File Deletions
ATI 2003-0001

Document revision: 0.7 - 03/21/2003
Primary Author: Walter C. Wong (wcw+@cmu.edu)
http://asg.web.cmu.edu/arch/ati/2003-0001-deletion.html

1.0 Introduction

This document will describe how files get deleted from services. Backups are discussed in section 3.

The purpose of this document is to describe the current state of the system and should be used as a base of understanding for future discussions. At this point, no discussions have occurred with local legal folks to align the technical solutions with any organizational document retention policy.

At this point, we recommend not making any changes to the systems without fully understanding the legal obligations and organizational objectives.

1.1 Overview

The permanent deletion of files from computer systems is very difficult, especially if one has near limitless resources to devote to the recovery. It is likely that without complete physical destruction and/or degaussing, data can be recovered.[1].

At this point, services tend use the "standard" file deletion mechanism where when a file is deleted, the reference to that file is removed and the blocks containing the data are marked as 'free'. The system is then able to reuse the data blocks for other data.

No services attempt to overwrite the data with random data in order to make data recovery even more difficult.

Data recovery of deleted data is already fairly difficult and time-consuming task, especially on busy server machines. On busy server machines data is likely to be overwritten anyway. Also, from a service provider view point, taking a busy server offline in order to recover data is very costly to the organization and would result in a serious loss of productivity.

2.0 Services

2.1 Cyrus Email

The Cyrus Email system currently stores email messages on the local disk of "back-end" servers. Veritas VXFS is used over the native Solaris UFS for performance and reliability reasons.

In practice, the filesystems are constantly busy. It is likely that the blocks that were freed would be immediately reused by another email message. However, the exact nature will depend on the filesystem's block allocation algorithm and what else may be going on with the system at that time. Clearly, this provides no assurances that all the data has been deleted.

Backups are done of the filesystem via Amanda using the Cyrus instance.

2.2 AFS

Every file in AFS is stored in a volume. A volume starts off in the read-write state. A single backup volume and multiple replicated volumes can be created from that read-write volume.

The backup volume is a read-only, copy on write version of the read-write volume. The backup volume is usually created by the backup system and then the data from this volume is transferred to the backup system. The backup volume is also accessible by the end user. By convention, when a new user volume is created, the backup volume is mounted as OldFiles in the home directory; backup volumes for other volumes are not automatically mounted.

The replicated volumes are multiple read-only volumes created from a single read-write volume. Replicated volumes allow multiple servers to host the same data, thereby providing additional performance capacity for frequently accessed files and providing redundancy in case of a single server failure. The replication process is done "manually" by issuing the command to release a read-write volume. The system does NOT detect that a change has been made to the read-write volume and automatically propagate the change out to the replicas.

However, there is an "auto-release" process that runs twice daily that will check a list of volumes and release any volume that has been changed.

The fileservers currently use the namei filesystem method. That means that for every file in AFS there is a corresponding file on a server. When a user deletes that file, the file may not be immediately deleted. There must no longer be any references to that file before it gets deleted from the disk. When there are no longer any references, the file is deleted using standard unix filesystem mechanisms.

Outstanding references to deleted files may exist in the backup volume. As such, the file may not be deleted until the next time the backup volume is created. In general, a new backup volume is created nightly.

Outstanding references may also exist in the replicated volumes. To decrement these references, one must release the read-write volume. Some replicated volumes are released daily; others are released as needed.

In practice, the filesystems are fairly busy. It is likely that the blocks that were freed would be immediately reused by another file. However, the exact nature will depend on the filesystem's block allocation algorithm and what else may be going on with the system at that time. Clearly, this provides no assurances that all the data has been deleted.

Backups are done via the stage system described in more detail later. While not all volumes are backed up, all volumes that contain user data should be backed up.

2.3 Web

2.3.1 User Web

www.andrew.cmu.edu, the user web publishing system, copies data from AFS to the local disk of the web server. The server picks the partition with the largest amount of free space, copies the files to that partition and if the copy was successful, the pointer to the web pages is moved from the previous directory to this new directory and the old directory is then deleted.

Sometimes there can be orphan directories -- directories to which there is no valid pointer. This can occur if the copy or deletion failed. There is a process that runs nightly that deletes the orphaned directories.

No backups are done of the local disk. The only backups available are the AFS backups. The system disk is backed up via the Amanda internal instance.

2.3.1 Campus Web

www.cmu.edu, the campus web publishing system, uses the new "Andrew Web Publishing System." Web pages are copied into AFS and also incorporated into CVS. CVS is a revision management system that allows one to have access to multiple versions of a file.

The CVS system is not directly accessible by end users at this time.

The use of CVS may result in files that were tagged for deletion to not actually be deleted. It will also preserve changes so that deleted text is still readily accessible (though not to the end-user at this time).

The system disk is backed up via the Amanda internal instance. Since the web pages are stored on AFS, backup of the data is via the AFS mechanisms.

2.4 Oracle

Oracle deletions act like a standard file deletion.

Data in the database is dumped to a file and those files are backed up by the Amanda internal instance.

3.0 Backup Systems

There are two backup systems: Amanda and Stage.

The typical backup for both systems is not taken off site. All backups are done to RAID units. The backups are stored on a RAID array attached the backup server. When the expiration time is reached, the files are deleted and the partition is then reused for the next backup.

3.1 Stage

stage is a backup system specifically for AFS.

AFS volumes are categorized into stage groups and the retention of volumes varies with each group. The retention for each group is as follows:

Full dumps of every backup group are done every month.

"End of Semester" backups are done for AFS. These backups are full dumps that are performed to tape at the end of each semester and sent off site. These tapes are kept for five (5) years. At the end of this time, the tapes are retrieved and bulk erased.

3.2 Amanda

Amanda is used to back up the local disk of the servers. There are three Amanda instances: Cyrus, cost recovery, and internal. Amanda is better suited as a 'disaster recovery' backup system where the backups are used to restore disks that have gone bad. Restoring user data has a higher overhead as restores are done in disk partition chunks.

Amanda backups are not taken off site. All backups are done to RAID units. The backups are stored on a RAID array attached the backup server. When the expiration time is reached, the files are deleted and the partition is then reused for the next backup.

The Cyrus rotation is 21 days. This means that we can restore files up to 21 days; no data is available after 21 days. The internal and cost recovery rotation is 14 days.

References

[1] - http://www.sans.org/rr/privacy/gone.php

ChangeLog

0.7  - 03/27/2003 - Document approved
0.6  - 03/13/2003 - include sans reference; clean up intro secure 
                    delete text 
0.5  - 03/10/2003 - poepping updates
0.4  - 03/10/2003 - update from dk08; tapes are destroyed
0.3  - 03/10/2003 - comments from kvd, cg2v, spellcheck, added intro
0.2  - 03/10/2003 - added backup foo
0.1  - 03/03/2003 - Initial draft