How To Archive Your Files
Introduction
No matter how much disk space we have it is still a finite resource. Most of us do not clear out old files until a particular disk partition runs out of space.
Please take a look through your files to see whether there is anything that is no longer worth holding onto. If your overall directory is less than a few hundred gigabytes then you might have very little that you can clean out. However nearly everyone has some files that are no longer needed.
Please do not delete anything you are really using or want to hold onto for reference. However, the gzip compression utility can make a real difference. The point is to clear out garbage you have been meaning to remove, not to make your life difficult. :-)
Thanks for your help with this! A complete description of various tools (primarily df,du,gzip,tar, and gtar) follows below.
These will be links:
Mac users
UNIX users (Linux or Mac command line)
A Command Line approach for handling disk space
How to determine your disk usage:
The df command shows "Disk space Free". Running df by itself shows all currently mounted filesystems. You can also give it a directory argument, which shows available space for all users on that partition, not just your account.
- df -h ~ (where '~' represents your entire home directory)
- df -h /
- df -Ph / (recommended for Macs, to unclutter and not show inodes)
The df command shows results in kilobytes, which can be hard to read for large disks. Using the -h flag for a "human-readable" form (megabytes, gigabytes, etc) is very helpful.
The du command shows "Disk Usage". It is most useful in its "summary" ('-s') mode, because otherwise it shows you every nested subdirectory.
du outputs are shown in 512K blocks, which is even less helpful and nonintuitive to read. Thus, use du -hs for a "human-readable" form just like df or use du -ks for kilobytes that are easier to sort. To find the overall size of your home directory, type:
- du -hs ~
- du -ks * | sort -nr
- cd ~
- du -ks * .[a-zA-Z0-9]* | sort -nr
You can handle the output by piping it to head or a pager (less, more). Or you can redirect it to a file so that you don't have to keep repeating the du over and over.
- du -ks * | sort -nr | head -25
- du -ks * | sort -nr | less
- du -ks * | sort -nr > /tmp/du.myfiles
(where that filename can be anything you want)
Then you can use less on that file, e.g.,
less /tmp/du.myfiles
An effective way to use this command is to cd to your home directory, determine the largest sub-directories (using this command), then cd to those directories and do it again.
On the Linux cluster, you can check your home directory on the file servers with quota -s. (This won't work for local data disks.)
How to compress files
First of all, you should use gzip, not compress. gzip is generally faster than compress, and more importantly, it nearly always squeezes files smaller than compress. (Gzip is also backwards compatible, and can undo .Z files created by compress.) The bzip2 utility compresses files to an even smaller size than gzip, by the way, and is very much recommended for larger files or directories.Type:
- gzip filename
- gzip filename* or
- gzip fileA fileB
- gzip -r
- gzip -d filename.gz
- gzip -help
How to create archives
The tar command (Tape ARchive) allows you to create a single file which represents a concatenated group of files, retaining the original directory structure and file ownership/permissions.Creating a tar file involves no compression unless you request it, and the tar file will be the same size as the sum of the sizes of the files it contains.
Thus, you should plan to compress your tar files as you create them to save space. You can do this by adding a flag of "z" (for gzip) or "j" (for bzip2).
And you want your tar file names to be self documenting, so you and others will know what they are in the future, with suffixes of e.g., .tar.gz or even .tgz. (Similarly: .tar.bz2 or .tbz2 for bzip2)
The best format to use is:
- tar zcvf tar_file.tar.gz relative_path_to_files_to_archive
- tar jcvf tar_file.tar.bz2 relative_path_to_files_to_archive
Make sure always to use relative pathnames when referring to the files to be archived, because otherwise you have no choice as to where to restore them (they must go back in the same place). [Definition: an absolute path has a leading "/".]
Example:
- tar zcvf june_data.tar.gz JUNE_data
- tar zcvf june_data.tar.gz ~/JUNE_data
This latter case would expand to "$HOME/JUNE_data", which would mean that that directory would be the only place to which the files could be extracted from the archive. (For example, you might want to put the files back into JUNE_data.OLD or into a directory in /data, and not clobber your existing JUNE_data directory. You lose this flexibility with absolute pathnames.)
The basic rule here is not to specify the files to be tar'ed with a leading "/" (including "~" which an implied leading slash.) (You are allowed to use an absolute path for the name of the tar file you are creating.)
Follow-through, after tar file creation
Once you have created the tar file, please do one or both of the following items:- Delete the original file or directory (if the purpose was archiving for clearing out disk space, as opposed to sharing the files with someone else.) Otherwise you are consuming up to twice the disk space: the original file(s) or directory(ies) and the tar file.
- Move the tar file itself to another location (a local disk on your workstation, copy to another computer, etc) and delete the tar file from the system.
How to extract from tar files
Don't plan to uncompress first: you can simply work with the tar file in its compressed form! (Some web browsers unhelpfully decompress .tar.gz files.)
You simply replace the "c" with an "x" (eXtract):
- gtar zxvf filename.tar.gz
- gtar zxvf filename.tar.gz file3
- gtar ztvf filename.tar.gz | grep file3
- gtar ztvf filename.tar.gz ./dir_AB/file3
Important comment & rant:
Always examine the table of contents of a tar file (especially one you get from someone else) before extracting anything. This is done by using the table of contents "t" flag (instead of "c" for 'create'):- gtar ztvf filename.tar.gz
gtar ztvf filename.tar.gz dir_AB/file1 dir_AB/file2 dir_AB/file3 dir_AB/sub_dir_xx/file4 dir_AB/sub_dir_xx/file5 dir_AB/sub_dir_xx/file6example of 'unfriendly' tar file:
gtar ztvf filename.tar.gz file1 file2 file3 sub_dir_xx/file4 sub_dir_xx/file5 sub_dir_xx/file6The latter will 'litter' your current directory with file1, file2, and file3 amidst everything else you already had there. It is much better for it to create dir_AB and for you to know that everything you just extracted is there. If you check the table of contents, and the form is 'unfriendly' you have the opportunity of creating a directory first:
gtar ztvf filename.tar.gz (determine that it is unfriendly) mkdir dir_foo cd dir_foo gtar zxvf filename.tar.gzThink about this issue as well when you are creating tar files. You might want to go up one directory ("cd ..") and do the tar from there.
END IMPORTANT COMMENT & RANT. (with apologies to the Mac Bible :-) )
David Friedlander
October 1995, revised April 2020.