How does one compress a large number of small files? I have over a terrabyte of data that is currently stored as small matlab .mat files (~750 KB each) in a hierarchy of directories (the top level directories contain several GB of data which are then divided among a few hundred subdirectories that vary in size from a couple of MB to a few hundred MB). I'm considering compressing the data, but I'm not 100% what the best way to do that is for such a large structure. Also, does anybody have any suggestions as to which compression format will work best? — D. Borrero 2009-07-08 13:29
Aren't Matlab .mat files binary floating-point data and thus already compressed? If there is room to be gained from compression (try gzipping an individual file) I would suggest one of two alternatives:
tar cvfpz bigdir.tgz bigdir
” where bigdir is the name of the toplevel directory. That will compress everything into one big tarfile, which you then list contents/extract with “tar tvfpz bigdir.tgz
” or “tar xvfpz bigdir.tgz
”. gzip -r .
”. That'll do a recursive descent into the current directory and compress all files within. Alternatively you could do “find . -name '*.mat' -exec gzip {} \;
”. You can use bzip2
instead of gzip
in the latter commands (j
in place of z
in the tar commands); bzip2
is supposed to give better compression but it doesn't always.