User Tools

Site Tools


gtspring2009:howto:compress

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

gtspring2009:howto:compress [2010/02/02 07:55] (current)
Line 1: Line 1:
 +====== How to compress a data file ======
 +
 +{{gtspring2009:​dbthumb.png?​24 }} How does one compress a large number of small files? I have over a terrabyte of data that is currently stored as small matlab .mat files (~750 KB each) in a hierarchy of directories (the top level directories contain several GB of data which are then divided among a few hundred subdirectories that vary in size from a couple of MB to a few hundred MB). I'm considering compressing the data, but I'm not 100% what the best way to do that is for such a large structure. Also, does anybody have any suggestions as to which compression format will work best? --- //​[[dborrero@gatech.edu|D. Borrero]] 2009-07-08 13:29//
 +
 +{{gtspring2009:​gibson.png?​24 }} Aren't Matlab .mat files binary floating-point data and thus already compressed? If there is room to be gained from compression (try gzipping an individual file) I would suggest one of two alternatives: ​
 +
 +  - Compress the entire directory structure with "''​tar cvfpz bigdir.tgz bigdir''"​ where bigdir is the name of the toplevel directory. That will compress everything into one big tarfile, which you then list contents/​extract with "''​tar tvfpz bigdir.tgz''"​ or "''​tar xvfpz bigdir.tgz''"​. ​
 +
 +  - Compress files individually with "''​gzip -r .''"​. That'​ll do a recursive descent into the current directory and compress all files within. Alternatively you could do "''​find . -name '​*.mat'​ -exec gzip {} \;''"​. You can use  ''​bzip2''​ instead of ''​gzip''​ in the latter commands (''​j''​ in place of ''​z''​ in the tar commands); ''​bzip2''​ is supposed to give better compression but it doesn'​t always. ​
 +
  
gtspring2009/howto/compress.txt ยท Last modified: 2010/02/02 07:55 (external edit)