- i took a directory of source code and test data, around 9MB.
- copied it to a remote box
- tar cvzf on both sides to one file and also tar cvf to another file.
- on the source box, edit one source file, insert only one line.
- tar cvzf and tar cvf on the source box. the source box should have sources, tar and .tgz which vary in only one line in only one internal file.
- rsync of the source gives a speedup of 450 (14K sent, 94 received), rsync of the tar file gives a speedup of 85000+ (78 bytes received, 20 bytes sent), rsync of the .tgz gives a speedup of 1.48, (2.4MB sent, 12K or so received).
so rsync of a tar file is best (because only one file needs to be analyzed to see where the differences are). rsync of a compressed file (at any rate of .tgz, but probably of any compressor) is bad. not sure why, but i wouldn't be surprised if the compressed representation of a lot of data depends on what has come before, and there may be other effects like that which
confound the difference finder since too much is found to be different.