[NTLUG:Discuss] Copying a file to TWO destinations as the same time..

Sun Jan 29 13:06:23 CST 2006

Terry wrote:
> On 1/29/06, Stuart Yarus <syarus at kvsystems.org> wrote:
> 
>>If you really want to use tee, try:
>>
>>    cat file | tee location1 | tee location2 | tee location3
>>
>>This method uses the original file just once.  The method certainly
>>isn't recommended for many or for large files, due to the use and
>>duplication of stdout (standard output).
>>
>>Your backup method should depend on the nature of the backup.
>>
>>Stuart Yarus
>>
> 
> 
> And what about?:
> 
> cp file location1 ; cp file location2 ; cp file location 3
> 
> same end result, right?
> (Or maybe I'm missing something here?)

There is a HUGE amount of difference between:

    cat file1 > file2
    cp  file1 > file2

When you use 'cp', it preserves things like the file permissions.

Try this, start with a junk file - let's call it 't1'

   % chmod a-w t1
   % cat t1 > t2
   % cp  t1   t3
   % ls -l
   total 12
   -r--r--r--  1 steve users 25 2006-01-29 12:54 t1
   -rw-rw-rw-  1 steve users 25 2006-01-29 12:55 t2
   -r--r--r--  1 steve users 25 2006-01-29 12:55 t3

...see what I mean?

When you are doing a backup, you REALLY want the permissions to match

Furthermore, when you do this:

   cp file location1 ; cp file location2 ; cp file location 3

...you are reading the "file" three times.  When you do this:

   cat file | tee location1 | tee location2 > location3

...you are only reading 'file' once.

In olden days of UNIX, I'd be telling you that only reading 'file' once
would be faster because hard disks are slower than main memory and
'pipes' are dealt with in hardware.

However, this is the 21st century and computers have a TON of RAM and
our shiney new Linux operating system uses it.  So when you read "file"
for the first time, Linux leaves a copy of it in whatever unused RAM
there is in your computer.  The second time you read from "file", it
cleverly notices that there is a copy in memory and DOESN'T RE-READ THE
DISK DRIVE.

So in practice, I very much doubt there is any measurable performance
difference between using a pipeline versus simply copying the file
a bunch of times.  In both case, the data is in RAM during the copying.

The only time you might see a difference would be if your computer had
very little free memory during the copying.  Linux can only cache copies
of files in RAM if there is free memory...if not, it'll just have to
re-read the slow old disk drive every time...but that's VERY unlikely.

So, IMHO, if you are doing backups it'll make very little difference
whether you copy the file a couple of times or whether you pipe it
through 'tee'.

Knowing what Linux does with file caching, there is one very important
strategy.  Back up one file to two places, then move onto the next
file.  Don't back up all of your files to one place - then all of them
to the other place.  By the time you start the second copy of the first
file, all the other files you read will have filled up free memory and
kicked the cache of the first file out of RAM.