[NTLUG:Discuss] Combining data from multiple files
Steve Baker
sjbaker1 at airmail.net
Mon May 26 11:13:00 CDT 2003
Michael P wrote:
> I'm a little too sleepy to work up the full command but what you are
> looking for is the command "sort" or maybe a combination of cat, sort,
> and
> uniq.
cat file1 file2 | sort | uniq
This is OK if the information in the files are line-by-line databases
where the order of the lines in the resulting merged file doesn't
matter and where the same line occurring twice in the file should never
happen.
However, if these two files were something like a novel that someone
was writing where they'd edited some parts on their desktop machine
and edited other parts on their laptop - and now wanted a file
containing BOTH sets of changes, then sort and uniq would most
certainly not be the answer!!
I'd use 'diff -d -D' (or perhaps 'diff3 -m' for the 3 file case).
These produce a file containing a merge of the two or three original
files - but with additional information showing which of the original
files the changes came from. 'grep'ing out those additional lines
should get the result you need - and diff provides a really simple
mechanism for doing that:
eg:
diff -d -D VERYUNLIKELYWORD file1 file2 | grep -v VERYUNLIKELYWORD > result
(Where VERYUNLIKELYWORD is something that you know won't be found in
either of the two input files.)
The 'diff3 -m' command works a little differently but to be honest it's
much easier to merge three files using the 'merge two files' method twice:
diff -d -D VERYUNLIKELYWORD file1 file2 | grep -v VERYUNLIKELYWORD > temp
diff -d -D VERYUNLIKELYWORD temp file3 | grep -v VERYUNLIKELYWORD > result
...you can easily see how to extend this to merge any number of files.
I find this useful for sorting out files that have been edited separately
on laptops and PDA's - then merged onto a common central file when I
get back to my desktop machine.
---------------------------- Steve Baker -------------------------
HomeEmail: <sjbaker1 at airmail.net> WorkEmail: <sjbaker at link.com>
HomePage : http://www.sjbaker.org
Projects : http://plib.sf.net http://tuxaqfh.sf.net
http://tuxkart.sf.net http://prettypoly.sf.net
-----BEGIN GEEK CODE BLOCK-----
GCS d-- s:+ a+ C++++$ UL+++$ P--- L++++$ E--- W+++ N o+ K? w--- !O M- V-- PS++ PE- Y-- PGP-- t+ 5 X R+++ tv b++ DI++ D G+ e++ h--(-) r+++ y++++
-----END GEEK CODE BLOCK-----
More information about the Discuss
mailing list