[NTLUG:Discuss] eliminating lines with the same information
Lance Simmons
simmons.lance at gmail.com
Mon Jul 2 12:55:33 CDT 2007
I have a text file with several thousand email addresses, many of
which are duplicates. I've used "sort" and "uniq" to make the list
smaller, but there are still almost a thousand..
But I still have many duplicates. For example, three lines in the file might be
jsmith at abc.org
"John Smith" <jsmith at abc.org>
"Mr. John Smith" <jsmith at abc.org>
Obviously, I'd like to get rid of two of those lines without having to
manually go through and decide which to keep. And I don't care about
keeping names, I'm only interested in addresses.
Also, the duplicates are not all on lines near each other, so even if
I wanted to do it manually, it would be a huge hassle.
Any suggestions?
--
Lance Simmons
More information about the Discuss
mailing list