[NTLUG:Discuss] eliminating lines with the same information

Lance Simmons simmons.lance at gmail.com
Mon Jul 2 12:55:33 CDT 2007


I have a text file with several thousand email addresses, many of
which are duplicates. I've used "sort" and "uniq" to make the list
smaller, but there are still almost a thousand..

But I still have many duplicates.  For example, three lines in the file might be

  jsmith at abc.org
  "John Smith" <jsmith at abc.org>
  "Mr. John Smith" <jsmith at abc.org>

Obviously, I'd like to get rid of two of those lines without having to
manually go through and decide which to keep.  And I don't care about
keeping names, I'm only interested in addresses.

Also, the duplicates are not all on lines near each other, so even if
I wanted to do it manually, it would be a huge hassle.

Any suggestions?

-- 
Lance Simmons



More information about the Discuss mailing list