[NTLUG:Discuss] eliminating lines with the same information
Wayne Walker
waynewalker at bybent.com
Mon Jul 2 17:05:27 CDT 2007
cat file_with_names | sed -e 's/.*<//' -e 's/>.*//' | sort -u
This will HOSE you if any of the addresses are in the valid but rarely used form of :
wwalker at bybent.com <Wayne Walker>
If you have any of those, instead use:
cat file_with_names | sed -e 's/^[^@]*<//' -e 's/>[^@]*$//' | sort -u
Wayne
On Mon, Jul 02, 2007 at 12:55:33PM -0500, Lance Simmons wrote:
> I have a text file with several thousand email addresses, many of
> which are duplicates. I've used "sort" and "uniq" to make the list
> smaller, but there are still almost a thousand..
>
> But I still have many duplicates. For example, three lines in the file might be
>
> jsmith at abc.org
> "John Smith" <jsmith at abc.org>
> "Mr. John Smith" <jsmith at abc.org>
>
> Obviously, I'd like to get rid of two of those lines without having to
> manually go through and decide which to keep. And I don't care about
> keeping names, I'm only interested in addresses.
>
> Also, the duplicates are not all on lines near each other, so even if
> I wanted to do it manually, it would be a huge hassle.
>
> Any suggestions?
>
> --
> Lance Simmons
>
> _______________________________________________
> http://www.ntlug.org/mailman/listinfo/discuss
More information about the Discuss
mailing list