[NTLUG:Discuss] eliminating lines with the same information

Stuart Johnston saj at thecommune.net
Mon Jul 2 17:09:08 CDT 2007


Woo Hoo!!  Wayne wins the coveted Useless Use of Cat Award!!  ;)

http://partmaps.org/era/unix/award.html

 > sed -e 's/.*<//' -e 's/>.*//' < file_with_names | sort -u

Wayne Walker wrote:
> cat file_with_names | sed -e 's/.*<//' -e 's/>.*//' | sort -u
> 
> This will HOSE you if any of the addresses are in the valid but rarely used form of :
> 
> wwalker at bybent.com <Wayne Walker>
> 
> If you have any of those, instead use:
> 
> cat file_with_names | sed -e 's/^[^@]*<//' -e 's/>[^@]*$//' | sort -u
> 
> Wayne 
> 
> On Mon, Jul 02, 2007 at 12:55:33PM -0500, Lance Simmons wrote:
>> I have a text file with several thousand email addresses, many of
>> which are duplicates. I've used "sort" and "uniq" to make the list
>> smaller, but there are still almost a thousand..
>>
>> But I still have many duplicates.  For example, three lines in the file might be
>>
>>   jsmith at abc.org
>>   "John Smith" <jsmith at abc.org>
>>   "Mr. John Smith" <jsmith at abc.org>
>>
>> Obviously, I'd like to get rid of two of those lines without having to
>> manually go through and decide which to keep.  And I don't care about
>> keeping names, I'm only interested in addresses.
>>
>> Also, the duplicates are not all on lines near each other, so even if
>> I wanted to do it manually, it would be a huge hassle.
>>
>> Any suggestions?
>>
>> -- 
>> Lance Simmons
>>
>> _______________________________________________
>> http://www.ntlug.org/mailman/listinfo/discuss
> 
> _______________________________________________
> http://www.ntlug.org/mailman/listinfo/discuss



More information about the Discuss mailing list