[NTLUG:Discuss] Need help debugging simple script commands

George Lass George.Lass at osc.com
Wed Jul 31 20:25:13 CDT 2002


Don't know if you realize it or not, but there is an option within
sort itself that will suppress output of duplicate lines:

sort -u -o domains.uniq domains


It may work better than using sort & uniq (as well as all of the
pipes & re-directs).

George

Rick Matthews wrote:
> 
> This problem is driving me crazy! (Please help, because that's a short
> trip from where I live!)
> 
> I've got a file of domain names (one per line) that contains duplicates.
> I've been removing the duplicates with:
> 
> cat domains | sort | uniq > domains.uniq
> 
> That has stopped working. It stopped once before and I found some
> garbage in the file. I cleaned out blank lines and trailing spaces and
> tabs, and it started working again. (It wouldn't match 'domain-name'
> with 'domain-name<TAB>'). Now the cleanup is a standard part of my
> routine, and it has worked fine for months, until about a week ago.
> The file size has been growing and is up to about 7.5 meg and about
> 500,000 lines before removing duplicates.
> 
> When I say that it doesn't work, I don't mean that it abends with an
> error. It takes the same amount of time before it completes, and it
> is removing some of the duplicates, but it is leaving most of them.
> 
> I copied about 5k of the file into a test file and it successfully
> removed all of the duplicates. That same section of the file is not
> deduped when included in the big file.
> 
> I think the problem is one of two things:
> 
> a) Something is blowing up and I'm not looking in the right place
> for the error messages.
> 
> b) The file contains some other kind of garbage besides what I am
> cleaning out.
> 
> Can anyone agree with that, or have a better answer? Suggestions?
> 
> Does anyone have a grep or sed or perl command or two that I can use
> that will remove everything that is not legal in a domain name?
> (not the http:, just the rest).
> 
> Thanks for your help!
> 
> Rick
> 
> _______________________________________________
> http://www.ntlug.org/mailman/listinfo/discuss




More information about the Discuss mailing list