[NTLUG:Discuss] Need help debugging simple script commands
George Lass
George.Lass at osc.com
Wed Jul 31 20:25:13 CDT 2002
Don't know if you realize it or not, but there is an option within
sort itself that will suppress output of duplicate lines:
sort -u -o domains.uniq domains
It may work better than using sort & uniq (as well as all of the
pipes & re-directs).
George
Rick Matthews wrote:
>
> This problem is driving me crazy! (Please help, because that's a short
> trip from where I live!)
>
> I've got a file of domain names (one per line) that contains duplicates.
> I've been removing the duplicates with:
>
> cat domains | sort | uniq > domains.uniq
>
> That has stopped working. It stopped once before and I found some
> garbage in the file. I cleaned out blank lines and trailing spaces and
> tabs, and it started working again. (It wouldn't match 'domain-name'
> with 'domain-name<TAB>'). Now the cleanup is a standard part of my
> routine, and it has worked fine for months, until about a week ago.
> The file size has been growing and is up to about 7.5 meg and about
> 500,000 lines before removing duplicates.
>
> When I say that it doesn't work, I don't mean that it abends with an
> error. It takes the same amount of time before it completes, and it
> is removing some of the duplicates, but it is leaving most of them.
>
> I copied about 5k of the file into a test file and it successfully
> removed all of the duplicates. That same section of the file is not
> deduped when included in the big file.
>
> I think the problem is one of two things:
>
> a) Something is blowing up and I'm not looking in the right place
> for the error messages.
>
> b) The file contains some other kind of garbage besides what I am
> cleaning out.
>
> Can anyone agree with that, or have a better answer? Suggestions?
>
> Does anyone have a grep or sed or perl command or two that I can use
> that will remove everything that is not legal in a domain name?
> (not the http:, just the rest).
>
> Thanks for your help!
>
> Rick
>
> _______________________________________________
> http://www.ntlug.org/mailman/listinfo/discuss
More information about the Discuss
mailing list