[NTLUG:Discuss] Need help debugging simple script commands

Wed Jul 31 21:06:43 CDT 2002

> Don't know if you realize it or not, but there is an option within
> sort itself that will suppress output of duplicate lines:

Yes, I know it's there, and I've used it before, but I had not tried
it here. 

I just ran a test using:

sort -u -o domains.uniq domains

and got the same exact results as the other method.  The dups are 
still there.

It doesn't make sense.

Rick

> -----Original Message-----
> From: discuss-admin at ntlug.org [mailto:discuss-admin at ntlug.org]On Behalf
> Of George Lass
> Sent: Wednesday, July 31, 2002 8:25 PM
> To: discuss at ntlug.org
> Subject: Re: [NTLUG:Discuss] Need help debugging simple script commands
> 
> 
> Don't know if you realize it or not, but there is an option within
> sort itself that will suppress output of duplicate lines:
> 
> sort -u -o domains.uniq domains
> 
> 
> It may work better than using sort & uniq (as well as all of the
> pipes & re-directs).
> 
> George
> 
> Rick Matthews wrote:
> > 
> > This problem is driving me crazy! (Please help, because that's a short
> > trip from where I live!)
> > 
> > I've got a file of domain names (one per line) that contains duplicates.
> > I've been removing the duplicates with:
> > 
> > cat domains | sort | uniq > domains.uniq
> > 
> > That has stopped working. It stopped once before and I found some
> > garbage in the file. I cleaned out blank lines and trailing spaces and
> > tabs, and it started working again. (It wouldn't match 'domain-name'
> > with 'domain-name<TAB>'). Now the cleanup is a standard part of my
> > routine, and it has worked fine for months, until about a week ago.
> > The file size has been growing and is up to about 7.5 meg and about
> > 500,000 lines before removing duplicates.
> > 
> > When I say that it doesn't work, I don't mean that it abends with an
> > error. It takes the same amount of time before it completes, and it
> > is removing some of the duplicates, but it is leaving most of them.
> > 
> > I copied about 5k of the file into a test file and it successfully
> > removed all of the duplicates. That same section of the file is not
> > deduped when included in the big file.
> > 
> > I think the problem is one of two things:
> > 
> > a) Something is blowing up and I'm not looking in the right place
> > for the error messages.
> > 
> > b) The file contains some other kind of garbage besides what I am
> > cleaning out.
> > 
> > Can anyone agree with that, or have a better answer? Suggestions?
> > 
> > Does anyone have a grep or sed or perl command or two that I can use
> > that will remove everything that is not legal in a domain name?
> > (not the http:, just the rest).
> > 
> > Thanks for your help!
> > 
> > Rick
> > 
> > _______________________________________________
> > http://www.ntlug.org/mailman/listinfo/discuss
> 
> _______________________________________________
> http://www.ntlug.org/mailman/listinfo/discuss
>