[NTLUG:Discuss] db vs flat file

Fri Dec 14 17:05:59 CST 2001

David,

The only real life experience that i have with flat files is with
/etc/passwd, and I was managing /etc/passwd with 10,000 entries in it.  I
think that it was bearable speed.  I probaly could have shaved a couple
milliseconds with a hash, but the OS usually does not support that. ;)

If you are wanting to go with a hash, you will risk the possibility that your
database could become corrupted, and if it does corrupt then you will not have
any method to clean it like you would if you used a flat file.

I am assuming that speed and simplicty is priority to you.  If speed is very
important, I would load up a hash and a flat file and test to see if the
hashed is any faster than the flat file.  My gut tells me that would probably
will not see any significant speed gains until you get up to 20,000+
lines/records.  Actually, that is probably close to the hard limit of flat
files.

Some other things to look at is DBI, which is a perl interface to different
types of database.  I think that one of the DBI modules is text files, and
that would easily be ported to a relational database, or what ever fits your
need.

FYI: (from CPAN)
Module          DBD::ODBC       (J/JU/JURL/DBD-ODBC-0.28.tar.gz)
Module          DBD::Oracle     (T/TI/TIMB/DBD-Oracle-1.12.tar.gz)
Module          DBD::LDAP       (T/TU/TURNERJW/DBD-LDAP-0.04.tar.gz)
Module          DBD::CSV        (J/JW/JWIED/DBD-CSV-0.1029.tar.gz)
Module          DBD::File       (J/JW/JWIED/DBD-CSV-0.1029.tar.gz)
...

My Favorite: ;)
Module          DBD::Excel      (K/KW/KWITKNR/DBD-Excel-0.05.tar.gz)

Hope it helps

Greg

On Fri, Dec 14, 2001 at 03:05:25PM -0600, David Camm wrote:
> i'm about to write some code to do a search based on condtions among
> several fields in records in a file.
> 
> i can use either a flat file or a hash version of the file.
> 
> as the file gets larger (multiple thousands of records) will it be
> faster to: 
> 
> open(I, 'inputfile'); while (<I>) { .......} close (I); 
> 
> or:
> 
> tie (%D .... input.db...) while (($k,$v) = each %D){ .....} untie %D;
> 
> my ***GUESS*** is that dbm support contains more code than sequential
> file support and hence might be slower, but that's just a guess.
> 
> any opinions (or hard facts) would be greatly appreciated.
> 
> david camm
> advanced web systems
> 
> _______________________________________________
> http://www.ntlug.org/mailman/listinfo/discuss