 |
Index for Section 3 |
|
 |
Alphabetical listing for D |
|
 |
Bottom of page |
|
DBZ(3)
NAME
dbzinit, dbzfresh, dbzagain, dbzclose - database routines
dbzexists, dbzfetch, dbzstore - database routines
dbzsync, dbzsize, dbzgetoptions, dbzsetoptions, dbzdebug - database
routines
SYNOPSIS
#include <dbz.h>
BOOL dbzinit(const char *base)
BOOL dbzclose(void)
BOOL dbzfresh(const char *base, const long size)
BOOL dbzagain(const char *base, const char *oldbase)
BOOL dbzexists(const HASH key)
OFFSET_T dbzfetch(const HASH key)
BOOL dbzfetch(const HASH key, void *ivalue)
BOOL dbzstore(const HASH key, const OFFSET_T offset)
BOOL dbzstore(const HASH key, void *ivalue)
BOOL dbzsync(void)
long dbzsize(const long nentries)
void dbzgetoptions(dbzoptions *opt)
void dbzsetoptions(const dbzoptions opt)
BOOL dbzdebug(const BOOL newvalue)
DESCRIPTION
These functions provide an indexing system for rapid random access to a
text file (the base file).
Dbz stores offsets into the base text file for rapid retrieval. All
retrievals are keyed on a hash value that is generated by the
HashMessageID() function.
Dbzinit opens a database, an index into the base file base, consisting of
files base.dir , base.index , and base.hash which must already exist. (If
the database is new, they should be zero-length files.) Subsequent accesses
go to that database until dbzclose is called to close the database.
Dbzfetch searches the database for the specified key, returning the
corresponding value if any, if <--enable-tagged-hash at configure> is
specified. If <--enable-tagged-hash at configure> is not specified, it
returns TRUE and content of ivalue is set. Dbzstore stores the key - value
pair in the database, if <--enable-tagged-hash at configure> is specified.
If <--enable-tagged-hash at configure> is not specified, it stores the
content of ivalue. Dbzstore will fail unless the database files are
writable. Dbzexists will verify whether or not the given hash exists or
not. Dbz is optimized for this operation and it may be significantly
faster than dbzfetch().
Dbzfresh is a variant of dbzinit for creating a new database with more
control over details.
Dbzfresh's size parameter specifies the size of the first hash table within
the database, in key-value pairs. Performance will be best if the number
of key-value pairs stored in the database does not exceed about 2/3 of
size. (The dbzsize function, given the expected number of key-value pairs,
will suggest a database size that meets these criteria.) Assuming that an
fseek offset is 4 bytes, the .index file will be 4 * size bytes. The .hash
file will be DBZ_INTERNAL_HASH_SIZE * size bytes (the .dir file is tiny and
roughly constant in size) until the number of key-value pairs exceeds about
80% of size. (Nothing awful will happen if the database grows beyond 100%
of size, but accesses will slow down quite a bit and the .index and .hash
files will grow somewhat.)
Dbz stores up to DBZ_INTERNAL_HASH_SIZE bytes of the message-id's hash in
the .hash file to confirm a hit. This eliminates the need to read the base
file to handle collisions. This replaces the tagmask feature in previous
dbz releases.
A size of ``0'' given to dbzfresh is synonymous with the local default; the
normal default is suitable for tables of 5,000,000 key-value pairs.
Calling dbzinit(name) with the empty name is equivalent to calling
dbzfresh(name, 0).
When databases are regenerated periodically, as in news, it is simplest to
pick the parameters for a new database based on the old one. This also
permits some memory of past sizes of the old database, so that a new
database size can be chosen to cover expected fluctuations. Dbzagain is a
variant of dbzinit for creating a new database as a new generation of an
old database. The database files for oldbase must exist. Dbzagain is
equivalent to calling dbzfresh with a size equal to the result of applying
dbzsize to the largest number of entries in the oldbase database and its
previous 10 generations.
When many accesses are being done by the same program, dbz is massively
faster if its first hash table is in memory. If the ``pag_incore'' flag is
set to INCORE_MEM, an attempt is made to read the table in when the
database is opened, and dbzclose writes it out to disk again (if it was
read successfully and has been modified). Dbzsetoptions can be used to set
the pag_incore and exists_incore flag to new value which should be
``INCORE_NO'', ``INCORE_MEM'', or ``INCORE_MMAP'' for the .hash and .index
files separately; this does not affect the status of a database that has
already been opened. The default is ``INCORE_NO'' for the .index file and
``INCORE_MMAP'' for the .hash file. The attempt to read the table in may
fail due to memory shortage; in this case dbz fails with an error. Stores
to an in-memory database are not (in general) written out to the file until
dbzclose or dbzsync, so if robustness in the presence of crashes or
concurrent accesses is crucial, in-memory databases should probably be
avoided or the writethrough option should be set to ``TRUE'';
If the nonblock option is ``TRUE'', then writes to the .hash and .index
files will be done using non-blocking I/O. This can be significantly
faster if your platform supports non-blocking I/O with files.
Dbzsync causes all buffers etc. to be flushed out to the files. It is
typically used as a precaution against crashes or concurrent accesses when
a dbz-using process will be running for a long time. It is a somewhat
expensive operation, especially for an in-memory database.
If dbz has been compiled with debugging facilities available (which makes
it bigger and a bit slower), dbzdebug alters the value (and returns the
previous value) of an internal flag which (when 1; default is 0) causes
verbose and cryptic debugging output on standard output.
Concurrent reading of databases is fairly safe, but there is no
(inter)locking, so concurrent updating is not.
An open database occupies three stdio streams and two file descriptors;
Memory consumption is negligible (except for stdio buffers) except for in-
memory databases.
SEE ALSO
dbm(3), history(5), libinn(3)
DIAGNOSTICS
Functions returning BOOL values return ``TRUE'' for success, ``FALSE'' for
failure. Functions returning OFFSET_T values return a value with -1 for
failure. Dbzinit attempts to have errno set plausibly on return, but
otherwise this is not guaranteed. An errno of EDOM from dbzinit indicates
that the database did not appear to be in dbz format.
If DBZTEST is defined at compile-time then a main() function will be
included. This will do performance tests and integrity test.
HISTORY
The original dbz was written by Jon Zeeff (zeeff@b-tech.ann-arbor.mi.us).
Later contributions by David Butler and Mark Moraes. Extensive reworking,
including this documentation, by Henry Spencer (henry@zoo.toronto.edu) as
part of the C News project. MD5 code borrowed from RSA. Extensive
reworking to remove backwards compatibility and to add hashes into dbz
files by Clayton O'Neill (coneill@oneill.net)
BUGS
Unlike dbm, dbz will refuse to dbzstore with a key already in the database.
The user is responsible for avoiding this.
The RFC822 case mapper implements only a first approximation to the
hideously-complex RFC822 case rules.
Dbz no longer tries to be call-compatible with dbm in any way.
 |
Index for Section 3 |
|
 |
Alphabetical listing for D |
|
 |
Top of page |
|