From brister@vix.com Mon Nov 25 11:14:48 EST 1996
Article: 26878 of news.software.nntp
Path: news.math.psu.edu!news3.cac.psu.edu!howland.erols.net!news.mathworks.com!uunet!in2.uu.net!vixie!velo.pp.vix.com!not-for-mail
From: James Brister <brister@vix.com>
Newsgroups: news.software.nntp
Subject: Re: History DB Performance
Date: 20 Nov 1996 20:03:33 +0100
Organization: Internet Software Consortium
Lines: 67
Message-ID: <yy36830612y.fsf@velo.pp.vix.com>
References: <56turt$5vb$1@newsfeed.dreamscape.com>
NNTP-Posting-Host: localhost
Cc: msd@nova.dreamscape.com 
X-Newsreader: Gnus v5.3/Emacs 19.34

msd@nova.dreamscape.com (Matthew Dwyer) writes:

> As with 1.4, We are still having trouble with the history database once it
> exceeds about 35megs. Our responses to incoming articles become too large to
> accept a full feed.

What do you mean by 'too large'?

> There seems to be little information on the relationship between the size of 
> the history DB and performance.

The performance gets worse as the size gets bigger (obviously). It also
gets worse if the database is not rebuilt nightly (which it is by default
with expire). There's a bug in 'makehistory' when using the '-i' and '-s'
flags, which results in poor performance when your history file is over
16MB (untested patch below). Using '-i' to expire will also result in poor
performance (but that's not a bug).

A good indicator of your performance characteristics would be how much
smaller is the number generated by this,

   head -1 /var/news/etc/history.dir | perl -ane 'print 2 ** $F[7], "\n";'

than the size of your history text file. If it's bigger you're OK. if it's
smaller, then lookups for the message ids at the tail of the history text
file (past the byte indexed by the number just generated) will be much
slower than for those at the front.
 
> One idea we are considering is to keep the "live" database small during the 
> day and appending it to the large database at expire time.. then producing a 
> new "live" database for the next day.

What exactly is the 'live' database, and what's the 'large' database.

> We just don't need to check every 
> incoming article against 14 days worth. We only need about 1 day of history 
> for incoming news and the rest for expiration.

Actually, you have that backwards. You need 14 days history so that when
someone upstream starts re-feeding you week old articles, you can reject
them properly.

James
-- 
James Brister                                            brister@vix.com
Internet Software Consortium       http://www.isc.org    inn@isc.org 
---------------------------------------------------------------------------
*** makehistory.c	1996/11/07 00:50:51	1.20
--- makehistory.c	1996/11/13 22:15:01
***************
*** 125,131 ****
      /* Open the new database, using the old file if desired and possible. */
      (void)dbzincore(1);
      if (IgnoreOld) {
! 	if (dbzfresh(p, dbzsize(size), HIS_FIELDSEP, 'C', dbztagmask(size)) < 0) {
  	    (void)fprintf(stderr, "Can't do dbzfresh, %s\n",
  		    strerror(errno));
  	    if (temp[0])
--- 125,132 ----
      /* Open the new database, using the old file if desired and possible. */
      (void)dbzincore(1);
      if (IgnoreOld) {
!         /* 70 == average history file line length */
! 	if (dbzfresh(p, dbzsize(size), HIS_FIELDSEP, 'C', dbztagmask(size*70)) < 0) {
  	    (void)fprintf(stderr, "Can't do dbzfresh, %s\n",
  		    strerror(errno));
  	    if (temp[0])


