From news@news.cam.ac.uk Fri Aug  8 10:30:41 EDT 1997
Article: 41009 of news.software.nntp
Path: news.cis.ohio-state.edu!news.maxwell.syr.edu!howland.erols.net!rill.news.pipex.net!pipex!join.news.pipex.net!pipex!server1.netnews.ja.net!lyra.csx.cam.ac.uk!not-for-mail
From: news@news.cam.ac.uk (USENET news)
Newsgroups: news.software.nntp
Subject: INN 1.5.1sec2 - fix for an innd crash
Date: 8 Aug 1997 00:22:17 +0100
Organization: Computing Service, Cambridge University, England
Lines: 77
Message-ID: <5sdlb9$oq$1@lyra.csx.cam.ac.uk>
NNTP-Posting-Host: lyra.csx.cam.ac.uk
Xref: news.cis.ohio-state.edu news.software.nntp:41009

At least, it seems to be the fix for a problem I've been seeing with 
INN 1.5.1sec2 on Solaris 2.5 (SPARC) and Sun's C compiler - "your mileage 
may vary". 

The symptoms I saw were innd collapsing in response to circumstances which
caused outbound feeds to be restarted (ctlinnd reload newsfeeds, ctlinnd
newgroup, etc.), though it was also dependent on the newsfeeds configuration
(related to funnel files and how many sites particular articles to the
funnel were being sent to). Since the underlying cause is a memory
addressing error, the symptoms are likely to vary with OS/compiler/phase of
moon, depending on where data structures get allocated etc.

In my case, examining a core dump in dbx showed e.g. 

Current function is SITEfree
 1044           DISPOSE(sp->FNLnames.Data);
(dbx)   [1] kill(0x0, 0x6, 0x0, 0x0, 0xffffffff, 0x128be8), at 0xef67434c
  [2] abort(0x11b104, 0x127580, 0x2ebc80, 0x0, 0x0, 0x1), at 0xef6396a4
  [3] free(0x549e08, 0x55555400, 0x55555555, 0x549e00, 0xd0, 0x131f48), at 
0x6e9d8
=>[4] SITEfree(sp = 0x52a0a0), line 1044 in "site.c"
  [5] SITEparsefile(StartSite = 1), line 542 in "newsfeeds.c"
  [6] ICDsetup(StartSites = 1), line 98 in "icd.c"
  [7] ICDwritevactive(vp = 0xefffef54, vpcount = 2), line 221 in "icd.c"
  [8] ICDnewgroup(Name = 0xeffff6aa "soc.genealogy.britain", Rest = 
0xeffff6c0 "y"), line 281 in "icd.c"
  [9] CCnewgroup(av = 0xeffff264), line 1016 in "cc.c"
  [10] CCreader(cp = 0x131728), line 1739 in "cc.c"
  [11] CHANreadloop(), line 833 in "chan.c"
  [12] main(ac = 0, av = 0xeffffe7c), line 972 in "innd.c"

accompanied consistently by 

assertion botched: *(unsigned int *)((caddr_t)Perl_op + 
Perl_op->ovu.ovu_size + 1 - sizeof (unsigned int)) == 0x55555555

in errlog.

The error (if I'm right, and the code certainly appears to be wrong anyway!)
is actually in innd/art.c, in an area where the INN 1.5.1sec2 (and sec) code
differs from both 1.5.1 and from the 1.6 beta releases. The 1.5.1sec2
version appears broken, and substituting the 1.6b3 version (patch below)
appears to fix it.

Grateful thanks are due to Forrest J. Cavalier III <mibsoft@epix.net>, whose
unified source files for the various recent INN versions (announced here 
recently) made it much easier to see where possibly-relevant bits of
1.5.1sec2 differed from 1.5.1 and 1.6beta. In fact, he also highlighted the
1.5.1sec2 version of the change as dubious (from examining the source code)
a couple of days ago on the inn-workers mailing list, but the effects of the
broken code were hitherto unknown...

                                John Line

===== patch for innd/art.c in INN 1.5.1sec2 (retrofitted from 1.6b3)
*** art.c.original	Thu Aug  7 21:30:52 1997
--- art.c	Thu Aug  7 22:06:26 1997
***************
*** 1690,1697 ****
  		    *p++ = ' ';
  		    bp->Used++;
  		}
!                 strncpy(p, sp->Name, bp->Size - 1) ;
! 		p[bp->Size - 1] = '\0';
  		bp->Used += strlen(p);
  	    }
  	}
--- 1690,1697 ----
  		    *p++ = ' ';
  		    bp->Used++;
  		}
! 		strncpy(p, sp->Name, bp->Size - bp->Used - 1) ;
! 		bp->Data[bp->Size - 1] = '\0';
  		bp->Used += strlen(p);
  	    }
  	}
=====


