Bug (and probable fix) in CSBwin

Discuss Chaos Strikes Back for Windows and Linux, an unofficial port of Chaos Strikes Back to PC by Paul Stevens, as well as CSBuild, an associated dungeon editor.

Moderator: Zyx

Forum rules
Please read the Forum rules and policies before posting.
Post Reply
Ful Ir
Novice
Posts: 28
Joined: 26-Jan-11 20:04

Bug (and probable fix) in CSBwin

Post by Ful Ir » 2-Feb-11 22:30

I recently stumbled across an internal bug in CSBwin. In BACKGROUND_LIB::BackgroundGraphicExists (viewport.cpp, line 5848) it says

Code: Select all

  if (i < 0) -1;
when it probably should be

Code: Select all

  if (i < 0)
    return -1;
The missing return causes an out-of-bounds array access in the following line(s), triggered for example in Conflux when starting a new game and then turning for the first time.

User avatar
Paul Stevens
CSBwin Guru
Posts: 4099
Joined: 8-Apr-01 16:00
Location: Madison, Wisconsin, USA

Re: Bug (and probable fix) in CSBwin

Post by Paul Stevens » 3-Feb-11 00:20


Ful Ir
Novice
Posts: 28
Joined: 26-Jan-11 20:04

Re: Bug (and probable fix) in CSBwin

Post by Ful Ir » 6-Feb-11 23:37

I have encountered another bug, this one is not as easily fixed though :(
From my limited understanding, what happens is that during the processing of a DSA a database runs out of free entries and uses realloc() to enlarge the available memory (DATABASES::Enlarge()), then the processing of the DSA continues but because the pointer to some 'master object' (e.g. argument 'pDBMaster' to Execute()) still points to the old allocation area, the access sometime later falls into a freed memory area.

How to trigger the bug: In Conflux, pick a character then go down the stairs next to the half-open black door ("Foul smells and vapors" message) and follow the corridor to the area with the munching trees. Waiting for about 10 seconds in that room at the bottom of the stairs will trigger the bug.
If my guess is correct I could imagine an easy fix to be to make sure the database is large enough before starting to process a DSA, assuming the number of entries allocated during the DSA has a fixed maximum or is known in advance.

Some gory details follow.
Backtrace from a breakpoint at the realloc() call (before the invalid access occurs (the first line is from a debug printf):

Code: Select all

enlarging DB 3: from 10318 to 10418 entries (10 per entry)

Breakpoint 1, DATABASES::Enlarge (this=0x8179de0, dbNum=3) at data.cpp:859
859    MALLOC064);                                                                                                                        
(gdb) bt
#0  DATABASES::Enlarge (this=0x8179de0, dbNum=3) at data.cpp:859
#1  0x080ebe8c in DATABASES::FindEmptyDBEntry (this=0x8179de0, dbType=dbACTUATOR, important=true) at data.cpp:973
#2  0x0808191a in FindEmptyDB3Entry (important=59) at CSBCode.cpp:4926
#3  0x080ad49c in CopyItem (origRN=...) at DSA.cpp:1771
#4  0x080ae71d in EX_ADD (exPkt=<value optimized out>, cmdOffset=<value optimized out>) at DSA.cpp:2026
#5  EX_AMPERSAND (exPkt=<value optimized out>, cmdOffset=<value optimized out>) at DSA.cpp:2520
#6  0x080b4f27 in Execute (RNmaster=<value optimized out>, RNslave=<value optimized out>, pDBMaster=<value optimized out>, pMaster=0x82a2ba8, 
    pSlave=0x82a2ba8, curState=1, msgType=1, locrMaster=..., locrSlave=..., subroutineLevel=1, filter=false) at DSA.cpp:5136
#7  0x080b62e2 in EX_GOSUB (exPkt=...) at DSA.cpp:804
#8  0x080b5e52 in Execute (RNmaster=<value optimized out>, RNslave=<value optimized out>, pDBMaster=<value optimized out>, pMaster=0x82a2ba8, pSlave=0x82a2ba8, curState=1, msgType=0, locrMaster=..., locrSlave=..., subroutineLevel=0, filter=false) at DSA.cpp:5242
#9  0x080b6785 in ProcessDSATimer6 (objSlave=..., pTimer=0xbfffebb0, locrSlave=..., filter=<value optimized out>, dsaVars=0xbfffe97a) at DSA.cpp:5437
#10 0x080e085c in ProcessTT_STONEROOM (pTimer=0xbfffebb0, index=197) at Timer.cpp:2200
#11 0x0808646c in ProcessTimers () at CSBCode.cpp:6412
#12 0x0808932e in _MainLoop () at CSBCode.cpp:617
The error message from valgrind of the invalid access:

Code: Select all

==3396== Thread 1:
==3396== Invalid read of size 2
==3396==    at 0x80E7306: DB3::ParameterA() (data.cpp:1339)
==3396==    by 0x80AC82C: GetTarget(TARGETTYPE, EXECUTIONPACKET&, bool) (DSA.cpp:597)
==3396==    by 0x80ACE68: EX_MESSAGE(EXECUTIONPACKET&, bool, char) (DSA.cpp:667)
==3396==    by 0x80B5DCF: Execute(RN, RN, DB3*, DSA*, DSA*, int, int, LOCATIONREL, LOCATIONREL, int, bool) (DSA.cpp:5221)
==3396==    by 0x80B62E1: EX_GOSUB(EXECUTIONPACKET&) (DSA.cpp:804)
==3396==    by 0x80B5E51: Execute(RN, RN, DB3*, DSA*, DSA*, int, int, LOCATIONREL, LOCATIONREL, int, bool) (DSA.cpp:5242)
==3396==    by 0x80B6784: ProcessDSATimer6(RN, TIMER const*, LOCATIONREL, bool, DSAVARS*) (DSA.cpp:5437)
==3396==    by 0x80E085B: ProcessTT_STONEROOM(TIMER*, unsigned int) (Timer.cpp:2200)
==3396==  Address 0x6456c28 is 100,128 bytes inside a block of size 103,180 free'd
==3396==    at 0x4025A7D: realloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==3396==    by 0x808A314: UI_realloc(void*, int, unsigned int) (CSBUI.cpp:2270)
==3396==    by 0x80E97F1: DATABASES::Enlarge(int) (data.cpp:857)
==3396==    by 0x80EBDFB: DATABASES::FindEmptyDBEntry(DBTYPE, bool) (data.cpp:971)
==3396==    by 0x8081919: FindEmptyDB3Entry(bool) (CSBCode.cpp:4926)
==3396==    by 0x80AD49B: CopyItem(RN) (DSA.cpp:1771)
==3396==    by 0x80AE71C: EX_AMPERSAND(EXECUTIONPACKET&, int) (DSA.cpp:2026)
==3396==    by 0x80B4F26: Execute(RN, RN, DB3*, DSA*, DSA*, int, int, LOCATIONREL, LOCATIONREL, int, bool) (DSA.cpp:5136)
I will try to provide additional info if needed. :)

User avatar
Sophia
Concise and Honest
Posts: 3978
Joined: 12-Sep-02 19:50
Location: Nowhere in particular
Contact:

Re: Bug (and probable fix) in CSBwin

Post by Sophia » 6-Feb-11 23:41

DSB gets around issues like this by storing the pointers to its main databases in global data structures, and not actually passing them anywhere. In some ways, it's not too pretty, but this means that any time they're accessed, it's sure to get the correct pointer, even if something has been moved or resized. So that might be a moderately easy fix... Just a random thought. :)

User avatar
Paul Stevens
CSBwin Guru
Posts: 4099
Joined: 8-Apr-01 16:00
Location: Madison, Wisconsin, USA

Re: Bug (and probable fix) in CSBwin

Post by Paul Stevens » 7-Feb-11 01:25

Thanks again.....these are tough to locate.
I appreciate your work.

User avatar
Paul Stevens
CSBwin Guru
Posts: 4099
Joined: 8-Apr-01 16:00
Location: Madison, Wisconsin, USA

Re: Bug (and probable fix) in CSBwin

Post by Paul Stevens » 7-Feb-11 18:56

Thanks once more.......I have found it easy
to replicate this bug.

Now I am pondering various ways of fixing it.

User avatar
Paul Stevens
CSBwin Guru
Posts: 4099
Joined: 8-Apr-01 16:00
Location: Madison, Wisconsin, USA

Re: Bug (and probable fix) in CSBwin

Post by Paul Stevens » 7-Feb-11 20:31

And finally, one last 'Thank You'........

I fixed this rather simply. I never store the
pointer to that record. I maintain only the
'Record Identifier' (or RN) and compute the
pointer wherever it is needed. A couple of
extra machine-language instructions three
or four times per DSA invocation.

Anyone playing Conflux should get the
new executable:

http://dianneandpaul.net/CSBwin/CSBwin11_058.zip

Zyx should probably take notice and say something
in the appropriate Conflux threads. This is a pretty
ugly bug.

Ful Ir
Novice
Posts: 28
Joined: 26-Jan-11 20:04

Re: Bug (and probable fix) in CSBwin

Post by Ful Ir » 8-Feb-11 22:42

You're welcome, I'm happy to help.

I tried a similar change yesterday, but was too tired to look in other places except DSA.cpp if I had missed any other references before posting it.
I essentially replaced all (about 5 or so) m_pDBMaster references with calls to GetRecordAddressDB3() and removed the parameter from the Execute() function. So far does seem to do the trick, or did I miss some other cases?
Patch of my changes:

Code: Select all

diff --git a/DSA.cpp b/DSA.cpp
index e751468..ca4fbfd 100644
--- a/DSA.cpp
+++ b/DSA.cpp
@@ -245,7 +245,6 @@ struct EXECUTIONPACKET
   i32         m_nextState;      //The prefix '5' in "5MS"
   ui16       *m_PC;             //Updated
   i32         m_wordsRemaining; //Updated
-  DB3        *m_pDBMaster;      //Unchanged
   DSA        *m_pMaster;        //Unchanged
   i32         m_curState;       //Unchanged
   i32         m_column;         //Unchanged
@@ -427,7 +426,6 @@ void STACK::minusRoll(ui32 i)
 
 static i32 Execute(RN  RNmaster,
                    RN  RNslave,
-                   DB3 *pDBMaster,
                    DSA *pMaster, 
                    DSA *pSlave, 
                    i32 curState, 
@@ -594,10 +592,10 @@ LOCATIONREL GetTarget(TARGETTYPE targetType,
   switch (targetType)
   {
   case TARGET_A:
-    target = exPkt.m_pDBMaster->ParameterA();
+    target = GetRecordAddressDB3(exPkt.m_RNmaster)->ParameterA();^M
     break;
   case TARGET_B:
-    target = exPkt.m_pDBMaster->ParameterB();
+    target = GetRecordAddressDB3(exPkt.m_RNmaster)->ParameterB();^M
     break;
   case TARGET_ABS:
     target = *(exPkt.m_PC++);
@@ -653,8 +651,8 @@ static void EX_MESSAGE(EXECUTIONPACKET& exPkt, bool thirtyTwo, char MorD)
   switch (cmd.Delay())
   {
   case 0: delay = 0; break;
-  case 1: delay = exPkt.m_pDBMaster->ParameterA(); break;
-  case 2: delay = exPkt.m_pDBMaster->ParameterB(); break;
+  case 1: delay = GetRecordAddressDB3(exPkt.m_RNmaster)->ParameterA(); break;^M
+  case 2: delay = GetRecordAddressDB3(exPkt.m_RNmaster)->ParameterB(); break;^M
   case 3: delay = *(exPkt.m_PC++); exPkt.m_wordsRemaining--; break;
   };
 //  locr = GetTarget(cmd.Target(),
@@ -793,7 +791,6 @@ static void EX_GOSUB(EXECUTIONPACKET& exPkt)
   };
   Execute(exPkt.m_RNmaster,
           exPkt.m_RNslave,
-          exPkt.m_pDBMaster,
           exPkt.m_pMaster,
           exPkt.m_pMaster,
           row,
@@ -942,7 +939,6 @@ static void EX_QUESTION(EXECUTIONPACKET& exPkt)
     };
     Execute(exPkt.m_RNmaster,
             exPkt.m_RNslave,
-            exPkt.m_pDBMaster,
             exPkt.m_pMaster,
             exPkt.m_pMaster,
             addr,
@@ -4748,7 +4744,6 @@ static void EX_AMPERSAND(EXECUTIONPACKET& exPkt, int cmdOffset)
       };
       Execute(exPkt.m_RNmaster,
               exPkt.m_RNslave,
-              exPkt.m_pDBMaster,
               exPkt.m_pMaster,
               exPkt.m_pMaster,
               state,
@@ -5045,7 +5040,6 @@ public:
 
 static i32 Execute(RN  RNmaster,
                    RN  RNslave,
-                   DB3 *pDBMaster,
                    DSA *pMaster, 
                    DSA *pSlave, 
                    i32 curState, 
@@ -5066,7 +5060,6 @@ static i32 Execute(RN  RNmaster,
   exPkt.m_subroutineLevel = subroutineLevel;
   exPkt.m_RNmaster       = RNmaster;
   exPkt.m_RNslave        = RNslave;
-  exPkt.m_pDBMaster      = pDBMaster;
   exPkt.m_pMaster        = pMaster; 
   exPkt.m_curState       = curState;
   exPkt.m_column         = msgType;
@@ -5426,7 +5419,6 @@ bool ProcessDSATimer6(RN objSlave,
   startingOperationCount = operationCount;
   curState = Execute(objMaster,
                      objSlave,
-                     pDBMaster, 
                      pMaster, 
                      pSlave, 
                      curState, 
@@ -5436,7 +5428,6 @@ bool ProcessDSATimer6(RN objSlave,
                      0,
                      filter);
   pDBSlave = GetRecordAddressDB3(objSlave);
-  pDBMaster = GetRecordAddressDB3(objMaster);
 #ifdef _DEBUG
   instrumentation.AddToDSACount(absDSAindex, operationCount-startingOperationCount);
 #endif
@@ -5457,7 +5448,7 @@ bool ProcessDSATimer6(RN objSlave,
     };
   };
   if (pdsaDbank->forceState >= 0) curState = pdsaDbank->forceState;
-  PutState(pDBMaster, pMaster, curState);
+  PutState(GetRecordAddressDB3(objMaster), pMaster, curState);
   if (!excessiveWarning)
   {
     if (operationCount > 50000)

User avatar
Paul Stevens
CSBwin Guru
Posts: 4099
Joined: 8-Apr-01 16:00
Location: Madison, Wisconsin, USA

Re: Bug (and probable fix) in CSBwin

Post by Paul Stevens » 8-Feb-11 23:50

The pointer was computed in 'processTimer6'. I think
that was the only place the pointer was saved.
I removed that computation and then had to
recode every reference to the pointer.
I removed the pointer from EXECUTIONPACKET.

I don't remember any of the other particulars.

I posted:
http://dianneandpaul.net/CSBwin/DSA.cpp
if you want to do some sort of comparison
with the old version.

Ful Ir
Novice
Posts: 28
Joined: 26-Jan-11 20:04

Re: Bug (and probable fix) in CSBwin

Post by Ful Ir » 21-Feb-11 22:04

Thanks, I took a look at it and it seems I haven't missed anything.

I fixed another small problem (two related?), but this is more an annoyance than serious.
a) When the recording is in "preopen" mode (lines go into the m_lineQueue buffer, but not to disk yet), the checks "if (m_lineQueueLen > MAXLINEQUEUE)" in data.cpp near lines 1917 and 1974 trigger too late (it should be ">="), so the game segfaults instead of exiting cleanly.
b) The game quits after about 10 clicks on the dungeon door or "load game" screen when the m_lineQueue buffer becomes full, so I changed m_lineQueue to use dynamic allocation.

I don't know if you think these are worth fixing, but I will post patches and/or the modified source files if you want them.

User avatar
Paul Stevens
CSBwin Guru
Posts: 4099
Joined: 8-Apr-01 16:00
Location: Madison, Wisconsin, USA

Re: Bug (and probable fix) in CSBwin

Post by Paul Stevens » 21-Feb-11 22:42

Yes, indeed. They are worth fixing. Details like this
are very important in my opinion. I appreciate
your work in this regard.

Ful Ir
Novice
Posts: 28
Joined: 26-Jan-11 20:04

Re: Bug (and probable fix) in CSBwin

Post by Ful Ir » 24-Feb-11 22:39

OK, I have another one, but this one may be in Conflux' DSAs and not in the engine itself. While playing Conflux I sometimes had the game quit on me with System error 45 / No empty timer entries. There was no apparent relation to where I was in the dungeon or how I long I had played since starting to play or loading a saved game.
I added a debug printf outputting m_numTimer every time it was increased or decreased. This showed usually about 371 with frequent jumps to about 430, but never higher than 500.
Some time later I encountered the bug again and was lucky enough to save just before it triggers. The timers jump to ~500, then ~800 and then go up ever faster until they hit the limit of 5000 some seconds later.
This only happens however if I stand still after loading the saved game, if I move the party the game continues normally.
I haven't tried yet if the bug also triggers in the Windows version.
If anyone wants to give it a try, you can download the savegame and a record file here: http://www.diamondsword.de/csb/csbgame3.zip. I'm currently stumped because my knowledge of DSAs is rather limited.

User avatar
Paul Stevens
CSBwin Guru
Posts: 4099
Joined: 8-Apr-01 16:00
Location: Madison, Wisconsin, USA

Re: Bug (and probable fix) in CSBwin

Post by Paul Stevens » 25-Feb-11 00:23

I will check it out. These things are usually pretty
obvious when I look at the traces.

Thanks.

User avatar
Paul Stevens
CSBwin Guru
Posts: 4099
Joined: 8-Apr-01 16:00
Location: Madison, Wisconsin, USA

Re: Bug (and probable fix) in CSBwin

Post by Paul Stevens » 25-Feb-11 03:30

Well, I tried your CSBgame3 and playfile.

The playfile ended immediately after loading
the savegame. So I just stood there and let
my champions get clubbed to death one-at-a-
time. Tried it with 11.057 and 11.059. I did
not have an 11.056 available. Tried it with
debug version and release version. Watched
the timer count. Nothing went wrong.

Hmmmmmm......don't know what to say.
Unless 11.057 fixed something.

Ful Ir
Novice
Posts: 28
Joined: 26-Jan-11 20:04

Re: Bug (and probable fix) in CSBwin

Post by Ful Ir » 25-Feb-11 18:27

I tried some things:
- Increasing the timer interval on Linux from 10ms to 100ms changes nothing
- decreasing the resolution of UI_GetSystemTime to 100ms changes nothing

But I can reproduce the fault on Windows with version 11.058 by just loading the save game (ignoring the record file) and waiting for about 5 seconds. Sorry, I tacked on the record file as an afterthought without really checking if it contained meanful data.
I tried turning on timer and DSA trace but can't see any change in the log files.

User avatar
Paul Stevens
CSBwin Guru
Posts: 4099
Joined: 8-Apr-01 16:00
Location: Madison, Wisconsin, USA

Re: Bug (and probable fix) in CSBwin

Post by Paul Stevens » 25-Feb-11 21:23

Yep....fails very quickly.

See you later today. Hopefully

User avatar
Paul Stevens
CSBwin Guru
Posts: 4099
Joined: 8-Apr-01 16:00
Location: Madison, Wisconsin, USA

Re: Bug (and probable fix) in CSBwin

Post by Paul Stevens » 26-Feb-11 01:12

Level 3 is flooded with DSA type Decoration AB. I think
it is to provide the appearance of a water flood but
it turns out to be a DSA flood.

Level 3 has multiple Decoration AB DSAs at 13,23.

Each receives messages and produces new messages.
So when one message arrives at 13,23 two are produced.

I though we might have fixed some problem like this
when we assured that messages issued with delay of
zero would be put in the front of the queue. Otherwise
things might happen like this:
We notice that there is no DSA in 13,23 so we initiate
a DSA to be placed there. Then, before it is actually
born, someone else initiates another DSA to be placed
in the same spot. Then both are born. The initiation
was fun but the multiple births-----big problems.

The entire 5,190,001 line DSA trace is at:
http://dianneandpaul.net/CSBwin/trace014.zip
for those among you who are sadists.

Here is an abbreviated piece of the non-DSA trace.

Code: Select all

00deca timer Process        18f 03 DLY=0000 66 00 0d 17 02 01 59f9c191
00deca timer create         153 03 DLY=0000 65 1a 0c 17 03 00 840f4c88
00deca timer create         18f 03 DLY=0001 66 00 0d 17 02 01 840f4c88
00deca timer create         1c6 03 DLY=0000 65 1b 0d 17 02 00 840f4c88
DSA Decoration AB                                                                   State 13 MSG 7 [458]
00deca timer create         1d7 03 DLY=0000 65 1c 0c 17 03 00 6f72a3f3
00deca timer create         1f9 03 DLY=0001 66 00 0d 17 02 01 6f72a3f3
00deca timer create         23b 03 DLY=0000 65 1d 0d 17 02 00 6f72a3f3
DSA Decoration AB                                                                   State 13 MSG 7 [458]



User avatar
Zyx
DSA Master
Posts: 2564
Joined: 5-Jun-00 11:53
Location: in the mind
Contact:

Re: Bug (and probable fix) in CSBwin

Post by Zyx » 27-Feb-11 23:26

I found the faulty dsa code. Send me a savegame for bugfixing.

Post Reply