This is not an easy task. Very slow and tedious, and without any guaranteed result. But very gratifying when you find something!
There are many ways to start and not all of them will always lead to interesting results.
- You must know what you are looking for (images, sounds, color palettes, maps of levels)
- You may study the hardware platform to know the raw formats used for the items you are looking for.
For example, when I searched data in the Super NES ROM of Dungeon Master, I read lots of technical documentation explaining how the video was managed: the fact that it uses tile based graphics, the possible sizes of tiles, the fact that tile graphics are stored in a particular planar mode, the format of color palettes.
There are tools available that can display the content of any file so you can visually search for tiles, encoded in many various ways (lots of bitmap and planar variations).
Once I identified a set of tiles (grouped together), I knew I have to find a 'tilemap' which instructs which tiles should be assembled to form an image. Hopefully, viewing the tiles you can recognize an in-game image, you make a screenshot of that image and start using the tiles in the tile set as a puzzle: build the list of tile indices that will form the complete image (you only need a small part of the image usually). Then you can search for the list of indices in the raw file data, encoded as bytes, or words, little endian or big endians, etc. (you need to write a custom program for these kind of searches, and this example is included in SCK). This is how we found most of the tilemaps. When you have several matches, you have to try each candidate tilemap to check which one is the correct one.
- You can search for already know patterns of data: does the dungeon data have the same format in DM SNES than in other version? You then search in the ROM for some dungeon byte sequence and bingo! the dungeon is there. But this does not work for Theron's Quest... the dungeon data does not have the same format. Only some parts can be found.
- You can disassemble the program to see how it manipulates the data. This requires skills but produces the best results of course. This is often necessary to understand compression or encryption algorithms.
- You can inspect file contents with a Hex editor and see if you can spot interesting things like:
- Text strings that may help you identify the content
- Patterns: some repetivite byte sequences or structures
- With an experienced look, you can sometimes spot uncompressed graphics directly in the hex editor (much like in the Matrix movie: the heroes can see things on the screens, but you can only see green scrolling symbols!). For example, I remember finding the picture of the Axe in the ROM of Theron's Quest and that was the start of my study to find and extract some other uncompressed graphics (this is now included in sck). Using a hex editor where you can adjust the width (number of bytes) of the view helps a lot here.
When you have spotted some interesting patterns or structures, you can start looking at headers that often contain offets and sizes. Use these to see if you can find file sizes, block sizes, block offsets.
I did some very early research on DM Nexus files a long time ago but did not spent much time on it. I did not extract anything and only identified some structures in some files by looking at them in a hex editor.
I have uploaded my notes here:
http://dmweb.free.fr/Stuff/DMNFiles.xlsx
http://dmweb.free.fr/Stuff/DMNNotes.txt
But I don't think this will help you much
If anyone want to continue research though, it would be cool to extract bitmaps, textures, sounds. Extracting level maps (there is an automap feature in the game) would also be nice and they do not seem to be very hard to understand as they seem to be stored uncompressed.