[NTLK] ROM Disassembly online?

Fri May 29 19:16:32 EDT 2015

I remember the discussion about having the private disassembly file that's generated by Albert and, separately, the public comment file.  I even wrote that utility "ROMCOM" that will merge the public comment file into the disassembly, if both are in a particular format.

As far as I can tell, there is no actual public comment file in existence yet.  If this is true, and there are no objections, I could simply create a new GitHub repo for the public comment file and invite everyone to it.  (Unless it should be part of the existing Albert repo?)  It would simply be an empty file for now, though.

Also, AFAIK, the current Albert does not generate its disassembly in the format ROMCOM is expecting.  I'm not sure if this has already been tweaked and just hasn't been committed, or if it still needs to be done?  I can probably do it if it needs doing, but there's no need to do it twice.  :)

	. . .

So, let's say we resolve those two problems -- not too hard -- what now?

In his README for Albert, Matthias has the following to-do list:

  - Jump Tables
    - generate the correct offsets for the remaining jump tables
    - verify
  - ARM code section
    - understand and label every word in the code (grep "\.word")
    - create symbolic information
    - use Einstein to record byte vs. word access to ROM
    - use class information to analyse byte offsets
  - NewtonScript code section
    - understand and label all data byte
    - generate symbolic labels for jump instructions in Byte Code
  - Databases
    - not much done here yet: try to understand and disassemble the compressed
      section
    - reassemble and compress whatever we find out
  - ROM Extensions
    - develop strategy

While I understand in general what all these words mean, I don't understand specifically what steps I need to do to help.

For example: "understand and label every word in the code (grep "\.word")"  -- I know there are .word directives in the disassembly that are untyped, unknown data, and I presume he means each one of these needs to be identified as either ARM code, data, NewtonScript code, etc.  Obviously not a one-person job as there are thousands of them in the disassembly.  But even if I figured out a bunch of these... what then do I do with that information?  Does it go in a file somewhere?  Just email it to him?

"generate the correct offsets for the remaining jump tables" -- I know what offsets are and I know what jump tables are, but what jump tables, where (addresses?), and what is the process I need to do to "generate the correct offset".  Armed with this knowledge, I will generate offsets all day, if possible.  :)

You get the idea... I'm just really eager to help with anything I can!

Steven

> On May 29, 2015, at 2:51 PM, Jake Bordens <jake at allaboutjake.com> wrote:
> 
> 
>> But what I still really need is someone to tell me what is the most important thing I ought to be doing right now to move Matthias toward the goal of completing his static analysis of the entire ROM!  
> 
> I'll take a stab and Matthias can correct me. 
> 
> I think his vision is an app that has 2 "databases"-- a "public database" and a "private database."
> 
> The public database would contain all the metadata, comments, and annotation about the ROM as entered and developed by those working on the reverse engineering.  The database can be posted to github and updated by anyone.  Crowdsourced if you will.  Any metadata needed to understand, decode, decompile, or disassemble could be held in the public database.
> 
> The private database would contain the ROM and derived information from the ROM (disassembly and decompilation).  Perhaps it would have decoded assets such as images, fonts, and sound files eventually as well.  This wouldn't be public, since it is under copyright.
> 
> Ideally, with the ROM and the current version of the community "public" database, one would be able to regenerate the private database.   I guess what I'm calling the "private database" is more a cache of decompiled/disassembled code and assets. 
> 
> It doesn't necessarily need to be a "database", it could be text files or something more human readable and diff-able, but I'm just using the word as a concept.
> 
> Anyhow, that's my interpretation.  We want to be able to collaborate on the public stuff, yet have a way to generate the exact same "private database" without sharing it directly between the developers in the community.
> 
> 
> ----------------------------------------------------------------------
> 
> http://newtontalk.net/