[NTLK] Einstein Simulator Status Update
Matthias Melcher
mm at matthiasm.com
Wed Sep 19 15:59:44 EDT 2012
On 19.09.2012, at 19:33, Steven Frank <stevenfrank880 at gmail.com> wrote:
> I looked at how you implemented CArrayIterator last night, and did my
> best to try to understand it. I think the main thing holding me back
> from adding new simulator methods is not knowing where to find the
> appropriate documentation on the Newton side. For example, to
> implement a method in the simulator, you need to know at least:
>
> - The ROM address of the method you're patching (I have a symbol dump,
> but I'm sure most people don't)
To work with all this, you need to download as much information as is available. Apple has released so much code that I sometimes wonder if they actually *wanted* us to rewrite the code ;-)
1: On Unna, download and unpack the DDKs. They all contain original header files. These files are gold, as they give you the class dependency, the class members, the methoods, the signature of the methods and the return types of many many functions in the ROM:
http://www.unna.org/view.php?/apple/development/DDKs
Better yet, download and unpack the entire Newton C++ Toolkit. Unpack it, and you will find many valuable header files.
2: Next, get the debugger images. They are in the archives above, but they are also available on Unna. They contain the entire ROM (oops!) *plus* (and that is the absolute motherload of luck) all debugging symbols that are used in the ROM. To spell that out agian: this file contains a list of where (almost) each function in the ROM starts, and for C++ functions, a list of all arguments that this function takes.
http://www.unna.org/view.php?/development/Debugger_Images
3: So how do you get the symbols out of the ROM? Simple. Uncompress the file above. The result is a regular an well documented ARM binary image. Just run the SymbolDumper on it:
http://www.unna.org/view.php?/development/tools/ROMSymbolDumper
4: Aaaah, we just received over 52000 symbols from the ROM file. Thanks, Apple! Many of the symbols have a double underscore in them. Those are C++ symbols which need to be demangled to be readable. Get a program named c++filt and run it on the symbol list - tadaaa: all C++ symbols are now very readable and extremely informative, for example, "AppendToList__14CArrayIteratorFP14CArrayIterator" becomes:
CArrayIterator::AppendToList(CArrayIterator*);
When this is called, the ARM register r0 holds a pointer to the class, and r1 holds another CArrayIterator pointer. The return address is in r14, as always, and the stack pointer in r13. All we need is the return type, which is always stored in r0.
So search the header files and what do we find in DDKIncludes/UtilityClass/ArrayIterator.h? Ah, there:
Line 89: CArrayIterator* AppendToList(CArrayIterator* toList);
Now we know the method signature, the return type, and the address in the ROM.
> - The method signature, including types
> - Return type
> - Offsets for the class's instance variables (I assume the purpose of
> the SIM_GET_SET_W() macros is to map these to RAM in the emulator (?),
> although I don't fully understand it yet.)
Yes, that is correct. In a normal environment, you would simply have class variables. In the Simulator, we still need to go through Einstein to read from and write to emulator memory. That's what the GET_SET macros are for. In a native class, SIM_GET_SET_W(int, Size) would translate to:
class CTest {
private:
int fSize;
public:
int GetSize() { return fSize; }
void SetSize(int v) { fSize = v; }
};
An optimizing compiler throws those inline methods away and simply translates:
x = this->GetSize(); // x = fSize; *will fail*!!!
into
ldr r1, [r0, 0] // load r1 with the memory at address r0 plus offset 0
In Simulator world, a lot more stuff goes on. Ignore that. Always use the GetSet method to access these variable and all will be fine (including MMU exception handling...)
To get the offsets right, the GET_SET methods must be in the order as given by the header file, using the right sizes, and keeping the correct order of inheritance. It's always wise to calculate, mark, and verify the offsets.
Two caveats: no constructors and destructors yet! I will have to override "new" and "delete" to make that work. Also, no virtual functions yet!
> - What the method is actually supposed to do. :)
Ah, now that is tricky. The legal and perfect way to do is the Black Box approach: one person "observes" what the function does and explains that to another person. The second person then implements a function that does, whatever the first person observed. Easy, huh?
In the real world however, Reverse Engineering by Static Analysis is the common approach. I *explicitly* do *not* recommend this approach. I only mention it for completeness and explain it in detail purely for educational purposes here:
Get a disassembler, for example DisARM. DisARM is actually part of Einstein: open the monitor window and type "stop". The current instruction will be clearly readable. Type "pc=000383B4" to see another location.
Let's say we get something similar to this:
AppendToList__14CArrayIteratorFP14CArrayIterator:
@ 0x000383B4: t_unknown CArrayIterator::AppendToList(...)
@ label = 'AppendToList__14CArrayIteratorFP14CArrayIterator'
@ ARM R0 = type: 'CArrayIterator'*
@ ARM R1 = type: 'CArrayIterator'*
@ name = 'AppendToList'
@ class = 'CArrayIterator'
teq r1, #0 @ 0x000383B4 0xE3310000 - .1..
moveq pc, lr @ 0x000383B8 0x01A0F00E - ....
ldr r2, [r1, #24] @ 0x000383BC 0xE5912018 - ....
(...)
str r0, [r2, #20]! @ 0x000383CC 0xE5A20014 - ....
str r0, [r1, #24]! @ 0x000383D0 0xE5A10018 - ....
mov pc, lr @ 0x000383D4 0xE1A0F00E - ....
Some obvious stuff is here in plain sight. The C++ compiler that Apple used was set to do almost no optimization. Again, did the engineers *want* us to read this?
mov pc, lr returns to the caller. This is simply the end of the method, in this case return this;
ldr r2, [r1, #24] reads the RAM at r1 (the first argument: a CArrayIterator*) plus 24. Looking into the header file, this translates directly into r2 = arg1->GetNextLink();
str r0, [r1, #24] translates into arg1->SetNextLink(r0);
Follow the code, untangle the branches, recreate the if's and for's, then replace the r0 to r12 with meaningful names and types. When done, add the stub that infiltrates some code that will make the JIT compiler switch into the simulator, for example:
T_SIM_INJECTION(0x000383B4, "CArrayIterator::AppendToList(list)") {
SIM_RETVAL SIM_CLASS(CArrayIterator)->AppendToList(SIM_ARG1(CArrayIterator*));
SIM_RETURN;
}
Now your function is linked into the ROM. Run the emulator and watch its behavior closely. Does it still run as expected? When setting a breakpoint in your function, will it actually be called? Do your observations match the observations of your partner in the Black Box approach? Yes? Great!
Just one remark to copyright: I see this work as a mix of research and self defense. It's research because none of this is used commercially. Findings are based on tools that do not decrypt or circumvent or break some copy protection. Every tool or information has been made public by Apple, ARM, or any other of the respective creators. It's self defense, simply because our hardware is giving up, the software is buggy (y10k, need I say more), and there is no support at all from Apple anymore. My only way to continue to use the product I purchased and have access to my data is to somehow transfer the software to another machine. And as a last indicator that Apple has lost interest and made all this abandonware is the fact that Apple has let all trademarks on MessagePad or Newton run out without renewal.
> The other thing I noticed is the OS X app only builds for 32-bit. It
> might be tricky to make it 64-bit clean, but this is something I could
> look into also.
Yes, I have not invested any thought into that. 64-bit is only useful if a program plans to access more than 2GB of data and would run out of address space. As long as 32 bit software still runs on the respective operating systems, I see no need. But maybe Apple has announced that 32 bit software will soon stop working?
- Matthias
More information about the NewtonTalk
mailing list