Reverse Engineering MacWars

One may have seen my prior post on Cracking a 35 Year-Old Macintosh Game, wherein I describe cracking Star Trek: The Kobayashi Alternative, only to “find out it was absolute garbage” as one friend and colleague put it. Still, it was a fun little challenge, and it whetted my appetite for the next challenge.

So I decided to look into the other abandonware for 68k Macs that remained uncracked, and hopefully required a bit more reverse engineering than just narrowing down the section of code to replace with no-operation instructions.

Soon I ran across what may have been considered the holy grail of early Macintosh copy-protected software: MacWars. If you haven’t heard of Macwars, this Youtube video by gruz gives a good overview. This was a clone of the popular Star Wars arcade game, which would not have its own Macintosh port for a few more years. It featured vector graphics, a 3d environment, digitized voice audio, … and a very sophisticated anti-piracy code that encrypted the entire game, literally decrypting chunks of the application as needed right before execution.

MacWar’s title screen. The icons were actually stored as custom font characters in the system file.

The encryption and anti-piracy routines were likely done by PACE Anti-Piracy, Inc, which was formed in 1985, the same year MacWars came out. It is still active, known for its Interlok system from the dongle-based anti-piracy era and today’s Internet-based iLok license system.

Cracking MacWars would certainly be a proper exercise in reverse engineering, and as it appears theses early PACE-protected titles had never been cracked before, would make for some interesting security research.

A quick aside for those wanting a quick primer of reverse engineering through disassembly: its reading the assembly language directly from the machine code, usually without any hints like variables or function names, and trying to figure out what the code does. Half the battle is just figuring out if you’re even looking in the correct section of code. Sometimes the variables and function names are sometimes included in so-called ‘debugging symbols’ and can help with that immensely, but these are usually not included in production code as they take up time and memory. However, sometimes those symbols are left in, as was famously (and likely accidentally) done in the Grand Theft Auto 3 for PS2 and Android…but not so for MacWars. Anyhow, for a simple example, try to figure out what the following simplified assembly code does…and note that disassembled machine code doesn’t have the commented hints:

0: movq.b 0, d0 #set register d0 to 0
16: movq.b $20,d1 #set register d1 to 32
32: cmp.b d0,d1 #compare registers d0 and d1
48: bgt pc+$40 #if d0>d1, then skip 64 bytes ahead
64: _sysbeep #call the system trap to make a beep
80: addq.b 1,d0 #add 1 to register d0
96: jmp pc-$40 #jump back 64 bytes
112:

Figured it out? It basically does the same as this C-like code:

i=0;
while (i≤32) {
beep();
i = i + 1;
}

Now, imagine doing this for a couple thousands of lines of code, and this is what I took on for reverse engineering MacWars. For those researchers wanting to follow along at home, I’ve published my code on github at https://github.com/barberd/depace/. The line numbers I refer to later in this article refer to specifically https://github.com/barberd/depace/blob/main/decrypt/macwars-D-decrypt.py. Researchers may want to note that the hexadecimal addresses in the comments of that file are the corresponding location in the original file’s resource fork. Also note I don’t provide a copy of the MacWars software; to use my scripts, one will need to obtain it and then extract a copy of the application files into MacBinary format.

I didn’t start out knowing the application was encrypted. Stepping through the assembly, I first noticed the code that made sense was only at the beginning, and the rest was gibberish — just completely (seemingly) random data. This is unusual; programmers don’t tend to waste space, especially in the mid-80s when memory and disk space was limited. Even today such waste would be considered inelegant. It was a safe bet that this bad code might be encrypted good code — after all, something actually had to make up the game.

Focusing on the readable code, I noticed it grabbed a pointer to the beginning of the seemingly-random data. Further along, I spotted an EOR instruction. EOR (“exclusive or,” also termed XOR in other CPUs and languages), is one of the most basic digital instructions, basically saying: given two inputs, output true if only one of the inputs is true. The nice thing about this is that by EORing data with a key, the output is scrambled, but by EORing it again with the same key, one then gets the original data back. This makes it a symmetric operation, and its very commonly found in encryption algorithms. As such, I had a big hint this subroutine would decrypt the data.

But I still had a big problem: even if this did decrypt the later code block, the program didn’t actually seem to do anything with it. It just set up some registers like it was about to call another subroutine, but then just returned. Huh?

Just fly around randomly; eventually one will find the enemy space station… or decrypted code routine, whatever.

It took me a few minutes, but I finally figured it out: at the very beginning of the code, almost the very first thing it does, a pointer to the random data (decrypted later) is pushed onto the stack. And tracing further down, I could not find a corresponding pop from the stack.

This had to be on purpose; no compiler would output code that corrupted the stack so directly. But this was actually a clever hack: on the 68k processor, a JSR (Jump to SubRoutine) instruction pushes the current execution location onto the stack before executing the subroutine, and the RTS instruction (ReTurn from Subroutine), located at the end of every subroutine, pops the top 4 bytes off the stack and continues execution there, redirecting the computer back to the same place it had started from before the jump. But the PACE code pushes a new address onto the stack at the beginning of the subroutine without a corresponding removal from the stack at the end. Instead of returning to the true caller, the RTS instruction ‘returns’ to the wrong place, and starts executing code at the now-decrypted code block. Neat, with a minor side of stack corruption (wasting 4 bytes each time).

So, now that I’d figured out the general flow, the first order of business was to reverse engineer the decryption code, which ended up being two challenges: determine both the encryption key and encryption algorithm used. Both were tedious but straightforward, and I was able to recreate both using python.

For the key, the program read all the bytes of code making up the entire code segment, adding the value of each to a ‘seed’ number into a 32-bit register, ignoring any overflow. This became the key for later decryption. This is an artful method, as modifying the code at all, perhaps to add inspection breakpoints, would result in an incorrect key and stop most would-be crackers. This is the ‘getkey1’ call on line 139 of my code. Note that the initial seed number translates to “PACE” in ascii, the first hint that PACE Anti-piracy Inc provided this code.

For the decryption algorithm, its basically a key stream algorithm where the key was used to EOR a chunk of code, and then the key was rotated in some way for the next round— somewhat akin (though much simpler) to the way the RC4 encryption algorithm works. It was slightly tedious to get all the bitwise operations correct in python (python3 basically allows for infinite bits in its bitwise functions, so one has to code in 32 bit boundaries to simulate the 68k processor’s behavior), but quite doable, so a few dozen lines of code later (initiated by the write_decrypt1 call on line 141 of my code), I had another big section of Macwar’s code in readable form.

Then another layer of PACE’s sophistication was revealed: even after decryption, a good amount of the code was still encrypted, and this round used different algorithms both for key generation and decryption…and both were much more sophisticated. I reversed both of these as well, becoming the getkey2() and write_decrypt2() functions in my python code.

Prepare to enter the tunnels…of encryption!

In total, it turns out there were 4 layers of encryption. Each round undertook several steps to prevent tampering: the system’s error jump tables were modified and then later examined to eliminate the ability to redirect execution outside of the program (and therefore give crackers an avenue), and it would even check to see if more than 3 seconds had passed since the last decryption round, and would corrupt the key in memory by EORing it with “POOP” if so (I guess programmers’ humor hasn’t changed much). If anything seemed out of place, the system would reboot. Additionally, each layer wiped the prior layers from memory when no longer needed, so anyone somehow bypassing all the lockouts still wouldn’t have access to the data if done a millisecond too late.

The third round was where the bad-block disk check took place, and would also check certain things like the timestamps of the application file and the disk volume, which it passed to the fourth round for later use. This underlines why its so important to get a good disk image when doing research; sometimes archivists accidentally modify the volume in such a way that loses important data, such as the ‘last modified’ timestamp.

The fourth and final round was finally where the ‘real’ application decryption took place, which involved two activities. The first was decrypt the program’s jump table found in the first code segment, and the second was to set up the OS to decrypt the rest of the application’s code segments. And both implementations were glorious, from a computer science perspective.

For full appreciation of the beauty of how PACE designed their solution, one needs to understand how the original Macintosh did memory management. The original 1984 Macintosh had only 128 kilobytes of memory, so entire programs often couldn’t be loaded into memory — and instead only sections (called segments) would be loaded, one at a time. Programmers would compile their code into these different segments (some development suites would break up the code automatically), use a linker to assemble them with control passing through a ‘jump table,’ and store them as different entries within the application’s file. There could be up to 256 of these “CODE” segments, allowing for large programs to operate in only 128k of memory, largely seamlessly to the user. The jump table, combined with the Macintosh’s memory management ROM calls, acted like a traffic copy to pull in the right code segments when needed. And PACE developed their system to integrate with all this.

When a user starts an application in the Macintosh, the operating system (really, the ROM calls) first loads and executes the first code segment, which on most programs contains the jump table. As such, a processor entering a new application normally involves executing the very first jump in the table. In the case of MacWars, PACE encrypted the original jump table, replaced the first 8 bytes with a jump to its own code, and saved the original (but encrypted) 8 bytes into its own code segment (CODE4). The fourth round of encryption copied these bytes back, and then decrypted the entire jump table, allowing the application’s normal behavior. This is replicated in line 164 and 166 of my script.

The enemy space station. Maybe the final encryption key is somewhere inside?

Next, the key was incremented by a certain amount to derive the final key used for decrypting the rest of the application. The exact number was determined in round three, but is calculated in some very complex code that I couldn’t get my head around after 2–3 days of tracing assembly. While I’m certain I could grok it with a little more effort, it turned out to not be necessary. I figured out that there is a fatal flaw in PACE’s decryption algorithm: it only really uses the last 8 bits of the 32-bit key, as the algorithm only operates one byte at a time. This means there are only 256 possible keys, not 4.3 billion. Additionally, the first two bytes of a code segment is the offset of that code segment’s first entry in the jump table. Since the jump table was less than 256 bytes long, the offset would always entirely fit into just the second byte, meaning I was certain the first byte must decrypt to 0. Once I’d deduced these facts, it became trivial to bruteforce the correct key (as one can see in lines 177–187 of my script).

The fourth round then took this derived key and tied itself to the Macintosh’s memory ROM routines. PACE’s code replaced the CPU’s custom trap handler for “_LoadSegment” with its own routine that would first call the original “_LoadSegment” routine, then decrypt the code just loaded from disk, and then return to the application. This allowed for decryption on-the-fly, and must have required a very deep understanding of the Macintosh…and some Linked-In sleuthing suggests the founder of PACE had indeed worked at Apple.

Fully appreciating the elegance of the solution, but armed with the final key and the reverse-engineered algorithm, I was able to decrypt the entire application (lines 193–196 in my script). The original application code was revealed, and the PACE code had been bypassed.

Yay!

The application was decrypted, revealing the ‘core’ of the anti-piracy routines.

The challenge wasn’t completely over: the application is split into four different executable files, and I needed to repeat the whole exercise for one of the other files. I also had to bypass some additional anti-piracy code (patches found in https://github.com/barberd/depace/tree/main/patch) within the application segments, but these were pretty straightforward to locate and bypass. Now, 35 years after release, the game is now playable on emulators, and one can write new floppies to replace the aged original ones. And unlike my last article, I can say this game was actually a lot of fun, and is very exciting to play even today.

In the few weeks since, I’ve discovered a few other games that were protected by PACE. I’m assuming PACE must have had different package offerings available to client software publishing houses, as several games (Sword of Kadash, Shanghai (version 1), and Star Trek: The Kobayashi Alternative) did have the same self-decrypting anti-piracy routines, but did not encrypt the rest of the application. These used the same encryption algorithm as rounds 2–4 in MacWars, but the keys were hard-coded instead of being generated dynamically. The only other game I’ve found (so far) that encrypted the entire application similar to MacWars is Seven Cities of Gold. This doesn’t mean others won’t turn up though!

PACE should be very proud; their anti-piracy system lasted 35+ years, and was full of methods that would still be considered best-practice today, and were absolutely ground-breaking back then. It was an amazing piece of work, and one gets a sense of how exciting it must have been to help develop it, even 35 years disconnected. But…maybe next time use the full 32 bits of the key, not just the last 8? I’m betting someone got a little rushed and didn’t want to figure out how to handle cases where the code size didn’t line up to an exact multiple of 32 bits.

Enemy space station go boom. The galaxy is now safe for vintage gamers.

I’m proud to have solved this challenge, and must give kudos to PACE for developing such a strong system that lasted 35+ years. I hope by sharing my code and these techniques, other researchers can continue to work on other applications and games no longer available to vintage computing enthusiasts.

Geek from the Oregon Trail generation