Writing a Debugger

For the Upcoming version of Elements we completed the port from LLDB to our own native debugger. We already had debug engines for .NET, Java, WebAssembly, Windows and Linux, but for iOS and macOS we used Apple's open source LLDB. LLDB had an obvious advantage of supporting all the platforms and also having built-in Objective-C support. While LLDB has served us well over the years, we kept running into limitations due to our languages not being officially covered by it, and some semantics just didn't have a C equivalents. We started this new engine for our native code backend (Island) a few years ago, but with recent addition of iOS and arm64 we could finally finish our macOS and iOS targets, so the new debug engine is now used for Island and Toffee projects. This article explores how we wrote this on a more technical level.

How we started

While we had experience with compilers and managed debuggers (for .NET, Mono and Java), we never did native debuggers before we started the Island platform. However one thing that was important for us was that we would be able to share as much code between the target platforms and architectures as possible. We started by exploring what is available on Windows and Linux.

Windows

On Windows we used the native debugging APIs, as Windows doesn't have any built-in remote debugger:

Api	Purpose
CreateProcess	Api to start a new process, has flags to start the process in debugging mode (Imports.DEBUG_PROCESS or Imports.DEBUG_ONLY_THIS_PROCESS)
DebugActiveProcess	Attach to another process
DebugActiveProcessStop	Detach from a process
OpenProcess	Gets a handle to a runing process (used for read/write memory apis)
CloseHandle	Closes a previously opened handle (CreateProcess and OpenProcess)
WaitForDebugEvent	Inside the debug loop, wait for debugging events to occur and return the next one
ContinueDebugEvent	Resumes the process after last debug event
TerminateProcess	Kills a process with a given exit code
ReadProcessMemory/WriteProcessMemory	Reads and writes process memory
VirtualAllocEx/VirtualFreeEx/VirtualProtectEx	Allocates/Frees and changes the rights of virtual memory in a foreign process; we use this for on the fly jitting to run code in another process, like during evaluation
DebugBreakProcess	Forces a break in the target process
(Wow64)GetThreadContext/SetThreadContext	Reads the registers of a foreign process; the WoW64 version is when reading 32bits processes from a 64bits process

The APIs support the concept of breaking, continuing, instruction stepping, module loading, exceptions and threads. Other that that, they also support reading and writing memory and registers. Generally all debug related apis should be called from a single thread, one that also has the debug loop. Any pauses in the process should be done by pausing the debug loop for the duration of the pause.

Note that you cannot debug 64bits processes from 32bits processes, and when debugging 32bits from 64bits the WoW64 APIs must be used. Also, the exception codes differ for things like the break exception (e.g. STATUS_BREAKPOINT vs STATUS_WOW64_BREAKPOINT). The debug loop generally looks something like:

  while (running) {
     DEBUG_EVENT data;
     if (WaitForDebugEvent(&data, <timeout>) { 
       switch(data.dwDebugEventCode) {
         case CREATE_PROCESS_DEBUG_EVENT: 
           // process was started
           break;
         case EXIT_PROCESS_DEBUG_EVENT: 
           // process finished running
         // Other events:
         // CREATE_THREAD_DEBUG_EVENT a new thread was created
         // EXIT_THREAD_DEBUG_EVENT a thread finished
         // OUTPUT_DEBUG_STRING_EVENT debug line info
         // LOAD_DLL_DEBUG_EVENT loading a dll
         // UNLOAD_DLL_DEBUG_EVENT a dll was unloaded
         case EXCEPTION_DEBUG_EVENT:
           switch(data.ExceptionRecord.ExceptionCode) {
             // STATUS_WOW64_BREAKPOINT/STATUS_BREAKPOINT: Int3/breakpoint
             // STATUS_WOW64_SINGLE_STEP/STATUS_SINGLE_STEP: Single step instruction breakpoint
             // Other codes are regular exceptions, like user ones, Accessviolation and others.
           }
           break;
       }
       ContinueDebugEvent(data.dwProcessId, data.dwThreadId, DBG_CONTINUE);
       
     }
  }

Posix: gdbserver & debugserver

On Linux we use gdbserver, and on macOS and iOS LLDB's debugserver. The reason we build on gdbserver and debugserver here is that they generally are available, or easily installed and saved us a lot of time implementing this. However, conceptually these are very thin wrappers over Linux' ptrace and macOS' equivalent infrastructure. They both use the same protocol which is mostly identical to the protocol used by serial jtag devices debug devices. It's a text protocol that, again, has the concept of signals, stopping, resuming, threads, reading and writing memory and registers. The gdb remote protocol documentation describes the protocol in detail, but it generally looks like $<packagetdata>#twobytehash, where the hash is the sum of the ASCII bytes of the character data.

Request which features we support and are supported on the other end.
->: 'qSupported:multiprocess+;swbreak+;hwbreak+;qRelocInsn+;vContSupported+;QProgramSignals+;'
'QThreadEvents+;QStartNoAckMode+;no-resumed+;xmlRegisters=i386'
<-: 'qXfer:features:read+;PacketSize=20000;qEcho+'

Disable the requirement for explicit ack. Acking is something needed for serial 
protocols but has no real purpose on a real host.
->: 'QStartNoAckMode'
<-: 'OK'

Ask what cpu the other end runs.
->: 'qHostInfo'
<-: 'cputype:16777228;cpusubtype:2;addressing_bits:47;ostype:macosx;'
'watchpoint_exceptions_received:before;vendor:apple;os_version:11.2.1;'+
'maccatalyst_version:14.4;endian:little;ptrsize:8;'

Enable thread suffixes for debug commands, this will include the current thread in callbacks.
->: 'QThreadSuffixSupported'
<-: 'OK'

Request the first part of the thread list.
->: 'qfThreadInfo'
<-: 'm41e8dc'

Request the next part (empty)
->: 'qsThreadInfo'
<-: 'l'

Returns where we currently are. This returns SIG17 (0x11). The following fields 
are the current register values. In this particular case it's a breakpoint at the 
entry of the process. 
->: '?'
<-: 'T11thread:41e8dc;00:0000000000000000;01:0000000000000000;02:0000000000000000;'
'03:0000000000000000;04:0000000000000000;05:0000000000000000;06:0000000000000000;'
'07:0000000000000000;08:0000000000000000;09:0000000000000000;0a:0000000000000000;'
'0b:0000000000000000;0c:0000000000000000;0d:0000000000000000;0e:0000000000000000;'
'0f:0000000000000000;10:0000000000000000;11:0000000000000000;12:0000000000000000;'
'13:0000000000000000;14:0000000000000000;15:0000000000000000;16:0000000000000000;'
'17:0000000000000000;18:0000000000000000;19:0000000000000000;1a:0000000000000000;'
'1b:0000000000000000;1c:0000000000000000;1d:0000000000000000;1e:0000000000000000;'
'1f:c03af46e01000000;20:00907e0101000000;21:00000000;metype:5;mecount:2;'
'medata:10003;medata:11;'

Write memory:
->: 'M100ec5f04,4:e0031faa'
<-: 'OK'

Read memory:
->: 'm16ef436f0,8'
<-: '1037f46e01000000'

Resume debugging
->: 'vCont;c'

Above will not return till the next event, which looks a lot like the T11 
line above, depending on the signal.

The macOS version supports the concept of reading module information, however does not have events for detecting delayed loads. To work around that, there is a hardcoded symbol inside dyld where the debugger can set an internal breakpoint.

The Linux version has an accessible list of pointers, one of which has the same breakpoint location, and a list of currently loaded modules.

So on macOS we set a breakpoint on the __dyld_debugger_notification function symbol, and use the jGetLoadedDynamicLibrariesInfos API to retrieve the list of modules. On Posix we use AUXV data records. At AUXV_AT_ENTRY set a breakpoint, like on macOS. AUXV_AT_PHDR contains the program header table, containing among other info, the list of modules, while AUXV_AT_PHNUM contains the number of modules loaded.

Breakpoints

The most low-level and important concept in a debugger are breakpoints. Breakpoints are used by both-low level constructs (such as exceptions) and user-set breakpoints. While some platforms support a limited set of hardware breakpoints, as far as I could tell this wasn't very common. Instead you create hardware breakpoint by replacing code with an instruction like int3 (on x86/x86_64), saving the original code that was overwritten. When the breakpoint hits, the debugger restores the code, rewinds the ip, runs a single instruction, and — if the breakpoint needs to be set again — writes it out again.

Signals

Signals occur for lots of things, but for debuggers the important one is the one for breakpoints. The actual signal code is dependent on the platform. On Windows the closest thing comparable are exceptions. Single-step and breakpoints are exception types there. The debugger gets the thread, current registers and current instruction pointer as part of the event.

Call stacks

All debuggers have to be able to walk the stack given the current instruction. While a lot of debug formats have info on how to do this, debuggers should have a fallback for when no debug info is available. Walking the stack is an instruction set specific operation (and gets trickier when it comes to optimized code) but on i386 and x86_64, walking can be done by using the frame pointer to find the stack pointer that was set on on entry and the return address for the previous instruction pointer. This is because the general stack frame looks like:

push ebp
mov ebp, esp

// or 
push rbp
mov rbp, rsp

Generally this comes down to:

[EBP/RBP + PointerSize] containing the return address
[EBP/RBP] containing the previous frame pointer

Other platforms like arm/arm64 have the same concept, just with different registers.

Conclusion

This concludes the first article on our new debugger.

You can see the Island debugger in action when debugging Cocoa, Windows, Linux or WebAssembly projects with Elements, in Fire, Water or Visual Studio.

In a future article, i will dive into more detail about the different debug info formats.