For the Upcoming version of Elements we completed the port from LLDB to our own native debugger. We already had debug engines for .NET, Java, WebAssembly, Windows and Linux, but for iOS and macOS we used Apple's open source LLDB. LLDB had an obvious advantage of supporting all the platforms and also having built-in Objective-C support. While LLDB has served us well over the years, we kept running into limitations due to our languages not being officially covered by it, and some semantics just didn't have a C equivalents. We started this new engine for our native code backend (Island) a few years ago, but with recent addition of iOS and arm64 we could finally finish our macOS and iOS targets, so the new debug engine is now used for Island and Toffee projects. This article explores how we wrote this on a more technical level.
How we started
While we had experience with compilers and managed debuggers (for .NET, Mono and Java), we never did native debuggers before we started the Island platform. However one thing that was important for us was that we would be able to share as much code between the target platforms and architectures as possible. We started by exploring what is available on Windows and Linux.
Windows
On Windows we used the native debugging APIs, as Windows doesn't have any built-in remote debugger:
Api | Purpose |
---|---|
CreateProcess | Api to start a new process, has flags to start the process in debugging mode (Imports.DEBUG_PROCESS or Imports.DEBUG_ONLY_THIS_PROCESS) |
DebugActiveProcess | Attach to another process |
DebugActiveProcessStop | Detach from a process |
OpenProcess | Gets a handle to a runing process (used for read/write memory apis) |
CloseHandle | Closes a previously opened handle (CreateProcess and OpenProcess) |
WaitForDebugEvent | Inside the debug loop, wait for debugging events to occur and return the next one |
ContinueDebugEvent | Resumes the process after last debug event |
TerminateProcess | Kills a process with a given exit code |
ReadProcessMemory/WriteProcessMemory | Reads and writes process memory |
VirtualAllocEx/VirtualFreeEx/VirtualProtectEx | Allocates/Frees and changes the rights of virtual memory in a foreign process; we use this for on the fly jitting to run code in another process, like during evaluation |
DebugBreakProcess | Forces a break in the target process |
(Wow64)GetThreadContext/SetThreadContext | Reads the registers of a foreign process; the WoW64 version is when reading 32bits processes from a 64bits process |
The APIs support the concept of breaking, continuing, instruction stepping, module loading, exceptions and threads. Other that that, they also support reading and writing memory and registers. Generally all debug related apis should be called from a single thread, one that also has the debug loop. Any pauses in the process should be done by pausing the debug loop for the duration of the pause.
Note that you cannot debug 64bits processes from 32bits processes, and when debugging 32bits from 64bits the WoW64 APIs must be used. Also, the exception codes differ for things like the break exception (e.g. STATUS_BREAKPOINT
vs STATUS_WOW64_BREAKPOINT
). The debug loop generally looks something like:
while (running) {
DEBUG_EVENT data;
if (WaitForDebugEvent(&data, <timeout>) {
switch(data.dwDebugEventCode) {
case CREATE_PROCESS_DEBUG_EVENT:
// process was started
break;
case EXIT_PROCESS_DEBUG_EVENT:
// process finished running
// Other events:
// CREATE_THREAD_DEBUG_EVENT a new thread was created
// EXIT_THREAD_DEBUG_EVENT a thread finished
// OUTPUT_DEBUG_STRING_EVENT debug line info
// LOAD_DLL_DEBUG_EVENT loading a dll
// UNLOAD_DLL_DEBUG_EVENT a dll was unloaded
case EXCEPTION_DEBUG_EVENT:
switch(data.ExceptionRecord.ExceptionCode) {
// STATUS_WOW64_BREAKPOINT/STATUS_BREAKPOINT: Int3/breakpoint
// STATUS_WOW64_SINGLE_STEP/STATUS_SINGLE_STEP: Single step instruction breakpoint
// Other codes are regular exceptions, like user ones, Accessviolation and others.
}
break;
}
ContinueDebugEvent(data.dwProcessId, data.dwThreadId, DBG_CONTINUE);
}
}
Posix: gdbserver & debugserver
On Linux we use gdbserver
, and on macOS and iOS LLDB's debugserver
. The reason we build on gdbserver
and debugserver
here is that they generally are available, or easily installed and saved us a lot of time implementing this. However, conceptually these are very thin wrappers over Linux' ptrace
and macOS' equivalent infrastructure. They both use the same protocol which is mostly identical to the protocol used by serial jtag devices debug devices. It's a text protocol that, again, has the concept of signals, stopping, resuming, threads, reading and writing memory and registers. The gdb
remote protocol documentation describes the protocol in detail, but it generally looks like $<packagetdata>#twobytehash
, where the hash is the sum of the ASCII bytes of the character data.
Request which features we support and are supported on the other end.
->: 'qSupported:multiprocess+;swbreak+;hwbreak+;qRelocInsn+;vContSupported+;QProgramSignals+;'
'QThreadEvents+;QStartNoAckMode+;no-resumed+;xmlRegisters=i386'
<-: 'qXfer:features:read+;PacketSize=20000;qEcho+'
Disable the requirement for explicit ack. Acking is something needed for serial
protocols but has no real purpose on a real host.
->: 'QStartNoAckMode'
<-: 'OK'
Ask what cpu the other end runs.
->: 'qHostInfo'
<-: 'cputype:16777228;cpusubtype:2;addressing_bits:47;ostype:macosx;'
'watchpoint_exceptions_received:before;vendor:apple;os_version:11.2.1;'+
'maccatalyst_version:14.4;endian:little;ptrsize:8;'
Enable thread suffixes for debug commands, this will include the current thread in callbacks.
->: 'QThreadSuffixSupported'
<-: 'OK'
Request the first part of the thread list.
->: 'qfThreadInfo'
<-: 'm41e8dc'
Request the next part (empty)
->: 'qsThreadInfo'
<-: 'l'
Returns where we currently are. This returns SIG17 (0x11). The following fields
are the current register values. In this particular case it's a breakpoint at the
entry of the process.
->: '?'
<-: 'T11thread:41e8dc;00:0000000000000000;01:0000000000000000;02:0000000000000000;'
'03:0000000000000000;04:0000000000000000;05:0000000000000000;06:0000000000000000;'
'07:0000000000000000;08:0000000000000000;09:0000000000000000;0a:0000000000000000;'
'0b:0000000000000000;0c:0000000000000000;0d:0000000000000000;0e:0000000000000000;'
'0f:0000000000000000;10:0000000000000000;11:0000000000000000;12:0000000000000000;'
'13:0000000000000000;14:0000000000000000;15:0000000000000000;16:0000000000000000;'
'17:0000000000000000;18:0000000000000000;19:0000000000000000;1a:0000000000000000;'
'1b:0000000000000000;1c:0000000000000000;1d:0000000000000000;1e:0000000000000000;'
'1f:c03af46e01000000;20:00907e0101000000;21:00000000;metype:5;mecount:2;'
'medata:10003;medata:11;'
Write memory:
->: 'M100ec5f04,4:e0031faa'
<-: 'OK'
Read memory:
->: 'm16ef436f0,8'
<-: '1037f46e01000000'
Resume debugging
->: 'vCont;c'
Above will not return till the next event, which looks a lot like the T11
line above, depending on the signal.
The macOS version supports the concept of reading module information, however does not have events for detecting delayed loads. To work around that, there is a hardcoded symbol inside dyld
where the debugger can set an internal breakpoint.
The Linux version has an accessible list of pointers, one of which has the same breakpoint location, and a list of currently loaded modules.
So on macOS we set a breakpoint on the __dyld_debugger_notification
function symbol, and use the jGetLoadedDynamicLibrariesInfos
API to retrieve the list of modules. On Posix we use AUXV data records. At AUXV_AT_ENTRY
set a breakpoint, like on macOS. AUXV_AT_PHDR
contains the program header table, containing among other info, the list of modules, while AUXV_AT_PHNUM
contains the number of modules loaded.
Breakpoints
The most low-level and important concept in a debugger are breakpoints. Breakpoints are used by both-low level constructs (such as exceptions) and user-set breakpoints. While some platforms support a limited set of hardware breakpoints, as far as I could tell this wasn't very common. Instead you create hardware breakpoint by replacing code with an instruction like int3
(on x86/x86_64), saving the original code that was overwritten. When the breakpoint hits, the debugger restores the code, rewinds the ip, runs a single instruction, and — if the breakpoint needs to be set again — writes it out again.
Signals
Signals occur for lots of things, but for debuggers the important one is the one for breakpoints. The actual signal code is dependent on the platform. On Windows the closest thing comparable are exceptions. Single-step and breakpoints are exception types there. The debugger gets the thread, current registers and current instruction pointer as part of the event.
Call stacks
All debuggers have to be able to walk the stack given the current instruction. While a lot of debug formats have info on how to do this, debuggers should have a fallback for when no debug info is available. Walking the stack is an instruction set specific operation (and gets trickier when it comes to optimized code) but on i386 and x86_64, walking can be done by using the frame pointer to find the stack pointer that was set on on entry and the return address for the previous instruction pointer. This is because the general stack frame looks like:
push ebp
mov ebp, esp
// or
push rbp
mov rbp, rsp
Generally this comes down to:
- [EBP/RBP + PointerSize] containing the return address
- [EBP/RBP] containing the previous frame pointer
Other platforms like arm/arm64 have the same concept, just with different registers.
Conclusion
This concludes the first article on our new debugger.
You can see the Island debugger in action when debugging Cocoa, Windows, Linux or WebAssembly projects with Elements, in Fire, Water or Visual Studio.
In a future article, i will dive into more detail about the different debug info formats.