Last Resort Debugging
When coworkers ask for help debugging their codebases I often say ājust use straceā to them. Normally Iām joking; simpler tools like in-language debuggers and plain old print statements are enough to fix most problems. On the other hand, when youāre out of other options these tools can be a life-saver. Not all programs are that observable; they might have little or no logging, or you might not have even written them or know what they do. I can think of a few cases where more advanced tooling has really helped me:
- With about an hourās work, I found that CPU was high on one of our production serverās because our DNS daemon was repeatedly reading
/etc/group. This would have been fine, if that file wasnāt about three hundred megabytes⦠- Finding out which files our cursed configuration management system was loading, to debug why it ignored my configuration
- I OOMed my laptop when running
deno fmtagainst my mildly hauntedwebsitesdirectory. Without having any notion of how that tool worked, I was able to root-cause the issue within a few minutes
I have to admit Iām writing this just so I remember to use one of these recently discovered tools again, next time I see a fleet running out of memory (that is, soon).
Strace
strace traces system-calls, signals, and process events. Ultimately thatās a lot of the things your program actually does.
The program output is not terribly easy to read (or scan with grep), so I often use B3 to parse it into something jq can filter. I generally use it to see what syscalls a program is making (especially interesting if the program is terminating prematurely), and trace what files it is loading.
Iāll link to Brendan Greggās list of one-liners, and include a few of my own.
# Trace all `open` syscalls for a command (including child processes), and write output to a file
strace -f -o out.txt -e trace=open deno fmt
# Trace syscall timings for a PID
strace -c -p <pid>
Heaptrace
Iāve occasionally needed to get heapdumps for some of my companiesā, em, less robust Java services. Itās always been painful, so I was impressed when I found a memory analysis tool thatās pleasant to use. When I OOMed my laptop yesterday while working with deno, I wanted to find out why so much memory was being leaked. heaptrack (and heaptrack_gui) made this trivial:
heaptrack deno fmt
heaptrack_gui heaptrack.deno.80887.zst
Iāve only just started using this tool, but the most useful things Iāve seen are:
- Overall summaries of leaked memory
- Tracking the call tree and which callers are leaking / allocating how much memory
Flamegraph
Flamegraphs are a useful way of spotting what a program is spending time on. In work Iāve spotted services with excessive regular-expression calls (precompile and stop using explosive regular expressions, people!) and lousy process handling within a few seconds of finding their flamegraphs.
These used to be difficult to generate, but a coworker mentioned this nice utility.
cargo install flamegraph
flamegraph -- deno fmt
open flamegraph.svg
Neat!
/Proc & /Sys
When all else fails, youāll probably find what you need in /proc and /sys. Thereās a lot in there (see my parser, procolli!) but in short itās a virtual file-system that lets you inspect (and write) kernel level information about processes. A few uses Iāve found:
- Spotting how commands were started (
/proc/<pid>cmdline) - Inspecting the program environment (
/proc/<pid>/environ)
And in particular, spotting resource-contention. Without getting into too many details, many system failures are caused by too many processes trying to use too few resources. This leads to queueing (contention) which slows processes and can signify the OS OOM killer is about to strike. Iāve used this to diagnose why some internal services are failing, in a way that looking at absolute resource usage would not allow.
Takeaway Points:
- Debugging is fun
- Itās more fun with powerful tooling
- Just use
strace