Tech Notes: The dominator tree of a dependency graph

The dominator tree of a dependency graph

July 12, 2023

One of the core joys of mathematical thinking is how sometimes, once you frame a problem the right way, the solution becomes obvious. Think of Euler inventing graph theory to trivially solve a complex-seeming problem. This post describes another graph trick.

The practice of software is bursting with graphs. One I've touched a lot in my career is the dependency graph. C files may #include another; TypeScript files may import another; Rust crates may import from one another; Ninja build step inputs depend on the outputs of others. These collections of edges form a directed graph.

Sometimes we want to visualize or understand these graphs. Some questions you may have are "which parts of my program depend on others?", or "why is my program so big, what can I cut?" If you dump a real-world dependency graph into a graph visualization algorithm you'll often find the resulting picture is a mess. (A pet peeve: visualizations of graphs that end up as big amorphous blobs of dots, coupled with some "whoa look how complex it is" insightless insight.)

A graph dominator is well-specified there on Wikipedia. My hand-wavy description: A dominates B if all paths from the root to B go through A; A immediately dominates B if it's the "lowest" dominator; and a dominator tree is the tree constructed from the immediately-dominates relationship. I have mostly seen dominators discussed in compilers and control flow graphs, where A dominating B means that control flow always reaches A before B.

What is the dominator tree of a dependency graph? When A is above B in a dominator tree, the intuition is that it means all paths to B go through A, and A is the lowest such point. For a dependency graph specifically, here are some other ways of looking at what the dominator tree reveals:

The only reason B is in this graph is because A uses it.
If we removed A's usage of B, it would no longer be in the graph.
Any cost (in binary size etc.) of B is wholly attributable to A.
No part of the program outside of A interacts with B; B is a "private" dependency of A.

When working with dependencies these are great tools to have. Further, a tree often ends up easier to visualize than a graph: you can render it as a simple indented list without needing to touch on graph layout.

More generally, whenever I have a directed graph which may contain cycles, I now think of the dominator tree as a way of "summarizing" the graph into a simpler tree DAG. If the graph is too tighly woven the tree may end up flat, but if not it can useful reveal structure.