Tech Notes: Rethinking errors, warnings, and lints

Rethinking errors, warnings, and lints

January 12, 2022

Programming language ecosystems often involve tools that process code and produce errors or warnings, such as compilers and linters. Different programmers have widely different opinions on how to treat these systems. This culminates in disagreements around questions like: Should I treat all warnings as errors? Why is X an error and not a warning? Can I ignore lint warnings?

In this post I aim to give you a framework for how to reason about these systematically, with lessons learned from my recent years of work on programming language tooling for thousands of engineers at Google.

To begin with, we need to reset your biases, and to do so let's make up new words just for the duration of this post. Let's say a compiler/linter/static analysis/etc executes checks which can identify problems in code.

Here are some examples of some checks I've seen, just to ensure we're more or less on the same page:

does the code have a type error?
does the code have an unused import, variable, function parameter, ...?
does the code use a dangerous language feature?
does the code use a deprecated API?
does the code use a string without localizing it?
does the code use a suspicious pattern, such as calling a function without using the return value?
does the code change something that needs an accompanying change elsewhere? (e.g. keeping two lists in sync)
does the code need more comments?
is the commit message properly formatted?

The development cycle

At a high level there are two purposes of checks:

identify problems earlier in the development cycle;
maintain invariants, such as "the code never contains unused imports".

(There's potentially a third case of checks — what to do with problems like syntax errors that make the program completely uninterpretable. But more on that in a second.)

To elaborate on (1), the process of software development is taking code through a series of phases:

edit
build
run locally
commit
deploy (e.g. release to users)
execute remotely (e.g. on server or on user's machine)

We programmers loop through each of these — we edit many more times than build, build more often than we run, run more often than we commit, and so on. At each phase, discovering a problem is more costly than discovering it at an earlier one, up to orders of magnitude: contrast the cost of spotting and fixing a squiggly underline in a code editor vs the cost of tracking down the cause of a user's bug report.

So here's an easy goal: when possible, discover problems in earlier phases rather than later. For example, one way to view static typing is that it shifts type problems from the execute phases to the earlier build phase; a reason programmers like IDEs is that they can surface problems as you type rather than needing to wait for a build first.

Invariants, and inform versus block

How we maintain invariants is more interesting. What do you do when a check finds a problem? There are two basic options: either inform the programmer and move on, or block forward progress.

Here's a warm up to make this more concrete. In most statically typed languages, a type error stops compilation, so you could say that for a type problem the compiler must block at build time. But even something as apparently fundamental as this is not fixed! For one counterexample, TypeScript can be configured to generate code regardless in the presence of type errors (which never affect runtime behavior anyway).

You could imagine even a C compiler could choose "inform" on a type problem and generate a runtime failure in its place, to allow you to run the resulting program. (I've heard some tools in the Java ecosystem can already do this, perhaps?) This behavior is plausibly useful when developing, where you want to iterate on module A of a program while you ignore type problems in irrelevant module B that you know you won't execute while you are working on A.

In principle then at least, for any problem you could choose between blocking and informing, and informing lets the programmer choose whether to stop rather than forcing it. At any given moment there are surely hundreds of different ways the code could be improved, some of them not even detectable by your tools; in my experience there's never a shortage of known flaws in a given codebase, and the hard problem is instead how to prioritize these problems. When we block on one problem what we're effectively saying is that of all the different things that could be worked on, the problem you just identified is absolutely the highest priority to fix, so much so that you cannot be allowed to work anything else before fixing it.

So, you might ask, if we were starting over again, why would we ever choose to block for any problem? It's a pretty deep question, really! In practice it comes down to human nature and how we sometimes won't do work unless we're forced. For example, consider warning blindness.

Warning blindness

You often find programmers recommending to configure C compilers with -Werror, which turns all warnings (which typically "inform") into errors ("block"). But at that point, why do languages even have a mechanism for warnings at all? Again consider TypeScript: the problems it finds ("diagnostics", in TypeScript lingo) have a severity field that is an enum that has a possible level of "warning", which then is never actually used by the compiler! It ends up feeling pretty circular, where warnings are never used so everything is an error, but then there's also an option to not block on errors.

The underlying reason for this is that in my (and likely your!) experience, informing about a problem without blocking on it at some point often leads to a situation where sometimes problems are ignored. This then leads to "warning blindness", which is when checks find so many problems, it gets hard to spot the important checks amidst the ignored ones, which then means the checks stop actually finding problems. The treat-warnings-as-errors approach fixes this using the only lever we normally have available — blocking compilation — which forces fixing the warnings.

Choosing to block at any phase is to enforce an invariant, e.g. "code that passes this phase never contains a type error". Invariants are powerful tools for reasoning because they let you relax the part of your brain that normally needs to worry about an invariant. For example, imagine a system that verifies (perhaps using a type system) code doesn't contain XSS or SQL injection, choosing to block before deploy. Now code reviewers don't need to check for that problem as closely.

But importantly, it's also useful to be able to relax an invariant. While developing, you might wanna sprinkle in some printfs while debugging something and you don't want to worry about appeasing the XSS checker as you do it. It's valuable both to be able to sprinkle those prints (not blocking on error at the run phase) while also guaranteeing that you never deploy such code (blocking at the commit or deploy phase).

The pattern of informing in one phase followed by blocking in a later one is a nice way to softly introduce an invariant while still giving the programmer some wiggle room while developing. For example, most problems that block at a later phase (such as a blocking build problem) can usefully be informed during the editing phase (as a highlight in the editor) as a way of shortening the development loop. Another example of this pattern in practice is in projects that provide a warning-only linter during development, but then require code to be lint-clean before submitting it.

The solution to warning blindness here can be generalized as this principle: informing without blocking can be useful to allow rapid progress, but if any phase ever decides to inform, some later phase must block on the same thing to ensure the problem doesn't stick around. And that is to my mind the best application of informing, as a marker for effectively time-shifting when blocking happens to a later phase.

Unused import hygiene and and lint

It's time for another case study: how unused imports are handled in Go. Go wanted to enforce the invariant that code never has unused imports, so the compiler refuses to build code that has unused imports. This has a nice outcome — you never see Go code with unused imports — but it can be frustrating to work with while developing, where you might need to repeatedly add and remove a temporary import like the 'log' module while debugging.

As a user of Go and as a person who prefers more checks rather than less, I still think it would be better to allow building and running code with extra imports. What I think we actually care about is that you don't commit such code, which is a later phase in the development cycle. In my experience there are a lot of "hygiene" like problems, such as preferred whitespacing or whether imports are in the proper order, that better belong as checks during commit rather than while developing. I imagine one big reason Go doesn't behave this way is that they don't have the adequate hooks into the developer workflow to enforce such a thing.

The lack of adequate hooks in the other direction, from the developer into the toolchain, is also an explanation for the existence of "lint" tools. In my experience tools named "lint" often end up being a grab-bag of "all the checks we wanted but that weren't enforced by the language already", which can vary from trivial guidelines all the way up to really important invariants. Switching your mental framing from "which program does the check execute as part of" to "which development phase does the check execute during" makes it obvious to me that programming language tooling like compilers instead ought to provide hooks for programmer-defined checks, such that those checks behave just like language builtins.

TypeScript does something almost like this, with an option for providing language service addons in the tsconfig (which can then surface problems in the editor), but those addons aren't used in the compilation step itself. So in TypeScript and in many other languages, introducing additional checks often involves a separate build step that has to re-parse (or even re-type check) the input source. For another consequence of this layering, note that code editors build compiler integrations (to show compiler checks) and then end up separately building lint integrations (to show lint checks), when it'd be more coherent if they only needed to integrate with one "tooling that checks for problems" system.

One objection you might have at this point is something about how compilers typically check "important" problems and linters check "unimportant" problems. This is true to an extent but it's also often kind of a historical accident; there's plenty of unimportant things a compiler can complain about, and there are plenty of lint-like tools that discover real problems. And this also explains why different compilers might treat different problems as either a warning (inform) or error (block) kind of problem.

Confidence

Another axis to analyze checks along are how confident they are about the problem. At one extreme, a compiler might say "if this code is ever executed I guarantee it will explode"; at the other, a linter might say "this code pattern looks kinda fishy but I'm not sure". At the low confidence end of things this means that the check may misfire and complain about a problem that doesn't exist.

For this reason, in the past I have believed the right approach is to just let programmers ignore (bypass) low-confidence checks; for example, you could decide that some lint check is purely a guess and that you should feel free to commit code even when it fires. I have since come around to seeing this is a bad idea, because it means that the next person who edits the code will need to reevaluate the same question, and a system where you regularly need to disregard warnings leads to warning blindness. (Within Google we had a system that attempts to identify whether a given change actually introduced a problem, or whether it was preexisting in the code, but it ends up fragile when things happen like code is moved or variables get renamed.)

Instead, I think a better mechanism for responding to checks is for programmers to annotate code directly. In some cases that can mean modifying the code itself. A common check across languages is to say a statement like if (x = 3) ... is suspicious because it assigns a value to x rather than testing it, and in some languages the fix is to write it if ((x = 3)) ... as a way of more or less saying "no I really meant that".

But in some cases, there isn't an easy syntactic modification available, in which case I think an explicit acknowledgement of the check is both a great fix for the check and for other programmers. For example, an explicit comment that turns off the check is also exactly the right place to explain to the next programmer why the code looks suspicious but is actually ok:

// The unsafe loader here is ok because bar() already sanitizes the input.
// ignore:no-dangerous-api
dangerouslyLoad(bar(input));

(This example isn't itself great in that the above case can likely be better addressed with types, but that is also another demonstration of how modifying the code itself can be better than annotations.)

One tempting solution that doesn't work well is to put these annotations somewhere other than the code, like in a file on the side or in a config file. This doesn't work out because it results in action at a distance: if someone moves the code, suddenly a check starts firing, and it's not obvious to discover that it was caused by a reference to the old location of code in another file. The above mechanism, with the annotation directly on the code, means that anyone who touches the code (or copy-pastes it) will preserve its behavior.

Fixing

In some cases the fix for a problem can be autogenerated by tooling. For example, imagine a check that enforces imports are sorted. (Enforcing this invariant is often useful for reducing unmeaningful churn when imports are added and removed over time, as well as making it easier for tools to automatically modify imports.)

Computers are better than humans at sorting, so it would be helpful if the imports-are-sorted check was accompanied by a way to automatically apply the fix. Again, such fixes are usefully plumbed throughout the stack of phases: even if there's a lint-like problem that is blocking at submit time, it would be ideal if you could click a button in your editor to accept the suggested fix, or even apply the fix automatically on save.

Frequently, the code to identify a problem is the best place to also identify the fix. A tool might say "you wrote foobra right here but you probably meant foobar", and when it does so it knows exactly the byte offsets of those symbols that other tools would use to both display or execute the fix.

So to make this work well, what you need is a uniform serializable representation of problems along with their accompanying fixes throughout the cycle, which allows tools at any layer to both produce and consume these fixes. (Nothing fancy, JSON is adequate.) For example, an editor consumes fixes by offering the user to apply those fixes, like the light bulb in VSCode (scroll up from here to see it); an offline tool could consume fixes by applying them automatically to code.

Evolving checks

One reason we used tools to apply fixes offline at Google is to roll out a new check. Suppose that we historically didn't sort imports but wanted to start doing so. A bad approach is to just turn on the check without changing any code. Doing that means the next person to build or modify an affected file suddenly is interrupted with some demand unrelated to the change they were planning to make, and the resulting commits mix unrelated concerns.

Much better is to programmatically execute the check and apply the fix up front, as its own independent change. A common task I did over my last years at Google was such a thing, deploying automated fixes like these to thousands of files. Representing these "code migrations" as a problem check with an attached fix, rather than a specific one-off code migration program, has the additional benefit that you can use the same code to maintain the invariant going forward — the code that implements the offline import sorting is the same code that offers to sort your imports on new commits, sorts your imports within the editor, or comments on code reviews.

In the case where a new suppressible check cannot be automatically fixed, it's counterintuitively still valuable to insert a suppression comment at every current instance of it in the code base as part of rolling it out, because this (1) still allows the new check to fire on newly introduced code, and (2) clearly marks all the locations that are still failing the check, making it easy to grep for (how many remaining places are using APIs that we have marked as dangerous? Just grep for the dangerous-api suppression) and track in a burndown chart. In my experience, sending a code review that inserts a suppression is often a good prompt to the reviewer to either say "this code is fine as is" and accept the change, or push back with "this actually discovered a bug" and send a counterproposal.

For fixes that have a high confidence, a better option than bothering humans about them is to just apply them immediately. This, for example, is how whitespace is handled in any humane programming language these days — it's never mentioned to the programmer, and never brought up in any code review, but it's just fixed as you hit save in the editor. Meanwhile, other fixes have low confidence, where you definitely need a programmer involved before applying them; sometimes a given problem might even have multiple plausible different generated fix options. For this reason it's useful to model potential fixes as an array. In VSCode, for example, it can sometimes pop up a menu to let the programmer choose between the fix alternatives.

Conclusion

In this post I jumped around between how things actually work and how things ought to work. It was my intent not to describe any system in particular, but rather gather the patterns that I have seen work well and also suggest how programming language designers can usefully think about them in the future. I hope it was helpful!