The Principle Of Least Astonishment

POLA, defined

POLA stands for “The Principle Of Least Astonishment,” which means that when you introduce changes into an established system you should do so in a way that does the least damage to continuity from one system to another.

I have often seen this term “POLA” mentioned as if it were some sacred law, but until now I never looked it up. It’s an excellent principle to hold to: I have often used the phrase “I hate surprises,” generally meaning I don’t like to discover undocumented or untested gotchas. But there’s always someone who thinks they understand your needs so well, they don’t need to discuss a proposed change with you.

An example: back in my last startup, we had a midnight cron job to rotate web server access logs for report processing. Midnight is a convenient time, since it occurs at a change in the date. It also occurs when most people are sleeping, so problems that occur in conjunction with that process will be inconvenient, to say the least.

It turned that some bright young spark in engineering decided to commit the unpardonable sin, from an operations standpoint: he changed something that was working perfectly. At some point, some of the engineers were infected with a mild form of cleverness, the kind that never quite works out, so some of them decided to start using bash‘s ability to accept variables and declarations from external files. I’m sympathetic to the idea of wanting cleaner, more easily maintained code, but it doesn’t work on the first try.

The routine, at its most basic, is this: you stop the webserver process, you rename the active logfile, you restart the webserver, then you compress or so whathaveyou with the old logfile. This would result in an outage of a second to 5 seconds, depending on how much logging data had to be flushed. You had to stop the server process in those days to close the file/get the server to stop writing to the filehandle. Anyway, the POLA violation occured when someone adjusted the process to stop the webserver, rename the logfile, compress it for archival storage, then restart the webserver.

This meant that while the compression process was happening — meaning 10:1compression of a 2+ million line file — the service was dead. No requests would be answered. This could take up to ten minutes.

The change was buried in a function definition in a file that was referenced or “sourced” by the actual executable script. Yours truly discovered it after about a month of midnight wakeup calls for the operations staff, and it took me a few tries.

So now I have a new way to refer to incidents like this, and I think the phenomenon is not necessarily specific to technology. Does a day go by when we aren’t inconvenienced by transgressions of POLA?