Why Is Email So Complicated? Part 221: The Legacy of Punch Cards

Most people who use email today probably have never heard of punch cards, and although my readers are probably a bit more knowledgeable, I'll describe them briefly.  Punch cards were the primary method of data input and output for the earliest computers.   Programmers would write programs, specialized operators would translate them into a series of cards which would be fed into the machine, which would run the programs and deliver its output via a new stack of punched cards.

In their heyday, billions of punch cards were used each year, and every program written for the first generations of computers was tailored to their capabilities.  Various sizes of punch card were used and manufactured, but the dominant standard was the IBM/Hollerith 80 column card.  By the time alternate input and output mechanisms such as keyboards and screens and printers became common, nearly every program in the world was designed to accept and produce 80-column data.

It was therefore natural that nearly every video display built for the first few decades of that technology's existence was also geared to 80-column data.  The 80-column display was a near universal standard, largely because it made it easier to convert older punch-card oriented programs to work on the newfangled TV-like screens.

Similarly, programs that exchanged data files tended to use 80 columns of data, which meant 80 characters per line.  Various conventions were developed to indicate when data was continued on a subsequent line, though these conventions tended to differ from one protocol or application to another.

That is the world for which the first email programs were written.  The Internet email protocols which evolved into SMTP were all designed for an 80 column world.  Lest you judge the designers too harshly, recall that it was impossible to know what kind of system your file would be transferred to, and it was possible, well into the 1970's, that it would involve punched cards.  Limiting the data to 80 columns was simply good, conservative engineering, ensuring that it wouldn't break when transmitted to those older systems.

The world changed, of course.  But changing a protocol that everyone uses every day is no small matter; it has to be done carefully and incrementally.  You want to make sure that most mail servers can handle "long-line" email before you start sending it.  So, although the standard eventually raised the line length limit to its current 1000, most mail sending software still tries to keep lines under 80 characters, so that they won't have to worry about problems with older mail software that receives it.   At this point, no one really knows how many servers would have trouble with longer lines, but nearly everyone thinks it safest to stay under the 80 character limit first defined by IBM punch cards.

You might also ask, why should there be a line limit at all?  Why not just send a binary chunk of data, like most modern software?  Again, the answer is in the installed base.  Most email software is designed to receive data in line-oriented format, terminated by a standard end-of-line marker (about which I'll have more to say in a future essay).  Binary data would break nearly everything.

In the 1990's the mail gurus (yes, including me) designed an SMTP extension that would allow consenting mail servers to exchange binary data.  As useful as that sounds, it's seen remarkably little use, because there's already a convention called base64 for converting binary data into line oriented data.  Base64 enlarges the data by 33%, which sounds like the kind of thing engineers would want to avoid, but it's much easier than changing all the mail software in the world.

So, we live in a world where my daughter can email me a video of my grandchildren, which still amazes and delights me, but that 6 megabyte video becomes 8 MB for transit because it has to be encoded as 80 column lines that are safe for punch cards, just in case it needs to be printed on them.

Protocols, like animals, evolve to produce solutions that work, not necessarily solutions that are optimal or elegant.  We walk upright with a quadruped's backbone, and email transmits video with a punch card's line format.

As long as it ain't broke, we probably won't fix it.

Image via Luke Sheppard