Geoff Huston recently discussed the processes behind putting together an RFC, and the history of why they came to be, so today we’re going to explore another aspect of what makes RFCs the way they are — their plain text format.
If you’ve ever seen court documents, you’ll have noticed they’re typeset in a very defined manner. They have a structure that is required by the court, using a monospaced font with double spacing. If you breach these basics, your document might be rejected. Some courts are very specific in what they accept, in terms of font size and weight, and use of overstrike or drawn lines.
Why is this? It’s because these strict requirements help the courts do their job. Narrow specifications help avoid confusion by preventing people from choosing weird fonts or formatting.
IETF documents end up looking quite a bit like those legal documents above. Although it’s 2021, and we’re up in the 9,000s of documents published as RFCs, the IETF still specifies a remarkably simple, textual form of document.
Let’s explore it.
It started with four simple guidelines
That the RFC series starts with RFC 1 shouldn’t come as a huge surprise (although, given they are written by computer scientists, you might be forgiven for asking why they don’t start with RFC 0).
However, it’s notable that RFC 3 is when the first hints of ‘how to write an RFC’ kick in. Authored by Steve Crocker in 1969, when he was a research student along with Jon Postel and Vint Cerf, it is admirably brief:
DOCUMENTATION CONVENTIONS CONTENT The content of a NWG note may be any thought, suggestion, etc. related to the HOST software or other aspect of the network. Notes are encouraged to be timely rather than polished. Philosophical positions without examples or other specifics, specific suggestions or implementation techniques without introductory or background explication, and explicit questions without any attempted answers are all acceptable. The minimum length for a NWG note is one sentence. These standards (or lack of them) are stated explicitly for two reasons. First, there is a tendency to view a written statement as ipso facto authoritative, and we hope to promote the exchange and discussion of considerably less than authoritative ideas. Second, there is a natural hesitancy to publish something unpolished, and we hope to ease this inhibition. FORM Every NWG note should bear the following information: 1. "Network Working Group" "Request for Comments:" x where x is a serial number. Serial numbers are assigned by Bill Duvall at SRI 2. Author and affiliation 3. Date 4. Title. The title need not be unique.
The idea of calling the notes ‘Requests for Comments’ came from Bill Duvall. The format and content remained the same until RFC 10, when the distribution list was altered. The sequence numbering was maintained by Bill Duvall, and people were happy to share ideas and thoughts. More revisions to distribution happened periodically in RFCs 16, 24, 27, and 30.
This guidance was simple and ensured a unique identity via the RFC number, context, as well as attribution.
Steve Crocker managed the RFC series until May 1971 when he departed UCLA to work at ARPA. Jon Postel nurtured and evolved the series for many decades thereafter.
Then came Postel’s formatting rules
The first RFCs were typeset in a reasonable approximation of what was to follow. It was perhaps a little rough and ready, but they basically met the goal of a readable document on a computer of the day, or printed out according to the US legal page size, which had lines of 80 characters.
Right up until the modern era, when RFC 7994 was published, the end goal for an RFC was to be defined by the ASCII textual version.
This textual format has deep roots in RFCs. It goes all the way back to RFC 825 in 1982, which was authored by Jon Postel, the man who was basically the original Internet Assigned Numbers Authority (IANA). In this RFC, Jon laid the roots of the format to document ‘Requests for Comments’ to be widely readable, and therefore useful across the community developing Internet protocols and standards.
RFC 825 was titled ‘Request for Comments on Requests for Comments’, which is pleasantly recursive by nature. It was only two pages long.
[....] the following rules are established for the format of RFCs: The character codes are ASCII. Each page must be limited to 58 lines followed by a form feed on a line by itself. Each line must be limited to 72 characters followed by carriage return and line feed. No overstriking (or underlining) is allowed. These "height" and "width" constraints include any headers, footers, page numbers, or left side indenting.
In the above excerpt you can see the beginnings of the structural form that is still being used by many people.
What were the rules?
Each page must be limited to 58 lines followed by a form feed on a line by itself
This constraint establishes a vertical size suitable for printing on US legal paper, likely a line printer, that would typically print 66 lines of text where each line was 132 characters wide, with a fixed-width font. So, an RFC was defined to fit ‘inside’ this limit, with two to three lines of clearance, top and bottom. A separate text specified using ‘form feed’ — a clear signal of intention to print on a line-printer — bare line with a form feed is the normal ASCII encoded signal to a printer to emit a new sheet.
Each line must be limited to 72 characters, followed by carriage return and line feed
This directive is interesting, because it doesn’t directly allow two-up page printing on a line-printer of only 132 characters width. That would require 144 characters. This constraint comes from the dimensions of a standard teletype terminal device, VT100 class, or similar. The standard ‘line length’ in a terminal like this was 80 characters, and so many products were specified to use 80 character line widths. Hollerith Punched Cards, which were still widely used for data entry, are 80 characters wide. So, the maximum you could ordinarily use on a line was 80 characters.
So, why 72 rather than 80? For wiggle room, basically. This comes from the practice of emailing text around. With an arbitrary limit like 80 (such as used on punched cards and display terminals) it is wise to stay inside the limit. With end-of-line markers being either a bare line-feed, or sometimes a combination of the line-feed and carriage-return ASCII marker (in fact, this is the key difference of opinion about line marks between UNIX users and Microsoft/DOS/Windows users), you need at least one, and possibly two spaces inside 80 on each side. This means you can be sure you can ‘fit’, if by accident, the line-end marks become printable.
There’s also another reason. Email was defined in RFC 822 and related specifications to prefer lines of no more than 76 characters long. This is said to stem from the inheritance of the characters per line limit of the teletype.
I personally think it’s a reflection of the emerging email ‘quoting’ model, which uses the ‘>’ marker at the front of quoted text. Seventy-six character lines, in an 80 character terminal, would permit either four characters of indent, or four levels of ‘>’ quoting in a sequence. This means you can rationally read formatted text, which is typeset to be readable inside this limit if somebody quotes it.
And indeed, the UNIX ‘fmt’ command sets a 60 character line by default, with 10 characters of ‘overflow’ (to avoid word hyphenation) and recommends a maximum 72 characters in some manual pages. This also permits repeated quoting inside email.
No overstriking (or underlining) is allowed
This rule is because there were many overstrike typography tricks in the time of line printers that did not work reliably in all output media. Some would ignore them. Some would do the right thing and overstrike. Some would print the text twice. Some would misinterpret the ASCII overstrike commands and shift into bizarre fonts and typesetting.
In any case, the use of overstrike was damaging to line-printer heads. As a student at the time, I was forbidden to print ascii art because of excessive ink consumption, partly due to overstrike causing damage to the line printer.
These height and width constraints include any headers, footers, page numbers, or left-side indenting
This is interesting, because it means you were guaranteed to receive text that actually meant what was listed was what you’d get (not always a given when things like headers and footers sneak in). It would be 66 lines, and at most, 72 characters per line, per page of text.
Pages were baked in. The idea that you had to be paginated, and had to subsequently repeat things like document title, author, page number at set intervals, followed on.
The rules kept growing
In 1989, along came RFC 1111 with more constraints. This document was still ‘Request for Comments on Request for Comments’ but was usefully subtitled ‘Instructions to RFC Authors’. RFC 1111 was more prescriptive about the structure of the document as a whole, as well as format and syntax:
- Do not fill the text with extra spaces to provide a straight right margin. This is just about readability. People were getting annoyed by lines with too much white space from aggressive right justification.
- Do not do hyphenation of words at the right margin. This makes sense because specific meanings couldn’t be matched properly for automatic processing, if they’re hyphenated. So, rather than invoke hyphenation, a ragged margin means you can put the whole hyphenation on the next line.
- Do not use footnotes. If such notes are necessary, put them at the end of a section, or at the end of the document. This rule starts to make it clear that the text has to be ‘at the same level’ and can’t have comments or parenthetical remarks, but should be clear in and of itself, or else refer to text elsewhere in the document that can be read in flow.
- Use single spaced text within a paragraph, and one blank line between paragraphs. These formatting requirements were to aide comprehension and density of the text, I suppose.
A more interesting directive preceded the formatting. This was the beginning of the IETF document purpose/requirements section, which still exists to this day for things like copyright and clarification of normative language use.
Each RFC is to include on its title page or in the first or second paragraph a statement describing the intention of the RFC. The following sample paragraphs may be used to satisfy this requirement: Specification This RFC specifies a standard for the ARPA Internet community. Hosts on the ARPA Internet are expected to adopt and implement this standard. Discussion The purpose of this RFC is to focus discussion on particular problems in the ARPA Internet and possible methods of solution. No proposed solutions this document are intended as standards at this time. Rather, it is hoped that a general consensus will emerge as to the appropriate solution to such problems, leading eventually to the adoption of standards. Information This RFC is presented to members of the ARPA Internet community in order to solicit their reactions to the proposals contained in it. While perhaps the issues discussed are not directly relevant to the research problems of the ARPA Internet, they may be particularly interesting to some researchers and implementers. Status This RFC is issued in response to the need for current information about the status and progress of various projects in the ARPA Internet community. The information contained in this document is accurate as of the date of publication, but is subject to change. Subsequent RFCs may reflect such changes. Report This RFC is issued to report on the results of a meeting. It may document significant decisions made that impact the implementation of network protocols, or limit or expand the use of optional features of protocols. Other meeting results may be indicated including (but not limited to) policy issues, technical topics discussed and problems needing further work. Of course these paragraphs need not be followed word for word, but the general intent of the RFC must be made clear.
Jon’s document also refers to postscript, and there are clues as to how RFCs would address a world that no longer revolved around fixed-width fonts. However, it was clear the postscript explanation of the RFC was intended to be pretty but not canonical.
There’s also the introduction here as to how the N/ROFF markup method can be used to formalize typesetting in an RFC. N/ROFF is the normal typesetting system for UNIX, and can produce text for terminals, or be used to make text for printing. T/ROFF and N/ROFF can produce camera-ready copy or postscript, and continues to be supported to this day to submit RFCs.
The RFC had grown to six pages long, at this point.
The document then added the concept of ‘revisions’ to prior work.
6. Relation to other RFCs Sometimes an RFC adds information on a topic discussed in a previous RFC or completely replaces an earlier RFC. There are two terms used for these cases respectively, UPDATES and OBSOLETES. A document that obsoletes an earlier document can stand on its own. A document that merely updates an earlier document cannot stand on its own; it is something that must be added to or inserted into the existing document, and has limited usefulness independently. The terms SUPERSEDES and REPLACES are no longer used.
Not surprisingly, this document says in its header — Obsoletes: 825, and defines the ‘Obsoleted by’ header, which means old and new documents are mutually linked in the ‘metadata’ part, so you can see what superseded it and what’s been replaced.
This version lasted until 1993, when RFC 1543 – Instructions to RFC Authors was published. The text above was left unchanged but added a new section:
Note that the number of pages in a document and the page numbers
on which various sectionsfall will likely change with reformatting. Thus cross references in the text by section number usually are easier to keep consistent than cross references by page number.
This reflected the emerging reality that page length was not consistent across all systems. Aside from the USA, the world overwhelmingly preferred to use A4 paper.
As a result, in-document page references were not going to continue working and text would need structural markers. There is even more direction to authorship, editorial, intent, and purpose, reflecting the increasing formalism of the IETF process overall.
The document had now grown from its two-page origins to 16-pages long. It has the N/ROFF directives to produce the format, so much of the growth is because it now explains more about how to make things work.
There’s also a higher process barrier now. The Status section has been updated to refer to ‘Standards Track’ and ‘Experimental’ because of the increased requirements to distinguish an RFC from a standards document. Also added was the ‘Security Considerations’ section. The IETF is now much more aware of its requirements to try to prevent bad things from happening on the wire.
1997 arrives, and so does a revision to the RFC guidance
In 1997 we got RFC 2223. The document has grown to 20 pages. As far as I can tell, it didn’t substantively alter the format. It did address emerging formalisms around different document statuses. The document had gotten big enough to require a table of contents occupying the whole of the first page and added three new sections: Copyright Notice, Copyright Section and Full Copyright statement. The first and last are boilerplate declarations of form to the intellectual property rights inherent in the IETF documents. The middle one specifies that RFCs need to have copyright statements.
RFC 2223bis and the rise of XML
Sadly, in 1998 Jon Postel passed away, unexpectedly, after complications from heart surgery. This had a huge impact on a document process, because Jon was fundamental to the idea of the RFC, the numbering, and the editorial process. His death left a huge hole in the IETF process, but the inherent problems with a textually defined standard specification were apparent before this time.
From 1997 onwards, work to refine the RFC document series continued, with continuity from Joyce Reynolds, the RFC editor who had overseen much of the emerging process with Jon. This process led to RFC 2223bis, which went through eight revisions between 2002 and 2004. Note: ‘bis’ and ‘ter’ are the French terms for again/redux/revised. They are used in many other standards processes and refer to a refinement of a document.
As noted in RFC 2223bis’ document tracker:
|IESG note||Probably will emerge in a completely different form when other RFC Editor related drafts have been approved by IAB (2/07)|
Indeed it did. But, interestingly, this draft notes the emergence of Extensible Markup Language (XML) for the first time in the RFC formatting specifications. XML aided immensely in making RFCs understood by machines.
Even though the web had been a lived reality for almost a decade, the RFC series continued to be specified in terms of the ASCII text and use of the N/ROFF package (with some side notes about postscript printable format). Now, for the first time, a structured text model emerged. This is interesting, because unlike N/ROFF, which merely marks how to lay out the letters in the line and page, XML specifies the logical hierarchy and functional role of the body of text.
If something is a section header, it isn’t using the .SH directive to define a page. Instead it’s in XML markup as <section>……</section>, so the machine-processing possibilities of recognizing this as a section become more clear.
It means things like machine processing, formal proofs, completeness checks, and future reformatting can be done more easily, because the abstraction of the structural form of the RFC has become clear in how it is presented. The document also shows how other word processing systems like Microsoft Word and LaTEX can be used (which of course, will have been a reality for some time, but this is the formal entry of them into the specification process for an RFC).
At the start of this drafting process in 2002, the document was 30 pages long. By 2007, the document was 43 pages long. Even at the start (’00’ revision) of the process, the document included the entire ASCII character set as a definition, presumably because of disagreements about the encoding of form and page feed, and other characters in the datastream. The old 58/72 rule has been significantly augmented with directions as to how to encode the page throw and why there is a ragged right margin.
But by 2007 (’08’ revision), the ASCII table had gone, although the increased direction on how to paginate remains. Both 00 and 08 refer to the emerging reality of XML specification through references to Marshall T. Rose’s RFC 2629, from 1999. For three whole years prior to the bis process, an RFC had documented how to write an RFC in XML. 2007 was important because it formalized the inclusion of XML.
2013 and the acceptance of XML as the canonical form
With the publication of RFC 6949, authored by Heather Flanagan (the RFC series editor since 2012) and Nevil Brownlee (the independent submissions editor), the IETF set the scene for XML as a ‘first class’ RFC state, putting it in the same category as N/ROFF, plaintext, and word documents.
Over 40 years ago, the RFC Series began as a collection of memos in an environment that included handwritten RFCs, typewritten RFCs, RFCs produced on mainframes with complicated layout tools, and more. As the tools changed and some of the source formats became unreadable, the core individuals behind the Series realized that a common format that could be read, revised, and archived long in the future was required. US-ASCII was chosen for the encoding of characters, and after a period of variability, a well-defined presentation format was settled upon. That format has proved to be persistent and reliable across a large variety of devices, operating systems, and editing tools. That stability has been a continuing strength of the Series. However, as new technology, such as small devices and advances in display technology, comes into common usage, there is a growing desire to see the format of the RFC Series adapt to take advantage of these different ways to communicate information. Since the format stabilized, authors and readers have suggested enhancements to the format. However, no suggestion developed clear consensus in the Internet technical community. As always, some individuals see no need for change, while others press strongly for specific enhancements.
This was released in an email written by Heather Flanagan, who was also the primary editor, author, and coordinator of community input into the design process, as subsequently documented in a series of RFCs, beginning with RFC 7322 in 2014. This RFC continued the fine tradition of keeping things simple and had one poignant nod to the past in its early text:
The ultimate goal of the RFC publication process is to produce documents that are readable, clear, consistent, and reasonably uniform. The basic formatting conventions for RFCs were established in the 1970s by the original RFC Editor, Jon Postel. This document describes the fundamental and unique style conventions and editorial policies currently in use for the RFC Series [RFC4844]. It is intended as a stable, infrequently updated reference for authors, editors, and reviewers.
But, it also had some seeds from this early 1970 vision of what an RFC needed:
3.2. Punctuation * No overstriking (or underlining) is allowed.
It’s 2014 and we’re finally defining XML as the formal definition of what an RFC ‘is’ (with a nod to plain ASCII), and we still make sure it prints properly, even if you can’t process overstrike characters!
Good old IETF!
This blog article was amended 30/08/2021 to clarify the critical role of Steve Crocker and Bill Duvall before RFC 825, with thanks to Steve Crocker and Vint Cerf for their detailed correction.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.