Notation and conventions
These conventions apply, at least in theory, to all of the specification documents unless stated otherwise.
Remember, our specification documents were once a collection of separate text files, written separately and edited over the course of years.
While we are trying (as of 2023) to edit them into consistency, you should be aware that these conventions are not now followed uniformly everywhere.
MUST, SHOULD, and so on
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
Data lengths
Lengths are given as a number of 8-bit bytes.
All bytes are 8 bits long. We sometimes call them "octets"; the terms as used here are interchangeable.
When referring to longer lengths, we use SI binary prefixes (as in "kibibytes", "mebibytes", and so on) to refer unambiguously to increments of 1024X bytes.
If you encounter a reference to "kilobytes", "megabytes", or so on, you cannot safely infer whether the author intended a decimal (1000N) or binary (1024N) interpretation. In these cases, it is better to revise the specifications.
Integer encoding
Multi-byte integers are encoded in big-endian ("network") order.
For example, 4660 (0x1234), when encoded as a two-byte integer, is the byte 0x12 followed by the byte 0x34. ([12 34])
When encoded as a four-byte integer, it is the byte 0x00, the byte 0x00, the byte 0x12, and the byte 0x34. ([00 00 12 34]).
Textual formats
The Tor Protocols involve textual elements (for example, text files such as netdocs, command lines, and bridge lines).
Strings, including keywords (for example, netdoc Item keywords, network parameters, and router flags), are compared for equality as byte strings. Likewise, "lexical ordering" of strings is lexical ordering of byte strings.
This means that these keywords are compared case-sensitively, without any kind of Unicode normalisation, etc.
Binary-as-text encodings
When we refer to "base64", "base32", or "base16", we mean the encodings described in RFC 4648, with the following notes:
- In base32, we never insert linefeeds in base32,
and we omit trailing
=
padding characters. - In base64,
we sometimes omit trailing
=
padding characters, and we do not insert linefeeds unless explicitly noted. - We do not insert any other whitespace, except as specifically noted.
Base 16 and base 32 are case-insensitive. Implementations should accept any cases, and should produce a single uniform case.
We sometimes refer to base16 as "hex" or "hexadecimal".
Note that as of 2023, in some places, the specs are not always explicit about:
- which base64 strings are multiline
- which base32 strings and base16 strings should be generated in what case.
This is something we should correct.
Notation
Operations on byte strings
A | B
represents the concatenation of two binary stringsA
andB
.
Binary literals
When we write a series of one-byte hexadecimal literals in square brackets, it represents a multi-byte binary string.
For example,
[6f 6e 69 6f 6e 20 72 6f 75 74 69 6e 67]
is a 13-byte sequence representing the unterminated ASCII string,onion routing
.