Relay cells
Within a circuit, the client and the end node use the contents of relay cells to tunnel end-to-end commands and TCP connections ("Streams") across circuits. End-to-end commands can be initiated by either edge; streams are initiated by the client.
End nodes that accept streams may be:
- exit relays (RELAY_BEGIN, anonymous),
- directory servers (RELAY_BEGIN_DIR, anonymous or non-anonymous),
- onion services (RELAY_BEGIN, anonymous via a rendezvous point).
The body of each unencrypted relay cell consists of an enveloped relay message, encoded as follows:
Field | Size |
---|---|
Relay command | 1 byte |
'Recognized' | 2 bytes |
StreamID | 2 bytes |
Digest | 4 bytes |
Length | 2 bytes |
Data | Length bytes |
Padding | CELL_BODY_LEN - 11 - Length bytes |
TODO: When we implement prop340, we should clarify which parts of the above are about the relay cell, and which are the enveloped message.
The relay commands are:
Command | Identifier | Type | Description | |
---|---|---|---|---|
Core protocol | ||||
1 | BEGIN | F | Open a stream | |
2 | DATA | F/B | Transmit data | |
3 | END | F/B | Close a stream | |
4 | CONNECTED | B | Stream has successfully opened | |
5 | SENDME | F/B, C? | Acknowledge traffic | |
6 | EXTEND | F, C | Extend a circuit with TAP (obsolete) | |
7 | EXTENDED | B, C | Finish extending a circuit with TAP (obsolete) | |
8 | TRUNCATE | F, C | Remove nodes from a circuit (unused) | |
9 | TRUNCATED | B, C | Report circuit truncation (unused) | |
10 | DROP | F/B, C | Long-range padding | |
11 | RESOLVE | F | Hostname lookup | |
12 | RESOLVED | B | Hostname lookup reply | |
13 | BEGIN_DIR | F | Open stream to directory cache | |
14 | EXTEND2 | F, C | Extend a circuit | |
15 | EXTENDED2 | B, C | Finish extending a circuit | |
16..18 | Reserved | For UDP; see prop339. | ||
Conflux | ||||
19 | CONFLUX_LINK | F, C | Link circuits into a bundle | |
20 | CONFLUX_LINKED | B, C | Acknowledge link request | |
21 | CONFLUX_LINKED_ACK | F, C | Acknowledge CONFLUX_LINKED message (for timing) | |
22 | CONFLUX_SWITCH | F/B, C | Switch between circuits in a bundle | |
Onion services | ||||
32 | ESTABLISH_INTRO | F, C | Create introduction point | |
33 | ESTABLISH_RENDEZVOUS | F, C | Create rendezvous point | |
34 | INTRODUCE1 | F, C | Introduction request (to intro point) | |
35 | INTRODUCE2 | B, C | Introduction request (to service) | |
36 | RENDEZVOUS1 | F, C | Rendezvous request (to rendezvous point) | |
37 | RENDEZVOUS2 | B, C | Rendezvous request (to client) | |
38 | INTRO_ESTABLISHED | B, C | Acknowledge ESTABLISH_INTRO | |
39 | RENDEZVOUS_ESTABLISHED | B, C | Acknowledge ESTABLISH_RENDEZVOUS | |
40 | INTRODUCE_ACK | B, C | Acknowledge INTRODUCE1 | |
Circuit padding | ||||
41 | PADDING_NEGOTIATE | F, C | Negotiate circuit padding | |
42 | PADDING_NEGOTIATED | B, C | Negotiate circuit padding | |
Flow control | ||||
43 | XON | F/B | Stream-level flow control | |
44 | XOFF | F/B | Stream-level flow control |
- F (Forward): Must only be sent by the originator of the circuit.
- B (Backward): Must only be sent by other nodes in the circuit back towards the originator.
- F/B (Forward or backward): May be sent in either direction.
- C: (Control) must have a zero-valued stream ID. (Other commands must have a nonzero stream ID.)
The 'recognized' field is used as a simple indication that the cell is still encrypted. It is an optimization to avoid calculating expensive digests for every cell. When sending cells, the unencrypted 'recognized' MUST be set to zero.
When receiving and decrypting cells the 'recognized' will always be zero if we're the endpoint that the cell is destined for. For cells that we should relay, the 'recognized' field will usually be nonzero, but will accidentally be zero with P=2^-16.
When handling a relay cell, if the 'recognized' in field in a decrypted relay cell is zero, the 'digest' field is computed as the first four bytes of the running digest of all the bytes that have been destined for this hop of the circuit or originated from this hop of the circuit, seeded from Df or Db respectively (obtained in Setting circuit keys), and including this relay cell's entire body (taken with the digest field set to zero). Note that these digests do include the padding bytes at the end of the cell, not only those up to "Len". If the digest is correct, the cell is considered "recognized" for the purposes of decryption (see Routing relay cells).
(The digest does not include any bytes from relay cells that do not start or end at this hop of the circuit. That is, it does not include forwarded data. Therefore if 'recognized' is zero but the digest does not match, the running digest at that node should not be updated, and the cell should be forwarded on.)
All relay messages pertaining to the same tunneled stream have the same stream ID. StreamIDs are chosen arbitrarily by the client. No stream may have a StreamID of zero. Rather, relay messages that affect the entire circuit rather than a particular stream use a StreamID of zero -- they are marked in the table above as "C" ([control") style cells. (Sendme cells are marked as "sometimes control" because they can include a StreamID or not depending on their purpose -- see Flow control.)
The 'Length' field of a relay cell contains the number of bytes in the relay cell's body which contain the body of the message. The remainder of the unencrypted relay cell's body is padded with padding bytes. Implementations handle padding bytes of unencrypted relay cells as they do padding bytes for other cell types; see Cell Packet format.
The 'Padding' field is used to make relay cell contents unpredictable, to avoid certain attacks (see proposal 289 for rationale). Implementations SHOULD fill this field with four zero-valued bytes, followed by as many random bytes as will fit. (If there are fewer than 4 bytes for padding, then they should all be filled with zero.
Implementations MUST NOT rely on the contents of the 'Padding' field.
If the relay cell is recognized but the relay command is not understood, the cell must be dropped and ignored. Its contents still count with respect to the digests and flow control windows, though.
Calculating the 'Digest' field
The 'Digest' field itself serves the purpose to check if a cell has been fully decrypted, that is, all onion layers have been removed. Having a single field, namely 'Recognized' is not sufficient, as outlined above.
In this section,
we assume an incrementally updated hash function,
where hash_calculate(state)
computes the current digest,
and hash_update(state,M)
adjusts the hash function's state
by adding M
to its input.
For ordinary circuits, the hash function used here is SHA-1.
For onion service circuits,
the hash function is SHA3-256.
When ENCRYPTING a relay cell, an implementation does the following:
# Encode the cell in binary (recognized and digest set to zero)
tmp = cmd + [0, 0] + stream_id + [0, 0, 0, 0] + length + data + padding
# Update the hash state with the encoded data
hash_state = hash_update(hash_state, tmp)
digest = hash_calculate(hash_state)
# The encoded data is the same as above with the digest field not being
# zero anymore
encoded = cmd + [0, 0] + stream_id + digest[0..4] + length + data +
padding
# Now we can encrypt the cell by adding the onion layers ...
When DECRYPTING a relay cell, an implementation does the following:
decrypted = decrypt(cell)
# Replace the digest field in decrypted by zeros
tmp = decrypted[0..5] + [0, 0, 0, 0] + decrypted[9..]
# Update the digest field with the decrypted data and its digest field
# set to zero
hash_state = hash_update(hash_state, tmp)
digest = hash_calculate(hash_state)
if digest[0..4] == decrypted[5..9]
# The cell has been fully decrypted ...
The caveat itself is that only the binary data with the digest bytes set to zero are being taken into account when calculating the running digest. The final plain-text cells (with the digest field set to its actual value) are not taken into the running digest.