Filename: 359-cgo-redux.md
Title: Counter Galois Onion, Updated
Author: Nick Mathewson
Created: 2024-04-02
Status: Open

Introduction

In Tor, a "Relay cryptography" algorithm is what we use to decrypt, encrypt, and authenticate RELAY and RELAY_EARLY cells.

This proposal describes a new approach for relay cryptography in Tor, instantiating the work of Degabriele, Melloni, Münch, and Stam. (See "Counter Galois Onion (CGO) for Tor: Fast Non-Malleable Onion Encryption".)

This document does not currently explain or justify the decisions involved in CGO, except in a few cases when necessary to avoid possible implementation errors.

We believe that the scheme provides the following advantages over Tor's current behavior:

The scheme is non-malleable. Any attempt to introduce a tag on a message will cause that message, and future messages, to become undecryptable.
The scheme provides additional forward security: after a party has decrypted a message, it no longer holds the material necessary to decrypt that message again.
This scheme has stronger protection against modification. (The previous one only had a 16 bit digest; this has 128 bits of authentication.)

Performance is expected to be faster than the existing Tor algorithm overall, due to no longer having to use SHA-1.

For some background and history here, see proposal 202, proposal 261, proposal 295, and proposal 308.

Defining Counter Galois Onion

Introduction and outline

Here we specify how to instantiate the Counter Galois Onion (CGO) encryption scheme. The description here is meant to match that in the paper and the reference implementation.

We will be presenting the algorithm from the bottom up:

At level 1, we'll define the symbols and cryptographic primitives we need.
At level 2, we'll use those primitives to define a tweakable block cipher and a tweakable PRF.
At level 3, we'll use those definitions to define a wide block cipher implementing the "Rugged Pseudorandom Permutation" security definition.
Finally, at level 4, we use that wide block cipher to construct a new set of forward- and reverse- direction onion encryption algorithms for use in Tor.

Preliminaries

We will give all lengths here as a number of bytes.

Define MSG_LEN as the length of the encrypted payload of a relay cell. In Tor, this is currently 509 bytes.

We will require below that several of our primitives work on data of the same size. This "block size" (BLK_LEN) will be the block size of our underlying block cipher, the output length of our universal hash, and other things too. As instantiated here, BLK_LEN is 16 bytes.

Level 1: Primitives

Let UH(K,M) be a universal hash taking a key K of length KLEN_UH and a message M of a given fixed length MLEN_UH(K), and outputting a tag of length BLK_LEN.

Let ENC_BC(K,M) / DEC_BC(K,M) be a block cipher taking a key of length KLEN_BC and a block of length BLK_LEN, and outputting a block of the same length.

Use || to mean concatenation.

Use ^ to mean bitwise xor. If the inputs are not the same length, extend the shorter one by appending 0s.

Use + to mean arithmetic addition.

Use * to mean arithmetic multiplication.

Use CEIL_DIV(a,b) to mean a divided by b, rounded up to the nearest whole number.

Instantiate UH as POLYVAL, giving KLEN_UH=16. If MLEN_UH is not a multiple of 16 (the input length of POLYVAL), pad its input with 0s.

Note: This padding is not safe if MLEN_UH is variable. As used here, every key used with UH is used only with messages of only a single length.

Instantiate ENC_BC/DEC_BC as AES-128, giving KLEN_BC=16.

Note: I want to change this to AES-192 or AES-256, but for now I am trying to specify the same thing as in the paper.

Define CTR_n(K,X) as ENC_BC used in counter mode, truncated to n bytes:

CTR_n(K,X) = ENC_BC(K, X) || ENC_BC(K, X+1) || ...`

Level 2: Building blocks

We use the above definitions to build ENC_ET(K,T,M)/DEC_ET(K,T,M): a tweakable block cipher taking a key of length KLEN_ET, a tweak of an fixed length TLEN_ET, and a block of length BLK_LEN, and producing a block of the same length.

We instantiate ENC_ET/DEC_ET using LRW2 (so-named in LRW2-NAME):

ENC_ET((KB,KU), T, M) = UH(KU,T) ^ ENC_BC(KB, M ^ UH(KU,T))
DEC_ET((KB,KU), T, M) = UH(KU,T) ^ DEC_BC(KB, M ^ UH(KU,T))

This gives: KLEN_ET = KLEN_UH + KLEN_BC. Here, MLEN_UH(KU) is TLEN_ET.

We also use the above definitions to build a pseudorandom function PRF((K, B), T, t), taking a key (K,B) of length KLEN_PRF, an input of T of length INLEN_PRF, and a single bit t, producing outputs of n0 or n1 bytes, depending on whether t=0 or t=1.

PRF((K, B), T, t) = CTR_{nt}(K, UH(B, T) + (t * C)).

where C > n0 / BLK_LEN. Here, MLEN_UH(B) is TLEN_ET.

Note: In the first version of the paper we saw, the POLYVAL operation above is given as XOR. In an email, Jean Paul said that it should be GF128_MUL, but they're actually using POLYVAL. I've put in POLYVAL (UH), so we can match the reference code.

This gives KLEN_PRF = KLEN_BC + BLEN_BC.

We will set n0 = MSG_LEN - BLK_LEN, and C >= CEIL_DIV(n0, BLK_LEN). (For our parameters, n0 = 509-16 = 493 and C = 31.) We define n1 later.

Note: since n0 depends on the message length to be encrypted, and n1 depends only on the algorithm itself, maybe it would make more sense to switch the two, eventually? This would let us specify the constant offset C as depending only on the length of the keys to be generated.

Layer 3: The UIV+ construction

Now we use the definitions above to define a tweakable updatable wide block cipher based on a Unilaterally Protected IV (UIV) construction, with key length KLEN_UIV and block length MSG_LEN. (This is not the original UIV construction, and so it is called "UIV+".)

It takes a tweak H of length HLEN_UIV and a key K=(J,S), where J has length KLEN_ET and S has length KLEN_PRF. It works on a ciphertext in two parts (X_L,X_R), where X_L has length of BLK_LEN, and X_R has length LEN_XR = MSG_LEN - BLK_LEN.

(This gives a key of length KLEN_UIV = KLEN_ET + KLEN_PRF.)

ENC_UIV((J,S), H, (X_L,X_R)):
   Y_L <-- ENC_ET(J, (H || X_R), X_L)
   Y_R <-- X_R ^ PRF_n0(S, Y_L, 0)
   return (Y_L, Y_R)

DEC_UIV((J,S), H, (Y_L,Y_R)):
   X_R <-- Y_R xor PRF_n0(S, Y_L, 0)
   X_L <-- DEC_ET(J, (H || X_R), Y_L)
   return (X_L, X_R)

The UIV+ construction also has an update function. Define n1 = KLEN_UIV + BLK_LEN, and:

UPDATE_UIV((J,S), N):
   ((J',S'), N') = PRF_{n1}(S, N, 1)
   return ((J', S'), N')

This construction above gives TLEN_ET = HLEN_UIV + LEN_XR.

Note an important property of UIV+: ENC_UIV is nonmalleable (and hence tagging resistant), but DEC_UIV is somewhat malleable (and hence not tagging resistant). This influences the choices of which UIV+ direction is used where: Don't swap them around!

The onion encryption algorithm

For each hop, in each direction, the client and the relay must share a key for the UIV construction (K), a nonce N, and a tag (T').

K (length KLEN_UIV) -- initialize from circuit handshake KDF.
N (length BLK_LEN) -- initialize from circuit handshake KDF.
T' (length BLK_LEN) -- initialize to zero.

We will give the forward and reverse encryption algorithms separately below. In the forward-direction algorithms, we will be referring only to the forward keys, and in the backward-direction algorithms, we will be referring only to the backward keys. These are nonetheless two separate sets of keys.

When writing the client algorithm, we'll refer to the state corresponding to the i'th hop as S_i = (K_i, N_i, T'_i). Hops are 1-indexed. When writing the OR algorithms, we'll refer to the state as R = (K, N, T').

These algorithms take an "Additional Data" parameter AD. We set this to a single byte, corresponding to the command (RELAY or RELAY_EARLY) used to transmit the relay message.

Caveat: These algorithms are secure for any single length of AD, but they do not accommodate a variable-length AD. If it later becomes necessary to use a different AD length, we must never use two different lengths with the same set of keys.

We do not give a fixed value for MSG_LEN below; in Tor's current design, it is 509, but other values are possible. The important constraint here is that the same keys (and any keys derived from them) must never be used to encrypt messages of different lengths.

Forward definitions

To encrypt a message M in the forward direction to the dth hop in a circuit, a client behaves as follows:

ENC_OP(S[], d, AD, M):
    Initialize T as N_d.
    Initialize C as M.
    for i = d down to 1:
        (K, N, T') <-- S_i
        H <-- T' || AD
        T' <-- T
        (T, C) <-- DEC_UIV(K, H, (T,C))
        if i == d:  # Destination hop
            (K, N) <-- UPDATE_UIV(K, N) # update key
        S_i <-- (K, N, T')
    return T || C

Note: I moved the initialization of T,C to the top of the function since I think that is more clear. No change is intended.

When receiving an outbound message T || C on a circuit with state R, the OR behaves as follows:

DEC_OR(R, AD, T || C):
    (K, N, T') <-- R
    H <-- T' || AD
    (T, C) <-- ENC_UIV(K, H, (T,C))
    if T == N:  # Cell is recognized!
        (K, N) <-- UPDATE_UIV(K, N) # update key
        X <-- "receive message C"
    else:
        X <-- "forward message T || C"
    R <-- (K, N, T)
    return X

Backward definitions.

To initiate a message M with additional data AD, the OR behaves as follows:

ENC_OR(R, AD, M):
    (K, N, T') <-- R
    H <-- T' || AD
    (T, C) <-- ENC_UIV(K, H, (N, M))
    (K, N) <-- UPDATE_UIV(K, T)
    R <-- (K, N, T)
    return T || C

To process an inbound message T || C, the OR behaves as follows:

PROC_OR(R, AD, T || C):
    (K, N, T') <-- R
    H <-- T' || AD
    (T, C) <-- ENC_UIV(K, H, (T, C))
    R <-- (K, N, T)
    return T || C

Finally, to process an inbound message T || C on a circuit with length L, the client behaves as follows:

DEC_OP(S[], AD, T || C):
   for i = 1 to L:
       (K, N, T') <-- S_i
       H <-- T' || AD
       (T', C) <-- DEC_UIV(K, H, (T, C))
       if T' == N: # Cell recognized
           (K, N) <-- UPDATE_UIV(K, T)
           S_i <-- (K, N, T)
           return "message C from hop i"
       S_i <-- (K, N, T)
       T <-- T'
   return "corrupt message".

Note: I've altered the layout a little above, but I have not changed the algorithm, I think.

Integrating with Tor

Dependencies

In order to use CGO, the following protocol elements must also be supported:

Congestion control (FlowCtrl=2, proposal 324)
Ntorv3 (Relay=4, proposal 332)
Subprotocol-based negotiation (Relay=5, proposal 346)

Relay cell format

This format is the same as first specified in proposal 340.

When exchanging relay messages with a hop using CGO, we use the following new format for a decrypted relay cell:

   tag             [16 bytes]
   body            [509 - 16 = 493 bytes]

The tag and body correspond to "T" and "C" in the algorithm definitions above.

Within the body, we use the following format for the enveloped relay message:

   Message Header
      command       [1 byte]
      length        [2 bytes]
   Message Routing Header (optional)
      stream_id     [2 bytes]
   Message body
      data          ["length" bytes]
   Padding
      (4 zero bytes, followed by random bytes, up to end of cell)

The stream_id field has moved into a separate inner header that only appears sometimes. The command value tells us if the header is to be expected or not.

The following message types take required stream IDs: BEGIN, DATA, END, CONNECTED, RESOLVE, RESOLVED, and BEGIN_DIR, XON, XOFF.

(Note that the above rule is not compatible with pre-CC flow control, where stream-SENDMEs also existed. That's why we require FlowCtrl=2.)

Advertising and Negotiating CGO.

If a relay supports CGO encryption, it must advertise the fact using a subprotocol version. (We recommend allocating a new "Relay=" subprotocol).

The client may choose CGO encryption by negotiating with the relay using any handshake that supports extensions. (Currently, the only such handshake supported in this context is "ntor v3".) The client sends the protocol-negotiation extension from proposal 346, requesting the relevant subprotocol. and sending the protocol-negotiation extension.

Deriving keys the onion handshake

In the existing Tor relay encryption, keys are derived from the ntor handshake's KDF as:

    Df  (20 bytes)        digest seed, forward
    Db  (20 bytes)        digest seed, backward
    Kf  (16 bytes)        AES key, forward
    Kb  (16 bytes)        AES key, backward
    KH  (20 bytes)        Binding value, used in handshake

When we are instead deriving keys for CGO, we extract them from the KDF as:

    Forward direction
        K   (KLEN_UIV bytes)
        N   (BLK_LEN bytes)
    Backward direction
        K   (KLEN_UIV bytes)
        N   (BLK_LEN bytes)
        KH  (20 bytes)        Binding value, used in handshake

Note: The next time we revisit the way our KDF works, we should possibly change it so that it first derives KH and then a separate seed that is used for an independent KDF that gives us the relay crypto keys.

Also see proposal 357.

Authenticated SENDMEs

In its circuit-level "SENDME" messages, Tor now includes what amounts to a 20-byte SHA1 digest of the traffic that is acknowledging, to prove that it really received that traffic. (See proposal 289 for more details and design rationale.) This digest is "free", since it is the same digest used in the current Tor relay encryption design.

But when using CGO, we will no longer keep a running SHA1 digest of all our relay cells. Instead, when CGO is in use at a given hop, SENDME messages must contain the tag value T of the last relay cell that they are acknowledging. This means that, for every 100th relay cell from a given hop, the sender must store the value T until it has been acknowledged, and the recipient must send that value T in the corresponding SENDME message.

Note that this SENDME format proves less than it did before:

It does proves that that the party sending the SENDME has received the 100th cell.
It does not prove that they have received any of the intermediate cells.
It does not prove that they have decrypted any cells.

Nonetheless, this SENDME format should still be sufficient for our needs.

Onion Services

This section is a restatement of relevant parts of proposal 346.

We also need to negotiate and enable CGO for onion services. There are two points where CGO might be used:

When clients are connecting to an introduction point listed in the HsDesc, and when services are connecting to a rendezvous point listed in an INTRODUCE2 message.
When the client and service are connecting to one another over a "virtual" hop.

To handle the first case, we use the same approach as we use to handle other protocol changes for introduction and rendezvous points:

When connecting to a node that is listed in our latest directory, we use the protocols listed for that node in the directory.
When connecting to a node that isn't listed, we assume that it supports only the minimum protocol versions listed in the directory.

TODO: Instead, we could add additional features to the onion service protocol so that we can include protocol information about IPts and RendPts in the appropriate location.

This above could be a separate proposal.

For the second case, we need a way for services to advertise that they support CGO, and clients to opt in to use it. Services advertise their support via adding relay=X to the proto item in the inner (encrypted section), where relay=X is the reserved subprotocol flag to indicate CGO support.

To opt in to using CGO, clients send a subproto_request extension in their INTRODUCE2 messages, as before. (See proposal 346 and proposal 358.)

TODO: We could in theory take this opportunity to unify hs-ntor and ntor-v3. But maybe it would be better to wait instead until we want to support postquantum crypto.

Deployment and timeline

We should implement CGO on relays and clients simultaneously. It should be enabled by default on relays, and enabled by default on Arti clients, and disabled by default on and onion services.

It isn't clear if we should make it enabled-by-default on C tor clients or not.

TODO: Analyze the fingerprinting risk.

Operation will be controlled by these consensus parameters, based on the scheme from proposal 346:

hs-support-relay-X
hs-advertise-relay-X

where "Relay=X" is the reserved subprotocol for CGO.

For testing, there will be a configuration option to tell clients and onion services to use CGO even when the consensus parameter has disabled CGO.

Mixed operation

We note that the CGO system can be used on a "mixed" circuit where some relays support CGO and some do not. When a client is using CGO, it uses it with every relay on the circuit that supports it, and uses the legacy relay crypto with the other relays.

(This doesn't give the same benefits as using CGO with every hop; see the paper for more information.)

Eventually, when all relay versions without CGO support are obsolete, all circuits will use CGO for all hops.

Appendices

Reserved numbers, names, and IDs

We need a new Relay=X subprotocol to indicate support for CGO. (RELAY_CGO)

We'll need new *-relay-X parameters (possibly) to indicate whether clients, relays, and onion services should advertise and/or use CGO.

Test vectors

For POLYVAL test vectors, see RFC 8452.

For AES test vectors, see NIST800-38a.

For test vectors for the ET, PRF, UIV, and cell encryption algorithms, see https://gitlab.torproject.org/nickm/cgo-drafts/-/blob/main/cgo-test-vectors.

Index of symbols

Functions:

Symbol	Meaning
`UH(K,M)`	Universal hash with key K and message M. (POLYVAL)
`ENC_BC(K,M)`	Block cipher encryption of M with key K. (AES128)
`DEC_BC(K,M)`	Block cipher decryption of M with key K. (AES128)
`A
`A ^ B`	A xored with B
`CEIL_DIV(a,b)`	A divided by B, rounded up.
`CTR_n(K,X)`	n bytes of output from BC in counter mode, with key K and nonce X
`ENC_ET(K,T,M)`	Tweaked-block-cipher encryption of M with key K and tweak T
`DEC_ET(K,T,M)`	Tweaked-block-cipher decryption of M with key K and tweak T
`PRF((K,B), T, t)`	Pseudorandom function with key (K,B), input T, and bit t
`ENC_UIV((J,S), H, (X_L,X_R))`	UIV+ encryption of `X_L,X_R` with key (J,S) and tweak H
`DEC_UIV((J,S), H, (Y_L,Y_R))`	UIV+ encryption of `Y_L,Y_R` with key (J,S) and tweak H
`UPDATE_UIV(K,N)`	Update UIV+ keys K based on based on a nonce value N

Constants:

Constant	Value	Meaning
`MSG_LEN`	509	Length of the encrypted part of a relay cell
`BLK_LEN`	16	Block length; used in various places
`MLEN_UH(KU)`	510	Length of message input to universal hash used in ENC_ET
`MLEN_UH(B)`	16	Length of message input to universal hash used in PRF
`KLEN_UH`	16	Length of universal hash key.
`KLEN_BC`	16	Length of block cipher key.
`TLEN_ET`	510	Length of tweak for tweakable block cipher
`KLEN_ET`	32	Length of key for tweakable block cipher
`KLEN_PRF`	32	Length of PRF key
`C`	31	Offset in blocks to separate n0 and n1
`INLEN_PRF`	16	Length of PRF input.
`n0`	493	Length of PRF output when t=0
`n1`	80	Length of PRF output when t=1
`HLEN_UIV`	17	Length of UIV tweak
`LEN_XR`	493	Length of `X_R` portion of UIV input.
`KLEN_UIV`	64	Length of UIV+ key

Possible future changes

We may want to move to bigger AES keys.

We may want to move around the concatenated inputs to the universal hash, if doing so is more efficient.

We may want to analyze the cost of updating our AES keys in UPDATE_UIV: Updating UIV keys is free, but AES keys need a key expansion step that isn't free for cheap CPUs.

Tor design proposals