Computing a consensus from a set of votes
Given a set of votes, authorities compute the contents of the consensus.
The consensus status, along with as many signatures as the server currently knows (see section 3.10 below), should be available at
http://<hostname>/tor/status-vote/next/consensus.z
The contents of the consensus document are as follows:
The "valid-after", "valid-until", and "fresh-until" times are taken as the median of the respective values from all the votes.
The times in the "voting-delay" line are taken as the median of the VoteSeconds and DistSeconds times in the votes.
Known-flags is the union of all flags known by any voter.
Entries are given on the "params" line for every keyword on which a majority of authorities (total authorities, not just those participating in this vote) voted on, or if at least three authorities voted for that parameter. The values given are the low-median of all votes on that keyword.
(In consensus methods 7 to 11 inclusive, entries were given on the "params" line for every keyword on which any authority voted, the value given being the low-median of all votes on that keyword.)
"client-versions" and "server-versions" are sorted in ascending
order; A version is recommended in the consensus if it is recommended
by more than half of the voting authorities that included a
client-versions or server-versions lines in their votes.
With consensus methods 19 through 33, a package line is generated for a given PACKAGENAME/VERSION pair if at least three authorities list such a package in their votes. (Call these lines the "input" lines for PACKAGENAME.) The consensus will contain every "package" line that is listed verbatim by more than half of the authorities listing a line for the PACKAGENAME/VERSION pair, and no others.
The authority item groups (dir-source, contact, fingerprint, vote-digest) are taken from the votes of the voting authorities. These groups are sorted by the digests of the authorities identity keys, in ascending order. If the consensus method is 3 or later, a dir-source line must be included for every vote with legacy-key entry, using the legacy-key's fingerprint, the voter's ordinary nickname with the string "-legacy" appended, and all other fields as from the original vote's dir-source line.
A router status entry:
* is included in the result if some router status entry with the same
identity is included by more than half of the authorities (total
authorities, not just those whose votes we have).
(Consensus method earlier than 21)
* is included according to the rules in section 3.8.0.1 and
3.8.0.2 below. (Consensus method 22 or later)
* For any given RSA identity digest, we include at most
one router status entry.
* For any given Ed25519 identity, we include at most one router
status entry.
* A router entry has a flag set if that is included by more than half
of the authorities who care about that flag.
* Two router entries are "the same" if they have the same
(descriptor digest, published time, nickname, IP, ports> tuple.
We choose the tuple for a given router as whichever tuple appears
for that router in the most votes. We break ties first in favor of
the more recently published, then in favor of smaller server
descriptor digest.
[
* The Named flag appears if it is included for this routerstatus by
_any_ authority, and if all authorities that list it list the same
nickname. However, if consensus-method 2 or later is in use, and
any authority calls this identity/nickname pair Unnamed, then
this routerstatus does not get the Named flag.
* If consensus-method 2 or later is in use, the Unnamed flag is
set for a routerstatus if any authorities have voted for a different
identities to be Named with that nickname, or if any authority
lists that nickname/ID pair as Unnamed.
(With consensus-method 1, Unnamed is set like any other flag.)
[But note that authorities no longer vote for the Named flag,
and the above two bulletpoints are now irrelevant.]
]
* The version is given as whichever version is listed by the most
voters, with ties decided in favor of more recent versions.
* If consensus-method 4 or later is in use, then routers that
do not have the Running flag are not listed at all.
* If consensus-method 5 or later is in use, then the "w" line
is generated using a low-median of the bandwidth values from
the votes that included "w" lines for this router.
* If consensus-method 5 or later is in use, then the "p" line
is taken from the votes that have the same policy summary
for the descriptor we are listing. (They should all be the
same. If they are not, we pick the most commonly listed
one, breaking ties in favor of the lexicographically larger
vote.) The port list is encoded as specified in section 3.8.2.
* If consensus-method 6 or later is in use and if 3 or more
authorities provide a Measured= keyword in their votes for
a router, the authorities produce a consensus containing a
Bandwidth= keyword equal to the median of the Measured= votes.
* If consensus-method 7 or later is in use, the params line is
included in the output.
* If the consensus method is under 11, bad exits are considered as
possible exits when computing bandwidth weights. Otherwise, if
method 11 or later is in use, any router that is determined to get
the BadExit flag doesn't count when we're calculating weights.
* If consensus method 12 or later is used, only consensus
parameters that more than half of the total number of
authorities voted for are included in the consensus.
[ As of 0.2.6.1-alpha, authorities no longer advertise or negotiate
any consensus methods lower than 13. ]
* If consensus method 13 or later is used, microdesc consensuses
omit any router for which no microdesc was agreed upon.
* If consensus method 14 or later is used, the ns consensus and
microdescriptors may include an "a" line for each router, listing
an IPv6 OR port.
* If consensus method 15 or later is used, microdescriptors
include "p6" lines including IPv6 exit policies.
* If consensus method 16 or later is used, ntor-onion-key
are included in microdescriptors
* If consensus method 17 or later is used, authorities impose a
maximum on the Bandwidth= values that they'll put on a 'w'
line for any router that doesn't have at least 3 measured
bandwidth values in votes. They also add an "Unmeasured=1"
flag to such 'w' lines.
* If consensus method 18 or later is used, authorities include
"id" lines in microdescriptors. This method adds RSA ids.
* If consensus method 19 or later is used, authorities may include
"package" lines in consensuses.
* If consensus method 20 or later is used, authorities may include
GuardFraction information in microdescriptors.
* If consensus method 21 or later is used, authorities may include
an "id" line for ed25519 identities in microdescriptors.
[ As of 0.2.8.2-alpha, authorities no longer advertise or negotiate
consensus method 21, because it contains bugs. ]
* If consensus method 22 or later is used, and the votes do not
produce a majority consensus about a relay's Ed25519 key (see
3.8.0.1 below), the consensus must include a NoEdConsensus flag on
the "s" line for every relay whose listed Ed key does not reflect
consensus.
* If consensus method 23 or later is used, authorities include
shared randomness protocol data on their votes and consensus.
* If consensus-method 24 or later is in use, then routers that
do not have the Valid flag are not listed at all.
[ As of 0.3.4.1-alpha, authorities no longer advertise or negotiate
any consensus methods lower than 25. ]
* If consensus-method 25 or later is in use, then we vote
on recommended-protocols and required-protocols lines in the
consensus. We also include protocols lines in routerstatus
entries.
* If consensus-method 26 or later is in use, then we initialize
bandwidth weights to 1 in our calculations, to avoid
division-by-zero errors on unusual networks.
* If consensus method 27 or later is used, the microdesc consensus
may include an "a" line for each router, listing an IPv6 OR port.
[ As of 0.4.3.1-alpha, authorities no longer advertise or negotiate
any consensus methods lower than 28. ]
* If consensus method 28 or later is used, microdescriptors no longer
include "a" lines.
* If consensus method 29 or later is used, microdescriptor "family"
lines are canonicalized to improve compression.
* If consensus method 30 or later is used, the base64 encoded
ntor-onion-key does not include the trailing = sign.
* If consensus method 31 or later is used, authorities parse the
"bwweightscale" and "maxunmeasuredbw" parameters correctly when
computing votes.
* If consensus method 32 or later is used, authorities handle the
"MiddleOnly" flag specially when computing a consensus. When the
voters agree to include "MiddleOnly" in a routerstatus, they
automatically remove "Exit", "Guard", "V2Dir", and "HSDir". If
the BadExit flag is included in the consensus, they automatically
add it to the routerstatus.
* If consensus method 33 or later is used, and the consensus
flavor is "microdesc", then the "Publication" field in the "r"
line is set to "2038-01-01 00:00:00".
* If consensus method 34 or later is used, the consensus
does not include any "package" lines.
The signatures at the end of a consensus document are sorted in ascending order by identity digest.
All ties in computing medians are broken in favor of the smaller or earlier item.
Deciding which Ids to include
This sorting algorithm is used for consensus-method 22 and later.
First, consider each listing by tuple of <Ed,Rsa> identities, where 'Ed'
may be "None" if the voter included "id ed25519 none" to indicate that
the authority knows what ed25519 identities are, and thinks that the RSA
key doesn't have one.
For each such <Ed, RSA> tuple that is listed by more than half of the
total authorities (not just total votes), include it. (It is not
possible for any other <id-Ed, id-RSA'> to have as many votes.) If more
than half of the authorities list a single <Ed,Rsa> pair of this type, we
consider that Ed key to be "consensus"; see description of the
NoEdConsensus flag.
Log any other id-RSA values corresponding to an id-Ed we included, and any
other id-Ed values corresponding to an id-RSA we included.
For each <id-RSA> that is not yet included, if it is listed by more than
half of the total authorities, and we do not already have it listed with
some <id-Ed>, include it, but do not consider its Ed identity canonical.
Deciding which descriptors to include
Deciding which descriptors to include.
A tuple belongs to an <id-RSA, id-Ed>
identity if it is a new tuple that
matches both ID parts, or if it is an old tuple (one with no Ed opinion)
that matches the RSA part. A tuple belongs to an <id-RSA>
identity if its
RSA identity matches.
A tuple matches another tuple if all the fields that are present in both tuples are the same.
For every included identity, consider the tuples belonging to that identity. Group them into sets of matching tuples. Include the tuple that matches the largest set, breaking ties in favor of the most recently published, and then in favor of the smaller server descriptor digest.
Forward compatibility
Future versions of Tor will need to include new information in the consensus documents, but it is important that all authorities (or at least half) generate and sign the same signed consensus.
To achieve this, authorities list in their votes their supported methods for generating consensuses from votes. Later methods will be assigned higher numbers. Currently specified methods:
"1" -- The first implemented version.
"2" -- Added support for the Unnamed flag.
"3" -- Added legacy ID key support to aid in authority ID key rollovers
"4" -- No longer list routers that are not running in the consensus
"5" -- adds support for "w" and "p" lines.
"6" -- Prefers measured bandwidth values rather than advertised
"7" -- Provides keyword=integer pairs of consensus parameters
"8" -- Provides microdescriptor summaries
"9" -- Provides weights for selecting flagged routers in paths
"10" -- Fixes edge case bugs in router flag selection weights
"11" -- Don't consider BadExits when calculating bandwidth weights
"12" -- Params are only included if enough auths voted for them
"13" -- Omit router entries with missing microdescriptors.
"14" -- Adds support for "a" lines in ns consensuses and microdescriptors.
"15" -- Adds support for "p6" lines.
"16" -- Adds ntor keys to microdescriptors
"17" -- Adds "Unmeasured=1" flags to "w" lines
"18" -- Adds 'id' to microdescriptors.
"19" -- Adds "package" lines to consensuses
"20" -- Adds GuardFraction information to microdescriptors.
"21" -- Adds Ed25519 keys to microdescriptors.
"22" -- Instantiates Ed25519 voting algorithm correctly.
"23" -- Adds shared randomness protocol data.
"24" -- No longer lists routers that are not Valid in the consensus.
"25" -- Vote on recommended-protocols and required-protocols.
"26" -- Initialize bandwidth weights to 1 to avoid division-by-zero.
"27" -- Adds support for "a" lines in microdescriptor consensuses.
"28" -- Removes "a" lines from microdescriptors.
"29" -- Canonicalizes families in microdescriptors.
"30" -- Removes padding from ntor-onion-key.
"31" -- Uses correct parsing for bwweightscale and maxunmeasuredbw
when computing weights
"32" -- Adds special handling for MiddleOnly flag.
"33" -- Sets "publication" field in microdesc consensus "r" lines
to a meaningless value.
"34" -- Removes "package" lines from consensus.
Before generating a consensus, an authority must decide which consensus method to use. To do this, it looks for the highest version number supported by more than 2/3 of the authorities voting. If it supports this method, then it uses it. Otherwise, it falls back to the newest consensus method that it supports (which will probably not result in a sufficiently signed consensus).
All authorities MUST support method 25; authorities SHOULD support more recent methods as well. Authorities SHOULD NOT support or advertise support for any method before 25. Clients MAY assume that they will never see a current valid signed consensus for any method before method 25.
(The consensuses generated by new methods must be parsable by implementations that only understand the old methods, and must not cause those implementations to compromise their anonymity. This is a means for making changes in the contents of consensus; not for making backward-incompatible changes in their format.)
The following methods have incorrect implementations; authorities SHOULD NOT advertise support for them:
"21" -- Did not correctly enable support for ed25519 key collation.
Encoding port lists
Whether the summary shows the list of accepted ports or the list of rejected ports depends on which list is shorter (has a shorter string representation). In case of ties we choose the list of accepted ports. As an exception to this rule an allow-all policy is represented as "accept 1-65535" instead of "reject " and a reject-all policy is similarly given as "reject 1-65535".
Summary items are compressed, that is instead of "80-88,89-100" there only is a single item of "80-100", similarly instead of "20,21" a summary will say "20-21".
Port lists are sorted in ascending order.
The maximum allowed length of a policy summary (including the "accept " or "reject ") is 1000 characters. If a summary exceeds that length we use an accept-style summary and list as much of the port list as is possible within these 1000 bytes. [XXXX be more specific.]
Computing Bandwidth Weights
Let weight_scale = 10000, or the value of the "bwweightscale" parameter. (Before consensus method 31 there was a bug in parsing bwweightscale, so that if there were any consensus parameters after it alphabetically, it would always be treated as 10000. A similar bug existed for "maxunmeasuredbw".)
Starting with consensus method 26, G, M, E, and D are initialized to 1 and T to 4. Prior consensus methods initialize them all to 0. With this change, test tor networks that are small or new are much more likely to produce bandwidth-weights in their consensus. The extra bandwidth has a negligible impact on the bandwidth weights in the public tor network.
Let G be the total bandwidth for Guard-flagged nodes. Let M be the total bandwidth for non-flagged nodes. Let E be the total bandwidth for Exit-flagged nodes. Let D be the total bandwidth for Guard+Exit-flagged nodes. Let T = G+M+E+D
Let Wgd be the weight for choosing a Guard+Exit for the guard position. Let Wmd be the weight for choosing a Guard+Exit for the middle position. Let Wed be the weight for choosing a Guard+Exit for the exit position.
Let Wme be the weight for choosing an Exit for the middle position. Let Wmg be the weight for choosing a Guard for the middle position.
Let Wgg be the weight for choosing a Guard for the guard position. Let Wee be the weight for choosing an Exit for the exit position.
Balanced network conditions then arise from solutions to the following system of equations:
WggG + WgdD == M + WmdD + WmeE + WmgG (guard bw = middle bw) WggG + WgdD == WeeE + WedD (guard bw = exit bw) WedD + WmdD + WgdD == D (aka: Wed+Wmd+Wdg = weight_scale) WmgG + WggG == G (aka: Wgg = weight_scale-Wmg) WmeE + WeeE == E (aka: Wee = weight_scale-Wme)
We are short 2 constraints with the above set. The remaining constraints come from examining different cases of network load. The following constraints are used in consensus method 10 and above. There are another incorrect and obsolete set of constraints used for these same cases in consensus method 9. For those, see dir-spec.txt in Tor 0.2.2.10-alpha to 0.2.2.16-alpha.
Case 1: E >= T/3 && G >= T/3 (Neither Exit nor Guard Scarce)
In this case, the additional two constraints are: Wmg == Wmd, Wed == 1/3.
This leads to the solution:
Wgd = weight_scale/3
Wed = weight_scale/3
Wmd = weight_scale/3
Wee = (weight_scale*(E+G+M))/(3*E)
Wme = weight_scale - Wee
Wmg = (weight_scale*(2*G-E-M))/(3*G)
Wgg = weight_scale - Wmg
Case 2: E < T/3 && G < T/3 (Both are scarce)
Let R denote the more scarce class (Rare) between Guard vs Exit. Let S denote the less scarce class.
Subcase a: R+D < S
In this subcase, we simply devote all of D bandwidth to the scarce class.
Wgg = Wee = weight_scale
Wmg = Wme = Wmd = 0;
if E < G:
Wed = weight_scale
Wgd = 0
else:
Wed = 0
Wgd = weight_scale
Subcase b: R+D >= S
In this case, if M <= T/3, we have enough bandwidth to try to achieve a balancing condition.
Add constraints Wgg = weight_scale, Wmd == Wgd to maximize bandwidth in the guard position while still allowing exits to be used as middle nodes:
Wee = (weight_scale*(E - G + M))/E Wed = (weight_scale*(D - 2E + 4G - 2M))/(3D) Wme = (weight_scale*(G-M))/E Wmg = 0 Wgg = weight_scale Wmd = (weight_scale - Wed)/2 Wgd = (weight_scale - Wed)/2
If this system ends up with any values out of range (ie negative, or above weight_scale), use the constraints Wgg == weight_scale and Wee == weight_scale, since both those positions are scarce:
Wgg = weight_scale
Wee = weight_scale
Wed = (weight_scale*(D - 2*E + G + M))/(3*D)
Wmd = (weight_Scale*(D - 2*M + G + E))/(3*D)
Wme = 0
Wmg = 0
Wgd = weight_scale - Wed - Wmd
If M > T/3, then the Wmd weight above will become negative. Set it to 0
in this case:
Wmd = 0
Wgd = weight_scale - Wed
Case 3: One of E < T/3 or G < T/3
Let S be the scarce class (of E or G).
Subcase a: (S+D) < T/3:
if G=S:
Wgg = Wgd = weight_scale;
Wmd = Wed = Wmg = 0;
// Minor subcase, if E is more scarce than M,
// keep its bandwidth in place.
if (E < M) Wme = 0;
else Wme = (weight_scale*(E-M))/(2*E);
Wee = weight_scale-Wme;
if E=S:
Wee = Wed = weight_scale;
Wmd = Wgd = Wme = 0;
// Minor subcase, if G is more scarce than M,
// keep its bandwidth in place.
if (G < M) Wmg = 0;
else Wmg = (weight_scale*(G-M))/(2*G);
Wgg = weight_scale-Wmg;
Subcase b: (S+D) >= T/3
if G=S:
Add constraints Wgg = weight_scale, Wmd == Wed to maximize bandwidth
in the guard position, while still allowing exits to be
used as middle nodes:
Wgg = weight_scale
Wgd = (weight_scale*(D - 2*G + E + M))/(3*D)
Wmg = 0
Wee = (weight_scale*(E+M))/(2*E)
Wme = weight_scale - Wee
Wmd = (weight_scale - Wgd)/2
Wed = (weight_scale - Wgd)/2
if E=S:
Add constraints Wee == weight_scale, Wmd == Wgd to maximize bandwidth
in the exit position:
Wee = weight_scale;
Wed = (weight_scale*(D - 2*E + G + M))/(3*D);
Wme = 0;
Wgg = (weight_scale*(G+M))/(2*G);
Wmg = weight_scale - Wgg;
Wmd = (weight_scale - Wed)/2;
Wgd = (weight_scale - Wed)/2;
To ensure consensus, all calculations are performed using integer math with a fixed precision determined by the bwweightscale consensus parameter (defaults at 10000, Min: 1, Max: INT32_MAX). (See note above about parsing bug in bwweightscale before consensus method 31.)
For future balancing improvements, Tor clients support 11 additional weights for directory requests and middle weighting. These weights are currently set at weight_scale, with the exception of the following groups of assignments:
Directory requests use middle weights:
Wbd=Wmd, Wbg=Wmg, Wbe=Wme, Wbm=Wmm
Handle bridges and strange exit policies:
Wgm=Wgg, Wem=Wee, Weg=Wed
Computing consensus flavors
Consensus flavors are variants of the consensus that clients can choose to download and use instead of the unflavored consensus. The purpose of a consensus flavor is to remove or replace information in the unflavored consensus without forcing clients to download information they would not use anyway.
Directory authorities can produce and serve an arbitrary number of flavors of the same consensus. A downside of creating too many new flavors is that clients will be distinguishable based on which flavor they download. A new flavor should not be created when adding a field instead wouldn't be too onerous.
Examples for consensus flavors include:
- Publishing hashes of microdescriptors instead of hashes of
full descriptors (see section 3.9.2).
- Including different digests of descriptors, instead of the
perhaps-soon-to-be-totally-broken SHA1.
Consensus flavors are derived from the unflavored consensus once the voting process is complete. This is to avoid consensus synchronization problems.
Every consensus flavor has a name consisting of a sequence of one or more alphanumeric characters and dashes. For compatibility, the original (unflavored) consensus type is called "ns".
The supported consensus flavors are defined as part of the authorities' consensus method.
All consensus flavors have in common that their first line is "network-status-version" where version is 3 or higher, and the flavor is a string consisting of alphanumeric characters and dashes:
"network-status-version" SP version [SP flavor] NL
ns consensus
The ns consensus flavor is equivalent to the unflavored consensus. When the flavor is omitted from the "network-status-version" line, it should be assumed to be "ns". Some implementations may explicitly state that the flavor is "ns" when generating consensuses, but should accept consensuses where the flavor is omitted.
Microdescriptor consensus
The microdescriptor consensus is a consensus flavor that contains microdescriptor hashes instead of descriptor hashes and that omits exit-policy summaries which are contained in microdescriptors. The microdescriptor consensus was designed to contain elements that are small and frequently changing. Clients use the information in the microdescriptor consensus to decide which servers to fetch information about and which servers to fetch information from.
The microdescriptor consensus is based on the unflavored consensus with the exceptions as follows:
"network-status-version" SP version SP "microdesc" NL
[At start, exactly once.]
The flavor name of a microdescriptor consensus is "microdesc".
Changes to router status entries are as follows:
"r" SP nickname SP identity SP publication SP IP SP ORPort
SP DirPort NL
[At start, exactly once.]
Similar to "r" lines in section 3.4.1, but without the digest element.
"a" SP address ":" port NL
[Any number]
Identical to the "a" lines in section 3.4.1.
(Only included when the vote is generated with consensus-method 14 or later, and the consensus is generated with consensus-method 27 or later.)
"p" ... NL
[At most once]
Not currently generated.
Exit policy summaries are contained in microdescriptors and therefore omitted in the microdescriptor consensus.
"m" SP digest NL
[Exactly once.*]
"digest" is the base64 of the SHA256 hash of the router's microdescriptor with trailing =s omitted. For a given router descriptor digest and consensus method there should only be a single microdescriptor digest in the "m" lines of all votes. If different votes have different microdescriptor digests for the same descriptor digest and consensus method, at least one of the authorities is broken. If this happens, the microdesc consensus should contain whichever microdescriptor digest is most common. If there is no winner, we break ties in the favor of the lexically earliest.
[*Before consensus method 13, this field was sometimes erroneously omitted.]
Additionally, a microdescriptor consensus SHOULD use the sha256 digest algorithm for its signatures.