Opened 4 years ago

Last modified 3 years ago

#19531 new enhancement

Major cleanup in our baseXX APIs

Reported by: dgoulet Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: util tor-client base32 base64 base16 cleanup technical-debt
Cc: catalyst Actual Points:
Parent ID: Points: 1
Reviewer: Sponsor:


Our base16/32/64 APIs are quite inconsistent and a timebomb of possible errors and issues. Here is some recommendation with this:

1) We should have for _each_ type of encoding a function that returns the encoded size using a source length. We have such function for baese32 and base64 but they are not consistent:

  • base32: base32_encoded_size() returns the size including the NUL byte
  • base64: base64_encode_size() has a different name and does NOT add the NUL byte.

They should all return the size with the extra NUL byte because every _encode() function we have requires it in the first place. The other solution is to have a new function baseXX_encoded_string_size() or something that return the NUL byte and the non string function returns the value without it.

Else we end up with this kind of code pattern:

len = base64_encoded_size() + 1
buf = tor_malloc_zero(len)
ret = base64_encode()
tor_assert(ret == len - 1)

or the +1 added explicitly like so which is _not_ good.

base32_encode(serviceid, REND_SERVICE_ID_LEN_BASE32+1,
              circuit->rend_data->rend_pk_digest, REND_SERVICE_ID_LEN);

or even better:

base16_encode(hex, 2*CONDITIONAL_CONSENSUS_FPR_LEN+1,
              ds->v3_identity_digest, CONDITIONAL_CONSENSUS_FPR_LEN);

2) Source length requirement issue. I think though most of them are fixed except base64 API with ticket #17868.

I do see this in the base16_decode which could either be a _hard_ requirement assert or assume that if it gets 9 bytes in srclen, well only 8 are decoded. We have callsites that do NOT check the returned value so...

  if ((srclen % 2) != 0)
    return -1;

3) Every baseXX_encode/decode should return the amount of bytes that has been encoded or decoded. They also should return ssize_t; but that's another ticket I feel like because it's a large refactoring. Here are the missing pieces:

  • base16_encode() returns void so it should return int as the encoded length.
  • base32_encode() returns void.
  • base32_decode() returns 0 on success.

4) We should also have macros for the encoded length computation else we ended up with stuff like this (taken from the prop250 branch).

  (((DIGEST256_LEN - 1) / 3) * 4 + 4)

or flat values


This is a static calculation so most of the times that macro would be more useful than the function since it could be used for stack allocation which is a BIG use case of ours.

Child Tickets

#14013defectclosednikkolasgbase16_decode() API is inconsistent and error-prone
#17868defectclosednikkolasgbase64_decode_nopad() destination buffer length problem
#19222defectclosedbase64_decode() unreachable heap corruption on 32-bit systems
#19564enhancementclosedSR: use macros to compute our base64 encoded length
#21872enhancementclosedcatalystencoded length macros for baseXX encodings
#22123enhancementnewbaseXX API strictness
#28913defectclosednickmBase32_decode should return the length of its result.

Change History (13)

comment:1 Changed 4 years ago by asn

There is also a base64_decode() overflow at #19222 which might be worth fixing.

comment:2 Changed 4 years ago by dgoulet

Keywords: 029-proposed removed

Ok, we are getting to a point we have lots of tickets about the baseXX APIs so I'm making this one the parent ticket for them all. Marking it for 030 because I believe we should fix all those sooner than later.

comment:3 Changed 4 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:4 Changed 4 years ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:5 Changed 4 years ago by catalyst

Further complication: in our baseXX APIs, base64 is optionally padded, but base32 never is (even though the RFC4648 default is to pad).

Also, should we reject encodings whose padding bits are non-zero?

comment:6 Changed 4 years ago by nickm

Also, should we reject encodings whose padding bits are non-zero?

You mean, ones where the padding isn't made of "="s ? Or something else?

comment:7 Changed 4 years ago by catalyst

I mean where a non-= character at the end of the encoding is supposed to contain some padding bits. e.g., for a base64 encoding where the input length is 1 mod 3, the last 4 characters are XY==, where X can be any base64 alphabet character, but there are only 4 valid canonical values for Y (because it encodes the low order 2 bits of the last input byte, along with 4 padding bits that should be zero).

To put it another way, when you're decoding a base64 encoding, you can end up with 4 or 2 extra bits at the end that don't go into an output byte. For a canonical encoding, these bits should be zero, but a lenient decoder might accept nonzero bits there.

comment:8 Changed 4 years ago by nickm

Ah. Yes, I think we should probably reject stuff where the bits to discard are nonzero -- that would probably indicate an encoding error.

comment:9 Changed 4 years ago by catalyst

REND_DESC_COOKIE_LEN_BASE64 is a special case because it's the length of a truncated base64 encoding (not just an unpadded one). rend_auth_encode_cookie() removes the A character that represents the low bits of the extended auth cookie that are known to be zero. This gives the encoded cookie different properties than a 22-byte unpadded base64 encoding (which would ordinarily decode to 16 bytes).

I think this is not a great situation but it's already deployed, right? We should probably document it better.

comment:10 Changed 3 years ago by catalyst

Cc: catalyst added

comment:11 Changed 3 years ago by nickm

Keywords: tor-03-unspecified-201612 removed

Remove an old triaging keyword.

comment:12 Changed 3 years ago by nickm

Keywords: tor-client base32 base64 base16 added

comment:13 Changed 3 years ago by nickm

Keywords: cleanup technical-debt added
Note: See TracTickets for help on using tickets.