Opened 3 years ago
Last modified 2 years ago
#19531 new enhancement
Major cleanup in our baseXX APIs
Reported by: | dgoulet | Owned by: | |
---|---|---|---|
Priority: | Medium | Milestone: | Tor: unspecified |
Component: | Core Tor/Tor | Version: | |
Severity: | Normal | Keywords: | util tor-client base32 base64 base16 cleanup technical-debt |
Cc: | catalyst | Actual Points: | |
Parent ID: | Points: | 1 | |
Reviewer: | Sponsor: |
Description
Our base16/32/64 APIs are quite inconsistent and a timebomb of possible errors and issues. Here is some recommendation with this:
1) We should have for _each_ type of encoding a function that returns the encoded size using a source length. We have such function for baese32 and base64 but they are not consistent:
- base32:
base32_encoded_size()
returns the size including the NUL byte
- base64:
base64_encode_size()
has a different name and does NOT add the NUL byte.
They should all return the size with the extra NUL byte because every _encode()
function we have requires it in the first place. The other solution is to have a new function baseXX_encoded_string_size()
or something that return the NUL byte and the non string function returns the value without it.
Else we end up with this kind of code pattern:
len = base64_encoded_size() + 1 buf = tor_malloc_zero(len) ret = base64_encode() tor_assert(ret == len - 1)
or the +1 added explicitly like so which is _not_ good.
base32_encode(serviceid, REND_SERVICE_ID_LEN_BASE32+1, circuit->rend_data->rend_pk_digest, REND_SERVICE_ID_LEN);
or even better:
base16_encode(hex, 2*CONDITIONAL_CONSENSUS_FPR_LEN+1, ds->v3_identity_digest, CONDITIONAL_CONSENSUS_FPR_LEN);
2) Source length requirement issue. I think though most of them are fixed except base64 API with ticket #17868.
I do see this in the base16_decode
which could either be a _hard_ requirement assert or assume that if it gets 9 bytes in srclen, well only 8 are decoded. We have callsites that do NOT check the returned value so...
if ((srclen % 2) != 0) return -1;
3) Every baseXX_encode/decode should return the amount of bytes that has been encoded or decoded. They also should return ssize_t;
but that's another ticket I feel like because it's a large refactoring. Here are the missing pieces:
base16_encode()
returns void so it should returnint
as the encoded length.base32_encode()
returns void.base32_decode()
returns 0 on success.
4) We should also have macros for the encoded length computation else we ended up with stuff like this (taken from the prop250 branch).
#define SR_SRV_VALUE_BASE64_LEN \ (((DIGEST256_LEN - 1) / 3) * 4 + 4)
or flat values
#define REND_DESC_COOKIE_LEN_BASE64 22
This is a static calculation so most of the times that macro would be more useful than the function since it could be used for stack allocation which is a BIG use case of ours.
Child Tickets
Ticket | Type | Status | Owner | Summary |
---|---|---|---|---|
#14013 | defect | closed | nikkolasg | base16_decode() API is inconsistent and error-prone |
#17868 | defect | closed | nikkolasg | base64_decode_nopad() destination buffer length problem |
#19222 | defect | closed | base64_decode() unreachable heap corruption on 32-bit systems | |
#19564 | enhancement | closed | SR: use macros to compute our base64 encoded length | |
#21872 | enhancement | closed | catalyst | encoded length macros for baseXX encodings |
#22123 | enhancement | new | baseXX API strictness | |
#28913 | defect | closed | nickm | Base32_decode should return the length of its result. |
Change History (13)
comment:1 Changed 3 years ago by
comment:2 Changed 3 years ago by
Keywords: | 029-proposed removed |
---|
Ok, we are getting to a point we have lots of tickets about the baseXX APIs so I'm making this one the parent ticket for them all. Marking it for 030 because I believe we should fix all those sooner than later.
comment:4 Changed 3 years ago by
Keywords: | tor-03-unspecified-201612 added |
---|---|
Milestone: | Tor: 0.3.??? → Tor: unspecified |
Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.
comment:5 Changed 3 years ago by
Further complication: in our baseXX APIs, base64 is optionally padded, but base32 never is (even though the RFC4648 default is to pad).
Also, should we reject encodings whose padding bits are non-zero?
comment:6 Changed 3 years ago by
Also, should we reject encodings whose padding bits are non-zero?
You mean, ones where the padding isn't made of "="s ? Or something else?
comment:7 Changed 3 years ago by
I mean where a non-=
character at the end of the encoding is supposed to contain some padding bits. e.g., for a base64 encoding where the input length is 1 mod 3, the last 4 characters are XY==
, where X can be any base64 alphabet character, but there are only 4 valid canonical values for Y (because it encodes the low order 2 bits of the last input byte, along with 4 padding bits that should be zero).
To put it another way, when you're decoding a base64 encoding, you can end up with 4 or 2 extra bits at the end that don't go into an output byte. For a canonical encoding, these bits should be zero, but a lenient decoder might accept nonzero bits there.
comment:8 Changed 3 years ago by
Ah. Yes, I think we should probably reject stuff where the bits to discard are nonzero -- that would probably indicate an encoding error.
comment:9 Changed 3 years ago by
REND_DESC_COOKIE_LEN_BASE64
is a special case because it's the length of a truncated base64 encoding (not just an unpadded one). rend_auth_encode_cookie()
removes the A
character that represents the low bits of the extended auth cookie that are known to be zero. This gives the encoded cookie different properties than a 22-byte unpadded base64 encoding (which would ordinarily decode to 16 bytes).
I think this is not a great situation but it's already deployed, right? We should probably document it better.
comment:10 Changed 3 years ago by
Cc: | catalyst added |
---|
comment:11 Changed 3 years ago by
Keywords: | tor-03-unspecified-201612 removed |
---|
Remove an old triaging keyword.
comment:12 Changed 2 years ago by
Keywords: | tor-client base32 base64 base16 added |
---|
comment:13 Changed 2 years ago by
Keywords: | cleanup technical-debt added |
---|
There is also a
base64_decode()
overflow at #19222 which might be worth fixing.