Opened 19 months ago

Closed 3 months ago

#17868 closed defect (fixed)

base64_decode_nopad() destination buffer length problem

Reported by: dgoulet Owned by: nikkolasg
Priority: Medium Milestone: Tor: 0.3.1.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: review-group-12
Cc: yawning, rl1987@… Actual Points:
Parent ID: #19531 Points: 2
Reviewer: dgoulet Sponsor: SponsorR-can

Description

TL;DR; the base64_decode_nopad() doesn't work.

Here is a concrete example. We have 40 bytes of binary data that we want to encode. With padding, that is using base64_encode() we end up with a size of 56 bytes. Those resulting bytes, when passed to base64_decode(), the check on the destination buffer done in that function makes it that we need 42 bytes and not the original 40 bytes. This is due because of the padding.

One solution, instead of explicitly adding 2 bytes like it's been done in many places in the code, it is to use the _nopad() interface. However, the base64_decode_nopad() simply adds some = at the end with a new source buffer and passes along the base64_decode() function. However the dstlen that is the destination buffer length where the decoded data will go is not updated to reflect the new length of the source buffer so the call fails because of the dstlen check in the decode function.

Passing 40 bytes for dstlen and 54 for srclen (which is the expected value _without_ padding), the nopad() call changes srclen to 56 bytes but then dstlen should be 42 bytes else the call fails.

I'm not sure how to fix that properly apart from making _nopad() call allocating a new source buffer if needed. I would much prefer that than the caller adding bytes beforehand making the code cryptic and honestly unsafe to any future length changes.

Thoughts?

Child Tickets

Attachments (5)

17868_base64_decode_nopad.patch (4.9 KB) - added by nikkolasg 14 months ago.
first try
base64_decode_nopad_v2.patch (6.2 KB) - added by nikkolasg 14 months ago.
some refinements.
base64_decode_nopad_reviewed1.patch (7.1 KB) - added by nikkolasg 13 months ago.
Patch after first review of dgoulet
base64_decode_nopad_reviewed2.patch (21.1 KB) - added by nikkolasg 13 months ago.
base64_decode_nopad_reviewed3.patch (6.6 KB) - added by nikkolasg 11 months ago.
third revision

Download all attachments as: .zip

Change History (47)

comment:1 Changed 19 months ago by yawning

  • Cc yawning added

The interfaces for all of this stuff need cleanups imo. The sensible thing to do would probably be to make our decoder factor in padding into the length calculations (fairly easy, in theory).

comment:2 Changed 19 months ago by rl1987

  • Cc rl1987@… added

comment:3 Changed 17 months ago by nickm

  • Milestone changed from Tor: 0.2.8.x-final to Tor: 0.2.9.x-final

comment:4 Changed 17 months ago by teor

  • Parent ID set to #16943

We need to fix this before #16943 merges.

comment:5 Changed 15 months ago by nickm

  • Points set to medium

comment:6 Changed 15 months ago by dgoulet

  • Sponsor changed from SponsorR to SponsorR-can

Move those from SponsorR to SponsorR-can.

Changed 14 months ago by nikkolasg

first try

Changed 14 months ago by nikkolasg

some refinements.

comment:7 Changed 14 months ago by nikkolasg

  • Owner set to nikkolasg
  • Status changed from new to assigned

comment:8 Changed 14 months ago by nikkolasg

  • Status changed from assigned to needs_review

comment:9 Changed 14 months ago by dgoulet

  • Points changed from medium to 0.1
  • Reviewer set to dgoulet

comment:10 Changed 13 months ago by nickm

  • Keywords review-group-2 added

Create a review-group-2 from (most of the) tickets in 0.2.8 or 0.2.9 or 029-nickm-says-yes listed as needs_review,

comment:11 Changed 13 months ago by dgoulet

  • Keywords review-group-2 removed
  • Status changed from needs_review to needs_revision

First of all, thanks for this!

Could you explain how this patch is fixing the issue? Basically, for the commit message.

Some review comments:

  • Rename base64_decode_internal to something like base64_decode_impl( and make it static so it's not accessible by anyone outside of util_format.c
  • In the test, uint8_t *dst2 = tor_malloc_zero(40); could probably be changed to uint8_t dst2[40]; and then pass sizeof(dst2) instead of 40.
  • In the test, I would put this string in a const char above "Hi,thisisatestforbase64decodenopadfuncti" like char *decoded_src2 and use it in the mem test with strlen(). It just makes things so much easier if we ever need to modify this part.
  • Please align the arguments of base64_decode_internal() like base64_decode_nopad does it.
  • I think the following checks could be done at the beginning of the function just before the malloc because now they leak buf memory on error:
  if (destlen < (srclen*3)/4)
    return -1;

  if (destlen > SIZE_T_CEILING)
    return -1;

comment:12 Changed 13 months ago by dgoulet

  • Parent ID #16943 deleted

Oh, this one is orthogonal to #16943. Yes it would be great to have it before so we don't have to explicitly add two bytes to all of our base64 decoded buffer but ultimately it won't change the behavior.

Removing it as a child of #16943.

comment:13 Changed 13 months ago by nikkolasg

Alongside with #14013, I finally found time to finalize it. Thanks for your comments, they were good !

One line commit message: Decoupled the verification logic from the decoding logic
=> Moved the decoding logic into its own function (base64_decode_impl) and both decode64_* make their own verification logic then call the decoding logic function.

Changed 13 months ago by nikkolasg

Patch after first review of dgoulet

comment:14 Changed 13 months ago by nikkolasg

  • Status changed from needs_revision to needs_review

comment:15 Changed 13 months ago by andrea

  • Status changed from needs_review to needs_revision

Review of https://trac.torproject.org/projects/tor/attachment/ticket/17868/base64_decode_nopad_reviewed1.patch for #17868:

(patch is in my ticket17868-nikkolasg-patch-v3 branch for reference/future work)

Substantive points:

  • I believe the copy/add padding on the source dance in base64_decode_nopad() is unnecessary, since base64_decode_internal() just falls out of the loop when it sees a padding character and will behave identically if called with n % 4 != 0 non-padding inputs in src.
  • if base64_decode_internal() is not supposed to be called from outside util_format.c, it should be static and declared in util_format.c rather than util_format.h, or perhaps STATIC and in an #ifdef-ed off private section of util_format.h if the test suite needs to see it.
  • It might be simpler, especially given the possibility of space characters in the middle of the input, to just check dest against dest_len at each write in base64_decode_internal() and error-exit if we run out of buffer rather than trying to precalculate output sizes.

Style/spelling/etc.:

  • s/responsability/responsibility/ in comment for base64_decode_internal()
  • please fix the alignment of args to base64_decode_internal()

comment:16 Changed 13 months ago by nikkolasg

You're damn right :) After doing your first change, it seemed like we could simplify way more ... and we can. I removed the base64_decode_impl() because it's in fact useless now. I added another test also to check invalid input without padding (srclen % 4 == 1).
You'll have to double check to make sure it's fine but the *sane conditions* seem respected.

About your third point, I'm afraid I don't see what you mean; we can use dst_len instead of using src_len in the decoding for-loop, but I don't see the big end-goal here. It seems to me that's the responsibility of the caller of precomputing the output size and that's what's happening now... ?

Also, if you want me to work based on your branch, where can I find it ? (link to repo?)

Last edited 13 months ago by nikkolasg (previous) (diff)

Changed 13 months ago by nikkolasg

comment:17 Changed 13 months ago by nikkolasg

  • Status changed from needs_revision to needs_review

comment:18 Changed 12 months ago by nickm

  • Keywords review-group-4 added

comment:19 Changed 12 months ago by dgoulet

  • Reviewer dgoulet deleted
  • Status changed from needs_review to needs_revision

Seems now that the nopad function is useless because it's just a call to base64_decode(). This also doesn't solve the issue. Using base64_encode_size() computation, here is an example (all in bytes):

to_encode_size = 40
encoded_size = base64_encode_size(40) == 56

base64_decode() makes this check: if (destlen < (srclen*3)/4)

... which turns out that if destlen is 40 and srclen is 56, well that check fails which is the original issue here. The nopad function should have taken a smaller value since we don't need padding thus don't go to 56. That nopad function needs to compensate for the extra padding that the decode one wants to put in there (which is what was there in the first place).

comment:20 Changed 12 months ago by dgoulet

  • Reviewer set to dgoulet

comment:21 follow-up: Changed 11 months ago by nikkolasg

I finally found time to dig more into this problem.
First of all, looking at the my current version of the code, here's where base64_{encode,decode}_nopad is used:

  • base64_encode_nopad:
    • crypto_format.c: int n = base64_encode_nopad(buf, sizeof(buf), sig->sig, ED25519_SIG_LEN);
    • test_crypto.c
    • test_dir_handle_get.c:
      • base64_encode_nopad(digest_base64,sizeof(digest_base64),(uint8_t *) digest,DIGEST256_LEN);
      • base64_encode_nopad(digest_base64, sizeof(digest_base64), (uint8_t *) digest,DIGEST256_LEN);
  • base64_decode_nopad:
    • test_crypto.c

Your example just above uses the function base64_encode_size to compute the length of the encoded buffer, but it seems to me that all the use case for base64_decode_nopad uses a already-known length for the base64 encoded buffer (DIGEST256_LEN,ED25519_SIG_LEN...).

Second of all, the issue description states:
Passing 40 bytes for dstlen and 54 for srclen (which is the expected value _without_ padding), the nopad() call changes srclen to 56 bytes but then dstlen should be 42 bytes else the call fails.
First thing I've done when tackling this issue is I wrote a failing test, and tried to make it work (test_util_format.c:189). It works now: passing a dst buffer of length 40 with a src buffer of length 54 decodes correctly. What you just described above is the use of base64_encode_size which will necessarily gives a src buffer length of 56. So we're not really back into the beginning :) Giving a correct size will *not* fail.

With these two remarks in mind, it seems to me that using base64_encode_size for encoding *without* padding is wrong API wise and logic wise. My suggestion would be to introduce a new function base64_encode_size_nopad which returns the encoded buffer size without padding. That way, both base64 APIs are being consistent:

  • Padding:
    • Get the base64 buffer length with m = base64_encode_size(n)
    • Encode with base64_encode(encoded,m,plain,n)
    • Decode with base64_decode(decoded,n,encoded,m)
  • No padding:
    • Get the base64 buffer length with m = base64_encode_size_nopad(n)
    • Encode with base64_encode_nopad(encoded,m,plain,n)
    • Decode with base64_decode_nopad(decoded,n,encoded,m)

Does it make sense or am I over my head ?
If it does make sense, I'm happy to provide that method once you approve it.
If it does not make sense, then I'm eager to hear your reaction ;)

Thanks
Ps: sorry for the huge latency, but holidays first !

Last edited 11 months ago by nikkolasg (previous) (diff)

comment:22 Changed 11 months ago by nikkolasg

  • Status changed from needs_revision to needs_review

comment:23 Changed 11 months ago by nickm

  • Keywords review-group-6 added

comment:24 in reply to: ↑ 21 Changed 11 months ago by dgoulet

  • Status changed from needs_review to needs_revision

Replying to nikkolasg:
[snip]

With these two remarks in mind, it seems to me that using base64_encode_size for encoding *without* padding is wrong API wise and logic wise. My suggestion would be to introduce a new function base64_encode_size_nopad which returns the encoded buffer size without padding. That way, both base64 APIs are being consistent:

  • Padding:
    • Get the base64 buffer length with m = base64_encode_size(n)
    • Encode with base64_encode(encoded,m,plain,n)
    • Decode with base64_decode(decoded,n,encoded,m)
  • No padding:
    • Get the base64 buffer length with m = base64_encode_size_nopad(n)
    • Encode with base64_encode_nopad(encoded,m,plain,n)
    • Decode with base64_decode_nopad(decoded,n,encoded,m)

Does it make sense or am I over my head ?
If it does make sense, I'm happy to provide that method once you approve it.

It does! It makes a consistent API which makes sense to me. It's symmetric also, so please go ahead! :)

Changed 11 months ago by nikkolasg

third revision

comment:25 Changed 11 months ago by nikkolasg

  • Status changed from needs_revision to needs_review

comment:26 Changed 11 months ago by nikkolasg

  • Status changed from needs_review to needs_revision

comment:27 Changed 11 months ago by nikkolasg

In order to make it API consistent, I had to change also the base64_encode{_nopad} structure. That way, it's possible to use the size given by base64_encode_size (before, base64_encode used implicitely the base64_encode_size in both cases). See the tests in test_util_format_base64_encode_size{_nopad} which test the "full API from a-to-z".

There's a few inconsistencies I did not solve right here (but it's easily resolvable):

  • base64_encode_nopad does not take any flags parameters while we have now a base64_encode_size_nopad with a flag parameter.
  • base64_encode expects char * but base64_encode_nopad expects uint8_t *.

You can find the branch base64_nopad in my github repo https://github.com/nikkolasg/tor/tree/base64_nopad
Ask for a patch if preferable.

comment:28 Changed 11 months ago by nikkolasg

  • Status changed from needs_revision to needs_review

comment:29 Changed 11 months ago by nickm

  • Keywords review-group-7 added; review-group-6 removed

All review-group-6 tickets have had at least one review; moving them to 7.

comment:30 Changed 11 months ago by dgoulet

  • Keywords review-group-4 review-group-7 removed
  • Milestone changed from Tor: 0.2.9.x-final to Tor: 0.3.0.x-final
  • Points changed from 0.1 to 2
  • Status changed from needs_review to needs_revision

Thanks nikkolasg! I think we are almost there! Some syntax and naming issues in this but overall I believe we have the right idea.

Now, I'm going to defer this one for 030 for two reasons. First, this has lots of changes for something that is quite core to many subsystems. I would like to take more time to think about the changes and test it before directly going into the next stable in a month or two. Second, we have other tickets open for this whole API and would be a good idea to assess the other baseXX functions to make the whole thing somehow "uniform". There are some changes here I believe we might want to take for the other functions.

comment:31 Changed 11 months ago by dgoulet

  • Parent ID set to #19531

comment:32 Changed 9 months ago by nikkolasg

I don't know what is your schedule for 030 but if possible I'd like to continue working on it. As you say, I also think it's a good idea to make the whole baseXX api uniform and I got ideas for it (uniform return type, uniform argument types, etc).
Would that be more reasonable to create one big issue for the uniformization of the baseXX api and then one sub-issue for each base{16,32,64} ?

comment:33 Changed 9 months ago by nikkolasg

  • Status changed from needs_revision to needs_review

comment:34 Changed 8 months ago by nickm

  • Keywords review-group-11 added

comment:35 Changed 8 months ago by nickm

  • Keywords review-group-12 added; review-group-11 removed

comment:36 Changed 7 months ago by dgoulet

  • Status changed from needs_review to needs_revision

Setting this in needs_revision as I think there are some more action items to do on this. I'll have a look asap on where we are and see what's left or if ready for review.

comment:37 Changed 5 months ago by dgoulet

  • Milestone changed from Tor: 0.3.0.x-final to Tor: 0.3.1.x-final

Not critical for Tor *but* needs to get fixed definitely. Moving it to 031.

Actually, I have a feeling that we should focus on the parent ticket #19531 instead which should fix this issue.

comment:38 follow-up: Changed 3 months ago by catalyst

I think part of the difficulty is that the contract of base64_decode_nopad() is a bit ambiguous. Must padding be absent? Must spaces (or newlines) be absent? i.e., must the unpadded input encoding be of minimum length (which also means that the output length is a function solely of the input length)? If the input to base64_decode_nopad() doesn't meet these constraints, should that be an error?

Also, in base64_decode(), the length check at the beginning is overly conservative. Maybe it should just start decoding and return an error if it runs out of space? Or maybe make a first pass over the input to count the actual number of output bytes required before the actual decoding?

comment:39 in reply to: ↑ 38 Changed 3 months ago by nickm

Replying to catalyst:

I think part of the difficulty is that the contract of base64_decode_nopad() is a bit ambiguous. Must padding be absent? Must spaces (or newlines) be absent? i.e., must the unpadded input encoding be of minimum length (which also means that the output length is a function solely of the input length)? If the input to base64_decode_nopad() doesn't meet these constraints, should that be an error?

Suggestion:

The nopad() variant should allow either padding or no padding.

Also, in base64_decode(), the length check at the beginning is overly conservative. Maybe it should just start decoding and return an error if it runs out of space? Or maybe make a first pass over the input to count the actual number of output bytes required before the actual decoding?

I think the first option is better? Two-pass approaches can be risky if there is the tiniest mismatch in the passes' behavior.

comment:40 Changed 3 months ago by catalyst

Further findings:

  • Nothing besides test code currently calls base64_decode_nopad()
  • base64_decode_nopad() probably has no reason to exist if base64_decode() is sufficiently lenient

[moved comment from parent ticket]

comment:41 Changed 3 months ago by catalyst

  • Status changed from needs_revision to needs_review

Proposed (somewhat more minimalist) fix in https://gitlab.com/argonblue/tor/merge_requests/3
This deletes base64_decode_nopad(), makes base64_decode() check the actual decoded length instead of making an overly conservative estimate, and removes the now-obsolete SR_COMMIT_LEN workaround in #16943.

comment:42 Changed 3 months ago by nickm

  • Resolution set to fixed
  • Status changed from needs_review to closed

LGTM but needed a changes file. I've merged this to master, and added a changes file as aebd72a2f07dbb. Thanks!

Note: See TracTickets for help on using tickets.