Opened 7 years ago

Last modified 2 years ago

#6877 new defect

Finally replace all char[] buffers with uint8_t[] buffers

Reported by: nickm Owned by:
Priority: High Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-client tor-relay refactoring technical-debt lots-of-work
Cc: catalyst Actual Points:
Parent ID: Points: 10
Reviewer: Sponsor:

Description

We started doing this a while back, and covered a lot of the more worrisome cases, but there's lots more to do. Whenever we pass around a chunk of bytes, it should be as an array of an unsigned type. Otherwise we'll keep getting bugs like #6861.

This is going to have to go API by API, alas.

Child Tickets

TicketStatusOwnerSummaryComponent
#22410closedcatalystensure that uint8_t is unsigned charCore Tor/Tor

Change History (30)

comment:1 Changed 7 years ago by nickm

Incidentally, we might want to use a typedef for this, since "uint8_t" and "unsigned char" are a little bulky to type.

comment:2 Changed 7 years ago by nickm

Priority: normalmajor

comment:3 Changed 7 years ago by nickm

Keywords: tor-client added

comment:4 Changed 7 years ago by nickm

Component: Tor ClientTor

comment:5 Changed 6 years ago by arma

Milestone: Tor: 0.2.4.x-finalTor: 0.2.5.x-final

Deferring the major swath of changes to 0.2.5 (though if we find actual bugs while doing it, and it's not too late, we can fix them in 0.2.4 too)

comment:6 Changed 6 years ago by unixninja92

Status: newneeds_review

I've started converting crypto.c. Everything but openssl version handling, converting keys to and from strings, handling algorithm names, and encodings have been switched over to unsigned chars. aes.c has been completely converted as a well. https://github.com/unixninja92/Tor/tree/ticket6877crypto1

Last edited 6 years ago by unixninja92 (previous) (diff)

comment:7 Changed 6 years ago by nickm

Wow, thanks!

This is a ~5000 line patch series, so reviewing it is going to be a tiny bit slow, but I'll try to chug through it piece by piece. It compiles without warnings for me, so that's a good sign.

baf7eb6762befb40e86fdc0d337408329b031537:

  • crypto_add_spaces_to_fp and crypto_pk_get_fingerprint probably should still use char *, not uint8_t, since it's working on human-readable strings.

858ccbd17f9fe4bd36b049ab484b4fe1e4ed4e01:

  • looks okay.

comment:8 in reply to:  7 Changed 6 years ago by unixninja92

Replying to nickm:

Wow, thanks!

:) I had some free time on my hands.

baf7eb6762befb40e86fdc0d337408329b031537:

  • crypto_add_spaces_to_fp and crypto_pk_get_fingerprint probably should still use char *, not uint8_t, since it's working on human-readable strings.

I've just pushed a commit(6211cd4e1d8c093e64ee6cf0195694c08abb57ed) where I changed crypto_add_spaces_to_fp and crypto_pk_get_fingerprint back to char*.

Last edited 6 years ago by unixninja92 (previous) (diff)

comment:9 Changed 6 years ago by nickm

c48c2fb5979ad7189fb9f37a25e250e9379ad9eb:

  • Looks okay.
  • For crypto_digest_smartlist, since it's defined to take a list of NUL-terminated strings, I think it makes sense to have them remain char*.

24f08854b9e3efa94158ccb9b1f161f1a27b5e4a:

  • Wow, that's a big one...
  • In dirvote.c, since fingerprint is a hex string, it probably makes sense to have it stay char.
  • Otherwise looks okay.

comment:10 in reply to:  9 Changed 6 years ago by unixninja92

Replying to nickm:

c48c2fb5979ad7189fb9f37a25e250e9379ad9eb:

  • Looks okay.
  • For crypto_digest_smartlist, since it's defined to take a list of NUL-terminated strings, I think it makes sense to have them remain char*.

I made *append a char for crypto_dyget_smartlist again in 299de8cafb60bab080268dfef8a3f371e34439de. I left *digest_out as a uinit_8 because all the other digests in crypto are now uinit_8 and it would require a lot of casting to change it.

24f08854b9e3efa94158ccb9b1f161f1a27b5e4a:

  • Wow, that's a big one...
  • In dirvote.c, since fingerprint is a hex string, it probably makes sense to have it stay char.

This should be fixed in 6211cd4e1d8c093e64ee6cf0195694c08abb57ed.

  • Otherwise looks okay.

comment:11 Changed 5 years ago by nickm

Milestone: Tor: 0.2.5.x-finalTor: 0.2.6.x-final
Status: needs_reviewneeds_revision

Looking at the volume of patches that would be necessary to actually do this throughout tor, we need to find an easier, more automatable approach for validating patches here. If we do this as a series of n-thousand-line patches for each module converted, we'll surely have more patches than we can review or validate without the certainty of missing something, somewhere.

One option that Andrea and I discussed is to try to divide changes into:

  • one set of small patches that change function declarations: these would make things non-compiling, and these would be reviewed manually.
  • one set of bigger patches that do NOTHING besides add/remove casts to char* and uint8_t*, and change char types to uint8_t. These patches would be machine-verifiable to make no other changes, even to whitespace or formatting.
  • one set of misc patches that would do other stuff, as needed. These would be reviewed manually, and would be very small.

After that, we could do manual whitespace cleanup.

It's a bit of an irksome way to do development, but I think that the experiment here suggests that reviewing a non-automatable approach is going to be difficult too because of the volume of code.

Either way, this is something to revisit in 0.2.6.

comment:12 Changed 5 years ago by nickm

Milestone: Tor: 0.2.6.x-finalTor: 0.2.???

comment:13 Changed 5 years ago by nickm

Another option to use here would be trying to express transformations using coccinelle. Probably not trivial to do, though.

comment:14 in reply to:  11 ; Changed 4 years ago by unixninja92

Severity: Normal

Replying to nickm:
For the first set of patches, do you want that changed in the h files, c files, or both?

comment:15 in reply to:  14 Changed 4 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.2.8.x-final

Replying to unixninja92:

Replying to nickm:
For the first set of patches, do you want that changed in the h files, c files, or both?

Changing the .h files would be small and definitely make things not compile. Maybe the .h files and the corresponding definitions in the .c files only.

(I assume you've read the manual/automated/manual steps in https://trac.torproject.org/projects/tor/ticket/6877?replyto=14#comment:11 )

comment:16 Changed 4 years ago by nickm

Milestone: Tor: 0.2.8.x-finalTor: 0.2.???
Status: needs_revisionnew

comment:17 Changed 4 years ago by cypherpunks

uint8_t is a dangerous choice. It is not guaranteed by the standard to be the same as unsigned char and can be an unrelated type.

If it is an unrelated type, dereferencing an uint8_t* pointer that was created from a char* is undefined behavior (violates strict aliasing). There are a lot of casts between char* and uint8_t* in the Tor code, so this is likely happening.

comment:18 Changed 3 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:19 Changed 3 years ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:20 Changed 2 years ago by nickm

Keywords: tor-03-unspecified-201612 removed

Remove an old triaging keyword.

comment:21 Changed 2 years ago by nickm

Keywords: tor-relay refactoring technical-debt lots-of-work added
Points: 10

comment:22 Changed 2 years ago by catalyst

Cc: catalyst added

If we really want to avoid the aliasing problems we could use a typedef like u_char to abbreviate unsigned char. It's 6 characters instead of 7 (for uint8_t). (Note that u_char might only be available on BSD-ish platforms.)

My understanding is nothing in the C standard guarantees that uint8_t is unsigned char, but it is highly likely to be on POSIX platforms because POSIX requires uint8_t to exist. It is theoretically possible that uint8_t is a distinct type from unsigned char on a POSIX platform, but that would be a very unusual implementation choice. We should probably document (and test, if possible) this assumption if we choose to use uint8_t.

Also using unsigned char doesn't necessarily prevent all foreseeable signed arithmetic bugs in byte manipulation, because the integer promotions will promote unsigned char values to signed int on most platforms. (OK fine some unusual platforms could have identically sized char and int.) The fix for #6861 in 96d2a21683cdfe25b549e13fa450d4b12fb945b2 still right-shifts a signed integer, just a nonnegative one. That instance looks to be safe, but similar more complicated expressions could cause trouble.

comment:23 in reply to:  22 Changed 2 years ago by cypherpunks

Replying to catalyst:

It is theoretically possible that uint8_t is a distinct type from unsigned char on a POSIX platform, but that would be a very unusual implementation choice.

AFAIK uint8_t is guaranteed to be exactly 8 bits while unsigned char is CHAR_BIT bits which depends on the platform.

comment:24 Changed 2 years ago by catalyst

uint8_t is required by C99 to have no padding bits so the existence of uint8_t implies CHAR_BIT==8. The highly improbable condition to test for is that CHAR_BIT==8 and uint8_t exists, but uint8_t is a type (e.g., an extended integer type) that's distinct from unsigned char.

comment:25 Changed 2 years ago by catalyst

A comment on stackoverflow reminds me that uint8_t could be char on a platform where char is unsigned. It would still be a character type in that case and would therefore still have the privileged aliasing properties. (The test in #22410 would need to additionally check for the possibility that uint8_t is char on platforms where char is unsigned, if such platforms existed.)

comment:26 in reply to:  24 ; Changed 2 years ago by cypherpunks

Replying to catalyst:

uint8_t is required by C99 to have no padding bits so the existence of uint8_t implies CHAR_BIT==8. The highly improbable condition to test for is that CHAR_BIT==8 and uint8_t exists, but uint8_t is a type (e.g., an extended integer type) that's distinct from unsigned char.

I don't think you can make that implication because (as you state) uint8_t is a distinct type.

comment:27 in reply to:  25 Changed 2 years ago by cypherpunks

Replying to catalyst:

A comment on stackoverflow reminds me that uint8_t could be char on a platform where char is unsigned. It would still be a character type in that case and would therefore still have the privileged aliasing properties. (The test in #22410 would need to additionally check for the possibility that uint8_t is char on platforms where char is unsigned, if such platforms existed.)

Checking how C implementations implement certain data types is the wrong approach. The only thing you can depend on is how the C standard defines it. All else is implementation-defined and doomed to fail if the code depends on it.

comment:28 in reply to:  26 Changed 2 years ago by catalyst

Replying to cypherpunks:

Replying to catalyst:

uint8_t is required by C99 to have no padding bits so the existence of uint8_t implies CHAR_BIT==8. The highly improbable condition to test for is that CHAR_BIT==8 and uint8_t exists, but uint8_t is a type (e.g., an extended integer type) that's distinct from unsigned char.

I don't think you can make that implication because (as you state) uint8_t is a distinct type.

No, I said that it is highly improbable that uint8_t is a type distinct from unsigned char on POSIX platforms, and that we should check for that case.

C99 requires the absence of padding bits in uint8_t (and in all of the exactly-sized integer types in stdint.h). If a platform has uint8_t, then CHAR_BIT must be 8, because otherwise uint8_t would have padding bits. This is true regardless of whether uint8_t is a character type or some (implementation-defined) extended integer type.

comment:29 Changed 2 years ago by catalyst

Things get really hairy once you consider the possibility that uint8_t can be different from unsigned char (on a CHAR_BIT==8 system). Unfortunately, compiler developers have apparently seriously considered this in the past.

https://stackoverflow.com/questions/16138237/when-is-uint8-t-%E2%89%A0-unsigned-char#16138470
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66110
https://stackoverflow.com/questions/26297571/how-to-create-an-uint8-t-array-that-does-not-undermine-strict-aliasing

An additional direction is having (void *) parameters for functions that read or write byte arrays. In that case, a pointer to any type can be passed in so the implementation of those functions must use a can-alias-anything byte type like unsigned char internally.

(Also if uint8_t is an extended integer type, it looks like nothing in C99 guarantees that its bits will even be in the same order as those of an unsigned char. I think it might be feasible but annoying to test for that at compile time once we detect that they're different types.)

comment:30 Changed 2 years ago by catalyst

This gist goes through a very similar train of thought, and includes many references to interesting background material.

Note: See TracTickets for help on using tickets.