We started doing this a while back, and covered a lot of the more worrisome cases, but there's lots more to do. Whenever we pass around a chunk of bytes, it should be as an array of an unsigned type. Otherwise we'll keep getting bugs like #6861 (moved).
This is going to have to go API by API, alas.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
I've started converting crypto.c. Everything but openssl version handling, converting keys to and from strings, handling algorithm names, and encodings have been switched over to unsigned chars. aes.c has been completely converted as a well. https://github.com/unixninja92/Tor/tree/ticket6877crypto1
Trac: Username: unixninja92 Status: new to needs_review
This is a ~5000 line patch series, so reviewing it is going to be a tiny bit slow, but I'll try to chug through it piece by piece. It compiles without warnings for me, so that's a good sign.
baf7eb6762befb40e86fdc0d337408329b031537:
crypto_add_spaces_to_fp and crypto_pk_get_fingerprint probably should still use char *, not uint8_t, since it's working on human-readable strings.
Wow, thanks!
:) I had some free time on my hands.
baf7eb6762befb40e86fdc0d337408329b031537:
crypto_add_spaces_to_fp and crypto_pk_get_fingerprint probably should still use char *, not uint8_t, since it's working on human-readable strings.
I've just pushed a commit(6211cd4e1d8c093e64ee6cf0195694c08abb57ed) where I changed crypto_add_spaces_to_fp and crypto_pk_get_fingerprint back to char*.
For crypto_digest_smartlist, since it's defined to take a list of NUL-terminated strings, I think it makes sense to have them remain char*.
I made *append a char for crypto_dyget_smartlist again in 299de8cafb60bab080268dfef8a3f371e34439de. I left *digest_out as a uinit_8 because all the other digests in crypto are now uinit_8 and it would require a lot of casting to change it.
24f08854b9e3efa94158ccb9b1f161f1a27b5e4a:
Wow, that's a big one...
In dirvote.c, since fingerprint is a hex string, it probably makes sense to have it stay char.
This should be fixed in 6211cd4e1d8c093e64ee6cf0195694c08abb57ed.
Looking at the volume of patches that would be necessary to actually do this throughout tor, we need to find an easier, more automatable approach for validating patches here. If we do this as a series of n-thousand-line patches for each module converted, we'll surely have more patches than we can review or validate without the certainty of missing something, somewhere.
One option that Andrea and I discussed is to try to divide changes into:
one set of small patches that change function declarations: these would make things non-compiling, and these would be reviewed manually.
one set of bigger patches that do NOTHING besides add/remove casts to char* and uint8_t*, and change char types to uint8_t. These patches would be machine-verifiable to make no other changes, even to whitespace or formatting.
one set of misc patches that would do other stuff, as needed. These would be reviewed manually, and would be very small.
After that, we could do manual whitespace cleanup.
It's a bit of an irksome way to do development, but I think that the experiment here suggests that reviewing a non-automatable approach is going to be difficult too because of the volume of code.
Either way, this is something to revisit in 0.2.6.
Trac: Milestone: Tor: 0.2.5.x-final to Tor: 0.2.6.x-final Status: needs_review to needs_revision
Replying to nickm:
For the first set of patches, do you want that changed in the h files, c files, or both?
Changing the .h files would be small and definitely make things not compile. Maybe the .h files and the corresponding definitions in the .c files only.
uint8_t is a dangerous choice. It is not guaranteed by the standard to be the same as unsigned char and can be an unrelated type.
If it is an unrelated type, dereferencing an uint8_t* pointer that was created from a char* is undefined behavior (violates strict aliasing). There are a lot of casts between char* and uint8_t* in the Tor code, so this is likely happening.
If we really want to avoid the aliasing problems we could use a typedef like u_char to abbreviate unsigned char. It's 6 characters instead of 7 (for uint8_t). (Note that u_char might only be available on BSD-ish platforms.)
My understanding is nothing in the C standard guarantees that uint8_t is unsigned char, but it is highly likely to be on POSIX platforms because POSIX requires uint8_t to exist. It is theoretically possible that uint8_t is a distinct type from unsigned char on a POSIX platform, but that would be a very unusual implementation choice. We should probably document (and test, if possible) this assumption if we choose to use uint8_t.
Also using unsigned char doesn't necessarily prevent all foreseeable signed arithmetic bugs in byte manipulation, because the integer promotions will promote unsigned char values to signed int on most platforms. (OK fine some unusual platforms could have identically sized char and int.) The fix for #6861 (moved) in 96d2a21683cdfe25b549e13fa450d4b12fb945b2 still right-shifts a signed integer, just a nonnegative one. That instance looks to be safe, but similar more complicated expressions could cause trouble.
It is theoretically possible that uint8_t is a distinct type from unsigned char on a POSIX platform, but that would be a very unusual implementation choice.
AFAIK uint8_t is guaranteed to be exactly 8 bits while unsigned char is CHAR_BIT bits which depends on the platform.
uint8_t is required by C99 to have no padding bits so the existence of uint8_t implies CHAR_BIT==8. The highly improbable condition to test for is that CHAR_BIT==8 and uint8_t exists, but uint8_t is a type (e.g., an extended integer type) that's distinct from unsigned char.
A comment on stackoverflow reminds me that uint8_t could be char on a platform where char is unsigned. It would still be a character type in that case and would therefore still have the privileged aliasing properties. (The test in #22410 (moved) would need to additionally check for the possibility that uint8_t is char on platforms where char is unsigned, if such platforms existed.)
uint8_t is required by C99 to have no padding bits so the existence of uint8_t implies CHAR_BIT==8. The highly improbable condition to test for is that CHAR_BIT==8 and uint8_t exists, but uint8_t is a type (e.g., an extended integer type) that's distinct from unsigned char.
I don't think you can make that implication because (as you state) uint8_t is a distinct type.
A comment on stackoverflow reminds me that uint8_t could be char on a platform where char is unsigned. It would still be a character type in that case and would therefore still have the privileged aliasing properties. (The test in #22410 (moved) would need to additionally check for the possibility that uint8_t is char on platforms where char is unsigned, if such platforms existed.)
Checking how C implementations implement certain data types is the wrong approach. The only thing you can depend on is how the C standard defines it. All else is implementation-defined and doomed to fail if the code depends on it.
uint8_t is required by C99 to have no padding bits so the existence of uint8_t implies CHAR_BIT==8. The highly improbable condition to test for is that CHAR_BIT==8 and uint8_t exists, but uint8_t is a type (e.g., an extended integer type) that's distinct from unsigned char.
I don't think you can make that implication because (as you state) uint8_t is a distinct type.
No, I said that it is highly improbable that uint8_t is a type distinct from unsigned char on POSIX platforms, and that we should check for that case.
C99 requires the absence of padding bits in uint8_t (and in all of the exactly-sized integer types in stdint.h). If a platform has uint8_t, then CHAR_BIT must be 8, because otherwise uint8_t would have padding bits. This is true regardless of whether uint8_t is a character type or some (implementation-defined) extended integer type.
Things get really hairy once you consider the possibility that uint8_t can be different from unsigned char (on a CHAR_BIT==8 system). Unfortunately, compiler developers have apparently seriously considered this in the past.
An additional direction is having (void *) parameters for functions that read or write byte arrays. In that case, a pointer to any type can be passed in so the implementation of those functions must use a can-alias-anything byte type like unsigned char internally.
(Also if uint8_t is an extended integer type, it looks like nothing in C99 guarantees that its bits will even be in the same order as those of an unsigned char. I think it might be feasible but annoying to test for that at compile time once we detect that they're different types.)