Thanks for the patch! The getrandom(2) manpage on one of my Ubuntu VMs says
The behavior when a call to getrandom() that is blocked while reading from /dev/urandom is interrupted by a signal handler depends on the initialization state of the entropy buffer and on the request size, buflen. If the entropy is not yet initialized, then the call will fail with the EINTR error.
so in the case of tor starting up soon after boot, I think it might be possible to get EINTR if tor receives a signal while blocked on insufficient entropy. Arguably we want to retry in this case. (Also I would be interested in hearing if there are good reasons to treat this as a bug when it occurs anyway.)
Please make a GitHub pull request for your revised patches so CI can run on them. Thanks!
Hm... I see, you are correct. I really really hope that nowadays everybody has random seed persistence, but it is of course better to be conservative here.
Ah, I remember. I read the EINTR part of the ERRORS section, and I interpreted it to mean that we will never receive EINTR, contrary to your quote. I filed https://bugzilla.kernel.org/show_bug.cgi?id=199711 to ask which one it is.
Ah, I remember. I read the EINTR part of the ERRORS section, and I interpreted it to mean that we will never receive EINTR, contrary to your quote. I filed https://bugzilla.kernel.org/show_bug.cgi?id=199711 to ask which one it is.
Thanks for filing the kernel bug report! It looks like libevent might always set SA_RESTART when installing signal handlers on systems with sigaction()? (at least based on my quick skim of the source)
When thinking about how to describe the user-visible parts of this change, I realized that the previous code would loop on EINTR, while the patch causes a failure and disables getrandom() thereafter. This is unlikely to be a problem in practice, because libevent seems to always set SA_RESTART, which should prevent us from getting EINTR.
Maybe we should mention this in the changes file. On the other hand, maybe the conservative and likely harmless thing to do is to leave the existing loop as it is, even if it doesn't ever end up looping. If we restore the loop, I think the remaining parts of the patch are some comment improvements and handling of a (also unlikely) short-read condition.
Hm... I see, you are correct. I really really hope that nowadays everybody has random seed persistence, but it is of course better to be conservative here.
Unfortunately, the random seed takes quite some time (on the order of minutes) to actually take effect. The seed is written to the non-blocking character device which triggers the random_write file operation which uses write_pool to send the data to the input pool. Unfortunately it can take a while for the secondary pools to receive the seed, since they have to wait for the push_to_pool workqueue function to be triggered. On newer Linux kernels (using ChaCha20 rather than SHA-1 for the non-blocking character device), the input pool is queried every 5 minutes, and it only reseeds the stream cipher if more than 128 bits of entropy have been collected in the input pool since the last reseed.
If you do not check for EINTR (and avoid the blocking behavior altogether), then even if you are using a persistent random seed, you will end up obtaining potentially predictable random data.
When thinking about how to describe the user-visible parts of this change, I realized that the previous code would loop on EINTR, while the patch causes a failure and disables getrandom() thereafter. This is unlikely to be a problem in practice, because libevent seems to always set SA_RESTART, which should prevent us from getting EINTR.
It would be foolish to rely on a library's current behavior if it's not explicitly standardized.
Maybe we should mention this in the changes file. On the other hand, maybe the conservative and likely harmless thing to do is to leave the existing loop as it is, even if it doesn't ever end up looping. If we restore the loop, I think the remaining parts of the patch are some comment improvements and handling of a (also unlikely) short-read condition.
It would be completely harmless. You can even use LTO to allow the compiler to optimize the loop out.