Opened 3 weeks ago

Last modified 34 hours ago

#31552 merge_ready defect

--disable-module-dirauth broken (missing symbols)

Reported by: LarryBitcoin Owned by: nickm
Priority: Medium Milestone: Tor: 0.4.1.x-final
Component: Core Tor/Tor Version: Tor: 0.4.1.5
Severity: Normal Keywords: build, configure, features, modules, regression, 041-should, 041-backport
Cc: Actual Points: .1
Parent ID: Points:
Reviewer: catalyst Sponsor: Sponsor31-can

Description (last modified by catalyst)

build with ./configure --disable-module-dirauth

this will fail at linking time because of missing symbols.

It used to work in 0.4.0.5 and is broken in 0.4.1.5

for example

make: *** [Makefile:9672: src/app/tor] Error 1
ld.lld: error: undefined symbol: dirserv_should_launch_reachability_test
>>> referenced by ld-temp.o
>>>               lto.tmp:(routers_update_status_from_consensus_networkstatus)

ld.lld: error: undefined symbol: authdir_wants_to_reject_router
>>> referenced by ld-temp.o
>>>               lto.tmp:(router_add_to_routerlist)

ld.lld: error: undefined symbol: dirserv_would_reject_router
>>> referenced by ld-temp.o
>>>               lto.tmp:(update_consensus_router_descriptor_downloads)
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Child Tickets

Change History (27)

comment:1 Changed 3 weeks ago by nickm

It works for me, but I se that you're maybe using link-time optimization. Can you say more about your build environment?

comment:2 Changed 3 weeks ago by nickm

Keywords: regression added
Milestone: Tor: 0.4.1.x-final

comment:3 Changed 3 weeks ago by teor

Sometimes, LTO errors happen because the function signatures are different in different modules.
Maybe we should check these functions.

comment:4 Changed 3 weeks ago by LarryBitcoin

I have two projects that failed with the latest version: one that uses tor as a library in process rather than as an independent executable/process, and one that uses tor as its own 'vanilla' independent binary/process. Both worked fine with 0.4.0.5 and lto on and --disable-module-dirauth set and both don't with 0.4.1.5

I tried the library without lto and it still failed

tor/src/core/libtor-app.a(main.o):main.c:function run_tor_main_loop: error: undefined reference to 'keypin_load_journal'
tor/src/core/libtor-app.a(main.o):main.c:function run_tor_main_loop: error: undefined reference to 'keypin_open_journal'
tor/src/core/libtor-app.a(main.o):main.c:function do_hup: error: undefined reference to 'dirserv_load_fingerprint_file'
tor/src/core/libtor-app.a(connection_or.o):connection_or.c:function connection_or_client_learned_peer_id: error: undefined reference to 'dirserv_orconn_tls_done'
tor/src/core/libtor-app.a(networkstatus.o):networkstatus.c:function routers_update_status_from_consensus_networkstatus: error: undefined reference to 'dirserv_should_launch_reachability_test'
src/core/libtor-app.a(nodelist.o):nodelist.c:function nodelist_set_routerinfo: error: undefined reference to 'dirserv_router_get_status'
tor/src/core/libtor-app.a(nodelist.o):nodelist.c:function nodelist_set_routerinfo: error: undefined reference to 'dirserv_set_node_flags_from_authoritative_status'
tor/src/core/libtor-app.a(routerlist.o):routerlist.c:function router_add_to_routerlist: error: undefined reference to 'authdir_wants_to_reject_router'
tor/src/core/libtor-app.a(routerlist.o):routerlist.c:function update_consensus_router_descriptor_downloads: error: undefined reference to 'dirserv_would_reject_router'
tor/src/core/libtor-app.a(router.o):router.c:function init_keys: error: undefined reference to 'dirserv_add_own_fingerprint'
tor/src/core/libtor-app.a(router.o):router.c:function init_keys: error: undefined reference to 'dirserv_add_descriptor'
tor/src/core/libtor-app.a(router.o):router.c:function init_keys: error: undefined reference to 'dirserv_load_fingerprint_file'
tor/src/core/libtor-app.a(dircache.o):dircache.c:function directory_handle_command_post: error: undefined reference to 'dirserv_add_multiple_descriptors'     
clang: error: linker command failed with exit code 1 (use -v to see invocation) 
ninja: build stopped: subcommand failed.   

I used the following config (static openssl, static libevent, static zlib).

CONFIGURE_ARGS="--prefix=${BUILD_ROOT}/tor/build --disable-system-torrc --disable-asciidoc --enable-pic --enable-static-openssl \
                --enable-static-libevent --enable-static-zlib --with-openssl-dir=${BUILD_ROOT}/openssl/build \
                --with-libevent-dir=${BUILD_ROOT}/libevent/build --with-zlib-dir=${BUILD_ROOT}/zlib/build \
                --disable-system-torrc --disable-systemd --disable-zstd --disable-lzma --disable-largefile ac_cv_c_bigendian=no --disable-unittests --disable-tool-name-check --disable-rust --disable-module-dirauth"


without --disable-module-dirauth it works

The other use of tor is in its' binary form and that definitively fails with lto on, see https://travis-ci.org/greenaddress/bitcoin_ndk/jobs/578034561

I'm still doing some testing to confirm if it works without lto in binary form but as a library even without lto it breaks.

Last edited 3 weeks ago by LarryBitcoin (previous) (diff)

comment:5 Changed 3 weeks ago by LarryBitcoin

I can confirm that binary build with lto off and --disable-module-dirauth set builds fine.

comment:6 Changed 3 weeks ago by nickm

Owner: set to nickm
Status: newaccepted

Ah, this is for android!

I suspect that the reason I can't reproduce this issue locally is that it has something to do with the behavior of LTO on the android toolchain specifically. This is confirmed by #31301, which might be the same issue.

Looking at the code: It seems that the functions that the compiler is complaining about are only called within blocks that, after compilation with --disable-module-dirauth, become "if (0) { ... }". So I can believe that they are totally removed before linking sometimes, but not all the time. The fix should be simple enough if this is the case.

comment:7 Changed 3 weeks ago by nickm

Bug confirmed.

What does the compiler do when it sees:

   void function();
   // ...
   if (0) {
     function();
   }

but function() is never actually defined?

Usually, the compiler will remove the code in the if (0) block, so that by the time the linker sees it, the call to function() is no longer there.

But there are at least two circumstances in which the call to function() might last long enough for the linker to see that function() doesn't exist.

  1. When building with LTO, the call to function() might not get removed until later on.
  2. When building with -O0, the call won't get removed at all.

I've reproduced these warnings by building Tor with -O0.

comment:8 Changed 3 weeks ago by nickm

Actual Points: .1
Sponsor: Sponsor31-can
Status: acceptedneeds_review

Fix in my branch ticket31552_041; pull request at https://github.com/torproject/tor/pull/1273

(This is 31-can because it is followup from a modularization ticket.)

comment:9 Changed 3 weeks ago by teor

Should we run the --disable-module-dirauth CI build with -O0, to catch errors like this earlier?

comment:10 in reply to:  9 Changed 3 weeks ago by nickm

Replying to teor:

Should we run the --disable-module-dirauth CI build with -O0, to catch errors like this earlier?

I just made #31560 for that. :)

comment:11 Changed 3 weeks ago by LarryBitcoin

Nice one!

I tried your branch and it looks like it fixes the lto binary

The lto tor in library mode seems still broken (but i wonder now if it's my build config or not)

tor/src/core/libtor-app.a(main.o):main.c:function run_tor_main_loop: error: undefined reference to 'keypin_load_journal'
tor/src/core/libtor-app.a(main.o):main.c:function run_tor_main_loop: error: undefined reference to 'keypin_open_journal'
tor/src/core/libtor-app.a(main.o):main.c:function do_hup: error: undefined reference to 'dirserv_load_fingerprint_file'
tor/src/core/libtor-app.a(connection_or.o):connection_or.c:function connection_or_client_learned_peer_id: error: undefined reference to 'dirserv_orconn_tls_done'
tor/src/core/libtor-app.a(nodelist.o):nodelist.c:function nodelist_set_routerinfo: error: undefined reference to 'dirserv_router_get_status'
tor/src/core/libtor-app.a(nodelist.o):nodelist.c:function nodelist_set_routerinfo: error: undefined reference to 'dirserv_set_node_flags_from_authoritative_status'
tor/src/core/libtor-app.a(router.o):router.c:function init_keys: error: undefined reference to 'dirserv_add_own_fingerprint'
tor/src/core/libtor-app.a(router.o):router.c:function init_keys: error: undefined reference to 'dirserv_add_descriptor'
tor/src/core/libtor-app.a(router.o):router.c:function init_keys: error: undefined reference to 'dirserv_load_fingerprint_file'
tor/src/core/libtor-app.a(dircache.o):dircache.c:function directory_handle_command_post: error: undefined reference to 'dirserv_add_multiple_descriptors'     
clang: error: linker command failed with exit code 1 (use -v to see invocation) 
ninja: build stopped: subcommand failed.                                        
                                                                                

I'll run more tests locally and get back

comment:12 Changed 3 weeks ago by LarryBitcoin

I am not sure if the library failures are lto related or not but i tried to build my application which integrates with tor as a library on osx with lto off and the dirauth module disabled

Undefined symbols for architecture x86_64:
  "_dirserv_add_descriptor", referenced from:
      _init_keys in libtor-app.a(router.o)
  "_dirserv_add_multiple_descriptors", referenced from:
      _directory_handle_command_post in libtor-app.a(dircache.o)
  "_dirserv_add_own_fingerprint", referenced from:
      _init_keys in libtor-app.a(router.o)
  "_dirserv_load_fingerprint_file", referenced from:
      _do_hup in libtor-app.a(main.o)
      _init_keys in libtor-app.a(router.o)
  "_dirserv_orconn_tls_done", referenced from:
      _connection_or_client_learned_peer_id in libtor-app.a(connection_or.o)
  "_dirserv_router_get_status", referenced from:
      _nodelist_set_routerinfo in libtor-app.a(nodelist.o)
  "_dirserv_set_node_flags_from_authoritative_status", referenced from:
      _nodelist_set_routerinfo in libtor-app.a(nodelist.o)
  "_keypin_load_journal", referenced from:
      _run_tor_main_loop in libtor-app.a(main.o)
  "_keypin_open_journal", referenced from:
      _run_tor_main_loop in libtor-app.a(main.o)
ld: symbol(s) not found for architecture x86_64
Last edited 3 weeks ago by LarryBitcoin (previous) (diff)

comment:13 Changed 3 weeks ago by nickm

I've added another commit to the branch; does it fix the remaining problems?

comment:14 Changed 3 weeks ago by LarryBitcoin

Will test now and get back - thanks!

comment:15 Changed 3 weeks ago by LarryBitcoin

ack fb0e8966f00c4349602594f312dae60cad2ea11d

tested both as a library and as a standalone binary and with and without lto with the dirauth module disabled and everything builds correctly on linux/windows/osx/freebsd/android with both gcc/clang where available.

Great work! thank you!

comment:16 Changed 8 days ago by dgoulet

Component: Core TorCore Tor/Tor

comment:17 Changed 8 days ago by dgoulet

Reviewer: catalyst

comment:18 Changed 8 days ago by nickm

Keywords: 041-should added

comment:19 in reply to:  13 Changed 8 days ago by catalyst

Status: needs_reviewneeds_information

Replying to nickm:

I've added another commit to the branch; does it fix the remaining problems?

Did the previous changes work with -O0? If so, what readily available platform do we have to detect the problems that the second commit addresses?

comment:20 Changed 8 days ago by nickm

Status: needs_informationneeds_review

For me, the previous changes work with GCC and -O0, but fail with clang and -O0. The configuration that reproduces the failure with the original version of the patch but which succeeds with the second is:

./configure  --disable-module-dirauth --disable-gcc-hardening CC=clang CFLAGS='-O0'

(The --disable-gcc-hardening part is necessary to turn off FORTIFY_SOURCE, which is not compatible with -O0.)

For me, both versions of the patch work with GCC. My GCC is 9.2.1; my clang is 8.0.0.

comment:21 Changed 7 days ago by catalyst

Description: modified (diff)

Fix formatting in description.

comment:22 in reply to:  20 Changed 7 days ago by catalyst

Status: needs_reviewmerge_ready

Replying to nickm:

For me, the previous changes work with GCC and -O0, but fail with clang and -O0. The configuration that reproduces the failure with the original version of the patch but which succeeds with the second is:

./configure  --disable-module-dirauth --disable-gcc-hardening CC=clang CFLAGS='-O0'

(The --disable-gcc-hardening part is necessary to turn off FORTIFY_SOURCE, which is not compatible with -O0.)

For me, both versions of the patch work with GCC. My GCC is 9.2.1; my clang is 8.0.0.

Thanks! Patches look good.

I have

gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)

and gcc works with the first version of the patch while clang does not.

comment:23 Changed 7 days ago by nickm

Keywords: dgoulet-merge added

comment:24 Changed 42 hours ago by teor

Keywords: 041-backport added
Milestone: Tor: 0.4.1.x-finalTor: 0.4.2.x-final

comment:25 Changed 34 hours ago by nickm

Keywords: asn-merge added; dgoulet-merge removed

Setting two tickets to asn-merge, since dgoulet is on vacation.

comment:26 Changed 34 hours ago by asn

Keywords: asn-merge removed

Merged! Leaving open for backports.

comment:27 Changed 34 hours ago by nickm

Milestone: Tor: 0.4.2.x-finalTor: 0.4.1.x-final
Note: See TracTickets for help on using tickets.