Opened 3 years ago

Closed 2 years ago

#20323 closed defect (fixed)

avoid httpurl connection and use more robust approach

Reported by: iwakeh Owned by: karsten
Priority: High Milestone: metrics-lib 1.6.0
Component: Metrics/Library Version:
Severity: Normal Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

While testing sync the following happened:
during the downloading of recent from the main collector using the old, i.e. DescriptorCollectorImpl, method the download just hung at some point. I killed it after 30min, restarted and the collection worked.

When the hang-up happened again, I stopped the process immediately and re-started it using the new method, i.e. DescriptorIndexCollector. Now, FileNotFound warnings were logged and the download finished without problems. The files that were not available anymore had just expired, that is CollecTor just cleaned them out of its recent folder, which would explain the hang-up of the old method, too.

Why can one approach cope with disappearing files?

The two ways differ in the trace below FileInputStream (marked below; needs a wide window to be visible)

thread dump old method

"CollecTor-Scheduled-Thread-1" #9 daemon prio=5 os_prio=0 tid=0x00007f49d43c0800 nid=0x25fd runnable [0x00007f49b0f06000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:170)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
	at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:593)
	at sun.security.ssl.InputRecord.read(InputRecord.java:532)
	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
	- locked <0x000000071dc8b160> (a java.lang.Object)
	at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
	at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
	- locked <0x000000071dcbe658> (a sun.security.ssl.AppInputStream)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:345) <------------------------------------+
	- locked <0x000000071dd26f98> (a java.io.BufferedInputStream)                                          |
	at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:552)                  |
	at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:609)                          |  Difference
	at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:696)                               |
	- locked <0x000000071dd2ae48> (a sun.net.www.http.ChunkedInputStream)                                  |
	at java.io.FilterInputStream.read(FilterInputStream.java:133)    <-------------------------------------+
	at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3336)
	at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:238)
	at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
	at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
	- locked <0x000000071dd2d138> (a java.io.BufferedInputStream)
	at org.torproject.descriptor.impl.DescriptorCollectorImpl.fetchRemoteFile(DescriptorCollectorImpl.java:225)
        ...

thread dump new method

"CollecTor-Scheduled-Thread-1" #9 daemon prio=5 os_prio=0 tid=0x00007f19fc3a9800 nid=0x4060 runnable [0x00007f19e4a3f000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:170)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
	at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:593)
	at sun.security.ssl.InputRecord.read(InputRecord.java:532)
	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
	- locked <0x0000000734394300> (a java.lang.Object)
	at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
	at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
	- locked <0x00000007343963d0> (a sun.security.ssl.AppInputStream)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:345) <-------------------------------------+
	- locked <0x000000071df8b3b8> (a java.io.BufferedInputStream)                                           |
	at sun.net.www.MeteredStream.read(MeteredStream.java:134)                                               | Difference 
	- locked <0x000000071df8f618> (a sun.net.www.http.KeepAliveStream)  <------- (*)                        |
	at java.io.FilterInputStream.read(FilterInputStream.java:133)    <--------------------------------------+
	at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3336)
	at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3329)
	at java.nio.file.Files.copy(Files.java:2908)
	at java.nio.file.Files.copy(Files.java:3027)
	at org.torproject.descriptor.index.DescriptorIndexCollector.fetchRemoteFiles(DescriptorIndexCollector.java:88)
	at org.torproject.descriptor.index.DescriptorIndexCollector.collectDescriptors(DescriptorIndexCollector.java:61)
        ...

(*) see KeepAliveStream.java

example code from DescriptorIndexCollector (cf. here)

      try (InputStream is = new URL(baseUrl + "/" + filepathname)
          .openStream()) {
        Files.copy(is, tempDestinationFile.toPath());
...

This is also a little shorter than the older approach

    try {
      URL url = new URL(urlString);
      huc = (HttpURLConnection) url.openConnection();
      huc.setRequestMethod("GET");
      huc.connect();
      int responseCode = huc.getResponseCode();
      if (responseCode == 200) {
        BufferedReader br = new BufferedReader(new InputStreamReader(
            huc.getInputStream()));
        String line;
        while ((line = br.readLine()) != null) {
          sb.append(line).append("\n");
        }
        br.close();
      }
      ...

next steps

  • Try to device a test that triggers the above problem.
  • And/Or analyze the underlying sources for the reason.
  • Replace the HttpURLConnection code with the shorter Files.copy of InputStream.

This would probably important for the following tickets, too:
#8799, #16151

Child Tickets

Change History (9)

comment:1 Changed 3 years ago by karsten

Huh, wow, very nice find! Looking forward to replacing that old code with something less fragile. Thanks!

comment:2 Changed 3 years ago by iwakeh

Milestone: metrics-lib 1.5.0metrics-lib 2.0.0

comment:3 Changed 3 years ago by karsten

Priority: MediumHigh

It might be that this issue killed the relaydescs module 1.5 days ago by blocking the downloader indefinitely:

2016-10-30 23:05:00,000 INFO o.t.c.c.CollecTorMain:54 Starting relaydescs module of CollecTor.
2016-10-30 23:05:20,823 INFO o.t.c.r.CachedRelayDescriptorReader:255 Finished importing relay descriptors from local Tor data directories:
cached-consensus: 2016-10-30 23:00:00
cached-descriptors: parsed 0, skipped 24471 server descriptors
cached-descriptors.new: parsed 753, skipped 6024 server descriptors
cached-extrainfo: parsed 0, skipped 24508 extra-info descriptors
cached-extrainfo.new: parsed 745, skipped 1463 extra-info descriptors
v3-status-votes: parsed 8, skipped 0 votes
2016-10-30 23:06:00,001 INFO o.t.c.c.CollecTorMain:54 Starting updateindex module of CollecTor.

There should be a line like this (copied over from the previous run), but there wasn't:

2016-10-30 22:35:19,933 INFO o.t.c.r.RelayDescriptorDownloader:1067 Finished downloading relay descriptors from the directory authorities.

Raising priority to high.

comment:4 Changed 3 years ago by iwakeh

Milestone: metrics-lib 2.0.0metrics-lib 1.6.0

This ticket is only about metrics-lib.
The above comment:3 belongs to #8799. I added it there and raised #8799's priority.
Chose an earlier milestone for this ticket.

comment:5 Changed 2 years ago by karsten

Status: newneeds_information

Hmm, I briefly looked into this issue and tried to reproduce it, but failed. Here's what I did: I placed a large file (100MB) on people.torproject.org, which is powered by Apache, and attempted to download it using HttpURLConnection. This would take less than a minute under normal circumstances. I removed the file from the server a few seconds after starting, but the download succeeded without problems. I assume Apache was so kind to cache the file for me.

But do we really need to analyze this problem, or could we as well kick out all mentions of HttpURLConnection? As far as I can see this would only affect DescriptorCollectorImpl and DirectoryDownloader, and we could quite easily deprecate both of those. Wouldn't that fix this issue, too?

comment:6 in reply to:  5 Changed 2 years ago by iwakeh

Replying to karsten:

[snip]
But do we really need to analyze this problem, or could we as well kick out all mentions of HttpURLConnection? As far as I can see this would only affect DescriptorCollectorImpl and DirectoryDownloader, and we could quite easily deprecate both of those. Wouldn't that fix this issue, too?

I agree, this seems the best solution here.

comment:7 Changed 2 years ago by karsten

Status: needs_informationneeds_review

Okay, please review this commit.

comment:8 in reply to:  7 Changed 2 years ago by iwakeh

Status: needs_reviewmerge_ready

Replying to karsten:

Okay, please review this commit.

Looks fine.

comment:9 Changed 2 years ago by karsten

Resolution: fixed
Status: merge_readyclosed

Thanks for looking! Cherry-picked and merged to master. Closing. Thanks again!

Note: See TracTickets for help on using tickets.