Opened 3 years ago

Closed 3 years ago

#24534 closed defect (fixed)

Investigate Exonerator down times

Reported by: atagar Owned by: metrics-team
Priority: High Milestone:
Component: Metrics/ExoneraTor Version:
Severity: Major Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by iwakeh)

Investigate the reasons for ExoneraTor service being unable to provide service.

One incident reported by atagar:

When attempting to visit the connection hangs for several minutes, then fails for me with 502 proxy error ("Reason: Error reading from remote server"). sysrqb says he's getting the same thing.

Child Tickets

Change History (6)

comment:1 Changed 3 years ago by karsten

I just restarted the process which fixed the issue for now. It may come back, though. Keeping this ticket open to investigate further. Thanks for the report!

comment:2 Changed 3 years ago by iwakeh

Further investigation lead to a thread dump with 200 waiting backend queries, but no such query thread.
Logs didn't reveal anything suspicious except very few errors due to weird data input (reproducible by entering an ipv6 address with additional percent sign number combination).
In addition, ExoneraTor often is queried more than 3 times per second, which is surprising, but not at all a performance problem.
Thus, a temporary measure can be to simply catch all possible errors and then proceed when we catch something.
Please find a minimal catch-all patch here (all changes other than adding try-catch were made to avoid checkstyle complaints).

comment:3 Changed 3 years ago by iwakeh

Description: modified (diff)
Summary: Exonerator is downInvestigate Exonerator down times

Changed the summary and description to describe what this ticket is about.

comment:4 Changed 3 years ago by irl

There should never be a % sign on global scoped IP addresses, I'm guessing someone was trying to enter link local addresses with an interface identifier? It might be worth catching all RFC1918 and RFC4193 addresses and just returning them as invalid without ever performing a query, as they should never appear in the database and would not have any real meaning if they did (they are explicitly not unique identifiers).

comment:5 in reply to:  4 Changed 3 years ago by iwakeh

Thanks for your reply! No need to worry about db or application here, because my description above was unclear and hinted a wrong picture. Trying to do better:

Of course, there is no percent sign in ipv6 addresses. A query with an ipv6 address is percent encoded (%3A for :) and messing with that percent encoded url query triggered the very few warnings (<5 in several days), i.e., it is immediately identified as invalid in our code. Jetty validly logs a warning. ExoneraTor replies fine by stating that there was an invalid ip parameter.
Thus, these few warnings have nothing to do with the ExoneraTor issue at hand.

comment:6 Changed 3 years ago by karsten

Resolution: fixed
Status: newclosed

Merged iwakeh's patch that will catch anything we didn't think of and log a more helpful message. Now we'll have to wait until it happens again. When it does (and maybe it never does), let's open a new ticket. Closing. Thanks, all!

Note: See TracTickets for help on using tickets.