Opened 9 years ago

Closed 9 years ago

#4930 closed defect (fixed)

bridgedb is not monitored enough by nagios?

Reported by: runa Owned by: phobos
Priority: Medium Milestone:
Component: Company Version:
Severity: Keywords:
Cc: kaner Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

I have not had any replies from bridges@… this week, and we've received a few complaints on help@… as well. Can someone please figure out what's wrong?

Child Tickets

Change History (17)

comment:2 Changed 9 years ago by kaner

Is there any way to estimate when this stopped working? From the logs, I can tell that BridgeDB processed email requests at least until Jan 15 06:07:31.

If it stopped working after that, or if we aren't sure, we should let someone with admin rights (weasel or phobos) check whether or not emails are forwarded from whatever MTA to BridgeDB. To me, it currently looks like mails aren't arriving for processing in BridgeDB.

comment:3 Changed 9 years ago by arma

I just got a response from bridges@…. I think the problem was that the bridgedb service was down.

I guess the followup problem is: how come no nagios things noticed that the service was down?

comment:4 Changed 9 years ago by arma

Component: BridgeDBCompany
Owner: set to phobos
Summary: bridges@torproject.org does not replybridgedb is not monitored enough by nagios?

comment:5 in reply to:  4 Changed 9 years ago by phobos

Replying to arma:

We don't run bridgedb, therefore nagios ignores it.

comment:6 Changed 9 years ago by weasel

that time seems to coincide with byblos having been rebooted. guess is that something doesn't start Right.

comment:7 Changed 9 years ago by arma

I just changed a ~ to a /home/bridges in the @reboot line of its crontab. Maybe that will improve things.

comment:8 in reply to:  7 Changed 9 years ago by weasel

Replying to arma:

I just changed a ~ to a /home/bridges in the @reboot line of its crontab. Maybe that will improve things.

shouldn't. crontab entries pass through the shell which does tilde expansion.

comment:9 Changed 9 years ago by runa

Users are reporting that they are not getting any emails from bridges@tpo. I'm not having any luck either.

comment:10 Changed 9 years ago by arma

Works fine for me -- I just got an email response.

I wonder if these users are expecting more than one answer per hour?

Or if they're not sending from the right domains.

comment:11 in reply to:  10 Changed 9 years ago by runa

Replying to arma:

Works fine for me -- I just got an email response.

I wonder if these users are expecting more than one answer per hour?

Or if they're not sending from the right domains.

Did you send the email from a GMail account? I just tried again (and I had a friend try as well), and still no reply.

comment:12 Changed 9 years ago by arma

Cc: kaner added

bridgedb's logs say:

Feb 10 07:50:38 [INFO] Got a completed email; deciding whether to reply.
Feb 10 07:50:38 [INFO] Got a bad dkim header ('invalid (public key: DNS
query timeout for gamma._domainkey.gmail.com)') on an incoming mail;
rejecting it.

Looks like our dkim proxy is failing to do a dkim step, and declaring the email invalid.

Where does our dkim proxy live? I heard a rumor it was on gettor? I think kaner runs that?

comment:13 Changed 9 years ago by kaner

I wonder whats going on there. Its not like dkimproxy completely fails to work:

dnsel2:/var/log# grep "DKIM verify - pass" mail.log* | wc -l
18
dnsel2:/var/log# grep "DKIM verify - invalid" mail.log* | wc -l
5084

But it has a really bad success rate. All the "invalid"s there are timeouts, btw.

Running the query from the command line doesn't look slow at all:

dnsel2:/var/log# time dig +short TXT gamma._domainkey.gmail.com
"k=rsa\; p=MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDIhyR3oItOy22ZOaBrIVe9m/iME3RqOJeasANSpg2YTHTYV+Xtp4xwf5gTjCmHQEMOs0qYu0FYiNQPQogJ2t0Mfx9zNu06rfRBDjiIU9tpx2T+NGlWZ8qhbiLo5By8apJavLyqTLavyPSrvsx0B3YzC63T4Age2CDqZYA+OwSMWQIDAQAB"

real    0m0.047s
user    0m0.000s
sys     0m0.020s

Why is it only slow when the query is called from dkimproxy? Why did that timeout problem start to occur all of a sudden anyway? It did work for quite a while, then started getting problems, without anyone changing the setup (to my knowledge).

comment:14 Changed 9 years ago by kaner

I tried to verify one of the GetTor DKIM test emails from the command line:

dnsel2:/var/log# cat /tmp/testmail.txt | dkimproxy-verify 
originator address: ioerror@gmail.com
signature identity: @gmail.com
verify result: pass
signature identity: ioerror@gmail.com
verify result: pass
sender policy result: accept
author policy result: accept
ADSP policy result: accept

Worked.

comment:15 Changed 9 years ago by kaner

Fun fact: I don't see any DNS requests for domainkey hosts when I run:

tcpdump -n -i eth0 udp port 53 | grep domainkey

Unless I do a manual request from the command line like so:

dig +short TXT s1024._domainkey.yahoo.cn

or

cat /tmp/testmail.txt | dkimproxy-verify

As arma concluded, it may be that dkimproxy only *thinks* it does a DNS requests. Maybe it starts those requests with a timeout of 0? Someone with better perl-foo than myself should maybe take a look at /usr/share/perl5/Mail/DKIM/DNS.pm?

comment:16 Changed 9 years ago by kaner

I don't know what to day, but changing the request timeout from 10 to 60 seconds in /usr/share/perl5/Mail/DKIM/DNS.pm helped. Now all requests get through. And no, they don't take longer than small parts of a second each. I'm confused. But it works now. BridgeDB sends out bridge addresses again.

comment:17 Changed 9 years ago by phobos

Resolution: fixed
Status: newclosed

closing this. the new bridgedb is on a tor server and will be monitored by nagios.

Note: See TracTickets for help on using tickets.