Opened 5 years ago

Closed 4 years ago

#8285 closed enhancement (fixed)

facilitator-email-poller: don't process registrations that are too old

Reported by: dcf Owned by: dcf
Priority: Medium Milestone:
Component: Archived/Flashproxy Version:
Severity: Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

When I restarted the poll after the crash in #8284, the poller picked up a flood of pending registrations (over 200 of them) that had accumulated while the poller wasn't clearing them out. We should limit registrations to the past 30 minutes or some other threshold.

Child Tickets

Attachments (2)

8285.patch (2.6 KB) - added by sukhbir 5 years ago.
patch
patchtest.py (1.2 KB) - added by sukhbir 5 years ago.
Test script for the patch

Download all attachments as: .zip

Change History (7)

Changed 5 years ago by sukhbir

Attachment: 8285.patch added

patch

Changed 5 years ago by sukhbir

Attachment: patchtest.py added

Test script for the patch

comment:1 Changed 5 years ago by sukhbir

A couple of quick notes about the patch, just in case I missed something:

  • I am assuming a timezone-agnostic approach for both the mail service (Gmail in our case, which is PDT/PST) and the local time; both are converted to UTC and then compared.
    • To make handling timezone stuff easier, I am using dateutil.
  • For the received header, I am taking into account Gmail's "X-Received", which is when the message is received by the Gmail servers.
  • To test this patch, you can use patchtest.py (attached). Just run it and paste the headers (or the entire message), and it will show you whether the message is "new" or "old". The same function is used in `facilitator-email-poller'.
    • As discussed, this is the easiest way of testing it without having to set up an emulation environment.

Let me know if there any changes to be made.

comment:2 Changed 5 years ago by aallai

Status: newneeds_review

comment:3 Changed 4 years ago by dcf

Status: needs_reviewneeds_revision

Thanks for this patch.

REGISTRATION_LIMIT should be REGISTRATION_AGE_LIMIT and should be in seconds, not minutes.

REGISTRATION_AGE_LIMIT = 30 * 60

Add another layer of abstraction, a function that takes a message and returns whether it should be acted on. The division of labor with imap_loop parsing out the header and accept_message reading it is weird. The changed code in imap_loop should be only:

    if message_ok(msg):
        handle_message(msg)

There needs to be some error handling in case the X-Received header is missing or can't be parsed. What happens then? I at least need a special log message to happen in that case, so that I can eventually figure out what's wrong and fix it. I think my preference in this case is for it to fail open: if we can't tell how old the message is, treat it as new.

This way of getting the date string is not robust:

header.splitlines()[-1]

There's no guarantee that the header will be split across lines in the same way always. I'm actually kind of surprised that the email code exposes those line breaks to you. Looking at http://cr.yp.to/immhf/envelope.html#received, it sounds like it's better to grab whatever follows the final semicolon. I'm assuming that X-received is the same as Received in that respect, which seems to be true from looking at my own inbox.

I'd prefer not to use dateutil if possible. It looks like you can use email.utils.parsedate_tz to parse the date out of the header, and calendar.timegm to turn that back into a UTC-based number. You can use time.gmtime and calendar.timegm to get a comparable value for the current time.

comment:4 in reply to:  3 Changed 4 years ago by sukhbir

Replying to dcf:

There needs to be some error handling in case the X-Received header is missing or can't be parsed. What happens then?

The X-Received seems to be a standard Gmail header (I cannot find official documentation to back that up though) that shows the time the message was delivered to Gmail servers, so there should be no reason to believe that it will fail, and hence the header.splitlines()[-1]. I know it looks like a crude implementation, but I based this on the fact that it is a standard header and thus unlikely to break. Also:

>>> email.message_from_string(msg)["X-Received"]
'by 10.236.198.136 with SMTP id v...3;\n        Fri, 31 May 2013 21:13:41 -0700 (PDT)'

(Notice the \n)

I at least need a special log message to happen in that case, so that I can eventually figure out what's wrong and fix it.

OK, I can put that in.

There's no guarantee that the header will be split across lines in the same way always. I'm actually kind of surprised that the email code exposes those line breaks to you. Looking at http://cr.yp.to/immhf/envelope.html#received, it sounds like it's better to grab whatever follows the final semicolon. I'm assuming that X-received is the same as Received in that respect, which seems to be true from looking at my own inbox.

Yes, the X-Received is the same as Received in that respect, which is when the message is delivered to you. The X-Received header always breaks at a newline and has the same "structure", based on a lot of headers that I have observed. FTA that you linked:

In practice, SMTP servers put all sorts of badly formatted information into Received lines.

While that may be correct, it seems like Gmail does use standardized headers and follows a fixed format. So our assumption of breaking it in the same way may not be unsound. So how do you want me to handle this?

For the other issues, I have fixed them but I will upload the patch after we decide about the above issue.

comment:5 Changed 4 years ago by dcf

Resolution: fixed
Status: needs_revisionclosed

I modified the patch and committed something similar. Thanks, Sukhbir.

Note: See TracTickets for help on using tickets.