Opened 4 months ago

Last modified 8 weeks ago

#33835 merge_ready defect

Gmail's quoted response confuses BridgeDB's email autoresponder

Reported by: phw Owned by: agix
Priority: Medium Milestone:
Component: Circumvention/BridgeDB Version:
Severity: Normal Keywords: s30-o22a2
Cc: cohosh, phw Actual Points:
Parent ID: #31279 Points: 1
Reviewer: Sponsor: Sponsor30-can

Description

When replying to a BridgeDB email in Gmail's web interface, one ends up sending an email like this:

On Mon, Apr 6, 2020 at 2:12 PM <bridges@torproject.org> wrote:

>
> [This is an automated email.  Please do not reply.]
>
> Here are your bridges:
>
>   [redacted]
>
> Add these bridges to your Tor Browser by opening your browser
> preferences, clicking on "Tor", and then adding them to the "Provide a
> bridge" field.
>
> If these bridges are not what you need, reply to this email with one of
> the following commands in the message body:
>
>   get bridges            (Request unobfuscated Tor bridges.)
>   get ipv6               (Request IPv6 bridges.)
>   get transport TYPE     (Request obfuscated bridges. Replace TYPE with
> 'obfs4'.)
>   get key                (Get a copy of BridgeDB's public GnuPG key.)
>
>
>

--000000000000dde1a205a2a60f3e
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">get transport obfs4<br></div><br><div class=3D"gmail_quote=
"><div dir=3D"ltr" class=3D"gmail_attr">On Mon, Apr 6, 2020 at 2:12 PM &lt;=
<a href=3D"mailto:bridges@torproject.org">bridges@torproject.org</a>&gt; wr=
ote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px=
 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
[This is an automated email.=C2=A0 Please do not reply.]<br>
<br>
Here are your bridges:<br>
<br>
=C2=A0 [redacted]<br>
<br>
Add these bridges to your Tor Browser by opening your browser<br>
preferences, clicking on &quot;Tor&quot;, and then adding them to the &quot=
;Provide a<br>
bridge&quot; field.<br>
<br>
If these bridges are not what you need, reply to this email with one of<br>
the following commands in the message body:<br>
<br>
=C2=A0 get bridges=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (Request unobfu=
scated Tor bridges.)<br>
=C2=A0 get ipv6=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(Requ=
est IPv6 bridges.)<br>
=C2=A0 get transport TYPE=C2=A0 =C2=A0 =C2=A0(Request obfuscated bridges. R=
eplace TYPE with &#39;obfs4&#39;.)<br>
=C2=A0 get key=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (Get =
a copy of BridgeDB&#39;s public GnuPG key.)<br>
<br>
<br>
</blockquote></div>

--000000000000dde1a205a2a60f3e--

BridgeDB correctly ignores the commands that start with > but it doesn't ignore the commands that are encoded in quoted-printable. BridgeDB's email parser ends up interpreting each line as command, ending in "get key", which raises an exception, resulting in BridgeDB sending no response at all.

We should make sure that BridgeDB doesn't get confused by Gmail's quoted-printable response.

Child Tickets

Attachments (2)

Replacing-the-current-Mail-parsing-option.patch (8.9 KB) - added by agix 4 months ago.
Replacing-the-current-Mail-parsing-option.2.patch (9.0 KB) - added by agix 4 months ago.

Download all attachments as: .zip

Change History (28)

comment:1 Changed 4 months ago by agix

Owner: set to agix
Status: newassigned

comment:2 Changed 4 months ago by agix

To effectively parse out the requested options via email, I used the get_payload() function for EmailMessage objects.
As pointed out in both answers of https://stackoverflow.com/questions/45124127/unable-to-extract-the-body-of-the-email-file-in-python/45124153 there needs to be a certain policy defined (policy.compat32 instead the default policy) to be able to use get_payload instead of get_body.
The advantage of get_payload is that it ignores the Content-Type and Content-Transfer-Encoding header and therefore solely focuses on the actual payload which makes the parsing way easier.
get_payload(0).get_payload() parses out the plain text, but cant differentiate between the actual message and previous responses.
Thats why this approach just focuses on the first 3 words of an incoming email to determine the response action.
This seems to be one of only few approaches if we don't want to rely on regex anymore, but I am open to suggestions.
Additionally some unittests need to be adjusted for this patch.

comment:3 Changed 4 months ago by agix

Sorry, the previous patch had a small issue.
I fixed it in the second attachment.

comment:4 Changed 4 months ago by agix

Status: assignedneeds_review

comment:5 Changed 4 months ago by phw

Can you please push your patch to GitHub or GitLab? It makes code review easier if I can comment on specific lines.

comment:7 Changed 4 months ago by phw

Status: needs_reviewneeds_revision

With "push your patch to GitHub," I meant pushing the commits that your patch is based on.

That said, here's some feedback:

  • Overall, the get_payload() approach seems reasonable. Nicely done!
  • Make sure that your code is based on the develop and not on the master branch. For BridgeDB, we're using a development model in which patches branch off of develop rather than master.
  • I'm not sure why the patch deletes TRANSPORT_PATTERN and UNBLOCKED_PATTERN, and replaces them with a TODO item?
  • The patch's commit message should provide a short summary of how the patch accomplishes its goal. Here's a summary of how to write good commit messages.
  • We should add unit tests to make sure that the patch correctly deals with emails of different content types.

comment:8 Changed 3 months ago by agix

Thanks for the feedback!
On the branch 'stable-optionparser' you will find my work over the last week.
All but 4 unit tests still result in an error.
If my approach is acceptable thus far, I will continue to fix those as well.

Here is what I got so far:
https://github.com/agiix/bridgedb/tree/stable-optionparser

comment:9 Changed 3 months ago by agix

Status: needs_revisionneeds_review

comment:10 Changed 3 months ago by phw

For some reason I'm unable to comment on GitHub's "Files changed" view. Do you mind squashing your branch to a single commit? It will make it easier for me to comment on specific code lines.

comment:11 Changed 3 months ago by agix

There you go:
https://github.com/agiix/bridgedb/commit/1fb531ed505352026223c298607554654116564e

Correction about comment:8
What I meant was that all but 4 unit tests (and a few commented out ones) pass :)

comment:12 in reply to:  11 ; Changed 3 months ago by phw

Status: needs_reviewneeds_revision

Replying to agix:

There you go:
https://github.com/agiix/bridgedb/commit/1fb531ed505352026223c298607554654116564e

Correction about comment:8
What I meant was that all but 4 unit tests (and a few commented out ones) pass :)


Thanks! I left a bunch of comments here. The overall approach seems reasonable but we should make a handful of small changes – I'm happy to elaborate on any of my comments, just let me know. I would also suggest to not worry about the unit tests for now. Let's make sure that we're happy with the overall design, and then we can make sure that everything is properly tested.

Finally, please don't feel discouraged by my asking for revisions. Non-trivial patches (and this certainly is one) typically go through at least one round of revisions.

comment:13 Changed 3 months ago by agix

Status: needs_revisionneeds_review

comment:14 in reply to:  12 ; Changed 3 months ago by agix

Replying to phw:

Replying to agix:

There you go:
https://github.com/agiix/bridgedb/commit/1fb531ed505352026223c298607554654116564e

Correction about comment:8
What I meant was that all but 4 unit tests (and a few commented out ones) pass :)


Thanks! I left a bunch of comments here. The overall approach seems reasonable but we should make a handful of small changes – I'm happy to elaborate on any of my comments, just let me know. I would also suggest to not worry about the unit tests for now. Let's make sure that we're happy with the overall design, and then we can make sure that everything is properly tested.

Finally, please don't feel discouraged by my asking for revisions. Non-trivial patches (and this certainly is one) typically go through at least one round of revisions.

Thanks :)
I implemented your feedback and made a commit for it here

comment:15 in reply to:  14 ; Changed 3 months ago by phw

Replying to agix:

I implemented your feedback and made a commit for it here


This looks better, thanks. Can you please squash your two commits into one? I will then clean it up a bit, take a look at the unit tests, and get it merged.

comment:16 in reply to:  15 Changed 3 months ago by agix

Replying to phw:

Replying to agix:

I implemented your feedback and made a commit for it here


This looks better, thanks. Can you please squash your two commits into one? I will then clean it up a bit, take a look at the unit tests, and get it merged.

Here you go
Additionally I removed the fallback solution in request.py on line 80 since it was not an elegant solution anyway and was only there for testing purposes. Should get_payload return a string instead of a list, an exception will be raised.

comment:17 Changed 2 months ago by phw

I took a look at this issue and may have found a simpler way to fix it. Agix, what do you think about the following fix?
https://github.com/NullHypothesis/bridgedb/compare/defect/33835-2

The unit tests pass. If you think it looks good, we can test-deploy it on polyanthum.

comment:18 in reply to:  17 ; Changed 2 months ago by agix

Replying to phw:

I took a look at this issue and may have found a simpler way to fix it. Agix, what do you think about the following fix?
https://github.com/NullHypothesis/bridgedb/compare/defect/33835-2

The unit tests pass. If you think it looks good, we can test-deploy it on polyanthum.

Your fix is definitely more elegant.
The reason for my messy approach was that I noticed when, for example, the email is sent from an outlook address to a gmail address, every part is encoded as quoted-printable, while when the mails are sent between two gmail addresses, only the content of type html is being encoded as quoted-printable.
I guess I went a little overboard with my solution, since the different encoding behaviors might not effect BridgeDB at all (since it only replies to gmail & riseup).
So I am all for testing your fix.

comment:19 in reply to:  18 ; Changed 2 months ago by phw

Replying to agix:

The reason for my messy approach was that I noticed when, for example, the email is sent from an outlook address to a gmail address, every part is encoded as quoted-printable, while when the mails are sent between two gmail addresses, only the content of type html is being encoded as quoted-printable.


Oh, that's good to know! Do you happen to have such an email that you can paste in here? We should add it to our unit tests and find a way to support it.

comment:20 in reply to:  19 Changed 2 months ago by agix

Replying to phw:

Oh, that's good to know! Do you happen to have such an email that you can paste in here? We should add it to our unit tests and find a way to support it.


Sure, I posted the original email here

comment:21 Changed 2 months ago by phw

Hmm. I think the right way forward is to discard message parts whose content type is not text/plain. In both my and agix's example, the text/html part shouldn't reach the parser and the text/plain part should be fine.

We won't be able to support every single email client but at the very least, we should be able to support emails from Gmail's and Riseup's web interface.

What do you think?

comment:22 in reply to:  21 Changed 2 months ago by agix

Replying to phw:

Hmm. I think the right way forward is to discard message parts whose content type is not text/plain. In both my and agix's example, the text/html part shouldn't reach the parser and the text/plain part should be fine.

We won't be able to support every single email client but at the very least, we should be able to support emails from Gmail's and Riseup's web interface.

What do you think?


I agree.
Let's match for the content-type being text/plain and this should solve the issue for now.
We just have to keep in mind, if we ever add support for other email services that the issue might occur again.

comment:23 Changed 2 months ago by phw

Here's a revised patch set including two unit tests:
https://github.com/NullHypothesis/bridgedb/compare/defect/33835-2

Basically, if our autoresponder receives a multipart email, the patch removes the parts whose content type is not text/plain. If there is no text/plain part (which shouldn't happen), we log a warning message. Let's keep an eye out for that warning message after deploying this branch.

Agix, can you please take another look at my latest revisions?

comment:24 Changed 2 months ago by agix

I just left one comment on your patch here
Other than that, the patch looks good to me.

comment:25 Changed 2 months ago by agix

Status: needs_reviewneeds_information

comment:26 Changed 8 weeks ago by agix

Status: needs_informationmerge_ready
Note: See TracTickets for help on using tickets.