Opened 3 months ago

Closed 5 weeks ago

#29565 closed defect (fixed)

Fix broker robots.txt to disallow crawling

Reported by: dcf Owned by: cohosh
Priority: Medium Milestone:
Component: Circumvention/Snowflake Version:
Severity: Normal Keywords: easy
Cc: ahf, cohosh, dcf, arlolra Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

From comment:11:ticket:28848 and https://github.com/ahf/snowflake-notes/blob/fb4304a7df08c6ddeeb103f38fc9103721a20cd9/Broker.markdown#the-robotstxt-handler:

  • Was the question about crawling ever answered? I can't think of a very good reason not to allow it. Even if censors were crawling the web for Snowflake brokers, they could get this information much more easily just from the source code.

I believe the intention behind the robots.txt handler is to prevent search engines from indexing any pages on the site, because there's no permanent information there, not for any security or anti-enumeration reason.

ahf points out that the current robots.txt achieves the opposite: it allows crawling of all pages by anyone. Instead of

User-agent: *
Disallow:

it should be

User-agent: *
Disallow: /

Child Tickets

Change History (4)

comment:1 Changed 5 weeks ago by cohosh

Owner: set to cohosh
Status: newassigned

comment:2 Changed 5 weeks ago by cohosh

Status: assignedneeds_review

comment:3 Changed 5 weeks ago by dcf

Status: needs_reviewmerge_ready

Looks good, thanks for doing this.

comment:4 Changed 5 weeks ago by cohosh

Resolution: fixed
Status: merge_readyclosed

Merged to master and deployed.

Note: See TracTickets for help on using tickets.