Opened 3 weeks ago

#29565 new defect

Fix broker robots.txt to disallow crawling

Reported by: dcf Owned by:
Priority: Medium Milestone:
Component: Obfuscation/Snowflake Version:
Severity: Normal Keywords: easy
Cc: ahf, cohosh, dcf, arlolra Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

From comment:11:ticket:28848 and https://github.com/ahf/snowflake-notes/blob/fb4304a7df08c6ddeeb103f38fc9103721a20cd9/Broker.markdown#the-robotstxt-handler:

  • Was the question about crawling ever answered? I can't think of a very good reason not to allow it. Even if censors were crawling the web for Snowflake brokers, they could get this information much more easily just from the source code.

I believe the intention behind the robots.txt handler is to prevent search engines from indexing any pages on the site, because there's no permanent information there, not for any security or anti-enumeration reason.

ahf points out that the current robots.txt achieves the opposite: it allows crawling of all pages by anyone. Instead of

User-agent: *
Disallow:

it should be

User-agent: *
Disallow: /

Child Tickets

Change History (0)

Note: See TracTickets for help on using tickets.