Opened 22 months ago

Closed 21 months ago

Last modified 21 months ago

#17051 closed defect (fixed)

The binary reading on Windows platforms don't resolve the "0D0A"/"0A" problem.

Reported by: TORques Owned by: atagar
Priority: Medium Milestone:
Component: Core Tor/Stem Version:
Severity: Keywords: descriptor
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

When I run the first example from: https://stem.torproject.org/api/descriptor/microdescriptor.html

import os

from stem.control import Controller
from stem.descriptor import parse_file

with Controller.from_port(port = 9051) as controller:
  controller.authenticate()

  exit_digests = set()
  data_dir = controller.get_conf('DataDirectory')

  for desc in controller.get_microdescriptors():
    if desc.exit_policy.is_exiting_allowed():
      exit_digests.add(desc.digest)

  print 'Exit Relays:'

  for desc in parse_file(os.path.join(data_dir, 'cached-microdesc-consensus')):
    if desc.digest in exit_digests:
      print '  %s (%s)' % (desc.nickname, desc.fingerprint)

on Windows XP or Windows 10 the result was:

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\>example1.py
Exit Relays:

C:\>

After I replaced in "C:\Python27\Lib\site-packages\stem\descriptor\init.py":

def _parse_file_for_path(descriptor_file, *args, **kwargs):
		with open(descriptor_file, 'rb') as desc_file:
			for desc in parse_file(desc_file, *args, **kwargs):
				yield desc

with

def _parse_file_for_path(descriptor_file, *args, **kwargs):
	if os.environ.get('OS','') != 'Windows_NT':
		with open(descriptor_file, 'rb') as desc_file:
			for desc in parse_file(desc_file, *args, **kwargs):
				yield desc
	if os.environ.get('OS','') == 'Windows_NT':
		with open(descriptor_file, 'r') as desc_file:
			for desc in parse_file(desc_file, *args, **kwargs):
				yield desc

the result is:

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\>example1.py
Exit Relays:
  CalyxInstitute14 (0011BD2485AD45D984EC4159C88FC066E5E3300E)
  ieditedtheconfig (0098C475875ABC4AA864738B1D1079F711C38287)
  default (00AE2BBFB5C0BBF25853B49E04CC76895044A795)
  ...

This bug was reported on #tor IRC channel by maiena and I fixed it with this patch.

The file "cached-microdesc-consensus" created by tor on Windows platforms end any line with CRLF (0D0A). As Python stated about "open(name[, mode[, buffering]])", "The most commonly-used values of mode are 'r' for reading, 'w' for writing (truncating the file if it already exists). If mode is omitted, it defaults to 'r'. The default is to use text mode, which may convert '\n' characters to a platform-specific representation on writing and back on reading."

That means that the binary reading on Windows platforms don't resolve the "0D0A"/"0A" problem.

Child Tickets

Attachments (1)

cached-microdesc-consensus (1.3 MB) - added by trodun 22 months ago.
created by Tor 0.2.6.10 on Windows

Download all attachments as: .zip

Change History (8)

comment:1 Changed 22 months ago by atagar

Interesting, thanks for the report! Mind attaching one of the consensus files Stem is failing to parse? As discussed on irc this suggested fix would break python3 so it would be nice to get a local repro so I can come up with something better.

Changed 22 months ago by trodun

created by Tor 0.2.6.10 on Windows

comment:2 Changed 22 months ago by trodun

This bug also affects cached-consensus and possibly other files which use CRLF.

Here is a list of files from DataDirectory with their line endings on Windows:

CRLF cached-certs
CRLF cached-consensus
LF   cached-descriptors
LF   cached-descriptors.new
CRLF cached-microdesc-consensus
LF   cached-microdescs
LF   cached-microdescs.new
CRLF state

All of them are saved with LF on Linux.

Instead of patching stem, would it make sense to decide upon a line ending for current and future files generated by Tor, and use it consistently across all platforms?

Last edited 22 months ago by trodun (previous) (diff)

comment:3 Changed 21 months ago by atagar

Thanks! I'll try to get a fix for this over the weekend. It's certainly tempting to normalize this in Tor but it's probably proper to use Windows newlines on that platform. When in Rome...

Easy enough to fix on Stem's end. ;)

comment:4 Changed 21 months ago by trodun

Great! In that case a separate issue could be opened, for using CRLF in all files generated on Windows.

comment:5 Changed 21 months ago by atagar

Pushed a fix, mind giving it a whirl?

comment:6 Changed 21 months ago by atagar

  • Resolution set to fixed
  • Status changed from new to closed

comment:7 Changed 21 months ago by teor

See #17197 for fixing the line endings written by tor.

Note: See TracTickets for help on using tickets.