Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#17051 closed defect (fixed)

The binary reading on Windows platforms don't resolve the "0D0A"/"0A" problem.

Reported by: TORques Owned by: atagar
Priority: Medium Milestone:
Component: Core Tor/Stem Version:
Severity: Keywords: descriptor
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

When I run the first example from: https://stem.torproject.org/api/descriptor/microdescriptor.html

import os

from stem.control import Controller
from stem.descriptor import parse_file

with Controller.from_port(port = 9051) as controller:
  controller.authenticate()

  exit_digests = set()
  data_dir = controller.get_conf('DataDirectory')

  for desc in controller.get_microdescriptors():
    if desc.exit_policy.is_exiting_allowed():
      exit_digests.add(desc.digest)

  print 'Exit Relays:'

  for desc in parse_file(os.path.join(data_dir, 'cached-microdesc-consensus')):
    if desc.digest in exit_digests:
      print '  %s (%s)' % (desc.nickname, desc.fingerprint)

on Windows XP or Windows 10 the result was:

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\>example1.py
Exit Relays:

C:\>

After I replaced in "C:\Python27\Lib\site-packages\stem\descriptor\init.py":

def _parse_file_for_path(descriptor_file, *args, **kwargs):
		with open(descriptor_file, 'rb') as desc_file:
			for desc in parse_file(desc_file, *args, **kwargs):
				yield desc

with

def _parse_file_for_path(descriptor_file, *args, **kwargs):
	if os.environ.get('OS','') != 'Windows_NT':
		with open(descriptor_file, 'rb') as desc_file:
			for desc in parse_file(desc_file, *args, **kwargs):
				yield desc
	if os.environ.get('OS','') == 'Windows_NT':
		with open(descriptor_file, 'r') as desc_file:
			for desc in parse_file(desc_file, *args, **kwargs):
				yield desc

the result is:

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\>example1.py
Exit Relays:
  CalyxInstitute14 (0011BD2485AD45D984EC4159C88FC066E5E3300E)
  ieditedtheconfig (0098C475875ABC4AA864738B1D1079F711C38287)
  default (00AE2BBFB5C0BBF25853B49E04CC76895044A795)
  ...

This bug was reported on #tor IRC channel by maiena and I fixed it with this patch.

The file "cached-microdesc-consensus" created by tor on Windows platforms end any line with CRLF (0D0A). As Python stated about "open(name[, mode[, buffering]])", "The most commonly-used values of mode are 'r' for reading, 'w' for writing (truncating the file if it already exists). If mode is omitted, it defaults to 'r'. The default is to use text mode, which may convert '\n' characters to a platform-specific representation on writing and back on reading."

That means that the binary reading on Windows platforms don't resolve the "0D0A"/"0A" problem.

Child Tickets

Attachments (1)

cached-microdesc-consensus (1.3 MB) - added by trodun 2 years ago.
created by Tor 0.2.6.10 on Windows

Download all attachments as: .zip

Change History (8)

comment:1 Changed 2 years ago by atagar

Interesting, thanks for the report! Mind attaching one of the consensus files Stem is failing to parse? As discussed on irc this suggested fix would break python3 so it would be nice to get a local repro so I can come up with something better.

Changed 2 years ago by trodun

Attachment: cached-microdesc-consensus added

created by Tor 0.2.6.10 on Windows

comment:2 Changed 2 years ago by trodun

This bug also affects cached-consensus and possibly other files which use CRLF.

Here is a list of files from DataDirectory with their line endings on Windows:

CRLF cached-certs
CRLF cached-consensus
LF   cached-descriptors
LF   cached-descriptors.new
CRLF cached-microdesc-consensus
LF   cached-microdescs
LF   cached-microdescs.new
CRLF state

All of them are saved with LF on Linux.

Instead of patching stem, would it make sense to decide upon a line ending for current and future files generated by Tor, and use it consistently across all platforms?

Last edited 2 years ago by trodun (previous) (diff)

comment:3 Changed 2 years ago by atagar

Thanks! I'll try to get a fix for this over the weekend. It's certainly tempting to normalize this in Tor but it's probably proper to use Windows newlines on that platform. When in Rome...

Easy enough to fix on Stem's end. ;)

comment:4 Changed 2 years ago by trodun

Great! In that case a separate issue could be opened, for using CRLF in all files generated on Windows.

comment:5 Changed 2 years ago by atagar

Pushed a fix, mind giving it a whirl?

comment:6 Changed 2 years ago by atagar

Resolution: fixed
Status: newclosed

comment:7 Changed 2 years ago by teor

See #17197 for fixing the line endings written by tor.

Note: See TracTickets for help on using tickets.