Compass' command-line script can't encode unicode characters
Today I found that tail
and less
are unhappy about the task #6329 (moved) script printing out unicode characters. When piping its output into tail
or less
, the script exits with a traceback. When writing to stdout directly, Python is happy.
Here's how to reproduce the problem:
-
Clone the metrics-tasks repository.
-
Navigate to the #6329 (moved) script and make it download required data:
cd task-6329/; ./tor-relays-stats.py -d
-
Find a unicode character in an AS name:
grep -B1 "as_name.*\\\\u" details.json
-
Display relays in that AS, e.g. AS28548:
./tor-relays-stats.py -i -a 28548 | tail
Python should print out the following traceback:
Traceback (most recent call last):
File "./tor-relays-stats.py", line 197, in <module>
short=70 if options.short else None)
File "./tor-relays-stats.py", line 110, in print_groups
print formatted_group[:short]
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 144: ordinal not in range(128)
I found that a possible solution is to replace all Unicode characters with '?'s, but that doesn't seem very elegant:
- exit, guard, country, as_number, as_name)
+ exit, guard, country, as_number, as_name.encode('ascii', 'replace'))
Are there better solutions?