See PythonTranslator for the origin of this exercise.
Original Java code:
See also: CommentingChallengeResponsePartTwo
import java.io.*; import java.util.*; /** * Like HttpUtils.parseQueryString, except that it never throws parse * exceptions. */ public class QueryStringParser { private Dictionary _result = new Hashtable(); private InputStream _stream; private int _nextCharacter; public QueryStringParser(InputStream stream) throws IOException { _stream = stream; _nextCharacter = stream.read(); } public Dictionary parseArgs() throws IOException { while (hasAnotherCharacter()) { parseNameValuePair(); } return _result; } private void parseNameValuePair() throws IOException { String name = readUpTo('='); String value = readUpTo('&'); _result.put(name, value); } private String readUpTo(char boundaryCharacter) throws IOException { String word = ""; while(hasAnotherCharacter()) { char character = readCharacter(); if (character == boundaryCharacter) return word; else if (character == '%') word += readHexEncodedCharacter(); else if (character == '+') word += " "; else word += character; } return word; } private String readHexEncodedCharacter() throws IOException { int sixteens = readHexDigit(); int ones = readHexDigit(); if ((sixteens < 0) || (ones < 0)) return ""; char character = (char)((16 * sixteens) + ones); return "" + character; } private int readHexDigit() throws IOException { if (!hasAnotherCharacter()) return -1; return Character.digit(readCharacter(), 16); } private boolean hasAnotherCharacter() { return (_nextCharacter >= 0); } private char readCharacter() throws IOException { if (!hasAnotherCharacter()) throw new IllegalStateException("assertion failed"); char result = (char)_nextCharacter; _nextCharacter = _stream.read(); return result; } }Python port of the Java example:
This is pretty much a line-for-line port of Java code that parses an HTTP query string. Differences between the two languages for this code snippet:
class queryStringParser: def __init__(self, stream): self._result = {} self._stream = stream self._nextCharacter = stream.read() def parseArgs(self): while self.hasAnotherCharacter(): self.parseNameValuePair() return self._result def parseNameValuePair(self): name = self.readUpTo('=') value = self.readUpTo('&') self._result[name] = value def readUpTo(self, boundaryCharacter): word = "" while self.hasAnotherCharacter(): character = self.readCharacter() if (character == boundaryCharacter): return word elif (character == '%'): word = word + self.readHexEncodedCharacter() elif (character == '+'): word = word + " " else: word = word + character return word def readHexEncodedCharacter(self): sixteens = self.readHexDigit() ones = self.readHexDigit() if (sixteens < 0 or ones < 0): return "" character = 16*sixteens + ones return "%c" % character def readHexDigit(self): if (not self.hasAnotherCharacter()): return -1 return Character().digit(self.readCharacter(), 16) def hasAnotherCharacter(self): return (self._nextCharacter is not None) def readCharacter(self): if (not self.hasAnotherCharacter()): raise "assertion failed" result = self._nextCharacter self._nextCharacter = self._stream.read() return resultThe following classes are an artifact of porting the original code from Java. If you were writing pure Python code from scratch, you might use file/string methods directly.
class inputStream: def __init__(self): self.data = "search=find+stuff+here+%26+do+stuff&foo=bar" def read(self): try: c = self.data[0] self.data = self.data[1:] except: c = None return c class Character: def digit(self, c, base): try: return long(c, base) except: return -1Extensive unit-testing:)
qsp = queryStringParser(inputStream()) print qsp.parseArgs()Enjoy. -- SteveHowell
Here's a Python non-translation, 33 lines instead of 70. It depends on Python2 features: at least list comprehensions and string methods; and since Python's string type (like Java's String class) is immutable, I accumulate characters in a list to avoid O(N^2) behavior. It would be a little longer without those features. As it is, it's about half as long as the more literal implementation above. It also improves over the previous Python implementation in the following ways:
import sys, StringIO, re # decode a URL string # URL-unescape def decode(astring): return re.sub('%(..)', lambda mo: chr(int(mo.group(1), 16)), astring.replace('+', ' ')) class queryStringParser: def __init__(self, input): self._stream = input def parseArgs(self): # read up to first \0 chars = [] while 1: c = self._stream.read(1) if c == '' or c == '\0': break chars.append(c) query_string = ''.join(chars) # parse rv = {} for name, value in [ pair.split('=', 1) for pair in query_string.split('&')]: rv[decode(name)] = decode(value) return rv qsp = queryStringParser( StringIO.StringIO("a=b&c=d+e&f=g=h&i=%2bjk%21l\0bad=man") ) print qsp.parseArgs() print qsp.parseArgs()
Here's an even shorter even cleaner idiomatic implementation.:
import re class queryStringParser(list): def __init__(self, args): # [::-1] reverses the list so that pop() can be used self.extend(args.split("\0")[::-1]) def _decode(self, astring): def _convert(amatch): return chr(int(amatch.group(1), 16)) astring = astring.replace('+', ' ') return re.sub('%(..)', _convert, astring) def parseArgs(self): query_string = self.pop() # Throw IndexError if called too often pairs = [pair.split('=', 1) for pair in query_string.split('&')] rv = [(self._decode(name), self._decode(val)) for name, val in pairs] return dict(rv) qsp = queryStringParser("a=b&c=d+e&f=g=h&i=%2bjk%21l\0bad=man") print qsp.parseArgs() print qsp.parseArgs()
Anyone care to add another translation of this program? How about Smalltalk?