Query String Parser Translations

See PythonTranslator for the origin of this exercise.


Original Java code:

See also: CommentingChallengeResponsePartTwo

  import java.io.*;
  import java.util.*;

/** * Like HttpUtils.parseQueryString, except that it never throws parse * exceptions. */ public class QueryStringParser {

private Dictionary _result = new Hashtable(); private InputStream _stream; private int _nextCharacter;

public QueryStringParser(InputStream stream) throws IOException { _stream = stream; _nextCharacter = stream.read(); }

public Dictionary parseArgs() throws IOException { while (hasAnotherCharacter()) { parseNameValuePair(); } return _result; }

private void parseNameValuePair() throws IOException { String name = readUpTo('='); String value = readUpTo('&'); _result.put(name, value); }

private String readUpTo(char boundaryCharacter) throws IOException { String word = "";

while(hasAnotherCharacter()) { char character = readCharacter();

if (character == boundaryCharacter) return word; else if (character == '%') word += readHexEncodedCharacter(); else if (character == '+') word += " "; else word += character; } return word; }

private String readHexEncodedCharacter() throws IOException { int sixteens = readHexDigit(); int ones = readHexDigit(); if ((sixteens < 0) || (ones < 0)) return "";

char character = (char)((16 * sixteens) + ones); return "" + character; }

private int readHexDigit() throws IOException { if (!hasAnotherCharacter()) return -1;

return Character.digit(readCharacter(), 16); }

private boolean hasAnotherCharacter() { return (_nextCharacter >= 0); }

private char readCharacter() throws IOException { if (!hasAnotherCharacter()) throw new IllegalStateException("assertion failed");

char result = (char)_nextCharacter; _nextCharacter = _stream.read(); return result; }

}

Python port of the Java example:

This is pretty much a line-for-line port of Java code that parses an HTTP query string. Differences between the two languages for this code snippet:

This code doesn't demonstrate any compelling advantages for Python over Java; instead, it simply shows that you can port from one language to the other without jumping through major hoops. You could, for example, prototype code like this in Python, then port the code to Java when you wanted stronger typing and more punctuation. ;}

    class queryStringParser:
        def __init__(self, stream):
            self._result = {}
            self._stream = stream
            self._nextCharacter = stream.read()

def parseArgs(self): while self.hasAnotherCharacter(): self.parseNameValuePair() return self._result

def parseNameValuePair(self): name = self.readUpTo('=') value = self.readUpTo('&') self._result[name] = value

def readUpTo(self, boundaryCharacter): word = "" while self.hasAnotherCharacter(): character = self.readCharacter() if (character == boundaryCharacter): return word elif (character == '%'): word = word + self.readHexEncodedCharacter() elif (character == '+'): word = word + " " else: word = word + character return word

def readHexEncodedCharacter(self): sixteens = self.readHexDigit() ones = self.readHexDigit() if (sixteens < 0 or ones < 0): return "" character = 16*sixteens + ones return "%c" % character

def readHexDigit(self): if (not self.hasAnotherCharacter()): return -1 return Character().digit(self.readCharacter(), 16)

def hasAnotherCharacter(self): return (self._nextCharacter is not None)

def readCharacter(self): if (not self.hasAnotherCharacter()): raise "assertion failed" result = self._nextCharacter self._nextCharacter = self._stream.read() return result

The following classes are an artifact of porting the original code from Java. If you were writing pure Python code from scratch, you might use file/string methods directly.

    class inputStream:
        def __init__(self):
            self.data = "search=find+stuff+here+%26+do+stuff&foo=bar"

def read(self): try: c = self.data[0] self.data = self.data[1:] except: c = None return c

class Character: def digit(self, c, base): try: return long(c, base) except: return -1

Extensive unit-testing:)

    qsp = queryStringParser(inputStream())
    print qsp.parseArgs()

Enjoy. -- SteveHowell


Here's a Python non-translation, 33 lines instead of 70. It depends on Python2 features: at least list comprehensions and string methods; and since Python's string type (like Java's String class) is immutable, I accumulate characters in a list to avoid O(N^2) behavior. It would be a little longer without those features. As it is, it's about half as long as the more literal implementation above. It also improves over the previous Python implementation in the following ways:

    import sys, StringIO, re

# decode a URL string

# URL-unescape def decode(astring): return re.sub('%(..)', lambda mo: chr(int(mo.group(1), 16)), astring.replace('+', ' '))

class queryStringParser: def __init__(self, input): self._stream = input def parseArgs(self): # read up to first \0 chars = [] while 1: c = self._stream.read(1) if c == '' or c == '\0': break chars.append(c) query_string = ''.join(chars)

# parse rv = {} for name, value in [ pair.split('=', 1) for pair in query_string.split('&')]: rv[decode(name)] = decode(value) return rv

qsp = queryStringParser( StringIO.StringIO("a=b&c=d+e&f=g=h&i=%2bjk%21l\0bad=man") ) print qsp.parseArgs() print qsp.parseArgs()


Here's an even shorter even cleaner idiomatic implementation.:

 import re

class queryStringParser(list): def __init__(self, args): # [::-1] reverses the list so that pop() can be used self.extend(args.split("\0")[::-1])

def _decode(self, astring): def _convert(amatch): return chr(int(amatch.group(1), 16))

astring = astring.replace('+', ' ') return re.sub('%(..)', _convert, astring)

def parseArgs(self): query_string = self.pop() # Throw IndexError if called too often pairs = [pair.split('=', 1) for pair in query_string.split('&')] rv = [(self._decode(name), self._decode(val)) for name, val in pairs] return dict(rv)

qsp = queryStringParser("a=b&c=d+e&f=g=h&i=%2bjk%21l\0bad=man") print qsp.parseArgs() print qsp.parseArgs()


Anyone care to add another translation of this program? How about Smalltalk?


EditText of this page (last edited November 17, 2009) or FindPage with title or text search