None
Raw HTTP Requests
Abusing Python's httplib

Part of a recent project involved sending HTTP connections over non-standard transportation networks. The problem is that all the usual offenders (urllib2, Requests, etc) don't provide an interface to either wrap the connection or extract the final product - but this is Python; we can monkeypatch it! The first step was figuring out how urllib2 requests actually get sent to a socket...

# urllib2 module
class AbstractHTTPHandler(BaseHandler):
    def do_open(self, http_class, req):
        h = http_class(host, timeout=req.timeout) # will parse host:port
        try:
            h.request(req.get_method(), req.get_selector(), req.data, headers)
        except socket.error, err: # XXX what error?
            raise URLError(err)

class HTTPHandler(AbstractHTTPHandler):
    def http_open(self, req):
        return self.do_open(httplib.HTTPConnection, req)
    http_request = AbstractHTTPHandler.do_request_

# httplib module
class HTTPConnection:
    def request(self, method, url, body=None, headers={}):
        """Send a complete request to the server."""
        self._send_request(method, url, body, headers)
    def _send_request(self, method, url, body, headers):
        if body is not None and 'content-length' not in header_names:
            self._set_content_length(body)
        for hdr, value in headers.iteritems():
            self.putheader(hdr, value)
        self.endheaders(body)
    def endheaders(self, message_body=None):
        """Indicate that the last header line has been sent to the server."""
        self._send_output(message_body)
    def _send_output(self, message_body=None):
        """Send the currently buffered request and clear the buffer."""
        self.send(msg)

Searching for 'socket' in the urllib2 module reveals AbstractHTTPHandler.do_open making a request method call on an http_class argument. That argument is set by HTTPHandler.http_open as httplib.HTTPConnection (how 'http_open' actually gets called is interesting; urlopen calls OpenerDirector.open calls OpenerDirector._open which executes "self._call_chain(self.handle_open, protocol, protocol + '_open', req)").

Now in the httplib module, HTTPConnection.request calls HTTPConnection._send_request calls HTTPConnection.endheaders calls HTTPConnection._send_output which, finally, calls self.send!

class HTTPConnection:
    def send(self, data):
        """Send `data' to the server."""
        if self.sock is None:
            if self.auto_open:
                self.connect()
            else:
                raise NotConnected()

        if self.debuglevel > 0:
            print "send:", repr(data)
        blocksize = 8192
        if hasattr(data,'read') and not isinstance(data, array):
            if self.debuglevel > 0: print "sendIng a read()able"
            datablock = data.read(blocksize)
            while datablock:
                self.sock.sendall(datablock)
                datablock = data.read(blocksize)
        else:
            self.sock.sendall(data)

self.sock.sendall looks like a send on a socket, but only code can confirm...

class HTTPConnection:
    def __init__(self, host, port=None, strict=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT, source_address=None):
        self.sock = None
    def connect(self):
        """Connect to the host and port specified in __init__."""
        self.sock = socket.create_connection((self.host,self.port),self.timeout, self.source_address)
        if self._tunnel_host:
            self._tunnel()

There we go! The socket is instantiated in httplib.HTTPConnection.connect and finally receives data via sendall (HTTPS is similar). So now the interesting question became how to patch this long chain of objects and methods so as to output the data instead of sending it over a real socket. While Python doesn't natively provide any socket-mimicing objects, socket.sendall just behaves like a standard file write. That means the file-like IO objects in the io module can act as stand-ins after monkeypatching a sendall method as below. Subclassing BytesIO would be a better strategy if we wanted to more fully mimic a socket, but sendall is the only necessary method for now.

import io
buffer = io.BytesIO()
buffer.sendall = buffer.write

A similar trick can't be directly used against the httplib.HTTPConnection.connect instance method without patching urllib2.do_open (which has a ton of code) to access the http_class(..) instance. An alternative is to patch the httplib.HTTPConnection.connect class method instead which is both easier and harder. It provides automatic access to the http_class(..) instance, but arguments must match the limited existing arguments (ie, self) and execution is constrained to the httplib module. An indirect lambda can be used to provide access to locally scoped variables though. Since that pretty much wraps up the necessary adjustments - here's the resulting function:

def get_request_data(url, data=None, headers={}, **kwargs):
    import io
    buffer = io.BytesIO()
    buffer.sendall = buffer.write

    targets = [httplib.HTTPConnection,httplib.HTTPSConnection]
    old_connects = [target.connect for target in targets]

    def fake_connect(self, buffer): self.sock = buffer
    for target in targets: target.connect = lambda self:fake_connect(self,buffer)
    try:
        urllib2.urlopen(urllib2.Request(url, data=data, headers=headers, **kwargs)).read()
    except AttributeError:  # socket interface only partially implemented on StringIO
        pass # missing makefile method throws an AttributeError in HTTPResponse.__init__
    finally:
        for i,target in enumerate(targets): target.connect = old_connects[i]
    return buffer.getvalue()

Example Execution

>>> import net
>>> net.get_request_data('http://www.shysecurity.com')
'GET / HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: www.shysecurity.com\r\nConnection: close\r\nUser-Agent: Python-urllib/2.7\r\n\r\n'
>>> net.get_request_data('https://www.shysecurity.com',data='upload',headers={'host':'www.not-shysecurity.com','User-Agent':'custom'})
'POST / HTTP/1.1\r\nAccept-Encoding: identity\r\nContent-Length: 6\r\nHost: www.not-shysecurity.com\r\nContent-Type: application/x-www-form-urlencoded\r\nConnection: close\r\nUser-Agent: custom\r\n\r\nupload'
>>> print net.get_request_data('https://www.shysecurity.com',data='upload',headers={'host':'shysecurity.com','User-Agent':'custom'})
POST / HTTP/1.1
Accept-Encoding: identity
Content-Length: 6
Host: shysecurity.com
Content-Type: application/x-www-form-urlencoded
Connection: close
User-Agent: custom

upload

Code available on Github

- Kelson (kelson@shysecurity.com)