Monday, April 9, 2012

Reverse Engineering AudioDr7 PDF Malware

1. Introduction
This article presents a step by step analysis of a malware we will call AudioDr7 due to the URL address it attempts to contact. The MD5 hash for the malware is “ca1c1adab23e5baeeb3b49e0809e4ad4” and a sample can be found at offensivecomputing.com. The malware is embedded into a PDF document. Several tools are utilized that aid in the analysis of this malware. Tools to extract the JavaScript, execute a payload, obtain the shellcode, and later run the malicious code in an emulator and debugger. All these are shown later in this article.

2. The Malware
A sample of the malware analyzed in this article can be obtained at http://www.offensivecomputing.net/.

Figure 1.0 - Malware found on Offensive Computing

The analysis is performed on a system running Ubuntu 10.04. The PDF document is examined in a file editor in order to identify any suspicious objects contained within the file. In Figure 1.1 VIM is used to view the PDF file and examine its contents. Object 13 is the object shown in Figure 1.1. We can be sure this is malicious code due to the extremely large content in the variable "s".  It includes a string of numbers that will most likely represents some form of a shellcode.

Figure 1.1 - Large string from object 13 from the malicious PDF
Also following the string of numbers is JavaScript code that appears to do parsing for a string. Figure 1.2 shows the code segment following the variable "s" that was declared. After the preliminary inspection of the PDF document, the tool Jsunpack [1] is used to extract any JavaScript from the PDF to a separate file.

Figure 1.2 - JavaScript code from object 13 from the malicious PDF
Figure 1.3 displays the end of the output from Jsunpack. JavaScript is found and it is written to a separate file named “malware.exe.out”. The output contains the same information displayed in Figure 1.1 and Figure 1.2. The declaration of the variable “s” is followed by the code to parse a string.

Figure 1.3 - Output from Jsunpuck executed with the malicious PDF
3. Analysis of the JavaScript
The next step in the analysis is to find a way to obtain the shellcode, if it exists within the PDF. The next tool to use is SpiderMonkey [2] or Google’s V8 JavaScript Engine [3]. Both of these programs are JavaScript interpreters and they allow us to run JavaScript code. We use SpiderMonkey to execute our JavaScript contained in the file malware.exe.out. Also a patched version of SpiderMonkey 1.7 is available and it makes it easier for malware analysis. It redefines vulnerable functions and objects in order to prevent infection of the system and make the analysis easier. The patched version of SpiderMonkey 1.7 is used for this malware analysis alongside a file defined pre.js that defines document objects in case of a reference error. The file pre.js can be found inside the Jsunpack folder.

Figure 1.4 - Output of SpiderMonkey executed with the malicious PDF
The command to run SpiderMonkey with the pre.js and JavaScript found in malware.exe.out is shown in figure 1.4. Two interesting results can be obtained from SpiderMonkey. First the pre.js file from Jsunpack determines the exploit that the malware attempts to take advantage of. In this case it is “collab.getIcon”. The second interesting result is the log files that are created by SpiderMonkey.

Figure 1.5 - Folder containing the two log files created by SpiderMonkey
In figure 1.5 the files “eval.001.log” and “eval.002.log” are the two files created by SpiderMonkey. The first file contains the string that is created by the parsing function in figure 1.2.

Figure 1.6 - Contents of eval.001.log
The second file executes the string in the first file and we obtain the payload. Here we find the shellcode initialized to the variable “payload”. The patched SpiderMonkey makes it easier for us to execute the JavaScript and obtain the shellcode. If the process was done manually we would have to hook the eval and unescape statements as print statements. The JavaScript would have to be modified and executed twice to obtain the same output.

Figure 1.7 - Snippet from the contents of eval.002.log

Figure 1.7 shows a snippet of the contents for eval.002.log. The payload starting with “%uC033” and ending with “%u0070” is copied and saved in a separate file “payload.txt”. In order to analyze the shellcode we need to convert to hex representation and for this we use a Perl script provided by “Malware Analyst’s Cookbook and DVD” [4].

Figure 1.8 - Payload converted to shellcode with Perl Script

Figure 1.8 shows the HEX and ASCII representation of the shellcode we converted from the payload string. The ASCII representation displays a url http://audiodr7... that is most likely the address the malware will attempt to contact and download more malicious code. The shellcode should be saved in a separate file labeled in this example “shellcode.txt”. Figure 1.9 shows the command to save the output to a separate file.

Figure 1.9 - Shellcode saved to text file named "shellcode.txt"

4. Analysis of Shellcode
The next step is to utilize a tool called libemu [5] that runs shellcode in an emulated environment. Libemu should pop an alert if any windows api functions are called and provide the instructions that are executed.

Figure 1.10 - Output of libemu executed with shellcode

In Figure 1.10 the step size is 100000 and the option verbose is enabled. Libemu displays that the windows function GetTempPathA is called by the malware and the execution stops there. The reason the execution is stopped because GetTempPathA expects a temporary path to be returned to the program to use and none is given so the program cannot continue. This is one limitation of libemu. However, we can perform a manual analysis of the binary instructions of the malware and a user level debugger Immunity debugger [6] can be utilized.

The hex code is needed to inject the malware into immunity debugger. Figure 1.8 displays the hex code and this code is copied to a separate file labeled “hexdump.txt”.  To facilitate the process of obtaining the hex code without the offset or ASCII information the command in Figure 1.11 is used.

Figure 1.11 - Hex dump only of the malicious shellcode

Instead of displaying it on the screen we save it to the file hexdump.txt as shown in Figure 1.12.

Figure 1.12 - Command to output shellcode to text file in hex code format

Immunity debugger is installed on a system running Windows XP SP2. From the hex dump file we can easily obtain the executable file by using the online Sandsprite tool “shellcode 2 exe” [7]. The hex dump is pasted into the textbox provided by the webpage and the executable is created and downloaded to the system.

Figure 1.13 - Shellcode 2 exe web interface

The file created is labeled “shellcode.exe_”. This file can be opened with immunity debugger.

Figure 1.14 - Shellcode executable loaded into Immunity Debugger

To step through the program the key “F8” is used. To step into a function the key “F7” is used. To set a software breakpoint the key “F2” is used. To run the program or execute until a breakpoint is reached, the key “F9” is used. These are the commands used for this analysis. For an explanation on how to use Immunity Debugger refer to Dr. Fu’s Security Blog [8].

The first interesting instruction is at the address 00401002. Here the instruction “MOV EAX, DWORD PTR FS:[EAX+30]” copies an address to the EAX register. The FS segment region should set a red flag because this region stores critical information. The description of this location can be verified with winDBG. Attach windbg to any process or executable and examine the data structure for the thread information block.

Figure 1.15 - Data structure for Thread Environment Block in WinDBG

As we can see in Figure 1.15 the Instruction FS:[30] refers to the ProcessEnvironmentBlock section and it is a 32-bit pointer. The next location that is saved to the EAX register is at the address 00401008. DS[EAX+C] is executed and after DS[EAX+1C]. First DS[EAX+C] saves the address of the “Ldr” which is a pointer to _PEB_LDR_DATA. This can be verified with WinDBG.

Figure 1.16 - Data structure of _PEB in WinDBG

The second instruction DS[EAX+1C] now saves the address of InInitializationOrderModuleList to the EAX register. This address points to the beginning of a list of modules and the malware will probably try to access one of these modules later. This can also be verified with Windbg.

Figure 1.17 - Data structure of _PEB_LDR_DATA
As we can see in Figure 1.17 InInitializationOrderModuleList is at the offset 1C. Next let us set a breakpoint at 0040105F. As we can see from figure 1.20 there is a nested loop. After some analysis we can conclude that the malware has its own hash table and attempts to locate a specific function to load from kernel32.dll. At the address 0040105B the instruction CMP EDI, EAX compares the hash values and if they are not equal continues to search the list of modules. When the malware finds the module it will pass the instruction JNZ and continue to the instruction at 0040105F which pops the top of the stack to the ESI register.

Figure 1.18 - Section of shellcode loaded in Immunity Debugger

After the breakpoint has been set to 0040105F we can run the program to the breakpoint with the key “F9”. Continue to step through the program until the instruction ADD EAX, EBX at address 00401071. Here we find the function that the malware was searching for in the EAX register. The function is GetTempPathA and it corresponds to the output of libemu.

Figure 1.19 - Registers of shellcode.exe at address 00401071

We continue to step through the program and inside the function GetTempPathA it obtains the temp folder for the system and returns the Unicode string to the malware. Figure 1.20 displays the stack contents at the address 7C822220 which is inside the function GetTempPathA. The value stored is “C:\DOCUME~1\Mario\LOCALS~1\Temp\”.

Figure 1.19 - Stack contents of shellcode.exe at address 7C822220

We continue to step through the program and at address 0040109E at the instruction PUSH EAX we can see that the ESI register contains the temp address of the system and the file name for an executable “e.exe”. This is most likely the file the malware wants to download.

Figure 1.20 - Immunity Debugger instructions and registers at the address 0040109E

We continue to step through the program and notice the functions that are called by the malware. It should show the true intentions of what the malware is trying to accomplish. A breakpoint is set at the address POP EDI to quickly find the different functions the malware will call. This location is chosen because it is after the hash table function that searches for a function and if matched will display the name in the stack register.

Figure 1.21 - Immunity Debugger showing the function in EAX register

The second function called by the malware is GetProcAddress and this is from the dll file kernel32. The function name can be seen in the register EAX in Figure 1.21. We continue to the next function by pressing “F9”.

Figure 1.22 - Immunity Debugger showing the function in the EAX register

Above in Figure 1.22 the third function called is stored in the EAX register. The function is LoadLibraryA and it is also found in the kernel32.dll file. If we further examine the function call to LoadLibrary we find that two extra libraries are loaded into memory. First twain_32.dll and second urlmon.dll.

Again we execute the program to the breakpoint at 00401073 and the fourth function called is URLDownloadToFileA from the library urlmon.dll. The function can be seen in the EAX register in Figure 1.23.

Figure 1.23 - Immunity Debugger showing the function in the EAX register

Examining the call to URLDownloadToFileA we encounter the web address it connects to and attempts to download an executable from this URL. The address is “http://audiodr7...” and it is the same that appeared in the hexdump of the shellcode in Figure 1.9. Figure 1.26 shows the stack contents at the address 772BAAD3 inside the URLDownloadToFileA function.

Figure 1.24 - Stack contents at address 772BAAD3

Again we execute the program to the previously set breakpoint by pressing “F9” and we obtain the fifth function called. The function WinExec from the library kernel32 is called and the address is stored in the EAX register. After the WinExec function is called the malware terminates and the system is infected.

Figure 1.25 - Immunity Debugger showing the function in the EAX register

5. Conclusion
Now we have an overview of what the audiodr7 malware is trying to accomplish and what functions the malware attempts to call. To summarize we have 5 important functions that are called.
  1. GetTempPath – Obtains the location of the temporary folder for the system
  2. GetProcAddress – Obtains the address of the process running
  3. LoadLibraryA – Calls this function to load two extra libraries, twain_32.dll and urlmon.dll
  4. URLDownloadFileA – Connects to audiodr7 url and downloads the file “e.exe” to temp location
  5. WinExec – The last function called in order to execute the downloaded file “e.exe”
To conclude, many tools exist to help aid in the analysis of malware. The approach described above is one way to reverse engineer malware, specifically malware that is embedded into a PDF document.

6.References
[1] Jsunpack, Available at https://code.google.com/p/jsunpack-n/
[2] SpiderMonkey, Available at https://developer.mozilla.org/en/SpiderMonkey
[3] V8 JavaScript Engine, Available at http://code.google.com/p/v8/
[4] Michael Leigh, “Malware Analyst’s Cookbook and DVD”, Available at
[5] Libemu – x86 Shellcode Emulation, Available at http://libemu.carnivore.it/
[6] Immunity Debugger, Available at http://www.immunitysec.com/products-immdbg.shtml
[7] Shellcode 2 Exe, Available at http://sandsprite.com/shellcode_2_exe.php
[8] Dr. Xiang Fu, Malware Analysis Tutorial 4: Int2dh Anti-Debugging, Available at,