1. Introduction
This
article presents a step by step analysis of a malware we will call AudioDr7 due
to the URL address it attempts to contact. The MD5 hash for the malware is “ca1c1adab23e5baeeb3b49e0809e4ad4”
and a sample can be found at offensivecomputing.com.
The malware is embedded into a PDF document. Several tools are utilized that
aid in the analysis of this malware. Tools to extract the JavaScript, execute a
payload, obtain the shellcode, and later run the malicious code in an emulator
and debugger. All these are shown later in this article.
2. The Malware
A sample of the malware analyzed in this article can be obtained at http://www.offensivecomputing.net/.
Figure 1.0 - Malware found on Offensive Computing |
The
analysis is performed on a system running Ubuntu 10.04. The PDF document is examined in a file editor in order to identify any suspicious objects
contained within the file. In Figure 1.1 VIM is used to view the PDF file and
examine its contents. Object 13 is the object shown in Figure 1.1. We can be
sure this is malicious code due to the extremely large content in the variable "s". It includes a string of numbers that will most likely represents some form of
a shellcode.
Figure 1.1 - Large string from object 13 from the malicious PDF |
Also
following the string of numbers is JavaScript code that appears to do parsing
for a string. Figure 1.2 shows the code segment following the variable "s" that
was declared. After the preliminary inspection of the PDF document, the tool Jsunpack
[1] is used to extract any JavaScript from the PDF to a separate file.
Figure 1.2 - JavaScript code from object 13 from the malicious PDF |
Figure
1.3 displays the end of the output from Jsunpack. JavaScript is found and it is
written to a separate file named “malware.exe.out”. The output contains the
same information displayed in Figure 1.1 and Figure 1.2. The declaration of the
variable “s” is followed by the code to parse a string.
Figure 1.3 - Output from Jsunpuck executed with the malicious PDF |
3. Analysis of the JavaScript
The
next step in the analysis is to find a way to obtain the shellcode, if it exists
within the PDF. The next tool to use is SpiderMonkey [2] or Google’s V8 JavaScript
Engine [3]. Both of these programs are JavaScript interpreters and they allow
us to run JavaScript code. We use SpiderMonkey to execute our JavaScript
contained in the file malware.exe.out. Also a patched version of SpiderMonkey
1.7 is available and it makes it easier for malware analysis. It redefines vulnerable
functions and objects in order to prevent infection of the system and make the
analysis easier. The patched version of SpiderMonkey 1.7 is used for this
malware analysis alongside a file defined pre.js that defines document objects
in case of a reference error. The file pre.js can be found inside the Jsunpack
folder.
Figure 1.4 - Output of SpiderMonkey executed with the malicious PDF |
The
command to run SpiderMonkey with the pre.js and JavaScript found in
malware.exe.out is shown in figure 1.4. Two interesting results can be obtained
from SpiderMonkey. First the pre.js file from Jsunpack determines the exploit
that the malware attempts to take advantage of. In this case it is “collab.getIcon”.
The second interesting result is the log files that are created by SpiderMonkey.
Figure 1.5 - Folder containing the two log files created by SpiderMonkey |
In
figure 1.5 the files “eval.001.log” and “eval.002.log” are the two files
created by SpiderMonkey. The first file contains the string that is created by
the parsing function in figure 1.2.
Figure 1.6 - Contents of eval.001.log |
The
second file executes the string in the first file and we obtain the payload.
Here we find the shellcode initialized to the variable “payload”. The patched SpiderMonkey
makes it easier for us to execute the JavaScript and obtain the shellcode. If
the process was done manually we would have to hook the eval and unescape
statements as print statements. The JavaScript would have to be modified and
executed twice to obtain the same output.
Figure 1.7 - Snippet from the contents of eval.002.log |
Figure
1.7 shows a snippet of the contents for eval.002.log. The payload starting with
“%uC033” and ending with “%u0070” is copied and saved in a separate file “payload.txt”.
In order to analyze the shellcode we need to convert to hex representation and
for this we use a Perl script provided by “Malware Analyst’s Cookbook and DVD”
[4].
Figure 1.8 - Payload converted to shellcode with Perl Script |
Figure
1.8 shows the HEX and ASCII representation of the shellcode we converted from
the payload string. The ASCII representation displays a url http://audiodr7... that
is most likely the address the malware will attempt to contact and download
more malicious code. The shellcode should be saved in a separate file labeled in
this example “shellcode.txt”. Figure 1.9 shows the command to save the output
to a separate file.
Figure 1.9 - Shellcode saved to text file named "shellcode.txt" |
4. Analysis of Shellcode
The
next step is to utilize a tool called libemu [5] that runs shellcode in an
emulated environment. Libemu should pop an alert if any windows api functions
are called and provide the instructions that are executed.
Figure 1.10 - Output of libemu executed with shellcode |
In
Figure 1.10 the step size is 100000 and the option verbose is enabled. Libemu
displays that the windows function GetTempPathA is called by the malware and
the execution stops there. The reason the execution is stopped because
GetTempPathA expects a temporary path to be returned to the program to use and
none is given so the program cannot continue. This is one limitation of libemu.
However, we can perform a manual analysis of the binary instructions of the
malware and a user level debugger Immunity debugger [6] can be utilized.
The
hex code is needed to inject the malware into immunity debugger. Figure 1.8 displays the hex code and this code is copied to a separate file labeled “hexdump.txt”. To facilitate the process of obtaining the hex
code without the offset or ASCII information the command in Figure 1.11 is used.
Figure 1.11 - Hex dump only of the malicious shellcode |
Instead
of displaying it on the screen we save it to the file hexdump.txt as shown in
Figure 1.12.
Figure 1.12 - Command to output shellcode to text file in hex code format |
Immunity
debugger is installed on a system running Windows XP SP2. From the hex dump
file we can easily obtain the executable file by using the online Sandsprite
tool “shellcode 2 exe” [7]. The hex dump is pasted into the textbox provided by
the webpage and the executable is created and downloaded to the system.
Figure 1.13 - Shellcode 2 exe web interface |
The
file created is labeled “shellcode.exe_”. This file can be opened with immunity
debugger.
Figure 1.14 - Shellcode executable loaded into Immunity Debugger |
To
step through the program the key “F8” is used. To step into a function the key “F7”
is used. To set a software breakpoint the key “F2” is used. To run the program
or execute until a breakpoint is reached, the key “F9” is used. These are the commands
used for this analysis. For an explanation on how to use Immunity Debugger refer
to Dr. Fu’s Security Blog [8].
The
first interesting instruction is at the address 00401002. Here the instruction “MOV
EAX, DWORD PTR FS:[EAX+30]” copies an address to the EAX register. The FS
segment region should set a red flag because this region stores critical
information. The description of this location can be verified with winDBG.
Attach windbg to any process or executable and examine the data structure for
the thread information block.
Figure 1.15 - Data structure for Thread Environment Block in WinDBG |
As
we can see in Figure 1.15 the Instruction FS:[30] refers to the ProcessEnvironmentBlock
section and it is a 32-bit pointer. The next location that is saved to the EAX
register is at the address 00401008. DS[EAX+C] is executed and after
DS[EAX+1C]. First DS[EAX+C] saves the address of the “Ldr” which is a pointer
to _PEB_LDR_DATA. This can be verified with WinDBG.
Figure 1.16 - Data structure of _PEB in WinDBG |
The
second instruction DS[EAX+1C] now saves the address of InInitializationOrderModuleList
to the EAX register. This address points to the beginning of a list of modules
and the malware will probably try to access one of these modules later. This
can also be verified with Windbg.
Figure 1.17 - Data structure of _PEB_LDR_DATA |
As
we can see in Figure 1.17 InInitializationOrderModuleList is at the offset 1C.
Next let us set a breakpoint at 0040105F. As we can see from figure 1.20 there
is a nested loop. After some analysis we can conclude that the malware has its
own hash table and attempts to locate a specific function to load from
kernel32.dll. At the address 0040105B the instruction CMP EDI, EAX compares the
hash values and if they are not equal continues to search the list of modules.
When the malware finds the module it will pass the instruction JNZ and continue
to the instruction at 0040105F which pops the top of the stack to the ESI
register.
Figure 1.18 - Section of shellcode loaded in Immunity Debugger |
After
the breakpoint has been set to 0040105F we can run the program to the
breakpoint with the key “F9”. Continue to step through the program until the
instruction ADD EAX, EBX at address 00401071. Here we find the function that
the malware was searching for in the EAX register. The function is GetTempPathA
and it corresponds to the output of libemu.
Figure 1.19 - Registers of shellcode.exe at address 00401071 |
We
continue to step through the program and inside the function GetTempPathA it
obtains the temp folder for the system and returns the Unicode string to the
malware. Figure 1.20 displays the stack contents at the address 7C822220 which
is inside the function GetTempPathA. The value stored is “C:\DOCUME~1\Mario\LOCALS~1\Temp\”.
Figure 1.19 - Stack contents of shellcode.exe at address 7C822220 |
We
continue to step through the program and at address 0040109E at the instruction
PUSH EAX we can see that the ESI register contains the temp address of the
system and the file name for an executable “e.exe”. This is most likely the
file the malware wants to download.
Figure 1.20 - Immunity Debugger instructions and registers at the address 0040109E |
We
continue to step through the program and notice the functions that are called
by the malware. It should show the true intentions of what the malware is
trying to accomplish. A breakpoint is set at the address POP EDI to quickly
find the different functions the malware will call. This location is chosen
because it is after the hash table function that searches for a function and if
matched will display the name in the stack register.
Figure 1.21 - Immunity Debugger showing the function in EAX register |
The
second function called by the malware is GetProcAddress and this is from the
dll file kernel32. The function name can be seen in the register EAX in Figure
1.21. We continue to the next function by pressing “F9”.
Figure 1.22 - Immunity Debugger showing the function in the EAX register |
Above
in Figure 1.22 the third function called is stored in the EAX register. The
function is LoadLibraryA and it is also found in the kernel32.dll file. If we
further examine the function call to LoadLibrary we find that two extra libraries
are loaded into memory. First twain_32.dll and second urlmon.dll.
Again
we execute the program to the breakpoint at 00401073 and the fourth function
called is URLDownloadToFileA from the library urlmon.dll. The function can be
seen in the EAX register in Figure 1.23.
Figure 1.23 - Immunity Debugger showing the function in the EAX register |
Examining
the call to URLDownloadToFileA we encounter the web address it connects to and attempts to download an executable from this URL. The address is “http://audiodr7...” and
it is the same that appeared in the hexdump of the shellcode in Figure 1.9.
Figure 1.26 shows the stack contents at the address 772BAAD3 inside the
URLDownloadToFileA function.
Figure 1.24 - Stack contents at address 772BAAD3 |
Again
we execute the program to the previously set breakpoint by pressing “F9” and we
obtain the fifth function called. The function WinExec from the library
kernel32 is called and the address is stored in the EAX register. After the
WinExec function is called the malware terminates and the system is infected.
Figure 1.25 - Immunity Debugger showing the function in the EAX register |
5. Conclusion
Now
we have an overview of what the audiodr7 malware is trying to accomplish and
what functions the malware attempts to call. To summarize we have 5 important
functions that are called.
- GetTempPath – Obtains the location of the temporary folder for the system
- GetProcAddress – Obtains the address of the process running
- LoadLibraryA – Calls this function to load two extra libraries, twain_32.dll and urlmon.dll
- URLDownloadFileA – Connects to audiodr7 url and downloads the file “e.exe” to temp location
- WinExec – The last function called in order to execute the downloaded file “e.exe”
To
conclude, many tools exist to help aid in the analysis of malware. The
approach described above is one way to reverse engineer malware, specifically
malware that is embedded into a PDF document.
6.References
[1]
Jsunpack, Available at https://code.google.com/p/jsunpack-n/
[2]
SpiderMonkey, Available at https://developer.mozilla.org/en/SpiderMonkey
[3]
V8 JavaScript Engine, Available at http://code.google.com/p/v8/
[4]
Michael Leigh, “Malware Analyst’s
Cookbook and DVD”, Available at
[5]
Libemu – x86 Shellcode Emulation, Available at http://libemu.carnivore.it/
[6]
Immunity Debugger, Available at http://www.immunitysec.com/products-immdbg.shtml
[7]
Shellcode 2 Exe, Available at http://sandsprite.com/shellcode_2_exe.php
[8]
Dr. Xiang Fu, Malware Analysis Tutorial 4: Int2dh Anti-Debugging, Available at,