Wednesday, November 28, 2012

Stack Overflows - Part 1 : The Basics


I wanted to start the Exploit development learning series since quite some time ..Eventually , I've managed to get some time out of the work to start with the tutorial series ..With this , I plan to go through the series of excercise and concepts and help the beginners learn the basics of Exploit development and advance concepts in exploitation ..

Disclaimer : I would want to mention here that these tutorials are only for the educational purpose . Knowledge acquired through this series should never be used for hacking into the networks / computers or performing similar illegal activities . You should continue to read only if you agree with this disclaimer.

Historically , Stack based buffer overflows have been the most prevalent class of security bugs in the software and have been around for as long as C Language ..Numerous methods and techniques have been published to exploit Stack based buffer overflows .. for instance , Phrack papers etc ..Even today , this is one of the category of bugs which has remained to be widely exploited ..I am starting with some of the theoritical concepts and then we'll move on to the more advance stuff as we go ..

Windows Memory Layout

A developer basically visualizes the Windows memory layout as the flat memory model where , in a 32 bit system , processor can generate and access any memory address within the range from 0 to 2^32 . i.e from 0x00000000 to 0xFFFFFFFF. Internally , Windows uses techniques known as Memory Segmentation and Paging in order to divide the memory into protected segments called : Code segment , Data segment and Stack segment. While each process resides in the virtual address space of its own , it is not allowed to access the address space of another process and same for all other processes residing in the memory so that  they do not end up corrupting the data structures of other process. The access control to the memory segments was primarily controlled by Segmentation ( In systems based on older processors )  but now everything lies in the Paging. If we visualize the flat memory layout of Windows , it looks like this :

















However , in the flat memory model , Windows NT used to implement 32 bit of continuous and  linear addressible space and 286/386 class of Intel processors had a 32 bit address bus which could access any memory address in absence of Segmentation / Paging techniques. This is basically done through the processor segment registers which are loaded with the 2 byte segment selectors used as an index to segment descriptor tables , thereby translating 32 bit logical addresses into linear address . All of these were taken care of in the hardware which allows the programer to assume the address space as the flat memory model.

As we can visualize from the above picture , 0x00000000 to 0x7FFFFFFF is called the User space memory which consist of process image , process stack / heap , per thread stack, DLL code, and user land process data structures while the address space above that is called Kernel space memory where the kernel code resides.

As a side note , the saperation between user space memory and kernel space memory is implemented via Paging technique ..precisely a flag in the page directory entries ( PDEs ) marks this boundary ..Let's not worry about all that stuff now..

Process Memory in Windows

In Windows environment , each process is loaded by OS in its own virtual address space. As I said above , this is the user space memory which means the process will be able to see the entire address space from 0x00000000 to 0x7FFFFFFF as the linear address space and the memory higher than that belongs to the kernel which user land process does not have an access to .

When the process is created  , additional per process data structures like Process Environment block ( PEB ) and Thread Environment block is also created ..These data structures are of the prime interest to the exploit developers . The way they use the PEB is to access the PEB_LDR_DATA which contains all the information about the loaded DLL modules within the process and the shellcode accesses this data structure to retrieve the base address of any DLL ..Perhaps they could access the base address of kernel32.dll and then loadlibraryA to perform some interesting stuff !! :-)..

This is how any Win32 process memory map will look like :


















Memory area marked as "Program Image" is where the program code and everything else that is immediately visible about the program resides .

Data section : This is the writeable section of the load binary where all the initialized  / static / global variables of the program are located. For instance , the C program statement : int a = 2 , char buff[]="Hello" static int a = 1; etc  are stored in the data section .

Code section :  This is segment of the program where all the compiled code is located.

BSS section : This is the area where all the uninitialized variables of the program is stored. C program statement like : int a ; static int a ; etc ..are located in BSS section ..

RSRC section : This is resource section of the PE file which contains the information related to UI of the program : icons , menus , dialog boxes , cursor, fonts and things like that .

Heap section : This is the area in the memory where all the dynamically allocated data is stored..For eg the memory allocated by malloc() is in this space.

The Stack

Stack is the area in the process memory and per process data structure , which allows the process to access the data in the LIFO fashion . i.e Last In First Out : meaning , the most recent data stored on the stack is removed first from there ..There are two primary operations that are done on the Stack : PUSH and POP ..Both of these are CPU instructions which manipulates the stack .

When a process is created , the stack size is comitted and memory is allocated for process to store the data on it . Intel processors has 32 bit register called ESP ( Extended Stack Pointer ) which points to the top of the stack memory. When PUSH operation is performed , data is stored on the stack and ESP is decremented , since the stack grows towards low memory address. When POP operation is performed , data is accessed from the stack and ESP is incremented .













Following are the basic uses of the Stack memory :

1 . When any subroutine is called using CALL instruction, the arguments passed to the subroutine are PUSHed on the stack.
2 . The next memory address to which the caller should return after executing the subroutine, is also stored on the stack before calling that subroutine.
3 . During the execution of the subroutine , the memory for storing the temporary local variables is allocated on the stack .

Also, it is important to note that Stack is the temporary storage , and the memory is wiped off after returning from the subroutine. We will quickly walk through the Intel x86 registers and then with the example code , we will see how exactly the stack frame of the function is established.

Intel x86 architecture General purpose registers and Intruction pointer

Processor registers are the memory storage areas used to store the data for several arithmatic & logical operations performed by processor. Since they are built into the processor itself , the access to this registers is very fast. Intel x86 architecture has 8 general purpose registers and an Intrustion pointer register which points to the next intruction to be executed .

EAX : known as Accumulator register . Usually used to store the value of the arithmatic and logical operations on the data as well as the return values from the functions.
EBX : base pointer to the data section of the program. Normally used to store the data
ECX : used as the counter to string and loop operations ..ECX stores the value which is decremented for loop operation.
EDX : used as I/O pointer . Also used to perform little complex calculations ( multiply / divide etc..)
EBP : Stack frame base pointer register.It points to the start of the function stack frame and also used to access the function arguments via offsets.
ESP : Stack pointer. As discussed before , ESP always points to the top of the stack.
ESI : Source pointer for string operations ( string copy , string comparison etc..)
EDI: Destination pointer for string operations ( string copy , string comparison etc. )
EIP : Instruction pointer register which points to the next instruction to execute.


Visualization of stack memory with example code

Let's see the behaviour of the stack and operations performed on it with the example C code. I am using the following C code compiled with Microsoft Visual Studio 2010 Express edition.

int main (int argc , char *argv[])
{
char buffer[20];
strcpy(buffer , argv[1]);
if (!strcmp(buffer,"password"))
{
printf("Login Successful...\n");
return 1;
exit(1);
}
else
{
printf("Access denied..password incorrect..\n");
return 0;
exit(0);
}
}

If we open the binary in the debugger and look at the code , we see that the pointers to argv[] and argc are pushed on the stack before calling main()..Since this binary accepts the command line arguments , I am passing it in the debugger as follows :

we need to navigate the menu Debug --> Arguments, and we can pass the command line parameters to the binary:








Next the argc and argv are pushed on the stack :








If you cannot read the code , here is how it looks like :

PUSH EAX
MOV ECX, DWORD PTR DS:[argv] --- > argv moved to ECX register
PUSH ECX                ------>  ECX pushed on the stack
MOV EDX, DWORD PTR DS:[argc] ----> argc moved to EDX register
PUSH EDX                 ------>  argc pushed on the stack
CALL Buffer_O.main  ------ > Call main ()
ADD ESP, 0C

Our stack at this point of time will look like this :















Next , before calling main() , address of the instruction next to CALL, is pushed on the stack so that it knows where to resume execution after returning from main() . Stack will look like this:















At this point , if we take a look at the stack in the ollydbg debugger , here is how the stack is setup while calling main() function . If you observe the bottom right windows of the debugger , it is the stack window and  the highlighted portion of the stack shows the pushed EIP and the two arguments to the main .











Code window in the above picture is actually the disassembled code of the main() . If we examine some of the initial lines of the assembly code, we can relate it to the C souce I showed a while back.

PUSH EBP
MOV EBP, ESP
SUB ESP, 54
PUSH EBX
PUSH ESI
PUSH EDI                                                  ;  ntdll.7C910228
MOV EAX, DWORD PTR SS:[EBP+C]              ;  kernel32.7C81776F
MOV ECX, DWORD PTR DS:[EAX+4]
PUSH ECX                                                  ; /src = "L+7"
LEA EDX, DWORD PTR SS:[EBP-14]                            ; |
PUSH EDX                                                  ; |dest = 00000002
CALL Buffer_O.strcpy                               ; \strcpy
ADD ESP, 8
PUSH Buffer_O.0040316C                          ; /s2 = "password"
LEA EAX, DWORD PTR SS:[EBP-14]                            ; |
PUSH EAX                                                  ; |s1 = "X17"
CALL Buffer_O.strcmp                              ; \strcmp
ADD ESP, 8
TEST EAX, EAX
JNZ SHORT Buffer_O.00401050
PUSH Buffer_O.00403154                           ; /format = "Login Successful...\n"
CALL DWORD PTR DS:[<&MSVCR100D.printf>]          ; \printf

Within the main , as shown in the marked code , function prologue is executed wherein EBP is pushed on the stack thereby creating the new local stack frame for main and then ESP is moved to EBP . At this point , ESP and EBP both points to the same location . Remind yourself again that stack always grows towards low memory address and when something is pushed , ESP will be decremented by 4.

Here , Stack will look like this : You can also view the stack state in the debugger and examine the ESP and EBP registers . 


















At the third instruction , ESP is further substracted by 0x54 and allocates the space for local variable buffer that we've declared in the source. Once this instruction is executed , below is how the stack will appear : Highlighted area in the debugger stack window is the space allocated for the variables.















In the next few lines of the code , we see the strcpy ( ) is being done from argv[1] to buffer , and then the string comparision is done after which appropriate printf call is executed . What we need to remember here is that for every function that is called from within the main , stack frames are created in exactly the similar way as demonstrated above . When the function returns , the stack is wiped off and EBP/ EIP is popped from the stack to resume execution thereafter.

Overflowing the buffer with long command line parameters

Now that we have the knowledge of how the stack frame is established , it will be lot more easier for us to understand what will happen if we pass long command line arguments to main . Until now we were OK since we passed the string with length of 8 bytes and our buffer is 20 bytes long..If we pass the string of say 30 bytes as the command line parameter to main , we are sure that we will overwrite past the allocated buffer space , EBP and finally saved EIP as well and even the way beyond if our string is longer. If we visualize the stack after the strcpy ()  operation , it will be like this :

Important point to note here is that , overflow will happen from lower memory address to higher memory address .















Let's  pass the long command line parameter and check the stack state and the behaviour of the debugger. You should pass the parameter to this program in the similar way I described previously. I passed the string of "A" ( Hex : 0x41 )  with the length of 45 and here is what happened in the debugger .













Ahaa !! ..We've overwritten the buffer with the long string of "A" passed as the parameter to main and effectively to strcpy () , which overflowed the buffer and eventually gone and overwrote the  saved return address ( EIP ) . So when the main returned , the EIP was popped off the stack and it throwed the exception because of the fact that it couldn't read the memory at that location. You can also see the bottom right stack window where the allocated memory was filled up with 0x41 ( Hex value of "A" ) and ESP pointing to our overflowed buffer . This is exactly what we will use to exploit the buffer overflow and jump to over shellcode . We'll see that later in the next part :-)

So what exactly happened here ? Let's closely step through the code and examine what caused the debugger to throw the exception . We will breakpoint the strcpy operation and examine the stack just before this operation , to get some clarity .

I restarted the program , breakpointed at strcpy operation :




If you closely take a look at the arguments of strcpy operation , source is our passed string of "A"s which we want to copy , and the other argument is the destination memory location 0x0012FF54 on the stack where the string is to be copied . Just observe the stack at this time . This is before the copy is done . Few locations below our destination pointer at 0x0012FF6C , we have the return address saved . As I indicated before, the copy will be done from low mem to high mem addresses. Once the strcpy will be executed , this location will be overwritten with our string of "A"s .












If you notice the state of the stack after strcpy , it is filled up with our supplied parameter and the stack location 0x0012FF6C , which previously had the return address stored , now has 0x41414141 . Futher , if you step through the code , it will perform the strcmp and print the appropriate message and finally , the funtion epilogue is executed :

MOV ESP, EBP
POP EBP
RETN

EBP is moved to ESP register , top of the stack is then popped into EBP which contains 0x41414141 as well and then when finally RETN is executed ESP is pointing to 0x0012FF6C which contains 0x41414141 , is popped into EIP . Debugger will throw the exception when trying to read 0x41414141.

So through this simple vulnerable code , I demonstrated  that we can control the EIP and overwrite with the memory address that we choose . In this example , it is easier for us to find the exact offset in our parameter to overwrite the EIP. We know that our buffer is exactly 20 bytes long . if we add 4 bytes of EBP to that , we should be able to overwrite the EIP at 25th byte in our string ..Let's try that out ...

I'll pass the string with 24 bytes of "A" s + 4 bytes of "B" ..and we'll see that EIP is overwritten with the 0x42424242.













That is what we expected. At this point of time , we've just triggered the buffer overflow in our vulnerable code. Next step is to exploit this vulnerability and modify the execution flow of the program to execute our own shellcode . If the command line argument is longer than 28 bytes , you will see that ESP is pointing to some offset in our string and we can exploit that to modify the execution flow of the program and do what we want it to. We'll see how to achieve it in the next part of this series.

 Little endian Vs Big endian

This is another little concept that we need to understand before we dive deep into exploitation.Little endian and Big endian are the order in which the bytes are stored in the memory and is often dependent on underlying hardware architecture .

In a Big endian hardware , the most significant byte of the word / dword is stored first as lowest memory address and then the subsequent bytes are stored at the increasing memory locations.For instaance , if you visualize big endian format memory storage for dword 0x41424344 , starting at memory location 0x0012FF6C , it will be stored in the following format :

0x0012FF6C  : 0x41
0x0012FF6D  : 0x42
0x0012FF6E  : 0x43
0x0012FF6F  : 0x44

If you take the memory dump of the bytes stored in big endian format , you will find 0x41424344 stored like this in the memory :

 ADDRESS         : ---- MEMORY BYTES ----------
0x0012ff6c : 41 42 43 44 00 00 00 00 00 ...
In a Little endian hardware , the least significant byte of the word / dword is stored first at the lowest memory address and the next subsequent bytes are stored at increasing memory locations . Intel processors store the data in the little endian format . So If you take the memory dump of the bytes stored in little endian format , you will find 0x41424344 stored in the memory as below:

ADDRESS      : ---- MEMORY BYTES ----------
0x0012ff6c  : 44 43 42 41 00 00 00 00 00 ...


In part 2 of this series, we will explore stack overflow vulnerability in a commercial software and see how it can be exploited to do something very intersting..

No comments:

Post a Comment