Results 1 to 5 of 5

Thread: How a program works

  1. #1
    Just burned his ISO PeterPunk's Avatar
    Join Date
    Jul 2009
    Posts
    9

    Default How a program works

    The post title could (also) be
    A very early step for understanding Buffer Overflow

    First, I would like to point out that everything I say is about the processor xx86 family. In addition, memory addresses are expressed in a decimal notation (for the shake of clarity for beginners) instead of hexadecimal that actually represented in the real world systems.

    Every process starts in memory in three basic segments:
    -Code Segment
    -Data Segment (the well known BSS)
    -Stack Segment

    CODE SEGMENT
    ---------------
    In this memory segment, "live" all instructions of our program. Nobody... (nobody? well ok, almost nobody) can write to this memory segment i.e. is a read only segment.

    For example
    All assembly instructions (in C code here) are located in code segment:

    /* Set the 1st diagonal items to 1 otherwise 0 */
    for (i = 0; i < 100; i++)
    for (j = 0; j < 100; j++)
    if (i<>j)
    a[i][j] = 0
    else
    a[i][j] = 1;


    PS: The remarks /*...*/ are not included... in the data segment. The compiler does not produce code for the remarks.

    DATA SEGMENT
    ---------------
    All initialized or un-initialized global variable are stored in this non-read only segment.
    For example:

    int i;
    int j = 0;
    int a[100][100];



    STACK SEGMENT
    ----------------
    All function variables, return addresses and function addresses are stored in this non-readonly memory.
    This segment is actually a stack data structure (for those that have attended a basic information technology course). This, actually means, that we put variables in a stack in memory. The last putted (or pushed) variable is in the top on stack i.e. the first available. The well known LIFO (Last In First Out) data structure.

    The processor register ESP (Extended Stack Pointer) is used to keep the address of the first current available element of the stack.

    In the stack: we can put (PUSH) and get (POP) values.
    There are two important "secrets" here:
    [1] PUSH and POP instructions are done in 4-byte-units because of the 32bit architecture of xx86 processors family.
    [2] Stack grows downward, that is, if SP=256, just after a "PUSH 34" instruction, SP will become 252 and the value of EAX will be placed on address 252.

    For example:

    STACK
    adr memory
    --- -------
    256 xy
    252
    248
    244
    (ESP=256)

    Instruction > PUSH EAX ; remark: suppose EAX = 34

    STACK
    256 xy
    252 34
    248
    244
    (ESP=252)

    Instruction > POP EAX ; remark: Get the value from the stack into EAX register

    STACK
    256 xy
    252 34
    248
    244
    (ESP=256)


    Instruction > PUSH 15 ; remark: suppose EAX = 15
    Instruction > PUSH 16 ; remark: suppose EBX = 16

    STACK
    256 xy
    252 15
    248 16
    244
    (ESP=248)



    What is behind a function-call
    -----------------------------
    Before we explain what is behind, we must say a few words about the EIP (Extended Instruction Pointer or simple 'Instruction pointer'). This register keeps the code segment address of the instruction that will be executed by the CPU.

    Every time CPU executes an instruction stores into EIP the address of the instruction that follows the currently executed.
    But, how does CPU find the address of the next instruction?
    Well... we have two cases here...
    1. The address is immediately after the instruction currently executed.
    2. There is a 'JMP' (jump, i.e. a function call) so the instruction that needs to be executed next is in an address which is not next to the current.

    In case 1 the address is calculated by simply add the Length of the currently executed instruction to the current EIP value.
    Example:
    Suppose we have the following 2 instruction to the addresses 100, 101

    100 push EDX
    101 mov ESP 0

    Suppose that at the starting point of our little program we have: EIP = 100
    CPU executes the instruction at address 100.
    CPU checks the instruction:
    Is it a JUMP? No, so calculate its size. CPU knows that the push instruction is 1 byte long.
    So,... the new value of
    EIP = EIP + size(push EDX) =>
    EIP = 100 + 1 =>
    EIP = 101
    CPU executes the instruction at address 101, and so forth...

    In case 2, we have a jump, so things are a bit more different.
    Actually, just before we JMP to another address (i.e. call a function), we save the address of the next instruction in a temporary register, say in EDX; and before returning from the function we write the address in EDX to EIP back again.

    CALL and RETN assembly instructions are used ... by the CPU to calculate the above addresses.
    The CALL is used to do 2 things:
    1. To "remember" the next instruction that will be executed after function returns (by pushing its address to the stack) and
    2. To write into the EIP the address of the calling function i.e. to perform the function call.

    The RETN instruction is called at the end of the function:
    It pops (gets) the "return address" that CALL pushes into the stack to continue the execution after the end of the function.

    The Base pointer (EBP)
    ----------------------
    Each function in any program (even the main() function in C) has its own stack frame. A stack frame is a logical group of consecutive variables in the stack that keeps variables and addresses for every function that is currently executed.
    Every address in the stack’s frame is a relative address. That means, we address the locations of data in our stack in relative to some criterion. And this criterion is EBP, which is the acronym for Extended Base Pointer.
    EBP has the stack pointer of the caller function. We PUSH the old ESP to the stack, and utilize another register,named EBP to relatively reference local variables in the callee function.
    I hope the use of the base pointer will be more clear in the following example.

    A REAL EXAMPLE C PROGRAM.
    Consider the following C program:

    void function1(int , int , int );
    void main()
    {
    function1 (1, 2, 3);
    }

    void function1 (int a, int b, int c)
    {
    char z[4];
    }

    I compile/link the above program and I use the olly debugger to check the assembly code created.
    Bypassing the operating systems instructions (which is the 90% of the assembly code) the rest is the code that corresponds to our little program:

    0040123C /. 55 PUSH EBP
    0040123D |. 8BEC MOV EBP,ESP
    0040123F |. 6A 03 PUSH 3 ; /Arg3 = 00000003
    00401241 |. 6A 02 PUSH 2 ; |Arg2 = 00000002
    00401243 |. 6A 01 PUSH 1 ; |Arg1 = 00000001
    00401245 |. E8 05000000 CALL bo1.0040124F ; \bo1.0040124F
    0040124A |. 83C4 0C ADD ESP,0C
    0040124D |. 5D POP EBP
    0040124E \. C3 RETN

    0040124F /$ 55 PUSH EBP
    00401250 |. 8BEC MOV EBP,ESP
    00401252 |. 51 PUSH ECX
    00401253 |. 59 POP ECX
    00401254 |. 5D POP EBP
    00401255 \. C3 RETN


    ANALYSIS:
    -----------
    The addresses from 0040123C to 0040124E is the main() function.
    The addresses from 0040124F to 00401255 is the function1() function.

    0040123C /. 55 PUSH EBP
    Backs up the old stack pointer. It pushes it onto the stack.

    0040123D |. 8BEC MOV EBP,ESP
    Copy the old stack pointer to the ebp register
    From then on, in the function, we'll reference function's local
    variables with EBP. These two instructions are called the
    "Procedure Prologue".

    The stack has the EBP value:
    [ebp]
    STACK
    256 [ebp]
    (ESP=256)



    0040123F |. 6A 03 PUSH 3 ; /Arg3 = 00000003
    00401241 |. 6A 02 PUSH 2 ; |Arg2 = 00000002
    00401243 |. 6A 01 PUSH 1 ; |Arg1 = 00000001
    Here we put the arguments into the stack

    The stack is:
    STACK
    256 [ebp]
    252 3
    248 2
    244 1
    (ESP=244)


    00401245 |. E8 05000000 CALL bo1.0040124F ; \bo1.0040124F
    call the function at addresss 0040124F. bo1 is the name of my executable.
    The stack becomes:
    STACK
    256 [ebp]
    252 3
    248 2
    244 1
    240 0040124A <- the return address when the function1 ends.
    (ESP=240)

    Let’s follow the execution, so go to address 0040124F (the function1):

    0040124F /$ 55 PUSH EBP
    00401250 |. 8BEC MOV EBP,ESP
    Hmm... this is the "Procedure Prologue" again (remember this must be executed in every function). It set ups its own stack frame. The EBP register is currently pointing at a location in main's stack frame. This value must be preserved. So, EBP is pushed onto the stack. Then the contents of ESP is transferred to EBP. This allows the arguments to be referenced as an offset from EBP and frees up the stack register ESP to do other things.

    The stack now, is:
    STACK
    256 [ebp]
    252 3
    248 2
    244 1
    240 0040124A <- the return address when the function1 ends.
    236 <main’s EBP> <- Note that ESP=EBP indicates this address.
    (ESP=236)


    00401253 |. 59 POP ECX
    00401254 |. 5D POP EBP
    After two pops the actual stack becomes:
    STACK
    256 [ebp]
    252 3
    248 2
    244 1
    (ESP=244)

    00401255 \. C3 RETN
    The function ends and returns to the 0040124A (remember our definition of the RET instruction).

    0040124A |. 83C4 0C ADD ESP,0C
    After the function RETurned, we add 12 or 0C in hex (since we pushed 3 args
    onto the stack, each allocating 4 bytes (integers)) into Stack Pointer. Increasing the ESP we actually decreasing the stack (remember that we fill stack downwards from high to low memory addresses i.e. ESP = 244 + 12 = 256).
    STACK
    256 [ebp]
    (ESP=256)

    Thus, the ESP has the value that has at the first step of the programs execution before the function call.

    I hope that you get a basic understanding of the use of Stack and Stack Pointer.
    In another article I will describe how nasty things can happen here.

  2. #2
    Good friend of the forums
    Join Date
    Jun 2008
    Posts
    425

    Thumbs up

    nice write up

  3. #3
    Just burned his ISO alchemist-tek's Avatar
    Join Date
    Jun 2008
    Posts
    6

    Default

    i'd hate to steal your thunder, but ever hear of phrack magazine? they have this wonderful article called "smashing the stack for fun and profit" by aleph 0ne....

  4. #4
    Just burned his ISO PeterPunk's Avatar
    Join Date
    Jul 2009
    Posts
    9

    Default

    Quote Originally Posted by alchemist-tek View Post
    i'd hate to steal your thunder, but ever hear of phrack magazine? they have this wonderful article called "smashing the stack for fun and profit" by aleph 0ne....
    You don't steal any of my "thunder" of course
    Also, I am not trying to "patronize" the copyright of the above knowledge!

    Of course I know the very famous Aleph article: "smashing the stack for fun and profit" released at 1996 (I think).
    To be honest, I have published the above post in other forums & mags too (also in Greek language), with additional remarks that i could not put them here, because of the post size restrictions (10kb).
    So... to be sufficient let me reproduce the "cut" parts, that, I hope, clear up some issues.

    This is not a buffer overflow exploit, but a required background that will help to understand how CPU & memory "collaborate" to execute a program.
    I read many articles about 'buffer overflow'. Most of them starting from a specific point by 'stowing' the basic knowledge one must have to deeply understand what is going on behind the scenes. I write this article to cover (I hope) this gap.

    If at the end of this article you feel more comfortable with concepts like CALL, RETN and how a function is executed using the memory (buffer, stack, etc) then i will feel that I succeed... so, help me feel a successful and nice person )

    First, I would like to point out that everything we say is about the processor xx86 family. In addition, most memory addresses are expressed in a decimal notation (for the shake of clarity for beginners) instead of hexadecimal that actually represented in the real world systems.

    Requirements in order to read this article:
    1. A basic understanding of assembly language.
    2. A basic understanding of C language.
    3. A basic understanding of a Personal Computer.
    4. A basic understanding of English (i hope...).
    5. None of the above,... just open mind, imagination and... frame.
    Well,... ok,.. 4 and 5 i believe is the most crucial - even they contradict each other!
    and the last (and for others maybe more important part)...

    References:
    [1] BUFFER OVERFLOWS DEMYSTIFIED by murat@enderunix.org
    [2] C Function Call Conventions and the Stack (UMBC CMSC 313, Computer Organization & Assembly Language, Spring 2002, Section 0101)
    [3] The Assembly Language Book for IBM PC by Peter Norton (ISBN 960-209-028-6)
    [4] Analysis of Buffer Overflow Attacks from ***
    [5] 8088 8086 Programming and Applications for IBM PC/XT & Compatibles by Nikos Nasoufis
    [6] Smashing The Stack For Fun And Profit by Aleph One - aleph1@underground.org ***
    *** Sorry about the asterisks but I have got less than 15 posts so I can't put URLs...

    Thanks for your comments...

  5. #5
    Junior Member
    Join Date
    Apr 2009
    Posts
    33

    Default

    Anyone looking for further reading or a better understanding if you don't know assembly very well. I would find a cheap copy of "Gray Hat Hacking : The Ethical Hacker's Handbook". It's also got a good bit of info on writing attack code along with ASM basics.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •