1. Goals for this week:

  1. Learn tools for examining binary files.

  2. Practice examining a binary program file to discover what it’s doing

  3. Introduction to Lab 5.

2. Starting Point Code

Start by creating a week06 in your cs31/weeklylab subdirectory and copying over some files:

$ cd ~/cs31/weeklylab
$ mkdir week06
$ cd week06
$ pwd
/home/you/cs31/weeklylab/week06
$ cp ~richardw/public/cs31/week06/* ./
$ ls
Makefile  mystery*  README  simplefuncs.c

Compile simplefuncs.c using the provided Makefile:

$ make

3. Tools for examining binary files

Some tools for examining binary files:

  • strings dumps all the strings in a binary file:

$ strings simplefuncs
  • objdump -t or nm to list the symbol table contents:

$ objdump -t simplefuncs        # list symbol table in the executable file
$ nm --format sysv simplefuncs  # list symbol table in the executable file

The symbol table includes the names of all functions and global variables in the program. There is a lot of information in the symbol table that looks odd, but you should be able to see an entry for the two functions main and func1, and see where their start addresses are in memory.

  • gdb: for debugging programs at the assembly code level and examining the state of CPU registers and memory as the program runs. These will be the most useful tools for the next lab assignment.

3.1. gdb for debugging at the assembly code level

With gdb you can debug and trace through a program execution at the assembly code level. This includes executing individual IA32 instructions, examining register values, and disassembling functions.

First, let’s open up simplefuncs.c in an editor. Then, let’s try some things out in gdb:

$ gdb simplefuncs
(gdb) break main
(gdb) break func1
(gdb) run

In gdb you can disassemble code using the disas command:

(gdb) disas
(gdb) disas func1

You can break at a particular offset into a function:

(gdb) break *main+58    # set breakpoint at offset +58 in main

And you can step or next at the instruction level using ni or si (si steps into function calls, ni skips over them):

(gdb) ni        # execute the next instruction then gdb gets control again
(gdb) ni
(gdb) ni
(gdb) disas
(gdb) cont      # continue to next break point

Now we are at the call to func1, let’s step into this function using si (we also have a breakpoint at this function, let’s see when it is hit):

(gdb) si        # step into instructions in the called function (func1)
(gdb) disas
(gdb) ni
(gdb) where
(gdb) disas
(gdb) cont

The difference between si and ni shows up in what each does on a call instruction. si gives gdb control again at instructions at the beginning of the called function. ni gives gdb control again at the instruction immediately after the call instruction (the instruction at the return address). In other words, si "steps into" the called function, whereas ni lets the called function code continue, and only after the function returns does gdb get control again.

You can print out the values of individual registers like this:

(gdb) p $eax

You can also view all register values:

(gdb) info registers

You can use the display command to automatically display values each time a breakpoint is reached:

(gdb) display $eax
(gdb) display $edx

Tired of typing disas all the time? GDB has an option to always show the current assembly code in part of the window:

(gdb) layout asm

The only caveat is that it doesn’t always play nicely when the program you’re debugging produces output (e.g., with printf). If you’re using this mode and the display gets scrambled, try pressing CTRL-L (or resizing the terminal window.)

3.2. Examining memory

Let’s reset the state of the program to just before the call to func:

(gdb) run
(gdb) cont
(gdb) disas

At this point in the program, we can see that in addition to being in registers, the values 2 and 200 have been stored on the stack at addresses -0x10(%ebp) and -0xc(%ebp), respectively.

If you want to check the contents of memory, you could do something like:

p *(int *)($ebp - 0x10)

That’s a really nasty statement! Alternatively, you can use the examine command (x) to display the contents of a memory location. The memory address operand to (x) can be specified as the name of the register storing the address value or as an absolute memory address value. Here are some examples:

(gdb) p $ebp-0x10   # see what p and x display for the same value
(gdb) x $ebp-0x10   # see what p and x display for the same value

The examine command also takes formatting options to tell it how to interpret the memory at the address:

(gdb) x/wd $ebp-0x10 # examine memory at specified address and display it in decimal
(gdb) x/wx $ebp-0x10 # examine memory at specified address and display it in hex
(gdb) x/s $ebp-0x10  # examine memory at specified address and display it as a string

Examine’s formatting is sticky, which means that its last format specification is the one used for subsequent calls. To change it, explicitly specify an option again. This is different from print, which always defaults to int.

(gdb) x/wd $ebp-0x10 # examine memory at address ($ebp-0x10) as an int in decimal
(gdb) x $ebp-0xc     # examine memory ($ebp-0xc) with /d formatting (sticky formatting)

Let’s move forward until we’re about to call printf at main+80:

(gdb) break *main+80
(gdb) cont   # breaks when we enter func1
(gdb) cont   # breaks when we get to main+80
(gdb) disas

We know that printf always receives a format string as its first argument, so let’s see if we can find it. The parameters to the function should have been pushed onto the stack. If we look right above the call to printf, we see push $0x80b8014. Let’s look at that:

(gdb) p 0x80b8014

Hmm, that didn’t do anything helpful — maybe because it’s a memory address. Let’s try examining it as a string:

(gdb) x/s 0x080b8014

There we go! That’s the first argument to printf. We can also print the value of the second argument (y) using x/wd to examine the previously pushed item as an integer:

(gdb) x/wd $ebp-0xc

This strategy of printing function arguments just prior to calling a function should help you a lot when deciphering what mysterious assembly code is doing.

4. Try out some of these tools on a program binary

Run the mystery binary a few times and see what it is doing:

$ ./mystery

The program is asking you for input, but there is really not a lot of information provided to guess the right input, and this executable was not compiled with -g so there is no C code information we can get from it when we run it in gdb.

Let’s see if we can examine the assembly code to see if we can figure out what to enter.

Lets trying running in gdb and disassemble some code.

$ gdb ./mystery
(gdb) layout asm # optional: turns on the ASM layout
(gdb) break main
(gdb) run
(gdb) disas      # you only need to do this if you didn't turn on ASM layout

Let’s consider some questions about this program:

  1. what does main control flow look like?

  2. let’s add some break points around function calls and in functions

  3. let’s examine some state around functions

  4. we can print out strings using x/s

(gdb) x/s base_addr_of_string

5. Lab 5

Finally, let’s take a look at Lab 5.

6. Handy References