CS 31 Lab 6

1. Due Date

Lab 6 is a Mini Lab. As such, you will work on it on primarily in the weekly lab session and submit what you have completed at the end of your lab session. Being a mini lab means that it:

does not count as much towards your lab grade as the other regular labs.
is one that you complete (or mostly complete) during your weekly lab session.
is not be eligible for late days (you cannot use late days on this lab).

The reason for this being a mini lab is to support your studying for the midterm. As a result, the maze lab will be turned off on Wednesday Oct. 22 (lab day) at 11:59pm, which means you cannot work on it.

It will stay turned off until after the midterm, and we will turn it back on again briefly (on Monday Oct. 27 around 9pm (after the midterm)), and it then will be available to you again until Wednesday Oct. 29 at 2am.

Although not required, we encourage you to work on it some more after the midterm to see if you can solve more floors.

Due: At end of the weekly lab session: push your how_we_solved.txt file to github with notes about what you have done to solve your maze program so far. (and see below about how you can share your how_we_solved.txt file with your partner using scp).

After the midterm, we will turn the maze lab back on for about 24 hours. We encourage you and your partner to work on it some more to see if you can solve more floors. See if you can Complete floors 1 and 2 by Wednesday Oct. 29 at 2am. You will Minimal Extra Credit for additionally completing floors 3 and 4.

Partner for Lab 6: you each have your own Git repo for this lab, and will be assigned to your Lab 6 partner at the start of your lab session. We will step through together how you will share code with your partner.

2. Lab 6 Overview and Goals

2.1. Lab Overview

You have been sleep walking again, and you wake up on the roof of Parrish Hall. You need to find your way through its maze of floors and out of the building in time for your first class. The problem is that due to construction, there is no stairwell that connects more than two floors. As a result, you need to travel along each floor to find the next open stairwell down to the next floor below. However, there are people or things along your path that can trip you up and impede your progress, forcing you to run back to the roof to try again.

In this assignment, you and your partner are going to receive a binary maze program. Your maze has 5 phases, one for each floor of Parrish Hall (from the roof to out the door). Each floor’s phase is a binary puzzle that needs to be solved to move on to the next floor. To solve a puzzle you need to enter a correct phrase on stdin (you can also have your maze read phrases from a file given as a command line argument).

Your goal is to solve all the required phases/floors of your maze, limiting the number of times you trip-up along the way and have to start all over.

Your maze will automatically notify a central scoreboard every time you either trip-up or solve a floor for the first time.

2.2. Lab Goals

Gain experience reading and tracing through the execution of x86_64 assembly instructions.
Practice with tools for examining binary files.
Put your GDB skills to work to solve an assembly code puzzle.

3. Lab Starting Point Code

3.1. Getting Your Lab Repo

Both you and your partner should clone your Lab 6 repo into your cs31/Labs subdirectory:

get your Lab 6 ssh-URL from the CS31 GitHub Organization. The repository to clone is named Lab6-userID1-userID2 where the two user names match that of you and your {labpartnerurl}[Lab 6 lab partner].
cd into your cs31/Labs subdirectory:
```
$ cd ~/cs31/Labs
$ pwd
```

clone your repo

$ git clone [the ssh url to your your repo]
$ cd Lab6-userID1-userID2

There are more detailed instructions about getting your lab repo from the "Getting Lab Starting Point Code" section of the Using Git for CS31 Labs page. To make changes, follow the directions in the "Sharing Code with your Lab Partner" section of the Using Git for CS31 Labs page.

3.2. Starting Point files

The starting point files for this lab are:

$ ls
README.adoc how_we_solved.txt  mazeID soln.txt

maze_ID : a file into which you will add your maze ID (maze number).
how_we_solved.txt : a file into which you will add your description of how you solved each phase of your maze.
soln.txt: a file into which you will add the solution to each maze phase, one solution per line. You can use this file as input to your maze program. If you enter your answer on one line, make sure to hit <enter> at the end to introduce the newline character so that your solution is parsed correctly.

Note that your repo does not include your maze program. You will get that another way and add it to your repo.

4. Getting, Running, Solving your Maze

4.1. Getting Your Maze

Please wait on this step: We will do this together in lab.

Each time you register on the maze server, you will receive a unique maze with unique solutions. For this reason, you should only perform the following steps once!

You must perform this next step from a CS lab machine.

First, cd into your cs31/labs/Lab6-userID1-userID2 subdirectory.
Next, only one of you and your partner will follow these steps to get your maze.
1. In a browser on a CS lab machine, one of you or your partner should enter this url: http://owl.cs.swarthmore.edu:8000/.
2. Enter your CS user name and choose Submit.
3. Choose to save the mazeX.tar file in the dialog box that pops up. (Note that X is a number: each group will get a different maze number. For example, you might get maze12.tar.) Save this .tar file into your Lab6-userID1-userID2 repository directory. If you are not able to choose the download directory for your Lab6-userID1-userID2 directory, then the file is likely saved in the Desktop directory in your home directory. Use the mv command to move it into your lab repo:
  $ cd ~/cs31/labs/Lab6-userID1-userID2 $ mv ~/Desktop/mazeX.tar . # replace X with your maze number
At this point, your partner can remotely copy your mazeX.tar file so that you each have the same one. From the one of you who has the file’s account run scp to copy the file from your account to your partner’s account (scp file_name <username>@<machinename>:.). For example, to copy to user’s tnas account on the machine oil I’d do the following (and you can use oil too, or any other lab machine name):
```
scp mazeX.tar tnas@oil:.  # replace tnas partner's user name, X with your maze num
```
scp will prompt for your partner’s password that they should enter to copy the file to their account:
```
(partner@oil) Password:
```
Running scp copies the mazeX.tar file from your account into your partner’s home directory. Your partner can then move it into their Lab repo by running the mv command and then git add, commit, push to add it to their repo. For example, from your Lab repo do the following (replacing the X in mazeX.tar with your maze number):
```
mv ~/mazeX.tar .     # replace X with your maze num
git add mazeX.tar
git commit -m "my maze"
git push
```
Next, you and your partner should open your how_we_solved_it.txt file and add the following information:
1. Your maze number.
2. Both of your names.
At this point both you and your partner can add your mazeX.tar and how_we_solved_it.txt to your repos (replacing X with your maze number):
```
$ git add mazeX.tar how_we_solved_it.txt # replace X with your maze number
$ git commit -m "our maze"
$ git push
```

Next, you and your partner will untar your maze tar file:

$ cd ~/cs31/labs/Lab6-userID1-userID2
$ ls                  # NOTE: your mazeX.tar file will have a different number
maze1.tar
$ tar xvf maze1.tar   # NOTE: put your maze number here instead of mine
README
maze
main.c
$ ls
maze*  maze1.tar  maze.c  README

### DO NOT RUN THE MAZE PROGRAM YET! ###

Do not run the maze program yet!

At the end of lab today, your partner can share their how_we_solved_it.txt file with you using scp in as similar way to how you scp’ed the mazeX.tar file:

scp how_we_solved_it jas@oil:.  # replace jas with your partner's user name

4.1.1. Maze Program Code

The files included in your mazeX.tar file are:

main.c: contains the maze program’s main function. You can open main.c in vim (or another text editor) and see what the code is doing.
maze: contains your maze program binary. This is binary contains an x86_64 maze (described below), that you need to solve for this lab. Do not run the maze program yet!

4.2. Checking its Status

To check the status of your maze program, enter the following url into a browser (that is running on a CS machine): http://owl.cs.swarthmore.edu:8000/scoreboard.

To view the scoreboard from the terminal, you can type the following command. It won’t be super pretty, but it will allow you to check the scoreboard if you aren’t at the CS lab machines.

$ maze31scores

4.3. Running your Maze

Your maze program checks the machine it’s running on and will refuse to run anywhere other than a CS lab machine. If you wish to work on the lab remotely, you’ll need to use ssh to log in to a lab machine first, and then you can execute commands via that remote connection. See the remote access CS help page about how to do this.

The maze program can be run both with and without your soln.txt input file (as you solve floors we recommend running with this). Do not run the maze program yet!

As you are solving your maze, you are almost always going to want to run it in gdb. You’ll likely want to start it in gdb, set breakpoints, and then run (with or without the soln.txt input file), in order to step through its execution:

$ gdb ./maze
(gdb) layout asm
(gdb) break main
(gdb) run                # run maze
                         #   OR
(gdb) run soln.txt       # run maze with your soln.txt input file

It may be helpful to use the layout reg command to see how register values get updated after instructions are run. We don’t recommend running the maze outside of gdb, but you can also just run your maze program from the command line:

$ ./maze
$ ./maze soln.txt

If you are afraid your maze is about to trip-up, just enter Control-C to kill it.

5. Lab Details

The binary maze is a program that consists of a sequence of assembly language puzzles, one corresponding to each floor of Parrish that you need to pass through to get out the door. Each puzzle expects you to type a particular string when prompted. If you type the correct string, then the puzzle is solved and the maze program proceeds to the next floor. Otherwise, the maze program issues a trip-up message and terminates.

For the minilab we would like you to solve floors 1 and 2.

The scoreboard shows points for solving every floor. You are penalized for every trip-up that you let fully happen (1 point for every 4 trip-ups, rounded down), so you need to be careful not to trip-up too many times. Note that there is a cap on this penalty, so don’t be too hesitant to try things out.

Solving floor 1 and solving or making significant progress towards solving floor 2 is worth full credit for the mini lab. This includes the write-up of how you solved floors 1 and 2.
Solving the 3rd and 4th floor is worth a very small amount of extra credit points (and a large amount of satisfaction in solving more difficult floors).

5.1. Solving Your Maze

You must run your maze on one of the CS lab machines.

To kill your maze executable (to make it exit without tripping-up), type Control-C. This way you can run your maze, solve a puzzle on a floor, and then exit and come back later to try the puzzle on the next floor.

You can use many tools to help you solve your maze. Look at the hints section below for some tips and ideas. The best way is to use gdb to step through the execution of the disassembled binary.
Although the puzzles on each floor get progressively harder to solve, the expertise you gain as you move from floor to floor should offset this difficulty.
Once you have solved the puzzle on a floor, I encourage you to run your maze with a soln.txt file containing the input phrases for the floors you have solved. The format of the file should be one phrase per line, in order of the maze floors. Using an input file will help to prevent you from accidentally tripping up in the maze on a previously solved floor. For example: Make sure to hit <enter> at the end to introduce the newline character so that your solution is parsed correctly.
```
./maze soln.txt
```
reads the input lines from soln.txt until it reaches EOF (end of file), and then switches over to stdin for the remaining input. This feature is also nice so you don’t have to keep retyping the solutions to floors you have already solved. The maze ignores blank input lines, both in the file and on stdin.
To avoid accidentally tripping up in the maze, you will need to learn how to single-step through the assembly code and how to set breakpoints. You will also need to learn how to inspect both the registers and the memory states. One of the nice side-effects of doing the lab is that you will get very good at using a debugger. This is a crucial skill that will pay big dividends the rest of your career!

6. Lab Requirements

You must solve your maze by examining it at the assembly code level using tools like gdb, strings, objdump, and other similar tools for examining binary files.

For the complete solution you need to get past floors 1 and 2, complete and submit the write-up of how you solved each floor, and submit the solution to each floor in soln.txt, editing `mazeID file, and pushing it to your repo.

Write-up Requirements (for the completed solution only)

Edit how_we_solved.txt in an editor to include a short explanation of how you solved each floor and a short explanation of what each floor is doing.
Describe at a high-level what the original C code is doing for each floor. For example, is it doing some type of numeric computation, string processing, function calls, etc. and describe the specific computation it is doing (i.e. what type of string processing and how is that being used?).
Don’t describe in terms of registers and assembly code for this part, but describe what the puzzle on each floor is doing at a higher-level in terms of C semantics. You do not need to reverse engineer the assembly code and translate every part of it to equivalent C code. Instead, give a rough idea of equivalent C or pseudo code for the main part of the puzzle on each floor.

For example, something like "uses an if-else to choose to do X or Y based on the input value Z" is an appropriate right level of explanation. Something like "moves the value at rsp + 8 into register rax " is way too low-level.
The lab write-up lab should not be onerous; you should be able to explain each puzzle in a short paragraph or two (maybe with a few lines of C or pseudo code to help explain). I recommend doing the write-ups for each floor as you solve them.
Excessively verbose, low-level descriptions will be penalized, as will vague descriptions; you want to clearly demonstrate that you figured out what that floor is doing by examining the assembly code for each floor in your maze executable.
If you are unable to solve a floor, you can still receive partial credit for it in the write-up part by telling me what you have determined about that floor.

7. Tips

There are many ways of solving your maze. There are various tools for examining the program binary without running the maze program. These may provide some helpful information for solving some floors. The most useful tool will be gdb, which will allow you to run the maze program, set breakpoints, step through parts of its instruction’s execution, and examine its execution state. This will help you to discover information about what the program does, and you can use this information to solve your maze.

Remember that the maze program must be run on a CS machine. See Section 4.3 for information about how to run your maze, and how to run it remotely.

7.1. How not to solve the maze lab

Do not try to use brute force! You could write a program that tries every possible input string to find the right one. But this is no good for several reasons:

You lose 1/4 point (up to a max of 5 points) every time you guess incorrectly and the maze trips-up. Every 4th trip-up is -1 points (3 trip-ups is -0 points).
Every time you guess wrong, a message is sent to the mazelab server. You could very quickly saturate the network with these messages, and cause the system administrators to revoke your computer access.
We haven’t told you how long the input strings are, nor have we told you what characters are in them. Even if you made the (incorrect) assumptions that they all are less than 80 characters long and only contain letters, then you will have 26^80 guesses for each floor. This will take a very very long time to run, and you will not get the answer before the assignment is due.

7.2. How to solve the maze lab!

There are many tools that are designed to help you figure out both how programs work, and what is wrong when they don’t work. Here is a list of some of the tools you may find useful in analyzing your maze, and hints on how to use them. And refer to the Week 7 weekly lab page for more information on using gdb and tools for examining binaries:

gdb ./maze The GNU debugger will be your most useful tool. You can trace through a program line by line, examine memory and registers, look at both the source code and assembly code (we are not giving you the source code for most of your maze), set breakpoints, and set memory watch points.
draw the stack and register contents as you are tracing through code in gdb, and take notes as you go (this will also help you with the write-up part of the lab assignment).
strings maze: display the printable strings in your maze.
objdump: objdump may provide some information that is helpful, but gdb will be your most useful tool
- objdump -t maze prints out the maze’s symbol table. The symbol table includes the names of all functions and global variables in the maze, the names of all the functions the maze calls, and their addresses. You may learn something by looking at the function names.
- objdump -d maze: disassemble all of the code in the maze. You can also just look at individual functions. Although objdump -d gives you a lot of information, it doesn’t tell you the whole story. Calls to system-level functions are displayed in a cryptic form. For example, a call to sscanf might appear as:
  4015a6: e8 15 fd ff ff call 4012c0 <__isoc99_sscanf@plt>
  To determine what that call to sscanf is doing, you need to disassemble within a running maze program using gdb.

Looking for documentation about a particular tool? The man command will help you find documentation about unix utilities, and in gdb the help command will explain gdb commands:

$ man objdump

(gdb) help ni

Here is some more information about man

7.3. Notes on odd instructions or code sequences

You may find some code in your maze lab that uses instructions that we have not talked about in class. Here are some notes about some of these:

endbr64: You can ignore this instruction in any code that appears in this lab. It helps to ensure that jump instructions (including function calls) change the PC to valid locations in a program. It provides one mechanism to prevent attackers from hijacking control of a program, but it won’t affect the behavior of your maze in a way that you need to account for (i.e., it doesn’t change register values or memory that you care about). For more information, see: control flow integrity.
lea — Load effective address: This instruction computes a memory address without actually accessing memory at the location of the address. It’s really deceiving, since it looks like a memory access (it puts parentheses around a register), but it’s really just computing the address that would be accessed via the same displacement(register) syntax. See the "Load Effective Address" section of the textbook for more information.
Instructions that reference parts of registers. You may see code sequences that reference the lower half or lower quarter of a register. For example:
```
mov    %al,-0xa1(%rbp)
```
stores a 1 byte value from the lower byte in register rax to the specified address. Instructions with sub-registers are generated when the C code manipulates values that are smaller than 8 bytes, such as short or char values. Specific sub-bytes of the 8-byte register rax can be specified in instructions as registers: ah is the byte in bits 15-8 of register rax; al is the byte in bits 7-0 of register rax; ax is the 2-byte value in bits 15-0 of register rax; and eax is the 4-byte value in bits 32-0 of register rax. See the lecture notes and the "Advanced Register Notation" section of the textbook for more information about the names of these portions of the general purpose registers.
Stack canary code. You may see some odd code sequences around function return code that includes an instruction that looks like this: sub %fs:0x28,%rax. For example:
```
0x56556a53 <+100>:    sub    %fs:0x28,%rax
0x56556a5a <+107>:    jne    0x401723 <func+117>
< function return instructions here>
0x56556a61 <+117>:    call   0x401240 <__stack_chk_fail@plt>
```
You can ignore this sequence of instructions in figuring out the solution to a maze floor. You should, however, look at the <function return instructions here> parts of the code that are surrounded by this sequence of instructions.

This is stack canary testing code that the compiler generates to check that the stack is not corrupted by the function’s execution (called a buffer overflow error). The code checks something called a stack canary that is used to detect buffer overflow. If the compared value doesn’t match the stack canary value then the stack has been corrupted and the __stack_check_fail function is called, likely terminating the program with a stack memory error. Here is some more information about what a stack canary is: stack canaries

8. Submitting

Whether or not you have solved a maze floor, and how many times you have tripped-up your maze is automatically submitted to the maze server by your maze program, but you still need to submit a few things via GitHub:

We hope you can solve floors 1 and 2, but however far you get, we want you to submit your how_we_solved.txt file with your notes on your progess.

Note: Solving floors 3 and 4 are extra credit, provided you include a write-up on how you solved the floors.

To submit your code, commit your changes locally using git add and git commit. Then run git push while in your lab directory. Only one partner needs to run the final git push, but make sure both partners have pulled and merged each others changes.

Here are the commands to submit your complete solution (from one of you or your partner’s cs31/labs/Lab6-userID1-userID2 directory:

$ git add how_we_solved.txt soln.txt mazeID
$ git commit -m "lab 6 completed"
$ git push

If you have difficulty pushing your changes, see the "Troubleshooting" section and "can’t push" sections at the end of the Using Git for CS31 Labs page. And for more information and help with using git, see the git help page.

Lab 6 Survey

After submitting your complete Lab 6 solution, you should submit the required TBA[Lab 6 Questionnaire] (each lab partner must do this). Note this link is typically live a day or two before the due date through several days after the due date. Please submit within one week of the lab due date.

9. Handy Resources

General Lab Resources

Refer to the Week 7 weekly lab page
Class EdSTEM page for questions and answers about lab assignment
CS31 github org
Git help and Git for CS31 labs
Appendix2: Using Unix
Some Useful Unix Commands
vi (and vim) quick reference
man and apropos documentation for libraries and commands

Assembly Resource

Refer to the Week 7 weekly lab page
x86_64 Assembly Reference Sheet
x86_64 Assembly Stack
ASM Visualizer (choose x86_64 ISA)
gdb cheat sheet summary of the most common/useful commands (2nd page is for assembly debugging)
Section 3.2 and Section 3.5 of textbook (assembly debugging, print, display, info and x commands)
GDB for Assembly (from the gdb Guide). (assembly debugging and x command)
Dive into Systems, Chapter 7 on loops in x86_64 assembly
Tools for examining phases of compiling and running C programs
Tools for examining .o and executable files

CS31 Mini Lab 6: Binary Maze: The great escape from Parrish Hall