CS 45: Lab 5

Handy References:

Lab Audio, Week 1
Lab Audio, Week 2
Lab Audio, Week 3

Lab 5 Goals:

Implement a small-scale, traditional file system using inodes, directory entries, and blocks.
Interact with indirection in data structures.
Simulate a block-level disk interface with mmap().

Overview

You've probably taken file systems for granted your whole life: you click save, and the file is safely stored on the disk for you to retrieve later. For this lab, you'll develop your very own file system so you can see what goes in to making that 'save' work. Unlike a real FS though, we'll be focusing only on correctness rather than performance too. It's difficult enough making the FS be reliable!

To interface with the OS, we'll use the File System in Userspace (FUSE) library, which allows you to write a userspace program that manages files and integrate them into the OS's file structure. That is, when your userspace program is running, you'll be able to interact with your file system from the terminal using standard commands (ls, mkdir, rm, etc.). FUSE is the same mechanism that SSHFS uses to provide files over an SSH connection.

To mount your file system, you'll need to choose a location in the system's file tree to "graft" your files. Because of the way FUSE works, you cannot mount on top of an NFS file system (unless you're the root user, which you aren't). Thus, you will not be able to mount within your home directory. Luckily, you still have /local available to you. I would suggest making a new empty directory in /local: /local/fuse-username to use for testing. To be clear, you can still store, edit, build, and run your swatfs code from your home directory. When you execute it, you need to tell it to mount your files in a non-NFS location like /local.

You can mount the FS with this command:

# The -d and -s are FUSE parameters:
# -d: debug mode - provide output about the operations that are happening
# -s: single-threaded mode - you probably don't want to worry about concurrency right now...
./swatfs -d -s /local/fuse-username [disk image]

Replace the "fuse-username" above with the directory you created in /local for testing. The disk image will be the path to a file that represents a "disk" that has been formatted with the swatfs format. I've provided an example disk image, named test-disk.img that is pre-populated with a few directories and files. Use this to test your FS's read functionality before you have writing implemented. You can create a new disk image with a combination of dd and swatfs_mkfs:

# Create a 100-block disk image file, named disk.img, with a block size of 4096 bytes.
dd if=/dev/zero of=disk.img bs=4096 count=100

# Format the file system for use with swatfs:
./swatfs_mkfs disk.img

If you break a file system image (e.g. you attempt to write and something goes wrong), you can always reformat it to bring it back to a clean slate (no files, all blocks free, etc.).

To unmount your file system, you want to cleanly remove it from the system's file tree. Don't just Ctrl-C your swatfs program. Instead, run fusermount -u [mount point]. That should terminate your userspace process and unmount the FS cleanly.

Requirements

Warning: The amount of code you'll need to write for this lab is likely to be lager than previous labs in this course. You should start this lab early.

Your file system should support creating, reading, writing, and removing both regular files and directories. It should correctly report file attributes (e.g., if the user calls stat(), which triggers your FUSE getattr()). To make all this happen, you should provide implementations for all of the empty swatfs_ functions in swatfs.c. Ultimately, typical commands like ls, cd, rm, rmdir, I/O redirection (< and >), and text editors should all be able to interact with the files and directories in your file system.

When dealing with file attributes, you only need to worry about those that exist in the inode struct defined in swatfs_types.h. For example, your FS should properly update a file's modification time, but it does not need to worry about access time, since there is no access time field in your inode struct.

Your file system should detect errors when they happen, and when they do, set errno and return an appropriate error value. It's not feasible for me to dictate every single possible error that might occur, so please choose error constants that describe the error condition as closely as possible. You can see a list of error constants using errno --list or search for specific words using errno --search [word]. Try to be as specific as possible. For example, if the user asks to remove a directory that isn't empty, don't use EIO (I/O error), even though this is an I/O related problem, because ENOTEMPTY is a much better fit.

You should use the data structures defined in swatfs_types.h without modifying them. They're designed to be a particular size that evenly divide blocks. The function prototypes for the swatfs_ functions in swatfs.c are dictated by the FUSE library, so it will fail to compile if you change those. Otherwise, I don't care too much how you structure things or where you put code. I would strongly suggest factoring out common functionality into helper functions though. There are several places where functions have major overlap.

Your implementation should use the #defined constants from swatfs_types.h when dealing with numerical constants, and it should continue to function properly if those constants change.

You may always assume:
- The root directory (path "/") will always be described by the inode whose number is 0.
- There will not be multiple hard links for a file. That is, for every inode in the system (other than inode 0, which is special), the links field will be either 0 (inode is free) or 1 (inode is being used). It will not be larger than 1.
- No path parameter given to your swatfs_ functions will be larger than SWATFS_NAMELEN.
- There will be no gaps in the files you are asked to write. For example, if you have an existing file that is 100 bytes long (bytes 0 through 99), the user can ask to do a write starting at any offset from 0 through 100, where "100" means "add new data to the end of the file". They would not be allowed to request a write to an offset >=101 without first writing to 100.
- Directories will not need to use indirect block pointers. That is, there will not be more than 64 entries in a directory when using a 4096-byte block size.
- The size of a struct inode and the size of a struct dirent will evenly divide the size of a block.

Your file system implementation should get a clean bill of health from valgrind: no invalid memory accesses, uninitialized variables, or memory leaks.

Checkpoint

For the checkpoint, your file system should be able to read directories (e.g., ls should work) and read regular files. Your GitHub repository contains a disk image named test-disk.img that I've pre-populated with a few directories and files of varying sizes for testing purposes. You should be able to list and read all of those. For the .jpg image file, you can use the eog application ("eye of GNOME") to open it from the command line.

Starter Code Map

This lab uses several source files to divide up the functionality across logical boundaries. With the exception of the structs in swatfs_types.h, you're allowed to modify any code you'd like. Most of your changes are expected to go in swatfs.c and swatfs_directory.c:

swatfs.c: the main file that defines and implements the core file system functions.

swatfs_directory.c: a place to put directory-related helper functions. You can use this, or not, I don't really care. It has several examples of function prototypes that I found to be useful in my implementation. If you want to put everything in swatfs.c I'm not going to stop you, but it might get cluttered...

While you're allowed to modify the other files, I don't think you'll need to. They're there to provide you with basic functionality like retrieving/storing data to the disk. The corresponding headers for each of these has more details about the functions, their arguments, and their return values:

swatfs_disk.c: provides functions to simulate low-level disk access: read a block or write a block by providing a block number. Note: these functions transfer an entire block. If you're trying to just change a small portion of a block, you need to read the rest of the block first so that you don't overwrite the pieces you'd like to keep. swatfs_inode.c does this when writing inodes, so take a look there for an example.

swatfs_blockmap.c: keeps track of the status (free vs. in-use) of data blocks on the disk. You can query these functions to give you a free block or tell it to mark a block that you no longer need as free.

swatfs_inode.c: allows you to read and write inode data by providing the inode number (inum). Also has a function for getting the inode number of the next free inode.

swatfs_mkfs.c: this file won't be used by your FS implementation directly. It defines a standalone program that formats a file to be used as a disk image.

swatfs_types.h: defines structs for the key FS data structures you'll be interacting with: inodes and directory entries. It also defines several important constants that will come up repeatedly in your implementation. The structures have been carefully designed to be appropriate sizes that evenly divide disk blocks, so you shouldn't change the struct definitions!

With all the default settings, the block layout and inode block pointers will look like these figures:

The structure of the block pointers in a SwatFS inode.

Tips

Start early, and start testing as soon as you have enough functionality implemented to do so!

Factor out repeated functionality into helper functions. In some cases, there are pairs of functions that are extremely similar (e.g., mkdir and create). You can probably do most if not all of them both in one helper function.

When reading a block from the disk, you'll often want to treat the block's data differently depending on how you intend to use it. For example, if you know the block you're reading will contain directory entries, you'd like it to be typed as such in the C type system. When declaring variables for data blocks, you can use the following tricks:
- For directory entries, you can declare a block as: struct dirent block[SWATFS_BSIZE / sizeof(struct dirent)];
- For block pointers, you can declare a block as: uint32_t block[SWATFS_BSIZE / sizeof(uint32_t)];
- For generic user data, you can declare a block using characters: char block[SWATFS_BSIZE];
The implementation of inodes.c uses this idea if you'd like to see an example.

FUSE's readdir function is a little weird. It gives you a "filler" that you can use to populate a buffer with names from the directory. For each name, call filler(buf, name, NULL, 0). That should take care of formatting the output so that you don't have to.

The mode attribute of a file encodes both the file's type and its access permissions. You can use the S_ISREG() and S_ISDIR() macros to test the file type. See the stat() manual for more details about the mode attribute.

When you're adding a new entry to a directory, make sure to initialize all the fields of the new inode, including the block map.

Submitting

Please remove any excessive debugging output prior to submitting.

To submit your code, commit your changes locally using git add and git commit. Then run git push while in your lab directory.

CS 45 Lab 5: Building a File System