1. Introduction

In this lab, you will learn how to build a simplified BitTorrent client in C. The client will implement core functionalities of the BitTorrent protocol, including connecting to peers, requesting pieces, downloading data, and verifying its integrity. This hands-on lab will help you understand networking concepts, protocols, and file-sharing mechanisms.

2. Objectives

After completing this lab, you should be able to:

  • Parse .torrent files and extract metadata.

  • Establish peer-to-peer connections using the BitTorrent protocol.

  • Request and download pieces of a file from peers.

  • Validate downloaded data using hashing.

  • Assemble and store the file in the correct order.

Refer to the official BitTorrent protocol specification: https://www.bittorrent.org/beps/bep_0003.html

3. Introduction

The BitTorrent protocol is one of the most popular peer-to-peer (P2P) file-sharing systems. Unlike traditional client-server models, BitTorrent decentralizes file distribution by dividing files into small chunks called pieces and allowing peers to share these pieces with one another. This document provides an in-depth overview of the protocol and its key components.

4. Parts of the BitTorrent Protocol

Before diving into the protocol details, let’s clarify some important terms:

  • Torrent File: A .torrent file contains metadata about the shared file(s), including:

    • File names and sizes

    • Piece length (in bytes)

    • SHA-1 hashes for integrity checks

    • Tracker URL for peer discovery

  • Tracker: A central server that coordinates peers by providing their IP addresses and port numbers.

  • Peers: Clients participating in the file-sharing network. Peers can have the following roles:

    • Leechers: Downloaders that request pieces from others.

    • Seeders: Uploaders that have the complete file.

  • Pieces: Files are divided into fixed-size chunks called pieces. Each piece is further split into smaller units called blocks during transmission.

  • Info Hash: A SHA-1 hash of the "info" section of the .torrent file, used to uniquely identify the torrent.

5. Protocol Overview

The BitTorrent protocol operates in distinct phases:

  1. Metadata Distribution: The .torrent file is distributed to users. This file contains the necessary metadata for downloading and verifying the file.

  2. Peer Discovery: The client contacts a tracker (or uses decentralized methods like DHT) to find peers.

  3. Peer Communication: The client establishes TCP connections with other peers using a well-defined handshake.

  4. Piece Exchange: Peers exchange pieces of the file using a system of requests and responses.

  5. Data Integrity: Each piece is verified using the SHA-1 hash from the .torrent file.

6. Handshake Protocol

The handshake is the first step in peer-to-peer communication. It ensures that both peers are downloading the same file and confirms their identities.

6.1. Handshake Message Format

The handshake message consists of:

[Length (1 byte)][Protocol Name (19 bytes)][Reserved (8 bytes)][Info Hash (20 bytes)][Peer ID (20 bytes)]
  • Protocol Name: Always "BitTorrent protocol" (19 bytes).

  • Reserved: 8 bytes reserved for future use.

  • Info Hash: A SHA-1 hash of the info dictionary from the .torrent file.

  • Peer ID: A 20-byte unique identifier for the client.

6.2. Example

A client sends a handshake message to a peer:

Length: 19 Protocol Name: BitTorrent protocol Reserved: 00000000 Info Hash: <20-byte SHA-1 hash> Peer ID: -MY0001-123456789012

The peer responds with a similar message. If the Info Hash matches, the connection proceeds.

7. Message Protocol

After the handshake, peers communicate using a series of messages. Each message follows this format:

[Message Length (4 bytes)][Message Type (1 byte)][Payload (variable)]

7.1. Common Message Types

  1. Choke (Type: 0): Informs a peer that it cannot request pieces.

  2. Unchoke (Type: 1): Informs a peer that it can request pieces.

  3. Interested (Type: 2): Indicates that a peer is interested in downloading pieces.

  4. Have (Type: 4): Notifies a peer that the sender has a specific piece.

  5. Bitfield (Type: 5): Sends a bitmap of pieces that the peer possesses.

  6. Request (Type: 6): Requests a specific block of data: + ` [Index (4 bytes)][Begin (4 bytes)][Length (4 bytes)] `

  7. Piece (Type: 7): Sends a block of data: ` [Index (4 bytes)][Begin (4 bytes)][Block (variable)] `

  8. Cancel (Type: 8): Cancels a previously sent request.

7.2. Example Exchange

  1. Peer A sends an Interested message to Peer B.

  2. Peer B responds with an Unchoke message.

  3. Peer A sends a Request for a block of a specific piece.

  4. Peer B sends the requested Piece.

8. File Download Process

  1. Bitfield Exchange: When two peers connect, they exchange Bitfield messages to indicate which pieces they already have.

  2. Piece Requests:

    • A peer requests specific blocks of pieces it does not have.

    • Multiple requests can be sent in parallel to different peers.

  3. Piece Assembly:

    • The client assembles blocks into complete pieces.

    • Each piece is validated using the SHA-1 hash from the .torrent file.

  4. File Assembly:

    • Validated pieces are written to their correct position in the output file.

9. Data Integrity

Data integrity is critical in the BitTorrent protocol. Each piece has a corresponding SHA-1 hash in the .torrent file.

  1. SHA-1 Verification: After assembling a piece, the client calculates its SHA-1 hash and compares it with the expected value.

  2. Handling Corrupt Data:

    • If the hash does not match, the piece is discarded.

    • The client re-requests the piece from another peer.

10. Peer-to-Peer Optimization

The BitTorrent protocol includes optimizations to improve download speed and fairness:

  1. Rare Piece Priority: Peers prioritize downloading rarer pieces to ensure their availability.

  2. Tit-for-Tat Algorithm: Peers upload to those who upload to them, fostering cooperative behavior.

  3. Endgame Mode: Near the end of a download, the client requests remaining blocks from all peers to minimize delays.

11. Advantages of BitTorrent

  1. Scalability: File distribution scales with the number of peers.

  2. Decentralization: Files are distributed without relying on a central server.

  3. Efficient Bandwidth Usage: Peers share pieces, reducing the load on individual uploaders.

12. Limitations of BitTorrent

  1. Tracker Dependence: Centralized trackers can fail or be taken offline.

  2. Integrity Risks: Malicious peers may upload corrupt data.

  3. Legal Issues: BitTorrent is often associated with piracy, despite legitimate use cases.

More details in the protocol documentation: https://www.bittorrent.org/beps/bep_0003.html

12.1. Understanding the BitTorrent Protocol

  • Peer Handshake: The client must initiate a handshake with peers using a predefined protocol message.

  • Piece Requests: Once connected, the client requests blocks of a file from peers.

  • Integrity Check: The client validates each downloaded piece using its SHA-1 hash.

13. Tasks

13.1. Task 1: Parse .torrent Files

  1. Use the provided bencode.h library to parse .torrent files.

  2. Extract the following fields:

    • announce: Tracker URL.

    • info_hash: SHA-1 hash of the info dictionary.

    • piece_length and pieces_hashes.

  3. Print the extracted metadata to verify your implementation.

13.2. Task 2: Implement Peer Handshake

  1. Write a function to perform the BitTorrent peer handshake.

  2. Test the handshake by connecting to a peer and verifying the response.

13.3. Task 3: Download Pieces

  1. Implement the logic to request pieces from peers in blocks.

  2. Ensure blocks are received in the correct order and reassembled into full pieces.

13.4. Task 4: Validate Data Integrity

  1. Use SHA-1 hashing to verify the integrity of each piece.

  2. Discard invalid pieces and retry the download.

13.5. Task 5: Store and Assemble the File

  1. Store validated pieces in the correct location in the output file.

  2. Ensure the output file matches the size and structure defined in the .torrent metadata.

13.6. Lab Setup

  1. Source Code:

    • Your bittorrent_client.c file with all required functionality implemented.

    • A Makefile for compiling the code.

  2. Documentation:

    • An explanation of your implementation in README.md.

    • Describe how each protocol step is implemented.

  3. Test Results:

    • A text file (results.txt) showing the program’s output when downloading a sample file.

14. Evaluation Criteria

Your submission will be evaluated based on:

  • Functionality: The client correctly downloads and assembles the file (50%).

  • Data Integrity: The client validates all downloaded pieces (20%).

  • Code Quality: Proper use of C programming principles (15%).

  • Documentation: Clear and detailed explanations in the README file (15%).

15. Additional Resources