1. Introduction
In this lab, you will learn how to build a simplified BitTorrent client in C. The client will implement core functionalities of the BitTorrent protocol, including connecting to peers, requesting pieces, downloading data, and verifying its integrity. This hands-on lab will help you understand networking concepts, protocols, and file-sharing mechanisms.
2. Objectives
After completing this lab, you should be able to:
-
Parse
.torrent
files and extract metadata. -
Establish peer-to-peer connections using the BitTorrent protocol.
-
Request and download pieces of a file from peers.
-
Validate downloaded data using hashing.
-
Assemble and store the file in the correct order.
Refer to the official BitTorrent protocol specification: https://www.bittorrent.org/beps/bep_0003.html
3. Introduction
The BitTorrent protocol is one of the most popular peer-to-peer (P2P) file-sharing systems. Unlike traditional client-server models, BitTorrent decentralizes file distribution by dividing files into small chunks called pieces and allowing peers to share these pieces with one another. This document provides an in-depth overview of the protocol and its key components.
4. Parts of the BitTorrent Protocol
Before diving into the protocol details, let’s clarify some important terms:
-
Torrent File: A
.torrent
file contains metadata about the shared file(s), including:-
File names and sizes
-
Piece length (in bytes)
-
SHA-1 hashes for integrity checks
-
Tracker URL for peer discovery
-
-
Tracker: A central server that coordinates peers by providing their IP addresses and port numbers.
-
Peers: Clients participating in the file-sharing network. Peers can have the following roles:
-
Leechers: Downloaders that request pieces from others.
-
Seeders: Uploaders that have the complete file.
-
-
Pieces: Files are divided into fixed-size chunks called pieces. Each piece is further split into smaller units called blocks during transmission.
-
Info Hash: A SHA-1 hash of the "info" section of the
.torrent
file, used to uniquely identify the torrent.
5. Protocol Overview
The BitTorrent protocol operates in distinct phases:
-
Metadata Distribution: The
.torrent
file is distributed to users. This file contains the necessary metadata for downloading and verifying the file. -
Peer Discovery: The client contacts a tracker (or uses decentralized methods like DHT) to find peers.
-
Peer Communication: The client establishes TCP connections with other peers using a well-defined handshake.
-
Piece Exchange: Peers exchange pieces of the file using a system of requests and responses.
-
Data Integrity: Each piece is verified using the SHA-1 hash from the
.torrent
file.
6. Handshake Protocol
The handshake is the first step in peer-to-peer communication. It ensures that both peers are downloading the same file and confirms their identities.
6.1. Handshake Message Format
The handshake message consists of:
[Length (1 byte)][Protocol Name (19 bytes)][Reserved (8 bytes)][Info Hash (20 bytes)][Peer ID (20 bytes)]
-
Protocol Name: Always
"BitTorrent protocol"
(19 bytes). -
Reserved: 8 bytes reserved for future use.
-
Info Hash: A SHA-1 hash of the
info
dictionary from the.torrent
file. -
Peer ID: A 20-byte unique identifier for the client.
6.2. Example
A client sends a handshake message to a peer:
Length: 19 Protocol Name: BitTorrent protocol Reserved: 00000000 Info Hash: <20-byte SHA-1 hash> Peer ID: -MY0001-123456789012
The peer responds with a similar message. If the Info Hash
matches, the connection proceeds.
7. Message Protocol
After the handshake, peers communicate using a series of messages. Each message follows this format:
[Message Length (4 bytes)][Message Type (1 byte)][Payload (variable)]
7.1. Common Message Types
-
Choke (Type: 0): Informs a peer that it cannot request pieces.
-
Unchoke (Type: 1): Informs a peer that it can request pieces.
-
Interested (Type: 2): Indicates that a peer is interested in downloading pieces.
-
Have (Type: 4): Notifies a peer that the sender has a specific piece.
-
Bitfield (Type: 5): Sends a bitmap of pieces that the peer possesses.
-
Request (Type: 6): Requests a specific block of data: +
` [Index (4 bytes)][Begin (4 bytes)][Length (4 bytes)]
` -
Piece (Type: 7): Sends a block of data:
` [Index (4 bytes)][Begin (4 bytes)][Block (variable)]
` -
Cancel (Type: 8): Cancels a previously sent request.
7.2. Example Exchange
-
Peer A sends an Interested message to Peer B.
-
Peer B responds with an Unchoke message.
-
Peer A sends a Request for a block of a specific piece.
-
Peer B sends the requested Piece.
8. File Download Process
-
Bitfield Exchange: When two peers connect, they exchange Bitfield messages to indicate which pieces they already have.
-
Piece Requests:
-
A peer requests specific blocks of pieces it does not have.
-
Multiple requests can be sent in parallel to different peers.
-
-
Piece Assembly:
-
The client assembles blocks into complete pieces.
-
Each piece is validated using the SHA-1 hash from the
.torrent
file.
-
-
File Assembly:
-
Validated pieces are written to their correct position in the output file.
-
9. Data Integrity
Data integrity is critical in the BitTorrent protocol. Each piece has a corresponding SHA-1 hash in the .torrent
file.
-
SHA-1 Verification: After assembling a piece, the client calculates its SHA-1 hash and compares it with the expected value.
-
Handling Corrupt Data:
-
If the hash does not match, the piece is discarded.
-
The client re-requests the piece from another peer.
-
10. Peer-to-Peer Optimization
The BitTorrent protocol includes optimizations to improve download speed and fairness:
-
Rare Piece Priority: Peers prioritize downloading rarer pieces to ensure their availability.
-
Tit-for-Tat Algorithm: Peers upload to those who upload to them, fostering cooperative behavior.
-
Endgame Mode: Near the end of a download, the client requests remaining blocks from all peers to minimize delays.
11. Advantages of BitTorrent
-
Scalability: File distribution scales with the number of peers.
-
Decentralization: Files are distributed without relying on a central server.
-
Efficient Bandwidth Usage: Peers share pieces, reducing the load on individual uploaders.
12. Limitations of BitTorrent
-
Tracker Dependence: Centralized trackers can fail or be taken offline.
-
Integrity Risks: Malicious peers may upload corrupt data.
-
Legal Issues: BitTorrent is often associated with piracy, despite legitimate use cases.
More details in the protocol documentation: https://www.bittorrent.org/beps/bep_0003.html
12.1. Understanding the BitTorrent Protocol
-
Peer Handshake: The client must initiate a handshake with peers using a predefined protocol message.
-
Piece Requests: Once connected, the client requests blocks of a file from peers.
-
Integrity Check: The client validates each downloaded piece using its SHA-1 hash.
13. Tasks
13.1. Task 1: Parse .torrent
Files
-
Use the provided
bencode.h
library to parse.torrent
files. -
Extract the following fields:
-
announce
: Tracker URL. -
info_hash
: SHA-1 hash of theinfo
dictionary. -
piece_length
andpieces_hashes
.
-
-
Print the extracted metadata to verify your implementation.
13.2. Task 2: Implement Peer Handshake
-
Write a function to perform the BitTorrent peer handshake.
-
Test the handshake by connecting to a peer and verifying the response.
13.3. Task 3: Download Pieces
-
Implement the logic to request pieces from peers in blocks.
-
Ensure blocks are received in the correct order and reassembled into full pieces.
13.4. Task 4: Validate Data Integrity
-
Use SHA-1 hashing to verify the integrity of each piece.
-
Discard invalid pieces and retry the download.
13.5. Task 5: Store and Assemble the File
-
Store validated pieces in the correct location in the output file.
-
Ensure the output file matches the size and structure defined in the
.torrent
metadata.
13.6. Lab Setup
-
Source Code:
-
Your
bittorrent_client.c
file with all required functionality implemented. -
A
Makefile
for compiling the code.
-
-
Documentation:
-
An explanation of your implementation in
README.md
. -
Describe how each protocol step is implemented.
-
-
Test Results:
-
A text file (
results.txt
) showing the program’s output when downloading a sample file.
-
14. Evaluation Criteria
Your submission will be evaluated based on:
-
Functionality: The client correctly downloads and assembles the file (50%).
-
Data Integrity: The client validates all downloaded pieces (20%).
-
Code Quality: Proper use of C programming principles (15%).
-
Documentation: Clear and detailed explanations in the README file (15%).
15. Additional Resources
-
Official BitTorrent Protocol Documentation: https://www.bittorrent.org/beps/bep_0003.html
-
Networking in C: https://www.beej.us/guide/bgnet/
-
SHA-1 Hashing with OpenSSL: https://www.openssl.org/docs/man1.1.1/man3/SHA1.html