``Reliability Support for Network Swapping Enabled Clusters"
America Holloway: CS and Math double major, class of 2005.
Heather Jones: Engineering major (focus on Electrical and Computer Engineering), CS minor, class of 2006.
Jennifer Barry Likely Physics and CS double major, class of 2007 (students don't declare majors until the end of their sophomore year).
Tia Newhall: Assistant Professor of Computer Science
The goal of our project is to develop, implement, and test reliability algorithms for a network swapping system running on Linux clusters. This will be part of Professor Newhall's ongoing cluster computing project, Nswap . Network swapping systems allow individual cluster nodes with over-committed memory to use the idle memory of remote nodes as their backing store, and to swap their pages over the network. Network swapping is motivated by two observations: first, network speeds are getting faster more quickly than are disk speeds, and this disparity is likely to grow ; second, there is often a significant amount of idle memory space in a cluster that can be used for storing remotely swapped pages [1, 3, 9]. Thus swapping to local disk will be slower than transferring pages over a faster network and using remote idle memory as a ``swap device".
As the number of nodes in a cluster increases, it becomes more likely that a node will fail or become unreachable, making it important that such a system provide reliability support. Without reliability, a single node crash can affect programs running on other cluster nodes by losing remotely swapped page data that was stored on the crashed node. Any reliability support will add extra time and space overhead to remote swapping; reliability information uses idle RAM space that could be used for storing remote pages and typically requires some extra computational and message passing overhead. RAID-based reliability schemes are likely to provide a good balance between reliability and cost in addition to providing some flexibility. However, Nswap has design features that make implementing a strict RAID-like reliability scheme difficult. First, Nswap is designed to adapt to each node's local memory needs. The amount of local RAM space each node makes available to Nswap for storing remotely swapped pages grows and shrinks in response to the node's local processing needs: the amount of "swap" space is not fixed in size, and an individual node's Nswap storage capacity changes over time. The second difficulty is caused by Nswap's support for migrating remotely swapped pages between cluster nodes. Remote page migration occurs when a node needs to reclaim some of its RAM space from Nswap to use for local processing. Page migration complicates reliability support. For example, two pages in the same parity group could end up on the same node, resulting in a loss of reliability for that parity group. Prior work in reliability schemes for network swapping systems have relied on fixed placement of page and reliability data and are not applicable to Nswap.
We have studied some initial work investigating reliability algorithms for Nswap. We may use this as a starting point for developing reliability algorithms, or we may start from scratch.
This is a fairly ambitious project, and it is possible that we will not be able to complete full development, implementation, and testing of our solutions during the 2004-2005 academic year. However, we should make significant progress towards a completed project. In addition, there is the possibility of funding from Swarthmore College to support the students for 10 weeks during the summer of 2005 to continue work on this project.
Two of the students, America Holloway and Jennifer Barry will work on the project during both fall and spring semester of the 04-05 academic year. The other student, Heather Jones, will be abroad during the spring semester of 05, and thus will work on the project only during the fall semester.
The main question we will address is how to efficiently add reliability support to Nswap by making the best use of cluster-wide storage (idle RAM or disk) for reliability data. Our goal is to develop algorithms that minimize added time and space overhead associated with reliability support. In particular, we will focus on algorithms that have a minimal impact on the cluster node(s) that is currently swapping pages. We also need to solve problems associated with maintaining reliability of swapped page data in the presence of Nswap's remote page migration. In addition, it is important that we develop solutions that scale to large cluster sizes, and that fit with Nswap's asynchronous, decentralized design.
Once we have developed some algorithms that we think are efficient, scalable, and solve the problem of providing reliability for network swapping systems like Nswap, we will implement and test our solutions. Nswap is written as a loadable kernel module for Linux 2.4, running entirely in kernel space. We will need to investigate how best to incorporate reliability code into the existing Nswap code. In particular, since we want to implement, test, and evaluate several solutions, we need a way to switch easily between our reliability algorithms.
During the testing phase, we will need to discover how best to evaluate our solutions. We need to determine what metrics are important for comparing our solutions, and how best to measure the overhead added by reliability support. We will need to obtain both micro and macro benchmarks of our system, which will involve finding or developing benchmarks that test reliability in a "normal" cluster workload. We will develop hypotheses of conditions under which we think our solutions will perform well and conditions under which we think they may not perform as well, and then develop cluster workloads that will produce these conditions so that we can test our hypotheses. In addition, we must determine how best to add a mechanism to the cluster to simulate node failure so that we can test and measure the recovery part of our reliability schemes.
Much of the first semester of this project will be devoted to background research. We will meet weekly to discuss papers on distributed and cluster computing, reliability, and network swapping. In particular, we will read about several previous projects that examine using remote idle memory as backing store for nodes in networks of workstations [2, 5, 9, 7, 4, 12, 6]. It will be important to read this related work and to understand how Nswap is similar to and different from this work and why. We also will spend time reading and understanding Nswap's implementation and learning how to compile, load, run and debug a Linux loadable kernel module (lkm). One student, America Holloway, has experience with Linux kernel code and lkms through work she did in the Operating Systems course.
Near the end of the first semester we will be ready to start developing our own reliability algorithms. At this point the focus of our weekly group meetings will be on group problem solving to develop good algorithms. We will analyze proposed solutions both to see if they solve the problem, and to determine how well they solve it in the context of the goals of our project.
The second semester will be devoted to finalizing one or more solutions that we think will work well, to determining how to add our solutions to Nswap, and to implementing and testing our solutions. We will work together and independently during the week on implementation and testing, and we will continue our weekly group meetings to discuss our progress.
Anurag Acharya and Sanjeev Setia. Availability and Utility of Idle Memory on Workstation Clusters. In ACM SIGMETRICS Conference on Measuring and Modeling of Computer Systems, pages 35-46, May 1999.
T. Anderson, D. E. Culler, D. A. Patterson, and the NOW Team. A case for NOW (Networks of Workstations). IEEE Micro, Febuary 1999.
Remzi H. Arpaci, Andrea C. Dusseau, Amin M. Vahdat, Lok T. Liu, Thomas E. Anderson, and David A. Patterson. The Interaction of Parallel and Sequential Workloads on a Network of Workstations. In ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 267-278, 1995.
G. Bernard and S. Hamma. Remote Memory Paging in Networks of Workstations. In SUUG'94 Conference, April 1994.
Michael J. Feeley, William E. Morgan, Frederic H. Pighin, Anna R. Karlin, Henry M. Levy, and Chandramohan A. Thekkath. Implementing Global Memory Management in a Workstation Cluster. In 15th ACM Symposium on Operating Systems Principles, December 1995.
Michail D. Flouris and Evangelos P. Markatos. Network RAM, in High Performance Cluster Computing: Architectures and Systems, Chapt. 16. Prentice Hall, 1999.
Liviu Iftode, Karin Petersen, and Kai Li. Memory Servers for Multicomputers. In IEEE COMPCON'93 Conference, Febuary 1993.
John L. Hennessy and David A. Patterson. Computer Architectures A Quantitative Approach, 3rd Edition. Morgan Kaufman, 2002.
Evangelos P. Markatos and George Dramitinos. Implementation of a Reliable Remote Memory Pager. In USENIX 1996 Annual Technical Conference, 1996.
Tia Newhall, Sean Finney, Kuzman Ganchev, and Michael Spiegel. Nswap: A network swapping module for linux clusters. In Euro-Par'03 International Conference on Parallel and Distributed Computing. Klagenfurt, Austria, 2003.
David A. Patterson, Garth Gibson, and Randy H. Katz. A case for redundant arrays of inexpensive disks (RAID). In ACM SIGMOD International Conference on Management of Data, pages 109-116, 1988.
Li Xiao, Xiaodong Zhang, and Stefan A. Kubricht. Incorporating Job Migration and Network RAM to Share Cluster Memory Resources. In Ninth IEEE International Symposium on High Performance Distributed Computing (HPDC'00), 2000.
Students will be responsible for carrying out the research described in this proposal. Students will meet weekly as a group with Professor Newhall to discuss research papers and to participate in group problem solving. In addition, students will be responsible for working together to implement and test the project. Students will document their work by keeping research notes. They also will be responsible for disseminating their project in a poster session at a conference, and by giving a talk about their work at the CS department's colloquium series.
The faculty member will be responsible for coming up with background readings, for teaching students how to read research papers, for fostering group problem solving, and will work with the students in the design, implementation, testing, and evaluation of the project.
We request a total of $3,000 for our project. Two thousand dollars will be used as $1,000 stipends for the two full-year students (America Holloway and Jennifer Barry), and $500 will be used as a stipend for the half-year student (Heather Jones). The remaining $500 will be used to fund the students' attendance at a conference to present a paper or poster of their work. We have a good chance of getting a poster accepted to the student research poster session at the Consortium for Computing Sciences in Colleges Northeastern Conference (CCSCNE 2005) in April of 2005.