Swarthmore College Department of Computer Science

Talk by Charles Killian of Purdue University

Building and Testing Distributed Systems
Wednesday, Jan 25, 2012
4:00 pm, Science Center 240

Abstract

Building distributed systems is particularly difficult because of the asynchronous, heterogeneous, and failure-prone environment where these systems must run. This asynchrony makes verifying the correctness of systems implementations even more challenging. Tools for building distributed systems must strike a compromise between reducing programmer effort and increasing system efficiency. In my research, we strive to introduce a limited amount of structure and limitations to implementations to enable a wide range of analysis and development assistance. Most prominently, we have built the Mace language and runtime, which translates a concise by expressive distributed system specification into a C++ implementation. The Mace specification importantly exposes three key pieces of structure: atomic events, explicit state, and explicit messaging. By leveraging these three structural elements, we have built tools such as a model checker capable of detecting liveness violations in systems code, a performance tester, and an automated malicious protocol tester. Recent research has also explored applications of these key structures in legacy software, that has produced a log analysis tool that can detect performance problems, and a malicious fault injector that can discover successful performance attacks. Mace is fully operational, has been in development since 2004, and has been used to build a wide variety of Internet-ready distributed systems both by myself and by researchers at places such as Cornell University, Microsoft Research (Redmond, Silicon Valley, and Beijing), HP Labs, UCLA, EPFL, and UCSD. This talk will describe the Mace design and model checker, and highlight recent research directions.

Refreshments at 3:45