Total Recall



Total Recall is an effort to address the issues of reliability and availability in systems built over highly unavailable components. The need for explicit availability management is especially required when components undergo frequent, transient failures, and more so when the characteristics of these components changes with time.

Currently, the focus of Total Recall is on building automated availability management in peer-to-peer systems. Transient failures are frequent in such systems, where hosts leave the network periodically, to come back at a later time. The behavior of hosts in peer-to-peer systems needs to be understood, and system support has to be provided to take into account these characteristics.

To understand host characteristics, we have performed a measurement study, and have evaluated several means of providing redundancy in peer-to-peer systems. We are currently in the process of applying our findings to the design and implementation of a highly available mutable file system.