logo epfl
 Nicolas Bonvin
home page
 English only
     EPFL > I&C > LSIR > Nicolas Bonvin > Skute
 sommaire

Contact

Projects

Publications

Teaching

Short biography

Blog

Twitter logo

Failures of any type are common in current data-centers. As data scales up, its availability becomes more complex, while different availability levels per application or per data item may be required. Skute is a self-managed key-value store that dynamically allocates the resources of a data cloud to several applications in a cost-efficient and fair way. Our approach offers and dynamically maintains multiple differentiated availability guarantees to each different application despite failures. We employ a virtual economy, where each data partition acts as an individual optimizer and chooses whether to migrate, replicate or remove itself based on net benefit maximization regarding the utility offered by the partition and its storage and maintenance cost. Comprehensive experimental evaluations suggest that our solution is highly scalable and adaptive to query rate variations and to resource upgrades/failures.

Skute has the following properties:

  • it provides geographical replication of data
  • it ensures high availabilty of data by maximizing the geographical diversity of replicas. Hence, 2 replicas of the same data partition will be hosted on 2 servers located in the same rack with a miminal probability
  • it handles load peaks or flash crowds by replicating popular data partitions closed to the users
  • thanks to our economic model, the load among the servers in the cloud is balanced, resulting in better global resources usage

Figure 1 - Virtual rings: three applications with different availability levels.
Virtual Rings

Our approach combines the following innovative characteristics:

  • it enables a computational economy for cloud storage resources.
  • it provides differentiated availability statistical guarantees to different applications despite failures by geographical diversification of replicas.
  • it applies a distributed economic model for the cost-efficient self-organization of data replicas in the cloud storage that is adaptive to adding new storage, to node failures and to client locations.
  • it efficiently and fairly utilizes cloud resources by performing load balancing in the cloud adaptively to the query load.
Optimal replica placement is based on distributed net benefit maximization of query response throughput minus storage as well as communication costs, under the availability constraints. The optimality of the approach is proved by comparing simulation results to those expected by numerically solving an analytical form of the global optimization problem. Also, a game-theoretic model is employed to observe the properties of the approach at equilibrium. A series of simulation experiments prove the aforementioned characteristics of the approach. Finally, employing a fully working prototype of Skute, we experimentally demonstrate its applicability in real settings.

Virtual Ring

Our approach employs the concept of multiple virtual rings on a single cloud in an innovative way. Thus, as subsequently explained, we allow multiple applications to share the same cloud infrastructure for offering differentiated per data item and per application availability guarantees without performance conflicts. Each application uses its own virtual rings, while one ring per availability level is needed, as depicted in Figure 1. Each virtual ring consists of multiple virtual nodes that are responsible for different data partitions of the same application that demand a specific availability level. This approach provides the following advantages over existing key-value stores:

  • Multiple data availability levels per application. Within the same application, some data may be crucial and some may be less important. In other words, an application provider may want to store data with different availability guarantees. Unlike existing approaches, Skute allows a fine-grained control of the resources of each server, as every virtual node of each virtual ring acts as an individual optimizer, thus minimizing the impact among applications.
  • Geographical data placement per application. Data that is mostly accessed from a given geographical region should be moved close to that region. Without the concept of virtual rings, if multiple applications were using the same data store, data of different applications would have to be stored in the same partition, thus removing the ability to move data close to the clients. However, by employing multiple virtual rings, Skute is able to provide one virtual store per application, allowing the geographical optimization of data placement.
We have implemented a fully working prototype of Skute on top of Project Voldemort, which is an open source implementation of Amazon Dynamo written in Java. Servers are not synchronized and no centralized component is required.

 

Publications

  A self-organized, fault-tolerant and scalable replication scheme for cloud storage
Nicolas Bonvin, Thanasis G. Papaioannou, Karl Aberer
In ACM Symposium on Cloud Computing 2010 (SOCC2010), June 10-11, 2010, Indianapolis, USA
  Cost-efficient and Differentiated Data Availability Guarantees in Data Clouds
Nicolas Bonvin, Thanasis G. Papaioannou, Karl Aberer
In 26th IEEE International Conference on Data Engineering (ICDE2010), March 1-6, 2010, Long Beach, California, USA
  Dynamic Cost-Efficient Replication in Data Clouds
Nicolas Bonvin, Thanasis G. Papaioannou and Karl Aberer
In Automated Control for Datacenters and Clouds (ACDC09), Barcelona, Spain, June 15-19, 2009

© 2007-2011 Nicolas Bonvin | Last Modified 2011-05-09 16:57:50