DR2: Dynamic Request Routing for Tolerating Latency Variability in Online Cloud Applications

Jieming Zhu,  Zibin Zheng,  and Michael R. Lyu

Department of Computer Science and Engineering
The Chinese University of Hong Kong

Application latency is one significant user metric for evaluating the performance of online cloud applications. However, as applications are migrated to the cloud and deployed across a wide-area network, the application latency usually presents high variability over time. Among lots of subtleties that influence the latency, one important factor is relying on the Internet for application connectivity, which introduces a high degree of variability and uncertainty on user-perceived application latency. As a result, a key challenge faced by application designers is how to build consistently low-latency cloud applications with the large number of geo-distributed and latency-varying cloud components. In this paper, we propose a dynamic request routing framework, DR2, by taking full advantage of redundant components in the clouds to tolerate latency variability. In practice, many functionally-equivalent components have been already deployed redundantly for load balancing and fault tolerance, thus resulting in low additional overhead for DR2. To evaluate the performance of our approach, we conduct a set of experiments based on two large-scale real-world datasets and a synthetic dataset. The results show the effectiveness and ef´Čüciency of our approach.

Read more from our paper:
-------------------------------
Jieming Zhu, Zibin Zheng, and Michael R. Lyu, "DR2: Dynamic Request Routing for Tolerating Latency Variability in Online Cloud Applications," in Proc. of the 6th IEEE Conference on Cloud Computing (CLOUD), 2013. [Paper][Slides]

Dataset Release

This dataset is collected via the PlanetLab platform in Jan. 2012, comprising two data matrix Lu and Ls. In our experiments, we take the 460 PlanetLab nodes as the hosts of services, and the 1,350 IPs as the users. We measure the latency between each other by continously ping 10 times at a interval of 1ms, and take the median as the result. Thus, the latencies between 1,350 users and 460 services (Lu) and also between the 460 services (Ls) are obtained. The unit of each value is millisecond. Please refer to our paper for more information on the dataset collection.

Download dataset List of contents of the dataset
-----------------------------------
1. "Lu" (4.58 M): This file is a 1350-by-460 data matrix, comprising latencies between 1,350 users and 460 services.
2. "Ls" (1.55 M): This file is a 460-by-460 data matrix, comprising latencies between 460 services. This matrix is asymmetric, since the latency from a to b and latency from b to a may be different due to the routing policy.
3. "readme.txt" (2 KB): This file describes the dataset in detail.