首页 >> 科学研究 >> 科研成果 >> 正文

DCCP: an effective data placement strategy for data-intensive computations in distributed cloud computing systems

2016-11-30
  • 阅读:

作者: Wang, T (Wang, Tao); Yao, SH (Yao, Shihong); Xu, ZQ (Xu, Zhengquan); Jia, S (Jia, Shan)

来源出版物: JOURNAL OF SUPERCOMPUTING 卷: 72 期: 7 页: 2537-2564 DOI: 10.1007/s11227-015-1511-z 出版年: JUL 2016

摘要: Cloud computing systems provide high-performance computing resources and distributed storage space to deal with data-intensive computations. Data scheduling between data centers is becoming indispensable for the cloud computing systems in which a mass of large datasets is stored at different data centers and inter-center data accesses are needed in data analytics. However, the performance of data scheduling is highly dependent upon the rationality of data placement. Data placement is a key optimization method for reducing data scheduling between data centers and realizing statistical I/O load balancing, accordingly reducing the mean computation execution time. This paper proposes a data placement strategy, DCCP, which is developed based on dynamic computation correlation. DCCP places the datasets with high dynamic computation correlations at the same data center considering the I/O load and the capacity load of data centers; when computations are scheduled for this data center, most of the datasets they process are stored locally, and thus the mean computation execution time can be reduced. Evidence from a large number of experiments proves that the DCCP can achieve the statistical I/O load balancing and the capacity load balancing of data centers, thus reducing the total data scheduling between data centers as much as possible at a very low time complexity, even as the numbers of datasets and data centers increase.