How does SSD Cluster Perform for Distributed File Systems: An Empirical Study

by   Jiashu Wu, et al.

As the capacity of Solid-State Drives (SSDs) is constantly being optimised and boosted with gradually reduced cost, the SSD cluster is now widely deployed as part of the hybrid storage system in various scenarios such as cloud computing and big data processing. However, despite its rapid developments, the performance of the SSD cluster remains largely under-investigated, leaving its sub-optimal applications in reality. To address this issue, in this paper we conduct extensive empirical studies for a comprehensive understanding of the SSD cluster in diverse settings. To this end, we configure a real SSD cluster and gather the generated trace data based on some often-used benchmarks, then adopt analytical methods to analyse the performance of the SSD cluster with different configurations. In particular, regression models are built to provide better performance predictability under broader configurations, and the correlations between influential factors and performance metrics with respect to different numbers of nodes are investigated, which reveal the high scalability of the SSD cluster. Additionally, the cluster's network bandwidth is inspected to explain the performance bottleneck. Finally, the knowledge gained is summarised to benefit the SSD cluster deployment in practice.


page 1

page 8

page 12


Plug and Play Bench: Simplifying Big Data Benchmarking Using Containers

The recent boom of big data, coupled with the challenges of its processi...

Evaluation of Distributed Databases in Hybrid Clouds and Edge Computing: Energy, Bandwidth, and Storage Consumption

A benchmark study of modern distributed databases is an important source...

C3O: Collaborative Cluster Configuration Optimization for Distributed Data Processing in Public Clouds

Distributed dataflow systems enable data-parallel processing of large da...

Does Big Data Require Complex Systems? A Performance Comparison Between Spark and Unicage Shell Scripts

The paradigm of big data is characterized by the need to collect and pro...

Understanding System Characteristics of Online Erasure Coding on Scalable, Distributed and Large-Scale SSD Array Systems

Large-scale systems with arrays of solid state disks (SSDs) have become ...

Analyzing astronomical data with Apache Spark

We investigate the performances of Apache Spark, a cluster computing fra...

Performance Evaluation of Distributed Computing Environments with Hadoop and Spark Frameworks

Recently, due to rapid development of information and communication tech...

Please sign up or login with your details

Forgot password? Click here to reset