A Framework for Energy-aware Evaluation of Distributed Data Processing Platforms in Edge-Cloud Environment
Distributed data processing platforms (e.g., Hadoop, Spark, and Flink) are widely used to distribute the storage and processing of data among computing nodes of a cloud. The centralization of cloud resources has given birth to edge computing, which enables the processing of data closer to the data source instead of sending it to the cloud. However, due to resource constraints such as energy limitations, edge computing cannot be used for deploying all kinds of applications. Therefore, tasks are offloaded from an edge device to the more resourceful cloud. Previous research has evaluated the energy consumption of the distributed data processing platforms in the isolated cloud and edge environments. However, there is a paucity of research on evaluating the energy consumption of these platforms in an integrated edge-cloud environment, where tasks are offloaded from a resource-constraint device to a resource-rich device. Therefore, in this paper, we first present a framework for the energy-aware evaluation of the distributed data processing platforms. We then leverage the proposed framework to evaluate the energy consumption of the three most widely used platforms (i.e., Hadoop, Spark, and Flink) in an integrated edge-cloud environment consisting of Raspberry Pi, edge node, edge server node, private cloud, and public cloud. Our evaluation reveals that (i) Flink is most energy-efficient followed by Spark and Hadoop is found least energy-efficient (ii) offloading tasks from resource-constraint to resource-rich devices reduces energy consumption by 55.2 and server are found key factors impacting the energy consumption.
READ FULL TEXT