ProvLight: Efficient Workflow Provenance Capture on the Edge-to-Cloud Continuum

by   Daniel Rosendo, et al.

Modern scientific workflows require hybrid infrastructures combining numerous decentralized resources on the IoT/Edge interconnected to Cloud/HPC systems (aka the Computing Continuum) to enable their optimized execution. Understanding and optimizing the performance of such complex Edge-to-Cloud workflows is challenging. Capturing the provenance of key performance indicators, with their related data and processes, may assist in understanding and optimizing workflow executions. However, the capture overhead can be prohibitive, particularly in resource-constrained devices, such as the ones on the IoT/Edge.To address this challenge, based on a performance analysis of existing systems, we propose ProvLight, a tool to enable efficient provenance capture on the IoT/Edge. We leverage simplified data models, data compression and grouping, and lightweight transmission protocols to reduce overheads. We further integrate ProvLight into the E2Clab framework to enable workflow provenance capture across the Edge-to-Cloud Continuum. This integration makes E2Clab a promising platform for the performance optimization of applications through reproducible experiments.We validate ProvLight at a large scale with synthetic workloads on 64 real-life IoT/Edge devices in the FIT IoT LAB testbed. Evaluations show that ProvLight outperforms state-of-the-art systems like ProvLake and DfAnalyzer in resource-constrained devices. ProvLight is 26 – 37x faster to capture and transmit provenance data; uses 5 – 7x less CPU; 2x less memory; transmits 2x less data; and consumes 2 – 2.5x less energy. ProvLight and E2Clab are available as open-source tools.


page 1

page 3

page 5

page 6

page 8

page 9

page 10


Reproducible Performance Optimization of Complex Applications on the Edge-to-Cloud Continuum

In more and more application areas, we are witnessing the emergence of c...

KheOps: Cost-effective Repeatability, Reproducibility, and Replicability of Edge-to-Cloud Experiments

Distributed infrastructures for computation and analytics are now evolvi...

BeeFlow: Behavior Tree-based Serverless Workflow Modeling and Scheduling for Resource-Constrained Edge Clusters

Serverless computing has gained popularity in edge computing due to its ...

Data Provenance for Sport

Data analysts often discover irregularities in their underlying dataset,...

Enabling Reproducible Analysis of Complex Workflows on the Edge-to-Cloud Continuum

Distributed digital infrastructures for computation and analytics are no...

TAOS-CI: Lightweight Modular Continuous Integration System for Edge Computing

With the proliferation of IoT and edge devices, we are observing a lot o...

A Study of Optimizing Heterogeneous Resources for Open IoT

Recently, IoT technologies have been progressed, and many sensors and ac...

Please sign up or login with your details

Forgot password? Click here to reset