DataLab: A Platform for Data Analysis and Intervention

02/25/2022
by   Yang Xiao, et al.
16

Despite data's crucial role in machine learning, most existing tools and research tend to focus on systems on top of existing data rather than how to interpret and manipulate data. In this paper, we propose DataLab, a unified data-oriented platform that not only allows users to interactively analyze the characteristics of data, but also provides a standardized interface for different data processing operations. Additionally, in view of the ongoing proliferation of datasets, has features for dataset recommendation and global vision analysis that help researchers form a better view of the data ecosystem. So far, DataLab covers 1,715 datasets and 3,583 of its transformed version (e.g., hyponyms replacement), where 728 datasets support various analyses (e.g., with respect to gender bias) with the help of 140M samples annotated by 318 feature functions. DataLab is under active development and will be supported going forward. We have released a web platform, web API, Python SDK, PyPI published package and online documentation, which hopefully, can meet the diverse needs of researchers.

READ FULL TEXT
research
01/05/2017

OpenML: An R Package to Connect to the Machine Learning Platform OpenML

OpenML is an online machine learning platform where researchers can easi...
research
12/05/2022

Niimpy: a toolbox for behavioral data analysis

Behavioral studies using personal digital devices typically produce rich...
research
01/16/2018

MORF: A Framework for MOOC Predictive Modeling and Replication At Scale

The MOOC Replication Framework (MORF) is a novel software system for fea...
research
06/27/2019

DVP: Data Visualization Platform

We identify two major steps in data analysis, data exploration for under...
research
05/22/2019

Evaluating recommender systems for AI-driven data science

We present a free and open-source platform to allow researchers to easil...
research
01/25/2021

Predicting Workout Quality to Help Coaches Support Sportspeople

The support of a qualified coach is crucial to keep the motivation of sp...
research
03/26/2019

SUSI: Supervised Self-Organizing Maps for Regression and Classification in Python

In many research fields, the sizes of the existing datasets vary widely....

Please sign up or login with your details

Forgot password? Click here to reset