Data-to-Value: An Evaluation-First Methodology for Natural Language Projects

01/19/2022
by   Jochen L. Leidner, et al.
0

Big data, i.e. collecting, storing and processing of data at scale, has recently been possible due to the arrival of clusters of commodity computers powered by application-level distributed parallel operating systems like HDFS/Hadoop/Spark, and such infrastructures have revolutionized data mining at scale. For data mining project to succeed more consistently, some methodologies were developed (e.g. CRISP-DM, SEMMA, KDD), but these do not account for (1) very large scales of processing, (2) dealing with textual (unstructured) data (i.e. Natural Language Processing (NLP, "text analytics"), and (3) non-technical considerations (e.g. legal, ethical, project managerial aspects). To address these shortcomings, a new methodology, called "Data to Value" (D2V), is introduced, which is guided by a detailed catalog of questions in order to avoid a disconnect of big data text analytics project team with the topic when facing rather abstract box-and-arrow diagrams commonly associated with methodologies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/12/2021

Cloud Big Data Mining and Analytics: Bringing Greenness and Acceleration in the Cloud

Big data is gaining overwhelming attention since the last decade. Almost...
research
07/19/2022

Big Data and Education: using big data analytics in language learning

Working with big data using data mining tools is rapidly becoming a tren...
research
06/17/2015

Editorial for the First Workshop on Mining Scientific Papers: Computational Linguistics and Bibliometrics

The workshop "Mining Scientific Papers: Computational Linguistics and Bi...
research
04/04/2019

Learning Analytics Made in France: The METALproject

This paper presents the METAL project, an ongoing French open Learning A...
research
11/30/2021

Flood Analytics Information System (FAIS) Version 4.00 Manual

This project was the first attempt to use big data analytics approaches ...
research
11/09/2022

Evident: a Development Methodology and a Knowledge Base Topology for Data Mining, Machine Learning and General Knowledge Management

Software has been developed for knowledge discovery, prediction and mana...

Please sign up or login with your details

Forgot password? Click here to reset