Time and the Value of Data

by   Ehsan Valavi, et al.

Managers often believe that collecting more data will continually improve the accuracy of their machine learning models. However, we argue in this paper that when data lose relevance over time, it may be optimal to collect a limited amount of recent data instead of keeping around an infinite supply of older (less relevant) data. In addition, we argue that increasing the stock of data by including older datasets may, in fact, damage the model's accuracy. Expectedly, the model's accuracy improves by increasing the flow of data (defined as data collection rate); however, it requires other tradeoffs in terms of refreshing or retraining machine learning models more frequently. Using these results, we investigate how the business value created by machine learning models scales with data and when the stock of data establishes a sustainable competitive advantage. We argue that data's time-dependency weakens the barrier to entry that the stock of data creates. As a result, a competing firm equipped with a limited (yet sufficient) amount of recent data can develop more accurate models. This result, coupled with the fact that older datasets may deteriorate models' accuracy, suggests that created business value doesn't scale with the stock of available data unless the firm offloads less relevant data from its data repository. Consequently, a firm's growth policy should incorporate a balance between the stock of historical data and the flow of new data. We complement our theoretical results with an experiment. In the experiment, we empirically measure the loss in the accuracy of a next word prediction model trained on datasets from various time periods. Our empirical measurements confirm the economic significance of the value decline over time. For example, 100MB of text data, after seven years, becomes as valuable as 50MB of current data for the next word prediction task.


page 1

page 2

page 3

page 4


Time Dependency, Data Flow, and Competitive Advantage

Data is fundamental to machine learning-based products and services and ...

Towards Machine Learning-based Fish Stock Assessment

The accurate assessment of fish stocks is crucial for sustainable fisher...

Machine Learning for Stock Prediction Based on Fundamental Analysis

Application of machine learning for stock prediction is attracting a lot...

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

We analyze the growth of dataset sizes used in machine learning for natu...

A Word is Worth A Thousand Dollars: Adversarial Attack on Tweets Fools Stock Prediction

More and more investors and machine learning models rely on social media...

Applications of deep learning in stock market prediction: recent progress

Stock market prediction has been a classical yet challenging problem, wi...

Augmented Bilinear Network for Incremental Multi-Stock Time-Series Classification

Deep Learning models have become dominant in tackling financial time-ser...

Please sign up or login with your details

Forgot password? Click here to reset