The Proper Care and Feeding of CAMELS: How Limited Training Data Affects Streamflow Prediction
Accurate streamflow prediction largely relies on historical records of both meteorological data and streamflow measurements. For many regions around the world, however, such data are only scarcely or not at all available. To select an appropriate model for a region with a given amount of historical data, it is therefore indispensable to know a model's sensitivity to limited training data, both in terms of geographic diversity and different spans of time. In this study, we provide decision support for tree- and LSTM-based models. We feed the models meteorological measurements from the CAMELS dataset, and individually restrict the training period length and the number of basins used in training. Our findings show that tree-based models provide more accurate predictions on small datasets, while LSTMs are superior given sufficient training data. This is perhaps not surprising, as neural networks are known to be data-hungry; however, we are able to characterize each model's strengths under different conditions, including the "breakeven point" when LSTMs begin to overtake tree-based models.
READ FULL TEXT