AI applications for water supply and streamflow forecasting: a case study in the Russian River Basin, CA
Laurel Larsen, Professor
Geography
Applications for Fall 2024 are closed for this project.
The Environmental Systems Dynamics Laboratory (ESDL) focuses on the interplay between biological, physical, and human aspects of the environment using a combination of physically-based and data-driven models. This internship aims to expand on our current work exploring the use of deep learning for environmental predictions.
Deep learning methods, and long-short-term-memory networks (LSTMs) in particular, often outperform other models (including physical ones) in making environmental predictions but are often used as a “black box”, reducing our ability to gain insight into the physical processes involved, and to be able to robustly make forecasts in a changing climate. For this, we seek to introduce physical constraints (such as water balance or physically-based state variables) to the LSTM, by modifying including process-based model outputs among the input variables. This will enable the improvement of streamflow prediction, particularly in non-stationary conditions where out-of-sample data are more frequent, as well as a more robust generalization to other watersheds where data measurements are sparser.
This project, developed in collaboration with the U.S. Army Corps, of Engineers (USACE), and the Hydrologic Engineering Center (HEC) is part of the overarching goal of predicting California water supply in a changing climate.
In the first phase we performed a case study focused on snow-pack in the Tule River basin and reservoir inflows at the Lake Success reservoir, located in the western California Sierra Nevada mountains. We developed a long-short-term-memory (LSTM) network to forecast snow-pack accumulation and reservoir inflows, testing the predictive performance of novel machine learning models and comparing results from a companion effort involving a physical model carried out by USACE HEC-HMS modelers. Finally, we integrated the two models and showed that the additional physical constraints result in better out-of-sample predictions.
The next phase of this project aims to develop an LSTM model , as well as a hybrid model combining LSTM and HEC-HMS state variables. The project's specific objectives are to (1) update deep learning algorithms, focusing on the LSTM model available in open-source repositories such as NeuralHydrology, and (2) evaluate the performance of these models in predicting streamflow in the Russian River Basin, California. These models will enhance streamflow predictability, which is crucial for water availability analysis, reservoir operations, and flood risk assessment.
Role: The specific tasks for this project include setting up and extending the Python-based LSTM model available in NeuralHydrology. The database used for the models will include data from the Russian River Basin, California, as well as the CAMELS (Catchment Attributes and Meteorology for Large-Sample Studies) datasets.
Students will work with a variety of datasets from selected US and California watersheds, with a focus on the Russian River Basin, as well as state and national datasets of discharge and precipitation. The student will work collaboratively to develop deep learning models and to augment the LSTM with physically-based state variables. Example tasks involved in this project:
- Experiment with diverse LSTM model architectures
- Parallelize code to tune hyperparameters of the LSTM model at the large scale using UC Berkeley high-performance computing clusters
- Explore different physical constraints in the LTSM model inputs
- Apply transfer learning in LSTM model
- Analyze the importance of physical inputs in LSTM
Learning outcomes include:
- Mastering how to train, calibrate and optimize deep neural networks models
- Learning how to use artificial intelligence to understand physical processes and improve environmental science
- Achieving an improved understanding of environmental systems and hydrology in particular
- Improving data processing skills, including time series analyses
- Becoming familiar with major issues in environmental forecasting and the underlying science
- Gaining hands-on experience with data-driven approaches to catchment hydrology
Day-to-day supervisors for this project: Dino Bellugi, UCB Project Scientist, Matt Fleming, Hydrologic Engineering Center, and Chris Tennant, Army Geospatial Center. UCB Principal Investigator and supervisor, Laurel Larsen, Associate Professor, Geography and Civil Engineering.
Qualifications: This project will be of interest to students in Computer Science, Data Science, and Statistics (though students from other majors are also welcome to apply) who have an interest in applying their Machine Learning (ML) experience to the domains of Earth Science, Geography, Civil and Environmental Engineering. Required: Students should have programming capabilities in Python and PyTorch in particular, as well as experience in Deep Learning. Familiarity with LSTM networks, Matlab, and other ML libraries such as Scikit-Learn, Keras, and the Matlab Statistics and Machine Learning Toolbox is desirable. Students should demonstrate a strong ML background, highlighting courses they have taken, and applications developed. Students should be willing to work as a member of a research team and have strong communications skills.
Hours: to be negotiated
Off-Campus Research Site: Predominantly remote, but with in-person meetings when needed.
Related website: http://esdlberkeley.com
Mathematical and Physical Sciences