Machine Learning

Last modified by s2s_wiki on 2020/12/04 00:37

Machine Learning / Artificial Intelligence for S2S prediction

There is currently a lot excitement in the weather and climate communities to explore the potential of data driven approaches based on Artificial Intelligence/Machine Learning/Deep learning for S2S prediction through, for instance, improved parameterization, improved calibration and multi-model calibration, extreme event attribution, verification... The publicly available S2S database which contains a considerable amount of data (re-forecasts and real-time forecasts from 11 operational centres) represents an ideal testbed for these data-driven methods. The SubX database provides another such opportunity (

Potential applications :
 1. Improved data assimilation (e.g. better quality control of observations)
 2. Improved parameterization (e.g. radiative schemes)
 3. Improved post-processing (model calibration, bias-correction, multi-ensemble combination...)
 4. Predictability diagnostics (e.g. teleconnections)
 5. S2S event attribution (e.g. origins of extreme events)
 6. Empirical forecasts

Links on activities of ML community 

Ongoing Community Research on Machine Learning / Artificial Intelligence for S2S prediction
1. Scripps Institute of Oceanography  (From Dr. Peter Gibson. Updated on April 1, 2020)
 The project explores the potential for modern machine learning tools to improve seasonal prediction skill of precipitation over the Western US. Modern machine learning approaches are 'data hungry' while observations are data limited (relatively short in length for the purposes of seasonal forecasting). To circumvent this issue, we train a variety of machine learning tools on perturbed initial condition climate model ensembles that span several thousands of years, then use these 'learnt' teleconnections to make seasonal predictions. We are testing a hierarchy of machine learning approaches from simple to complex: simple logistic regression, LASSO, Random Forests, Gradient Boosted decision trees, and convolutional neural networks. This project is a collaboration between researchers at Scripps CW3E and JPL, and funded by the California Department of Water Resources.  

2. Australian Bureau of Meteorology (From Catherine de Burgh-Day​ with Oscar Alves and Debbie Hudson. Updated on April 2, 2020))

We are at the early stages of work developing a ML-based vegetation model which uses outputs of the Bureau's seasonal prediction ACCESS-S.The purpose of this work is twofold:  

  • Investigate the possibility of making predictions of vegetation ​characteristics in the coming weeks and seasons using model outputs as predictors. Forecasts of vegetation could have potential use for a number of sectors including fire agencies and agriculture 
  • Attempt to use the vegetation model we develop to periodically update the vegetation ancillary file used in model runs. Currently ACCESS-S1 uses a static vegetation file. We plan to investigate what possible skill gains could be got from a more dynamic representation of vegetation, and then to try updating the vegetation ancillary of the model every N timesteps by passing it through our vegetation model, along with the latest model parameters. 

We intend to start by trying an LSTM Neural Network for the vegetation model, potentially also including some convolutional layers. We will however be investigating what is most effective as we go. Initially we will be training using the ACCESS-S1 hindcast, however if a larger training set is needed we may investigate using a larger set to train, followed by transfer learning techniques to update the model to ACCESS-S. 

3. APEC Climate Center (From Dr. Hyung Jin Kim with Dr. Uran Chung and Dr. Kyungwon Park. Updated on April 3, 2020)
Our project is to develop a deep learning ensemble technique to improve subseasonal forecast over the Korean Peninsula. Deep learning is now recognized as a technique to improve climate forecasting, especially subseasonal climate prediction; however there is a limit to the application of deep learning due to insufficiency in size of subseasonal forecast data to train and test for deep learning models. Therefore, we are testing ensemble techniques for constructing sufficient subseasonal prediction data of the Korean Peninsula from climate models, and developing the application of machine learning and various deep learning algorithms (e.g. SVM, RF, RNN, LSTM, and Convolution LSTM) to the multi-model-ensemble based-subseasonal prediction data, to improve the daily maximum and minimum temperatures, and precipitation of the Korean Peninsula.
4. NOAA (ESRL/PSD) (From Dr. Michael Scheuerer. Updated on April 3, 2020)
'Using artificial neural networks for generating probabilistic subseasonal precipitation forecasts over California'
We have NOT obtained our data from the publicly available S2S database mentioned in the email below. For this study, we have used
  • Subseasonal retrospective forecasts by the IFS ensemble, Cycle 43r3, that we retrieved from the ECMWF MARS archive system
  • The daily accumulated PRISM precipitation data set, obtained from 
  • ERA5 reanalysis data, obtained from the Copernicus Climate Change Service

ML/AI methodology used:
In our work (paper has been submitted recently) we propose two new approaches for statistical post-processing of subseasonal ensemble forecasts:

  • The first approach uses an artificial neural network to translate subseasonal IFS precipitation forecasts into reliable probabilistic forecasts of week-2, week-3, and week-4 precipitation accumulations over California
  • The second approach uses a convolutional neural network to link large-scale predictors (geopotential height and total column water over the north-eastern Pacific) calculated from ERA5 analyses to precipitation amounts over California; these relationships are then used to derive week-2, week-3, and week-4 precipitation forecasts from subseasonal IFS forecasts of these large-scale weather variables
5. Colorado State University (From Prof. Elizabeth A. Barnes. Updated on April 3, 2020)
I have multiple members of my group using ML for S2S prediction. Specifically, we are focused on interpretable neural networks - so the goal is to not only make better empirical predictions, but to also understand where the predictability is coming from. We are also working on using ML to leverage climate model information to improve observational predictions.
6. Climate Prediction Center, NOAA/NWS/NCEP (From Dr. Yun Fan with Dr. Jon Gottschalck. Updated on April 4, 2020)
Benefiting from great advances in the machine learning techniques in recent years, such as more flexible and capable machine learning algorithms and availability of big dataset, we designed a more beneficial neural network setups which enable us not only to explore nonlinear impacts from big data, but also extract more sophisticated pattern and co-variabilities relationships hidden behind the multiple dimensional predictors and predictands. Then these learned more complicate relationships and high level statistical information are used to correct the original bias corrected NOAA NCEP Climate Forecast System(CFSv2) Week 34 precipitation and 2 meter temperature forecasts. The results show that to some extent neural network techniques can clearly improve the Week 34 forecast accuracy and greatly increase the efficiency over the traditional pointwise multiple linear regression methods. The dataset currently used is the NOAA NCEP CFSv2. In the near future, we will work on the NCEP GEFS, ECMWF, CMC etc real-time data sets available here in the NOAA CPC.
The following link has our NN short paper (on page 59-63:
A paper submitted to the AMS Journal: WAF (under revision):
    Yun Fan, Vladimir Krasnopolsky, Huug van den Dool, Chung-Yu Wu and Jon Gottschalck; 2020: Using Artificial Neural Networks to Improve CFS Week 3-4 Precipitation and 2 Meter Air Temperature Forecasts. 
7. Royal Dutch Meteorological Institute (KNMI) and the Institute for Environmental Studies at the Vrije Universiteit Amsterdam (IVM) (From Chiem van Straaten. Updated on April 6, 2020)
At the Royal Dutch Meteorological Institute (KNMI) and the Institute for Environmental Studies at the Vrije Universiteit Amsterdam (IVM) we run a research project called ‘Improvement of sub-seasonal probabilistic forecasts of European high-impact weather events using machine learning techniques’. The project uses ML for post-processing and diagnostics (mainly dimension reduction and learning connections).
We evaluate whether probabilistic forecasts at the sub-seasonal timescale contain skill for surface variables in Europe (e.g. 2-meter temperature), and how this depends on scale, location and extremity. Then, for events in which some predictability is found (for hot extremes predictability is expected), we try to find their physical precursors in other variables. Ridge regression and unsupervised clustering are used for dimension reduction in SST’s, geopotential height and more.
Lastly, we combine the information on observed driving factors with information on shortcomings of the ensemble prediction systems (e.g. propagation of waves from the tropics to the mid-latitudes) to post-process the forecasts. We have experience with RF’s and CNN’s for post-processing at shorter timescales. Regarding data: forecast evaluation was done on ECMWF cycle 45r1, precursors are currently searched in ERA5, and we might apply our post-processing to the EPS’s in the S2S database.
8. National Center of Scientific Research “Demokritos” (NCSRD) (From Dr. Athanasios Sfetsos. Updated on April 9, 2020)
Generic Title: implementing a Deep Learning approach for spatial and time error correction of S2S simulation data over Greece
The current work of National Center of Scientific Research “Demokritos” (NCSRD) with respect to Machine Learning (ML) and Seasonal to Subseasonal (S2S)  is based on the temporal and spatial enhancement of S2S predictions with Deep Learning approaches. More specifically, NCSRD locally produces a S2S prediction for Greece (at very high spatial resolution of 5x5 km2) downscaled from a European wide (at 20x20 km2 grid resolution) domain. The simulations are forced by the Climate Forecast System (CFS)  model from the National Centers for Environmental Prediction (NCEP) in addition to existing datasets from the S2S database.  In order to effectively correct the error of the simulation result, a deep learning approach is tested based on a combination of Convolutional neural networks (CNN) and Recurrent Neural Network (RNN) architectures concerning the space and time domains respectively, focusing on Greece, thus enhancing the accuracy and predictability of longer S2S simulations.
9. ETH Zurich (From Prof. Daniela Domeisen. Updated on April 13, 2020)
Our ongoing project is a collaboration between ETH and the Swiss Data Science Center (SDSC), exploring the subseasonal predictability of stratospheric extreme events using data science methods.
The upper atmosphere, i.e. the stratosphere at about 12 – 50km above the Earth’s surface, provides increased predictability to Europe after extreme stratospheric events, so-called Sudden Stratospheric Warming (SSW) events. These events can provide skill over Europe for up to several weeks to months, with persistently colder than usual weather over Northern and central Europe. SSW events themselves are currently only possible to predict several days in advance. An extended prediction of SSW events would therefore significantly benefit forecasts at the surface. It is therefore crucial to understand the predictability of the stratosphere itself.
The main objectives of this project are the use of reanalysis data and the S2S prediction database to extract novel insights from this data using data science tools. A first step will be an improved classification of stratospheric events, allowing for a flexible definition that includes the predictability aspects of these events. For instance, we are building new representations of the polar vortex using non-linear dimension reduction techniques that can later be used in unsupervised clustering algorithms. In a second step, this project aims to classify remote predictors of long-term weather variability. In particular, known predictors for stratospheric and tropospheric variability will be evaluated using data science methods and possible new predictors will be identified. This knowledge is expected to lead to an improved predictability of the weather over Europe on weekly to monthly timescales.
10. ECMWF (From Dr. Michel Rixen. Updated on April 13, 2020)
Machine learning seminars:

S2S machine Learning competition:

Artificial Intelligence (AI) or Machine Learning (ML) methods for weather forecasting have recently generated a huge interest in the research community. These methods could be used to improve data assimilation methods, physical parameterizations or the post-processing of model outputs. Research on using AI/ML methods as an alternative to dynamical models is also ongoing. The newly formed WMO Research Board has identified Artificial Intelligence (AI) as a key research topic in weather and climate science for the upcoming years. The World Meteorological Organization (WMO) Science & Innovation Department, in collaboration with the Services and Infrastructure, has encouraged holding an open competition to explore new services based on AI methods and applied to the WWRP/WCRP Sub-seasonal to Seasonal S2S project database. Following this recommendation, the WWRP/WCRP S2S Project is planning to organize an Artificial Intelligence (AI)/Machine Learning (ML) competition in 2021. The innovation coming out of this competition will support the goals and actions areas of the S2S and WWRP implementation plans as well as the WCRP strategic plan.

The main goal of the competition is to encourage the use of AI/ML tools to extract valuable information from the S2S database. The S2S database contains a huge amount of data (more than 100 TBs) which makes it a potentially powerful resource for AI/ML methods to explore possibilities of improving current S2S forecasts through, for instance, improved bias correction and multi-model combination. The competition should provide us more insight on the potential benefit of S2S/AI methods for S2S prediction.

The current proposal for this competition is to provide the “best possible” forecast of 2-metre temperature and precipitation, at forecast lead times of weeks 3-4 using bi-weekly averages. The forecast domain will be global (on the 1.5-degree spatial grid resolution of the S2S database but limited to land gridpoints) and the forecasts must be issued as tercile probabilities. The verification will be performed using the Ranked Probability Skill Score (RPSS) on 3 domains: Northern Exratropics, tropics and Southern Extratropics. The verification data will come from CPC 2-metre temperature and gridded data. The created software, code, documentation and results will be required to be open source and open access.

It is envisaged to have 2 rounds. During the first round, hindcasts from all the Thursdays of a given year (e.g. 2020) will be produced. The benchmark will be the ECMWF hindcasts after simple calibration. No data more recent than the forecasts start date should be used. During the second round, the most highly ranked teams from round 1, will compete on real-time forecasts. The competition will be open to AI/ML methods using data from the S2S database, but it will also be open to AI/ML methods using other types of input data, such as large climate model ensembles or reanalysis data. There will be a monetary prize from WMO for the winners.

The competition is planned to take place in 2021 and will be advertised via the S2S mailing list. Depending on the platform which will be used to run the competition, some of the aspects of the competition, as described above, may be modified.

Created by Administrator on 2020/04/01 10:19
This wiki is licensed under a Creative Commons 2.0 license
XWiki Enterprise 6.2.2 - Documentation