Running a hydrological model with multiple timeseries files

In this notebook, we show how to provide the hydrological model with multiple timeseries files. For example, one file could contain meteorological data and the other contain streamflow data, or all variables could be separate (i.e. precip, temperature, streamflow, etc.). The following instructions should make it easier to understand how to do this. for this example, we actually start from a netcdf file containing all information, and from there divide it into multiple time series netcdf files. We then use the split files to drive the model. In most user cases, different files will be provided directly by the user so no need to pre-split your files!

[1]:
#Cookie-cutter template necessary to provide the tools, packages and paths for the project. All notebooks
# need this template (or a slightly adjusted one depending on the required packages)
from birdy import WPSClient

from example_data import TESTDATA
import datetime as dt
from pathlib import Path
from urllib.request import urlretrieve
import xarray as xr
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import os
import json
import netCDF4 as nc
from zipfile import ZipFile
import glob
import tempfile

# Set environment variable WPS_URL to "http://localhost:9099" to run on the default local server
url = os.environ.get("WPS_URL", "https://pavics.ouranos.ca/twitcher/ows/proxy/raven/wps")
wps = WPSClient(url)

# DATA MAIN SOURCE - DAP link to CANOPEX dataset
CANOPEX_DAP = 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/ets/Watersheds_5797_cfcompliant.nc'
CANOPEX_URL = 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/ets/Watersheds_5797_cfcompliant.nc'
[2]:
# Open Canopex dataset using DAP link
ds = xr.open_dataset(CANOPEX_DAP)
ds
[2]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • time: 22280
    • watershed: 5797
    • time
      (time)
      datetime64[ns]
      1950-01-01 ... 2010-12-31
      long_name :
      time
      standard_name :
      time
      axis :
      T
      cf_role :
      timeseries_id
      _ChunkSizes :
      22280
      array(['1950-01-01T00:00:00.000000000', '1950-01-02T00:00:00.000000000',
             '1950-01-03T00:00:00.000000000', ..., '2010-12-29T00:00:00.000000000',
             '2010-12-30T00:00:00.000000000', '2010-12-31T00:00:00.000000000'],
            dtype='datetime64[ns]')
    • watershed
      (watershed)
      |S64
      b'St. John River at Ninemile Bridge, Maine' ... b'KETTLE CREEK AT ST. THOMAS'
      long_name :
      Name of watershed
      encoding :
      utf-8
      standard_name :
      watershed_name
      array([b'St. John River at Ninemile Bridge, Maine',
             b'St. John River at Dickey, Maine', b'Fish River near Fort Kent, Maine',
             ..., b'MIDDLE THAMES RIVER AT THAMESFORD',
             b'BIG OTTER CREEK AT TILLSONBURG', b'KETTLE CREEK AT ST. THOMAS'],
            dtype='|S64')
    • drainage_area
      (watershed)
      float64
      ...
      standard_name :
      drainage_area_at_river_stretch_outlet
      coverage_content_type :
      auxiliaryInformation
      long_name :
      drainage_area
      units :
      km2
      _ChunkSizes :
      5797
      array([3471.689421, 6938.20108 , 2260.093113, ...,  306.      ,  354.1     ,
              330.88    ])
    • pr
      (watershed, time)
      float64
      ...
      long_name :
      Precipitation
      standard_name :
      precipitation_flux
      units :
      kg m-2 s-1
      coverage_content_type :
      modelResult
      _ChunkSizes :
      [ 363 1393]
      [129157160 values with dtype=float64]
    • tasmax
      (watershed, time)
      float64
      ...
      units :
      K
      coverage_content_type :
      modelResult
      long_name :
      Daily Maximum Near-Surface Air Temperature
      standard_name :
      air_temperature
      _ChunkSizes :
      [ 363 1393]
      [129157160 values with dtype=float64]
    • tasmin
      (watershed, time)
      float64
      ...
      coverage_content_type :
      modelResult
      long_name :
      Daily Minimum Near-Surface Air Temperature
      standard_name :
      air_temperature
      units :
      K
      _ChunkSizes :
      [ 363 1393]
      [129157160 values with dtype=float64]
    • discharge
      (watershed, time)
      float64
      ...
      coverage_content_type :
      physicalMeasurement
      long_name :
      discharge
      standard_name :
      water_volume_transport_in_river_channel
      units :
      m3 s-1
      _ChunkSizes :
      [ 363 1393]
      [129157160 values with dtype=float64]
  • title :
    Hydrometeorological data for lumped hydrological modelling of 5797 catchments in North America
    institute_id :
    ETS
    contact :
    Richard Arsenault: richard.arsenault@etsmtl.ca
    date_created :
    2020-08-01
    source :
    Hydrometric data from USGS National Water Information Service and ECCC Water Survey Canada. Meteorological data from ECCC stations, NRCan 10km gridded interpolated dataset and Livneh 2014 database. Catchment areas from ECCC HYDAT, HydroSheds and USGS.
    featureType :
    timeSeries
    cdm_data_type :
    station
    license :
    ODC-BY
    keywords :
    hydrology, North America, streamflow, hydrometeorological, PAVICS, PAVICS-Hydro, modelling
    activity :
    PAVICS_Hydro
    Conventions :
    CF-1.6, ACDD-1.3
    summary :
    Hydrometeorological database for the PAVICS-Hydro platform, including precipitation, temperature, discharge and catchment area to drive the RAVEN hydrological modelling framework. Provided by the HC3 Laboratory at École de technologie supérieure, Montréal, Canada.
    institution :
    ETS (École de technologie supérieure)
    DODS.strlen :
    72
    DODS.dimName :
    string72
[3]:
# Setup some parameters to run the models. See the "canopex.ipynb" notebook for more detailed information
# on these parameters. The data we use comes from the extended CANOPEX database.
start = dt.datetime(1998, 1, 1)
stop = dt.datetime(2010, 12, 31)
watershedID = 5600
[4]:
# With this info, we can gather some properties from the CANOPEX database:
tmp=pd.read_csv(TESTDATA['canopex_attributes'])
basin_area=tmp['area'][watershedID]
basin_latitude = tmp['latitude'][watershedID]
basin_longitude = tmp['longitude'][watershedID]
basin_elevation = tmp['elevation'][watershedID]
basin_name=ds.watershed[watershedID].data

print("Basin name: ", basin_name)
print("Latitude: ", basin_latitude, " °N")
print("Area: ", basin_area, " km^2")
Basin name:  b'WHITEMOUTH RIVER NEAR WHITEMOUTH'
Latitude:  49.51119663557124  °N
Area:  3650.476384548832  km^2

SECTION TO SEPARATE DISCHARGE AND MET DATA TO RECOMBINE LATER

[5]:
# Define the 2 new files, i.e. the meteorological data and the streamflow data
filepathMet = os.getcwd()+"/CANOPEX_Met.nc"
filepathQobs=os.getcwd()+"/CANOPEX_Qobs.nc"

# Do the extraction for the selected catchment
newBasin=ds.isel(watershed=watershedID)

# Generate the streamflow time-series netcdf
Qobsfile = newBasin['discharge']
Qobsfile.to_netcdf(filepathQobs)

# Generate the meteorological time-series netcdf
newBasin=newBasin[['drainage_area','pr','tasmax','tasmin']]
newBasin.to_netcdf(filepathMet)

Here is where we run the model with multiple input time series

[6]:
# The model parameters. We are forcing values here just so the model runs, the models are probably very bad choices!

params = '9.5019, 0.2774, 6.3942, 0.6884, 1.2875, 5.4134, 2.3641, 0.0973, 0.0464, 0.1998, 0.0222, -1.0919, ' \
         '2.6851, 0.3740, 1.0000, 0.4739, 0.0114, 0.0243, 0.0069, 310.7211, 916.1947'

# Model configuration parameters. Please see "canopex.ipynb" for more details.
# This remains unchanged with multiple timeseries!
config = dict(
    start_date=start,
    end_date=stop,
    area=basin_area,
    elevation=basin_elevation,
    latitude=basin_latitude,
    longitude=basin_longitude,
    run_name='test_hmets_NRCAN',
    rain_snow_fraction='RAINSNOW_DINGMAN',
    nc_spec=json.dumps({'tasmax': {'linear_transform': (1.0, -273.15)},'tasmin': {'linear_transform': (1.0, -273.15)},'pr': {'linear_transform': (86400.0, 0.0)}},),
)


# Here is where we must tell the model that there are multiple input files. The idea is to combine them into a list of strings,
# with each string representing a path to a netcdf file. So we could do something like this:
ts_combined=[str(filepathMet),str(filepathQobs)]
resp = wps.raven_hmets(ts=ts_combined, params=params, **config)

# And get the response
# With `asobj` set to False, only the reference to the output is returned in the response.
# Setting `asobj` to True will retrieve the actual files and copy the locally.
[hydrograph, storage, solution, diagnostics, rv] = resp.get(asobj=True)
print(diagnostics)
observed data series,filename,DIAG_NASH_SUTCLIFFE,DIAG_RMSE,
HYDROGRAPH,/tmp/pywps_process_fufyza3l/CANOPEX_Qobs.nc,-0.12679,30.5267,

You can even invert the order of the netcdf files. Raven will detect which files contain which variables, so the order is not important!

[7]:
# Test with reversed timeseries files:
ts_combined=[str(filepathQobs),str(filepathMet)]
resp = wps.raven_hmets(ts=ts_combined, params=params, **config)

# And get the response
# With `asobj` set to False, only the reference to the output is returned in the response.
# Setting `asobj` to True will retrieve the actual files and copy the locally.
[hydrograph, storage, solution, diagnostics, rv] = resp.get(asobj=True)
print(diagnostics)
observed data series,filename,DIAG_NASH_SUTCLIFFE,DIAG_RMSE,
HYDROGRAPH,/tmp/pywps_process_6r8q6yk0/CANOPEX_Qobs.nc,-0.12679,30.5267,

As you can see, the NSE values and RMSE values are identical. You can pass as many NetCDF files as you have variables in any order and it will still work.