Running a hydrological model with multiple timeseries files¶
In this notebook, we show how to provide the hydrological model with multiple timeseries files. For example, one file could contain meteorological data and the other contain streamflow data, or all variables could be separate (i.e. precip, temperature, streamflow, etc.). The following instructions should make it easier to understand how to do this. for this example, we actually start from a netcdf file containing all information, and from there divide it into multiple time series netcdf files. We then use the split files to drive the model. In most user cases, different files will be provided directly by the user so no need to pre-split your files!
[1]:
#Cookie-cutter template necessary to provide the tools, packages and paths for the project. All notebooks
# need this template (or a slightly adjusted one depending on the required packages)
from birdy import WPSClient
from example_data import TESTDATA
import datetime as dt
from pathlib import Path
from urllib.request import urlretrieve
import xarray as xr
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import os
import json
import netCDF4 as nc
from zipfile import ZipFile
import glob
import tempfile
# Set environment variable WPS_URL to "http://localhost:9099" to run on the default local server
url = os.environ.get("WPS_URL", "https://pavics.ouranos.ca/twitcher/ows/proxy/raven/wps")
wps = WPSClient(url)
# DATA MAIN SOURCE - DAP link to CANOPEX dataset
CANOPEX_DAP = 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/ets/Watersheds_5797_cfcompliant.nc'
CANOPEX_URL = 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/ets/Watersheds_5797_cfcompliant.nc'
[2]:
# Open Canopex dataset using DAP link
ds = xr.open_dataset(CANOPEX_DAP)
ds
[2]:
xarray.Dataset
- time: 22280
- watershed: 5797
- time(time)datetime64[ns]1950-01-01 ... 2010-12-31
- long_name :
- time
- standard_name :
- time
- axis :
- T
- cf_role :
- timeseries_id
- _ChunkSizes :
- 22280
array(['1950-01-01T00:00:00.000000000', '1950-01-02T00:00:00.000000000', '1950-01-03T00:00:00.000000000', ..., '2010-12-29T00:00:00.000000000', '2010-12-30T00:00:00.000000000', '2010-12-31T00:00:00.000000000'], dtype='datetime64[ns]')
- watershed(watershed)|S64b'St. John River at Ninemile Bridge, Maine' ... b'KETTLE CREEK AT ST. THOMAS'
- long_name :
- Name of watershed
- encoding :
- utf-8
- standard_name :
- watershed_name
array([b'St. John River at Ninemile Bridge, Maine', b'St. John River at Dickey, Maine', b'Fish River near Fort Kent, Maine', ..., b'MIDDLE THAMES RIVER AT THAMESFORD', b'BIG OTTER CREEK AT TILLSONBURG', b'KETTLE CREEK AT ST. THOMAS'], dtype='|S64')
- drainage_area(watershed)float64...
- standard_name :
- drainage_area_at_river_stretch_outlet
- coverage_content_type :
- auxiliaryInformation
- long_name :
- drainage_area
- units :
- km2
- _ChunkSizes :
- 5797
array([3471.689421, 6938.20108 , 2260.093113, ..., 306. , 354.1 , 330.88 ])
- pr(watershed, time)float64...
- long_name :
- Precipitation
- standard_name :
- precipitation_flux
- units :
- kg m-2 s-1
- coverage_content_type :
- modelResult
- _ChunkSizes :
- [ 363 1393]
[129157160 values with dtype=float64]
- tasmax(watershed, time)float64...
- units :
- K
- coverage_content_type :
- modelResult
- long_name :
- Daily Maximum Near-Surface Air Temperature
- standard_name :
- air_temperature
- _ChunkSizes :
- [ 363 1393]
[129157160 values with dtype=float64]
- tasmin(watershed, time)float64...
- coverage_content_type :
- modelResult
- long_name :
- Daily Minimum Near-Surface Air Temperature
- standard_name :
- air_temperature
- units :
- K
- _ChunkSizes :
- [ 363 1393]
[129157160 values with dtype=float64]
- discharge(watershed, time)float64...
- coverage_content_type :
- physicalMeasurement
- long_name :
- discharge
- standard_name :
- water_volume_transport_in_river_channel
- units :
- m3 s-1
- _ChunkSizes :
- [ 363 1393]
[129157160 values with dtype=float64]
- title :
- Hydrometeorological data for lumped hydrological modelling of 5797 catchments in North America
- institute_id :
- ETS
- contact :
- Richard Arsenault: richard.arsenault@etsmtl.ca
- date_created :
- 2020-08-01
- source :
- Hydrometric data from USGS National Water Information Service and ECCC Water Survey Canada. Meteorological data from ECCC stations, NRCan 10km gridded interpolated dataset and Livneh 2014 database. Catchment areas from ECCC HYDAT, HydroSheds and USGS.
- featureType :
- timeSeries
- cdm_data_type :
- station
- license :
- ODC-BY
- keywords :
- hydrology, North America, streamflow, hydrometeorological, PAVICS, PAVICS-Hydro, modelling
- activity :
- PAVICS_Hydro
- Conventions :
- CF-1.6, ACDD-1.3
- summary :
- Hydrometeorological database for the PAVICS-Hydro platform, including precipitation, temperature, discharge and catchment area to drive the RAVEN hydrological modelling framework. Provided by the HC3 Laboratory at École de technologie supérieure, Montréal, Canada.
- institution :
- ETS (École de technologie supérieure)
- DODS.strlen :
- 72
- DODS.dimName :
- string72
[3]:
# Setup some parameters to run the models. See the "canopex.ipynb" notebook for more detailed information
# on these parameters. The data we use comes from the extended CANOPEX database.
start = dt.datetime(1998, 1, 1)
stop = dt.datetime(2010, 12, 31)
watershedID = 5600
[4]:
# With this info, we can gather some properties from the CANOPEX database:
tmp=pd.read_csv(TESTDATA['canopex_attributes'])
basin_area=tmp['area'][watershedID]
basin_latitude = tmp['latitude'][watershedID]
basin_longitude = tmp['longitude'][watershedID]
basin_elevation = tmp['elevation'][watershedID]
basin_name=ds.watershed[watershedID].data
print("Basin name: ", basin_name)
print("Latitude: ", basin_latitude, " °N")
print("Area: ", basin_area, " km^2")
Basin name: b'WHITEMOUTH RIVER NEAR WHITEMOUTH'
Latitude: 49.51119663557124 °N
Area: 3650.476384548832 km^2
SECTION TO SEPARATE DISCHARGE AND MET DATA TO RECOMBINE LATER¶
[5]:
# Define the 2 new files, i.e. the meteorological data and the streamflow data
filepathMet = os.getcwd()+"/CANOPEX_Met.nc"
filepathQobs=os.getcwd()+"/CANOPEX_Qobs.nc"
# Do the extraction for the selected catchment
newBasin=ds.isel(watershed=watershedID)
# Generate the streamflow time-series netcdf
Qobsfile = newBasin['discharge']
Qobsfile.to_netcdf(filepathQobs)
# Generate the meteorological time-series netcdf
newBasin=newBasin[['drainage_area','pr','tasmax','tasmin']]
newBasin.to_netcdf(filepathMet)
Here is where we run the model with multiple input time series¶
[6]:
# The model parameters. We are forcing values here just so the model runs, the models are probably very bad choices!
params = '9.5019, 0.2774, 6.3942, 0.6884, 1.2875, 5.4134, 2.3641, 0.0973, 0.0464, 0.1998, 0.0222, -1.0919, ' \
'2.6851, 0.3740, 1.0000, 0.4739, 0.0114, 0.0243, 0.0069, 310.7211, 916.1947'
# Model configuration parameters. Please see "canopex.ipynb" for more details.
# This remains unchanged with multiple timeseries!
config = dict(
start_date=start,
end_date=stop,
area=basin_area,
elevation=basin_elevation,
latitude=basin_latitude,
longitude=basin_longitude,
run_name='test_hmets_NRCAN',
rain_snow_fraction='RAINSNOW_DINGMAN',
nc_spec=json.dumps({'tasmax': {'linear_transform': (1.0, -273.15)},'tasmin': {'linear_transform': (1.0, -273.15)},'pr': {'linear_transform': (86400.0, 0.0)}},),
)
# Here is where we must tell the model that there are multiple input files. The idea is to combine them into a list of strings,
# with each string representing a path to a netcdf file. So we could do something like this:
ts_combined=[str(filepathMet),str(filepathQobs)]
resp = wps.raven_hmets(ts=ts_combined, params=params, **config)
# And get the response
# With `asobj` set to False, only the reference to the output is returned in the response.
# Setting `asobj` to True will retrieve the actual files and copy the locally.
[hydrograph, storage, solution, diagnostics, rv] = resp.get(asobj=True)
print(diagnostics)
observed data series,filename,DIAG_NASH_SUTCLIFFE,DIAG_RMSE,
HYDROGRAPH,/tmp/pywps_process_fufyza3l/CANOPEX_Qobs.nc,-0.12679,30.5267,
You can even invert the order of the netcdf files. Raven will detect which files contain which variables, so the order is not important!¶
[7]:
# Test with reversed timeseries files:
ts_combined=[str(filepathQobs),str(filepathMet)]
resp = wps.raven_hmets(ts=ts_combined, params=params, **config)
# And get the response
# With `asobj` set to False, only the reference to the output is returned in the response.
# Setting `asobj` to True will retrieve the actual files and copy the locally.
[hydrograph, storage, solution, diagnostics, rv] = resp.get(asobj=True)
print(diagnostics)
observed data series,filename,DIAG_NASH_SUTCLIFFE,DIAG_RMSE,
HYDROGRAPH,/tmp/pywps_process_6r8q6yk0/CANOPEX_Qobs.nc,-0.12679,30.5267,