Analyzing time series

Although the Raven server by no means offer a complete suite of tools for time series analysis, we strive to provide basic algorithms that can reduce the volume of data to be downloaded and analyzed locally. To this end, Raven offers a couple of indicators for stream flow series.

Here we’ll test those indicators on a simple test file with around ten years of daily streamflow generated by a Raven simulation.

[1]:
%matplotlib inline

from birdy import WPSClient
from example_data import TESTDATA
import os

# Set environment variable WPS_URL to "http://localhost:9099" to run on the default local server
url = os.environ.get("WPS_URL", "https://pavics.ouranos.ca/twitcher/ows/proxy/raven/wps")
wps = WPSClient(url)
fn = str(TESTDATA['simfile_single'])

Base flow index

The base flow index is the minimum 7-day average flow divided by the mean flow.

[2]:
help(wps.base_flow_index)
Help on method base_flow_index in module birdy.client.base:

base_flow_index(q=None, variable=None, freq='YS') method of birdy.client.base.WPSClient instance
    Return the base flow index, defined as the minimum 7-day average flow divided by the mean flow.

    Parameters
    ----------
    q : ComplexData:mimetype:`application/x-netcdf`
        NetCDF Files or archive (tar/zip) containing netCDF files.
    variable : string
        Name of variable to analyze in netCDF file.
    freq : {'YS', 'MS', 'QS-DEC', 'AS-JUL'}string
        Resampling frequency

    Returns
    -------
    output : ComplexData:mimetype:`application/x-netcdf`
        The indicator values computed on the original input grid.
    output_log : ComplexData:mimetype:`text/plain`
        Collected logs during process run.

The base flow index needs as input arguments the link to a NetCDF file storing the stream flow time series, the name of the stream flow variable, and the frequency at which the index is computed (YS: yearly, QS-DEC: seasonally).

[3]:
resp = wps.base_flow_index(fn, variable='q_sim')
out, log = resp.get(asobj=True)

from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

out.base_flow_index.plot()
[3]:
[<matplotlib.lines.Line2D at 0x7f0b243cbda0>]
../_images/notebooks_time_series_analysis_5_1.png

To compute generic statistics of a time series, use the ts_stats process.

[4]:
help(wps.ts_stats)
Help on method ts_stats in module birdy.client.base:

ts_stats(da, op=None, variable=None, freq='YS', season=None, month=None) method of birdy.client.base.WPSClient instance
    Parameters
    ----------
    da : ComplexData:mimetype:`application/x-netcdf`
        NetCDF Files or archive (tar/zip) containing netCDF files.
    variable : string
        Name of variable to analyze in netCDF file.
    op : {'min', 'max', 'mean', 'std', 'var', 'count', 'sum', 'argmax', 'argmin'}string
        Operation name
    freq : {'YS', 'MS', 'QS-DEC', 'AS-JUL'}string
        Resampling frequency
    season : {'DJF', 'MAM', 'JJA', 'SON'}string
        Season selection specification.
    month : {'range(1, 13)'}string
        Month selection specification

    Returns
    -------
    output : ComplexData:mimetype:`application/x-netcdf`
        The indicator values computed on the original input grid.
    output_log : ComplexData:mimetype:`text/plain`
        Collected logs during process run.

[5]:
# Here we compute the annual summer (JJA) minimum
resp = wps.ts_stats(fn, variable='q_sim', op='min', season='JJA')
out, log = resp.get(asobj=True)
out.ts_stats.plot()
[5]:
[<matplotlib.lines.Line2D at 0x7f0b242e8358>]
../_images/notebooks_time_series_analysis_8_1.png

Frequency analysis

The process freq_analysis is similar to the previous stat sin the it fits a series of annual maxima or minima to a statistical distribution, and returns the values corresponding to different return periods.

[6]:
help(wps.freq_analysis)
Help on method freq_analysis in module birdy.client.base:

freq_analysis(da, mode, t, dist=None, variable=None, window=1, freq=None, season=None, month=None) method of birdy.client.base.WPSClient instance
    Parameters
    ----------
    da : ComplexData:mimetype:`application/x-netcdf`
        NetCDF Files or archive (tar/zip) containing netCDF files.
    variable : string
        Name of variable to analyze in netCDF file.
    mode : {'min', 'max'}string
        Mode
    t : integer
        Return period
    dist : string
        Distribution
    window : integer
        Window size
    freq : {'YS', 'MS', 'QS-DEC', 'AS-JUL'}string
        Resampling frequency
    season : {'DJF', 'MAM', 'JJA', 'SON'}string
        Season selection specification.
    month : {'range(1, 13)'}string
        Month selection specification

    Returns
    -------
    output : ComplexData:mimetype:`application/x-netcdf`
        The indicator values computed on the original input grid.
    output_log : ComplexData:mimetype:`text/plain`
        Collected logs during process run.

For example, computing the Q27, the minimum 7-days streamflow of reccurrence two years, can be done using the following.

[7]:
resp = wps.freq_analysis(fn, variable='q_sim', mode='min', t=2, dist='gumbel_r', window=7)
[8]:
out, log = resp.get(asobj=True)
out.freq_analysis
[8]:
<xarray.DataArray 'freq_analysis' (return_period: 1, nbasins: 1)>
array([[8.88304]])
Coordinates:
    basin_name     (nbasins) object ...
  * return_period  (return_period) int64 2
Dimensions without coordinates: nbasins
Attributes:
    units:          m^3 s-1
    long_name:      N-year return period min annual 7-day flow
    original_name:
    description:    Streamflow frequency analysis for the min annual 7-day fl...
    estimator:      Maximum likelihood
    scipy_dist:     gumbel_r
    standard_name:  discharge
    cell_methods:
    mode:           min
    history:        [2020-03-09 15:39:43] freq_analysis: freq_analysis(da, mo...

An array of return periods can be passed.

[9]:
resp = wps.freq_analysis(fn, variable='q_sim', mode='max', t=(2, 5, 10, 25, 50, 100), dist='gumbel_r')
out, log = resp.get(asobj=True)
out.freq_analysis.plot()
[9]:
[<matplotlib.lines.Line2D at 0x7f0b1db6cf28>]
../_images/notebooks_time_series_analysis_15_1.png

Getting the parameters of the distribution and comparing the fit

It’s sometimes more useful to store the fitted parameters of the distribution rather than storing only the quantiles. In the example below, we’re first computing the annual maxima of the simulated time series, then fitting them to a gumbel distribution using the fit process.

[10]:
resp = wps.ts_stats(fn, variable="q_sim", op="max")
ts = resp.get()[0]

resp = wps.fit(ts, dist='gumbel_r')
pa_fn = resp.get()[0]
pa = resp.get(True)[0]
pa.params
[10]:
<xarray.DataArray 'params' (dparams: 2, nbasins: 1)>
array([[154.27106 ],
       [ 44.071605]])
Coordinates:
    basin_name  (nbasins) object ...
  * dparams     (dparams) object 'loc' 'scale'
Dimensions without coordinates: nbasins
Attributes:
    units:
    long_name:      gumbel_r distribution parameters
    history:        [2020-03-09 15:39:46] ts_stats: ts_stats(da, op, freq='YS...
    cell_methods:   time: fit
    standard_name:  gumbel_r parameters
    description:    Parameters of the gumbel_r distribution fitted
    original_name:  discharge
    estimator:      Maximum likelihood
    scipy_dist:     gumbel_r

To see how the distribution defined by those parameters compare with the original data, the graph_fit process can help by creating a graphic showing an histogram of the data, overlayed with the fitted probability density function. The left y-axis displays the density (pdf) while the right y-axis displays the histogram count.

[11]:
resp = wps.graph_fit(ts=ts, params=pa_fn)
resp.get(asobj=True)[0]
[11]:
../_images/notebooks_time_series_analysis_19_0.png