TSTool / Command / FillMixedStation

Overview
Command Editor
Command Syntax
Examples
Troubleshooting
See Also

Overview

The FillMixedStation command fills missing data in a time series where one or more independent time series are used to sequentially fill missing data. This approach has been developed to automate analysis of regression filling and to facilitate batch filling of many related time series. This implementation is based on the Mixed Station Model implemented for Colorado’s Decision Support Systems (Ayres Associates, 2000), which was based on the similarly named approach implemented by the USGS (Alley and Burns, 1981). However, due to performing calculations in double precision in TSTool, the results are not always identical.

The time series involved in the analysis are typically related, such as being from nearby locations in a region. The main uses of the command are

To automatically fill every time series in a data set, using other time series in the data set. For example, for hydrologic modeling natural flow time series may have been estimated by processing measured streamflow, diversion, and reservoir time series. The natural flow time series can be filled for use in modeling.
To perform an analysis without filling, to guide application of individual FillRegression and other commands.

Important: TSTool does not automatically exclude time series that have been filled in previous steps. Consequently, care must be taken when specifying the list of independent time series to NOT use time series that were filled in a previous step. In the future, features may be enabled to examine the data flags to determine if data have been filled.

For each dependent time series being filled, the Mixed Station Analysis (MSA) selects the independent time series and parameters that result in the best filling results, considering combinations of the following:

The list of independent time series being considered can be constrained to a subset of available time series.
Filling methods include ordinary least squares (OLS) regression (see the FillRegression command for details). Support for MOVE2 may be added in the future (see the FillMOVE2 command for details).
One equation and/or monthly equations can be used.
The data can be transformed using log10, or no transformation can be applied.
A minimum number of overlapping data points (sample size N1) can be specified to indicate a valid relationship.
A minimum correlation coefficient r can be specified to indicate a valid relationship.
A minimum confidence level for the slope of the regression line can be specified (see T-Test discussion below).
The best fit indicator can be the correlation coefficient (R), or the standard error of prediction (SEP, described below).

Because extensive analysis may be necessary to evaluate all the combinations of parameters, the FillMixedStation command will be slower than other commands that specifically indicate how to perform the filling. The number of combinations can also be limited by reducing the number of parameter options and using stricter limitations on the number of overlapping points and correlation coefficients that are required for a good regression result.

The full MSA process is as follows:

For each dependent time series, perform a regression analysis using a unique combination of parameters (e.g., use an independent time series, OLS regression with one equation, no data transform). This results in 1+ regression results for each dependent time series.
Qualifying results (those that meet the requirements of minimum number of overlapping points and correlation coefficient) are retained in a list for the dependent time series, for processing in the next step.
The qualifying results are used to estimate each missing value. Typically, the SEP is used to select the relationship to use (the one that has lowest SEP).
Missing data in the dependent time series are filled using the regression results for the selected relationship. If missing values remain, the next highest ranking regression result is used until all missing values are filled (or no additional qualifying regression results are available). Monthly filling occurs on each of the 12 months. This approach may use different stations for each filled value because of the goodness of fit of the relationship and because different stations may or may not have data that overlap the period to be filled.

Implementation in Colorado’s Decision Support Systems

The Mixed Station Model implemented for the State of Colorado typically used the following input:

Log transform (Transform=Log10)
One and monthly relationships (NumberOfEquations=MonthlyEquations,OneEquation)
Rank on SEP (BestFitIndicator=SEP)
Minimum concurrent values = 5 (MinimumDataCount=5)
Confidence level = 95% (ConfidenceInterval=95)
Fill all time series in data set (nothing selected in filling)

Command Editor

The following dialog is used to edit the command and illustrates the syntax of the command.

FillMixedStation Command Editor for Analysis Parameters (see also the full-size image)

FillMixedStation Command Editor for Criteria Parameters (see also the full-size image)

FillMixedStation Command Editor for Fill Parameters (see also the full-size image)

FillMixedStation Command Editor for Output Parameters (see also the full-size image)

Command Syntax

The command syntax is as follows:

FillMixedStation(Parameter="Value",...)

Command Parameters

Parameter	Description	Default
`DependentTSList`	Indicates the list of independent time series to be processed, one of: `AllMatchingTSID` – all time series that match the TSID (single TSID or TSID with wildcards) will be processed. `AllTS` – all time series before the command will be processed. `EnsembleID` – all time series in the ensemble will be processed. `FirstMatchingTSID` – the first time series that matches the TSID (single TSID or TSID with wildcards) will be processed. `LastMatchingTSID` – the last time series that matches the TSID (single TSID or TSID with wildcards) will be processed. `SelectedTS` – the time series selected with the `SelectTimeSeries` command will be processed.	`AllTS`
`DependentTSID`	The time series identifier or alias for the dependent time series to be processed, using the `*` wildcard character to match multiple time series.	Required if `DependentTSList=*TSID`.
`IndependentTSList`	Indicates the list of independent time series to be considered for each dependent time series, one of: `AllMatchingTSID` – all time series that match the TSID (single TSID or TSID with wildcards) will be processed. `AllTS` – all time series before the command will be processed. `EnsembleID` – all time series in the ensemble will be processed. `FirstMatchingTSID` – the first time series that matches the TSID (single TSID or TSID with wildcards) will be processed. `LastMatchingTSID` – the last time series that matches the TSID (single TSID or TSID with wildcards) will be processed. `SelectedTS` – the time series selected with the `SelectTimeSeries` command will be processed.	`AllTS`
`IndependentTSID`	The time series identifier or alias for the independent time series to be compared, using the `*` wildcard character to match multiple time series.	Required if `IndependentTSList=*TSID`.
`NumberOfEquations`	The number of equations to use for the analysis: `OneEquation` and/or `MonthlyEquations`.	`OneEquation`
`AnalysisMonth`	The month that data should be considered for.	All months
`Transformation`	Indicates how to transform the data before analyzing. Specify as None (no transformation) or `Log` (for `Log10`). If the Log option is used, zero and negative values in data are set to `.001`. Missing data are ignored. If multiple values are selected, separate with a comma and surround with double quotes.	None (no transformation)
`LEZeroLogValue`	Value to use for data values less than or equal to zero when using a log transformation. The Log10 of this value will be used in calculations. Use `Missing` to ignore those values entirely. Caution: this will set `0` as a missing value in the time series.	`.0010`
`Intercept`	Specify as 0 to force the intercept of the best-fit line through the origin. This is made available only for OLS regression analysis on untransformed data, to be consistent with the `FillRegression` command.	Do not force the intercept through zero.
`AnalysisStart`	The date/time to start the analysis, to focus on a period appropriate for analysis. For example, specify the unregulated period for streamflow.	Analyze the full period.
`AnalysisEnd`	The date/time to end the analysis.	Analyze the full period.
`BestFitIndicator`	Specifies the indicator to use when determining the best fit, one of: `R` – correlation coefficient (attempts to maximize) `SEP` – Standard Error of Prediction, defined as the square root of the sum of differences between the known dependent value, and the value determined from the equation of best fit at the same point (attempts to minimize)	`SEP`
`MinimumDataCount`	The minimum number of overlapping data points that are required for a valid analysis (`N1` in `FillRegression` documentation). If the minimum count is not met, then the independent time series is ignored for the specific combination of parameters. For example, if monthly equations are used, the independent time series may be ignored for the specific month; however, it may still be analyzed for other months.	`10`
`MinimumR`	The minimum correlation coefficient required for a best fit. If the minimum is not met, then the results are not considered in the best fit ranking or filling.
`ConfidenceLevel`	Required confidence level for the T-Test on the regression slope. Relationships not passing the test are not allowed for filling.	No limit on confidence level.
`Fill`	Indicates whether filling should occur (`True`) or just analyze to compute statistics (`False`). The latter is useful for testing combinations of statistics prior to actually performing filling. For example, use this command to analyze relationships, create an output table, and then use individual `FillRegression` commands for filling specific time series.	`True`
`FillStart`	The date/time to start filling, if other than the full time series period.Fill the full period.	Fill the entire period.
`FillEnd`	The date/time to end filling, if other than the full time series period.Fill the full period.	Fill the entire period.
`FillFlag`	A single character that will be used to flag filled data. `Auto` will show whether a monthly or single equation was used to fill, as well as what rank the equation used to fill was. `I` will show the location of the equation used to fill.	Filled values will not be flagged.
`FillFlagDesc`	Description for the fill flag, used in reports.	Automatically generated.
`MinimumR`	The minimum correlation coefficient required for a best fit. If the minimum is not met, then the results are not considered in the best fit ranking or filling.	No limit on R.
`TableID`	A table identifier for a table to receive output of the analysis. Note that creating the table requires a significant amount of memory, making it impractical for very large data sets.	Statistics are not written to the table. Refer to the log file for information.
`TableTSIDColumn`	The name of the column in the table that contains time series identifier information. This is used to match the table with time series being analyzed so that statistics can be written to the correct row.	Required if `TableID` is specified.
`TableTSIDFormat`	The specifier used to format the time series identifier in the TableTSIDColumn. The location part of the TSID, or the time series alias is typically used.	The alias will be used if available, or otherwise the full TSID will be used.

Examples

See the automated tests.

The FillFlag parameter shows more information about how the time series were filled:

Example of using the I fill flag (see also the full-size image)

Example of using the Auto fill flag (see also the full-size image)

Various statistics for each combination of time series, such as count, mean, and standard deviation, can be output in a table:

Example of FillMixedStation output table (see also the full-size image)

More information on the statistics in this table can be found in the FillRegression command documentation.

The following example command file fills natural flow time series from a StateMod model file using the traditional CDSS parameters:

ReadStateMod(InputFile="np2008_BF.xbf")
FillMixedStation(BestFitIndicator=SEP,NumberOfEquations="MonthlyEquations,OneEquation",Transformation="Log",ConfidenceInterval=95,MinimumDataCount=5,MinimumR=0,FillFlag="i",TableID="stats",TableTSIDColumn="dependent")