GLCsim {Rglimclim} | R Documentation |
This routine is used to simulate data from models of class GLC.modeldef
, typically created via calls to GLCfit
. The routine can generate univariate or multivariate sequences: the multivariate case is handled by consecutively simulating from a set of linked models in which there are no circular dependencies. Imputation (i.e. simulation conditioned on all available data values) can be performed, as well as unconditional simulation.
GLCsim(modeldefs, siteinfo, start, end, nsims, impute.until = end, output = c("daily", "monthly"), which.regions = 0, which.daily = 1:nsims, daily.start = start, daily.end = end, data.file, external.files, simdir, file.prefix, missval = -99.99)
modeldefs |
Either an object defining a model for a single variable (for univariate simulation), or a list of such objects, each defining a model for a different variable (multivariate simulation). For most variables, these individual model objects will be of class |
siteinfo |
A |
start |
Start date for simulation, in format |
end |
End date for simulation, similarly. The last day simulated is the last of the month. |
nsims |
Number of simulations to perform. |
impute.until |
A date, in the form |
output |
Chooses whether to produce daily output files, monthly output files or both (the default). |
which.regions |
A vector of integers containing the codes of regions for which monthly summaries should be produced if monthly outputs have been requested. NB region 0 is the entire area. For more on region definitions, see |
which.daily |
Vector of simulation numbers for which to produce daily output if this has been requested. Defaults to all simulations. |
daily.start |
Start date for daily output, in form |
daily.end |
End date for daily output. Default is |
data.file |
Name of data file from which to take data for initialisation and imputation. If this is not supplied, the routine will take data from the file that was used for fitting the model(s) in |
external.files |
Character vector of length 3, giving names of files from which to take "external" covariate data (yearly, monthly and daily) to drive the simulations (see the help to |
simdir |
Name of directory in which to store the output files. This will be interpreted as a pathname relative to the current working directory (see |
file.prefix |
Output files are named in a structured way as, for example, |
missval |
The value representing missing observations in the input file. The default of -99.99 is the same as that used in |
This routine is designed to be used with one or more models that have been fitted to a single univariate or multivariate dataset using GLCfit
. The result of a call to GLCfit
is a GLC.modeldef
object which stores the name of the file containing the data used for model fitting in component $filenames$Data
, and also stores the names of the variables in that file in component $var.names
. If the argument data.file
is not supplied, the simulation routine expects to find this data file in the current working directory; data from the file will be used to initialise simulations, and also for conditioning purposes when imputing missing values (see below for more on both of these points). All of the models defined in modeldefs
must reference the same data file; failure to do this will lead to an error.
The routine does not attempt to store the results of simulations internally within memory (see "Value" below for details of what is stored); instead, it writes ASCII
files to the directory specified by the simdir
argument. Control over these files is provided by the output
, which.regions
, which.daily
, daily.start
, daily.end
and file.prefix
arguments.
The data file may contain variables that are not required by the user when simulating. In this case, the output files will still contain columns corresponding to the non-simulated variables; thus the format of each of the daily output files is exactly the same as that as the original data file (see the data.file
argument to GLCfit
for details of this format).
The format of monthly output files is as follows:
There is one row for each year / region combination; rows are ordered by year, and within that by region.
Each row gives the year, region code and M sets of 13 simulated values, where M is the number of variables in the data file (including those that were not simulated). Each set of simulated values contains 12 monthly means for that variable, and an annual mean.
The FORTRAN format for reading each record is I4,1X,I3,1X,M(13(F6.2,1X))
where M
is the number of variables in the data file.
Missing values (due to a lack of any non-missing daily values) are coded as missval
. Note that variables that are not simulated will always be missing unless the routine is carrying out imputations.
Most realistic models will contain lagged values of one or more variables as covariates. To initialise a simulation therefore, values for these variables are needed for an appropriate number of days prior to the first day of the simulation. The routine takes these values from the data file if they are present; if not, it uses the overall mean value of each variable, as computed from the cases used to fit the model and stored in the model definition objects. This overall mean may not be a particularly realistic value: for example, in a region with an annual temperature range of 20 degrees, if a simulation is initialised in the middle of winter with the overall mean temperature then the initial values are likely to be around 10 degrees too high. In most practical applications, the effects of such initial condition errors are likely to be short-lived. Nonetheless, it is worth inspecting plots of the simulation results (see the plot
method) to check that the period of interest is not affected by initial conditions. To ensure this, in some situations it may be helpful to start the simulations a few months prior to the period of interest. The argument daily.start
can then be used to prevent the "start-up" values from being written to the daily output files.
Multivariate simulation cannot be carried out if there are direct or indirect circular dependencies between the variables. An example of a direct circular dependency would occur if the model for variable A included simultaneous (i.e. zero lag) values of B as covariates and vice versa. An indirect dependency would arise if A depended on B, B depended on C and C depended on A. The routine checks for circular dependencies, and terminates with an error message if any are found. Note that mutual dependence at lags greater than zero (e.g. A depends on the previous day's value of B, and B depends on the previous day's value of A) is not a problem.
As well as simulating sequences from the fitted models, the routine can (and will, unless explicitly prevented from doing so) perform random imputations of missing values in the data files: this is done by simulating, for each day, from the distribution of the missing data values conditioned both upon the covariates in the models (including lagged values that were either observed or have already been imputed) and upon the non-missing observations for that day. This provides a means of quantifying uncertainties in quantities of interest due to missing observations. To prevent the routine from carrying out any imputation (i.e. to ensure that the simulations run freely and are not conditioned upon any observations except during initialisation), set the argument impute.until
to a date preceding start
.
For more details on the algorithms used in the simulation and imputation routines, see the Appendices of the PDF package manual.
The routine returns a list
object of class GLCsim
, for which print
and plot
methods are available - see the object class documentation. Use the names
command to find the names of the list components. Many of them just duplicate arguments to the routine as called, although there are a few extra ones as well - most of these are self-explanatory. The component RNGstate
stores the state of the R random number generator on entry. This is a list containing two named elements: RNGkind
and seed
. RNGkind
is the result of a call to RNGkind
on entry; and seed
is the result of a call to .Random.seed
. This can be used to reinitialise the random number generator to the same state that was used to produce a particular simulation (see "Note" below) - although in most cases, this will be more conveniently achieved using a call to set.seed
.
Daily simulation files can be large, so sometimes it may be necessary to delete them after use if storage space is limited. In this case, the simulations can always be recreated by resetting the random number generator and calling GLCsim
again with exactly the same arguments as stored in the resulting object. In general, the recommended way to reset the random number generator is using a call to set.seed
immediately before the call to GLCsim
. For completeness however, the RNGstate
component of a GLCsim
object stores the values of both RNGkind
and .Random.seed
on entry.
Richard Chandler (richard@stats.ucl.ac.uk)
Yang, C., Chandler, R.E., Isham, V. and Wheater, H.S. (2005). Spatial-temporal rainfall simulation using Generalized Linear Models. Water Resources Research 41, doi:10.1029/2004WR003739.
GLCfit
for information on Rglimclim
model objects; also documentation for GLCsim class methods.