Contributing a Model • allometric

Motivated users are entirely capable of contributing models, provided they are comfortable using git pull requests.

Fork and then clone the models repository to your local enviroment.
Write a new file in the publications directory that
1. Establishes a citation for the originating publication of the model.
2. Instantiates the model (usually a ParametricModel or ModelSet) with associated metadata and prediction function.
Add and commit the new file to the remote forked repository from (1)
Submit a pull request to allometric/models via GitHub.

After the completion of step 4, your pull request will be tested and reviewed for completion, and will become part of the main package if it passes review.

Users that are not familiar with git can simply email me their model files at bfrank70@gmail.com. All other users are encouraged to submit models via pull requests

Establishing a Publication

allometric::Publication is a class that represents a given scientific article, report, or other technical document that contains allometric models.

Citations are established using RefManageR::BibEntry which is an R representation of a BibTex citation. For example, the BibEntry for the following publication

H. Temesgen, V. J. Monleon, and D. W. Hann. “Analysis and comparison of nonlinear tree height prediction strategies for Douglas-fir forests”. In: Canadian Journal of Forest Research 38.3 (2008), pp. 553-565.

paper_citation <- RefManageR::BibEntry(
  bibtype = "article",
  key = "temesgen_2008",
  title = "Analysis and comparison of nonlinear tree height prediction
        strategies for Douglas-fir forests",
  author = "Temesgen, Hailemariam and Monleon, Vicente J. and Hann, David W.",
  journal = "Canadian Journal of Forest Research",
  year = 2008,
  volume = 38,
  number = 3,
  pages = "553-565",
  year = 2008
)

many different types of publications are available, called “entry types”, and documentation for required fields are given here. The function that creates a Publication requires two arguments, the citation itself and an optional list of descriptors. The role of descriptors will become clear as we progress. For now, just know that descriptors is a named list of attributes that apply to all models in the publication.

For example, all models in this publication are for Douglas-fir, and were developed using data in the United States and Oregon. With this in mind, we establish the Publication object.

temesgen_2008 <- Publication(
  citation = paper_citation,
  descriptors = list(
    country = "US",
    region = "US-OR",
    taxa = Taxa(
      Taxon(
        family = "Pinaceae",
        genus = "Pseudotsuga",
        species = "menziesii"
      )
    )
  )
)

temesgen_2008 now represents a publication with citation information and metadata that describe aspects of the models inside. One problem, there are no models inside the publication yet, so we must add them.

Adding a Single Model

Models can be added one at a time. Other convenient ways of adding models are described in the next section, but let us start with just one model.

The publication we are using implements several fixed effects models that predict the heights of trees using various covariates. One of the functions the authors use is \(h = 1.37 + b_0 * (1 - \exp(b_1 * d)^{b_2})\) where \(h\) represents height and \(d\) represents the diameter outside bark. Note that the height is a function of the covariate \(d\) and a finite set of parameters, \(b_0\), \(b_1\) and \(b_2\). This is a perfect candidate for the FixedEffectsModel class.

temesgen_2008_model_one <- FixedEffectsModel(
  response = list(
    hst = units::as_units("m")
  ),
  covariates = list(
    dsob = units::as_units("cm")
  ),
  parameters = list(
    beta_0 = 51.9954,
    beta_1 = -0.0208,
    beta_2 = 1.0182
  ),
  predict_fn = function(dsob) {
    1.37 + beta_0 * (1 - exp(beta_1 * dsob)^beta_2)
  }
)

We can see that four arguments are used, response, covariates, parameters and predict_fn.

response - a named list containing one element, which is the name of the response variable. In this case hst indicates the model is the height of the stem using the total height. The value of this element is established by providing the units using the units package. In this case, the units are defined as meters.
covariates - a named list containing all covariates usied in the model. In this case, only one covariate is used called dsob which stands for the diameter of the stem, outside bark at breast height. Again, the units are defined, this time as centimeters.
parameters - a named list containing the parameters and their estimates.
predict_fn - this is the prediction function. Note that the body of the function is written using the parameters and covariates, but the only argument provided to the function is dsob, i.e., the one covariate the model uses. This is for the author’s convenience, and parameters are populated into the function at a later phase, so they do not need to be defined as arguments to predict_fn.

Finally, we add the model to the publication using add_model

temesgen_2008 <- add_model(temesgen_2008, temesgen_2008_model_one)

and print the summary for the publication

temesgen_2008

## Temesgen, Monleon, and Hann (2008)
## --- hst: 1 model set
## --------- 1 model

We can see that the publciation now contains one model set for the variable hst and that set contains one model.

Adding a Model Set

A frequent pattern in allometric modeling papers are “model sets”, these are sets of allometric models that have the same response, covariate and functional structure, but are fit for several species, age classes, etc. For example, Brackett (1977) fit 24 cubic volume functions for different species disaggregated by geographic region and species.

ModelSet is a class in allometric that efficiently deals with these types of model structures. For this example we use FixedEffectsSet which represents a set of fixed effects models. Model sets allow the author to import a table (e.g., a .csv) of model descriptions to quickly define large sets of ParametricModels.

For this task, consider the Brackett (1977) publication. We begin again defining the Publication.

bracket_1977_citation <- RefManageR::BibEntry(
  bibtype = "techreport",
  key = "brackett_1977",
  title = "Notes on tarif tree volume computation",
  author = "Brackett, Michael",
  year = 1977,
  institution = "State of Washington, Dept. of Natural Resources"
)

brackett_1977 <- Publication(
  citation = bracket_1977_citation,
  descriptors = list(
    country = "US",
    region = "US-WA"
  )
)

Here we can see the citation information and the shared descriptors. Note that in this case, descriptors only contains country and region. This is because each individual model in brackett_1977 is not uniform with respect to species as it was in temsegen_2008, so we will have to define those descriptors at the model-level.

Next, we load a data frame of model descriptions and examine it

model_specifications <- tibble::as_tibble(load_parameter_frame("vsa_brackett_1977"))
head(model_specifications)

We can see that model_specifications contains a number of individual models which are specified by column names referred to as descriptors. In this case, descriptors includes family, genus, species, geographic_region and age_class. The combination of these values uniquely identifies a model within this publication. family, genus and species are part of a standard set of descriptors called “search descriptors”, which get propagated to the searching functions described in the README. geographic_region and age_class are not search descriptors, but still help to clarify the purpose of the model within the publication.

Next, we instantiate the FixedEffectsSet, which looks very similar to FixedEffectsModel used in the previous section. The arguments are the same, but the implication is that response unit, covariate units and prediction_fn all apply uniformly to the models in the set.

brackett_1977_model_set <- FixedEffectsSet(
  response = list(
    vsa = units::as_units("ft3")
  ),
  covariates = list(
    dsob = units::as_units("in"),
    hst  = units::as_units("ft")
  ),
  parameter_names = c("a", "b", "c"),
  predict_fn = function(dsob, hst) {
    10^a * dsob^b * hst^c
  },
  model_specifications = model_specifications %>% aggregate_taxa()
)

Finally, we add the model set to the publication using add_set and print a summary to verify that the model was added.

brackett_1977 <- add_set(brackett_1977, brackett_1977_model_set)
brackett_1977

This summary shows the publication name, followed by a list of all model sets defined for the publication. We can see that there is 1 model set for vsa (volume of the stem including top and stump) that contains 24 models.

Where do Publications go from here?

Once a publication is added to the models to the publications directory the models contained within can now be installed by the user using install_models(). In fact, this is exactly how the user initially gains access to the allometric_models table. If you are ever interested in viewing the source code for a model, just find the matching .R file in publications, which can be viewed locally or on GitHub. The files used for this example are located at publications/temesgen_2008.R and publications/brackett_1977.R, respectively.