This function runs on a single subset (e.g. the dataset with timestep 1 as the test timestep). Run fit_ldats_crossval to run this function on every subset.

ldats_subset_one(
  subsetted_dataset_item,
  k,
  lda_seed,
  cpts,
  nit,
  return_full = FALSE,
  cpt_seed = NULL
)

Arguments

subsetted_dataset_item

Result of subset_data_one, list with elements $full, $train, $test, $test_timestep

k

integer Number of topics for the LDA model.

lda_seed

integer Seed for running LDA model. Only use even numbers (odd numbers duplicate adjacent evens).

cpts

integer How many changepoints for ts?

nit

integer How many iterations? (draws from posterior)

return_full

logical Whether to return fitted model objects and abundance probabilities in addition to logliks. Can be useful for diagnostics, but hogs memory. Default FALSE.

cpt_seed

integer what seed to use for the cpt model. If NULL (default) randomly draws one and records it as part of the model_info

Value

list. subsetted_dataset_item with the following appended: If return_full, fitted_lda; fitted_ts; abund_probabilities, otherwise NULL; test_logliks, model_info

Details

First, fits an LDA to the full (not subsetted) dataset. Then splits the matrix of topic proportions (gamma matrix) for that LDA into training/test subsets to match the subset. (The LDA is fit to the full dataset, because LDAs fit to different subsets cannot be recombined in a logical way).

Then fits a TS model to the subsetted gamma matrix, with the specified number of iterations & changepoints.

Then extracts from that TS model the predicted abundances (multinomial probability distribution of species abundances) for each timestep. Because of the Bayesian component of the changepoint model, there is a matrix of predicted abundances per timestep for every draw from the posterior, so nit matrices. Then calculates the loglikelihood of the test timestep given these predicted probabilities. There are nit estimates of the loglikelihood.

Returns the subsetted dataset item list provided, with the following appendend: The LDA, TS, and abundance probabilities (if return_full = TRUE), or as NULL otherwise; the vector of loglikelihoods for the test timestep for each iteration; a list model_info with the model specifications (k, seed, cpts, nit)