ldats_subset_one.Rd
This function runs on a single subset (e.g. the dataset with timestep 1 as the test timestep). Run fit_ldats_crossval
to run this function on every subset.
ldats_subset_one( subsetted_dataset_item, k, lda_seed, cpts, nit, return_full = FALSE, cpt_seed = NULL )
subsetted_dataset_item | Result of subset_data_one, list with elements |
---|---|
k | integer Number of topics for the LDA model. |
lda_seed | integer Seed for running LDA model. Only use even numbers (odd numbers duplicate adjacent evens). |
cpts | integer How many changepoints for ts? |
nit | integer How many iterations? (draws from posterior) |
return_full | logical Whether to return fitted model objects and abundance probabilities in addition to logliks. Can be useful for diagnostics, but hogs memory. Default FALSE. |
cpt_seed | integer what seed to use for the cpt model. If NULL (default) randomly draws one and records it as part of the model_info |
list. subsetted_dataset_item with the following appended: If return_full
, fitted_lda; fitted_ts; abund_probabilities, otherwise NULL; test_logliks, model_info
First, fits an LDA to the full (not subsetted) dataset. Then splits the matrix of topic proportions (gamma
matrix) for that LDA into training/test subsets to match the subset. (The LDA is fit to the full dataset, because LDAs fit to different subsets cannot be recombined in a logical way).
Then fits a TS model to the subsetted gamma
matrix, with the specified number of iterations & changepoints.
Then extracts from that TS model the predicted abundances (multinomial probability distribution of species abundances) for each timestep. Because of the Bayesian component of the changepoint model, there is a matrix of predicted abundances per timestep for every draw from the posterior, so nit
matrices. Then calculates the loglikelihood of the test timestep given these predicted probabilities. There are nit
estimates of the loglikelihood.
Returns the subsetted dataset item list provided, with the following appendend: The LDA, TS, and abundance probabilities (if return_full = TRUE
), or as NULL otherwise; the vector of loglikelihoods for the test timestep for each iteration; a list model_info
with the model specifications (k, seed, cpts, nit)