Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Main changes
Model selection: as discussed (see below*). I split the
lmerfunction into almer_selectionfunction and almer_testsfunction. Thelmer_selectionfunction implements model selection in the “old way” (BIC for random effects + LRT backward selection for fixed effects) but can be customised to only do the BIC part (or the LRT part, or some different methods). I used existing R librarybuildmer- the backward selection process is super annoying to implement, no need to do it from scratch…I didn’t keep the code from the originallmerfunction as it required the user to specify all anova comparisons in advance, which is not what is normally done when using LRT model selection (i.e., anovas).The
lmer_testsfunction runs someemmeansandemtrendscomparisons using the provided model. If more than one model is provided, it chooses which one to use based on BIC and then runs the tests on the selected model.Several possible modelling strategies can be implemented using these two functions - it’s best explained by examples, see examples/analysis_example.py (I put the script there for now, feel free to move it / remove it).
Fitting a new model for each pairwise comparison (group1 vs. HC, group2 vs. HC, ...) is not good practice. Better to fit a single model with the group as a multilevel factor, and then implement pairwise comparisons as a post-hoc analysis.
The output of
lmer_testsis now a dictionary where the keys are the different specs required by the user.I removed
other_col. The code gets the variables from formulas directly, so the user doesn't have to specify things twice -- they just give the formulas they need and the data gets checked for those variables. Names for group and ID variables can be specified by the user throughgroup_colandid_col(instead of requiring them to be 'Group' and 'ID', so they don't have to rename them in their data). Also no need to renamedata_colto'Y'.Questions
I don't get what
data_indexdoes. I left it as it was in the code.force categorical data type for
'Sensor': is it necessary? It's a bit artificial, and very specific…if necessary, then we may add asensor_colargument where the user can specify the name they used (if any) for the sensor?remove rows where
data_colis zero - why?(*) Model selection strategy:
Old:
New:
Motivation:
iterative use of statistical inference for selection of variables is a flawed approach to design multivariable models, as it invalidates the assumptions of inference (which assume a fixed model, not one selected based on the data) and it induces selection bias. Also, with this approach, variables that are not “significantly” associated with the outcome (albeit the model would be underpowered for at least some of them) are not accounted for even if they are true confounders of the relationship of interest. Instead, a multivariable model should be built based on the known and anticipated causal relationships.