JellyMe4
Purpose
One of the difficulties in transitioning to a new programming language is not just learning how to do things in the new language, but the difference in the package ecosystem. RCall
helps with both aspects when moving from R to Julia, at least when it comes to basic data manipulation. JellyMe4
takes advantage of RCall
's extensibility to provide a way to transfer mixed-effects models fit between R's lme4
and Julia's MixedEffectsModels
. This means that it is now possible to fit a model in Julia, but then take advantage of existing R packages for examing the model such as car
and effects
.
Installation
JellyMe4
is now registered in the Julia package registry and can be installed by
julia> using Pkg
julia> Pkg.add("JellyMe4")
or, in the pkg REPL,
(@v1.4) pkg> add JellyMe4
To get the latest pre-release features, you can install the development version:
(@v1.4) pkg> add JellyMe4#master
Generally speaking, the development version should work, but especially until version 1.0, there is no guarantee that there won't be breaking changes compared to the latest release.
Basic Usage
Fitting a model in R and moving it to Julia:
julia> using MixedModels, RCall, JellyMe4
julia> R"library(lme4)"
┌ Warning: RCall.jl: Loading required package: Matrix
└ @ RCall ~/.julia/packages/RCall/g7dhB/src/io.jl:113
RObject{StrSxp}
[1] "lme4" "Matrix" "stats" "graphics" "grDevices" "utils"
[7] "datasets" "methods" "base"
julia> R"summary(m <- lmer(Reaction ~ 1 + Days + (1+Days|Subject),sleepstudy,REML=FALSE))"
RObject{VecSxp}
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: Reaction ~ 1 + Days + (1 + Days | Subject)
Data: sleepstudy
AIC BIC logLik deviance df.resid
1763.9 1783.1 -876.0 1751.9 174
Scaled residuals:
Min 1Q Median 3Q Max
-3.9416 -0.4656 0.0289 0.4636 5.1793
Random effects:
Groups Name Variance Std.Dev. Corr
Subject (Intercept) 565.48 23.780
Days 32.68 5.717 0.08
Residual 654.95 25.592
Number of obs: 180, groups: Subject, 18
Fixed effects:
Estimate Std. Error t value
(Intercept) 251.405 6.632 37.907
Days 10.467 1.502 6.968
Correlation of Fixed Effects:
(Intr)
Days -0.138
julia> @rget m
Linear mixed model fit by maximum likelihood
Reaction ~ 1 + Days + (1 + Days | Subject)
logLik -2 logLik AIC BIC
-875.96967 1751.93934 1763.93934 1783.09709
Variance components:
Column Variance Std.Dev. Corr.
Subject (Intercept) 565.476967 23.7797596
Days 32.681785 5.7167985 0.08
Residual 654.945706 25.5919070
Number of obs: 180; levels of grouping factors: 18
Fixed-effects parameters:
───────────────────────────────────────────────────
Estimate Std.Error z value P(>|z|)
───────────────────────────────────────────────────
(Intercept) 251.405 6.63212 37.9072 <1e-99
Days 10.4673 1.50223 6.96783 <1e-11
───────────────────────────────────────────────────
julia> typeof(m)
LinearMixedModel{Float64}
Fitting a model in Julia and moving it to R
julia> using MixedModels, RCall, JellyMe4
julia> machines = rcopy(R"nlme::Machines")
54×3 DataFrames.DataFrame
│ Row │ Worker │ Machine │ score │
│ │ Categorical… │ Categorical… │ Float64 │
├─────┼──────────────┼──────────────┼─────────┤
│ 1 │ 1 │ A │ 52.0 │
│ 2 │ 1 │ A │ 52.8 │
│ 3 │ 1 │ A │ 53.1 │
│ 4 │ 2 │ A │ 51.8 │
│ 5 │ 2 │ A │ 52.8 │
│ 6 │ 2 │ A │ 53.1 │
⋮
│ 48 │ 4 │ C │ 64.0 │
│ 49 │ 5 │ C │ 72.1 │
│ 50 │ 5 │ C │ 72.0 │
│ 51 │ 5 │ C │ 71.1 │
│ 52 │ 6 │ C │ 62.0 │
│ 53 │ 6 │ C │ 61.4 │
│ 54 │ 6 │ C │ 60.5 │
julia> m = fit(MixedModel, @formula(score ~ 1 + Machine + (1 + Machine|Worker)), machines)
Linear mixed model fit by maximum likelihood
score ~ 1 + Machine + (1 + Machine | Worker)
logLik -2 logLik AIC BIC
-108.208914 216.417828 236.417828 256.307668
Variance components:
Column Variance Std.Dev. Corr.
Worker (Intercept) 13.81760447 3.7172039
Machine: B 28.68496515 5.3558347 0.49
Machine: C 11.24097825 3.3527568 -0.36 0.30
Residual 0.92463436 0.9615791
Number of obs: 54; levels of grouping factors: 6
Fixed-effects parameters:
───────────────────────────────────────────────────
Estimate Std.Error z value P(>|z|)
───────────────────────────────────────────────────
(Intercept) 52.3556 1.53437 34.1218 <1e-99
Machine: B 7.96667 2.20988 3.60502 0.0003
Machine: C 13.9167 1.40579 9.89956 <1e-22
───────────────────────────────────────────────────
julia> # LinearMixedModel doesn't keep of the original dataframe,
julia> # so we need to package it up
julia> m_machines = (m, machines);
julia> @rput m_machines;
julia> R"summary(m_machines)"
RObject{VecSxp}
Linear mixed model fit by maximum likelihood ['lmerMod']
AIC BIC logLik deviance df.resid
236.4 256.3 -108.2 216.4 44
Scaled residuals:
Min 1Q Median 3Q Max
-2.40773 -0.51890 0.03227 0.45598 2.54091
Random effects:
Groups Name Variance Std.Dev. Corr
Worker (Intercept) 13.8176 3.7172
MachineB 28.6850 5.3558 0.49
MachineC 11.2410 3.3528 -0.36 0.30
Residual 0.9246 0.9616
Number of obs: 54, groups: Worker, 6
Fixed effects:
Estimate Std. Error t value
(Intercept) 52.356 1.534 34.122
MachineB 7.967 2.210 3.605
MachineC 13.917 1.406 9.900
Correlation of Fixed Effects:
(Intr) MachnB
MachineB 0.463
MachineC -0.374 0.301
Limitations and warnings
This is alpha software. It has some functionality that should work well for common use cases and even a testsuite, but this testsuite depends on two different software environments (R and Julia) and can afoul of all sorts of nasty version interactions. The testuite only tests against a single version of R and the current version of lme4. In other words, even for the parts that do work for me, they may not work for you.
Parts of the API aren't as nice as I would like them to be (especially in the Julia to R direction) and may change if I figure out something nicer.
Only a subset of all the options available in MixedModels
and lme4
are supported. Unsupported things should break in an obvious way, but here's a list of things that are more commonly used but may break non obviously:
custom contrast coding. If you really need this and you know what you're doing, then you can set up your own numeric variables representing appropriate contrasts, as numeric variables survive the transition without difficulty.This should work if you're not doing anything weird with your interaction terms.- advanced models in either language (e.g.,
zerocorr!
in Julia or||
in R, which are not completely synonymous anyway). There's not really a trivial way to deal with this at the moment, sorry. - fancy transformations within model formulae, especially those that have different names (e.g.
scale()
in R). If in doubt, pre-transform your data in the dataframe before fitting. A few transformations are supported (e.g.log
,exp
) and you should receive an error for unsupported transformations, but it's always a good idea to compare the estimates of the original and copied models. - missing data is handled differently in R and Julia, both in its representation and when it's eliminated. If in doubt, remove missing data before fitting models for consistency.
- Interactions Specified in R with
^
. This part of the parsing is handled by RCall and winds up getting translated into simple exponentiation instead of "interactions below a certain order". Consider using thesimplify.formula
function from theMuMIn
package in R to simplify your formula to primitive operations before calling lme4. - Other R-specific extensions to Wilkinson-Roger notation. This includes
%in%
for nesting, (/
for nesting, with reversed order should work because lme4 and MixedModels both handle that specially),I()
for literal interpretation of arithmetic,offset()
. - All the different ways to specify binomial data in R. Just stick to Bernoulli models with a numeric 0-1 response if you can.
- GLMMs with dispersion parameters. This should error when unsupported model types are encountered. This is intentionally not supported at the moment because there are known problems with these models in MixedModels.jl.
- Getting binomial GLMMs from R with any of the alternative specifications, e.g.
cbind(successes, failures)
orsuccesses/total
. You will receive a gentle reminder in the form of an error. Note that even once this is supported, you're subject to the constraints above. - Getting GLMMs with
quasi
families from R. The corresponding distributions aren't in Distributions.jl and are only "quasi" in the GLM/GLMM framework. - A number of scratch variables in R prefixed by
jellyme4_
are created. We need scratch variables when moving things to R, so we use identifiers beginning withjellyme4_
.
Finally, to work its magic, this package hooks into MixedModels
and lme4
internals. lme4
's internals are quite stable now and this is taken advantage of by several other R packages (ordinal
, gamm4
, robustlmm
). MixedModels
are also fairly stable for the core things, but not everything. I am a contributor to MixedModels
, so I generally know when something is going to change, but things may still break with changing MixedModels
versions, especially for major version changes.
Where does the name come from?
A really bad pun.
If you take lme4
and add a J for Julia and try to say the resulting string quickly, it sounds a bit like the package name.
Sorry, not sorry.
Acknowledgements
The development of this package was supported by the Center for Interdisciplinary Research, Bielefeld (ZiF) Cooperation Group "Statistical models for psychological and linguistic data".