Mendel - Iterative Hard Thresholding
A modern approach to analyze data from a Genome Wide Association Studies (GWAS)
Package Feature
- Analyze large GWAS datasets intuitively.
- Built-in support for PLINK binary files via SnpArrays.jl and VCF files via VCFTools.jl.
- Out-of-the-box parallel computing routines for
q-fold
cross-validation. - Fits a variety of generalized linear models with any choice of link function.
- Computation directly on raw genotype files.
- Efficient handlings for non-genetic covariates.
- Optional acceleration (debias) step to dramatically improve speed.
- Ability to explicitly incorporate weights for predictors.
- Ability to enforce within and between group sparsity.
- Naive genotype imputation.
- Estimates nuisance parameter for negative binomial regression using Newton or MM algorithm.
- Excellent flexibility to handle different data structures and complements well with other Julia packages.
Read our paper for more detail.
Supported GLM models and Link functions
MendelIHT borrows distribution and link functions implementationed in GLM.jl and Distributions.jl.
Distribution | Canonical Link | Status |
---|---|---|
Normal | IdentityLink | $\checkmark$ |
Bernoulli | LogitLink | $\checkmark$ |
Poisson | LogLink | $\checkmark$ |
NegativeBinomial | LogLink | $\checkmark$ |
Gamma | InverseLink | experimental |
InverseGaussian | InverseSquareLink | experimental |
Examples of these distributions in their default value is visualized in this post.
Available link functions
CauchitLink
CloglogLink
IdentityLink
InverseLink
InverseSquareLink
LogitLink
LogLink
ProbitLink
SqrtLink
Manual Outline
- Getting started
- Examples
- Using MendelIHT.jl
- Parallel computing
- Example 1: GWAS with PLINK files
- Example 2: How to simulate data
- Example 3: Logistic/Poisson/Negative-binomial GWAS
- Example 4: Running IHT on general matrices
- Example 5: Group IHT
- Example 6: Linear Regression with prior weights
- Example 7: Multivariate IHT
- Other examples and functionalities
- Details of Parameter Estimation
- Generalized linear models
- Loglikelihood, gradient, and expected information
- Iterative hard thresholding
- Nuisance parameter estimation
- Contributing
- API