Functions and Types

Flexible Multivariate Linear Mixed Models (flxMLMM)

FlxQTL.FlxQTLModule
FlxQTL

flexible Multivariate Linear Mixed Model based QTL analysis tools for structured multiple traits.

FlxQTL.flxMLMMModule
flxMLMM

A module designed for fitting a Multivariate Linear Mixed Model run by Nesterov's Accelerated Gradient with restarting scheme incorporated with Expectation Conditional Maximization.

The model:

$Y=XBZ'+R+E$, where $E(vec(Y))= (Z \otimes X)vec(B)$, $var(vec(Y))= \tau^2 K_G \otimes K_C + I_n \otimes \Sigma$

FlxQTL.flxMLMM.K2EigFunction
  K2Eig(Kg,Kc::Array{Float64,2},LOCO::Bool=false)

Returns a two pairs of eigenvectors and eigenvalues for genetic and climatic relatedness matrices.

Arguments

  • Kg : A matrix of a genetic kinship, or 3-d array of that if LOCO sets to be true.
  • Kc : A matrix of a climatic relatedness.
  • LOCO : Boolean. Default is false (no LOCO). (Leave One Chromosome Out). LOCO is only connected to the genetic kinship (Kg).

Output

  • Tg : A matrix of eigenvectors for Kg, or 3-d array of eigenvectors if LOCO sets to be true.
  • λg : A vector of eigenvalues for Kg, or matrix of eigenvalues if LOCO sets to be true.
  • Tc : A matrix of eigenvectors for Kc.
  • λc : A vector of eigenvalues for Kc

See K2eig.

Examples

For a genetic kinship calculated under LOCO (3-d array of kinship),

 Tg,λg,Tc,λc = K2Eig(Kg,Kc,true)

produces a 3-d array of Tg, matrices of λg, Tc, and a vector of λc.

FlxQTL.flxMLMM.K2eigFunction
   K2eig(K,LOCO::Bool=false)

Returns eigenvectors and eigenvalues of a (genetic, climatic) relatedness, or 3-d array of these of a genetic relatedness if LOCO is true.

Arguments

  • K : A matrix of (genetic or climatic) relatedness (Default). 3-d array of genetic relatedness (LOCO sets to be true.)
  • LOCO : Boolean. Default is false (no LOCO). (Leave One Chromosome Out).

Output

  • T : A matrix of eigenvectors, or 3-d array of eigenvectors if LOCO sets to be true.
  • λ : A vector of eigenvalues, or matrix of eigenvalues if LOCO sets to be true.

See also K2Eig.

Examples

For a (climatic) relatedness, or genetic relatedness for LOCO =false,

 T, λ = K2eig(K)

produces a matrix of T and a vector of λ.

For a genetic kinship calculated under LOCO (3-d array of kinship),

 T, λ = K2eig(K,true)

produces a 3-d array of T and a matrix of λ.

FlxQTL.flxMLMM.envScanFunction
  envScan(Midx::Array{Int64,1},cross::Int64,Tg,Tc::Array{Float64,2},Λg,λc::Array{Float64,1},
    Y0::Array{Float64,2},XX::Markers,Z0::Array{Float64,2},LOCO::Bool=false;
            Xnul::Array{Float64,2}=ones(1,size(Y0,2)),itol=1e-4,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)

Implement environment scan conditional on a genetic marker of interest (QTL). Each of q trait covariate data is scanned (regressed) given a major QTL selected from genome scan, geneScan to obtain LOD scores.

Arguments

  • Midx : A vector of genetic marker indices (major QTL) selected based on LOD scores from geneScan.

  • cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.

  • Tg : A n x n matrix of eigenvectors from K2eig, or K2Eig. Returns 3d-array of eigenvectors as many as Chromosomes if LOCO is true.

  • Tc : A m x m matrix of eigenvectors from climatic relatedness matrix.

  • Λg : A n x 1 vector of eigenvalues from kinship. Returns a matrix of eigenvalues if LOCO is true.

  • λc : A m x 1 vector of eigenvalues from climatic relatedness matrix. Use ones(m) for no climatic information added.

  • Y0 : A m x n matrix of response variables, i.e. m traits (or environments) by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e. Y0[1,:] (a vector) ->Y[[1],:] (a matrix) .

  • XX : A type of Markers.

  • Z0 : A m x q matrix of low-dimensional trait covariate data for environment scan, i.e. minimum or maximum monthly temperature data, monthly photoperiod data, etc.

  • LOCO : Boolean. Default is false (no LOCO). Runs genome scan using LOCO (Leave One Chromosome Out).

Keyword Arguments

  • Xnul : A matrix of covariates. Default is intercepts (1's). Unless plugging in particular covariates, just leave as it is.
  • itol : A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is 1e-3.
  • tol0 : A tolerance controlling ECM under H1: existence of QTL. Default is 1e-3.
  • tol : A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is 1e-4.
  • ρ : A tunning parameter controlling $\tau^2$. Default is 0.001.

!!! Note

  • When some LOD scores return negative values, reduce tolerences for ECM to tol0 = 1e-4. It works in most cases. If not, can reduce both tol0 and tol to 1e-4 or further.

Output

  • LODs : A vector of LOD scores by envrionment scan when including each major QTL.
  • B : A 3-d array of B (fixed effects) matrices under H1: existence of an environment factor (covariate) conditional on a major QTL.
  • est0 : A type of EcmNestrv.Approx including parameter estimates under H0: no environment factor conditional on a major QTL.
FlxQTL.flxMLMM.gene2ScanFunction
gene2Scan(cross::Int64,Tg,Tc::Array{Float64,2},Λg,λc::Array{Float64,1},
    Y0::Array{Float64,2},XX::Markers,Z0::Array{Float64,2},LOCO::Bool=false;ρ=0.001,Xnul::Array{Float64,2}=ones(1,size(Y0,2))
    ,itol=1e-4,tol0=1e-3,tol::Float64=1e-4)
gene2Scan(cross::Int64,Tg,Λg,Y0::Array{Float64,2},XX::Markers,LOCO::Bool=false;
                  Xnul::Array{Float64,2}=ones(1,size(Y0,2)),itol=1e-4,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)

Implement 2d-genome scan with/without LOCO (Leave One Chromosome Out). Note that the second gene2Scan() is based on a conventional MLMM:

\[vec(Y) \sim MVN((Z \otimes X)vec(B) (or XBZ') , K \otimes \Sigma_1 +I \otimes \Sigma_2),\]

where K is a genetic kinship, $\Sigma_1, \Sigma_2$ are covariance matrices for random and error terms, respectively. Z can be replaced with an identity matrix.

Arguments

  • cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.
  • Tg : A n x n matrix of eigenvectors from K2eig, or K2Eig. Returns 3d-array of eigenvectors as many as Chromosomes if LOCO is true.
  • Tc : A m x m matrix of eigenvectors from climatic relatedness matrix.
  • Λg : A n x 1 vector of eigenvalues from kinship. Returns a matrix of eigenvalues if LOCO is true.
  • λc : A m x 1 vector of eigenvalues from climatic relatedness matrix. Use ones(m) for no climatic information added.
  • Y0 : A m x n matrix of response variables, i.e. m traits (or environments) by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e. Y0[1,:] (a vector) -> Y[[1],:] (a matrix) .
  • XX : A type of Markers.
  • Z0 : An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). If nothing to insert in Z0, just insert an identity matrix, Matrix(1.0I,m,m). m traits x q phenotypic covariates.
  • LOCO : Boolean. Default is false (no LOCO). Runs genome scan using LOCO (Leave One Chromosome Out).

Keyword Arguments

  • Xnul : A matrix of covariates. Default is intercepts (1's). Unless adding covariates, just leave as it is. See geneScan.
  • itol : A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is 1e-3.
  • tol0 : A tolerance controlling ECM under H1: existence of QTL. Default is 1e-3.
  • tol : A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is 1e-4.
  • ρ : A tunning parameter controlling $\tau^2$. Default is 0.001.

!!! Note

  • When some LOD scores return negative values, reduce tolerences for ECM to tol0 = 1e-4. It works in most cases. If not, can reduce both tol0 and tol to 1e-4 or further.

Output

  • LODs : LOD scores. Can change to $- \log_{10}{P}$ using lod2logP.
  • est0 : A type of EcmNestrv.Approx including parameter estimates under H0: no QTL.
FlxQTL.flxMLMM.geneScanFunction
geneScan(cross::Int64,Tg,Tc::Array{Float64,2},Λg,λc::Array{Float64,1},Y0::Array{Float64,2},XX::Markers,Z0::Array{Float64,2},
     LOCO::Bool=false;Xnul::Array{Float64,2}=ones(1,size(Y0,2)),itol=1e-3,tol0=1e-3,tol::Float64=1e-4,ρ=0.001,LogP::Bool=false)
geneScan(cross::Int64,Tg,Tc::Array{Float64,2},Λg,λc::Array{Float64,1},Y0::Array{Float64,2},XX::Markers,LOCO::Bool=false;
            Xnul::Array{Float64,2}=ones(1,size(Y0,2)),itol=1e-3,tol0=1e-3,tol::Float64=1e-4,ρ=0.001,LogP::Bool=false)
geneScan(cross::Int64,Tg,Λg,Y0::Array{Float64,2},XX::Markers,LOCO::Bool=false;
    Xnul::Array{Float64,2}=ones(1,size(Y0,2)),itol=1e-3,tol0=1e-3,tol::Float64=1e-4,ρ=0.001,LogP::Bool=false)

Implement 1d-genome scan with/without LOCO (Leave One Chromosome Out). Note that the third geneScan() is based on a conventional MLMM:

\[vec(Y) \sim MVN((Z \otimes X)vec(B) (or XBZ'), K \otimes \Sigma_1 +I \otimes \Sigma_2),\]

where K is a genetic kinship, $\Sigma_1, \Sigma_2$ are covariance matrices for random and error terms, respectively. Z can be replaced with an identity matrix.

Arguments

  • cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.
  • Tg : A n x n matrix of eigenvectors from K2eig, or K2Eig. Returns 3d-array of eigenvectors as many as Chromosomes if LOCO is true.
  • Tc : A m x m matrix of eigenvectors from climatic relatedness matrix.
  • Λg : A n x 1 vector of eigenvalues from kinship. Returns a matrix of eigenvalues if LOCO is true.
  • λc : A m x 1 vector of eigenvalues from climatic relatedness matrix. Use ones(m) for no climatic information added.
  • Y0 : A m x n matrix of response variables, i.e. m traits (or environments) by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e. Y0[1,:] (a vector) ->Y[[1],:] (a matrix) .
  • XX : A type of Markers.
  • Z0 : An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). If nothing to insert in Z0, just exclude it or insert an identity matrix, Matrix(1.0I,m,m). m traits x q phenotypic covariates.
  • LOCO : Boolean. Default is false (no LOCO). Runs genome scan using LOCO (Leave One Chromosome Out).

Keyword Arguments

  • Xnul : A matrix of covariates. Default is intercepts (1's): Xnul= ones(1,size(Y0)). Adding covariates (C) is Xnul= vcat(ones(1,m),C) where size(C)=(c,m) for m = size(Y0,1).
  • itol : A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is 1e-3.
  • tol0 : A tolerance controlling ECM under H1: existence of QTL. Default is 1e-3.
  • tol : A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is 1e-4.
  • ρ : A tunning parameter controlling $\tau^2$. Default is 0.001.
  • LogP : Boolean. Default is false. Returns $-\log_{10}{P}$ instead of LOD scores if true.

!!! Note

  • When some LOD scores return negative values, reduce tolerences for ECM to tol0 = 1e-4. It works in most cases. If not, can reduce both tol0 and tol to 1e-4 or further.

Output

  • LODs (or logP) : LOD scores. Can change to $- \log_{10}{P}$ in lod2logP if LogP = true.
  • B : A 3-d array of B (fixed effects) matrices under H1: existence of QTL. If covariates are added to Xnul : Xnul= [ones(1,size(Y0)); Covariates], ex. For sex covariates in 4-way cross analysis, B[:,2,100], B[:,3:5,100] are effects for sex, the rest genotypes of the 100th QTL, respectively.
  • est0 : A type of EcmNestrv.Approx including parameter estimates under H0: no QTL.
FlxQTL.flxMLMM.permTestMethod
  permTest(nperm,cross,Kg,Kc,Y0,XX::Markers,Z0;pval=[0.05 0.01],Xnul=ones(1,size(Y0,2)),itol=1e-4,tol0=1e-3,tol=1e-4,ρ=0.001)
  permTest(nperm,cross,Kg,Y0,XX::Markers,Z0;pval=[0.05 0.01],Xnul=ones(1,size(Y0,2)),itol=1e-4,tol0=1e-3,tol=1e-4,ρ=0.001)
  permTest(nperm,cross,Kg,Y0,XX::Markers;pval=[0.05 0.01],Xnul=ones(1,size(Y0,2)),itol=1e-4,tol0=1e-3,tol=1e-4,ρ=0.001)

Implement permutation test to get thresholds at the levels of type 1 error, α. Note that the second permTest() is for the conventional MLMM:

\[vec(Y)\sim MVN((I \otimes X)vec(B) (or BX), K \otimes \Sigma_1 +I \otimes \Sigma_2),\]

where K is a genetic kinship, $\Sigma_1, \Sigma_2$ are covariance matrices for random and error terms, respectively.

Arguments

  • nperm : An integer indicating the number of permutation to be implemented.
  • cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.
  • Kg : A n x n genetic kinship matrix. Should be symmetric positive definite.
  • Kc : A m x m climatic relatedness matrix. Should be symmetric positive definite.
  • Y0 : A m x n matrix of response variables, i.e. m traits (or environments) by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e. Y0[1,:] (a vector) -> Y[[1],:] (a matrix) .
  • XX : A type of Markers.
  • Z0 : An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). If nothing to insert in Z0, just insert an identity matrix, Matrix(1.0I,m,m). m traits x q phenotypic covariates.

Keyword Arguments

  • pval : A vector of p-values to get their quantiles. Default is [0.05 0.01] (without comma).
  • Xnul : A matrix of covariates. Default is intercepts (1's). Unless plugging in particular covariates, just leave as it is.
  • itol : A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is 1e-3.
  • tol0 : A tolerance controlling ECM under H1: existence of QTL. Default is 1e-3.
  • tol : A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is 1e-4.
  • ρ : A tunning parameter controlling $\tau^2$. Default is 0.001.

Output

  • maxLODs : A nperm x 1 vector of maximal LOD scores by permutation.
  • H1par_perm : A type of struct, EcmNestrv.Approx(B,τ2,Σ,loglik) including parameter estimates or EcmNestrv.Result(B,Vc,Σ,loglik) for a conventional MLMM under H0: no QTL by permutation.
  • cutoff : A vector of thresholds corresponding to pval.
FlxQTL.EcmNestrvModule
EcmNestrv

A module for base algorithms using ECM (Expectation-Conditional Maxization) with Speed restarting Nesterov's accelerated gradient method to fit a flexible multivariate linear mixed model (flxMLMM).

Genetic Relatedness Matrices (GRM)

FlxQTL.GRMModule
 GRM

A module for computing Genetic Relatedness Matrix (or kinship).

FlxQTL.GRM.kinship4wayMethod
 kinship4way(genmat::Array{Float64,2})

Computes a kinship for four-way cross data counting different alleles between two markers: ex. AB-AB=0; AB-AC=1; AB-CD=2,$\dots$ Note: In R/qtl, genotypes are labeled as 1=AC; 2=BC; 3=AD; 4=BD by the function, read.cross.

Argument

  • genmat : A matrix of genotypes for four-way cross $(1,2, \dots)$. size(genematrix)= (p,n), for p genetic markers x n individuals(or lines).

Output

Returns a n x n symmetric matrix containing 1's on the diagonal.

FlxQTL.GRM.kinshipCtrMethod
 kinshipCtr(genmat::Array{Float64,2})

Calculates a kinship by a centered genotype matrix (linear kernel), i.e. genotypes subtracted by marker mean.

Argument

  • genmat : A matrix of genotype data (0,1,2). size(genmat)=(p,n) for p markers x n individuals

Output

Returns a n x n symmetric matrix. See also kinshipStd.

FlxQTL.GRM.kinshipGsMethod
 kinshipGs(climate::Array{Float64,2},ρ::Float64)

Computes a kinship matrix using the Gaussian Kernel.

Arguments

  • climate : A matrix of genotype, or climate information data. size(climate)=(r,m), such that r genotype markers (or days/years of climate factors, i.e. precipitations, temperatures, etc.), and m individuals (or environments/sites)
  • ρ : A free parameter determining the width of the kernel. Could be attained empirically.

Output

Returns a m x m symmetric (positive definite) matrix containing 1's on the diagonal.

FlxQTL.GRM.kinshipLinMethod
kinshipLin(mat,cross)

Calculates a kinship (or climatic relatedness, kinshipGs) matrix by linear kernel.

Arguments

  • mat : A matrix of genotype (or allele) probabilities usually extracted from R/qtl, R/qtl2, or the counterpart packages. size(mat)= (p,n) for p genetic markers x n individuals.
  • cross : A scalar indicating instances of alleles or genotypes in a genetic marker. ex. 1 for genotypes (labeled as 0,1,2), 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.

Output

Returns a n x n symmetric (positive definite) matrix containing 1's on the diagonal.

See also kinshipCtr, kinshipStd for genetype data.

FlxQTL.GRM.kinshipLocoFunction
 kinshipLoco(kin,g::Markers,cross::Int64=1)

Generates a 3-d array of symmetric positive definite kinship matrices using LOCO (Leave One Chromosome Out) witout shrinkage intensity estimation. When a kinship is not positive definite, a tweak like a weighted average of kinship and Identity is used to correct minuscule negative eigenvalues.

Arguments

  • kin : A function of computing a kinship. Can only use with kinshipCtr, kinshipStd for genotypes, and with kinshipLin for genotype (or allele) probabilities.
  • g : A struct of arrays, type Markers.
  • cross : A scalar indicating instances of alleles or genotypes in a genetic marker. ex. 1 for genotypes (0,1,2) as default, 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.

Output

Returns 3-d array of n x n symmetric positive definite matrices as many as Chromosomes. Refer to shrinkgLoco.

FlxQTL.GRM.kinshipManMethod
  kinshipMan(genematrix::Array{Float64,2})

Calculates a kinship matrix using a manhattan distance. Missing values need to be either omitted or imputed. This function is for recombinant inbred line (RIL) (AA/BB), not for 4-way cross genotype data. See kinship4way.

Argument

  • genematrix : A matrix of genotypes, i.e. 0,1 (or 1,2). size(genematrix)= (p,n) for p genetic markers x n individuals(or lines).

Output

Returns a n x n symmetric matrix containing 1's on the diagonal.

FlxQTL.GRM.kinshipStdMethod
 kinshipStd(genmat::Array{Float64,2})

Calculates a kinship by a standardized (or normalized) genotype matrix (linear kernel), i.e. genotypes subtracted by marker mean and divided by marker standard deviation. Can also do with climatic information data. See kinshipGs.

Argument

  • genmat : A matrix of genotype data (0,1,2). size(genmat)=(p,n) for p markers x n individuals

Output

Returns a n x n symmetric matrix. See also kinshipCtr.

FlxQTL.GRM.shrinkgMethod
 shrinkg(f,nb::Int64,geno)

Estimates a full-rank positive definite kinship matrix by shrinkage intensity estimation (bootstrap). Can only use with kinshipMan, kinship4way. This function runs faster by CPU parallelization. Add workers/processes using addprocs() function before running for speedup.

Arguments

  • f: A function of computing a kinship. Can only use with kinshipMan, kinship4way.
  • nb : An integer indicating the number of bootstrap. It does not have to be a large number.
  • geno : A matrix of genotypes. See kinshipMan, kinship4way for dimension.

Example

julia> using flxQTL
julia> addprocs(8)
julia> K = shinkage(kinshipMan,20,myGeno)

Output

Returns a full-rank symmetric positive definite matrix.

FlxQTL.GRM.shrinkgLocoMethod
   shrinkgLoco(kin,nb,g::Markers)

Generates 3-d array of full-rank positive definite kinship matrices by shrinkage intensity estimation (bootstrap) using a LOCO (Leave One Chromosome Out) scheme.

Argument

  • kin : A function of computing a kinship. Can only use with kinshipMan, kinship4way
  • nb : An integer indicating the number of bootstrap.
  • g : A struct of arrays, type Markers.

Output

Returns 3-d array of n x n symmetric positive definite matrices as many as Chromosomes.

Multivariate Linear Models (MLM)

FlxQTL.MLMModule
  MLM

A module for fitting general Multivariate Linear Models motivated by functional data analysis via mle or reml. The default fitting method is mle. ( i.e. reml=false)

The model: $Y=XBZ'+E$, where $E(Y)=XBZ'$ (or $E(vec(Y))= (Z \otimes X)vec(B)$ ), $var(vec(E))=\Sigma \otimes I.$ size(Y)=(n,m), size(X)=(n,p), size(Z)=(m,q).

FlxQTL.MLM.EstimatType
 Estimat(B::Array{Float64,2},Σ::Array{Float64,2},loglik::Float64)

A struct of arrays for results by fitting a multivariate linear model, mGLM(). The results are B(fixed effects), Σ (m x m covariance matrix), loglik(a value of log-likelihood by mle or reml).

FlxQTL.MLM.mGLMFunction
  mGLM(Y::Array{Float64,2},X::Array{Float64,2},Z::Array{Float64,2},reml::Bool=false)
  mGLM(Y::Array{Float64,2},X::Array{Float64,2},reml::Bool=false)

Fitting multivariate General Linear Models via MLE (or REML) and returns a type of a struct Estimat.

Arguments

  • Y : A matrix of response variables, i.e. traits. size(Y)=(n,m) for n individuals x m traits
  • X : A matrix of independent variables, i.e. genotypes or genotype probabilities including intercept or/and covariates. size(X)=(n,p) for n individuals x p markers including intercept or/and covariates
  • Z : An optional matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). If nothing to insert in Z, just exclude it or insert Matrix(1.0I,m,m). size(Z)=(m,q) for m traits x q phenotypic covariates.
  • reml : Boolean. Default is fitting the model via mle. Resitricted MLE is implemented if true.

Output

Returns Estimat .

Utility Functions (Util)

FlxQTL.UtilModule
 Util

A module for utility functions.

FlxQTL.Util.MarkersType
Markers(name::Array{String,1},chr::Array{Any,1},pos::Array{Float64,1},X::Array{Float64,2})

A struct of arrays creating genotype or genotype probability data for genome scan.

Arguments

  • name : A vector of marker names
  • chr : A vector of Chromosomes
  • pos : A vector of marker positions (cM)
  • X : A matrix of genotypes or genotype probabilities
FlxQTL.Util.Y_huberMethod
Y_huber(Y::Array{Float64,2})

Rescale Y (phenotype or trait data) to be less sensitive to outliers using by Huber loss function and MAD (median absolute deviation). size(Y)=(m,n) for m trait and n individuals.

FlxQTL.Util.array2matMethod
 array2mat(cross::Int64,X0::Array{Float64,3})

Returns a 3-d array to a matrix of genotype probabilities. size(X0)=(p,cross,n) –> (p1,n), where p1 = cross*p for p markers, cross alleles or genotypes, and n individuals. See mat2array.

FlxQTL.Util.getFinoidxMethod
getFinoidx(phenoData::Array{Union{Missing,Float64},2})

Attains indices of phenotype data without missing values.

Argument

  • phenoData : A matrix of phenotype (or trait) data including missing values. size(phenoData) = (m,n) for m traits and n individuals.
FlxQTL.Util.getGenoidxFunction
getGenoidx(GenoData::Union{Array{Any,2},Array{Float64,2}},maf::Float64=0.025)

Attains genotype indices to drop correlated or bad markers.

Arguments

  • GenoData : A matrix of genotype data. size(GenoData)= (p,n) for p markers and n individuals.
  • maf : A scalar for dropping criterion of markers. Default is 0.025 i.e. markers of MAF < 0.025 are dropped.
FlxQTL.Util.lod2logPMethod
lod2logP(LODs::Union{Array{Float64,1},Array{Any,1}},v::Int64)

Caculates $-\log_{10}{P}$ from LOD scores.

Arguments

  • LODs : A vector of LOD scores computed from genome scan.
  • v : Degrees of freedom for Chi-squared distribution.

!!! NOTE

  • To compute degrees of freedom for 1D-genome scan (geneScan), v = prod(size(B1))-prod(size(B0), where B1 and B0 are effect size matrices under H1 of a full model (intercept + covariates + QTL) and H0 of no QTL (intercept + covariates), respectively.
FlxQTL.Util.mat2arrayMethod
 mat2array(cross::Int64,X0)

Returns a matrix of genotype probabilities to 3-d array. size(X0)=(p1,n) –> (p,cross,n), where p1 = cross*p for p markers, cross alleles or genotypes, and n individuals.

Argument

  • cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.
  • X0 : A matrix of genotype probability data computed from r/qtl or r/qtl2.

See array2mat.

FlxQTL.Util.mat2vecMethod
 mat2vec(mat)

Stacks a matrix to a vector, i.e. vectorizing a matrix.

FlxQTL.Util.newMarkersFunction
newMarkers(XX::Markers,cross::Int64,cM::Int64=2)

Returns a struct of Markers by keeping only markers positioned in every cM centimorgans for 2-d genome scan to avoid singularity.

Arguments

  • XX : A type of Markers. See Markers.
  • cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.
  • cM : An integer of dropping criterion of markers. Default is 2, i.e. keeping only markers in every 2 cM, or dropping markers within 2cM between 2 markers.
FlxQTL.Util.ordrMarkersMethod
ordrMarkers(markers::Array{Any,2})

Rearrange by CPU parallelization marker information composed of marker name, chr, position obtained from rqtl2, which is not listed in order (excluding X chromosome).

Argument

  • markers : An array of marker information.
FlxQTL.Util.setSeedMethod
setSeed(seedNum::Int64)

Assigns different numbers of seeds to workers (or processes) for reproducibility.

Arguments

  • seedNum: An integer. A minimum seed number to assign a worker. For distributed computing, seed numbers are generated for multiple workers by increasing 1, e.g. seedNum = 123 & 10 workers, setSeed generates seeds from 123 to 132 and assigns to corresponding workers (processes).

Examples

julia> using Distributed
julia> addprocs(10)
julia> @everywhere using flxQTL
julia> setSeed(123)
FlxQTL.Util.sortBycMFunction
    sortBycM(Chr::Any,XX::Markers,cross::Int64,cM::Int64=2)

Returns marker indices in Chromosome Chr and the corresponding genotype probabilities keeping only markers positioned in every cM centimorgans for 2-d genome scan to avoid singularity.

Arguments

  • Chr : A type of Any indicating a particular chromosome to sort markers out.
  • XX : A type of Markers.
  • cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.
  • cM : An integer of dropping criterion of markers. Default is 2, i.e. keeping only markers in every 2 cM, or dropping markers within 2cM between 2 markers.

See also newMarkers