CDMrdata.jl

This package is a simplistic port of the data repo created by @alexanderrobitzsch as part of their CDM R package.

Installation

As of July 22nd, this is still an un-registered package and hence to install this package, go to the package manager mode and run:

add https://github.com/athulsudheesh/CDMrdata.jl

Once you have installed the package, you can use it just like any other Julia package

using CDMrdata

Usage

Currently the package has only two functions:

load_data(data_name): takes the data_name as an argument and loads it as a Dict
list_datasets(): lists all the datasets available in the package.

Example

Suppose we want to load the ecpe dataset:

dat = load_data("ecpe");

To know the fields for that specific dataset:

keys(dat)

KeySet for a Dict{String, Int64} with 2 entries. Keys:
  "data"
  "q.matrix"

And to access one of those fields:

dat["q.matrix"]

28×3 DataFrame
 Row │ skill1  skill2  skill3 
     │ Int32   Int32   Int32  
─────┼────────────────────────
   1 │      1       1       0
   2 │      0       1       0
   3 │      1       0       1
   4 │      0       0       1
   5 │      0       0       1
   ⋮ │      ⋮        ⋮       ⋮
  24 │      0       1       0
  25 │      1       0       0
  26 │      0       0       1
  27 │      1       0       0
  28 │      0       0       1
               18 rows omitted

Dataset Description

Dataset Name	Description (From CDM R Package Dev)
`cdm01`	A multiple choice dataset
`cdm02`	Multiple choice dataset with a Q-matrix designed for polytomous attributes.
`cdm03`	Resimulated dataset from Chiu, Koehn and Wu (2016) where the data generating model is a reduced RUM model.
`cdm04`	Simulated dataset for the sequential DINA model (as described in Ma & de la Torre, 2016). The dataset contains 1000 persons and 12 items which measure 2 skills.
`cdm05`	Example dataset used in Philipp, Strobl, de la Torre and Zeileis (2018). This dataset is a sub-dataset of the probability dataset in the pks package (Heller & Wickelmaier, 2013).
`cdm06`	Resimulated example dataset from Chen and Chen (2017).
`cdm07`	This is a resimulated dataset from the social anxiety disorder data concerning social phobia which involve 13 dichotomous questions (Fang, Liu & Ling, 2017). The simulation was based on a latent class model with five classes. The dataset was also used in Chen, Li, Liu and Ying (2017).
`cdm08`	This is a simulated dataset involving four skills and three misconceptions for the model for simultaneously identifying skills and misconceptions (SISM; Kuo, Chen & de la Torre, 2018). The Q-matrix follows the specification in their simulation study.
`cdm09`	This is a simulated dataset involving polytomous skills which is adapted from the empirical example (proportional reasoning data) of Chen and de la Torre (2013).
`cdm10`	This is a simulated dataset involving a hierarchical skill structure. Skill A has four levels, skill B possesses two levels and skill C has three levels.
`dcm`	Dataset from Book 'Diagnostic Measurement' of Rupp, Templin and Henson (2010).
`dtmr`	DTMR Fraction Data (Bradshaw et al., 2014).
`ecpe`	The dataset has been used in Templin and Hoffman (2013), and Templin and Bradshaw (2014).
`fraction1`	The dataset has been used in de la Torre, J. (2009).
`fraction2`	The dataset has been used in de la Torre, J. (2009) & . Henson, Templin and Willse (2009)
`fraction3`	The dataset has been used in de la Torre (2011).
`fraction4`	The dataset has been used in de la Torre and Douglas (2004) and Chen, Liu, Xu and Ying (2015).
`fraction5`	This dataset was used as an example for the multiple strategy DINA model in de la Torre and Douglas (2008) and Hou and de la Torre (2014).
`hr`	Simulated data according to Ravand et al. (2013).
`jang`	Simulated dataset according to the Jang (2005) L2 reading comprehension study.
`melab`	This is a simulated dataset according to the MELAB reading study (Li, 2011; Li & Suen, 2013). Li (2011) investigated the Fusion model (RUM model) for calibrating this dataset. The dataset in this package is simulated assuming the reduced RUM model (RRUM).
`mg`	Large-scale dataset with multiple groups, survey weights and 11 polytomous items.
`pgdina`	Dataset for the estimation of the polytomous GDINA model.
`pisa00R.ct`	PISA 2000 of German students including 26 items of the reading test [Chen and de la Torre (2014)].
`pisa00R.cc`	PISA 2000 of German students including 20 items of the reading test [Chen and Chen (2016)].
`sda6`	This is a simulated dataset of the SDA6 study according to informations given in Jurich and Bradshaw (2014).
`Students`	This dataset contains item responses of students at a scale of cultural activities (act), mathematics self concept (sc) and mathematics joyment (mj) from an Austrian survey of 8th grade students
`timss03.G8.su`	This is a dataset with a subset of 23 Mathematics items from TIMSS 2003 items used in Su, Choi, Lee, Choi and McAninch (2013).
`timss07.G4.lee`	This dataset is a list containing dichotomous item responses (data; information on booklet and gender included), the Q-matrix (q.matrix) and descriptions of the skills (skillinfo) used in Lee et al. (2011).
`timss07.G4.py`	This dataset uses the same items as `timss07.G4.lee` but employs a simplified Q-matrix with 7 skills. This Q-matrix was used in Park and Lee (2014) and Park et al. (2018).
`timss07.G4.Qdomains`	This Q-matrix data is a simplification of `timss07.G4.py$q.matrix` to 3 domains and involves a simple structure of skills.
`timss11.G4.AUT`	TIMSS 2011 dataset of 4668 Austrian fourth-graders.
`timss11.G4.AUT.part`	Part of `timss11.G4.AUT` and contains only the first three booklets (with N=1010 students).
`timss11.G4.sa`	Contains the Q-matrix used in Sedat and Arican (2015).
`fraction.subtraction.data`	Tatsuoka's (1984) fraction subtraction data set is comprised of responses to 𝐽=20 fraction subtraction test items from 𝑁=536 middle school students
`fraction.subtraction.qmatrix`	The Q-Matrix corresponding to Tatsuoka (1984) fraction subtraction data set.