Datasets

The following shows the currently available example datasets:

using ArviZExampleData

println(describe_example_data())
centered_eight
==============

A centered parameterization of the eight schools model. Provided as an example of a model that NUTS has trouble fitting. Compare to `non_centered_eight`.

The eight schools model is a hierarchical model used for an analysis of the effectiveness of classes that were designed to improve students' performance on the Scholastic Aptitude Test.

See Bayesian Data Analysis (Gelman et. al.) for more details.

local: /juliateam/.julia/artifacts/88acee5c1db592b660a577a21fa9bdc783ec49d7/arviz_example_data-0.2.0/data/centered_eight.nc

non_centered_eight
==================

A non-centered parameterization of the eight schools model. This is a hierarchical model where sampling problems may be fixed by a non-centered parametrization. Compare to `centered_eight`.

The eight schools model is a hierarchical model used for an analysis of the effectiveness of classes that were designed to improve students' performance on the Scholastic Aptitude Test.

See Bayesian Data Analysis (Gelman et. al.) for more details.

local: /juliateam/.julia/artifacts/88acee5c1db592b660a577a21fa9bdc783ec49d7/arviz_example_data-0.2.0/data/non_centered_eight.nc

radon
=====

Radon is a radioactive gas that enters homes through contact points with the ground. It is a carcinogen that is the primary cause of lung cancer in non-smokers. Radon levels vary greatly from household to household.

This example uses an EPA study of radon levels in houses in Minnesota to construct a model with a hierarchy over households within a county. The model includes estimates (gamma) for contextual effects of the uranium per household.

See Gelman and Hill (2006) for details on the example, or https://docs.pymc.io/notebooks/multilevel_modeling.html by Chris Fonnesbeck for details on this implementation.

remote: http://ndownloader.figshare.com/files/24067472

rugby
=====

The Six Nations Championship is a yearly rugby competition between Italy, Ireland, Scotland, England, France and Wales. Fifteen games are played each year, representing all combinations of the six teams.

This example uses and includes results from 2014 - 2017, comprising 60 total games. It models latent parameters for each team's attack and defense, as well as a global parameter for home team advantage.

See https://github.com/arviz-devs/arviz_example_data/blob/main/code/rugby/rugby.ipynb for the whole model specification.

remote: http://figshare.com/ndownloader/files/44916469

rugby_field
===========

A variant of the 'rugby' example dataset. The Six Nations Championship is a yearly rugby competition between Italy, Ireland, Scotland, England, France and Wales. Fifteen games are played each year, representing all combinations of the six teams.

This example uses and includes results from 2014 - 2017, comprising 60 total games. It models latent parameters for each team's attack and defense, with each team having different values depending on them being home or away team.

See https://github.com/arviz-devs/arviz_example_data/blob/main/code/rugby_field/rugby_field.ipynb for the whole model specification.

remote: http://figshare.com/ndownloader/files/44667112

regression1d
============

A synthetic one dimensional linear regression dataset with latent slope, intercept, and noise ("eps"). One hundred data points, fit with PyMC3.

True slope and intercept are included as deterministic variables.

remote: http://ndownloader.figshare.com/files/16254899

regression10d
=============

A synthetic multi-dimensional (10 dimensions) linear regression dataset with latent weights ("w"), intercept, and noise ("eps"). Five hundred data points, fit with PyMC3.

True weights and intercept are included as deterministic variables.

remote: http://ndownloader.figshare.com/files/16255736

classification1d
================

A synthetic one dimensional logistic regression dataset with latent slope and intercept, passed into a Bernoulli random variable. One hundred data points, fit with PyMC3.

True slope and intercept are included as deterministic variables.

remote: http://ndownloader.figshare.com/files/16256678

classification10d
=================

A synthetic multi dimensional (10 dimensions) logistic regression dataset with latent weights ("w") and intercept, passed into a Bernoulli random variable. Five hundred data points, fit with PyMC3.

True weights and intercept are included as deterministic variables.

remote: http://ndownloader.figshare.com/files/16256681

glycan_torsion_angles
=====================

Torsion angles phi and psi are critical for determining the three dimensional structure of bio-molecules. Combinations of phi and psi torsion angles that produce clashes between atoms in the bio-molecule result in high energy, unlikely structures.

This model uses a Von Mises distribution to propose torsion angles for the structure of a glycan molecule (pdb id: 2LIQ), and a Potential to estimate the proposed structure's energy. Said Potential is bound by Boltzman's law.

remote: http://ndownloader.figshare.com/files/22882652