RegressionAndOtherStories.jl v0.4

Project Status Build Status

Purpose (once completed, maybe late 2022)

RegressionAndOtherStories.jl contains supporting (Julia) functions and the data files used in "Regression and Other Stories" by Andrew Gelham, Jennifer Hill and Aki Vehtari.

Contents

The supporting functions are intended to be used in (currently) 2 Julia projects (also under development), ROSStanPluto.jl and ROSTuringPluto.jl.

I use the ros_functions dataframe to summarize all functions introduced in RegressionAndOtherStories.jl.

The ros_functions dataframe can be created in a notebook by executing ros_functions = create_ros_functions().

As the actual available functions will vary by notebook there is a function update_ros_functions(df) which will update the :function column in df (typically ros_functions) for a given notebook based on its loaded packages (the dependency is shown in the column :condition).

All data files are in .csv format and located in the data directory.

If RegressionAndOtherStories.jl is loaded, the files can be read in as a DataFrame using:

hibbs = CSV.read(ros_datadir("ElectionsEconomy", "hibbs.csv"), DataFrame)

For that purpose ros_datadir() is exported.

If needed, Stata files (.dat) have been converted to .csv files using the scripts in the scripts directory, e.g. see scripts\hdi.jl. To access the Stata files in the R package ROS-Examples RegressionAndOtherStories.jl expects the environment variable JULIA_ROS_HOME to be defined, e.g.:

ENV["JULIA_ROS_HOME"] = expanduser("~/Projects/R/ROS-Examples")

R itself does not necessarily need to be installed for this to work.

If so desired, direct use of the Stata files is also possible as the Stata to .csv file conversion scripts mentioned above show.

Approach

The initial approach attempted in RegressionAndOtherStories.jl (v0.2) and associated projects was different from StatisticalRethinking.jl. But that approach did not work out as expected, so I will switch to a similar setup as in StatisticalRethinking.jl using Requires.jl from v0.3 onwards.

In particular Turing, Stan, Makie and AlgebraOfGraphics, if needed, will all be included using Requires.jl.

Over time I might minimize the use of AlgebraOfGraphics.jl. It is a nice package but also a bit more difficult to tailor (compared to Makie/GLMakie).

For testing purposes the packages enabled using Requires.jl will move to the test section of RegressionAndOtherStories.jl.

In doing this I will move over several important functions from StatisticalRethinking.jl as well, e.g. link().

I expect I can use ParetoSmoothedImportanceSampling.jl and StructuralCausalModels.jl as is.

Notebook maintenance

Pluto is a great tool and definitely leads to a different programming style. But it is also new and thus will continue to develop rapidly (I hope). One aspect, as the developer/maintainer of 4 projects each containing a growing number of notebooks, I have found updating packages in these notebooks is time consuming.

For now I am using a (homegrown) maintenance function (update_ros_notebooks()) which will remove the Project and Manifest sections in selected notebooks.

This function I will likely use in the future to store all notebooks on GitHub.

Issues, comments and questions

Please file issues, comments and questions here.

Pull requests are also welcome.

Versions

Release 0.4.5

  1. Doc fixes by Pietro Monticone
  2. Added model_summary(::SampleModel).

Release 0.4.x

  1. Model_summary and plot_chains (accept both Symbol and Strings)
  2. Focus on Appendices A and B.
  3. Focus on chapters 4, 5, 6, 7

Versions 0.3.6 - 0.3.10

  1. Fine tuning working with ros_functions and ros_notebooks.

Release 0.3.5

  1. Added maintenance functions for a (large) set of notebooks.

Release 0.3.4

  1. Is tagging using JuliaHub with setting branch name working?

Version 0.3.3

  1. Add initial version of notebook maintenance routines.
  2. Tag this version (if not done by TagBot)

Version 0.3.2

  1. Fix Makie and AoG glue scripts.

Version 0.3.1

  1. StatsFuns compat entry to 1.0.

Version 0.3.0 (under development)

  1. Switch back to using Requires.jl
  2. Switch to using eachindex() where appropriate.
  3. Experimental versions for chapter 3.

Version 0.2.4

  1. Chapter 2 mostly done
  2. Added trankplot function

Version 0.2.0

  1. Support for the 5 examples from chapter 1 done.
  2. Added plot_chains() and model_summary() functions.
  3. Added Makie and AlgebraOfGraphics as dependencies.

Note: Source files for Makie/AoG are all in src/Makie/ to simplify moving those to a separate repo (not my intention right now, but still).

  1. In sync with both ROS[Turing|Stan]Pluto projects tagged 2.3 and up.

Version 0.1.0

  1. Initial commit (to registrate the package for usage in projects).

References

Of course this package is focused on:

  1. Gelman, Hill, Vehtari: Regression and Other Stories

which in a sense is a major update to item 3. below.

There is no shortage of other good books on Bayesian statistics. A few of my favorites are:

  1. Bolstad: Introduction to Bayesian statistics

  2. Bolstad: Understanding Computational Bayesian Statistics

  3. Gelman, Hill: Data Analysis Using Regression and Multilevel/Hierarchical Models

  4. McElreath: Statistical Rethinking

  5. Kruschke: Doing Bayesian Data Analysis

  6. Lee, Wagenmakers: Bayesian Cognitive Modeling

  7. Betancourt: A Conceptual Introduction to Hamiltonian Monte Carlo

  8. Gelman, Carlin, and others: Bayesian Data Analysis

  9. Pearl, Glymour, Jewell: Causal Inference in Statistics: A Primer

  10. Pearl, Judea and MacKenzie, Dana: The Book of Why

  11. Scott Cunningham: Causal Inference - the mixtapes

A good book to understand most of the Julia constructs used in this book is:

  1. Bogumił Kamiński: Julia for Data Analysis.