Build status Coverage Status

This package defines a Plots recipe to implement the Stata command binscatter in Julia.


binscatter(df::Union{DataFrame, GroupedDataFrame}, f::FormulaTerm, n = 20; 
           weights::Union{Symbol, Nothing} = nothing, seriestype::Symbol = :scatter, kwargs...)


  • df: A DataFrame or a GroupedDataFrame
  • f: A formula created using @formula. The variable(s) in the left-hand side are plotted on the y-axis. The first variable in the right-hand side is plotted on the x-axis. Add other variables for controls.
  • n: Number of bins (default to 20).

Keyword arguments

  • weights: A symbol for weights
  • seriestype:
    • :scatter (default) only plots bins
    • :linearfit plots bins with a regression line
    • :scatterpath plots bins with a connecting line
  • kwargs...: Additional attributes from Plots.


using DataFrames, RDatasets, Plots, Binscatters
df = dataset("datasets", "iris")


You can use the typical options in Plot to customize the plot:

binscatter(df, @formula(SepalLength ~ SepalWidth), seriestype = :scatterpath, linecolor = :blue, markercolor = :blue)


Length seems to be a decreasing function of with in the iris dataset

binscatter(df, @formula(SepalLength ~ SepalWidth), seriestype = :linearfit)


However, it is an increasing function within species. To show this, you can apply binscatter on a GroupedDataFrame

binscatter(groupby(df, :Species), @formula(SepalLength ~ SepalWidth), seriestype = :linearfit)

binscatter When there is a large number of groups, a better way to visualize this fact is to partial out the variables with respect to the group:

binscatter(df, @formula(SepalLength ~ SepalWidth + fe(Species)), seriestype = :linearfit)


See more examples by typing ?binscatter in the REPL.


The package is registered in the General registry and so can be installed at the REPL with ] add Binscatter.