Custom Simulations

This is an example of building a custom simulation of a FACS-based screen.

Tip

This section is complementary to the Implementation section of the paper

Setup

Lets first start the Julia REPL or a Julia session inside of a Jupyter Notebook and load the packages we'll need:

using Crispulator
using Gadfly

Basic screen parameters

First lets design a simple Crispulator.FacsScreen with 250 genes with 5 guides per gene. Lets say that we make sure we have 1000x as many cells as guides during transfection (representation) and sorting (bottleneck_representation) and 1000x as many reads as guides during sequencing (seq_depth). We'll leave the rest of the values as the defaults and print the object

s = FacsScreen()
s.num_genes = 250
s.coverage = 5
s.representation = 1000
s.bottleneck_representation = 1000
s.seq_depth = 1000
println(s)
Crispulator.FacsScreen <: ScreenSetup

    Number of genes: 250
    Number of guides per gene: 5 (1250 guides total)
    Cells per guide at transfection: 1000 (1250000 cells total)
    Multiplicity of infection (M.O.I.): 0.25
    Number of cells per guide at sequencing: 1000 (1250000 cells total)

    Number of cells per guide sorted: 1000 (1250000 cells total)
    Gaussian standard deviation of sorting: 1.0 (phenotype units)
    Bins:
        bin1: 0-33 percentile of cells
        bin2: 67-100 percentile of cells

Construction of true phenotype distribution

Next, lets make our distribution of true phenotypes. The basic layout is a Dict mapping a class name to a tuple of the probability of selecting this class and then the Distributions.Sampleable from which to draw a random phenotype from this class. The probabilities across all the classes should add up to 1.

For example, here we are making three different classes of "genes": the first group are :inactive, i.e. they have no phenotype, so we'll set their phenotypes to 0.0 using a Crispulator.Delta. We'll also make them 60% of all the genes. The second group are the negative controls :negcontrol (the only required group) which make up 10% of the population of genes and also have no effect. The final group is :increasing which makes up 30% of all genes and which are represented by a Normal(μ=0.1, σ=0.1) distribution clamped between 0.025 and 1.

max_phenotype_dists = Dict{Symbol, Tuple{Float64, Sampleable}}(
    :inactive => (0.60, Delta(0.0)),
    :negcontrol => (0.1, Delta(0.0)),
    :increasing => (0.3, truncated(Normal(0.1, 0.1), 0.025, 1)),
);
Dict{Symbol, Tuple{Float64, Distributions.Sampleable}} with 3 entries:
  :negcontrol => (0.1, Delta(0.0))
  :inactive   => (0.6, Delta(0.0))
  :increasing => (0.3, Truncated(Distributions.Normal{Float64}(μ=0.1, σ=0.1); l…
Note

The :negcontrol class needs to be present because Crispulator normalizes the frequencies of all other guides against the median frequency of the negative control guides. Also the distribution of :negcontrol guides serve as the null distribution against which the log2 fold changes of guides targeting a specific gene are assayed to calculate a statistical significance of the shift for each gene. See Crispulator.differences_between_bins for more details.

Library construction

Now, we actually build the library. Here we're making a Crispulator.CRISPRi library and then getting the guides that were built from the true phenotype distribution that we constructed above and we also get the frequency of each guide in the library.

lib = Library(max_phenotype_dists, CRISPRi())
guides, guide_freqs_dist = construct_library(s, lib);
(Crispulator.Barcode[Crispulator.Barcode(1, 0.9556068838439474, 0.10221415995027207, NaN, :linear, :increasing, -Inf), Crispulator.Barcode(1, 0.9289247009847659, 0.09936016532894809, NaN, :linear, :increasing, -Inf), Crispulator.Barcode(1, 0.8749466517427581, 0.09358653492474599, NaN, :linear, :increasing, -Inf), Crispulator.Barcode(1, 0.9045174315285525, 0.09674950127208035, NaN, :linear, :increasing, -Inf), Crispulator.Barcode(1, 0.7426632858894747, 0.07943716728760795, NaN, :linear, :increasing, -Inf), Crispulator.Barcode(2, 0.9136987916149341, 0.0, NaN, :linear, :inactive, -Inf), Crispulator.Barcode(2, 0.8536938118680988, 0.0, NaN, :linear, :inactive, -Inf), Crispulator.Barcode(2, 0.9096305815630755, 0.0, NaN, :linear, :inactive, -Inf), Crispulator.Barcode(2, 0.7120688477346155, 0.0, NaN, :linear, :inactive, -Inf), Crispulator.Barcode(2, 0.9587409603742346, 0.0, NaN, :linear, :inactive, -Inf)  …  Crispulator.Barcode(249, 0.9226871138413806, 0.04374905988039778, NaN, :linear, :increasing, -Inf), Crispulator.Barcode(249, 0.745293367907617, 0.03533796418301227, NaN, :linear, :increasing, -Inf), Crispulator.Barcode(249, 0.9265150046632483, 0.043930558702988155, NaN, :linear, :increasing, -Inf), Crispulator.Barcode(249, 0.8980633212608335, 0.042581526748170645, NaN, :linear, :increasing, -Inf), Crispulator.Barcode(249, 0.9603396468030689, 0.04553434862506398, NaN, :linear, :increasing, -Inf), Crispulator.Barcode(250, 0.8171188503650818, 0.0, NaN, :linear, :inactive, -Inf), Crispulator.Barcode(250, 0.994105695790255, 0.0, NaN, :linear, :inactive, -Inf), Crispulator.Barcode(250, 0.9150093127230605, 0.0, NaN, :linear, :inactive, -Inf), Crispulator.Barcode(250, 0.8909927062378782, 0.0, NaN, :linear, :inactive, -Inf), Crispulator.Barcode(250, 0.14720002318209385, 0.0, NaN, :linear, :inactive, -Inf)], Distributions.Categorical{Float64, Vector{Float64}}(
support: Base.OneTo(1250)
p: [0.00040211353567488943, 0.000345777950461791, 0.00031432667997179716, 0.0005098736359933936, 0.0015492548715405404, 0.000963028896842269, 0.0005531261475957026, 0.002421528526294965, 0.0007927459850577805, 0.0007997605157473652  …  0.0002185700104040626, 0.001032525285059027, 0.001701100922121758, 0.0002635310314151374, 0.0009578576636695585, 0.0034773120497709827, 0.0006982749171233774, 0.0002660525171325106, 0.00036869312569451205, 0.00042074729589129553]
)
)

Lets first look at what the true phenotype distribution of our different classes of guides looks like

df = DataFrame(Dict(
    :phenotype=>map(x->x.theo_phenotype, guides),
    :class=>map(x->x.class, guides),
    :freq=>pdf.(guide_freqs_dist, 1:(length(guides)))
))
plot(df, x=:phenotype, color=:class, Geom.histogram, Guide.ylabel("Number of guides"),
Guide.title("Guide phenotype distribution"))
phenotype -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 -0.5 0.0 0.5 1.0 -0.40 -0.38 -0.36 -0.34 -0.32 -0.30 -0.28 -0.26 -0.24 -0.22 -0.20 -0.18 -0.16 -0.14 -0.12 -0.10 -0.08 -0.06 -0.04 -0.02 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 0.50 0.52 0.54 0.56 0.58 0.60 0.62 0.64 0.66 0.68 0.70 0.72 0.74 0.76 0.78 0.80 increasing inactive negcontrol class h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? -1250 -1000 -750 -500 -250 0 250 500 750 1000 1250 1500 1750 2000 2250 -1000 -900 -800 -700 -600 -500 -400 -300 -200 -100 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 -1000 0 1000 2000 -1000 -950 -900 -850 -800 -750 -700 -650 -600 -550 -500 -450 -400 -350 -300 -250 -200 -150 -100 -50 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 1050 1100 1150 1200 1250 1300 1350 1400 1450 1500 1550 1600 1650 1700 1750 1800 1850 1900 1950 2000 Number of guides Guide phenotype distribution

As you can see, most guides should have a phenotype of 0. In FACS Screens this is equivalent to having no preference to being in either the left (bin1) or right (bin2) bins. The :increasing genes have a small preference to be in the right bin.

We can also look at the frequency of each guide in the library, which follows a Log-Normal distribution.

plot(df, x=:freq, color=:class, Geom.histogram(position=:stack),
    Guide.xlabel("Frequency"), Guide.ylabel("Number of guides"),
    Guide.title("Frequencies of guides in simulated library"))
Frequency -0.007 -0.006 -0.005 -0.004 -0.003 -0.002 -0.001 0.000 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.010 0.011 0.012 0.013 -0.0060 -0.0055 -0.0050 -0.0045 -0.0040 -0.0035 -0.0030 -0.0025 -0.0020 -0.0015 -0.0010 -0.0005 0.0000 0.0005 0.0010 0.0015 0.0020 0.0025 0.0030 0.0035 0.0040 0.0045 0.0050 0.0055 0.0060 0.0065 0.0070 0.0075 0.0080 0.0085 0.0090 0.0095 0.0100 0.0105 0.0110 0.0115 0.0120 -0.01 0.00 0.01 0.02 -0.0060 -0.0058 -0.0056 -0.0054 -0.0052 -0.0050 -0.0048 -0.0046 -0.0044 -0.0042 -0.0040 -0.0038 -0.0036 -0.0034 -0.0032 -0.0030 -0.0028 -0.0026 -0.0024 -0.0022 -0.0020 -0.0018 -0.0016 -0.0014 -0.0012 -0.0010 -0.0008 -0.0006 -0.0004 -0.0002 0.0000 0.0002 0.0004 0.0006 0.0008 0.0010 0.0012 0.0014 0.0016 0.0018 0.0020 0.0022 0.0024 0.0026 0.0028 0.0030 0.0032 0.0034 0.0036 0.0038 0.0040 0.0042 0.0044 0.0046 0.0048 0.0050 0.0052 0.0054 0.0056 0.0058 0.0060 0.0062 0.0064 0.0066 0.0068 0.0070 0.0072 0.0074 0.0076 0.0078 0.0080 0.0082 0.0084 0.0086 0.0088 0.0090 0.0092 0.0094 0.0096 0.0098 0.0100 0.0102 0.0104 0.0106 0.0108 0.0110 0.0112 0.0114 0.0116 0.0118 0.0120 increasing inactive negcontrol class h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? -70 -60 -50 -40 -30 -20 -10 0 10 20 30 40 50 60 70 80 90 100 110 120 130 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 -100 0 100 200 -60 -58 -56 -54 -52 -50 -48 -46 -44 -42 -40 -38 -36 -34 -32 -30 -28 -26 -24 -22 -20 -18 -16 -14 -12 -10 -8 -6 -4 -2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 102 104 106 108 110 112 114 116 118 120 Number of guides Frequencies of guides in simulated library

Performing the screen

Now, we'll actually perform the screen. We'll first perform the transection via Crispulator.transfect, followed by the selection process via Crispulator.select:

cells, cell_phenotypes = transfect(s, lib, guides, guide_freqs_dist)
bin_cells = Crispulator.select(s, cells, cell_phenotypes, guides)
freqs = counts_to_freqs(bin_cells, length(guides));
Dict{Symbol, Vector{Float64}} with 2 entries:
  :bin1 => [0.000334874, 0.000326656, 0.00024037, 0.000412943, 0.00145455, 0.00…
  :bin2 => [0.000406779, 0.000334873, 0.000297894, 0.000564971, 0.0016189, 0.00…

Lets look at what the observed phenotype distribution looks like when the selection was performed:

df = DataFrame(Dict(
    :phenotype=>map(x->x.theo_phenotype, guides),
    :class=>map(x->x.class, guides),
    :obs_freq=>map(x->x.obs_phenotype, guides)
))
plot(df, x=:obs_freq, Geom.density, Guide.xlabel("Observed phenotype on FACS machine"),
Guide.title("Kernel density estimate of guide observed phenotypes"), Guide.ylabel("ρ"))
Observed phenotype on FACS machine -20 -15 -10 -5 0 5 10 15 20 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 -20 -10 0 10 20 -15.0 -14.5 -14.0 -13.5 -13.0 -12.5 -12.0 -11.5 -11.0 -10.5 -10.0 -9.5 -9.0 -8.5 -8.0 -7.5 -7.0 -6.5 -6.0 -5.5 -5.0 -4.5 -4.0 -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 -0.50 -0.45 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 -0.5 0.0 0.5 1.0 -0.50 -0.48 -0.46 -0.44 -0.42 -0.40 -0.38 -0.36 -0.34 -0.32 -0.30 -0.28 -0.26 -0.24 -0.22 -0.20 -0.18 -0.16 -0.14 -0.12 -0.10 -0.08 -0.06 -0.04 -0.02 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 0.50 0.52 0.54 0.56 0.58 0.60 0.62 0.64 0.66 0.68 0.70 0.72 0.74 0.76 0.78 0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00 ρ Kernel density estimate of guide observed phenotypes

As you can see, this looks like many FACS plots, e.g. when looking at density along the fluorescence channel. A quick sanity check is that we should see a slight enrichment of the frequency of :increasing genes on the right side

plot(df, x=:obs_freq, color=:class, Geom.density, Guide.xlabel("Observed phenotype on FACS machine"),
Guide.title("Kernel density estimate of guide observed phenotypes"), Guide.ylabel("ρ"))
Observed phenotype on FACS machine -20 -15 -10 -5 0 5 10 15 20 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 -20 -10 0 10 20 -15.0 -14.5 -14.0 -13.5 -13.0 -12.5 -12.0 -11.5 -11.0 -10.5 -10.0 -9.5 -9.0 -8.5 -8.0 -7.5 -7.0 -6.5 -6.0 -5.5 -5.0 -4.5 -4.0 -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 increasing inactive negcontrol class h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 -0.50 -0.45 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 -0.5 0.0 0.5 1.0 -0.50 -0.48 -0.46 -0.44 -0.42 -0.40 -0.38 -0.36 -0.34 -0.32 -0.30 -0.28 -0.26 -0.24 -0.22 -0.20 -0.18 -0.16 -0.14 -0.12 -0.10 -0.08 -0.06 -0.04 -0.02 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 0.50 0.52 0.54 0.56 0.58 0.60 0.62 0.64 0.66 0.68 0.70 0.72 0.74 0.76 0.78 0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00 ρ Kernel density estimate of guide observed phenotypes

And that is what we see. The change is really small (this is pretty usual), but the later analysis will be able to pull out the increasing genes.

Sequencing and Analysis

Now we'll use Crispulator.sequencing to simulate sequencing by transforming the guide frequencies into a Categorical distribution and drawing a random sample of reads from this distribution. Finally, we'll use the Crispulator.differences_between_bins function to compute the differences between bins on a per-guide level (guide_data) and per-gene level (gene_data).

raw_data = sequencing(Dict(:bin1=>s.seq_depth, :bin2=>s.seq_depth), guides, freqs)
guide_data, gene_data = differences_between_bins(raw_data);
(1250×17 DataFrame
  Row │ gene   knockdown  theo_phenotype  behavior   class       initial_freq  ⋯
      │ Int64  Float64    Float64         Symbol     Symbol      Float64       ⋯
──────┼─────────────────────────────────────────────────────────────────────────
    1 │     1  0.955607       0.102214    linear     increasing   0.0003698    ⋯
    2 │     1  0.928925       0.0993602   linear     increasing   0.000328711
    3 │     1  0.874947       0.0935865   linear     increasing   0.00025886
    4 │     1  0.904517       0.0967495   linear     increasing   0.000484848
    5 │     1  0.742663       0.0794372   linear     increasing   0.00149153   ⋯
    6 │     2  0.913699       0.0         linear     inactive     0.000912173
    7 │     2  0.853694       0.0         linear     inactive     0.000538264
    8 │     2  0.909631       0.0         linear     inactive     0.00257216
  ⋮   │   ⋮        ⋮            ⋮             ⋮          ⋮            ⋮        ⋱
 1244 │   249  0.898063       0.0425815   linear     increasing   0.000217771  ⋯
 1245 │   249  0.96034        0.0455343   linear     increasing   0.000961479
 1246 │   250  0.817119       0.0         linear     inactive     0.00344325
 1247 │   250  0.994106       0.0         linear     inactive     0.000694402
 1248 │   250  0.915009       0.0         linear     inactive     0.000353364  ⋯
 1249 │   250  0.890993       0.0         linear     inactive     0.000390344
 1250 │   250  0.1472         0.0         linear     inactive     0.000439651
                                                11 columns and 1235 rows omitted, 227×7 DataFrame
 Row │ gene   behavior   class       pvalue_bin2_div_bin1  mean_bin2_div_bin1  ⋯
     │ Int64  Symbol     Symbol      Float64               Float64             ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │     1  linear     increasing            3.21561             0.319777    ⋯
   2 │     2  linear     inactive             -0.0                -0.0122411
   3 │     3  linear     increasing            1.5775              0.125157
   4 │     4  linear     inactive              0.99711             0.142935
   5 │     6  linear     increasing            3.42921             0.377822    ⋯
   6 │     7  sigmoidal  inactive              1.5775              0.129307
   7 │     8  linear     inactive              1.99779            -0.169173
   8 │     9  linear     inactive              1.16795            -0.092462
  ⋮  │   ⋮        ⋮          ⋮                ⋮                    ⋮           ⋱
 221 │   244  sigmoidal  inactive              1.08094             0.077192    ⋯
 222 │   245  sigmoidal  inactive              0.235651            0.0314794
 223 │   246  sigmoidal  increasing            0.112024            0.0444467
 224 │   247  linear     inactive              0.109206            0.00144287
 225 │   248  sigmoidal  increasing            3.71749             0.69264     ⋯
 226 │   249  linear     increasing            0.350441            0.044728
 227 │   250  linear     inactive              0.00457469          0.00927481
                                                  2 columns and 212 rows omitted)

Here's what the per-guide data looks like:

10×17 DataFrame
Rowgeneknockdowntheo_phenotypebehaviorclassinitial_freqcountsbarcodeidfreqsrel_freqscounts_bin1freqs_bin1rel_freqs_bin1counts_bin2freqs_bin2rel_freqs_bin2log2fc_bin2_div_bin1
Int64Float64Float64SymbolSymbolFloat64Float64Int64Float64Float64Float64Float64Float64Float64Float64Float64Float64
110.9556070.102214linearincreasing0.0003698421.510.0003370310.549186421.50.0003370310.549186510.50.0004081960.7215550.393815
210.9289250.0993602linearincreasing0.000328711400.520.000320240.521824400.50.000320240.521824417.50.0003338330.5901060.177411
310.8749470.0935865linearincreasing0.00025886320.530.0002562720.41759320.50.0002562720.41759344.50.0002754620.4869260.221616
410.9045170.0967495linearincreasing0.000484848514.540.0004113940.670358514.50.0004113940.670358722.50.0005777111.02120.607263
510.7426630.0794372linearincreasing0.001491531896.550.001516442.471011896.50.001516442.471012006.50.00160442.836040.198778
620.9136990.0linearinactive0.0009121731187.560.0009495251.547231187.50.0009495251.547231136.50.0009087461.606360.0541068
720.8536940.0linearinactive0.000538264704.570.0005633180.917915704.50.0005633180.917915597.50.0004777610.844523-0.120224
820.9096310.0linearinactive0.002572163608.580.002885364.701633608.50.002885364.701632969.50.002374414.19717-0.163743
920.7120690.0linearinactive0.000723164968.590.0007744131.26189968.50.0007744131.26189963.50.0007704151.361840.109969
1020.9587410.0linearinactive0.000809451052.5100.0008415791.371341052.50.0008415791.371341010.50.0008079961.428270.0586857
Tip

See Crispulator.differences_between_bins for details on what each column means.

And the gene level data

10×7 DataFrame
Rowgenebehaviorclasspvalue_bin2_div_bin1mean_bin2_div_bin1absmean_bin2_div_bin1pvalmeanprod_bin2_div_bin1
Int64SymbolSymbolFloat64Float64Float64Float64
11linearincreasing3.215610.3197770.3197771.02828
22linearinactive-0.0-0.01224110.01224110.0
33linearincreasing1.57750.1251570.1251570.197435
44linearinactive0.997110.1429350.1429350.142522
56linearincreasing3.429210.3778220.3778221.29563
67sigmoidalinactive1.57750.1293070.1293070.203983
78linearinactive1.99779-0.1691730.169173-0.337972
89linearinactive1.16795-0.0924620.092462-0.107991
910linearinactive0.3814250.0646070.0646070.0246427
1011sigmoidalinactive0.3428390.01064860.01064860.00365076

We can generate standard pooled screen plots from this dataset. Like a count scatterplot:

nopseudo = guide_data[(guide_data[!, :counts_bin1] .> 0.5) .& (guide_data[!, :counts_bin2] .> 0.5), :]
plot(nopseudo, x=:counts_bin1, y=:counts_bin2, color=:class, Scale.x_log10,
Scale.y_log10, Theme(highlight_width=0pt), Coord.cartesian(fixed=true),
Guide.xlabel("log counts bin1"), Guide.ylabel("log counts bin2"))
log counts bin1 10-5 10-4 10-3 10-2 10-1 100 101 102 103 104 105 106 107 108 109 10-4.0 10-3.5 10-3.0 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 10-5 100 105 1010 10-4.0 10-3.8 10-3.6 10-3.4 10-3.2 10-3.0 10-2.8 10-2.6 10-2.4 10-2.2 10-2.0 10-1.8 10-1.6 10-1.4 10-1.2 10-1.0 10-0.8 10-0.6 10-0.4 10-0.2 100.0 100.2 100.4 100.6 100.8 101.0 101.2 101.4 101.6 101.8 102.0 102.2 102.4 102.6 102.8 103.0 103.2 103.4 103.6 103.8 104.0 104.2 104.4 104.6 104.8 105.0 105.2 105.4 105.6 105.8 106.0 106.2 106.4 106.6 106.8 107.0 107.2 107.4 107.6 107.8 108.0 increasing inactive negcontrol class h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? 10-5 10-4 10-3 10-2 10-1 100 101 102 103 104 105 106 107 108 109 10-4.0 10-3.5 10-3.0 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 10-5 100 105 1010 10-4.0 10-3.8 10-3.6 10-3.4 10-3.2 10-3.0 10-2.8 10-2.6 10-2.4 10-2.2 10-2.0 10-1.8 10-1.6 10-1.4 10-1.2 10-1.0 10-0.8 10-0.6 10-0.4 10-0.2 100.0 100.2 100.4 100.6 100.8 101.0 101.2 101.4 101.6 101.8 102.0 102.2 102.4 102.6 102.8 103.0 103.2 103.4 103.6 103.8 104.0 104.2 104.4 104.6 104.8 105.0 105.2 105.4 105.6 105.8 106.0 106.2 106.4 106.6 106.8 107.0 107.2 107.4 107.6 107.8 108.0 log counts bin2

And a volcano plot:

plot(gene_data, x=:mean_bin2_div_bin1, y=:pvalue_bin2_div_bin1, color=:class, Theme(highlight_width=0pt),
Guide.xlabel("mean log2 fold change"), Guide.ylabel("-log10 pvalue"))
mean log2 fold change -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 -2.0 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 -2 0 2 4 -2.00 -1.95 -1.90 -1.85 -1.80 -1.75 -1.70 -1.65 -1.60 -1.55 -1.50 -1.45 -1.40 -1.35 -1.30 -1.25 -1.20 -1.15 -1.10 -1.05 -1.00 -0.95 -0.90 -0.85 -0.80 -0.75 -0.70 -0.65 -0.60 -0.55 -0.50 -0.45 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95 2.00 2.05 2.10 2.15 2.20 2.25 2.30 2.35 2.40 2.45 2.50 increasing inactive class h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 -4.0 -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 -5 0 5 10 -4.0 -3.8 -3.6 -3.4 -3.2 -3.0 -2.8 -2.6 -2.4 -2.2 -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 4.4 4.6 4.8 5.0 5.2 5.4 5.6 5.8 6.0 6.2 6.4 6.6 6.8 7.0 7.2 7.4 7.6 7.8 8.0 -log10 pvalue

And finally we can see how well we can differentiate between the different classes using Area Under the Precision-Recall Curve (Crispulator.auprc)

auprc(gene_data[!, :pvalmeanprod_bin2_div_bin1], gene_data[!, :class], Set([:increasing]))[1]
0.950629306347186

Crispulator.auroc and Crispulator.venn are also good summary statistics.