# Custom Simulations

This is an example of building a custom simulation of a FACS-based screen.

This section is complementary to the Implementation section of the paper

## Setup

Lets first start the Julia REPL or a Julia session inside of a Jupyter Notebook and load the packages we'll need:

```
using Crispulator
using Gadfly
```

## Basic screen parameters

First lets design a simple `Crispulator.FacsScreen`

with 250 genes with 5 guides per gene. Lets say that we make sure we have 1000x as many cells as guides during transfection (`representation`

) and sorting (`bottleneck_representation`

) and 1000x as many reads as guides during sequencing (`seq_depth`

). We'll leave the rest of the values as the defaults and print the object

```
s = FacsScreen()
s.num_genes = 250
s.coverage = 5
s.representation = 1000
s.bottleneck_representation = 1000
s.seq_depth = 1000
println(s)
```

```
Crispulator.FacsScreen <: ScreenSetup
Number of genes: 250
Number of guides per gene: 5 (1250 guides total)
Cells per guide at transfection: 1000 (1250000 cells total)
Multiplicity of infection (M.O.I.): 0.25
Number of cells per guide at sequencing: 1000 (1250000 cells total)
Number of cells per guide sorted: 1000 (1250000 cells total)
Gaussian standard deviation of sorting: 1.0 (phenotype units)
Bins:
bin1: 0-33 percentile of cells
bin2: 67-100 percentile of cells
```

## Construction of true phenotype distribution

Next, lets make our distribution of true phenotypes. The basic layout is a `Dict`

mapping a class name to a tuple of the probability of selecting this class and then the `Distributions.Sampleable`

from which to draw a random phenotype from this class. The probabilities across all the classes should add up to 1.

For example, here we are making three different classes of "genes": the first group are `:inactive`

, i.e. they have no phenotype, so we'll set their phenotypes to 0.0 using a `Crispulator.Delta`

. We'll also make them 60% of all the genes. The second group are the negative controls `:negcontrol`

(the only required group) which make up 10% of the population of genes and also have no effect. The final group is `:increasing`

which makes up 30% of all genes and which are represented by a Normal(μ=0.1, σ=0.1) distribution clamped between 0.025 and 1.

```
max_phenotype_dists = Dict{Symbol, Tuple{Float64, Sampleable}}(
:inactive => (0.60, Delta(0.0)),
:negcontrol => (0.1, Delta(0.0)),
:increasing => (0.3, truncated(Normal(0.1, 0.1), 0.025, 1)),
);
```

```
Dict{Symbol, Tuple{Float64, Distributions.Sampleable}} with 3 entries:
:negcontrol => (0.1, Delta(0.0))
:inactive => (0.6, Delta(0.0))
:increasing => (0.3, Truncated(Distributions.Normal{Float64}(μ=0.1, σ=0.1); l…
```

The `:negcontrol`

class needs to be present because Crispulator normalizes the frequencies of all other guides against the median frequency of the negative control guides. Also the distribution of `:negcontrol`

guides serve as the null distribution against which the log2 fold changes of guides targeting a specific gene are assayed to calculate a statistical significance of the shift for each gene. See `Crispulator.differences_between_bins`

for more details.

## Library construction

Now, we actually build the library. Here we're making a `Crispulator.CRISPRi`

library and then getting the guides that were built from the true phenotype distribution that we constructed above and we also get the frequency of each guide in the library.

```
lib = Library(max_phenotype_dists, CRISPRi())
guides, guide_freqs_dist = construct_library(s, lib);
```

```
(Crispulator.Barcode[Crispulator.Barcode(1, 0.9556068838439474, 0.10221415995027207, NaN, :linear, :increasing, -Inf), Crispulator.Barcode(1, 0.9289247009847659, 0.09936016532894809, NaN, :linear, :increasing, -Inf), Crispulator.Barcode(1, 0.8749466517427581, 0.09358653492474599, NaN, :linear, :increasing, -Inf), Crispulator.Barcode(1, 0.9045174315285525, 0.09674950127208035, NaN, :linear, :increasing, -Inf), Crispulator.Barcode(1, 0.7426632858894747, 0.07943716728760795, NaN, :linear, :increasing, -Inf), Crispulator.Barcode(2, 0.9136987916149341, 0.0, NaN, :linear, :inactive, -Inf), Crispulator.Barcode(2, 0.8536938118680988, 0.0, NaN, :linear, :inactive, -Inf), Crispulator.Barcode(2, 0.9096305815630755, 0.0, NaN, :linear, :inactive, -Inf), Crispulator.Barcode(2, 0.7120688477346155, 0.0, NaN, :linear, :inactive, -Inf), Crispulator.Barcode(2, 0.9587409603742346, 0.0, NaN, :linear, :inactive, -Inf) … Crispulator.Barcode(249, 0.9226871138413806, 0.04374905988039778, NaN, :linear, :increasing, -Inf), Crispulator.Barcode(249, 0.745293367907617, 0.03533796418301227, NaN, :linear, :increasing, -Inf), Crispulator.Barcode(249, 0.9265150046632483, 0.043930558702988155, NaN, :linear, :increasing, -Inf), Crispulator.Barcode(249, 0.8980633212608335, 0.042581526748170645, NaN, :linear, :increasing, -Inf), Crispulator.Barcode(249, 0.9603396468030689, 0.04553434862506398, NaN, :linear, :increasing, -Inf), Crispulator.Barcode(250, 0.8171188503650818, 0.0, NaN, :linear, :inactive, -Inf), Crispulator.Barcode(250, 0.994105695790255, 0.0, NaN, :linear, :inactive, -Inf), Crispulator.Barcode(250, 0.9150093127230605, 0.0, NaN, :linear, :inactive, -Inf), Crispulator.Barcode(250, 0.8909927062378782, 0.0, NaN, :linear, :inactive, -Inf), Crispulator.Barcode(250, 0.14720002318209385, 0.0, NaN, :linear, :inactive, -Inf)], Distributions.Categorical{Float64, Vector{Float64}}(
support: Base.OneTo(1250)
p: [0.00040211353567488943, 0.000345777950461791, 0.00031432667997179716, 0.0005098736359933936, 0.0015492548715405404, 0.000963028896842269, 0.0005531261475957026, 0.002421528526294965, 0.0007927459850577805, 0.0007997605157473652 … 0.0002185700104040626, 0.001032525285059027, 0.001701100922121758, 0.0002635310314151374, 0.0009578576636695585, 0.0034773120497709827, 0.0006982749171233774, 0.0002660525171325106, 0.00036869312569451205, 0.00042074729589129553]
)
)
```

Lets first look at what the true phenotype distribution of our different classes of guides looks like

```
df = DataFrame(Dict(
:phenotype=>map(x->x.theo_phenotype, guides),
:class=>map(x->x.class, guides),
:freq=>pdf.(guide_freqs_dist, 1:(length(guides)))
))
plot(df, x=:phenotype, color=:class, Geom.histogram, Guide.ylabel("Number of guides"),
Guide.title("Guide phenotype distribution"))
```

As you can see, most guides should have a phenotype of 0. In FACS Screens this is equivalent to having no preference to being in either the left (`bin1`

) or right (`bin2`

) bins. The `:increasing`

genes have a small preference to be in the right bin.

We can also look at the frequency of each guide in the library, which follows a Log-Normal distribution.

```
plot(df, x=:freq, color=:class, Geom.histogram(position=:stack),
Guide.xlabel("Frequency"), Guide.ylabel("Number of guides"),
Guide.title("Frequencies of guides in simulated library"))
```

## Performing the screen

Now, we'll actually perform the screen. We'll first perform the transection via `Crispulator.transfect`

, followed by the selection process via `Crispulator.select`

:

```
cells, cell_phenotypes = transfect(s, lib, guides, guide_freqs_dist)
bin_cells = Crispulator.select(s, cells, cell_phenotypes, guides)
freqs = counts_to_freqs(bin_cells, length(guides));
```

```
Dict{Symbol, Vector{Float64}} with 2 entries:
:bin1 => [0.000334874, 0.000326656, 0.00024037, 0.000412943, 0.00145455, 0.00…
:bin2 => [0.000406779, 0.000334873, 0.000297894, 0.000564971, 0.0016189, 0.00…
```

Lets look at what the observed phenotype distribution looks like when the selection was performed:

```
df = DataFrame(Dict(
:phenotype=>map(x->x.theo_phenotype, guides),
:class=>map(x->x.class, guides),
:obs_freq=>map(x->x.obs_phenotype, guides)
))
plot(df, x=:obs_freq, Geom.density, Guide.xlabel("Observed phenotype on FACS machine"),
Guide.title("Kernel density estimate of guide observed phenotypes"), Guide.ylabel("ρ"))
```

As you can see, this looks like many FACS plots, e.g. when looking at density along the fluorescence channel. A quick sanity check is that we should see a slight enrichment of the frequency of `:increasing`

genes on the right side

```
plot(df, x=:obs_freq, color=:class, Geom.density, Guide.xlabel("Observed phenotype on FACS machine"),
Guide.title("Kernel density estimate of guide observed phenotypes"), Guide.ylabel("ρ"))
```

And that is what we see. The change is really small (this is pretty usual), but the later analysis will be able to pull out the `increasing`

genes.

## Sequencing and Analysis

Now we'll use `Crispulator.sequencing`

to simulate sequencing by transforming the guide frequencies into a Categorical distribution and drawing a random sample of reads from this distribution. Finally, we'll use the `Crispulator.differences_between_bins`

function to compute the differences between bins on a per-guide level (`guide_data`

) and per-gene level (`gene_data`

).

```
raw_data = sequencing(Dict(:bin1=>s.seq_depth, :bin2=>s.seq_depth), guides, freqs)
guide_data, gene_data = differences_between_bins(raw_data);
```

```
(1250×17 DataFrame
Row │ gene knockdown theo_phenotype behavior class initial_freq ⋯
│ Int64 Float64 Float64 Symbol Symbol Float64 ⋯
──────┼─────────────────────────────────────────────────────────────────────────
1 │ 1 0.955607 0.102214 linear increasing 0.0003698 ⋯
2 │ 1 0.928925 0.0993602 linear increasing 0.000328711
3 │ 1 0.874947 0.0935865 linear increasing 0.00025886
4 │ 1 0.904517 0.0967495 linear increasing 0.000484848
5 │ 1 0.742663 0.0794372 linear increasing 0.00149153 ⋯
6 │ 2 0.913699 0.0 linear inactive 0.000912173
7 │ 2 0.853694 0.0 linear inactive 0.000538264
8 │ 2 0.909631 0.0 linear inactive 0.00257216
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
1244 │ 249 0.898063 0.0425815 linear increasing 0.000217771 ⋯
1245 │ 249 0.96034 0.0455343 linear increasing 0.000961479
1246 │ 250 0.817119 0.0 linear inactive 0.00344325
1247 │ 250 0.994106 0.0 linear inactive 0.000694402
1248 │ 250 0.915009 0.0 linear inactive 0.000353364 ⋯
1249 │ 250 0.890993 0.0 linear inactive 0.000390344
1250 │ 250 0.1472 0.0 linear inactive 0.000439651
11 columns and 1235 rows omitted, 227×7 DataFrame
Row │ gene behavior class pvalue_bin2_div_bin1 mean_bin2_div_bin1 ⋯
│ Int64 Symbol Symbol Float64 Float64 ⋯
─────┼──────────────────────────────────────────────────────────────────────────
1 │ 1 linear increasing 3.21561 0.319777 ⋯
2 │ 2 linear inactive -0.0 -0.0122411
3 │ 3 linear increasing 1.5775 0.125157
4 │ 4 linear inactive 0.99711 0.142935
5
```