Estimating $\pi$ with distributed Monte Carlo

In this tutorial we will estimate $\pi$ by simulating `n`

darts thrown at a square dartboard with inscribed circle.

When the simulated darts are thrown uniformly, the ratio $n_○/n_□$ of the number of darts that land within the circle to the number within the square is approximately the ratio of the areas $A_○/A_□$. For a radius $r$, we have

$\begin{aligned} \frac{n_○}{n_□} \approx \frac{A_○}{A_□} = \frac{\pi r^2}{4*r^2} = \frac{\pi}{4} \end{aligned}$This gives a *Monte Carlo estimate* $\pi_{est} = 4 \frac{n_○}{n_□} \approx \pi$ which improves as the total number of points is increased.

To simplify the code we can use symmetry to work with just one quarter of the circle. Any radius will do, so it's simplest to choose $r=1$. This leads to the following sequential code:

```
function estimate_pi(n)
n > 0 || throw(ArgumentError("number of iterations must be >0, got $n"))
num_inside = 0
for i in 1:n
x, y = rand(), rand()
if x^2 + y^2 <= 1
num_inside += 1
end
end
return 4 * num_inside / n
end
```

We can run this $\pi$ estimation on one core of one machine, but to run it on a cluster of machines we need reorganize a little. One possible way is to write it as a *distributed reduction* with the `@distributed`

macro. We'll also add a line to actually run the function and log the result, giving a complete script:

```
using Distributed
function estimate_pi(n)
n > 0 || throw(ArgumentError("number of iterations must be >0, got $n"))
num_inside = @distributed (+) for i in 1:n
x, y = rand(), rand()
Int(x^2 + y^2 <= 1)
end
return 4 * num_inside / n
end
pi_estimate = estimate_pi(1_000_000_000)
@info "Finished computation" pi_estimate
```

Next we launch the job from within your local installation of VSCode. To set things up, you'll need to install two extensions, the Julia extension and the JuliaHub extension. You'll also need to set https://juliahub.com as the package server. See the connecting your editor section for detailed instructions.

Bring up the JuliaHub extension interface via the VSCode command palette (press `F1`

) with the "JuliaHub: Show" command:

The JuliaHub extension interface can be used to select the number of computers ("nodes") for a distributed computation, etc. Let's select a two node cluster with four CPUs per node:

Now click the "Start Job" button. After some time the cluster will be up and running and you can view live logs as the computation progresses. In the example above, there's just a single `@info`

line of interest.

There's no external packages used in the simple script above, but we'll shortly need the JSON package so let's add a *new julia environment* and record the package version.

Run the command "Julia: Activate This Environment" from the command palette

Add the JSON package by typing

`]add JSON`

into the terminal

You'll see that two files `Project.toml`

and `Manifest.toml`

have been generated to define your package environment. As your project develops you should keep these as an important part of your source code: With these files you'll be able to reproduce your computation later and the JuliaHub extension will know which packages to install when running code in the cloud.

While we can see the output of our computation in the job logs, it's more convenient to have the important summary information directly available in the JuliaHub interface.

To make the output visible within the JuliaHub interface, the `ENV["RESULTS"]`

environment variable can be set to a snippet of JSON. For example, by changing the last few lines of our script to read:

```
using Distributed
function estimate_pi(n)
n > 0 || throw(ArgumentError("number of iterations must be >0, got $n"))
num_inside = @distributed (+) for i in 1:n
x, y = rand(), rand()
Int(x^2 + y^2 <= 1)
end
return 4 * num_inside / n
end
num_darts = 1_000_000_000
stats = @timed begin
pi_estimate = estimate_pi(num_darts)
end
@info "π estimate" pi_estimate elapsed_time=stats.time
using JSON
ENV["RESULTS"] = json(Dict(
:pi=>pi_estimate,
:num_darts=>num_darts,
:compute_time=>stats.time
))
```

Launching this job, you'll see that the job details tab in the web interface at https://juliahub.com/ui/Run now includes the summary information we added to `ENV["RESULTS"]`

:

Try running this code on a larger cluster of machines, and with a larger number of random dart throws. It's a simple parallel problem which should scale perfectly across a large cluster.