Running a JuliaHub job script from your local Julia IDE

Estimating π\pi with distributed Monte Carlo

Background

In this tutorial we will estimate π\pi by simulating n darts thrown at a square dartboard with inscribed circle.

Dartboard image

When the simulated darts are thrown uniformly, the ratio n/nn_○/n_□ of the number of darts that land within the circle to the number within the square is approximately the ratio of the areas A/AA_○/A_□. For a radius rr, we have

nnAA=πr24r2=π4\begin{aligned} \frac{n_○}{n_□} \approx \frac{A_○}{A_□} = \frac{\pi r^2}{4*r^2} = \frac{\pi}{4} \end{aligned}

This gives a Monte Carlo estimate πest=4nnπ\pi_{est} = 4 \frac{n_○}{n_□} \approx \pi which improves as the total number of points is increased.

The code

To simplify the code we can use symmetry to work with just one quarter of the circle. Any radius will do, so it's simplest to choose r=1r=1. This leads to the following sequential code:

function estimate_pi(n)
    n > 0 || throw(ArgumentError("number of iterations must be >0, got $n"))
    num_inside = 0
    for i in 1:n
        x, y = rand(), rand()
        if x^2 + y^2 <= 1
            num_inside += 1
        end
    end
    return 4 * num_inside / n
end

We can run this π\pi estimation on one core of one machine, but to run it on a cluster of machines we need reorganize a little. One possible way is to write it as a distributed reduction with the @distributed macro. We'll also add a line to actually run the function and log the result, giving a complete script:

using Distributed

function estimate_pi(n)
    n > 0 || throw(ArgumentError("number of iterations must be >0, got $n"))
    num_inside = @distributed (+) for i in 1:n
        x, y = rand(), rand()
        Int(x^2 + y^2 <= 1)
    end
    return 4 * num_inside / n
end

pi_estimate = estimate_pi(1_000_000_000)

@info "Finished computation" pi_estimate

Launching the code

Next we launch the job from within your local installation of VSCode. To set things up, you'll need to install two extensions, the Julia extension and the JuliaHub extension. You'll also need to set https://juliahub.com as the package server. See the connecting your editor section for detailed instructions.

Bring up the JuliaHub extension interface via the VSCode command palette (press F1) with the "JuliaHub: Show" command:

Command Palette

The JuliaHub extension interface can be used to select the number of computers ("nodes") for a distributed computation, etc. Let's select a two node cluster with four CPUs per node:

JuliaHub Extension

Now click the "Start Job" button. After some time the cluster will be up and running and you can view live logs as the computation progresses. In the example above, there's just a single @info line of interest.

Setting the Julia package environment

There's no external packages used in the simple script above, but we'll shortly need the JSON package so let's add a new julia environment and record the package version.

  • Run the command "Julia: Activate This Environment" from the command palette

  • Add the JSON package by typing ]add JSON into the terminal

Adding packages to the environment

You'll see that two files Project.toml and Manifest.toml have been generated to define your package environment. As your project develops you should keep these as an important part of your source code: With these files you'll be able to reproduce your computation later and the JuliaHub extension will know which packages to install when running code in the cloud.

Job summary output

While we can see the output of our computation in the job logs, it's more convenient to have the important summary information directly available in the JuliaHub interface.

To make the output visible within the JuliaHub interface, the ENV["RESULTS"] environment variable can be set to a snippet of JSON. For example, by changing the last few lines of our script to read:

using Distributed

function estimate_pi(n)
    n > 0 || throw(ArgumentError("number of iterations must be >0, got $n"))
    num_inside = @distributed (+) for i in 1:n
        x, y = rand(), rand()
        Int(x^2 + y^2 <= 1)
    end
    return 4 * num_inside / n
end

num_darts = 1_000_000_000

stats = @timed begin
    pi_estimate = estimate_pi(num_darts)
end

@info "π estimate" pi_estimate elapsed_time=stats.time

using JSON
ENV["RESULTS"] = json(Dict(
    :pi=>pi_estimate,
    :num_darts=>num_darts,
    :compute_time=>stats.time
))

Launching this job, you'll see that the job details tab in the web interface at https://juliahub.com/ui/Run now includes the summary information we added to ENV["RESULTS"]:

Job outputs

Next steps

Try running this code on a larger cluster of machines, and with a larger number of random dart throws. It's a simple parallel problem which should scale perfectly across a large cluster.