Reference

Julia clusters

Distributed.addprocsFunction
addprocs(template, ninstances[; kwargs...])

Add Azure scale set instances where template is either a dictionary produced via the AzManagers.build_sstemplate method or a string corresponding to a template stored in ~/.azmanagers/templates_scaleset.json.

key word arguments:

  • subscriptionid=template["subscriptionid"] if exists, or AzManagers._manifest["subscriptionid"] otherwise.
  • resourcegroup=template["resourcegroup"] if exists, or AzManagers._manifest["resourcegroup"] otherwise.
  • sigimagename="" The name of the SIG image[1].
  • sigimageversion="" The version of the sigimagename[1].
  • imagename="" The name of the image (alternative to sigimagename and sigimageversion used for development work).
  • osdisksize=60 The size of the OS disk in GB.
  • customenv=false If true, then send the current project environment to the workers where it will be instantiated.
  • session=AzSession(;lazy=true) The Azure session used for authentication.
  • group="cbox" The name of the Azure scale set. If the scale set does not yet exist, it will be created.
  • overprovision=true Use Azure scle-set overprovisioning?
  • ppi=1 The number of Julia processes to start per Azure scale set instance.
  • julia_num_threads="$(Threads.nthreads(),$(Threads.nthreads(:interactive))" set the number of julia threads for the detached process.[2]
  • omp_num_threads=get(ENV, "OMP_NUM_THREADS", 1) set the number of OpenMP threads to run on each worker
  • exename="$(Sys.BINDIR)/julia" name of the julia executable.
  • exeflags="" set additional command line start-up flags for Julia workers. For example, --heap-size-hint=1G.
  • env=Dict() each dictionary entry is an environment variable set on the worker before Julia starts. e.g. env=Dict("OMP_PROC_BIND"=>"close")
  • nretry=20 Number of retries for HTTP REST calls to Azure services.
  • verbose=0 verbose flag used in HTTP requests.
  • save_cloud_init_failures=false set to true to copy cloud init logs (/var/log/clout-init-output.log) from workers that fail to join the cluster.
  • show_quota=false after various operation, show the "x-ms-rate-remaining-resource" response header. Useful for debugging/understanding Azure quota's.
  • user=AzManagers._manifest["ssh_user"] ssh user.
  • spot=false use Azure SPOT VMs for the scale-set
  • maxprice=-1 set maximum price per hour for a VM in the scale-set. -1 uses the market price.
  • spot_base_regular_priority_count=0 If spot is true, only start adding spot machines once there are this many non-spot machines added.
  • spot_regular_percentage_above_base If spot is true, then when ading new machines (above spot_base_reqular_priority_count) use regular (non-spot) priority for this percent of new machines.
  • waitfor=false wait for the cluster to be provisioned before returning, or return control to the caller immediately[3]
  • mpi_ranks_per_worker=0 set the number of MPI ranks per Julia worker[4]
  • mpi_flags="-bind-to core:$(ENV["OMP_NUM_THREADS"]) -map-by numa" extra flags to pass to mpirun (has effect when mpi_ranks_per_worker>0)
  • nvidia_enable_ecc=true on NVIDIA machines, ensure that ECC is set to true or false for all GPUs[5]
  • nvidia_enable_mig=false on NVIDIA machines, ensure that MIG is set to true or false for all GPUs[5]
  • hyperthreading=nothing Turn on/off hyperthreading on supported machine sizes. The default uses the setting in the template. To override the template setting, use true (on) or false (off).

Notes

[1] If addprocs is called from an Azure VM, then the default imagename,imageversion are the image/version the VM was built with; otherwise, it is the latest version of the image specified in the scale-set template. [2] Interactive threads are supported beginning in version 1.9 of Julia. For earlier versions, the default for julia_num_threads is Threads.nthreads(). [3] waitfor=false reflects the fact that the cluster manager is dynamic. After the call to addprocs returns, use workers() to monitor the size of the cluster. [4] This is inteneded for use with Devito. In particular, it allows Devito to gain performance by using MPI to do domain decomposition using MPI within a single VM. If mpi_ranks_per_worker=0, then MPI is not used on the Julia workers. This feature makes use of package extensions, meaning that you need to ensure that using MPI is somewhere in your calling script. [5] This may result in a re-boot of the VMs

AzManagers.preemptedFunction
ispreempted,notbefore = preempted([id=myid()|id="instanceid"])

Check to see if the machine id::Int has received an Azure spot preempt message. Returns (true, notbefore) if a preempt message is received and (false,"") otherwise. notbefore is the date/time before which the machine is guaranteed to still exist.

Detached service

AzManagers.addprocFunction
addproc(template[; name="", basename="cbox", subscriptionid="myid", resourcegroup="mygroup", nretry=10, verbose=0, session=AzSession(;lazy=true), sigimagename="", sigimageversion="", imagename="", detachedservice=true])

Create a VM, and returns a named tuple (name,ip,resourcegrup,subscriptionid) where name is the name of the VM, and ip is the ip address of the VM. resourcegroup and subscriptionid denote where the VM resides on Azure.

Parameters

  • name="" name for the VM. If it is not an empty string, then the next paramter (basename) is ignored
  • basename="cbox" base name for the VM, we append a random suffix to ensure uniqueness
  • subscriptionid=template["subscriptionid"] if exists, or AzManagers._manifest["subscriptionid"] otherwise.
  • resourcegroup=template["resourcegroup"] if exists, or AzManagers._manifest["resourcegroup"] otherwise.
  • session=AzSession(;lazy=true) Session used for OAuth2 authentication
  • sigimagename="" Azure shared image gallery image to use for the VM (defaults to the template's image)
  • sigimageversion="" Azure shared image gallery image version to use for the VM (defaults to latest)
  • imagename="" Azure image name used as an alternative to sigimagename and sigimageversion (used for development work)
  • osdisksize=60 Disk size of the OS disk in GB
  • customenv=false If true, then send the current project environment to the workers where it will be instantiated.
  • nretry=10 Max retries for re-tryable REST call failures
  • verbose=0 Verbosity flag passes to HTTP.jl methods
  • show_quota=false after various operation, show the "x-ms-rate-remaining-resource" response header. Useful for debugging/understanding Azure quota's.
  • julia_num_threads="$(Threads.nthreads(),$(Threads.nthreads(:interactive))" set the number of julia threads for the workers.[1]
  • omp_num_threads = get(ENV, "OMP_NUM_THREADS", 1) set OMP_NUM_THREADS environment variable before starting the detached process
  • exename="$(Sys.BINDIR)/julia" name of the julia executable.
  • env=Dict() Dictionary of environemnt variables that will be exported before starting the detached process
  • detachedservice=true start the detached service allowing for RESTful remote code execution

Notes

[1] Interactive threads are supported beginning in version 1.9 of Julia. For earlier versions, the default for julia_num_threads is Threads.nthreads().

AzManagers.@detachatMacro
@detachat myvm begin ... end

Run code on an Azure VM.

Example

using AzManagers
myvm = addproc("myvm")
job = @detachat myvm begin
    @info "I'm running detached"
end
read(job)
wait(job)
rmproc(myvm)
AzManagers.variablebundleFunction
variablebundle(:key)

Retrieve a variable from a variable bundle. See variablebundle! for more information.

AzManagers.variablebundle!Function
variablebundle!(;kwargs...)

Define variables that will be passed to a detached job.

Example

using AzManagers
variablebundle(;x=1)
myvm = addproc("myvm")
myjob = @detachat myvm begin
    write(stdout, "my variable is $(variablebundle(:x))
")
end
wait(myjob)
read(myjob)
Base.readFunction
read(job[;stdio=stdout])

returns the stdout from a detached job.

AzManagers.rmprocFunction
rmproc(vm[; session=AzSession(;lazy=true), verbose=0, nretry=10])

Delete the VM that was created using the addproc method.

Parameters

  • session=AzSession(;lazy=true) Azure session for OAuth2 authentication
  • verbose=0 verbosity flag passed to HTTP.jl methods
  • nretry=10 max number of retries for retryable REST calls
  • show_quota=false after various operation, show the "x-ms-rate-remaining-resource" response header. Useful for debugging/understanding Azure quota's.
Base.waitFunction
wait(job[;stdio=stdout])

blocks until the detached job, job, is complete.

Configuration

AzManagers.build_nictemplateFunction
AzManagers.build_nictemplate(nic_name; kwargs...)

Returns a dictionary for a NIC template, and that can be passed to the addproc method, or written to AzManagers.jl configuration files.

Required keyword arguments

  • subscriptionid Azure subscription
  • resourcegroup_vnet Azure resource group that holds the virtual network that the NIC is attaching to.
  • vnet Azure virtual network for the NIC to attach to.
  • subnet Azure sub-network name.
  • location location of the Azure data center where the NIC correspond to.

Optional keyword arguments

  • accelerated=true use accelerated networking (not all VM sizes support accelerated networking).
AzManagers.build_sstemplateFunction
AzManagers.build_sstemplate(name; kwargs...)

returns a dictionary that is an Azure scaleset template for use in addprocs or for saving to the ~/.azmanagers folder.

required key-word arguments

  • subscriptionid Azure subscription
  • admin_username ssh user for the scaleset virtual machines
  • location Azure data-center location
  • resourcegroup Azure resource-group
  • imagegallery Azure image gallery that contains the VM image
  • imagename Azure image
  • vnet Azure virtual network for the scaleset
  • subnet Azure virtual subnet for the scaleset
  • skuname Azure VM type

optional key-word arguments

  • subscriptionid_image Azure subscription corresponding to the image gallery, defaults to subscriptionid
  • resourcegroup_vnet Azure resource group corresponding to the virtual network, defaults to resourcegroup
  • resourcegroup_image Azure resource group correcsponding to the image gallery, defaults to resourcegroup
  • osdisksize=60 Disk size in GB for the operating system disk
  • skutier = "Standard" Azure SKU tier.
  • datadisks=[] list of data disks to create and attach [1]
  • tempdisk = "sudo mkdir -m 777 /mnt/scratch; ln -s /mnt/scratch /scratch" cloud-init commands used to mount or link to temporary disk
  • tags = Dict("azure_tag_name" => "some_tag_value") Optional tags argument for resource
  • encryption_at_host = false Optional argument for enabling encryption at host

Notes

[1] Each datadisk is a Dictionary. For example,

Dict("createOption"=>"Empty", "diskSizeGB"=>1023, "managedDisk"=>Dict("storageAccountType"=>"PremiumSSD_LRS"))

or, to accept the defaults,

Dict("diskSizeGB"=>1023)

The above example is populated with the default options. So, if datadisks=[Dict()], then the default options will be included.

AzManagers.build_vmtemplateFunction
AzManagers.build_vmtemplate(vm_name; kwargs...)

Returns a dictionary for a virtual machine template, and that can be passed to the addproc method or written to AzManagers.jl configuration files.

Required keyword arguments

  • subscriptionid Azure subscription
  • admin_username ssh user for the scaleset virtual machines
  • location Azure data center location
  • resourcegroup Azure resource group where the VM will reside
  • imagegallery Azure shared image gallery name
  • imagename Azure image name that is in the shared image gallery
  • vmsize Azure vm type, e.g. "StandardD8sv3"

Optional keyword arguments

  • resourcegroup_vnet Azure resource group containing the virtual network, defaults to resourcegroup
  • subscriptionid_image Azure subscription containing the image gallery, defaults to subscriptionid
  • resourcegroup_image Azure resource group containing the image gallery, defaults to subscriptionid
  • nicname = "cbox-nic" Name of the NIC for this VM
  • osdisksize = 60 size in GB of the OS disk
  • datadisks=[] additional data disks to attach
  • `tempdisk = "sudo mkdir -m 777 /mnt/scratch

ln -s /mnt/scratch /scratch"` cloud-init commands used to mount or link to temporary disk

  • tags = Dict("azure_tag_name" => "some_tag_value") Optional tags argument for resource
  • encryption_at_host = false Optional argument for enabling encryption at host

Notes

[1] Each datadisk is a Dictionary. For example,

Dict("createOption"=>"Empty", "diskSizeGB"=>1023, "managedDisk"=>Dict("storageAccountType"=>"PremiumSSD_LRS"))

The above example is populated with the default options. So, if datadisks=[Dict()], then the default options will be included.

AzManagers.write_manifestFunction
AzManagers.write_manifest(;resourcegroup="", subscriptionid="", ssh_user="", ssh_public_key_file="~/.ssh/azmanagers_rsa.pub", ssh_private_key_file="~/.ssh/azmanagers_rsa")

Write an AzManagers manifest file (~/.azmanagers/manifest.json). The manifest file contains information specific to your Azure account.

AzManagers.save_template_nicFunction
AzManagers.save_template_nic(nic_name, template)

Save template::Dict generated by AzManagers.buildnictmplate to /juliateam/.azmanagers/templatesnic.json.

AzManagers.save_template_scalesetFunction
AzManagers.save_template_scaleset(scalesetname, template)

Save template::Dict generated by AzManagers.buildsstemplate to /juliateam/.azmanagers/templatesscaleset.json.

AzManagers.save_template_vmFunction
AzManagers.save_template_vm(vm_name, template)

Save template::Dict generated by AzManagers.buildvmtmplate to /juliateam/.azmanagers/templatesvm.json.