Reference
Julia clusters
Distributed.addprocs
— Functionaddprocs(template, ninstances[; kwargs...])
Add Azure scale set instances where template is either a dictionary produced via the AzManagers.build_sstemplate
method or a string corresponding to a template stored in ~/.azmanagers/templates_scaleset.json.
key word arguments:
subscriptionid=template["subscriptionid"]
if exists, orAzManagers._manifest["subscriptionid"]
otherwise.resourcegroup=template["resourcegroup"]
if exists, orAzManagers._manifest["resourcegroup"]
otherwise.sigimagename=""
The name of the SIG image[1].sigimageversion=""
The version of thesigimagename
[1].imagename=""
The name of the image (alternative tosigimagename
andsigimageversion
used for development work).osdisksize=60
The size of the OS disk in GB.customenv=false
If true, then send the current project environment to the workers where it will be instantiated.session=AzSession(;lazy=true)
The Azure session used for authentication.group="cbox"
The name of the Azure scale set. If the scale set does not yet exist, it will be created.overprovision=true
Use Azure scle-set overprovisioning?ppi=1
The number of Julia processes to start per Azure scale set instance.julia_num_threads="$(Threads.nthreads(),$(Threads.nthreads(:interactive))"
set the number of julia threads for the detached process.[2]omp_num_threads=get(ENV, "OMP_NUM_THREADS", 1)
set the number of OpenMP threads to run on each workerexename="$(Sys.BINDIR)/julia"
name of the julia executable.exeflags=""
set additional command line start-up flags for Julia workers. For example,--heap-size-hint=1G
.env=Dict()
each dictionary entry is an environment variable set on the worker before Julia starts. e.g.env=Dict("OMP_PROC_BIND"=>"close")
nretry=20
Number of retries for HTTP REST calls to Azure services.verbose=0
verbose flag used in HTTP requests.save_cloud_init_failures=false
set to true to copy cloud init logs (/var/log/clout-init-output.log) from workers that fail to join the cluster.show_quota=false
after various operation, show the "x-ms-rate-remaining-resource" response header. Useful for debugging/understanding Azure quota's.user=AzManagers._manifest["ssh_user"]
ssh user.spot=false
use Azure SPOT VMs for the scale-setmaxprice=-1
set maximum price per hour for a VM in the scale-set.-1
uses the market price.spot_base_regular_priority_count=0
If spot is true, only start adding spot machines once there are this many non-spot machines added.spot_regular_percentage_above_base
If spot is true, then when ading new machines (abovespot_base_reqular_priority_count
) use regular (non-spot) priority for this percent of new machines.waitfor=false
wait for the cluster to be provisioned before returning, or return control to the caller immediately[3]mpi_ranks_per_worker=0
set the number of MPI ranks per Julia worker[4]mpi_flags="-bind-to core:$(ENV["OMP_NUM_THREADS"]) -map-by numa"
extra flags to pass to mpirun (has effect whenmpi_ranks_per_worker>0
)nvidia_enable_ecc=true
on NVIDIA machines, ensure that ECC is set totrue
orfalse
for all GPUs[5]nvidia_enable_mig=false
on NVIDIA machines, ensure that MIG is set totrue
orfalse
for all GPUs[5]hyperthreading=nothing
Turn on/off hyperthreading on supported machine sizes. The default uses the setting in the template. To override the template setting, usetrue
(on) orfalse
(off).
Notes
[1] If addprocs
is called from an Azure VM, then the default imagename
,imageversion
are the image/version the VM was built with; otherwise, it is the latest version of the image specified in the scale-set template. [2] Interactive threads are supported beginning in version 1.9 of Julia. For earlier versions, the default for julia_num_threads
is Threads.nthreads()
. [3] waitfor=false
reflects the fact that the cluster manager is dynamic. After the call to addprocs
returns, use workers()
to monitor the size of the cluster. [4] This is inteneded for use with Devito. In particular, it allows Devito to gain performance by using MPI to do domain decomposition using MPI within a single VM. If mpi_ranks_per_worker=0
, then MPI is not used on the Julia workers. This feature makes use of package extensions, meaning that you need to ensure that using MPI
is somewhere in your calling script. [5] This may result in a re-boot of the VMs
AzManagers.preempted
— Functionispreempted,notbefore = preempted([id=myid()|id="instanceid"])
Check to see if the machine id::Int
has received an Azure spot preempt message. Returns (true, notbefore) if a preempt message is received and (false,"") otherwise. notbefore
is the date/time before which the machine is guaranteed to still exist.
Detached service
AzManagers.addproc
— Functionaddproc(template[; name="", basename="cbox", subscriptionid="myid", resourcegroup="mygroup", nretry=10, verbose=0, session=AzSession(;lazy=true), sigimagename="", sigimageversion="", imagename="", detachedservice=true])
Create a VM, and returns a named tuple (name,ip,resourcegrup,subscriptionid)
where name
is the name of the VM, and ip
is the ip address of the VM. resourcegroup
and subscriptionid
denote where the VM resides on Azure.
Parameters
name=""
name for the VM. If it is not an empty string, then the next paramter (basename
) is ignoredbasename="cbox"
base name for the VM, we append a random suffix to ensure uniquenesssubscriptionid=template["subscriptionid"]
if exists, orAzManagers._manifest["subscriptionid"]
otherwise.resourcegroup=template["resourcegroup"]
if exists, orAzManagers._manifest["resourcegroup"]
otherwise.session=AzSession(;lazy=true)
Session used for OAuth2 authenticationsigimagename=""
Azure shared image gallery image to use for the VM (defaults to the template's image)sigimageversion=""
Azure shared image gallery image version to use for the VM (defaults to latest)imagename=""
Azure image name used as an alternative tosigimagename
andsigimageversion
(used for development work)osdisksize=60
Disk size of the OS disk in GBcustomenv=false
If true, then send the current project environment to the workers where it will be instantiated.nretry=10
Max retries for re-tryable REST call failuresverbose=0
Verbosity flag passes to HTTP.jl methodsshow_quota=false
after various operation, show the "x-ms-rate-remaining-resource" response header. Useful for debugging/understanding Azure quota's.julia_num_threads="$(Threads.nthreads(),$(Threads.nthreads(:interactive))"
set the number of julia threads for the workers.[1]omp_num_threads = get(ENV, "OMP_NUM_THREADS", 1)
setOMP_NUM_THREADS
environment variable before starting the detached processexename="$(Sys.BINDIR)/julia"
name of the julia executable.env=Dict()
Dictionary of environemnt variables that will be exported before starting the detached processdetachedservice=true
start the detached service allowing for RESTful remote code execution
Notes
[1] Interactive threads are supported beginning in version 1.9 of Julia. For earlier versions, the default for julia_num_threads
is Threads.nthreads()
.
AzManagers.@detachat
— Macro@detachat myvm begin ... end
Run code on an Azure VM.
Example
using AzManagers
myvm = addproc("myvm")
job = @detachat myvm begin
@info "I'm running detached"
end
read(job)
wait(job)
rmproc(myvm)
AzManagers.variablebundle
— Functionvariablebundle(:key)
Retrieve a variable from a variable bundle. See variablebundle!
for more information.
AzManagers.variablebundle!
— Functionvariablebundle!(;kwargs...)
Define variables that will be passed to a detached job.
Example
using AzManagers
variablebundle(;x=1)
myvm = addproc("myvm")
myjob = @detachat myvm begin
write(stdout, "my variable is $(variablebundle(:x))
")
end
wait(myjob)
read(myjob)
Base.read
— Functionread(job[;stdio=stdout])
returns the stdout from a detached job.
AzManagers.rmproc
— Functionrmproc(vm[; session=AzSession(;lazy=true), verbose=0, nretry=10])
Delete the VM that was created using the addproc
method.
Parameters
session=AzSession(;lazy=true)
Azure session for OAuth2 authenticationverbose=0
verbosity flag passed to HTTP.jl methodsnretry=10
max number of retries for retryable REST callsshow_quota=false
after various operation, show the "x-ms-rate-remaining-resource" response header. Useful for debugging/understanding Azure quota's.
AzManagers.status
— Functionstatus(job)
returns the status of a detached job.
Base.wait
— Functionwait(job[;stdio=stdout])
blocks until the detached job, job
, is complete.
Configuration
AzManagers.build_nictemplate
— FunctionAzManagers.build_nictemplate(nic_name; kwargs...)
Returns a dictionary for a NIC template, and that can be passed to the addproc
method, or written to AzManagers.jl configuration files.
Required keyword arguments
subscriptionid
Azure subscriptionresourcegroup_vnet
Azure resource group that holds the virtual network that the NIC is attaching to.vnet
Azure virtual network for the NIC to attach to.subnet
Azure sub-network name.location
location of the Azure data center where the NIC correspond to.
Optional keyword arguments
accelerated=true
use accelerated networking (not all VM sizes support accelerated networking).
AzManagers.build_sstemplate
— FunctionAzManagers.build_sstemplate(name; kwargs...)
returns a dictionary that is an Azure scaleset template for use in addprocs
or for saving to the ~/.azmanagers
folder.
required key-word arguments
subscriptionid
Azure subscriptionadmin_username
ssh user for the scaleset virtual machineslocation
Azure data-center locationresourcegroup
Azure resource-groupimagegallery
Azure image gallery that contains the VM imageimagename
Azure imagevnet
Azure virtual network for the scalesetsubnet
Azure virtual subnet for the scalesetskuname
Azure VM type
optional key-word arguments
subscriptionid_image
Azure subscription corresponding to the image gallery, defaults tosubscriptionid
resourcegroup_vnet
Azure resource group corresponding to the virtual network, defaults toresourcegroup
resourcegroup_image
Azure resource group correcsponding to the image gallery, defaults toresourcegroup
osdisksize=60
Disk size in GB for the operating system diskskutier = "Standard"
Azure SKU tier.datadisks=[]
list of data disks to create and attach [1]tempdisk = "sudo mkdir -m 777 /mnt/scratch; ln -s /mnt/scratch /scratch"
cloud-init commands used to mount or link to temporary disktags = Dict("azure_tag_name" => "some_tag_value")
Optional tags argument for resourceencryption_at_host = false
Optional argument for enabling encryption at host
Notes
[1] Each datadisk is a Dictionary. For example,
Dict("createOption"=>"Empty", "diskSizeGB"=>1023, "managedDisk"=>Dict("storageAccountType"=>"PremiumSSD_LRS"))
or, to accept the defaults,
Dict("diskSizeGB"=>1023)
The above example is populated with the default options. So, if datadisks=[Dict()]
, then the default options will be included.
AzManagers.build_vmtemplate
— FunctionAzManagers.build_vmtemplate(vm_name; kwargs...)
Returns a dictionary for a virtual machine template, and that can be passed to the addproc
method or written to AzManagers.jl configuration files.
Required keyword arguments
subscriptionid
Azure subscriptionadmin_username
ssh user for the scaleset virtual machineslocation
Azure data center locationresourcegroup
Azure resource group where the VM will resideimagegallery
Azure shared image gallery nameimagename
Azure image name that is in the shared image galleryvmsize
Azure vm type, e.g. "StandardD8sv3"
Optional keyword arguments
resourcegroup_vnet
Azure resource group containing the virtual network, defaults toresourcegroup
subscriptionid_image
Azure subscription containing the image gallery, defaults tosubscriptionid
resourcegroup_image
Azure resource group containing the image gallery, defaults tosubscriptionid
nicname = "cbox-nic"
Name of the NIC for this VMosdisksize = 60
size in GB of the OS diskdatadisks=[]
additional data disks to attach- `tempdisk = "sudo mkdir -m 777 /mnt/scratch
ln -s /mnt/scratch /scratch"` cloud-init commands used to mount or link to temporary disk
tags = Dict("azure_tag_name" => "some_tag_value")
Optional tags argument for resourceencryption_at_host = false
Optional argument for enabling encryption at host
Notes
[1] Each datadisk is a Dictionary. For example,
Dict("createOption"=>"Empty", "diskSizeGB"=>1023, "managedDisk"=>Dict("storageAccountType"=>"PremiumSSD_LRS"))
The above example is populated with the default options. So, if datadisks=[Dict()]
, then the default options will be included.
AzManagers.write_manifest
— FunctionAzManagers.write_manifest(;resourcegroup="", subscriptionid="", ssh_user="", ssh_public_key_file="~/.ssh/azmanagers_rsa.pub", ssh_private_key_file="~/.ssh/azmanagers_rsa")
Write an AzManagers manifest file (~/.azmanagers/manifest.json). The manifest file contains information specific to your Azure account.
AzManagers.save_template_nic
— FunctionAzManagers.save_template_nic(nic_name, template)
Save template::Dict
generated by AzManagers.buildnictmplate to /juliateam/.azmanagers/templatesnic.json.
AzManagers.save_template_scaleset
— FunctionAzManagers.save_template_scaleset(scalesetname, template)
Save template::Dict
generated by AzManagers.buildsstemplate to /juliateam/.azmanagers/templatesscaleset.json.
AzManagers.save_template_vm
— FunctionAzManagers.save_template_vm(vm_name, template)
Save template::Dict
generated by AzManagers.buildvmtmplate to /juliateam/.azmanagers/templatesvm.json.