Cloud Scaling Startup Script

Create a cloud-init script that is executed when the cloud node is deployed.

Introduction

Your site will want to do some configuration to their cloud nodes after booting. For example, you may want to install some packages, add users, or start services. A startup script can be added to a node class that will be run when the instance boots to perform automated tasks. Startup scripts can perform many actions, such as installing software, performing updates, turning on services, and any other tasks defined in the script. You can use startup scripts to easily and programmatically customize your cloud instances.

Startup Script on Windows Platforms

On Windows platforms, the startup script must be a PowerShell script. The content of the PowerShell script should be enclosed in <powershell> and </powershell>. For more information about PowerShell see PowerShell Scripting.

Startup Script on Linux Platforms

On Linux platforms, a utility specifically designed for cloud instance initialization is cloud-init. The cloud-init program is a bootstrapping utility for pre-provisioned disk images that run in virtualized environments, usually cloud-oriented services. Basically, it sets up the server instance to be usable when it’s finished booting. You must install cloud-init on your cloud provider VM to simplify the task of configuring your instances on boot. For more information see cloud-init.

Several input types are supported by cloud-init.
  • Shell scripts
  • Cloud config files

The simplest way to configure an instance on boot is to use a shell script. The shell script must begin with #! in order for cloud-init to recognize it as a shell script.

Cloud-init Example for Accelerator

Here is a cloud-init for Accelerator nodeclass (without the templating). A <<DYNAMIC VALUE>> shows what needs to be updated based on your site configuration. The variables used in cloud-init need to be escaped.

#!/bin/bash
 
navops_host=10.0.0.70 <<DYNAMIC VALUE>>

# Default price to use in the case where the cloud provider didn't have price data so NavOps does not hold this data
default_price=0.555 <<DYNAMIC VALUE>>
default_spot_price=0.222 <<DYNAMIC VALUE>>
python=/usr/bin/python3
navops_conf_file=/opt/navops/etc/navops-agent.yaml

# NavOps populates these templated values with data, available to use to extend the utility of the cloud-init script for your own purposes
nats_token={{ nats_token }}
nats_urls=nats://${navops_host}:4222
accplus_host={{ vovhostname }}
bucketid={{ bucketid }}
user={{ user }}
group={{ group }}
projectname={{ projectname }}
vovportnumber={{ vovportnumber }}
vovdir={{ vovdir }}
cloud_environment_uid={{ cloud_environment_uid }}
instance_type={{ instance_type }}
machine_group_id={{ machine_group_id }}
preemptable={{ preemptable }}
name=$(hostname)
cost_resource_name=COST
slots_per_tasker=$(lscpu | egrep 'CPU\(s\)' | head -n1 | awk '{print $NF}')
# slots_per_tasker=1/1
 
# Set for manual burst taskers as this data is not available to the cloud-init for manual scaling
[ -z "$accplus_host"] && accplus_host=10.0.0.150 <<DYNAMIC VALUE>>
[ -z "$projectname"] && projectname=wx <<DYNAMIC VALUE>>
[ -z "$vovportnumber"] && vovportnumber=11495 <<DYNAMIC VALUE>>
[ -z "$user"] && user=accplusadmin <<DYNAMIC VALUE>>
[ -z "$group"] && group=/time/users.${admin_user}
[ -z "$vovdir"] && vovdir=/opt/rtda/2023.1/linux64 <<DYNAMIC VALUE>>

# Instance cost retrieval
price_filename=/tmp/instance-type-price.json
curl -s -o ${price_filename} "http://${navops_host}/api/v1/instance-type-price?cloud_environment_uid=${cloud_environment_uid}&instance_type=${instance_type}"
if [ $? -eq 0 ]; then
  price=$(cat ${price_filename} | ${python} -c 'import json,sys;obj=json.load(sys.stdin);print(obj.get("price", "NULL"))')
  if [ $? -ne 0 ] || [ $price == "NULL" ]; then
    price=${default_price}
  fi
  spot_price=$(cat ${price_filename} | ${python} -c 'import json,sys;obj=json.load(sys.stdin);print(obj.get("spot_price", "NULL"))')
  if [ $? -ne 0 ] || [ $spot_price == "NULL" ]; then
    spot_price=${default_spot_price}
  fi
else
  price=${default_price}
  spot_price=${default_spot_price}
fi
 
cost_resource=${cost_resource_name}#${price}
if [[ "$preemptable" =~ ^(true|True)$ ]]; then
  cost_resource=${cost_resource_name}#${spot_price}
fi
 
# Install, configure and start navops-agent
curl --insecure https://${navops_host}/resources/agent/navops-agent.accelerator-tasker.linux_amd64.tgz | sudo tar xzf - -C /opt
/opt/navops/bin/agent-setup.sh --non-interactive
systemctl enable /opt/navops/etc/systemd/navops-agent.service

# Please uncomment and update the right hand side of this sed statement with your path for non-default install locations
# Note: This updates the agent to leverage the correct path
# /usr/bin/sed -i "s?/opt/accelerator/common/etc/vovrc.sh?/opt/rtda/2023.1/common/etc/vovrc.sh?" ${navops_conf_file}
if [ -z "$bucketid" ]; then
  tasker_cmd="${vovdir}/bin/wxagent -W "tasker_" -p ${projectname} -h $accplus_host --port ${vovportnumber} --vovdir ${vovdir} --maxLife 1w --maxIdle 0 --taskerVerbosity 1 -U 15s -T ${slots_per_tasker} -w USER=$user:GROUP=$group:LOGPATH=/tmp -r ${cost_resource} >/var/log/wxagent.log 2>&1"
else
  tasker_cmd="${vovdir}/bin/wxagent -W "tasker_$(printf '%09d\n' $bucketid)_" -p ${projectname} -h $accplus_host --port ${vovportnumber} --vovdir ${vovdir} --maxIdle 30s --maxLife 1w --taskerVerbosity 1 --taskerTimeout 120 -U 15s -T ${slots_per_tasker} -e $bucketid -w BUCKETID=$bucketid:USER=$user:GROUP=$group:LOGPATH=/tmp -r ${cost_resource} -r BUCKET:$(printf '%09d\n' $bucketid) >/var/log/wxagent.log 2>&1"
fi
sed -i '/tasker_cmd:/d' ${navops_conf_file}
echo "      tasker_cmd: export VOVDIR=$vovdir && $tasker_cmd" >> ${navops_conf_file}
 
systemctl start navops-agent

Cloud-init Example for PBS Professional

Here is a cloud-init for PBS Professional nodeclass (without the templating). A <<DYNAMIC VALUE>> shows what need to be updated based on your site configuration.. The variables used in cloud-init need to be escaped.

#!/bin/bash
navops_host="10.0.0.70" <<DYNAMIC_VALUE>>
pbs_server_ip="10.0.0.250" <<DYNAMIC_VALUE>>
TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
private_ip=$(curl -f -s -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/local-ipv4)
momHost=$(hostname)
momFQDN=$(hostname -f)
echo "$private_ip $momHost $momFQDN" >> /etc/hosts
echo "PBS_MOM_NODE_NAME=${private_ip}" >> /etc/pbs.conf
echo "PBS_SERVER_HOST_NAME=${pbs_server_ip}" >> /etc/pbs.conf
echo "PBS_LEAF_ROUTERS=${pbs_server_ip}" >> /etc/pbs.conf
curl --insecure https://${navops_host}/resources/agent/navops-agent.pbspro-mom.linux_amd64.tgz|tar xzf - -C /opt
/opt/navops/bin/agent-setup.sh --non-interactive
systemctl enable /opt/navops/etc/systemd/navops-agent.service
systemctl enable pbs
systemctl start pbs
systemctl start navops-agent

Cloud-init Example for Grid Engine

Here is a cloud-init for Grid Engine nodeclass (without the templating). A <<DYNAMIC VALUE>> shows what need to be updated based on your site configuration.. The variables used in cloud-init need to be escaped.

#!/bin/bash
NAVOPS_HOST="10.0.0.70" <<DYNAMIC VALUE>>
SGE_ROOT="/age" <<DYNAMIC VALUE>>
SGE_CELL="default" <<DYNAMIC VALUE>>
curl --insecure https://${NAVOPS_HOST}/resources/agent/navops-agent.age-exec.linux_amd64.tgz | sudo tar xzf - -C /opt
/opt/navops/bin/agent-setup.sh --non-interactive
systemctl enable /opt/navops/etc/systemd/navops-agent.service
# /usr/bin/sed -i "s#sge_root:.*#sge_root: \/age#" /opt/navops/etc/meta.yaml
# /usr/bin/sed -i "s#^spool_dir:.*#spool_dir: /age/default/spool/$(hostname | awk -F. '{print $1}')/active_jobs#" /opt/navops/etc/meta.yaml
systemctl start navops-agent

Cloud-init Example for Slurm

Here is a cloud-init for Slurm nodeclass

#!/bin/bash
# Define a log file
LOGFILE="/home/ec2-user/slurm-files/install.log"
mkdir -p /home/ec2-user/slurm-files
# Install dependencies for building Slurm
echo "Installing dependencies for building Slurm..." >> $LOGFILE
# Clone the json-c repository
git clone --depth 1 --single-branch -b json-c-0.15-20200726 https://github.com/json-c/json-c.git /home/ec2-user/slurm-files/json-c >> $LOGFILE 2>&1
# Create a build directory
mkdir -p /home/ec2-user/slurm-files/json-c-build
# Run cmake with absolute paths
cmake /home/ec2-user/slurm-files/json-c -B/home/ec2-user/slurm-files/json-c-build >> $LOGFILE 2>&1
# Compile and install using the build directory
sudo make -C /home/ec2-user/slurm-files/json-c-build >> $LOGFILE 2>&1
sudo make -C /home/ec2-user/slurm-files/json-c-build install >> $LOGFILE 2>&1
# Clone the libjwt repository
echo "Cloning libjwt repository..." >> /home/ec2-user/slurm-files/install.log
git clone --depth 1 --single-branch -b v1.12.0 https://github.com/benmcollins/libjwt.git /home/ec2-user/slurm-files/libjwt >> /home/ec2-user/slurm-files/install.log 2>&1
# Run autoreconf with absolute paths
echo "Running autoreconf for libjwt..." >> /home/ec2-user/slurm-files/install.log
autoreconf --install /home/ec2-user/slurm-files/libjwt >> /home/ec2-user/slurm-files/install.log 2>&1
# Run configure with absolute paths
echo "Configuring libjwt..." >> /home/ec2-user/slurm-files/install.log
/home/ec2-user/slurm-files/libjwt/configure --prefix=/usr/local >> /home/ec2-user/slurm-files/install.log 2>&1
# Build and install libjwt with absolute paths
echo "Building and installing libjwt..." >> /home/ec2-user/slurm-files/install.log
make -j -C /home/ec2-user/slurm-files/libjwt >> /home/ec2-user/slurm-files/install.log 2>&1
sudo make install -C /home/ec2-user/slurm-files/libjwt >> /home/ec2-user/slurm-files/install.log 2>&1
# Log completion of dependencies installation
echo "Dependencies installed." >> /home/ec2-user/slurm-files/install.log