Automate Cloud Scaling for Accelerator

Create rules to automatically scale-up and scale-down deployment of nodes on the cloud.

Define automation rules to filter resource requests from Accelerator and route them to target instance types. You can provide rapid scaling parameters to scale-up node deployment on the cloud to meet a surge in demand for resources. After the jobs are finished, the nodes are undeployed based on a scale-down rule. Automations can also be added to a cluster after the cluster is deployed.
  1. Login to NavOps.
  2. Click Automations.
  3. Click Add Automation .
  4. Provide a name for the automation.
  5. Enter a description for the automation.
  6. Select an Accelerator cluster.
  7. In the IF condition menu :
    • select jq to filter job workoad.
    • select node-filter to filter on the state of the node in the inventory.
    • select none to trigger actions without conditions.
  8. For Rapid scaling, in the If form field, enter a jq statement to filter a set of Accelerator jobs.
    Tip: Click (?) get more information about the supported schema. The Jq editor supports a powerful and extensible schema to create complex rules. Use autocompletion for quick and precise editing. Here is an example of a serial rule where all conditions must be passed for a job to be processed by the rule:
    select(.memory > 512) | select(.cpus = 1) | select(.memory <= 1024)
  9. The Trigger type is defined as receive.
  10. In the Then menu, select Rapid scaling and define the Rapid Scaling parameters:
    1. Select a Cost center.
    2. Select a Node class.
    3. Enter the Maximum number of nodes to add per run.
      Note: Accelerator could be requesting more, NavOps will provide nodes in a maximum batch size as specified here.
    4. For Time to empty, enter a time value in seconds to empty the bucket of jobs. This Target Time to Empty (TTTE) is compared to the predicted Time To Empty (TTE) provided by Accelerator at regular intervals for each bucket of jobs.
    5. For Max nodes in bucket, enter the total maximum nodes provisioned per bucket.
      Note: Nodes are reserved for the buckets they are provisioned for.
    6. For Cool down period (s), define a time interval between scaling cycles before another scaling task is initiated.
      Note: Allow time for resources to be provisioned, start running jobs and for the reported (by Accelerator) Time to Empty to come down from the default (8,640,000s). As a recommendation set this to the time taken for a complete scaling cycle to complete plus the time taken for jobs to finish per node. This rule is not applicable to all workload types so please work with your Altair Application Engineer to tune this for different workload types.
  11. Click Save.
    The new automation is displayed in the automations table.
  12. Enable the Automation Engine and enable the automation.

Create Scale Down Automations

Nodes can be removed by filtering the inventory based on the state of the node. The advanced condition editor provides several templates to generat a query for scale down automations.
  1. Login to NavOps.
  2. Click Automation.
  3. Click Add Automation.
  4. Provide a name for the automation.
  5. Enter a description for the automation.
  6. Select an Accelerator cluster.
  7. In the IF condition menu select node-filter.
  8. In the Query field, click () to open the advanced condition editor.
  9. Select a template to configure a condition. For example, Remove nodes when tasker service is stopped.
    The configuration fields are populated based on the template. You can modify them as per your requirements. The generated query is displayed.
  10. Click Save.
  11. Trigger type is defined as calender.
  12. In the When section, select the values from the drop down menus to build the required cron expression displayed in the textbox form field below the menus. You can also enter valid chron expression directly in the textbox form field.
  13. In the Then menu, select simple-scale-down.
  14. Click Save.
    The new automation is displayed in the automations table.
  15. Enable the Automation Engine and enable the automation.