Automate Cloud Scaling for Grid Engine Using JQ Filters
Create rules to automatically scale-up and scale-down deployment of nodes on the cloud. Use node configuration to customize a node associated with a Grid Engine cluster. Add suitable host groups and resources. For example, cloud_job = true, h_vmem, license, ncpus, and slots.
Enable JQ Filters for Automations
- As Administrator, edit the navops-agent.yaml file located at /opt/navops/etc.
-
Uncomment the following lines.
- name: conditions-wlm type: external configuration: status_url: http://127.0.0.1:8083/v1/status managed: true unit_name: navops-conditions-wlm
-
Restart the agent: systemctl restart navops-agent.
A new service called conditions-wlm is enabled and the Scale up job filter condition type for JQ filters is available for Automations.
Add Automations
- Login to NavOps.
- Click Automations.
- Click Add Automation.
- Provide a name for the automation.
- Enter a description for the automation.
- Select a Grid Engine cluster.
-
In the IF condition menu :
- select Scale up job filter to filter job for dynamic scale-up automations..
- select none to trigger actions without conditions. For example, if you want to set a rule to scale-up instances at 8 AM.
- select node-filter to filter on the state of the node in the inventory. For example, used to scale-down instances once they are idle.
- For Jobs data driven scaling, in the If form field, enter a JQ statement to filter a set of Grid Engine jobs.
-
Filters you can define:
<field_name>__contains=<substring-value> <field_name>__in=<exact-string-value1>,<exact-string-value2>, ... <field_name>__gt=<numeric-value> <field_name>__gte=<numeric-value> <field_name>__lt=<numeric-value> <field_name>__lte=<numeric-value> <field_name>__startswith=<substring-value> <field_name>__endswith=<substring-value>
Note: Rules can be chained with & (but not with | (or) ). -
Sorting options available are:
sort=<field_name> sort=<field_name>__asc sort=<field_name>__desc Other: page_size=<num of record>
-
Here are some examples of filters:
Queued Jobs: state=qw Job Class optistruct requested (using contains as job has optistruct.default): jclass_name__contains=optistruct Filter for no job class requested (empty): jclass_name= Filter for hard request for complex ncpus >=1 and <=4: rd_list.job_hard_resources.ncpus__gte=1&rd_list.job_hard_resources.ncpus__lte=4 Filter for soft request for complex ncpus >2 and <=4: rd_list.job_soft_resources.ncpus__gt=2&rd_list.job_soft_resources.ncpus__lte=8 Note: We should always have a >0 filter as empty values are <x where x>0, e.g. rd_list.job_soft_resources.ncpus__gt=0&rd_list.job_soft_resources.ncpus__lte=8 Filter for Parallel Environment (PE) and slots requested in PE: requested_pe.name=pe1&requested_pe.pe_min__lgt=4&requested_pe.pe_max__lte=8 For MPI PE Type Jobs: state=qw&requested_pe.name=pe* state=qw&requested_pe.name__contains=pe state=qw&requested_pe.name__in=pe1,pe2,pe3 Jobs that don't request PE: requested_pe.name=&state=qw Filter for Project: JB_project=engineering Filter for Requested Queue: user_hard_requested_queues=all.q user_hard_requested_queues__contains=all user_soft_requested_queues__contains=cloud Filter for jobs older than 3 minutes (units is seconds so >180): JB_job_age__gt=180 Complex Examples Derived From Above: jclass_name__contains=optistruct&rd_list.job_hard_resources.ncpus__gt=2 &rd_list.job_hard_resources.ncpus__lte=4 jclass_name=&rd_list.job_hard_resources.ncpus__gt=2 &rd_list.job_hard_resources.ncpus__lte=4 jclass_name__in=nastran.default,optistruct.default jclass_name__in=nastran.default,optistruct.default &rd_list.job_hard_resources.ncpus__gt=2 &rd_list.job_hard_resources.ncpus__lte=4 state=qw&JB_job_age__gt=180&jclass_name__contains=optistruct &rd_list.job_hard_resources.ncpus__gte=1 &rd_list.job_hard_resources.ncpus__lte=4 state=qw&requested_pe.name=pe1&requested_pe.requested_pe=8 state=qw&requested_pe.name__in=pe1,pe* Example sorting by JobID (oldest jobs at top) and Top 20 oldest jobs: sort=JB_job_number__asc&jclass_name__contains=nastran&page_size=20
Note: Additionally, depending on the format of the job data, it may be also be necessary to:- Modify the jq-based data transformation being performed on the job data in /opt/navops/bin/get-job-data.sh.
- Modify the field_mapping in /opt/navops/etc/navops-conditions-wlm.yaml.
- The Trigger type is defined as calender.
- In the When section, select the values from the drop down menus to build the required cron expression displayed in the textbox form field below the menus. You can also enter valid chron expression directly in the textbox form field. For example, * * * * * will run the automation every minute and is the recommended default.
-
In the Then select Jobs Data Driven
Scaling and define the parameters:
-
Click Save.
The new automation is displayed in the automations table.
- Enable the Automation Engine and enable the automation.
Create Scale Down Automations
- Login to NavOps.
- Click Automation.
- Click Add Automation.
- Provide a name for the automation.
- Enter a description for the automation.
- Select a Grid Engine cluster.
- In the IF condition menu select node-filter.
-
In the Query field, click (
) to open the advanced condition editor.
-
Select a template to configure a condition. For example, Remove nodes with
age-exec service idle.
The configuration fields are populated based on the template. You can modify them as per your requirements. The generated query is displayed.
- Click Save.
- Trigger type is defined as calender.
- In the When section, select the values from the drop down menus to build the required cron expression displayed in the textbox form field below the menus. You can also enter valid chron expression directly in the textbox form field.
- In the Then menu, select Scale down (AGE).
-
Click Save.
The new automation is displayed in the automations table.
- Enable the Automation Engine and enable the automation.