Unified InsightPro Database Schema

View the InsightPro database schema that combines the accounting logs from PBS Pro, AGE, and Accelerator.
The following table provides the InsightPro database schema mapping across WLMs

Field Name	Data Type	Field Description	PBS Pro	AGE	Accelerator
account	String	Account info	account	account	NA
arrayIndex	UInt64	Array job task Identifier	Extract task number from job id	task_number
cgroupsFailcnt	UInt64	Number of memory limit hits (cgroups).	N/A	failcnt	N/A
cgroupsMaxMemory	Float64	Maximum of the cgroups_memory load value. cgroups_memory is calculated from values in the cgroups file memory.stat	N/A	max_cgroups_memory	N/A
cgroupsMaxUsageInBytes	Float64	Maximum recorded memory usage (cgroups).	N/A	max_usage_in_bytes	N/A
cgroupsMemswFailcnt	UInt64	Number of memory and swap limit hits(cgroups).	N/A	memsw.failcnt	N/A
cgroupsMemswMaxUsageInBytes	Float64	Maximum recorded memory and swap usage (cgroups).	N/A	memsw.max_usage_in_bytes	N/A
cluster_id	UUID	Unique cluster id to identify different cluster	N/A	N/A	N/A
cluster_type	String	PBS or AGE or ACCELERATOR	N/A	N/A	N/A
command	String	Application name	Resource_List.software	job_name	TOOL
cpuTimeUsed	UInt64	CPU usage in seconds calculated as usr+sys time (or taken from the cgroup cpuacct subsystem if it is active for the job in PBS)	resources_used.cput	cpu	CPUTIME
cwd	String	The working directory the job ran in as specified with qsub(1) / qalter(1) switches -cwd and -wd. As the delimiter used by the accounting file (colon “:”) can be part of the working directory all colons in the working directory are re‐placed by ASCII code 255.	N/A	cwd	DIR
deadline	UInt64	unix time stamp when jobs need to be done	N/A	N/A	deadline
department	String	Department name	N/A	department	N/A
dependOnJobs	String	The job ids which this job was dependent to run	depend	hold_jid (65)	N/A
dpcount	UInt64	Distributed parallel jobs - e.g 200 (cores/slots)	N/A	N/A	dpcount
dphosts	String	list of hosts where job was running	N/A	N/A	dphosts
dptopid	UInt64	top id - "master job" - numerical id	N/A	N/A	dptopid
earliestStartTime	UInt64	The initially requested execution time when the job was submitted	etime	orig_exec_time	N/A
endTime	UInt64	Job end time. Values is in epoch timestamp in microseconds.	end	end_time	ENDTIME
env	String	The environment which the job should use (like "modules", VOV has a named-environment management facility).	N/A	N/A	env
exclusive	Bool	If job requested for exclusive use of node	Resource_List.place=excl	N/A	N/A
exitStatus	Int64	Exit status of the job	Exit_status	exit_status	EXIT
failed	Int64	Indicates the problem which occurred in case a job could not be started on the execution host	Exit_status	failed	FAILCODE
fsgroup	String	FairShare group specified for job.	N/A	N/A	fsgroup
fstokens	UInt64	Number of FairShare tokens specified for job.	N/A	N/A	fstokens
gpusRequested	UInt64	No. of GPU slots	Resource_List.ngpus	Configurable variable name in agent configuration. The usage will be fetch from hard_resources -> <variable name>.	N/A
gpuUsage	UInt64	GPU usage	N/A	N/A	N/A
grabbed	String	Of resources requested, those that have been confirmed as being in-use.	N/A	N/A	grabbed
group	String	Linux user group name	group	group	OSGROUP
instanceType	String	"on-prem" default value for all the WLM. "on-cloud" value for PBS Navops cloud setup.	Resource_List.cloud_eligible=True	N/A	N/A
io	Float64	The amount of data transferred in Gbytes. On Linux data transferred mall bytes read and written by the job through the read(), pread(), writeand pwrite() systems calls. On Windows this is the sum of all bytes transferred by the job by doing write, read and other operations.	N/A	io	N/A
ioops	UInt64	The number of io operations.	N/A	ioops	N/A
iow	UInt64	io wait time in seconds	N/A	iow	N/A
jobClass	String	Name of the Job Class the job uses. "NONE" for jobs not using a JC.	N/A	job_class	JOBCLASS
jobExitStatus	String	"failed" value for not zero exit status, "succeed" value for zero exit status	N/A	N/A	N/A
jobId	String	job identifier	index[2]	job_number	ID
licenses	String	subset of RESOURCES string	N/A	N/A	licenses
memoryRequested	UInt64	Memory requested while submitting job.(Kb)	Resource_List.mem	Order of priority - first one found will be taken 1.hard_resources→m_mem_free 2.hard_resources→h_rss 3.hard_resources→h_vmem 4.hard_resources->mem_free	In Resource field value of RAM
name	String	Job name provided by the end user	jobname	job_name	JOBNAME
owner	String	Linux user name which is used to submit job	user	user	USER
peName	String	The parallel environment which was selected for that job.	N/A	granted_pe	N/A
physicalMemoryUsed	UInt64	RSS memory used reporting by WLM end record. (Kb)	resources_used.mem	maxrss	MAXRAM
project	String	Project name	project	project	PROJECT
qdelInfo	String	If the job (the array task) has been deleted via qdel, <username>@<hostname>, else NONE. If qdel was called multiple times, every invocation is recorded in a comma separated list.	N/A	qdel_info	N/A
queued	UInt64	amount of time was in one of the scheduling buckets in seconds	N/A	N/A	queue
queueName	String	Queue in which job was run	queue	qname	N/A
queueTime	UInt64	Job Queue time (or when job was submitted). Values is in epoch timestamp in microseconds.	qtime	submission_time	SUBMITTIME
resvID	String	Reservation ID	resvID	arid	N/A
resvName	String	The name of the Advance Reservation of the job.	resvname	ar_name	N/A
slots	UInt64	Number of slots	Resource_List.ncpus	slots	CORES value from RESOURCE field
spriority	UInt64	Submission priority.	N/A	N/A	spriority
startTime	UInt64	Time Job Started running on execution host. Values is in epoch timestamp in microseconds.	start	start_time	STARTTIME
status	String	string how the finished DONE or FAILED	N/A	N/A	status
submitHost	String	Hostname where job was submitted	N/A	submit_host	SUBMITHOST
submitTime	UInt64	Job submission time. Values is in epoch timestamp in microseconds.	ctime	submission_time	SUBMITTIME
susptime	UInt64	total time job was suspended in seconds	N/A	N/A	susptime
virtualMemoryUsed	UInt64	VM memory used reporting by WLM end record. (Kb)	resources_used.vmem	maxvmem	MAXVM
wallClockTimeRequested	UInt64	Wallclock requested in seconds while submitting job	Resource_List.walltime	hard_resources	N/A
wallclockTimeUsed	UInt64	Wallclock used by the job	resources_used.walltime	wallclock	ENDTIME-STARTTIME-SUSPTIME
xpriority	UInt64	Execution priority (nice).	N/A	N/A	xpriority
exec_host_list	Nested	list of hosts where job was running	exec_vnode	exec_host_list	EXEHOST
cloud_cost	Float64	If cloud job what is the price	resources_used.cloud_cost_list	N/A	N/A
cloud_node_class	String	If cloud job what is the node class	resources_used.cloud_instance_list	N/A	N/A
cloud_node_time	UInt64	If cloud job how much time it used the node	resources_used.cloud_time_list	N/A	N/A
cloud_node_type	String	If cloud job what is the node type	resources_used.cloud_type_list	N/A	N/A
frac_node_cpu_used	Float64	Fraction of CPU used from the availability
frac_node_mem_used	Float64	Fraction of Memory used from the availability
mem	UInt64	Memory used in the node	exec_vnode
ncpus	UInt64	No of CPU used in the node	exec_vnode	exec_host_list	CORES value from RESOURCE field
node	String	execution node name	exec_vnode	exec_host_list	EXEHOST
resourcesGranted	Nested	Based on custom resource, definition dynamic fields will be created with defined custom resource data type	resources_assigned.<resource_name>=<resource_value>	granted_req. (73)	RESOURCES parsed from RESOURCE field
datatype	Enum('integer', 'float', 'string', 'boolean')	Custom resource data type
resource_name	String	Custom resource name
valueBoolean	Bool	Custom resource value
valueFloat	Float64	Custom resource value
valueInt	Int64	Custom resource value
valueStr	String	Custom resource value
other_resources	Nested	All the non-mapped job information.
resource_name	String
resource_value	String


exec_host_list: Contains the list of nodes in which the job has run. It stores the resources consumed by the job on each of the nodes. Sample SQL queries are: Query 1: Find all the distinct node name: select DISTINCT(arrayJoin(exec_host_list.node)) from JOB_RECORDS; Query 2: Find the number of jobs ran on each node: select exec.node, count() from JOB_RECORDS ARRAY JOIN exec_host_list AS exec group by exec.node


resourcesGranted: Based on the custom resource definition dynamic fields are stored with their defined data type. Different WLM has different terms for it, in PBS it is Custom Resources, in AGE it is Complex Attributes, and in Accelerator it is Resources. One job can have more than one custom resource and different data types (Boolean, Float, Integer, String). On resourcesGranted - we store value as a list of custom resources and resourcesGranted.resource_name contains a custom resource name. On resourcesGranted.datatype - we store the datatype of that resource and resourcesGranted.value<DataType> will have its value. Sample SQL queries are: Query 1: Find the number of jobs that has requested "ndesktops" resource: SELECT count() from JOB_RECORDS WHERE has(resourcesGranted.resource_name, 'ndesktops')


other_resources: All the non-mapped job information that is not part of the default parsing fields and resourceGranted are stored in other_resource. All the values of such fields are stored as strings. For example, in PBS, Resource_List.nodect attribute of the job accounting log is not being mapped to any of the database fields or resourceGranted so it will be stored in other_resources. Sample SQL query to fetch data from other_resources Query 1: Find the number of job which have Resource_List.nodect value as 1. SELECT count() from JOB_RECORDS WHERE has(other_resources.resource_name, 'Resource_List.nodect') and other_resources.resource_value[indexOf(other_resources.resource_name, 'Resource_List.nodect')]='1';


Nested Data Structure: A nested data structure is like a table inside a cell. The column names and types of a nested data structure are specified in the same way as the CREATE TABLE query. Each table row can correspond to any number of rows in a nested data structure. For more information on how to work with nested data structures, refer to clickhouse database documentation.