DataPrep Elements

DataPrep elements in the Process Designer tool are used for executing data preparation operations upon process inputs. These elements consist of Workspace and DataPrep Export.

NOTE: A Hub element can be added between a Workspace element and a DataPrep Export element to process multiple input files individually or by group on a workspace.

The table below specifies functionality of the DataPrep group elements.

Element Name

Description

Properties and Comments

Workspace

Performs operations upon process inputs.

  • Name — name of the graphic element.

  • Workspace Path — path to the required input file location.

Actions to select the workspace path:

    1. Click the Browse button to open the Browse dialog.

    2. Select among server library.

    3.  If you have slow data source connections, you can select the Delay data source connections validation check box. In this case, you can cancel the validation of Data source connections when loading workspace by clicking the Cancel button in the Properties panel. You can also cancel the validation of Data source connections when opening a process which contains a workspace.

    4. Click OK.

NOTE: The following alert displays when you change the workspace path of an existing workspace element: Load Plan should be reselected and reapplied in DataPrep Export/Verify Workspace settings because another workspace was specified. (It is impractical to assume that the Load Plan is still valid, given the Workspace has changed, thus, requiring this step to confirm/refresh accordingly).  

  • Validate Workspace — Click this button to validate the workspace manually. The result of workspace validation shows the following status:

    • Validation successful

    • Validation failed

    • Validation is not completed

NOTE: Workspace validation is sensitive to data volume and connection speed and is thus disabled by default. For web/cloud-based load plans, you may select the Disable validation optimization checkbox to disable the automatic downloading/parsing of files from the web.

  • Load Plans — the list of available export load plans, including:

    • Provider type. This shows the type of provider.

    • File Name. If your Workspace input is a file of a certain format, you can substitute it with the file from the process input. Do the following. For the PDF Report, Text Report and Delimited Text file formats add a File input item to your visual process design form. For Microsoft Access and Microsoft Excel formats add a Database input item. Connect this item with your Workspace element. Set up the Input path for your File input or Database input. Then in the File Name drop-down list for your load plan select a File input or a Database input option.

  1. NOTES:

  2. When the Database Input option is selected, the table name automatically changes to the table name from the database input. The table name may be changed manually.

    Database Input is required for Excel formats and must include the extension even with wildcards.

  3. Additional options are available for Excel Worksheet Design. They are detailed here.

    • Source — If the workspace input is a database, then this field displays the database name. If the source type is SQL Server, ODBC, or OLE DB, there is the possibility to rewrite/edit the Database Data Source. To do so, click the Edit Table Info button, and then edit the Connection String, Table, etc. If the plugin is Google Analytics, the settings can be changed by clicking the Edit Table Info button. In the Google Analytics window, edit the Service Account Email, Profile ID, Start Date, End Date, etc. If the plugin is Oracle, edit the settings by clicking the Edit Table Info button. In the Oracle window, edit the connection type and settings (including host name, port, service name, user ID, password, etc.). Tables and views, as well as queries, can also be edited from this window.

NOTE: Additional options for SQL Server, OLE DB, and Oracle source types are provided below.

    • Change History — additional operations to be executed with the input. Click the Add Change List button and select a change list from the Server Library to add to the workspace. You can also add new item to Server Library via the Add Change List dialog box. Upon selecting a change list, you will see a Change History list. You can add multiple change lists via the little Add Change List button on the right, or remove all change lists via the Reset Change History button. You can also disable certain changes to your workspace by clicking the respective Undo button (all the following change items will be disabled as well). Click the Redo button to enable the changes back. Non-applicable changes appear as disabled and accompany with a warning sign. You can remove such changes via the Delete button.

NOTE: When the load plan comes with a change list, this change list is labeled as “Default.” All new change lists added to the workspace’s change history are labeled according to their names in the Server Library.

When starting a process via manifest, the existing change list in the process can be overwritten with a change list from the Server Library.

DataPrep Export

Exports data from data preparation workspace.

  • Name — name of the graphic element.

  • Load Plan to be Exported — select a load plan for exporting.

The user can select several load plans for the export by ticking their corresponding checkboxes. To export all load plans, select the All Load Plans check box.

  • Import Export from Workspace — allows you to import an export previously saved with the connected workspace.

Click this button to open the Import Export Definition dialog box. In the Export Name dropdown menu, select the workspace export you want to import. The corresponding details of the export (e.g., load plan, export type) are displayed. Click OK to proceed. The settings of the selected export are then applied to the DataPrep Export item settings.

  • Export File System  the file system to which your table will be exported. This setting is available for the following export types:

    • CSV

    • Microsoft Excel

    • Microsoft Access

    • Qlik

    • Tableau

    • SAS Transport File Format V5

    • JSON

    • XML

The following file systems are available:

  • Local or mounted disk
  • Amazon S3
  • Azure Data Lake Storage Gen2
  • Azure Blob Storage
  • Oracle Cloud Infrastructure
  • Google Cloud Storage
  • Export Type — the type of export file. Further settings depend on the type of file and are described as advanced options below.

  • Export Destination — path to required export file. Click the Browse button   to modify the destination path for the local or mounted disk file system or to open the Browse For File window from which you can view the containers of your selected export file system and set the file to be used as the export destination. Then, click Ok to proceed.

NOTES:

You may also use the naming macros to name the exported files and destination folder. To do this, click the Insert Macro icon (Insert Macro icon) and select the naming macros from the drop-down list.

The Input File Name macro is available to specify one or more input file names. If there are several inputs in a Load Plan (Join/Union), the names of the export files will contain a combination of these input files. When the Input files are processed by group, the Input File Name macro will only hold the first file name.

  • Export Options — allows you to specify the filter, summary, and sort settings for the export

Filtering:

  • Select filter: From the dropdown list, choose the filter you want to apply to the export. When All filters is selected, you will be asked to specify the Automatic Naming setting:  

    • By Files: produces a file for each filter

    • By Tables: produces a table for each filter

Select a summary:

NOTE: This option is available for the following file types: CSV, Microsoft Excel, Microsoft Access, XML, JSON, Tableau, SAS Transport Format, and Qlik.

  • Summary:  From the dropdown list, choose the summary to apply to the export. Selecting All summaries will let you specify the Automatic Naming setting:

    • By files — when exporting multiple summaries, a file is exported for each summary, and the files are named according to the summary names.

    • By tables — when exporting multiple summaries, a table is produced in the export file for each summary.

  • Measure:  Allows you to specify the measure to apply to the summary selected. This is activated when the summary has a defined column key. From the dropdown list, you can also choose All, by key to include all measures grouped by key or All, by measure to include all measures grouped by measure.

  • Split: Check Each distinct value of the first key makes a separate file or table to split the summary export according to each value change in the first key (i.e., the leftmost one). The Automatic Naming setting can then be modified:

    • By Files: produces a file for each distinct first key

    • By Tables: produces a table for each distinct first

  • Drill level: Set the drill level to apply to the summary being exported. Drilling up and drilling down is to collapse and expand a summary data set, respectively. A lower drill level (e.g., Level 1) results in fewer columns in the exported table.

Sorting

  • Select Order: From the dropdown list, choose the sort to apply to the export.

NOTE: When the selected table has a row limit applied, you can proceed with the export by exporting either all rows or limited rows. Specifically, the following options are available:

  • Export all rows : the export file will contain all the rows from the workspace

  • Export limited rows (Limit: X number of rows): the export file will contain only a limited number of rows, as set in Data Prep Studio.

The option to export either all rows or limited rows is only available when one load plan is selected. It cannot be modified when multiple load plans are selected for export.

 

Advanced options for Workspace element

Additional options for Excel Worksheet Design

  1. When dealing with multiple inputs, the Parse inputs for sheet names option becomes available. Select the checkbox if you want to parse the inputs for specific sheets, and then click Apply to…

  2. When dealing with only one input, modifications can be applied to the following by clicking Apply to…:

Additional options for SQL Server, OLE DB, and Oracle source types

  1. In the column list in the SQL Server/OLE DB/Oracle dialog box, select the column to use.

Select from the following options:

The following filter methods are available:

 

Advanced options for DataPrep Export element

Additional options for CSV file types

  1. In the When output file exists drop-down menu,

  2. Delimiter sets the separator symbol between columns. The delimiter characters available are comma, semicolon, tab, pipe, and space. Choose Other to specify a different character.

  3. Qualifier sets the wrapper symbol type.

  4. Choose the Encoding Type to apply to the CSV file. The following options are available:

  1. Set the Code Page to apply to the CSV file. The following options are available:

  • OEM United States  

  • Greek (DOS)

  • Baltic (DOS)  

  • Western European (DOS)  

  • Central European (DOS)  

  • OEM Cyrillic  

  • Turkish (DOS)  

  • Portuguese (DOS)  

  • Icelandic (DOS)  

  • French Canadian (DOS)  

  • Nordic (DOS)  

  • Cyrillic (DOS)  

  • Greek, Modern (DOS)  

  • Japanese (Shift-JIS)  

  • Central European (Windows)  

  • Cyrillic (Windows)  

  • Western European (Windows)  

  • Greek (Windows)  

  • Turkish (Windows)  

  • Baltic (Windows)  

  • Cyrillic (KOI8-R)  

  • Cyrillic (KOI8-U)

  • Western European (ISO)  

  • Central European (ISO)  

  • Latin 3 (ISO)  

  • Baltic (ISO)  

  • Cyrillic (ISO)  

  • Greek (ISO)  

  • Turkish (ISO)  

  • Estonian (ISO)  

  • Latin 9 (ISO)  

  • Japanese (JIS)  

  • Japanese (EUC)

 

  1. Select the Include Column Header checkbox to include headers in your file.

  2. Select the Append DOS end-of-life character (x1A) checkbox to add a DOS EOF character to the export file.

  3. Select the Legacy compatibility mode – Ignore format customizations during export checkbox if you want to use the Monarch Classic style formatting instead of the column format customizations during the export of numeric and date columns.

NOTES

When this option is selected, numeric values are exported using the current system’s localized decimal separator. For date values, the following options are available:

For Monarch Server, this would typically be based on the Service Account for the Monarch Server Agent Service localization specifically (or, if running Monarch Server in console mode, the User Account context in which the console is running).

Additional options for Fixed Text file types

  1. In the When output file exists drop-down menu,

  2. Click Column sizes to adjust the widths of the columns in your file.

Additional options for Altair Panopticon file types

  1. In the When output file exists dropdown menu,

  1. Enter your Table Name.

NOTE: Click the Existing Tables button ( ) to display the names of the tables in the export file when exporting to a file that already exists. Select the desired table name.

  1. When Add data is selected in the When output file exists dropdown menu, the When table exists dropdown menu becomes available with the following options:

  1. Select the Suppress column header row checkbox if you prefer to hide the column headers in the exported file.

  2. Click the Advanced Excel Export Options button to set the parameters for the Excel export. The following settings are available:

  1. Click the Page Setup button to specify the header/footer settings and other page settings for the Excel export in the Excel Page Setup window. Select the Use workspace settings checkbox to apply the default page settings of the workspace used. To make your own specifications, the following settings are available:
  2. Select the Export All checkbox if you want to export all columns; otherwise, click Column Mapping to map the export columns to the destination columns.

Additional options for Microsoft Excel file types

  1. In the When output file exists drop-down menu,

  2. Enter your Table Name.

NOTE: Click the Existing Tables button () to display the names of the tables in the export file when exporting to a file that already exists. Select the desired table name.

  1. When Add data is selected in the When output file exists dropdown menu, the When table exists dropdown menu becomes available with the following options:

  1. Select the Suppress column header row checkbox if you prefer to hide the column headers in the exported Excel file.

  2. When Append is selected, select the Do not match column names checkbox if you want the columns from potentially different tables to be left aligned regardless of column names or data types.

  3. Click the Advanced Excel Export Options button to apply the following settings:

For Table Exports:

- Store: A list of the certificates currently available on your Certificate Store is displayed. Select a certificate from the list and click OK.

- File: Navigate to the folder containing your certificate file and then select it. Enter the certificate password in the field provided. Click the Get Certificate Information button to display the details of the certificate.

For Summary Exports:

- Store: A list of the certificates currently available on your Certificate Store is displayed. Select a certificate from the list and click OK.

- File: Navigate to the folder containing your certificate file and then select it. Enter the certificate password in the field provided. Click the Get Certificate Information button to display the details of the certificate.

  1. Click the Page Setup button to specify the header/footer settings and other page settings for the Excel export in the Excel Page Setup window. Select the Use workspace settings checkbox to apply the default page settings of the workspace used. To make your own specifications, the following settings are available:
  2. Select the Export All checkbox if you want to export all columns; otherwise, click Column Mapping to map the export columns to the destination columns.

Additional options for Microsoft Access file types

  1. In the When output file exists dropdown menu, select Overwrite to substitute files with the same file names.

  2. Enter your Table Name.

NOTE: Click the Existing Tables button () to display the names of the tables in the export file when exporting to a file that already exists. Select the desired table name.

  1. When Add data is selected in the When output file exists dropdown menu, the When table exists dropdown menu becomes available with the following options:

  1. Select the Export All checkbox if you want to export all columns; otherwise, click Column Mapping to map the export columns to the destination columns.

  2. When Update or Update and append is selected from the When table exists dropdown menu,modify the following:

Additional options for OleDb and ODBC file types

  1. Type in the Connection String or select it via the Data Link Wizard or Connection Definitions with all the required data.

  2. Select the Include password checkbox to input a password.

  3. Select the Bulk Behavior checkbox to enhance export performance.

NOTE: This option is applicable for Oracle Database and Microsoft SQL Server OLEDB providers only. 

  1. Select the Use Transaction checkbox to revert the export via rollback if an error occurs during the export.

  2. Enter your Table Name.

  3. In the When table exists drop-down menu,

  1. Select the Export All checkbox if you want to export all columns; otherwise, click Column Mapping to map the export columns to the destination columns.

  2. When Update or Update and append is selected from the When table exists dropdown menu, modify the following:

  

Additional options for Qlik file types

  1. In the When output file exists drop-down menu,

 

Additional options for Tableau file types

  1. In the When output file exists drop-down menu,

NOTE: Export to Tableau is performed in .tde format.

  1. To publish the Tableau export file to Tableau Server, click the Publish to Tableau Server check box under the Publish to Tableau Server options. This displays the following Tableau Server settings.

Additional options for IBM Cognos Analytics file types

  1. Enter your Table Name.

  2. In the When output file exists dropdown menu,

  1. Type the value of the Cognos Analytics server URL to the Server Url field.

  1. Click the Test button to get all namespaces.

  2. Select the corresponding Namespace.

  3. Type the correct Username and Password.

  4. Click the Check Credentials button to validate credentials.

NOTE: If you want to log in anonymously, select the Is Anonymous checkbox.

Additional options for Microsoft Power BI file types

  1. Enter your Table Name.

  2. In the When output file exists dropdown menu,

  3. Click the Set Power BI Login Data button to open the Power BI Login form.

NOTE: If the data on logged in user has been already received, then the button name becomes Update Power BI Login Data.

 

Additional options for SAS transport format file types

  1. In the When output file exists drop-down menu,

  2. Enter your Table Name.

NOTE: Click the Existing Tables button () to display the names of the tables in the export file when exporting to a file that already exists. Select the desired table name. The table name is limited to 8 characters in length.

 

Additional options for JSON file types

  1. In the When table exists drop-down menu,

 

Additional options for Altair Knowledge Studio file types

  1. In the When output file exists drop-down menu,

 

Additional options for XML file types

  1. In the When output file exists drop-down menu,

  2. Select the Legacy compatibility mode – Ignore format customizations during export checkbox if you want to use the Monarch Classic style formatting instead of the column format customizations during the export of numeric and date columns.

NOTE: When this option is selected, numeric values are exported using the current system’s localized decimal separator. Date values are exported in “yyy-mm-ddThh:mm:ss” format. For Monarch Server, this would typically be based on the Service Account for the Monarch Server Agent Service localization specifically (or, if running Monarch Server in console mode, the User Account context in which the console is running).

 

Plug-ins supported by Automator

Datawatch.DataPrep.Engine

 

Panopticon.DataDirect

 

Panopticon.CloudantPlugin   

 

Panopticon.DatawatchPlugin

 

Panopticon.ODataPlugin

 

Panopticon.SplunkPlugin

 

Panopticon.BusinessObjectsUniversePlugin

 

Panopticon.GoogleAnalyticsPlugin