Creating a Template
A Monarch Data Prep Studio data extraction template is used to obtain data from a PDF or report document. Monarch Data Prep Studio provides seven template types:
-
Detail template: A detail template is used for extracting information from the lowest report level, usually referred to as the detail, transaction or itemized level. Fields extracted by the detail template are used to create each database record in the resulting database table.
-
Append template: An append template is used for extracting fields from each sort level in a report, usually referred to as a sort level, group level or append level. Fields extracted by each append template are appended (i.e., concatenated) to each record created by the detail template field.
-
Group Footer template: A footer template is used for extracting fields that appear after a detail line (use an append template to extract fields that appear before a detail line). Fields extracted by the footer template are appended to each record.
-
Page Header template: A page header template is used for extracting fields that appear at the top of each page. This area is referred to as the page header. Fields extracted by the page header are appended to each record.
-
Exclusion template: An exclusion template is used to specify lines or parts of lines that should NOT be captured by other template types.
-
Start Region template: A Start Region trap identifies a line in a report where all other types of trapping (e.g., detail traps, append traps) should begin.
-
End Region template: An End Region trap, in contrast to a Start Region, identifies a line in a report where all other types of traps (i.e., detail traps, append traps) should end.
Though the following instructions outline the procedure for creating a detail template, you may use the same procedure to create an append, footer, page header, or exclusion template.
Step 1: Identify and select a template sample
-
Import a PDF or PRN report into Data Prep Studio. The Report Design window displays.
-
Inspect the first few pages of the report to identify the detail level information. Note whether all detail fields fit on a single detail line or if they spread across multiple lines. Note: In most reports, all detail level fields fit on a single line. Occasionally, the detail fields may be arranged in blocks of two or more lines, sometimes with field labels interspersed. In the following illustration, all of the detail fields are arranged on a single line.
-
Select a line or group of lines that contain a single instance of the detail fields. Note: To select a single line, click in the line selection area to the left of the line. To select multiple lines, click down in the line selection area to the left of the first line, then drag down to the last line, then release.
Step 2: Assign a template role
The first time you create a template for a report, you must assign a role. You may change the template role at any time while creating or editing a template definition.
In the Template and Field Properties Panel, select Add a New Template and then select a template role:
Step 3: Define a Trap
Each template requires a trap that identifies unique features of the template. The trap is used to capture all instances of a template throughout the report. In the case of a detail trap, the trap identifies common features shared by all detail lines, but not shared by other lines. A proper detail trap will capture only the detail lines while ignoring lines from headers and other sort levels.
For example, in the illustration above, all detail lines have a number in position 9 followed by two blank characters:
To capture these lines, you can set a trap that looks for a numeric character in position 9 followed by two blanks.
You create a trap by entering trap characters in the Trap line as shown below:
The Sample text line and the Trap Line. The Trap Line has one numeric trap defined.
As you create your trap, portions of the body of your report are marked with guillemets in the Line Selection area to confirm that the data in them will be captured. You can use these marks to check that your trap is correctly defined.
You can specify one or more trap characters on the trap line. Often, several different combinations will work equally well. It is generally a good idea to specify several trap characters to ensure that you do not accidentally capture lines from other sort levels, but be careful. If you specify many trap characters, you run the risk that some lines will be missed by the trap. You can experiment with the trap definition until you find a combination that works.
In some reports, the detail fields are arranged on more than one line, e.g., a block of text. As well, occasionally, there may be no unique features that identify the first line of a block of text. If it is impossible to use line 1, select another line that has unique features. In this case, you need to indicate which of the sample lines to trap on via the Trap Line field.
When you are satisfied that the trap is working to capture all of the detail lines, but not lines from other levels, you are ready to highlight the fields that you want to extract.
Data Prep Studio accepts several trap types, i.e., Standard, Floating, and Regex. You can choose which trap type to apply to the template from the Trap Type drop-down.
Step 4: Highlight and name fields
To highlight fields:
Using the sample line as an example, highlight each field you want to extract. Each field highlight should be long enough to allow for long field values but not so long that you extend the highlight into another field's data. For numeric fields, which are right aligned, the highlight should extend to the left to account for the largest number that is likely to exist in the field.
-
Using the mouse. Click down in the Sample text line at the left extent of the field and then drag right to highlight the field. Repeat for each field you want to extract. For numeric fields, which are right aligned, you may start at the right edge and drag left to highlight the field.
-
Using the keyboard. The keyboard provides a more precise method of highlighting fields, since you can easily adjust the field length by a single character at-a-time. Click in the Sample text line to display the insertion cursor. Use the arrow keys to move the cursor to the first character in the field, then press Insert and use the right arrow to highlight the field. Press Enter to complete the field definition. Repeat for each field you want to extract.
Although you could use the vertical scroll bar to scroll through the report to ensure that you have defined your fields properly, this method becomes tedious when you are working with large reports. Therefore, Monarch Data Prep Studio provides a field verification feature that reads the entire report and verifies that your fields are properly defined.
To name fields:
You can name each field via the Field Properties portion of the Template and Field Properties panel. By naming fields within the context of the report, you may find it easier to determine appropriate field names.
-
Click a field you wish to rename in the Sample text line.
-
Create an appropriate name for the field by clicking into the Name box and replacing the current entry with a new one. Click on the checkmark that displays next to the field to accept the field name.
-
Repeat steps 1 and 2 for each field to be named. If several fields are defined in one template, you can display the properties of the next field and rename it by selecting the Next Field button at the bottom of the Edit Field Properties panel.
The Field Properties portion of the Template and Field Properties panel displays the sample field value for the selected field along with the current field name. If you have not yet named the field, it is automatically given a temporary name.
Field names may be up to 62 characters in length and may contain uppercase and lowercase characters, spaces, and punctuation except for period (.), exclamation point (!), accent grave (`) and brackets ([]). Names may begin with any character, except for an underscore or space. If a name is entered with leading spaces, the name is accepted but the leading spaces are ignored.
Note: If you elect to enforce DBF field naming rules, field names must adhere to the dBASE III field naming conventions. Names may be up to ten characters long and may contain any letter or number and the underscore character (_). The first character must be a letter. Spaces and punctuation are not allowed.
Step 5: Edit field properties
All fields are described by a set of properties that make it unique from other fields. These properties are specified in the Edit Field Properties portion of the Template and Field Properties panel when a field is selected in the Sample text line. Use the guide provided in Input Field Properties to modify the properties of your fields.
Step 6: Name and accept the template
When you are satisfied with the trap and field definitions, you can name the template and apply it to the report.
-
Click on the Edit icon located to the right of a template name to activate the text box. Note that, by default, Monarch Data Prep Studio assigns the current template role as the template name.
-
Enter a new name into the Template Name field.
-
Click Accept on the Report Design window to save your template definition. Otherwise, choose Cancel to discard your changes.
Step 7: Verify the template
Once the template has been created, you can verify it to ensure there are no errors or issues.
-
Click on the Report Verify tool on the Report Design toolbar.
-
Verify scans the entire report and examines the field boundaries. If any characters are found immediately adjacent to a field boundary, Monarch Data Prep Studio will highlight the field to alert you that the field may be too short to accommodate a field value or that the field may be defined at the wrong location.