Customizing PDF Import Options in Data Prep Studio

When you import a PDF file into Monarch Data Prep Studio, the application performs an analysis of the file to try and determine the optimum method of transforming the data accurately. The goal of ensuring optimal text alignment in a PDF report is to facilitate trap creation and data capture. When the text isn't properly aligned, you may have to resort to creating numerous traps to capture data from different lines of text, which is both tedious and time consuming. Thus, when adjusting alignments in PDF reports is warranted, giving careful consideration to how you intend to trap the data is of great importance. In most cases, Monarch Data Prep Studio’s auto-detection routines will produce the best results. Under certain conditions, however, other adjustments to the PDF import options may be necessary.

Previous PDF engines used in Monarch used the idea of monospaced and free-style text flow to adjust for text alignment. These older engines are usually adequate for:

  • PDF files containing tables with tightly compacted columns.

  • PDF files containing multiple font sizes and the data of interest are in a smaller font than the other text in the PDF, thereby causing the auto-calculated font size to be too large.

  • PDF files containing mixed mono- and variable-spaced fonts but the data of interest use monospaced fonts.

  • PDF files containing mixed free-form and tabular data.

However, in newer PDF reports:

  • Text alignment on pages with sparse text is inconsistent.

  • Text wrapping may cause horizontal misalignment.

  • Alignment of centered text is unpredictable.

Moreover, PDF reports are now published by numerous software products and may show some unpredictability in terms of their use of fonts, backgrounds, and line colors. Thus, a rendering engine that can tolerate any combination of fonts (including both monospaced and free-form fonts) and background colors is required.

Monarch introduces a new PDF engine (version 4.5) that improves the accuracy of text extraction by identifying graphical elements, such as vertical and horizontal lines and rectangles, on rendered PDF pages and using these elements to form grids to which text will be aligned. This new feature addresses alignment issues that render some trapping operations in PDF files extremely difficult.

For example, when the PDF report Composers.pdf, which is usually available in C:\Users\Public\Documents\Altair Monarch\Reports, is opened in Data Prep Studio and an older PDF engine is used (e.g., 4.3), the second column appears skewed.

 

In this case, simply changing the PDF engine to 4.5 is sufficient to align all columns correctly. Data Prep Studio also automatically sets other properties to obtain optimal results.  

 

In other cases, however, simply changing the PDF engine is insufficient to align closely positioned columns consistently. For such cases, Data Prep Studio offers two engine modes, that is, SIMPLE and EXPERT, each of which include several properties you can modify, to improve the alignment of columns in your PDF report.

 

 

© 2024 Altair Engineering Inc. All Rights Reserved.

Intellectual Property Rights Notice | Technical Support