Working with Problem PDF Files

You may occasionally encounter a PDF file that you are unable to import into Monarch Classic. While there are a number of reasons why this may occur, often it is because the text layer of the PDF file was damaged during the creation process, or because the PDF file is actually a scanned image or some other embedded image.  

The first step when working with a problem PDF file is to determine whether it actually contains any text.

Determining whether a PDF file contains text

A quick and easy way to check to see if any text actually exists in a PDF file is to open it in Adobe Acrobat and use the Find feature to search for some text you can plainly see on screen. If the text is not found, the text layer has been damaged or does not exist, in which case the document is most likely an image and is therefore unreadable by Monarch Classic or Acrobat.

Another test is to use the text extract tool in Acrobat, copy some text and then paste it into Notepad. (Note: If the text extract tool fails to highlight any text when you left-click and drag over it, then the text you can see on screen is an image.) If the text you pasted into Notepad is not the same as the text you can see on the page of the PDF file, then the text layer is damaged.

Scenarios in which Monarch Classic Cannot Import a PDF File

Below are the more common scenarios under which Monarch Classic may not be able to import a particular PDF document, as well as some suggestions on handling them.

  • Scanned PDF Files - If a PDF file contains no text (see how to determine this above), it may actually be a scanned image or some other embedded image. A scanned image is a picture of a document, taken by a scanner, which is then embedded into a PDF document. Monarch Classic cannot extract text from a picture. The only way to deal with images is to use OCR (optical character recognition) software to try and recognize and extract text from them.

  • CAUTION: We do NOT recommended that OCR software be used with critical financial documents, due to the fact that the extraction accuracy varies with each document and the OCR software being used. It is very easy for small errors in the recognition to creep in when using OCR software, which may not be noticed until a review or audit of the data is performed.

  • Damaged PDF Files - Even though a PDF file may appear correctly in Adobe Acrobat, during the creation process the text layer may have become damaged beyond repair, the result being that Monarch Classic is unable to extract text from it. Adobe Acrobat is able to detect and repair many small errors in PDF documents, so opening the offending PDF file in Acrobat and using the File|Save As menu option to re-save it as a new PDF file may correct the problem.

  • Text Extraction Prohibition - When a PDF file is published, there are security options that can be specified to prevent the extraction of content from it. When you attempt to import a PDF document for which content extraction has been prohibited, Monarch Classic will issue a message "Cannot import from PDF file because it does not allow text extraction". If this occurs, you will have to ask the publisher of the PDF file to republish it for you, and to allow content extraction when doing so.

 

 

 

© 2024 Altair Engineering Inc. All Rights Reserved.

Intellectual Property Rights Notice | Technical Support