Document Types

There are two distinct types of PDF documents:

Regular (Native)
Scanned

Regular (Native) documents are usually created using Adobe Acrobat or a special printer driver that prints into a PDF file. These files contain actual text.

Scanned documents, on the other hand, are usually created by scanning a hard copy (paper) document into the computer, and therefore contains only an image of the text.

For scanned documents, PDF2XL OCR and Enterprise both use OCR - Optical Character Recognition. This is not available in the PDF2XL Basic or CLI editions.

The OCR module attempts to read the text inside the images so it can be converted to Excel properly.

When you open a scanned document, PDF2XL OCR or Enterprise will usually recognize it and suggest to turn on OCR Mode. A message box will be displayed, saying that the document is scanned and that it will be displayed in OCR Mode, and allow you to check a "Don't ask me again" box, which will make any scanned document you open in the future use OCR Mode automatically and without notifying you.

If you are using the Basic edition, no alert will pop up, but you may notice that your preview doesn't look quite right. While the OCR tab is available on the toolbar, trying to engage the OCR enginge by pressing the Start button will return an alert that you need OCR.

To learn more about converting a scanned PDF file, click here.

Document Types

Related Articles