Converting a Scanned PDF File
The PDF2XL OCR and Enterprise editions both come equipped with an OCR engine that can convert scanned files.
For the best result, we recommend scan settings of 300 DPI, Black & White (if possible). Note that the accuracy of the conversion is dependent on the quality of your document.
Converting a Scanned PDF FiIe
- As usual, open your PDF using one of the "Open File" options.
- If the document has not been previously run through an OCR engine, you will see a prompt notifying you that this is a scanned PDF and OCR Mode is going into effect. Just click "OK" and let it do its thing.
- If your page is rotated or the data is skewed, use the rotation buttons on the Source pane toolbar or the Fine Rotation options in the "OCR" menu.
- Create your layout, splitting and merging columns where necessary.
- There are additional settings you can adjust in the OCR Options section.
- If your document is in another language, you can select it from the dropdown or use the "Add Languages" button to import it.
- The OCR Tweaking settings at the bottom allow you to adjust how the OCR recognizes the data. Use "Threshold" when the page is too light or too dark. Use "Despeckle" if there happens to be a lot of noise on the page. "Remove Lines" will try to clear out any vertical and horizontal lines, and "Force DPI" will affect the dots per inch to try and provide more clarity. Since these settings are based entirely on the quality of your document, there is no recommended adjustment - you simply need to play with it until you have the best possible outcome.
- When you are ready to convert, make sure you have selected the number of pages you want to convert in the Convert Pages field of the "Convert" menu. By default, this is set to convert all the pages in your document.
- Click the "Convert Document" button.
- You will see a prompt asking you to validate your document. This is optional, but it allows you to correct any errors that the tweaking did not fix before you convert it.
- When you agree to validate, another prompt will appear. This one grabs the words that the OCR does not recognize perfectly. If the word in the "Suggested word" field is correct, you can click "Accept". If it is incorrect, you can type in the correct word and "Accept". You can click "Done" at any time to close the validation prompt.
- Once validation is complete, the application will complete the conversion.
Poor Conversion
If your conversion isn't accurate and you've done all you can using the OCR tools, there are a few things you can look at to determine why you ended up with this result.
Open the PDF in Adobe Acrobat Reader and view it at high zoom (800-1000%)
- Is the background clean?
- Are there different background colors?
- Are there pixels around the text?
- Are there any watermarks or handwritten text?
- Any vertical or horizontal dividers in the table?
- Any language that is not European? The current OCR can only recognize Latin text. (Go here to see a list of supported languages)
Any of the above issues can affect your result.