PDF2XL Options
The PDF2XL Options menu can be found on the Start menu or by clicking on the small gear icon in the top right corner of the application.
General
Application Defaults
User interface language: This combo box allows the user to select the user interface language for PDF2XL. Note that changing the user interface language will only go into effect once you restart the application (PDF2XL will automatically suggest a restart if the user interface language is changed). If the new language was not originally installed with the application, the user will be prompted to download the language components (or install them from the CD) using the Install Language dialog.
Default magnification when opening file: This setting controls the zoom in which the document is initially opened. The options include Original Size, which will keep the size of the document; Fit to Width, which will automatically set the document to fit the PDF2XL window, and Set to, which will allow the user to set the default size (in percents).
Show Start pane at startup: If this option is checked, the Start pane will automatically be displayed when PDF2XL is loaded.
Show tutorial options on Start pane: If this option is checked, the Start pane will display the tutorial options.
Program Updates
Check for updates: If this option is checked, PDF2XL will automatically check for available software updates whenever it's started. In the dropdown box you can select what to do if an update is found:
- Notify: PDF2XL will notify the user that a new version is available for download.
- Download and notify: PDF2XL will automatically download any available uploads which are available for the current license, and notify the user that the update is ready to be installed.
- Download and install: PDF2XL will automatically download any available uploads which are available for the current license, and prompt the user to install it when PDF2XL starts.
If you are not subscribed to an updates package, you will need to purchase the update in the web store before you can download it.
Heuristics
Find horizontal dividers: If this option is checked, PDF2XL will use the divider finding algorithm to find and ignore horizontal dividers. The algorithm uses only characters from the list when calculating the horizontal dividers.
Find vertical dividers: If this option is checked, PDF2XL will use the divider finding algorithm to find and ignore vertical dividers. The algorithm uses only characters from the list when calculating the vertical dividers.
Allow dropping letters: In some cases, PDF might contain text on top of other text. If this option is checked, some of that text may be ignored. Uncheck this option if you want PDF2XL not to ignore any of the text.
Ignore more than X consecutive spaces: If this option is checked, PDF2XL will ignore blocks of consecutive spaces (i.e. will not use them to recognize tables or convert them). The user can set the exact limit of consecutive spaces that should be ignored.
Space-to-letter ratio: This setting can be used to change the minimal distance between words which PDF2XL will consider a space; setting it to narrower will ensure getting a space in the output even when the words are close together, while setting it to wider might combine parts of words even if they do not touch.
PDF2XL Printer (Enterprise Only)
While running, set Cogniview Printer as default printer: If this option is checked, Cogniview Printer will be set as the default printer when PDF2XL Enterprise starts. When PDF2XL Enterprise is closed, the previous default printer will be restored.
PDF2XL Scanning (Enterprise Only)
Use TWAIN drivers: If this option is checked, PDF2XL Enterprise will use TWAIN technology to access the scanner, instead of the newer WIA technology. If you have a scanner which PDF2XL Enterprise fails to detect, you should try and see if checking this option fixes that.
Format
Column format character set
Number: Checking this option will ensure that only the characters in the box will be converted in columns and fields marked as numeric. This character set will also be used to limit the OCR engine when performing OCR on columns and fields marked as numeric.
Currency: Checking this option will ensure that only the characters in the box will be converted in columns and fields marked as currency. This character set will also be used to limit the OCR engine when performing OCR on columns and fields marked as currency.
Date: Checking this option will ensure that only the characters in the box will be converted in columns and fields marked as date. This character set will also be used to limit the OCR engine when performing OCR on columns and fields marked as date.
Time: Checking this option will ensure that only the characters in the box will be converted in columns and fields marked as time. This character set will also be used to limit the OCR engine when performing OCR on columns and fields marked as time.
Numeric Columns
Move right side negative to left: If this option is checked, PDF2XL will move any negative sign on the right side of numeric fields to the left, so Excel will be able to regard the data as a number. If the box is cleared, the numeric fields and columns will not be changed. Click here to learn more.
Replace parenthesis with negative sign: If this option is checked, PDF2XL will replace the parentheses around numeric fields with a minus sign on the left. If left unchecked, numeric fields and columns containing parentheses will remain unchanged.
Minus sign characters: Any character in this character set will automatically be converted to the minus (negative) sign in fields and columns marked as numeric.
Text Columns
Keep indentations: If this option is checked, PDF2XL will retain the relative indentation of all the columns or fields marked as text. This will affect all the lines of text in case of fields or cells containing more then one line of text, if the Keep Line Wrap option (under Text Columns) is checked. Learn how to use the indentations feature.
Keep line wrap: Checking this box will ensure PDF2XL will keep line wrapping for all fields and cells marked as text when converting. Note that line wrapping will not be kept for CSV or Clipboard conversions, due to the fact the Excel and similar applications will not receive the data correctly in such a case.
Output formatting
Convert superscript: If this option is unchecked, PDF2XL will remove any superscript data from the conversion. This is mostly useful when there are footnote or annotation marking next to numbers, and you wish to ignore them when converting so Excel won't consider them a part of the number.
Change white text's color: When converting data using the font's attributes - specifically, color - white colored text can disappear if the background color of the resulting document is white; as the default background color of Word and Excel is white, this can cause some issues. This setting can fix this problem by letting the user select the output color for white and nearly-white colored text; the user can keep the original color by clearing the check box. Note that this problem only occurs if the user set the "Keep Text Attributes" option for either Excel or Word and Powerpoint in the Output Settings page of the Settings dialog.
OCR (Available in OCR and Enterprise)
Default OCR behaviour
Language: Allows the user to select the language of the dictionary used in the OCR process. Words that do not appear in the dictionary will have a higher chance of being marked as suspect. If 'all' is selected (which is the default), the OCR process will guess the language of the word.
Automatically OCR scanned files: If this option is checked, PDF2XL will automatically perform OCR when a new scanned PDF document is opened.
Validation
Validate only suspect words: If this is left unchecked, OCR Validation will try to validate all words in the document, rather than only ones marked as suspect.
Ignore validated words: If this is checked, opening the OCR Validation dialog will skip words that were already validated. If unchecked, the dialog will ask to validate all words
OCR Tweaking
Threshold: You can either allow PDF2XL to select an automatic monochrome threshold, or set it manually. Change this setting if the scanned page is either very light or very dark.
Despeckle: By setting this option you can make the OCR process ignore small dots and imperfections in the scanned image. If the scanned document has a lot of 'noise', this option can help enormously. To use it, check the despeckle box, and select the maximum size of the dot to remove. Moving the bar to the right will make PDF2XL remove larger and larger 'dots', up to removing quite sizable chunks.
Remove lines: If this option is set, PDF2XL will try to remove vertical and horizontal lines before processing the image. This is mostly useful when trying to process an image scanned from old computer print-out papers that have pre-printed lines on them.
Conversion
General
Write table names: If this option is checked, PDF2XL will include each table's name in the output (when not using the Convert into multiple sheets option). If unchecked, the table name will not be displayed in the output.
Keep empty rows: If this option is checked, the output will include empty rows (rows where all cells are empty). If unchecked, such rows will be omitted from the output.
Keep text attributes: If this option is checked, PDF2XL will keep the text attributes - size, color, italic, bold, and type - of the original data when converting to Excel or OpenDocument Spreadsheet. If unchecked, the default text attributes will be used.
Use 2007 Format: If this option is checked, PDF2XL will use the MS Office 2007 file format when saving (and will use the .xlsx, .docx or .pptx file extensions). If unchecked, the file will be saved in the old format (.xls, .doc or .ppt). This option requires MS Office 2007 to be installed on the computer.
Write column names: If this option is checked, PDF2XL will include each table's column names in the output (if there are column names for the table). If unchecked, the column names will not be converted.
Merge cells with overflowing text: Checking this option will result in merging the contents of cells where a word is overflowing from one into the other; if converting to a format that supports merging the actual cells (for example, Excel or Word), the cells themselves will also be merged if that option is set. Leaving this option unchecked will make PDF2XL split overflowing words between the cells.
Merge cells to match layout: Checking this option will make PDF2XL merge cells in Excel or OpenDocument Spreadsheet wherever there are merged cells in the table structure.
Keep leading zeros: Checking this option will make sure that Excel and OpenOffice will treat cells that contain numbers with leading zeros as text, even when not marked as text in the Conversion Format. Note that this does not apply to columns specifically marked as Numeric, but only to columns that have no set format.
Conversion to a File
Open file after conversion: If this option is checked, PDF2XL will automatically open the converted file once it's done converting.
Ask for filename: If this option is checked, PDF2XL will ask for a filename before converting, and save the conversion results in that name.
File Metadata
This is the meta data that will be put in the converted file.
Excel-Specific:
General
Enable Add-ins: When PDF2XL converts data into Excel or Excel File (with the Open File after Conversion option checked), the opened Excel instance might not have all the installed Add-ins enabled. Checking this option will ensure that they will work in that instance. Checking this option might make the conversion slightly slower
Use XLStart Folder: Normally, PDF2XL launches your converted file in Excel without using any of your automatically-loaded workbooks, such as your Personal Macro Workbook (PERSONAL.XLS or PERSONAL.XLSB); this is done because of a limitation in Excel Automation. If you check this box, then while launching Excel with your converted data, PDF2XL will scan your XLStart folder and attempt to load your start-up workbooks. Note that for security reasons, PDF2XL will only load XLS, XLSX, XLSB and XLSM files from your XLStart folder.
Conversion to a File
When File Exists: Select the way PDF2XL will handle the conversion when converting into an Excel file that already exists. The options include:
Option | Description |
---|---|
Overwrite | Replace the existing file. |
Add new sheet(s) | Add one or more new sheets to the workbook and put the newly converted data into them. |
Add to existing sheet(s) | Append the data to the existing sheets. If the current setting is to convert all the data into a single sheet, then PDF2XL will add the data to the first sheet. If the setting is to convert to multiple sheets, then PDF2XL will search the file for a sheet with the same name. If found, the new contents for that sheet will be appended to it; if not found, a new sheet will be added. |
Ask me what to do | The user will be asked which of the previous three options to use. If the last one (Add to existing sheets) is selected and the output will only be a single table, the user can select to which sheet to add the data, and even where to put it. |
Cell Format
Number Format: This is the format that will be set in the Excel sheet for cells marked as having numeric data.
Currency Format: This is the format that will be set in the Excel sheet for cells marked as having currency data.
Date Format: This is the format that will be set in the Excel sheet for cells marked as having dates.
Time Format: This is the format that will be set in the Excel sheet for cells marked as having time data.
CSV-Specific:
General
Quote all fields: Checking this option will make PDF2XL quote all the fields in the CSV output; leaving it clear will quote only fields that contain the separator character or the quote character.
Quote empty fields: Checking this option will make PDF2XL quote any empty fields in the CSV output (i.e., putting "" between separators); leaving it clear will result in writing the separators only.
Use Separator: This setting allows you to select the separator used in the conversion into CSV files. The default is Comma, and you can also select Tab from the combo box. If you wish for any other character, select 'Other' from the combo box and type it into the edit box (see, for example, the 'o' as a separator in the image above).
HTML-Specific:
General
Excel Look: When this box is set, the resulting HTML will look quite similar to Excel 2007. If it is clear, the a default HTML table will be created instead (and the additional row-number column, which is necessary for the Excel-like output, will not be added). This option is on by default.
Full HTML File: When this box is set, the resulting file will contain HTML headers and footers. If it is clear, only the table will be in the output file. This option is on by default.