PDF Documents
(Available from version 7.1 to 7.6)
Use the PDF Documents built-in functions to interact with PDF documents and forms. Two sets of functions are provided:
-
With Acrobat Reader: Includes functions to read the text and file name from a specified PDF document that is currently open in Adobe Acrobat Reader.
-
Without Acrobat Reader: Includes functions to read text and form field names and values from PDF documents, and to enter values into PDF document form fields. These functions do not use Adobe Acrobat Reader, and therefore work faster than the functions under With Acrobat Reader.
The functions in the PDF Documents built-in service are intended for retrieving text from PDF files in which the data is saved as text, for example, PDF forms that were completed electronically. For retrieving text from PDF files that contain images of text, for example, scanned forms that have been completed by hand, use the Advanced PDF Object type and its methods.
-
Text that is stored digitally
PDF files that contain digital text are usually created by converting files from other formats, such as MS Word files, into PDF files.
When opened in a PDF reader such as Acrobat Reader, it is possible to select and copy digital text, as below.
The PDF Documents built-in service allows you to capture digital text data.
PDF forms are PDF files that contain digital text as well as special fields that allow, for example, customers to fill in their information. The information captured in the form fields is stored digitally.
Shown below is part of a PDF form. The text the customer entered into the two Name fields is stored as digital text.
The PDF Documents built-in service allows you to capture information entered into PDF form fields. Functions are also provided to allow you to enter text into PDF form fields.
-
Text that is stored as an image
When a printed page is scanned to create a PDF file, the page is stored as an image within the PDF file, and the text on that page is treated as part of the image. The PDF file does not "know" that some of the dots on the page form text.
When opened in a PDF reader such as Acrobat Reader, it is not possible to select individual texts.
Shown below is a scanned membership form. The entire page is stored in the PDF file as a single image. It is not possible to select texts on the page; you are only able to select areas of the page to copy as images.
Text stored in this way can be recognized and retrieved using OCR objects. The PDF Documents built-in service does not allow you to capture scanned text data.
With Acrobat Reader
The functions in the With Acrobat Reader library allow you to read the text and file name from a PDF document that is currently open in Adobe Acrobat Reader. Additional functions are provided to get the window handle values of the Acrobat Reader windows.
Preparing Acrobat Reader
To use the functions available in the With Acrobat Reader library, accessibility options must be configured in Acrobat Reader.
To configure Adobe Acrobat Reader accessibility options:
-
Open Adobe Acrobat Reader.
-
Click Edit > Accessibility > Setup Assistance.
-
Click Use recommended settings and skip setup.
Example for PDF Documents - With Acrobat Reader
An example workflow is presented for each function described below. All workflows are included in a sample project.
To view the sample project:
-
Download the ZIP file of the sample project here.
-
Copy the following files to the folder%AppData%/Nice_Systems/AutomationStudio/Projects:
-
PDF_WithAcrobat_<date>.resx
-
PDF_WithAcrobat_<date>.dproj
-
-
Copy the following files to the folder c:/temp.
-
Dec_of_Independence.pdf
-
SamplePDF_Text.pdf
-
-
Open the project PDF_WIthAcrobat in Automation Studio. Each workflow is named with the name of the function it demonstrates.
Functions
Retrieves the name of the PDF file that is currently active in Acrobat Reader.
Parameters
This function has no parameters.
Returns
Returns the name of the PDF file and the number of pages as text, for example, C:\temp\SamplePDF_Text.pdf, 1 pages.
Example
This workflow stores the file name of the PDF that is open in the active Acrobat Reader window in the text variable PDF_Name.
This workflow was executed with one file open in Acrobat Reader.
The name of the PDF file, with the number of pages, is returned in the variable PDF_Name.
Retrieves the window handle of the active Acrobat Reader window.
Parameters
This function has no parameters.
Returns
Returns the window handle as a number.
Example
This workflow stores the window handle of the active Acrobat Reader in the number variable handle.
This workflow was executed while two windows of Acrobat Reader were open. The window handle of the last active window is written to handle.
Retrieves the name of the PDF file open in the Acrobat Reader window specified by its window handle.
Parameters
Parameter |
Input Type |
Description |
---|---|---|
window handle |
number |
The window handle of the Acrobat Reader window |
Returns
Returns the file name with the number of pages as text, for example, C:\temp\SamplePDF_Text.pdf, 1 pages.
Example
This workflow first runs the workflow shown in the example for Get Active Acrobat Window Handle to get the window handle for the active Acrobat window, which is stored in the number variable handle. The workflow then reads the name of the PDF file in the active window and stores it in the text variable PDF_Name.
This workflow was executed with one file open in Acrobat Reader.
The PDF name and number of pages is returned in the variable PDF_Name.
Retrieves the text of the PDF file open in the Acrobat Reader window specified by its window handle.
Parameters
Parameter |
Input Type |
Description |
---|---|---|
window handle |
Number |
The window handle of the Acrobat Reader window |
Returns
Returns the text contents of the PDF file as text.
Example
This workflow first runs the workflow shown in the example for Get Active Acrobat Window Handle to get the window handle for the active Acrobat window, which is stored in the number variable handle. The workflow then reads the text of the PDF file in the active window and stores it in the text variable PDF_Text.
This workflow was executed with one file open in Acrobat Reader.
The text is returned in the variable PDF_Text.
Retrieves a list of window handles of all open Acrobat Reader windows. If multiple PDF documents are open on different tabs in one Acrobat Reader window, only one window handle is returned.
Parameters
This function has no parameters.
Returns
Returns the window handles in a list of type number.
Example
This workflow stores the window handles of all open Acrobat Reader windows in the number list ListOfHandles.
This workflow was executed while two Acrobat Reader windows were open. ListOfHandles lists two window handles, one for each window.
Without Acrobat Reader
Example for PDF Documents - Without Acrobat Reader
An example workflow is presented for each function described below. All workflows are included in a sample project.
To view the sample project:
-
Download the ZIP file of the sample project here.
-
Copy the following files to the folder%AppData%/Nice_Systems/AutomationStudio/Projects:
-
PDF_WithoutAcrobat_<date>.resx
-
PDF_WithoutAcrobat_<date>.dproj
-
-
Copy the following files to the folder c:/temp.
-
Dec_of_Independence.pdf
-
Client_Intake_Form-JFurst.pdf
-
SamplePDF_Text.pdf
-
-
Open the project PDF_WIthoutAcrobat in Automation Studio. Each workflow is named with the name of the function it demonstrates.
Functions
Retrieves the properties of all form fields in the specified PDF form.
Parameters
Parameter |
Input Type |
Description |
---|---|---|
directory |
Text |
The directory in which the PDF file is located |
file name | Text | The name of the PDF file |
Returns
Returns a list of type PDF Field. Each PDF Field element in the list stores the properties of one field in the PDF file.
Example
This workflow gets the PDF form fields from a file at c:/temp/Client_Intake_Form_JFurst.PDF and stores them in the list of type PDF Fields called PDF_Fields.
When executed, 11 fields were found. For each, its field name, type, and value are written to an element in the PDF_Fields list variable. Note that all supported field types were identified: Text, Checkbox, Combo, Radiobutton, and List.
Retrieves the value of a specified field from a specified PDF form.
Parameters
Parameter |
Input Type |
Description |
---|---|---|
directory |
Text |
The directory in which the PDF file is located |
file name | Text | The name of the PDF file |
field name | Text | The name of the field for which to return the value |
Returns
Returns the value as text.
Example
This workflow retrieves the value of the field named Home Phone] from the file at c:/temp/Client_Intake_Form_JFurst.PDF and stores it in the text variable Field_Value.
When executed, the value of the field is stored in Field_Value, as expected.
Retrieves the number of pages in a specified PDF file.
Parameters
Parameter |
Input Type |
Description |
---|---|---|
directory |
Text |
The directory in which the PDF file is located |
file name | Text | The name of the PDF file |
Returns
Returns the number of pages as a number.
Example
This workflow gets the number of pages in the PDF file at c:/temp/Dec_of_Independence.PDF and stores that in the number varialbe No_of_Pages.
When executed, the number of pages is written to No_of_Pages as 3, as expected.
Retrieves the text from a single specified page of a specified PDF file.
Parameters
Parameter |
Input Type |
Description |
---|---|---|
directory |
Text |
The directory in which the PDF file is located |
file name | Text | The name of the PDF file |
page number | Number | The number of the page from which to retrieve the text |
Returns
Returns text.
Example
This workflow reads the text from page 1 of the PDF file at c:/temp/Dec_of_Independence.pdf and stores it in the text variable Page_1_Text.
When executed the workflow, the text contents of page 1 of the PDF file are written to the variable Page_1_Text.
Merges multiple PDF files into a single PDF file. You can optionally choose to add page numbers at the bottom right corner of each page. The output file is created if it does exist before. If the output file exists before, it is overwritten.
Parameters
Parameter |
Input Type |
Description |
---|---|---|
full paths |
List of Text |
A list of the full paths to each PDF file to merge, for example: c:/temp/sampleFile.pdf, c:/users/documents/anotherFile |
full path | Text |
The full path to the output file, for example: c:/temp/Merged.PDF |
add page numbers | Boolean | Set to True to add page numbering in the output file. |
Returns
Returns a Boolean value of True if the files were merged successfully, or False if they were not, for example, if one of the input files could not be found.
Example
This workflow creates a new PDF file at c:/temp/merged_pdf.pdf by merging the files c:/temp/Dec_of_Independence.PDF and c:/temp/SamplePDF_Text.PDF. Page numbers are added to the output file.
Writes the specified value into the specified form field in the specified PDF file. Only text fields can be written to.
Parameters
Parameter |
Input Type |
Description |
---|---|---|
directory |
Text |
The directory in which the PDF file is located |
file name | Text | The name of the PDF file |
field name | Text | The name of the field to be written to |
field value | Text | The text to enter into the field |
Returns
Returns a Boolean value of Trueif the value was written successfully, or False if it was not.
Example
This workflow sets the value of the form field named Home Phone in the file c:/temp/Client_Intake_Form-JFurst.pdf to +27-11-987-3242.
When executed, the change is made in the PDF file.