CP2 data model

The CP2 data model (part of the metadata module) describes a class that inherits all functions of the XMP data model and all functions of the Dataset class. The following subsections describe the functions specific to the CP2 data model.

Overview

Since most Certified PDF 2 information is stored as XMP, the "CP2" data model inherits from the "XMP" data model. Thus any XMP data model function can be invoked on a CP2 dataset as well.

A CP2 dataset is always embedded. It is not possible to create an external dataset with this data model.

A CP2 dataset can either be read-only or writable. When a writable CP2 dataset is finalized, an appropriate Certified PDF 2 signature is written to the backing file - in addition to updating the XMP and CP2 data structures.

A CP2 dataset can be obtained for any PDF file, whether it is a valid Certified PDF 2 file or not. This allows converting a regular PDF file into a Certified PDF 2 file.

A writable CP2 dataset can be "cleared", removing all Certified PDF 2 related information. This allows converting a Certified PDF 2 file into a regular PDF file.

File formats other than PDF are not supported, since Certified PDF 2 is tied to PDF.

Effects on the scripting API
Supporting the new CP2 data model requires:
  • Adding semantics to a limited number of existing functions.
  • Offering a range of new classes (and corresponding functions).

Compatibility

GWG Proof of Preflight
Since a PDF file with a valid GWG Proof of Preflight ticket is – by definition – also a valid Certified PDF 2 file, the scripting API presented in this document fully supports the GWG Proof of Preflight specification.
Certified PDF 1
The scripting API does not support Certified PDF 1 files. In other words, Certified PDF 1 files are treated as regular PDF files, and any Certified PDF 1 data structures are ignored.

Accessing XMP metadata

Certified PDF 2 metadata
After obtaining a CP2 dataset, the Certified PDF 2 metadata stored in the XMP packet must be accessed exclusively through the functions specific for the CP2 data model. Accessing any XMP metadata fields in the CP2 namespace using the regular XMP access functions has undefined results.
Note: This allows the CP2 data model implementation to cache Certified PDF 2 information without immediate write-through to the XMP fields.
Other XMP Metadata
It is perfectly valid however to access any other XMP metadata (i.e. unrelated to Certified PDF 2) using the regular XMP access functions on the CP2 dataset, even intermixed with the use of functions specific for the CP2 data model.
Note: This requires the CP2 data model implementation to update the Certified PDF 2 information independent of the rest of the XMP packet or – if this is not possible – to detect any regular XMP updates and synchronize when needed.

File level operations

Validity
The CP2 data model offers functions to determine whether the underlying file has a valid Certified PDF 2 signature, and whether it contains a valid Certified PDF 2 data structure even if the signature is invalid.
Creating a Certified PDF 2 file
Converting a regular PDF file to a Certified PDF 2 file requires no extra functions; it can be accomplished as follows:
  • Obtain a writable CP2 dataset for the PDF file.
  • Add any relevant Certified PDF 2 metadata information to the dataset using the CP2 data model functions described later.
  • Explicitly finish writing on the dataset or exit the entry point.
Removing Certified PDF 2 information
Converting a Certified PDF 2 file to a regular PDF file can be accomplished as follows:

  • Obtain a writable CP2 dataset for the Certified PDF 2 file.
  • Call the removeCertifiedPDF() function on the CP2 dataset.
  • Explicitly finish writing on the dataset or exit the entry point.
Changing the PDF file
The scripting API does not offer any functions to update portions of a PDF file other than the changes directly related to the Certified PDF 2 data structures (including the XMP packet and the Certified PDF 2 signature).

Also, in an entry point that obtains a writable CP2 dataset on the incoming PDF file, it is not allowed to invoke an external application that changes that PDF file (or keeps it open for update without actually modifying it). This is because:


  • A writable CP2 dataset keeps the underlying PDF file open for update.
  • The process of writing the changes back to a temporary copy of the file is not under the script programmer's control. Thus cannot be coordinated with the external application.

File level validity

hasValidSignature( ) : Boolean [R]

Returns true if the backing file for which this dataset was created has a valid Certified PDF 2 signature; returns false if the file has no conforming signature or if the signature is no longer valid because the file was modified.

hasPreviousSessions( ) : Boolean [R]

Returns true if the backing file for which this dataset was created contains a valid Certified PDF 2 data structure with at least one session (not counting any sessions automatically added while obtaining the dataset); returns false otherwise.

If hasValidSignature() returns true, hasPreviousSessions() should return true as well (unless some application has created a corrupt Certified PDF 2 file). However if hasValidSignature() returns false, hasPreviousSessions() may still return true.

Removing Certified PDF 2

removeCertifiedPDF2( fullSave : Boolean ) [w]
Removes all metadata related to Certified PDF 2 from the dataset and causes the dataset to subsequently behave as a regular XMP dataset. When finishWriting() is called (or the entry point exits), any data structures related to Certified PDF 2 are removed from the underlying file.

If fullSave is false or missing, these changes are affected through incremental save, which means the file will contain the original information and overhead. If fullSave is true, a full save is performed, eliminating any information that is no longer referenced and thus reducing overhead.

After this function was invoked on a dataset, invoking any of the CP2-specific functions on the dataset is a programming error and has unpredictable results.

Sessions

A session represents the work done to a Certified PDF 2 file between saves.

Active session
When obtaining a CP2 dataset, a new session object is automatically created and stored in the dataset. This session is called the "Active" session and it contains any changes made to the CP2 dataset during this entry point. The session is automatically completed when the dataset is finished for writing (explicitly or by exiting the entry point).
Delaing with uncertified files
If a previously valid Certified PDF 2 file has been updated by a nonconforming application, its XMP metadata do not contain session objects for each performed save. When obtaining a writable CP2 dataset for such a file, session objects are automatically created for each uncertified save, before creating the active session.
Note: For implementation, use a subset of the PitStop algorithms to determine the missing sessions, without considering any Certified PDF 1 information; create minimal sessions (containing required properties only) with editing zone "All".
Updating sessions
Once a session has been finished it can no longer be modified. The active session is the only editable session object. Any session can be removed from the dataset. However when a session is removed, all previous sessions are automatically removed as well. In addition any certificates added during the removed sessions are automatically removed as well. As an alternative to removing a session completely (and as an exception to the rule of not modifying previous sessions), it is possible to strip a session. Stripping a session removes all optional session properties while leaving any certificates intact.
Note: After implementation, the scripting API should remove any PDF objects associated as private data with removed or stripped sessions or with removed certificates.
getAllSessions( ) : SessionList [R]
Returns a list of all sessions in the dataset, including the active session, in order of occurrence (that is, the active session is the last session in the list).
getPreviousSessions( ) : SessionList [R]
Returns a list of the sessions that were already in the backing file before the dataset was created, that is, excluding the active session and any other automatically added sessions. The list is in order of occurrence (that is, the most recent session is listed last). The list may be empty.
getActiveSession( ) : Session [R]
Returns the active session; this is the only editable session.
touchZones( zones : String | String[] ) : Boolean [W]
Notifies the dataset that the specified editing zone(s) have been touched in the PDF file during the active session. This function performs three actions:
  • Add the specified zone(s) to the editing zones of the active session.
  • Set the state of any certificate with overlapping zones and residing in the active session to "Unknown".
  • Cause any certificate with overlapping zones and residing in previous sessions to become dirty (there is no actual change in the data structure, just in the internal caches for the dataset).

The function returns true if any certificate was affected, false otherwise.

removeSessionsIncluding( index : Number ) [W]
Removes the session at the specified zero-based index (in the list returned by getAllSessions) and all previous sessions. All certificates associated with these sessions and any orphaned users are removed as well.
stripSessionAt( index : Number ) [W]
Removes all optional information from the session at the specified zero-based index (in the list returned by getAllSessions). If the associated user becomes orphaned, it is removed as well.

Certificates

A CP2 dataset can contain zero or more certificates. Each certificate references the session during which it was created.
Adding certificates
New certificates can be added to a writable dataset. A new certificate is automatically associated with the active session.
Certificate class
The class ID of a new certificate must be specified when the certificate object is constructed which cannot be changed any time later. The scripting API does not support access to class properties in certificates.
Updating certificates
Certificates other than those associated with the active session cannot be updated. It is possible however to completely remove any certificate.
Note: After implementation, the scripting API should remove any PDF objects associated as private data with removed certificates. The scripting API does not support influencing the order of certificates.
Proof of Preflight certificate
When obtaining a CP2 dataset for a PDF file that contains a GWG Proof of Preflight ticket without a corresponding preflight certificate, a new preflight certificate is automatically created and stored in the dataset. Depending on implementation issues, such auto-generated preflight certificate may be associated with the active session or with a previous session.
getAllCertificates( ) : CertificateList [R]
Returns a list of all certificates in the dataset, including any certificates in the active session, in order of appearance in the dataset. The list may be empty.
addNewCertificate( classID : String ) : Certificate [W]
Creates a new certificate of the specified vendor-neutral class, associates it with the current session, adds it to the dataset and returns a reference to the new certificate.

A certificate class associates additional semantics with a particular type of certificate. Even if the scripting API does not support access to the class properties that may be defined for certain certificate classes, it is important to specify the appropriate class. For example, a preflight certificate may cause a GWG Proof of Preflight ticket to be written in the PDF file, while other certificates do not show this correspondence.

The Certified PDF 2 specification describes two "built-in" classes, as described in the following table:

classID (CP2:class_id) Description
"" (the empty string) A generic certificate, without additional semantics
"Preflight" A preflight certificate, which may correspond to a GWG Proof of Preflight ticket in the PDF file

In many cases these built-in classes provide sufficient functionality. Third-party vendors may agree on other values for classID; it is recommended to register such other values with Enfocus to avoid name collisions.

After adding a new certificate with this function, and before finishing the dataset (or exiting the entry point), the script must set the certificate's required properties (see "Required and Automatic Properties" section below) to a non-empty value. Alternatively the script may remove the certificate. Failing to do either of these is a programming error and has unpredictable results.
Note: The built-in classID values are easily mapped to one of the built-in C++ certificate classes. However supporting other classID values requires a C++ certificate class with dynamic classID (but without support for accessing class properties). Such a class may need to be implemented within the toolkit (so that special hacks can be applied).
removeCertificateAt( index : Number ) [W]
Removes the certificate at the specified zero-based index (in the list returned by getAllCertificates).

After completion of this function, the indexes for the remaining certificates may have shifted and references to the removed certificate may have become invalid. Using such invalid references is a programming error and has unpredictable results.

Users

A CP2 dataset can contain zero or more user objects. Each session can reference a user; multiple sessions can share the same user.

Active user
There is a function to create a user object for the active session. If the specified user turns out to have the same properties as one of the users already present in the CP2 dataset, the active session is updated to reference that existing user. Otherwise a new user object is automatically added to the dataset.
Updating users
Only active users can be updated. When a user is no longer referenced by any session (because the referencing sessions have been stripped or removed), the "orphaned" user object is automatically removed from the dataset.
Note: The C++ toolkit deals with users in a slightly different way; the scripting API implementation will have to cache the user object and associate either the new object or an existing user object with the active session when the session is finalized.
getAllUsers( ) : UserList [R]
Returns a list of all users in the dataset, including the active user if any, in arbitrary order. The list may be empty.
addActiveUser( ) : User [W]
If the active session has an associated user, this function returns it. If the active session has no associated user, this function creates a new user object, associates it with the active session and returns a reference to it.
removeActiveUser( ) : User [W]
Removes the active user from the dataset and dissociates the active session from any user.

Required and automatic properties

Session, user and certificate objects have optional and required properties, as stated in the Certified PDF 2 file format specification.
Automatic properties
The scripting API automatically generates appropriate values for most (but not all) required properties and for some optional properties. Some of these automatic properties are not even exposed to the script programmer. Others are available for reading but can't be changed under the script programmer's control. The following table lists the automatic properties and corresponding details.
Object Property Exposure
Session CP2:session_id

CP2:start_byte

CP2:user_id_ref
Not exposed
Session CP2:start_time

CP2:end_time

Read-only
Session CP2:tool_id

CP2:tool_version

CP2:tool_desc

CP2:appl_id

CP2:appl_version

CP2:appl_desc
Read-only
User CP2:user_id Not exposed
Certificate CP2:session_id_ref Not exposed
Certificate CP2:time Read-only
Other required properties
For sessions and users all required properties are automatic. For certificates however the required properties listed in the following table must be supplied by the script programmer; most even don't have a default value.
Object Property Default value
Certificate CP2:type_id

CP2:type_version

CP2:type_desc

CP2:impl_id

CP2:impl_version

CP2:statement_desc
None

Value to be supplied by script

Certificate CP2:state "Unknown"

Value can be overridden by script

Certificate CP2:zones "All"

Value can be overridden by script

Editing zones

Zones for which a certificate is sensitive
A new certificate is automatically sensitive to all editing zones. If this is not changed, any zone touched in any following session makes the certificate dirty. It is advisable to narrow the certificate's editing zones down to avoid it becoming dirty unnecessarily.
Zones touched during a session
The active session starts without any editing zones set. For each modification applied to the PDF file during that session, appropriate editing zones must be added. Adding an editing zone to a session has the following effect on any certificates that are sensitive to that zone:
  • If the certificate resides in a previous session it becomes dirty. This is a consequence of the definitions in the Certified PDF 2 file format specification; there is no need to update the certificate.
  • If the certificate resides in the active session, its state is automatically reset to "Unknown". According to the Certified PDF 2 file format specification a certificate can never become dirty due to an editing zone listed for the session during which the certificate was added. Resetting the certificate state is the only way to indicate that the certificate may no longer represent the state of the file.