FileStatistics class

The FileStatistics class allows retrieving certain statistics about file contents for a number of supported file formats. The class does not allow modifying file contents.

Each FileStatistics instance references a particular file, which may be any file on the local file system (whether it is a job or not). The FileStatistics class does not support folders.

Constructing

FileStatistics( file-path : String ) : FileStatistics

Constructs a FileStatistics instance associated with a file specified through its absolute file path. If the specified path references a folder rather than file, the constructed instance behaves as if the referenced file doesn't exist.

Getting file system attributes

The functions in this section work independently of the file's file format.

getPath( ) : String

Returns the absolute file path.

getName( ) : String

Returns the filename including filename extension if present.

getNameProper( ) : String

Returns the filename excluding filename extension.

getExtension( ) : String

Returns the filename extension, or the empty string if there is none.

getMacType( ) : String

Returns the Mac file type code as a 4-character string if available, otherwise the empty string.

getMacCreator( ) : String

Returns the Mac creator code as a 4-character string if available, otherwise the empty string.

isType( ext : String ) : Boolean

Returns true if the file matches the specified file type, specified as a filename extension, and false otherwise. A file matches if its filename extension and/or its Mac file type (after conversion) match the specified filename extension.

This function does not examine the file contents.

isFile( ) : Boolean

Returns true if the file exists, false otherwise.

getByteCount( ) : Number

Returns the size in bytes of the file, or zero if the file doesn't exist.

getCreated( ) : Date

Returns the creation time of the file, or null if the file doesn't exist.

getModified( ) : Date

Returns the last modification time of the file, or null if the file doesn't exist.

Accessing embedded metadata

getEmbeddedDataset( ) : Dataset

Returns a read-only embedded metadata dataset object with the XMP data model for the metadata embedded in the file, or null if the file doesn't exist. If the file exists but it has no supported embedded metadata, the function returns a valid but empty dataset.

The backing file path for the dataset points to the file. Metadata may be embedded in the file as an XMP packet and/or as binary EXIF or IPTC tags. Metadata fields from multiple sources are synchronized into a unified XMP data model. See supported file formats for more information.

This function behaves similarly to the Job.getEmbeddedDataset() function; it supports the same file and metadata formats and performs the same synchronizations. The advantage is that it can be used with any file (for example, to iterate over all files inside a job folder). It does not however allow metadata updates (it always returns a read-only dataset) and it does not support folders (i.e. it doesn't look for an appropriate backing file inside a folder).

Recognizing file format

The functions in this section recognize file format by looking at the file contents. The current implementation supports the following formats:

Filename extension

Description

AI

Adobe Illustrator (internal format is PDF)

AVI

Video clip

EPS

Encapsulated PostScript

INDD

Adobe InDesign

JPEG

JPEG image

MOV

QuickTime movie

MP3

MP3 sound

PDF

Adobe PDF (Portable Document Format)

PNG

PNG image

PS

Adobe PostScript

PSD

Adobe Photoshop

TIFF

TIFF image

WAV

Wave sound

getFileFormat( ) : String

Returns the format of the file contents (as one of the strings in the first column of the table above), or the empty string if the format is not recognized or if the file doesn't exist. This function checks for all supported file formats, so it may be a bit slow.

isFileFormat( format : String ) : Boolean

Returns true if the file exists and its contents has the specified format (as one of the strings in the first column of the table above); otherwise it returns false. This function is faster than getFileFormat() since it has to check for only a single file format.

Getting contents statistics

The functions in this section interpret the file contents to obtain certain file-format-specific statistics. The requested statistic is specified through its name (as defined for each format in subsequent sections). Each statistic has a well-defined data type (listed with its description). It is recommended to use the "get" function with the appropriate data type, but reasonable conversions between data types are supported.

The functions return the null object (as opposed to an empty string or a zero number) if:


getString( query : String ) : String

Returns the requested statistic as a string.

getStringList( query : String ) : String[ ]

Returns the requested statistic as a list of strings.

getNumber( query : String ) : Number

Returns the requested statistic as a number.

getBoolean( query : String ) : Boolean

Returns the requested statistic as a Boolean.

getDate( query : String ) : Date

Returns the requested statistic as a date-time.

Data type conversions

Statistic data type

String

Number

Boolean

Date

String

Straightforward

Decimal number representation

Interpret as in the XMP data model

Interpret ISO 8601 date-time representation

Integer

Decimal representation of the number

Straightforward

False if zero; true if nonzero

Not supported

Rational

Decimal floating point representation of the number

Straightforward

False if zero; true if nonzero

Not supported

Boolean

"true" or "false"

1 or 0

Straightforward

Not supported

Date

ISO 8601 representation of the date-time

Not supported

Not supported

Straightforward

A string list is converted to another data type by converting the first string in the list (if the list is empty, the conversion is not supported). Any other data type is converted to a string list by converting it to a string and forming a list of one item.

Supported statistics

Statistic name

Data type

Supported for

Description

NumberOfPages

Integer

All formats

The number of pages in the document (or a value of one if the format doesn't support multiple pages)

SamplesPerPixel

Integer

JPEG, TIFF, PNG

Number of components per pixel

PixelXDimension

Integer

JPEG, TIFF, PNG

Valid image width, in pixels

PixelYDimension

Integer

JPEG, TIFF, PNG

Valid image height, in pixels

ColorMode

Integer

JPEG, TIFF, PNG

The color mode used: 0 = Bitmap; 1 = Gray; 2 = Indexed color; 3 = RGB; 4 = CMYK; 7 = Multichannel; 8 = Duotone; 9 = Lab color

ColorSpace

Integer

JPEG, TIFF, PNG

Color space information: 1 = sRGB; 65535 = uncalibrated

ICCProfile

String

JPEG, TIFF, PNG

The name of ICC color profile used, if any

Colorants

String list

TIFF

A list of the names of all colorants for Duotone or Multichannel images; the list is empty for all other color models

CellWidth

Integer

TIFF

The width of the dithering or halftoning matrix used to create a dithered or halftoned bilevel file

CellLength

Integer

TIFF

The length of the dithering or halftoning matrix used to create a dithered or halftoned bilevel file

TileWidth

Integer

TIFF

The width (number of columns) of each tile

TileLength

Integer

TIFF

The length (number of rows) of each tile

ColorIntent

Integer

PNG

The sRGB rendering intent:

0 = Perceptual, 1 = RelativeColorimetric,

2 = Saturation, 3 = AbsoluteColorimetric

4 = invalid value

Version

String

PDF

The version of the file format (for example "1.6")

PageWidth

Rational

PDF

The width of the first page in the document, in points (derived from the media box)

PageHeight

Rational

PDF

The height of the first page in the document, in points (derived from the media box)

PageLabels

String list

PDF

A list of the page labels for all pages in the document, or null if none of the pages has a page label

Colorants

String list

PDF

A list of the names of all colorants as they appear in Separation and DeviceN color spaces and in page separation info dictionaries in the document

Fonts

String list

PDF

A list of the names of all fonts (without subsetting prefix) as they appear in font dictionaries in the document

SecurityMethod

String

PDF

The method used to protect the document; possible values: "None", "Password", "Certificate", "LiveCycle"

PDF media boxes

The FileStatistics class offers the following additional statistics for the PDF file format:

Statistic name

Data type

Description

PageBoxesEqual

Boolean

True if all pages have identical page boxes; false otherwise (verifies media, crop, bleed, trim and art box)

MediaBoxWidth

Rational

The width of the media box for the first page in the document, in points

This is a synonym for the ‘PageWidth' statistic

MediaBoxHeight

Rational

The height of the media box for the first page in the document, in points

This is a synonym for the ‘Pageheight' statistic

CropBoxWidth

Rational

The width of the crop box for the first page in the document, in points

CropBoxHeight

Rational

The height of the crop box for the first page in the document, in points

BleedBoxWidth

Rational

The width of the bleed box for the first page in the document, in points

BleedBoxHeight

Rational

The height of the bleed box for the first page in the document, in points

TrimBoxWidth

Rational

The width of the trim box for the first page in the document, in points

TrimBoxHeight

Rational

The height of the trim box for the first page in the document, in points

ArtBoxWidth

Rational

The width of the art box for the first page in the document, in points

ArtBoxHeight

Rational

The height of the art box for the first page in the document, in points

PDF/X version key

The FileStatistics class offers the following additional statistics for the PDF file format:

Statistic name

Data type

Description

PDFXVersionKey

String

The contents of the PDF/X version key in the document, or the empty string if there is no such key; one of the following values may be returned:


  • PDF/X-1a:2001

  • PDF/X-3:2002

  • PDF/X-1a:2003

  • PDF/X-3:2003

  • PDF/X-4

  • PDF/X-4p

The PDF/X version key indicates a claim that the document conforms to the PDF/X specification, but it does not offer any guarantees

PDF page content

The FileStatistics class offers the following additional statistics for the PDF file format.

Obtainging the information for these statistics requires parsing the complete page contents, so it might be time consuming.

Statistic name

Data type

Description

ColorSpaceFamilies

String list

A list of the names of all color space families actually used in the document's page contents; the following names may be returned:

DeviceGray, DeviceRGB, DeviceCMYK, CalGray, CalRGB, Lab, ICCBasedGray, ICCBasedRGB, ICCBasedCMYK, Indexed, Pattern, Separation, DeviceN.

For Indexed or Pattern, the underlying color space families are also listed

TransparentColor

Boolean

True if the document's page contents uses transparency for defining colors; false otherwise

FontTypes

String list

A list of the names of all font types actually used in the document's page contents; the following names may be returned:


  • TrueType

  • Type1

  • Type3

  • MultipleMaster

  • Composite