Email message schema

The mail receive tool automatically picks up the contents of incoming email messages as metadata. Specifically, for each incoming message the mail receive tool creates a metadata dataset that contains the relevant components of the message, such as the sender, to, cc, and reply-to addresses and the complete email body text. This dataset is then associated with the job ticket for all files delivered by the message under the standard dataset name "Email".

Data model and schema

The email message dataset uses the XML data model with a simple schema without namespaces.

The XML document element name is "email". It contains an element for each message component as described in the table below; each element contains the text of the corresponding message component. If a message component is not present or it contains only white space, the corresponding element is missing.

Leading and trailing white space is removed (except for the body text, where it may be significant). Otherwise, no parsing or reformatting is performed. For example, a list of email addresses (including its separators) is stored exactly as it was provided in the incoming message.

Element name

Message component

message-id

An identifier for the message (assigned by the host which generated the message)

subject

The subject line of the message

date

The date and time when the message was sent (formatted as it appeared in the message header)

from

The email address(es) of the author(s) of the message

sender

The single email address of the sender of the message; if there is a single author the sender is identical to the author, otherwise the sender should be one of the authors

reply-to

The email addresses to which a response to this message should be sent

to

The email addresses of the primary recipients of this message

cc

The email addresses of the secondary recipients of this message

body

A plain-text rendition (without markup) of the body text of this message