Email message schema

The mail receive tool automatically picks up the contents of incoming email messages as metadata. Specifically, for each incoming message the mail receive tool creates a metadata dataset that contains the relevant components of the message, such as the sender, to, cc, and reply-to addresses and the complete email body text. This dataset is then associated with the job ticket for all files delivered by the message under the standard dataset name "Email".

Data model and schema

The email message dataset uses the XML data model with a simple schema without namespaces.

The XML document element name is "email". It contains an element for each message component as described in the table below; each element contains the text of the corresponding message component. If a message component is not present or it contains only white space, the corresponding element is missing.

Leading and trailing white space is removed (except for the body text, where it may be significant). Otherwise, no parsing or reformatting is performed. For example, a list of email addresses (including its separators) is stored exactly as it was provided in the incoming message.

Element name	Message component
message-id	An identifier for the message (assigned by the host which generated the message)
subject	The subject line of the message
date	The date and time when the message was sent (formatted as it appeared in the message header)
from	The email address(es) of the author(s) of the message
sender	The single email address of the sender of the message; if there is a single author the sender is identical to the author, otherwise the sender should be one of the authors
reply-to	The email addresses to which a response to this message should be sent
to	The email addresses of the primary recipients of this message
cc	The email addresses of the secondary recipients of this message
body	A plain-text rendition (without markup) of the body text of this message