Text encoding

JavaScript strings in Switch store Unicode text in UTF-16 encoding. A database implementation may use a different text encoding, in which case the appropriate conversion must be performed when exchanging text. The ODBC interface unfortunately does not offer a mechanism to discover the text encoding used by the database, so this information must be provided by the user.

Most modern databases use Unicode-aware encodings such as UTF-16 or UTF-8. Older database implementations may use local 8-bit encodings that are not Unicode-aware. To complicate matters even more, some databases use a mixture of encodings (for example UTF-8 for storing text in data fields and UTF-16 for interpreting queries).

Codec Arguments

Selected functions in the Switch database module offer extra "codec" arguments to support databases that use text encodings other than UTF-16.

The DataSource.connect() function allows specifying two codec arguments:


The codecs established with the connect() function are used for all Statement instances derived from the DataSource during the connection. However the Statement.getString() function allows overriding the data-codec on an individual field basis.

Text items exchanged with the database

Function

Text item

Direction

Encoding

connect()

user name and password

To database

query-codec

execute()

SQL statement

To database

query-codec

tables()

Table names

From database

query-codec

columns()

Table name

To database

query-codec

Column names

From database

query-codec

getColumnName() getColumnDataType() getColumnSize()

Column name

From database

query-codec

getNumber() getDate() getString() getBinary()

Column name

From database

query-codec

getString() for STRING and BINARY data types

Data field

From database

data-codec

Default behaviour

The default query-codec and data-codec is UTF-16.

If the data-codec for the connection is UTF-16, the default codec for converting BINARY column data is latin-1 (copy the low byte and clear the high byte of each code point).

If the data-codec for the connection is an 8-bit encoding (i.e. not UTF-16), that encoding is also the default for converting BINARY column data.

The default behavior is summarized in the following table.

Item

Algorithm to determine codec

query-codec

If DataSource.connect() explicitly specifies a query-codec, use that codec. Otherwise use UTF-16

data-codec for STRING data in Statement.getString()

If Statement.getString() explicitly specifies a data-codec, use that code. If DataSource.connect() explicitly specifies a data-codec, use that codec. Otherwise use UTF-16

data-codec for BINARY data in Statement.getString()

If Statement.getString() explicitly specifies a data-codec, use that codec (even if the specified codec is UTF-16). If DataSource.connect() explicitly specifies a data-codec other than UTF-16, use that codec. Otherwise use latin-1