Text encoding

JavaScript strings in Switch store Unicode text in UTF-16 encoding. A database implementation may use a different text encoding, in which case the appropriate conversion must be performed when exchanging text. The ODBC interface unfortunately does not offer a mechanism to discover the text encoding used by the database, so this information must be provided by the user.

Most modern databases use Unicode-aware encodings such as UTF-16 or UTF-8. Older database implementations may use local 8-bit encodings that are not Unicode-aware. To complicate matters even more, some databases use a mixture of encodings (for example UTF-8 for storing text in data fields and UTF-16 for interpreting queries).

Codec Arguments

Selected functions in the Switch database module offer extra "codec" arguments to support databases that use text encodings other than UTF-16.

The DataSource.connect() function allows specifying two codec arguments:

query-codec: affects queries sent to and column names retrieved from the database.
data-codec: affects STRING and BINARY data fields when retrieved as a string.

The codecs established with the connect() function are used for all Statement instances derived from the DataSource during the connection. However the Statement.getString() function allows overriding the data-codec on an individual field basis.

Text items exchanged with the database

Function	Text item	Direction	Encoding
connect()	user name and password	To database	query-codec
execute()	SQL statement	To database	query-codec
tables()	Table names	From database	query-codec
columns()	Table name	To database	query-codec
columns()	Column names	From database	query-codec
getColumnName() getColumnDataType() getColumnSize()	Column name	From database	query-codec
getNumber() getDate() getString() getBinary()	Column name	From database	query-codec
getString() for STRING and BINARY data types	Data field	From database	data-codec

Default behaviour

The default query-codec and data-codec is UTF-16.

If the data-codec for the connection is UTF-16, the default codec for converting BINARY column data is latin-1 (copy the low byte and clear the high byte of each code point).

If the data-codec for the connection is an 8-bit encoding (i.e. not UTF-16), that encoding is also the default for converting BINARY column data.

The default behavior is summarized in the following table.

Item	Algorithm to determine codec
query-codec	If DataSource.connect() explicitly specifies a query-codec, use that codec. Otherwise use UTF-16
data-codec for STRING data in Statement.getString()	If Statement.getString() explicitly specifies a data-codec, use that code. If DataSource.connect() explicitly specifies a data-codec, use that codec. Otherwise use UTF-16
data-codec for BINARY data in Statement.getString()	If Statement.getString() explicitly specifies a data-codec, use that codec (even if the specified codec is UTF-16). If DataSource.connect() explicitly specifies a data-codec other than UTF-16, use that codec. Otherwise use latin-1