JavaScript strings in Switch store Unicode text in UTF-16 encoding. A database implementation may use a different text encoding, in which case the appropriate conversion must be performed when exchanging text. The ODBC interface unfortunately does not offer a mechanism to discover the text encoding used by the database, so this information must be provided by the user.
Most modern databases use Unicode-aware encodings such as UTF-16 or UTF-8. Older database implementations may use local 8-bit encodings that are not Unicode-aware. To complicate matters even more, some databases use a mixture of encodings (for example UTF-8 for storing text in data fields and UTF-16 for interpreting queries).
Selected functions in the Switch database module offer extra "codec" arguments to support databases that use text encodings other than UTF-16.
The DataSource.connect() function allows specifying two codec arguments:
query-codec: affects queries sent to and column names retrieved from the database.
data-codec: affects STRING and BINARY data fields when retrieved as a string.
The codecs established with the connect() function are used for all Statement instances derived from the DataSource during the connection. However the Statement.getString() function allows overriding the data-codec on an individual field basis.
Function |
Text item |
Direction |
Encoding |
---|---|---|---|
connect() |
user name and password |
To database |
query-codec |
execute() |
SQL statement |
To database |
query-codec |
tables() |
Table names |
From database |
query-codec |
columns() |
Table name |
To database |
query-codec |
Column names |
From database |
query-codec |
|
getColumnName() getColumnDataType() getColumnSize() |
Column name |
From database |
query-codec |
getNumber() getDate() getString() getBinary() |
Column name |
From database |
query-codec |
getString() for STRING and BINARY data types |
Data field |
From database |
data-codec |
The default query-codec and data-codec is UTF-16.
If the data-codec for the connection is UTF-16, the default codec for converting BINARY column data is latin-1 (copy the low byte and clear the high byte of each code point).
If the data-codec for the connection is an 8-bit encoding (i.e. not UTF-16), that encoding is also the default for converting BINARY column data.
The default behavior is summarized in the following table.
Item |
Algorithm to determine codec |
---|---|
query-codec |
If DataSource.connect() explicitly specifies a query-codec, use that codec. Otherwise use UTF-16 |
data-codec for STRING data in Statement.getString() |
If Statement.getString() explicitly specifies a data-codec, use that code. If DataSource.connect() explicitly specifies a data-codec, use that codec. Otherwise use UTF-16 |
data-codec for BINARY data in Statement.getString() |
If Statement.getString() explicitly specifies a data-codec, use that codec (even if the specified codec is UTF-16). If DataSource.connect() explicitly specifies a data-codec other than UTF-16, use that codec. Otherwise use latin-1 |