Text encoding issues

On modern operating systems – including Windows XP and Mac OS X – file and folder names may contain any Unicode code point (except for a few platform-specific separator characters). This is true even if the default 8-bit code page is not able to represent these characters. For example, on a regular English Windows XP system it is perfectly possible to have Greek or Cyrillic characters in a filename, although these characters cannot be represented in the default Latin code page.

Switch is fully Unicode enabled and for optimal operation it requires a configured third-party application to be Unicode enabled as well. In other words Switch requires that the third-party application:


Command line on Mac OS X

Mac OS X uses UTF-8 as its default encoding for representing filenames/paths. In our experience, a command line application can simply take a file path from the command line as an 8-bit string and pass it through to a regular UNIX or Mac OS file system call even if the command line application itself is not Unicode aware. Since the functions of the Switch Process class use UTF-8 to invoke command line applications, things will automatically work correctly.

Command line on Windows

On Windows the functions of the Switch Process class use Windows-specific Unicode-enabled function calls to invoke command line applications. The command line application in turn MUST invoke the Windows-specific Unicode-enabled function calls for retrieving the command line AND for opening the files. For example, in C/C++ the command line application must use the Unicode ("wide") version of the Windows-specific "GetCommandLine" function rather than the argv[] argument in the main function (which only supports the current default code page).

Other text input/output

This includes the console input/output streams and any exchanged control files that may contain plain text (as opposed to XML which has built-in Unicode support).

It is strongly recommended to use UTF-8 for all text input/output because this encoding can represent any Unicode code point and it is upwards compatible with 7-bit ASCII in various ways (for example, with regards to line breaks and null-terminators).

If this is not feasible, at the very least the encoding used must be well-defined and documented, and it should follow these guidelines: