Converting code using OdString and OdAnsiString classes

Sergey Zaitsev

February 23, 2017

Sometimes developers want to convert some text from multi-byte coding to Unicode or UTF and back. For this purpose, the OdString and OdAnsiString classes may be used. Starting with Teigha version 4.3.0, CP_UTF_8 coding was included in the supported coding list.

For example, the following array of characters in UTF-8 coding is “Test string” in Russian:

char testStr[] = { '\xD0', '\xA2', '\xD0', '\xB5', '\x63', '\xD1', '\x82', '\xD0', '\xBE', '\xD0', '\xB2', '\xD0', '\xB0', '\xD1', '\x8F', '\x20', '\xD0', '\xA1', '\xD1', '\x82', '\xD1', '\x80', '\xD0', '\xBE', '\xD0', '\xBA', '\xD0', '\xB0', 0 };

Here is the initialized UTF-8 ANSI string:

OdAnsiString ansiStr(testStr, CP_UTF_8);

Here is the OdString with ASCII buffer only:

OdString unicodeStr(ansiStr);

Here we get pStr as a pointer to the Unicode buffer as the result of conversion:

const OdChar *pStr = unicodeStr.c_str();

unicodeStr has both buffers (ASCII and Unicode) synchronized. To get OdAnsiString from unicodeStr, the ASCII buffer is used without any transformation until the content stays constant.

Now we have the Unicode string which is initialized:

OdString unicodeStr1(pStr);

unicodeStr1 has an initialized Unicode buffer only.

Here we get UTF-8 text in sAnsiUtf8:

OdAnsiString sAnsiUtf8(nStr, CP_UTF_8);

Here we get ANSI_1251 text in sAnsi1251:

OdAnsiString sAnsi1251(sStr, CP_ANSI_1251);

Here we use ASCII text with code page ANSI_1251 to initialize OdString:

OdString sStr1251(sAnsi1251);

And here we get the Unicode string as a result of conversion during synchronization of buffers:

const OdChar *pStr = nStr1251.c_str();