LabWindows/CVI

Content Type
Programming Language
Current manual
Product DocumentationLabWindows/CVI...Working with Project and File EncodingProgramming for ANSI Multibyte Character Sets in LabWindows/CVICurrent page
Table of Contents

Programming for ANSI Multibyte Character Sets in LabWindows/CVI

Programming for ANSI Multibyte Character Sets in LabWindows/CVI

A traditional character in the C programming language consists of a single byte, which you can set to a particular value from the universal ASCII code. A multibyte character, on the other hand, is a character that can be composed of one or two bytes. A multibyte character set consists of all the multibyte characters required to represent a single language, such as Japanese.

Some terminology to be familiar with if you work with multibyte characters sets include:

  • Single-byte character—A multibyte character composed of only one byte.
  • Lead byte—The first byte of a dual-byte character.
  • Trail byte—The second byte of a dual-byte character.
  • Codepage—The numeric value that identifies a particular character set.

String Handling

The primary rule in manipulating strings that might contain multibyte characters is to always treat the lead byte and the trail byte of a dual-byte character as a single unit. Unfortunately, this affects every instance in your program where characters or strings are handled.

It is important to keep in mind the difference between the length of a string measured in bytes versus the length of a string measured in characters. In many instances, the number of bytes should be used, such as when allocating a buffer for the storage of a string, because every memory storage location of a character needs to allow for the possibility of having a two-byte character. In this case, you should continue to use the ANSI C Library function strlen, which returns the number of bytes in a string. In other cases, however, you must replace all ANSI string handling functions with the Multibyte Character functions listed in the ANSI C Library Function Tree topic or the macros described in the Multibyte Macros and Functions in toolbox.h topic.

Note Note  Refer to the EVENT_KEYPRESS topic for information about processing keypress events that result from multibyte character input.

Write all of your string processing code in a multibyte-aware manner. For example, pointers should usually indicate the start of a character, and indices should always reference the start of a character. Use the CmbStrInc or CmbStrDec macros in toolbox.h instead of the ++ and –– operators to modify the value of pointers into your strings. Process strings sequentially, from left to right, rather than randomly. Accessing random characters in a multibyte string is computationally expensive and can be error prone.

The following is a code example that performs a text search beginning at the end of a string. Before multibyte changes, your code might look like the following example:

char * CVIFUNC FindFileExtension(const char *pathString)
{

int index, count=0;
char *fileName;
char *terminatorPtr;
AssertMsg(pathString, "Null pathString parameter passed to FindFileExtension");
fileName = FindFileName(pathString);
if ((index = strlen (fileName)) == 0)

return fileName;

terminatorPtr = fileName + index;
/* do not bother checking when index is 1 because if the dot is in position 0, it really is not an extension */
for (; index > 1; index--)
{

if (fileName[index-1] == '.')

return &fileName[index];

count++;
if (count > MAX_FILE_EXTENSION_LENGTH)

return terminatorPtr;

}
return terminatorPtr;

}

After multibyte changes, your code might look like the following example:

char * CVIFUNC FindFileExtension(const char *pathString) {
       int index;
       char *fileName;
       char *ptr, *terminatorPtr;
       AssertMsg(pathString, "Null pathString parameter passed to FindFileExtension");
       fileName = FindFileName(pathString);
       if ((index = strlen (fileName)) == 0)
              return fileName;
       terminatorPtr = fileName + index;
       ptr = CmbStrPrev (fileName, terminatorPtr);
       while (ptr && ((terminatorPtr-ptr) <= MAX_FILE_EXTENSION_LENGTH+1))
       {
              if (*ptr == '.' && ptr != fileName) /* if dot is the first char in filename, */
                     return ++ptr; /* it really is not an extension */
              CmbStrDec (fileName, ptr);
       }
             return terminatorPtr;
}

In This Section
Was this information helpful?