Friday, February 02, 2007

C Tutorial: Strings

(link to tutorial)

Defining Strings:
C has a character type, but no string type as such. Text is defined as a character array, that is as a table of single characters. The lack of a string type in C is quite a severe deficit for application programming, and was one of the main reasons for developing C++, which does have a string type.

There are no C instructions that will handle a string as a single variable in the way MVC or MOVE do. Instead, string functions like strcpy() or memcpy() implement loops that address each character one by one.

To declare a text string, specify a type of char and place the number of characters in the array in square brackets after the string name.

char volume[6];

This generates a table of six contiguous characters and an implied pointer to the first character. A detailed description of this pointer is given in Addressing a String.

Addressing Each Character in a String:

The string declared above is an array with six members, volume[0] through volume[5]. Each member is of type character. A character literal is enclosed in single apostrophes. Double quotes identify a string literal (character array), not a character. So 'x' is a one-byte character and "x" is a two-byte character array consisting of an 'x' followed by a null-byte.

You can also assign characters hex values. In particular, note that a null-byte (a byte with all bits set to zero) can be written either using an escape character (backslash) or as a hex value:

char zerobyte = '\0'; /*use escape character*/
char zerobyte = 0x00; /*use hex value*/

Both of the following lines assign the letter S to the first character in volume[], first using a character literal and then using the EBCDIC hex value.

volume[0] = 'S';
volume[0] = 0xe2;

Since C handles characters internally as short integers, you can do arithmetic with them. This is more than a party trick: the routines to convert between uppercase and lowercase letters simply add or subtract 'A' - 'a' to the character. The following line sets the second character in volume[] to T.

volume[1] = volume[0] + 1;

Addressing a String:

The implied pointer that C generates when you declare a character array has some special properties:

  • It has the string name (no subscripts);
  • It is a constant (you can't alter its value);
  • The sizeof() function returns the size of the array, not that of a pointer.
The memcpy() function copies the string at the second address to the string at the first address for a length specified by the third parameter. Memcpy() will accept volume as its first parameter, because volume is a pointer.

Memcpy() also will allow a string literal as its second parameter. String literals are enclosed in double apostrophes and always have a null byte of binary zeros appended to the end in memory.

memcpy(volume, "STRG01", sizeof(volume));

The memset() function sets all characters in a string to a padding byte. The following line sets all six characters of volume[] to S.

memset(volume, 'S', sizeof(volume));

Memcmp() compares two strings for the length specified in the third parameter. It returns zero if a match is found.

if (memcmp(volume, "STRG", 4) == 0)
{
ProcessStorageVolume(volume);
}
else;

Looping through a String using a Subscript:
The usual way to loop through an array is to use a variable as the subscript in a for or while loop. More details about loops are given in the Loop Tutorial.

int i;
char volume[6];

for (i = 0; i LT sizeof(volume); i++)
{
volume[i] = '0';
}

This code will load volume[] with '000000'. The loop is set to terminate when i is less than 6 because the subscripts go from 0 through 5.

Looping through a String using a Pointer:
Since you can't change value of the pointer generated automatically by C when you declare an array, you must declare your own pointer to loop through a string. Initialize it by copying the implied pointer.

char volume[6];
char *cptr;

for (i = 0, cptr = volume; i LT sizeof(volume); cptr++, i++)
{
*cptr = i;
}

Initializing a String:
You can set the initial value of a character array when you declare it by specifying a string literal. If the array is too small for the literal, the literal will be truncated. If the literal (including its null terminator) is smaller than the array, then the final characters in the array will be undefined. If you don't specify the size of the array, but do specify a literal, then C will set the array to the size of the literal, including the null terminator.

char volume[6] = "STRG01"; /* contains 'S', 'T', 'R', 'G', '0', '1' */
char volume[] = "STRG01": /* contains 'S', 'T', 'R', 'G', '0', '1', '\0' */

Null-terminated Strings:
C was developed for Unix and still assumes that text strings will be delimited by a null terminator, a byte containing binary zeros. That means that string functions like strcpy() and strcmp(), as well as those that use strings like sprintf(), will identify the length of the operation to be not the length of the array declared, but the length between the first character and the first null byte, wherever it is.

S0C4 abends on a mainframe are often caused by a string not being delimited by a null terminator. The following lines will format a message containing the value in volume[], by copying volume[] into a temporary field initialized with nulls, and using that as the string.

char volume[6], temp[7], buffer[20];
memset(temp, '\0', sizeof(temp));
memcpy(temp, volume, sizeof(volume));
sprintf(buffer, "Volume is %s", temp);

If you know the exact size of a text string, you can print it out safely by setting the minimum and maximum size to print.

printf(buffer, "Volume is %6.6s", volume);