Friday, 24 May 2013  
 
 
 
 
 
 
 
 

Main Menu
Donate
Visitors Counter
mod_vvisit_countermod_vvisit_countermod_vvisit_countermod_vvisit_countermod_vvisit_countermod_vvisit_countermod_vvisit_counter
mod_vvisit_counterToday150
mod_vvisit_counterYesterday257
mod_vvisit_counterAll830877
amazon.de

Choosing the right data type Attention: open in a new window. PDFPrintE-mail

User Rating: / 4
PoorBest 
Choose the right data type for the issue at hand.

Choosing the right data type

Choosing the right data type is not so easy as you may think. Especially when you planning to write a program which should run without modifications on several systems, platforms and compilers. The are several basic data types like integer, character, real and maybe even strings. In the following I will explain the issue by using C++ asa an example. But the key point also applies to any other programming language.

Data types in C/C++

In C/C++ there are the data types char, short, int and long to represent integers. And each of these data types can be either signed or unsigned. This makes 8 different types in total for integers. For real numbers there are two (or three) different types: float, double and long double. To make the situation more complicate, the different data types do not always have the same value range and representations on different platforms and/or compilers. The question is when to use which type to accomplish the task at hand.

Value range

The value range of a data type depends on its width and if the data type is signed or unsigned. A simple signed char can hold values from -128 to +127. For some problems this is enough, but you get into trouble if you want to store the value 200 in a char. Check the value range of the number problem at hand and choose the data type which fits best. But also try to ask yourself the question: Is it possible that even larger/smaller number can arise? If you determined the maximum value range and the next question is: Use a signed or an unsigned version of the data type.

Signed vs. Unsigned

Very often the compiler raises a warning about a comparision of a signed and an unsigned value or an signed to unsigned assignment. You may ignore those warning if you are very certain that it's not a problem. In some cases you may have introduced a serious problem.

Generate Sine-Wave

To demonstrate the problem of signed and unsigned check out the following example. The task is to generate several sine-values and store them in an array. So you write a simple loop, which puts the calculates sine value into an array:

#include 
#include 

int main(int argc, char ** argv)
{
    unsigned short	sine[100];
    unsigned 		n;

    for(n = 0; n < sizeof(sine)/sizeof(sine[0]); n++)
    {
    	sine[n] = 32000 * sin(2 * 3.1416 * ((float)n / 50.0));
    	printf("sine[%3i]=%i\n",n,sine[n]);
    }
    return 0;
}

Try to compile this example (e.g. gcc -o sine sine.c -lm -pedantic) and see for youself that the result is not as you may expect. The program outputs the following lines:

sine[  0]=0
sine[  1]=4010
sine[  2]=7958
sine[  3]=11780
sine[  4]=15416
sine[  5]=18809
sine[  6]=21905
sine[  7]=24656
sine[  8]=27018
sine[  9]=28954
sine[ 10]=30433
sine[ 11]=31433
sine[ 12]=31936
sine[ 13]=31936
sine[ 14]=31433
sine[ 15]=30433
sine[ 16]=28954
sine[ 17]=27018
sine[ 18]=24656
sine[ 19]=21905
sine[ 20]=18808
sine[ 21]=15415
sine[ 22]=11779
sine[ 23]=7957
sine[ 24]=4010
sine[ 25]=0
sine[ 26]=61526
sine[ 27]=57578
sine[ 28]=53756
sine[ 29]=50120
sine[ 30]=46727
sine[ 31]=43631
sine[ 32]=40880
sine[ 33]=38518
sine[ 34]=36582
...

As you can see the result is as expected up to n=25, but what happens if n gets larger? That's the result of an mixture of signed and unsigned. The array of sine values is defined as unsigned, but the result of sin() is signed. Note that on this example even the compiler does not complain about this problem. The fix for this problem is very easy (of course); change the type of the sine array from unsigned to signed and the results are as expected:

...
sine[ 22]=11779
sine[ 23]=7957
sine[ 24]=4010
sine[ 25]=0
sine[ 26]=-4010
sine[ 27]=-7958
...

Portability

One important point of choosing the right data type is ensuring that your code remains portable. If your using specific data types only available on one compiler or system you have to put in much effort to port it to another compiler or system. Portability can be easily accomplished if you use the data types defined by the standard. These data types must be present on any compiler which pretends to be compatible to the standard. For example the C99 standard defines that there must be header file called stdint.h which defines a type int32_t for a signed 32-Bit integer, uint16_t for a unsigned 16-Bit Integer.

Sometimes you don't care much if to use 32-Bits or 64-Bit to represent a simple number. The best example for this issue are counter variable in loops. In such cases you could simple use a integer type without specifying the exact width. For example to count from 0 to 30 you could simple use an int or unsigned. But you keep the value range of the data type and the signed-unsigned-issue in mind.

Special data types for special situations

Most available libraries introduce the own data type for special use cases. These data types are based on the native data types, but they have one advantage: They improve readability of your program.

unsigned n;

The variable n of the type unsigned does not tell the reader that it's gonna be used to store a process identifier. So if you write

pid_t n;

instead its much clearer for which purpose the variable is meant. At this point you may say "But this I can also accomplish by choosing the right name for the variable!". Your right, but choosing the right data type can increase the readability. And there's another reason why to use these special data type instead of the native data types. Let's assume on your OS there maximum number of processes is 216, so you choose an unsigned short to represent the process identifier. After several years and several thousand lines of code you increase the maximum number to 232. Now you use an unsigned int for the PIDs and you have change all occurance of unsigned short to unsigned int when it's been used to hold a process identifier. If you just used the type pid_t instead, the change would be quiet simple. Only change the definition of the type pid_t and that's all.

There a several well known type which should be used in certain situations. A very good example is the type size_t in C and C++. It's supposed to be used to measure the size or length of an object or buffer. Many functions of the standard library of C/C++ are using size_t when the size of a buffer needs to be specified. For example strlen returns the length of the given string in characters as size_t. But many people use a unsigned or int to represent the size of a buffer. Using size_t in such cases would make it more easier to understand the function and its parameters. So choose size_t whenever you needs the length or size of an object, buffer or string.

Performance

Choosing a data type may also have some influence on the performance of your program. The 64-bit arithmetic operation on a 32-bit machine must be implemented by the compiler (or in the libraries of the compiler) since most 32-bit machines do not have 64-bit arithmetic in the regular instruction set. For example a addition of two 64-bit value must be carried out using several instructions to get the result. A addition of two 32-bit value can be done by a single instruction. For some applications the difference matters, especially if you must perform such an operation very frequently.

Endianness

Normally the endianness of the values does not matter if your programm does not interact with other program on other platforms. But if you intend to exchange information in a network you have to make sure that every member uses the same representation of your data. On little-endian machines (like AMD or Intel) the integer numbers are store with the least significant bit at the highest memory location. The big-endian machines (like Motorola and some PowerPCs) on the other hand are putting the most-signifacant bit into the highest memory location.

Value
0xDEADCAFE
Little-Endian
0xFECAADDE
Big-Endian
0xDEADCAFE

Most data which is transferred over the network is done so in big-endian byte order. Sometimes it's also called network byte order. This ensures that machines can communication with eachother also if there are using very different hardware.

Summary

As you have seen to choose the right data type is sometimes not so easy at all. You have to consider which value range is required and if you need negativ value or not. We have seen that you can decrease the effort for porting your code from one system to another if your using portable data types. Two minor points mentioned in this article are the performance conciderations and endianess. After reading this article your should be able to choose the right data type for your problem.

 
Search
Polls
Do you read personal blogs?
 
Google Search
amazon.de
Mozilla.org
Firefox Download Button
top
Copyright © by AR Soft 2005-2013