Introduction, bits, bytes, BCD, ASCII, characters, strings, integers and floating point

Site hosted by Angelfire.com: Build your free website today!

Data Structures And Number Systems
© Copyright Brian Brown, 1984-2000. All rights reserved.

Part 1

Reference Books:

Program Design : P Juliff
IBM Microcomputer Assembly Language : J Godfrey
Programmers Craft : R Weiland
Data Storage in a computer : CIT
Microcomputer Software Design : S Campbell

DATA STRUCTURES
Just as learning to design programs is important, so is the understanding of the correct format and usage of data. All programs use some form of data. To design programs which work correctly, a good understanding of how data is structured will be required.

This module introduces you to the various forms of data used by programs. We shall investigate how the data is stored, accessed and its typical usage within programs.

A computer stores information in Binary format. Binary is a number system which uses BITS to store data.

BITS
A bit is the smallest element of information used by a computer. A bit holds ONE of TWO possible values,

Value	Meaning
0	OFF
1	ON

A bit which is OFF is also considered to be FALSE or NOT SET; a bit which is ON is also considered to be TRUE or SET.

Because a single bit can only store two values, bits are combined together into large units in order to hold a greater range of values.

NIBBLE
A nibble is a group of FOUR bits. This gives a maximum number of 16 possible different values.


        2 ** 4 = 16  (2 to the power of the number of bits)

It is useful, when dealing with groups of bits, to determine which bit of the group has the least value, and which bit has the most or greatest value.

The Least Significant Bit AND The Most Sigificant Bit
This is deemed to be bit 0, and is always drawn at the extreme right. The Most significant bit is always shown on the extreme left, and is the bit with the greatest value.

The diagram below shows a NIBBLE, and each bits position and decimal weight value (for more information, consult the module on Number Systems).


          3   2   1   0     Bit Number
        +---+---+---+---+
        |   |   |   |   |
        +---+---+---+---+
          8   4   2   1     Decimal Weighting Value
         MSB         LSB

Lets consider an example of converting binary values into decimal.


   Bit        3 2 1 0
   Value      1 0 1 1
   
        Bit 3 is set, so it has a decimal weight value of        8
        Bit 2 is not set, so it has a decimal weight value of    0
        Bit 1 is set, so it has a decimal weight value of        2
        Bit 0 is set, so it has a decimal weight value of        1
        Adding up all the decimal weight values for each bit=   11

        So 1011 in binary is 11 in decimal!

For more examples, consult the module on Number Systems.

BYTES
Bytes are a grouping of 8 bits. This comprises TWO nibbles, as shown below.


          7   6   5   4   3   2   1   0     Bit Number
        +---+---+---+---+---+---+---+---+
        |   |   |   |   |   |   |   |   |
        +---+---+---+---+---+---+---+---+
       128  64 32 16  8  4   2    1     Decimal Weighting Value
         MSB                         LSB

Bytes are often used to store CHARACTERS. They can also be used to store numeric values,


        0 to 255
        +127 to -128

Binary Coded Decimal [BCD]
Binary code decimal digits (0-9) are represented using FOUR bits. The valid combinations of bits and their respective values are

Binary Value	Digit
0000	0
0001	1
0010	2
0011	3
0100	4
0101	5
0110	6
0111	7
1000	8
1001	9

The binary combinations 1010 to 1111 are invalid and are not used.

If the computer stores one BCD digit per byte, its called normal BCD. The unused nibble may be either all 0's or all 1's.

If two BCD digits are stored per byte, its called Packed BCD. This occurs in data transmission where numbers are being transmitted over a communications link. Packed BCD reduces the amount of time spent transmitting the numbers, as each data byte transmitted results in the sending of two BCD digits.

Consider the storing of the digits 56 in Packed BCD format.


          7   6   5   4   3   2   1   0     Bit Number
        +---+---+---+---+---+---+---+---+
        | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
        +---+---+---+---+---+---+---+---+
         MSB                         LSB

The UPPER nibble holds the value 5, whilst the LOWER nibble holds the value 6.

Status And Boolean Variables
BOOLEAN variables use a single bit to hold their value, so can only assume one of two possible states. This is either 0 (considered to be FALSE), or 1 (considered to be TRUE).

The computer handles each boolean variable as a single bit. If the bit is TRUE, then is has a value of 1. If the bit is FALSE, then it has the value 0.

When a group of bits are grouped together to form a limited range of values, this is known as a STATUS variable.

Consider the case in a program where we need to keep track of the number of minutes a phone line is busy for (within the limited range of 0 to 60). This does not require the use of a full integer, so some programming languages allow you to specify the number of bits used to allocate to variables with limited ranges.

The advantage of this approach, is that the storage space of status variables can be combined together into a single 16 or 32 bits, resulting in a saving of space.

Consider where a computer allocates 16 bits of storage per status variable. If we had three status variables, the space consumed would be 48 bits. BUT, if all the status variables could be combined and fitted into a single 16 bits of storage, we could save 32 bits of memory. This is very important in real-time systems where memory space is at a premium.

Consider the following diagram, which illustrates the packing of boolean and status variables together into a single byte.


          7   6   5   4   3   2   1   0     Bit Number
        +---+---+---+---+---+---+---+---+
        | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |
        +---+---+---+---+---+---+---+---+
                  |   |           |   |       
                  |   |           |   +--- local call/TOLL call (bit 0)
                  |   |           +------- extension busy/free  (bit 1)
                  |   +------------------- minutes (bits 2-4)
                  +----------------------- extension diverted/TRUE/FALSE

The American Standard Code for Information Interchange
ASCII is a computer code which uses 128 different encoding combinations of a group of seven bits (27 = 128) to represent,

characters A to Z, both upper and lower case
special characters, < . ? : etc
numbers 0 to 9
special control codes used for device control

Lets now look at the encoding method. The table below shows the bit combinations required for each character.

NUL

SOH

STX

ETX

EOT

ENQ

ACK

BEL

TAB

DLE

DC1

DC2

DC3

DC4

NAK

SYN

ETB

CAN

SUB

ESC

(

)

;

[

]

{

}

DEL

A computer usually stores information in eight bits. The eighth bit is unused in ASCII, thus is normally set to 0. Some systems may use the eight bit to implement graphics or different language symbols, ie, Greek characters.

Control codes are used in communications and printers. They may be generated from an ASCII keyboard by holding down the CTRL (control) key and pressing another key (A to Z, plus {, \, ], ^, <- ).

Example Code the text string 'Hello.' in ASCII using hexadecimal digits.


  H = 48
  e = 65
  l = 6C
  l = 6C
  o = 6F
  . = 2E

thus the string is represented by the byte sequence


   48 65 6C 6C 6F 2E

CHARACTERS
Characters are non-numeric symbols used to convey language and meaning. In English, they are combined with other characters to form words. Examples of characters are;

 
	a b c d e f g h i j k l m n o p q r s t u v w x y z 
 
	A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
 
	0 1 2 3 4 5 6 7 8 9 ! @ # $ % ^ & * ( ) _ - = + | \ 

	` , . / ; ' [ ] { } : " < >  ?

A computer system normally stores characters using the ASCII code. Each character is stored using eight bits of information, giving a total number of 256 different characters (2**8 = 256).

In the high level language Pascal, characters are defined and used as follows,

 
	var 
		plus_symbol : char; 
 
	begin 
		plus_symbol := '+';

Variables used in a Pascal program are declared after the keyword var. The above example declares the variable plus_symbol to be a character type, thus eight bits of memory storage are allocated to store its value (as yet undetermined).

Inside the main body of the program, after the keyword begin, the statement shown assigns the symbol + to the character variable plus_symbol. This is equivalent to storing the ASCII value 2B hexadecimal in the eight bits of memory allocated to the variable plus_symbol.

TEXT STRINGS
Text strings are a sequence of characters (ie, words or multi- character symbols). Each character is stored one after the other, each occupying eight bits of memory storage.

The text string Hello would be stored as follows

 
		+------+ 
		|  48  | 
		+------+ 
		|  65  | 
		+------+ 
		|  6C  | 
		+------+ 
		|  6C  | 
		+------+ 
		|  6F  | 
		+------+

In Turbo Pascal, text strings are defined and used as follows,

 
	var 
		text_message : string[22]; 
 
	begin 
		text_message := 'Welcome to text strings';

The above example declares the variable text_message to be a string type of up to 22 characters long (but no more!). Eight bits of memory storage are allocated to store each character in the string (a total of 22 bytes), with the value in each byte as yet undetermined.

Inside the main body of the program, after the keyword begin, the statement shown assigns the message Welcome to text strings to the string variable text_message. This stores the ASCII value of each character into each successive byte of memory allocated to the variable text_message.

INTEGERS
Numeric information cannot efficiently be stored using the ASCII format. Imagine storing the number 123,769 using ASCII. This would consume 6 bytes, and it would be difficult to tell if the number was positive or negative (though we could precede it with the character + or -).

A more efficient way of storing numeric information is to use a different encoding scheme. The encoding scheme in most use is shown below,

 
	 Bits 
	 15  14 13 12 11 10  9  8  7  6  5  4  3  2  1  0 
	+---+--------------------------------------------+ 
	| S |      Numeric Value in Binary               |
	+---+--------------------------------------------+ 
	  S = Sign Bit

Integers store whole numbers only! They do not contain fractional parts. Consider the examples below,


	Valid		Invalid 
	123		.987 
	0		0.0 
	278903		123.09

The sign bit (which is bit 15) indicates whether the number is positive or negative. A logic 1 indicates negative, a logic 0 indicates positive.

The number is converted to binary and stored in bits 0 to 14 of the two bytes.

 
	Example 
	Store the value +263 as an integer value. 
 
	 1) The sign bit is 0. 
	 2) The decimal value +263 is 100000111 in binary. 
 
	 Thus the storage in memory of the integer +263 looks like, 
 
		 Bits 
		 15  14 13 12 11 10  9  8  7  6  5  4  3  2  1  0 
		+---+---------------------------------------------+ 
		| 0 | 0  0  0  0  0  0  1  0  0  0  0  0  1  1  1 | 
		+---+---------------------------------------------+

When storing negative numbers, the number is stored using the two's complement format.

In Pascal, integers are defined and used as follows,

 
	var 
		whole_number : integer; 

	begin 
		whole_number := 1267;

The example declares the variable whole_number to be an integer type, thus sixteen bits of memory storage are allocated to store its value (as yet undetermined).

Inside the main body of the program, after the keyword begin, the statement shown assigns the numeric value 1267 to the integer variable whole_number. This is equivalent to storing the bit combination 0000010011110011 in the sixteen bits of memory allocated to the variable whole_number.

Signed integers using 16 bits have a number range of,

 
	-32768 to +32767  (+-2^15)

To store larger integer values would require more bits. Some systems and languages also support the use of unsigned integers, which are deemed to be positive only.

FLOATING POINT NUMBERS
There are two problems with integers; they cannot express fractions, and the range of the number is limited to the number of bits used. An efficient way of storing fractions is called the floating point method, which involves splitting the fraction into two parts, an exponent and a mantissa.

The exponent represents a value raised to the power of 2.

The mantissa represents a fractional value between 0 and 1.

Consider the number

 
	 12.50

The number is first converted into the format

 
	2n * 0.xxxxxx

where n represents the exponent and 0.xxxxx is the mantissa.

The computer industry agreed upon a standard for the storage of floating point numbers. It is called the IEEE 754 standard, and uses 32 bits of memory (for single precision), or 64 bits (for double precision accuracy). The single precision format looks like,

 
	 Bits 
	  31  30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0  
	 +---+-----------------------+---------------------------------------------------------------------+ 
	 | S | Exponent value        |                  Mantissa value                                     | 
	 +---+-----------------------+---------------------------------------------------------------------+ 
	  S = Sign Bit

The sign bit is 1 for a negative mantissa, and 0 for a positive mantissa.

The exponent uses a bias of 127.

The mantissa is stored as a binary value using an encoding technique.

Working out the FP bit patterns
The number we have is


	12.5

which expressed as fraction to the power of 2 is,

 
	12.5   / 2 = 6.25 
	6.25   / 2 = 3.125 
	3.125  / 2 = 1.5625 
	1.5625 / 2 = 0.78125

NOTE: Keep dividing by 2 till a fraction between 0 and 1 results. The fraction is the mantissa value, the number of divisions is the exponent value.

thus our values now are,

 
	0.78125 * (2*2*2*2) (comment: this is 2 to the power of 4)

The exponent bit pattern is stored using an excess of 127. This means that this value is added to the exponent when storing (and subtracted when removing).

The exponent bit pattern to store is,

 
	4 + 127 = 131 = '10000011'

As the mantissa is a positive value, the sign bit is 0.

The mantissa is a little more complicated to work out. Each bit represents 2 to the power of a negative number. It looks like,

 
	1st bit of mantissa = 0.5 
	2nd                 = 0.25 
	3rd                 = 0.125 
	4th                 = 0.0625 
	5th                 = 0.03125 
	etc

The mantissa number value we have is 0.78125, which in binary is


	11001000000000000000000	(0.5 + 0.25 + 0.03125)

How-ever, to make matters even more complicated, the mantissa is normalized, by moving the bit patterns to the left (each shift subtracts one from the exponent value) till the first 1 drops off.

The resulting pattern is then stored.

The mantissa now becomes

 
	10010000000000000000000

and the exponent is adjusted to become

 
	131 - 1 = 130 = '10000010'

The final assembled format is,

 
	 Bits 
	 31  30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0  
	 +--+-----------------------+---------------------------------------------------------------------+ 
	 | 0| 1  0  0  0  0  0  1  0| 1  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 | 
	 +--+-----------------------+---------------------------------------------------------------------+ 
	  S   Exponent               Mantissa

Now lets convert the following storage format back into a decimal number.


	 Bits 
	 31  30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0  
	 +--+-----------------------+---------------------------------------------------------------------+ 
	 | 1| 1  0  0  0  0  0  1  1| 1  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 | 
	 +--+-----------------------+---------------------------------------------------------------------+ 
	  S   Exponent               Mantissa

This gives a negative number with an exponent value of,

 
	131 - 127 = 4

and a mantissa value of,

 
	1.0 + 0.5 + 0.0625 = 1.5625

(the 1.0 comes from the bit which was shifted off when the mantissa was normalized ), thus the number is,

 
	-1.5625 * 2^4 = -25.0

The numeric range for floating point numbers using the IEEE-754 method, using 32 bits, is

 
	+- 0.838860808 * 2-128 to +- 0.838860808 * 2127

or, in decimal,

 
	2.4652 * 10-39 to 1.4272 * 1038

In Pascal, floating point numbers are defined and used as follows,


	var 
		fp_number : real; 

	begin 
		fp_number := 12.50;

The example declares the variable fp_number to be a floating- point type, thus thirty-two bits of memory storage are allocated to store its value (as yet undetermined).

Inside the main body of the program, after the keyword begin, the statement shown assigns the numeric value 12.50 to the real variable fp_number. This is equivalent to storing the bit combinations 01000001010010000000000000000000 in the thirty-two bits of memory allocated to the variable fp_number.

The advantages of storing floating point numbers in this way are,

multiplication is performed by adding exponents and mantissa's
division is performed by subtracting exponents and mantissa's
it is easy to compare two numbers to see which is the greater or lesser
large number ranges are stored using relatively few bits

The disadvantages of the storage format are,

errors are created by moving the mantissa bits
conversion backwards and forwards takes time

Home | Other Courses | Feedback | Notes | Tests