# Floating point representation of real numbers.

### Real number representation:

As we have seen earlier, integers can be represented by the combination of binary digits, same is the case with real numbers, but representation is a bit different. There can be many techniques to represent them, but most efficient and useful is floating point representation. The reason is described below:

### Why:

To understand, why floating point representation is better, we first need to know, what the alternatives are? here are those,

- Treat real number as a combination of two integers.
- Fixed decimal, sign bit representation.

But these have some disadvantages,

- First: It does not seem to be logical. When we go this way, we need to use extra memory to keep track of the separated parts. Whenever we need to perform operation, it becomes more overhead.
- Second: It has limits that restrict the wide range of numbers and also calculations. With this, we need to compromise with the accuracy, though floating point is no different here, but still much better than fixed one.

### How:

- Normalize the number.
- The number now can be separated in two parts, one is the decimal part called mantissa, and the other is the exponent. Now the number can be represented as:

sign | m | m | m | m | sign | e | e |

The above is a simple example of 8 bit representation, there are 32, 64 bits representation also. There are different standards by which, the representation is implemented, i.e. number of bits for mantissa, number of bits for exponent etc., are standard defined. We will see the algorithm for our example.

#### IEEE 754 defines two standards for real no as:

- Single precision (32-bit) floating point representation.
- Double precision (64-bit) floating point representation.

Sign(1bit) | Exponent 2-9 (8 bits) | Mantissa 10-32 (23 bits) |

Sign(1bit) | Exponent 2-12 (11 bits) | Mantissa 13-64 (52 bits) |

Let us represent 1100.011

Read n ->1100.011

Normalize ->1.100011E 00110101

0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 |

In the above representation excess-50 notation is used. For understanding how to perform normalization, you can follow the link in references.

### Normalization:

There will be two parts of a normalized number, called mantissa and exponent. For our example, we will use 16-bit representation.
For mantissa, shift the decimal to either right or left until there is a non-zero on its right and a zero on its left. And fill the remaining bits with zeros.

And the number of shifts will account for exponent. i.e. the number of shifts to left will be written as E+(no of shifts) and to right will be written as E-(no of shifts).

example:

100.011

Mantissa: 0.1000110

exponent: E+3 or E+00000011(binary)

### Why normalization:

As we see, the computer has a different representation of real numbers from one that we use. It stores bit combination for them. Every real number has a different representation, as the decimal can be anywhere in the number. So, when it comes to perform operation and manipulations on these numbers, as the computer can not understand the significance of decimal, it can not produce required results. So, for performing operations on these numbers, the numbers need to have a definite structure. Hence by normalizing the number, we provide a particular structure to the computer to represent it in a standard way.

### Algorithm:

1. Read m, e 2. if MSB = 0 3. sign = ‘+’ 4. else 5. sign = ‘-‘ 6. whilem > 1 7. m = m/10 8. E = e + 1 9. while m < 0.9 10. m = m*10 11. E = e – 1 12. write ‘sign’ + ‘e’ + ‘m’

## Comments

Leave a comment