Search

# Decimal Points

Last updated Dec 7, 2022 Edit Source

Decimal points separate whole numbers from their fractional part.

Consider a binary number as 1001 . 1010 the first part, 1001 is equivalent to 9 whilst the second part is equal to 0.625 (0.5 + 0.125), making the whole number 9.625

## # Converting denary to binary

11.75 -> 11 | 0.75 11 = 1011 0.75 = 0.11

11.75 -> 1011.11

Fixed point binary system:

• It can be noted that using this method, only certain fractional parts can be represented.
• In other words, only the fractional numbers that can be written as a sum of those numbers specified in the table can be converted to binary using this method accurately.
• Consider an 8-bit register, using two bits for the fractional part means there will be only 6 bits to store the whole number.

17.5

-> 17 | 0.5

16 8 4 2 1 1 0 0 0 1

0.5 0.1

10001.10

(correct, use longer word length)

-17.5 01110.01

32.015625 32 | 0.015625

32 16 8 4 2 1 1 0 0 0 0 0

0.15625

0.5 -> 0.25 -> 0.125 -> 0.0625 -> 0.03125 ->0.015625 0.00001

100000.00001

## # Floating-point numbers

1.3x10^7

^7 is an exponent

1.3 is the mantissa

1234 = 0.1234 x 10^4

Step 1 ) Find exponent and check sign bit

Step 2 ) Modulate the mantissa by the exponent

Mantissa is the actual number The exponent is the modifier to be applied to the mantissa.

FPB: 001010.01

32 16 8 4 2 1 . 1/2 1/4

10.25

1001.0010

17.125 (Incorrect, negative due to sign bit)

-17.5 to bin

1001.1

0.111 010

Exponent = 2 011.1010 3 + 1/2 + 1/8 = 3.625 Mantissa = 3.625

0.110111 0100 Exponent = 4 Mantissa =

-32 16 8 4 2 1 1 1 0 1 1 1

-32 + 16 + 4 + 2 +1 = -9

Act: 13.75

## # Rounding Errors

• Binary hits some issues when we deal with some numbers that aren’t factors of 2.
• It can’t be done accurately, we have to settle on being as close as possible.

### # Absolute Error

• The difference between the number you’re trying to calculate and the number you’ve managed to store.

### # Relative Error

• Essentially the same as a relative error, but it is expressed as a percentage.

• It gives us an idea of the scale of the error that we’re dealing with and how much we can trust the numbers.

• Divide the absolute error by the number that we’re trying to represent, and times the result by 100 to get a percentage.

• So 0% is entirely accurate, and 100% is completely inaccurate.

• The closer the percentage value is to 0, the greater the accuracy of the number.

Target Number: 25000 Absolute Error: 0.5 Relative Error: 0.002%

Target Number: 100 Absolute Error: 0.5 Relative Error: 0.5%

Target Number: 5000 Absolute Error: 0.5 Relative Error: 0.01%

Target Number: 10 Absolute Error: 0.00000005 Relative Error: 0.0000005%

#### # FLOP to Denary

0.010 0110

Exponent: 6

0.010

0.100 1.0000 10 100 1000 10000 16 8 4 2 1

16

0.1101 011

Exponent: 3

0110.1 4 2. 1/2

6.5

## # Normalisation

The process of improving the accuracy of a number with decimal points.

46321 (denary) can be represented as 0.46321x10^4 0.00463x10^6 ( less accurate)

Out of these representations, the first is the most precise.

This same logic can be applied to binary numbers.

An unormalised positive number consists of a sign bit (0) and one or more zeros after the decimal point.

For a normalised positive binary number, the sign bit is 0 and the bit after the sign bit is always 1.

To normalise a binary number, there must be a 1 after the decimal point. The mantissa may be lessened and the exponent will be directly increased.

To normalised a negative binary number, the first bit of both the mantissa and the exponent is 1.

The exponent may also be lessened and the mantissa increased.

01 10

• left
• right

3.5 = 00011.100

0.1110 010

### # Standard Form

If you need to add two numbers together, they must be converted into the same form.  So 4.63x10^6 cannot be added to 4.63x10^8 until the power is normalised.