## Description

CS2035 – Assignment 1

IEEE 754 Floating Point Numbers

Introduction

This MATLAB assignment requires you to write four (4) MATLAB functions, each contained

in its own script file. The first two convert a signed integer to 8 bit decimal (int2bin8.m) and

convert an 8 bit decimal to a signed integer (bin2int8.m). The second two functions convert

a decimal number to a 32 bit binary floating-point representation (dec2bin32.m) and a 32 bit

binary string encoding a floating-point number to a decimal number (bin2dec32.m).

You will learn about:

• binary representation of an 8-bit signed integer;

• IEEE 754 binary representation of a single precision (32 bit) floating-point number;

• loop control statements;

• if-then-else control statements;

• strings and substrings;

• creating MATLAB functions.

This assignment is worth 8% of the course mark.

IEEE 754 Floating Point Numbers

Introduction

In 1985 the Institute of Electrical and Electronics Engineers (IEEE) established a technical

standard for floating-point computation, which ensures that (to a reasonable extent) the implementation of of floating-point arithmetic on different systems is reliable and code is portable.

CS2035: Assignment 1 – 2018 2

The full standard is very complex, so we will just learn about the basics of representing floatingpoint numbers in this standard.

We are all familiar with scientific notation for decimal numbers, which represents a number

as some decimal number between 1 and 9.9 (called the mantissa) multiplied by 10 (the base)

to an integer exponent. The mantissa is always restricted to a certain number of decimal digits

(determining the precision of the representation of a number). Thus, 10π can be represented

approximately in scientific notation as

3.141592 × 101

, (1)

where the mantissa here has a precision of 6 decimal digits.

The IEEE 754 standard for binary representation of floating point numbers works in much

the same way, but using 2 rather than 10 for the base, and binary for the mantissa and exponent.

To move in this direction, if we use base 2 and a binary mantissa (using decimal for the exponent

for now), 10π can be represented approximately in scientific notation as

1.1111012 × 2

4

, (2)

where the mantissa here has a precision of 6 binary digits. The decimal equivalent of this

number is 31.25 = 24 + 23 + 22 + 21 + 20 + 2−2

, which is less than but close to the decimal

value 31.41592 in (1) above. To recover the precision of (1), we actually require a precision of

17 binary digits! But what we have here is not yet an IEEE 754 floating-point number.

Single Precision Floating-Point Numbers

For this assignment we will focus on the representation of single precision floating-point numbers. Such numbers use 32 binary digits to represent a floating-point number. This breaks

down into 1 bit for the sign (0 for positive, 1 for negative), 8 bits for the signed exponent

(giving values between −126 and +127), and 24 bits for the mantissa, i.e., 23 digits after the

decimal place (precision of 23 binary digits).

You will notice that 1 + 8 + 24 = 33, so you might think I have made an error. In fact, this

is correct. The reason is that in binary the integer part of the mantissa is always 1, so that

CS2035: Assignment 1 – 2018 3

it does not need to be stored in memory. In decimal, the integer part of the mantissa can be

any number from 1 to 9, but in binary the only possibility is 1. Thus, we only need 23 bits to

represent a 24 bit mantissa in binary.

We will illustrate the 32 bit single float representation by writing 10π in binary. To do this

we need the sign bit, which is zero here, the binary representation of the exponent, and the full

binary mantissa. So we start with the sign bit, giving us

sign exponent mantissa

0 · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·

From (2) we already have part of the mantissa, so let’s fill that in:

sign exponent mantissa

0 · · · · · · · · 1 1 1 1 0 1 · · · · · · · · · · · · · · ·

Notice that I only copied the six digits after the decimal point because we don’t need to store

the leading 1.

8 Bit Signed Integers

Now for the exponent. We see from (2) that the exponent is 4. To represent this as an 8 bit

signed integer we need to apply a bias. With 8 bits we can represent the numbers 0 to 255. To

get negative numbers we subtract the bias, which here will be 127 = 27 − 1. Subtracting the

bias gives us numbers from −127 to 128. Now, in IEEE 754, the boundary values −127 and 128

are reserved for special quantities. For normal single precision floating-point numbers

the exponent takes values from −126 to 127.

To find the binary representation of a signed decimal integer, we need to add the bias (127)

and compute the binary value of the result. So for −126, adding the bias gives 1, which in 8

bit binary is just 0000 0001. Similarly, for +127 adding the bias gives 254, which is 1111 1110

in 8 bit binary. For 0, adding the bias yields 127, which is 0111 1111 in 8 bit binary. For +4,

the exponent in (2), we add the bias to get 131 = 128 + 2 + 1, which is 1000 0011. Thus, we

can now fill in the place of the binary exponent:

sign exponent mantissa

0 1 0 0 0 0 0 1 1 1 1 1 1 0 1 · · · · · · · · · · · · · · ·

CS2035: Assignment 1 – 2018 4

Computing A Binary Mantissa

To fill in the mantissa, we can use the same algorithm as presented in class to divide the decimal

number by successively smaller powers of 2, counting a 0 if the power of 2 does not divide the

number, and counting a 1 if the power of 2 does divide it, and continuing the process with the

remainder, stopping when all of the bits have been filled. When we do this for 10π, we can fill

in the rest of the mantissa to obtain

sign exponent mantissa

0 1 0 0 0 0 0 1 1 1 1 1 1 0 1 1 0 1 0 1 0 0 1 1 1 0 1 1 0 0 0 1

We therefore now have the IEEE 754 single precision representation of 10π. Expressed as a

binary string, this is 0100 0001 1111 1011 0101 0011 1101 0001, giving 4 bytes of data.

For most numbers, the exponent takes some value between -126 and 127 inclusive, as was

pointed out above. The only exceptions to this we consider are 0 and numbers that are larger

than can be represented in the system. We will treat any number with absolute value smaller

than 2−126 as floating-point zero 0, which has the binary string

0 = 0000 0000 0000 0000 0000 0000 0000 0000

2

128 and larger numbers cannot be approximated in the single precision number system and are

represented by Inf, where

Inf = 0111 1111 1000 0000 0000 0000 0000 0000.

Numbers −2

128 or less similarly are represented by -Inf, where

-Inf = 1111 1111 1000 0000 0000 0000 0000 0000.

CS2035: Assignment 1 – 2018 5

Part I: Converting an Integer to Binary and Back

For the first part of this assignment you will write a MATLAB function int2bin8 to convert

an integer x in the range −127 ≤ x ≤ 128 into its binary representation, and a second function

bin2int8 to convert from binary to integer.

To get you started, the following pseudocode gives you the basic structure of int2bin8:

function s = int2bin8(x)

% int2bin8:

% Explanation of what the function does

is x in the range [-127,128]?

s = ’’;

num = x + bias

for i from 7 to 0 decreasing by 1

compute remainder of num and 2^i

concatenate ’0’ or ’1’ to s accordingly

reset the value of num

end

For the bin2int8 function, which takes an 8 bit binary string s as input and outputs an

integer x, your function should start with

function x = bin2int8(s)

% bin2int8:

% Explanation of what the function does

You will then need to ensure that the input is indeed an 8 bit binary string (for which the

strlength and assert functions will be useful), check the bits of the input for which powers

of 2 to add to the result, and apply the bias to compute the decimal value for the output x.

CS2035: Assignment 1 – 2018 6

Part II: Converting a Decimal to a Single Precision Float and Back

For the second part of the assignment, you will use your code from part I to write a MATLAB

function dec2bin32 to convert a decimal number to a 32 bit float, and a second function

bin2dec32 to convert a 32 bit float to its decimal value.

For this part, there are three things that need to be kept track of: the sign; the exponent;

and the mantissa. For dec2bin32 the algorithm is the following

function s = dec2bin32(x)

% dec2bin32:

% Explanation of what the function does

is abs(x) < 2^-126? (zero case)

is abs(x) = 2^128? (+/- Inf case)

out = ’’;

% set sign bit

check the sign of x and set the appropriate sign bit of s

% set exponent bits

find the first power of 2 smaller than x to determine the exponent e

convert e to binary8 using your int2bin8 and concatenate the result to s

% set mantissa bits

num = remainder of x and 2^e % extract ’hidden bit’ portion of mantissa

for i from 1 to 23 by 1

rem = remainder of num and 2^(e-i)

concatenate ’0’ or ’1’ to s accordingly

reset the value of num

end

Note that if you hit one of the special cases at the beginning, the execution of the function

should end and return to the calling function. For finding the value of the exponent e, you may

find the log2 and floor functions useful.

For the bin2dec32 function, which takes a 32 bit binary string as input and outputs a

decimal number x, your function should start with

CS2035: Assignment 1 – 2018 7

function x = bin2dec32(s)

% bin2dec32:

% Explanation of what the function does

You need to ensure that the input is indeed a 32 bit binary string, extract the sign bit, the

exponent bits and the mantissa bits, convert the exponent to decimal, write a loop to add

the powers of 2 as specified by the exponent and mantissa and then apply the correct sign.

Remember to take into account the hidden bit of the mantissa!

Part III: Program Validation

Write a fifth MATLAB script file testConversions.m that begins with

%% CS2035B Assignment 1: Testing Integer and Floating Point Conversions

%% Identification

% Your Name:

% Your Student Number

and should have a section heading (line with two % signs) for each type of conversion you test.

This script file should:

• call bin2int8(int2bin8(x)) for several integers you choose, each integer individually

denoted as x;

• call int2bin8(bin2int8(s)) for several 8 bit binary strings you choose, each string

individually denoted as s;

• call dec2bin32(x) for a list of decimal numbers you choose, each decimal individually

denoted as x, to generate a corresponding list of 32 bit binary strings encoding single

precision floats, each individually denoted as s; and then

• call bin2dec32(dec2bin32(x)) and dec2bin32(bin2dec32(s)) for these lists of decimal

and binary floating-point numbers.

Do you get the expected results?

CS2035: Assignment 1 – 2018 8

Submitting the Assignment

To complete the assignment, make a new folder in your OWL Drop Box called A1 and place

in that folder your five (5) MATLAB script files together with the file testConversions.pdf

obtained by running MATLAB’s Publish on testConversions.m. You must include the

pdf output to receive full marks for your assignment.