Double check my floating point logic

johnsa · November 26, 2008, 09:12:34 PM

So.. I need to convert a decimal integer -1234 into 12 bit floating point. Not IEEE floating point with it's special tricks and special cases and it includes the leading 1 in the mantissa.
So here was the thought process...

-1234 ... binary version of Abs(-1234) or 1234 is... 10011010010.0
Given that the floating point is 12bit, we have 1 sign bit, 5 bits for characterstic and 6 bits for mantissa.
So we need to truncate that above number to 6 significant digits and convert to binary normalized exponential form... which gives: 0.100110 x 2^11

The exponent bias is 2^(m-1)-1 .. where m = number of bits for characterisitc... so we have 15 here...

the characterisitc is thus 11 + 15 = 26 = 11010
so the final floating point representation is
111010100110.

If I convert this back into decimal integer I get -1216 ... now either my logic is flawed, or that is in fact correct but the loss of digits in the mantissa causes the error.
If this is not correct.. any suggestions as to how -1234 would be stored in 12 bit floating point?

Roger · November 27, 2008, 12:37:10 AM

Hi Johnsa,

It is the loss of bits in the mantissa.

The smallest number you can have with a 6 bit mantissa is 0.000001. With an exponent of 2^11 this is 32. Thus your 1234 can only be stored to the nearest 32 i.e. 1184, 1216, 1248, 1280 etc.

I am not sure why you need to add the exponent bias as it increases the size of the characteristic by 1 bit. Without it you have 7 bit for the mantissa giving you accuracy to the nearest 16.

For accuracy to the nearest 1 needs 11 bits (for numbers up to 2048) plus a sign bit which leaves 0 bits for the characteristic so you are stuck with fixed point.

Regards Roger

raymond · November 27, 2008, 04:10:22 AM

Just curious as to why you chose 12 bits instead of some multiple of 8 bits?

johnsa · November 27, 2008, 07:54:54 AM

Thanks for that, I just wanted to double check that the answer I came up with was at least correct. I thought it was, but at 2am I was doubting what was left of my brain! :)

The choice of 12bits wasn't really up to me unfortunately, it's part of an assignment in discrete maths/digital circuit design.

Rockoon · November 28, 2008, 10:02:05 AM

12-bit data is actualy fairly common

For example, many DAC's and ADC's were 12-bit not so long ago.

I imagine these or their offspring are going as surplus now and loaded into imbedded systems that dont need 16-bit precision... probably even into digital cameras

(also remember than there were 36-bit computers well before intel was putting out 32-bit ones, and 12 is an even multiple...)

Also, dont some video formats use 12-bit floats?

News:

Double check my floating point logic

johnsa

Roger

raymond

johnsa

Rockoon