News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Double check my floating point logic

Started by johnsa, November 26, 2008, 09:12:34 PM

Previous topic - Next topic

johnsa

So.. I need to convert a decimal integer -1234 into 12 bit floating point. Not IEEE floating point with it's special tricks and special cases and it includes the leading 1 in the mantissa.
So here was the thought process...

-1234 ... binary version of Abs(-1234) or 1234 is... 10011010010.0
Given that the floating point is 12bit, we have 1 sign bit, 5 bits for characterstic and 6 bits for mantissa.
So we need to truncate that above number to 6 significant digits and convert to binary normalized exponential form... which gives: 0.100110 x 2^11

The exponent bias is 2^(m-1)-1 .. where m = number of bits for characterisitc... so we have 15 here...

the characterisitc is thus 11 + 15 = 26 = 11010
so the final floating point representation is
111010100110.

If I convert this back into decimal integer I get -1216 ... now either my logic is flawed, or that is in fact correct but the loss of digits in the mantissa causes the error.
If this is not correct.. any suggestions as to how -1234 would be stored in 12 bit floating point?

Roger

Hi Johnsa,

It is the loss of bits in the mantissa.

The smallest number you can have with a 6 bit mantissa is 0.000001. With an exponent of 2^11 this is 32. Thus your 1234 can only be stored to the nearest 32 i.e. 1184, 1216, 1248, 1280 etc.

I am not sure why you need to add the exponent bias as it increases the size of the characteristic by 1 bit. Without it you have 7 bit for the mantissa giving you accuracy to the nearest 16.

For accuracy to the nearest 1 needs 11 bits (for numbers up to 2048) plus a sign bit which leaves 0 bits for the characteristic so you are stuck with fixed point.

Regards Roger

raymond

Just curious as to why you chose 12 bits instead of some multiple of 8 bits?
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

johnsa

Thanks for that, I just wanted to double check that the answer I came up with was at least correct. I thought it was, but at 2am I was doubting what was left of my brain! :)

The choice of 12bits wasn't really up to me unfortunately, it's part of an assignment in discrete maths/digital circuit design.

Rockoon

12-bit data is actualy fairly common

For example, many DAC's and ADC's were 12-bit not so long ago.

I imagine these or their offspring are going as surplus now and loaded into imbedded systems that dont need 16-bit precision... probably even into digital cameras

(also remember than there were 36-bit computers well before intel was putting out 32-bit ones, and 12 is an even multiple...)

Also, dont some video formats use 12-bit floats?
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.