News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

math coprocessor

Started by ninjarider, August 04, 2005, 12:42:16 PM

Previous topic - Next topic

ninjarider

i thought there was a fpu tutorial in the tutorials that came with masm. i need to know what i need to do to set up the fpu, and if i actually need to. how to move numbers to the fpu. how to use the sine and cosine functions. and how to get my number back. i think it was in the hla thing that i read. some stuff about it.

pro3carp3

LGC

raymond

Although the following link seems to be the same as the one reported above, the html encoding of pro3carp3's message seems to have been messed up and his link does not work. This one should.

http://www.ray.masmcode.com/fpu.html

Raymond
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

ninjarider

ray. -i've been reading your tutorial and plan to finish it. (may take days). which is y i asked the question in the first place

Mark Jones

"Days" is not fast enough?

::)
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

ninjarider

yeah. but when i learn something new i like to learn all of it and understand all of it.

Mark Jones

"Patience Daniel-san, patience..."  :bg
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

ninjarider

since the program that im writing does a lot of fpu functions like sin, cos, add, multi, and divide functions. would i be able to scatter these commands around in the code so that i dont have to wait. and would the following code be a good idea.

fmul st, st(1)
...some code...
fwait
fdiv st, st(2)
...some code...
fwait
fsin
...some code...
fwait
fmul st, st(3)


question on fpu clock cycles
im looking at a chart that says

fsin        | 486 257-354 *

does that mean that the fsin can take anywere from 257 to 354 clock cycles

raymond

First, you rarely need to use the fwait instruction with modern processors, although I still tend to use it myself as a precaution when storing data from the FPU to memory before using that data with the CPU.

And YES you can intersperse CPU instructions with FPU instructions and they will "normally" run concurrently. However, multiplications and divisions by the CPU and the FPU may use the same hardware and would thus wait on each other.

Another trick to improve speed is to exchange FPU registers to possibly run two FPU instructions concurrently when they don't use the same hardware. For example:

fmul st,st(2)
fxch st,st(3)
fadd st,st(1)
fsub st,st(4)


While the current content of st(0) would get multiplied by st(2), the current contents of st(3) and st(1) would get added and st(4) subtracted from the result well before the multiplication would get completed. You only have to remember that the result of the multiplication would be in st(3).

Quotedoes that mean that the fsin can take anywere from 257 to 354 clock cycles

The hardware is probably hardcoded with something like Taylor series to compute the transcendental functions. The time required would thus vary to reach the error margin based on the input to the function.

Raymond
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

ninjarider

with the code that u have. the fxch would move the value of st(3) to st. so how would it not be using the same hardware.
when u say that if it doesn't use the same hardware. does each instruction use a diffrent set of hardware or are u talking about the register in which the data resides

some of the code to be translated
x = sin(cameraangle.y * pi / 180
y = sin(cameraangle.x * pi / 180
z = cos(cameraangle.y * pi / 180

with cameraposition
    glulookat .x, .y, .z, .x + x, .y + y, .z + z, 0, 1, 0
end with

raymond

The FPU is a complex processor. For example, multiplications are performed in a separate section while additions are performed in another section. Once the data has been sent to the multiplier, it doesn't matter anymore where that data was taken. The "TOP" register can then be loaded with different data and fed to the adding section which will run independently.

Obviously, there isn't a separate section for each instruction. The transcendental instructions are most probably using also the multiplying section. Some of the more modern processors may also have more than one section of each type and be able to perform parallel computations. In addition, processors could be different from one manufacturer to another (Intel, AMD, etc).

Hardware experts may be able to explain this much better than I can. :eek

FYI: The description of the fsin instruction in the FPU tutorial suggested earlier contains an example of coding to obtain the sine of an angle expressed in degrees.

Raymond
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

ninjarider

believe it or not it was a good explanation.

Mark Jones

Raymond is the house ALU genius around here. :) Have you seen his fractal renderer yet? :U
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

ninjarider


raymond

If you are interested in fractals, you can get it from the bottom of the following page:

http://www.ray.masmcode.com/complex.html

Raymond
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com