News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

How many registers do I have?

Started by frktons, July 15, 2010, 04:44:02 PM

Previous topic - Next topic

hutch--

 :bg

Frank,

Quote
It looks like you never get rest with CPU modifications and upgrades.
Probably you have to stick with whatever is the best for a timeframe
and be ready to change as far as it is needed. Roll Eyes

Welcome to mixed mode or balanced mode assembler programming.  :P
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Rockoon

Demonstrating the reason why I suggested LEA be tested:

AMD Phenom(tm) II X6 1055T Processor (SSE3)
1896    cycles for 100*div
193     cycles for 100*mul
96      cycles for 100*imul
61      cycles for 100*lea, 2*eax
61      cycles for 100*lea, 2*eax+eax
61      cycles for 100*lea, 2*eax+eax+99
61      cycles for 100*shl 1
61      cycles for 100*shl 2

1896    cycles for 100*div
194     cycles for 100*mul
96      cycles for 100*imul
61      cycles for 100*lea, 2*eax
61      cycles for 100*lea, 2*eax+eax
61      cycles for 100*lea, 2*eax+eax+99
61      cycles for 100*shl 1
61      cycles for 100*shl 2


--- ok ---

AMD never gave up on LEA performance.
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

oex

ty guys for your input.... I'm reasonably confident that my code is about as fast as it can be, outside of imul being faster everything else you've been saying seems to be pretty much inkeeping with the current rules I implement, I havent used imul up until now so I might be able to tease a few cycles out of my code yet :bg.... I'll take onboard what you have said and see what improvements I can make :bg
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

frktons

Quote from: Rockoon on July 19, 2010, 01:04:09 PM
Demonstrating the reason why I suggested LEA be tested:

AMD Phenom(tm) II X6 1055T Processor (SSE3)
1896    cycles for 100*div
193     cycles for 100*mul
96      cycles for 100*imul
61      cycles for 100*lea, 2*eax
61      cycles for 100*lea, 2*eax+eax
61      cycles for 100*lea, 2*eax+eax+99
61      cycles for 100*shl 1
61      cycles for 100*shl 2

1896    cycles for 100*div
194     cycles for 100*mul
96      cycles for 100*imul
61      cycles for 100*lea, 2*eax
61      cycles for 100*lea, 2*eax+eax
61      cycles for 100*lea, 2*eax+eax+99
61      cycles for 100*shl 1
61      cycles for 100*shl 2


--- ok ---

AMD never gave up on LEA performance.

Good to see lea is fine. ;) Could you post the ASM
as well, I live on it for the time being.  :P
Mind is like a parachute. You know what to do in order to use it :-)

Rockoon

Quote from: frktons on July 19, 2010, 05:29:52 PM
Good to see lea is fine. ;) Could you post the ASM
as well, I live on it for the time being.  :P

See JJ's post.
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

frktons

Oh! Oh! I skipped a couple of post  :P

Miss lea was not improved that much on my CPU:

Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
1216    cycles for 100*div
178     cycles for 100*mul
94      cycles for 100*imul
94      cycles for 100*lea, 2*eax
94      cycles for 100*lea, 2*eax+eax
94      cycles for 100*lea, 2*eax+eax+99
62      cycles for 100*shl 1
62      cycles for 100*shl 2

1217    cycles for 100*div
178     cycles for 100*mul
94      cycles for 100*imul
94      cycles for 100*lea, 2*eax
94      cycles for 100*lea, 2*eax+eax
94      cycles for 100*lea, 2*eax+eax+99
62      cycles for 100*shl 1
62      cycles for 100*shl 2


--- ok ---


Thanks JJ for providing all these fine examples.  :clap:
Mind is like a parachute. You know what to do in order to use it :-)

Queue

AMD Athlon(tm) 4 Processor (SSE1)
4217 cycles for 100*div
310  cycles for 100*mul
260  cycles for 100*imul
78   cycles for 100*lea, 2*eax
66   cycles for 100*lea, 2*eax+eax
88   cycles for 100*lea, 2*eax+eax+99
74   cycles for 100*shl 1
67   cycles for 100*shl 2

4221 cycles for 100*div
310  cycles for 100*mul
260  cycles for 100*imul
78   cycles for 100*lea, 2*eax
66   cycles for 100*lea, 2*eax+eax
88   cycles for 100*lea, 2*eax+eax+99
75   cycles for 100*shl 1
66   cycles for 100*shl 2

Queue

KeepingRealBusy


Intel(R) Pentium(R) 4 CPU 3.20GHz (SSE2)
5918    cycles for 100*div
1020    cycles for 100*mul
460     cycles for 100*imul
213     cycles for 100*lea, 2*eax
91      cycles for 100*lea, 2*eax+eax
193     cycles for 100*lea, 2*eax+eax+99
95      cycles for 100*shl 1
90      cycles for 100*shl 2

5838    cycles for 100*div
1013    cycles for 100*mul
466     cycles for 100*imul
196     cycles for 100*lea, 2*eax
99      cycles for 100*lea, 2*eax+eax
197     cycles for 100*lea, 2*eax+eax+99
87      cycles for 100*shl 1
87      cycles for 100*shl 2


--- ok ---

AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ (SSE3)
4344    cycles for 100*div
210     cycles for 100*mul
79      cycles for 100*imul
70      cycles for 100*lea, 2*eax
85      cycles for 100*lea, 2*eax+eax
78      cycles for 100*lea, 2*eax+eax+99
28      cycles for 100*shl 1
61      cycles for 100*shl 2

4425    cycles for 100*div
193     cycles for 100*mul
96      cycles for 100*imul
76      cycles for 100*lea, 2*eax
62      cycles for 100*lea, 2*eax+eax
61      cycles for 100*lea, 2*eax+eax+99
78      cycles for 100*shl 1
81      cycles for 100*shl 2


--- ok ---

Ficko

Quote from: BogdanOntanu on July 15, 2010, 07:31:57 PM
Quote from: oex on July 15, 2010, 05:52:11 PM
Hey Bogdan, I was wondering is this an OS design choice or a CPU setting?....

It is a CPU design setting. Design choiche of AMD.

Quote
At the lowest level (OS) you could you have switching right?

Not sure what you ask... but if I guess right the answer is NO.

However you do not have to since you can run both 32bits and 64 bits executables in a 64bits OS. You just can not mix them because the CPU forbids it.


I know this answer from "Bogdan" is a little bit old however I just run into this http://vxheavens.com/lib/vrg16.html
Looks like YASM has this "trick" allready "build-in" wondering how this can be done in MASM  ::) to mix 32/64 - bit source code ?