The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: frktons on July 15, 2010, 04:44:02 PM

Title: How many registers do I have?
Post by: frktons on July 15, 2010, 04:44:02 PM
Hello everybody. I've been a little busy recentely with
C learning, so my MASM/32 experimentations are waiting
for available time.

But I want to do some experimentations nevertheless, and
because I've already done some ADD,MOV, dereferencing
and so on, I'm wondering what else can I do with them, what
kind of instructions can I try, and what registers are available
on my machine, other than the 8 general 32 bit registers and
sons, so to speak, FLAG register, Stack registers and usual 32 bit ones.

According to the Intel Processor Identification Utility, my pc
has got a Core 2 Duo CPU E6600 2.40 Ghz, and is
in the X64 class Processor and can use SSE3 instructions.

I'm asking myself, and somebody who knows better than me,
some basic information about the quantity and type of registers
are there in that machine, what are they used for,
if I can move data from EAX for example to an MMX register
and things like these.

I think all these info are available on INTEL manuals, but before
diving into them and get lost, I'd like some general explanation
if you can help me or give me a link to these info.

One wonderful thing would be a 3 lines complete example of
the general ideas:



««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««« *

    .686                                    ; create 32 bit code [?]
    .model flat, stdcall                    ; 32 bit memory model
    option casemap :none                    ; case sensitive

    include \masm32\include\windows.inc     ; always first
    include \masm32\macros\macros.asm       ; MASM support macros

  ; -----------------------------------------------------------------
  ; include files that have MASM format prototypes for function calls
  ; -----------------------------------------------------------------
    include \masm32\include\masm32.inc
    include \masm32\include\gdi32.inc
    include \masm32\include\user32.inc
    include \masm32\include\kernel32.inc

  ; ------------------------------------------------
  ; Library files that have definitions for function
  ; exports and tested reliable prebuilt code.
  ; ------------------------------------------------
    includelib \masm32\lib\masm32.lib
    includelib \masm32\lib\gdi32.lib
    includelib \masm32\lib\user32.lib
    includelib \masm32\lib\kernel32.lib
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

start:                          ; The CODE entry point to the program

    some code here to add or move a register X64 to another
    and print the result, or the use of MMX, EMMX and the like
    exit


; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

end start                       ; Tell MASM where the program ends



I mean something I can assemble with the masm32 package, I'm using
ML  from vs2010 so it can assemble the last available code I think.

Thanks for your patience.

Frank
Title: Re: How many registers do I have?
Post by: BogdanOntanu on July 15, 2010, 05:27:48 PM
While you run on an 32 bits OS and /or compile targeting an 32bits executable you will not be able to access the additional 64bits general purpose registers. Basically when the CPU is in 32 bits mode or executes an 32 bits executable THEN x64 register are off limits even if your CPU is x64 capable.

To put it blantly in 32 bits you do not have acces to RAX, RCX, RDX,... and R8, R9, to R15 neither to XMM8 ... XMM15. You can only use EAX, ECX, EDX, EBX, ESP,EBP,ESI,EDI and XMM1 to XMM7 even if your CPU is x64 capable. Access to FPU and MMX registers is kind of the same in x32 and in x64

Assumming that you run an 64bits OS like Windows 7 x64 then you could use x64 general purpoose registers but only if you compile for an PE32+ (in fact PE64) executable format target.

The MASM compiler provided in MASM32 package can not do this. You will have to use the 64 bits version of ML. Again unfortunately the x64 bits version of ML does not yet support invoke and many of the more advanced features of the ML 32 bits compiler like .IF .ELESIF .WHILE etc

Hence you can try JWASM, or GoASM or humbly my own assembler: SOL_ASM.

JWASM is the most compatible with MASM/MASM32.
GoASM is kind of different.... to much for my taste but some people here swear by it and it is part of those forums. 
Sol_ASM is somewhere in the middle (you have invoke like in MASM but other things like the include/structures/db/PROC format is more like in TASM)

Then on other sites you can also find FASM or NASM or YASM with even more syntax diferences when compared to MASM.

I suggest that you gain confidence in 32 bits world with MASM32 and move to x64 only later because the transition is not exactly an easy ride but also not very complicated once you know 32bits well enough. Anyway you might get confused unless you have a solid 32bits conceptual base to fallback to.

IMHO JWASM is your best option now if you want to try x64 with a MASM32 like syntax.  Or as a  biased oppinion my own Sol_Asm :D

As for using and moving data to/from from GPR registers to FPU/MMX/SSE3/XMM registers some restrictions do apply but you will just have to learn them... your question is too vague to be answered briefly and clearely.
Title: Re: How many registers do I have?
Post by: frktons on July 15, 2010, 05:45:19 PM
Quote from: BogdanOntanu on July 15, 2010, 05:27:48 PM
While you run on an 32 bits OS and /or compile targeting an 32bits executable you will not be able to access the additional 64bits general purpose registers.

Basically when the CPU is in 32 bits mode or executes an 32 bits executable x64 is off limits even if your CPU is x64 capable.

Assumming that you run an 64bits OS like Windows 7 x64 then you could use x64 general purpoose registers but only if you compile for an PE32+ (in fact PE64) exectable target.

The MASM compiler provided in MASM32 package can not do this. You will have to use the 64 bits version of ML. Again unfortunately the x64 bits version of ML does not yet support invoke and many of the more advanced features of the ML 32 bits compiler.

Hence you can try JWASM, or GoASM or humbly my own assembler: SOL_ASM.

JWASM is the most compatible with MASM/MASM32.
GoASM is kind of different.... to much for my taste but some people here swear by it and it is part of those forums. 
Sol_ASM is somewhere in the middle (you have invoke like in MASM but other things like the include/structures/db/PROC format is more like in TASM)

Then on other sites you can also find FASM or NASM or YASM with even more syntax diferences when compared to MASM.

I suggest that you gain confidence in 32 bits world with MASM32 and move to x64 only later because the transition is not exactly an easy ride but also not very complicated once you know 32bits well enough. Anyway you might get confused unless you have a solid 32bits conceptual base to fallback to.

IMHO JWASM shold be your best choiche if you want to try x64 with a MASM32 like syntax. Or as a  biased oppinion my own Sol_Asm :D

As for using and moving data to/from from GPR registers to FPU/MMX/SSE3/XMM registers some restrictions do apply but you will just have to learn them... your question is too vague to be answered briefly and clearely.


Thanks, BogdanOntanu.

My machine is X64 and my OS is WIN7/64 bit, but I'm not trying
to shift to 64 bit Assembly for the time being. I'm just a beginner
so I'm going to stay on 32 bit MASM for a while.

According to Wikipedia:
Quote
MMX defined eight registers, known as MM0 through MM7 (henceforth referred to as MMn). To avoid compatibility problems with the context switch mechanisms in existing operating systems, these registers were aliases for the existing x87  FPU stack registers (so no new registers needed to be saved or restored). Hence, anything that was done to the floating point stack would also affect the MMX registers and vice versa. However, unlike the FP stack, the MMn registers are directly addressable (random access).

Each of the MMn registers holds 64 bits (the mantissa-part of a full 80-bit FPU register). The main usage of the MMX instruction set is based on the concept of packed data types, which means that instead of using the whole register for a single 64-bit integer, two 32-bit integers, four 16-bit integers, or eight 8-bit integers may be processed concurrently.

The mapping of the MMX registers onto the existing FPU registers made it somewhat difficult to work with floating point and SIMD data in the same application. To maximize performance, programmers often used the processor exclusively in one mode or the other, deferring the relatively slow switch between them as long as possible.

Because the FPU stack registers are 80 bits wide, the upper 16 bits of the stack registers go unused in MMX, and these bits are set to all ones, which makes them NaNs or infinities in the floating point representation. This can be used to decide whether a particular register's content is intended as floating point or SIMD data.

MMX provides only integer operations. When originally developed, for the Intel_i860, the use of integer math made sense (both 2D and 3D calculations required it), but as graphics cards that did much of this became common, integer SIMD in the CPU became somewhat redundant for graphical applications. On the other hand, the saturation arithmetic operations in MMX could significantly speed up some digital signal processing applications.

I've got some 8 registers I never use, but that could be useful
for integer operations, or just to store data, or using SSE instructions
on the 64 bit area they provide.

You are right, the question is too vague, I need to express better my
idea:

Let's make a couple of practical examples:

if I run short of 32 bit GP register, how do I use the MMX registers
to store the content of some 32 bit GP register?
Is it better to push them on the stack and pop them afterwhile?
If so why?
When I have to deal with 4 16 bit integer numbers and I want
to perform some SIMD on them, like adding 1 to each of them
is it possible to do it with an MMX register?

I hope I am a little bit more clear now, I'm just trying to figure
what can I do with MMX registers that exist from 486 CPU,
without moving to X64 Assembly. It is not time yet  :P



Title: Re: How many registers do I have?
Post by: oex on July 15, 2010, 05:52:11 PM
Quote from: BogdanOntanu on July 15, 2010, 05:27:48 PM
Basically when the CPU is in 32 bits mode or executes an 32 bits executable THEN x64 register are off limits even if your CPU is x64 capable.

Hey Bogdan, I was wondering is this an OS design choice or a CPU setting?.... At the lowest level (OS) you could you have switching right? rather than having to have 2 seperate exes like now but this would have to be factored into the PE equivalents design

Also does SOL OS work off PE format or do you have your own format?
Title: Re: How many registers do I have?
Post by: jj2007 on July 15, 2010, 05:59:00 PM
google for Tommesani SSE2 - best intro you can get.
Title: Re: How many registers do I have?
Post by: frktons on July 15, 2010, 06:41:53 PM
Quote from: jj2007 on July 15, 2010, 05:59:00 PM
google for Tommesani SSE2 - best intro you can get.

Thanks JJ, I'll have a look.  :8)

Could anyone post some 3 lines example as well?

I mean the .686 directive to MASM is necessary?
Have I to declare some other directive for using MMX registers and SIMD/
SSE/SSE2/SSE3 instructions?

Thanks
Title: Re: How many registers do I have?
Post by: BogdanOntanu on July 15, 2010, 07:11:09 PM
Quote from: frktons on July 15, 2010, 05:45:19 PM
...
Thanks, BogdanOntanu.

My machine is X64 and my OS is WIN7/64 bit, but I'm not trying
to shift to 64 bit Assembly for the time being. I'm just a beginner
so I'm going to stay on 32 bit MASM for a while.

If you already run Windows7 x64 version (I also do) then you could concentrate on 32 bits for learning and also once in a while test some x64 code  just to get your skills updated / introduced to the "new" x64 world...

At least that is what I do: I keep my main focus on x32 for now but I also do take long and deep incursions into x64 world and test programms / applications whenever I feel like.


Quote
I've got some 8 registers I never use, but that could be useful
for integer operations, or just to store data, or using SSE instructions
on the 64 bit area they provide.

Yes most of today machines have the extra MMX and XMM registers. The problem with MMX is that they are aliased over FPU registers and FPU is also releatively needed for many of today applications. If you can keep yourslef restrained into integer world then you could use MMX...

However using XMM is a much better choiche... personally I kind of ignore MMX and go directly for XMM when I want to use SSE.

Quote
You are right, the question is too vague, I need to express better my
idea:

Let's make a couple of practical examples:

if I run short of 32 bit GP register, how do I use the MMX registers
to store the content of some 32 bit GP register?
Is it better to push them on the stack and pop them afterwhile?
If so why?

Come on... such questions are naive at max. You can not gain this kind of experience by asking "what is better and why - make me a list"  kind of questions :D

First you do need to read the INTEL manuals and some tutorials (as proposed here by jj2007) on this SSE instructions.

Then try some hands on simple tests... then you will understand the basics and gain the much needed neuronal paths and experience and then you can generate much more relevant questions if you hit some road block or if no questions arise then improve your hands simple test iteratively (by adding complexity).

If you get the predigeste answer then you have data but you are not improved internally. It might be usefull in a robotic or production way but it is not usefull for your future neuronal development.

Hands on simple tests and comming back with more exact questions is a better way to learn IMHO.

However in order to hint about your questions conceptually:  
1) How do you propose to push a 64 bits register on a 32 bits stack?

2) Another thing to note is the Single Instruction Mutiple Data aspect: SSE instructions usually operate on multiple smaller data packed together inside a single MMX/XMM register. It depends on your skils to prepare or to handle data packed this way.

3) They can also help you with "saturation" and thus avoiding .IF eax>255  eax == 255 .ENDIF kind of code that can be  time consuming inside an inner loop.

Quote
When I have to deal with 4 16 bit integer numbers and I want
to perform some SIMD on them, like adding 1 to each of them
is it possible to do it with an MMX register?

Better with an XMM register ;) bu yes generally you can do this and with saturation.

Quote
I hope I am a little bit more clear now, I'm just trying to figure
what can I do with MMX registers that exist from 486 CPU,
without moving to X64 Assembly. It is not time yet  :P

x64 will give you access to extra XMM8...XMM15 but otherwise the instructions behave kind of exactly the same as in 32 bits.

Since you already use an x64 OS you do not have to restrain yourself because of this.

The real problems with X64 is that it uses another calling convention and you can not use mature enough tools yet (for example OllyDbg not working) and beeing a beginner it is of no use to deal with 2 (two) problems in the same time. You will not know if the problem is from x64 or from your handling of SSE.

However as I have said above by all means do take a peek int x64  if you are there ;)
Title: Re: How many registers do I have?
Post by: clive on July 15, 2010, 07:13:36 PM
004014F8 0F6EC1                 movd    mm0,ecx
004014FB 0F7EC1                 movd    ecx,mm0


http://www.tommesani.com/MMXDataTransfer.html
Title: Re: How many registers do I have?
Post by: BogdanOntanu on July 15, 2010, 07:16:08 PM
Quote from: frktons on July 15, 2010, 06:41:53 PM
Could anyone post some 3 lines example as well?

Not me :))

Quote
I mean the .686 directive to MASM is necessary?
Have I to declare some other directive for using MMX registers and SIMD/
SSE/SSE2/SSE3 instructions?

Thanks

Fast search on the forums reveals: .XMM
Title: Re: How many registers do I have?
Post by: BogdanOntanu on July 15, 2010, 07:31:57 PM
Quote from: oex on July 15, 2010, 05:52:11 PM
Hey Bogdan, I was wondering is this an OS design choice or a CPU setting?....

It is a CPU design setting. Design choiche of AMD.

Quote
At the lowest level (OS) you could you have switching right?

Not sure what you ask... but if I guess right the answer is NO.

However you do not have to since you can run both 32bits and 64 bits executables in a 64bits OS. You just can not mix them because the CPU forbids it.

Quote
rather than having to have 2 seperate exes like now but this would have to be factored into the PE equivalents design

This is more an issue with the CPU than with PE. Hence not possible.

Quote
Also does SOL OS work off PE format or do you have your own format?

For now SOL_OS can load map and run PE32 files for compatibility reasons (with existing compilers and tools). You can also load and run plain binary files under certain circumstances (older interfaces).

However in SOL_OS you (the programmer) have full control over the machine and hence there is nothing blocking you from switching the CPU into x64 mode and use 64 bits registers and then return to 32 bits mode and /or load and run your own favorite executable.

I do plan to use my own executable format later  but this is not a priority at this moment (a format is pre-designed)

However:
1) One needs experience and deep understanding of existing formats in order to invent another "better" format.
2) There is a huge burden that a new executable format would place on potential developers and tool chains.


Note: this is kind of off-topic for the OP's question.

Hence if you intend to ask further questions on SOL_OS then you can do so on SOL_OS forums or here BUT in another thread. Emails or PM are not recommended with me :D
Title: Re: How many registers do I have?
Post by: oex on July 15, 2010, 07:44:30 PM
ty for reply it answers my questions :)
Title: Re: How many registers do I have?
Post by: jj2007 on July 15, 2010, 07:46:09 PM
Quote from: BogdanOntanu on July 15, 2010, 07:11:09 PM

Quote
When I have to deal with 4 16 bit integer numbers and I want
to perform some SIMD on them, like adding 1 to each of them
is it possible to do it with an MMX register?

Better with an XMM register

Google for mmx fpu emms to understand the reason.
Title: Re: How many registers do I have?
Post by: frktons on July 15, 2010, 07:55:32 PM
Thanks everybody.

All your suggestions will be meditated upon.  :U

what I got till now from your indications is:

1)I have to declare that I'm using MMX this way:


.686
option casemap:none
.mmx
.xmm
.model flat, stdcall


2) I can move data from 32 bit registers to the 32 low bits of an MMX
and viceversa this way:



movd    mm0,ecx
movd    ecx,mm0      


So a complete program could be:


««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««« *

   .686                                    ; create 32 bit code
   option casemap :none                    ; case sensitive
   .mmx
   .xmm
   .model flat, stdcall                    ; 32 bit memory model

   include \masm32\include\windows.inc     ; always first
   include \masm32\macros\macros.asm       ; MASM support macros

 ; -----------------------------------------------------------------
 ; include files that have MASM format prototypes for function calls
 ; -----------------------------------------------------------------
   include \masm32\include\masm32.inc
   include \masm32\include\gdi32.inc
   include \masm32\include\user32.inc
   include \masm32\include\kernel32.inc

 ; ------------------------------------------------
 ; Library files that have definitions for function
 ; exports and tested reliable prebuilt code.
 ; ------------------------------------------------
   includelib \masm32\lib\masm32.lib
   includelib \masm32\lib\gdi32.lib
   includelib \masm32\lib\user32.lib
   includelib \masm32\lib\kernel32.lib
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

start:                          ; The CODE entry point to the program

   mov     ecx, 12345
   movd    mm0,ecx
   print     str$(mm0)," value of mm0",13,10
   movd    ecx,mm0  
   print      str$(ecx)," value of ecx",13,10  
   exit

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

end start                       ; Tell MASM where the program ends



Or I still need something else? It doesn't assemble with masm32.  :(
Title: Re: How many registers do I have?
Post by: clive on July 15, 2010, 07:58:15 PM
The general take away is that using MMX or SSE registers as additional scratch registers for the processor is not a great plan.

Specifically, they are designed to pull vector data out of memory and process it/them in parallel. Intel calls this SIMD (Single Instruction Multiple Data)

The Software Optimization Cookbook, Richard Gerber, Intel Press, ISBN 0-9712887-1-4
http://www.alibris.com/booksearch?qisbn=9780971288713&qwork=
Title: Re: How many registers do I have?
Post by: clive on July 15, 2010, 08:01:46 PM
A small example that assembles.

ml  -Fl  -c  -coff  test32.asm

TEST32.ASM
        .686
        .XMM
        .MODEL FLAT

        .CODE

_start:

        mov     ecx, 12345
        movd    mm0,ecx
        movd    ecx,mm0

        ret

        END     _start


TEST32.LST
Microsoft (R) Macro Assembler Version 6.15.8803     07/15/10 15:01:09
test32.asm      Page 1 - 1


        .686
        .XMM
        .MODEL FLAT

00000000         .CODE

00000000 _start:

00000000  B9 00003039         mov     ecx, 12345
00000005  0F 6E C1         movd    mm0,ecx
00000008  0F 7E C1         movd    ecx,mm0

0000000B  C3         ret

        END     _start
Title: Re: How many registers do I have?
Post by: frktons on July 15, 2010, 08:06:52 PM
Quote from: clive on July 15, 2010, 08:01:46 PM
Microsoft (R) Macro Assembler Version 6.15.8803     07/15/10 15:01:09
test32.asm      Page 1 - 1


        .686
        .XMM
        .MODEL FLAT

00000000         .CODE

00000000 _start:

00000000  B9 00003039         mov     ecx, 12345
00000005  0F 6E C1         movd    mm0,ecx
00000008  0F 7E C1         movd    ecx,mm0

0000000B  C3         ret

        END     _start


OK clive, I got that.
What if I want to display the content of ecx and mm0?
Why my example is not assembling? What's wrong?

Quote from: clive on July 15, 2010, 07:58:15 PM
The general take away is that using MMX or SSE registers as additional scratch registers for the processor is not a great plan.

Specifically, they are designed to pull vector data out of memory and process it/them in parallel. Intel calls this SIMD (Single Instruction Multiple Data)

The Software Optimization Cookbook, Richard Gerber, Intel Press, ISBN 0-9712887-1-4
http://www.alibris.com/booksearch?qisbn=9780971288713&qwork=

Good to know. If I'd like a faster code I'll have to take that into account.  :U
Title: Re: How many registers do I have?
Post by: frktons on July 15, 2010, 08:23:47 PM
Well, one of the thing MASM32 doesn't like is:

   print     str$(mm0)," value of mm0",13,10


maybe the print macro doesn't accept this kind of
data to be displayed.

What else?


Microsoft (R) Macro Assembler Version 10.00.30319.01
Copyright (C) Microsoft Corporation.  All rights reserved.

Assembling: C:\masm32\examples\mmx_usage.asm
C:\masm32\examples\mmx_usage.asm(1) : error A2044:invalid character in file
C:\masm32\examples\mmx_usage.asm(37) : error A2034:must be in segment block
C:\masm32\examples\mmx_usage.asm(39) : error A2034:must be in segment block
C:\masm32\examples\mmx_usage.asm(40) : error A2034:must be in segment block
C:\masm32\examples\mmx_usage.asm(42) : error A2034:must be in segment block
C:\masm32\examples\mmx_usage.asm(49) : error A2006:undefined symbol : start
C:\masm32\examples\mmx_usage.asm(49) : error A2148:invalid symbol type in expres
sion : start
_
Assembly Error
Premere un tasto per continuare . . .


Well I found something else, I was missing .data and .code

Still something wrong:

C:\masm32\examples\mmx_usage.asm(1) : error A2044:invalid character in file


And the last one: I was missing a ";" at the very first line of comment.
Title: Re: How many registers do I have?
Post by: BogdanOntanu on July 15, 2010, 08:34:55 PM
Quote from: frktons on July 15, 2010, 08:23:47 PM
maybe the print macro doesn't accept this kind of
data to be displayed.

What else?

Well, many times in ASM you are on your own and you have to create your own tools and routines (unlike in HLL languages).

MASM32 spoils you a little with it's stock of macro's and routines ready to use BUT it is possible that the macro provided with MASM32 does not support 64bits or 128 bits registers printing... after all it is designed for 32 bits.

Hence there might be nothing else.

Use the example kindly provided by Clive and consider that maybe now it is a good time for you to learn how to write your own simple routine to convert an 64bits / 128 bits integer into it's ASCII equivalent and to print the resulting string on screen.

Alternatively you could store the MMX / XMM register into a memory location / variable and then do two consecutive prints on it's low and high parts inorder to see the ASCII on screen and avoid writting your own code ...


Title: Re: How many registers do I have?
Post by: clive on July 15, 2010, 09:08:33 PM
Quote from: frktons
C:\masm32\examples\mmx_usage.asm(37) : error A2034:must be in segment block
C:\masm32\examples\mmx_usage.asm(49) : error A2006:undefined symbol : start

Needs a .CODE, _start is probably what you should be using. Pretty sure you can't use MM0..MM7 for those STR$ macros, being as the registers live inside the FPU.
Title: Re: How many registers do I have?
Post by: frktons on July 15, 2010, 09:11:46 PM
Quote from: BogdanOntanu on July 15, 2010, 08:34:55 PM

Well, many times in ASM you are on your own and you have to create your own tools and routines (unlike in HLL languages).

MASM32 spoils you a little with it's stock of macro's and routines ready to use BUT it is possible that the macro provided with MASM32 does not support 64bits or 128 bits registers printing... after all it is designed for 32 bits.

Hence there might be nothing else.

Use the example kindly provided by Clive and consider that maybe now it is a good time for you to learn how to write your own simple routine to convert an 64bits / 128 bits integer into it's ASCII equivalent and to print the resulting string on screen.

Alternatively you could store the MMX / XMM register into a memory location / variable and then do two consecutive prints on it's low and high parts inorder to see the ASCII on screen and avoid writting your own code ...


Thanks, I'll try do do it by myself.
For the time being I realized how I can use the MMX registers,
and this is the first thing I was trying to understand.
The final code that compiles and runs could be the starting
point for doing what you suggest.  :P

;««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««« *
; Example of MMX register usage
;««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««« *
   include \masm32\include\masm32rt.inc
   .686  
   .mmx
   .xmm

   include \masm32\macros\macros.asm       ; MASM support macros

    .data

    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

start:                          ; The CODE entry point to the program

   mov      ecx, 12345
   movd     mm0,ecx
 ;  print    str$(mm0)," value of mm0",13,10
   movd     ecx,mm0  
   print    str$(ecx)," value of ecx",13,10  
   print "Press a key to close the program"
   call wait_key
   print chr$(13,10)

   exit

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

end start                       ; Tell MASM where the program ends


Quote from: clive on July 15, 2010, 09:08:33 PM
Quote from: frktons
C:\masm32\examples\mmx_usage.asm(37) : error A2034:must be in segment block
C:\masm32\examples\mmx_usage.asm(49) : error A2006:undefined symbol : start

Needs a .CODE, _start is probably what you should be using. Pretty sure you can't use MM0..MM7 for those STR$ macros, being as the registers live inside the FPU.

Yes clive, after some experiments I got that, now I'll try to find a way to display
the mm0 content.  :U

Edit: something like this would be sufficient in this case:

    mov      ecx, 12345
    movd     mm0, ecx
    movd     eax, mm0
    print    str$(eax)," value of 32 lower bit mm0",13,10
Title: Re: How many registers do I have?
Post by: MichaelW on July 16, 2010, 05:26:17 AM
Printf can handle 64-bit integers directly, and the formatting is easy. For 128-bit values the only easy method I can see is to display them as back to back 64-bit values, in hex.

;==============================================================================
    include \masm32\include\masm32rt.inc
    .586
    .MMX
    .XMM
;==============================================================================
    .data
      i64     dq 1122334455667788h
              dq 8877665544332211h
      rmm0    dq 0
      rxmm0   dq 0,0
    .code
;==============================================================================
start:
;==============================================================================
    movq mm0, i64
    movq rmm0, mm0
    movups xmm0, i64
    movups rxmm0, xmm0

    invoke crt_printf, cfm$("MM0 = %I64Xh\n\n"), rmm0
    invoke crt_printf, cfm$("XMM0 = %I64X%I64Xh\n\n"), rxmm0, rxmm0+8

    inkey "Press any key to exit..."
    exit
;==============================================================================
end start

Title: Re: How many registers do I have?
Post by: ecube on July 16, 2010, 05:49:41 AM
GoASM is the best, the more I use it, the more I love it, it-just-makes-sense, and it saves me a ton of time even in the 32bit world vs masm.
Title: Re: How many registers do I have?
Post by: frktons on July 16, 2010, 10:42:56 AM
Quote from: MichaelW on July 16, 2010, 05:26:17 AM
Printf can handle 64-bit integers directly, and the formatting is easy. For 128-bit values the only easy method I can see is to display them as back to back 64-bit values, in hex.

;==============================================================================
    include \masm32\include\masm32rt.inc
    .586
    .MMX
    .XMM
;==============================================================================
    .data
      i64     dq 1122334455667788h
              dq 8877665544332211h
      rmm0    dq 0
      rxmm0   dq 0,0
    .code
;==============================================================================
start:
;==============================================================================
    movq mm0, i64
    movq rmm0, mm0
    movups xmm0, i64
    movups rxmm0, xmm0

    invoke crt_printf, cfm$("MM0 = %I64Xh\n\n"), rmm0
    invoke crt_printf, cfm$("XMM0 = %I64X%I64Xh\n\n"), rxmm0, rxmm0+8

    inkey "Press any key to exit..."
    exit
;==============================================================================
end start



Hi Michael, thanks for the example.

I've tried to assemble it with MASM32 but I get some errors:

Microsoft (R) Macro Assembler Version 10.00.30319.01
Copyright (C) Microsoft Corporation.  All rights reserved.

Assembling: C:\masm32\examples\print64bit.asm
C:\masm32\examples\print64bit.asm(20) : error A2070:invalid instruction operands

C:\masm32\examples\print64bit.asm(21) : error A2070:invalid instruction operands

_
Assembly Error
Press a key . . .


What's wrong?

It is pointing at these instructions I think:

    movups xmm0, i64
    movups rxmm0, xmm0


because I added a couple of comment lines on the top.

Quote from: E^cube on July 16, 2010, 05:49:41 AM
GoASM is the best, the more I use it, the more I love it, it-just-makes-sense, and it saves me a ton of time even in the 32bit world vs masm.

Well, GoAsm could be a next tool to get, but actually I
prefer to stick with MASM32 in order to learn enough bits
of Assembly, afterwhile I'll see....
Title: Re: How many registers do I have?
Post by: MichaelW on July 16, 2010, 12:20:54 PM
QuoteI've tried to assemble it with MASM32 but I get some errors

I tested with ML 6.15 only. Try adding OWORD PTR in front of the memory operands.

http://msdn.microsoft.com/en-us/library/2det2cf1(VS.71).aspx
Title: Re: How many registers do I have?
Post by: frktons on July 16, 2010, 02:44:42 PM
Quote from: MichaelW on July 16, 2010, 12:20:54 PM

I tested with ML 6.15 only. Try adding OWORD PTR in front of the memory operands.

http://msdn.microsoft.com/en-us/library/2det2cf1(VS.71).aspx

Thanks Michael it now works this way:

    movups xmm0, OWORD PTR i64
    movups OWORD PTR rxmm0,  xmm0


and outputs:


MM0 = 1122334455667788h

XMM0 = 11223344556677888877665544332211h

Press any key to exit...


Is that what we expected? OWORD means we are using 8 bytes operands?
Title: Re: How many registers do I have?
Post by: MichaelW on July 16, 2010, 03:05:21 PM
Yes, and yes. I initially included the OWORD PTR because the defined size of the data does not match the register size. I removed it when 6.15 did not complain, but that clearly was a bad choice.
Title: Re: How many registers do I have?
Post by: frktons on July 16, 2010, 04:44:57 PM
Quote from: MichaelW on July 16, 2010, 03:05:21 PM
Yes, and yes. I initially included the OWORD PTR because the defined size of the data does not match the register size. I removed it when 6.15 did not complain, but that clearly was a bad choice.

:U
Title: Re: How many registers do I have?
Post by: frktons on July 16, 2010, 09:22:26 PM
According to these small experiments and code kindly posted
by some of you, I can now reply to my own question that there
are quite a few registers I can use on my box,  8 general purpose
registers, 8 MMX, 16 XMM and probably some more.

How to use them, and when it is convenient to do it, well that
is a long path to go  :P
Title: Re: How many registers do I have?
Post by: BogdanOntanu on July 16, 2010, 11:28:42 PM
Quote from: frktons on July 16, 2010, 09:22:26 PM
16 XMM and probably some more.

In 32 bits mode you only have access to 8 XMM registers.

In 64bits mode you also have access to 16 GPR registers.

Quote
How to use them, and when it is convenient to do it, well that
is a long path to go  :P

This is not really important from an conceptual point of view.

Most software algorithms can be expressed and perform very well with just a few GPR registers available. At some point the number of registers becomes a market issue and a drag on the CPU speed (selecting from 16 registers instead of 8 registers is slower in electronics).

Also, internally the CPU does perform a few extra tricks like register alias/renaming and even if you overuse the same register the CPU knows better.

Hence, for a start I would not concentrate myself too much on using registers intensively.

It is more important to learn how to express code and algorithms in a "register / memory / jumps / calls and returns"  based "world"  rather that to contemplate optimum register usage.

For an syntax example understanding the diference between EAX and [EAX] is much more important ... and to a certain extent the difference between OFFSET and ADDR operators is also important.
Title: Re: How many registers do I have?
Post by: hutch-- on July 17, 2010, 01:44:16 AM
Frank,

With registers in 32 bit, about 95% of the work is done in the 8 GP registers. The additional registers with their matching instruction sets tend to be more focused on a particularl type of work. FP for maths, MMX sharing the FP registers was the early multi-media extensions that were then bypassed by the extended multi-media instructions and registers (XMM or SIMD) which has then developed through about 4 families of additions (SSE - SSE4.2 Intel ).

Each register type has its own instruction set and the most extensive are the original general purpose integer registers (EAX ECX EDX EBX EBP ESP ESI EDI). 32 bit and later processors removed many restrictions on how the 8 GP registers were used but some legacy instructions still require specific registers to work, XLAT MOVS STOS etc ....
Title: Re: How many registers do I have?
Post by: GregL on July 17, 2010, 02:44:58 AM
(http://images.anandtech.com/reviews/cpu/amd/hammer/x86-64.gif)
Title: Re: How many registers do I have?
Post by: oex on July 17, 2010, 02:47:39 AM
If you have dual core+ you still technically have exactly the same registers though they are silently doubled+
Title: Re: How many registers do I have?
Post by: dedndave on July 17, 2010, 02:57:30 AM
i was waiting for someone to bring up HTT   :P
i have thought some of playing around with "hogging both threads" of my prescott - lol
i don't think it is a great idea in practice, though
seems like you tie up the machine by doing that - better to let the OS manage threads
but - it could give you a whole extra set of registers to play with
not that you could exchange from one set to another efficiently, but you might be able to find some advantage in there
Title: Re: How many registers do I have?
Post by: frktons on July 17, 2010, 08:18:46 AM
Thanks everybody for your suggestions.  :U

Actually my CPU doesn't support Hyper-Threading Technology so
HTT is not an issue for the time being, and it is probably a too advanced
subject for n00bs of my level.  :P

Algorithms in Assembly well that's the matter I'd really like
to grasp a little. I have seen a lot of good books on algorithms in
C/C++/Java and the like. Probably the C/C++ category is the most
close to the machine.
From C, that I'm actually learning, I'll take advantage to get some
Algorithm attitude, so to speak, and then all the way long to translate
or adapt them in MASM/GoAsm whatever.

It's quite a long way though, and the sources are overwhelming  :eek
A step, slow one, at a time, no other choice.  :lol
Title: Re: How many registers do I have?
Post by: frktons on July 19, 2010, 12:20:39 AM
One of the thing I'd like to test is the use of 64 bit registers
to perform the division, that is quite resource consuming, as
many of you have explained to me.

This short mixed code I use for dividing by ten a number
is an example I'd like to improve a little with a better algorithm,
maybe a divide by multiply and shift, and/or with the use of
some 64 bit Assembly trick I'm not aware of:


    long div_result = 0;
    long remain = 0;
    const long ten = 10;
    num2 = rand() % 10000;
    __asm{
    xor   edx, edx
    mov eax, num2
    mov ecx, ten
    idiv   ecx
    mov  div_result, eax
    mov  remain, edx
    }


Probably MMX registers are not well suited for this purpose,
or are slower than GPR, I actually don't know. Surely if I use
the following code, that is obviously in C language:

      num2 = rand() % 10000;
      div_result = (num2 * 6554UL) >> 16;
      remain = num2 - div_result * ten;   


I get a better performance because the algorithm is smarter
and doesn't use division, but a magic number to
multiply the number to divide and after it shifts right the same
number a given number of position.

Of course 6544 works for number not bigger than 9999
and I'd have to calculate the magic number depending on the range
I'm going to use.

So I was wondering what performance could we get using methods
like this with 64 bit registers and a full set of magic numbers
to use.  ::)


Title: Re: How many registers do I have?
Post by: frktons on July 19, 2010, 02:20:14 AM
I translated the C code for divide by multiply and shift:


      div_result = (num2 * 6554UL) >> 16;
      remain = num2 - div_result * ten;   


in Assembly this way:


      mov  eax, num2
      imul  eax, 6554
      shr    eax, 16
      mov  div_result, eax
      mov  ecx, num2
      imul  eax, ten
      sub   ecx, eax
      mov  remain, ecx


But the performances are about the same, and I don't
know if it depends on how good the compiler is to
translate the code, or how bad I am to do the same.  :P

Any suggestion to improve the above code?
Title: Re: How many registers do I have?
Post by: KeepingRealBusy on July 19, 2010, 03:12:14 AM
Magic numbers are good for dividing by using a magic number multiply and shifting, but you get no remainder, and need the shift, and are usually used for dividing by constants and not for dividing by variables. For variables, you would need a table of all possible magic numbers, or a table that contained a pair of number/magic_number entries which had to be searched for a number match to get the magic number to use. The full table would exceed allowable memory (especially for 64 bit). The search would take more time than you would save with the Magic number multiply.

Until you start using 64 bit processing, you do not have 64 bit gp registers (rax,rdx). I do not see any MMX 64 bit register instructions that did divides. Some MMX 64 bit register packed multiplies exist, but nothing that you cannot do with multiply eax and edx. Note, to save a register, put one value in eax, the other in edx, then mul, the 64 bit result in eax:edx (low 32 bits:high 32 bits).

Dave.
Title: Re: How many registers do I have?
Post by: frktons on July 19, 2010, 03:18:02 AM
Thanks Dave.

I was doing naive assumptions, typical beginner stuff  :P

By the way, the code I used to translate the C code is good enough
or could I do better in some ways?
Title: Re: How many registers do I have?
Post by: oex on July 19, 2010, 03:23:00 AM
Quote from: frktons on July 19, 2010, 02:20:14 AM
Any suggestion to improve the above code?

You could swap memory for registers though it really does depend on the surrounding code.... ie I see no need for this line in current code:

mov  div_result, eax

you could also:

mov  eax, num2
mov  ecx, eax
Title: Re: How many registers do I have?
Post by: frktons on July 19, 2010, 03:48:38 AM
Quote from: oex on July 19, 2010, 03:23:00 AM
You could swap memory for registers though it really does depend on the surrounding code.... ie I see no need for this line in current code:

mov  div_result, eax

well I need the div_result variable to use in the C code.

Quote
you could also:

mov  eax, num2
mov  ecx, eax

Well, this is good  :U I can spare some cycles this way. Thanks:



      mov  eax, num2
      mov  ecx, eax
      imul  eax, 6554
      shr    eax, 16
      mov  div_result, eax
      imul  eax, ten
      sub   ecx, eax
      mov  remain, ecx


Nevertheless I'm not able to beat the Pelles'C compiler.
The C code is as fast as the Assembly.  :eek
Title: Re: How many registers do I have?
Post by: oex on July 19, 2010, 04:56:23 AM
Most of the time is taken up in the imuls.... If you can find a way to remove or combine them you should be in luck but it's too late for me to do that math :lol
Title: Re: How many registers do I have?
Post by: jj2007 on July 19, 2010, 06:43:14 AM
Quote from: oex on July 19, 2010, 04:56:23 AM
Most of the time is taken up in the imuls...

imuls are actually pretty fast, much faster than normal muls, so don't waste too much efforts for finding a workaround.
Title: Re: How many registers do I have?
Post by: oex on July 19, 2010, 07:09:36 AM
I was working off the MASM opcodes manual which has them at 13-42 clocks each.... Is there a better ref?

mov, sub and shr are down as 1-3 clocks....

I dont know for sure and it's been a VERY long night but shr, 16 would be:
movzx ebx, ax
I think.... (maybe the other way round.... bswap first) being 16 bit this might be slightly faster?
Title: Re: How many registers do I have?
Post by: frktons on July 19, 2010, 08:38:16 AM
Quote from: oex on July 19, 2010, 07:09:36 AM
I was working off the MASM opcodes manual which has them at 13-42 clocks each.... Is there a better ref?

mov, sub and shr are down as 1-3 clocks....

I dont know for sure and it's been a VERY long night but shr, 16 would be:
movzx ebx, ax
I think.... (maybe the other way round.... bswap first) being 16 bit this might be slightly faster?

Thanks oex, this is another option to try:

movzx ebx, ax


or the code that works for it, I still don't know.  ::)

Back home, on my pc, I'll try it and see if it performs any better.  :P
Title: Re: How many registers do I have?
Post by: hutch-- on July 19, 2010, 08:45:54 AM
Frank and oex, forget old timing manuals in cycles on anything later than a 386 as they have pipelines that "SCHEDULE" instructions and on some of the later processors the throughput of any single instruction without a stall may be 40 to 50 cycles from entry to retirement.

Think of one or more pipelines as instruction assembly production lines like in a factory, performance is measured by the output, not the individual component.
Title: Re: How many registers do I have?
Post by: oex on July 19, 2010, 08:49:51 AM
Oh OK so how many conveyor belts?.... I take it you mean something like....

mov eax, 7
add eax, 3

mov ebx, 5
add ebx, 5

add eax, ebx


Pipeline 1                       Pipeline 2 - Processed on completion P1

mov eax, 7
add eax, 3
                                    add eax, ebx
mov ebx, 5
add ebx, 5

????

If so how many 'component' conveyor belts in each pipeline (ie 2 in this example in P1 and 1 in P2)

Maybe I'm way off again.... I guess this looks more like FPGA but it's been a long night....

Maybe someone could point me in the direction of a diagram and/or code example explanation?
Title: Re: How many registers do I have?
Post by: jj2007 on July 19, 2010, 09:05:30 AM
Quote from: hutch-- on July 19, 2010, 08:45:54 AM
Frank and oex, forget old timing manuals in cycles on anything later than a 386

Indeed.
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
5434    cycles for 100*div
470     cycles for 100*mul
174     cycles for 100*imul
177     cycles for 100*shl
Title: Re: How many registers do I have?
Post by: frktons on July 19, 2010, 09:08:50 AM
On my pc I get different results:

Intel(R) Core(TM)2 Duo CPU     E4500  @ 2.20GHz (SSE4)
175     cycles for 100*mul
95      cycles for 100*imul
62      cycles for 100*shl

--- ok ---


shl looks faster.
Probably because you only shifted 1 position:

mov eax, 2
shl eax, 1

Title: Re: How many registers do I have?
Post by: oex on July 19, 2010, 09:15:16 AM
hmmmm that just creates even more questions for me :lol.... Hutch just said to forget output of individual instructions so what do those timings tell us *out of context*?

From what I can see here now no code can be judged by any means other than testing it?

Quote from: hutch
on some of the later processors the throughput of any single instruction without a stall may be 40 to 50 cycles from entry to retirement
Title: Re: How many registers do I have?
Post by: jj2007 on July 19, 2010, 09:18:29 AM
Quote from: frktons on July 19, 2010, 09:08:50 AM
shl looks faster.
Probably because you only shifted 1 position:

That "shl reg, 1 is faster than shl reg, 15" might be valid for very old CPUs... see updated attachment above.

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
4728    cycles for 100*div
468     cycles for 100*mul
172     cycles for 100*imul
183     cycles for 100*shl 1
181     cycles for 100*shl 2
Title: Re: How many registers do I have?
Post by: frktons on July 19, 2010, 09:21:02 AM
Not many changes, actually:

Intel(R) Core(TM)2 Duo CPU     E4500  @ 2.20GHz (SSE4)
1221    cycles for 100*div
179     cycles for 100*mul
95      cycles for 100*imul
63      cycles for 100*shl 1
63      cycles for 100*shl 2

1226    cycles for 100*div
179     cycles for 100*mul
95      cycles for 100*imul
62      cycles for 100*shl 1
63      cycles for 100*shl 2


--- ok ---


shl still looks 50% faster than imul  ::)

Well I'm working on a Win XP pro/32 bit with a Core 2 duo, I don't
know if, but it seems to make difference.
Title: Re: How many registers do I have?
Post by: jj2007 on July 19, 2010, 09:25:49 AM
Quote from: frktons on July 19, 2010, 09:21:02 AM

shl still looks 50% faster than imul  ::)

Well I'm working on a Win XP pro/32 bit with a Core 2 duo, I don't
know if, but it seems to make difference.

We are talking 0.95 cycles instead of 0.63 cycles per multiplication. And you still have not explained how you want to replace imul  eax, 6554 with some intelligent shift, add etc operations that perform in less than 0.95 cycles...
Title: Re: How many registers do I have?
Post by: frktons on July 19, 2010, 09:35:34 AM
Quote from: jj2007 on July 19, 2010, 09:25:49 AM
We are talking 0.95 cycles instead of 0.63 cycles per multiplication. And you still have not explained how you want to replace imul  eax, 6554 with some intelligent shift, add etc operations that perform in less than 0.95 cycles...

Sorry JJ, I was just showing you what I get. In order to change
the  imul  eax, 6554 with something smarter I've no clue
for the time being, I have to think about that for a while. That is
a magic number and I don't know how to deal with them
without offending them  :lol

Could you suggest something?

By the way, have you any idea why on your machine the imul
and the shift have different performances?  ::) your machine should
be faster than mine according to what is displayed.

Title: Re: How many registers do I have?
Post by: Rockoon on July 19, 2010, 11:35:11 AM
Lets get some AMD representation:

AMD Phenom(tm) II X6 1055T Processor (SSE3)
1899    cycles for 100*div
193     cycles for 100*mul
96      cycles for 100*imul
61      cycles for 100*shl 1
61      cycles for 100*shl 2

1896    cycles for 100*div
193     cycles for 100*mul
96      cycles for 100*imul
61      cycles for 100*shl 1
61      cycles for 100*shl 2


But honestly, the timing of individual instructions is useless. When choosing between SHL and IMUL (and hey, why wasnt LEA represented here?) the other instructions in the pipeline mean everything. IMUL takes over a different execution unit than the SHL does on the latest from both Intel and AMD.

Title: Re: How many registers do I have?
Post by: hutch-- on July 19, 2010, 11:58:32 AM
oex,

The concept of 1 or more pipelines is not something you can control very well independently, it comes more in understanding how they work. You have 2 basic classes of instructions, the RISC preferred set and the old junk, mainly stored in microcode and what recent processors do is present an interface with the x86 instruction set. From a variety of sources you get a reasonably good idea of what the preferred instruction set is and its usually the simpler instructions. MOV ADD SUB TEST CMP, then you have more complex instructions that get slower and this varies from one processor to another, shifts, rotates are usually off the pace on late hardware, XCHG is a lemon, string instructions without REP are worth avoiding but there is special case circuitry when used with REP that cut in after about 500 bytes. On older hardware IMUL MUL were very slow and still are in comparison to preferred instructions but later hardware is getting faster with multiplications as they have additional execution units to do stuff like this.

You get the fastest code for the data size by using preferred instructions and avoiding stalls from a variety of situations, dependency being one of the bad ones that will stop a pipeline until the result it depends on is available. Earlier processors had problems with alignment and some had problems with different data sizes apart from the native unit size, 32 bit and on later stuff, 64 bit.

LEA was fast on everything from a 486 up to the early PIVs where it was off the pace and could be replaced by a number of ADDs in some contexts, on the Core series and later LEA is fast again.
Title: Re: How many registers do I have?
Post by: frktons on July 19, 2010, 12:00:20 PM
Quote from: Rockoon on July 19, 2010, 11:35:11 AM
Lets get some AMD representation:

AMD Phenom(tm) II X6 1055T Processor (SSE3)
1899    cycles for 100*div
193     cycles for 100*mul
96      cycles for 100*imul
61      cycles for 100*shl 1
61      cycles for 100*shl 2

1896    cycles for 100*div
193     cycles for 100*mul
96      cycles for 100*imul
61      cycles for 100*shl 1
61      cycles for 100*shl 2


But honestly, the timing of individual instructions is useless. When choosing between SHL and IMUL (and hey, why wasnt LEA represented here?) the other instructions in the pipeline mean everything. IMUL takes over a different execution unit than the SHL does on the latest from both Intel and AMD.



What about lea, where is she?

Post some representing code of lea please, let's have
a taste of her  :lol
Title: Re: How many registers do I have?
Post by: frktons on July 19, 2010, 12:04:53 PM
Quote from: hutch-- on July 19, 2010, 11:58:32 AM
oex,

The concept of 1 or more pipelines is not something you can control very well independently, it comes more in understanding how they work. You have 2 basic classes of instructions, the RISC preferred set and the old junk, mainly stored in microcode and what recent processors do is present an interface with the x86 instruction set. From a variety of sources you get a reasonably good idea of what the preferred instruction set is and its usually the simpler instructions. MOV ADD SUB TEST CMP, then you have more complex instructions that get slower and this varies from one processor to another, shifts, rotates are usually off the pace on late hardware, XCHG is a lemon, string instructions without REP are worth avoiding but there is special case circuitry when used with REP that cut in after about 500 bytes. On older hardware IMUL MUL were very slow and still are in comparison to preferred instructions but later hardware is getting faster with multiplications as they have additional execution units to do stuff like this.

You get the fastest code for the data size by using preferred instructions and avoiding stalls from a variety of situations, dependency being one of the bad ones that will stop a pipeline until the result it depends on is available. Earlier processors had problems with alignment and some had problems with different data sizes apart from the native unit size, 32 bit and on later stuff, 64 bit.

LEA was fast on everything from a 486 up to the early PIVs where it was off the pace and could be replaced by a number of ADDs in some contexts, on the Core series and later LEA is fast again.

It looks like you never get rest with CPU modifications and upgrades.
Probably you have to stick with whatever is the best for a timeframe
and be ready to change as far as it is needed. ::)
Title: Re: How many registers do I have?
Post by: dedndave on July 19, 2010, 12:19:16 PM
QuoteFrom what I can see here now no code can be judged by any means other than testing it?

even testing it is only valid if you test it on a variety of CPU's
P4 cores are quickly becoming obsolete, in spite of the fact that they may not be all that old

here is the method i use...

(http://img839.imageshack.us/img839/4458/carnac.jpg)
Title: Re: How many registers do I have?
Post by: jj2007 on July 19, 2010, 12:24:53 PM
Quote from: Rockoon on July 19, 2010, 11:35:11 AM
and hey, why wasnt LEA represented here?

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
5440    cycles for 100*div
471     cycles for 100*mul
173     cycles for 100*imul
426     cycles for 100*lea, 2*eax
277     cycles for 100*lea, 2*eax+eax
426     cycles for 100*lea, 2*eax+eax+99
177     cycles for 100*shl 1
177     cycles for 100*shl 2
Title: Re: How many registers do I have?
Post by: sinsi on July 19, 2010, 12:35:07 PM
FWIW,

Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz (SSE4)
1219    cycles for 100*div
178     cycles for 100*mul
94      cycles for 100*imul
94      cycles for 100*lea, 2*eax
94      cycles for 100*lea, 2*eax+eax
94      cycles for 100*lea, 2*eax+eax+99
62      cycles for 100*shl 1
62      cycles for 100*shl 2

1217    cycles for 100*div
178     cycles for 100*mul
94      cycles for 100*imul
94      cycles for 100*lea, 2*eax
94      cycles for 100*lea, 2*eax+eax
94      cycles for 100*lea, 2*eax+eax+99
62      cycles for 100*shl 1
62      cycles for 100*shl 2

No different to the earlier test (I do keep an eye on you mr jj :bg)
Tests should operate on the same data  :naughty:
Title: Re: How many registers do I have?
Post by: hutch-- on July 19, 2010, 12:35:54 PM
 :bg

Frank,

Quote
It looks like you never get rest with CPU modifications and upgrades.
Probably you have to stick with whatever is the best for a timeframe
and be ready to change as far as it is needed. Roll Eyes

Welcome to mixed mode or balanced mode assembler programming.  :P
Title: Re: How many registers do I have?
Post by: Rockoon on July 19, 2010, 01:04:09 PM
Demonstrating the reason why I suggested LEA be tested:

AMD Phenom(tm) II X6 1055T Processor (SSE3)
1896    cycles for 100*div
193     cycles for 100*mul
96      cycles for 100*imul
61      cycles for 100*lea, 2*eax
61      cycles for 100*lea, 2*eax+eax
61      cycles for 100*lea, 2*eax+eax+99
61      cycles for 100*shl 1
61      cycles for 100*shl 2

1896    cycles for 100*div
194     cycles for 100*mul
96      cycles for 100*imul
61      cycles for 100*lea, 2*eax
61      cycles for 100*lea, 2*eax+eax
61      cycles for 100*lea, 2*eax+eax+99
61      cycles for 100*shl 1
61      cycles for 100*shl 2


--- ok ---

AMD never gave up on LEA performance.
Title: Re: How many registers do I have?
Post by: oex on July 19, 2010, 04:06:06 PM
ty guys for your input.... I'm reasonably confident that my code is about as fast as it can be, outside of imul being faster everything else you've been saying seems to be pretty much inkeeping with the current rules I implement, I havent used imul up until now so I might be able to tease a few cycles out of my code yet :bg.... I'll take onboard what you have said and see what improvements I can make :bg
Title: Re: How many registers do I have?
Post by: frktons on July 19, 2010, 05:29:52 PM
Quote from: Rockoon on July 19, 2010, 01:04:09 PM
Demonstrating the reason why I suggested LEA be tested:

AMD Phenom(tm) II X6 1055T Processor (SSE3)
1896    cycles for 100*div
193     cycles for 100*mul
96      cycles for 100*imul
61      cycles for 100*lea, 2*eax
61      cycles for 100*lea, 2*eax+eax
61      cycles for 100*lea, 2*eax+eax+99
61      cycles for 100*shl 1
61      cycles for 100*shl 2

1896    cycles for 100*div
194     cycles for 100*mul
96      cycles for 100*imul
61      cycles for 100*lea, 2*eax
61      cycles for 100*lea, 2*eax+eax
61      cycles for 100*lea, 2*eax+eax+99
61      cycles for 100*shl 1
61      cycles for 100*shl 2


--- ok ---

AMD never gave up on LEA performance.

Good to see lea is fine. ;) Could you post the ASM
as well, I live on it for the time being.  :P
Title: Re: How many registers do I have?
Post by: Rockoon on July 19, 2010, 05:59:29 PM
Quote from: frktons on July 19, 2010, 05:29:52 PM
Good to see lea is fine. ;) Could you post the ASM
as well, I live on it for the time being.  :P

See JJ's post.
Title: Re: How many registers do I have?
Post by: frktons on July 19, 2010, 06:11:45 PM
Oh! Oh! I skipped a couple of post  :P

Miss lea was not improved that much on my CPU:

Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
1216    cycles for 100*div
178     cycles for 100*mul
94      cycles for 100*imul
94      cycles for 100*lea, 2*eax
94      cycles for 100*lea, 2*eax+eax
94      cycles for 100*lea, 2*eax+eax+99
62      cycles for 100*shl 1
62      cycles for 100*shl 2

1217    cycles for 100*div
178     cycles for 100*mul
94      cycles for 100*imul
94      cycles for 100*lea, 2*eax
94      cycles for 100*lea, 2*eax+eax
94      cycles for 100*lea, 2*eax+eax+99
62      cycles for 100*shl 1
62      cycles for 100*shl 2


--- ok ---


Thanks JJ for providing all these fine examples.  :clap:
Title: Re: How many registers do I have?
Post by: Queue on July 19, 2010, 08:34:41 PM
AMD Athlon(tm) 4 Processor (SSE1)
4217 cycles for 100*div
310  cycles for 100*mul
260  cycles for 100*imul
78   cycles for 100*lea, 2*eax
66   cycles for 100*lea, 2*eax+eax
88   cycles for 100*lea, 2*eax+eax+99
74   cycles for 100*shl 1
67   cycles for 100*shl 2

4221 cycles for 100*div
310  cycles for 100*mul
260  cycles for 100*imul
78   cycles for 100*lea, 2*eax
66   cycles for 100*lea, 2*eax+eax
88   cycles for 100*lea, 2*eax+eax+99
75   cycles for 100*shl 1
66   cycles for 100*shl 2

Queue
Title: Re: How many registers do I have?
Post by: KeepingRealBusy on July 19, 2010, 09:16:31 PM

Intel(R) Pentium(R) 4 CPU 3.20GHz (SSE2)
5918    cycles for 100*div
1020    cycles for 100*mul
460     cycles for 100*imul
213     cycles for 100*lea, 2*eax
91      cycles for 100*lea, 2*eax+eax
193     cycles for 100*lea, 2*eax+eax+99
95      cycles for 100*shl 1
90      cycles for 100*shl 2

5838    cycles for 100*div
1013    cycles for 100*mul
466     cycles for 100*imul
196     cycles for 100*lea, 2*eax
99      cycles for 100*lea, 2*eax+eax
197     cycles for 100*lea, 2*eax+eax+99
87      cycles for 100*shl 1
87      cycles for 100*shl 2


--- ok ---

AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ (SSE3)
4344    cycles for 100*div
210     cycles for 100*mul
79      cycles for 100*imul
70      cycles for 100*lea, 2*eax
85      cycles for 100*lea, 2*eax+eax
78      cycles for 100*lea, 2*eax+eax+99
28      cycles for 100*shl 1
61      cycles for 100*shl 2

4425    cycles for 100*div
193     cycles for 100*mul
96      cycles for 100*imul
76      cycles for 100*lea, 2*eax
62      cycles for 100*lea, 2*eax+eax
61      cycles for 100*lea, 2*eax+eax+99
78      cycles for 100*shl 1
81      cycles for 100*shl 2


--- ok ---
Title: Re: How many registers do I have?
Post by: Ficko on March 22, 2012, 05:25:57 PM
Quote from: BogdanOntanu on July 15, 2010, 07:31:57 PM
Quote from: oex on July 15, 2010, 05:52:11 PM
Hey Bogdan, I was wondering is this an OS design choice or a CPU setting?....

It is a CPU design setting. Design choiche of AMD.

Quote
At the lowest level (OS) you could you have switching right?

Not sure what you ask... but if I guess right the answer is NO.

However you do not have to since you can run both 32bits and 64 bits executables in a 64bits OS. You just can not mix them because the CPU forbids it.


I know this answer from "Bogdan" is a little bit old however I just run into this http://vxheavens.com/lib/vrg16.html
Looks like YASM has this "trick" allready "build-in" wondering how this can be done in MASM  ::) to mix 32/64 - bit source code ?