News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Is mixing 16 and 32-bit operands inefficient?

Started by mercifier, March 30, 2007, 07:27:15 AM

Previous topic - Next topic

mercifier

I'm beginning to learn Intel assembler (using MASM in console mode so far) and I don't think that I quite understand how the processor distinguishes between different sizes of operands. I recall reading something aboute prefixes and default operand sizes. Does this mean that an instruction reading a 16-bit value from memory is actually slower (and takes up more space) than one using a 32-bit value? How does this apply to instrucions which do not access memory, such as mov ax,bx v.s. mov eax,ebx?

(I guess that some of you think that I should "R.T.F.M.", but could you please tell me which "F.M." to read. There are so many of them!  :P)

TNick

RTFM!!!! :bdg :bdg :bdg :bdg

If you want to see how instructions are encoded, you may want to read Intel's manuals. To give you a basic idea, most instruction are, in fact, families of instructions. At assemble time, the assembler is choosing the appropriate one based on the operands that you passed and the rule that "the shorter the better".

About 16 bit operands in 32 bit mode, yes, the instructions will be slower because, if I recall correctly, always 32 bits are read from memory and then the interesting part is masked. (same for 8 bit operands).

F.M: Most people around here agreed that the best way to start learning assembly for windows are Iczelion's tutorials. Links to this and many other useful stuff you may find here

Nick

japheth

> About 16 bit operands in 32 bit mode, yes, the instructions will be slower because, if I recall correctly, always 32 bits are read from memory.

True, but it should be noted that it also depends on how the data is aligned -  i.e, reading a DWORD from linear address 401002h should be slower that reading a WORD from the very same address.

hutch--

Hi mercifier,

Welcome on board. The best "F" "M" is the manufacturer's technical data and with Intel this means the PIV set of manuals that you can download from their site. AMD also have technical manuals that are very useful. You already have an asm background so this approach will probably be more use to you. x86 is different and quirky in some respects but it has a mountain of code and examples around and it is very well understood.

With your example of 16 to 32 bit code, later processors (in the last 10 years) tend to have a preferred internal "WORD" size that they are the most efficient at using and in the case of most of the later processors, this means a preferred WORD size of 32 bit. As mentioned above, the 16 bit versions of a particular mnemonic use prefixes which make the code larger but in most instances the code is also slower.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

mercifier

Thanks for all the answers! (I'll check out that tutorial, for sure!)

In particular, this...

Quote from: hutch-- on March 30, 2007, 12:59:49 PM
16 bit versions of a particular mnemonic use prefixes which make the code larger but in most instances the code is also slower.

was what i was thinking about. I recall reading something like that in a document, but it wouldn't tell what impact those prefixes could have on performance. Obviously, an extra byte added to the op-code just to differ 16 from 32-bit operation could'nt exactly make it execute faster, but I was figuring it didn't really matter most of the time when executing on a pipelined processor with out-of-order and speculative execution features. (I.e. the op-code fetching isn't where the bottleneck lies, but in the execution.)

Or am I wrong?

EduardoS

It's processor specific.
Usually neither the fetching or execution (here throughput) are bottlenecks, but dependency chains.

add eax, bx; p1
mov var, eax
mov eax, var2; p2

Here p1 and p2 are independent and the OOO engine can execute p2 before p1.

add ax, bx; p1
mov var, ax
mov ax, var2; p2

Here p2 aren't really dependent on p1, but on modern processors they are dependent (a false dependency) and p1 must execute before p2, Agner talks about it, RTFM :bdg

mercifier

Quote from: EduardoS on March 31, 2007, 01:22:15 AM

add eax, bx; p1


I suppose it should be add eax, ebx there.

Quote from: EduardoS on March 31, 2007, 01:22:15 AM
Agner talks about it, RTFM :bdg

Gonna check that out. Guess this was a little more complicated than I first thought! :)

EduardoS

Quote
I suppose it should be add eax, ebx there.
True, didn't note that after the ctrl+c/ctrl+v