The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: Astro on April 26, 2010, 01:14:49 AM

Title: mov vs. movzx
Post by: Astro on April 26, 2010, 01:14:49 AM
Hi,

Just want to check I understand this correctly:

mov doesn't always overwrite the upper part of a register?

If that is correct, is:

movzx ecx,eax

equivalent to:

xor ecx,ecx
mov ecx,eax


Best regards,
Robin.
Title: Re: mov vs. movzx
Post by: joemc on April 26, 2010, 01:39:40 AM
I think you have the right idea, but mov ecx,eax is going to move ALL of eax into ecx so the xor isnt going to do much for you. movzx is good for when you want to move one object that is a different size into a larger one.

a better example would be

movzx ecx,  byte ptr 9

is equivalent to
xor ecx ecx
mov cl, 9

the main time is use it is for string pointers
to get one character from a string pointed to by eax
movzx ecx, byte ptr [eax]
Title: Re: mov vs. movzx
Post by: Astro on April 26, 2010, 01:46:00 AM
 ::)

I keep forgetting that part - SIZE DIFFERENCE.

Thanks.

Best regards,
Robin.
Title: Re: mov vs. movzx
Post by: hutch-- on April 26, 2010, 02:20:13 AM
Robin,

movzx is useful for a couple of reasons, you can tweak the size of a smaller piece of data up to the size you want but it has the added advantage of clearing a partial register read which on some earlier Intel hardware gave you a bad stall.

Typically you have a situation of a BYTE of DATA at an address whewre you use MOVZX to copy it into a full 32 bit register so it can be used with full 32 bit comparisons.


movzx eax, WORD PTR [esp+22]


This gets a 16 bit WORD directly off the stack and writes it to EAX so you can test against it with full 32 bit instructions which is generally faster than using 16 bit compares.
Title: Re: mov vs. movzx
Post by: Astro on April 26, 2010, 02:45:04 AM
Nice!! Thanks for the info. I was wondering if:

cmp al,1

was faster or the same as

cmp eax,1

You answered that one! :bg

Best regards,
Robin.
Title: Re: mov vs. movzx
Post by: Slugsnack on April 26, 2010, 07:12:38 AM
Quote from: joemc on April 26, 2010, 01:39:40 AM
I think you have the right idea, but mov ecx,eax is going to move ALL of eax into ecx so the xor isnt going to do much for you. movzx is good for when you want to move one object that is a different size into a larger one.

a better example would be

movzx ecx,  byte ptr 9

is equivalent to
xor ecx ecx
mov cl, 9

the main time is use it is for string pointers
to get one character from a string pointed to by eax
movzx ecx, byte ptr [eax]
actually those 2 are not equivalent. byte ptr 9 is not the same as the value 9. byte ptr 9 is the byte that is pointed to by memory address 9. most likely that would cause an access violation since no memory is allocated there.

@ Astro : generally, you dealing with larger registers rather than partial registers is quicker. so cmp eax, 1 would be faster in this case
Title: Re: mov vs. movzx
Post by: jj2007 on April 26, 2010, 08:46:34 AM
Quote from: Astro on April 26, 2010, 01:14:49 AM
If that is correct, is:
movzx ecx,eax
equivalent to:

xor ecx,ecx
mov ecx,eax


Robin,
movzx eax, ecx will first of all generate error A2070:invalid instruction operands
What works is
movzx eax, ax
movsx eax, ax (sign extension)
movzx eax, cl
movsx eax, cl (sign extension)
movzx eax, word ptr [mem]
movsx eax, word ptr [mem]
movzx eax, byte ptr [mem]
movsx eax, byte ptr [mem]
Title: Re: mov vs. movzx
Post by: Astro on April 26, 2010, 09:40:03 AM
Hi jj,

Yes - I already discovered that! :bg  Thanks for pointing it out though.

@joemc:
Quoteactually those 2 are not equivalent. byte ptr 9 is not the same as the value 9.
I've seen it written both ways, and appears to work identically?

e.g.
mov al, byte ptr 8
mov byte ptr al,8


O/T slightly: I've found a weird error I get in one app but not another.

SomeProc proc Buffer:DWORD

mov eax,Buffer
mov eax,[eax] ; this line fails build in one app but works in another????? Build options are the same except processor type.
; working processor is 386, failing processor is 486.


Best regards,
Robin.
Title: Re: mov vs. movzx
Post by: Slugsnack on April 26, 2010, 09:56:34 AM
Astro : those instructions do very different things. the first treats al as a pointer and moves the byte value at memory address 8 into it. this would cause an access violation on most machines since that is below the minimum application address. the second is not even a valid instruction. you are telling the computer to move the value 8 into the byte pointed to by al. the machine can not treat al as a pointer however, since pointers are all 32 bits.. even if we were to do the following :
movzx eax, al
mov byte ptr ds:[eax], 8

that is very very different to the 'opposite' of the first instruction which i assume was your intention
Title: Re: mov vs. movzx
Post by: MichaelW on April 26, 2010, 10:19:57 AM
For both of these statements:

mov al, byte ptr 8
mov byte ptr al, 8

MASM ignores the unnecessary BYTE PTR operators (unnecessary because the size can be determined from the destination operand) and assembles a MOV reg, immed:

MOV AL, 8

Title: Re: mov vs. movzx
Post by: Slugsnack on April 26, 2010, 10:37:55 AM
this is a subject that has been discussed in the past, how :
mov al, byte ptr 8

actually assembles to :
mov al, 8

personally i believe this can be regarded as a 'bug' on the part of the assembler. the first instruction is actually valid. despite the fact that in 90% of cases, the user intended the second instruction, i do not believe it should be just changed.

above is a clear example of where there has been confusion about this already. the equivalence joemc claims is actually true if that is what you input into masm32 since it is 'translated' like above. however if you assembled that code within ollydbg, the semantics becomes very different
Title: Re: mov vs. movzx
Post by: Astro on April 26, 2010, 12:29:11 PM
So MASM behavior is:

mov al, byte ptr 8
to:
mov al, 8
which is actually INCORRECT?

Best regards,
Robin.
Title: Re: mov vs. movzx
Post by: dedndave on April 26, 2010, 12:32:57 PM
MOV AL,8 is good enough
no need to use "byte ptr", as AL is a byte register, so the operand can only be a byte
Title: Re: mov vs. movzx
Post by: brethren on April 26, 2010, 12:57:46 PM
Quote from: Astro on April 26, 2010, 01:14:49 AM
Hi,

Just want to check I understand this correctly:

mov doesn't always overwrite the upper part of a register?

If that is correct, is:

movzx ecx,eax

equivalent to:

xor ecx,ecx
mov ecx,eax


Best regards,
Robin.

movzx ecx, eax is not a valid instruction

the syntax for movzx (move with zero extend)

movzx r32, r/m16
movzx r32, r/m8
movzx r16, r/m8

i'm pretty sure you just made a typo and you meant movzx ecx, ax. if thats what you meant then you're right

xor ecx, ecx
mov cx, ax

would accomplish the same thing as

movzx ecx, ax
Title: Re: mov vs. movzx
Post by: MichaelW on April 26, 2010, 01:57:22 PM
Quote from: Slugsnack on April 26, 2010, 10:37:55 AM
this is a subject that has been discussed in the past, how :
mov al, byte ptr 8

actually assembles to :
mov al, 8

personally i believe this can be regarded as a 'bug' on the part of the assembler. the first instruction is actually valid.

It is valid, but it is not interpreted as:

mov al, byte ptr [8]

Would be in, for example, Debug or CodeView.

Judging from multiple statements to this effect in the MASM documentation, the PTR operator is intended for specifying operand size. It is not intended for specifying direct memory operands.
Title: Re: mov vs. movzx
Post by: Slugsnack on April 26, 2010, 05:22:46 PM
Quote from: MichaelW on April 26, 2010, 01:57:22 PM
Quote from: Slugsnack on April 26, 2010, 10:37:55 AM
this is a subject that has been discussed in the past, how :
mov al, byte ptr 8

actually assembles to :
mov al, 8

personally i believe this can be regarded as a 'bug' on the part of the assembler. the first instruction is actually valid.

It is valid, but it is not interpreted as:

mov al, byte ptr [8]

Would be in, for example, Debug or CodeView.

Judging from multiple statements to this effect in the MASM documentation, the PTR operator is intended for specifying operand size. It is not intended for specifying direct memory operands.

yes, however :
mov al, [8]

is also incorrectly interpreted as :
mov al, 8

when surely it should be :
mov al, byte ptr ds:[8]
Title: Re: mov vs. movzx
Post by: Astro on April 26, 2010, 05:38:39 PM
The MASM docs are clear in that:

mov eax,Buffer

is the same as:

mov eax,[Buffer]

The [ ] are not explicitly required (this caused me much confusion when I first started!).




The only time [ ] are explicitly required is when dealing with registers:

mov eax,ecx

is NOT the same as:

mov eax,[ecx]




Immediates are not affected, so:

mov eax,8

is the same as:

mov eax,[8]



Quotewhen surely it should be :
mov al, byte ptr ds:[8]

I just tried this - you're correct. [8] should become ds:[8] when built, but it does not.

Best regards,
Robin.
Title: Re: mov vs. movzx
Post by: dedndave on April 26, 2010, 05:40:37 PM
welllll - that isn't quite true
try assembling this and see what code is generated...

        mov     al,8
        mov     al,[8]     ;this one may generate an error message - if so, remove it
        mov     al,ds:[8]
Title: Re: mov vs. movzx
Post by: Astro on April 26, 2010, 05:42:54 PM
Quotemov     al,[8]     ;this one may generate an error message - if so, remove it
It does not generate an error. [8] becomes an immediate value, not a memory reference.

Best regards,
Robin.
Title: Re: mov vs. movzx
Post by: dedndave on April 26, 2010, 05:46:06 PM
here is what i get
00401014 B008                    mov al,08
00401016 B008                    mov al,08
00401018 A008000000              mov al,[00000008]
Title: Re: mov vs. movzx
Post by: clive on April 26, 2010, 07:23:24 PM
I love what it did with "mov al,byte ptr [512]"

Microsoft (R) Macro Assembler Version 6.15.8803     04/26/10 14:20:26
test4.asm      Page 1 - 1


        .386
        .MODEL Flat

00000000         .DATA

00000000 01 foo     db      1

00000000         .CODE

00000000 start:

00000000  B0 08         mov al,8
00000002  B0 08         mov al,[8]
00000004  B0 08         mov al,byte ptr [8]
00000006  A0 00000008         mov al,ds:[8]
0000000B  B0 08         mov al,0[8]
0000000D  B0 7F         mov al,[127]
0000000F  B0 80         mov al,[128]
00000011  B0 FF         mov al,[255]
;        mov al,[512] ; chokes
00000013  B0 00         mov al,byte ptr [512]
00000015  A0 00000000 R         mov al,foo

        END start


At least 5.1 errors with both 512 references

Microsoft (R) Macro Assembler Version 5.10                  4/26/10 14:32:34
                                                             Page     1-1


        .386

0000 _data   SEGMENT PARA PUBLIC 'DATA'

0000  01 foo     db      1

0001 _data   ENDS

0000 _text   SEGMENT PARA USE32 PUBLIC 'CODE'
        ASSUME CS:_text, DS:_data

0000 start:

0000  B0 08         mov al,8
0002  B0 08         mov al,[8]
0004  B0 08         mov al,byte ptr [8]
0006  A0 00000008         mov al,ds:[8]
000B  B0 08         mov al,0[8]
000D  B0 7F         mov al,[127]
000F  B0 80         mov al,[128]
0011  B0 FF         mov al,[255]
0013  B0 00         mov al,[512] ; chokes
test5.asm(22): error A2050: Value out of range
0015  B0 00         mov al,byte ptr [512]
test5.asm(23): error A2050: Value out of range
0017  A0 00000000 R         mov al,foo
001C  B0 08         mov al,offset 8

001E _text   ENDS

        END start
Title: Re: mov vs. movzx
Post by: dedndave on April 26, 2010, 07:44:49 PM
that is interesting
it might be nice to know
but, i would hate to depend on it being a feature instead of a bug
otherwise, it could be handy with an equate
i suppose you could trust:

mov al,byte ptr SomeEquate and 255

the older assemblers would have spit out a syntax error, expecting the equate to find a 16-bit destination
Title: Re: mov vs. movzx
Post by: MichaelW on April 26, 2010, 07:50:06 PM
Quote from: Slugsnack on April 26, 2010, 05:22:46 PM
yes, however :
mov al, [8]
is also incorrectly interpreted as :
mov al, 8

Coming to MASM from Debug, this particular detail threw me too. The section Direct Memory Operands  here (http://webster.cs.ucr.edu/Page_TechDocs/MASMDoc/ProgrammersGuide/Chap_03.htm) specifies how the index operator is used, and under Segment Override:
Quote
A segment name override or the segment override operator identifies the operand as an address expression.
. . .
As the example shows, a constant expression cannot be an address expression unless it has a segment override.

So you can get the behavior that you expect with:

mov al, ds:[8]


Title: Re: mov vs. movzx
Post by: clive on April 26, 2010, 08:41:29 PM
MSVC 12.00 is at least consistent. Though I've never really used the opcode in this form, generally indexing from some base register, or referencing a data structure that MASM knows about.

#include <windows.h>

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
  __asm
  {
        mov al,8
        mov al,[8]
        mov al,byte ptr [8]
        mov al,ds:[8]
        mov al,0[8]
        mov al,[127]
        mov al,[128]
        mov al,[255]
        mov al,[512]
        mov al,byte ptr [512]
  }

  return(0);
}


Disassembly

00000050                    _main:
00000050 55                     push    ebp
00000051 8BEC                   mov     ebp,esp
00000053 53                     push    ebx
00000054 56                     push    esi
00000055 57                     push    edi
00000056 B008                   mov     al,8
00000058 B008                   mov     al,8
0000005A B008                   mov     al,8
0000005C 3EA008000000           mov     al,ds:[8]
00000062 B008                   mov     al,8
00000064 B07F                   mov     al,7Fh
00000066 B080                   mov     al,80h
00000068 B0FF                   mov     al,0FFh
0000006A B000                   mov     al,0
0000006C B000                   mov     al,0
0000006E 33C0                   xor     eax,eax
00000070 5F                     pop     edi
00000071 5E                     pop     esi
00000072 5B                     pop     ebx
00000073 5D                     pop     ebp
00000074 C3                     ret
Title: Re: mov vs. movzx
Post by: hutch-- on April 27, 2010, 01:22:35 AM
The general drift with historical Intel notation that MASM more or less preserves si that it is a fully specified language which means you can still write it in much the same way as a CL.EXE asm dump does where it specifies the data size with every instruction.

While other tools use different notation, masm uses named variables which have corresponding addresses so if you have a stack variable [ebp+16] that is named as "var", placing square brackets around it is the same as writing [[ebp+16]] which is ambiguous as the x86 hardware does not have the mechanism for multiple levels of indirection. masm will allow mov eax, [ecx+edx*4][128] where the contents of the second pair of square brackets are ADDED to the address like any normal displacement but with [named_variable] it just ignores the notation and you just get "named_variable".

Over time a shorthand has developed where you can omit the size specifier "BYTE PTR" and similar if the size can be determined from either of the operands by the assembler but where the size cannot be determined the full specification is required.

movzx eax, [ebp+16]  ; goes bang because there is no way to dtermine the size of the data to be zero extended.

movzx eax, WORD PTR [ebp+16]   ; removes the ambiguity.
Title: Re: mov vs. movzx
Post by: joemc on April 27, 2010, 02:28:48 AM
oops... i had never zero extended an immediate value (since there really is no point), so i just accidentally typed it that way. I did test it before i posted and it worked :) but i agree it is a confusing way when disassembly and debugging tend to look at it differently.