News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Stack: size of instructions using local variables

Started by jj2007, April 04, 2008, 08:12:38 PM

Previous topic - Next topic

jj2007

Maybe this is trivial but it might be ok for the Campus: I stumbled over a big difference in the size of instructions when using local variables. Here is my test proc:

StackTest proc arg:DWORD
LOCAL smallbuffer[BufSize]:BYTE
LOCAL locVar:DWORD
LOCAL buffer[800]:BYTE
mov locVar, 87654321h
mov eax, locVar
mov eax, arg
mov locVar, eax
mov arg, eax
ret
StackTest endp


Now, with smallbuffer=128, Olly finds this:

00404E3B  /$ 55                          PUSH EBP
00404E3C  |. 8BEC                        MOV EBP,ESP
00404E3E  |. 81C4 5CFCFFFF               ADD ESP,-3A4
00404E44  |. C785 7CFFFFFF 21436587      MOV DWORD PTR SS:[EBP-84],87654321    ; ***
00404E4E  |. 8B85 7CFFFFFF               MOV EAX,DWORD PTR SS:[EBP-84]    ; ***
00404E54  |. 8B45 08                     MOV EAX,DWORD PTR SS:[EBP+8]
00404E57  |. 8985 7CFFFFFF               MOV DWORD PTR SS:[EBP-84],EAX    ; ***
00404E5D  |. 8945 08                     MOV DWORD PTR SS:[EBP+8],EAX
00404E60  |. C9                          LEAVE
00404E61  \. C2 0400                     RETN 4


However, the same with smallbuffer=120 looks quite different:

00404E3B  /$ 55                          PUSH EBP
00404E3C  |. 8BEC                        MOV EBP,ESP
00404E3E  |. 81C4 64FCFFFF               ADD ESP,-39C
00404E44  |. C745 84 21436587            MOV DWORD PTR SS:[EBP-7C],87654321
00404E4B  |. 8B45 84                     MOV EAX,DWORD PTR SS:[EBP-7C]    ; ***
00404E4E  |. 8B45 08                     MOV EAX,DWORD PTR SS:[EBP+8]
00404E51  |. 8945 84                     MOV DWORD PTR SS:[EBP-7C],EAX    ; ***
00404E54  |. 8945 08                     MOV DWORD PTR SS:[EBP+8],EAX
00404E57  |. C9                          LEAVE
00404E58  \. C2 0400                     RETN 4


Note the difference in size for the instructions marked *** - 3 bytes instead of 6. If there are lots of local variables around, it may make a big difference in size (and speed? I guess it also affects the CPU cache).
Simple conclusion: Start the LOCALs with DWORD variables, and put long structures and buffers at the end.

Ossa

Well, this isn't too surprising. Look at the encoding of instructions:


  • Prefixes
  • Opcode
  • Mod R/M
  • SIB
  • Displacement
  • Immediate

Now, looking at the stack after this:

push Param1
push Param2
call Some_Function


or

invoke Some_Function, Param2, Param1

you get:

+-------------------+
|  Old stack items  | <----- (from calling procedure)
|-------------------|
|      Param 1      |
|-------------------|
|      Param 2      |
|-------------------|
|       EIP 1       |
|-------------------|
|       EBP 1       |
|-------------------| <----- ebp
|      Local 1      |
|-------------------|
|      Local 2      |
|-------------------| <----- esp
|                   |
+-------------------+


So to reference a LOCAL, you will be referring to it as a displacement from ebp (since esp will vary with push's and pop's in the procedure). Now, the opcode is 1 byte, the Mod R/M is also 1 byte and then displacement is either 1, 2 or 4 bytes (from manual). Now, in the second example you have (for example):

00404E4B  |. 8B45 84                     MOV EAX,DWORD PTR SS:[EBP-7C]

; 8B = Opcode:            MOV r32, r/m32
; 45 = Mod R/M:           EAX, EBP+disp8
; 84 = Displacement (8):  84 = -7C (2's complement)


this format allows you to address offsets of up to -80h=-128 to +7Fh=+127, but in the first example, because you first local is larger, you make the displacement larger than 128 bytes, which means that it has to switch to a larger displacement. What is not apparent in the Intel manuals to start with is that 2 bytes displacements are only available in 16-bit real mode. This means that MASM has to switch to a 32-bit displacement for the first example, hence:

00404E4E  |. 8B85 7CFFFFFF               MOV EAX,DWORD PTR SS:[EBP-84]

; 8B = Opcode:            MOV r32, r/m32
; 85 = Mod R/M:           EAX, EBP+disp32
; 84 = Displacement (32): 7CFFFFFF = -84 (2's complement)
; (the DWORD might look a bit wrong due to the little endian nature of the IA-32 architecture)


which is where you 3 byte increase comes from. Yes it will affect speed, as there is a limit on instruction sizes that can be decoded by the processor under certain circumstances. As you say, declaring small values first will bring "distance" from most of your local variables from ebp down considerably. However, I'd like to clarify that a bit: if a large buffer is referenced very often compared to a smaller buffer, it would be better to put the larger buffer first. This means that you would reduce more instructions than you would otherwise manage.

Ossa

[edit] I think that was a silly post from me... it didn't really add anything to what you said jj2007. [/edit]
Website (very old): ossa.the-wot.co.uk

jj2007

Quote from: Ossa on April 04, 2008, 10:42:59 PM
[edit] I think that was a silly post from me... it didn't really add anything to what you said jj2007. [/edit]

No, that was not silly: you added the theory. There are plenty of novices who read this forum, and you explained it really nicely - thanxalot!