The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: Vineel Kumar Reddy Kovvuri on January 09, 2012, 04:48:56 PM

Title: Generation of push ecx instruction after the prolog of function
Post by: Vineel Kumar Reddy Kovvuri on January 09, 2012, 04:48:56 PM
Hi,

Many times I saw "push ecx" generated after the prolog of each function
and then the room created by the above instruction on the stack is accessed as [ebp - 4] for the local variable access.
My question here is, Is there any special reason of this instruction or it is generated
as a short hand of making space for the local variable (instead of using sub esp -4)


Thanks in advance.
Title: Re: Generation of push ecx instruction after the prolog of function
Post by: dedndave on January 09, 2012, 04:58:28 PM
Quoteit is generated as a short hand of making space for the local variable (instead of using sub esp -4)

:U

which brings up a trick...
initialize ECX, and you have initialized your local   :bg
Title: Re: Generation of push ecx instruction after the prolog of function
Post by: donkey on January 09, 2012, 05:39:27 PM
The PUSH ECX is (usually) the result of a USE ECX directive in the PROC declaration. It is used to preserve the value of ECX across calls to the procedure, there is a subsequent POP ECX added to the RET macro.
Title: Re: Generation of push ecx instruction after the prolog of function
Post by: dedndave on January 09, 2012, 05:56:05 PM
read the post carefully, Edgar   :P
Title: Re: Generation of push ecx instruction after the prolog of function
Post by: donkey on January 09, 2012, 06:17:59 PM
Ah, missed that, I once did that to pass a result in EAX, put in a USE EAX then pointed the RESULT label to that space on the stack, thought it was pretty cool but pretty much pointless. If you only need 4 bytes of stack space you might save a couple of bytes with the push but its hardly worth it.
Title: Re: Generation of push ecx instruction after the prolog of function
Post by: NoCforMe on January 09, 2012, 06:24:54 PM
QuotePUSH   ECX

Are you sure about this? I've never seen this instruction generated in my code. Under what conditions is this generated?

I can't even see what the purpose of this would be, unless you're using ECX in the subroutine. ECX has nothing to do with the stack frame--that's what EBP is used for, no? Or am I missing something here?
Title: Re: Generation of push ecx instruction after the prolog of function
Post by: clive on January 09, 2012, 06:49:12 PM
Quote from: NoCforMeAre you sure about this? I've never seen this instruction generated in my code. Under what conditions is this generated?

No Soup for you, it's something MSVC does, without optimization even, and thus would be present in just about any commercial application or driver you care to look at.

Quote#include <stdio.h>

int test(void)
{
  int i;

  return(i);
}

int main(int argc, char **argv)
{
  test();

  return(1);
}

QuoteDisassembly

00000030                    _test:                      ; Xref 0000003E
00000030 55                     push    ebp
00000031 8BEC                   mov     ebp,esp
00000033 51                     push    ecx
00000034 8B45FC                 mov     eax,[ebp-4]
00000037 8BE5                   mov     esp,ebp
00000039 5D                     pop     ebp
0000003A C3                     ret

0000003B                    _main:
0000003B 55                     push    ebp
0000003C 8BEC                   mov     ebp,esp
0000003E E8EDFFFFFF             call    _test
00000043 B801000000             mov     eax,1
00000048 5D                     pop     ebp
00000049 C3                     ret
Title: Re: Generation of push ecx instruction after the prolog of function
Post by: NoCforMe on January 09, 2012, 07:08:11 PM
Ah, I see. Not generated by any assembler, and just a way of moving the stack pointer.

So is a PUSH cheaper/faster than a SUB of SP? I guess I'll have to look at Hutch's opcode help file to find out.

Can I get my soup now?
Title: Re: Generation of push ecx instruction after the prolog of function
Post by: Vineel Kumar Reddy Kovvuri on January 09, 2012, 07:12:47 PM
Quote from: donkey on January 09, 2012, 05:39:27 PM
The PUSH ECX is (usually) the result of a USE ECX directive in the PROC declaration. It is used to preserve the value of ECX across calls to the procedure, there is a subsequent POP ECX added to the RET macro.


There is no POP ECX in the generated code in the function


_main PROC
push ebp
mov ebp, esp
push ecx
mov DWORD PTR [ebp-4], 4660 ; 00001234H
mov eax, 22136 ; 00005678H
mov esp, ebp
pop ebp
ret 0


for

int main()
{
int x = 0x1234;
return 0x5678;
}


compiled as

cl /Zi /GS- test.c /Fasc

Title: Re: Generation of push ecx instruction after the prolog of function
Post by: dedndave on January 09, 2012, 07:16:58 PM
the MASM epilogue generally uses LEAVE, thus no balancing POP is required
if you guys are depending on this thread for lunch, you'll go hungry   :(
Title: Re: Generation of push ecx instruction after the prolog of function
Post by: clive on January 09, 2012, 07:20:27 PM
Quote from: Vineel Kumar Reddy Kovvuri
There is no POP ECX in the generated code in the function

No, because the stack frame is effectively collapsed with the MOV ESP,EBP and the content of local/automatic variables is lost as the scope disappears.

Now if the code used EBX,ESI,EDI, you might find it does something different.

You'll also note that LEAVE doesn't track what the prologue code, or ENTER, does.
Title: Re: Generation of push ecx instruction after the prolog of function
Post by: jj2007 on January 09, 2012, 08:21:44 PM
I took the liberty to time this MSVC-optimised code.
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
499     cycles for 100*MSVC
500     cycles for 100*ByHand

499     cycles for 100*MSVC
500     cycles for 100*ByHand


By the way: "ByHand" is exactly one byte shorter. Guess why :bg
Title: Re: Generation of push ecx instruction after the prolog of function
Post by: clive on January 09, 2012, 09:41:15 PM
Quote from: dedndavewhich brings up a trick... initialize ECX, and you have initialized your local

Or PUSH 0 or PUSH 01234h, which might been even cleverer. Larger tables in reverse of course.

Though typically what C will do is allocate the space, and then copy in the initializers. This can get highly inefficient, with say a large table of constants like CRC tables, where the programmer really should have chosen static const so it would be stored in the code section or in a ROM.

ByHand doesn't maintain the same stack pointer. The example was more of a quick hack to demonstrate a case where PUSH ECX existed, vs the ADD ESP,-4 which would normally occur as the frame is created.

AMD Phenom(tm) II X6 1055T Processor (SSE3)
720     cycles for 100*MSVC
499     cycles for 100*ByHand

710     cycles for 100*MSVC
500     cycles for 100*ByHand
Title: Re: Generation of push ecx instruction after the prolog of function
Post by: jj2007 on January 09, 2012, 09:56:10 PM
In any case the push ecx in Clive's disassembled C example does ... nothing, absolutely nothing useful:

ByHand:
   push ebp
   mov ebp, esp
   push ecx  ; you don't need this one
   mov eax, [ebp-4]
if 0
   leave  ; pardon, too easy and too short  :wink
else
   mov esp, ebp
   pop ebp
endif
   ret

Unless you use Dave's trick, but 1. that does not appear to be the purpose of the C example and 2. it could easily be achieved with a mov eax, ecx...
Title: Re: Generation of push ecx instruction after the prolog of function
Post by: Antariy on January 10, 2012, 02:40:34 AM
Quote from: jj2007 on January 09, 2012, 09:56:10 PM
In any case the push ecx in Clive's disassembled C example does ... nothing, absolutely nothing useful:

Actually, yes, but that was a "strange" example just for showing the point - the code have no sense -  the compiler will even complain about that function working with the uninitialized parameter (because code is unpredictable - some kind of random numbers generator).
But, if the code would be changed a bit, for example:


int test(void)
{
  int i;
  printf("qwe\n"); // <<<

  return(i);
}

int main(int argc, char **argv)
{
  test();

  return(1);
}



then without "push ecx" the place for local "i" would not be allocated, i.e. esp=ebp, [ebp-4] is the next [esp] value - the address of the string, passed to printf.


push ebp
mov ebp,esp     ; [ebp] = [esp]
; push ecx      ; <<< if remove it from the real working code then
push CTXT("qwe"); [ebp-4] = [esp]
call _printf
mov eax,[ebp-4] ; if printf was not changed its local - pointer to the string
mov esp,ebp     ; - then eax would be pointer to the string "qwe"
pop ebp
ret


I.e. without "push ecx" there is a chance for local "i" to be overwritten. Compiler, which follows simple rule "to be straightforward and robust as it is possible", just producing this local allocation (it might use "sub esp,4" - but it is 3 times longer than "push reg") without any assumptions about usefulness of the code or local variable itself :green2
Title: Re: Generation of push ecx instruction after the prolog of function
Post by: Antariy on January 10, 2012, 02:48:58 AM
Also, sometimes it it very funny to see some pieces of code. For example, MSVC++ passing object reference to the class member functions via ECX. And the start of the functions may begin with such a code:


push ebp
mov ebp,esp
push ecx ; ECX is the ptr to the object
mov [ebp-4],ecx ; very cool...


Maybe, this is the limitations of optimized code generation due to EH (it needs [ebp-4] place if it is present).



Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
1281    cycles for 100*MSVC
1502    cycles for 100*ByHand

1285    cycles for 100*MSVC
1499    cycles for 100*ByHand
Title: Re: Generation of push ecx instruction after the prolog of function
Post by: clive on January 10, 2012, 08:11:11 AM
Quote from: jj2007In any case the push ecx in Clive's disassembled C example does ... nothing, absolutely nothing useful

Indeed, it was a minimal contrived test case that generates the PUSH ECX being queried by the OP, in fact the compiler can also generate a pair of PUSH ECX's instead of an ADD ESP,-8

The OP in fact had an example where the space for the local variable was created by the PUSH ECX, and subsequently initialized. And as I also noted that might more efficiently be achieved with an immediate PUSH of the constant itself.

The point was that it is a fairly common code construct, despite others never seeing it before.
Title: Re: Generation of push ecx instruction after the prolog of function
Post by: jj2007 on January 10, 2012, 08:47:42 AM
Clive,
No offense intended, and sorry that I hijacked this to demonstrate that compilers lack intelligence :bg
A propos,
   mov esp, ebp
   pop ebp

equals
   leave
Two bytes shorter, same speed, at least on a P4.
:thumbu
Title: Re: Generation of push ecx instruction after the prolog of function
Post by: clive on January 10, 2012, 11:46:36 PM
Ok, I was trying the understand the goal of timing it, you'd probably want to examine a more practical example.

The whole ENTER/LEAVE concept has become quite murky, I've not looked at how this performs across different micro-architectures recently. I've not even seen the more complex ENTER forms in a very long while.
Title: Re: Generation of push ecx instruction after the prolog of function
Post by: dedndave on January 11, 2012, 01:48:38 AM
i think ENTER is slow enough that it is better to "manually" create the stack frame
LEAVE, on the other hand makes for a nice little shortcut
if you have no locals and a balanced stack, you can just POP EBP, though   :P
Title: Re: Generation of push ecx instruction after the prolog of function
Post by: jj2007 on January 11, 2012, 02:11:01 AM
Clive & Dave,
For you:
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
1203    cycles for 100*TestFrames Enter
600     cycles for 100*TestFrames Leave
1203    cycles for 100*TestFrames Enter & Leave
499     cycles for 100*TestFrames Push & Pop

1203    cycles for 100*TestFrames Enter
600     cycles for 100*TestFrames Leave
1204    cycles for 100*TestFrames Enter & Leave
499     cycles for 100*TestFrames Push & Pop

12      bytes for TestFramesE
9       bytes for TestFramesL
10      bytes for TestFramesEL
11      bytes for TestFramesPP


Which confirms Dave's view: Enter is a no-brainer, leave is quite ok unless you have the strange habit of calling time-critical code in an innermost loop instead of inlining it. Leave is one cycle faster on my Celeron, zero cycles on the P4.