Hi,
Many times I saw "push ecx" generated after the prolog of each function
and then the room created by the above instruction on the stack is accessed as [ebp - 4] for the local variable access.
My question here is, Is there any special reason of this instruction or it is generated
as a short hand of making space for the local variable (instead of using sub esp -4)
Thanks in advance.
Quoteit is generated as a short hand of making space for the local variable (instead of using sub esp -4)
:U
which brings up a trick...
initialize ECX, and you have initialized your local :bg
The PUSH ECX is (usually) the result of a USE ECX directive in the PROC declaration. It is used to preserve the value of ECX across calls to the procedure, there is a subsequent POP ECX added to the RET macro.
read the post carefully, Edgar :P
Ah, missed that, I once did that to pass a result in EAX, put in a USE EAX then pointed the RESULT label to that space on the stack, thought it was pretty cool but pretty much pointless. If you only need 4 bytes of stack space you might save a couple of bytes with the push but its hardly worth it.
QuotePUSH ECX
Are you sure about this? I've never seen this instruction generated in my code. Under what conditions is this generated?
I can't even see what the purpose of this would be, unless you're using ECX in the subroutine. ECX has nothing to do with the stack frame--that's what EBP is used for, no? Or am I missing something here?
Quote from: NoCforMeAre you sure about this? I've never seen this instruction generated in my code. Under what conditions is this generated?
No Soup for you, it's something MSVC does, without optimization even, and thus would be present in just about any commercial application or driver you care to look at.
Quote#include <stdio.h>
int test(void)
{
int i;
return(i);
}
int main(int argc, char **argv)
{
test();
return(1);
}
QuoteDisassembly
00000030 _test: ; Xref 0000003E
00000030 55 push ebp
00000031 8BEC mov ebp,esp
00000033 51 push ecx
00000034 8B45FC mov eax,[ebp-4]
00000037 8BE5 mov esp,ebp
00000039 5D pop ebp
0000003A C3 ret
0000003B _main:
0000003B 55 push ebp
0000003C 8BEC mov ebp,esp
0000003E E8EDFFFFFF call _test
00000043 B801000000 mov eax,1
00000048 5D pop ebp
00000049 C3 ret
Ah, I see. Not generated by any assembler, and just a way of moving the stack pointer.
So is a PUSH cheaper/faster than a SUB of SP? I guess I'll have to look at Hutch's opcode help file to find out.
Can I get my soup now?
Quote from: donkey on January 09, 2012, 05:39:27 PM
The PUSH ECX is (usually) the result of a USE ECX directive in the PROC declaration. It is used to preserve the value of ECX across calls to the procedure, there is a subsequent POP ECX added to the RET macro.
There is no POP ECX in the generated code in the function
_main PROC
push ebp
mov ebp, esp
push ecx
mov DWORD PTR [ebp-4], 4660 ; 00001234H
mov eax, 22136 ; 00005678H
mov esp, ebp
pop ebp
ret 0
for
int main()
{
int x = 0x1234;
return 0x5678;
}
compiled as
cl /Zi /GS- test.c /Fasc
the MASM epilogue generally uses LEAVE, thus no balancing POP is required
if you guys are depending on this thread for lunch, you'll go hungry :(
Quote from: Vineel Kumar Reddy Kovvuri
There is no POP ECX in the generated code in the function
No, because the stack frame is effectively collapsed with the MOV ESP,EBP and the content of local/automatic variables is lost as the scope disappears.
Now if the code used EBX,ESI,EDI, you might find it does something different.
You'll also note that LEAVE doesn't track what the prologue code, or ENTER, does.
I took the liberty to time this MSVC-optimised code.
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
499 cycles for 100*MSVC
500 cycles for 100*ByHand
499 cycles for 100*MSVC
500 cycles for 100*ByHand
By the way: "ByHand" is exactly one byte shorter. Guess why :bg
Quote from: dedndavewhich brings up a trick... initialize ECX, and you have initialized your local
Or PUSH 0 or PUSH 01234h, which might been even cleverer. Larger tables in reverse of course.
Though typically what C will do is allocate the space, and then copy in the initializers. This can get highly inefficient, with say a large table of constants like CRC tables, where the programmer really should have chosen
static const so it would be stored in the code section or in a ROM.
ByHand doesn't maintain the same stack pointer. The example was more of a quick hack to demonstrate a case where PUSH ECX existed, vs the ADD ESP,-4 which would normally occur as the frame is created.
AMD Phenom(tm) II X6 1055T Processor (SSE3)
720 cycles for 100*MSVC
499 cycles for 100*ByHand
710 cycles for 100*MSVC
500 cycles for 100*ByHand
In any case the push ecx in Clive's disassembled C example does ... nothing, absolutely nothing useful:
ByHand:
push ebp
mov ebp, esp
push ecx ; you don't need this one
mov eax, [ebp-4]
if 0
leave ; pardon, too easy and too short :wink
else
mov esp, ebp
pop ebp
endif
ret
Unless you use Dave's trick, but 1. that does not appear to be the purpose of the C example and 2. it could easily be achieved with a mov eax, ecx...
Quote from: jj2007 on January 09, 2012, 09:56:10 PM
In any case the push ecx in Clive's disassembled C example does ... nothing, absolutely nothing useful:
Actually, yes, but that was a "strange" example just for showing the point - the code have no sense - the compiler will even complain about that function working with the uninitialized parameter (because code is unpredictable - some kind of random numbers generator).
But, if the code would be changed a bit, for example:
int test(void)
{
int i;
printf("qwe\n"); // <<<
return(i);
}
int main(int argc, char **argv)
{
test();
return(1);
}
then without "push ecx" the place for local "i" would not be allocated, i.e. esp=ebp, [ebp-4] is the next [esp] value - the address of the string, passed to printf.
push ebp
mov ebp,esp ; [ebp] = [esp]
; push ecx ; <<< if remove it from the real working code then
push CTXT("qwe"); [ebp-4] = [esp]
call _printf
mov eax,[ebp-4] ; if printf was not changed its local - pointer to the string
mov esp,ebp ; - then eax would be pointer to the string "qwe"
pop ebp
ret
I.e. without "push ecx" there is a chance for local "i" to be overwritten. Compiler, which follows simple rule "to be straightforward and robust as it is possible", just producing this local allocation (it might use "sub esp,4" - but it is 3 times longer than "push reg") without any assumptions about usefulness of the code or local variable itself :green2
Also, sometimes it it very funny to see some pieces of code. For example, MSVC++ passing object reference to the class member functions via ECX. And the start of the functions may begin with such a code:
push ebp
mov ebp,esp
push ecx ; ECX is the ptr to the object
mov [ebp-4],ecx ; very cool...
Maybe, this is the limitations of optimized code generation due to EH (it needs [ebp-4] place if it is present).
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
1281 cycles for 100*MSVC
1502 cycles for 100*ByHand
1285 cycles for 100*MSVC
1499 cycles for 100*ByHand
Quote from: jj2007In any case the push ecx in Clive's disassembled C example does ... nothing, absolutely nothing useful
Indeed, it was a minimal contrived test case that generates the PUSH ECX being queried by the OP, in fact the compiler can also generate a pair of PUSH ECX's instead of an ADD ESP,-8
The OP in fact had an example where the space for the local variable was created by the PUSH ECX, and subsequently initialized. And as I also noted that might more efficiently be achieved with an immediate PUSH of the constant itself.
The point was that it is a fairly common code construct, despite others never seeing it before.
Clive,
No offense intended, and sorry that I hijacked this to demonstrate that compilers lack intelligence :bg
A propos,
mov esp, ebp
pop ebp
equals
leave
Two bytes shorter, same speed, at least on a P4.
:thumbu
Ok, I was trying the understand the goal of timing it, you'd probably want to examine a more practical example.
The whole ENTER/LEAVE concept has become quite murky, I've not looked at how this performs across different micro-architectures recently. I've not even seen the more complex ENTER forms in a very long while.
i think ENTER is slow enough that it is better to "manually" create the stack frame
LEAVE, on the other hand makes for a nice little shortcut
if you have no locals and a balanced stack, you can just POP EBP, though :P
Clive & Dave,
For you:
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
1203 cycles for 100*TestFrames Enter
600 cycles for 100*TestFrames Leave
1203 cycles for 100*TestFrames Enter & Leave
499 cycles for 100*TestFrames Push & Pop
1203 cycles for 100*TestFrames Enter
600 cycles for 100*TestFrames Leave
1204 cycles for 100*TestFrames Enter & Leave
499 cycles for 100*TestFrames Push & Pop
12 bytes for TestFramesE
9 bytes for TestFramesL
10 bytes for TestFramesEL
11 bytes for TestFramesPP
Which confirms Dave's view: Enter is a no-brainer, leave is quite ok unless you have the strange habit of calling time-critical code in an innermost loop instead of inlining it. Leave is one cycle faster on my Celeron, zero cycles on the P4.