News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

call instead of invoke!

Started by LAS3R, February 12, 2005, 04:57:15 PM

Previous topic - Next topic

hutch--

While some of the techniques propsed here are very interesting, at the bottom line, if you want to use TASM CALL syntax, write it in TASM. If you want MASM functionality, write it in MASM using INVOKE. TASM is barely supported these days and its only the old timers who know how to use it so there is little point in wasting time trying to emulate an out of date assembler.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

James Ladd

MichaelW,
One reason I could see people doing what you have shown is to make a debug output to console version of call and easily switch it off to do
the non-debug version at another time. Makes it rather easy and nice to do call interception.
Of course, doing the same with invoke is what I could do. :)

Grincheux

If you have several calls it is possible to replace them with a jump. For example you want to post a message to several controls :

The syntax with invoke is :

invoke PostMessage,hWnd1,EM_LIMTTEXT,256,0
invoke PostMessage,hWnd2,EM_LIMTTEXT,256,0
invoke PostMessage,hWnd3,EM_LIMTTEXT,256,0

It could be replaced with

push 0
push 256
push EM_LIMITTEXT
push hWnd3
push OFFSET Label_1

push 0
push 256
push EM_LIMITTEXT
push hWnd2
push OFFSET PostMessage

push 0
push 256
push EM_LIMITTEXT
push hWnd1
push OFFSET PostMessage
jmp  PostMessage

Label_1 :

It is more difficult than using INVOKE but you don't use 2 calls !
I think it is quickyer and more compact than INVOKE.

What do you think ?
:eek
Kenavo

Philippe RIO


Kenavo

Grincheux
_____________________________________________________
http://www.phrio.biz

Tedd

Intel processors LIKE you to use paired call/return.
Using this method screws up prediction, so although there appears to be less work, I'm not sure of how much of a speed-up it will actually give.
Just a random thought.
No snowflake in an avalanche feels responsible.

AeroASM


donkey

Certain operations use prediction to determine how to fill the cache. For example a Jcc backwards will normally be taken so the branch prediction circuitry will fill the cache following the Jcc with instructions from the point where it jumps to. A forward Jcc is not normally taken so it fills the cache with instructions immediately after the opcode. In the P4 you can affect the prediction algorithm with the branch hint opcode modifiers or using the hint.xxxx directive in GoAsm...

2Eh - hint that the branch will not occur most of the time.
3Eh - hint that the branch will occur most of the time.

GoAsm only...

CMP EDX,EAX
hint.branch JZ >L1
;
L1:


Speed penalties for missed cache hits can be substantial, stalling the processor while the cache is being emptied and refilled.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

Randall Hyde

Quote from: Tedd on March 01, 2005, 11:40:50 AM
Intel processors LIKE you to use paired call/return.
Using this method screws up prediction, so although there appears to be less work, I'm not sure of how much of a speed-up it will actually give.
Just a random thought.

I'm not sure branch prediction would make a difference here.
This, however, is one of those "assembly language horrors" that has made assembly language famous for being difficult to read, maintain, and debug.  For the two call operations it saves, you would be hard-pressed to convince me that the time savings are worth the few cycles saved.

Also note: this trick is going to screw up many debuggers that track things like call stacks.

Finally, don't forget that this scheme consumes more stack space than three sequential calls. Usually this isn't important, but sometimes it can be (e.g., in device drivers and ISRs).

It *is* a neat hack, but it definitely falls into the category of "being tricky for tricky's sake" and this is exactly the kind of code that the following programmer who works on the program *doesn't* want to see.
Cheers,
Randy Hyde

Grincheux

I don't think like you. But every one is free...

You can save more than 2 calls. Imagine a call to wsprintf followed by a call to MessageBox.

In the wsprintf call you initialize a local data which is used whith MessageBox. The syntax is :

push MB_OK
push OFFSET szCaption
lea    eax,szLocalResult
push  eax
push  hWnd
push  OFFSET Label_1

push dwData
push OFFSET szFmt
push  eax
push  OFFSET MessageBox
jmp   wsprintf

Label_1 :

add   esp,12

Ok this method is longer than INVOKE, but it is ASM language not a C like...
If I want to write in C I use Visual Studio and if I want to write in asm I use MASM.

With the "high level" extension (why "high level" ?) it is not easy to optimize our program.
If I use asm it's to have quick and small programs.
It always is longer to develop in ASM than in C or PASCAL.
This long time must also be used for optimization.

An other avantage of this method is for reverse engeenering.
It is more difficult to dissasembly these programs.

Thanks for everyone who take some time to give me an answer, even if what I think is different, very different.

Kenavo

Kenavo

Grincheux
_____________________________________________________
http://www.phrio.biz

pbrennick

IDCat,
I respect your right to have your own opinion but atsome point you need to listen to the voice of experience.  After many years of programming, that which causes problems gets left behind and while us 'oldtimers' were climbing the ladder, believe me, we tried it all.  You will see as time goes by.  In the meantime, have fun.

Paul

Grincheux

I understand and think you are right.

Thanks
Kenavo

Grincheux
_____________________________________________________
http://www.phrio.biz

hutch--

I certainly agree with Paul here, we have all done our share of code designs that the processor barely struggles to make sense of but if you want another human being to be able to maintain it some time down the track, you would risk him coming after you with an axe or as Mirno put it, a desire to kneecap you for having done so.

The design you proposed is clever and it is worth understanding how things like tis are done but putting it into production code is a surefire method for you to remove your name from the code and leave the country before the next guy finds you.  :green2
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

donkey

You know I have seen some pretty neat coding schemes, some that saved 1000 clocks !!! But if you think about it well what does that represent. On a 1Ghz processor, it represents 1/1000000 of a second. So if you managed to save that 1000 clocks every second it would be 11.57 days before you would accumulate 1 second of time savings. However, from a standpoint of clarity, come back to your own code after not touching it for 2 years and see how much of your valuable time you waste tracing through the unintelligible code. I will always take the clearer source over the questionable optimizations. And BTW, last I checked the Intel manuals CALL/RET were actual opcodes and therefore are assembly, not C or anything else.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

Ratch

IDCat,

I like your method, but the basics are not new.  It is an old trick to PUSH variables onto the stack when they are available, but not immediately needed.  I believe that compiliers do that all the time.  Does the CPU really give a damn if a RET does not have a matching CALL? And if the "chained" return addresses are within a busy inner loop, the saved time could mean something.  As for obfuscation, be sure to document what you are doing.  You are not engaging in anything weird that is difficult to understand.  Ratch

AeroASM

donkey, the point is that if programs were written well and fully optimized, we could run about ten programs simultaneously without stalling on a 100Mhz.

pbrennick

I like Donkey's words, they remind me of something I used to say to individuals learning how toprogram, 'What difference does it make whether you get the answer in picoseconds or nanoseconds as long as the answer is correct, because for all intents and purposes; we cannot tell the difference.'

Paul