Discussion about a more efficient way to get HI WORD and LO WORD?

Started by Slugsnack, January 20, 2010, 10:43:43 AM

Previous topic - Next topic

Slugsnack

reading this brings up memories of when i read this article :
http://www.flounder.com/optimization.htm

QuoteDo not do clever optimizations that have no meaning. For example, people who try to "optimize" the GUI interface. Hardwired constants, distributed enabling, cute algorithms. The result is something that is hard to develop, difficult to debug, and absolutely impossible to maintain.

Optimization is meaningless here. For more details, you might want to read my essay on the only sensible way I've found to manage dialog controls. I'll summarize the key idea here.

Why doesn't efficiency matter when you're updating menus or controls? Look at the human factors. A mouse is held approximately 2 feet from the ear. Sound travels at approximately 1100 ft/sec. This means that it takes approximately 2ms for the sound of the mouse click or keystroke to reach the ear. The neural path from the fingertip to the brain of an adult is approximately 3 feet. Propagation of nerve impulses is approximately 300 ft/sec, meaning the sensation of the mouse click or keystroke takes approximately 10ms to reach the brain. Perceptual delay in the brain can add between 50 and 250ms more.

Now, how many Pentium instructions can you execute in 2ms, 10ms, or 100ms? In 2ms, on a 500MHz machine that's 1,000,000 clock cycles, so you can execute a lot of instructions in that time. Even on a now-clunky 120MHz Pentium there is no noticeable delay in handling the controls.

imho, for things like this, clearest is best. that is, readability over the potentially saved 2 clock cycles

hutch--

there is an element of truth in that API calls are dictated by the OS, not how its coded but the comment comes with an agenda that goes uphill well into HLLs. There never has been a substitute for knowing the difference and knowing what you are doing. If you think optimising a MessageBox call like VC used to do is useless, try coding a byte scanner in old VB.  :bdg

I just gave it a quick read, I wonder why it has the sniff of old compiler theory ? Any opinion is theory laden, come from a background of coding in HEX,(my older brother still does it from time to time on ICs he has to use) and you will think that high level mnemonics is rubbish. Goes up to the other end and a VH-HLL looks at C++ as really low level and not appropriate.

ome back to the original topic, two MOVZX or MOVSX instructions is the simplest way to do it, why mess it up doing some clunky HLL junk when you can do it so simply with two intel mnemonics.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

still, it is good to know how the different instructions behave
there is nothing wrong with writing efficient code
it develops good habits for the times optimization IS important
i do understand what you are saying though, Mike - and i agree

jj2007

Joe,
If you are curious, use OllyDbg to check what the assembler is really doing:

SWITCH uMsg
  CASE WM_COMMAND
    mov ecx, lParam ; SWITCH uses eax, so we take ecx for the handle in lParam
    movzx eax, word ptr wParam+2
    Switch eax ; the WM_COMMAND message


004012F6              8B4D 14            mov ecx, [ebp+14]
004012F9              0FB745 12          movzx eax, word ptr [ebp+12]

As others have said before, speed optimisation is not an issue here. Check \masm32\help\opcodes.chm for clock counts if you like, but bear in mind that is only relevant if you have an innermost loop that runs a Million times. The WndProc is definitely not the place for optimising for speed - you should optimise for readability here, e.g. by using a consistent SWITCH system.

dedndave

lol - you guys are gonna love me for this....
if i am not optimizing for speed, i am optimizing for size   :bg
there are exceptions to this - only to make the code more readable (last priority - lol)

joemc

To be honest i am just learning the very basics and i appreciate all the information. 

Ghandi

Im must be bad and inefficient, im quite happy to use:


mov eax, dword ptr [wParam]
mov ecx,eax
and eax,0FFFFh
shr ecx,16


For the benefit of anyone who doesnt know, it simply loads the whole dword into EAX, copies it to ECX and then 'extracts' the HIWORD and LOWORD values for use.

HR,
Ghandi

dedndave

that probably would be more efficient, until the SHR - which kind kills the whole thing - lol
also, AND EAX,0FFFFh is 5 bytes

Sinsi has the right idea, i think

jj2007

Quote from: dedndave on January 21, 2010, 03:43:40 AM
that probably would be more efficient, until the SHR - which kind kills the whole thing - lol
also, AND EAX,0FFFFh is 5 bytes

Sinsi has the right idea, i think

movzx eax, word ptr ax is more efficient
:bg

sinsi

'movzx eax,word ptr lParam' must be super efficient then  :bg

PS jj, should be no need for the 'word ptr'
Light travels faster than sound, that's why some people seem bright until you hear them.

Ghandi

Thanks everybody, I'm well aware that the instructions are far from optimal for speed but this is used in a dialog proc, simply to separate the HI and LO words of wParam during a WM_COMMAND message. Its not exactly a time critical routine so this is quite sufficient for my needs. :D

If i were depending on speed i would go with the movzx also.

HR,
Ghandi

jj2007

Quote from: Ghandi on January 21, 2010, 09:21:55 AM
If i were depending on speed i would go with the movzx also.

People who hang around in assembler forums typically go for speed and size. Why use 13 bytes if 8 bytes are enough?

  mov eax, dword ptr [wParam]
  mov ecx,eax
  and eax,0FFFFh
  shr ecx,16
nop
  movzx eax, word ptr wParam
  movzx ecx, word ptr wParam+2


0040131C              . 8B45 10             mov eax, [ebp+10]
0040131F              . 8BC8                mov ecx, eax
00401321              . 25 FFFF0000         and eax, 0FFFF
00401326              . C1E9 10             shr ecx, 10

00401329              . 90                  nop

0040132A              . 0FB745 10           movzx eax, word ptr [ebp+10]
0040132E              . 0FB74D 12           movzx ecx, word ptr [ebp+12]

donkey

The most efficient way is not to bother to split lParam at all. Just use WORD PTR [wParam] for the LOWORD and WORD PTR [wParam+2] for the HIWORD. After all one label is the same as any other and it's likely that they'll just be dumped into a short cmp block anyway so why bother creating new labels for them ?
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

Ghandi


People who hang around in assembler forums typically go for speed and size. Why use 13 bytes if 8 bytes are enough?


Because im not a total purist and 5 bytes of code in a GUI application wont break the bank, when most HDD's these days are a decent capacity. Not only that, but i actually disagree with you on that comment. It depends on the actual usage, because i know for a fact that a lot of experienced assembler programmers will unroll a loop for speed at the expense of size. Smaller does not always mean faster

Im not out to prove anything to anyone, so im not going to try and optimize the hell out of a message pump which i once again iterate, doesnt need to be lightening fast or super-small. You've taken my comment out of context jj2007, quoting the code i already said is not optimal. Please tell me what you actually achieved by quoting my earlier example and also the movzx, which i also said that i would use if i were after speed, because from here it sort of seems like flogging a dead horse.

Omg, if i were banging on about the virtues of using SHR + AND, then i would understand, but i am not. Coming along and trying to point out that i've given incorrect/unoptimal/unsatisfactory code after i have already agreed that it is not ideal for use in time critical routines seems like a bit of overkill, wouldnt you agree?

HR,
Ghandi

jj2007

Quote from: donkey on January 21, 2010, 05:17:17 PM
The most efficient way is not to bother to split lParam at all.

That depends on the number of comparisons. A simple
.if word ptr wParam==3E8h
translates to a 6-byte
66:817D 10 E803          cmp word ptr [ebp+10], 3E8

In contrast,
movzx eax, word ptr wParam
.if eax==3E8

translates to a 4+5=9-byte
0FB745 10                movzx eax, word ptr [ebp+10]
3D E8030000              cmp eax, 3E8

However, each comparison is then one byte shorter. If you have three branches, splitting costs 4+3*5=19 bytes, not splitting costs 3*6=18 bytes. My editor has around 60 cases, that makes it 6*60=360 vs 4+5*60 bytes...

But you are right that for a small number of branches the split is not so useful.