Quote from: sinsi on March 13, 2010, 07:58:14 AM
How inefficient.
...
I thought a while loop tested before, so you test the initial value before any processing e.g.
mov ebx,whatever
@@:
test ebx,ebx
js @f
dec ebx
jmp @b
@@:
Good idea, but this version is a bit faster:
mov ebx, 1000
test ebx, ebx
js DontEnter
@@:
dec ebx ; **** insert optimised manual while loop here ****
jns @B
DontEnter:
Celeron M:
1025 cycles for .While ... .Endw
1025 cycles for manual while ... coding
Quote from: hutch-- on March 13, 2010, 08:01:09 AM
Just for the people who use the Campus, can we keep discussion on details for the Workshop or similar as we don't want to baffle a learner with technology.
I fully agree. Learners should not be confused by vague promises that there are ways to make a .While loop faster by some magic "hand coding".
Why does every freaking thread turn into a cycle-saving witch hunt ?
Did anyone say it's speed-critical ? No. Next time, assume it isn't.
Look inside this page for the first occurence of the word "speed".
Yes, so donkey made a little reference/comment and you turned this into a benchmark ! :bg
The difference is basically this, structured loops do the job in many places but there are many other designs where they don't work well. A structured loop has single entry and exit layouts and this is very clunky when you need multiple entry and exit loop designs that are often interdependent and then have similar subloops in them.
If your code design requires a structured loop, use it if it suits you but if you need more complicated logic, write what you need at the mnemonic level. Empty test pieces are close to useless as instruction scheduling and loop code are interdependent. It will be the architecture that makes the difference one way or another, not bean counting the instruction order.
As a matter of personal preference I code loops manually as I am used to doing it but there is nothing wrong with a structured loop if that is what you need to do the job.
Hi Hutch and BlackVortex,
I am sorry, I did try not to get drawn into this mess. I don't really mind the optimization stuff but I wasn't going to be drawn into a completely useless test as loop optimization is extremely implementation specific. I just saw that in the original post he implied heavily nested loops and a complex algorithm so I mentioned that hand coding whether "magic" or not might help to speed things up but I also mentioned that the intrinsic HLC's were fairly efficient and if they fit the bill why not use them. I just don't normally get involved in the speed test garbage unless it really interests me and that is exceedingly rare as I see most optimization, especially a lot of the stuff on the board lately, as useless wastes of time.
Edgar
Quote from: BlackVortex on March 14, 2010, 02:25:59 AM
Yes, so donkey made a little reference/comment and you turned this into a benchmark ! :bg
A "little reference"? No, he promised a magic speed increase by "hand-coding", and I
offered a benchmarking testbed where he could
demonstrate his hand-coding skills. I like the magic of speedy and compact assembler code. What I don't like is false promises, especially in The Campus.
Quote from: jj2007 on March 14, 2010, 08:33:36 AM
Quote from: BlackVortex on March 14, 2010, 02:25:59 AM
Yes, so donkey made a little reference/comment and you turned this into a benchmark ! :bg
A "little reference"? No, he promised a magic speed increase by "hand-coding", and I offered a benchmarking testbed where he could demonstrate his hand-coding skills. I like the magic of speedy and compact assembler code. What I don't like is false promises, especially in The Campus.
Your benchmark is crap.
:bg
Come on guys, everyone who has been around for some length of time knows this stuff, in non-critical loops .WHILE or .REPEAT loops work fine if thats how you like to code but there will always be loop designs that you can only code in mnemonics and the more complex they are the more this is the case. Structured loops have their limitations, especially for multiple entry/exit loops with nested dependencies and interactive loop design. Even adaptive optimising compilers have problems with complex loop design and this is among the reasons why they often speak against complex loop structures.
Hutch,
Just give me one example where a hand-coded loop is faster than its .While equivalent, and I'll never write about it again. Promised.
JJ,
My only problem with .WHILE loops is when you need any other form of loop. I regularly write multiple entry and exit loops, interdependent loop, stepped nested loops and none of them code well using high level architecture.
This is why I agree with the advice that if a .WHILE loop does the job for you, use it but if you have other designs in mind, write them in mnemonics.
No example...?
Donkey is busy with more important stuff, his header project. :thumbu
But what about the branch prediction hints, when using the conditionals you can't use them, isn't that a performance consideration in some cases ? (I've been reading the GoAsm manual, fisrt time I've heard of these prefixes...)
Hi BlackVortex,
Not so much that I don't have time, I'm just not interested, but it is fun popping in every once in a while to see how jj2007 is doing :)
Edgar
In my tests, run on a P3, the .WHILE and .REPEAT .UNTIL loops have slightly more overhead, even without complex exit conditions.
;==============================================================================
.NOLIST
;==============================================================================
include \masm32\include\masm32rt.inc
.686
include \masm32\macros\timers.asm
;==============================================================================
.LISTALL
;==============================================================================
.data
.code
;==============================================================================
start:
;==============================================================================
invoke Sleep, 3000
counter_begin 1000, HIGH_PRIORITY_CLASS
mov ebx, 100
.WHILE ebx
dec ebx
.ENDW
counter_end
print str$(eax),13,10
counter_begin 1000, HIGH_PRIORITY_CLASS
mov ebx, 100
.REPEAT
dec ebx
.UNTIL ebx == 0
counter_end
print str$(eax),13,10
counter_begin 1000, HIGH_PRIORITY_CLASS
mov ebx, 100
@@:
dec ebx
jnz @B
counter_end
print str$(eax),13,10
xor esi, esi
counter_begin 1000, HIGH_PRIORITY_CLASS
mov ebx, 100
.WHILE ebx
.BREAK .IF esi
dec ebx
.ENDW
counter_end
print str$(eax),13,10
counter_begin 1000, HIGH_PRIORITY_CLASS
mov ebx, 100
@@:
test esi, esi
jnz @F
dec ebx
jnz @B
@@:
counter_end
print str$(eax),13,10
inkey "Press any key to exit..."
exit
;==============================================================================
end start
234
245
205
277
208
I guess masm is actually doing all the compares instead of just depending on the Z flag, right ?
(too bored to assemble and disassemble,hehe)
Thanks, Michael, that you took the time to deliver a concrete basis for arguing. I appreciate.
Here is my version of your code (note that your last manual example was actually not a while but rather a repeat loop).
.NOLIST
include \masm32\include\masm32rt.inc
.686
include \masm32\macros\timers.asm
.LISTALL
.code
start:
;==============================================================================
invoke Sleep, 3000
counter_begin 1000, HIGH_PRIORITY_CLASS
mov ebx, 100
.REPEAT
dec ebx
.UNTIL Zero? ; not: ebx == 0
counter_end
print str$(eax), " - rep loop",13,10
counter_begin 1000, HIGH_PRIORITY_CLASS
mov ebx, 100
@@:
dec ebx
jnz @B
counter_end
print str$(eax), " - rep manual", 13,10
xor esi, esi
counter_begin 1000, HIGH_PRIORITY_CLASS
mov ebx, 100
jj = 1
if jj
.Repeat
test esi, esi
.Break .if !Zero?
dec ebx
.Until Zero?
else
.WHILE ebx
.BREAK .IF esi
dec ebx
.ENDW
endif
counter_end
print str$(eax)," - rep2 (ex while)", 13,10
counter_begin 1000, HIGH_PRIORITY_CLASS
mov ebx, 100
@@:
test esi, esi
jnz @F
dec ebx
jnz @B
@@:
counter_end
print str$(eax)," - rep2, manual", 13,10
inkey "Press any key to exit..."
exit
;==============================================================================
end start
Celeron M:
121 - rep loop
121 - rep manual
209 - rep2 (ex while)
209 - rep2, manual
Edit: P4:
177 - rep loop
177 - rep manual
293 - rep2 (ex while)
293 - rep2, manual
Edit: I took away this one because there was no hand-coded equivalent present
counter_begin 1000, HIGH_PRIORITY_CLASS
mov ebx, 100
.WHILE ebx
dec ebx
.ENDW
counter_end
print str$(eax), " - while loop", 13,10
Quote from: BlackVortex on March 15, 2010, 12:06:42 AM
But what about the branch prediction hints, when using the conditionals you can't use them, isn't that a performance consideration in some cases ? (I've been reading the GoAsm manual, fisrt time I've heard of these prefixes...)
You could use them by inserting the db xx, but:
Quote
Branch hint prefixes (http://gcc.gnu.org/ml/gcc/2008-02/msg00634.html)
GCC has support for this feature, but it has turned out to not gain
anything and was disabled by default, since branch reordering streamlines
code well enough to match the default predictor behaviour.
Same conclusion was done by other compiler teams too, ICC is not
generating the hints either.
Try a .while/.repeat loop with multiple .break/.continue options and see what it does. 'dec ebx' isn't a great benchmark...
I'm sure I read somewhere that the branch prefix hints are ignored in the core2 and later, and were a real waste of time (agner maybe?).
Quote from: jj2007You could use them by inserting the db xx
Just use a CS: or DS: prefix, that's all they are.
JJ,
These will do for satarters.
wordreplace
WordCount
szRemove
szRep
szMonoSpace
StrLen
qssorta
parse_line
ltok
wtok
InString
Quote from: hutch-- on March 15, 2010, 07:49:21 AM
JJ,
These will do for satarters.
???
Quote from: sinsi on March 15, 2010, 07:42:49 AM
Try a .while/.repeat loop with multiple .break/.continue options and see what it does. 'dec ebx' isn't a great benchmark...
Example?
>Example?
You are the clock meister, I am the one who doesn't use invoke, much less .while. Some .while/.break code when disassembled is like spaghetti...
Whatever happened to LAMPS? That was always good for a chuckle or two :bg
Quote from: sinsi on March 15, 2010, 09:15:12 AM
Whatever happened to LAMPS? That was always good for a chuckle or two :bg
LAMPS work only if the code is different in size, speed, or both. Since apparently nobody is able to find an example where a hand-coded loop is
different from its matching high level construct, let alone more efficient, there is no point in applying LAMPs...
JJ,
While we will happily agree about using .WHILE where it works, the list of library modules were all written with non-standard loop designs and while you could probably cobble together the same algorithms with .REPEAT and .WHILE structured loops, it would be ugly sloppy code that did it.
As usual, horses for courses.