News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Can this be converted to sse

Started by zemtex, December 16, 2011, 05:18:28 PM

Previous topic - Next topic

dedndave

it seems like you should be able to speed it up with SSE using the same algorithm  :P

let me see if the register swap method has merit

dedndave

doesn't look good for swapping a 5 with a 10   :P

zemtex

dedndave, I have found a very fast algorithm for the same kind of function I posted here. If you want to try to convert it from c++ to asm, I highly appreciate it. The guy who wrote this had 15 ms using msvc. It is pretty darn fast for c++. Can you do it faster?  :P


static int sumlol(int m) {
int sum = 0, left = m % 15;
for (int i = 0; i < m - left; i += 15) {
sum += 7 * i + 3 + 5 + 6 + 9 + 10 + 12;
}
for (int i = m - left; i < m; i++) {
if(i % 3 == 0 || i % 5 == 0) {
sum += i;
}
}
return sum;
}


15 milliseconds for this in c++, it is twice as fast as the optimized version you gave me, it gave me 31 ms. But if you take this algorithm and convert it, perhaps we can get down to 12 perhaps?
I have been puzzling with lego bricks all my life. I know how to do this. When Peter, at age 6 is competing with me, I find it extremely neccessary to show him that I can puzzle bricks better than him, because he is so damn talented that all that is called rational has gone haywire.

dedndave

i think i understand it - lol
not very good with C operators

zemtex

I have converted it to pseudo language if you need it, I am not entirely sure if the pseudo is correct though:

result = 0

left = max / 15

i = 0

while i < (Max-left)
   i := i + 15
   resultat  = resultat + 7 * i + 3 + 5 + 6 + 9 + 10 + 12

while i < (Max)
   i := i + 1
   if (i / 3 == 0 || i / 5 == 0
      result = resultat + i
return result
I have been puzzling with lego bricks all my life. I know how to do this. When Peter, at age 6 is competing with me, I find it extremely neccessary to show him that I can puzzle bricks better than him, because he is so damn talented that all that is called rational has gone haywire.

zemtex

His compiler outputs something like this, so you see the compiler is very "smart", it uses the lea for calculation and so forth.

xor ecx, ecx
xor eax, eax
loop1: lea ecx, [ecx+eax+0x2d]
add eax, 0x69
cmp eax, 0x29b926ba
jl loop1

if you can make something out of this...
I have been puzzling with lego bricks all my life. I know how to do this. When Peter, at age 6 is competing with me, I find it extremely neccessary to show him that I can puzzle bricks better than him, because he is so damn talented that all that is called rational has gone haywire.

dedndave

i have a loop that works - kinda
i am not getting the correct answer - let me see if i can figure out why
it takes about 60% as much time as my original loop - considerably faster

zemtex

The guy who wrote the algo have written a new one, this new one runs in 0 ms for 100 million iterations, or so he claims. I don't know what to say, I think i need a cup of pepsi max.  :bdg


static unsigned long sumlol2(unsigned long m) {
unsigned long left = m % 15; m -= left;
unsigned long sum = (m - 1) / 15 * ((m - 1) / 15 + 1) / 2 * 15 * 7 + 45 * (m / 15);
for (; left != 0; m++, left--) {
if(m % 3 == 0 || m % 5 == 0) {
sum += m;
}
}
return sum;
I have been puzzling with lego bricks all my life. I know how to do this. When Peter, at age 6 is competing with me, I find it extremely neccessary to show him that I can puzzle bricks better than him, because he is so damn talented that all that is called rational has gone haywire.

dedndave

yes - lol
i was just working that out - it can be calculated directly without loops   :P

zemtex

Compilers seems to be extremely efficient breaking down long arithmetic statements and finding the right instructions to do it efficiently.
I have been puzzling with lego bricks all my life. I know how to do this. When Peter, at age 6 is competing with me, I find it extremely neccessary to show him that I can puzzle bricks better than him, because he is so damn talented that all that is called rational has gone haywire.

dedndave

they are - that does not mean it cannot be improved upon   :U

back to working on my problem...
http://projecteuler.net/problem=361
not as bad as it looks, i think

zemtex

#26
True that  :U

We need a "compiler" type tool for asm too. Perhaps write a small tool that can quickly parse an assermbly source file to find register stalls. That would be a good start. And then later expand the program to find more optimizations. Alternatively, write an assembly code editor, and with an optimization module implemented it will highlight lines in the source file if it is stalling.

The only way to get complete control of the vastness of optimizations, is if you store all of them in a database and let a program tool let you know about them. If assembly programmers always have to discover slow code by accident, it will be very time consuming. An editor with a line highlighter will force you to make good habits, over time you will learn to avvoid the same mistakes.

One could expand this functionality to automatically show the size of any given code section on the far left side for example. So that size optimizations can be easily controlled. This would require to store all instructions in a database of course.

Perhaps also implement a virtual execution unit that will simulate execution of the instructions and show output on the far right. The editor must be able to switch between different arcitectures, each architecture has its own instruction database, so that the editor is portable to other instruction sets.

A hotkey to pop up instruction listings by category. you could scroll through instructions using arrow keys, kind of like a dos listing system. And press enter to go into the description of each instruction.

Maybe later the editor could be expanded to make COM calls easier, even easier than macro calls. Find a system or a way to make this automatic.

If the editor fits, you could make an automatic coffee machine too, it will output coffee and an automatic boredom detection unit.
I have been puzzling with lego bricks all my life. I know how to do this. When Peter, at age 6 is competing with me, I find it extremely neccessary to show him that I can puzzle bricks better than him, because he is so damn talented that all that is called rational has gone haywire.

dedndave

just write the code in C and generate an assembler listing   :U

zemtex

Yes you could of course. Me, personally have no c++ experience. My high level language of choice have always been Turbo Pascal and Borland Delphi. In Delphi you can make a big program in minutes, but the size of the executable will grow to 700 KB almost instantly.  :bdg

A few years ago, I spent a month trying to learn c++, I managed to create a few smaller programs, but I quit after a month.
I have been puzzling with lego bricks all my life. I know how to do this. When Peter, at age 6 is competing with me, I find it extremely neccessary to show him that I can puzzle bricks better than him, because he is so damn talented that all that is called rational has gone haywire.

dedndave

i think i did the same thing about 25 years ago   :bg
i didn't care for C - but i had to have some familiarity for a job