News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

starting assembly

Started by AJMB, May 16, 2008, 11:39:53 PM

Previous topic - Next topic

AJMB

Hello!

I have just started asm coding, for my first attempt I tried to write a Fibonacci sequence generator,
after getting it to work i wanted to see if it was faster than one written in c.
i embedded the asm into this c code
msvs8's compile was faster ... after looking at vs8's assembly output, I made some changes and now my asm is faster. :bg
here is the code, it doesn't actually output the Fibonacci sequence, just how many loops it takes before the 32bit register overflows
is there anything I can do to make it better/faster.(besides having it display more or less information)
if one adds a signed(negative) number to a positive is the overflow flag still set to true?

-----Code-----
#include <stdio.h>
#include <stdlib.h>
#include <windows.h>
#include <winbase.h>
#define asm_code //type whether you want c or assembly code
int main()
{
     int b=1,c=0;
     int a = 0;
     int big = 0;
     float seconds;
     LARGE_INTEGER beginticks, endticks;
     LARGE_INTEGER ticksPerSecond;

     QueryPerformanceFrequency(&ticksPerSecond);
     printf( "Starting. ticks per second: %X%X\n", ticksPerSecond.QuadPart);
     QueryPerformanceCounter(&beginticks);

#ifdef c_code
     printf("c_code");
     while(big!=10000000){
          a=0;
          b=1;
          c=0;
          while(b<2971215073){
               a++;
               c+=b;
               b+=c;
          };
     big++;
     };
#endif

#ifdef asm_code
     printf("asm_code");
     __asm{
          align 16
          mov esi, 10000000
          _label2:
               mov edx, 0
               mov eax, 1
               mov ecx, 0
               _label:
                    add edx, 1
                    add ecx, eax
                    add eax, ecx
               jno _label
               sub esi, 1
          jnz _label2
          mov [a], eax
     };
#endif

     QueryPerformanceCounter(&endticks);

     seconds = ((float)endticks.QuadPart/(float)ticksPerSecond.QuadPart)-
          ((float)beginticks.QuadPart/(float)ticksPerSecond.QuadPart);
     printf("Complete in %u loops. Time taken: %X%X hex ticks or %.10f seconds\n",
          a*2, endticks.QuadPart-beginticks.QuadPart, seconds);
return 0;
}

don't worry about the colored code...it is a python script that I wrote to do that(I didn't go through and color it by hand.)

MichaelW

As coded, the C version will hang. What is the value 2971215073? A signed 32-bit integer will overflow before it reaches this value. I corrected this by changing a, b, and c to unsigned integers and changing the condition to b<2147483648. The C version is incrementing the variable a on each loop, but the ASM version is incrementing edx on each loop and then storing eax in a. I corrected this by changing the store instruction to mov [a], edx. After the changes, and running on a P3, I get 46 loops and ~3.66s for the C version, and 46 loops and ~1.22s for the ASM version. In the short time I spend on it I could not find any way to make the ASM version much faster, but substuting inc edx for add edx, 1 decreased the time by about 20ms.

eschew obfuscation

AJMB

thanks for looking over my code!

2971215073 is the 46th(i think) number in the Fibonacci sequence, if the c code were to continue to the next loop after that number it would overflow an unsigned int.

hmm, that is odd, the c version does not hang for me. I would think that msvs8 would, at least, give me a warning about comparing a signed int with an unsigned int.

on the asm version outputting the wrong register, I had changed it to see what the value was of eax once and had forgotten to change it back to outputting edx... oops.

so "inc reg." is faster than "add reg., 1"  I thought they both took one clock cycle. does this have something to do with the size of the opcode?

(I am on a pIII also)

thanks again!

xmetal

Quote from: AJMB on May 17, 2008, 05:02:10 PM
I would think that msvs8 would, at least, give me a warning about comparing a signed int with an unsigned int.

It generates a warning when both the operands involved are variables. When one of them is a constant and the other a variable, it just silently promotes one of them to unsigned without any warning.

MichaelW

The problem appears to be the Pelles C that I was using: 5.00.6 beta version #4. The code compiles with no warnings, but when executed it hangs. Checking in the debugger big gets incremented once, and after that the inner loop runs continuously. Changing b to an unsigned int corrects the problem, and using the value 2971215073 I get a time of ~6.78s with no compiler optimization, and ~3.62s with.  Using the original code, compiling with CL from the MSVC Toolkit 2003, I get no warning even with /W4, the code runs OK, and using /O2 (the best of the possible choices) I get a time of ~1.65s. If I switch to GCC, I get the not very clear warning "decimal constant is so large that it is unsigned", the code runs OK, and with -Os (the best of the possible choices) I get a time of ~1.65s.

Quoteso "inc reg." is faster than "add reg., 1"

From this small difference in one test you can't really draw any conclusions regarding which instruction is faster, even for the processor that you are running the test on. I know that the add form is recommended for the P4, but in my tests on a P4 I never saw more than a very small difference.
eschew obfuscation