The MASM Forum Archive 2004 to 2012

General Forums => The Workshop => Topic started by: MikeT on October 26, 2009, 12:56:39 AM

Title: An ASM clock counter for C++
Post by: MikeT on October 26, 2009, 12:56:39 AM
Anyone have a nice little snippet that can count clock cycles between two points in VS2008 C++ code?
I am dying to see how long the cast of a CHAR to INT takes...
Title: Re: An ASM clock counter for C++
Post by: hutch-- on October 26, 2009, 01:00:47 AM
Mike,

You don't really need asm to do this, you can use API code for timing but I seriously doubt you can get the granularity to test something like this. Just look up you timing APIs and it should be easy enough to do.
Title: Re: An ASM clock counter for C++
Post by: BlackVortex on October 26, 2009, 05:44:33 AM
Yeah, better use QueryPerformanceCounter or whatever it's called. RDTSC is overkill and unreliable.
Title: Re: An ASM clock counter for C++
Post by: MikeT on October 26, 2009, 06:08:37 AM
Oh ok. I got used to using a nice piece of ASM in another language and I thought it would be worthwhile here.

Since C++ doesn't support the BYTE data type I have to convert the byte values to an int. That doesn't sound like its going to be pretty and since the function I am writing is in a heavily traveled path, I thought it might be a good idea to check it before looking for a better solution.

My code converts a string date like 12/29/2008 to a Julian date integer. To avoid using am expensive string conversion three times, I just find the byte values, subtract their ascii value and sum them for the day, mo, yr values.

   
for( i=0; i < sDate.size(); i++ )
{
   bVal = (int)a[i]; // Byte value
   if( bVal >= 48 || bVal <= 57 ) // 0 - 9
Sum = Sum*10 + bVal - 48;
.
.


Do I need to replace that type conversion with ASM perhaps?
Title: Re: An ASM clock counter for C++
Post by: MichaelW on October 26, 2009, 07:22:07 AM
Quote from: MikeT on October 26, 2009, 12:56:39 AM
I am dying to see how long the cast of a CHAR to INT takes...

Not very long. Compiling this code:

int main(void)
{
    int i, bVal, Sum = 0;
    char a[20];

    for( i=0; i < 20; i++ )
    {
        bVal = (int)a[i];                       // Byte value
        if( bVal >= 48 || bVal <= 57 )  // 0 - 9
        Sum = Sum*10 + bVal - 48;
    }

    printf("%d\n\n",Sum);

    getch();
    return 0;
}


With the Visual C++ Toolkit 2003 compiler, using:

cl /O2 /G6 /FA test.c

The relevant parts of the assembly output are:

_main PROC NEAR ; COMDAT
; Line 2
sub esp, 20 ; 00000014H
; Line 3
xor eax, eax
; Line 6
xor edx, edx
$L545:
; Line 8
movsx ecx, BYTE PTR _a$[esp+edx+20]
; Line 9
cmp ecx, 48 ; 00000030H
jge SHORT $L550
cmp ecx, 57 ; 00000039H
jg SHORT $L546
$L550:
; Line 10
lea eax, DWORD PTR [eax+eax*4]
lea eax, DWORD PTR [ecx+eax*2-48]
$L546:
movsx ecx, BYTE PTR _a$[esp+edx+21]
cmp ecx, 48 ; 00000030H
jge SHORT $L566
; Line 9
cmp ecx, 57 ; 00000039H
jg SHORT $L567
$L566:
; Line 10
lea eax, DWORD PTR [eax+eax*4]
lea eax, DWORD PTR [ecx+eax*2-48]
$L567:
movsx ecx, BYTE PTR _a$[esp+edx+22]
cmp ecx, 48 ; 00000030H
jge SHORT $L568
; Line 9
cmp ecx, 57 ; 00000039H
jg SHORT $L569
$L568:
; Line 10
lea eax, DWORD PTR [eax+eax*4]
lea eax, DWORD PTR [ecx+eax*2-48]
$L569:
movsx ecx, BYTE PTR _a$[esp+edx+23]
cmp ecx, 48 ; 00000030H
jge SHORT $L570
; Line 9
cmp ecx, 57 ; 00000039H
jg SHORT $L571
$L570:
; Line 10
lea eax, DWORD PTR [eax+eax*4]
lea eax, DWORD PTR [ecx+eax*2-48]
$L571:
movsx ecx, BYTE PTR _a$[esp+edx+24]
cmp ecx, 48 ; 00000030H
jge SHORT $L572
; Line 9
cmp ecx, 57 ; 00000039H
jg SHORT $L573
$L572:
; Line 10
lea eax, DWORD PTR [eax+eax*4]
lea eax, DWORD PTR [ecx+eax*2-48]
$L573:
add edx, 5
cmp edx, 20 ; 00000014H
jl SHORT $L545


The cast essentially turns a MOV into a MOVSX. Note the unrolled loop, and the use of LEA to do the:

Sum = Sum*10 + bVal - 48;

For a good compiler, compiler-optimized code is typically somewhere close to optimal. If you use inline assembly it will probably limit the compiler's ability to optimize.
Title: Re: An ASM clock counter for C++
Post by: Kyle on October 29, 2009, 10:50:56 AM
Very true about the inline assembly hindering the optimization, but if you're serious about optimizations you can take what the compiler put out ( optimized ) and attempt to further it  :green.