News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

An ASM clock counter for C++

Started by MikeT, October 26, 2009, 12:56:39 AM

Previous topic - Next topic

MikeT

Anyone have a nice little snippet that can count clock cycles between two points in VS2008 C++ code?
I am dying to see how long the cast of a CHAR to INT takes...

hutch--

Mike,

You don't really need asm to do this, you can use API code for timing but I seriously doubt you can get the granularity to test something like this. Just look up you timing APIs and it should be easy enough to do.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

BlackVortex

Yeah, better use QueryPerformanceCounter or whatever it's called. RDTSC is overkill and unreliable.

MikeT

Oh ok. I got used to using a nice piece of ASM in another language and I thought it would be worthwhile here.

Since C++ doesn't support the BYTE data type I have to convert the byte values to an int. That doesn't sound like its going to be pretty and since the function I am writing is in a heavily traveled path, I thought it might be a good idea to check it before looking for a better solution.

My code converts a string date like 12/29/2008 to a Julian date integer. To avoid using am expensive string conversion three times, I just find the byte values, subtract their ascii value and sum them for the day, mo, yr values.

   
for( i=0; i < sDate.size(); i++ )
{
   bVal = (int)a[i]; // Byte value
   if( bVal >= 48 || bVal <= 57 ) // 0 - 9
Sum = Sum*10 + bVal - 48;
.
.


Do I need to replace that type conversion with ASM perhaps?

MichaelW

Quote from: MikeT on October 26, 2009, 12:56:39 AM
I am dying to see how long the cast of a CHAR to INT takes...

Not very long. Compiling this code:

int main(void)
{
    int i, bVal, Sum = 0;
    char a[20];

    for( i=0; i < 20; i++ )
    {
        bVal = (int)a[i];                       // Byte value
        if( bVal >= 48 || bVal <= 57 )  // 0 - 9
        Sum = Sum*10 + bVal - 48;
    }

    printf("%d\n\n",Sum);

    getch();
    return 0;
}


With the Visual C++ Toolkit 2003 compiler, using:

cl /O2 /G6 /FA test.c

The relevant parts of the assembly output are:

_main PROC NEAR ; COMDAT
; Line 2
sub esp, 20 ; 00000014H
; Line 3
xor eax, eax
; Line 6
xor edx, edx
$L545:
; Line 8
movsx ecx, BYTE PTR _a$[esp+edx+20]
; Line 9
cmp ecx, 48 ; 00000030H
jge SHORT $L550
cmp ecx, 57 ; 00000039H
jg SHORT $L546
$L550:
; Line 10
lea eax, DWORD PTR [eax+eax*4]
lea eax, DWORD PTR [ecx+eax*2-48]
$L546:
movsx ecx, BYTE PTR _a$[esp+edx+21]
cmp ecx, 48 ; 00000030H
jge SHORT $L566
; Line 9
cmp ecx, 57 ; 00000039H
jg SHORT $L567
$L566:
; Line 10
lea eax, DWORD PTR [eax+eax*4]
lea eax, DWORD PTR [ecx+eax*2-48]
$L567:
movsx ecx, BYTE PTR _a$[esp+edx+22]
cmp ecx, 48 ; 00000030H
jge SHORT $L568
; Line 9
cmp ecx, 57 ; 00000039H
jg SHORT $L569
$L568:
; Line 10
lea eax, DWORD PTR [eax+eax*4]
lea eax, DWORD PTR [ecx+eax*2-48]
$L569:
movsx ecx, BYTE PTR _a$[esp+edx+23]
cmp ecx, 48 ; 00000030H
jge SHORT $L570
; Line 9
cmp ecx, 57 ; 00000039H
jg SHORT $L571
$L570:
; Line 10
lea eax, DWORD PTR [eax+eax*4]
lea eax, DWORD PTR [ecx+eax*2-48]
$L571:
movsx ecx, BYTE PTR _a$[esp+edx+24]
cmp ecx, 48 ; 00000030H
jge SHORT $L572
; Line 9
cmp ecx, 57 ; 00000039H
jg SHORT $L573
$L572:
; Line 10
lea eax, DWORD PTR [eax+eax*4]
lea eax, DWORD PTR [ecx+eax*2-48]
$L573:
add edx, 5
cmp edx, 20 ; 00000014H
jl SHORT $L545


The cast essentially turns a MOV into a MOVSX. Note the unrolled loop, and the use of LEA to do the:

Sum = Sum*10 + bVal - 48;

For a good compiler, compiler-optimized code is typically somewhere close to optimal. If you use inline assembly it will probably limit the compiler's ability to optimize.
eschew obfuscation

Kyle

Very true about the inline assembly hindering the optimization, but if you're serious about optimizations you can take what the compiler put out ( optimized ) and attempt to further it  :green.