News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Timing Macros for PB

Started by MikeT, January 04, 2008, 03:01:01 PM

Previous topic - Next topic

MikeT


hutch--

Hi Mike,

Welcome on board. As far as I know no-one has ever ported Michael's macros to PB and there is the problem that the two languages are not the same in their macro capacity.

What I would be inclined to do is write a normal function or two and code the timing method then put it in a seperate file so it can be included when needed.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

MikeT

Hi Hutch,

That would work. How hard is it to port that ASM to PB?
Are there many PB ASM members?

hutch--

Mike,

the timing macros that Michael has written are very useful in comparing and optimising low level algorithms but I seriously doubt they would be much use to you in application development. If you need to benchmark different algos there are many ways to do this without the level of specialty involved in these timing macros. I still personally time most algos on a large sample with GetTickCount but there are other API functions with better resolution and of course RDTSC if you know how to use its 64 bit resolution.

Perehaps let us know what you are trying to do and we can probably help you in the direction you need.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

MichaelW

Mike,

I'm not sure if this is doable in PB, but I did manage to create a system that consists of one macro one define and two procedures that works well for FreeBASIC. Here are the essential parts of the code:

#include once "windows.bi"

dim shared _counter_tsc1_ as ulongint, _counter_tsc2_ as ulongint
dim shared _counter_overhead_ as ulongint, counter_cycles as ulongint
dim shared _counter_loop_counter_ as uinteger

sub _counter_code1_
    asm
        '
        ' Use same CPUID input value for each call.
        '
        xor eax, eax
        '
        ' Flush pipe and wait for pending ops to finish.
        '
        cpuid
        '
        ' Read Time Stamp Counter.
        '
        rdtsc
        '
        ' Save count.
        '
        mov [_counter_tsc1_], eax
        mov [_counter_tsc1_+4], edx
    end asm
end sub

sub _counter_code2_
    asm
        xor eax, eax
        cpuid
        rdtsc
        mov [_counter_tsc2_], eax
        mov [_counter_tsc2_+4], edx
    end asm
end sub

'' Unlike the #define directive, the #macro directive
'' allows inline asm.
''
#macro COUNTER_BEGIN( loop_count, priority_class )
    _counter_overhead_ = 2000000000
    counter_cycles = 2000000000
    SetPriorityClass( GetCurrentProcess(), priority_class )
    Sleep_(0)                 '' Start a new time slice
    ''
    '' The nops compensate for the 10-byte instruction (that
    '' initializes _counter_loop_counter_) between the alignment
    '' directive and the loop label, which ideally needs to be
    '' aligned on a 16-byte boundary.
    ''
    asm
      .balign 16
      nop
      nop
      nop
      nop
      nop
      nop
    end asm
    for _counter_loop_counter_ = 1 to loop_count
        _counter_code1_
        _counter_code2_
        if (_counter_tsc2_ - _counter_tsc1_) < _counter_overhead_ then
            _counter_overhead_ = _counter_tsc2_ - _counter_tsc1_
        endif
    next
    Sleep_(0)                 '' Start a new time slice
    asm
      .balign 16
      nop
      nop
      nop
      nop
      nop
      nop
    end asm
    for _counter_loop_counter_ = 1 to loop_count
        _counter_code1_
#endmacro
''
'' *** Note the open FOR loop ***
''
#define COUNTER_END _
        _counter_code2_ :_
        if (_counter_tsc2_ - _counter_tsc1_) < counter_cycles then :_
            counter_cycles = _counter_tsc2_ - _counter_tsc1_ :_
        endif :_
    next :_
    SetPriorityClass( GetCurrentProcess(), NORMAL_PRIORITY_CLASS ) :_
    counter_cycles -= _counter_overhead_

eschew obfuscation

MikeT

Hutch,
I am sure you are right, but I would like to explore this anyway. I learn by exploring :)

                         

     

#COMPILE EXE
#DIM ALL


GLOBAL hDbg AS LONG             
               
' #INCLUDE once "windows.bi"  ?

GLOBAL  CounterTSC1, CounterTSC2, CounterOverhead , CounterCycles AS LONG
GLOBAL  LoopCounter AS INTEGER
           

'¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤'
SUB CounterCode1

        '
        ' Use same CPUID input value for each call.
        '
     !  XOR eax, eax
        '
        ' Flush pipe and wait for pending ops to finish.
        '
     !  cpuid
        '
        ' Read Time Stamp Counter.
        '
     !  rdtsc  ' <----------------  PB Compiler does not recognize this ***
        '
        ' Save count.
        '
     !  mov [CounterTSC1], eax
     !  mov [CounterTSC1+4], edx
END SUB
     
'¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤'
SUB CounterCode2

  ! XOR eax, eax
  ! cpuid
  ! rdtsc       ' <----------------  PB Compiler does not recognize this ***
  ! mov [CounterTSC2], eax
  ! mov [CounterTSC2+4], edx

END SUB


'¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤'
MACRO COUNTER_BEGIN( loop_count, priority_class ) 

    CounterOverhead = 2000000000
    CounterCycles   = 2000000000
    SetPriorityClass( GetCurrentProcess(), priority_class )
    Sleep_(0) '' Start a new time slice

    '' The nops compensate for the 10-byte instruction (that
    '' initializes LoopCounter) between the alignment
    '' directive and the loop label, which ideally needs to be
    '' aligned on a 16-byte boundary.


    !  .balign 16
    !  nop
    !  nop
    !  nop
    !  nop
    !  nop
    !  nop

    FOR LoopCounter = 1 TO loop_count
        CounterCode1
        CounterCode2
        IF (CounterTSC2 - CounterTSC1) < CounterOverhead THEN  CounterOverhead = CounterTSC2 - CounterTSC1

    NEXT   

    Sleep_(0)                 '' Start a new time slice

    !  .balign 16
    !  nop
    !  nop
    !  nop
    !  nop
    !  nop
    !  nop

    FOR LoopCounter = 1 TO loop_count  '' *** Note the open FOR loop ***
        GOSUB CounterCode1

END MACRO
           


       
'¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤'
MACRO COUNTER_END

        GOSUB CounterCode2
        IF (CounterTSC2 - CounterTSC1) < CounterCycles THEN CounterCycles = CounterTSC2 - CounterTSC1
    NEXT
    CALL SetPriorityClass( GetCurrentProcess(), NORMAL_PRIORITY_CLASS )
    CounterCycles = CounterCycles - CounterOverhead       


END MACRO
             

           

'¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤'
SUB time_stamp_count(tick AS QUAD) ' CPU Clock count    Charles E V Pegge

  '---------------------------'
  '                           ' approx because it is not a serialised instruction
  '                           ' it may execute before or after other instructions
  '                           ' in the pipeline.
  ! mov ebx,tick              ' var address where count is to be stored.
  ! db  &h0f,&h31             ' RDTSC read time-stamp counter into edx:eax hi lo.
  ! mov [ebx],eax             ' save low order 4 bytes.
  ! mov [ebx+4],edx           ' save high order 4 bytes.
  '---------------------------'

END SUB
           

'¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
FUNCTION PBMAIN
               
  LOCAL i, Rounding, nLoops, RetVal AS LONG 
  LOCAL cBeg, cEnd AS QUAD  ' for time stamp, measuring cpu clock cycles
  LOCAL d AS DOUBLE
  LOCAL s, sTemp AS STRING

               
  hDbg = FREEFILE '
  OPEN "Debug.txt" FOR OUTPUT LOCK WRITE AS hDbg ' PRINT #hDbg, "MetersToFt="+STR$(MetersToFt)
             

    nLoops = 100000
                           
           
    d = 523.34#
   
    COUNTER_BEGIN '

      FOR i = 1 TO nLoops
        d = d * 3 
        d = d / 3
      NEXT               

    COUNTER_END ' 

    s = s + "VAL =" + STR$(d) + ",   Clock Cycles="+STR$( (CounterCycles)\nLoops ) + $CRLF + $CRLF
                 
PRINT #hDbg, s
                   
MSGBOX s,64,"All Done"  : EXIT FUNCTION


  CLOSE hDbg

END FUNCTION

'¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤


My ASM knowledge is limited, but this will nto compile.

MichaelW

Synthesizing unsupported instructions with db (e.g. ! db &H0F, &H31 for RDTSC) is nothing unusual.

The:

#INCLUDE once "windows.bi"

was necessary for FreeBASIC to pull in the header files and import libraries necessary to call the SetPriorityClass, GetCurrentProcess, and Sleep functions. I have no idea how you might do this with PB.

The Sleep_(0) calls the Windows API Sleep function passing a dwMilliseconds value of 0. The function was renamed to avoid a conflict with the FreeBASIC SLEEP function.

The .balign is a GAS alignment directive. Again, I have no idea how you might do this with PB.
eschew obfuscation

hutch--

Mike,

here are two PB macros for the two instructions.


    MACRO rdtsc = ! db &H0F,&H31

    MACRO cpuid = ! db &H0F,&HA2


Note that you use them WITHOUT the leading "!" as they ae macros, not inline assembler.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

MikeT

Thx hutch.

I get a compiler error:
Assembler Syntax Error
for
     !  mov [CounterTSC1], eax
I suspect the square brackets. Can I replace them with round brackets?

Also, is the PB sleep interchangable with sleep in this code?
ie starting a new time slice?



                         

     

#COMPILE EXE
#DIM ALL
       
#INCLUDE "WIN32API.inc"         ' Basic Win API definitions   

GLOBAL hDbg AS LONG             
               
' #INCLUDE once "windows.bi"  ?

GLOBAL  CounterTSC1, CounterTSC2, CounterOverhead , CounterCycles AS LONG
GLOBAL  LoopCounter AS INTEGER
           
                     
MACRO rdtsc = ! db &H0F,&H31

MACRO cpuid = ! db &H0F,&HA2

'¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤'
SUB CounterCode1

        '
        ' Use same CPUID input value for each call.
        '
     !  XOR eax, eax
        '
        ' Flush pipe and wait for pending ops to finish.
        '
     !  cpuid
        '
        ' Read Time Stamp Counter.
        '
        rdtsc 
        '
        ' Save count.
        '
     !  mov [CounterTSC1], eax
     !  mov [CounterTSC1+4], edx
END SUB
     
'¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤'
SUB CounterCode2

  ! XOR eax, eax
    cpuid
    rdtsc       
  ! mov [CounterTSC2], eax
  ! mov [CounterTSC2+4], edx

END SUB


'¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤'
MACRO COUNTER_BEGIN( loop_count, priority_class ) 

    CounterOverhead = 2000000000
    CounterCycles   = 2000000000 
                         
    CALL SetPriorityClass( GetCurrentProcess, priority_class )

    SLEEP 0 ' Start a new time slice
             

    '' The nops compensate for the 10-byte instruction (that
    '' initializes LoopCounter) between the alignment
    '' directive and the loop label, which ideally needs to be
    '' aligned on a 16-byte boundary.


    !  .balign 16
    !  nop
    !  nop
    !  nop
    !  nop
    !  nop
    !  nop

    FOR LoopCounter = 1 TO loop_count
        CounterCode1
        CounterCode2
        IF (CounterTSC2 - CounterTSC1) < CounterOverhead THEN  CounterOverhead = CounterTSC2 - CounterTSC1

    NEXT   

    SLEEP 0 ' Start a new time slice

    !  .balign 16
    !  nop
    !  nop
    !  nop
    !  nop
    !  nop
    !  nop

    FOR LoopCounter = 1 TO loop_count  '' *** Note the open FOR loop ***
        GOSUB CounterCode1

END MACRO
           


       
'¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤'
MACRO COUNTER_END

        GOSUB CounterCode2
        IF (CounterTSC2 - CounterTSC1) < CounterCycles THEN CounterCycles = CounterTSC2 - CounterTSC1
    NEXT
    CALL SetPriorityClass( GetCurrentProcess, %NORMAL_PRIORITY_CLASS )
    CounterCycles = CounterCycles - CounterOverhead       


END MACRO
             

           

'¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤'
SUB time_stamp_count(tick AS QUAD) ' CPU Clock count    Charles E V Pegge

  '---------------------------'
  '                           ' approx because it is not a serialised instruction
  '                           ' it may execute before or after other instructions
  '                           ' in the pipeline.
  ! mov ebx,tick              ' var address where count is to be stored.
  ! db  &h0f,&h31             ' RDTSC read time-stamp counter into edx:eax hi lo.
  ! mov [ebx],eax             ' save low order 4 bytes.
  ! mov [ebx+4],edx           ' save high order 4 bytes.
  '---------------------------'

END SUB
           

'¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
FUNCTION PBMAIN
               
  LOCAL i, Rounding, nLoops, RetVal AS LONG 
  LOCAL cBeg, cEnd AS QUAD  ' for time stamp, measuring cpu clock cycles
  LOCAL d AS DOUBLE
  LOCAL s, sTemp AS STRING

               
  hDbg = FREEFILE '
  OPEN "Debug.txt" FOR OUTPUT LOCK WRITE AS hDbg ' PRINT #hDbg, "MetersToFt="+STR$(MetersToFt)
             

    nLoops = 100000
                           
           
    d = 523.34#
   
    COUNTER_BEGIN(nLoops, %HIGH_PRIORITY_CLASS) ' REALTIME_PRIORITY_CLASS

      FOR i = 1 TO nLoops
        d = d * 3 
        d = d / 3
      NEXT               

    COUNTER_END ' 

    s = s + "VAL =" + STR$(d) + ",   Clock Cycles="+STR$( (CounterCycles)\nLoops ) + $CRLF + $CRLF
                 
PRINT #hDbg, s
                   
MSGBOX s,64,"All Done"  : EXIT FUNCTION


  CLOSE hDbg

END FUNCTION

'¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤




MikeT

Still trying to figure out how to convert

! mov [CounterTSC2+4], edx

to PB. The Compiler baulks at the square brackets...

hutch--

Mike,

PB is a bit deviant in the notation for complex addressing mode.


replace this
! mov [CounterTSC2+4], edx
with
! mov CounterTSC2[4], edx


See if this works.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

MikeT

Thx Hutch,
Tha seemed to keep the compler happy, now it baulks at:
    !  .balign 16


hutch--

Mike,

Delete that line, PowerBASIC does not support code alignment.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

MichaelW

#14
And if the compiler cannot align the code there is no need for the two groups of 6 nops, and you should expect the cycle counts to vary somewhat with the memory address of the counting (FOR) loop labels. You should be able to minimize this effect by doing all of your comparison tests in the same macro at the same address, and by adding a sufficient number of nops ahead of the second Sleep(0) to ensure that both of the loop labels (the reference loop that determines the counter overhead and the test loop) are at the same relative alignment (i.e. that they are some multiple of 16 bytes apart).
eschew obfuscation