News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Bin2Time, Converts binary value to hours:min:sec.000

Started by Tight_Coder_Ex, February 01, 2008, 10:54:48 PM

Previous topic - Next topic

Tight_Coder_Ex

Entry ECX = Unsigned value of duration in milliseconds
EDX = Pointer to ASCII output.

Divisors dd 3600000, 60000, 1
 
  HrsFmt db '%d:', 0
  MinFmt db '%02d:', 0
  SecFmt db '%06d', 0
  Formats dd HrsFmt, MinFmt, SecFmt

Bin2Time

    0    57            push    edi
    1    56            push    esi
    2    53            push    ebx

; Epilog set up for applications loop

    3    52            push    edx ; Preserve initial value
    4    BE <->        mov     esi, Divisors
    9    8BFA          mov     edi, edx
    B    BB <->        mov     ebx, Formats
    10   8BD1          mov     edx, ecx
    12   33C9          xor     ecx, ecx
    14   880F          mov     [edi], cl ; Nullify previous contents
    16   B1 03         mov     cl, 3
    18   51            push    ecx

    19   AD            lodsd ; Get next divisor
    1A   8BC8          mov     ecx, eax
    1C   8BC2          mov     eax, edx
    1E   33D2          xor     edx, edx
    20   F7F9          idiv    ecx
    22   8BC8          mov     ecx, eax
    24   87F3          xchg    ebx, esi
    26   AD            lodsd ; Get pointer to next format string
    27   87F3          xchg    ebx, esi
    29   23C9          and     ecx, ecx
    2B   75 05         jnz     32 ; Was quotient null
    2D   803F 00       cmp     byte ptr [edi], 0 ; Has anything been written to output
    30   74 0F         je      59

Write next segment of string with applicable format

    32   52            push    edx
    33   51            push    ecx
    34   50            push    eax
    35   57            push    edi
    36   E8 <->        call    wsprintf ; Make this segment of string
    3B   83C4 0C       add     esp, 0C
    3E   5A            pop     edx
    3F   03F8          add     edi, eax
    41   59            pop     ecx
    42   E2 D4         loopd   19

The last part of output always has 6 characters and positions 1 & 2 must be moved to 0 & 1 and 2
replaced with the decimal point
       
    44   8BCF          mov     ecx, edi
    46   2BF8          sub     edi, eax
    48   66:8B47 01    mov     ax, [edi+1]
    4C   66:AB         stosw
    4E   C607 2E       mov     byte ptr [edi], 2E
    51   5F            pop     edi
    52   2BCF          sub     ecx, edi
    54   803F 30       cmp     byte ptr [edi], 30
    57   75 0A         jnz     63

The possibility that either hours or seconds can have a leading zero and they must
be removed from output by shifting everything upward by 1

    59   8BF7          mov     esi, edi
    5B   51            push    ecx
    5C   57            push    edi
    5D   46            inc     esi
    5E   F3:A4         rep     movsb
    60   5F            pop     edi
    61   59            pop     ecx
    62   49            dec     ecx

Epilog ECX = Number of characters in output
EDX = Unchanged, pointer to ASCII string

    63   8BD7          mov     edx, edi
    65   33C0          xor     eax, eax
    67   5B            pop     ebx
    68   5E            pop     esi
    69   5F            pop     edi
    6A   C3            ret

    6B   = 107 bytes

donkey

lods,movs and idiv are all pretty slow but since your using wsprintf you are not really looking for any conversion speed. I use the following function I wrote for WinExplorer...

UpTime FRAME pszTimeString
uses edx
LOCAL Minutes :D
LOCAL Hours :D
LOCAL Days :D

xor ebx,ebx
invoke GetTickCount

xor edx,edx
mov ecx,86400000 ; Milliseconds in one day
div ecx
mov [Days],eax

mov eax,edx
xor edx,edx
mov ecx,3600000 ; Milliseconds in one hour
div ecx
mov [Hours],eax

mov eax,edx
xor edx,edx
mov ecx,60000 ; Milliseconds in one minute

div ecx
mov [Minutes],eax

; convert to seconds with fraction
push edx
pop eax
xor edx,edx
mov ecx,1000
div ecx

CInvoke(wsprintf,[pszTimeString],offset szUptimeFmt,[Days],[Hours],[Minutes],eax,edx)

RET
szUptimeFmt: DB "%u days %02u:%02u:%02u.%u",0

ENDF
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

Tight_Coder_Ex

Interesting to see as to how you functionally did the same thing in a completely different way in an extra 2 bytes.  Of course you don't trim null information from your output string, but I don't think that would take anymore than 20 to 30 bytes.  I had contemplated accounting for days, but decided for my purposes hours would suffice.

lingo

"lods,movs and idiv are all pretty slow"

div too..  :lol

option prologue:none            ; turn it off
option epilogue:none            ;
align  16                       ;
d2dtl  proc    lpointTimeBuffer : DWORD, ntime:DWORD
       mov     eax, [esp+2*4]   ; eax-> time in miliseconds
       mov     ecx, 31B5D4h     ; 2^48 / 86400000
       mov     [esp+2*4], esi   ; saving esi 
       mul     ecx              ; high 16 bits of edx = days, dx:eax = fractional days(hours,mins,sec)
       mov     esi, [esp+1*4]   ; esi->lpTimeBuffer
       mov     ecx, edx         ; high 16 bits of ecx-> days
       mov     [esp+1*4], ebx   ; saving ebx
       shr     ecx, 16          ; edx-> days
       mov     [esi], ecx       ; saving days
       and     edx, 0FFFFh      ; edx-> fractional days
       mov     ebx, eax         ;
       mov     ecx, edx         ;
       add     eax, eax         ;
       adc     edx, edx         ;
       add     eax, ebx         ;
       adc     edx, ecx         ;
       shrd    eax, edx, 13     ;
       shr     edx, 13          ; edx = hours, eax = fractions(min,sec)
       mov     ebx, eax         ;
       shr     ebx, 4           ;     
       mov     [esi+4], edx     ; saving hours
       sub     eax, ebx         ;
       mov     ebx, eax         ;
       and     eax, 03FFFFFFh   ; eax-> low 26 bits;(min:sec)
       shr     ebx, 26          ;
       mov     edx, eax         ;       
       shr     edx, 4           ;
       mov     [esi+8], ebx     ; saving minutes
       sub     eax, edx         ;         
       mov     ebx, [esp+1*4]   ; restoring ebx
       shr     eax, 20          ; 
       mov     [esi+12], eax    ; saving seconds
       mov     esi, [esp+2*4]   ; restoring esi
       ret     2*4              ;     
d2dtl  endp                     ;     
option prologue:prologuedef     ;  turn back on the defaults
option epilogue:epiloguedef     ;


Regards,
Lingo     

NightWare

tight coder ex,
if you choose to use idiv, you should use "cdq" instruction instead of "xor edx,edx" (just a question of logic...).

another thing, xchg is also a well known slow instruction, so you should better use :
xor ebx,esi
xor esi,ebx
xor ebx,esi

here speed-ups are not essentials, but if you start to code like that, you will obtain very slow result on heavily used algo.


lingo,
:lol, nice, but you spent time for a proc that is only used once per global loop (so, few clock cycles more or less doesn't really matter here...)

Tight_Coder_Ex

Quote from: NightWare on February 09, 2008, 02:01:37 AM
tight coder ex,
if you choose to use idiv, you should use "cdq" instruction instead of "xor edx,edx" (just a question of logic...).

Only problem is that EAX is an unsigned value therefore CDQ sign extending into EDX produces inaccurate results.

Quote from: NightWare on February 09, 2008, 02:01:37 AM
another thing, xchg is also a well known slow instruction, so you should better use :
xor ebx,esi
xor esi,ebx
xor ebx,esi

here speed-ups are not essentials, but if you start to code like that, you will obtain very slow result on heavily used algo.
Do you have a resource where the timing of each instruction is depicted.  I've read a bit about the techs behind dual core and wonder if that has any significance in chip design.  I somehow have a problem thinking Intel would design something that is more efficient using 6 bytes versus one

MichaelW

The cycle counts disappeared from the Intel manuals somewhere between the P1 and the P4. Agner Fog's optimization manuals, available here, list cycle counts for some of the instructions on some of the processors, but the manuals deal more with optimization techniques, technical details,  problematic instructions, etc.

I attempted to time your code against Donkey's and lingo's, but it appears to have a stack imbalance when it reaches the return instruction. Although I did not try to debug the code, I did determine that adding two pops ahead of the return instruction allows it to return successfully with most input values, but there are still some input values < 86400000 (24 hours in ms) that will cause it to fail. The stack is not balanced within the loop, with 4 pushes and, effectively, 5 pops.
eschew obfuscation

Tight_Coder_Ex

MichealW:

Maybe due to the new technologies the specific clocks for each cycle are irrelevant due to pipelining etc.  This may be why Intel dropped that information.

I'm going to look through the code and see if I can spot the possible failure you depicted.  Stack imbalances are one of the things I use to my advantage or at least I like to thing it's an advantage.

        invoke   BeginPaint, hWnd, addr Ps
        sub esp, 8

       ...  Whatever code

        call    EndPaint

As you can see, the values I need for EndPaint are already there so I just adjust the stack and use call instead of invoke.

        push    edi
        push    esi
        push    ebx

        enter   20, 0

      .....  code that may or may not corrupt stack

        leave

        pop   ebx
        pop   esi
        pop   edi

        ret

A lot of my algos use additional stack space dependent upon conditionals.  So one time the procedure might need 1024 bytes and then another time 4096.  This method guaranttees me that the essential registers will always be visible at epilogue.

Another I'll do is push 12 parameters onto stack, call GetWindowEx and instead of pushing 12 params again, I'll just subtract 48 from stack, change what needs to be changed and then call CallWindowEx again.

I'm not trying to advocate these methods are sound.  There are a few API's where this can't be done as the stack is not returned with the same values as they were called with.  If I remember correctly, DrawFrame is one of them.  You can imagine what kind of fiasco that would be if M$ tweaked one of thier API's in a new version, it could render all the software I've done useless.

This may be why, whatever algo your using to detect stack imbalances is detecting them. 

Tight_Coder_Ex

MichealW:  I found the problem you encountered and corrected it in this new revision.  My algo was sensitive to leading null, but once a value greater than 1 was encountered it also suppressed trailing nulls.  Therefore anything evenly divisible by 60 was generating an error.

I wrote a test application of 96,000,000 iterations and they average out to 465 nanoseconds / call

  000     57               push    edi
  001     56               push    esi
  002     53               push    ebx

  003     52               push    edx

  004     BE <->    mov     esi, Divisors
  009     89D7             mov     edi, edx
  00B     BB <->           mov     ebx, Formats
  010     89CA             mov     edx, ecx
  012     31C9             xor     ecx, ecx
  014     880F             mov     [edi], cl
  016     B1 03            mov     cl, 3

  018     51               push    ecx
  019     AD               lodsd
  01A     89C1             mov     ecx, eax
  01C     89D0             mov     eax, edx
  01E     31D2             xor     edx, edx
  020     F7F9             idiv    ecx
  022     59               pop     ecx
  023     21C0             and     eax, eax
  025     75 05            jnz     02C

  027     F6C5 01          test    ch, 1
  02A     74 15            je      041

  02C     80CD 01          or      ch, 1
  02F     51               push    ecx
  030     52               push    edx
  031     50               push    eax
  032     FF33             push    dword ptr [ebx]
  034     57               push    edi
  035     E8 <.>           call    wsprintfA
  03A     83C4 0C          add     esp, 0C
  03D     01C7             add     edi, eax
  03F     5A               pop     edx
  040     59               pop     ecx

  041     83C3 04          add     ebx, 4
  044     FEC9             dec     cl
  046     75 D0            jnz     018

  048     5A               pop     edx

  049     29C7             sub     edi, eax
  04B     89FE             mov     esi, edi
  04D     46               inc     esi
  04E     66:AD            lodsw
  050     66:AB            stosw
  052     B0 2E            mov     al, 2E
  054     AA               stosb

  055     803A 30          cmp     byte ptr [edx], 30
  058     75 01            jnz     05B
  05A     42               inc     edx

  05B     5B               pop     ebx
  05C     5E               pop     esi
  05D     5F               pop     edi
  05E     C3               ret

  05F = 95 bytes

MichaelW

The corrected code works OK. Unlike yours and Donkey's code, lingo's code stores the hour, minute, and second values in the buffer. Running on a P3, the cycle count for yours and Donkey's code, with the calls to wsprintf, are around 1400 and 1800. With the calls to wsprintf commented out the cycle counts are 200 and 164, and for lingo's code 24, all with no run to run variation observed.


[attachment deleted by admin]
eschew obfuscation

NightWare

this speed test is a bit unfair... an asm beginner (he hasn't understood yet, he has to unroll his code when it's possible) and someone who said it's not an optimised algo, against a guy well know for the speed of his algos...

so here, donkey's approach (without the div instructions) added... lingo's algo still the fastest for exactly the same work, but it's a bit less unfair...

[attachment deleted by admin]

donkey

Hi,

I never really worried about speed for my algorithm, as a matter of fact I had mentioned that "using wsprintf you are not really looking for any conversion speed". But it is interesting, when I have a bit of time to spend on the problem I may revisit it, for now I am occupied with other programming stuff.

Donkey
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

MichaelW

The test wasn't intended to be unfair, it was intended to show how big of a difference optimization could make in the code that calculates the component times. I included the cycle counts for the code with the wsprintf calls to show why it makes little sense to optimize the code that calculates the component times. If I were doing this I would have done something similar to Donkey's code, straightforward and easy to understand.
eschew obfuscation

GregL

QuoteIf I were doing this I would have done something similar to Donkey's code, straightforward and easy to understand.

I agree, fast code is great but it's not everything. There's a lot to be said for stable, straightforward and easy to understand code. Once in a while, you really need to optimize for speed, but it's not that often.


NightWare

Quote from: MichaelW on February 12, 2008, 11:18:03 AM
The test wasn't intended to be unfair
perfectly understood, but if you show to a beginner an algo x time faster than the one he has developped, be sure he gonna try to use some pratice seen in the optimised algo (it's one of the goals of the forum after all...). here lingo has a long practice behind him,  he made some choice concerning his coding technics (for example esp+ use), and there is, imho, things that should be avoided. so it's essential to show there is alternatives...

Quote from: Greg on February 12, 2008, 08:30:13 PM
I agree, fast code is great but it's not everything. There's a lot to be said for stable, straightforward and easy to understand code. Once in a while, you really need to optimize for speed, but it's not that often.
i didn't say the contrary  :P