Print Page - masm.lib functions dw2ah and a2dw

Title: masm.lib functions dw2ah and a2dw
Post by: Larry Hammick on November 30, 2007, 11:28:34 AM

These look to me like improvements:

dw2ah proc public uses ebxÂ  dwValue:DWORD, lpBuffer:DWORD

Â  Â  mov ebx,lpBuffer
Â  Â  mov ecx,8Â  Â  Â  ;cheaper but slower is push 8 \ pop ecx.
Â  Â  mov word ptr[ebx+ecx],"H"Â  ;puts "H" in the low byte and zero in the high.
Â  Â  mov edx,dwValueÂ  Â  ;get the value from the stack onto the CPU just once.
@@: mov al,dl
Â  Â  ror edx,4
Â  Â  and al,0Fh
Â  Â  cmp al,0AhÂ  ;A trick using binary coded decimal in AL.
Â  Â  sbb al,69h
Â  Â  dasÂ  Â  Â  ;das and other b.c.d. opcodes use the auxiliary carry flag AF.
Â  Â  dec ecx
Â  Â  mov [ebx+ecx], al
Â  Â  jnz short @B
Â  Â  ret

dw2ah endp

a2dw proc uses ebx edi ecx edx String:DWORD
Â  Â  Â  ;----------------------------------------
Â  Â  Â  ; Convert decimal string into dword value
Â  Â  Â  ; return value in eax
Â  Â  Â  ;----------------------------------------
Â  Â  Â  ; Does not detect overflow of 32 bits, nor any invalid digit
Â  Â  Â  ;----------------------------------------
Â  Â  Â  mov ebx, 10
Â  Â  Â  xor ecx, ecxÂ  Â  ;ecx will accumulate the number
Â  Â  Â  mov edi, String
@@:Â  Â mov al,[edi]
Â  Â  Â  and eax,0FFh
Â  Â  Â  jz short @F
Â  Â  Â  sub al,"0"
Â  Â  Â  inc edi
Â  Â  Â  xchg eax, ecx
Â  Â  Â  mul ebx
Â  Â  Â  add ecx, eax
Â  Â  Â  jmp short @B
@@:Â  Â xchg eax, ecx
Â  Â  Â  ret

a2dw endp

Here's a variation of a2dw for reading decimal quadwords:

Code Select

a2qw:Â  Â ;read asciiz string to quadword edx:eax
Â  Â  push ebp
Â  Â  mov ebp,esp
Â  Â  push esi
Â  Â  mov esi,[ebp+8]
Â  Â  push ebx
Â  Â  push edi
Â  Â  xor ebx,ebxÂ  Â ;ebx:edi will accumulate the number
Â  Â  xor edi,edi
Â  Â  mov ecx,10
@@: xor eax,eax
Â  Â  cdq
Â  Â  lodsb
Â  Â  or al,al
Â  Â  jz short @F
Â  Â  sub al,"0"
Â  Â  xchg eax,ebx
Â  Â  mul ecx
Â  Â  ;jc Overflowed
Â  Â  xchg eax,ebx
Â  Â  xchg eax,edi
Â  Â  mul ecx
Â  Â  add edi,eax
Â  Â  adc ebx,edx
Â  Â  ;jc Overflowed
Â  Â  jmp short @B
@@: mov edx,ebx
Â  Â  xchg eax,edi
Â  Â  pop edi
Â  Â  pop ebx
Â  Â  pop esi
Â  Â  pop ebp
Â  Â  ret 4

Title: Re: masm.lib functions dw2ah and a2dw
Post by: Larry Hammick on December 01, 2007, 08:56:40 AM

ltoa can also be slightly improved:

Code Select


ltoa proc lValue:DWORD, lpBuffer:DWORD

comment * -------------------------------------------------------
Â  Â  Â  Â  convert signed 32 bit integer "lValue" to zero terminated
Â  Â  Â  Â  string and store string at address in "lpBuffer"
Â  Â  Â  Â  ------------------------------------------------------- *

Â  Â  push lValue
Â  Â  call @F  ;a trick for pushing the address of a constant string or other read-only data
Â  Â  db "%ld",0
Â  @@:
Â  Â  push lpBuffer
Â  Â  call wsprintf
Â  Â  cmp eax, 3
Â  Â  jge @F
Â  Â  xor eax, eaxÂ  Â  ; zero EAX on fail
Â  @@:Â  Â  Â  Â  Â  Â  Â  Â ; else EAX contains count of bytes written
Â  Â  ret

ltoa endp

Title: Re: masm.lib functions dw2ah and a2dw
Post by: Vortex on December 01, 2007, 10:06:53 AM

Hi Larry,

You need to balance the stack after calling wsprintf :

Code Select


.
.
push lpBuffer
call wsprintf
add esp,3*4  ;  three parameters are passed to wsprintf
.
.

wsprintf is a C function.

Title: Re: masm.lib functions dw2ah and a2dw
Post by: Larry Hammick on December 02, 2007, 12:04:11 AM

Quote from: Vortex on December 01, 2007, 10:06:53 AM
Hi Larry,

You need to balance the stack after calling wsprintf :
Code Select Expand
. . push lpBuffer call wsprintf add esp,3*4Â ;Â three parameters are passed to wsprintf . .

wsprintf is a C function.

True, wsprintf doesn't clean the stack, but the "proc" thingee looks after it, by using "leave". Here's the original disassembled:
0040106EÂ /. 55Â Â Â Â Â Â Â PUSH EBP
0040106FÂ |. 8BECÂ Â Â Â Â Â MOV EBP,ESP
00401071Â |. EB 04Â Â Â Â Â JMP SHORT c.00401077
00401073Â |. 25 6C 64 00Â Â ASCII "%ld",0
00401077Â |> FF75 08Â Â Â Â PUSH DWORD PTR SS:[EBP+8]
0040107AÂ |. 68 73104000Â Â PUSH c.00401073
0040107FÂ |. FF75 0CÂ Â Â Â PUSH DWORD PTR SS:[EBP+C]
00401082Â |. E8 9F010000Â Â CALL <JMP.&user32.wsprintfA>
00401087Â |. 83C4 0CÂ Â Â Â ADD ESP,0C
0040108AÂ |. 83F8 03Â Â Â Â CMP EAX,3
0040108DÂ |. 7D 02Â Â Â Â Â JGE SHORT c.00401091
0040108FÂ |. 33C0Â Â Â Â Â Â XOR EAX,EAX
00401091Â |> C9Â Â Â Â Â Â Â LEAVE
00401092Â \. C2 0800Â Â Â Â RETN 8

Title: Re: masm.lib functions dw2ah and a2dw
Post by: Vortex on December 02, 2007, 10:48:41 AM

Interesting. Could you provide the full source code? This is my testing :

test.asm :

Code Select

.386
.model flat, stdcall
option casemap:none

include \masm32\include\user32.inc

.code

ltoa proc lValue:DWORD, lpBuffer:DWORD

comment * -------------------------------------------------------
        convert signed 32 bit integer "lValue" to zero terminated
        string and store string at address in "lpBuffer"
        ------------------------------------------------------- *

    push lValue
    call @F  ;a trick for pushing the address of a constant string or other read-only data
    db "%ld",0
  @@:
    push lpBuffer
    call wsprintf
    cmp eax, 3
    jge @F
    xor eax, eax    ; zero EAX on fail
  @@:               ; else EAX contains count of bytes written
    ret

ltoa endp

end

Disassembling with Agner Fog's tool objconv.exe :

Code Select

objconv.exe -fasm test.obj disasm.asm

Code Select


; Disassembly of file: test.obj
; Sun Dec 02 12:34:01 2007

; Mode: 32 bits
; Syntax: MASM/ML
; Instruction set: 80386

.386
option dotname
.model flat

public _ltoa@8

extern _wsprintfA: near


_text   SEGMENT DWORD PUBLIC 'CODE'                     ; section number 1

_ltoa@8 PROC NEAR
        push    ebp                                     ; 0000 _ 55
        mov     ebp, esp                                ; 0001 _ 8B. EC
        push    dword ptr [ebp + 08H]                   ; 0003 _ FF. 75, 08
        call    ?_002                                   ; 0006 _ E8, 00000004
?_001:
; Error: Instruction out of phase with next label
;       and     eax, 0FF00646CH                         ; 000B _ 25, FF00646C
        db 25H, 6CH, 64H, 00H

?_002   LABEL NEAR
        push    dword ptr [ebp + 0CH]                   ; 000F _ FF. 75, 0C
        call    _wsprintfA                              ; 0012 _ E8, 00000000(rel)
        cmp     eax, 3                                  ; 0017 _ 83. F8, 03
        jge     ?_003                                   ; 001A _ 7D, 02
        xor     eax, eax                                ; 001C _ 33. C0
?_003:  leave                                           ; 001E _ C9
        ret     8                                       ; 001F _ C2, 0008
_ltoa@8 ENDP
_text   ENDS

_data   SEGMENT DWORD PUBLIC 'DATA'                     ; section number 2

        db      34 dup (?)                              ; 0000 _ 
_data   ENDS

END

Title: Re: masm.lib functions dw2ah and a2dw
Post by: MichaelW on December 02, 2007, 07:16:55 PM

In my tests on a P3, the â€œimprovedâ€? version is ~17 cycles slower than the original. The cycle count for wsprintf is some 33 times larger than the cycle count for the other code, so there is little benefit to be had from optimization.

Code Select


; Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«
Â  Â  include \masm32\include\masm32rt.inc
Â  Â  .686
Â  Â  include \masm32\macros\timers.asm
; Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«
Â  Â  .data
Â  Â  Â  buffer db 16 dup(0)
Â  Â  .code
; Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«

align 4
_ltoa proc lValue:DWORD, lpBuffer:DWORD

comment * -------------------------------------------------------
Â  Â  Â  Â  convert signed 32 bit integer "lValue" to zero terminated
Â  Â  Â  Â  string and store string at address in "lpBuffer"
Â  Â  Â  Â  ------------------------------------------------------- *

Â  Â  push lValue
Â  Â  call @FÂ  ;a trick for pushing the address of a constant string or other read-only data
Â  Â  db "%ld",0
Â  @@:
Â  Â  push lpBuffer
Â  Â  call wsprintf
Â  Â  ;add esp, 12

Â  Â  cmp eax, 3
Â  Â  jge @F
Â  Â  xor eax, eaxÂ  Â  ; zero EAX on fail
Â  @@:Â  Â  Â  Â  Â  Â  Â  Â ; else EAX contains count of bytes written
Â  Â  ret

_ltoa endp

; Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«
start:
; Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«
Â  Â  invoke _ltoa, 12345678, ADDR buffer
Â  Â  print ADDR buffer,13,10

Â  Â  invoke Sleep, 3000

Â  Â  counter_begin 100000,HIGH_PRIORITY_CLASS
Â  Â  Â  invoke _ltoa, 12345678, ADDR buffer
Â  Â  counter_end
Â  Â  print ustr$(eax)," cycles",13,10

Â  Â  counter_begin 100000,HIGH_PRIORITY_CLASS
Â  Â  Â  invoke ltoa, 12345678, ADDR buffer
Â  Â  counter_end
Â  Â  print ustr$(eax)," cycles",13,10

Â  Â  counter_begin 100000,HIGH_PRIORITY_CLASS
Â  Â  Â  invoke _ltoa, 12345678, ADDR buffer
Â  Â  counter_end
Â  Â  print ustr$(eax)," cycles",13,10

Â  Â  inkey "Press any key to exit..."
Â  Â  exit
; Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«Â«
end start

Title: Re: masm.lib functions dw2ah and a2dw
Post by: Larry Hammick on December 02, 2007, 09:03:30 PM

Quote from: MichaelW on December 02, 2007, 07:16:55 PM
In my tests on a P3, the â€œimprovedâ€? version is ~17 cycles slower than the original. The cycle count for wsprintf is some 33 times larger than the cycle count for the other code, so there is little benefit to be had from optimization.

Surprising. I guess the "call" instruction costs mips, whereas the original uses a jump instead, and then loads a constant address from the code section. And no doubt the wsprintf swamps the rest of the code anyhow.
Here are Olly's disassembly of the new and old ltoa.

Code Select

00401000 55Â  Â  Â  Â  Â  Â  Â PUSH EBP
00401001 8BECÂ  Â  Â  Â  Â  Â MOV EBP,ESP
00401003 FF75 08Â  Â  Â  Â  PUSH DWORD PTR SS:[EBP+8]
00401006 E8 04000000Â  Â  CALL c.0040100F
0040100B 25 6C 64 00Â  Â  ASCII "%ld",0
0040100F FF75 0CÂ  Â  Â  Â  PUSH DWORD PTR SS:[EBP+C]
00401012 E8 07020000Â  Â  CALL <JMP.&user32.wsprintfA>
00401017 83F8 03Â  Â  Â  Â  CMP EAX,3
0040101A 7D 02Â  Â  Â  Â  Â  JGE SHORT c.0040101E
0040101C 33C0Â  Â  Â  Â  Â  Â XOR EAX,EAX
0040101E C9Â  Â  Â  Â  Â  Â  Â LEAVE
0040101F C2 0800Â  Â  Â  Â  RETN 8
;;;;;;;;
00401022 55Â  Â  Â  Â  Â  Â  Â PUSH EBP
00401023 8BECÂ  Â  Â  Â  Â  Â MOV EBP,ESP
00401025 EB 04Â  Â  Â  Â  Â  JMP SHORT c.0040102B
00401027 25 6C 64 00Â  Â  ASCII "%ld",0
0040102B FF75 08Â  Â  Â  Â  PUSH DWORD PTR SS:[EBP+8]
0040102E 68 27104000Â  Â  PUSH c.00401027
00401033 FF75 0CÂ  Â  Â  Â  PUSH DWORD PTR SS:[EBP+C]
00401036 E8 E3010000Â  Â  CALL <JMP.&user32.wsprintfA>
0040103B 83C4 0CÂ  Â  Â  Â  ADD ESP,0C
Â  Â  Â ;add esp,0Ch does not appear in the original asm file, but
Â  Â  Â ;got inserted by "invoke", apparently
0040103E 83F8 03Â  Â  Â  Â  CMP EAX,3
00401041 7D 02Â  Â  Â  Â  Â  JGE SHORT c.00401045
00401043 33C0Â  Â  Â  Â  Â  Â XOR EAX,EAX
00401045 C9Â  Â  Â  Â  Â  Â  Â LEAVE
00401046 C2 0800Â  Â  Â  Â  RETN 8

Vortex, objconv tried to interpret the format string as code rather than embedded data, but the revised version runs okay. (Ollydbg -- a remarkably smart program -- figured out that it was ascii.) The new version is five bytes smaller, if that counts for anything. Three of those bytes could be conserved in the old version by using CALL instead of INVOKE for wsprintf.

Title: Re: masm.lib functions dw2ah and a2dw
Post by: Vortex on December 02, 2007, 09:09:37 PM

Yes, it's true that objconv interprets the ASCII byte sequence as code but it's a powerfull tool to disassemble object code. It appears that leave does the stack balancing job. The Visual C++ compiler is also applying this trick to optimize code. ( it does not insert automaticaly the add esp,XX statement if it can find a case to do this optimization. )

Title: Re: masm.lib functions dw2ah and a2dw
Post by: Mark Jones on December 03, 2007, 04:52:04 AM

If someone felt like making a small testbed app looping through these routines for a few seconds, I could feed it through CodeAnalyst and report exactly what was slowing it down, from cache misses, to branch mispredictions, to register stalls.

I fed it a simple console prime-number-factorization routine and found some interesting and unusual caveats, like a XOR in my code that was 15x slower than a DIV.

Title: Re: masm.lib functions dw2ah and a2dw
Post by: jj2007 on December 03, 2007, 09:00:45 AM

Quote from: Larry Hammick on December 02, 2007, 09:03:30 PM
Code Select Expand
0040102B FF75 08 PUSH DWORD PTR SS:[EBP+8] 0040102E 68 27104000 PUSH c.00401027 00401033 FF75 0C PUSH DWORD PTR SS:[EBP+C] 00401036 E8 E3010000 CALL <JMP.&user32.wsprintfA> 0040103B 83C4 0C ADD ESP,0C ;add esp,0Ch does not appear in the original asm file, but ;got inserted by "invoke", apparently

Balancing with LEAVE is ok if this happens in a subroutine without loops. Imagine what happens if you do the call 10,000 times before the code runs into LEAVE... :red
Stupid question: If I use

Code Select


	invoke GetProcAddress,hWindowsDLL, chr$('AnyAPI_Export')
	push 123
	push NULL	; some params
	call eax
	add esp,8	; needed or not?

Do Windows APIs cleanup the stack? Do MASM macros behave the same?

Title: Re: masm.lib functions dw2ah and a2dw
Post by: Larry Hammick on December 03, 2007, 12:12:04 PM

I'm not a great master of Windows API, but as far as I know, all the functions clean the stack except wsprintf, no doubt because that one takes a variable number of parameters.
While we're at it, here's a revision of getcl.asm in masm32.lib. Replace

Code Select

Â  Â  xor ecx, ecxÂ  Â  Â  Â  Â  Â  ; zero ecx & use as counter
Â  Â  mov esi, lpCmdLine
Â  Â  
Â  Â  @@:
Â  Â  Â  lodsb
Â  Â  Â  cmp al, 0
Â  Â  Â  je @F
Â  Â  Â  cmp al, 34Â  Â  Â  Â  Â  Â  ; [ " ] character
Â  Â  Â  jne @B
Â  Â  Â  inc ecxÂ  Â  Â  Â  Â  Â  Â  Â ; increment counter
Â  Â  Â  jmp @B
Â  Â  @@:

Â  Â  push ecxÂ  Â  Â  Â  Â  Â  Â  Â  ; save count

Â  Â  shr ecx, 1Â  Â  Â  Â  Â  Â  Â  ; integer divide ecx by 2
Â  Â  shl ecx, 1Â  Â  Â  Â  Â  Â  Â  ; multiply ecx by 2 to get dividend

Â  Â  pop eaxÂ  Â  Â  Â  Â  Â  Â  Â  Â ; put count in eax
Â  Â  cmp eax, ecxÂ  Â  Â  Â  Â  Â  ; check if they are the same
Â  Â  je @F

with

Code Select

Â  Â  xor ecx,ecxÂ  Â  Â  ;cl is the number of " symbols mod 2
Â  Â  mov esi, lpCmdLine
Â  Â  
Â  Â  @@:
Â  Â  Â  lodsb
Â  Â  Â  cmp al, 0Â  Â  Â ;variant: cmp al,ch
Â  Â  Â  je @F
Â  Â  Â  cmp al,'"'
Â  Â  Â  jne @B
Â  Â  Â  not clÂ  Â  Â  Â  Â ; flip the counter (mod 2)
Â  Â  Â  jmp @B
Â  Â  @@:
Â  Â  jecxz @FÂ  Â  Â ;compact, but not the fastest possible way

Code Select

Â  Â  xor ecx,ecxÂ  Â  Â  ;ecx counts the number of " symbols
Â  Â  mov esi, lpCmdLine
Â  Â  
Â  Â  @@:
Â  Â  Â  lodsb
Â  Â  Â  cmp al, 0
Â  Â  Â  je @F
Â  Â  Â  cmp al,'"'
Â  Â  Â  jne @B
Â  Â  Â  inc ecx
Â  Â  Â  jmp @B
Â  Â  @@:
Â  Â  shr ecx,1
Â  Â  jnc @F

The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: Larry Hammick on November 30, 2007, 11:28:34 AM