News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Building my own library

Started by jdoe, April 16, 2006, 07:06:16 AM

Previous topic - Next topic

jdoe

Hi,

I'm building my own library the same way as m32lib from MASM32 and there is something weird.
To make sure I'm doing it correctly, I did a test with szLen from m32lib. I put szLen.asm in my proc folder and built my library with it and when I doing speed test with it, I not getting the same result. On my AMD processor when I use includelib \masm32\lib\masm32.lib, I get 65 cycles for szLen but when I use \myproject\mylib.lib I get 87 cycles for the same szLen.

What can make the result to be different.


hutch--

As long as you put the identical algo in the test library, all I can think of is the location in the EXE file where the libray code has been placed by the linker. I have seen minor differences with identical code but not on the scale you have mentioned here.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Mark Jones

Hmm, as Hutch said, perhaps the routine is not aligned to the same offset, or perhaps there is a PUSHAD and POPAD being added or something. Write two identical apps and compare the disassembly. :thumbu
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

MichaelW

Or just add code to display the offset addresses of the procedures.

eschew obfuscation

jdoe

Thanks for your efforts guys, but right now I'm stuck trying to understand what I don't understand.
I don't see how the routine could be align at the same offset when it is used from a different library.

I zipped the test I'm doing in the attachment. Maybe this will help you to figure where I'm stuck.


I have added this in my signature "Giving up is not a solution"... it's my leitmotiv in all I'm doing. If I have to pass to next months on trying to understand this, I WILL.



[attachment deleted by admin]

donkey

Alignment should not be an issue here, the linker should align library code at paragraph/page boundaries. So for example if you have two routines that only take a few bytes each, there will be a large gap between them when they are linked. At least that is how it appears (using MASM/LINK) when I look at a test in Olly, this is the end of one lib function (from MASM32.LIB) and the beginning of the next.

00401172  |.^EB B6          JMP SHORT TestAlig.0040112A
00401174  |> 8B85 F8FBFFFF  MOV EAX,DWORD PTR SS:[EBP-408]
0040117A  |> 5F             POP EDI
0040117B  |. 5E             POP ESI
0040117C  |. 5B             POP EBX
0040117D  |. C9             LEAVE
0040117E  \. C2 1400        RETN 14
00401181     CC             INT3
00401182     CC             INT3
00401183     CC             INT3
00401184     CC             INT3
00401185     CC             INT3
00401186     CC             INT3
00401187     CC             INT3
00401188     CC             INT3
00401189     CC             INT3
0040118A     CC             INT3
0040118B     CC             INT3
0040118C     CC             INT3
0040118D     CC             INT3
0040118E     CC             INT3
0040118F     CC             INT3
00401190  /$ 55             PUSH EBP
00401191  |. 8BEC           MOV EBP,ESP
00401193  |. 83C4 F4        ADD ESP,-0C
00401196  |. 53             PUSH EBX
00401197  |. 56             PUSH ESI
00401198  |. 57             PUSH EDI
00401199  |. 8B45 10        MOV EAX,DWORD PTR SS:[EBP+10]
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

hutch--

jdoe,

Here is a test piece that uses the szLen proc from the library and a local copy of the identical code. With only minor variations between the two I am not getting meaningful differences from one being in a library and the other being local.


48 szLen
48 szLen2
515 timing szLen
516 timing szLen2
516 timing szLen
531 timing szLen2
516 timing szLen
515 timing szLen2
516 timing szLen
515 timing szLen2
516 timing szLen
516 timing szLen2
515 timing szLen
532 timing szLen2
515 timing szLen
516 timing szLen2
515 timing szLen
516 timing szLen2
Press any key to continue ...




; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

comment * -----------------------------------------------------
                        Build this  template with
                       "CONSOLE ASSEMBLE AND LINK"
        ----------------------------------------------------- *

    szLen2 PROTO :DWORD

    .data
      align 4
      txt db "A sadder but a wiser man he woke the morrow morn",0

    .code

start:
   
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

    call main
    inkey
    exit

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

main proc

    push esi

    invoke szLen,ADDR txt
    print str$(eax)," szLen",13,10

    invoke szLen2,ADDR txt
    print str$(eax)," szLen2",13,10

    invoke SetPriorityClass,FUNC(GetCurrentProcess),REALTIME_PRIORITY_CLASS

    REPEAT 8
  ; ------------------------------------------
    invoke GetTickCount
    push eax

    mov esi, 10000000
  @@:
    invoke szLen,ADDR txt
    sub esi, 1
    jnz @B

    invoke GetTickCount
    pop ecx
    sub eax, ecx

    print str$(eax)," timing szLen",13,10

  ; ------------------------------------------
    invoke GetTickCount
    push eax

    mov esi, 10000000
  @@:
    invoke szLen2,ADDR txt
    sub esi, 1
    jnz @B

    invoke GetTickCount
    pop ecx
    sub eax, ecx

    print str$(eax)," timing szLen2",13,10

  ; ------------------------------------------
    ENDM

    invoke SetPriorityClass,FUNC(GetCurrentProcess),NORMAL_PRIORITY_CLASS

    pop esi

    ret

main endp

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

align 4

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

align 4

szLen2 proc src:DWORD

    mov eax, [esp+4]
    sub eax, 4

  @@:
    add eax, 4
    cmp BYTE PTR [eax], 0
    je lb1
    cmp BYTE PTR [eax+1], 0
    je lb2
    cmp BYTE PTR [eax+2], 0
    je lb3
    cmp BYTE PTR [eax+3], 0
    jne @B

    sub eax, [esp+4]
    add eax, 3
    ret 4
  lb3:
    sub eax, [esp+4]
    add eax, 2
    ret 4
  lb2:
    sub eax, [esp+4]
    add eax, 1
    ret 4
  lb1:
    sub eax, [esp+4]
    ret 4

szLen2 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

end start
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

GregL

Different ML amd LINK options?


jdoe

SHAME ON ME   :(

It's a stupid error of mine. I was comparing different algo together szLen/StrLen. That where the different timing come from. Now everything is logic and my mind is in peace.


I'm so sorry because I made you loose you time. Forgive me.  :red  :red  :red



hutch--

 :bg

No problems, as l;ong as you write some rocket libraries and share them around.  :U
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jdoe

Quote from: hutch-- on April 17, 2006, 04:06:30 AM
No problems, as l;ong as you write some rocket libraries and share them around.  :U

Count on me  :wink

----------------------------------

I'm less confused now than was about alignment. But how far can I go with it, knowing that some processor may not take advantage of it. For example this code.


.586

.model flat, stdcall

option casemap:none

.code

option prologue:none
option epilogue:none

align 4
nop
;
; Return characters length of lpszStr excluding zero-terminated char
;
StrLenA proc p_lpszStr:dword

   mov eax, [esp+4]

   align 16
@@:
   mov edx, dword ptr [eax]
   add eax, 4
   test edx, 0FFh
   jz Lbl_0
   test edx, 0FF00h
   jz Lbl_1
   test edx, 0FF0000h
   jz Lbl_2
   test edx, 0FF000000h
   jnz @B

   sub eax, 1
   sub eax, [esp+4]
   ret 4

Lbl_2:

   sub eax, 2
   sub eax, [esp+4]
   ret 4

Lbl_1:

   sub eax, 3
   sub eax, [esp+4]
   ret 4

Lbl_0:

   sub eax, 4
   sub eax, [esp+4]
   ret 4

StrLenA endp

option prologue:prologuedef
option epilogue:epiloguedef

end


I did alignment as much as I could and it is the best performance I can have on my AMD processor. But if I'm writing a program that I want to share, it may not be as fast for the others. Is there a guideline to know what is good for general processor use.


zooba

If you're keen to write a new and useful library, make one that can detect and store the processor type at the start and then automatically use the best procedure for the CPU.

By storing the CPU type (ie. 0 = pre-P3, 1 = P3, 2 = P4,... etc.) you only have to detect it once and then it's a simple branch at the start of the proc. The branch will be just after a call, so branch prediction won't be an issue.

That way anyone can easily use the best code for any processor and you will be a cult hero :bg ... maybe :U

jdoe

Quote from: zooba on April 17, 2006, 06:00:49 AM
...and you will be a cult hero :bg ... maybe :U

In my dream  :P


-----------------------------------

I did few test with building a LIB file and I don't find any reasons to add align X before any procedures because once compiled it don't change nothing. Procedures are always align on a 16 byte paragraph boundary like donkey says previously. If I look at masm32 library, align X is used before many procedures so, since I'm new to MASM I'm thinking there must be a reason to do so.
What is this new mystery ?


hutch--

jdoe,

The alignment for the start of a procedure in a library module is actually controlled by how the module is written with the particular compiler or assembler with COFF format and it ranges from 1 to 8192 byte alignment.

In the case of most of the modules in the masm32 library, I write them in an EXE file first to make sure they work properly them copy them complete into a seperate module that is built into an object module and combined with the main library. You could probably remove the leading alignment in the module source code but I doubt it effects much.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php