Hi,
I'm building my own library the same way as m32lib from MASM32 and there is something weird.
To make sure I'm doing it correctly, I did a test with szLen from m32lib. I put szLen.asm in my proc folder and built my library with it and when I doing speed test with it, I not getting the same result. On my AMD processor when I use includelib \masm32\lib\masm32.lib, I get 65 cycles for szLen but when I use \myproject\mylib.lib I get 87 cycles for the same szLen.
What can make the result to be different.
As long as you put the identical algo in the test library, all I can think of is the location in the EXE file where the libray code has been placed by the linker. I have seen minor differences with identical code but not on the scale you have mentioned here.
Hmm, as Hutch said, perhaps the routine is not aligned to the same offset, or perhaps there is a PUSHAD and POPAD being added or something. Write two identical apps and compare the disassembly. :thumbu
Or just add code to display the offset addresses of the procedures.
Thanks for your efforts guys, but right now I'm stuck trying to understand what I don't understand.
I don't see how the routine could be align at the same offset when it is used from a different library.
I zipped the test I'm doing in the attachment. Maybe this will help you to figure where I'm stuck.
I have added this in my signature "Giving up is not a solution"... it's my leitmotiv in all I'm doing. If I have to pass to next months on trying to understand this, I WILL.
[attachment deleted by admin]
Alignment should not be an issue here, the linker should align library code at paragraph/page boundaries. So for example if you have two routines that only take a few bytes each, there will be a large gap between them when they are linked. At least that is how it appears (using MASM/LINK) when I look at a test in Olly, this is the end of one lib function (from MASM32.LIB) and the beginning of the next.
00401172 |.^EB B6 JMP SHORT TestAlig.0040112A
00401174 |> 8B85 F8FBFFFF MOV EAX,DWORD PTR SS:[EBP-408]
0040117A |> 5F POP EDI
0040117B |. 5E POP ESI
0040117C |. 5B POP EBX
0040117D |. C9 LEAVE
0040117E \. C2 1400 RETN 14
00401181 CC INT3
00401182 CC INT3
00401183 CC INT3
00401184 CC INT3
00401185 CC INT3
00401186 CC INT3
00401187 CC INT3
00401188 CC INT3
00401189 CC INT3
0040118A CC INT3
0040118B CC INT3
0040118C CC INT3
0040118D CC INT3
0040118E CC INT3
0040118F CC INT3
00401190 /$ 55 PUSH EBP
00401191 |. 8BEC MOV EBP,ESP
00401193 |. 83C4 F4 ADD ESP,-0C
00401196 |. 53 PUSH EBX
00401197 |. 56 PUSH ESI
00401198 |. 57 PUSH EDI
00401199 |. 8B45 10 MOV EAX,DWORD PTR SS:[EBP+10]
jdoe,
Here is a test piece that uses the szLen proc from the library and a local copy of the identical code. With only minor variations between the two I am not getting meaningful differences from one being in a library and the other being local.
48 szLen
48 szLen2
515 timing szLen
516 timing szLen2
516 timing szLen
531 timing szLen2
516 timing szLen
515 timing szLen2
516 timing szLen
515 timing szLen2
516 timing szLen
516 timing szLen2
515 timing szLen
532 timing szLen2
515 timing szLen
516 timing szLen2
515 timing szLen
516 timing szLen2
Press any key to continue ...
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
include \masm32\include\masm32rt.inc
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
comment * -----------------------------------------------------
Build this template with
"CONSOLE ASSEMBLE AND LINK"
----------------------------------------------------- *
szLen2 PROTO :DWORD
.data
align 4
txt db "A sadder but a wiser man he woke the morrow morn",0
.code
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
call main
inkey
exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
main proc
push esi
invoke szLen,ADDR txt
print str$(eax)," szLen",13,10
invoke szLen2,ADDR txt
print str$(eax)," szLen2",13,10
invoke SetPriorityClass,FUNC(GetCurrentProcess),REALTIME_PRIORITY_CLASS
REPEAT 8
; ------------------------------------------
invoke GetTickCount
push eax
mov esi, 10000000
@@:
invoke szLen,ADDR txt
sub esi, 1
jnz @B
invoke GetTickCount
pop ecx
sub eax, ecx
print str$(eax)," timing szLen",13,10
; ------------------------------------------
invoke GetTickCount
push eax
mov esi, 10000000
@@:
invoke szLen2,ADDR txt
sub esi, 1
jnz @B
invoke GetTickCount
pop ecx
sub eax, ecx
print str$(eax)," timing szLen2",13,10
; ------------------------------------------
ENDM
invoke SetPriorityClass,FUNC(GetCurrentProcess),NORMAL_PRIORITY_CLASS
pop esi
ret
main endp
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
align 4
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 4
szLen2 proc src:DWORD
mov eax, [esp+4]
sub eax, 4
@@:
add eax, 4
cmp BYTE PTR [eax], 0
je lb1
cmp BYTE PTR [eax+1], 0
je lb2
cmp BYTE PTR [eax+2], 0
je lb3
cmp BYTE PTR [eax+3], 0
jne @B
sub eax, [esp+4]
add eax, 3
ret 4
lb3:
sub eax, [esp+4]
add eax, 2
ret 4
lb2:
sub eax, [esp+4]
add eax, 1
ret 4
lb1:
sub eax, [esp+4]
ret 4
szLen2 endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start
Different ML amd LINK options?
SHAME ON ME :(
It's a stupid error of mine. I was comparing different algo together szLen/StrLen. That where the different timing come from. Now everything is logic and my mind is in peace.
I'm so sorry because I made you loose you time. Forgive me. :red :red :red
:bg
No problems, as l;ong as you write some rocket libraries and share them around. :U
Quote from: hutch-- on April 17, 2006, 04:06:30 AM
No problems, as l;ong as you write some rocket libraries and share them around. :U
Count on me :wink
----------------------------------
I'm less confused now than was about alignment. But how far can I go with it, knowing that some processor may not take advantage of it. For example this code.
.586
.model flat, stdcall
option casemap:none
.code
option prologue:none
option epilogue:none
align 4
nop
;
; Return characters length of lpszStr excluding zero-terminated char
;
StrLenA proc p_lpszStr:dword
mov eax, [esp+4]
align 16
@@:
mov edx, dword ptr [eax]
add eax, 4
test edx, 0FFh
jz Lbl_0
test edx, 0FF00h
jz Lbl_1
test edx, 0FF0000h
jz Lbl_2
test edx, 0FF000000h
jnz @B
sub eax, 1
sub eax, [esp+4]
ret 4
Lbl_2:
sub eax, 2
sub eax, [esp+4]
ret 4
Lbl_1:
sub eax, 3
sub eax, [esp+4]
ret 4
Lbl_0:
sub eax, 4
sub eax, [esp+4]
ret 4
StrLenA endp
option prologue:prologuedef
option epilogue:epiloguedef
end
I did alignment as much as I could and it is the best performance I can have on my AMD processor. But if I'm writing a program that I want to share, it may not be as fast for the others. Is there a guideline to know what is good for general processor use.
If you're keen to write a new and useful library, make one that can detect and store the processor type at the start and then automatically use the best procedure for the CPU.
By storing the CPU type (ie. 0 = pre-P3, 1 = P3, 2 = P4,... etc.) you only have to detect it once and then it's a simple branch at the start of the proc. The branch will be just after a call, so branch prediction won't be an issue.
That way anyone can easily use the best code for any processor and you will be a cult hero :bg ... maybe :U
Quote from: zooba on April 17, 2006, 06:00:49 AM
...and you will be a cult hero :bg ... maybe :U
In my dream :P
-----------------------------------
I did few test with building a LIB file and I don't find any reasons to add align X before any procedures because once compiled it don't change nothing. Procedures are always align on a 16 byte paragraph boundary like donkey says previously. If I look at masm32 library, align X is used before many procedures so, since I'm new to MASM I'm thinking there must be a reason to do so.
What is this new mystery ?
jdoe,
The alignment for the start of a procedure in a library module is actually controlled by how the module is written with the particular compiler or assembler with COFF format and it ranges from 1 to 8192 byte alignment.
In the case of most of the modules in the masm32 library, I write them in an EXE file first to make sure they work properly them copy them complete into a seperate module that is built into an object module and combined with the main library. You could probably remove the leading alignment in the module source code but I doubt it effects much.