Here is my GetFilePath function to extract path from filenames. The subroutine appends a slash symbol to the end
of the buffer receiving the path followed by a zero.
.386
.model flat,stdcall
option casemap:none
.code
GetFilePath PROC uses esi source:DWORD,dest:DWORD
mov esi,source
mov edx,dest
dec esi
dec edx
mov ecx,edx
@@:
add esi,1
add edx,1
mov al,BYTE PTR [esi]
mov BYTE PTR [edx],al
test al,al
jz @f
cmp al,'\'
jne @b
mov ecx,edx
jmp @b
@@:
mov BYTE PTR [ecx+1],0
ret
GetFilePath ENDP
END
[attachment deleted by admin]
Vortex,
This will be a nice addition to the library. :U
Paul
Hi Vortex! You have a point there, I will use it. Thanks!
Just two questions:
- isn't INC faster than ADD?
- isn't Or faster than TEST?
Or difrent manufacters provide diffrent speeds for this four instructions?
[EDIT]
A last one :)
- cmp al, '\'
jne @B
sub al, '\'
jnz @B
[/EDIT]
Regards,
Nick
Hi Paul,
Thanks for your kind words.
Hi Nick,
Thanks for your support.
add is faster than inc on P4 processors. test is also a fast instruction. I don't know about the situation of sub vs cmp here.
Hi again. Thanks for claring that up.
I really should search for that updated Agner manual ... :)
Nick
Nick,
Here are Agner Fog's optimization manuals :
http://www.agner.org/optimize/#manuals
Nick,
cmp is a nondistructive test, sub is destructive so care is needed. Sometimes it is okay to use, other times, no.
Paul
Thanks for that link, Vortex.
Hi, Paul! Yep, but in this particular case (cmp al,'\') is working. But I'm not shure about the speed.
Regards,
Nick
Hi Nick,
My routine is aimed for general purpose usage so small timing effects should not affect the functionnality of the algo.
In the same way, the "mov al,Byte Ptr [esi]" could be move before "add esi,1".
mov al,BYTE PTR [esi]
add esi,1
mov BYTE PTR [edx],al
add edx,1
test al,al
jz @f
But I don't understand why you are writing all of these functions which are already existing in the standard API. I am sure that programmers would need others functions.
Thanks, for all. I like this forum, there always is something interesting for evryone.
Hello, Grincheux!
Even if "reinventing the wheel" is considered by most people a waste of time, I (and, probably, Vortex, too) think that great things can came out of this. Think about speed, think about the size, about putting your brain to find a new solution to a problem already solved (wich may be harder than finding a solution without a previous existing answer), about the fact that reading source code for such a function may help others to find solutions to similar problems, ..., ..., ...
And think about the fact that anyone is free to do anything he/she likes with his/her time. :)
Regards,
Nick
You are right, I make the same thing too. In French, "Grincheux" means the man who is never pleased...
What I wanted to do is to get more functions about subjects like these : Images, internet, security, GUI...
For such functions where all are "reinventing the wheel". We are not all lucky, because of our own capabilities and knowledge.
You are right too, because I don't like to use code written by another than I. Except for subjects I don't know or don't understand.
I think that functions about databases (by example) would be more useful than Path and files functions that already are in the standard API.
Thanks
Here is an improved version :
.386
.model flat,stdcall
option casemap:none
.code
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
GetFilePath PROC source:DWORD,dest:DWORD
push esi
push edi
mov esi,DWORD PTR [esp+12] ; source
mov edi,DWORD PTR [esp+16] ; dest
xor edx,edx
xor ecx,ecx
@@:
mov al,BYTE PTR [esi+edx]
mov BYTE PTR [edi+edx],al
add edx,1
test al,al
jz @f
cmp al,'\'
jne @b
mov ecx,edx
jmp @b
@@:
mov BYTE PTR [edi+ecx],0
pop edi
pop esi
ret 8
GetFilePath ENDP
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
END
[attachment deleted by admin]
I think that something better can be made by removing PROLOGUE and EPILOGUE. Because we don't need locals variables EBP register can then be used like this we can remove one PUSH ESI/POP ESI. That could be
GetFilePath :
PUSH EDI
mov EBP,DWORD PTR [esp+12] ; source
XOR EDX,EDX
mov edi,DWORD PTR [esp+16] ; dest
MOV ECX,EDX
MOV AH,'\'
.
.
.
CMP AL,AH
.
.
.
POP EDI
RET 8
;===================A Suggestion :
Replacing
mov BYTE PTR [edi+edx],al
By
mov WORD PTR [edi+edx],ax ; In that case CMP AL,AH must be replaced by the original CMP AL,'\'
If so we can remove later
mov BYTE PTR [edi+ecx],0
That's all. In the same spirit why nt rewrite PathMakePretty...?
QuoteBecause we don't need locals variables EBP register can then be used like this we can remove one PUSH ESI/POP ESI.
Grincheux, I am sorry but your method would not work because you need to preserve the original value of EBP if you modify it. Iczelion's tutorial #1:
QuoteWhen you program under Win32, you must know some important rules. One such rule is that, Windows uses esi, edi, ebp and ebx internally and it doesn't expect the values in those registers to change. So remember this rule first: if you use any of those four registers in your callback function, don't ever forget to restore them before returning control to Windows.
In that case don't use PROC directive and define the procedure as a label.
What is the interest of using ESP rather than EBP in that case ?
Grincheux,
Even if you replace the PROC directive with a label to identify the procedure, you need to observe the register preservation rule in the module containing the procedure. Using esp has a meaning here : by reading the value stored in esp, you don't destroy the content of this register.
Lingo, here is the forum rule #2, probably you forgot it :
Quote2. The basis of the rules in the forum is respect for other members. Respect for new members who are struggling with assembler, respect for experienced members who are here to help and respect for differences between members. This cannot be done by rigid compliance to stated rules and is based on intent. Please remember that every person who posts in the forum is another human being and deserves to be treated like one.
[/color]
isslz macro
add ecx, 1
cmp byte ptr [eax+ecx],"\"
je K2
cmp byte ptr [eax+ecx], 0
je K3
endm
GetFilePath:
mov eax, [esp+1*4]
or ecx, -1
mov edx, [esp+2*4]
movq MM7, [eax]
movq [edx], MM7
repeat 8
isslz
endm
K1:
movq MM7, [eax+ecx]
movq [edx+ecx], MM7
repeat 8
isslz
endm
jne K1
K2:
add edx, ecx
add eax, ecx
xor ecx, ecx
je K1
K3:
mov byte ptr [edx], 0
ret 2*4
Regards,
Lingo
Yes, but no need of OPTION PROLOGUE/EPILOGUE:NONE and no other code is added. When looking at
mov esi,DWORD PTR [esp+12] ; source
mov edi,DWORD PTR [esp+16] ; dest
we can see that a push has been added. Otherwise ESP + 12 would refer to the return address ?
To Lingo :
Very very interesting !
I imagined somthing using these kind of instructions set but I was not sure.
I understand Vortex, and he is TRUE for all of its remarks, but the finality for me is to have the quickest code. But there is a limit, at the end except the author no one else can read the source code.
But, how to progress if we don't go at the end of a problem ?
IMHO what does it mean ?
To Lingo :
K2:
add edx, ecx
add eax, ecx
xor ecx, ecx
je K1
je or jz is the same thing.
XOR ECX,ECX set the ZERO flag so JE K1 always is TRUE, what is faster JE K1 or JMP K1 ?
Quote....what is faster JE K1 or JMP K1 ?
"For example, a jump to a jump should be replaced by a jump to the final target.
In some cases this is even possible with conditional jumps if the condition is the same or is known." :lol
from 22.3. Avoiding jumps (all processors)
How to optimize for the Pentium family of microprocessors by Agner Fog
> what is faster JE K1 or JMP K1 ?
Look at how much work the instruction performs, a JE must test a flag, a JMP does not. Both types of branches slow up a pipeline and this factor alone may mask any timing difference between the two but a JMP does less.
Grincheux,
With the condition that you introduce the procedure with a unique label identifier, you can remove the OPTION PROLOGUE/EPILOGUE:NONE statements if you wish. If you merge your main module with the procedure then you can safely remove the push statements but that would destroy the flexibility of a standalone procedure in its own module.
Lingo,
To refresh your memory, I would like to remind you that we had a deal : you would go in your own way and me in my way. I know, it's very difficult for you to understand that you should post your specialized codes to the algorithm section. This subforum is intended for beginners and I believe that very few people here would be interested in a super fast GetFilePath algorithm. With your manners, you are portraying a person far from professionality.
If you still have difficulties to undestand what I say, don't hesitate to send me a PM and I will reply all your questions.
Vortex I understand what you are saying about Lingo, I don't want to know who is right, but it seems that the algo built by Lingo is coming from the discuss we had with this topic. I am not a beginner, but I appreciate this forum, more than the one you are speaking about.
I think it is not a good thing, even for a beginner, to know that now the algo is OK but it cannot be improved because he is a beginner. I think to that in specialized forums there would be explanaition about algo. Why they have been written like this, and where did they start from.
Where is the frontier between them ? If you are a beginner today will be a beginner tomorrow ? Even if I know some parts of my job sometimes I consider I am less than a beginner that helps me to understand when I am lost.
I understand the ideas of everyone, but is it MASM Forum or OK CORRAL ?
Thank you Lingo and Vortex, somewhere everyone can advance and it is the most important thing.
Grincheux,
The distinction is easy : I would not consider an algorithm optimized with MMX instruction is targetting this subforum. The audience of this section are members who have some basic knowledge and experience with the assembly language. The MMX stuff would rather fall to a more advanced category. Like in other fields in life, there are phases in learning assembly and this subforum is an effort to support our members who are progressing in the learning curve. The Masm32 forum provides a place where to put advanced algorithms, it's the laboratory section.
God said :U
Another version :
.386
.model flat,stdcall
option casemap:none
.data
GFPtable db 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
.code
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
GetFilePath PROC source:DWORD,dest:DWORD
push esi
push edi
push ebp
mov esi,DWORD PTR [esp+16] ; source
mov edi,DWORD PTR [esp+20] ; dest
xor edx,edx
xor ecx,ecx
mov ebp,OFFSET GFPtable
l0:
mov eax,edx
@@:
mov cl,BYTE PTR [esi+edx]
mov BYTE PTR [edi+edx],cl
add edx,1
cmp BYTE PTR [ebp+ecx],0
je @b
cmp cl,'\'
je l0
mov BYTE PTR [edi+eax],0
pop ebp
pop edi
pop esi
ret 8
GetFilePath ENDP
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
END
[attachment deleted by admin]
Sorry, but I don't understand. You gave too many comments I am lost.
I don't understand the meaning of '0' and '1' into your table.
Hi Grincheux,
You are right, I should probably explain how the algo works. OK, let me try to explain. Our procedure is supposed to copy the file path to the destination buffer and determine the last \ character to mark the end of the path pointing the file :
mov cl,BYTE PTR [esi+edx]
mov BYTE PTR [edi+edx],cl
add edx,1
It's clear that the characters are copied to the destination buffer pointed by edi. The most crucial part of the algo is to filter all the characters except the path terminator slash \ and the NULL ASCII string terminator. We create a lookup table to determine the candidates to be filtered by the algo :
cmp BYTE PTR [ebp+ecx],0
ebp points our lookup table and ecx holds the current character fetched from the source pointed by esi+edx The cmp instruction compares the element stored in ebp having the index ecx against zero. ecx is an ASCII value pointed by esi+edx so if this member ( BYTE PTR [ebp+ecx] ) is marked as NULL in the array then the ZERO flag is set to FALSE and then this ASCII character is skipped and the following jump instruction redirects the execution to the nearest anonymous label @@ Let's have a look at the lookup table, there are two exceptions : The first is the NULL terminator character ( ASCII 0 ) and the other is the slash character \ ( ASCII 92 ) Both of them are marked as 1 If ecx holds one of those values then the element located by ebp+ecx should match a TRUE value ( = 1 ) This means that the ZERO flag is going to be set to TRUE and we determined that ecx is equal to 0 or 92. The rest is easy : quit the loop if cl is NULL or go back if cl == 92
You may me ask why I used the lookup table method. It simplifies the design of conditional jumps in the loops and eliminates the forward jump in my previous version of my algo.
Did you compare the number of bytes and cycles from the first algo and the new one ?
If all the programs were written with keeping in mind the need of speed with very good algos did we need to often change our cpus ? Did we need to have very big memories quantities on our PCs ? This is not specially for Windows ?
Thanks
The first version was taking about 200 cycles on my P3 to process a 17-character string, including the call overhead, but I did not try to compare this to other similar string functions, or to the second version.
Grincheux,
Compared to the previous one, this new version runs faster. Testing a string sized 38 bytes :
The old version : 321 cycles
The new version : 280 cyles
This example code was a modest attempt to optimize the algo. You don't need to renew periodically your hardware or to buy expensive components. All what you need is to use the force of the assembly language where it's required.
using: (46 bytes string)
timer_begin 10000000, REALTIME_PRIORITY_CLASS
invoke(call) GetFilePath....
timer_end
I got:
1445 - Vortex_version_1 :(
1349 - Vortex_version_2 ::)
766 - Lingo's :wink
Using: counter_begin/end instead I got
275
273
151
Accordingly ...
QuoteThis example code was a modest attempt to optimize the algo. You don't need to renew periodically your hardware or to buy expensive components. All what you need is to use the force of the assembly language where it's required.
Excatly what I am thinking. :boohoo:
Not any windows function called. It can be used by the pinguins...
When do you create an "OPTIMIZED LIBRARY" including this function ?
Writing such functions could be a challenge :dance: ?
OK I cannot let win anyone of you... :U
Check out this kick-ass algo:
One disadvantage in my algo: destination size should be as large as source
GetFilePathR proc sour:DWORD, dest:DWORD
mov ecx, sour
mov edx, dest
@@:
mov al, BYTE PTR [ecx]
cmp al, 0
jz @F
mov BYTE PTR [edx], al
inc ecx
inc edx
jmp @B
@@:
cmp al, '\'
jz @F
dec edx
mov al, BYTE PTR [edx]
jmp @B
@@:
mov BYTE PTR [edx+1], 0
ret
GetFilePathR endp
Tested speed:
with timer_begin:
1469 - Vortex_v1
1431 - Vortex_v2
764 - Lingo's
652 - mine :8)
with counter_begin:
279 - Vortex_v1
275 - Vortex_v2
152 - Lingo's
130 - mine :8)
There could be a problem if using this function on a network path. \\Server\Dir1\SubDri\File.txt
The second '\' is not detected because the corresponding byte of the table is equal to '0'. Don't need to modify the function, just indicate that it cannot be used for network path.
Am I right ? :dazzled:
RamGuru, you don not preserver ESI and EDI.
fixed :P
Ramguru,
Compliments, nice idea but you must to be carefull with long file names giving different timing results. :wink
Quoted:\source\GetFilePath.asm
Grincheux,
Writing optimized algos is a nice idea but this task should not cross the aim of this forum. Regarding optimized algos, your ideas will be welcomed.
You're right vortex, further tests :U
Order: mine, lingo's, vortex_v2, vortex_v1
14 bytes - 48 39 89 129
28 bytes - 96 98 155 217
46 bytes - 129 151 273 262
83 bytes - 255 322 505 481
I've just modify RamGuru code and I got :
eax = 100, cycles
----------------------------------------
eax = 610, ms
----------------------------------------
GrincheuxGetFilePath proc sour:DWORD, dest:DWORD
push edi
push esi
mov esi, sour
mov edi, dest
mov edx,1
mov ah,'\'
@@:
mov al, BYTE PTR [esi]
test al,al
jz @F
mov BYTE PTR [edi], al
add esi,edx
add edi,edx
jmp @B
@@:
cmp al,ah
jz @F
sub edi,edx
mov al, BYTE PTR [edi]
jmp @B
@@:
mov BYTE PTR [edi+1], 0
pop edi
pop esi
ret
GrincheuxGetFilePath endp
damn I wrote kick-ass algo to kick someone's ass with it, not my own... :eek
Ramguru,
Believe me, it can be sometimes difficult to find the last step of those optimizations. :wink The collaboration of you and Grincheux gave a nice idea regarding code design. Thanks Grincheux and Ramguru.
Thanks Vortex.
Here is my last test from RamGuru :
eax = 98, cycles
file = C:\WINDOWS\system32\drivers\etc\hosts
path = C:\WINDOWS\system32\drivers\etc\
----------------------------------------
eax = 587, ms
----------------------------------------
I stop now, I have work to do...
Last try :
QuoteGrincheuxGetFilePath proc sour:DWORD, dest:DWORD
push edi
push esi
mov eax,'\\\\'
mov esi,sour
mov edi,dest
mov al,Byte Ptr [esi]
mov edx,1
@@ :
test al,al
jz @NextLoop
add esi,edx
mov Byte Ptr [edi],al
add edi,edx
mov al,Byte Ptr [esi]
jmp @B
@@ :
cmp al,ah
jz @EndOfJob
@NextLoop :
sub edi,edx
nop
mov al,Byte Ptr [edi]
jmp @B
@EndOfJob :
mov Byte Ptr [edi],0
@Exit :
pop esi
pop edi
ret
GrincheuxGetFilePath endp
file = C:\WINDOWS\system32\drivers\etc\hosts
path = C:\WINDOWS\system32\drivers\etc
----------------------------------------
eax = 95, cycles
eax = 580, ms
----------------------------------------
I really stop now.
Good Night
I got some really strange results using Agner Fog timer routines (TSCWin.zip):
Used 50 bytes path
; put your test code here. Example:
REPT 100 ; repeat code 100 times
invoke GetFilePath...
ENDM
30492 Vortex_v1
44604 Vortex_v2
17904 ramguru
18192 Grincheux_v1
18384 Grincheux_v2
1020 Lingo ....sorry I mixed push parameters 21480
Now who's timer routine is more trustful?! Michael's | Agner's
Here is what I use for measuring cycles :
Quotecounter_begin 10000000, REALTIME_PRIORITY_CLASS
invoke GrincheuxGetFilePath,ADDR file,ADDR path
counter_end
push eax
PrintString file
PrintString path
PrintLine
pop eax
PrintDec eax," cycles"
timer_begin 10000000, REALTIME_PRIORITY_CLASS
invoke GrincheuxGetFilePath,ADDR file,ADDR path
timer_end
PrintDec eax," ms"
PrintLine