The algo I had been working on works fine but there is a problem with almost all implimentations I have seen, they load the entire file into memory and this limits the size that can be loaded to physical memory limits. The task I have to address is a SHA1 algo that can handle the source file chopped into sections, process the data in each section until the end of the file is reached and produce the correct SHA1 hash at the end of it.
The application it will be used in needs to be able to handle much larger files than fit into memory. What I was hoping was that someone may have seen some source code that does this, I have a couple of C implimentations that work but none that deal with tiling the source into pieces.
Quote from: hutch-- on January 12, 2011, 05:26:19 AM
The algo I had been working on works fine but there is a problem with almost all implimentations I have seen, they load the entire file into memory and this limits the size that can be loaded to physical memory limits. The task I have to address is a SHA1 algo that can handle the source file chopped into sections, process the data in each section until the end of the file is reached and produce the correct SHA1 hash at the end of it.
Wow. I thought handling sections of files was typical. So most of what you see involves file mapping? Performance wise, I tried it and found no difference compared to caching the file as it is read. But for my testing, I found a buffer size of 64KB to be most efficient, so that's what I tend to use for most file reading operations.
The source & timings I've been using in posting to your other thread involves processing the file in 64KB sections. In fact, the procedure calls I've been using (both CAPI and my source) allow the passing of a buffer processing size, which makes either putting the whole file into memory or reading it in sections a possibility.
The Delphi I posted in the other thread illustrates reading the file in sections and processing it. Or if I'm misunderstanding, could you please clarify what you are talking about? Thanks.
The Microsoft CryptoAPI functions are designed to operate on discontinuous data streams, and judging from his SHA-1 hash so are the procedures in Drizz's CryptoHash library.
;==============================================================================
include \masm32\include\masm32rt.inc
include \masm32\include\advapi32.inc
includelib \masm32\lib\advapi32.lib
;--------------------------------------------------------------------
; This from Drizz's CryptoHash Library in the download section here:
; http://www.drizz.eu.pn/
;--------------------------------------------------------------------
include sha1.asm
;==============================================================================
;--------------------------------------------
; These derived from WinCrypt.h in the PSDK:
;--------------------------------------------
ALG_CLASS_HASH equ 4 SHL 13
ALG_TYPE_ANY equ 0
ALG_SID_SHA1 equ 4
CALG_SHA1 equ ALG_CLASS_HASH or ALG_TYPE_ANY or ALG_SID_SHA1
;==============================================================================
.data
hProv dd 0
hHash dd 0
hashBuff db 20 dup(0)
dwDataLen dd SIZEOF hashBuff
;-------------------------------------------------------
; Sample messages and digests from FIPS PUB 180-1 here:
; http://www.itl.nist.gov/fipspubs/fip180-1.htm
;-------------------------------------------------------
szSample1 db "abc",0
szDigest1 db "A9993E364706816ABA3E25717850C26C9CD0D89D",0
szSample2 db "abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq",0
szDigest2 db "84983E441C3BD26EBAAE4AA1F95129E5E54670F1",0
szSample3 db "1000000 * 'a'",0
szDigest3 db "34AA973CD4C4DAA4F61EEB2BDBAD27316534016F",0
.code
;==============================================================================
start:
;==============================================================================
mov esi, alloc( 1000000 )
mov eax, "aaaa"
invoke memfill, esi, 1000000, eax
lea edi, hashBuff
invoke CryptAcquireContext, ADDR hProv,
NULL,
NULL,
PROV_RSA_FULL,
CRYPT_VERIFYCONTEXT
print ADDR szSample1, 13,10
print ADDR szDigest1,13,10
invoke CryptCreateHash, hProv, CALG_SHA1, 0, 0, ADDR hHash
invoke CryptHashData, hHash, ADDR szSample1, SIZEOF szSample1 - 1, 0
invoke CryptGetHashParam, hHash, HP_HASHVAL, edi, ADDR dwDataLen, 0
invoke CryptDestroyHash, hHash
xor ebx, ebx
.WHILE ebx < 20
movzx edx, BYTE PTR [edi+ebx]
invoke crt_printf, cfm$("%02X"), edx
inc ebx
.ENDW
print chr$(13,10)
print ADDR szSample2, 13,10
print ADDR szDigest2,13,10
invoke CryptCreateHash, hProv, CALG_SHA1, 0, 0, ADDR hHash
invoke CryptHashData, hHash, ADDR szSample2, SIZEOF szSample2 - 1, 0
invoke CryptGetHashParam, hHash, HP_HASHVAL, edi, ADDR dwDataLen, 0
invoke CryptDestroyHash, hHash
xor ebx, ebx
.WHILE ebx < 20
movzx edx, BYTE PTR [edi+ebx]
invoke crt_printf, cfm$("%02X"), edx
inc ebx
.ENDW
print chr$(13,10)
print ADDR szSample3, 13,10
print ADDR szDigest3,13,10
invoke CryptCreateHash, hProv, CALG_SHA1, 0, 0, ADDR hHash
invoke CryptHashData, hHash, esi, 1000000, 0
invoke CryptGetHashParam, hHash, HP_HASHVAL, edi, ADDR dwDataLen, 0
invoke CryptDestroyHash, hHash
xor ebx, ebx
.WHILE ebx < 20
movzx edx, BYTE PTR [edi+ebx]
invoke crt_printf, cfm$("%02X"), edx
inc ebx
.ENDW
print chr$(13,10)
print ADDR szSample3, 13,10
print ADDR szDigest3,13,10
invoke CryptCreateHash, hProv, CALG_SHA1, 0, 0, ADDR hHash
mov ebx, esi
REPEAT 10
invoke CryptHashData, hHash, ebx, 100000, 0
add ebx, 100000
ENDM
invoke CryptGetHashParam, hHash, HP_HASHVAL, edi, ADDR dwDataLen, 0
invoke CryptDestroyHash, hHash
xor ebx, ebx
.WHILE ebx < 20
movzx edx, BYTE PTR [edi+ebx]
invoke crt_printf, cfm$("%02X"), edx
inc ebx
.ENDW
print chr$(13,10)
invoke CryptReleaseContext, hProv, 0
print chr$(13,10,13,10)
print ADDR szSample1,13,10
print ADDR szDigest1,13,10
invoke SHA1Init
invoke SHA1Update, ADDR szSample1, SIZEOF szSample1-1
invoke SHA1Final
xor ebx, ebx
.WHILE ebx < 20
movzx edx, BYTE PTR [SHA1Digest+ebx]
invoke crt_printf, cfm$("%02X"), edx
inc ebx
.ENDW
print chr$(13,10)
print ADDR szSample2, 13,10
print ADDR szDigest2,13,10
invoke SHA1Init
invoke SHA1Update, ADDR szSample2, SIZEOF szSample2-1
invoke SHA1Final
xor ebx, ebx
.WHILE ebx < 20
movzx edx, BYTE PTR [SHA1Digest+ebx]
invoke crt_printf, cfm$("%02X"), edx
inc ebx
.ENDW
print chr$(13,10)
print ADDR szSample3, 13,10
print ADDR szDigest3,13,10
invoke SHA1Init
invoke SHA1Update, esi, 1000000
invoke SHA1Final
xor ebx, ebx
.WHILE ebx < 20
movzx edx, BYTE PTR [SHA1Digest+ebx]
invoke crt_printf, cfm$("%02X"), edx
inc ebx
.ENDW
print chr$(13,10)
print ADDR szSample3, 13,10
print ADDR szDigest3,13,10
invoke SHA1Init
mov ebx, esi
REPEAT 10
invoke SHA1Update, ebx, 100000
add ebx, 100000
ENDM
invoke SHA1Final
xor ebx, ebx
.WHILE ebx < 20
movzx edx, BYTE PTR [SHA1Digest+ebx]
invoke crt_printf, cfm$("%02X"), edx
inc ebx
.ENDW
print chr$(13,10)
print chr$(13,10,13,10)
inkey "Press any key to exit..."
exit
;==============================================================================
end start
In my limited tests with a 10 MB buffer hashed in a single pass, and running on a P3, the CryptoAPI functions were slightly faster than Drizz's code.
Always thought SHA was agnostic to size, and could be done blockwise
sha_init(&sha);
while(FileRemains)
sha_update(&sha, Buffer, Length);
sha_final(digest, &sha);
ideally in a multiple of 64 bytes.
Quote from: clive on January 12, 2011, 08:36:56 PM
Always thought SHA was agnostic to size, and could be done blockwise
It is, you just got be sure you process the stream 64 bytes at a time. As long as you keep properly presenting 64 bytes to the calc routine, SHA-1 is fine. I guess there are a lot that either think file mapping is more efficient, or can't manage a correct algo to present the 64 byte blocks to calc?
Michael,
Thanks for decyphering the crypto API code. I have ported it to basic and it works reliably in the simple context of passing a single memory block to it. I confess to being near illiterate with crypto APIs so I have yet to digest how to stream the data. The folks who will be using this code will be doing checksums on very large files, > 1 gig so streaming the data in blocks is a necessity. The following code has been cleaned up a bit from the 1st version I posted.
' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
MACRO ALG_CLASS_HASH = 32768 ''' 4 SHL 13
MACRO ALG_TYPE_ANY = 0
MACRO ALG_SID_SHA1 = 4
MACRO CALG_SHA1 = ALG_CLASS_HASH or ALG_TYPE_ANY or ALG_SID_SHA1
MACRO PROV_RSA_FULL = 1
MACRO CRYPT_VERIFYCONTEXT = &HF0000000
MACRO HP_HASHVAL = 2
' ------------------------------------------
' these API DECLAREs are slightly renamed so
' they don't clash with future INCLUDE files
' ------------------------------------------
DECLARE FUNCTION Crypt_Acquire_Context LIB "ADVAPI32.DLL" ALIAS "CryptAcquireContextA"( _
ByVal phProv as DWORD,ByVal pszContainer as DWORD, _
ByVal pszProvider as DWORD,ByVal dwProvType as DWORD, _
ByVal dwFlags as DWORD) as DWORD
DECLARE FUNCTION Crypt_Create_Hash LIB "ADVAPI32.DLL" ALIAS "CryptCreateHash"( _
ByVal hProv as DWORD,ByVal Algid as DWORD, _
ByVal hKey as DWORD,ByVal dwFlags as DWORD, _
ByVal phHash as DWORD) as DWORD
DECLARE FUNCTION Crypt_Hash_Data LIB "ADVAPI32.DLL" ALIAS "CryptHashData"( _
ByVal hHash as DWORD,ByVal pbData as DWORD, _
ByVal dwDataLen as DWORD,ByVal dwFlags as DWORD) as DWORD
DECLARE FUNCTION Crypt_Get_Hash_Param LIB "ADVAPI32.DLL" ALIAS "CryptGetHashParam"( _
ByVal hHash as DWORD,ByVal dwParam as DWORD, _
ByVal pbData as DWORD,ByVal pdwDataLen as DWORD, _
ByVal dwFlags as DWORD) as DWORD
DECLARE FUNCTION Crypt_Destroy_Hash LIB "ADVAPI32.DLL" ALIAS "CryptDestroyHash"( _
ByVal hHash as DWORD) as DWORD
' ÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-÷-
FUNCTION API_sha1(src$) as STRING
#REGISTER NONE
LOCAL psrc as DWORD
LOCAL phsh as DWORD
LOCAL hProv as DWORD
LOCAL hHash as DWORD
LOCAL dwDataLen as DWORD
LOCAL phx as DWORD
psrc = StrPtr(src$) ' source address
hashbuff$ = space$(20) ' 20 byte working buffer
phsh = StrPtr(hashbuff$) ' working buffer pointer
hxout$ = space$(40) ' allocate 40 bytes for the output string
phx = StrPtr(hxout$) ' get its address
dwDataLen = 20 ' set data length to length of hashbuff$
Crypt_Acquire_Context VarPtr(hProv),%NULL,%NULL,PROV_RSA_FULL,CRYPT_VERIFYCONTEXT
Crypt_Create_Hash hProv, CALG_SHA1, 0, 0, VarPtr(hHash)
Crypt_Hash_Data hHash,psrc,len(src$),0
Crypt_Get_Hash_Param hHash, HP_HASHVAL, phsh, VarPtr(dwDataLen), 0
Crypt_Destroy_Hash hHash
' --------------------------------------
' convert bytes in phsh to hex in hxout$
' --------------------------------------
! xor ebx, ebx ' zero the counter
! mov esi, phsh ' load the address of bytes in phsh
! mov edi, phx ' load the output buffer address
oloop:
! movzx eax, BYTE PTR [esi+ebx]
! movzx ecx, WORD PTR hextbl[eax*2]
! mov [edi], cx
! add edi, 2
! add ebx, 1
! cmp ebx, 20
! jle oloop
FUNCTION = hxout$
Exit FUNCTION
#align 4
hextbl:
! dw "00","10","20","30","40","50","60","70","80","90","a0","b0","c0","d0","e0","f0"
! dw "01","11","21","31","41","51","61","71","81","91","a1","b1","c1","d1","e1","f1"
! dw "02","12","22","32","42","52","62","72","82","92","a2","b2","c2","d2","e2","f2"
! dw "03","13","23","33","43","53","63","73","83","93","a3","b3","c3","d3","e3","f3"
! dw "04","14","24","34","44","54","64","74","84","94","a4","b4","c4","d4","e4","f4"
! dw "05","15","25","35","45","55","65","75","85","95","a5","b5","c5","d5","e5","f5"
! dw "06","16","26","36","46","56","66","76","86","96","a6","b6","c6","d6","e6","f6"
! dw "07","17","27","37","47","57","67","77","87","97","a7","b7","c7","d7","e7","f7"
! dw "08","18","28","38","48","58","68","78","88","98","a8","b8","c8","d8","e8","f8"
! dw "09","19","29","39","49","59","69","79","89","99","a9","b9","c9","d9","e9","f9"
! dw "0a","1a","2a","3a","4a","5a","6a","7a","8a","9a","aa","ba","ca","da","ea","fa"
! dw "0b","1b","2b","3b","4b","5b","6b","7b","8b","9b","ab","bb","cb","db","eb","fb"
! dw "0c","1c","2c","3c","4c","5c","6c","7c","8c","9c","ac","bc","cc","dc","ec","fc"
! dw "0d","1d","2d","3d","4d","5d","6d","7d","8d","9d","ad","bd","cd","dd","ed","fd"
! dw "0e","1e","2e","3e","4e","5e","6e","7e","8e","9e","ae","be","ce","de","ee","fe"
! dw "0f","1f","2f","3f","4f","5f","6f","7f","8f","9f","af","bf","cf","df","ef","ff"
End FUNCTION
' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
The Delphi I used in the CAPI program mentioned in the other thread.
infile := CreateFile(PChar(OpenDialog1.FileName), GENERIC_READ,
FILE_SHARE_READ, nil, OPEN_EXISTING,
FILE_FLAG_SEQUENTIAL_SCAN, 0);
CryptAcquireContext(hProv, nil, nil, PROV_RSA_FULL, CRYPT_VERIFYCONTEXT);
CryptCreateHash(hProv, CALG_SHA1, nil, 0, hHash);
ReadFile(infile, inbuffer, sizeof(inbuffer), amount_read, nil);
repeat
CryptHashData(hHash, @inbuffer, amount_read, 0);
ReadFile(infile, inbuffer, sizeof(inbuffer), amount_read, nil);
until amount_read = 0;
cbHashDataLen := 20;
CryptGetHashParam(hHash, HP_HASHVAL, @outHash, cbHashDataLen, 0);
CryptDestroyHash(hhash);
CryptReleaseContext(hprov, 0) ;
CloseHandle(infile);
The output value is in outHash. Run your ByteToHex against that to get your SHA1
Quotehttp://www.drizz.eu.pn/
drizz lives on Pitcairn ?
:bg
LATER: After slightly diluting my ignorance of Microsoft crypto API functions, Michael's code handles the streaming of blocks of data so all is well and it is even reasonably fast.
Gratsie again ! :U
Here is a test piece that uses the technique that Michael developed. Hope your PB is up to scratch :bg
I have also included the Microsoft fciv.exe and an old Unix port for comparison testing. The Microsoft version appears to be the fastest, this one is not far behind and the old Unix port is the slowest.
we don't allow powerbasic in this forum :lol
Quote from: hutch-- on January 14, 2011, 12:27:42 AM
I have also included the Microsoft fciv.exe and an old Unix port for comparison testing. The Microsoft version appears to be the fastest, this one is not far behind and the old Unix port is the slowest.
Actually, I find if you don't turn off the OS file caching (and you don't in your code), that 64KB is almost always best for a buffer size. You might consider adjusting/testing your code in this way, especially since using such a big buffer causes a performance hit in such an environment. That performance hit isn't as large on faster computers/drives, but it's still there.
Straight I/O test varying buffer sizes, file cache ON.
Processing against: D:\TEST2
1024 bytes 1062
2048 bytes 703
4096 bytes 531
8192 bytes 454
16384 bytes 421
32768 bytes 391
65536 bytes 375
131072 bytes 375
262144 bytes 359
524288 bytes 375
1048576 bytes 985
2097152 bytes 1015
4194304 bytes 1016
8388608 bytes 1031
16777216 bytes 1016
33554432 bytes 1031
67108864 bytes 1047
Glen,
The rough test I used was a 1 gig VOB file and the buffer size did not effect the timing much at all. I played with the CreateFile() flags but saw no timing change from any option. I would have tried the CreateFile() flag FILE_FLAG_NO_BUFFERING but I cannot garrantee the data alignment which is 4 byte aligned.
One it has been run it then lives in the OS file cache and you are effectively only timing the crypto routine.
The main factor at the moment is that its reliable and easy to use.
Glen,
I did some tests with your suggestion and the results have been very good. The smaller buffer, 16, 32 and 64k run about 25% faster so it appears to have something to do with the way the crypto algo works. Gratsie !
Quote from: hutch-- on January 16, 2011, 02:14:54 AM
Glen,
I did some tests with your suggestion and the results have been very good. The smaller buffer, 16, 32 and 64k run about 25% faster so it appears to have something to do with the way the crypto algo works. Gratsie !
More to do with the ReadFile API and how it is written, actually. Note in the test results above that read I/O was being benchmarked and not the CAPI routines. The delay varies based on CPU (I guess something in the file caching algorithm since I would presume in reading the file that it would already be cached in memory?) and drive speed/ability to handle reads. Your code actually was about 3-4x slower than FCIV in certain scenarios I ran it through when I was testing it, and I was able to trace it to the buffer size by duplicating the slower speed in my own CAPI program by changing the buffer size to 16MB.
Glad you were able to try it and find the suggestion helpful.
Edit: Actually looking at those test results, I'm curious as to how high you can push the buffer size without impacting performance. For example, it seems between the 512K and 1024K line is where the performance hit starts to take hold. Might have to test it sometime soon. Edit2: I tested another couple of drives and one of the lines was between 32-64K, and another one was slightly greater than that. So drive performance is playing into this somewhere, too, and not just the file caching algorithm in the OS. I'm wondering now how you determine what would be fully optimal besides testing/benchmarking like I've been doing.
i looked at the times over a range and while it started to get slower with buffer sizes under 16k, it did not start to slow on the other end until it was much larger so I took the small end (16 to 64k) as larger did not make it faster. I agree that hardware has much to do with the read speed, i am testing on WD 1tb green disks which burst at about 180 meg/sec and sustain at about 100 meg/sec. I guess it still has some to do with how both CreateFile() and ReadFile() interact with the way the crypto algos were written as in most instances where I have to work from disk, much larger buffers are faster and 32 meg is no big deal these days.
The disk IO is still a lot slower than processor and even though the crypto algos do a lot of work there will still be some lag time and I think it is here that the very large buffers impact on speed. It may be worth splitting the technique into different threads so that you some asynchronous overlap between disk IO and processor occurs.