I'm PB Coder, some day i write function
CRC calculation, by adopting Export Function from ntdll.dll (RtlComputeCrc32)
The calculation is really fast,
but i need other byte (data) calculation code,
any one can help me?
I need calculation faster than CRC, written ini PowerBasci with ASM inline. :bg :bg
welcome to the forum, hrxxx :U
sounds to me like you have been bitten by the ASM bug :bg
pretty soon, you will forget all about PB
Quote from: dedndave on February 20, 2010, 01:11:57 AM
welcome to the forum, hrxxx :U
sounds to me like you have been bitten by the ASM bug :bg
pretty soon, you will forget all about PB
Yeah :bg
I choose PB beacause i can fix basic with ASM, really nice, fast anda power :cheekygreen: :cheekygreen:
You could try an inline ASM implementation of Adler-32. A good one to start with would be the version included with Zlib.
Quote from: MichaelW on February 20, 2010, 05:28:07 AM
You could try an inline ASM implementation of Adler-32. A good one to start with would be the version included with Zlib.
Yeah, i must download zlib 1st. I hope Adler-32 faster that CRC........ :U
Now that I test, on a P3, the Zlib crc-32 (4-table) version is 2 cycles faster than the adler-32 version (157 cycles versus 159 cycles, for the string "my other brother darryl"). I started out trying to compare the adler-32 cycle count to that for RtlComputeCrc32, but the function is not included for Windows 2000 and my XP system is down.
Quote from: MichaelW on February 20, 2010, 06:38:15 AM
Now that I test, on a P3, the Zlib crc-32 (4-table) version is 2 cycles faster than the adler-32 version (157 cycles versus 159 cycles, for the string "my other brother darryl"). I started out trying to compare the adler-32 cycle count to that for RtlComputeCrc32, but the function is not included for Windows 2000 and my XP system is down.
How to perform test?
u call function from DLL, or u create ur own code in your lang.
This code is adopted from ntdll.dll XP, and write in PowerBasic with inline ASM
FUNCTION SysRtlComputeCrc32(BYVAL dwInitialize AS LONG, BYVAL pBufferData AS DWORD, BYVAL pBufferDataLen AS DWORD) AS LONG
'---adopsi fungsi RtlComputeCrc32 punya xp_ntdll.dll :) *---buanter banget,ngebutzabiz,wuzzwuzzwuzz...
!mov eax, dwInitialize
!xor ecx, ecx ;---reset:0.
!cmp pBufferDataLen, ecx
!not eax
!jbe local_7C961521
local_7C961502:
!mov edx, pBufferData
!movzx edx, Byte[ecx+edx]
!xor edx, eax
!and edx, &H000000FF
!shr eax, &H08
!xor eax, ASM_CRC32_TABLE_CODE[&H04*edx]
!inc ecx
!cmp ecx, pBufferDataLen
!jb local_7C961502
local_7C961521:
!not eax
!mov function, eax
EXIT FUNCTION
ASM_CRC32_TABLE_CODE:
!dd &H00000000 ;---ini yang pertama (crc32_table[0]).
!dd &H77073096, &HEE0E612C, &H990951BA, &H076DC419
!dd &H706AF48F, &HE963A535, &H9E6495A3, &H0EDB8832
!dd &H79DCB8A4, &HE0D5E91E, &H97D2D988, &H09B64C2B
!dd &H7EB17CBD, &HE7B82D07, &H90BF1D91, &H1DB71064
!dd &H6AB020F2, &HF3B97148, &H84BE41DE, &H1ADAD47D
!dd &H6DDDE4EB, &HF4D4B551, &H83D385C7, &H136C9856
!dd &H646BA8C0, &HFD62F97A, &H8A65C9EC, &H14015C4F
!dd &H63066CD9, &HFA0F3D63, &H8D080DF5, &H3B6E20C8
!dd &H4C69105E, &HD56041E4, &HA2677172, &H3C03E4D1
!dd &H4B04D447, &HD20D85FD, &HA50AB56B, &H35B5A8FA
!dd &H42B2986C, &HDBBBC9D6, &HACBCF940, &H32D86CE3
!dd &H45DF5C75, &HDCD60DCF, &HABD13D59, &H26D930AC
!dd &H51DE003A, &HC8D75180, &HBFD06116, &H21B4F4B5
!dd &H56B3C423, &HCFBA9599, &HB8BDA50F, &H2802B89E
!dd &H5F058808, &HC60CD9B2, &HB10BE924, &H2F6F7C87
!dd &H58684C11, &HC1611DAB, &HB6662D3D, &H76DC4190
!dd &H01DB7106, &H98D220BC, &HEFD5102A, &H71B18589
!dd &H06B6B51F, &H9FBFE4A5, &HE8B8D433, &H7807C9A2
!dd &H0F00F934, &H9609A88E, &HE10E9818, &H7F6A0DBB
!dd &H086D3D2D, &H91646C97, &HE6635C01, &H6B6B51F4
!dd &H1C6C6162, &H856530D8, &HF262004E, &H6C0695ED
!dd &H1B01A57B, &H8208F4C1, &HF50FC457, &H65B0D9C6
!dd &H12B7E950, &H8BBEB8EA, &HFCB9887C, &H62DD1DDF
!dd &H15DA2D49, &H8CD37CF3, &HFBD44C65, &H4DB26158
!dd &H3AB551CE, &HA3BC0074, &HD4BB30E2, &H4ADFA541
!dd &H3DD895D7, &HA4D1C46D, &HD3D6F4FB, &H4369E96A
!dd &H346ED9FC, &HAD678846, &HDA60B8D0, &H44042D73
!dd &H33031DE5, &HAA0A4C5F, &HDD0D7CC9, &H5005713C
!dd &H270241AA, &HBE0B1010, &HC90C2086, &H5768B525
!dd &H206F85B3, &HB966D409, &HCE61E49F, &H5EDEF90E
!dd &H29D9C998, &HB0D09822, &HC7D7A8B4, &H59B33D17
!dd &H2EB40D81, &HB7BD5C3B, &HC0BA6CAD, &HEDB88320
!dd &H9ABFB3B6, &H03B6E20C, &H74B1D29A, &HEAD54739
!dd &H9DD277AF, &H04DB2615, &H73DC1683, &HE3630B12
!dd &H94643B84, &H0D6D6A3E, &H7A6A5AA8, &HE40ECF0B
!dd &H9309FF9D, &H0A00AE27, &H7D079EB1, &HF00F9344
!dd &H8708A3D2, &H1E01F268, &H6906C2FE, &HF762575D
!dd &H806567CB, &H196C3671, &H6E6B06E7, &HFED41B76
!dd &H89D32BE0, &H10DA7A5A, &H67DD4ACC, &HF9B9DF6F
!dd &H8EBEEFF9, &H17B7BE43, &H60B08ED5, &HD6D6A3E8
!dd &HA1D1937E, &H38D8C2C4, &H4FDFF252, &HD1BB67F1
!dd &HA6BC5767, &H3FB506DD, &H48B2364B, &HD80D2BDA
!dd &HAF0A1B4C, &H36034AF6, &H41047A60, &HDF60EFC3
!dd &HA867DF55, &H316E8EEF, &H4669BE79, &HCB61B38C
!dd &HBC66831A, &H256FD2A0, &H5268E236, &HCC0C7795
!dd &HBB0B4703, &H220216B9, &H5505262F, &HC5BA3BBE
!dd &HB2BD0B28, &H2BB45A92, &H5CB36A04, &HC2D7FFA7
!dd &HB5D0CF31, &H2CD99E8B, &H5BDEAE1D, &H9B64C2B0
!dd &HEC63F226, &H756AA39C, &H026D930A, &H9C0906A9
!dd &HEB0E363F, &H72076785, &H05005713, &H95BF4A82
!dd &HE2B87A14, &H7BB12BAE, &H0CB61B38, &H92D28E9B
!dd &HE5D5BE0D, &H7CDCEFB7, &H0BDBDF21, &H86D3D2D4
!dd &HF1D4E242, &H68DDB3F8, &H1FDA836E, &H81BE16CD
!dd &HF6B9265B, &H6FB077E1, &H18B74777, &H88085AE6
!dd &HFF0F6A70, &H66063BCA, &H11010B5C, &H8F659EFF
!dd &HF862AE69, &H616BFFD3, &H166CCF45, &HA00AE278
!dd &HD70DD2EE, &H4E048354, &H3903B3C2, &HA7672661
!dd &HD06016F7, &H4969474D, &H3E6E77DB, &HAED16A4A
!dd &HD9D65ADC, &H40DF0B66, &H37D83BF0, &HA9BCAE53
!dd &HDEBB9EC5, &H47B2CF7F, &H30B5FFE9, &HBDBDF21C
!dd &HCABAC28A, &H53B39330, &H24B4A3A6, &HBAD03605
!dd &HCDD70693, &H54DE5729, &H23D967BF, &HB3667A2E
!dd &HC4614AB8, &H5D681B02, &H2A6F2B94, &HB40BBE37
!dd &HC30C8EA1, &H5A05DF1B, &H2D02EF8D, &HFFFFFFFF
' adopted by Pamzlogiz
END FUNCTION
Try it, it is very fast...
and compare with adler calculation write in ASM
which the faster one ? im sure is affected by efficienct code that we write
Normally adler really faster than CRC32, but i'm still adopting how zlib do it (calculate adler). May be u can help me? :green :green
we no comprende "!" :P
QuoteNormally adler really faster than CRC32
It turns out that the problem was the size of my test string. With a 324-byte string crc32 runs in 973 cycles and adler32 in 550 cycles. This is for compiler-optimized versions of the Zlib functions, using the Microsoft Visual C++ Toolkit 2003 compiler with /O2 /G6. For comparison, a 4-table procedure done in MASM ran in 887 cycles, a single-table procedure in 1895 cycles, a bit-by-bit procedure in 33900 cycles, and a MASM version of your inline assembly function in 2580 cycles. And in case you're thinking that my implementation must be wrong, it returned the correct CRC value. All tests were run on a P3. No assembly version of adler32 yet.
Edit:
I forgot to align the table for your inline assembly function, and after I did so the cycle count dropped to 2205.
I'm not a PowerBasic guy, so sorry if the syntax isn't perfect. Also not sure if ebx needs preserving, so stacked it. Here is a slightly more efficient CRC32 implementation which folds BYTEs into DWORDs. On a Pentium P4 (Prescott) using RDTSC to time cycles, we have about 262 cycles for the original code, and 157 cycles for this one, using the "my other brother darryl" string. I'd have to blow the dust off the P3 (Katmai) as it's got to be 11 year since I was hacking around with it.
Basically it will perform as many 32-bit operations as possible and then finish with 0 to 3 8-bit operations.
-Clive
!mov eax, dwInitialize
!mov ecx, pBufferDataLen
!jecxz local_500
!not eax
!push ebx
!mov edx, pBufferData
!shr ecx, 2
!jz local_200
local_100:
!xor eax, Dword[edx]
!movzx ebx, al
!shr eax, &H08
!xor eax, ASM_CRC32_TABLE_CODE[&H04*ebx]
!movzx ebx, al
!shr eax, &H08
!xor eax, ASM_CRC32_TABLE_CODE[&H04*ebx]
!add edx, 4
!movzx ebx, al
!shr eax, &H08
!xor eax, ASM_CRC32_TABLE_CODE[&H04*ebx]
!movzx ebx, al
!shr eax, &H08
!xor eax, ASM_CRC32_TABLE_CODE[&H04*ebx]
!dec ecx
!jnz local_100
local_200:
!mov ecx, pBufferDataLen
!and ecx,3
!jz local_400
local_300:
!movzx ebx, Byte[edx]
!xor eax, ebx
!movzx ebx, al
!shr eax, &H08
!inc edx
!xor eax, ASM_CRC32_TABLE_CODE[&H04*ebx]
!dec ecx
!jnz local_300
local_400:
!pop ebx
!not eax
local_500:
!mov function, eax
nice code, Clive - welcome to the forum :U
Quote from: dedndave on February 22, 2010, 01:34:17 AM
nice code, Clive - welcome to the forum
Thanks, known
hutch-- forever, kind of been off doing stuff with ARM and MIPS code.
Not sure this is even optimum, seem to think the shifts will kill you on a P4. If a 16-bit CRC was workable, an 8-bit xchg rather than a shift might favour more platforms. Personally I would recommend using CRCs for any integrity application. The size, shift direction and the polynomial are debatable. I've built hardware CRC engines that can clock entire words at a time, the logic cascades rather cleverly.
The SSE 4.2 (Nehalem) has a CRC32 instruction, but the polynomial might not suit everyone, its the one used by iSCSI (CASTAGNOLI), not the one used by PKZIP. I've seen people on Intel's site complain about not being able to set the polynomial, but their implementation is probably a single set of 32 flip flops, and a bunch of random logic that can operate very fast, not a ROM or table driven construct. For simple integrity purposes the polynomial probably isn't too important in the scale of things*, though it won't help if your software has to use a specific one. * Most common ones have been picked for specific performance characteristics, and may have known limitations/expectations/sensitivities.
I don't have a SSE 4.2 box to hand, but the opcodes should look like this.
00401B7C F20F38F00B crc32 ecx,byte ptr [ebx]
00401B81 F20F38F0CA crc32 ecx,dl
00401B86 F20F38F10B crc32 ecx,dword ptr [ebx]
00401B8B F20F38F1CA crc32 ecx,edx
00401B90 66F20F38F10B crc32 ecx,word ptr [ebx]
00401B96 66F20F38F1CA crc32 ecx,dx
-Clive
Thanks for ALL, :U :U