News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Does anyone have a genuinely fast SHA1 algo ?

Started by hutch--, January 01, 2011, 02:37:05 AM

Previous topic - Next topic

hutch--

Glen,

The trailing "b" is for binary notation, the other way around b1 is being treated as a memory operand.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

i must be looking at the wrong stuff
i see "1Bh"


i don't think it matters, but he uses JLE - looks like JBE would make more sense

Glenn9999

Got done playing with this source.  Hopefully someone can see something else going on in it that can shave off some more time.  I cleaned it up and got it about 10ms faster than it was when I originally posted times for it.

It's about 300 lines of ASM, so I'm going to attach it to this post as a ZIP file.

Please let me know if it helps, and if anyone has any ideas to get it going faster.

For usage, see the Delphi code I originally posted in here - the ASM module call is intended as a direct Sha1Compress replacement.

Antariy

Quote from: Glenn9999 on January 06, 2011, 07:51:38 AM
Quote from: dedndave on January 06, 2011, 07:49:48 AM
oh - found it - page 1 of the thread
did you notice that he edited that post ?

Yes, I copied it when I made the other post in order to test it.  Tried 1B and B1 as well in the "Edited" Description.  1B compiled, B1 didn't.  Maybe he can clarify.

B1 - is in hex. When you want to write a hex in MASM, but the number is starting from a letter, not digit, then write a "0" before a number. I.e. use 0B1h instead of B1 - me just said in simper manner.

OK, right code should be something like:

@loop1:
PSHUFW MM0,QWORD PTR [EAX+EDX*8],0B1h
PSHUFW MM1,QWORD PTR [EAX+EDX*8],0B1h
PSRLW MM0,8
ADD EDX,1
PSLLW MM1,8
CMP EDX,8
PADDD MM0,MM1
MOVQ QWORD PTR [EAX+EDX*4],MM0
JL @loop1


Not sure about gain - it may be not so much since memory bus is slow anyway.

Antariy

Quote from: dedndave on January 06, 2011, 07:56:43 AM
i must be looking at the wrong stuff
i see "1Bh"

i don't think it matters, but he uses JLE - looks like JBE would make more sense

I'm writing that code piece right over original code - just inserting other shuffling stuff, Dave :P

Glenn9999

#50
I had a chance to play with this some more.  Mainly the question needed to be answered if what has been posted qualifies as a "genuinely fast SHA1 algo".

I looked for other common implementations and the Microsoft ones seemed to be a good start.  I tested:
1) Microsoft's FCIV command-line utility.  If you invoke it with -R it will put out approximate times in seconds.  Not desirable but good to see how close things are - whether they are similar or very far apart.
2) Microsoft's cryto API.  I wrote a program which mirrors functionality of the testbed program I've been using to write/test these algorithms invoking this API.

The SHA-1 implementation in FCIV seems to be very comparable to the one I posted here.  Probably IA-32 code and times out very similar in most all instances to the point that it would have to be considered relatively even given varations and such.

However, the crypto API was a different story.  The SHA-1 algorithm there ran slightly slower than what I posted here when it came to the IA-32 platform.  However, when removed to a platform that supports MMX & SSE, the speeds were radically different.

Athlon 2000XP CAPI SHA1 - Processed 681574400 bytes in  9813 ms.
Athlon 2000XP mine SHA1 - Processed 681574400 bytes in  9531 ms.
Core2Duo 2900 CAPI SHA1 - Processed 681574400 bytes in  2297 ms.
Core2Duo 2900 mine SHA1 - Processed 681574400 bytes in  3500 ms.

That said, since the times were similar in the IA-32 platform, what I posted probably is pretty close to the best you're going to get without MMX/SSE, but you're going to get a big return if you implement with those instructions.

Hope that helps, and hope the code posted was useful.

EDIT: Added all timings.