News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

SHA-256

Started by Petroizki, March 20, 2006, 07:16:20 PM

Previous topic - Next topic

Petroizki

This is a SHA-256 hash algo. It has no unrolling to the infinite, or other complex macros, just the basic assembly. The clock count is about 1400 per block (64 bytes) on my machine.

info: http://en.wikipedia.org/wiki/SHA-256
dl: http://personal.inet.fi/atk/partsu/SHA256.zip

Program output:
QuoteMESSAGE: abc
HASH: BA7816BF8F01CFEA414140DE5DAE2223B00361A396177A9CB410FF61F20015AD
CYCLES: 1453

MESSAGE: abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq
HASH: 248D6A61D20638B8E5C026930C3E6039A33CE45964FF2167F6ECEDD419DB06C1
CYCLES: 2882

MESSAGE: 1000000 times the letter 'a'
HASH: CDC76E5C9914FB9281A1C7E284D73E67F1809A48A497200E046D39CCC7112CD0
CYCLES: 23819820

Mark_Larson

 nice job Petroizki!  I walked through the code with my eyeballs and spotted some possible speed ups if you are still wanting to optimize further?

1) change loops that use INC/ADD to DEC/SUB to remove a CMP instruction from the loop.
here's some of the code where you do an INC/ADD to increment a register.  The trick is to fixup ECX by doing a NEG first.


; copy source 64 bytes to the workmem
mov esi, [$lpSource]
xor ecx, ecx
@@: mov eax, dword ptr [esi + ecx*4]
mov ebx, dword ptr [esi + ecx*4 + 4]
mov edx, dword ptr [esi + ecx*4 + 8]
mov edi, dword ptr [esi + ecx*4 + 12]
bswap eax
bswap ebx
bswap edx
bswap edi
mov dword ptr [$W + ecx*4], eax
mov dword ptr [$W + ecx*4 + 4], ebx
mov dword ptr [$W + ecx*4 + 8], edx
mov dword ptr [$W + ecx*4 + 12], edi
add ecx, 4
cmp ecx, 16
jne @B


Hutch has some sample code doing this from one of his routines in m32lib called addarr ( in addarr.asm).  Just need to apply it to your code wherever you do this.  The trick is in the "neg".  He also adds ECX to itself 4 times, which ends up being an ECX*4.  You also use ECX*4 in your code.


arr_add proc arr:DWORD,cnt:DWORD,plus:DWORD

    mov edx, plus
    mov eax, arr
    mov ecx, cnt
    add ecx, ecx
    add ecx, ecx
    add eax, ecx
    neg ecx
    jmp @F

  align 16
  @@:
    add [eax+ecx], edx
    add ecx, 4
    jnz @B




2) this is a small really loop. It only gets executed 4 times.  You might want to try unrolling it.


; copy source 64 bytes to the workmem
mov esi, [$lpSource]
xor ecx, ecx
@@: mov eax, dword ptr [esi + ecx*4]
mov ebx, dword ptr [esi + ecx*4 + 4]
mov edx, dword ptr [esi + ecx*4 + 8]
mov edi, dword ptr [esi + ecx*4 + 12]
bswap eax
bswap ebx
bswap edx
bswap edi
mov dword ptr [$W + ecx*4], eax
mov dword ptr [$W + ecx*4 + 4], ebx
mov dword ptr [$W + ecx*4 + 8], edx
mov dword ptr [$W + ecx*4 + 12], edi
add ecx, 4
cmp ecx, 16
jne @B




3) code alignment.  Try aligning your loops on 16 byte boundaries.  The way I usually do it, align one loop, time the code, and see if it's faster.  If it isn't, remove the align. And then try the next loop.

BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

Petroizki

Thanks Mark,

I have made changes, as you suggested. I also added my proctimers to the example. New upload on link at top post.

Speed improvements after changes: :8)
Quotev.0.01
1st MESSAGE: 1519 cycles
2nd MESSAGE: 3007 cycles
3rd MESSAGE: 24920978 cycles

v.0.02
1st MESSAGE: 1453 cycles
2nd MESSAGE: 2882 cycles
3rd MESSAGE: 23700356 cycles

Mark Jones

Hello Petroizki, great work!

Quote from: AMD XP 2500+ / XP SP2
MESSAGE: abc
HASH: BA7816BF8F01CFEA414140DE5DAE2223B00361A396177A9CB410FF61F20015AD
CYCLES: 1812

MESSAGE: abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq
HASH: 248D6A61D20638B8E5C026930C3E6039A33CE45964FF2167F6ECEDD419DB06C1
CYCLES: 3594

MESSAGE: 1000000 times the letter 'a'
HASH: CDC76E5C9914FB9281A1C7E284D73E67F1809A48A497200E046D39CCC7112CD0
CYCLES: 27817112
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

Ghirai

Very nice :)

Results fom my notebook (Mobile AMD Sempron 3000+):

QuoteMESSAGE: abc
HASH: BA7816BF8F01CFEA414140DE5DAE2223B00361A396177A9CB410FF61F20015AD
CYCLES: 1453

MESSAGE: abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq
HASH: 248D6A61D20638B8E5C026930C3E6039A33CE45964FF2167F6ECEDD419DB06C1
CYCLES: 2895

MESSAGE: 1000000 times the letter 'a'
HASH: CDC76E5C9914FB9281A1C7E284D73E67F1809A48A497200E046D39CCC7112CD0
CYCLES: 21830745
MASM32 Project/RadASM mirror - http://ghirai.com/hutch/mmi.html

PBrennick

Very nice;)
You guys rock!

Anyway AMD 1000 w/XPHE+SP2
Quote
MESSAGE: abc
HASH: BA7816BF8F01CFEA414140DE5DAE2223B00361A396177A9CB410FF61F20015AD
CYCLES: 1812

MESSAGE: abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq
HASH: 248D6A61D20638B8E5C026930C3E6039A33CE45964FF2167F6ECEDD419DB06C1
CYCLES: 3594

MESSAGE: 1000000 times the letter 'a'
HASH: CDC76E5C9914FB9281A1C7E284D73E67F1809A48A497200E046D39CCC7112CD0
CYCLES: 27179203

Don't laugh at my old machine!
Paul
The GeneSys Project is available from:
The Repository or My crappy website

Mark_Larson

  I ran the code on my work computer.  It is a 3.2 GHz Nocona.  I recently converted to AMD for my home system ( got an X2). 


MESSAGE: abc
HASH: BA7816BF8F01CFEA414140DE5DAE2223B00361A396177A9CB410FF61F20015AD
CYCLES: 2390

MESSAGE: abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq
HASH: 248D6A61D20638B8E5C026930C3E6039A33CE45964FF2167F6ECEDD419DB06C1
CYCLES: 4456

MESSAGE: 1000000 times the letter 'a'
HASH: CDC76E5C9914FB9281A1C7E284D73E67F1809A48A497200E046D39CCC7112CD0
CYCLES: 35971865
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

Mark Jones

Wow, it seems like the bigger AMD's are getting slower!
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

Mark_Larson

Quote from: Mark Jones on March 29, 2006, 06:54:58 PM
Wow, it seems like the bigger AMD's are getting slower!

The timings are from my work computer which is an Intel.

BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm