News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Code for starting multiple threads.

Started by hutch--, September 16, 2009, 12:51:08 PM

Previous topic - Next topic

hutch--

What I have tried to do here is get the three test threads started as close together as possible. The 2 step call of the thread is to ensure that the arguments passed to each thread are copied locally before the caller overwrites the stack with the next lot. The thread termination approach is to set a flag in ther main structure to zero then let each thread increment the flag on exit. Track it with a yielding loop and when the thread count variable matches the thread count, the threads are closed.


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
    include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

comment * -----------------------------------------------------
                        Build this  template with
                       "CONSOLE ASSEMBLE AND LINK"
        ----------------------------------------------------- *

    start_thread PROTO :DWORD
    work_thread  PROTO :DWORD

    THREAD_BLOCK STRUCT
      flag  dd ?            ; thread start flag
      tcnt  dd ?            ; thread termination counter
      arg1  dd ?
      arg2  dd ?
      arg3  dd ?
      arg4  dd ?
      arg5  dd ?
      arg6  dd ?
      arg7  dd ?
      arg8  dd ?
    THREAD_BLOCK ENDS

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    LOCAL tblk      :THREAD_BLOCK

    LOCAL thread1   :DWORD
    LOCAL thread2   :DWORD
    LOCAL thread3   :DWORD

  ; -------------------------------
  ; load the THREAD_BLOCK structure
  ; -------------------------------
    mov DWORD PTR tblk.tcnt, 0      ; zero the thread counter
    mov DWORD PTR tblk.arg1, 9
    mov DWORD PTR tblk.arg2, 8
    mov DWORD PTR tblk.arg3, 7
    mov DWORD PTR tblk.arg4, 6

  ; -----------------------
  ; start the three threads
  ; -----------------------
    mov thread1, rv(start_thread,ADDR tblk)
    mov thread2, rv(start_thread,ADDR tblk)
    mov thread3, rv(start_thread,ADDR tblk)

  ; ---------------------------
  ; idle until all 3 have ended
  ; ---------------------------
  @@:
    invoke SleepEx,0,0
    cmp DWORD PTR tblk.tcnt, 3      ; check terminated thread count
    jne @B

  ; ------------------------
  ; close the thread handles
  ; ------------------------
    invoke CloseHandle,thread1
    invoke CloseHandle,thread2
    invoke CloseHandle,thread3

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

start_thread proc pstruct:DWORD

    LOCAL tID       :DWORD
    LOCAL hThread   :DWORD

    push esi
    mov esi, pstruct
    mov (THREAD_BLOCK PTR [esi]).flag, 1    ; set the flag

    mov hThread, rv(CreateThread,NULL,NULL,ADDR work_thread,pstruct,NULL,ADDR tID)

  ; -------------------------------------------
  ; up the priority to make the spinlock faster
  ; reduces the start time for each thread.
  ; -------------------------------------------
    invoke SetPriorityClass,rv(GetCurrentProcess),REALTIME_PRIORITY_CLASS
  spinlock:
    cmp (THREAD_BLOCK PTR [esi]).flag, 0    ; loop until called thread clears the flag
    jne spinlock
    invoke SetPriorityClass,rv(GetCurrentProcess),NORMAL_PRIORITY_CLASS
  ; -------------------------------------------

    mov eax, hThread
    pop esi

    ret

start_thread endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

work_thread proc pstruct:DWORD

    LOCAL arg1  :DWORD
    LOCAL arg2  :DWORD
    LOCAL arg3  :DWORD
    LOCAL arg4  :DWORD

  ; ---------------------------------------------------------
  ; load the args to locals before the caller flag is cleared
  ; ---------------------------------------------------------
    mov eax, pstruct
    m2m arg1, (THREAD_BLOCK PTR [eax]).arg1
    m2m arg2, (THREAD_BLOCK PTR [eax]).arg2
    m2m arg3, (THREAD_BLOCK PTR [eax]).arg3
    m2m arg4, (THREAD_BLOCK PTR [eax]).arg4

    mov (THREAD_BLOCK PTR [eax]).flag, 0    ; clear the flag

    print str$(arg1)    ; ,13,10
    print str$(arg2)    ; ,13,10
    print str$(arg3)    ; ,13,10
    print str$(arg4)    ; ,13,10

    mov eax, pstruct
    add (THREAD_BLOCK PTR [eax]).tcnt, 1    ; increment the terminated thread counter

    ret

work_thread endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

Quote; -----------------------
  ; start the three threads
  ; -----------------------
    mov thread1, rv(start_thread,ADDR tblk)
    mov thread2, rv(start_thread,ADDR tblk)
    mov thread3, rv(start_thread,ADDR tblk)
you could use a single variable to start all three threads
i wrote some thread code that had handshaking
command, then acknowledge
i used different bits of the same variable to indicate different commands or ack's
one variable for commands written by the main proc
one variable for ack's/status written by the thread
in this case, each thread could have its' own ack/status var

hutch--

Not sure what you mean Dave, if you mean the thread handle you will have problems with Closehandle() if you don't have the correct handle for each thread.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

well - the threads can have an idle loop - waiting for a command
once all the threads are started, the main proc issues a "start" command through a common variable
several threads may READ the var at the same time - only one proc or thread should WRITE to any given var, however
use a different bit in the command word to instruct the threads to terminate
the threads can test the command word and react in a "state machine" fashion
once a thread sees the terminate command, it sends back a "terminate acknowledge" in their own status var
that is an example of command/state handshaking

TCommand dd 0
T1State dd 0
T2State dd 0
T3State dd 0

;TCommand
;bit     meaning
;0      start
;1      terminate

;TnState
;bit     meaning
;0      start ack
;1      terminate ack

or - use a different bit in TCommand to terminate each thread
the same idea can be used to do things other than terminating, of course
the important thing is, each of the variables is WRITTEN to by only one thread or proc

hutch--

I get you, you synch them with an idle loop then start them as close together as possible by changing the start flag that each thread is polling in the idle loop.

What I have in mind with the technique I am using here is parallel processing on different cores where I don't want any lag on startup as i am trying to reduce the thread overhead granularity.

I have used ADD as it has an implicit LOCK on write which I think makes it safe if and when multiple threads try and write at exactly the same time. I am trying out methods like this to try and avoid system base critical section style code as it tends to be clunky and slow.

The other factor in the design so far is to make the thread caller re-entrant, that is why I have avoided global variable so far. Its no big deal to use a couple of the locations in the structure for start and finish flags. A lot of this stuff gets a lot simpler if the duration is long enough. If yopu are working in seconds any of the OS synch methods are OK but when you get down to near the millisecond level they are too slow.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

at the start of each thread:

Start0: test byte ptr TCommand,1
        jz      Start0

        or byte ptr T(n)State,1

that loop executes pretty damn fast

hutch--

Its a good synch technique and the loop design is fine but its doing something different, its synching the thread starts where I am trying to get them started faster without being synched.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

in the main proc....

        or byte ptr TCommand,1  ;issue start command

TsNotStarted:
        mov     eax,T1State
        and     eax,T2State
        and     eax,T3State
        test    al,1         ;start ack bit
        jz      TsNotStarted  ;wait for them all to get going
.
.

dedndave

oh - gotcha
well - you can only start a thread so fast - lol

thomas_remkus

I think I understand what you are trying to do and I think I do something similar:

1) Have a global CRITICAL_SECTION and in your main thread you InitializeCriticalSection (or InitializeCriticalSectionAndSpinCount).
2) Create a struct that will be all the input and output needed for any of your threads.
3) Have a DWORD that is the count of threads that you want.
4) Have a DWORD that is the current thread index that is loading. Initialize this to zero.
5) Dynamically allocate enough handles for threads as you have in your thread count.
6) Loop from 0 to [thread count] - 1 and use CreateThread for each of your threads and put the handle to your threads in the array.
7) Each thread uses EnterCriticalSection and will lock the global, check the current thread index and store that value locally, then increase the global thread index, LeaveCriticalSection. At this point you know what number you are in the pecking order and you can access your params from the struct array without bashing the other threads. You can also store your values back to the struct. Each thread will lock/getIndex/unlock the critical section so you will never stomp on another thread.
8) After all the threads are created they will run and do whatever.
9) Use WaitForMultipleObjects and pass in the array to your thread handles and the count. When all of your threads are closed/done then this API will return.
10) Make sure to close all the handles you created by looping and closing them again.

Let's say that you have 50 threads and all the threads need to initialize before they can get your signal. Then you can have a global CRITICAL_SECTION that will lock so you can increase a global count of how many threads are ready. Your primary thread would then loop and lock/check/unlock to see if all threads are ready. The threads would need to be in an idle loop after ready and be checking for a CRITICAL_SECTION for the "all ok" in another variable. At that point they will all start as close as they can to each other. Then you can use your WaitForMultipleObjects.

hutch--

Thomas,

Thanks for your info. What I am playing with at the moment is setting up 2 to 4 threads that match the processor core count and try for a low level technique of parallel processing of intense algorithms, effectively for each thread to come close to 100% core usage. The piece of genius that may or may not work is to split a task into 2 and perhaps later 4 and run the tasks in parallel cores to increase the processing speed.

I am happy enough at the moment to allow the OS thread engine do the work of running each thread in an available core and once I have this working I am playing with an idea of staggered threads where as one thread finishes its task, the other processes the data supplied by the first. I still have memory access issues to deal with which will come with a few later test pieces.

I will also be interested to see if the OS based CRITICAL_SECTION syncing method is fast enough as I am a bit suspicious after having played with a few thread functions some time ago.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

thomas_remkus

I understand better what you are trying to accomplish. Working with affinity is really interesting. If you look at CRITICAL_SECTION(s) then don't forget to check out "InitializeCriticalSectionAndSpinCount" because overall it has been able to help.

There are several "thread-safe" APIs that are not very safe. I avoid all of the Interlock APIs because they do seem to have issues. But CS(s) are awesome and are far faster that Mutexes. For what you need I think that an Event might help. If you are also looking for very low weight check out "fibers" as an alternative ... but working with those are far beyond my expertise.