What I have tried to do here is get the three test threads started as close together as possible. The 2 step call of the thread is to ensure that the arguments passed to each thread are copied locally before the caller overwrites the stack with the next lot. The thread termination approach is to set a flag in ther main structure to zero then let each thread increment the flag on exit. Track it with a yielding loop and when the thread count variable matches the thread count, the threads are closed.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
comment * -----------------------------------------------------
Build this template with
"CONSOLE ASSEMBLE AND LINK"
----------------------------------------------------- *
start_thread PROTO :DWORD
work_thread PROTO :DWORD
THREAD_BLOCK STRUCT
flag dd ? ; thread start flag
tcnt dd ? ; thread termination counter
arg1 dd ?
arg2 dd ?
arg3 dd ?
arg4 dd ?
arg5 dd ?
arg6 dd ?
arg7 dd ?
arg8 dd ?
THREAD_BLOCK ENDS
.code
start:
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
call main
inkey
exit
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
main proc
LOCAL tblk :THREAD_BLOCK
LOCAL thread1 :DWORD
LOCAL thread2 :DWORD
LOCAL thread3 :DWORD
; -------------------------------
; load the THREAD_BLOCK structure
; -------------------------------
mov DWORD PTR tblk.tcnt, 0 ; zero the thread counter
mov DWORD PTR tblk.arg1, 9
mov DWORD PTR tblk.arg2, 8
mov DWORD PTR tblk.arg3, 7
mov DWORD PTR tblk.arg4, 6
; -----------------------
; start the three threads
; -----------------------
mov thread1, rv(start_thread,ADDR tblk)
mov thread2, rv(start_thread,ADDR tblk)
mov thread3, rv(start_thread,ADDR tblk)
; ---------------------------
; idle until all 3 have ended
; ---------------------------
@@:
invoke SleepEx,0,0
cmp DWORD PTR tblk.tcnt, 3 ; check terminated thread count
jne @B
; ------------------------
; close the thread handles
; ------------------------
invoke CloseHandle,thread1
invoke CloseHandle,thread2
invoke CloseHandle,thread3
ret
main endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
start_thread proc pstruct:DWORD
LOCAL tID :DWORD
LOCAL hThread :DWORD
push esi
mov esi, pstruct
mov (THREAD_BLOCK PTR [esi]).flag, 1 ; set the flag
mov hThread, rv(CreateThread,NULL,NULL,ADDR work_thread,pstruct,NULL,ADDR tID)
; -------------------------------------------
; up the priority to make the spinlock faster
; reduces the start time for each thread.
; -------------------------------------------
invoke SetPriorityClass,rv(GetCurrentProcess),REALTIME_PRIORITY_CLASS
spinlock:
cmp (THREAD_BLOCK PTR [esi]).flag, 0 ; loop until called thread clears the flag
jne spinlock
invoke SetPriorityClass,rv(GetCurrentProcess),NORMAL_PRIORITY_CLASS
; -------------------------------------------
mov eax, hThread
pop esi
ret
start_thread endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
work_thread proc pstruct:DWORD
LOCAL arg1 :DWORD
LOCAL arg2 :DWORD
LOCAL arg3 :DWORD
LOCAL arg4 :DWORD
; ---------------------------------------------------------
; load the args to locals before the caller flag is cleared
; ---------------------------------------------------------
mov eax, pstruct
m2m arg1, (THREAD_BLOCK PTR [eax]).arg1
m2m arg2, (THREAD_BLOCK PTR [eax]).arg2
m2m arg3, (THREAD_BLOCK PTR [eax]).arg3
m2m arg4, (THREAD_BLOCK PTR [eax]).arg4
mov (THREAD_BLOCK PTR [eax]).flag, 0 ; clear the flag
print str$(arg1) ; ,13,10
print str$(arg2) ; ,13,10
print str$(arg3) ; ,13,10
print str$(arg4) ; ,13,10
mov eax, pstruct
add (THREAD_BLOCK PTR [eax]).tcnt, 1 ; increment the terminated thread counter
ret
work_thread endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end start
Quote; -----------------------
; start the three threads
; -----------------------
mov thread1, rv(start_thread,ADDR tblk)
mov thread2, rv(start_thread,ADDR tblk)
mov thread3, rv(start_thread,ADDR tblk)
you could use a single variable to start all three threads
i wrote some thread code that had handshaking
command, then acknowledge
i used different bits of the same variable to indicate different commands or ack's
one variable for commands written by the main proc
one variable for ack's/status written by the thread
in this case, each thread could have its' own ack/status var
Not sure what you mean Dave, if you mean the thread handle you will have problems with Closehandle() if you don't have the correct handle for each thread.
well - the threads can have an idle loop - waiting for a command
once all the threads are started, the main proc issues a "start" command through a common variable
several threads may READ the var at the same time - only one proc or thread should WRITE to any given var, however
use a different bit in the command word to instruct the threads to terminate
the threads can test the command word and react in a "state machine" fashion
once a thread sees the terminate command, it sends back a "terminate acknowledge" in their own status var
that is an example of command/state handshaking
TCommand dd 0
T1State dd 0
T2State dd 0
T3State dd 0
;TCommand
;bit meaning
;0 start
;1 terminate
;TnState
;bit meaning
;0 start ack
;1 terminate ack
or - use a different bit in TCommand to terminate each thread
the same idea can be used to do things other than terminating, of course
the important thing is, each of the variables is WRITTEN to by only one thread or proc
I get you, you synch them with an idle loop then start them as close together as possible by changing the start flag that each thread is polling in the idle loop.
What I have in mind with the technique I am using here is parallel processing on different cores where I don't want any lag on startup as i am trying to reduce the thread overhead granularity.
I have used ADD as it has an implicit LOCK on write which I think makes it safe if and when multiple threads try and write at exactly the same time. I am trying out methods like this to try and avoid system base critical section style code as it tends to be clunky and slow.
The other factor in the design so far is to make the thread caller re-entrant, that is why I have avoided global variable so far. Its no big deal to use a couple of the locations in the structure for start and finish flags. A lot of this stuff gets a lot simpler if the duration is long enough. If yopu are working in seconds any of the OS synch methods are OK but when you get down to near the millisecond level they are too slow.
at the start of each thread:
Start0: test byte ptr TCommand,1
jz Start0
or byte ptr T(n)State,1
that loop executes pretty damn fast
Its a good synch technique and the loop design is fine but its doing something different, its synching the thread starts where I am trying to get them started faster without being synched.
in the main proc....
or byte ptr TCommand,1 ;issue start command
TsNotStarted:
mov eax,T1State
and eax,T2State
and eax,T3State
test al,1 ;start ack bit
jz TsNotStarted ;wait for them all to get going
.
.
oh - gotcha
well - you can only start a thread so fast - lol
I think I understand what you are trying to do and I think I do something similar:
1) Have a global CRITICAL_SECTION and in your main thread you InitializeCriticalSection (or InitializeCriticalSectionAndSpinCount).
2) Create a struct that will be all the input and output needed for any of your threads.
3) Have a DWORD that is the count of threads that you want.
4) Have a DWORD that is the current thread index that is loading. Initialize this to zero.
5) Dynamically allocate enough handles for threads as you have in your thread count.
6) Loop from 0 to [thread count] - 1 and use CreateThread for each of your threads and put the handle to your threads in the array.
7) Each thread uses EnterCriticalSection and will lock the global, check the current thread index and store that value locally, then increase the global thread index, LeaveCriticalSection. At this point you know what number you are in the pecking order and you can access your params from the struct array without bashing the other threads. You can also store your values back to the struct. Each thread will lock/getIndex/unlock the critical section so you will never stomp on another thread.
8) After all the threads are created they will run and do whatever.
9) Use WaitForMultipleObjects and pass in the array to your thread handles and the count. When all of your threads are closed/done then this API will return.
10) Make sure to close all the handles you created by looping and closing them again.
Let's say that you have 50 threads and all the threads need to initialize before they can get your signal. Then you can have a global CRITICAL_SECTION that will lock so you can increase a global count of how many threads are ready. Your primary thread would then loop and lock/check/unlock to see if all threads are ready. The threads would need to be in an idle loop after ready and be checking for a CRITICAL_SECTION for the "all ok" in another variable. At that point they will all start as close as they can to each other. Then you can use your WaitForMultipleObjects.
Thomas,
Thanks for your info. What I am playing with at the moment is setting up 2 to 4 threads that match the processor core count and try for a low level technique of parallel processing of intense algorithms, effectively for each thread to come close to 100% core usage. The piece of genius that may or may not work is to split a task into 2 and perhaps later 4 and run the tasks in parallel cores to increase the processing speed.
I am happy enough at the moment to allow the OS thread engine do the work of running each thread in an available core and once I have this working I am playing with an idea of staggered threads where as one thread finishes its task, the other processes the data supplied by the first. I still have memory access issues to deal with which will come with a few later test pieces.
I will also be interested to see if the OS based CRITICAL_SECTION syncing method is fast enough as I am a bit suspicious after having played with a few thread functions some time ago.
I understand better what you are trying to accomplish. Working with affinity is really interesting. If you look at CRITICAL_SECTION(s) then don't forget to check out "InitializeCriticalSectionAndSpinCount" because overall it has been able to help.
There are several "thread-safe" APIs that are not very safe. I avoid all of the Interlock APIs because they do seem to have issues. But CS(s) are awesome and are far faster that Mutexes. For what you need I think that an Event might help. If you are also looking for very low weight check out "fibers" as an alternative ... but working with those are far beyond my expertise.