News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

dual-core multi-threaded x86 programming

Started by bsdz, February 22, 2007, 02:25:21 PM

Previous topic - Next topic

bsdz

Hi all

Probably a silly question but I have had no luck in answering it myself so far. Are there x86 assembly instructions that allow you to swap cores in a dual-core system? I am very interested in the possibility of writing some multi-threaded x86 code. If there are, is there any good on-line resource with examples including memory sharing, locking, etc..? I am very familiar with writing multi-threaded Win32 applications but I have a funny feeling that switching cores/threading is an OS function and therefore inaccessible to an x86 programmer?

Thanks in advance
Blair

TNick

Hello and welcome!

Quote from: bsdz on February 22, 2007, 02:25:21 PM
... but I have a funny feeling that switching cores/threading is an OS function and therefore inaccessible to an x86 programmer?

Your feeling is right! Use Search capabilities of this forum to see a long list of discussions about this matter.  :bg

Regards,
Nick

u

First, use SetThreadAffinityMask() for the first (main) thread to be 1. (This means that thread#1 will run on cpu#1).
Then, use CreateEvent() to create one "event", it'll be used to synchronize the second thread with the first thread.
Then, CreateThread() - this will be your second thread.
Call SetThreadAffinityMask() for the second thread to be 2.
Make the second thread wait for the "Event" you've created. Waiting is done via WaitForSingleObject().

If you need extra fine-sync, use the instructions "xchg" and "cmpxchg" to set flag/counter global variables. Avoid using simple "mov", because it leaves a 1/(10^16) percent of risk for error. "xchg" and "cmpxchg" kill the risk of both cpus read/write the flag/counter global variable at once. The Win32 API functions InterlockedExchange, InterlockedExchangeAdd, Interlocked***  use these two instructions to do the task.

On a dualcore PC, where no app other than yours is taking >5% cpu, you can achieve almost perfect sync if necessary. [but if a third thread of your app does some long computations, it'll also kill the balance]. It's done by using the above-mentioned API and instructions, combined with a state-checking loop like   "while(!g_thread2_ready){}"

I recommend reading "Inside Windows 2000", the "Thread Scheduling" topic. Without knowing how w2k/xp do thread-scheduling, you'll hardly manage to find the perfect balance/sync. (which, fortunately is easy)
Please use a smaller graphic in your signature.

TNick

Well, looks like I got the wrong impression from other threads. Thanks for correcting me, Ultrano!

Regards,
Nick

bsdz

Thanks for the information.

It looks pretty much like the Win32 multi-threading I have used from C++. The ultra-fine syncing is very useful information. Thanks Ultrano++

u

Though, be aware that xchg/cmpxchg do their trick by locking the data-bus [which will degrade the other cpu's performance].... at least the docs read so. Thus, using these two instructions in a fine-waiting-loop might not be a good idea (instead use "mov" or simply "cmp"). I haven't been able to code on a dualcore PC yet, to measure penalties and do benchmarks... so I'm simply retyping the theory ^^"
Please use a smaller graphic in your signature.

dsouza123

  So with a quad core with a main thread doing the window
and message handling (needing very low cpu usage)
and four computation threads (with high cpu usage)
what about issues of data access ?
  For example using four sets of globals or/and four stacks ?
  What about issues of latency for regular data access,
access to data for periodic backups ie save files,
data access between threads ( if needed ) ?

u

I only have experience (and very recent, too) with simultaneous access of a circular buffer, more specifically audio (see my site for info). On my single-cpu PC, that's running at 15ms/threadswitch , when I didn't use intelligent sync, it was easy to achieve 15-25ms latency.
Please use a smaller graphic in your signature.

u

About "periodic backup", the best to do is base your code on the idea of:

QuickCloneSObjectData proc ; ecx= _this
local bufSize,pData

LockReadObject ecx
m2m bufSize,[ecx].SObject.dwSize
UnlockReadObject ecx

    tryAgain:
mov pData,malloc(bufSize)

LockReadObject ecx
mov eax,[ecx].SObject.dwSize
.if eax!=bufSize
mov bufSize,eax
UnlockReadObject ecx
free pData
jmp tryAgain
.endif
invoke memmove,pData,[ecx].SObject.pData,bufSize
UnlockReadObject ecx

mov eax,pData
mov edx,bufSize
ret
QuickCloneSObjectData endp
Please use a smaller graphic in your signature.