News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Core Blimey

Started by oex, February 13, 2010, 01:46:36 AM

Previous topic - Next topic

oex

Hey guys,

I was just thinking about multicore (something I dont have but many now do).... I'm assuming if I create a thread it will be automatically executed on a free core by windows? Or do I somehow have to specify which core to use?.... Also is there a quick and dirty way of finding out if a system is multicore? I did see a way in dedndaves CPUID prog but I didnt get what affinity was? I'm assuming threads only have a minor overhead but I'd rather not make an app multithreaded if it will only have a negative impact on execution.... Finally assuming the above assumptions are correct on windows management of threads, will threads be executed on a single core or cross core? If I have an app with 2 threads, core 0 executes main app code and cores 1 and 2 execute 2 threads so is core 3 just sitting there twiddling it's bits just waiting for a kick up the arse from the main app or will it help out cores 0,1 and 2?
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

dedndave

in the version i am working on, i use GetProcAddress of SetProcessAffinityMask to test if the OS supports multiple cores
it is more reliable than getting the OS version - supposedly, NT4 and up support multiple cores
the tricky one is windows CE, for embedded systems
i found different answers to "does CE support multiple cores" - i suspect newer versions may - older ones may not
some versions of CE are "buildable" and the OEM may be able to eliminate the API's

once you see that the OS supports it, you can use GetProcessAffinityMask to get 2 masks; 1 for the process and 1 for the system
if you want to know how many (enabled) logical cores are in the system, just count the bits in the system affinity mask
they could be hyper-thread cores, physical cores in the same package and/or multiple packages

as for the thread assignment, the OS will assign a core for a new thread
there is no documented guarantee that it won't switch cores but if it does, i doubt it happens very often
for single-core or multi-core machines, the thread is given time-slices, just like any other process
unless you bind a thread to a specific core, it can operate on whichever core the OS sees fit
for threads, there are also GetThreadAffinityMask and SetThreadAffinityMask API's

remember there are other processes running all the time, so a "free core" probably doesn't happen very often
if the OS is busy or if you have multiple programs running - who knows how it will assign threads to cores
you may have 3 threads all running on the same core
but, they get seperate registers and stacks, etc (i.e context)

oex

ty that helps solidify my understanding, my apps are rather intensive so being able to use multiple cores if available is a big plus
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

jj2007

Make sure to avoid multiple threads fighting for the reading head of your hard disk...

dedndave

yah - it probably makes sense to let the OS finish it's work on files one at a time for large reads or writes
although, if you have an app that sparsely reads or writes small sections of different files, threading might make sense

hutch--

If you run an app like this you will see that the OS tends to distribute the load across the different cores anyway. JJ is right that you should not let different threads assault a disk at the same time, try and do that from one thread alone as it will be faster if the disk id not being thrashed between two or more threads. The thing that spreads the load around is normal OS time slicing so depending on the thread duration(s) as one thread fionishes the core will be re-used in the next time slice to carry the load of the othet threads that are still running.

Something worth uderstanding is that the more threads you start, the harder each core must work so you should keep the number of non suspended threads down to the core count if possible.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

oex

The multithreading is mainly for in memory compression/decompression so this works for me, ty for the input guys
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

brethren

maybe the code in examples/exampl10/threads/mprocasm will be helpful

heres what it says in the comment
Quote
        The original design for this example was written by
        "c0d1f1ed" in Microsoft C++.

        It has been ported to MASM with a number of corrections
        and has been simplified to test on 1, 2 and 4 core
        processors. It is also one tenth of the size as is
        consistent with pure assembler programming.

        The design is to sequentially start 1 2 and 4 thread
        without using leading or interactive operating system
        thread synchronisation methods which removes a major

        timing delay and it uses an operating system
        synchronisation method on thread exit so the results
        can be displayed when all threads have terminated.

        On a single core machine the results of the two and four
        thread tests should be two and 4 times longer.

        On a dual core machine the two thread test should run in
        much the same time as the single thread test and the four
        thread test should be two times longer.

        On a quad core machine all three tests should have a
        similar timing.
*

dedndave

i do find that interesting
i have a dual core (hyperthreaded - not two seperate cores)
the results from that test indicate that i have a single core
i would not want to use the timing method to map cores to affinity mask bits,
but i may be able to adapt something like it to verify whatever method i do come up with

oex

Quote from: dedndave on February 13, 2010, 04:43:51 PM
i do find that interesting
i have a dual core (hyperthreaded - not two seperate cores)
the results from that test indicate that i have a single core

That sounds bad, sounds like there is no way to force windows to use an idle core for a thread? Any ideas on the logic used for multi core tasking? Maybe this example is just not multicore with hyperthreading compatible for some reason?
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

jj2007

Quote from: oex on February 13, 2010, 07:02:39 PM
That sounds bad, sounds like there is no way to force windows to use an idle core for a thread?

No, it seems there is no idle core with HT because there is only one core - it is just a bit more efficiently used, about 30% or so. Windows will choose an idle core automatically for you.

dedndave

no - that sounds like it ought to sound
my hyper-threaded core is really a single core and the test reveals that
a hyper-threaded core is essentially an additional set of registers and context - 2 "logical" cores sharing a single "physical" core

as for thread scheduling, the best you can do is divide your thread requirements equally amongst the cores that are present
you can use the SetThreadAffinityMask API for that
in reality, you can probably ignore affinity altogether and let the OS schedule them for you
it will probably do as well or better than you can

EDIT - i can see a case where i might want to manually control scheduling
let's say i have one thread that is extremely processor intensive
and a few other threads that are more-or-less "background" threads
i might bind the intensive thread to one core and the others to a different core
the only reason taking control makes sense is because i know in advance that one thread is intensive
the OS cannot make that kind of prediction

oex

ok ty for that info I thought I'd wasted an evening :D
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

oex

#13
hmm just had a thought.... it should be possible to write a macro that replaces the invoke call with something like invokethread proc,args.... This could be invaluable for multicore machines.... I'm rather busy atm got a deadline next week but I know some of you enjoy a challenge.... Not thought it all through and not sure how to get size of macro argsso got stopped at the first hurdle but passing the data something like:


Invoke Macro:

invokeThread MACRO FuncName:REQ,args:VARARG

mov esi, alloc(32)

arg equ <invoke FuncName>

mov [esi], ADDR FuncName
add esi, 4

FOR var,<args>
IF issize(var, 1)
mov [esi], var
inc esi
ENDIF
IF issize(var, 2)
mov [esi], var
add esi, 2
ENDIF
IF issize(var, 4)
mov [esi], var
add esi, 4
ENDIF
ENDM

invoke CreateThread, 0, 0, offset MyThread, esi, 0, offset ThreadID
ENDM


And then reading them back off in MyThread should be quite easy if you can get (and pass) arg sizes

I dont like to waste these little sparks of inspiration ;) far better to share them
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

hutch--

The late PIVs were reasonably sophisticated for a single core, hyperthreading worked OK on a late enough OS but the availability of multiple core processors produced far better threaded performance. The next generation i7 series do both, hyperthreading AND multiple cores and from the tesing I have seen it makes a big difference again with multithreaded code. The real problem with later PIVs was the pipelie length, if you coded carefully for them you could get them to perform OK but tangle your instruction sequence and you took some big performance hits for it.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php