Multicore theory proposal

hutch-- · May 31, 2008, 01:44:55 AM

John,

I just tweaked the alignment example so it made a bit more sense. I have just come out of hospital for a day and I am stil a bit wandery from the general anasthetic.

One question, why use a structure with only one member when a single unsigned 32 bit value would do the job fine and with less code ?

Code Select


; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

comment * -----------------------------------------------------
                        Build this  template with
                       "CONSOLE ASSEMBLE AND LINK"
        ----------------------------------------------------- *

    spinlock_t STRUCT
      _lock dd 0
    spinlock_t ENDS

    .code

start:
   
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

    call main
    inkey
    exit

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

main proc

    LOCAL pbuffer   :DWORD          ; buffer pointer
    LOCAL pstruct   :DWORD          ; start address for structure
    LOCAL buffer[1024]:BYTE

    lea eax, buffer                 ; get the buffer address
    mov pbuffer, eax                ; write it to buffer pointer

    push esi

    lea esi, pbuffer                ; load buffer offset into ESI
    memalign esi, 128               ; align address in ESI to 128 bytes
    mov pstruct, esi
    mov (spinlock_t PTR [esi])._lock, 12345678  ; < load you value here

  ; -----------------------
  ; test code for alignment
  ; -----------------------
    lea eax, (spinlock_t PTR [esi])._lock
    print str$(eax)," aligned spinlock_t structure member address", 13,10

    print str$(esi)," ESI aligned value",13,10
    memalign esi, 128
    print str$(esi)," ESI aligned value after realigning it to 128 bytes again",13,10

    pop esi
    ret

main endp

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

end start

johnsa · May 31, 2008, 02:48:28 PM

Hi,

Thanks for that example!

The reason I wanted to put the lock in a structure to start with is that I was thinking of expanding the structure with some other data to indicate a R/W condition on the lock and possibly some sort of call-back queue that would be processed as the spin-lock times out.

c0d1f1ed · June 01, 2008, 10:22:37 PM

Quote from: hutch-- on May 31, 2008, 01:18:47 AM
I think you miss where I come from in relation to multicore or multiprocessor computers.

I don't think that's relevant, but ok...

QuoteLike most people wity a reasonable grasp of modern computing I see multicore procesing as the future but at the common desktop PC level I see it as somewhere in the future as I don't see that capacity in current dual core hardware as being even vaguely near fast enough to do general purpose work.

Why? New projects appear every day reaching 90% higher performance on dual-core, proving the contrary.

QuoteTo make a point back in the 16 bit DOS days I had the technical data on FLAT memory model ever though it was another 7 years until an OS properly supported it at the PC level.

You're not making any point for multi-core here. Mainstream multi-core processors were supported the day they were available in stores.

QuoteI see multicore in much the same light as FLAT memory model in 1988, something that will be usefl in the future but not realy viable at the moment.

Nostalgic references to unrelated technology are not an argument. It's 2008 and budget PC's come with a very capable dual-core CPU. If you want your software to run faster and continue to become faster with newer processors you have to go multi-core sooner rather than later.

QuoteMultithread multicore processing is already with us in terms of multiple concurrent threads for things like terminal servers and web servers as they routinely handle that type of workload but the hardware is not yet suitable for close range high performance computing.

Again you're trying to get away with this without any arguments. Inserting and removing elements from a lock-free list takes only a few dozen clock cycles, so current hardware is very capable of running close range high performance workloads. What exactly do you think is still missing?

QuoteNow where this will make a difference is when you can approach a task that is by its layout not suitable for parallel processing, compression comes to mind here which effects not only simple data compression but formats like MP2 and MP4 video compression which nees to be linear (serial) in its nature to acheive very high compression rates.

There will always be a number of algorithms not suited for parallel processing. That doesn't take away the vast benefits of multi-core for algorithms that do scale well with concurrent threads. By the way, compression parallelizes quite well: H.264, JPEG, Lempel Ziv, ...

QuoteThink in terms of a 64 core x86 processor where the core design can not only handle current concurrent threads in the normal manner but can handle parallel processing on a sngle thread without the messy fudges that are curently required and where you can get about a 1.9 times increase in computing power for extra core used in the algorithm. It says for the use of 10 core instead of one that you wil get about 8 to 9 time that processing power.

Please specify in great detail how that would work. Show me some references to related research.

QuoteThe action is in interfacing multiple cores in an efficient manner and this will only ever be done at a very high speed hardware level, emulating software cooperative multitasking at a core level is doomed to failure as it can never be fast enough.

Tell me again how you're going to achieve that.

The fact of the matter is that multi-core processor architectures will not change drastically for the foreseeable future. You'll still need significant changes to the software to make use of the extra cores. And you can't avoid the issue by referring to mythic hardware.

MichaelW · June 01, 2008, 11:59:08 PM

Quote from: c0d1f1ed on June 01, 2008, 10:22:37 PM
Quote from: hutch-- on May 31, 2008, 01:18:47 AM
To make a point back in the 16 bit DOS days I had the technical data on FLAT memory model ever though it was another 7 years until an OS properly supported it at the PC level.

You're not making any point for multi-core here. Mainstream multi-core processors were supported the day they were available in stores.

As was the 386, but "proper" OS support for anything near its full capacity didn't appear until years later.

QuoteNostalgic references to unrelated technology are not an argument. It's 2008 and budget PC's come with a very capable dual-core CPU. If you want your software to run faster and continue to become faster with newer processors you have to go multi-core sooner rather than later.

Quote
The fact of the matter is that multi-core processor architectures will not change drastically for the foreseeable future. You'll still need significant changes to the software to make use of the extra cores. And you can't avoid the issue by referring to mythic hardware.

Quote from: c0d1f1ed on May 14, 2008, 09:45:29 AM
Multi-core programming is in its infancy and it's going to require innovation on the software front (both low-end and high-end), and hardware front to achieve superior results.

hutch-- · June 02, 2008, 01:07:11 AM

c0d1f1ed,

Quote
The fact of the matter is that multi-core processor architectures will not change drastically for the foreseeable future. You'll still need significant changes to the software to make use of the extra cores. And you can't avoid the issue by referring to mythic hardware.

This is probably the major aea of our disagreement, I see that multicore processing will change in almost unimaginable ways in the next 5 to 10 years if there is not another technical brakthrough inbetween and the techniques that are being proposed at the moment as well as the current OS design techniques will end up in the dustbin of history.

I have known for a couple of years that both Intel and AMD have 32 and 64 core processors in the early design stages and this type of core arrangement is well beyond the co-operative multitasking you have alluded to over the range of this debate.

there is memory technology in the pipeline that is vastly faster than current memory and it is also well known that memory is still the major bottleneck in current software performance so there are major performance gains here.

It is simply a mistake to assume that current technology contains the architecture of the future, there are lessons from the past that making the same assumption has often failed, 64k of memory, why would you ever need to go faster than a 33 meg 486 etc etc ....

Real multicore/multiprocessor computing is ALREADY WITH US, its just that 120000 US$ tops what most people wish to spend on a desktop but the model is clear, forget trivial fudges of up to 1.9 times faster than a single core, think of dozens to hundreds of times faster than a single core and you will have some idea of where multicore hardware is going.

I have a wait and see approach as I don't see the magnitude of hardware improement yet and I doubt I will see it for a few years. let some other patsy waste their time and money on hardware and techniques that will not last, just like the win32s guys did, just like the Itanuim development guys did, RISC boxed etc .....

When the hadware and OS support is there, this stff will work and it will be fast, not just a coupe of core but far more and far faster.

c0d1f1ed · June 02, 2008, 09:18:01 AM

Quote from: MichaelW on June 01, 2008, 11:59:08 PM
As was the 386, but "proper" OS support for anything near its full capacity didn't appear until years later.

Which again has nothing to do with multi-core. Multi-processor support has been widespread for some time so when multi-cores appeared they were fully supported. If you think there's still a lot missing, please specify.

Quote
Quote from: c0d1f1ed on May 14, 2008, 09:45:29 AM
Multi-core programming is in its infancy and it's going to require innovation on the software front (both low-end and high-end), and hardware front to achieve superior results.

Read the context. While multi-core hardware is going to make significant progress (more cores, faster inter-core communication, etc.), there is no magical technology that will make it as easy as sequential programming to make use of the full concurrent processing capacity. Unless, you use high-level tools that help extract the parallelism.

MichaelW · June 02, 2008, 10:37:30 AM

QuoteWhich again has nothing to do with multi-core. Multi-processor support has been widespread for some time so when multi-cores appeared they were fully supported. If you think there's still a lot missing, please specify.

Please define "fully supported".

hutch-- · June 02, 2008, 12:41:17 PM

c0d1f1ed,

Have a look at this thread and you will see that your assumptions are unsound. In particular look at the graphs that Greg has posted and it shows clearly that a single thread is being processed by both cores. Forget abstraction, high level tool to make it all easier and magical high level libraries that will do it all for you, this IS being done in hardware on a modern dual core Intel processor. The future is more of the same but with many more cores interfaced in hardware.

http://www.masm32.com/board/index.php?topic=9297.0

PBrennick · June 02, 2008, 01:29:59 PM

Hutch,

By now, after all the experiences we have had with that type of scenario over the years, this has really become an axiom, hasn't it?

Quote
This is probably the major aea of our disagreement, I see that multicore processing will change in almost unimaginable ways in the next 5 to 10 years if there is not another technical brakthrough inbetween and the techniques that are being proposed at the moment as well as the current OS design techniques will end up in the dustbin of history.

Well said, any way you look at it., this happens because (not when) hardware technology out-paces software development, doesn't it? And it has happened over and over again.

Paul

johnsa · June 03, 2008, 09:02:36 PM

If I am understanding correctly ... from this graph ... what is actually going on inside the cores/OS without us even knowing (perhaps in some undocumented attempt) is that intel/amd/ms are already looking at ways to get the cores to automatically handle the processing load of sequential code without actually having to "multi-thread" at all... which in my mind is the perfect solution... IE: no solution :)
don't multi-thread and let the cpu work out how best to split the instructions up that it receives amongst it's cores.

hutch-- · June 04, 2008, 12:49:33 AM

Its my guess over time that with the core count increase that we will see a technique of core clustering in much the same way as the 4 core double core 2 Duos work at the moment but on a much larger scale. It means that both ends of the spectrum are being approached so that close range linear code will benefit from automatic scheduling between 2 or more cores in a cluster while the number of clusters of cores will improve the performance of multithreaded code by at least a factor of the number of clusters.

I would expect to see later OS versions with far more accurate thead synchronisation methods that have far finer granularity than current OS versions which should allow greater parrallelism between multiple clusters, each in itself scheduling instructions within a single thread across the number of cores in a cluster.

NightWare · June 04, 2008, 02:07:14 AM

Quote from: hutch-- on June 04, 2008, 12:49:33 AM
I would expect to see later OS versions with far more accurate thead synchronisation methods that have far finer granularity than current OS versions which should allow greater parrallelism between multiple clusters, each in itself scheduling instructions within a single thread across the number of cores in a cluster.

:lol stop dreaming, it will not happen (i mean at os level), ms has never used hardware improvements (like cmov, mmx, etc...) in their os like they should do... and only reserve that for extra stuffs (like directx). beside, threads at os level, is only made to deal with a lot of differents programs, nothing more... in the contrary, at hardware level it's an attempt for parallelism... it's different. so if you place yourself in the point of view of ms, why coding something if it's automatically made at hardware level ?

now, it's good for us (asm coder), it's easy to develop a system to efficiently share a task, with simple bolean operators (until it's ^2), something similar to intel's hardware system... beside, why coding in asm if it's to let the os to do the job for us ? (and remember, here we speak of the guys who coded win3.11, win95, win98, win98se, winMe and more recently winVista ! :bg)

hutch-- · June 04, 2008, 02:19:33 AM

:bg

We probably differ here, once a capacity is built into hardware, some years later MS tend to put it to use. 386DX multitasking was eventually put into early 32 bit NT as the OS would not work on earlier stuff. I see multiprocessor/core hardware in much the same light, current OS version only touch the fringe of its capacity and until there is both major hardware and software changes, tis will not change much.

Now while it may hae to wait for Windows Galaxy to be properly implemented with a minimum hardware spec, it is inevitable that both approaches will see development, close range hardware controlled core synchronisation and independent threads on diferent clusters and in this sense true parallelism will become a reality, just don't hold your breath waiting. :P

c0d1f1ed · June 04, 2008, 09:44:11 PM

Quote from: hutch-- on June 02, 2008, 01:07:11 AM
This is probably the major aea of our disagreement, I see that multicore processing will change in almost unimaginable ways in the next 5 to 10 years if there is not another technical brakthrough inbetween and the techniques that are being proposed at the moment as well as the current OS design techniques will end up in the dustbin of history.

Please name these breakthroughs, or at least explain in detail how they would work in theory. You're claiming fantastic advancements in hardware that will make it unnecessary to explicitely do concurrent processing, so lets hear about them.

QuoteI have known for a couple of years that both Intel and AMD have 32 and 64 core processors in the early design stages and this type of core arrangement is well beyond the co-operative multitasking you have alluded to over the range of this debate.

Why? With high-level languages there is plenty of opportunity to extract a high level of parallellism. Again, look at RapidMind and SystemC for inspiration. Also please explain to me how you're going to program a 64 core in pure and straightforward assembly if you're not even considering programming much simpler dual- and quad-cores today. Extracting parallelism beyond just a few instructions is a high-level software problem. Only the developer (or a high-level tool) knows what tasks are independent and therefore can run concurrently. Single-threaded programming has no future.

Quotethere is memory technology in the pipeline that is vastly faster than current memory and it is also well known that memory is still the major bottleneck in current software performance so there are major performance gains here.

Again, name that technology. Are you talking about Z-RAM, embedded RAM, etc? That's all great stuff but it will equally benefit each core and won't help single-core software reach the performance of multi-core software.

QuoteIt is simply a mistake to assume that current technology contains the architecture of the future, there are lessons from the past that making the same assumption has often failed, 64k of memory, why would you ever need to go faster than a 33 meg 486 etc etc ....

You're referring to totally unrelated things. In the past, every upgrade of the CPU and memory made your software faster and made it easier to program for them. This just doesn't hold for multi-core. To put more transistors to work you have to explicitely make them do independent tasks.

QuoteReal multicore/multiprocessor computing is ALREADY WITH US, its just that 120000 US$ tops what most people wish to spend on a desktop but the model is clear, forget trivial fudges of up to 1.9 times faster than a single core, think of dozens to hundreds of times faster than a single core and you will have some idea of where multicore hardware is going.

We're talking about mainstream systems here. But either way the number of cores is increasing. If you're going to wait till we have dozens of cores you'll have missed at the very least a decade of opportunity to write faster software than the competition.

QuoteI have a wait and see approach as I don't see the magnitude of hardware improement yet and I doubt I will see it for a few years. let some other patsy waste their time and money on hardware and techniques that will not last, just like the win32s guys did, just like the Itanuim development guys did, RISC boxed etc .....

Feel free to wait and see. But you'll be waiting forever to make your software faster. Amateurs using the tools and frameworks for multi-core programming will write faster software than you.

QuoteWhen the hadware and OS support is there, this stff will work and it will be fast, not just a coupe of core but far more and far faster.

That's just wishful thinking. The brightest people in the industry and the academic world haven't come up yet with a realistic way to make single-core performance scale like multi-core performance could. They also don't think any significant O.S. support is missing. A big topic right now is transactional memory, but while it might allow software to scale beyond, say, eight cores, it's not a silver bullet by any stretch. In particular, you still have at least the same architectural complexity needed for efficient dual- and quad-core programming. It's just yet another synchronization primitive that might be added to the concurrent programming toolbox, out of necessity.

c0d1f1ed · June 04, 2008, 09:47:37 PM

Quote from: MichaelW on June 02, 2008, 10:37:30 AM
Please define "fully supported".

You can create threads, suspend them, and resume them. That's pretty much everything you need to have a thread per core and have them schedule and process tasks. Synchronization primitives can be implemented at the application level with no need for O.S. interaction.

News:

Multicore theory proposal

c0d1f1ed

c0d1f1ed

c0d1f1ed

c0d1f1ed