News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

why is multicore such a load of junk?

Started by johnsa, October 02, 2008, 10:20:42 AM

Previous topic - Next topic

johnsa


With 1 thread the first core is maxed out (50% of total available cpu). With both threads 2 cores are fully maxed out (100% of cpu used) and the final timings are identical (in fact 2 threads is slightly slower).

I've tried this same approach to testing various other algorithms and sofar, as I mentioned previously not 1 has even gained more than 20% by having 2 cores thrown at it.

So if normalizing vectors is a bad example (I thought it's a pretty real world example) then how about doing the same test but calculating fresnel factors or something? I can tell you now.. it will end it tears.. with no performance gain from multi-core :)

Mark_Larson

Quote from: johnsa on November 23, 2008, 11:38:32 AM

With 1 thread the first core is maxed out (50% of total available cpu). With both threads 2 cores are fully maxed out (100% of cpu used) and the final timings are identical (in fact 2 threads is slightly slower).

I've tried this same approach to testing various other algorithms and sofar, as I mentioned previously not 1 has even gained more than 20% by having 2 cores thrown at it.

So if normalizing vectors is a bad example (I thought it's a pretty real world example) then how about doing the same test but calculating fresnel factors or something? I can tell you now.. it will end it tears.. with no performance gain from multi-core :)

there is actually an open source raytracer that is done as a tutorial on devmaster.net

I'll send you the link.  they actually have 7 tutorials with source code, but I could only get the first 3 to compile.


that way we have a real raytracing example.  And then we can see what happens.  Yea Quake 3 had the same problem.  They only got a 20-30% speed up from going multi-core.  So in general your statement is accurate.  But there are special cases.


this is the first tutorial.  That tutorial links to the rest.  The pdf I posted I got from this tutorial.
http://www.devmaster.net/articles/raytracing_series/part1.php

BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

vanjast

Has anybody mentioned yet that the with dual/quad core that you can get the same amount of work done for less power wastage.
The CPU runs cooler..
That's if the OS is optimised for this, that is
:8)

BogdanOntanu

Quote from: vanjast on November 24, 2008, 11:36:24 PM
Has anybody mentioned yet that the with dual/quad core that you can get the same amount of work done for less power wastage.
The CPU runs cooler..
That's if the OS is optimised for this, that is
:8)

No, nobody did mention that because it is NOT true.

Think a little.... if you have a dual core CPU with a fixed number of transistors and it consumes a certain amount of power say 40W.

It is then simple logic that IF you make the chip in half ie 1/2 THEN it will consume 40W/2=20W and it will dissipate less power and it will run much more cooler than a dual core. The same logic works for 4xcore or 8xcore. Yes you can stop parts of the CPU BUT by design in a multi core CPU there will be some stuff shared and you can not stop that parts because they are needed for the one part that is still running. And if you stop them then you loose the "multi core advantage".

Hence what you say is illogical.

However it is true that faster but "older" P4 CPU's are usually made in a technology less advanced than the technology used to the "new" dual/quad cores and because of this they do consume much more power. But this is done on purpose in order to promote new multicore CPU's and to diminish the importance of faster single cores because apparently they do consume more and are harder to produce. Simple market manipulation.

Yes, in practice the faster you switch (depending on technology) the more power "might" be lost in the transition phases. However adding 2x or 4x number of switching elements and packing them on the same chip is not going to help either. Having shared parts that can never be switched off but have to be designed to accommodate 2x or 4x load does not help either.

One way to reduce internet power consumption would be to use a much smaller picture in your signature and eventually a less offending one. More space for text information... :P
Ambition is a lame excuse for the ones not brave enough to be lazy.
http://www.oby.ro

Mirno

The last set of comparable single and dual core processors out I could find was the Yonnah core (not core2) duo vs solo.
The 65nm Core solo is quoted with a TDP of 27 watts, while it's dual core equivalent is only 4 watts more.

So it appears that while it is "illogical" it is the case that there are power savings. Further to this, both Intel and AMD have learnt much from their initial forays into the dual core world, and now have separate power planes for each core (as well as memory controller, and IOs), so that power saving should be reflected more evenly if the load on the two cores is unbalanced.


Mirno

BogdanOntanu

Quote
So it appears that while it is "illogical" it is the case that there are power savings.

This is like this because "they" want it to be so ;)

The manufactures can and in fact will promote what they want to produce in the future and will "leave behind" in more or less "subtle" ways the products that they do not want to sell anymore.

Anyway the "writing is on the wall": no matter if good or bad the near future belongs to multi core CPU designs because the producers say so and the consumers have no way to influence this. And it might well be the "whole" future not only the "near" future.

"Junk" or "not junk" it does not really matter. Just accept it as it is and make the best out of it because you can not produce your own CPU :D

Ambition is a lame excuse for the ones not brave enough to be lazy.
http://www.oby.ro

Mirno

If by "they", you mean the laws of physics, then yes.
Die shrinkage isn't the solution it once was - electrical leakage becomes a more and more dominant figure with each sucessive manufacturing process, and we can't just ramp the clocks up anymore.
Materials sciences have been pushed pretty far, but we simply cannot disipate the heat that the silicon would produce at the higher levels. Both Intel and AMD's latest architectures both hit 6+ GHz, but they need liquid nitrogen cooling to do it. There isn't any more speed available with air cooling.

The fact that even ARM are looking to dual core processor design says a lot about the direction of the industry, and I'm inclined to believe they (and the VHDL engineers I worked with, and all the other industry analysts) have a good reason for supporting multi-core.

Mirno

vanjast

Quote from: BogdanOntanu on November 25, 2008, 07:57:02 AM
Yes, in practice the faster you switch (depending on technology) the more power "might" be lost in the transition phases.
A freudian slip here maybe...  :wink I realise the numbers, but haven't looked properly into this. I vaguely remember seeing it mentioned someplace.
Maybe it was the power dissipation efficiency... lower clock speeds + slightly bigger die area = better cooling (or running cooler)

Quote from: BogdanOntanu on November 25, 2008, 07:57:02 AM
One way to reduce internet power consumption would be to use a much smaller picture in your signature and eventually a less offending one. More space for text information... :P
...the exact reason why I didn't post a pic of myself...  :bg :green2

Rockoon

It is not johnsa's benchmark that I disagree with, but rather his conclusion:

Quote... but if something like this has no benefit from 2+ cores.. then it follows logically that 90% of all the code you're going to be using/writing especially in the time critical parts won't either.

It follows logically that the specific algorthm in question is bottlenecked on something...

...and until we identify that that something is we cannot attribute that something to 90% of all algorithms, or even 1%.

Both Intel and AMD provide tools for identifing such things (VTune and CodeAnalyst.)


Someone had asked if the single-process version used 100% (50%) of a core..

..he was barking up a wrong tree because no time at all is ever spent in the task schedulers idle loop on the core running it (the thread never sleeps or waits for an event), so by definition its 100% usage until its done.


When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

johnsa

Ok.. here is how I see it..

This debate has gone on for quite some time and in numerous threads. I am not saying that what I've said is gospel, it is merely my opinion (albeit based on a fair amount of experimentation).
I would like to approach this in a scientific manner, and am open to debate. I'll be the first to admit that I am wrong (and gladly so as it would imply a brighter future for development).
So with that said the only thing I can suggest is that we come up with an agreed upon test-bench to prove or disprove my argument. I have presented a small sample above, which you may or may not agree to be a good test.
I believe it to be a real-world example, sofar no one who has disagreed with me has presented anything that we can all re-compile/assemble and test with multi-core options to prove that there is performance to be gained from doing so.

I'm up for the challenge, anyone else want to jump in and help put something together that we can switch between 1/2/4 cores and see the benefit (something that everyone can agree to be a "real-world" example)?

The rt ray-tracer would be good, but is quite a large under-taking to prove the case, perhaps something on smaller scale but more inclusive than my simple vector sample.

Rockoon

If you want to approach it scientifically, then fire up VTune or CodeAnalyst and gather some data. Scientists measure.
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

johnsa

I have been measuring.. using VTune and custom timers...

At the end of the day if an algorithm completes a finite amount of work in a set time, if that time decreases it will be perceived as increased performance. Thats a fact.. and that can be measured in MANY ways,  timers, visually, using a watch, using profiling tools (which will obviously give more detail as to where and how the time is used).

Nobody has been able to prove me wrong, apart from little jests and sarcastic comments.

In any event thanks for you very un-helpful post. :)

sinsi

Have a look at stuff like hyper-v, each server running by itself - that's why new cpu's have virtualization. Instead of 4 servers, each with its own computer,
we have one quad-core running all 4. Single point of failure sure but hardware seems to last longer nowadays.
Light travels faster than sound, that's why some people seem bright until you hear them.

Rockoon

Quote from: johnsa on November 28, 2008, 07:54:14 AM
I have been measuring.. using VTune and custom timers...

Timers cannot answer the question posed. VTune should be able to tell you how often cache misses are happening, how often branches are mispredicted, and so forth.

Where is the extra 100% of the time spent?


When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

nuckfuts

Quote from: sinsi on November 28, 2008, 08:07:22 AM
Have a look at stuff like hyper-v, each server running by itself - that's why new cpu's have virtualization. Instead of 4 servers, each with its own computer,
we have one quad-core running all 4. Single point of failure sure but hardware seems to last longer nowadays.

A single point of failure yeah, but if you *need* all four servers, 1 point is better than 4 points.  Not to mention it's virtualized, so putting it on other hardware could happen pretty darn fast, if not automatically too.