The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: hutch-- on April 20, 2008, 10:21:19 AM

Title: Idea for multithread memfill.
Post by: hutch-- on April 20, 2008, 10:21:19 AM
I remember a thread recently producing a faster memory fill algo and at the time I made a suggestion that seperating the memory fill from the work done on the memory may be faster if one could be done in a seperate thread from the other.

Below is the rogh idea of what I was suggesting, it is a long way from being fast enough due to crteating a new thread for each memory fill but there may be a way to keep the fill thread running and signalling it by inter thread communication of when to fill a buffer.


; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

comment * -----------------------------------------------------
                        Build this  template with
                       "CONSOLE ASSEMBLE AND LINK"
        ----------------------------------------------------- *

    tdata STRUCT
      pbuffer dd ?      ; buffer address
      bufsize dd ?      ; buffer size
    tdata ENDS

    start_fill_thread PROTO :DWORD
    fill_thread  PROTO :DWORD

    .data?
      signal dd ?
      loaded dd ?

    .code

start:
   
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

    call main
    inkey
    exit

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

    memcnt equ <1024*1024*100>

main proc

    LOCAL pmem1 :DWORD
    LOCAL pmem2 :DWORD
    LOCAL tdat  :tdata

    mov pmem1, alloc(memcnt)                        ; alocate 2 blocks of memory
    mov pmem2, alloc(memcnt)

  ; ||||||||||||||||||||||||||||||||||||||||||||||||
  ; fill pmem1 while working on pmem2
  ; ||||||||||||||||||||||||||||||||||||||||||||||||

    mov eax, pmem1
    mov DWORD PTR tdat.pbuffer, eax
    mov DWORD PTR tdat.bufsize, memcnt              ; 100 meg buffer

    invoke start_fill_thread,ADDR tdat              ; start new thread

  ; ************************************
  ; do your own processing here on pmem2
  ; ************************************

  spinlock0:
    invoke SleepEx,1,0
    cmp signal, 0                                   ; test if thread is finished yet
    jne spinlock0
    print "Thread finished",13,10

  ; ||||||||||||||||||||||||||||||||||||||||||||||||
  ; fill pmem2 while working on pmem1
  ; ||||||||||||||||||||||||||||||||||||||||||||||||

    mov eax, pmem2
    mov DWORD PTR tdat.pbuffer, eax
    mov DWORD PTR tdat.bufsize, memcnt              ; 100 meg buffer

    invoke start_fill_thread,ADDR tdat              ; start new thread

  ; ************************************
  ; do your own processing here on pmem1
  ; ************************************

  spinlock1:
    invoke SleepEx,1,0
    cmp signal, 0                                   ; test if thread is finished yet
    jne spinlock1
    print "Thread finished",13,10

  ; ||||||||||||||||||||||||||||||||||||||||||||||||

    free pmem2
    free pmem1

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

start_fill_thread proc pstruct:DWORD

    LOCAL tID   :DWORD

    mov loaded, 0                       ; clear the loaded flag

    invoke CreateThread,0,0,fill_thread,pstruct,0,ADDR tID

  spinlock:
    invoke SleepEx,1,0
    cmp loaded, 1
    jne spinlock

    print "Thread started",13,10

    mov eax, tID

    ret

start_fill_thread endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

fill_thread proc pstruct:DWORD

    LOCAL pbuf  :DWORD
    LOCAL sbuf  :DWORD

    push ebx
    push esi
    push edi

    mov signal, 1                       ; set the signal flag
    mov edi, pstruct

    lea ecx, (tdata PTR [edi]).pbuffer  ; copy structure data to LOCAL
    mov ecx, [ecx]
    mov pbuf, ecx

    lea ecx, (tdata PTR [edi]).bufsize  ; copy structure data to LOCAL
    mov ecx, [ecx]
    mov sbuf, ecx

    mov loaded, 1                       ; set the loaded flag for the calling proc

    invoke memfill,pbuf,sbuf,0

    pop edi
    pop esi
    pop ebx

    mov signal, 0                       ; clear the signal flag

    ret

fill_thread endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start

[attachment deleted by admin]
Title: Re: Idea for multithread memfill.
Post by: u on April 20, 2008, 01:36:36 PM
By the time the thread starts, the first thread would have filled dozens of megabytes.
Triple-buffering, with FIFO control of a constantly-staying second thread (only on multicore systems) is the way.

By the way, why are we zeroing so much memory again and again, I see too few cases when zeroing MBs at a time like 100 times/second is relevant :P. Usually, just overwriting memory is enough.
Title: Re: Idea for multithread memfill.
Post by: hutch-- on April 20, 2008, 02:38:50 PM
 :bg

In ancient times (10 years ago) I remember sprites starting as blanks with progressve blitting onto the sprite and finally blitting the sprite to screen. The general idea with the example is to utilise multicore processors if a way can be found that is fast enough to communicate with a slave thread.
Title: Re: Idea for multithread memfill.
Post by: NightWare on April 20, 2008, 09:41:12 PM
Quote from: Ultrano on April 20, 2008, 01:36:36 PM
Usually, just overwriting memory is enough.
yep, it's always interesting to see another approach, but i think attribute a specific task to a cpu will certainly produce better result... but i'm going to try it... just in case...   :wink

EDIT :

hutch,
    lea ecx,(tdata PTR [edi]).pbuffer  ; copy structure data to LOCAL
    mov ecx,[ecx]

can be reduced to :
    mov ecx,DWORD PTR (tdata PTR [edi]).pbuffer
or (shorter) :
    mov ecx,(tdata PTR [edi]).pbuffer
1 instruction less, + 1 stall avoided. and you can also directly use :
    invoke memfill,(tdata PTR [edi]).pbuffer,(tdata PTR [edi]).bufsize,0

now, concerning multithread, i think it's not a good solution... i mean not in this form... create a thread is too much slow... so it have to be created previously, but just one thread (or one extra thread), where all the "background" tasks will be defined... (not fill only) and you jump on thoses tasks when needed, so you should have something like :
ThreadBackgroundTasks PROC
; some init

BackgroundTasksLoop:

NotYet:
cmp BackgroundThread,TRUE
jne NotYet

;here your tasks :

; mov eax,TasksFlags ; 32 bits so 32 tasks possible, it's certainly enough
; and eax,TaskNumberX ; keep the corresponding bit
; jz AvoidTaskNumberX ; if the bit isn't here, avoid the task
;TaskX:
;
;
;AvoidTaskNumberX:


cmp BackgroundThreadEnd,TRUE ; end of the loop
jne BackgroundTasksLoop

; here, to be sure the thread is finished before the main thread/program
invoke SendMessage,WinMainCopy,WM_CLOSE,NULL,NULL
; ret not needed
ThreadBackgroundTasks ENDP

until your program run, here BackgroundThread (and BackgroundThreadEnd) and TasksFlags have to be defined by your program or your main thread (and they must be global variables to be constantly accessible), unfortunately, the loop work until it's ok (so there is a useless cpu usage... unless someone know a solution to avoid this...)

Title: Re: Idea for multithread memfill.
Post by: hutch-- on April 21, 2008, 10:21:31 AM
You have two APIs SuspendThread() and resumeThread() that would be useful to turn a thread on and off but I have not yet seen a fast signalling process so a master process can signal a slave thread to do something.
Title: Re: Idea for multithread memfill.
Post by: MichaelW on April 21, 2008, 04:09:28 PM
The attachment is an attempt to determine how long it takes for a thread to respond to a global flag. Typical results on my Windows 2000 P3 system:

61139 us, mean waiting for end of time slice
20 us, mean using Sleep, 0
6 us, mean using event wait



[attachment deleted by admin]
Title: Re: Idea for multithread memfill.
Post by: NightWare on April 23, 2008, 01:56:03 AM
here an attempt for parallelism... since you can't use 2 thread with same data (or you have to use CreateProcess+CreateRemoteThread),
i've just used one thread to work with the main program, in all cases the cpus are 100% used.

Results of the tests :


Test 1 (default hardware), done in 4373819 cycles


Test 2 (parallelism attempt), done in 4584532 cycles


Test 3 (attempt with another algo), done in 4583756 cycles


Test 4, done in 12 cycles


Test 5, done in 12 cycles


Test 6, done in 12 cycles


Test 7, done in 12 cycles


Test 8, done in 12 cycles


Press ENTER to quit...




[attachment deleted by admin]
Title: Re: Idea for multithread memfill.
Post by: codewarp on May 07, 2008, 10:46:12 PM
Hutch,

First time back to this board after several years.  During those years, I have been doing a great deal of work to achieve maximum performance out of multi-core CPUs.  The conclusion over and over again, is to minimize interaction between CPUs by keeping them in fully occupied in separate memory to the greatest extent possible.  Sychronizing threads is necessary for logic, but a killer on performance.  So initializing data in a separate thread could really only benefit performance if you really have something else to do in the mean time, and often enough to make a difference.  Minimizing the rate of bus-locked instructions by making them unnecessary with better logic--this is the goal.

I am glad to see this site is still active.
Title: Re: Idea for multithread memfill.
Post by: evlncrn8 on May 08, 2008, 06:38:00 AM
Quote from: hutch-- on April 21, 2008, 10:21:31 AM
You have two APIs SuspendThread() and resumeThread() that would be useful to turn a thread on and off but I have not yet seen a fast signalling process so a master process can signal a slave thread to do something.

using events might do the trick, or a mutex?... either way, the delay caused in hopping to the other thread probably kills any possible benefit you might gain for having a seperate proc doing the memfill... even using suspendthread/resumethread...
Title: Re: Idea for multithread memfill.
Post by: hutch-- on May 08, 2008, 11:14:19 AM
From what I have seen so far the only place where multithreading seems to offer an advantage is with multiple processes that are reasonably sparse in terms of processor usage. Doing multiple downloads in seperate threads is typical of this capacity.

Now with a memory intensive operation being performed on one core, I get the impression that the other core may not have a completely free access at different memory addresses even if they are far enough apart. I know on a larger scale between two seperate machines if you use very large data sets you don't get any real problem with TCP/IP but mechanisms like this are far too slow to be useful at close range with processor intensive work.

Ideas of this type may have to wait for different hardware with diferent capacity before they can be realised as the current signalling methods between threads is too clunky and too slow to be of any real use.
Title: Re: Idea for multithread memfill.
Post by: codewarp on May 08, 2008, 04:52:42 PM
Hutch,

To use dual CPUs for performance you must (1) set them up to work independently, and (2) balance their loads so that they finish at the same time.  Of course (1) makes (2) difficult and elusive.   If they don't finish at the same time, you're back to single-processor performance for the remainder of the task--we don't want that.

My solution to this gets the performance equivalent of 1.9 CPUs :dance: (Opteron 170/2800).  I pair up two threads, and give them a queue of actions to perform.  The threads start from opposite ends of the queue, working their way toward the middle, so they both finish within one logical action of each other, no matter how uneven the workload is.  The beauty of this is that the entire thing is managed by single, lock-free counter (of remaining actions).  The first thread to finish gets a count=0, the 2nd thread finishes with count=-1.  The only processor contention is on the counter, and none on the queue.

Forget about calling system thread scheduling functions for managing anything resembling fine-grained computation, because they consume far too many clock cycles to be viable.  You must observe (1) and (2) if you want dual-processor performance from your own code.  The way you are doing it works fine, as long as you can tolerate the initial 6-8 microsecs of thread startup latency :eek.

There is a fundamental principle at work with broad relevance here:  Giving your synchronous duties away to asynchonous entities buys you a lot of waiting around.
Title: Re: Idea for multithread memfill.
Post by: NightWare on May 09, 2008, 01:25:46 AM
Quote from: hutch-- on May 08, 2008, 11:14:19 AM
Ideas of this type may have to wait for different hardware with diferent capacity before they can be realised as the current signalling methods between threads is too clunky and too slow to be of any real use.
hmm... no and yes..., no the ideas don't have to wait, i explain why after... and yes signaling methods are deadly slow... now, reading you make me try a different approach... the use of diffrerent memory locations slowdown the process because of the perpetuals l2 shared cache exchanges... so i tried to do one task (3mb fillmem) but with a work shared by the main program and a thread... (with 4 dwords of decay), and i have obtained a speed up of aproximatively 10% in speed test against the same work made by the full equivalent algo... of course, here i just avoid the l2 cache exchanges (so, there is benefit only in some case...), but it's a start... unfortunatly, instructions like rep stosd/stosd/movsd/movsb can't be used in this case...
Title: Re: Idea for multithread memfill.
Post by: bozo on May 09, 2008, 02:09:33 AM
Instead of using signalling, couldn't you use WaitForSingleObject() on the handle returned by CreateThread() ?
Or add ExitThread() to the fill_thread() routine? then use WaitForMultipleObjects() which would ensure all threads finish before continuing execution.
Title: Re: Idea for multithread memfill.
Post by: codewarp on May 09, 2008, 02:26:31 AM
Quote from: Kernel_Gaddafi on May 09, 2008, 02:09:33 AM
Instead of using signalling, couldn't you use WaitForSingleObject() on the handle returned by CreateThread() ?
Or add ExitThread() to the fill_thread() routine? then use WaitForMultipleObjects() which would ensure all threads finish before continuing execution.

WaitForAnyPrimitiveYouCanSyncOn( ) reduces your computational transaction speed to the speed of the Windows thread schduler.  To get faster, you need to be running already and use lock-free methods.  Organize your methods so that the only time you wait is when you have nothing to do.
Title: Re: Idea for multithread memfill.
Post by: bozo on May 10, 2008, 05:13:23 AM
codewarp, WaitForSingleObject() on the handle returned by CreateThread() ?

where does it say use WaitForSingleObject inside fill_thread()?

if i were gonna implement such a routine, i'd distribute the total size of buffer between N threads
otherwise there is no point really..1 routine is enough.
Title: Re: Idea for multithread memfill.
Post by: Rockoon on May 10, 2008, 08:10:58 AM
The current x64 multi-core paradigm really isnt made for this.

This is in contrast to the current GPU's paradigm, where there are so many cores that having one sitting around waiting for the instant something in particular can be done is no big loss (For example, the 8800GT sells for ~US$150 these days and has a whopping 112 streaming cores.)

Both Intel and AMD are rapidly moving towards 8 cores, with an additional 32+ smaller streaming cores. This is when things will start to get interesting for tightly coupled threads.

The good news?

At least for Intel's Larrabee, these smaller streaming cores will use a subset of the x86 instruction set, so ASM programmers will be able to dive in quickly. They will begin by offering their streaming cores as an add-in card, but the current official roadmap has a 6x16 (6 CPU x 16 Streaming) single-chip codenamed Nehalem coming out sometimes in 2009.

As far as AMDs fusion, these streaming cores will simply be beefed up ATI GPU cores, which kind of stinks for ASM programmers. Probably the biggest mistake AMD will ever make because the ATI GPU design isnt as good as NVidia's, meaning it can't compete on current graphics applications, while also being less than general purpose so it cannot compete against Larrabee either. Things don't look good for long-term AMD competitiveness.

Edited to add:

NVidia is also playing this game, sort of. Their CUDA could potential compete against Larrabee for the short term, but they can't integrate with the CPU so in the end Intel is going to completely monopolize high performance desktop computing.
Title: Re: Idea for multithread memfill.
Post by: sinsi on May 10, 2008, 08:50:05 AM
Quote from: Rockoon on May 10, 2008, 08:10:58 AM
Edited to add:

NVidia is also playing this game, sort of. Their CUDA could potential compete against Larrabee for the short term, but they can't integrate with the CPU so in the end Intel is going to completely monopolize high performance desktop computing.

Intel to buy nvidia, like speculation a few months ago, like AMD buying ATI heh heh.

Let's face it, a GPU can kill a CPU (for certain things, fair enough), but look at gddr5 vs ddr3 - why can't intel/amd keep up with this?
Title: Re: Idea for multithread memfill.
Post by: codewarp on May 10, 2008, 05:39:08 PM
Quote from: Kernel_Gaddafi on May 10, 2008, 05:13:23 AM
codewarp, WaitForSingleObject() on the handle returned by CreateThread() ?

where does it say use WaitForSingleObject inside fill_thread()?

if i were gonna implement such a routine, i'd distribute the total size of buffer between N threads
otherwise there is no point really..1 routine is enough.
That's not the point.  The point is that every time you wait for a thread to do anything, you are adding 10's of thousands of clock cycles to your own thread, giving them away to another process that IS ready to run now.  There is no benefit at all to memfills < 64kb, when it takes 5-10 microsecs of latency before your "helper" gets up to speed.  It's like hiring someone to do a job, who takes a two hour coffee break at the beginning of every task.

I can't believe anyone would even debate this point.  Multi-threading a memory fill to acheive better performance is simply not a well formed concept.  No one in their right mind would do this.  Rather, this example is an attempt to answer the larger question:  How can we take advantage of additional CPUs to enhance the performance of synchronous subtasks?  This is a good question to ask, but the answer is not a very satisfying or resounding yes.  Effective solutions come from understanding and working with the reality of multiple CPUs, not the fantasy.  Ultimately, real multi-cpu performance within an application can only come from free-running fully occupied threads that only minimally interact.  With a good lock-free queuing mechanism, you can benefit from giving other threads tiny jobs to do.  But latency toll will still have to be paid--either from the WaitForStuff( ) calls, or from the jobs active on the queue remaining to be done at any given time.  And you still have to synchronize with the data state changes from other threads, before you can touch their results.

The resistance to this concept from the assembler point of view is understandable--queuing and related support mechanisms are complicated and substantial.  It's the kind of complexity that pushes software toward higher-level solutions and away from assembler.  Die-hard assembler programmers don't like to hear that (I used to be one, so I know).  Work with it and you can achieve, fight it and you won't.
Title: Re: Idea for multithread memfill.
Post by: c0d1f1ed on May 10, 2008, 08:37:13 PM
Quote from: Rockoon on May 10, 2008, 08:10:58 AM
For example, the 8800GT sells for ~US$150 these days and has a whopping 112 streaming cores.
That's 112 scalar units, not 112 cores. In fact it's 7 cores with SIMD units that are 16 elements wide.
QuoteBoth Intel and AMD are rapidly moving towards 8 cores...
Plus the SIMD units are getting wider and deeper. AVX (http://softwareprojects.intel.com/avx/) extends the registers to 256-bit and SSE5 (http://developer.amd.com/cpu/SSE5/Pages/default.aspx) adds instructions that perform a multiplication and an addition in one. You do the math. :8)

What's still lacking though is parallel scatter/gather instructions. Writing and reading elements from different memory locations is quickly becoming the biggest bottleneck with arithmetic power going up. I'm curious when and how Intel/AMD will address this...
Title: Re: Idea for multithread memfill.
Post by: Rockoon on May 11, 2008, 04:45:19 AM
Quote from: sinsi on May 10, 2008, 08:50:05 AM
Let's face it, a GPU can kill a CPU (for certain things, fair enough), but look at gddr5 vs ddr3 - why can't intel/amd keep up with this?

This stuff has little benefit on CPU's.

Memory isnt accessed directly, instead memory is accessed from caches that already are close to keeping up with the CPU.

3 cycle latency or less from L1. DDRx doesnt impact this.
10 cycle latency or less from L2. DDRx doesnt impact this.

Title: Re: Idea for multithread memfill.
Post by: bozo on May 11, 2008, 09:47:50 PM
Quote from: c0dewarpThe point is that every time you wait for a thread to do anything, you are adding 10's of thousands of clock cycles to your own thread, giving them away to another process that IS ready to run now.

so how else do you suggest waiting for a thread to finish execution in these circumstances? or atleast know in a multi-threaded process when it finished? - normally i'd use WaitForSingleObject()/WaitForMultipleObjects() on the thread handle(s)..as in inside my thread(s) will be ExitThread() or just a RET to tell the operating system "i'm done here" which in turn notifies the main process.

it would be more efficient to do this than run a loop every N seconds checking a variable to see if the thread is terminated, because the operating system doesn't have to keep switching contexts.

Quote from: c0dewarpThere is no benefit at all to memfills < 64kb, when it takes 5-10 microsecs of latency before your "helper" gets up to speed

i don't recall ever saying that there was, or that the buffer being filled in hutchs' code was < 64kb - as you can see, its 100MB - the threads could be performing any number of operations on the same buffer, just at different offsets in different threads, not necessarily accessing the same data.

Quote from: c0dewarpDie-hard assembler programmers don't like to hear that (I used to be one, so I know)

i have a feeling you haven't programmed assembly since the old MS-DOS days.
Title: Re: Idea for multithread memfill.
Post by: u on May 12, 2008, 12:31:55 AM
Quote from: codewarp on May 10, 2008, 05:39:08 PM
The resistance to this concept from the assembler point of view is understandable--queuing and related support mechanisms are complicated and substantial.  It's the kind of complexity that pushes software toward higher-level solutions and away from assembler.  Die-hard assembler programmers don't like to hear that (I used to be one, so I know).  Work with it and you can achieve, fight it and you won't.
This just in: ASM programmers have been using MASM for the last decade, which stands for "Macro assembler". OOP, vectors, HLL constructs, garbage-collection, automated memory management, automated code-generation, thinking from ALL points of view/levels of the software (lowest to highest), being able to easily and reliably check performance thus choose the best route, and so on - WE HAVE IT. Read up, try, and THINK!

And from the start, you want us to present you with THE ultimate and only universal way to solve ALL types of problems of multicore? This stinks of too little programming experience. Prove me wrong.
Title: Re: Idea for multithread memfill.
Post by: c0d1f1ed on May 12, 2008, 07:18:27 PM
Quote from: Kernel_Gaddafi on May 11, 2008, 09:47:50 PMso how else do you suggest waiting for a thread to finish execution in these circumstances? or atleast know in a multi-threaded process when it finished? - normally i'd use WaitForSingleObject()/WaitForMultipleObjects() on the thread handle(s)..as in inside my thread(s) will be ExitThread() or just a RET to tell the operating system "i'm done here" which in turn notifies the main process.

it would be more efficient to do this than run a loop every N seconds checking a variable to see if the thread is terminated, because the operating system doesn't have to keep switching contexts.
I think c0dewarp's suggestion is to not wait at all. Design your algorithms in such a way that each thread always has something do to, and use lock-free synchronization (http://en.wikipedia.org/wiki/Lock-free_and_wait-free_algorithms) to avoid the huge overhead of O.S. thread synchronization.
Title: Re: Idea for multithread memfill.
Post by: codewarp on May 15, 2008, 06:10:29 PM
Quote from: c0d1f1ed on May 12, 2008, 07:18:27 PM
I think c0dewarp's suggestion is to not wait at all. Design your algorithms in such a way that each thread always has something do to, and use lock-free synchronization (http://en.wikipedia.org/wiki/Lock-free_and_wait-free_algorithms) to avoid the huge overhead of O.S. thread synchronization.

Exactly.
Title: Re: Idea for multithread memfill.
Post by: codewarp on May 15, 2008, 06:23:37 PM
Quote from: u on May 12, 2008, 12:31:55 AM
And from the start, you want us to present you with THE ultimate and only universal way to solve ALL types of problems of multicore? This stinks of too little programming experience. Prove me wrong.
See my thread "Multiple CPUs and the limits to assembler"
Title: Re: Idea for multithread memfill.
Post by: Mark Jones on June 26, 2008, 02:51:39 PM
Michael, I got this from your thread-test app:


2 us, mean waiting for end of time slice
2 us, mean using Sleep, 0
2 us, mean using event wait
Press any key to exit...


New box, AMD Athlon X2 (dual-core) 64-bit @4GHz, Win XP Pro x32
Title: Re: Idea for multithread memfill.
Post by: Mark Jones on June 26, 2008, 02:58:45 PM
And here's Nightware's test:

Results of the tests :
Test 1 (default hardware), done in 7953689 cycles
Test 2 (parallelism attempt), done in 5320118 cycles
Test 3 (attempt with another algo), done in 5322003 cycles
Test 4, done in 11 cycles
Test 5, done in 11 cycles
Test 6, done in 10 cycles
Test 7, done in 11 cycles
Test 8, done in 11 cycles
Title: Re: Idea for multithread memfill.
Post by: NightWare on June 26, 2008, 10:01:58 PM
hi,
read http://www.masm32.com/board/index.php?topic=9255.0, results with severals threads are not representatives. mine included...  :red