A New Testbed for MASM32 Forum

hutch-- · September 22, 2010, 03:24:16 AM

Frank,

For a testbed to be useful it must do the timings in REAL TIME, processors have not worked in clock ticks since the i486dx. (1990)

frktons · September 22, 2010, 12:45:17 PM

Quote from: hutch-- on September 22, 2010, 03:24:16 AM
Frank,

For a testbed to be useful it must do the timings in REAL TIME, processors have not worked in clock ticks since the i486dx. (1990)

Maybe somebody has already a routine to do it.
This part is quite difficult for me to manage for the time being.
Any help is welcome.

Frank

hutch-- · September 22, 2010, 12:59:18 PM

Frank,

testing in real time has every irritation you can imagine, processor load variation, OS interference and considerable difficulty in getting reliable times but its only saving grace is it comes a lot closer to how algorithms get used. basically you work out what the algo will be used for, design the data for test purposes then run it for at least a half a second to get something like reliable timings.

If the take off of an algo is important you gear it for short repeated data, if sustained rates of data is important you gear the data for long continuous processing with very large samples, basically the difference is between ATTACK and SUSTAIN.

frktons · September 22, 2010, 04:13:30 PM

Quote from: hutch-- on September 22, 2010, 12:59:18 PM
Frank,

testing in real time has every irritation you can imagine, processor load variation, OS interference and considerable difficulty in getting reliable times but its only saving grace is it comes a lot closer to how algorithms get used. basically you work out what the algo will be used for, design the data for test purposes then run it for at least a half a second to get something like reliable timings.

If the take off of an algo is important you gear it for short repeated data, if sustained rates of data is important you gear the data for long continuous processing with very large samples, basically the difference is between ATTACK and SUSTAIN.

I gladly postpone this argument to a later date. :P

Frank

frktons · October 30, 2010, 10:30:14 AM

Hi all guys. :bg

Eventually I got a couple of free hours, and I coded some of the routines
needed to display the Screen Interface for the New Testbed.

It is just the grid structure to display the results of the tests.

Still missing:

1) Proc to Detect CPU type and PRINT it
2) Proc to Time the Algos and PRINT the elapsed CPU cycles
3) Proc to Format the numbers with point/comma thousand separator
4) Proc to Detect the Windows version and PRINT it

All the missing Procs are already around the forum, but I've not time
enough at the moment to search, adapt them and code them inside the
program. I'll do it as time permits.

Anybody who has already coded these Procs could adapt and insert them
inside the program structure attached and post the new version. :P

The actual structure of the program uses an external file that is the TextScreen
to display. I prefer to have it external because is easier to modify, if needed.

The program uses an Open File Dialog already set to pick the name of the
Text Screen File. In the final version both the external file and the Dialog
could possibly be unnecessary.

Any improvement in Speed, Completeness, Whatever are always welcome.

Frank

hutch-- · October 30, 2010, 01:54:50 PM

It looks good Frank, I have always liked crisp looking text mode interfaces.

frktons · October 31, 2010, 04:03:01 AM

I wonder how this PROC correctly works and the text screen
format is displayed. Looking at it, the content of eax after the first
mov is not cleared, and the next shl should bring 2 bytes that are
not zero in the positions where they shouldn't. ::)

Code Select


; -------------------------------------------------------------------------
; The data read from the file is converted into CHAR_INFO style. 
; ConsoleBuffer is filled with the content of FileBuffer, padding a zero
; after each byte making it Windows Console API displayable. 
; -------------------------------------------------------------------------

ConvertArea PROC

    lea  esi, FileBuffer
    lea  edi, ConsoleScreen

    mov  ecx, BufChars
    xor  eax, eax

    xor  ebx, ebx
    
NextChar:

    mov  bx , WORD PTR [esi]
    mov  ah , bh
    shl  eax, 8
    mov  al , bl

    mov  [edi], eax
    add  esi, 2
    add  edi, 4

    dec  ecx

    jnz  NextChar

    ret 
    
ConvertArea ENDP

Frank

Antariy · October 31, 2010, 10:37:40 PM

Quote from: frktons on October 31, 2010, 04:03:01 AM
I wonder how this PROC correctly works and the text screen
format is displayed. Looking at it, the content of eax after the first
mov is not cleared, and the next shl should bring 2 bytes that are
not zero in the positions where they shouldn't. ::)

Code Select Expand
; ------------------------------------------------------------------------- ; The data read from the file is converted into CHAR_INFO style. ; ConsoleBuffer is filled with the content of FileBuffer, padding a zero ; after each byte making it Windows Console API displayable. ; ------------------------------------------------------------------------- ConvertArea PROC lea esi, FileBuffer lea edi, ConsoleScreen mov ecx, BufChars xor eax, eax xor ebx, ebx NextChar: mov bx , WORD PTR [esi] mov ah , bh shl eax, 8 mov al , bl mov [edi], eax add esi, 2 add edi, 4 dec ecx jnz NextChar ret ConvertArea ENDP

Frank

Frank, I guess, you are did repacking of DOS-like screen contents to the Windows-console screen-buffer contents?

Since CHAR_INFO have format:

Code Select


typedef struct _CHAR_INFO { // chi  
    union {                /* Unicode or ANSI character  */ 
        WCHAR UnicodeChar; 
        CHAR AsciiChar; 
    } Char; 
    WORD Attributes;       // text and background colors 
} CHAR_INFO, *PCHAR_INFO;

for unifyed usage with *Unicode* output and ASCII output, first member have size of WORD (2 bytes) - it is union, and contain WCHAR - which is Unicode UTF-16 char with size 2 byte (WORD). Second member is WORD, but higher byte have meaning for two-bytes character sets (like Japan) - i.e. complex scripts on Win9x-like implementation.

So, when you do your code, the EAX is

Code Select


E A X:
#0byte        #1byte        #2byte        #3byte   
================================================
char code |  garbage   | color format  | garbage

Sice you use WriteConsoleOutputA, higher byte (AH) of the first member (Char) if the CHAR_INFO is ignored - it is used only in Unicode version. The same higher byte of EAX (#3) is ignored, sice it is just not used, because all colors specifyers is placed in lower byte.

So, your code just work on system (your system) - which made it right. If run it on Japan Win9x, or call to WriteConsoleOutputW - results would be "undefined".

Alex

frktons · October 31, 2010, 11:43:43 PM

Yes Alex, I was thinking about that. I'll change it anyway along the path.
The program will have to change many times before it is ready.

In the next step release I'll correct it and make it a little bit faster :bg

Frank

Antariy · October 31, 2010, 11:49:35 PM

Quote from: frktons on October 31, 2010, 11:43:43 PM
Yes Alex, I was thinking about that. I'll change it anyway along the path.

Probably, something like this:

Quote
..........
NextChar:

movzx eax , WORD PTR [esi]
shl eax,8
shr ax,8
mov [edi], eax

...........

Alex

frktons · October 31, 2010, 11:57:57 PM

Quote from: Antariy on October 31, 2010, 11:49:35 PM

Probably, something like this:
Quote
..........
NextChar:

movzx eax , WORD PTR [esi]
shl eax,8
shr ax,8
mov [edi], eax

...........

Alex

That is one possibility. I have used a different approach some months ago,
reading 4 bytes at a time from the source string and writing them to the
destination string like this:

Code Select


Convert:
    mov eax, [esi]
    mov [edi], al
    mov [edi+2], ah
    bswap eax
    mov [edi+6], al
    mov [edi+4], ah
    add esi, 4
    add edi, 8
    dec ecx
    jnz Convert

but I don't know for the time being what's faster.

Frank

Antariy · November 01, 2010, 12:05:27 AM

Quote from: frktons on October 31, 2010, 11:57:57 PM
That is one possibility. I have used a different approach some months ago,
reading 4 bytes at a time from the source string and writing them to the
destination string like this:

but I don't know for the time being what's faster.

Frank

This is almost the same, if you want use that code, but writes to subsequentry places:

Code Select


Convert:
    mov eax, [esi]
    mov [edi], al
    mov [edi+2], ah
    shr eax,16
    mov [edi+4], al
    mov [edi+6], ah
    add esi, 4
    add edi, 8
    dec ecx
    jnz Convert

Hard to say what is faster on PIV - SHR is slow, and BSWAP too :green2

Alex

frktons · November 01, 2010, 12:13:35 AM

Quote from: Antariy on November 01, 2010, 12:05:27 AM
Quote from: frktons on October 31, 2010, 11:57:57 PM
That is one possibility. I have used a different approach some months ago,
reading 4 bytes at a time from the source string and writing them to the
destination string like this:

but I don't know for the time being what's faster.

Frank

This is almost the same, if you want use that code, but writes to subsequentry places:
Code Select Expand
Convert: mov eax, [esi] mov [edi], al mov [edi+2], ah shr eax,16 mov [edi+4], al mov [edi+6], ah add esi, 4 add edi, 8 dec ecx jnz Convert

Hard to say what is faster on PIV - SHR is slow, and BSWAP too :green2

Alex

We timed it some months ago, and if I correctly remember BSWAP if faster than
SHR on the machines we tested. And using MMX registers to manage 64 bits at a time
is complicated because we have to pad a zero byte after each CHAR/ATTRIBUTE byte.

At least we have not tried to do it with MMX or SSE instructions, so far.

Frank

Antariy · November 01, 2010, 12:19:09 AM

Quote from: frktons on November 01, 2010, 12:13:35 AM
We timed it some months ago, and if I correctly remember BSWAP if faster than
SHR on the machines we tested. And using MMX registers to manage 64 bits at a time
is complicated because we have to pad a zero byte after each CHAR/ATTRIBUTE byte.

At least, in poit of hardware, BSWAP must be slower than SHR, because SHR can and must work with the same physical register (set of bits) but BSWAP must use temporary place or delay lines.
But we does not know how implemented something in some CPU :green2 Anyway - SHR is incredible designed on PIV.

Alex

Antariy · November 01, 2010, 12:22:30 AM

Quote from: frktons on November 01, 2010, 12:13:35 AM
And using MMX registers to manage 64 bits at a time
is complicated because we have to pad a zero byte after each CHAR/ATTRIBUTE byte.

At least we have not tried to do it with MMX or SSE instructions, so far.

Using of PUNPCKLBW will help a lot.
EDITED: for clearness - can interleave data bytes and padding zero, in this case.

Alex

News:

A New Testbed for MASM32 Forum