News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

A New Testbed for MASM32 Forum

Started by frktons, September 21, 2010, 05:25:26 PM

Previous topic - Next topic

hutch--

Frank,

For a testbed to be useful it must do the timings in REAL TIME, processors have not worked in clock ticks since the i486dx. (1990)
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

frktons

Quote from: hutch-- on September 22, 2010, 03:24:16 AM
Frank,

For a testbed to be useful it must do the timings in REAL TIME, processors have not worked in clock ticks since the i486dx. (1990)

Maybe somebody has already a routine to do it.
This part is quite difficult for me to manage for the time being.
Any help is welcome.

Frank
Mind is like a parachute. You know what to do in order to use it :-)

hutch--

Frank,

testing in real time has every irritation you can imagine, processor load variation, OS interference and considerable difficulty in getting reliable times but its only saving grace is it comes a lot closer to how algorithms get used. basically you work out what the algo will be used for, design the data for test purposes then run it for at least a half a second to get something like reliable timings.

If the take off of an algo is important you gear it for short repeated data, if sustained rates of data is important you gear the data for long continuous processing with very large samples, basically the difference is between ATTACK and SUSTAIN.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

frktons

Quote from: hutch-- on September 22, 2010, 12:59:18 PM
Frank,

testing in real time has every irritation you can imagine, processor load variation, OS interference and considerable difficulty in getting reliable times but its only saving grace is it comes a lot closer to how algorithms get used. basically you work out what the algo will be used for, design the data for test purposes then run it for at least a half a second to get something like reliable timings.

If the take off of an algo is important you gear it for short repeated data, if sustained rates of data is important you gear the data for long continuous processing with very large samples, basically the difference is between ATTACK and SUSTAIN.

I gladly postpone this argument to a later date.  :P

Frank
Mind is like a parachute. You know what to do in order to use it :-)

frktons

Hi all guys.  :bg

Eventually I got a couple of free hours, and I coded some of the routines
needed to display the Screen Interface for the New Testbed.

It is just the grid structure to display the results of the tests.

Still missing:

1) Proc to Detect CPU type and PRINT it
2) Proc to Time the Algos and PRINT the elapsed CPU cycles
3) Proc to Format the numbers with point/comma thousand separator
4) Proc to Detect the Windows version and PRINT it

All the missing Procs are already around the forum, but I've not time
enough at the moment to search, adapt them and code them inside the
program. I'll do it as time permits.

Anybody who has already coded these Procs could adapt and insert them
inside the program structure attached and post the new version.   :P

The actual structure of the program uses an external file that is the TextScreen
to display. I prefer to have it external because is easier to modify, if needed.

The program uses an Open File Dialog already set to pick the name of the
Text Screen File. In the final version both the external file and the Dialog
could possibly be unnecessary.

Any improvement in Speed, Completeness, Whatever are always welcome.

Frank


Mind is like a parachute. You know what to do in order to use it :-)

hutch--

It looks good Frank, I have always liked crisp looking text mode interfaces.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

frktons

I wonder how this PROC correctly works and the text screen
format is displayed. Looking at it, the content of eax after the first
mov is not cleared, and the next shl should bring 2 bytes that are
not zero in the positions where they shouldn't.  ::)


; -------------------------------------------------------------------------
; The data read from the file is converted into CHAR_INFO style.
; ConsoleBuffer is filled with the content of FileBuffer, padding a zero
; after each byte making it Windows Console API displayable.
; -------------------------------------------------------------------------

ConvertArea PROC

    lea  esi, FileBuffer
    lea  edi, ConsoleScreen

    mov  ecx, BufChars
    xor  eax, eax

    xor  ebx, ebx
   
NextChar:

    mov  bx , WORD PTR [esi]
    mov  ah , bh
    shl  eax, 8
    mov  al , bl

    mov  [edi], eax
    add  esi, 2
    add  edi, 4

    dec  ecx

    jnz  NextChar

    ret
   
ConvertArea ENDP


Frank
Mind is like a parachute. You know what to do in order to use it :-)

Antariy

Quote from: frktons on October 31, 2010, 04:03:01 AM
I wonder how this PROC correctly works and the text screen
format is displayed. Looking at it, the content of eax after the first
mov is not cleared, and the next shl should bring 2 bytes that are
not zero in the positions where they shouldn't.  ::)


; -------------------------------------------------------------------------
; The data read from the file is converted into CHAR_INFO style.
; ConsoleBuffer is filled with the content of FileBuffer, padding a zero
; after each byte making it Windows Console API displayable.
; -------------------------------------------------------------------------

ConvertArea PROC

    lea  esi, FileBuffer
    lea  edi, ConsoleScreen

    mov  ecx, BufChars
    xor  eax, eax

    xor  ebx, ebx
   
NextChar:

    mov  bx , WORD PTR [esi]
    mov  ah , bh
    shl  eax, 8
    mov  al , bl

    mov  [edi], eax
    add  esi, 2
    add  edi, 4

    dec  ecx

    jnz  NextChar

    ret
   
ConvertArea ENDP


Frank


Frank, I guess, you are did repacking of DOS-like screen contents to the Windows-console screen-buffer contents?

Since CHAR_INFO have format:

typedef struct _CHAR_INFO { // chi 
    union {                /* Unicode or ANSI character  */
        WCHAR UnicodeChar;
        CHAR AsciiChar;
    } Char;
    WORD Attributes;       // text and background colors
} CHAR_INFO, *PCHAR_INFO;


for unifyed usage with *Unicode* output and ASCII output, first member have size of WORD (2 bytes) - it is union, and contain WCHAR - which is Unicode UTF-16 char with size 2 byte (WORD). Second member is WORD, but higher byte have meaning for two-bytes character sets (like Japan) - i.e. complex scripts on Win9x-like implementation.

So, when you do your code, the EAX is

E A X:
#0byte        #1byte        #2byte        #3byte   
================================================
char code |  garbage   | color format  | garbage


Sice you use WriteConsoleOutputA, higher byte (AH) of the first member (Char) if the CHAR_INFO is ignored - it is used only in Unicode version. The same higher byte of EAX (#3) is ignored, sice it is just not used, because all colors specifyers is placed in lower byte.

So, your code just work on system (your system) - which made it right. If run it on Japan Win9x, or call to WriteConsoleOutputW - results would be "undefined".



Alex

frktons

Yes Alex, I was thinking about that. I'll change it anyway along the path.
The program will have to change many times before it is ready.

In the next step release I'll correct it and make it a little bit faster  :bg

Frank
Mind is like a parachute. You know what to do in order to use it :-)

Antariy

Quote from: frktons on October 31, 2010, 11:43:43 PM
Yes Alex, I was thinking about that. I'll change it anyway along the path.

Probably, something like this:
Quote
..........
NextChar:

    movzx  eax , WORD PTR [esi]
    shl eax,8
    shr ax,8
    mov  [edi], eax

...........




Alex

frktons

Quote from: Antariy on October 31, 2010, 11:49:35 PM

Probably, something like this:
Quote
..........
NextChar:

    movzx  eax , WORD PTR [esi]
    shl eax,8
    shr ax,8
    mov  [edi], eax

...........

Alex

That is one possibility. I have used a different approach some months ago,
reading 4 bytes at a time from the source string and writing them to the
destination string like this:


Convert:
    mov eax, [esi]
    mov [edi], al
    mov [edi+2], ah
    bswap eax
    mov [edi+6], al
    mov [edi+4], ah
    add esi, 4
    add edi, 8
    dec ecx
    jnz Convert


but I don't know for the time being what's faster.

Frank
Mind is like a parachute. You know what to do in order to use it :-)

Antariy

Quote from: frktons on October 31, 2010, 11:57:57 PM
That is one possibility. I have used a different approach some months ago,
reading 4 bytes at a time from the source string and writing them to the
destination string like this:

but I don't know for the time being what's faster.

Frank

This is almost the same, if you want use that code, but writes to subsequentry places:

Convert:
    mov eax, [esi]
    mov [edi], al
    mov [edi+2], ah
    shr eax,16
    mov [edi+4], al
    mov [edi+6], ah
    add esi, 4
    add edi, 8
    dec ecx
    jnz Convert



Hard to say what is faster on PIV - SHR is slow, and BSWAP too  :green2



Alex

frktons

Quote from: Antariy on November 01, 2010, 12:05:27 AM
Quote from: frktons on October 31, 2010, 11:57:57 PM
That is one possibility. I have used a different approach some months ago,
reading 4 bytes at a time from the source string and writing them to the
destination string like this:

but I don't know for the time being what's faster.

Frank

This is almost the same, if you want use that code, but writes to subsequentry places:

Convert:
    mov eax, [esi]
    mov [edi], al
    mov [edi+2], ah
    shr eax,16
    mov [edi+4], al
    mov [edi+6], ah
    add esi, 4
    add edi, 8
    dec ecx
    jnz Convert


Hard to say what is faster on PIV - SHR is slow, and BSWAP too  :green2

Alex

We timed it some months ago, and if I correctly remember BSWAP if faster than
SHR on the machines we tested. And using MMX registers to manage 64 bits at a time
is complicated because we have to pad a zero byte after each CHAR/ATTRIBUTE byte.

At least we have not tried to do it with MMX or SSE instructions, so far.

Frank
Mind is like a parachute. You know what to do in order to use it :-)

Antariy

Quote from: frktons on November 01, 2010, 12:13:35 AM
We timed it some months ago, and if I correctly remember BSWAP if faster than
SHR on the machines we tested. And using MMX registers to manage 64 bits at a time
is complicated because we have to pad a zero byte after each CHAR/ATTRIBUTE byte.

At least, in poit of hardware, BSWAP must be slower than SHR, because SHR can and must work with the same physical register (set of bits) but BSWAP must use temporary place or delay lines.
But we does not know how implemented something in some CPU  :green2 Anyway - SHR is incredible designed on PIV.



Alex

Antariy

Quote from: frktons on November 01, 2010, 12:13:35 AM
And using MMX registers to manage 64 bits at a time
is complicated because we have to pad a zero byte after each CHAR/ATTRIBUTE byte.

At least we have not tried to do it with MMX or SSE instructions, so far.


Using of PUNPCKLBW will help a lot.
EDITED: for clearness - can interleave data bytes and padding zero, in this case.



Alex