News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

How to CMP to more than one value?

Started by joemc, April 03, 2010, 01:51:56 AM

Previous topic - Next topic

joemc

I want to see if a character is whitespace or not...

basically i want to do this
    cmp eax, 20h
    cmp eax, 09h
    cmp eax, 0Dh
    cmp eax, 0Ah

and logicaly or the flags of each result.  What is the proper way?


clive

You can't compound the comparisons, the code would end up looking like this


cmp eax, 20h
jz whitespace
cmp eax, 09h
jz whitespace
cmp eax, 0Dh
jz whitespace
cmp eax, 0Ah
jz whitespace

; Not White Space
..
jmp @F

whitespace:

; White Space

@@:
It could be a random act of randomness. Those happen a lot as well.

joemc

Thank you.  Very quick response. Thats what i was currently doing :(  i was hoping there was a better way.

clive

On the ARM you could cascade the comparisons using conditional executions, once one of them set the ZF the others would turn into NOP's. The x86 (from Pentium Pro) has some conditional type instructions, but I don't think this helps much.

The non-whitespace path is pretty cheap in cycles, and as most characters are in this group that could be advantageous (branch prediction) although not drastically so. Aligning the branch targets on DWORD or PARA boundaries might provide slight improvements in performance if you are processing a lot of characters.

You could use a jump table, but that could be unwieldy unless you want to expedite the parsing of other starting characters for example.

-Clive
It could be a random act of randomness. Those happen a lot as well.

hutch--

Joem

If it will fit into your design in character filtering, do one compare against the space character(ASCII 32) and if its less than 32 branch to bypass it.


cmp al, 32
jle label
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

donkey

Sorry nothing to add here, read a post wrong.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

clive

This might work well too.

cmp eax, 20h
ja notwhitespace
jz whitespace
cmp eax, 09h
jb notwhitespace
jz whitespace
cmp eax, 0Dh
jz whitespace
cmp eax, 0Ah
jz whitespace

; Drop through

notwhitespace:

; Not White Space

..

jmp @F

whitespace:

; White Space

@@:
It could be a random act of randomness. Those happen a lot as well.

clive

Looks like we all had similar thoughts. I guess I'd have to benchmark it, but my gut tells me 4 comparisons with branches not taken (well predicted) should be pretty efficient with CPU resources.

-Clive
It could be a random act of randomness. Those happen a lot as well.

sinsi

You might be able to use 'cmov' to avoid jumps, if all you want to know if it is whitespace or not.

hutch, should be 'jbe label' for unsigned compares.
Light travels faster than sound, that's why some people seem bright until you hear them.

lingo

 :wink
        mov edx, eax    ;  simultaneously testing for 5 bytes (0, 9h, 0Ah, 0Dh, 20h)
lea ecx, [eax+0FEFEFEFFh]
xor eax, 9090909h ;  09h -> ASCII code of  <tab>
add eax, 0FEFEFEFFh
or ecx, eax
mov eax, edx
xor eax, 0A0A0A0Ah ;  0Ah -> ASCII code of  LF
add eax, 0FEFEFEFFh
or ecx, eax
mov eax, edx
xor eax, 0D0D0D0Dh ;  0Dh -> ASCII code of  CR
add eax, 0FEFEFEFFh
or ecx, eax
xor edx, 20202020h         ;  20h -> ASCII code of  Space
add edx, 0FEFEFEFFh
or ecx, edx
test ecx, 80808080h
jne WhiteSpace
Not_White:
....
WhiteSpace:


hutch--

sinsi,

Puer laziness, byte range will never exceed signed DWORD range so there is no loss here.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

sinsi

Light travels faster than sound, that's why some people seem bright until you hear them.

drizz

Use a table.


CHR_UPPER equ 00000001b
CHR_LOWER equ 00000010b
CHR_DIGIT equ 00000100b
CHR_DELIM equ 00001000b
CHR_PUNCT equ 00010000b
CHR_CNTRL equ 00100000b
CHR_SPACE equ 01000000b
CHR_HEX   equ 10000000b
CHR_ALPHA equ CHR_LOWER or CHR_UPPER
CHR_ALNUM equ CHR_LOWER or CHR_UPPER or CHR_DIGIT
CHR_PRINT equ CHR_LOWER or CHR_UPPER or CHR_DIGIT or CHR_PUNCT or CHR_SPACE

PUBLIC ChrFlagTable

.data

ChrFlagTable label byte
db  9 dup  (00100000b); #0..
db  5 dup  (00101000b); #TAB,#LF,#CR,..
db 18 dup  (00100000b);
db  1 dup  (01001000b); #SPACE
db 15 dup  (00010000b); !"#$%&'()*+,-./
db 10 dup  (10000100b); 0123456789
db  7 dup  (00010000b); :;<=>?@
db  6 dup  (10000001b); ABCDEF
db 20 dup  (00000001b); GHIJKLMNOPQRSTUVWXYZ
db  6 dup  (00010000b); [\]^_`
db  6 dup  (10000010b); abcdef
db 20 dup  (00000010b); ghijklmnopqrstuvwxyz
db  4 dup  (00010000b); {|}~
db 128 dup (0)

.code
.if byte ptr [ChrFlagTable+eax] & (CHR_CNTRL or CHR_SPACE)
.endif

The truth cannot be learned ... it can only be recognized.

hutch--

This is a test piece for the 2 tests, one for zero, the other for anything below ascii 32. The test piece is a white space stripper that inserts a single space for any sequence of white spaces including CRLF pairs.


IF 0  ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                      Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include\masm32rt.inc

    nowhsp PROTO :DWORD

    .data
      text db "this  is",9,9,32," a",13,10," test  of",9,9,32,32,"  white",13,10," space  removal",13,10,0

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    invoke nowhsp,ADDR text

    print ADDR text,13,10

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

nowhsp proc ptxt:DWORD

    mov ecx, [esp+4]        ; ptxt
    mov edx, [esp+4]        ; ptxt
    sub ecx, 1
    jmp lead

  pre:
    mov BYTE PTR [edx], al
    add edx, 1

  lead:
    add ecx, 1
    movzx eax, BYTE PTR [ecx]
    test eax, eax
    jz quit
    cmp eax, 32
    jg pre

  subloop:
    add ecx, 1
    movzx eax, BYTE PTR [ecx]
    test eax, eax
    jz quit
    cmp eax, 32
    jle subloop
    mov BYTE PTR [edx], 32
    add edx, 1
    jmp pre

  quit:
    mov BYTE PTR [edx], 0
    ret 4

nowhsp endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

clive

Bit tables come to mind too, bit I suspect the instruction/cycle count will be a wash. Other processor architectures I'd be more tempted.

The other thought is, if you want to eat lots of white space rapidly would be to look for multiples skipping forward appropriately, testing the long ones first

0x20
0x2020
0x20202020
0x09
0x0909
0x0A
0x0D
0x0D0D ; If UNIX
0x0A0D ; Natural pairing of CR,LF
0x0A0D0A0D

You could probably expand the list, I picked this based on gut feeling of potential distributions in a file.

-Clive
It could be a random act of randomness. Those happen a lot as well.