News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Spaces to Tabs Conversion

Started by Mark Jones, April 08, 2006, 11:07:09 PM

Previous topic - Next topic

Mark Jones

Hello group. Attached is a RadASM test piece for analyzing text formatting conversion. Below is the algorithm for converting spaced-text into tabbed-text. Sorry they are both a bit dodgy. :red I just can't seem to get the algorithm to work properly. Maybe it qualifies for the "Pearl Code Awards" but it's the best I can do. This one is very close but not exact. This is my fourth attempt and I'm really losing patience with it. :( Could anyone suggest something to try or a better approach? :wink


SpacesToTabs1 proc private uses esi edi lpDest:DWORD,lpSource:DWORD
    ;LOCAL inChrPos:DWORD                ; input line chr position counter
    ;LOCAL outChrPos:DWORD               ; output line chr pos counter

        cld                             ; always copy forwards
        mov esi,lpSource
        mov edi,lpDest

loopit: lodsd                           ; 4 bytes EAX<--[ESI] and ADD esi,4
        mov cl,1                        ; flag as "loop"
                                        ; always start after LF!
findLF: mov edx,eax
        cmp dl,0Ah                      ; search for linefeeds,
        jne @F
          sub esi,3                     ; copy any found and restart after
          jmp Out1
@@:     cmp dh,0Ah
        jne @F
          sub esi,2
          jmp Out2
@@:     ror edx,16                      ; swap words
        cmp dl,0Ah
        jne @F
          sub esi,1
          jmp Out3
@@:     cmp dh,0Ah
        je Out4         ; still a problem here?
        ror edx,16                      ; LF not found, look for spaces

findSP: mov cl,0                        ; flag as "tab"
        cmp eax,20202020h               ; "    "?
        je OutTab
        and edx,0FFFFFF00h              ; strip off LSB
        cmp edx,20202000h               ; "x   "?
        je Out1
        and edx,0FFFF0000h
        cmp edx,20200000h               ; "xx  "?
        je Out2
        and edx,0FF000000h
        cmp edx,20000000h               ; "xxx "?
        je Out3
        mov cl,1                        ; cannot tab, flag as "loop"

Out4:   test al,al                      ; copy all four bytes
        je OutNul                       ; check each for null
        stosb                           ; write to EDI and increment it
        ror eax,8                       ; rotate next byte into AL
Out3:   test al,al                      ; three bytes
        je OutNul
        stosb
        ror eax,8
Out2:   test al,al                      ; two bytes
        je OutNul
        stosb
        ror eax,8
Out1:   test al,al                      ; one byte
        je OutNul
        stosb
        cmp cl,1
        je loopit                       ; loop? or
       
OutTab: mov al,09h                      ; output tab
        stosb
        jmp loopit
OutNul: stosb
        ret
SpacesToTabs1 endp

[attachment deleted by admin]
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

Mark Jones

#1
I think I understand the problem now, the algorithm considers a read 09h a single character width, while it can actually span 1-4 characters depending on position. Well, back to the drawing board. :)

EDIT: Adding a "tabs to spaces" conversion at the beginning to this algo fixes the issue. Now maybe this amalgamation can be simplified.


        cld                             ; always copy forwards
        mov esi,lpSource
        mov edi,lpDest
nop
        mov cl,4                        ; preset chr position counter+1
remTab: lodsb                           ; replace any tabs with spaces
        dec cl
        and cl,3                        ; make CL always 0-3
        cmp al,0Ah                      ; linefeed?
        jnz @F
        mov cl,4                        ; set CL to 3 on next char
@@:     cmp al,09h                      ; tab?
        jnz putchr
        mov al,20h
@@:     stosb
        dec cl
        jge @B
        inc cl
        jmp remTab
putchr: stosb                           ; else write it
        cmp al,00h                      ; was it a null?
        jne remTab                      ; loop until null
                                        ; fall through when null copied
invoke lstrcpy,lpSource,lpDest
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

Tedd

How about moving the 'and cl,3' to just before putting the spaces?
Increment ecx for each character to keep track of the line offset, and then only and ecx,3 when you need to put the spaces for the tab (and don't forget to re-adjust ecx after doing the spaces!)
No snowflake in an avalanche feels responsible.