News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

text position

Started by jckl, April 14, 2007, 08:24:41 AM

Previous topic - Next topic

jckl

i having trouble locating the end of some text within a block of text. I tried scanning through the text and keeping track of the last location i found a pattern but i couldnt get it to always work. I was wondering if someone can show me how they would go about find the end of this text is a block where this text is not always the same length.



WPN_M4AUTO.-1.-1.-1.WPN_M4.-1.-1.-1.WPN_AT4.-1.-1.-1.WPN_GRENADEFB.-1.-1.-1.WPN_GRENADEHE.-1.-1.-1.WPN_GRENADESM.-1.-1.-1.WPN_KNIFE.-1.-1.-1



this is another way it could be listed


WPN_CLAYMORE.-1.-1.-1PWPN_GRENADEFB.-1.-1.-1-WPN_GRENADEHE.-1.-1.-1-WPN_GRENADESM.-1.-1.-1-WPN_KNIFE.-1.-1.-1


the periods are actually a 0 charactor.

The text starts in the block at the same point but the end would be in a different place.. 2 blocks of text would be like this.


..WPN_M4AUTO.-1.-1.-1.WPN_M4.-1.-1.-1.WPN_AT4.-1.-1.-1.WPN_GRENADEFB.-1.-1.-1.WPN_GRENADEHE.-1.-1.-1.WPN_GRENADESM.-1.-1.-1.WPN_KNIFE.-1.-1.-1..

..WPN_CLAYMORE.-1.-1.-1PWPN_GRENADEFB.-1.-1.-1-WPN_GRENADEHE.-1.-1.-1-WPN_GRENADESM.-1.-1.-1-WPN_KNIFE.-1.-1.-1E..



Again the periods are 0 charactors.  when i scanned i read 4 bytes and compared it to 12589 which worked for the first block but not on the second. i assume its because of the letter E at the end instead of beong a 0 charactor.

dsouza123

The crucial issue: What defines the end of your block of text ?

Without this, how can the end be known ?

How many end sequences are there, and what are they ?

Once you have the answers about the end, the code will be straight forward.

Tedd

Your question is not entirely clear, so if you can correct my assumptions..

What you have is an array of zero-terminated strings (28 in total? I'm not sure if your examples are meant to be fully accurate or not - is it an exact number?).. and you're hoping to find the end of each array (weapons inventory)?

If that's the case then you can just count out 28 (or however many there are) items, each of which ends with the zero, then you're at the end of the array (and the start of the next one?)

Though the 'extra' letter E causes a problem, maybe that's because, either:
- the end of the array is not really where you think it should be;
- or there is some extra encoding which you don't know about - are the 'zeroes' always actually ZERO ?

P.S. What are you doing here - hacking a game?
No snowflake in an avalanche feels responsible.

jckl

no i am not hacking a game. I actually really really dislike people who write trainers and such. This is in a map file which the map editor can make a .mis file which is plain text and a .bms file which is read by the game. The map editor reopens the mis file. Well there has been these programs in the past to convert the bms file back to the mis. There are many people who want one for this series in the game and there is not one. Most people want this because it can be used to save their maps when they loose the .mis file. The zero's are hex 00 so a null charactor. The weapons are listed right before the items of the map and i am not worried about converting the weapons. I need to find where they end to start the items. I assume the reasoning that the map editor does not open the .bms file is because there is information that the bms file does contain like descriptions and such set by the user. When it saves the .mis it has the descriptions saved in the file. The number of weapons seems to change from map to map. It always starts offset 616. Some end at like 756 and some at 725.

heres a section in hex vs its plain text

5750 4E5F 4D34 4155 544F 002D 3100 2D31 002D 3100 5750 4E5F 4D34 002D 3100 2D31 002D 3100 5750 4E5F 4154 3400 2D31 002D 3100 2D31 0057 504E 5F47 5245 4E41 4445 4642 002D 3100 2D31 002D 3100 5750 4E5F 4752 454E 4144 4548 4500 2D31 002D 3100 2D31 0057 504E 5F47 5245 4E41 4445 534D 002D 3100 2D31 002D 3100 5750 4E5F 4B4E 4946 4500 2D31 002D 3100 2D31 0000

WPN_M4AUTO.-1.-1.-1.WPN_M4.-1.-1.-1.WPN_AT4.-1.-1.-1.WPN_GRENADEFB.-1.-1.-1.WPN_GRENADEHE.-1.-1.-1.WPN_GRENADESM.-1.-1.-1.WPN_KNIFE.-1.-1.-1..

Jimg

It looks to me like it's simply a variable length weapon name followed by three parameters.  Each weapon name and parameter is terminated by a zero byte.  Repeat for however many weapons there are.  What's the problem?  Read until you find a zero byte, that's a weapon name.  Read until you find a zero byte, that's parameter 1.  same for parameter 2 and 3.  Loop back and start over.
The big question is what do you want to do with this information?  Write it out in a different format?  What format?

MichaelW

jckl,

I'm not sure about the form of the input data, or exactly what component you are trying to find the end of, but this demonstrates one method of locating the double null byte, and one method of splitting the data into individual fields.

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
      .RADIX 16
      sample LABEL WORD
        dw 5750,4E5F,4D34,4155,544F,002D,3100,2D31,002D,3100,5750
        dw 4E5F,4D34,002D,3100,2D31,002D,3100,5750,4E5F,4154,3400
        dw 2D31,002D,3100,2D31,0057,504E,5F47,5245,4E41,4445,4642
        dw 002D,3100,2D31,002D,3100,5750,4E5F,4752,454E,4144,4548
        dw 4500,2D31,002D,3100,2D31,0057,504E,5F47,5245,4E41,4445
        dw 534D,002D,3100,2D31,002D,3100,5750,4E5F,4B4E,4946,4500
        dw 2D31,002D,3100,2D31,0000
     .RADIX 10
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

    ; ------------------------------------------------------------
    ; Reverse the bytes and the replace the embedded nulls with a
    ; suitable character that will not act as a string terminator.
    ; ------------------------------------------------------------

    mov ebx, OFFSET sample
    .WHILE WORD PTR[ebx] != 0     ; while not double null byte
      mov ax, [ebx]
      .IF al == 0
        mov al, 0ffh
      .ENDIF
      .IF ah == 0
        mov ah, 0ffh
      .ENDIF
      xchg ah, al
      mov [ebx], ax
      add ebx, 2
    .ENDW

    ; ----------------------------------------------------
    ; Use the CRT strtok function to parse the data using
    ; the replacement character as a delimiter.
    ; ----------------------------------------------------

    mov ebx, rv(crt_strtok, ADDR sample, chr$(0ffh))
    .WHILE ebx
      print ebx,13,10
      mov ebx, rv(crt_strtok, NULL, chr$(0ffh))
    .ENDW

    inkey "Press any key to exit..."
    exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start

eschew obfuscation

Tedd

Well all that example shows is that it seems to be: (weapon, value, value, value), (weapon, value, value, value), ..., NULL
(the extra null at the end to indicate the end of the list.)

Want to post hex for one that messes up this assumption (maybe with a few extra bytes before and after)..?


Obviously there must be a structure to it since it's expected to be read back in again. So, either the messed up example is actually a corrupt file, or there's some value somewhere that indicates the real length of the list, or ...(something else)...
No snowflake in an avalanche feels responsible.

jckl



...?..o.....................................WPN_CLAYMORE.-1.-1.-1PWPN_GRENADEFB.-1.-1.-1-WPN_GRENADEHE.-1.-1.-1-WPN_GRENADESM.-1.-1.-1-WPN_KNIFE.-1.-1.-1E.........&...........2.....0.....d...d.......@.......d.d.................................................................null




0000003F09006F000A000000000FA0050000000000000000000000000000000000000000000000000000000057504E5F434C41594D4F5245002D31002D31002D315057504E5F4752454E4144454642002D31002D31002D312D57504E5F4752454E4144454845002D31002D31002D312D57504E5F4752454E414445534D002D31002D31002D312D57504E5F4B4E494645002D31002D31002D314500BF000000000000002600000000000000A1C3DAFF32DFDCFFE21530000A000000640000006400000010000000400100000000000064006400000000000000000003000500FFFF1E0000000000000000000F00000000000000000000000A0000000000000000000000000000000000000000000000000000006E756C6C



This is one that messes it up.


Jimg

Is this an actual file that was output from the game?  Can you attach a couple of example file to a post?

jckl

here are a few of the bms files..

[attachment deleted by admin]

MichaelW

Without knowing the exact structure of the files it's going to be difficult to do anything with them. I see from a previous post that you have an idea of the item structure:

http://www.masm32.com/board/index.php?topic=4570.0

eschew obfuscation

jckl

yea i have the structure of everything i needed to read in that file but cant get past this part. I can tell you that the items start right after the -1 on both examples. Ill attach my txt that i stored my offsets in but it may not make sense to you since its kind of notes to me.

[attachment deleted by admin]

MichaelW

For the file that is dropped on the EXE, this code will display the offset from the start of the file of the last occurrence of "WPN_", or zero if there is no occurrence.

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
      pMem    dd 0
      memLen  dd 0
      cmd     db 128 dup(0)
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

    invoke GetCL, 1, ADDR cmd
    invoke read_disk_file, ADDR cmd, ADDR pMem, ADDR memLen

    mov ebx, memLen
    sub ebx, 4
    mov esi, pMem
    .WHILE ebx
      mov eax, [esi+ebx]
     .IF eax == "_NPW"
        .BREAK
     .ENDIF
     dec ebx
    .ENDW
    print uhex$(ebx),13,10

    inkey "Press any key to exit..."
    exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start

eschew obfuscation

Jimg

This will dump out the weapons contents, assuming good formating.  It looks like the first three files are good, and test444 has problems.  This program make no assumptions about the length of the parameters for each weapon.  If you could assume that each parameter was exactly two characters long, you could also process the test444 file.  Does the test444 file actually work as input to the game and show the correct information?

    include \masm32\include\masm32rt.inc
    .data
file1 db "1.bms",0
file2 db "2.bms",0
file3 db "Destroy All.bms",0
file4 db "test444.bms",0
mem   dd ?
flen  dd ?
   .code
Program:
call Main
Invoke ExitProcess,0

ReadBmsFile proc filename:dword
print "Procesing file: "
print filename,13,10
    invoke read_disk_file, filename,addr mem, ADDR flen
    or eax,eax
    jz Failure
mov esi,mem
add esi,268h
ReadLoop:
cmp byte ptr [esi],0
je done
print esi,13,10
.repeat
lodsb
.until al==0
jmp ReadLoop

done:
print "--------",13,10
invoke GlobalFree,mem
ret
Failure:
print "unable to read file",13,10
ReadBmsFileRet:
ret
ReadBmsFile EndP

Main Proc
invoke ReadBmsFile,addr file1
invoke ReadBmsFile,addr file2
invoke ReadBmsFile,addr file3
invoke ReadBmsFile,addr file4

    inkey "Press any key to exit..."
ret
Main EndP
end Program

jckl

yea that file works fine. I was going good until i ran into this file.