I have a string of two words in hex that i need to convert to ascii the only problem is that there is NULL chars in the string between each letter and 3 NULL chars to identify the end of each word
for example:
68 00 65 00 6c 00 6c 00 6f 00 00 00 77 00 6f 00 72 00 6c 00 64 00 00 00
would translate to:
hello world
Is there a quick streamlined function to convert the string and return each word to their own destination buffers and replace the 3 NULLS to terminate each string
There are many solutions to this. Maybe you should use the szappend macro to append them one by one in a temporary buffer
EDIT: Errr, I didn't read you post very well I believe. Those look like 2 unicode zstrings. There are unicode functions in masmlib.
the problem i run into is that the nulls terminate the string when trying to convert
so instead of returning 'hello' it just returns 'h'
Quote from: ChillyWilly on September 13, 2008, 08:43:19 AM
the problem i run into is that the nulls terminate the string when trying to convert
so instead of returning 'hello' it just returns 'h'
Use the unicode functions, not the asciiz functions. These are 2 unicode strings.
is there a function in masm that returns the string?
i dont know which unicode function will return them to ascii
and separate both words
in delphi i would do this
function ConvertDataToAscii(Buffer: pointer; Length: Word): string;
var
Iterator: integer;
AsciiBuffer: string;
begin
AsciiBuffer := '';
for Iterator := 0 to Length - 1 do
begin
if char(pointer(integer(Buffer) + Iterator)^) in [#32..#127] then
AsciiBuffer := AsciiBuffer + ' ' + char(pointer(integer(Buffer) + Iterator)^) + ' '
else
AsciiBuffer := AsciiBuffer + ' . ';
end;
Result := AsciiBuffer;
end;
chilly,
You need to know what the text format is and what you want it to end up as. Normal unicode does not use 00 00 as a word separator so it tends to look like a unicode list with zero separation and zero termination which makes it difficult to process as you don't know where the text end is.
If you could get it to be consistent its an easy enough algorithm to write. The form that would be useful is like the common dialog box path word pairs that are zero separated and double zero terminated.
its a string in the registry
invoke WideCharToMultiByte, CP_ACP, 0, addr Source, -1, addr Destination, ecx, NULL, NULL
only returns the first strin "hello"
is there a way to seperate the two unicode strings from the string before calling WideCharToMultiByte
Quote from: ChillyWilly on September 13, 2008, 05:48:52 PM
invoke WideCharToMultiByte, CP_ACP, 0, addr Source, -1, addr Destination, ecx, NULL, NULL
only returns the first strin "hello"
is there a way to seperate the two unicode strings from the string before calling WideCharToMultiByte
add the length of the string to source and then call widechar again
ChillyWilly,
Quote
I have a string of two words in hex that i need to convert to ascii
the only problem is that there is NULL chars in the string between each letter
and 3 NULL chars to identify the end of each word ( ? )
for example:
68 00 65 00 6c 00 6c 00 6f 00 00 00 -> means hello (end in 00 00)
77 00 6f 00 72 00 6c 00 64 00 00 00 -> " world (end in 00 00)
;-----------------------------------------------------------------------
1st: each char is 1 word: ex: 68 00 is the char "h". Yes ?
So, each text word (ex: "hello") ends in one word 00 00. Yes ?
This is the way we can see your problem. 2nd: now, you want to convert each word (ex: 68 00 ) to byte
and return each text word to its own destination buffer
So, we should have 2 destination buffers
You can do this with the code below. Each buffer ends with 00
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
.data
_unicode db 68h, 00, 65h, 00, 6ch, 00, 6ch, 00, 6fh, 00, 00, 00 ; text word 1
db 77h, 00, 6fh, 00, 72h, 00, 6ch, 00, 64h, 00, 00, 00 ; text word 2
_ascii1 db 120 dup (?) ; buffer for text word 1
_ascii2 db 120 dup (?) ; buffer for text word 2
.code
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««
mov edi, offset _ascii1 ; buffer for text word 1
mov esi, offset _unicode
;
@@: movzx eax, word ptr [esi]
mov byte ptr [edi], al
add esi, 2
inc edi
cmp ax, 0
jne @B
mov edi, offset _ascii2 ; buffer for text word 2
@@: movzx eax, word ptr [esi]
mov byte ptr [edi], al
add esi, 2
inc edi
cmp ax, 0
jne @B
;»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»
Have o good work
RuiLoureiro
Alright, it is my turn for the stupid question of the day.
Why are the zeroes in there at all? I use the method described by Hutch in my tables. Using his method it is a simple matter to step around each zero ands test for another zero. if no, then decode another word, concatenate it to the string you are building and so on. You could, also, replace the zeroes with spaces if you are truly building a string.
--Paul
ChillyWilly,
This is a unicode string array. The first string is "hello" and the second string is "world"
after WideCharToMultiByte,the destination buffer is like this:
'hello',0,'world',0 the hex is 68 65 6c 6c 6f 00 77 6f 72 6c 64 00
so,as a zero terminated string,the destination buffer is only "hello"
you can access "world" like this:
lea esi,destination
invoke lstrlen,addr destination
add esi,eax
inc esi
;esi now point "world"
invoke MessageBox,0,esi,0,0
This technique when applied to the registry is usually used to hide other data behind the first pair of zeros. Now for someone who has a valid reason to do this, it will not be done with a standard API or simple reusable algo, it needs to be written to read a zero separated sequence while having some method of determining where the end of the string lies.
Its a problem of this type using pseudo ANSI notation.
"string1",0,"hidden string2",0,"hidden string3",0 etc ....
If you know for certain that it is written like a common dialog path string pairs, you read the single zero as a separator and terminate the read with a pair of zeros but if this registry entry is non standard which appears to be the case, the person reading the data will need to know how many items are zero separated but no double zero terminated.
there is two items in the registry entry if i convert from unicode to hex
the string looks like this:
"68 00 65 00 6c 00 6c 00 6f 00 00 00 77 00 6f 00 72 00 6c 00 64 00 00 00"
where the first word 'hello' is 68 00 65 00 6c 00 6c 00 6f 00 00 00
so the string ends with 00 00 00
the second word is 77 00 6f 00 72 00 6c 00 64 00 00 00
and also ends with 00 00 00
is there some way to separate the two before calling WideCharToMultiByte
and also to check if there is indeed a second string
because sometimes it does not have one
Quote from: hutch-- on September 14, 2008, 03:36:56 AMThis technique when applied to the registry is usually used to hide other data behind the first pair of zeros.
If it ends with a double null (00 00 00 00) then it could be a REG_MULTI_SZ.
Quote from: ChillyWilly on September 14, 2008, 04:03:23 AM
is there some way to separate the two before calling WideCharToMultiByte
and also to check if there is indeed a second string
because sometimes it does not have one
If you are using RegQueryValueEx then you get the length of the data returned in lpcbData. The only way to see if there's more than one string is to step through each one until
you reach the end.
Quote from: ChillyWilly on September 14, 2008, 04:03:23 AM
the string looks like this:
"68 00 65 00 6c 00 6c 00 6f 00 00 00 77 00 6f 00 72 00 6c 00 64 00 00 00"
where the first word 'hello' is 68 00 65 00 6c 00 6c 00 6f 00 00 00
so the string ends with 00 00 00
the second word is 77 00 6f 00 72 00 6c 00 64 00 00 00
and also ends with 00 00 00
ChillyWilly,
I think you are making a confusion when you say «so the string ends with 00 00 00»
Then it means sometimes each char are 2 bytes and at the end only 1 byte. Why ?
68 00 65 00 6c 00 6c 00 6f 00 00 00 (so it ends with 2 bytes)
h e l l o so 0
Rui
so is there a function to seperate a unicode string into 2 parts
See MSDN: MultiByteToWideChar (http://msdn.microsoft.com/en-us/library/ms776413.aspx) & WideCharToMultiByte (http://msdn.microsoft.com/en-us/library/ms776420.aspx) for conversion between UNICODE and ASCII text formats.
See \masm32\help\MASMLIB.CHM for a list of library functions for manipulating string data, both UNICODE and ASCII.
ChillyWilly,
The separator between the two words says its not a normal unicode string, you need to write the algo yourself.
would it be easier to convert the unicode to hex then parse it?
the problem m running into since the string contains nulls it terminates the procedure because its searching for the 0 at the end of the string to stop the search
Changing it to hex is not going to help you because you will still have an indeterminate number of zeroes.
-- Paul
Quote from: ChillyWilly on September 14, 2008, 07:47:27 PM
would it be easier to convert the unicode to hex then parse it?
ChillyWilly,
Sorry, i didnt understand your problem ! You dont solve the
problem "how to convert the unicode to hex" and you want
to think about "how to parse it". The example you gave us
is correct or there are more behind it ?Quote
As Hutch said, «The separator between the two words says its not a
normal unicode string, you need to write the algo yourself.»
And if your case is not a normal unicode string, you should see
what is your case and then try to write your own procedure. Do you Know
how many bytes/words it has ?
Take a look at the following code. Does it help you ? ;**************************************************************************
; Console Assemble & Link
; *******************
include \masm32\include\masm32rt.inc
CvWordAscii proto :DWORD,:DWORD,:DWORD
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
.data
_unicode db 68h, 00, 65h, 00, 6ch, 00, 6ch, 00, 6fh, 00, 00, 00 ; text word 1
db 77h, 00, 6fh, 00, 72h, 00, 6ch, 00, 64h, 00, 00, 00 ; text word 2
_ascii1 db 120 dup (?) ; buffer for text word 1
_ascii2 db 120 dup (?) ; buffer for text word 2
.code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start: invoke CvWordAscii, offset _unicode, offset _ascii1, offset _ascii2
print offset _ascii1,13,10
print offset _ascii2,13,10
inkey
exit
; «««««««««««««««««« procedure «««««««««««««««««««««««««««««
; pSrc = pointer to Source -> example: offset _unicode
; pDst1= pointer to Destination buffer 1 -> example: offset _ascii1
; pDst2= pointer to Destination buffer 2
;
CvWordAscii proc pSrc:DWORD, pDst1:DWORD, pDst2:DWORD
mov ecx, pDst1
mov edx, pSrc
@@: movzx eax, word ptr [edx]
mov byte ptr [ecx], al
add edx, 2
inc ecx
cmp ax, 0 ; is 0000h ?
jne @B
mov ecx, pDst2
@@: movzx eax, word ptr [edx]
mov byte ptr [ecx], al
add edx, 2
inc ecx
cmp ax, 0 ; is 0000h ?
jne @B
ret
CvWordAscii endp
; ««««««««««««««««««««««««««««««««««««
end startRui
yes this is exactly what i needed :U
Quote from: ChillyWilly on September 15, 2008, 04:59:39 PM
yes this is exactly what i needed :U
Have a good work :U
RuiLoureiro