News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

IsTextUTF

Started by bomz, June 03, 2011, 09:19:57 PM

Previous topic - Next topic

bomz

It's work but I not sure correct
; ===============================================================
mov esi, fileSave
Next3:
lodsb
cmp al, 0D0h ;208
je Found2
cmp al, 0D1h ;209
je Found2
NoUTF:
cmp al, 0
je Exit
jmp Next3
Found2:
lodsb
cmp al,82h
jb NoUTF
cmp al,9Fh
ja NoUTF
invoke UTF8ToChar, fileSave
Exit:

; ===============================================================

baltoro

BOBZ,   
Are you referring to: UTF-8, UTF-16, or UTF-32, or, all of the preceeding ???
Baltoro

bomz

UTF-8 only. I try IsTextUnicode, but don't understand how it work, it all text recognize like Unicode. if no any UTF sign asci text not crash, but interesting how recognize it

http://narod.ru/disk/10932668001/%D0%A2%D0%B5%D0%BA%D1%81%D1%82%D0%BE%D0%B2%D1%8B%D0%B9%20%D0%B4%D0%BE%D0%BA%D1%83%D0%BC%D0%B5%D0%BD%D1%82%20%D0%A2%D0%B5%D0%BA%D1%81%D1%82%D0%BE%D0%B2%D1%8B%D0%B9%20%D0%B4%D0%BE%D0%BA%D1%83%D0%BC%D0%B5%D0%BD%D1%82%20%D0%A2%D0%B5%D0%BA%D1%81%D1%82%D0%BE%D0%B2%D1%8B%D0%B9%20%D0%B4%D0%BE%D0%BA%D1%83%D0%BC%D0%B5%D0%BD%D1%82.txt.html

I know only one crazy server. it's say iso coding. after UrlEnescape and three http re-addressing problem to give wright name to file

hutch--

bomz,

Give this a blast.


    mov ucflg, IS_TEXT_UNICODE_CONTROLS or IS_TEXT_UNICODE_NULL_BYTES or \
               IS_TEXT_UNICODE_STATISTICS or IS_TEXT_UNICODE_ILLEGAL_CHARS

    invoke IsTextUnicode,pMem,ccnt,ADDR ucflg
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

bomz



include \MASM32\INCLUDE\Advapi32.inc
includelib \MASM32\LIB\Advapi32.lib

invoke lstrlen, fileSave
mov bwrite, IS_TEXT_UNICODE_CONTROLS or IS_TEXT_UNICODE_NULL_BYTES or \
IS_TEXT_UNICODE_STATISTICS or IS_TEXT_UNICODE_ILLEGAL_CHARS
invoke IsTextUnicode,fileSave,eax,ADDR bwrite
.if bwrite == 0
invoke MessageBox,0,0,0,0
invoke UTF8ToChar, fileSave
.endif


The text passes test - that's mean text is unicode or text is not unicode? My code work using that examples which I try, but if you find in internet examples of this problem decision you may see that problem is more complicate , may be.




first file have not UTF name, second - UTF

Why I don't like the IsTextUnicode API

baltoro

BOMZ,
I don't know if you solved your problem,...but, I read this blog entry: How do I convert a UTF-8 string to UTF-16 while rejecting illegal sequences?, and, well, the idea is kind of crappy (you wouldn't want to use it in production code, but, you could use it as a concept test), but, you could invoke Multi­Byte­To­Wide­Char, and, presumably, WideCharToMultiByte to convert UTF-8 strings to UTF-16 and, back, and then compare the output and resultant errors, if any.
...If this sounds like a crappy idea, it's probably because it is,...
Baltoro