The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: hutch-- on April 24, 2011, 02:52:47 AM

Title: uniedit 2 beta.
Post by: hutch-- on April 24, 2011, 02:52:47 AM
I have a fair bit more work done on this version, the search and replace appear to be working properly, I have done a ZOOM menu that is working OK on my XP SP3 dev box and so far all of the hot keys are working correctly here. It is an English language editor in that the menus and dialog boxes have English text but the target is for it to be able to open, edit and save any unicode text that the OS version it is running on supports with any character set available as a unicode font. I don't know enough about right to left languages like Arabic and Hebrew and as far as I can tell it is not possible to get the text entry right to left on my US English Windows version but it appears to be doing the rest OK and that includes Chinese, Japanese, Thai, Russian, Georgian, Greek and a few others from east Asia.

I am sorry to impose on members but I need to be able to provide a unicode editor so that unicode RC files can be written using non ANSI character sets as resource strings that are accessed by LoadStringW and displayed natively as unicode.

At this stage of development there is no file confirmations done as it is a pest while doing a lot of testing. It does not yet store any settings as I need to have a pretty good idea of what to keep in INI files before I add the capacity.
Title: Re: uniedit 2 beta.
Post by: dedndave on April 24, 2011, 03:18:33 AM
it looks really nice, Hutch
the text even zooms smoothly with Ctrl-Wheel   :U

i was able to close after edit - no warning that i had not saved my work
i'm sure that's something you intend to add later
Title: Re: uniedit 2 beta.
Post by: hutch-- on April 24, 2011, 06:18:24 AM
Dave,

Thanks for testing this. Yes, that is the file save confirmations, while I am working on it, its a pain and I don't want to do the settings until I know reasonably well what I want to save settings for.
Title: Re: uniedit 2 beta.
Post by: jj2007 on April 24, 2011, 06:59:06 AM
Hutch,

When I open an existing rc file, I get this:
(http://www.masm32.com/board/index.php?action=dlattach;topic=16537.0;id=9149)

Maybe you should test for, and save, the BOM? It is straightforward, see this snippet from the RichMasm source:
mov SF_x, SF_TEXT ; default is plain text
invoke ReadFile, hFile, addr Read4, 4, ADDR dwBytesRead, 0
mov eax, Read4
or eax, 20200000h ; force lowercase for tr
.if eax=="tr\{"
mov SF_x, SF_RTF or SFF_PLAINRTF ; rtf if you find the magic string
mov NeedsCode, eax ; tr\{ as flag for StreamIn
.elseif ax==0FEFFh ; BOM
mov SF_x, SF_UNICODE or SF_TEXT
.endif
invoke SetFilePointer, hFile, 0, 0, FILE_BEGIN
Title: Re: uniedit 2 beta.
Post by: dedndave on April 24, 2011, 07:26:49 AM
you probably already know this stuff   :P

(http://www.masm32.com/board/index.php?action=dlattach;topic=16537.0;id=9150)


the attachment is the above PNG image renamed to ZIP
Title: Re: uniedit 2 beta.
Post by: hutch-- on April 24, 2011, 07:34:17 AM
JJ,

There is a test for unicode API but I have not played with it yet. It apparently works on most text but is unreliable on very low character counts. This editor is designed specifically for UNICODE plain text, I don't want it dependent on any other format or configuration files. It is humourous that ansi characters read as chinese characters in unicode, just that the chinese characters are high enough up in the 2 byte WORD range not to have a trailing 0 after each characters.
Title: Re: uniedit 2 beta.
Post by: jj2007 on April 24, 2011, 09:51:32 AM
Don't bother with the Unicode API. Here is a test that covers all cases: RTF, plain ANSI, Unicode with and without BOM.

invoke ReadFile, hFile, addr Read4, 4, ADDR dwBytesRead, 0
mov eax, Read4
m2m edx, SF_TEXT ; default is plain text
or eax, 20200000h ; force lowercase for tr
.if eax=="tr\{"
mov edx, SF_RTF or SFF_PLAINRTF ; rtf if you find the magic string
.elseif ax==0FEFFh ; BOM
mov dl, SF_UNICODE or SF_TEXT
.else
and eax, 0CF00FF00h
.if Zero?
mov dl, SF_UNICODE or SF_TEXT ; Unicode without BOM
.endif
.endif
mov SF_x, edx
invoke SetFilePointer, hFile, 0, 0, FILE_BEGIN ; back to start of file
mov editstream.pfnCallback, StreamInProc ; stream in the text
...
invoke SendMessage, hRE, EM_STREAMIN, SF_x, addr editstream
Title: Re: uniedit 2 beta.
Post by: hutch-- on April 24, 2011, 10:13:50 AM
JJ,

This is the format that the default EM_STREAMOUT writes the content to disk. In byte pairs for low ANSI, the first character contains the character, the second a NULL. On higher characters you have the first and second characters both as non NULLs. What I don't want is any leading prefix as the editor is designed to work in plain text only. QE opens anything including binary and while it displays garbage in binary and readable but not savable unicode with double spacing, it has never been a problem. I may end up testing for unicode as large image files are very slow to load even though they display garbage.

What I am chasing at the moment is if it performs search and replace on Win versions that use non European characters sets. I have it tested in Chinese, Japanese, Thai, Georgian, Russian, Greek and a few others and it seems to be working properly but people who are native users of non-european character sets may see something that I have not tested.


    db 47,0,47,0,32,0,164,0,164,0,164,0,164,0,164,0
    db 164,0,164,0,164,0,164,0,164,0,164,0,164,0,164,0
    db 164,0,164,0,164,0,164,0,164,0,164,0,164,0,164,0
    db 164,0,164,0,164,0,164,0,164,0,164,0,164,0,164,0
    db 164,0,164,0,164,0,164,0,164,0,164,0,164,0,164,0
    db 164,0,164,0,164,0,164,0,164,0,164,0,164,0,164,0
    db 164,0,164,0,164,0,164,0,164,0,164,0,164,0,164,0
    db 164,0,164,0,164,0,164,0,164,0,164,0,164,0,164,0
    db 164,0,164,0,164,0,164,0,164,0,164,0,164,0,164,0
    db 164,0,164,0,164,0,164,0,164,0,164,0,164,0,164,0
    db 164,0,164,0,164,0,164,0,164,0,164,0,164,0,164,0
    db 164,0,164,0,164,0,164,0,164,0,164,0,164,0,164,0
    db 164,0,164,0,164,0,164,0,164,0,13,0,10,0,13,0
    db 10,0,35,0,105,0,110,0,99,0,108,0,117,0,100,0
    db 101,0,32,0,34,0,92,0,109,0,97,0,115,0,109,0
    db 51,0,50,0,92,0,73,0,78,0,67,0,76,0,85,0
    db 68,0,69,0,92,0,82,0,101,0,115,0,111,0,117,0
    db 114,0,99,0,101,0,46,0,104,0,34,0,13,0,10,0
    db 13,0,10,0,47,0,47,0,32,0,164,0,164,0,164,0
Title: Re: uniedit 2 beta.
Post by: jj2007 on April 24, 2011, 10:31:02 AM
Hutch,

For practical reasons, you should check for the BOM, as shown above. It is a de facto standard for plain text Unicode files, and not by accident, even \masm32\bin\rc dated 1998 knows what a BOM is.
Title: Re: uniedit 2 beta.
Post by: hutch-- on April 24, 2011, 02:20:31 PM
Interestingly IsTextUnicode() worked better than I expected with judicious choice of the flags. The only problem was very short text in characters high enough to use both bytes with no nulls or CRLFs and that was easy to fix, if the text is under what a normal long line is, allow it, garbage and all. This way I preserve the original design of pure plain text. I don't want to go down the slippery slope of UTF8/16 RTF, PDF, DOC(x) etc .... If I was writing a word processor it may be viable to do that but I want a pure plain text editor. The IsTextUnicode() function solved the problem of accidentally trying to load a binary file or something in non-unicode format which was effectively locking up while it chomped through the non-unicode data.
Title: Re: uniedit 2 beta.
Post by: dedndave on April 24, 2011, 02:25:56 PM
i think the problem is, Hutch is trying to create a unicode editor for resource files, not necessarily for general unicode text files
i suppose a unicode resource file will have the BOM (don't know if rc.exe uses that or not)
but, individual string definitions will not have the BOM
Title: Re: uniedit 2 beta.
Post by: hutch-- on April 24, 2011, 03:39:53 PM
Dave,

It will open anything it can write and happily opens the unicode DrWatson log files and every other Microsoft unicode file I could find, the problem was if you tried to stream in a binary file by accident the EM_STREAMIN processing was very slow and the editor just about locked up until it loaded. Its easy enough to put the file IO in separate threads which is what I will do with it later but the target is any pure unicode text file which it seems to do fine now that it does not lock up for a while on binary files. I currently have a message box with the choice of loading the non-compliant file but its only really to catch incorrect file types.
Title: Re: uniedit 2 beta.
Post by: jj2007 on April 24, 2011, 03:46:09 PM
Quote from: dedndave on April 24, 2011, 02:25:56 PM
i suppose a unicode resource file will have the BOM (don't know if rc.exe uses that or not)
but, individual string definitions will not have the BOM

I use UC resource files all the time with RichMasm, without any problems. Microsoft's rc.exe does not need the BOM but is also not unhappy finding it. By the way, the BOM appears only once at the beginning of a UC text file and is a pretty unique 2-byte combination.
Title: Re: uniedit 2 beta.
Post by: aker on April 26, 2011, 02:36:43 AM
Hope to support all unicode encoding,  :lol

CodepageId CodepageName
1200 Unicode
1201 Unicode (Big-Endian)
12000 Unicode UCS-4 Little-Endian
12001 Unicode UCS-4 Big-Endian
65000 Unicode (UTF-7)
65001 Unicode (UTF-8)
65005 Unicode (UTF-32)
65006 Unicode (UTF-32 Big-Endian)

and WideChar.