News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

uniedit 2 beta.

Started by hutch--, April 24, 2011, 02:52:47 AM

Previous topic - Next topic

hutch--

I have a fair bit more work done on this version, the search and replace appear to be working properly, I have done a ZOOM menu that is working OK on my XP SP3 dev box and so far all of the hot keys are working correctly here. It is an English language editor in that the menus and dialog boxes have English text but the target is for it to be able to open, edit and save any unicode text that the OS version it is running on supports with any character set available as a unicode font. I don't know enough about right to left languages like Arabic and Hebrew and as far as I can tell it is not possible to get the text entry right to left on my US English Windows version but it appears to be doing the rest OK and that includes Chinese, Japanese, Thai, Russian, Georgian, Greek and a few others from east Asia.

I am sorry to impose on members but I need to be able to provide a unicode editor so that unicode RC files can be written using non ANSI character sets as resource strings that are accessed by LoadStringW and displayed natively as unicode.

At this stage of development there is no file confirmations done as it is a pest while doing a lot of testing. It does not yet store any settings as I need to have a pretty good idea of what to keep in INI files before I add the capacity.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

it looks really nice, Hutch
the text even zooms smoothly with Ctrl-Wheel   :U

i was able to close after edit - no warning that i had not saved my work
i'm sure that's something you intend to add later

hutch--

Dave,

Thanks for testing this. Yes, that is the file save confirmations, while I am working on it, its a pain and I don't want to do the settings until I know reasonably well what I want to save settings for.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Hutch,

When I open an existing rc file, I get this:


Maybe you should test for, and save, the BOM? It is straightforward, see this snippet from the RichMasm source:
mov SF_x, SF_TEXT ; default is plain text
invoke ReadFile, hFile, addr Read4, 4, ADDR dwBytesRead, 0
mov eax, Read4
or eax, 20200000h ; force lowercase for tr
.if eax=="tr\{"
mov SF_x, SF_RTF or SFF_PLAINRTF ; rtf if you find the magic string
mov NeedsCode, eax ; tr\{ as flag for StreamIn
.elseif ax==0FEFFh ; BOM
mov SF_x, SF_UNICODE or SF_TEXT
.endif
invoke SetFilePointer, hFile, 0, 0, FILE_BEGIN

dedndave

you probably already know this stuff   :P




the attachment is the above PNG image renamed to ZIP

hutch--

JJ,

There is a test for unicode API but I have not played with it yet. It apparently works on most text but is unreliable on very low character counts. This editor is designed specifically for UNICODE plain text, I don't want it dependent on any other format or configuration files. It is humourous that ansi characters read as chinese characters in unicode, just that the chinese characters are high enough up in the 2 byte WORD range not to have a trailing 0 after each characters.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Don't bother with the Unicode API. Here is a test that covers all cases: RTF, plain ANSI, Unicode with and without BOM.

invoke ReadFile, hFile, addr Read4, 4, ADDR dwBytesRead, 0
mov eax, Read4
m2m edx, SF_TEXT ; default is plain text
or eax, 20200000h ; force lowercase for tr
.if eax=="tr\{"
mov edx, SF_RTF or SFF_PLAINRTF ; rtf if you find the magic string
.elseif ax==0FEFFh ; BOM
mov dl, SF_UNICODE or SF_TEXT
.else
and eax, 0CF00FF00h
.if Zero?
mov dl, SF_UNICODE or SF_TEXT ; Unicode without BOM
.endif
.endif
mov SF_x, edx
invoke SetFilePointer, hFile, 0, 0, FILE_BEGIN ; back to start of file
mov editstream.pfnCallback, StreamInProc ; stream in the text
...
invoke SendMessage, hRE, EM_STREAMIN, SF_x, addr editstream

hutch--

JJ,

This is the format that the default EM_STREAMOUT writes the content to disk. In byte pairs for low ANSI, the first character contains the character, the second a NULL. On higher characters you have the first and second characters both as non NULLs. What I don't want is any leading prefix as the editor is designed to work in plain text only. QE opens anything including binary and while it displays garbage in binary and readable but not savable unicode with double spacing, it has never been a problem. I may end up testing for unicode as large image files are very slow to load even though they display garbage.

What I am chasing at the moment is if it performs search and replace on Win versions that use non European characters sets. I have it tested in Chinese, Japanese, Thai, Georgian, Russian, Greek and a few others and it seems to be working properly but people who are native users of non-european character sets may see something that I have not tested.


    db 47,0,47,0,32,0,164,0,164,0,164,0,164,0,164,0
    db 164,0,164,0,164,0,164,0,164,0,164,0,164,0,164,0
    db 164,0,164,0,164,0,164,0,164,0,164,0,164,0,164,0
    db 164,0,164,0,164,0,164,0,164,0,164,0,164,0,164,0
    db 164,0,164,0,164,0,164,0,164,0,164,0,164,0,164,0
    db 164,0,164,0,164,0,164,0,164,0,164,0,164,0,164,0
    db 164,0,164,0,164,0,164,0,164,0,164,0,164,0,164,0
    db 164,0,164,0,164,0,164,0,164,0,164,0,164,0,164,0
    db 164,0,164,0,164,0,164,0,164,0,164,0,164,0,164,0
    db 164,0,164,0,164,0,164,0,164,0,164,0,164,0,164,0
    db 164,0,164,0,164,0,164,0,164,0,164,0,164,0,164,0
    db 164,0,164,0,164,0,164,0,164,0,164,0,164,0,164,0
    db 164,0,164,0,164,0,164,0,164,0,13,0,10,0,13,0
    db 10,0,35,0,105,0,110,0,99,0,108,0,117,0,100,0
    db 101,0,32,0,34,0,92,0,109,0,97,0,115,0,109,0
    db 51,0,50,0,92,0,73,0,78,0,67,0,76,0,85,0
    db 68,0,69,0,92,0,82,0,101,0,115,0,111,0,117,0
    db 114,0,99,0,101,0,46,0,104,0,34,0,13,0,10,0
    db 13,0,10,0,47,0,47,0,32,0,164,0,164,0,164,0
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Hutch,

For practical reasons, you should check for the BOM, as shown above. It is a de facto standard for plain text Unicode files, and not by accident, even \masm32\bin\rc dated 1998 knows what a BOM is.

hutch--

Interestingly IsTextUnicode() worked better than I expected with judicious choice of the flags. The only problem was very short text in characters high enough to use both bytes with no nulls or CRLFs and that was easy to fix, if the text is under what a normal long line is, allow it, garbage and all. This way I preserve the original design of pure plain text. I don't want to go down the slippery slope of UTF8/16 RTF, PDF, DOC(x) etc .... If I was writing a word processor it may be viable to do that but I want a pure plain text editor. The IsTextUnicode() function solved the problem of accidentally trying to load a binary file or something in non-unicode format which was effectively locking up while it chomped through the non-unicode data.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

i think the problem is, Hutch is trying to create a unicode editor for resource files, not necessarily for general unicode text files
i suppose a unicode resource file will have the BOM (don't know if rc.exe uses that or not)
but, individual string definitions will not have the BOM

hutch--

Dave,

It will open anything it can write and happily opens the unicode DrWatson log files and every other Microsoft unicode file I could find, the problem was if you tried to stream in a binary file by accident the EM_STREAMIN processing was very slow and the editor just about locked up until it loaded. Its easy enough to put the file IO in separate threads which is what I will do with it later but the target is any pure unicode text file which it seems to do fine now that it does not lock up for a while on binary files. I currently have a message box with the choice of loading the non-compliant file but its only really to catch incorrect file types.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Quote from: dedndave on April 24, 2011, 02:25:56 PM
i suppose a unicode resource file will have the BOM (don't know if rc.exe uses that or not)
but, individual string definitions will not have the BOM

I use UC resource files all the time with RichMasm, without any problems. Microsoft's rc.exe does not need the BOM but is also not unhappy finding it. By the way, the BOM appears only once at the beginning of a UC text file and is a pretty unique 2-byte combination.

aker

Hope to support all unicode encoding,  :lol

CodepageId CodepageName
1200 Unicode
1201 Unicode (Big-Endian)
12000 Unicode UCS-4 Little-Endian
12001 Unicode UCS-4 Big-Endian
65000 Unicode (UTF-7)
65001 Unicode (UTF-8)
65005 Unicode (UTF-32)
65006 Unicode (UTF-32 Big-Endian)

and WideChar.
伟大的恐怖主义革命家拉登,因遭袭医治无效,于2011年5月1日在巴基斯坦逝世,享年54岁