As a US user I have 26 letters in the alphabet. All the characters can be handled in the 256 byte EBCDIC code.
How in the world do the Orientals cram thousands of their characters into this EBCDIC code?
i don't think you use ebcidc
i think you use ascii
anyways, they "cram" all those characters in by using unicode
each character uses 2 bytes (or possibly more)
Did you know that you could print Japanese to the Console if you use UTF8 text?
I actually noticed that when doing CGI web programming . UTF8 strings are not weird and they are all zero terminated, so we can use good old ASCII apis to print UTF8 characters :green
(Of course, you won't see them in the old barebones console of Windows but on the webpage)
Jack,
Dave is right, with other alphabets and writing systems they generally use UNICODE which can handle 64k of characters, not the 256 in the ASCII/ANSI set. It allows for traditional chinese, big 5 chinese, japanese Kanji, arabic, greek and a whole host of other character sets that cannot easily be handled in the 256 available in ASCII?ANSI.
There are various types of UNICODE as well.
UTF-16 : the NT Kernel operates in UTF-16 (fixed 2 byte unicode). And so do all the Unicode APIs of Windows.
UTF-8 : Most web pages, if not all the web operates in UTF-8 (1 to 4 bytes)
UTF-32 : Nothing uses that yet.
Thanks for the explanation guys.
As a side, I would think that would increase the size of programs by 2 to 4 times.
So on the Puter screen for English I can get 80 characters on one line and 40 or 20 in Chinese.
Is that a correct assumption?
Just curious.
normally, displayed text is only a small part of a program, so it doesn't affect the size that much
but - no - they get 80 chars on a line in a normal console
windows loads what is called a "code page" that translates characters for the display screen
it doesn't behave like the traditional DOS screens of old
There are also traditional character encodings, such as JIS and Shift-JIS for Japanesse, and Big 5 for traditional Chinese.
Don't know the details.
Ok, I can see how each characters is coded far as zeros and ones but how do you put
3,000 or so characters on a keyboard???
simple - they have small hands and can type twice as fast ? - lol
Quote from: shankle on August 28, 2009, 08:04:41 PM
Ok, I can see how each characters is coded far as zeros and ones but how do you put
3,000 or so characters on a keyboard???
Obviously, you don't
There's a common subset that you can by with, which reduces it to maybe 200.
Then, for actually inputting symbols, you essentially spell them out (since each one represents a word/notion) and as you type they're auto-completed so you can pick out the one you want.
Thanks for the response guys.
Seems to me there are 2 solutions for there very difficult languages.
1: learn English
2: Get a Scottie Puter (like in Star Trek)
I would wonder how they ever get anything done on a Puter with such
difficult languages.
many of them know enough english to get past a command prompt - lol
windows "codepages" help them a lot, i am sure
Quote from: shankle on August 28, 2009, 08:04:41 PM
Ok, I can see how each characters is coded far as zeros and ones but how do you put
3,000 or so characters on a keyboard???
Multiple key combinations for the most common stuff. As I don't do Chinese, I'm don't know the logic of the key combinations.
In the case of Japanese, IME uses "Romanized" notation. Enter the word using Latin-1 characters. It will convert to Hiragana, the preferred phonetic notation. Sometimes it will attempt to convert to Kanji. At any time after conversion, you can hit the space bar to select alternate notations. It's similar to the way the Japanese wapuro (word processors) worked.
I bet on every language there would be less than 4 Gigs word used on daily conversation. If only we could make a words database where each unique word can be represented by a dword. It will decrease the size of the text file.
Maybe :green
:bg
こんばんは。漢字タイピング平易。
Good evening. kanji typing is easy.
Load all of the east asian fonts, get an IME editor and the biggest collection of dictionaries you can find and start typing. I found a program called WAKAN 1.76 that seems to work fine. Word order is SUBJECT, OBJECT, VERB [particle]. Particles will take some time to get the swing of. I have it set up to type romaji and it produces hirigano alphabetic text, watch the word to see if you got it right, press space bar to insert kanji character.
this stuff is confusing, but the UTF-8 looks interesting, you have any c/asm/delphi etc functions that can convert em, or play with utf-8