Greenglish, Greeklish, Γρήκλης
Chatted with yorix from HelMUG today on AIM (using BitlBee on my side) and he mentioned that he had always wanted to write a program to translate Greeklish to real Greek characters and back. As it happened we were both using Greeklish and as it also happened I had been looking up information on Greeklish on the web the day before.
Greeklish (or Greenglish) is writing Greek with latin characters, for example you write ellhnika instead of ελληνικά. Transliteration relies on the human ability to make sense out of nonexact rules. Machine transliteration would face difficulties, because everybody makes their own Greeklish. I had some ideas about coding for transliteration already...
Some links: Scientific approach (Greek), a table for transliteration (English), sample from irc instructions.
Another example is part of our discussion in the chat:
11:41 [+yorixofhelm] betabug: Apo kairo hthela na kanw mia efarmogi pou na
metafrazei apo greeklish se ellinika...
11:41 [+yorixofhelm] betabug: Mporei loipon na fanei xrisimh :-)
11:41 [+yorixofhelm] betabug: Alla mporeis na kaneis kai ayto pou les...
11:42 [@betabug] yorixofhelm: nai, kai egw to skeftika ayto
Yorix writes the greek θ (theta) as "th", while I tend to use the 8. But since nobody tells me so, I may forget and use th sometimes. Same game with other characters. Yorix uses "u" for the Greek "ypsilon", while I go with the latin "y". His choice is better as it optically matches in better and leaves latin "y" open to be possibly used for Greek psi (ψ). I changed my habit.
Most disturbing for me is the setting of "r" for ρ (greek ro). I always want to use "p" for this, which matches better optically. But there is no obvious optical choice for π (greep pi) so it is stuck at "p".
Another problem with machine supported "Greeklish" would of course be that this is very much "up" in the human interface, close to the top of the application level. It could almost be implemented as a keyboard layout and font (so you could write and see proper Greek characters on your own side, while the other side gets Greeklies). This solution would presume an encoding where characters have a 1-to-1 relationship. Theta (θ) could not be translated to "th" because the character we see a different character on our side, but it's still just one character.
If OTOH we code a routine that transliterates input and display, then my first question is: "How do we hook that up to our applications?" Where does iChat get it? How do I tell Terminal.app? Writing web pages in TextEdit, have to call an AppleScript every time I save?.
Terminal.app might be the most simple, as in the shell I can put an Expect script between keyboard input (STDIN), display (STDOUT) and translate all kind of stuff. There is an example script for Expect that does Dvorak keyboard layout in that way.
Using a font/keyboard layout, we'd get the exact same transliteration always, like it or not. If you like to use some other transliteration, you have to change the font. And on the receiving side, if your partner uses some other transliteration, you might see partial garbage coming in. People will probably never change their habits for you, so you will for example have to live with the wrong Sigma at the end of words this way.
The program routine could employ extra logic to guess at the transliteration used on the other side. A lexicon of greek words could be used to make best guesses at certain ambivalent characters. This will have the result that we don't get full round-trip fidelity, more of a very loose mapping... but that's what Greeklish is all about.
Update: I'm now using my own python Greek to Greeklish converter, which does Unicode UTF-8 too - at least for mail reading (I don't convert Greeklish to Greek back).