betabug... Sascha Welter

home english | home deutsch | Site Map | Sascha | Kontakt | Pro | Weblog | Wiki

Entries : Category [ language ]
Language is one of the interesting factors living in a "foreign" city.
[digital]  [language]  [life]  [security]  [media]  [zope]  [tourism]  [limnos]  [mac]  [athens]  [travel]  [montage]  [food]  [fire]  [zwiki]  [schnipsel]  [music]  [culture]  [shellfun]  [photography]  [hiking]  [pyramid]  [politics]  [bicycle]  [naxos]  [swim] 

28 December 2004

Greenglish, Greeklish, Γρήκλης

It's a small world for writing in Greek without a Greek keyboard

Chatted with yorix from HelMUG today on AIM (using BitlBee on my side) and he mentioned that he had always wanted to write a program to translate Greeklish to real Greek characters and back. As it happened we were both using Greeklish and as it also happened I had been looking up information on Greeklish on the web the day before.

Greeklish (or Greenglish) is writing Greek with latin characters, for example you write ellhnika instead of ελληνικά. Transliteration relies on the human ability to make sense out of nonexact rules. Machine transliteration would face difficulties, because everybody makes their own Greeklish. I had some ideas about coding for transliteration already...

Some links: Scientific approach (Greek), a table for transliteration (English), sample from irc instructions.

Another example is part of our discussion in the chat:

11:41 [+yorixofhelm] betabug: Apo kairo hthela na kanw mia efarmogi pou na
metafrazei apo greeklish se ellinika...
11:41 [+yorixofhelm] betabug: Mporei loipon na fanei xrisimh :-)
11:41 [+yorixofhelm] betabug: Alla mporeis na kaneis kai ayto pou les...
11:42 [@betabug] yorixofhelm: nai, kai egw to skeftika ayto

Yorix writes the greek θ (theta) as "th", while I tend to use the 8. But since nobody tells me so, I may forget and use th sometimes. Same game with other characters. Yorix uses "u" for the Greek "ypsilon", while I go with the latin "y". His choice is better as it optically matches in better and leaves latin "y" open to be possibly used for Greek psi (ψ). I changed my habit.

Most disturbing for me is the setting of "r" for ρ (greek ro). I always want to use "p" for this, which matches better optically. But there is no obvious optical choice for π (greep pi) so it is stuck at "p".

Another problem with machine supported "Greeklish" would of course be that this is very much "up" in the human interface, close to the top of the application level. It could almost be implemented as a keyboard layout and font (so you could write and see proper Greek characters on your own side, while the other side gets Greeklies). This solution would presume an encoding where characters have a 1-to-1 relationship. Theta (θ) could not be translated to "th" because the character we see a different character on our side, but it's still just one character.

If OTOH we code a routine that transliterates input and display, then my first question is: "How do we hook that up to our applications?" Where does iChat get it? How do I tell Writing web pages in TextEdit, have to call an AppleScript every time I save?. might be the most simple, as in the shell I can put an Expect script between keyboard input (STDIN), display (STDOUT) and translate all kind of stuff. There is an example script for Expect that does Dvorak keyboard layout in that way.

Using a font/keyboard layout, we'd get the exact same transliteration always, like it or not. If you like to use some other transliteration, you have to change the font. And on the receiving side, if your partner uses some other transliteration, you might see partial garbage coming in. People will probably never change their habits for you, so you will for example have to live with the wrong Sigma at the end of words this way.

The program routine could employ extra logic to guess at the transliteration used on the other side. A lexicon of greek words could be used to make best guesses at certain ambivalent characters. This will have the result that we don't get full round-trip fidelity, more of a very loose mapping... but that's what Greeklish is all about.

Update: I'm now using my own python Greek to Greeklish converter, which does Unicode UTF-8 too - at least for mail reading (I don't convert Greeklish to Greek back).

Posted by betabug at 22:30 | Comments (7) | Trackbacks (0)
19 January 2005

Call for a "Greek & Mac FAQ"

Working in Greek on a Mac needs many fixes yet

Setting up a professional work environment (for example for graphics professionals) on a Mac with Mac OS X in Greek is not there yet. At least not out of the box. There still is a lot of information needed. Even working with a system in English using Greek contents (fonts, text) has its problems still. I propose an FAQ on this topic.


And some applications give their users a hard time too. Examples:

My friend Spyros keeps bugging me with questions, while I keep bugging him to make the switch from 9 to X. And my friend Yorix tells me that many of the problems and questions are brought up again and again. A case for a FAQ. HelMUG - the greek Mac User Group is one of the information providers in the field of Mac OS X and Greek. But is it inpolite to make a call for a big FAQ on that topic?

Posted by betabug at 12:32 | Comments (0) | Trackbacks (0)
13 February 2005

State of Greek and UTF-8 in the Mac OS X Terminal

Some of the setups I made to enjoy a foreign language

The Mac OS X application should be ready for utf-8 and thus for working in Greek. But that's not the complete truth. Applications to work in the terminal have to be ready too. The ones that come with Mac OS X are not. I mainly use vi / vim, mutt, lynx. The vim that comes with Mac OS X.3 (Panther) does not sport the multilingual abilities needed. I experimented with compiling my own, but in the end I used an old version I had downloaded from Marc Liyanage at I custom compiled my own mutt, same with lynx. The point is that you will have to look into your application compile time and runtime settings.

As for the setup itself, in the "Terminal Inspector", the "Character Set Encoding" has to be set to "Unicode (UTF-8)" and "Wide glyphs for Japanese/Chinese/etc." should be set off (contrary to what Apple Help suggests).

What's really ugly is using vim to write in Greek: It's OK while you are in input mode. You just keep writing and input works fine. Then you hit Escape to switch to command mode and every key command either just beeps at you or does something unexpected. The problem is that vi does not know what to do with "ξ" (xi), which is what you get when you hit "j" while in Greek keyboard mode. So the thing is to type "i" (input), switch to Greek keyboard, type your text, switch to non-Greek keyboard again, hit "Escape" (command mode), and continue editing. Also vim wants to be told about using unicode utf-8 explicitly (:set encoding=utf-8).

One day I will set up my vimrc so that command mode will work in the Greek keyboard too, hopefully I just have to remap the Greek letters to vi commands. What I currently do is to type most Greek texts in TextEdit. Not only don't have to struggle with switching keyboards for command mode, but I also enjoy the ASpell spellchecker. See, my spelling in Greek is pretty catastrophic, so I use a lot of suggestions.

Posted by betabug at 19:13 | Comments (6) | Trackbacks (0)
28 February 2005

Ξανά! Another Fight with Unicode

UTF-8 to the rescue - as soon as I've saved myself from itself

Αχ, ξανά τα Ελληνικά! Unbelievably, I'm again fighting with Unicode to get Greek text work in yet another place. This time it's with "NetNewsWire Lite", which I think should work, but it does not in this project. Problem should be on my site though.

Update: Another unbelievable, but not quite so unusual event. Right after I posted I discovered that text was entered in iso-8859-7 into the system. A common problem. Now http headers and xml tags are set to iso-8859-7 to match the content. Works.

Posted by betabug at 12:16 | Comments (0) | Trackbacks (0)
23 March 2005

mutt, ελληνικά, greeklish, OpenBSD and the Mac OS X Terminal

Teaching the dog fake greek

Just teached the mutt to autoconvert ISO-8859-7 Greek to "greeklish". On my Mac OS X PowerBook utf-8 works fine with and mutt, but when I use the mutt on my OpenBSD server (through ssh and I get a mess. Now I can have all this transliterated to (fake, ugly, but working) greeklish. Read on...

The recipe so far consists of:

An example mailcap file with entries for text/plain encodings. From this I created 2 new lines in my mailcap file (~/.mutt/mailcap):

text/plain; gr2gr -l ; \
        test=test "`echo %{charset} | tr '[A-Z]' '[a-z]'`" = iso-8859-7; \
text/plain; cat ;copiousoutput

These will pipe text/plain content with iso-8859-7 encoding through gr2gr. I also had to put the line:
auto_view text/plain
into my muttrc file.

Next was of course this program "gr2gr", which is a perl script I found mentioned on the hellenic-howto. Download link is:


Subject: Greek                                                                  
Date: Wed, 23 Mar 2005 16:55:31 +0200                                           
X-Mailer: Apple Mail (2.619.2)                                                  
[-- Autoview using gr2gr -l --]                                                 
Ta ellhnika einai polu eukola!                                                  

Update: I'm now using my own python Greek to Greeklish converter, which does Unicode UTF-8 too

Posted by betabug at 16:00 | Comments (2) | Trackbacks (0)
09 April 2005

Schach auf Griechisch

Alphabetisch anders

Am Dienstag war ich Schach spielen in einem Schachclub in Kipseli. Die Konfusion meinerseits war gross. Griechische Schachbretter sind zwar mit lateinischen Buchstaben beschriftet, aber griechische Schachspieler benutzen natuerlich die griechischen Buchstaben α, β, γ, δ, ε, ζ, η, θ um die Felder zu bezeichnen.

Seit Oktober hatte ich mich nicht mehr ernsthaft mit Schach beschäftigt und jetzt muss ich auch noch da "um die Ecke denken". Das kann ja nur gut rauskommen. Natürlich habe ich erstmal verloren, aber was solls. Es war ein interessanter Abend. Ich weiss nur noch nicht, wo ich die Zeit hernehmen soll, um regelmässig zu spielen.

Posted by betabug at 10:25 | Comments (0) | Trackbacks (0)
22 April 2005

Search in Greek Works Now

Hacked CJKSplitter to work with Greek in utf-8

Zope splitters are: a.) used to split text into words to make them indexable and b.) seemingly deep magic because documentation is nonexistant (or I couldn't find any). What I ended up with was the sourcecode to CJKSplitter. So I hacked CJKSplitter into doing Greek. It was not especially difficult, mostly a "delete" job (chinese is way more complex). Also I replaced the range of chinese unicode characters with the range for Greek. Might add iso-8859-1 character ranges too, especially for this blog.

Once I clean up the code, have a look at the license and ask on the mailing list if this is really the proper way to go, I will put this online. It might be a good solution for Greek in Zope with ZCTextIndex.

But in short: The result so far is that searching in Greek should now work in the search form on this weblog.

Posted by betabug at 14:54 | Comments (0) | Trackbacks (1)

Some Links for Searching and Greek

These might come in handy

Although this is not a "link blog", here are some references I found while fooling around with splitters for ZCTextIndex.

This is somehow the same problem I'm trying to solve, someone who wants to search using ZCTextIndex and Unicode characters:
Here are some greeks who want to search in iso-8859-7, an old entry into the Zope Collector:

(I'd suggest switching to UTF-8 and using something like the splitter I made with ZCTextIndex.)

The chapter about searching and indexes in Zope:
Unicode character table:
Had to do this on my source file to stop a warning (because python wants to know about unicode characters in source files):
This should really resolve all these issues and provide for better search functionality:

(I could not find any documentation about e.g. the supported languages on the TextIndexNG site, and I would have to change COREBlog, so I gave up on this for the moment.)

Posted by betabug at 15:59 | Comments (0) | Trackbacks (1)
25 May 2005

Υιοθετείστε ένα ωράριο

Και μπερδέψτε εναν Ελβετό!

Πριν μερικές εβδομάδες έμαθα τη λέξη "υιοθετώ"[1]. Δηλαδή, θα την ξαναμάθω πάλι σε λίγες εβδομάδες, αφου θα την έχω ξεχάσει.

Σήμερα όμως άκουσα με ένα αφτί στην τηλεόραση[2] ότι ψάχνουνε κάποιον να "υιοθετήσει το καλοκαιρινό ωράριο". Συγνώμη, τι; Η μάνα του, που είναι δηλαδή; Αλλά δεν πειράζει, συνεχίστε να μπερδεύετε τον Ελβετό!

[1] Εγώ το'χα γράψει "ηοθετήζω" αλλά με διόρθωσαν.

[2] Ναι. Και εγώ βλέπω TV. Κάποιες φορές και μάλιστα με τον ελληνικό τρόπο, να είναι ανοιχτή και να την αγνοούμε. Συνήθως το κάνω για να είμαι "social" μόνο. Αλλιώς, βλέπω πολύ λίγο TV.

Posted by betabug at 22:15 | Comments (2) | Trackbacks (0)
15 June 2005

Greek Localization for 10.4 is Available

Greek Mac OS X without selling your soul

As mentioned at HelMUG, GR-X the unofficial, free and working localization of Mac OS X is available for 10.4 too. Download GR-X here for free from MacUpdate.

This is not the Greek localization from the "official" Apple Macintosh reseller in Greece (IMC) Rainbow SA. GRupdate, the patch from Rainbow is out for 10.4 too, but the problems are even worse then they were with Panther. GRUpdate for Tiger is definitely not recommended, even if you can get it (it's only available if you bought your Mac from Rainbow). If you want Greek menus and dialogs, get GR-X, if you only want to read and write Greek, use Cocoa applications and everything just works.

Posted by betabug at 09:19 | Comments (1) | Trackbacks (0)
20 July 2005

Πως γράφουμε "γριήτηκη" λογικά;

Ο betabug μαθαίνει ελληνικά, επεισόδιο 137

Πως, λοιπόν, γράφουμε την λέξη "γριήτικη" (όπως λέμε "Τι γριήτικη πιτζάμα είναι αυτή;!"). Μπερδεύομαι συνέχεια με όλα αυτά τα ι, η, υ, οι, ει, ιη, ηι, κ.τ.λ. Παρ όλα αυτά, η ελληνική γλώσσα έχει λογική. Και προσφέρει βοήθεια σε όσους έχουνε μεγάλη ανάγκη. Όπως σε μερικές τράπεζες. Πως γίνεται αυτό; Το καταλαβαίνεις όταν κυκλοφορείς με λεωφορείο.

Απόδειξη μέρος Α: Όταν μια λέξη αναφέρεται σε κάτι θηλυκό, το οποίο είναι από την Κρήτη, αυτή είναι... κρητική. Σωστό; Σωστό!

Απόδειξη μέρος Β: Πέρασα με το λεωφορείο στην Συγγρού από μια τράπεζα από την Κρήτη... τράπεζα... (θηλυκή λέξη)... από την κρήτη... τότε: κρητική τράπεζα! Σωστό; Λάθος! "Παγκρητικιά Τράπεζα" λέει η πινακίδα. Εγώ (αγράμματος Ελβετός) δεν το κατάλαβα, μέχρι που σκέφτηκα ότι μια "Παγκρητική τράπεζα" δεν θα συμμετείχε εύκολα στο παιχνίδι των "νεανικών" μεγάλων εταιρειών. Τι ευτυχία που η ελληνική γλώσσα βοηθάει.

Posted by betabug at 09:42 | Comments (0) | Trackbacks (0)
15 September 2005

Athens Weather and News Roundup

Small stuff you likely did not want to know anyway

Just a few remarks about the weather, disfunctional computer kiosks, the effect of the Athens 2004 Olympics on the city, and the smells of small streets in the Athens suburbs...

The weather is going up and down here lately. A few weeks ago it was rather cold, I was switching from bermudas to long jeans. Then it got hot again, the last few days had us sweating with 30 degrees Celsius even at night. This morning some raindrops fell, it has cooled down a bit, but I'm still comfortable with the bermudas. weather in September in Athens can be much steadier, just a softer version of summer.

Diomedes Spinellis (Author of "Code Reading") has a picture of a disfunctional info kiosk on his weblog. I must admit that I've never seen any one of these computer kiosks in working order. But this may be because I was away from Athens during the Athens 2004 Olympics.

Because that was the time when everything worked in Athens. If you can read German, on a site called "Europolitan" there is a feature "Athens the city of contrasts", which is just so much bullshit. It sings the old song of Athens being this dirty city, but with the great background of the ancient Greeks. Right, but never heard of Byzantium and the eastern part of the Greek soul. Also they mention how the Olympics did not take over the city. Right, it was just the event that touched this city more than maybe anything in post WWII times (maybe except for the arrival of reinforced concrete).

As for the smelly part: There are three verbs for "smelling" in use in Greek. "Βρωμάω" - to stink, "μυρίζω" - to smell, "μοσχοβολώ" - to smell good, to have a good smell. Yesterday evening I was walking on a small sidestreet in Μοσχάτο (Moschato, a south eastern suburb). The small olive and orange trees were smelling very good. If there would be cameras for smells, I would have posted a smelling picture. This morning after the few drops of rain, there was an earthy, rainy note about the warm tree smell. I enjoyed it a lot. Μοσχοβολούσε - it had that good smell.

Posted by betabug at 09:18 | Comments (0) | Trackbacks (0)
18 September 2005

Παίζουμε και μαθαίνουμε

Πρόστυχο σταυρόλεξο με τον betabug

Όταν μαθαίνεις μια καινούρια γλώσσα, είναι πολύ σημαντικό, να διασκεδάζεις και λίγο. Συνέχεια "στεγνό" διάβασμα δεν κάνει, και ο καλύτερος μαθητής θα απελπιστεί. Ο betabug μας (δηλαδή εγώ) δεν είναι ο καλύτερος μαθητής, αλλά αυτόν τον κανόνα τον θυμάται πάντα...

Greek scrabble board with dirty words

Ένα από τα αγαπημένα παιχνίδια μου είναι το scrabble στα ελληνικά. Φυσικά θέλω βοήθεια, έτσι όπως την έχετε κάνει, την ελληνική ορθογραφία, δεν μπορώ μόνος μου. Ευτυχώς που το scrabble δεν το παίζεις μόνος σου ποτέ. Οι άλλοι παίκτες είναι αναγκασμένοι να με διορθώσουν.

Τι μαθαίνει κανείς τελικά με το scrabble; Δεν εμπλουτίζει το λεξιλόγιο, γιατί σπάνια θυμάμαι τις λέξεις που μου βάζει ειδικά η Μαίρη, ξέρει κάτι τρελοκαθαρευουζιώτικα, ό,τι γράμματα και να έχει, της βγαίνει καμία λέξη όπως οι "ζάρες" ή η "ανιαρή". (Οι "ζάρες" είναι κάτι με τσαλακωμένα ρούχα και η "ανιαρή" είναι βαρετή, εσείς το ξέρατε;) Καλύτερα πάμε με την ορθογραφία, γιατί αναγκαστικά πρέπει να σκεφτώ πως γράφονται οι λέξεις. Πάνω απ' όλα με βοηθάει όμως να ενισχύσω το λεξιλόγιο που έχω.

Με άλλα λόγια: Πρέπει να θυμηθώ όλες τις λέξεις που έμαθα κάποτε. Τώρα... τα πρώτα μου ελληνικά, τα έμαθα σε ένα συνεργείο μοτοσυκλετών στην οδό πιραιώ. Και δείχνει, όπως βλέπετε στην φωτογραφία. Δεν σας λέω ποιος έβαλε ποια λέξη. Άλλα κατά κάποιο τρόπο οι ζάρες έγιναν βυζάρες και η ανιαρή μετατράπηκε στον στον κλανιάρη. Παρ' όλο που δεν είναι πρόστυχο είμαι πολύ περήφανος για το "ψιλοφύσηξα" που βγήκε ΦΥΣΗ -> ΦΥΣΗΞΑ -> ΨΙΛΟΦΥΣΗΞΑ. Στο τέλος του παιχνιδιού είσαι ΠΙΤΑ (στη μέση αριστερά), μόνο 5 πόντους, άλλα σε τετράδιο "λέξη τριπλής αξίας". Παίζουμε και μαθαίνουμε!

Posted by betabug at 18:51 | Comments (1) | Trackbacks (0)
14 December 2005

Catering... L'Odeur

This smells of a fishy translation
l'Odeur Catering

This weeks Αθηνόραμα (Athinorama)[1] has an advertising brochure bundled for a catering company called... (drumroll) l'Odeur. Don't believe me? Look at the scanned brochure (click on the image for the complete view). My french is not *that* good, but when I saw it I thought something smelled of a fishy translation. Either I was off or they don't have a good nose for the nuances of anything french. So I asked the experts at #bsdcow who assured me (repeating here for those of my readers - out of all three of them - who don't speak french too well) that l'Odeur can indeed be something that smells good (meaning smell in general then), but normally it is something that stinks. Ups. Catering "the stink" anyone. Also note the subtle use of french colors in the "waves" of smell going up. Vive la tricolore or something.

[1] Αθηνόραμα is the Athenian "going-out" paper, which lists cinemas, theaters, restaurants, clubs, etc.

Posted by betabug at 10:03 | Comments (0) | Trackbacks (0)
22 December 2005

Το πιο πρόστυχο σκράμπλ (ever)


Συνήθως, όταν παίζουμε scrabble στο σπίτι μου, δεν παίζουμε ακριβώς με τους κανόνες του παιχνιδιού. Δηλαδή, τον πρώτο γύρο κάπως σωστά τον παίζουμε. Επειδή οι περισσότεροι μας ακόμα μαθαίνουμε ελληνικά, δεν παίζουμε με πόντους και βοηθιόμαστε μεταξύ μας. Αλλά πολλές φορές το δεύτερο παιχνίδι χαλάει λίγο... πάει στην πρόστυχη πλευρά...

Πρόστυχο scrabble

Εδώ έχουμε ένα παράδειγμα για ένα παιχνίδι που ξεκίνησε με απλές λέξεις "ΝΥΧΤΙΚΕΣ", "ΠΑΤΑΤΕΣ", και "ΦΥΣΗΞΑ", αλλά μετά πήρε την κατηφόρα. Δεν είναι πια μόνο "απαραίτητη η γονική συνέναιση", είναι μόνο "ακατάλληλο για ανήλικους". Όχι μόνο λόγω του σεξιστικού περιεχομένου ("ΠΟΡΝΗ", "ΚΑΥΛΙ"), αλλά και γιατί περιέχει και βία ("ΑΙΜΑΤΑ")!

Σκέφτηκα να γράψω εδώ όλες τις λέξεις που περιέχει το πεδίο μάχης με εξηγήσεις, γιατί είναι πολλοί που μαθαίνουν ελληνικά και δεν τα κάνουν αυτά στο φροντιστήριο, αλλά δεν θα το κάνω, γιατί είμαι καλό παιδί! Ήδη το Google επιστρέφει μια από τις σελίδες μου στα πρώτα αποτελέσματα για την λέξη "μουνάρες", χωρίς αυτή η σελίδα να έχει τις τσόντες που ζήτησαν αυτοί που ψάχνανε. Δεν θα γκρινιάξω, αλλά οι καημένοι που έχουν τέτοιες ανάγκες περιμένουν άδικα να κατεβάσουν μια σελίδα που έχει μόνο μια απλή ιστορία από κάτι που άκουσα σε ένα λεωφορείο!

Posted by betabug at 08:57 | Comments (0) | Trackbacks (0)
31 December 2005

Greek in vim with langmap - only ISO-8859-7

Still not there yet for UTF-8

In a comment to my old post about State of Greek and UTF-8 in the Mac OS X Terminal Andreas Triantafillidis asks if I finally managed to use vim with Greek. Answer: The state is still somehow the same. Using a vim with multilingual capabilities compiled in, I can read and write in Greek, but not as good as I want, because mapping of keyboard commands does not work with Unicode, but read on for what I got with ISO-8859-7...

Tassos Pavlakos (in another comment to that post) had suggested vim's langmap command, which is the right tool for the job. But typing :help langmap in vim gives us the following information:

This only works for 8-bit characters. The value of 'langmap' may be specified with multi-byte characters (e.g., UTF-8), but only the lower 8 bits of each character will be used.
This maps right with my experience. Trying to use the langmap feature with Greek in UTF-8 will leave vim just beeping at you.

So today I tried a small experiment. The Mac OS X does not offer ISO-8859-7 in its list of default encodings (and at least for Mac OS X 10.3 there is no way to change that list through the GUI). So I tweaked the rules a bit to see if I could get it to work, here are the steps I took:

So, what does that show us? With a bit of quirky hacking and the right version / compilation settings of vim, we can read and write unhindered in Greek, as long as it's ISO-8859-7 Greek. Which is kind of funny, since vim claims to be UTF-8 internally. But I guess that's just the way the langmap command and it's underlying architecture are implemented. If the terminal software you use supports ISO-8859-7 it's not even that difficult to try it out (I don't know about Linux/BSD terminal apps, my trials are on the Mac). I don't think I will like it though, since I believe the time for the ISO encodings is over, I became a Unicode believer. Last remark: This is vim 6.1, once I'm back on the net I will check out if anything has changed in newer versions.

Posted by betabug at 13:57 | Comments (1) | Trackbacks (0)
[1]   2   3   Next