betabug... Sascha Welter

home english | home deutsch | Site Map | Sascha | Kontakt | Pro | Weblog | Wiki

01 July 2005

Python Ελληνικά (Greek) to Greeklish Converter

Life is a one way street... so is this script

As mentioned in this post about my mutt mail setup for Greek mails and in this post about Greek in OS X Terminal, I have a special setup. On my OSX machine I teached mutt and vim to do Greek. But on my OpenBSD system I translate incoming Greek mails to "Greeklish" for viewing through ssh. Until yesterday I was using the gr2gr perl script for this, but now I finished my own python version, with the advantage that it works for UTF-8 mails too. Read on for the source code...


I had the "ISO-8859-7 to greeklish" part pretty fast (basically a minimalistic rip-off of gr2gr), but the Unicode UTF-8 had me gnawing on the bone for some weeks. What helped me in the end was stumbling upon this article about encoding and decoding Unicode in python. Great shit! If you do python, read it right now. Also the rest of the article. I had haggled with my encode()s and decode()s for weeks and with this article I got it right in 5 minutes. Well, to my defense I must say that I still don't know why "decode" is named like that, since

unicode_string = input_string.decode('utf-8')
does not decode anything, but instead somehow declares this string to be in utf-8 Unicode.

But whatever, here is the script. It's called el2gr (for ελληνικά to greek):

#! /usr/bin/env python
# -*- coding: utf-8 -*-

# el2gr
# A python script to transliterate
# greek (utf-8 or ISO-8859-7) to "greeklish".
# Of course uses my own preference for greeklish.
# 
# usage:
# pipe input into STDIN, read input from STDOUT
# use --unicode, -u, --utf8, --utf-8 if your input is in UTF-8
# otherwise it will assume iso-8859-7
# characters not in the iso-8859-7 range will be replaced by '?'

unioptions = ['--unicode', '-u', '--utf8', '--utf-8']

import string
import sys

if len(sys.argv) > 1:
    option = sys.argv[1]
else:
    option = '-iso'

input_string = sys.stdin.read()

from_chars = 'áâãäåæçèéêëìíîïðñóòôõö÷øùÜÝÞßúÀüýûàþÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÓÔÕÖ×ØÙ¶¸¹ºÚ¼¾Û¿'
to_chars =   'abgdezh8iklmn3oprsstufxywaehiiiouuuwABGDEZH8IKLMNJOPRSTYFXCWAEHIIOUUW'

if option in unioptions:
    input_string = input_string.decode('utf-8')
    input_string = input_string.encode( 'iso-8859-7', 'replace' )
translation_table = string.maketrans( from_chars, to_chars )
input_string = string.translate( input_string, translation_table )
print input_string

It does STDIN/STDOUT only (like a well behaved little Unix citizen), input is expected in ISO-8859-7 unless you tell it to do Unicode UTF-8. Output is ISO-8859-7, but with most of it transliterated to 7bit characters. If I missed anything that you need, or if you want another flavour of greeklish, change from_chars and to_chars. If from_chars does not survive you copy-pasting it, try the script file here.

Along with this, I also have these entries in my ~/.mutt/mailcap file:

text/plain; el2gr --utf-8 ; \
        test=test "`echo %{charset} | tr '[A-Z]' '[a-z]'`" = utf-8; \
        copiousoutput
text/plain; el2gr ; \
        test=test "`echo %{charset} | tr '[A-Z]' '[a-z]'`" = iso-8859-7; \
        copiousoutput
What else can I say? It works for me. Use at your own risk and do not blame me when you get flamed for using greeklish / greeklies / greenglish instead of proper ελληνικά. And don't expect to get proper Greek back out of the converted text, remember that greeklish is a one way road.

Posted by betabug at 10:02 | Comments (6) | Trackbacks (0)
ch athens
Life in Athens (Greece) for a foreigner from the other side of the mountains. And with an interest in digital life and the feeling of change in a big city. Multilingual English - German - Greek.
Main blog page
Recent Entries
Best of
Some of the most sought after posts, judging from access logs and search engine queries.

Apple & Macintosh:
Security & Privacy:
Misc technical:
Athens for tourists and visitors:
Life in general:
<< Afternoon in Athens | Main | Wetterbericht >>
Comments
Re: Python Ελληνικά (Greek) to Greeklish Converter

nice ! I will try it as soon as possible

Posted by: topgan1 at July 01,2005 16:30
Re: Python Ελληνικά (Greek) to Greeklish Converter

Very cool!
I agree with your antipathy towards the misnomers that are 'encode' and 'decode', they confused me immediately. Pretty far from Python's "there should be one obvious way to do it" mantra.

Posted by: Paul Giannaros at March 06,2007 17:16
Re: Python Ελληνικά (Greek) to Greeklish Converter

> But whatever, here is the script.
> It's called el2gr (for ελληνικά to greek):

You mean "ελληνικά to greeklish", and not "ελληνικά to greek". "Greek" is the english word for the language spoken and written in Greece. "Greeklish" is an abomination which is sometimes used by some people due to necessity. It is an insult to equate the two.

Posted by: Διαγόρας at June 24,2007 15:22
Re: Python Ελληνικά (Greek) to Greeklish Converter

Great script thanx

Posted by: Bart at December 13,2009 11:36
Re: Python Ελληνικά (Greek) to Greeklish Converter

I tried to use the #-*- coding: utf-8 -*- command but when i
give a string variable called strWord the value "καλημερα",
i tried to print it and everything good. BUT when i print
strWord[2] it cannot print me the letter λ because utf-8
character is a 16bit character and not a 8bit like iso-8859-7.

So, how can i use iso-8859-7 coding in my .py file?

-*- coding: iso-8859-7 -*- doesn't work...

Can anyone help me?

Posted by: nikosokin at November 14,2010 01:28
Conversion of ISO-8859-7 to utf-8 character

Hi,
I am converting file from iso latin code to utf-8 for various languages. For other countries its working but when i am converting file for greek + english characters i.e. greekish file into utf-8 the file which is converted into utf-8 ..not in proper format. The command which i am using is

iconv -cs -f ISO-8859-7 -cs -t utf-8 abc > def

Could you please help me in this scenario?

Posted by: shraddha gaikwad at January 09,2013 04:57
Trackbacks
You can trackback to: http://betabug.ch/blogs/ch-athens/135/tbping
There are no trackbacks.
Leave a comment