Python Ελληνικά (Greek) to Greeklish Converter
As mentioned in this post about my mutt mail setup for Greek mails and in this post about Greek in OS X Terminal, I have a special setup. On my OSX machine I teached mutt and vim to do Greek. But on my OpenBSD system I translate incoming Greek mails to "Greeklish" for viewing through ssh. Until yesterday I was using the gr2gr perl script for this, but now I finished my own python version, with the advantage that it works for UTF-8 mails too. Read on for the source code...
I had the "ISO-8859-7 to greeklish" part pretty fast (basically a minimalistic rip-off of gr2gr), but the Unicode UTF-8 had me gnawing on the bone for some weeks. What helped me in the end was stumbling upon this article about encoding and decoding Unicode in python. Great shit! If you do python, read it right now. Also the rest of the article. I had haggled with my encode()s and decode()s for weeks and with this article I got it right in 5 minutes. Well, to my defense I must say that I still don't know why "decode" is named like that, since
unicode_string = input_string.decode('utf-8')
does not decode anything, but instead somehow declares this string to be in utf-8 Unicode.
But whatever, here is the script. It's called el2gr (for ελληνικά to greek):
#! /usr/bin/env python
# -*- coding: utf-8 -*-
# el2gr
# A python script to transliterate
# greek (utf-8 or ISO-8859-7) to "greeklish".
# Of course uses my own preference for greeklish.
#
# usage:
# pipe input into STDIN, read input from STDOUT
# use --unicode, -u, --utf8, --utf-8 if your input is in UTF-8
# otherwise it will assume iso-8859-7
# characters not in the iso-8859-7 range will be replaced by '?'
unioptions = ['--unicode', '-u', '--utf8', '--utf-8']
import string
import sys
if len(sys.argv) > 1:
option = sys.argv[1]
else:
option = '-iso'
input_string = sys.stdin.read()
from_chars = 'áâãäåæçèéêëìíîïðñóòôõö÷øùÜÝÞßúÀüýûàþÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÓÔÕÖ×ØÙ¶¸¹ºÚ¼¾Û¿'
to_chars = 'abgdezh8iklmn3oprsstufxywaehiiiouuuwABGDEZH8IKLMNJOPRSTYFXCWAEHIIOUUW'
if option in unioptions:
input_string = input_string.decode('utf-8')
input_string = input_string.encode( 'iso-8859-7', 'replace' )
translation_table = string.maketrans( from_chars, to_chars )
input_string = string.translate( input_string, translation_table )
print input_string
It does STDIN/STDOUT only (like a well behaved little Unix citizen), input is expected in ISO-8859-7 unless you tell it to do Unicode UTF-8. Output is ISO-8859-7, but with most of it transliterated to 7bit characters. If I missed anything that you need, or if you want another flavour of greeklish, change from_chars and to_chars. If from_chars does not survive you copy-pasting it, try the script file here.
Along with this, I also have these entries in my ~/.mutt/mailcap file:
text/plain; el2gr --utf-8 ; \
test=test "`echo %{charset} | tr '[A-Z]' '[a-z]'`" = utf-8; \
copiousoutput
text/plain; el2gr ; \
test=test "`echo %{charset} | tr '[A-Z]' '[a-z]'`" = iso-8859-7; \
copiousoutput
What else can I say? It works for me. Use at your own risk and do not blame me when you get flamed for using greeklish / greeklies / greenglish instead of proper ελληνικά. And don't expect to get proper Greek back out of the converted text, remember that greeklish is a one way road.