betabug... Sascha Welter

home english | home deutsch | Site Map | Sascha | Kontakt | Pro | Weblog | Wiki

22 April 2005

Search in Greek Works Now

Hacked CJKSplitter to work with Greek in utf-8

Zope splitters are: a.) used to split text into words to make them indexable and b.) seemingly deep magic because documentation is nonexistant (or I couldn't find any). What I ended up with was the sourcecode to CJKSplitter. So I hacked CJKSplitter into doing Greek. It was not especially difficult, mostly a "delete" job (chinese is way more complex). Also I replaced the range of chinese unicode characters with the range for Greek. Might add iso-8859-1 character ranges too, especially for this blog.

Once I clean up the code, have a look at the license and ask on the mailing list if this is really the proper way to go, I will put this online. It might be a good solution for Greek in Zope with ZCTextIndex.

But in short: The result so far is that searching in Greek should now work in the search form on this weblog.

Posted by betabug at 14:54 | Comments (0) | Trackbacks (1)
ch athens
Life in Athens (Greece) for a foreigner from the other side of the mountains. And with an interest in digital life and the feeling of change in a big city. Multilingual English - German - Greek.
Main blog page
Recent Entries
Best of
Some of the most sought after posts, judging from access logs and search engine queries.

Apple & Macintosh:
Security & Privacy:
Misc technical:
Athens for tourists and visitors:
Life in general:
<< vi commands | Main | Some Links for Searching and Greek >>
There are no comments.
You can trackback to:
Searching ZCTextIndex in Greek, properly

A long time ago I had made my own Greek Unicode splitter for ZCTextIndex. That worked fine, but
it didn't take the pronunciation marks into consideration (so searching for
"ελληνικα" didn't find "ελληνικά"). Today I found through the greek plone

Read the linking post here: ch-athens at November 16,2006 12:33