betabug... Sascha Welter

home english | home deutsch | Site Map | Sascha | Kontakt | Pro | Weblog | Wiki

06 November 2007

MiniPlanet, a Mini Feed Aggregator for Zope

Released fresh from the tar presses!
 

Here it is: a minimal planet for collecting the RSS (and atom) feeds of blogs and other sites you are reading, and for showing them off to your visitors on your Zope site. Want to see an example? Just look at the lower right corner of my weblogs main page, there it is.

This thing has been in the making for a while, but that just means it's already in production use for a while too (here and recently on Wu's blog too). In the last week or so I've cleaned it up, added a couple more features and tests and release documentation.

The code tries to write to the ZODB as little as possible. It uses BTrees for almost everything, follows If-modifed-since headers (actually it uses feedparser, and feedparser does that). It's also optimized to use only very little screen space... it's a mini planet.

Get it from the MiniPlanet release page (with download link) on my wiki!


Posted by betabug at 20:51 | Comments (4) | Trackbacks (0)
ch athens
Life in Athens (Greece) for a foreigner from the other side of the mountains. And with an interest in digital life and the feeling of change in a big city. Multilingual English - German - Greek.
Main blog page
Recent Entries
Best of
Some of the most sought after posts, judging from access logs and search engine queries.

Apple & Macintosh:
Security & Privacy:
Misc technical:
Athens for tourists and visitors:
Life in general:
<< Visit to a Rainy Acropolis | Main | Old Code >>
Comments
Re: MiniPlanet, a Mini Feed Aggregator for Zope

nice!

a few observations from running a feedparser based aggregator for some years now:

- do not trust feeditem dates; there are still so many feeds that use modification time instead of publication date, thus items could show up multiple times in your feed listing -- the guid or the link is used as the unique identifier of the feeditems.

- feedparser does not timeout anymore; if a server does not respond to your request, feedparser will wait forever (and block a thread); best is setting the socket timeout to a sensible value (10 - 30 seconds) during the request.

- i don't know if feedparser switched automatic to UTF-8 conversion recently, but there is often a difference in what a feed pretends its data encoding is and the effective encoding.

- if feedparser fails to parse a feed it sets a bozo value and a bozo_exception object that holds the parser error message. you want to test for feed.bozo==0 before starting to access the feeditems, sometimes the exception is just reporting a wrong character encoding, sometimes no entries are accessible at all.

on the topic of creating a feed from your aggregated data i'd recommend looking into the django syndication feed framework (http://www.djangoproject.com/documentation/syndication_feeds/) which is in fact a very usable single module that lets you produce RSS2 and ATOM feeds from arbitrary data.

Posted by: d2m at November 06,2007 23:06
Re: MiniPlanet, a Mini Feed Aggregator for Zope

Hi d2m!

Thanks for your feedback! In fact it was you who pointed me to feedparser ("so it's all your fault!" haha). I'll look into these with more detail, but here are some first comments:

- I've noted dates jumping around (older posts "jumping" up) on some feeds, but no doubled posts yet. If there was one, my code would likely just use the 2nd one it encounters. Will think that one through.

- Haven't noticed any problems with timeouts or getting stuck. In fact I think I remember having tested for that when I started out, but I'll retest now.

- I might have been lucky with my feeds, since they use all kind of weird stuff (like "Greek", who can read that stuff anyway?) maybe they were by luck all encoded right. I will add a unit test for a wrongly encoded feed and I'm thinking about adding a "manual encoding override" to the settings of a feed.

- Agreed, I should test for that bozo stuff!

I'll look at that django page with more detail, currently I failed to spot where I could use the stuff outside of django - but I'll find it!

Thanks again!

Posted by: betabug at November 07,2007 10:05
Re: MiniPlanet, a Mini Feed Aggregator for Zope

> I'll look at that django page with more detail, currently I failed to spot
> where I could use the stuff outside of django - but I'll find it!

My fault, the link should go to the low-level framework
http://www.djangoproject.com/documentation/syndication_feeds/#the-low-level-framework

the module is here
http://code.djangoproject.com/browser/django/trunk/django/utils/feedgenerator.py

the dependencies on django.utils are minimal and can be resolved by cut'n'paste easily

i haven't found an easier way to create feeds except for writing my own implementation in zope ;)

Posted by: d2m at November 07,2007 11:44
Re: MiniPlanet, a Mini Feed Aggregator for Zope

Ah yeah, I had spotted that link, but was unsure if it was what you meant. I'll have a look at it. I always found generating RSS in Zope to be plain and easy, but I'll look into a more organized approach this time :-)

As for the timeout, I checked it and indeed feedparser waits forever. I've got a patch ready and I think this evening I'll find time to update MiniPlanet.

Posted by: betabug at November 07,2007 11:56
Trackbacks
You can trackback to: http://betabug.ch/blogs/ch-athens/715/tbping
There are no trackbacks.
Leave a comment