betabug... Sascha Welter

home english | home deutsch | Site Map | Sascha | Kontakt | Pro | Weblog | Wiki

30 October 2007

Quasi-Normal in Numbers

How much is that?

Zope programmers learn sooner or later that persistent objects don't like dictionaries and lists as attributes. Why? Because to keep their values around you have to assign them back to the object - and that will write a new revision compromising all of the object to the ZODB. Which wastes space and can lead to more ConflictErrors. But how much space? Yesterday evening I found one such case in Zwiki and in moving the dictionary in question to a BTree, I wrote down some numbers, in the process also rediscovering the ZCatalog in there...


Zwiki visitors can "rate" pages with 1-5 stars. The code in question stored the personal rating in a dictionary on the page. At first there's nothing bad about that, there are not so many ratings by so many people anyway. But the mentioned unpersistence on dictionary attributes led me to rewrite the code. The diff will soon be in the Zwiki darcs repository, so I'm not pasting it here. It's not the insteresting part. Let's look instead at the Data.fs growth, measured by a simple ls -l.

All numbers are in bytes. For reference I've rated pages from two browsers, one with a logged in user, on with a quasi-anonymous user with a user name set in a cookie. I've always double checked that simple page loads won't trigger stuff that grows the Data.fs, but otherwise things might not be very scientific.

comparison

For comparison, I've looked at some simple actions on a Zwiki page: Adding a comment and a visitor setting their name in the "options" page. Obviously the Data.fs growth of saving a comment is highly dependant of the size of the page and the comment.

comment add: ~ 26651
saving name in options: 970

As I was checking the numbers for the old code, quickly I discovered that some of the growth is due to the ZCatalog getting fed too. We can measure that effect.

old code

voting: 8755
voting from other account: 8568
voting on big page: 20264

old code without catalog reindex

voting: 5899
voting from other account: 5896
voting on big page: 17396
another vote on big page: 17414

Some of the growth clearly is coming from reindexing the object in the catalog. But the main observation here is that voting on a big page results in more bytes being written to Data.fs, since we are still writing down all of the object for each vote.

new code

voting: 2556
voting from other account: 2553
voting on big page: 2949
another vote on big page: 2528

Here I have rewritten the code to store ratings in an OOBTree. The migration of dictionary to BTree is not reflected in the numbers - that has to be done just once anyway, not for each rating being registered. We already see some reduction here, but the main observation is again on the difference of the "normal" page (more or less a default Zwiki FrontPage) to a slightly bigger page: There are in fact less bytes written when voting on the big page now, obviously we do not write all of the object to the ZODB any more. But then we still write a lot of bytes for such a small vote. Where does it all go?

new code without catalog reindex

voting: 169
voting from other account: 166
voting on big page: 169
another vote on big page: 166

Getting rid of reindexing for a moment and... we get very reasonable numbers all of a sudden. I can imagine a bit of overhead for writing objects to an OODB, so writing 169 bytes for a vote looks reasonable. You can even see that the 2nd accounts username is a bit shorter than the 1st. Lesson learned here: If you don't need to index your object in the ZCatalog, don't.

Reindexing all of the indexes in the ZCatalog is some overhead, but we do not change so much on the object. So why not just reindex only those indexes which we actually changed on our object?

new code reindexing only 2 indexes

voting: 2365
voting from other account: 2362
voting on big page: 2340
another vote on big page: 2337

The code for this looks basically like this:

catalog.catalog_object(object_that_changed, idxs=['rating', 'voteCount'])

We are getting down a bit again, but only slightly so. What might be the reason?

new code reindexing only 2 indexes, without metadata

voting: 608
voting from other account: 614
voting on big page: 591
another vote on big page: 514

Where we have used an optional parameter on the same line:

catalog.catalog_object(self, idxs=['rating', 'voteCount'], \
                                            update_metadata=0)

Here we told the catalog update to not update metadata. Obviously in practice this might not be a good choice for our code, as index and metadata diverge now, but it can show us something in the numbers: They are very reasonable now, we're writing a few hundred bytes, but we've got updated index and an updated object. Lesson learned here No. 1: If you don't need metadata in your ZCatalog, don't put it there.

Oh, and Lesson learned here No. 2: Even though the idxs parameter on the catalog_object() will update only the specified indexes, the call still will update all the metadata. There is a comment in the Zope code (in Products/ZCatalog/Catalog.py on updateMetadata()) "Given an object and a uid, update the column data for the uid with the object data if the object has changed" which could be mistaken that only the changed metadata is updated, but indeed all the metadata seems to be rewritten (will need to grok that particular piece of code more).

Conclusions

For the moment we will go for the "new code reindexing only 2 indexes" version in Zwiki. BTrees are nice. Cataloging stuff is a tradeoff.

Back to our quasi-normal state of hacking, or as Saad reports about the strike in Paris:

Le trafic aérien à Air France devrait reprendre de façon quasi-normale...

Posted by betabug at 11:34 | Comments (0) | Trackbacks (0)
ch athens
Life in Athens (Greece) for a foreigner from the other side of the mountains. And with an interest in digital life and the feeling of change in a big city. Multilingual English - German - Greek.
Main blog page
Recent Entries
Best of
Some of the most sought after posts, judging from access logs and search engine queries.

Apple & Macintosh:
Security & Privacy:
Misc technical:
Athens for tourists and visitors:
Life in general:
<< This Week in Pictures | Main | Black is the New Black >>
Comments
There are no comments.
Trackbacks
You can trackback to: http://betabug.ch/blogs/ch-athens/708/tbping
There are no trackbacks.
Leave a comment