Quasi-Normal in Numbers
Zope programmers learn sooner or later that persistent objects don't like dictionaries and lists as attributes. Why? Because to keep their values around you have to assign them back to the object - and that will write a new revision compromising all of the object to the ZODB. Which wastes space and can lead to more ConflictErrors. But how much space? Yesterday evening I found one such case in Zwiki and in moving the dictionary in question to a BTree, I wrote down some numbers, in the process also rediscovering the ZCatalog in there...
Zwiki visitors can "rate" pages with 1-5 stars. The code in question stored the personal rating in a dictionary on the page. At first there's nothing bad about that, there are not so many ratings by so many people anyway. But the mentioned unpersistence on dictionary attributes led me to rewrite the code. The diff will soon be in the Zwiki darcs repository, so I'm not pasting it here. It's not the insteresting part. Let's look instead at the Data.fs growth, measured by a simple ls -l.
All numbers are in bytes. For reference I've rated pages from two browsers, one with a logged in user, on with a quasi-anonymous user with a user name set in a cookie. I've always double checked that simple page loads won't trigger stuff that grows the Data.fs, but otherwise things might not be very scientific.
For comparison, I've looked at some simple actions on a Zwiki page: Adding a comment and a visitor setting their name in the "options" page. Obviously the Data.fs growth of saving a comment is highly dependant of the size of the page and the comment.
As I was checking the numbers for the old code, quickly I discovered that some of the growth is due to the ZCatalog getting fed too. We can measure that effect.
old code without catalog reindex
Some of the growth clearly is coming from reindexing the object in the catalog. But the main observation here is that voting on a big page results in more bytes being written to Data.fs, since we are still writing down all of the object for each vote.
Here I have rewritten the code to store ratings in an OOBTree. The migration of dictionary to BTree is not reflected in the numbers - that has to be done just once anyway, not for each rating being registered. We already see some reduction here, but the main observation is again on the difference of the "normal" page (more or less a default Zwiki FrontPage) to a slightly bigger page: There are in fact less bytes written when voting on the big page now, obviously we do not write all of the object to the ZODB any more. But then we still write a lot of bytes for such a small vote. Where does it all go?
new code without catalog reindex
Getting rid of reindexing for a moment and... we get very reasonable numbers all of a sudden. I can imagine a bit of overhead for writing objects to an OODB, so writing 169 bytes for a vote looks reasonable. You can even see that the 2nd accounts username is a bit shorter than the 1st. Lesson learned here: If you don't need to index your object in the ZCatalog, don't.
Reindexing all of the indexes in the ZCatalog is some overhead, but we do not change so much on the object. So why not just reindex only those indexes which we actually changed on our object?
new code reindexing only 2 indexes
The code for this looks basically like this:
catalog.catalog_object(object_that_changed, idxs=['rating', 'voteCount'])
We are getting down a bit again, but only slightly so. What might be the reason?
new code reindexing only 2 indexes, without metadata
Where we have used an optional parameter on the same line:
catalog.catalog_object(self, idxs=['rating', 'voteCount'], \ update_metadata=0)
Here we told the catalog update to not update metadata. Obviously in practice this might not be a good choice for our code, as index and metadata diverge now, but it can show us something in the numbers: They are very reasonable now, we're writing a few hundred bytes, but we've got updated index and an updated object. Lesson learned here No. 1: If you don't need metadata in your ZCatalog, don't put it there.
Oh, and Lesson learned here No. 2: Even though the idxs parameter on the catalog_object() will update only the specified indexes, the call still will update all the metadata. There is a comment in the Zope code (in Products/ZCatalog/Catalog.py on updateMetadata()) "Given an object and a uid, update the column data for the uid with the object data if the object has changed" which could be mistaken that only the changed metadata is updated, but indeed all the metadata seems to be rewritten (will need to grok that particular piece of code more).
For the moment we will go for the "new code reindexing only 2 indexes" version in Zwiki. BTrees are nice. Cataloging stuff is a tradeoff.
Back to our quasi-normal state of hacking, or as Saad reports about the strike in Paris:
Le trafic aérien à Air France devrait reprendre de façon quasi-normale...