Monday, March 03, 2014

A little lesson about document database modelling

This is just a short thing that I experienced when doing the data layer for AptiTalk. AptiTalk is an attempt to create a corporate chat, we want something like Hangout but better. Yeah, we're not only best - we are the most humble as well!

Ok, in this simple setting I learned a thing about using document databases (Mongo in this case).
The data model is really simple. It's just a Posts, Replies and Hashtags. So we created the Posts like this (using Mongoose):
Let's leave the Replies out of this discussion for now. I think I have more to say about that, but I haven't tried it yet. The Hashtags in turn is just the Tag and a reference to all the posts that contains that hashtag. Like this:
Notice the reference to the Post-collection in the pots-array. Both me and Hugo was alright with that model. I started to implement the data layer and wrote tests for it.
But... after awhile it started to squeak. A lot.

I noticed it when i started to put the whole thing together. In order to create a post I had to:
  • Save the Post to the database
  • Check for errors
  • Now loop over all the hashtags in the message ("a #message like #this #contains 3 #tags" for example)
  • For each of the tags... 
    • add it to the database...
      • check for error and ... ? remove the post too? 
    • but if it exists I should instead get it and then update the array of posts and update the Hashtag
      • check for error and ... remove the post and stop processing further?
  • When everything is completed - and no errors has occurred in any of the interactions with the database. 
And then callbacks on top of that... It got hairy I'm telling you.
About this place I started to get second thoughts. Nothing had been this hard with Mongo before. And this is a really simple data model. It shouldn't be this hard. 

Often when I get that feeling I try to take a break. For about 24 h. And then approach the model again, as for the first time. I did that this time and when I got back the answer was bright and clear for me: "we don't need a Hashtag-collection. It's just what an old relation-dude like I wanted to keep the 'tables' in order". 

Instead this is just a function/index on the Post. Loop through the hashtags of all the posts. This can easily be accomplished with Mongoose like this:

The "trick" can be found on line 7 (.find({hashtags : hashtag})) which is Mongoose speak for: "find all the documents that have the hashtag in the hashtags array.".

Summary 

I'm still learning document databases and have not yet harness the full power of them. I have a little feel that document database is to relational databases what functional programming is to object orientation. That is could can write the code in the same way as you did, but then you're missing out on the whole idea. 

No comments: