MongoDB and C# Dictionary Serialisation, Part 1 – The Problem

I’m building a web site at the moment, and as consequence, have been doing a lot of work with MongoDB. One of the interesting things I’ve come across is the C# driver’s, and I don’t know if this applies to other languages, approach to serialising generic dictionaries. i.e. Dictionary<TKey,TValue>.

The driver has two methods depending on the type of the key. If TKey is an object then the serialiser will create a nested array e.g.:

class Thread
{
 // Id of the Thread.
 ObjectId Id;

 // Dictionary<MemberId, Message>
 Dictionary<ObjectId, string> Posts;
}

serialises to:

{ "_id" : ObjectId("4e519fe15fc9d4099c01733a"),
	"Posts" : [
		[ ObjectId("4e3de6255fc9d40fd437a5ac"), "Is anyone there?" ],
		[ ObjectId("4e3de6305fc9d40fd437a5ae"), "Nobody but us chickens."]
	]
}

Alternatively if, we store the member’s id as a string:

class Thread
{
 // Id of the Thread.
 ObjectId Id;

 // Dictionary<MemberName, Message>
 Dictionary<string, string> Posts;
}

we get:

{ "_id" : ObjectId("4e519fe15fc9d4099c01733a"),
	"Posts" : {
		"4e3de6255fc9d40fd437a5ac" : "Is anyone there?"
		"4e3de6305fc9d40fd437a5ae" : "Nobody but us chickens."
	}
}

This latter format is wonderfully compact and seems like a smart way of leveraging the map like properties of the json/bson collection type.

Unfortunately, I think there are some significant problems with both approaches, which I’ll outline below. If I’m wrong about these let me know as I’d gladly be corrected!

Ability to search by the Dictionary’s Key

With the nested dictionary, we can’t search for documents containing posts by a specific MemberId. This syntax, which one might think could do this:

var q = Query("Posts.0", "4e3de6255fc9d40fd437a5ac")

Will actually search for records where the first element in Posts equals "4e3de6255fc9d40fd437a5ac" – it’s not going to find any matches even if the TKey of the first element matches. This is because the first element is actually a nested array of two elements i.e.

[ ObjectId("4e3de6255fc9d40fd437a5ac"), "Is anyone there?" ]

So you can’t search by MemberId, you can only search if you know the complete contents of the dictionary entry, and only if you know its specific position in the dictionary. That’s not too useful when dealing with serialsed dictionaries!

Info

As a brief aside, there is an open ticket on Mongo’s Jira to add support for a syntax like this:

var q = Query.EQ("Posts.$.0", "4e3de6255fc9d40fd437a5ac")

This would be great as it would solve our problem. I’m not sure it’s on the cards to be implemented anytime soon though.

If instead, we now look at our example with the string key, we can search with the following:

var q = Query.Exists("Posts.4e3de6255fc9d40fd437a5ac", true)

I don’t know about you, but I find it a little weird that the search value has become part of the key and we are having to use ‘Exists’ rather than ‘EQ’. Still… at least we can search by our dictionary’s key with this schema.

Ability to Index by the Dictionary’s Key

We are stuffed here. We can’t create an index with the nested array because we can’t even index the fields with current mongo syntax. Nor can we create a dictionary of all the field names in a document. This means our searches for dictionary key’s or elements are going to be expensive…

Ability to Atomically Update a Value in the Dictionary

Working with a database like MongoDb, one of the most important tools in our arsenal is the ability to make atomic updates to documents without having to pull them back into memory. This side steps a lot of concurrency issues that are otherwise difficult to manage without native transactions and locking. With a simple document this is pretty easy. What about modifying values in our dictionaries?

Well, with the nested array, it is again a bit of a disaster. The only options is to load the whole document, edit in memory and Update. So long atomicity!

With the nested document, we are in better shape and can use the following:

threadsCollection
        .FindAndModify(
		Query.Exists("Posts.4e3de6255fc9d40fd437a5ac", true),
                null,
                Update.Set("Posts.4e3de6255fc9d40fd437a5ac",
                       "A new message replacing the old"));

Summary of Issues

The nested array is not a well supported structure in mongo, and using it for dictionary persistence is very limiting. We lose the ability to search, index and update atomically.

The array of documents is more successful in that we can search and update atomically. We do however lose indexing, which is fairly catastrophic if we actually want to search and update across any reasonably sized collections.

An Alternative

Fortunately, there is an alternate approach which resolves all these problem. All we need to do is create a hybrid of the two approaches – an array, but this time, of specifically formatted documents with known fields for the key and value:

{ "_id" : ObjectId("4e519fe15fc9d4099c01733a"),
	"Posts" : [
		{ "k" : ObjectId("4e3de6255fc9d40fd437a5ac"), "v" : "Is anyone there?" },
		{ "k" : ObjectId("4e3de6305fc9d40fd437a5ae"), "v" : "Nobody but us chickens." }
	]
}

By having well known fields we can now search by the key, the value, or if we want to store documents as the value, the fields in that document.

This search will find all the documents in the collection with a given key :

var q = Query.EQ("Posts.K", "4e3de6255fc9d40fd437a5ac")

To update the value:

threadsCollection
	.Update(
		Query.EQ("Posts.K", "4e3de6255fc9d40fd437a5ac"),
		Update.Set("Posts.$.v", "A new message replacing the old");

Its worth noting that if our TValue is a class containing further properties rather than a literal type, then the above syntax can be extended to index into it using the standard dot notation. e.g.

threadsCollection
	.Update(
		Query.EQ("Posts.K", "4e3de6255fc9d40fd437a5ac"),
		Update.Set("Posts.$.v.Status", "New Status");

And because we can directly index the key, or indeed any value, we can index them!

We can also do funky things like replicating the addToSet functionality with our dictionary. Imagine we have a second dictionary on our thread, this one indexed by member id, but containing the members name so that we can cache the user names of everyone who has posted in the thread:

class Thread
{
 // Id of the Thread.
 ObjectId Id;

 // Dictionary<MemberId, Message>
 Dictionary<ObjectId, string> Posts;

 // Dictionary<MemberId, Name>
 Dictionary<ObjectId, string> NameLookup;

}

We don’t however want to have someone recorded in this lookup table style dictionary more than once, so addToSet is exactly the approach we need. To do this, we first have to create the element we would insert as a BsonDocument:

class NameLookupEntry
{
	public ObjectId k { get; set; }
	public Name v { get; set}
}

var entry = new NameLookupEntry() { k = "4e3de6255fc9d40fd437a5ac", v = "Joe Blow" };

In this instance we are searching for a specific thread by its Id, as we want to add the guy’s name to that one thread, and only if it isn’t already there:

threadsCollection
	.Update(
		Query.AND(
			Query.EQ("Id", "4e519fe15fc9d4099c01733a"
			Query.NE("NameLookup.K", "4e3de6255fc9d40fd437a5ac"),
		Update.Set("NameLookup.$.v", entry.ToBsonDocument());

In part 2, I’ll show you the code to build a custom serialiser to support this alternate scheme.

Thanks to Robert Stam at 10gen for assistance and feedback on this topic.

2 thoughts on “MongoDB and C# Dictionary Serialisation, Part 1 – The Problem

Leave a Reply