Monthly Archives: September 2011

MongoDB and C# Dictionary Serialisation, Part 1 – The Problem

I’m building a web site at the moment, and as consequence, have been doing a lot of work with MongoDB. One of the interesting things I’ve come across is the C# driver’s, and I don’t know if this applies to other languages, approach to serialising generic dictionaries. i.e. Dictionary<TKey,TValue>.

The driver has two methods depending on the type of the key. If TKey is an object then the serialiser will create a nested array e.g.:

class Thread
{
 // Id of the Thread.
 ObjectId Id;

 // Dictionary<MemberId, Message>
 Dictionary<ObjectId, string> Posts;
}

serialises to:

{ "_id" : ObjectId("4e519fe15fc9d4099c01733a"),
	"Posts" : [
		[ ObjectId("4e3de6255fc9d40fd437a5ac"), "Is anyone there?" ],
		[ ObjectId("4e3de6305fc9d40fd437a5ae"), "Nobody but us chickens."]
	]
}

Alternatively if, we store the member’s id as a string:

class Thread
{
 // Id of the Thread.
 ObjectId Id;

 // Dictionary<MemberName, Message>
 Dictionary<string, string> Posts;
}

we get:

{ "_id" : ObjectId("4e519fe15fc9d4099c01733a"),
	"Posts" : {
		"4e3de6255fc9d40fd437a5ac" : "Is anyone there?"
		"4e3de6305fc9d40fd437a5ae" : "Nobody but us chickens."
	}
}

This latter format is wonderfully compact and seems like a smart way of leveraging the map like properties of the json/bson collection type.

Unfortunately, I think there are some significant problems with both approaches, which I’ll outline below. If I’m wrong about these let me know as I’d gladly be corrected!

Ability to search by the Dictionary’s Key

With the nested dictionary, we can’t search for documents containing posts by a specific MemberId. This syntax, which one might think could do this:

var q = Query("Posts.0", "4e3de6255fc9d40fd437a5ac")

Will actually search for records where the first element in Posts equals "4e3de6255fc9d40fd437a5ac" – it’s not going to find any matches even if the TKey of the first element matches. This is because the first element is actually a nested array of two elements i.e.

[ ObjectId("4e3de6255fc9d40fd437a5ac"), "Is anyone there?" ]

So you can’t search by MemberId, you can only search if you know the complete contents of the dictionary entry, and only if you know its specific position in the dictionary. That’s not too useful when dealing with serialsed dictionaries!

Info

As a brief aside, there is an open ticket on Mongo’s Jira to add support for a syntax like this:

var q = Query.EQ("Posts.$.0", "4e3de6255fc9d40fd437a5ac")

This would be great as it would solve our problem. I’m not sure it’s on the cards to be implemented anytime soon though.

If instead, we now look at our example with the string key, we can search with the following:

var q = Query.Exists("Posts.4e3de6255fc9d40fd437a5ac", true)

I don’t know about you, but I find it a little weird that the search value has become part of the key and we are having to use ‘Exists’ rather than ‘EQ’. Still… at least we can search by our dictionary’s key with this schema.

Ability to Index by the Dictionary’s Key

We are stuffed here. We can’t create an index with the nested array because we can’t even index the fields with current mongo syntax. Nor can we create a dictionary of all the field names in a document. This means our searches for dictionary key’s or elements are going to be expensive…

Ability to Atomically Update a Value in the Dictionary

Working with a database like MongoDb, one of the most important tools in our arsenal is the ability to make atomic updates to documents without having to pull them back into memory. This side steps a lot of concurrency issues that are otherwise difficult to manage without native transactions and locking. With a simple document this is pretty easy. What about modifying values in our dictionaries?

Well, with the nested array, it is again a bit of a disaster. The only options is to load the whole document, edit in memory and Update. So long atomicity!

With the nested document, we are in better shape and can use the following:

threadsCollection
        .FindAndModify(
		Query.Exists("Posts.4e3de6255fc9d40fd437a5ac", true),
                null,
                Update.Set("Posts.4e3de6255fc9d40fd437a5ac",
                       "A new message replacing the old"));

Summary of Issues

The nested array is not a well supported structure in mongo, and using it for dictionary persistence is very limiting. We lose the ability to search, index and update atomically.

The array of documents is more successful in that we can search and update atomically. We do however lose indexing, which is fairly catastrophic if we actually want to search and update across any reasonably sized collections.

An Alternative

Fortunately, there is an alternate approach which resolves all these problem. All we need to do is create a hybrid of the two approaches – an array, but this time, of specifically formatted documents with known fields for the key and value:

{ "_id" : ObjectId("4e519fe15fc9d4099c01733a"),
	"Posts" : [
		{ "k" : ObjectId("4e3de6255fc9d40fd437a5ac"), "v" : "Is anyone there?" },
		{ "k" : ObjectId("4e3de6305fc9d40fd437a5ae"), "v" : "Nobody but us chickens." }
	]
}

By having well known fields we can now search by the key, the value, or if we want to store documents as the value, the fields in that document.

This search will find all the documents in the collection with a given key :

var q = Query.EQ("Posts.K", "4e3de6255fc9d40fd437a5ac")

To update the value:

threadsCollection
	.Update(
		Query.EQ("Posts.K", "4e3de6255fc9d40fd437a5ac"),
		Update.Set("Posts.$.v", "A new message replacing the old");

Its worth noting that if our TValue is a class containing further properties rather than a literal type, then the above syntax can be extended to index into it using the standard dot notation. e.g.

threadsCollection
	.Update(
		Query.EQ("Posts.K", "4e3de6255fc9d40fd437a5ac"),
		Update.Set("Posts.$.v.Status", "New Status");

And because we can directly index the key, or indeed any value, we can index them!

We can also do funky things like replicating the addToSet functionality with our dictionary. Imagine we have a second dictionary on our thread, this one indexed by member id, but containing the members name so that we can cache the user names of everyone who has posted in the thread:

class Thread
{
 // Id of the Thread.
 ObjectId Id;

 // Dictionary<MemberId, Message>
 Dictionary<ObjectId, string> Posts;

 // Dictionary<MemberId, Name>
 Dictionary<ObjectId, string> NameLookup;

}

We don’t however want to have someone recorded in this lookup table style dictionary more than once, so addToSet is exactly the approach we need. To do this, we first have to create the element we would insert as a BsonDocument:

class NameLookupEntry
{
	public ObjectId k { get; set; }
	public Name v { get; set}
}

var entry = new NameLookupEntry() { k = "4e3de6255fc9d40fd437a5ac", v = "Joe Blow" };

In this instance we are searching for a specific thread by its Id, as we want to add the guy’s name to that one thread, and only if it isn’t already there:

threadsCollection
	.Update(
		Query.AND(
			Query.EQ("Id", "4e519fe15fc9d4099c01733a"
			Query.NE("NameLookup.K", "4e3de6255fc9d40fd437a5ac"),
		Update.Set("NameLookup.$.v", entry.ToBsonDocument());

In part 2, I’ll show you the code to build a custom serialiser to support this alternate scheme.

Thanks to Robert Stam at 10gen for assistance and feedback on this topic.

DXGrid, Immediate Updates & Nested Attached Behaviours

With the Devexpress grid control, when you edit a row, it doesn’t update the view model until the row loses focus. I needed immediate update so that my view model would get notified when a checkbox in the grid was toggled as soon as it happened. So I wrote an attached behaviour to facilitate this without having to resort to code behind.

This example is also interesting as it shows how to maintain extra state on a control, using a second attached behaviour managed internally by the main one. In this case, this is necessary to manage the lifetime of the subscription to the controls events. I couldn’t find any info about this technique on the net, so not sure how well known it is.

The code is fairly well commented so should be self explanatory. You’ll also notice that the events are being converted to IObservable with Reactive extensions. I love the functional approach of reactive and the way that reactive returns an IDisposable makes the lifetime management really easy.

n.b. Since this class is dependent on Reactive Extensions, your references will need to look something like this:

image

You can grab Reactive (Rx) from nuget.

using System;
using System.Windows;
using DevExpress.Xpf.Grid;
using System.Reactive.Linq;
using System.Reactive;

namespace Utils.DevExpress
{
	/// <summary>
	/// Attached behaviour for Dev Express GridView's that forces the grid to immediately update the bound
	/// datasource row when values are changed in it. By default it will only update when the row loses
	/// focus.
	/// 
	/// Usage in xaml:
	/// 1. Add Reference
	///    xmlns:dxu="clr-namespace:Utils.DevExpress;assembly=Utils.DevExpress"
	/// 
	/// 2. Attach to GridView
	///    {dxg:TableView ShowGroupPanel="False"  MultiSelectMode="Row"
	///                   dxu:DXGridViewUpdateBehaviour.ImmediateUpdate="True"       
	///                   AllowBestFit="True" AutoWidth="True"/}
	///    {/dxg:GridControl.View}
	/// 
	/// Derived from DevExpress recommended work around which was based on code behind:
	///    http://www.devexpress.com/Support/Center/p/E2832.aspx
	/// 
	/// Author : Daniel Harman
	/// Date   : 07.09.2011
	/// </summary>

	public class DXGridViewUpdateBehaviour : DependencyObject
	{

		#region Immediate Update Attached Property

		public static bool GetImmediateUpdate(DependencyObject obj)
		{
			return (bool)obj.GetValue(ImmediateUpdateProperty);
		}

		public static void SetImmediateUpdate(DependencyObject obj, bool value)
		{
			obj.SetValue(ImmediateUpdateProperty, value);
		}

		/// Immediate update is a boolean attached DP that when set to true, will ensure that grid row updates
		/// are immediately propagated to the view model, rather than only when the row loses focus.
		public static readonly DependencyProperty ImmediateUpdateProperty = DependencyProperty.RegisterAttached(
				"ImmediateUpdate", typeof(bool), typeof(GridViewBase),
				new UIPropertyMetadata(false, OnImmediateUpdatePropertyChanged));

		#endregion

		#region Grid View Update Behaviour Subscription Attached Property

		public static DXGridViewUpdateBehaviourSubscription GetGridViewUpdateBehaviourSubscription(DependencyObject obj)
		{
			return (DXGridViewUpdateBehaviourSubscription)obj.GetValue(GridViewUpdateBehaviourProperty);
		}

		public static void SetGridViewUpdateBehaviourSubscription(
			DependencyObject obj, DXGridViewUpdateBehaviourSubscription value)
		{
			obj.SetValue(GridViewUpdateBehaviourProperty, value);
		}

		/// This property is used to store an instance of the subscription on the control. This instance
		/// contains the rxEventHandler IDisposable so that we can clean up when we want to change the binding
		/// etc.
		public static readonly DependencyProperty GridViewUpdateBehaviourProperty = DependencyProperty.RegisterAttached(
				 "GridViewUpdateBehaviour", typeof(DXGridViewUpdateBehaviourSubscription), typeof(DXGridViewUpdateBehaviour),
				 new UIPropertyMetadata(null));

		#endregion

		/// <summary>
		/// Handle changes to ImmediateUpdate.
		/// </summary>
		/// <param name="d">A child of GridViewBase</param>
		/// <param name="e">true/false for value of attached DP.</param>
		private static void OnImmediateUpdatePropertyChanged(DependencyObject d, DependencyPropertyChangedEventArgs e)
		{
			var gridViewBase = (GridViewBase)d;
			var oldValue = (bool)e.OldValue;
			var newValue = (bool)e.NewValue;

			if (oldValue == newValue)
				return;

			// Remove old sub if it exists.
			var oldSub = GetGridViewUpdateBehaviourSubscription(d);

			if (oldSub != null)
				oldSub.Dispose();

			// If ImmediateUpdate==true then create new sub.
			if (newValue)
				SetGridViewUpdateBehaviourSubscription(d, new DXGridViewUpdateBehaviourSubscription(gridViewBase));
		}

		/// <summary>
		/// Nested class to manage the lifetime of the reactive event handler. This is attached the GridView
		/// as an attached dependency property.
		/// </summary>
		public class DXGridViewUpdateBehaviourSubscription : IDisposable
		{
			IDisposable rxEventHandler;

			public DXGridViewUpdateBehaviourSubscription(GridViewBase gridViewBase)
			{
				// Create an rx observable of the events and subscribe handler to it.
				rxEventHandler = Observable
						 .FromEventPattern<CellValueChangedEventHandler, CellValueChangedEventArgs>(
								 h => gridViewBase.CellValueChanging += h,
								 h => gridViewBase.CellValueChanging -= h)
						 .Subscribe(OnCellValueChanging);
			}

			/// <summary>
			/// On notification that a cell value is changing, force the TableView to update the view model immediately.
			/// </summary>
			/// <param name="ep"></param>
			void OnCellValueChanging(EventPattern<CellValueChangedEventArgs> ep)
			{
				(ep.Sender as GridViewBase).PostEditor();
			}

			#region IDisposable Members

			public void Dispose()
			{
				rxEventHandler.Dispose();
			}

			#endregion
		}
	}
}