MongoDb Ninjitsu: Using ObjectId as a Timestamp

Whilst reading around yesterday I stumbled upon this little gem of knowledge:

Mongo’s ObjectIds contains a Utc timestamp with 1 second resolution.

This means, that if we don’t need millisecond accuracy, we can drop all those “CreatedOn” fields from our schemas, and by doing so we win twice:

  • Storage – no need to store a seperate time stamp field means less to store on disk and less for the server to shunt around.
  • Free Index – assuming your ObjectId is your primary key, it already has an index on it by default, so not only are you saving the space of having to store a seperate time stamp, but it’s also indexed for free.

To make it a little easier, I created an extension class to convert DateTimes to and from ObjectIds. This makes it insanely simple to query for objects created before/after a certain date. Here’s the code:

/// Author	: Daniel Harman (http://www.danharman.net)
/// Date	: 26.Oct.2011
/// License : Public Domain with no warranty given. Please maintain author credit.
using System;
using System.Collections.Generic;
using System.Linq;
using MongoDB.Bson;

namespace Mmphs.Utils.MongoDb
{
	public static class DateTimeExtensions
	{
		/// <summary>
		/// Converts a DateTime to an ObjectId.
		/// n.b. missing values that would ensure uniqueness. This is only intended for time comparisons.
		/// </summary>
		/// <param name="dateTime"></param>
		/// <returns></returns>
		public static ObjectId ToObjectId(this DateTime dateTime)
		{
			var timestamp = (int)(dateTime - BsonConstants.UnixEpoch).TotalSeconds;
			return new ObjectId(timestamp, 0, 0, 0);
		}

		/// <summary>
		/// Convert an ObjectId to a DateTime.
		/// </summary>
		/// <param name="id"></param>
		/// <returns></returns>
		public static DateTime ToDateTime(this ObjectId id)
		{
			return id.CreationTime;
		}

		static readonly int DATETIME_TRUNCATE_FACTOR = 10000;

		/// <summary>
		/// Truncate the accuracy of a datetime to the same resolution as a mongo db datetime.
		/// </summary>
		/// <param name="dateTime"></param>
		/// <returns></returns>
		public static DateTime MongoTruncate(this DateTime dateTime)
		{
			long ticks = dateTime.Ticks / DATETIME_TRUNCATE_FACTOR;
			return new DateTime(ticks * DATETIME_TRUNCATE_FACTOR, dateTime.Kind);
		}
	}
}

I’ve thrown in a little bonus there too – a method to truncate a DateTime in the same way MongoDb does when it persists one. This is handy for unit tests where you want to compare an object you’ve created and persisted and then loaded back up e.g. when testing a query brings back the right record.

Here is an example of using the ObjectId to get items after a certain date:

		public IEnumerable<Drop> GetStreamActivities(ObjectId streamId, DateTime utcSince)
		{
			return _drops
				.AsQueryable()
				.Where(a => a.StreamId == streamId && a.Id >= utcSince.ToObjectId())
				.OrderBy(a => a.Id);
		}

n.b. I’m using FluentMongo, but you certainly don’t need to.

and here are the unit tests:

/// Author	: Daniel Harman (http://www.danharman.net)
/// Date	: 26.Oct.2011
/// License : Public Domain with no warranty given. Please maintain author credit.

using System;
using System.Collections.Generic;
using System.Linq;
using MbUnit.Framework;
using MongoDB.Bson;

namespace Mmphs.Utils.MongoDb.Test.DateTimeExtensions
{
	[TestFixture]
	public class DateTimeExtensionsTest
	{
		[Test]
		public void Can_Convert_DateTime_To_ObjectId()
		{
			// Arrange
			DateTime dateTime = DateTime.UtcNow;

			// Act
			ObjectId result = dateTime.ToObjectId();

			// Assert
			Assert.AreApproximatelyEqual(dateTime, result.CreationTime, TimeSpan.FromSeconds(1));
		}

		[Test]
		public void Can_Convert_ObjectId_To_DateTime()
		{
			// Arrange
			ObjectId objectId = ObjectId.GenerateNewId();
			DateTime dateTime = DateTime.UtcNow;

			// Act
			var result = objectId.ToDateTime();

			// Assert
			Assert.AreApproximatelyEqual(dateTime, result, TimeSpan.FromSeconds(1));
		}
	}
}
/// Author	: Daniel Harman (http://www.danharman.net)
/// Date	: 26.Oct.2011
/// License : Public Domain with no warranty given. Please maintain author credit.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using MbUnit.Framework;
using MongoDB.Bson;
using MongoDB.Bson.Serialization;

namespace Mmphs.Utils.MongoDb.Test.DateTimeExtensions
{
	[TestFixture]
	public class When_Truncating_DateTime
	{
		[Test]
		public void Should_Match_Mongo()
		{
			// Arrage
			DateTime dt = DateTime.UtcNow;
			var asJson = dt.ToJson();

			// Act
			var asTrunc = dt.MongoTruncate();
			
			// Assert
			var fromJson = BsonSerializer.Deserialize<DateTime>(asJson);
			Assert.AreEqual(fromJson, asTrunc);
		}
	}
}

MongoDB and C# Dictionary Serialisation, Part 1 – The Problem

I’m building a web site at the moment, and as consequence, have been doing a lot of work with MongoDB. One of the interesting things I’ve come across is the C# driver’s, and I don’t know if this applies to other languages, approach to serialising generic dictionaries. i.e. Dictionary<TKey,TValue>.

The driver has two methods depending on the type of the key. If TKey is an object then the serialiser will create a nested array e.g.:

class Thread
{
 // Id of the Thread.
 ObjectId Id;

 // Dictionary<MemberId, Message>
 Dictionary<ObjectId, string> Posts;
}

serialises to:

{ "_id" : ObjectId("4e519fe15fc9d4099c01733a"),
	"Posts" : [
		[ ObjectId("4e3de6255fc9d40fd437a5ac"), "Is anyone there?" ],
		[ ObjectId("4e3de6305fc9d40fd437a5ae"), "Nobody but us chickens."]
	]
}

Alternatively if, we store the member’s id as a string:

class Thread
{
 // Id of the Thread.
 ObjectId Id;

 // Dictionary<MemberName, Message>
 Dictionary<string, string> Posts;
}

we get:

{ "_id" : ObjectId("4e519fe15fc9d4099c01733a"),
	"Posts" : {
		"4e3de6255fc9d40fd437a5ac" : "Is anyone there?"
		"4e3de6305fc9d40fd437a5ae" : "Nobody but us chickens."
	}
}

This latter format is wonderfully compact and seems like a smart way of leveraging the map like properties of the json/bson collection type.

Unfortunately, I think there are some significant problems with both approaches, which I’ll outline below. If I’m wrong about these let me know as I’d gladly be corrected!

Ability to search by the Dictionary’s Key

With the nested dictionary, we can’t search for documents containing posts by a specific MemberId. This syntax, which one might think could do this:

var q = Query("Posts.0", "4e3de6255fc9d40fd437a5ac")

Will actually search for records where the first element in Posts equals "4e3de6255fc9d40fd437a5ac" – it’s not going to find any matches even if the TKey of the first element matches. This is because the first element is actually a nested array of two elements i.e.

[ ObjectId("4e3de6255fc9d40fd437a5ac"), "Is anyone there?" ]

So you can’t search by MemberId, you can only search if you know the complete contents of the dictionary entry, and only if you know its specific position in the dictionary. That’s not too useful when dealing with serialsed dictionaries!

Info

As a brief aside, there is an open ticket on Mongo’s Jira to add support for a syntax like this:

var q = Query.EQ("Posts.$.0", "4e3de6255fc9d40fd437a5ac")

This would be great as it would solve our problem. I’m not sure it’s on the cards to be implemented anytime soon though.

If instead, we now look at our example with the string key, we can search with the following:

var q = Query.Exists("Posts.4e3de6255fc9d40fd437a5ac", true)

I don’t know about you, but I find it a little weird that the search value has become part of the key and we are having to use ‘Exists’ rather than ‘EQ’. Still… at least we can search by our dictionary’s key with this schema.

Ability to Index by the Dictionary’s Key

We are stuffed here. We can’t create an index with the nested array because we can’t even index the fields with current mongo syntax. Nor can we create a dictionary of all the field names in a document. This means our searches for dictionary key’s or elements are going to be expensive…

Ability to Atomically Update a Value in the Dictionary

Working with a database like MongoDb, one of the most important tools in our arsenal is the ability to make atomic updates to documents without having to pull them back into memory. This side steps a lot of concurrency issues that are otherwise difficult to manage without native transactions and locking. With a simple document this is pretty easy. What about modifying values in our dictionaries?

Well, with the nested array, it is again a bit of a disaster. The only options is to load the whole document, edit in memory and Update. So long atomicity!

With the nested document, we are in better shape and can use the following:

threadsCollection
        .FindAndModify(
		Query.Exists("Posts.4e3de6255fc9d40fd437a5ac", true),
                null,
                Update.Set("Posts.4e3de6255fc9d40fd437a5ac",
                       "A new message replacing the old"));

Summary of Issues

The nested array is not a well supported structure in mongo, and using it for dictionary persistence is very limiting. We lose the ability to search, index and update atomically.

The array of documents is more successful in that we can search and update atomically. We do however lose indexing, which is fairly catastrophic if we actually want to search and update across any reasonably sized collections.

An Alternative

Fortunately, there is an alternate approach which resolves all these problem. All we need to do is create a hybrid of the two approaches – an array, but this time, of specifically formatted documents with known fields for the key and value:

{ "_id" : ObjectId("4e519fe15fc9d4099c01733a"),
	"Posts" : [
		{ "k" : ObjectId("4e3de6255fc9d40fd437a5ac"), "v" : "Is anyone there?" },
		{ "k" : ObjectId("4e3de6305fc9d40fd437a5ae"), "v" : "Nobody but us chickens." }
	]
}

By having well known fields we can now search by the key, the value, or if we want to store documents as the value, the fields in that document.

This search will find all the documents in the collection with a given key :

var q = Query.EQ("Posts.K", "4e3de6255fc9d40fd437a5ac")

To update the value:

threadsCollection
	.Update(
		Query.EQ("Posts.K", "4e3de6255fc9d40fd437a5ac"),
		Update.Set("Posts.$.v", "A new message replacing the old");

Its worth noting that if our TValue is a class containing further properties rather than a literal type, then the above syntax can be extended to index into it using the standard dot notation. e.g.

threadsCollection
	.Update(
		Query.EQ("Posts.K", "4e3de6255fc9d40fd437a5ac"),
		Update.Set("Posts.$.v.Status", "New Status");

And because we can directly index the key, or indeed any value, we can index them!

We can also do funky things like replicating the addToSet functionality with our dictionary. Imagine we have a second dictionary on our thread, this one indexed by member id, but containing the members name so that we can cache the user names of everyone who has posted in the thread:

class Thread
{
 // Id of the Thread.
 ObjectId Id;

 // Dictionary<MemberId, Message>
 Dictionary<ObjectId, string> Posts;

 // Dictionary<MemberId, Name>
 Dictionary<ObjectId, string> NameLookup;

}

We don’t however want to have someone recorded in this lookup table style dictionary more than once, so addToSet is exactly the approach we need. To do this, we first have to create the element we would insert as a BsonDocument:

class NameLookupEntry
{
	public ObjectId k { get; set; }
	public Name v { get; set}
}

var entry = new NameLookupEntry() { k = "4e3de6255fc9d40fd437a5ac", v = "Joe Blow" };

In this instance we are searching for a specific thread by its Id, as we want to add the guy’s name to that one thread, and only if it isn’t already there:

threadsCollection
	.Update(
		Query.AND(
			Query.EQ("Id", "4e519fe15fc9d4099c01733a"
			Query.NE("NameLookup.K", "4e3de6255fc9d40fd437a5ac"),
		Update.Set("NameLookup.$.v", entry.ToBsonDocument());

In part 2, I’ll show you the code to build a custom serialiser to support this alternate scheme.

Thanks to Robert Stam at 10gen for assistance and feedback on this topic.

DXGrid, Immediate Updates & Nested Attached Behaviours

With the Devexpress grid control, when you edit a row, it doesn’t update the view model until the row loses focus. I needed immediate update so that my view model would get notified when a checkbox in the grid was toggled as soon as it happened. So I wrote an attached behaviour to facilitate this without having to resort to code behind.

This example is also interesting as it shows how to maintain extra state on a control, using a second attached behaviour managed internally by the main one. In this case, this is necessary to manage the lifetime of the subscription to the controls events. I couldn’t find any info about this technique on the net, so not sure how well known it is.

The code is fairly well commented so should be self explanatory. You’ll also notice that the events are being converted to IObservable with Reactive extensions. I love the functional approach of reactive and the way that reactive returns an IDisposable makes the lifetime management really easy.

n.b. Since this class is dependent on Reactive Extensions, your references will need to look something like this:

image

You can grab Reactive (Rx) from nuget.

using System;
using System.Windows;
using DevExpress.Xpf.Grid;
using System.Reactive.Linq;
using System.Reactive;

namespace Utils.DevExpress
{
	/// <summary>
	/// Attached behaviour for Dev Express GridView's that forces the grid to immediately update the bound
	/// datasource row when values are changed in it. By default it will only update when the row loses
	/// focus.
	/// 
	/// Usage in xaml:
	/// 1. Add Reference
	///    xmlns:dxu="clr-namespace:Utils.DevExpress;assembly=Utils.DevExpress"
	/// 
	/// 2. Attach to GridView
	///    {dxg:TableView ShowGroupPanel="False"  MultiSelectMode="Row"
	///                   dxu:DXGridViewUpdateBehaviour.ImmediateUpdate="True"       
	///                   AllowBestFit="True" AutoWidth="True"/}
	///    {/dxg:GridControl.View}
	/// 
	/// Derived from DevExpress recommended work around which was based on code behind:
	///    http://www.devexpress.com/Support/Center/p/E2832.aspx
	/// 
	/// Author : Daniel Harman
	/// Date   : 07.09.2011
	/// </summary>

	public class DXGridViewUpdateBehaviour : DependencyObject
	{

		#region Immediate Update Attached Property

		public static bool GetImmediateUpdate(DependencyObject obj)
		{
			return (bool)obj.GetValue(ImmediateUpdateProperty);
		}

		public static void SetImmediateUpdate(DependencyObject obj, bool value)
		{
			obj.SetValue(ImmediateUpdateProperty, value);
		}

		/// Immediate update is a boolean attached DP that when set to true, will ensure that grid row updates
		/// are immediately propagated to the view model, rather than only when the row loses focus.
		public static readonly DependencyProperty ImmediateUpdateProperty = DependencyProperty.RegisterAttached(
				"ImmediateUpdate", typeof(bool), typeof(GridViewBase),
				new UIPropertyMetadata(false, OnImmediateUpdatePropertyChanged));

		#endregion

		#region Grid View Update Behaviour Subscription Attached Property

		public static DXGridViewUpdateBehaviourSubscription GetGridViewUpdateBehaviourSubscription(DependencyObject obj)
		{
			return (DXGridViewUpdateBehaviourSubscription)obj.GetValue(GridViewUpdateBehaviourProperty);
		}

		public static void SetGridViewUpdateBehaviourSubscription(
			DependencyObject obj, DXGridViewUpdateBehaviourSubscription value)
		{
			obj.SetValue(GridViewUpdateBehaviourProperty, value);
		}

		/// This property is used to store an instance of the subscription on the control. This instance
		/// contains the rxEventHandler IDisposable so that we can clean up when we want to change the binding
		/// etc.
		public static readonly DependencyProperty GridViewUpdateBehaviourProperty = DependencyProperty.RegisterAttached(
				 "GridViewUpdateBehaviour", typeof(DXGridViewUpdateBehaviourSubscription), typeof(DXGridViewUpdateBehaviour),
				 new UIPropertyMetadata(null));

		#endregion

		/// <summary>
		/// Handle changes to ImmediateUpdate.
		/// </summary>
		/// <param name="d">A child of GridViewBase</param>
		/// <param name="e">true/false for value of attached DP.</param>
		private static void OnImmediateUpdatePropertyChanged(DependencyObject d, DependencyPropertyChangedEventArgs e)
		{
			var gridViewBase = (GridViewBase)d;
			var oldValue = (bool)e.OldValue;
			var newValue = (bool)e.NewValue;

			if (oldValue == newValue)
				return;

			// Remove old sub if it exists.
			var oldSub = GetGridViewUpdateBehaviourSubscription(d);

			if (oldSub != null)
				oldSub.Dispose();

			// If ImmediateUpdate==true then create new sub.
			if (newValue)
				SetGridViewUpdateBehaviourSubscription(d, new DXGridViewUpdateBehaviourSubscription(gridViewBase));
		}

		/// <summary>
		/// Nested class to manage the lifetime of the reactive event handler. This is attached the GridView
		/// as an attached dependency property.
		/// </summary>
		public class DXGridViewUpdateBehaviourSubscription : IDisposable
		{
			IDisposable rxEventHandler;

			public DXGridViewUpdateBehaviourSubscription(GridViewBase gridViewBase)
			{
				// Create an rx observable of the events and subscribe handler to it.
				rxEventHandler = Observable
						 .FromEventPattern<CellValueChangedEventHandler, CellValueChangedEventArgs>(
								 h => gridViewBase.CellValueChanging += h,
								 h => gridViewBase.CellValueChanging -= h)
						 .Subscribe(OnCellValueChanging);
			}

			/// <summary>
			/// On notification that a cell value is changing, force the TableView to update the view model immediately.
			/// </summary>
			/// <param name="ep"></param>
			void OnCellValueChanging(EventPattern<CellValueChangedEventArgs> ep)
			{
				(ep.Sender as GridViewBase).PostEditor();
			}

			#region IDisposable Members

			public void Dispose()
			{
				rxEventHandler.Dispose();
			}

			#endregion
		}
	}
}

Binding a DevExpress Grid Context Menu to a MVVM ViewModel Command

One of the things I sometimes wrestle with, is keeping to MVVM when working with DevExpress controls.

Today I was trying to bind the right click context menu on their grid control, to a command on my view model. I started off working from some of their example code which was event and code behind based. This lead me to the using the EventToCommand pattern, as didn’t want any code behind.

This is where the pain began! Firstly their BarButtonItem is derived from FrameworkContentElement, not FrameworkElement, which makes it incompatible with MVVM Light. I posted a question about this on stackoveflow which was answered with a great workaround.

However before I received that answer, I had already decided to just use the more vanilla System.Windows.Interactivity InvokeCommandAction which doesn’t have the limitation of only binding to FrameworkElements.

Again I ran into problems. It seems that the context menu’s from dev express are in their own visual tree, so I tried all sorts of bindings to try and get back to my ViewModel. None of which worked!

It was at this point that my colleague pointed out that, the menu items are actually buttons and have a command property. So all this wrestling with triggers was a complete waste of time – curses!

However, the binding to get back to the ViewModel is not at all obvious, as the DataContext on these popup menus is not what you might expect. Anyway it is possible, and the way to do it is as follows:

<dxg:TableView.RowCellMenuCustomizations>
    <dxb:BarButtonItem Name="deleteRowItem" Content="Delete"
        Command="{Binding View.DataContext.DeleteCommand}" />
</dxg:TableView.RowCellMenuCustomizations>

Binding WPF Events to MVVM ViewModel Commands

This article looks at binding event on WPF controls to commands in your MVVM view model.

A lot of MVVM examples show you how to bind a command in a view to an ICommand in your view model. What they sometimes skirt over, is how you get events into the view model. Prism for example recommends the use of an Event Aggregator.

However, there is an easier way! All we need is a class library that is part of Expression Blend. It’s called System.Windows.Interactivity and you can get hold of it from a variety of sources, but the one I recommend is the MVVM Light library.

This gives you the ability to create a trigger on an event and bind it to an ICommand on the view model.

xmlns:i="clr-namespace:System.Windows.Interactivity;assembly=System.Windows.Interactivity" 
<Button>
    <i:Interaction.Triggers>
        <i:EventTrigger EventName="MouseEnter" >
            <i:InvokeCommandAction Command="{Binding FooCommand}" />
        </i:EventTrigger>
    </i:Interaction.Triggers>
</Button>

Assuming the DataContext is your view model, then the above will map the ‘MouseEnter’ event to ‘FooCommand’ on the view model.

The only issue here, is that InvokeCommandAction doesn’t give you the event parameters.

I’ve already mentioned MVVM Light, and this library provides the a solution, by offering a different event to command behaviour that can optionally pass the parameters. Like this:

xmlns:i="clr-namespace:System.Windows.Interactivity;assembly=System.Windows.Interactivity" 
xmlns:cmd="clr-namespace:GalaSoft.MvvmLight.Command;assembly=GalaSoft.MvvmLight.Extras.WPF4"
<Button>
    <i:Interaction.Triggers>
        <i:EventTrigger EventName="MouseEnter" >
             <cmd:EventToCommand Command="{Binding FooCommand}"
                 PassEventArgsToCommand="True" />
        </i:EventTrigger>
    </i:Interaction.Triggers>
</Button>

With this, you’ll find the MouseEventArgs in the Object passed into your command.

SEO Slugification in Dotnet aka Unicode to Ascii aka Diacritic Stripping

This article looks at SEO using url slugs, and how we can generate them in dotnet/asp.

I’ve recently conducted quite a lot of research, looking for a class to create url slugs in C#. If you are not familiar with these, they are when sentences are converted to a url friendly format. Normally this involves converting spaces and punctuation to hyphens and removing accents from characters. e.g. “This is my resumé” becomes “this-is-my-resume”.

Slugification of urls to make them human readable, is considered good SEO strategy as it gets keywords into urls. Although how much weight engines put into it these days is a matter of debate as it has been abused by spammers. nonetheless I think humans are more likely to click on a human readable link if a search engine throws it up, so there is no downside.

Before digging into the implementation aspects, one thing worth mentioning, as it took me a while to realise, is that it’s not really worth trying to use the slug as the unique key to whatever resource you want to offer. Dealing with the problem of collisions where you have matching slugs is just not worth the pain. Especially if you want your slugs to contain something highly repeatable like names.

What you will see the experts do is include the id of the resource (e.g. a GUID) and the slug. You can see it on both Stack Overflow and Facebook:

http://stackoverflow.com/questions/3769457/how-can-i-remove-accents-on-a-string

http://ms-my.facebook.com/people/Joe-Bloggs/12343243267683877

For StackOverflow, the id is actually 3769457, and if you change the slug string it makes no odds to what you get back.

Facebook is exactly the same except they place the real id at the end instead of the middle. This seems a little smarter to me as google etc truncate very long URLs when they display them in search results, so by making sure the human readable part is at the front, you aren’t losing that whilst preserving a human meaningless id.

They also both do permanent redirects if you type in a random slug. This tackles the canonical problem of having the same content on multiple links if people mess up the slug, which is pure SEO poison if not tackled.

Right, to business – how do we do it? Well, the first part, punctuation removal and hyphenation, is trivial.

Removing accents however is rather more complex, and goes by many names including diacritic (accent) stripping and Unicode to ascii. My research took me to stack overflow where it became apparent that whilst there are libraries to help, C# coverage is very weak.

Looking at other language’s libraries, they tend to rely on lookup tables and regex. The tables are very difficult to make complete, and regex is quite an expensive operation.

There is also a more standards based way of doing this, which is known as Normalisation and is described in more detail on the Unicode website here. This describes the different forms of normalisation which I’m not going to go into here. What normalisation does, is split Unicode characters into the represented letter and a series of accents etc that follow (see the unicode link for more details).

Having got this string one then needs to remove the accent characters which should leave us with just the character and no accent. This is great in theory, unfortunately certain characters don’t map to a low Ascii character with normalise, so even with this approach one needs a lookup table for exceptions.

So the answer I’ve come to is based on two snippets which Jeff Atwood kindly shared on Stack Overflow. These are apparently the functions Stack Overflow uses for this very operation. You can find these here and here. Using these you have the warm glow of knowing they are production tested on a high volume site and performance is solid – note the lack of regex!

I have however, made a few changes:

  • The hyphens were appended in such a way that one could be added, and then need removing as it was the last character in the string. i.e. We never want “my-slug-”. This mean an extra string allocation. I’ve worked around this by delay-hyphening. If you compare my code to Jeff’s the logic for this is easy to follow.
  • His approach is purely lookup based and missed a lot of characters I found in examples whilst researching on stack overflow. To counter this, I first perform a normalisation pass, and then ignore any characters outside the acceptable ranges. This works most of the time…
  • …For when it doesn’t I’ve also had to add a lookup table. As mentioned above, some characters don’t map to a low ascii value when normalised. Rather than drop these I’ve got a manual list of exceptions that is doubtless full of holes, but better than nothing. The normalisation code was inspired by Jon Hanna’s great post here.
  • The case conversion is now also optional.

The upshot of all this, is that my version has better coverage than Jeff’s original and is a bit smarter with the hyphenation. I have a suspicion that the Microsoft implemented normalisation is likely to be slower than Jeff’s lookup table, so we are trading completeness for performance on that aspect. The hyphenation I would expect to be a bit faster, but not much as his extra string copy was only an edge case.

Anyway here’s the code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

	public static class Slug
	{
		public static string Create(bool toLower, params string[] values)
		{
			return Create(toLower, String.Join("-", values));
		}

		/// <summary>
		/// Creates a slug.
		/// Author: Daniel Harman, based on original code by Jeff Atwood
		/// References:
		/// http://www.unicode.org/reports/tr15/tr15-34.html
		/// http://meta.stackoverflow.com/questions/7435/non-us-ascii-characters-dropped-from-full-profile-url/7696#7696
		/// http://stackoverflow.com/questions/25259/how-do-you-include-a-webpage-title-as-part-of-a-webpage-url/25486#25486
		/// http://stackoverflow.com/questions/3769457/how-can-i-remove-accents-on-a-string
		/// </summary>
		/// <param name="toLower"></param>
		/// <param name="normalised"></param>
		/// <returns></returns>
		public static string Create(bool toLower, string value)
		{
			if (value == null) return "";

			var normalised = value.Normalize(NormalizationForm.FormKD);

			const int maxlen = 80;
			int len = normalised.Length;
			bool prevDash = false;
			var sb = new StringBuilder(len);
			char c;

			for (int i = 0; i < len; i++)
			{
				c = normalised[i];
				if ((c >= 'a' && c <= 'z') || (c >= '0' && c <= '9'))
				{
					if (prevDash)
					{
						sb.Append('-');
						prevDash = false;
					}
					sb.Append(c);
				}
				else if (c >= 'A' && c <= 'Z')
				{
					if (prevDash)
					{
						sb.Append('-');
						prevDash = false;
					}
					// tricky way to convert to lowercase
					if (toLower)
						sb.Append((char)(c | 32));
					else
						sb.Append(c);
				}
				else if (c == ' ' || c == ',' || c == '.' || c == '/' || c == '\' || c == '-' || c == '_' || c == '=')
				{
					if (!prevDash && sb.Length > 0)
					{
						prevDash = true;
					}
				}
				else
				{
					string swap = ConvertEdgeCases(c, toLower);

					if (swap != null)
					{
						if (prevDash)
						{
							sb.Append('-');
							prevDash = false;
						}
						sb.Append(swap);
					}
				}

				if (sb.Length == maxlen) break;
			}

			return sb.ToString();
		}

		static string ConvertEdgeCases(char c, bool toLower)
		{
			string swap = null;
			switch (c)
			{
				case 'ı':
					swap = "i";
					break;
				case 'ł':
					swap = "l";
					break;
				case 'Ł':
					swap = toLower ? "l" : "L";
					break;
				case 'đ':
					swap = "d";
					break;
				case 'ß':
					swap = "ss";
					break;
				case 'ø':
					swap = "o";
					break;
				case 'Þ':
					swap = "th";
					break;
			}
			return swap;
		}
	}

and here are my mbunit tests:

	[TestFixture]
	public class When_Creating_Slug
	{
		[Test]
		[Row("ṃ,ỹ,ṛ,è,ş,ư,ḿ,ĕ", "m-y-r-e-s-u-m-e")]
		[Row("á-é-í-ó-ú", "a-e-i-o-u")]
		[Row("à,å,á,â,ä,ã,å,ą", "a-a-a-a-a-a-a-a")]
		[Row("è,é,ê,ë,ę", "e-e-e-e-e")]
		[Row("ì,í,î,ï,ı", "i-i-i-i-i")]
		[Row("ò,ó,ô,õ,ö,ø", "o-o-o-o-o-o")]
		[Row("ù,ú,û,ü", "u-u-u-u")]
		[Row("ç,ć,č", "c-c-c")]
		[Row("ż,ź,ž", "z-z-z")]
		[Row("ś,ş,š", "s-s-s")]
		[Row("ñ,ń", "n-n")]
		[Row("ý,Ÿ", "y-Y")]
		[Row("ł,Ł", "l-L")]
		[Row("đ", "d")]
		[Row("ß", "ss")]
		[Row("ğ", "g")]
		[Row("Þ", "th")]
		public void Should_Remove_Accents_Case_Invariant(string value, string expected)
		{
			var result = Slug.Create(false, value);
			
			Assert.AreEqual(expected, result);
		}

		[Test]
		[Row("ý,Ÿ", "y-y")]
		[Row("ł,Ł", "l-l")]
		public void Should_Remove_Accents_To_Lower(string value, string expected)
		{
			var result = Slug.Create(true, value);
			
			Assert.AreEqual(expected, result);
		}

		[Test]
		[Row("Slug Me ", "Slug-Me")]
		[Row("Slug Me,", "Slug-Me")]
		[Row("Slug Me.", "Slug-Me")]
		[Row("Slug Me/", "Slug-Me")]
		[Row("Slug Me\", "Slug-Me")]
		[Row("Slug Me-", "Slug-Me")]
		[Row("Slug Me_", "Slug-Me")]
		[Row("Slug Me=", "Slug-Me")]
		[Row("Slug Me--", "Slug-Me")]
		[Row("Slug Me---,", "Slug-Me")]
		public void Should_Remove_Trailing_Punctuation(string value, string expected)
		{
			var result = Slug.Create(false, value);

			Assert.AreEqual(expected, result);
		}
	}

After all this, I’ve now realised I don’t really need this code where I thought I did. My use case was to convert people’s names into slugs for a name directory, but having done all this work, I had a look at how facebook does it (after all no harm copying the industry leaders). Well I wish I’d done this first, as it turns out, they just leave the accents in the names when they display them in a directory! Oh well, I’m pretty sure they will normalise for searching to maximise matches, and this code is sound for article type slugification rather than names.

With thanks to Tom Chantler for the spotting the bug handling large whitespace strings.

Storing Custom Data in Forms Authentication Tickets

This article looks at storing custom data in asp.net forms authentication tickets. I recently updated the article to make the custom model binder generic, and add the necessary registration code which was missing from the first draft.

So you’ve decided to use FormsAuthentication, and perhaps enhanced it with your own custom providers. In your AccountController Login method you probably have a call along these lines:

FormsAuthentication.SetAuthCookie(account.Id.ToString(), model.RememberMe);

That all works great, but what if you need to store some extra data in the cookie. Perhaps the name you are passing into the AuthTicket isn’t actually the users name, but a GUID. Suddenly that built in ASP.Net login widget, in the top right of the page, doesn’t seem so great when it looks like this:

Hello 5D1D4743-9941-40B5-8931-6BC12617946C

What we need to do is store some extra data in that AuthTicket cookie right? That way we can keep the GUID as the authentication id, but still store things like the users first name in the cookie. Thus saving an expensive round trip to the db each time we render the widget.

Hmmm… whats this ‘UserData’ property we see on the AuthTicket? Perfect!

Erk… It’s read only?!?!?!

At least that’s how my thought process went.

So we need to make an authentication ticket ourselves:

var ticket = FormsAuthenticationTicket(int version, string name, DateTime issueDate,
	DateTime expiration, bool isPersistent, string userData, string cookiePath);

Unfortunately that’s quite a few more parameters than SetAuthCookie(…) required and they should be coming from the web.config rather than hard-coded.

On the plus side, there is access to the UserData!

To avoid losing the web.config driven settings, we can do a little trick and get FormsAuthentication to do the parsing for us. All we need to do is ask it for an AuthTicket and copy the settings from that into a new one we create.

To do this, a few steps are required. Firstly, after getting the ticket, we have to decrypt it, copy the data into a new ticket, and then make sure we encrypt that. Then we need to add it to the response.

Now before getting to the code, we should think about where it should live. It would seem logical to encapsulate this an extension method on FormsAuthentication, but being a static class we can’t. Instead we can attach it to HttpResponseBase which is not a bad home, especially as we have to add the cookie onto a response anyway. I’d recommend creating the following class in an ‘Infrastructure’ folder in your project:

	public static class HttpResponseBaseExtensions
	{
		public static int SetAuthCookie<T>(this HttpResponseBase responseBase, string name, bool rememberMe, T userData)
		{
			/// In order to pickup the settings from config, we create a default cookie and use its values to create a 
			/// new one.
			var cookie = FormsAuthentication.GetAuthCookie(name, rememberMe);
			var ticket = FormsAuthentication.Decrypt(cookie.Value);
			
			var newTicket = new FormsAuthenticationTicket(ticket.Version, ticket.Name, ticket.IssueDate, ticket.Expiration,
				ticket.IsPersistent, userData.ToJson(), ticket.CookiePath);
			var encTicket = FormsAuthentication.Encrypt(newTicket);

			/// Use existing cookie. Could create new one but would have to copy settings over...
			cookie.Value = encTicket;

			responseBase.Cookies.Add(cookie);

			return encTicket.Length;
		}
	}

There are a couple of things of note here, firstly we are accepting a generic type for the UserData, and secondly we are encoding it to Json!

Why? well lets think about the UserData field. Being on a cookie, this can only contain string data. Now we could do our own custom serialisation into this string, but my preference is to use JSON as it’s designed for the task. In this instance I’m using the serialiser from MongoDb as I happen to be using that in my project, but any Json serialiser will do. You might like to try the ServiceStack implementation for example.

I’m also returning the size of the cookie – cookies should never be longer than 4000 bytes as some browsers will just discard them. Its worth keeping an eye on this as it’s not just the size of your UserData but the other mandatory parts of the cookie too.

So let’s get this wired into our AccountController.

First we define a UserData class with a FirstName in it:

	public class UserData
	{
		public string FirstName { get; set; }

		public UserData()
		{
			FirstName = "Unknown";
		}
	}

Now here’s an example Login Action. There are some extras in here around validation, but you can use whatever approach here that fits your project.

[HttpPost]
		public ActionResult LogIn(AccountLoginVM model, string returnUrl)
		{
			try
			{
				if (ModelState.IsValid)
				{
					// Some code to validate and check authentication
					if (!Membership.ValidateUser(model.Email, model.Password))
						throw new RulesException("Incorrect username or password");

					Account account = _accounts.GetByEmail(model.Email);

					UserData userData = new UserData
					{
						FirstName = account.FirstName
					};

					Response.SetAuthCookie(account.Id.ToString(),
						model.RememberMe, userData);
				
					if (Url.IsLocalUrl(returnUrl))
					{
						return Redirect(returnUrl);
					}
					else
					{
						return RedirectToAction("Index", "Home");
					}
				}
			}
			catch (RulesException ex)
			{
				ex.CopyTo(ModelState);
			}

			model.Password = "";
			return View(model);
		}

That’s it. We’ve now got a cookie with our extra UserData in it.

Hang on… what about fixing that login widget in the top right?

One elegant way to crack this is to create a custom model binder, then if we swap the example widget from being a partial view to a partial action, all we need to do is demand a UserData object as an input param and the magic of binding will save us.

So, the custom model binder, again leveraging the MongoDb Json deserialiser:

	/// <summary>
	/// Binder to pull the UserData out for any actions that may want it.
	/// </summary>
	public class UserDataModelBinder<T> : IModelBinder
	{
		public object BindModel(ControllerContext controllerContext,
			ModelBindingContext bindingContext)
		{
			if (bindingContext.Model != null)
				throw new InvalidOperationException("Cannot update instances");
			if (controllerContext.RequestContext.HttpContext.Request.IsAuthenticated)
			{
				var cookie = controllerContext
					.RequestContext
					.HttpContext
					.Request
					.Cookies[FormsAuthentication.FormsCookieName];

				if (null == cookie)
					return null;

				var decrypted = FormsAuthentication.Decrypt(cookie.Value);

				if (!string.IsNullOrEmpty(decrypted.UserData))
					return BsonSerializer.Deserialize<T>(decrypted.UserData);
			}
			return null;
		}
	}

This is a generic so you can use whatever class suits to store the userdata. This then needs to be registered in Application_Start() in ‘Global.asax.cs’ :

ModelBinders.Binders.Add(typeof(UserData), new UserDataModelBinder<UserData>());

Now our login widget action, which passes a UserData object into our view (wrapped in a view model as we may not always want to pass all the UserData into the view).

		public ActionResult LoginWidget(UserData userData)
		{
			AccountLoginWidgetVM model = new AccountLoginWidgetVM();
			if (null != userData)
				model.UserData = userData;

			return PartialView(userData);
		}
@model TestProj.Web.Models.AccountLoginWidgetVM
         
@if(Request.IsAuthenticated) {
    <text>Welcome <b>@Model.UserData.FirstName</b>!
    [ @Html.ActionLink("Logout", "Logout", "Account") ]</text>
}
else {
...
}

We’ve covered quite a broad range of topics here, but hopefully its clear and of use. If you need any clarification leave a comment.

Next time… a change of tack. I’m going to look at how to get some performance out of a devexpress WPF grid.