Efficiently Using Lists as Dictionary Keys in C#

Updated on 2020-02-24

A relatively safe, simple, yet high performance technique for using lists as dictionary keys.

Introduction

When doing functional style programming in C#, you'll probably eventually run into a situation where you need to key a dictionary by an ordered list of items. There are good ways and not so good ways to do this. This article aims to present you with an efficient and relatively safe way to use lists as dictionary keys.

Conceptualizing this Mess

Dictionary keys must be comparable for equality, must present a proper hash code, and must be immutable. These requirements make it a little bit involved to use lists as dictionary keys since lists don't typically provide value semantics - in this case, item by item comparison and a proper hash code. Perhaps worse, lists allow you to modify them at any time.

In summary, there are two major issues here, and one is comparing lists item by item isn't very efficient, and using a standard IList implementation as a dictionary key is fundamentally unsafe since again it can be modified at any time.

We can solve the latter more or less by requiring the use of IReadOnlyList which provides list-like access without the methods that modify the list. We're going to take a liberty here though when we declare our KeyList class - we'll be adding an Add() method to it. The reason for this is to allow you to fill the list in the first place. If we wanted to be extra safe, we'd nix the Add() method and force you to pass the items as an array into the constructor of the KeyList. The reason this was not done was for performance. KeyList's primary purpose is performance, and passing in the data to the constructor would require a copy. Copies aren't horribly slow necessarily, but we don't need it here - we're okay here since anything that takes IReadOnlyList will not see our Add() method anyway and this way, we avoid spurious copies.

The former requires a little more work. We have to implement value equality semantics on our KeyList. Here, we pull an important trick though: We recompute the hash code as each item is added to the list, that way we don't have to compute it later. We then use that hash code as part of our equality check to short circuit if they aren't equal. Let's take a look at the code for the KeyList class next.

Coding this Mess

sealed class KeyList<T> : IReadOnlyList<T>, IEquatable<KeyList<T>>
{
    int _hashCode;
    List<T> _items;
    public KeyList()
    {
        _items = new List<T>();
        _hashCode = 0;
    }
    public KeyList(int capacity)
    {
        _items = new List<T>(capacity);
        _hashCode = 0;
    }
    public T this[int index] => _items[index];

    public int Count => _items.Count;

    public IEnumerator<T> GetEnumerator()
    {
        return _items.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
    public void Add(T item)
    {
        _items.Add(item);
        if (null != item)
            _hashCode ^= item.GetHashCode();
    }
    public override int GetHashCode()
    {
        return _hashCode;
    }
    public bool Equals(KeyList<T> rhs)
    {
        if (ReferenceEquals(this, rhs))
            return true;
        if (ReferenceEquals(rhs, null))
            return false;
        if (rhs._hashCode != _hashCode)
            return false;
        var ic = _items.Count;
        if (ic != rhs._items.Count)
            return false;
        for(var i = 0;i<ic;++i)
            if (!Equals(_items[i], rhs._items[i]))
                return false;
        return true;
    }
    public override bool Equals(object obj)
    {
        return Equals(obj as KeyList<T>);
    }
}

The important methods here are Add() and the main Equals() method. You can see in Add() we're modifying the _hashCode field for each item added. You can see in the Equals() method we short circuit if the hash codes are not equal, before finally doing an item by item comparison. This can vastly improve performance. The other performance gain is in the GetHashCode() method which doesn't have to compute anything. Since these methods are typically called by the IDictionary<TKey,TValue> implementation, we want them to be as quick as possible, especially the GetHashCode() method which is called a lot by Dictionary<TKey, TValue>.

If you run the code in Release mode, you'll get accurate performance metrics. In Debug not so much, so the program will warn you if you're running in Debug. You can see we work the dictionaries, one filled with standard lists, the other filled with our special KeyList instances. You should see a 5x+ performance increase with the KeyList compared to using List and since it only implements IReadOnlyList for list access it's safer, too. Now both of our initial problems are solved. Enjoy.

Caveat

It has been pointed out that this is not correct if the T type is mutable. This is true. However, the same is true of Dictionary<TKey, TValue> and HashSet so the same caveat applies there as here - do not change the values of items that are added to these lists, just like you should not change key values added to dictionaries and hashsets.

Using Other Containers

It is possible to adapt this technique to other collections if you need to. You can adapt it to a HashSet if you need (relatively) efficient unordered comparisons. I've done this in some of my projects. Just make sure to use SetEquals() in your Equals() method to do the item by item comparison, because it's faster.

History

  • 24th February, 2020 - Initial submission