Unit testing Entity Framework repositories

Writing unit tests for repositories/ data access code can be a bit tricky and can quickly become an integration test and take more execution time than you want.

In this blog post, I will walk you through 3 different options to perform unit tests on repositories. There isn't a right or wrong one, it all depends on your use case/ current situation.

When writing unit test I try to follow the following rules:

  • don't test code you don't own
  • make a test run as fast as possible
  • unit test code should be easy (and quick) to understand

With this in mind; the parts we like to test of a repository are:

  • LINQ queries
  • Is a call performed to store, update, or save changes; I don't care if it is persisted, that's the responsibility of the database.

We use XUnit.Net as test framework and FakeItEasy for mocking in this post, but you could probably just as easy use any other combination.

Working with a fake DbContext

One option is to write a custom DbContext that performs actions on a regular List<T> to prepare the state of our DbContext without the need for the actual database.

Which allow us to write test that looks like:

Testing query logic:

[Fact]
public void Must_get_foo_by_id()  
{
    // Arrange
    var foo = new Foo { Id = 1 };
    _fooDbContext.Foos.Insert(foo);
    var sut = new FooRepository(_fooDbContext);

    // Act
    var result = sut.GetById(1);

    // Assert
    Assert.Equal(foo, result);
}

Adding, updating or removing entities can be tested in 2 different ways.

One way is just to look in the List object in our fake DbSet if the item is removed, added, or updated like below:

[Fact]
public void Must_add_one_item_to_the_data_store()  
{
    // Arrange
    var foo = new Foo { Id = 1 };
    var sut = new FooRepository(_fooDbContext);

    // Act
    sut.Add(foo);

    // Assert
    Assert.Equal(foo, sut.Foos.First());
}

Another (better) way, is just to test if the correct calls are made to delegate the action to the data store. It's the responsibility of the data store to perform this correctly not ours.

public void Must_delegate_the_removal_of_the_entity_to_the_data_store()  
{
    // Arrange
    var context = A.Fake<MyContext>();
    var foo = new Foo { Id = 1 };
    var repo = new FooRepository(context);

    // Act
    repo.Remove(foo);

    // Arrange
    A.CallTo(() => context.Foos.Remove(foo).MustHaveHappened();
}

public void Must_persist_changed_to_data_store()  
{
    // Arrange
    var context = A.Fake<MyContext>();
    var repo = new FooRepository(context);

    // Act
    repo.Remove(new Foo());

    // Arrange
    A.CallTo(() => context.SaveChanges()).MustHaveHappened();
}

To do this you need to implement some helper classes once, which you can reuse each time:

Creating a fake IDbSet<T>

Create a class that implements IDbSet<T> which contains an inner List<T> and redirects all the calls to the inner list.

public class TestDbSet<T> : IDbSet<T>, IDbAsyncEnumerable<T> where T : class, new()  
{
    private readonly List<T> _data = new List<T>();

    public IEnumerator<T> GetEnumerator()
    {
        return _data.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return _data.GetEnumerator();
    }

    public Expression Expression => Expression.Constant(_data.AsQueryable());

    public Type ElementType => typeof(T);

    public IQueryProvider Provider => new TestDbAsyncQueryProvider<T>(_data.AsQueryable().Provider);

    public T Find(params object[] keyValues)
    {
        throw new NotImplementedException();
    }

    public T Add(T entity)
    {
        _data.Add(entity);
        return entity;
    }

    public T Remove(T entity)
    {
        _data.Remove(entity);
        return entity;
    }

    public T Attach(T entity)
    {
        _data.Add(entity);
        return entity;
    }

    public T Create()
    {
        return Activator.CreateInstance<T>();
    }

    public TDerivedEntity Create<TDerivedEntity>() where TDerivedEntity : class, T
    {
        return Activator.CreateInstance<TDerivedEntity>();
    }

    public ObservableCollection<T> Local => new ObservableCollection<T>(_data);

    IEnumerator<T> IEnumerable<T>.GetEnumerator()
    {
        return _data.GetEnumerator();
    }

    IDbAsyncEnumerator<T> IDbAsyncEnumerable<T>.GetAsyncEnumerator()
    {
        return new TestDbAsyncEnumerator<T>(_data.GetEnumerator());
    }

    public IDbAsyncEnumerator GetAsyncEnumerator()
    {
        return new TestDbAsyncEnumerator<T>(_data.AsQueryable().GetEnumerator());
    }
}

To make the above code compile you need to also add the definition for TestDbAsyncQueryProvider and TestDbAsyncEnumerator<T>.

Create a file named TestDbAsyncQueryProvider.cs and add the following content:

internal class TestDbAsyncQueryProvider<TEntity> : IDbAsyncQueryProvider  
{
    private readonly IQueryProvider _inner;

    internal TestDbAsyncQueryProvider(IQueryProvider inner)
    {
        _inner = inner;
    }

    public IQueryable CreateQuery(Expression expression)
    {
        return new TestDbAsyncEnumerable<TEntity>(expression);
    }

    public IQueryable<TElement> CreateQuery<TElement>(Expression expression)
    {
        return new TestDbAsyncEnumerable<TElement>(expression);
    }

    public object Execute(Expression expression)
    {
        return _inner.Execute(expression);
    }

    public TResult Execute<TResult>(Expression expression)
    {
        return _inner.Execute<TResult>(expression);
    }

    public Task<object> ExecuteAsync(Expression expression, CancellationToken cancellationToken)
    {
        return Task.FromResult(Execute(expression));
    }

    public Task<TResult> ExecuteAsync<TResult>(Expression expression, CancellationToken cancellationToken)
    {
        return Task.FromResult(Execute<TResult>(expression));
    }
}

Create a file named TestDbAsyncEnumerator.cs and add the following content:

internal class TestDbAsyncEnumerator<T> : IDbAsyncEnumerator<T>  
{
    private readonly IEnumerator<T> _inner;

    public TestDbAsyncEnumerator(IEnumerator<T> inner)
    {
        _inner = inner;
    }

    public void Dispose()
    {
        _inner.Dispose();
    }

    public Task<bool> MoveNextAsync(CancellationToken cancellationToken)
    {
        return Task.FromResult(_inner.MoveNext());
    }

    public T Current => _inner.Current;

        object IDbAsyncEnumerator.Current => Current;
}

Create a file named TestDbAsyncEnumerable.cs and add the following content:

internal class TestDbAsyncEnumerable<T> : EnumerableQuery<T>, IDbAsyncEnumerable<T>, IQueryable<T>  
{
    public TestDbAsyncEnumerable(IEnumerable<T> enumerable)
        : base(enumerable)
    { }

    public TestDbAsyncEnumerable(Expression expression)
        : base(expression)
    { }

    public IDbAsyncEnumerator<T> GetAsyncEnumerator()
    {
        return new TestDbAsyncEnumerator<T>(this.AsEnumerable().GetEnumerator());
    }

    IDbAsyncEnumerator IDbAsyncEnumerable.GetAsyncEnumerator()
    {
        return GetAsyncEnumerator();
    }

    IQueryProvider IQueryable.Provider => new TestDbAsyncQueryProvider<T>(this);
}

We will create an XUnit fixture to store our code that prepares and creates the fake DbContext. The class will use FakeItEasy to create a fake DbContext and uses reflection to replace the properties that are of the type IDbSet<T> with an instance of the TestDbSet<T> class we created which contains the inner list.

public class DbContextFixture  
{
    public static T CreateDbContext<T>() where T : DbContext
    {
        var dbContext = A.Fake<T>();
        A.CallTo(() => dbContext.SaveChanges()).Returns(0);

        var properties = typeof(T).GetProperties(
            BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic);

        foreach (var property in properties)
        {
            if (property.PropertyType.IsGenericType && 
                property.PropertyType.GetGenericTypeDefinition().IsAssignableFrom(typeof(IDbSet<>)))
            {
                var innerType = property.PropertyType.GetGenericArguments()[0];
                property.SetValue(dbContext, CreateGeneric(typeof (TestDbSet<>), innerType));
            }
        }

        return dbContext;
    }

    private static object CreateGeneric(Type generic, Type innerType, params object[] args)
    {
        return Activator.CreateInstance(generic.MakeGenericType(innerType), args);
    }
}

Adding a extension method to make adding items easier/ clearer.

public static class DbSetExtentions  
{
    public static void Insert<T>(this IDbSet<T> dbset, params T[] range) where T : class
    {
        foreach (var item in range)
        {
            dbset.Add(item);
        }
    }
}

We are now able to setup our test class in the following way:

public class FooRepositoryTests  
{
    [Collection("DbContext")]
    public class GetById
    {
        private readonly FooDbContext _fooDbContext;

        public GetById(DbContextFixture fixture)
        {
            _fooDbContext = fixture.CreateDbContext<FooDbContext>();
        }

        [Fact]
        public void Must_get_foo_by_id()
        {
            // Arrange
            var foo1 = new Foo { Id = 1 };
            _fooDbContext.Foos.AddRange(new Foo { Id = 2 }, foo1);

            var fooRepository = new FooRepository(_fooDbContext);

            // Act
            var result = fooRepository.GetById(1);

            // Assert
            Assert.Equal(foo1, result);
        }
    }
}

EF Core (previously named Entity Framework 7.0) InMemory data store

The re-written entity framework bits used in dot net core, provide a useful alternative to the above code called InMemory data store. This allows you to run your existing code against a data store that temporarily persists in memory for testing. Which means it requires less of the above boilerplate code to get started.

One downside is that to get a 'clean database' for each unit test some time-consuming setup tasks needs to be executed to get a clean state. When building read-only unit tests, this might not be that big of a deal, because the data store could be re-used. I am not sure if this gives the clearest tests because your database would also contain data to resolve other unit test scenarios, so you end up with a big setup method adding a lot of data for every case.

The upside is that it allows you to use entity framework queries like you would in production and perform actions as you would do in production. Actions could probably still be mocked to only test for delegation of actions, instead of testing if entity framework does its job.

More can be found here: https://docs.efproject.net/en/latest/miscellaneous/testing.html

To create a clean InMemory data store you need the following boilerplate code:

private static DbContextOptions<MyContext> CreateNewContextOptions()  
{
    // Create a fresh service provider, and therefore a fresh 
    // InMemory database instance.
    var serviceProvider = new ServiceCollection()
        .AddEntityFrameworkInMemoryDatabase()
        .BuildServiceProvider();

    // Create a new options instance telling the context to use an
    // InMemory database and the new service provider.
    var builder = new DbContextOptionsBuilder<MyContext>();
    builder.UseInMemoryDatabase()
            .UseInternalServiceProvider(serviceProvider);

    return builder.Options;
}

Which can be used in almost the same way as the other scenario above:

public class FooRepositoryTests  
{
    public class GetById
    {
        private readonly FooDbContext _fooDbContext;

        public GetById()
        {
            _fooDbContext = CreateNewContextOptions();
        }

        [Fact]
        public void Must_get_foo_by_id()
        {
            // Arrange
            var foo1 = new Foo { Id = 1 };
            _fooDbContext.Foos.AddRange(new Foo { Id = 2 }, foo1);

            var fooRepository = new FooRepository(_fooDbContext);

            // Act
            var result = fooRepository.GetById(1);

            // Assert
            Assert.Equal(foo1, result);
        }
    }
}

A more Domain-driven design like approach

When using domain-driven design, we would try to get our model very close to how our domain experts talk about the different 'things' in the bounded context. Which makes it easy to introduce the specification pattern, because that would be something the domain expert specifies to define a restriction or query.

For example:

I would like to give all the customers from the Netherlands a discount code to get a 10% discount on their next purchase when I press this button.

or

I would like to get a report every month of customers from the Netherlands that used a discount code to purchase goods.

or

Only customers from the Netherlands are allowed to submit an order with a discount code.

A specification could be implemented in the following way:

With a generic interface ISpecification of T

public interface ISpecification<T>  
{
    Expression<Func<T, bool>> Predicate { get; }

    bool IsSatisfiedBy(T item);
}

An abstract implementation of this interface that implements the more generic IsSatisfiedBy part of the interface. One note: you might want to cache the compiled of the lambda expression for performance benefits if it is often executed.

public abstract class Specification<T> : ISpecification<T>  
{
    public abstract Expression<Func<T, bool>> Predicate
    {
        get;
    }

    public bool IsSatisfiedBy(T item)
    {
        return this.Predicate.Compile().Invoke(item);
    }
}

And the specification containing the actual logic:

public class CustomersFromTheNetherlandsSpecification : Specification<Customer>  
{
    public override Expression<Func<Customer, bool>> Predicate
    {
        get
        {
            return c => c.CountryCode.toLowerCase() == "nl";
        }
    }
}

Because the logic of the query is now moved to a specification class, we are required to change our repository implementation a bit. (In the above cases, one could also create a more generic repository class that accepts a lambda, but that is still code that is hard to test without preparing the data store).

public interface IRepository<T> where T : class  
{
    void Add(T item);

    void Remove(T item);

    IEnumerable<T> Get(ISpecification<T> specification);

    Task<T> GetFirstOrDefault(ISpecification<T> specification);
}

And create a generic base class:

public abstract class GenericEFRepository<T, C> : IRepository<T>  
        where T : class 
        where C : DbContext
{
    private readonly C context;
    private readonly DbSet<T> set;

    public GenericEFRepository(C context)
    {
        this.context = context;
        this.set = this.context.Set<T>();
    }

    public async Task Add(T item)
    {
        this.set.Add(item);
        await this.Save();
    }

    public IEnumerable<T> Get(ISpecification<T> specification)
    {
        return this.set.Where(specification.Predicate);
    }

    public async Task<T> GetFirstOrDefault(ISpecification<T> specification)
    {
        return await this.set.FirstOrDefaultAsync(specification.Predicate);
    }

    public async Task Remove(T item)
    {
        this.set.Remove(item);
        await this.Save();
    }

    private async Task<int> Save()
    {
        return await this.context.SaveChangesAsync();
    }
}

You might expect the Save method on the repository to perform batch operations, without performing a lot of calls to the data store. A batch operation is a unit of work because it's more than one thing you do. I've separated this to the following interface and use an abstract implementation that won't call Save from the Add and Remove calls, but only performs this action on calling Commit:

public interface IUoWRepository<T> : IDisposable, IRepository<T> where T : class  
{
    Task<int> Commit();
}

This is probably the cleanest way of testing only the condition (query) that we own because there is not logic other than just the definition of the condition. Unit tests to check if the add, remove, or update actions are delegated can still be done with mocking like in the first scenario.

Specifications can also be used to filter or apply restrictions to non-entity framework data, which would decrease duplicated code.

For example (kept simple for demonstration purpose):

public void SubmitOrder(CustomerOrderViewModel view)  
{
    if (!new CustomersFromTheNetherlandsSpecification().IsSatisfiedBy(view.Customer))
    {
        throw new Exception("Discount code cannot be applied");
    }
}

If we look back to the rules, this is probably the best options to match them:

  • don't test code you don't own
    • we only need to test the specification condition.
  • make a test run as fast as possible
    • it's only one condition to test, which performs faster than creating a list and filling it.
  • unit test code should be easy (and quick) to understand
    • you now only need to know the condition, not look at what is currently in the data store and what is not in there.

One downside is that every query will result in a new class/ file on disk, which might not be that manageable if you stick them all in one folder. But you could also say it gives you a better view of the complexity of your application query logic because it is not hidden in business logic code or a large repository class with methods to query data.

Performance differences

As we expected the performance of the specification is the fastest, because it only needs to test one condition. The EF6 Fake is in the second place, and the EFCore InMemory is the slowest. This is mostly due to the creation of a clean data store each time. For read-only scenario's it will perform better.

10.000 iterations, values are in milliseconds.

TypeAverageMaxMinTotal
EF6 Fake0,135845901358
EFCore InMemory6,21331059462133
Specification 0,009 12 0 90

EF6

static void Main(string[] args)  
{
    List<long> timings = new List<long>();
    for (var i = 0; i < 10000; i++)
    {
        var sq = Stopwatch.StartNew();

        using (var context = CreateDbContext<MyContext>())
        {
            context.Foos.Add(new Foo { Id = 4, Name = "test4" });
            context.Foos.Add(new Foo { Id = 5, Name = "test5" });
            context.SaveChanges();

            var fooObj = context.Foos.FirstOrDefault(foo => foo.Name == "test4");
        }

        sq.Stop();
        timings.Add(sq.ElapsedMilliseconds);

    }

    Console.WriteLine("EF6Fake => Average: " + timings.Average() + " msec, max: " + timings.Max() + " msec, min: " + timings.Min() + " msec, total: " + timings.Sum() + " msec");
    Console.ReadLine();
}

EFCore

public static void Main(string[] args)  
{
    List<long> timings = new List<long>();
    for (var i = 0; i < 1000; i++)
    {
        var sq = Stopwatch.StartNew();
        using (var context = new MyContext(CreateNewContextOptions()))
        {
            context.Foos.Add(new Foo { Id = 4, Name = "test4" });
            context.Foos.Add(new Foo { Id = 5, Name = "test5" });
            context.SaveChanges();

            var fooObj = context.Foos.FirstOrDefault(foo => foo.Name == "test4");
        }

        sq.Stop();
        timings.Add(sq.ElapsedMilliseconds);

    }

    Console.WriteLine("EFCore => Average: " + timings.Average() + " msec, max: " + timings.Max() + " msec, min: " + timings.Min() + " msec, total: " + timings.Sum() + " msec");
    Console.ReadLine();
}

Specification code

static void Main(string[] args)  
{
    List<long> timings = new List<long>();
    for (var i = 0; i < 10000; i++)
    {
        var sq = Stopwatch.StartNew();
        var specification = new FooByNameSpecification("test4");
        var resultTrue = specification.IsSatisfiedBy(new Foo { Id=1, Name = "test4"});
        var resultFalse = specification.IsSatisfiedBy(new Foo { Id = 1, Name = "test5" });
        sq.Stop();
        timings.Add(sq.ElapsedMilliseconds);
    }

    Console.WriteLine("Specification => Average: " + timings.Average() + " msec, max:" + timings.Max() + " msec, min: " + timings.Min() + " msec, total: " + timings.Sum() + " msec");
    Console.ReadLine();
}

Conclusion

EF6Core InMemory looks bad in the performance comparison, but if you would own around 1000 unit tests, it would still only take around 6 to 7 seconds to complete all of them. Once you re-use the data store context, it would perform a lot faster but might give a bit less clear tests in some cases. One downside is that it would require the existing code base to move to EFCore, which might be a big step to take.

The EF6 DbContext Fake is probably the easiest way to start unit testing your existing code base. Performs well and won't allow a lot of additional modifications to your code (If it currently is injected with a DbContext).

The DDD way requires a new way of thinking. The transformation of an existing code base might be a lot harder. In a recent code base, it will make unit testing the query conditions very smooth and bright, and fast. The result will probably be cleaner, more understandable code.