This post describes the use of yield and compares it to building and returning a list behind an IEnumerable<T> interface.
Download Source Code
Setup
The example consists of a contact store that will allow the client to retrieve a collection of contacts.
The IStore.GetEnumerator method must return IEnumerable<T>, which is a strongly typed generic interface that describes the ability to fetch the next item in the collection.
The actual implementation of the collection can be decided by the concrete implementation. For example, the collection could consist of an array, generic list or yielded items.
public interface IStore<out T>
{
IEnumerable<T> GetEnumerator();
}
public class ContactModel
{
public string FirstName { get; set; }
public string LastName { get; set; }
}
Calling GetEnumerator
Let's create two different stores, call the GetEnumerator on each store and evaluate the console logs to determine if there is a difference between the List Store and the Yield Store.
List Store
The code below is a common pattern I've observed during code reviews, where a list is instantiated, populated and returned once ALL of the records have been constructed.
public class ContactListStore : IStore<ContactModel>
{
public IEnumerable<ContactModel> GetEnumerator()
{
var contacts = new List<ContactModel>();
Console.WriteLine("ContactListStore: Creating contact 1");
contacts.Add(new ContactModel() { FirstName = "Bob", LastName = "Blue" });
Console.WriteLine("ContactListStore: Creating contact 2");
contacts.Add(new ContactModel() { FirstName = "Jim", LastName = "Green" });
Console.WriteLine("ContactListStore: Creating contact 3");
contacts.Add(new ContactModel() { FirstName = "Susan", LastName = "Orange" });
return contacts;
}
}
static void Main(string[] args)
{
var store = new ContactListStore();
var contacts = store.GetEnumerator();
Console.WriteLine("Ready to iterate through the collection.");
Console.ReadLine();
}
ContactListStore: Creating contact 1
ContactListStore: Creating contact 2
ContactListStore: Creating contact 3
Ready to iterate through the collection.
Note: The entire collection was loaded into memory without even asking for a single item in the list.
Yield Store
The yield alternative is shown below, where each instance is returned as soon as it is produced.
public class ContactYieldStore : IStore<ContactModel>
{
public IEnumerable<ContactModel> GetEnumerator()
{
Console.WriteLine("ContactYieldStore: Creating contact 1");
yield return new ContactModel() { FirstName = "Bob", LastName = "Blue" };
Console.WriteLine("ContactYieldStore: Creating contact 2");
yield return new ContactModel() { FirstName = "Jim", LastName = "Green" };
Console.WriteLine("ContactYieldStore: Creating contact 3");
yield return new ContactModel() { FirstName = "Susan", LastName = "Orange" };
}
}
static void Main(string[] args)
{
var store = new ContactYieldStore();
var contacts = store.GetEnumerator();
Console.WriteLine("Ready to iterate through the collection.");
Console.ReadLine();
}
Ready to iterate through the collection.
Note: The collection wasn't executed at all. This is due to the "deferred execution" nature of IEnumerable. Constructing an item will only occur when it is really required.
Let's call the collection again and obverse the behaviour when we fetch the first contact in the collection.
static void Main(string[] args)
{
var store = new ContactYieldStore();
var contacts = store.GetEnumerator();
Console.WriteLine("Ready to iterate through the collection");
Console.WriteLine("Hello {0}", contacts.First().FirstName);
Console.ReadLine();
}
Ready to iterate through the collection
ContactYieldStore: Creating contact 1
Hello Bob
Nice! Only the first contact was constructed when the client "pulled" the item out of the collection.
Possible multiple enumeration of IEnumerable
Have you ever noticed the "possible multiple enumeration of IEnumerable" warning from ReSharper? ReSharper is warning us about a potential double handling issue, particularly for deferred execution functions such as yield and Linq.
Have a look at the results produced from the code below.
static void Main(string[] args)
{
var store = new ContactYieldStore();
var contacts = store.GetEnumerator();
Console.WriteLine("Ready to iterate through the collection");
if (contacts.Any())
{
foreach (var contact in contacts)
{
Console.WriteLine("Hello {0}", contact.FirstName);
}
}
Console.ReadLine();
}
Ready to iterate through the collection
ContactYieldStore: Creating contact 1
ContactYieldStore: Creating contact 1
Hello Bob
ContactYieldStore: Creating contact 2
Hello Jim
ContactYieldStore: Creating contact 3
Hello Susan
Note: The first contact was constructed twice, hence the multiple enumeration warning. Where possible, avoid checking if a collection has items before looping. It doesn't really hurt looping through a collection containing zero items. Checking that the collection is not null before looping is highly recommended. Worst yet, calling .Count() means that the entire collection will be built twice!
IEnumerable.ToList()
What if we have a requirement to materialize (build) the entire collection immediately? The answer is shown below.
static void Main(string[] args)
{
var store = new ContactYieldStore();
var contacts = store.GetEnumerator().ToList();
Console.WriteLine("Ready to iterate through the collection");
Console.ReadLine();
}
ContactYieldStore: Creating contact 1
ContactYieldStore: Creating contact 2
ContactYieldStore: Creating contact 3
Ready to iterate through the collection
Calling .ToList() on IEnumerable will build the entire collection up front.
Comparison
The list implementation loaded all of the contacts immediately whereas the yield implementation provided a deferred execution solution.
In the list example, the caller doesn't have the option to defer execution. The yield approach provides greater flexibility since the caller can decide to pre-load the data or pull each record as required. A common trap to avoid is performing multiple enumerations on the same collection since yield and Linq functions will perform the same operation for each enumeration.
In practice, it is often desirable to perform the minimum amount of work needed in order to reduce the resource consumption of an application.
For example, we may have an application that processes millions of records from a database. The following benefits can be achieved when we use IEnumerable in a deferred execution pull-based model:
-
Scalability, reliability and predictability are likely to improve since the number of records does not significantly affect the application’s resource requirements.
-
Performance and responsiveness are likely to improve since processing can start immediately instead of waiting for the entire collection to be loaded first.
-
Recoverability and utilisation are likely to improve since the application can be stopped, started, interrupted or fail. Only the items in progress will be lost compared to pre-fetching all of the data where only a portion of the results was actually used.
-
Continuous processing is possible in environments where constant workload streams are added.