Software Design Blog - IEnumerable Simple solutions to solve complex problems / http://www.rssboard.org/rss-specification BlogEngine.NET 3.1.1.0 en-US /opml.axd http://www.dotnetblogengine.net/syndication.axd Jay Strydom Software Design Blog 0.000000 0.000000 10x Performance Gain: IEnumerable vs IQueryable <p>This post compares IEnumerable against IQuerable using an experiment to illustrate the behaviour and performance differences. Spotting a func vs an expression func filter bug is easy to miss. The caller’s syntax stays the same but it could have a 10x performance impact on your application.</p> <a href="/Downloads/IQueryableVsIEnumerable.zip" role="button" class="btn btn-primary btn-sm">Download Source Code</a> <h3>Setup</h3> SQL Server 2014 was used for hosting the database. The GeoAllCountries table content was sourced from <a href="http://www.geonames.org/" target="_blank">GeoNames</a> and contains just over 10 million rows. Entity Framework 6 was used for the LINQ to SQL integration. <h3>Predicate Function</h3> The code below will query the GeoAllCountries table and use a filter predicate function to filter the results starting with "Aus". <pre class="brush: c-sharp;"> static void Main(string[] args) { var stopWatch = Stopwatch.StartNew(); var countryNames = GetCountryNames(name =&gt; name.StartsWith("Aus")); foreach (var name in countryNames) { Console.WriteLine(name); } stopWatch.Stop(); Console.WriteLine("Running time: {0}", stopWatch.Elapsed.TotalSeconds); Console.ReadLine(); } public static IEnumerable&lt;string&gt; GetCountryNames(Func&lt;string, bool&gt; filterFunc) { using (var context = new TestDatabaseDataContext()) { IQueryable&lt;string&gt; names = (from country in context.GeoAllCountries select country.Name); foreach (var name in names.Where(filterFunc)) { yield return name; } } } </pre> <pre>Running time: 8.6558463 </pre> SQL Server Profiler captured the following query between the application and the database: <pre class="brush: sql;">SELECT [t0].[Name] FROM [dbo].[GeoAllCountries] AS [t0] </pre> <div class="alert alert-danger" role="alert"> <b>Oops!</b> Filtering was not performed at the database server. This is because the predicate function used in the where clause on line 23 turned IQueryable into IEnumerable. </div> <h3>Expression Predicate Function</h3> The code below will query the GeoAllCountries table and use an expression filter predicate function to filter the results starting with "Aus". <pre class="brush: c-sharp;"> static void Main(string[] args) { var stopWatch = Stopwatch.StartNew(); var countryNames = GetCountryNames(name =&gt; name.StartsWith("Aus")); foreach (var name in countryNames) { Console.WriteLine(name); } stopWatch.Stop(); Console.WriteLine("Running time: {0}", stopWatch.Elapsed.TotalSeconds); Console.ReadLine(); } public static IEnumerable&lt;string&gt; GetCountryNames( Expression&lt;Func&lt;string, bool&gt;&gt; filterFunc) { using (var context = new TestDatabaseDataContext()) { IQueryable&lt;string&gt; names = (from country in context.GeoAllCountries select country.Name); foreach (var name in names.Where(filterFunc)) { yield return name; } } } </pre> <pre>Running time: 0.8633603 </pre> SQL Server Profiler captured the following query between the application and the database: <pre class="brush: sql;">exec sp_executesql N'SELECT [t0].[Name] FROM [dbo].[GeoAllCountries] AS [t0] WHERE [t0].[Name] LIKE @p0',N'@p0 nvarchar(4000)',@p0=N'Aus%' </pre> <div class="alert alert-success" role="alert"> <b>Success!</b> Filtering was performed at the database server. This is because the expression predicate function used in the where clause on line 24 kept the statement as IQueryable allowing the expression tree builder to add filters to the select query. </div> <b>Note that the client code did not change.</b> Adding the expression syntax around the func made a world of difference. It is pretty easy to add the predicate syntax but is just as easy to miss in a code review unless you have the fidelity to spot the issue and understand the implications. <h3>Summary</h3> <p><b>IEnumerable</b> executes the select query at the database and filters the data in-memory at the application layer.</p> <p><b>IQueryable</b> executes the select query and all of the filters at the database.</p> <p>The database filtering reduced network traffic and application memory load resulting in a significant 10x performance gain.</p> /post/10x-performance-gain-ienumerable-vs-iqueryable jay@webdevelopment.co.nz /post/10x-performance-gain-ienumerable-vs-iqueryable#comment /post.aspx?id=09a570ad-0154-4701-9afd-afefdd838aef Tue, 08 Dec 2015 09:27:00 +1300 Software Design IEnumerable Entity Framework LINQ to SQL Performance .NET C# Jay Strydom /pingback.axd /post.aspx?id=09a570ad-0154-4701-9afd-afefdd838aef 0 /trackback.axd?id=09a570ad-0154-4701-9afd-afefdd838aef /post/10x-performance-gain-ienumerable-vs-iqueryable#comment /syndication.axd?post=09a570ad-0154-4701-9afd-afefdd838aef Butcher the LINQ to SQL Resource Hog <p> Has your LINQ to SQL repository ever thrown a "<b>cannot access a disposed object</b>" exception? You can fix it by calling ToList on the LINQ query but it will impede your application’s performance and scalability. </p> <p> This post covers common pitfalls and how to avoid them when dealing with unmanaged resources such as the lifecycle of a database connection in a pull-based IEnumerable repository. An investigation is made to uncover when Entity Framework and LINQ to SQL resources are disposed of and how to implement an effective solution. </p> <a href="/Downloads/IEnumerableResourceExample.zip" role="button" class="btn btn-primary btn-sm">Download Source Code</a> <h3>Setup</h3> The following repository class will be used to model the same behaviour as an actual LINQ to SQL database repository. <pre class="brush: c-sharp;"> public class Model { public string Message { get; set; } } public class Repository : IDisposable { public IEnumerable&lt;Model&gt; Records { get { if (_disposed) throw new InvalidOperationException("Disposed"); Console.WriteLine("Building message one"); yield return new Model() { Message = "Message one" }; if (_disposed) throw new InvalidOperationException("Disposed"); Console.WriteLine("Building message two"); yield return new Model() { Message = "Message two" }; } } private bool _disposed = false; public void Dispose() { Dispose(true); GC.SuppressFinalize(this); } protected virtual void Dispose(bool disposing) { if (_disposed) return; _disposed = true; } } </pre> <h3>LINQ to SQL: Cannot access a disposed object</h3> Let's execute the LINQ query below to call the repository and write the results to the console. <pre class="brush: c-sharp;"> static void Main(string[] args) { var records = GetLinqRecords(); foreach (var record in records) { Console.WriteLine(record); } Console.ReadLine(); } private static IEnumerable&lt;string&gt; GetLinqRecords() { using (var repository = new Repository()) { return (from model in repository.Records select model.Message); } } </pre> <div class="alert alert-danger" role="alert"> <b>Oops!</b> An InvalidOperationException occurred on line 12 in the repository class. </div> <p> A LINQ to SQL application would raise the following exception: </p> <b> An unhandled exception of type 'System.ObjectDisposedException' occurred in System.Data.Linq.dll<br> Additional information: Cannot access a disposed object.</b> <h3>LINQ to SQL: ToList</h3> Let's execute the LINQ query below by materialising the records to a list first: <pre class="brush: c-sharp;"> static void Main(string[] args) { var records = GetLinqRecordsToList(); foreach (var record in records) { Console.WriteLine(record); } Console.ReadLine(); } private static IEnumerable&gt;string&lt; GetLinqRecordsToList() { using (var repository = new Repository()) { return (from model in repository.Records select model.Message).ToList(); } } </pre> <pre>Building message one Building message two Message one Message two </pre> <div class="alert alert-warning" role="alert"> <b>Warning!</b> The code works but with bad side-effects. All of the records were loaded into memory immediately and the caller lost the ability to defer exection. The benefits of deferred execution are described in the <a href="http://www.webdevelopment.co.nz/post/yield-ienumerable-vs-list-building">Yield IEnumerable vs List Building</a> post. </div> <h3>Yield to the rescue</h3> Let's execute the code below using yield instead: <pre class="brush: c-sharp;"> static void Main(string[] args) { var records = GetYieldRecords(); foreach (var record in records) { Console.WriteLine(record); } Console.ReadLine(); } private static IEnumerable&lt;string&gt; GetYieldRecords() { using (var repository = new Repository()) { foreach (var record in repository.Records) { yield return record.Message; } } } </pre> <pre>Building message one Message one Building message two Message two </pre> <div class="alert alert-success" role="alert"> <b>Success!</b> The connection was also kept alive and the records were constructed in a deferred execution pull-based manner. </div> <h3>Don’t refactor your code</h3> Let's see what happens when we run a refactored version of the code: <pre class="brush: c-sharp;"> static void Main(string[] args) { var records = GetRefactoredYieldRecords(); foreach (var record in records) { Console.WriteLine(record); } Console.ReadLine(); } private static IEnumerable&lt;string&gt;string&lt;string&gt; GetRefactoredYieldRecords() { using (var repository = new Repository()) { return YieldRecords(repository.Records); } } private static IEnumerable&lt;string&gt; YieldRecords(IEnumerable&lt;Model&gt; records) { if (records == null) throw new ArgumentNullException("records"); foreach (var record in records) { yield return record.Message; } } </pre> <div class="alert alert-danger" role="alert"> <b>Oops!</b> An InvalidOperationException occurred on line 12 in the repository class. </div> <p>Déjà Vu. The same error occurred as seen in the LINQ to SQL example. Take a closer look at the IL produced by the compiler using a tool such as <a href="http://ilspy.net/" target="_blank">ILSpy</a>.</p> <p>In the refactored and the LINQ to SQL version, instead of returning an IEnumerable function directly, a function is returned that points to another IEnumerable function. Effectively, it is an IEnumerable within an IEnumerable. The connection lifecycle is managed in the first IEnumerable function which will be disposed once the second IEnumerable function is returned to the caller.</p> <p>Keep it simple, return the IEnumerable function directly to the caller.</p> /post/Butcher-the-LINQ-to-SQL-Resource-Hog jay@webdevelopment.co.nz /post/Butcher-the-LINQ-to-SQL-Resource-Hog#comment /post.aspx?id=bf6cdc45-bc0b-4c0b-a075-b6643b2e4ae8 Sun, 06 Dec 2015 16:11:00 +1300 Software Design IEnumerable IEnumerable LINQ to SQL Entity Framework Performance C# .NET Jay Strydom /pingback.axd /post.aspx?id=bf6cdc45-bc0b-4c0b-a075-b6643b2e4ae8 0 /trackback.axd?id=bf6cdc45-bc0b-4c0b-a075-b6643b2e4ae8 /post/Butcher-the-LINQ-to-SQL-Resource-Hog#comment /syndication.axd?post=bf6cdc45-bc0b-4c0b-a075-b6643b2e4ae8 Yield IEnumerable vs List Building <p> This post describes the use of yield and compares it to building and returning a list behind an IEnumerable&lt;T&gt; interface.</p> <a href="/Downloads/EnumerableExample.zip" role="button" class="btn btn-primary btn-sm">Download Source Code</a> <h3>Setup</h3> The example consists of a contact store that will allow the client to retrieve a collection of contacts. <p> The IStore.GetEnumerator method must return IEnumerable&lt;T&gt;, which is a strongly typed generic interface that describes the ability to fetch the next item in the collection. </p> <p>The actual implementation of the collection can be decided by the concrete implementation. For example, the collection could consist of an array, generic list or yielded items. </p> <pre class="brush: c-sharp;"> public interface IStore&lt;out T&gt; { IEnumerable&lt;T&gt; GetEnumerator(); } public class ContactModel { public string FirstName { get; set; } public string LastName { get; set; } } </pre> <h3>Calling GetEnumerator</h3> Let's create two different stores, call the GetEnumerator on each store and evaluate the console logs to determine if there is a difference between the List Store and the Yield Store. <h4>List Store</h4> The code below is a common pattern I've observed during code reviews, where a list is instantiated, populated and returned once ALL of the records have been constructed. <pre class="brush: c-sharp;"> public class ContactListStore : IStore&lt;ContactModel&gt; { public IEnumerable&lt;ContactModel&gt; GetEnumerator() { var contacts = new List&lt;ContactModel&gt;(); Console.WriteLine("ContactListStore: Creating contact 1"); contacts.Add(new ContactModel() { FirstName = "Bob", LastName = "Blue" }); Console.WriteLine("ContactListStore: Creating contact 2"); contacts.Add(new ContactModel() { FirstName = "Jim", LastName = "Green" }); Console.WriteLine("ContactListStore: Creating contact 3"); contacts.Add(new ContactModel() { FirstName = "Susan", LastName = "Orange" }); return contacts; } } static void Main(string[] args) { var store = new ContactListStore(); var contacts = store.GetEnumerator(); Console.WriteLine("Ready to iterate through the collection."); Console.ReadLine(); } </pre> <pre>ContactListStore: Creating contact 1 ContactListStore: Creating contact 2 ContactListStore: Creating contact 3 Ready to iterate through the collection. </pre> <div class="alert alert-warning" role="alert"> <b>Note:</b> The entire collection was loaded into memory without even asking for a single item in the list. </div> <h4>Yield Store</h4> The yield alternative is shown below, where each instance is returned as soon as it is produced. <pre class="brush: c-sharp;"> public class ContactYieldStore : IStore&lt;ContactModel&gt; { public IEnumerable&lt;ContactModel&gt; GetEnumerator() { Console.WriteLine("ContactYieldStore: Creating contact 1"); yield return new ContactModel() { FirstName = "Bob", LastName = "Blue" }; Console.WriteLine("ContactYieldStore: Creating contact 2"); yield return new ContactModel() { FirstName = "Jim", LastName = "Green" }; Console.WriteLine("ContactYieldStore: Creating contact 3"); yield return new ContactModel() { FirstName = "Susan", LastName = "Orange" }; } } static void Main(string[] args) { var store = new ContactYieldStore(); var contacts = store.GetEnumerator(); Console.WriteLine("Ready to iterate through the collection."); Console.ReadLine(); } </pre> <pre>Ready to iterate through the collection. </pre> <div class="alert alert-info" role="alert"> <b>Note:</b> The collection wasn't executed at all. This is due to the "deferred execution" nature of IEnumerable. Constructing an item will only occur when it is really required. </div> Let's call the collection again and obverse the behaviour when we fetch the first contact in the collection. <pre class="brush: c-sharp;"> static void Main(string[] args) { var store = new ContactYieldStore(); var contacts = store.GetEnumerator(); Console.WriteLine("Ready to iterate through the collection"); Console.WriteLine("Hello {0}", contacts.First().FirstName); Console.ReadLine(); } </pre> <pre>Ready to iterate through the collection ContactYieldStore: Creating contact 1 Hello Bob </pre> <div class="alert alert-success" role="alert"> <b>Nice!</b> Only the first contact was constructed when the client "pulled" the item out of the collection. </div> <h4>Possible multiple enumeration of IEnumerable</h4> Have you ever noticed the "possible multiple enumeration of IEnumerable" warning from ReSharper? ReSharper is warning us about a potential double handling issue, particularly for deferred execution functions such as yield and Linq. Have a look at the results produced from the code below. <pre class="brush: c-sharp;"> static void Main(string[] args) { var store = new ContactYieldStore(); var contacts = store.GetEnumerator(); Console.WriteLine("Ready to iterate through the collection"); if (contacts.Any()) { foreach (var contact in contacts) { Console.WriteLine("Hello {0}", contact.FirstName); } } Console.ReadLine(); } </pre> <pre>Ready to iterate through the collection ContactYieldStore: Creating contact 1 ContactYieldStore: Creating contact 1 Hello Bob ContactYieldStore: Creating contact 2 Hello Jim ContactYieldStore: Creating contact 3 Hello Susan </pre> <div class="alert alert-warning" role="alert"> <b>Note:</b> The first contact was constructed twice, hence the multiple enumeration warning. Where possible, avoid checking if a collection has items before looping. It doesn't really hurt looping through a collection containing zero items. Checking that the collection is not null before looping is highly recommended. Worst yet, calling .Count() means that the entire collection will be built twice! </div> <h4>IEnumerable.ToList()</h4> <p>What if we have a requirement to materialize (build) the entire collection immediately? The answer is shown below.</p> <pre class="brush: c-sharp;"> static void Main(string[] args) { var store = new ContactYieldStore(); var contacts = store.GetEnumerator().ToList(); Console.WriteLine("Ready to iterate through the collection"); Console.ReadLine(); } </pre> <pre>ContactYieldStore: Creating contact 1 ContactYieldStore: Creating contact 2 ContactYieldStore: Creating contact 3 Ready to iterate through the collection </pre> <p>Calling .ToList() on IEnumerable will build the entire collection up front.</p> <h3>Comparison</h3> <p>The list implementation loaded all of the contacts immediately whereas the yield implementation provided a deferred execution solution.</p> <p>In the list example, the caller doesn't have the option to defer execution. The yield approach provides greater flexibility since the caller can decide to pre-load the data or pull each record as required. A common trap to avoid is performing multiple enumerations on the same collection since yield and Linq functions will perform the same operation for each enumeration.</p> <p>In practice, it is often desirable to perform the minimum amount of work needed in order to reduce the resource consumption of an application.</p> <p>For example, we may have an application that processes millions of records from a database. The following benefits can be achieved when we use IEnumerable in a deferred execution pull-based model:</p> <ul> <li> <b>Scalability, reliability and predictability</b> are likely to improve since the number of records does not significantly affect the application’s resource requirements. </li> <li> <b>Performance and responsiveness</b> are likely to improve since processing can start immediately instead of waiting for the entire collection to be loaded first. </li> <li> <b>Recoverability and utilisation</b> are likely to improve since the application can be stopped, started, interrupted or fail. Only the items in progress will be lost compared to pre-fetching all of the data where only a portion of the results was actually used. </li> <li> <b>Continuous processing</b> is possible in environments where constant workload streams are added. </li> </ul> /post/yield-ienumerable-vs-list-building jay@webdevelopment.co.nz /post/yield-ienumerable-vs-list-building#comment /post.aspx?id=312ccb60-c1bb-4883-8523-ae13a8aab229 Thu, 03 Dec 2015 07:28:00 +1300 Software Design IEnumerable IEnumerable Yield C# IList<T> IEnumerable<T> .NET Jay Strydom /pingback.axd /post.aspx?id=312ccb60-c1bb-4883-8523-ae13a8aab229 0 /trackback.axd?id=312ccb60-c1bb-4883-8523-ae13a8aab229 /post/yield-ienumerable-vs-list-building#comment /syndication.axd?post=312ccb60-c1bb-4883-8523-ae13a8aab229