Reading Excel Files

There are multiple methods in HuLib for reading excel files. The latest addition is the BufferedExcelReader. BufferedExcelReader is meant to replace older methods as it is much much more performant. It achieves this by grabbing all excel data in bulk at the beginning and then providing methods to interact with the reader directly to get data out.

Creating an instance

To create an instance, just call the constructor with the filename:
BufferedExcelReader reader = new BufferedExcelReader(fileName);
Now you are ready to read from it! If you want to switch the sheet you are reading you can use:
reader.GoToSheet("MySheetName");
Note that initially, the sheet will be the active workbook sheet.

Reading data directly

There are two main ways to do the reading.

1. Using the reader

This is convenient when column numbers can change and you want to read data row by row and not have to keep track of the position manually:
string[] readLine = reader.ReadStrings();
object[] readData = reader.ReadLine();
ReadStrings() will convert everything to a string while ReadLine() will give you the raw types. Each time you call this the program will jump to the next line so you just have to deal with the data that comes out of it.

2. Use the sheet directly

If you want to use direct locations then you can use the sheet directly:
BufferedExcelReader.Sheet sheet = reader.SelectedSheet;
object read = sheet.Read("B8");
read = sheet.Read("B", 8);
read = sheet.Read(8, 2);
These 3 methods return the same thing, note that the third call is using the more common order of arguments where row is first (looks reversed from the first 2 methods).

Mapping with BufferedExcelReader

The quickest way to read tables off of Excel is to take advantage of the Mapping framework.
To use the mapping framework we need 2 things: A model and an object that implements the mappable interface. In this case a Sheet is a mappable object so we can use it to generate classes for us.

1. The Model

The model is just a class with properties, the only difference is we must define the Mapping attribute on each property to identify the column that will contain the data.
public class ReceiptLine
{ 
 [Mapping("A")]
 public string Store { get; set; }
 [Mapping("B")]
 public string Invoice { get; set; }
 [Mapping("C")]
 public string Date { get; set; }
}
You can see that the argument for the Mapping attribute is the column that the data will be pulled from.

2. The Mappable Object

Now we want to populate a collection of ReceiptLines from an excel file, this is trivial:
BufferedExcelReader reader = new BufferedExcelReader(fileName);
BufferedExcelMap bufferedExcelMap = reader.SelectedSheet.GetMap(9);
List receiptLines = bufferedExcelMap.CreateClasses();
First we create the reader, then we use the selected sheet and call GetMap - this returns an object that we use for mapping, the argument 9 is the row where the data starts (not the header). In this example A9 will be the start of the data. Using this BufferedExcelMap object we just use CreateClasses() to create all the rows of data as we would from other mappable objects.

Performance

Comparing this to using the older excel method that did not preload data on a file that had only about 200 rows:
Old method: 49 seconds
New method: 2 seconds - note that now the time is spent in the constructor, the mapping part is very quick
The difference is entirely due to reducing the number of interop calls into excel by fetching all data at the start.


Posted by Mike Hall on October 13, 2017 at 12:43 PM UTC
Edited on October 13, 2017 at 01:18 PM UTC
Public Comments
comments powered by Disqus