By Dany Hoter, DataRails Solutions Architect
Microsoft Excel is quick, but complex calculations can sometimes take more time than expected.
This was the problem that a customer faced when I was asked to step in and help. The original file took several minutes to calculate, and after some analysis, I concluded that the bulk of the processing time was spent calculating a single column. Using alternative methods, I was able to cut the time significantly, from 10 seconds to 0.1 second.
The task was to add a column that identified whether the current row contained the largest sale for the city in the row.
The file that simulates the original task contains 10,000 rows, each one representing a sales transaction. You can download the file here to see the various solutions I attempted.
I included the sales amount and the city in which the sales took place. Since I’m a fan of tables, I chose to implement all formulas using table notation, but doing so makes no difference in terms of performance.
Solution 1: Array Formula
The original method to accomplish the task used the following array formula:
Note that [City] references the entire column, while [@City] is the value of the column city in the current row. The formula used the IF function for the entire table, returned all amounts for the current city to the Max function, and the Max value was compared to the amount in the current row.
The formula worked correctly, but took 10 seconds to perform—a long time for a relatively small table.
Solution 2: MAXIFS
Many users are familiar with the SUMIFS and COUNTIFS functions, which were added to Excel in 2007 and provided the ability to apply multiple conditions. After Excel 2016, more functions were added, including the very useful MAXIFS.
This method, which took 2.7 seconds to perform, was faster than the array formula and was also easier to understand. The only problem with this method is that not everyone has the MAXIFS function in their versions of Excel.
Solution 3: GETPIVOTDATA
This solution took just 0.1 seconds to perform; however, it requires some preparation.
GETPIVOTDATA is a function that was introduced in Excel 2003. Most users hated it when they first saw it, and most of them have been trying to avoid it since. Admittedly, I didn’t find the function very useful until very recently. I even remember showing users that there is a tool that can turn off the feature of generating GETPIVOTDATA, and how grateful they were. (This feature can be found in PivotTable tools/Analyze/options/Generate GetPivotData.)
Unlike other new functions, this one actually changed the behavior of Excel. Since 2003, when you start writing a formula and point to a cell that contains a value in a pivot table, Excel creates a reference that looks like
instead of just
This particular GETPIVOTDATA function that I created returns the value of SalesAmount from a pivot that starts at cell A3 for customers in the city of Berkeley and black products.
There are three main drawbacks to this function:
- All the generated arguments are string literals within double quotes like “Black” and “Berkeley.” If you generate such a formula and try to drag it down, the generated copies return exactly the same value because the formula contains no relative references.
- All the referenced values from the pivot need to stay visible in the pivot. If you filter out any values that are referenced, the formula will return to #REF.
- The pivot table needs to be refreshed every time the data changes. If the data changes frequently, it may be advisable to refresh the pivot from VBA any time the data changes.
The first drawback can be remedied by replacing the static values with a regular reference to some cell, as you can see in the formula above when [@City] is used for the city in the current row. The second drawback has no real solution, and so one needs to make sure that all referenced values remain visible. In addition, the refresh issue has no easy solution.
On the other hand, the actual aggregation work is performed extremely fast and only during refresh, which is also very rapid. The larger the table and the greater the number of conditions applied, the bigger the difference. In this specific case, a 100X improvement in performance can definitely make a difference and is worth the extra cost of building the pivot and refreshing it.
In the above pivot, the value uses the MAX aggregation, but any other aggregation supported by pivot tables can be used.
About the author:
Dany Hoter is a renowned Excel guru who works as a solutions architect at DataRails. Dany has over two decades’ experience working as an expert for the Microsoft Excel team, and his online Excel classes reach over one million students across the globe.