London skyline
Join Us in London for GAIM Ops!
June 26, 2018
private equity
Digital Transformation in the Private Equity and Venture Capital Realms
July 11, 2018

Power Query Beyond the UI

Excel expert

by Dany Hoter, DataRails Solutions Architect

Experts debate endlessly about the direct use of M language, which is the script language behind Power Query.

The Power Query tool is a UI in front of the M language. This UI is covering more and more areas of functionality, but there is still a lot more power locked in the language. This leads to some people turning to straight use of the M language. A great example from way back when Power Query was not even called Power Query is this article from Chris Web (2013).

Many times in the past, I told people, “Instead of learning M, get a life.” What I meant was that M is hard, and Power Query is already very rich; therefore, spend your time learning more useful tools such as DAX.

Nowadays, I’m more open to a limited kind of using M that tweaks the generated M, as opposed to actually writing new M statements.

Anytime you find yourself opening the Advanced Editor, ask yourself if you have a good reason for that, and if the person that will need to maintain this solution after you will be able to follow what you did. If you apply some tweaking on the formula bar, you are probably in a safe place.

That is why I was happy to see that I can a solve a non-trivial problem without opening the Advanced Editor once.

Task on hand

I created a dataset with annual total sales for customers. The original dataset was much larger and came from a slow web source.

Every customer/year combination has a row, but many rows have zero as the value. I want to filter all the rows that have zero as the value, but after the filter, my pivoted results are skewed.

I pivot on year, and if the first customer has non-empty data for 2005 and 2008 and the second customer for 2006 and 2009, the order of columns of the result pivot will be 2005, 2008, 2006 and 2009. If no customers have data for a specific year, this year will be absent from the pivoted result. In this dataset, there are no sales for 2009.

I want all years to appear in the pivot and in the right order. On the other hand, I want to filter out the rows with zero values in order to read less rows from the source.

The solution I came up with  leaves all data for the first customer and removes empty rows from all other customers. In this way, all years are represented for the first customer and appear in the right order in the pivot. (Input data is sorted by customer and year.)

Steps

First, I want to extract the key of the first customer. The data is already in the Excel file, so I start a query from table. Instead of continuing with the rest of steps, I remove all but the first row, right-click the customer field and then right-click/drill down. This creates a simple value named Customer with a value of C1, which is the first customer key.

Now I need to continue with the regular steps, so I insert a new step from the Fx icon in the formula bar. I change the created line to = Source, which brings back the table with all rows in the original Excel table.

Now I filter on two columns: Amount <>  0 and Customer= “C1”

The generated line I see in the formula bar is:

= Table.SelectRows(Custom1, each ([Amount] <> 0) and ([Customer] = “C1”))

The tweak : I change the “and” to “or” and the second filter argument to use Customer, which is the variable I created in the beginning:

= Table.SelectRows(Custom1, each ([Amount] <> 0) or ([Customer] = Customer))

This filter cannot be created from the UI because it combines filtering on two different columns with an “or” condition and also uses a variable for comparison. It is still pretty easy to read and understand.

I want all rows from the first customer and all other non-zero rows. This guarantees that all years are found and in the right order, as you can see in the pivoted result.

So the two kinds of tweaks are the second use of source in the query and the manual change to the filter expression. The UI was nice enough to combine filtering on two columns to one filter statement and I changed the “and” between two columns to “or” and the filter operand from a fixed value to a variable one.

In my opinion, this level of tweaking is acceptable, and balances the need to create clear and maintainable solutions and the geeky urge to use more of the power of the M engine.

dany

 

About the author:

Dany Hoter is a renowned Excel guru who works as a solutions architect at DataRails. Dany has over two decades’ experience working as an expert for the Microsoft Excel team, and his online Excel classes reach over one million students across the globe.

Request a demo
[contact-form-7 404 "Not Found"]