I do some pretty advanced Power Query work (yes, I know they call it “Get and Transform”, but it will always be Power Query to me) and in order to remove duplicates, it is common to use the “Group By” feature. This has the unfortunate requirement of requiring you to select an aggregation for the grouping.
Using this method, you would need to then have an additional step where you would delete the aggregate column before continuing. Although this doesn’t sound like a big deal, if you do the group operations multiple times, it could get redundant and make the total number of steps grow unnecessarily.
So how do we group the result set without having to create an aggregate column?
Well, you can’t do this through the GUI, but you can do it through the “Advanced Editor”. If you’ve never used the Advanced Editor, this is really where the “Power” of “Power Query” comes out.
To do the grouping:
- First, pull in the data. For our example, we are pulling in from a SharePoint list titled “Product Sales” which stores product sale information. For our example, we are looking to get a unique listing of products.
Here’s our list:
Within Excel, we bring in the list by selecting Data Tab > From Other Sources > From SharePoint List:
Then type in the URL to your SharePoint list and click “OK”.
In the Navigator screen that appears, select the list you want to group:
Then in Excel, you will see the pull of the list in its raw form and a number of extra columns we don’t need.
To only return the column to group (Title in our case), we select it and remove all other columns:
The resulting set looks like this:
- Do the grouping like you would with the aggregate column (say Count).
With this single column, we would select the “Group By” Action and select the column to group by (Title in our case), and click “OK”.
In the preview panel, you will see the grouped list with the aggregated column:
- Open “Advanced Editor” and locate the step you just created.
Clicking on the “Advanced Editor”, you see the row that we just created. You’ll notice the reference to our aggregate column in the text ‘{“Count”, each Table.RowCount(_), type number}’
-
Remove the entry for the aggregate column and leave an empty set.
We simply remove this text so that we have an empty set in the step:
-
Re-run the query to see that the aggregate column is not created and what remains is a clean set without duplicates.
After we re-run, there is no “Count” column and our Power Query code is clean.
That’s all there is to it. Happy Power Querying (or Get and Transforming…. <grumble><grumble>) !