Satalyst Brilliance on 30 Apr 2015

Power Query: How to iterate over a list

If you’ve been using Power BI or you’ve seen the demos, then like me, you’re probably excited about Power Query. As a data nerd, I’ve been having a lot of fun pulling data sets off the web and mashing them up. Getting web data used to be quite a manual task for me, since I wasn’t about to write a custom screenscraper. I really love the “From Web” functionality that mines web pages for HTML tables. But I’m a developer, so of course I wanted to load a list of files, or a list of websites, and I had some trouble figuring out how to do it. Hopefully my experiences will help someone else.

One of the tricks that I found was that the list that you want to iterate over has to be pulled into Power Query designer. So let’s start with looping over a list of files – this is something that EVERYONE will want to do at some point. So first, launch the Power Query designer from the Power Query ribbon, and use the folder that you want as your data source:
Power Query Folder Source

As you can see, you get a lot of meta data about your folder contents. The most important column is the “Binary” column, which stores the actual data in the files.
SourceStep2
Another thing to look out for: all of your files must be in the same format, otherwise when you try to load them you get an error. I’ve got two that don’t conform, so I’ll filter those two rows out using the “Remove Rows” function on the Home ribbon of the Query editor.

So how do you load that data? You need a custom column to hold the data. When you use the designer, behind the scenes it’s writing M code, which you can access from the Advanced Editor. If you have the formula bar showing, it also shows you the M for each step that you use. So when you insert a custom column, this is what you see:

Custom  Column Editor

 

In the formula bar, you’ll see that Power Query has added a lot. Table.AddColumn is pretty self-explanatory. “#”Removed Other Columns” is the name of my last step as generated by the designer, where I removed all columns but the file name and the Content. With M, I can perform an operation on any step along the way, so it’s important to check that you’re using the right one. The tricky part is this – you would think you are writing your formula in the dialog, but really you’re adding whatever is in the dialog to the end of the formula in the formula bar, where the ellipsis is. So this step ends up looking like this behind the scenes:

#"Removed Other Columns" = Table.AddColumn(#"Filter Table", "Custom", each Table.PromoteHeaders(Csv.Document([Content])))

 


The meat of this is the each Csv.Document([Content]), which just gets the loads up the data from the Content column into a nested table in your new custom column. I’m promoting the headers at the same time just to make it cleaner.
How to expand the columnOnce the data is loaded into your new custom column, all you have to do is expand it by clicking on the icon to the right of your column name. Since we’ve promoted the headers already, you’ll end up with all of your files fully loaded, headers in place.

 

The final data looks like this:

Data Expanded

 

And the final M code for this project sample looks like this:

let
   Source = Folder.Files("C:\MyFolderPathHere\Data"),
   #"Removed Other Columns" = Table.SelectColumns(Source,{"Content", "Name"}),
   #"Filter Table" = Table.SelectRows(#"Removed Other Columns", each Text.StartsWith([Name], "FuelWatchRetail")),
   #"The Formula" = Table.AddColumn(#"Filter Table", "Custom", each Table.PromoteHeaders(Csv.Document([Content]))),
   #"Expand Custom" = Table.ExpandTableColumn(#"The Formula", "Custom", {"PUBLISH_DATE", "TRADING_NAME", "BRAND_DESCRIPTION", "PRODUCT_DESCRIPTION", "PRODUCT_PRICE", "ADDRESS", "LOCATION", "POSTCODE", ""}, {"PUBLISH_DATE", "TRADING_NAME", "BRAND_DESCRIPTION", "PRODUCT_DESCRIPTION", "PRODUCT_PRICE", "ADDRESS", "LOCATION", "POSTCODE", ""}),
   #"Removed Columns" = Table.RemoveColumns(#"Expand Custom",{"Content", "Name"}),
   #"Changed Type" = Table.TransformColumnTypes(#"Removed Columns",{{"PUBLISH_DATE", type date}, {"PRODUCT_PRICE", type number}})
in
#"Changed Type"

 

Next time: Creating a parameterized function in M to make multiple web calls.

 

Thanks for reading,

Leah

Categories:
Tags: