Shaping and Transforming Data in Power BI

Evaluating and profiling the data

We shall explore the ways in which we can evaluate our datasets using the Power Query Editor. This can be opened by clicking on transform the data as soon as the navigator window opens after getting your datasets on to Power BI. Alternatively, you can also open power query editor using the transform data option in the home ribbon of Power BI Desktop.

In the view option in Power BI, we select column distribution, column quality, and column profile.

Column Distribution

Column distribution lets us view the distribution of the columns, indicating the distinct and unique values within the individual columns. Distinct values show the total of all the different types of values within the columns, and unique shows the total of all the records (rows) that are unique and only appear once in the column.

Column Quality

The column quality feature allows us to view the percentage of valid, error, and empty values in the column.

Column Profile

This feature allows us to view the value distribution, count, error, empty, distinct, unique, max, min and average values within a specified column. These options would be different depending on whether the column has text data type or alphabetical data type. You can only view the column profile one column at a time.

By default, the Power Query editor only shows these features for the top 1000 rows. This can be changed by clicking the option at the bottom of the page.

Reducing inconsistencies, unexpected or null values, and data quality issues

Renaming queries: Renaming the queries in a more end user friendly way instead of the naming conventions used in data warehousing. For example, DimProduct can be changed to Product.

Removing Errors: Error shown in column quality can be removed by right clicking the column header and selecting remove errors.

Removing null values: Null values within the columns can be replaced with their intended value or can be filled using the fill down option or fill up option depending on what you require.

Removing Extra Spaces- To remove the leading or trailing spaces in the text columns, right-click and select transform option and click on trim.

Changing data types

To change the data types of the columns, right-click the column header and change type. Alternatively, you can also change the data type by going in the home ribbon and selecting the data type.

Removing excessive columns: Removing unnecessary rows and columns that are not pertinent to our analysis reduces the size of the dataset and thereby increases the performance of our reports.

We can remove a column by right clicking the header and selecting remove or remove other columns depending on the columns we want to remove.

Use first row as headers: To promote the first row as headers, select the use first row as headers option from the home ribbon. Alternatively, you can demote headers by clicking on the drop-down arrow next to the use first row as headers to select Use headers as first row.

What do you think?

Related articles