In this article, we shall look into the best practices to clean, prepare, and transform your data for analysis. Clean, transformed data before we model ensures that our data is error-free and makes it easier to form relationships and create measures.
Duplicate columns: You can duplicate the columns by right clicking the header of the column and selecting duplicate column.
Split columns: You can split the columns by right clicking the header and selecting split columns, you will be prompted to specify the delimiter. Delimiter is defined as the sequence of one or more characters used to specify the boundary between one or more characters. Alternatively, you can also split the column based on the number of characters or by position or by lowercase to uppercase or by digit to non-digit etc.
Remove duplicates and replace values: Right click the column header and then select Remove duplicates to remove duplicate values within the column. To replace values in the column, select the column header and click on Replace values. Replace values dialogue box will appear, and you will be prompted to specify the Values to find and the Values to replace.
Group by: Group by function can be used to aggregate values across multiple rows/columns into one. We can find the group by icon in the transform ribbon. You can perform the following operations using group by: Sum, Average, Median, Min and Max, Percentile, Count, Distinct etc.
- Select the Advanced option, so you can select multiple columns to group by.
- Select the column you need to group by.
- Select Add grouping.
Unpivot columns: Unpivoting columns is a data transformation process that converts wide-format data (where multiple columns represent different attributes) into a long-format structure. In this format, the unique values from the column headers become values in a new column, while their corresponding data is organized into another column. To unpivot columns in Power BI, select the columns you want to unpivot in the Power Query Editor. Right-click on the selected columns and choose “Unpivot Columns.” This transforms the data, converting column headers into values in a new column, making it easier to analyse in a tidy format.
Rename: You can rename a column by right clicking the column and selecting rename or double clicking the title to trigger an edit.
Uppercase, Lowercase, Trim, capitalize each word: Right click the column header and select Transform option to transform the text into uppercase or Lowercase. You can also trim trailing or leading whitespaces and capitalize each word.
Append: We use the append feature in Power BI to add rows into our queries. This option can be found in the home ribbon in Power Query Editor. You can select Append Queries which appends the query specified with the existing query. You can also select Append Queries as New to create a new query that has the rows from the combined queries. You will be prompted to select the query, and you can click on Add.
Merge: Merge Queries is a feature that allows you to combine data from two or more tables into a single table based on a related column. This is particularly useful for integrating data from different sources or enhancing your dataset with additional information. Like append, you can either merge queries within the existing query or create a new query that has the combined data within it (Merge Queries as New)