Data cleaning is an important aspect of data analysis. Most data analysts encounter lots of errors while getting data ready for analysis and spend more than 70% of their time during the data cleaning process. An error can lead to a greater problem for the company and as such, data analysts should make sure that their data set is correct, and accurate, before carrying out an analysis.
Before cleaning a #data set it is always important to make a copy of the data set.
1. Creating duplicate sheet
a. Right-click on the sheet i.e sample+ data
b. Click on Duplicate Sheet
c. Rename the sheet by double-clicking on the name of the sheet or right-clicking and selecting rename #worksheet.
2. Getting rid of extra spaces in cells with string data
In #Excel, you can use the TRIM command to get rid of white spaces.
Steps:
a. Highlight the data in the spreadsheet.
b. Click on the Data tab, then hover over Data cleanup and select Trim whitespace.
3. To eliminate rows with blank cells
Steps:
a. Highlight all cells in the #spreadsheet. You can highlight Columns A-G by clicking on the header of Column A, holding Shift, and clicking on the header of Column G.
b. Click on the Data tab and pick the Create a filter option.
c. Every column now shows a green triangle in the first row next to the column title. Click the green triangle in Column C to access a new menu.
d. On that new menu, click Filter by condition and open the dropdown menu to select Is empty. Click OK.
e. You can then review a list of all the rows with blank cells in that column. Select all these cells and delete the rows except the row of column headers.
f. Return to the Filter by condition and return it to None.
4. To Transpose Data
Steps:
a. You can do this by highlighting Columns A-G.
b. Right-click on the cell where you want the transposed data to start. Hover over Paste Special from the right-click menu. Select the Transposed option.
5. To Remove Duplicate Data
Steps:
a. Highlight the data in the spreadsheet.
b. Click on the Data tab, then hover over Data cleanup and select Trim whitespace.
c. Click on Data has a header row and the columns to remove duplicates.
CONCLUSION Knowing how to fix specific data either manually, with spreadsheet tools, or with #functions, is extremely valuable.
Once you’ve determined that the data is clean and compatible, you can start using it for analysis.
Data cleaning is so important because even one mistake can ripple throughout an organization, causing the same error to appear again and again. This leads to incorrect results.
Comments