Use tables and functions to clean up data

Jan 30
Use tables and functions to clean up data

Excel is often the go to application when it comes to moving data from one form to another. Excel is an excellent method of managing and formatting data. Data can be cleaned up from data exports. It can be used to report or analyze data using tables, charts and pivottables to get to the very depths of the data and answer almost any question. Often when data is removed from it's native storage application there can be many problems with the data. These complications can be caused by old applications, poor data entry practices, or not many options for exporting that data while maintaining formatting.

Excel has several features that help quickly clean up the data allowing you to more easily analyze and work with it. Often data exported from a database has the option of saving as a selection of file types. Going straight from a database into Excel is ideal but not always an option. Sometimes you must choose output like a CSV (Comma Seperated Value) or a Tab delimited file. Both are types of text files and both can be edited in Excel however, you want to make sure you save the file as Excel otherwise you may lose some of the edits you make to your data. If the file you are opeing is CSV or tab delimited start Excel and open the file by browsing to it using File, Open. If you attempt to open the file by double clicking it through an explorer window it may open another application related to a text file like WordPad or Note Pad so opening Excel first and opening the file into Excel is the recommended practice.

When you open a text file in Excel, a wizard will automatically start to help import the data.


On the first screen of the wizard you can choose the type of file either Delimited or Fixed width. You can also check the box if there is a header row in the data that is being imported. If that data to be imported is not in row 1 then you can also choose what row to start importing data. This is helpful if the file is coming from a report that had titles about the actual data. Don't forget to start the import on the row of the header data and not the first row of data.


The second screen of the import wizard allows you to choose what type of delimiter is used, and to select the checkbox to treat consecutive delimiters as one. Often in text files additional tabs or spaces are added to make the columns of data line up visually which could cause it to fall into the wrong column when the data is imported.


The final screen of the wizard allows you to change the data type of any columns that might need to be listed as date, currency, number etc. You can also skip importing select columns of data. Once the Import has been completed you may find that the data still needs a little work. Because the imported file is still in the text format it is a good idea to F12 and save the data as an Excel workbook before proceeding so that all of the changes to the data are supported.


For instance, this portion of data shown above has spaces in front of text and it also has some cells that contain multiple spaces between the words in a single cell. This type of issue can cause problems when consolidating data, or applying certain types of formatting like subtotals and conditional formatting. We will use the TRIM formula to fix this problem.


The blue box above shows the data as it was imported. The green box shows how the data looks after we have used the the TRIM formula. By using the TRIM function we can select the cell with the extra spaces and TRIM removes them from before, between, and behind the text in the cell. The formula can easily be copied down the entire column to quickly correct all of the imported data. Then the TRIM column needs to be copied and pasted using paste special values to replace the original column of data. The formula can then be deleted. If this step is skipped and the original data is deleted the formula will stop working.


Now that the spacing has been corrected we can turn our attention to other data problems like duplicates. In order to easily find and remove duplicates we have two options. We can highlight them using Conditional Formatting, or we can remove them altogther. Either way working with data in lists is eaiser to handle when formatted as a table so we will do that next.


Select the list and type Ctrl+T to make it a table. Check the box for My table has headers and that the selected range is correct then click OK. The data will now be formatted with alternating colors of blue and white rows by default.


Click the remove duplicates button on the Ribbon and select the column to look for duplicate data. The message will tell you how many lines are removed, but you do not get any indication of which lines have been deleted. Another alternative is simply to highlight the duplicate values using Conditional formatting.


 Select the Conditional Formatting button from the Home tabl and point to Highlight Cells Rules and Dublicate Values.


Now you can see the lines that are duplicated and can use the remove duplicates feature if you still want to remove the lines. Combining the use of Functions, Conditional Formatting and Tables are some great features in Excel that make data cleanup a much easier process.