Week 10: Dealing With Messy Data

This week, I finally found all the data sets and sources we will be using for the Freedom House project and started with the data cleaning process. Data cleaning and pre-processing is a crucial process in analytics – data scientists spend 80% of their time cleaning and pre-processing data. Looks like I got a good taste of it. So, this is basically what I am doing in the data cleaning process:

  1. Merge data before 2003 with the data after 2003 (this is messy since they are in different formats and aggregation levels)
  2. Merge data related to freedom in disputed territories (e.g. West Bank and Gaza)
  3. Find and merge population data going back to 1973 and data about electoral democracies

About the MSI blog, we didn’t get any updates from the editor which was frankly speaking, very unprofessional. We have decided to reach out to BNE Intellinews, another news reporting site in Europe to see if they are interested in publishing my blog. Let’s see what happens!

Published by Ashwed Patil

Graduate Student in Information Science interesting in pursuing a career in data analytics for public and non-profit sector and international development.

Leave a comment

Your email address will not be published. Required fields are marked *