As I was going through my list of projects on Trello board, I discovered a card in my backlog with the title “Data Cleansing – Excel multiple sheets”. That card somehow reminded me about an article named “10 Python Skills They Don’t Teach in Bootcamp” that I just read this morning. What a better way to give it a try! And the result is fabulous because these tricks totally made my day. Kudos to Nicole Janeway Bills for writing an amazing piece of work!
The dataset contains the US Monthly Retail Trade Report downloaded from the US Census Bureau website. It was an Excel file containing multiple sheets. Each sheet includes monthly sales of a specific year between 1992 and 2020. If you are interested in exploring the data, you may find the raw Excel file, Python data cleansing script and the final dataset on my GitHub.
Repeat after me, everyone! “I am unstoppable.” Well, I know it sounds weird when I mentioned mantra (instead of some technical or factual insights) as my biggest takeaway here. But here is why. I downloaded the dataset 2 months ago. I worked on it and was frustrated because something didn’t work. So I left it there and that’s why it remains in my backlog. So what has changed since then?
It’s all about the shift in mindset. If something doesn’t work after trying a few times, don’t fret. Take a step back and consider another approach. There are so many ways to solve a problem. And if I think through hard enough, try to understand each red error a bit more, eventually something will click. And voila! The sense of enlightenment is so real and fulfilling. Hehehe.
- Import and merge multiple Excel sheets in 1 DataFrame
- Cleanse and unpivot DataFrames with pandas
- Data Analysis
- Beware of subtotals and different levels of granularity that might be included in the dataset. Nobody wants to double count something.
- Be patient and read the background information, definitions, methodology and how the data is collected (if any).