Overview

Part of the General Assembly Data Science Bootcamp Series

Week 2 - Exploratory Data Analysis in Pandas

The advantage of being already proficient in one or more programming languages is that learning Python (or any other programming language for that matter) is relatively easy. I say that with a bit of a caveat though. Python is quite simple to pick up, however it’s got a reputation of being quite terse. But like anything, practice makes perfect.

Coming from a C/C++/C# background, many of the language constructs also exist in Python, so no problem there. However when we are trying to learn list comprehension, for example, the C# developer in me automatically defaults to thinking about using loops (almost any problem can be solved with it), or LINQ. But the Python way is to reach for list comprehension as a first preference versus using looping constructs.

In the second week, we were introduced to Pandas, a general purpose data analysis and manipulation tool built on top of Python. This is one of the Libraries that will be a staple in each Data Science professional. Like Python, it is also open source, with a very active community, a tell-tale sign of a good library.

Before Python, I would have picked something like Excel or Google sheets to do something with data analysis. However, after having briefly used Pandas, I can quickly see the value this tool can add to the process. It is known in the community as the Swiss Army Knife of data manipulation.

Unit Project due after this weekend

GA’s 10-week course is organized into 4 units. And this week is the end of Unit 1, and that it means that the cohort will have to work and submit the first Unit project.

As I am in between jobs at the moment, I have lots of time in my hands, believe me I have already spent hours this week keeping myself up to date with my favorite streaming shows in Netflix, Amazon, and Apple TV! More importantly, I have also spent some quality time to work on and complete the current Unit Project.

The Unit Project is a 4 problem Python Coding exercise, nothing complicated, however, they are not that trivial either. One will need some quality time to focus and get lots of practice to really have the confidence level required as the course progresses. This is the best chance to devote time to learn about the language, because from next week, it can get really fast and complicated, and I want the language issues out of the way by then.

Ooh, the Capstone Project…

The Final Project is the thing that gives me the most excitement in this course. I know it is difficult, specially for something one has not done before, but I am so looking forward to going all in and giving it all I’ve got. Bear in mind that this is still a part-time course, and the other aspects of my family life, work, physical fitness, etc, etc, are still there and need slice of my time. So it will be interesting, but I’m pumped and ready.

This early, we have been made aware that the Capstone Project proposal will be due by the end of Unit 2, and that we better start thinking about what real-world problems we want to solve. Something we are really into and familiar with, since working with something otherwise will make it more difficult than it really is. The course instructor and the instructor associate are there to help us choose an appropriately-scoped project that can be completed in the short 10 week period.

We will really need guidance at this stage, since being data science novices, we still do not have the intuition in identifying what class/scope of problems are possible to be solved in this 10-week window.

There are countless Final Project possibilities, based on the readily available dataset, for example:

  • Amazon Pricing Data
  • Health Insurance Marketplace
  • Instacart Orders
  • Consumer Loan Data
  • Fuel Economy Data
  • Craft Beers Data
  • All Trump Tweets
  • Choose your own dataset

I will likely choose my own dataset that are more in line with my interests.

Resources

2021

Back to top ↑

2020

DynamoDB and Single-Table Design

9 minute read

Follow along as I implement DynamoDB Single-Table Design - find out the tools and methods I use to make the process easier, and finally the light-bulb moment...

Back to top ↑

2019

Website Performance Series - Part 3

5 minute read

Speeding up your site is easy if you know what to focus on. Follow along as I explore the performance optimization maze, and find 3 awesome tips inside (plus...

Back to top ↑