A scalable (and cost-effective) strategy to transition your Machine Learning project from prototype to production
Part of the General Assembly Data Science Bootcamp Series
Week 2 - Exploratory Data Analysis in Pandas
The advantage of being already proficient in one or more programming languages is that learning Python (or any other programming language for that matter) is relatively easy. I say that with a bit of a caveat though. Python is quite simple to pick up, however it’s got a reputation of being quite terse. But like anything, practice makes perfect.
Coming from a C/C++/C# background, many of the language constructs also exist in Python, so no problem there. However when we are trying to learn list comprehension, for example, the C# developer in me automatically defaults to thinking about using loops (almost any problem can be solved with it), or LINQ. But the
Python way is to reach for list comprehension as a first preference versus using looping constructs.
In the second week, we were introduced to Pandas, a general purpose data analysis and manipulation tool built on top of Python. This is one of the Libraries that will be a staple in each Data Science professional. Like Python, it is also open source, with a very active community, a tell-tale sign of a good library.
Before Python, I would have picked something like Excel or Google sheets to do something with data analysis. However, after having briefly used Pandas, I can quickly see the value this tool can add to the process. It is known in the community as the Swiss Army Knife of data manipulation.
Unit Project due after this weekend
GA’s 10-week course is organized into 4 units. And this week is the end of Unit 1, and that it means that the cohort will have to work and submit the first Unit project.
As I am in between jobs at the moment, I have lots of time in my hands, believe me I have already spent hours this week keeping myself up to date with my favorite streaming shows in Netflix, Amazon, and Apple TV! More importantly, I have also spent some quality time to work on and complete the current Unit Project.
The Unit Project is a 4 problem Python Coding exercise, nothing complicated, however, they are not that trivial either. One will need some quality time to focus and get lots of practice to really have the confidence level required as the course progresses. This is the best chance to devote time to learn about the language, because from next week, it can get really fast and complicated, and I want the language issues out of the way by then.
Ooh, the Capstone Project…
The Final Project is the thing that gives me the most excitement in this course. I know it is difficult, specially for something one has not done before, but I am so looking forward to going all in and giving it all I’ve got. Bear in mind that this is still a part-time course, and the other aspects of my family life, work, physical fitness, etc, etc, are still there and need slice of my time. So it will be interesting, but I’m pumped and ready.
This early, we have been made aware that the Capstone Project proposal will be due by the end of Unit 2, and that we better start thinking about what real-world problems we want to solve. Something we are really into and familiar with, since working with something otherwise will make it more difficult than it really is. The course instructor and the instructor associate are there to help us choose an appropriately-scoped project that can be completed in the short 10 week period.
We will really need guidance at this stage, since being data science novices, we still do not have the intuition in identifying what class/scope of problems are possible to be solved in this 10-week window.
There are countless Final Project possibilities, based on the readily available dataset, for example:
- Amazon Pricing Data
- Health Insurance Marketplace
- Instacart Orders
- Consumer Loan Data
- Fuel Economy Data
- Craft Beers Data
- All Trump Tweets
- Choose your own dataset
I will likely choose my own dataset that are more in line with my interests.
An Approach to Effective and Scalable MLOps when you’re not a Giant like Google
Day 2 summary - AI/ML edition
Day 1 summary - AI/ML edition
What is Module Federation and why it’s perfect for building your Micro-frontend project
What you always wanted to know about Monorepos but were too afraid to ask
Using Github Actions as a practical (and Free*) MLOps Workflow tool for your Data Pipeline. This completes the Data Science Bootcamp Series
Final week of the General Assembly Data Science bootcamp, and the Capstone Project has been completed!
Fifth and Sixth week, and we are now working with Machine Learning algorithms and a Capstone Project update
Fourth week into the GA Data Science bootcamp, and we find out why we have to do data visualizations at all
On the third week of the GA Data Science bootcamp, we explore ideas for the Capstone Project
We explore Exploratory Data Analysis in Pandas and start thinking about the course Capstone Project
Follow along as I go through General Assembly’s 10-week Data Science Bootcamp
Updating Context will re-render context consumers, only in this example, it doesn’t
Static Site Generation, Server Side Render or Client Side Render, what’s the difference?
How to ace your Core Web Vitals without breaking the bank, hint, its FREE! With Netlify, Github and GatsbyJS.
Follow along as I implement DynamoDB Single-Table Design - find out the tools and methods I use to make the process easier, and finally the light-bulb moment...
Use DynamoDB as it was intended, now!
A GraphQL web client in ReactJS and Apollo
From source to cloud using Serverless and Github Actions
How GraphQL promotes thoughtful software development practices
Why you might not need external state management libraries anymore
My thoughts on the AWS Certified Developer - Associate Exam, is it worth the effort?
Running Lighthouse on this blog to identify opportunities for improvement
Use the power of influence to move people even without a title
Real world case studies on effects of improving website performance
Speeding up your site is easy if you know what to focus on. Follow along as I explore the performance optimization maze, and find 3 awesome tips inside (plus...
Tools for identifying performance gaps and formulating your performance budget
Why web performance matters and what that means to your bottom line
How to easily clear your Redis cache remotely from a Windows machine with Powershell
Trials with Docker and Umbraco for building a portable development environment, plus find 4 handy tips inside!
How to create a low cost, highly available CDN solution for your image handling needs in no time at all.
What is the BFF pattern and why you need it.