Data Science Bootcamp - Week 3 - Full Stack Developer Tips

Jose 𝒥𝒪 Reyes

Machine Learning ★ AWS Community Builder ★ Master of Data Science student @ UNSW ★ Author, fullstackdeveloper.tips ★ Connect with Me!

Overview

Part of the General Assembly Data Science Bootcamp Series
Not Hotdog App
Week 3 - More Pandas
Capstone Project proposal due at the end of next week
Resources

Part of the General Assembly Data Science Bootcamp Series

Not Hotdog App

I still remember the moment that piqued my interest in Data Science. It was probably a couple of years ago, I saw this segment in a popular HBO sitcom - Silicon Valley, and I wasn’t even following the series. I don’t remember, but for some reason I found myself watching this on Youtube. So hilarious when Jian Yang first demoed the Not Hotdog app. But when I discovered that it was actually a real app that they developed for the series, I couldn’t resist, but I had to find out exactly how they made it.

Week 3 - More Pandas

This week, we spent more time getting deeper experience with Pandas, how data scientists use it to slice and dice data and effectively use it for exploratory data analysis. As we get to use it more, we get the appreciation of how indespensible it is at the stage of this data science end to end process. And one can undestand why data scientists love using Jupyter notebooks at this stage in the process too.

The course instructor always talks about that in data science, one needs to build this intuition, of being able to find a problem that is worth solving where it’s solution has an impact as well as identify if the data we have available is of good quality. And that we can have all the volume of data we want, and if it is no good, then they still belong in the rubbish bin. My goal, by the end of this course, is to not only complete the Capstone project, but more importantly, to be able to understand at least how to achieve that intuition that he keeps on talking about.

Capstone Project proposal due at the end of next week

With the Capstone Project proposal due at the end of next week, I’ve been thinking about different options, inspecting several available public datasets, and researching problems that people (myself included) are experiencing that can be solved with data science. Because my data science intuition needs some improvement, it’s also worth noting that I need to come up with a few ideas, since not all are good ideas, or are problems that are able to be completed with the limited time available to me by the end of the bootcamp.

The following are the possibilities:

Amazon Pricing Data

I’ve been interested with pricing related Amazon data since I looked into Amazon Fulfilment by Amazon (FBA) a while back. There are several problems that 3rd party sellers would want answers for such as:

what the optimum selling price is for your chosen category
what the best strategy for product launch is
which version of product description page will convert better
which keywords and and how much to bid for the most optimum PPC campaign

Residential Property Price Index Dataset

Australian house prices are notorious worldwide for being overpriced and unreachable for many. There is a public data available from Australian Bureau of Statistics that show historical property price index for different states from mid 2000 up to the present. From this information, in combination with data from other datesets, we want to:

Find out when and where best to purchase your residential property
Predict house/unit prices 3 months from now
Find out the best locality to purchase an investment property
Does government subsidised housing improve housing affordability in the long term

Formula 1 Dataset

Ever since the first season of Drive to Survive, I’ve been captivated by the drama and excitement that is Formula 1. I’ve been consuming this public API in some of my past blog posts (DynamoDB and Single-Table Design, Simple GraphQL consumer with Apollo Client) and I thought it was fitting to continue this trend and explore the instights that can be gleaned from it:

of course I would like to predict the winner of the next race
explore the effect of the weather on the outcome of the rece
who wins the constructor at the end of the year
who is the last place in the next race

Marathon time Dataset

As I have been dabbling in marathons and triathlons, on and off through the years, this is also one my interests. For years I have been wondering:

what if can accurately predict my finishing marathon time?
how about predicting middle distances like the 10K and 21K?
what is the effect of missed training sessions to my finishing time?
what is the optimum pace throughout the race to achieve my best time?

I will be submitting my Capstone Project proposal at the end of next week, and the ideas I have presented above, in one form or another will most probably be it!

Resources

2023 6
2022 7
2021 9
2020 6
2019 11

2023

How to Build, Train and Deploy Your Own Recommender System – Part 2

7 minute read

We build a recommender system from the ground up with matrix factorization for implicit feedback systems. We then deploy the model to production in AWS.

How to Build, Train and Deploy Your Own Recommender System – Part 1

12 minute read

We build a recommender system from the ground up with matrix factorization for implicit feedback systems. We put it all together with Metaflow and used Comet...

Build Recommender Systems the Easy Way in AWS

15 minute read

Building and maintaining a recommender system that is tuned to your business’ products or services can take great effort. The good news is that AWS can do th...

Ethics in Data, Weekly Reflections

9 minute read

Provided in 6 weekly installments, we will cover current and relevant topics relating to ethics in data

Accelerate ML Application Development in AWS

8 minute read

Get your ML application to production quicker with Amazon Rekognition and AWS Amplify

Remember the last time you created an Entity Relationship diagram? I can’t.

3 minute read

(Re)Learning how to create conceptual models when building software

2022

Going to Production with Github Actions, Metaflow and AWS SageMaker

5 minute read

A scalable (and cost-effective) strategy to transition your Machine Learning project from prototype to production

Small to Reasonable Scale MLOps

4 minute read

An Approach to Effective and Scalable MLOps when you’re not a Giant like Google

AWS Summit 2022 Australia and New Zealand

4 minute read

Day 2 summary - AI/ML edition

AWS Summit 2022 Australia and New Zealand

4 minute read

Day 1 summary - AI/ML edition

Micro-frontends building blocks: Webpack Module Federation

4 minute read

What is Module Federation and why it’s perfect for building your Micro-frontend project

Micro-frontends building blocks: Monorepos

3 minute read

What you always wanted to know about Monorepos but were too afraid to ask

Data Science Bootcamp - MLOps on the cheap!

4 minute read

Using Github Actions as a practical (and Free*) MLOps Workflow tool for your Data Pipeline. This completes the Data Science Bootcamp Series

2021

Data Science Bootcamp - Week 10

7 minute read

Final week of the General Assembly Data Science bootcamp, and the Capstone Project has been completed!

Data Science Bootcamp - Week 5 & 6

5 minute read

Fifth and Sixth week, and we are now working with Machine Learning algorithms and a Capstone Project update

Data Science Bootcamp - Week 4

3 minute read

Fourth week into the GA Data Science bootcamp, and we find out why we have to do data visualizations at all

Data Science Bootcamp - Week 3

4 minute read

On the third week of the GA Data Science bootcamp, we explore ideas for the Capstone Project

Data Science Bootcamp - Week 2

3 minute read

We explore Exploratory Data Analysis in Pandas and start thinking about the course Capstone Project

Data Science Bootcamp - Week 1

3 minute read

Follow along as I go through General Assembly’s 10-week Data Science Bootcamp

Updating React Context does not update my component

4 minute read

Updating Context will re-render context consumers, only in this example, it doesn’t

Pre-render strategies in NextJS

8 minute read

Static Site Generation, Server Side Render or Client Side Render, what’s the difference?

Penny Pinching using the Jamstack Architecture

4 minute read

How to ace your Core Web Vitals without breaking the bank, hint, its FREE! With Netlify, Github and GatsbyJS.

2020

DynamoDB and Single-Table Design

9 minute read

Follow along as I implement DynamoDB Single-Table Design - find out the tools and methods I use to make the process easier, and finally the light-bulb moment...

Debunking 5 common misconceptions about DynamoDB

7 minute read

Use DynamoDB as it was intended, now!

Simple GraphQL consumer with Apollo Client

5 minute read

A GraphQL web client in ReactJS and Apollo

6 Steps to your first GraphQL server

6 minute read

From source to cloud using Serverless and Github Actions

Top 7 reasons why GraphQL is better than REST

7 minute read

How GraphQL promotes thoughtful software development practices

Managing React application state shouldn’t be rocket science

6 minute read

Why you might not need external state management libraries anymore

Data Science Bootcamp - Week 3

Jose 𝒥𝒪 Reyes

Part of the General Assembly Data Science Bootcamp Series

Not Hotdog App

Week 3 - More Pandas

Capstone Project proposal due at the end of next week

Amazon Pricing Data

Residential Property Price Index Dataset

Formula 1 Dataset

Marathon time Dataset

Resources

2023

2022

2021

2020

2019