How to Build, Train and Deploy Your Own Recommender System â Part 2
We build a recommender system from the ground up with matrix factorization for implicit feedback systems. We then deploy the model to production in AWS.
I love being involved in application development as a full stack developer. I get to experience a wide variety of technologies all the time. One time you get the odd legacy application using old and ancient technology (once I worked on a full jQuery application for a week!), but at other times something more recent and cool.
One such project was to provide modifications to an existing web application developed a couple of years back. It was a modern DynamoDB-based application using ReactJS for the front-end. I belong to a small team, so when working with small projects, we typically choose a general section to work on. One developer picks the front-end, while another works on the back-end, that sort of thing.
DynamoDB is a proprietary DB offering from AWS
In this particular instance, I declared dibs on the database/backend changes. Prior to that, I had never been involved with a commercial NoSQL project before. A few years back, I dabbled in a couple of tutorials with MongoDB but that was it. I was already confident with development using SQL at this stage, so how could DynamoDB be any different?
Boy was I wrong!
In the end, I was able to complete the modifications to my satisfaction, however it took an enormous amount of time and effort to forget about what I already knew about relational databases, and rewire my brain to think the DynamoDB way.
The following are the misconceptions about DynamoDB that myself and many other beginners or those not using it will probably have. As I got more confident with the technology in the course of this project, I found that these misconceptions were all unfounded.
Read on to find out why.
I know that this is a very common misconception. Specially for me because up to this point I have never really seen an example of DynamoDB being used for a more complicated application. Even for the project I was working on, it was just a step above a simple key value store.
Now that I have looked at DynamoDB in a bit more detail, in the course of studying for an AWS certification - AWS Certified Data Analytics - Specialty, I knew that there is more to DynamoDB than just a key-value store since part of the coursework is all about DynamoDB.
On researching about the topic, I stumbled upon the book called The DynamoDB Book - by Alex DeBrie. Another awesome resource is the YouTube video AWS re:Invent 2019 video by Rick Houlihan, an AWS NoSQL Blackbelt. Both of these guys opened my eyes to what DynamoDB can do.
One to one relationships. One to many relationships. Many to many relationships. Single digit millisecond latency regardless of the database size. What?!! I didnât know it can do that! Whatever SQL can do, DynamoDB can too, and more. However, the key to unlock these was to unlearn most of the things you know about relational databases.
I became a DynamoDB convert then and there.
Not sure where this misconception came from. But for me, I guess from my very limited exposure to this technology, naturally I really did not know its strengths and weaknesses. But this is really unfounded. Amazon.com, yes the retail giant, requires all Tier 1 workloads and services to use DynamoDB.
Tier 1 services are those that will lose the company money if the system had a downtime. You can imagine that Amazon.comâs online ordering system is not really a low-volume and simple application. I rest my case.
This is a funny one as it is opposite the previous point, that DynamoDB can only be used for large and complex applications and that you should not waste time in learning it for simple applications and low volume workloads. The truth is, if DynamoDB can handle large and complex application workloads like Amazon.comâs, then anything less will be childâs play.
It is true though, be it for small and low workloads to large, complex and heavy applications, it will keep itâs single digit millisecond performance. Whether youâre a startup or a giant and you get a large uptick on your workload , you will be thankful you have chosen DynamoDB when you first started.
Personally, this is how I felt before I started working on the application. Not that I was an expert in SQL, but I was already had the confidence in tackling any SQL related task. As the technology has matured, almost every developer will be expected to have SQL proficiency in their toolbox.
Hereâs a list of DynamoDB truths - There are no JOINS in DynamoDBâŠBefore touching any DynamoDB table, you should know your access patterns up-front⊠Be careful in picking the primary key as you might end up having hot partitions and will drastically affect performance⊠Avoid using Local Secondary Index⊠The advantages of Global Secondary Index and how they are better⊠Donât use SCAN or if you have to use it know the drawbacks⊠Be careful in filtering your data or else youâre just wasting your money⊠Once you have picked your table/index strategy, itâs the end of the world, as you are stuck and cannot change this anymore - is a false statement⊠Yes you can use multiple tables and just allow the application to aggregate the data, however this is not optimal and AWSâ recommendation is to use Single-Table design⊠The list goes on and onâŠ
To learn to use DynamoDB correctly you have to unlearn things that you know about relational databases, and entails a very steep learning curve.
No, DynamoDB is not easy, but boy is it powerful.
It is true that NoSQL databases, DynamoDB included, are not restricted by a schema like relational databases such as SQL. But that doesnât mean that you should not use one.
All this means is that DynamoDB does not use a schema at the database level. However when you are working with it in your application, you would need to organize your item collections and data structures into some sort of system, otherwise it would be very difficult to reason about and read back the data from the database.
Today we covered the top 5 common misconceptions about NoSQL and DynamoDB in particular that have been circulating through the years. But misconceptions are just false statements spread by inaccurate information either intentionally or not. Dispelling these false information is quite simple, just start learning!
The two references I have listed in the Resources below taught me everything I needed to know about DynamoDB. However I did my part too. Putting a bit of effort in getting to know DynamoDB a little bit better resulted me not only debunking these misconceptions, but enabled me to use DynamoDB as AWS intended.
In the next articles about DynamoDB, we will explore:
The approach I use to effectively design my DynamoDB tables
Use DynamoDB how Amazon intended through an example project and walkthrough
Migrating an existing DynamoDB project when you need to add access patterns unknown when first developed
These picks are things that have had a positive impact to me in recent weeks:
We build a recommender system from the ground up with matrix factorization for implicit feedback systems. We then deploy the model to production in AWS.
We build a recommender system from the ground up with matrix factorization for implicit feedback systems. We put it all together with Metaflow and used Comet...
Building and maintaining a recommender system that is tuned to your businessâ products or services can take great effort. The good news is that AWS can do th...
Provided in 6 weekly installments, we will cover current and relevant topics relating to ethics in data
Get your ML application to production quicker with Amazon Rekognition and AWS Amplify
(Re)Learning how to create conceptual models when building software
A scalable (and cost-effective) strategy to transition your Machine Learning project from prototype to production
An Approach to Effective and Scalable MLOps when youâre not a Giant like Google
Day 2 summary - AI/ML edition
Day 1 summary - AI/ML edition
What is Module Federation and why itâs perfect for building your Micro-frontend project
What you always wanted to know about Monorepos but were too afraid to ask
Using Github Actions as a practical (and Free*) MLOps Workflow tool for your Data Pipeline. This completes the Data Science Bootcamp Series
Final week of the General Assembly Data Science bootcamp, and the Capstone Project has been completed!
Fifth and Sixth week, and we are now working with Machine Learning algorithms and a Capstone Project update
Fourth week into the GA Data Science bootcamp, and we find out why we have to do data visualizations at all
On the third week of the GA Data Science bootcamp, we explore ideas for the Capstone Project
We explore Exploratory Data Analysis in Pandas and start thinking about the course Capstone Project
Follow along as I go through General Assemblyâs 10-week Data Science Bootcamp
Updating Context will re-render context consumers, only in this example, it doesnât
Static Site Generation, Server Side Render or Client Side Render, whatâs the difference?
How to ace your Core Web Vitals without breaking the bank, hint, its FREE! With Netlify, Github and GatsbyJS.
Follow along as I implement DynamoDB Single-Table Design - find out the tools and methods I use to make the process easier, and finally the light-bulb moment...
Use DynamoDB as it was intended, now!
A GraphQL web client in ReactJS and Apollo
From source to cloud using Serverless and Github Actions
How GraphQL promotes thoughtful software development practices
Why you might not need external state management libraries anymore
My thoughts on the AWS Certified Developer - Associate Exam, is it worth the effort?
Running Lighthouse on this blog to identify opportunities for improvement
Use the power of influence to move people even without a title
Real world case studies on effects of improving website performance
Speeding up your site is easy if you know what to focus on. Follow along as I explore the performance optimization maze, and find 3 awesome tips inside (plus...
Tools for identifying performance gaps and formulating your performance budget
Why web performance matters and what that means to your bottom line
How to easily clear your Redis cache remotely from a Windows machine with Powershell
Trials with Docker and Umbraco for building a portable development environment, plus find 4 handy tips inside!
How to create a low cost, highly available CDN solution for your image handling needs in no time at all.
What is the BFF pattern and why you need it.