We build a recommender system from the ground up with matrix factorization for implicit feedback systems. We then deploy the model to production in AWS.
I started my Master of Data Science coursework this month and the first course in the program is all about Database Systems. Having been developing software as a full stack developer for many years now, there have been many instances where a feature I’m working on required a database to be designed and setup.
Conceptual Design with ER Diagrams
The very first topic we covered is the use of Entity Relationship Diagram (ER Diagram) as part of the process of building the application after the requirements have been gathered.
Wait, when was the last time I had to make an ER diagram? I honestly cannot remember, maybe in my undergraduate course, but possibly not in my professional career? I discuss with my team in a meeting or a chat session, draw into a scrap piece of paper or exercise book, scan that and send through in an email. I’ll have to admit that my development style have been light on documentation, and prefer my code and some comments to convey my intentions, as I go.
Am I alone with this thinking?
This exercise made me question if I’m a real developer at all? However, I know that I am not alone in this. Many, if not all of the developers I’ve worked with over the years would have the same experience as I have.
Why ER Diagrams are not so common anymore
So what may be the reasons why we are not using ER diagrams to develop relational databases as much as they used to? Here are my thoughts:
Agile development methodologies
With the rise of agile development methodologies and the wane of the waterfall model, there has been a push to prioritize rapid development and iteration over detailed planning and design, where code is more important than documentation.
Object-relational mapping (ORM) frameworks have become more popular in recent years. These frameworks automatically map object-oriented code to relational databases (as in the code first approach), so we don’t have to design the database schema manually.
NoSQL databases have become increasingly popular and in some cases, they offer more scalability and flexibility than traditional relational databases. ER diagrams are primarily used for relational databases, and may not be as useful when working with NoSQL databases.
Up to date documentation
ER Diagrams being a conceptual model of a database, it is the shared vision of you system made available to all stakeholders, regardless of technical ability. However, as the project matures, more and more changes will need to be implemented, which means that the ER Documentation will need to be maintained with these new changes. More documentation that will most likely will not get updated as new changes are introduced.
Designing Data Warehouses
With the advent of Data Warehouses where an increasingly complex web of data sources need to be intertwined together and the increase in the use of data for the creation of analytical reports, there has been a recent push to bring back data modelling (data warehouses use Dimensional models, a derivative of ER models), not just to help convey the shared data vision, but of equal importance for more practical and financial reasons.
Good data modelling benefits
Having a good and correct data model means end users will have accurate data, the same data put in front of the decision makers. Correct data means correct and better decisions.
Good data models help in creating pipelines that use simpler queries, simpler queries that translate to cheaper compute. Good data models that avoid having to create duplicate pipelines that return the same data, and these will most definitely translate to savings in cloud compute costs.
As I review and re-learn the use of Entity Relationship diagrams in not only building databases, but in extension to larger systems in Data Warehouses and Data Lakes, I have to remind myself that we do these not just for documentation:
- initially used for documentation and a vessel to disseminate the shared vision to stakeholders including the non-technical ones
- can be used to define the data architecture for an organization
- can aid in building systems that can use simpler and cheaper queries
- can avoid the creation of pipelines that will return duplicate data, translating to savings in compute costs
We build a recommender system from the ground up with matrix factorization for implicit feedback systems. We put it all together with Metaflow and used Comet...
Building and maintaining a recommender system that is tuned to your business’ products or services can take great effort. The good news is that AWS can do th...
Provided in 6 weekly installments, we will cover current and relevant topics relating to ethics in data
Get your ML application to production quicker with Amazon Rekognition and AWS Amplify
(Re)Learning how to create conceptual models when building software
A scalable (and cost-effective) strategy to transition your Machine Learning project from prototype to production
An Approach to Effective and Scalable MLOps when you’re not a Giant like Google
Day 2 summary - AI/ML edition
Day 1 summary - AI/ML edition
What is Module Federation and why it’s perfect for building your Micro-frontend project
What you always wanted to know about Monorepos but were too afraid to ask
Using Github Actions as a practical (and Free*) MLOps Workflow tool for your Data Pipeline. This completes the Data Science Bootcamp Series
Final week of the General Assembly Data Science bootcamp, and the Capstone Project has been completed!
Fifth and Sixth week, and we are now working with Machine Learning algorithms and a Capstone Project update
Fourth week into the GA Data Science bootcamp, and we find out why we have to do data visualizations at all
On the third week of the GA Data Science bootcamp, we explore ideas for the Capstone Project
We explore Exploratory Data Analysis in Pandas and start thinking about the course Capstone Project
Follow along as I go through General Assembly’s 10-week Data Science Bootcamp
Updating Context will re-render context consumers, only in this example, it doesn’t
Static Site Generation, Server Side Render or Client Side Render, what’s the difference?
How to ace your Core Web Vitals without breaking the bank, hint, its FREE! With Netlify, Github and GatsbyJS.
Follow along as I implement DynamoDB Single-Table Design - find out the tools and methods I use to make the process easier, and finally the light-bulb moment...
Use DynamoDB as it was intended, now!
A GraphQL web client in ReactJS and Apollo
From source to cloud using Serverless and Github Actions
How GraphQL promotes thoughtful software development practices
Why you might not need external state management libraries anymore
My thoughts on the AWS Certified Developer - Associate Exam, is it worth the effort?
Running Lighthouse on this blog to identify opportunities for improvement
Use the power of influence to move people even without a title
Real world case studies on effects of improving website performance
Speeding up your site is easy if you know what to focus on. Follow along as I explore the performance optimization maze, and find 3 awesome tips inside (plus...
Tools for identifying performance gaps and formulating your performance budget
Why web performance matters and what that means to your bottom line
How to easily clear your Redis cache remotely from a Windows machine with Powershell
Trials with Docker and Umbraco for building a portable development environment, plus find 4 handy tips inside!
How to create a low cost, highly available CDN solution for your image handling needs in no time at all.
What is the BFF pattern and why you need it.