After graduating with an accounting degree, I straight away started my first job in IT. I still remember my first project was to implement a data governance framework for a multinational company that sells medical devices and laboratory testing solutions. Being completely clueless when my senior spoke to me about database, schema, SQL and tons of mysterious jargons, I managed to make Google my closest friend ever.
Fast forward 4 years, data has always been part of my career in one way or another. Since going back to school to pursue my masters, I found myself reflecting a lot more about my past journey, the projects I have worked on, the lessons I have learned along the way, the struggles, the failures, the tiny sense of achievement and the value that I have created or failed to create.
This “Back to Basics” series is dedicated to my younger 24-year-old self who was naïve about the future, serious about the work and curious about why things work that way. In this first part, let’s join me and explore a universal topic that forms the backbone of every organisation: DATABASE.
Why do people need databases?
Whenever people mention databases, I think about my first giant black Sakos suitcase that my mom bought for me. It was the first time I ever left my home sweet home to study overseas, so mom was really worried about getting sufficient clothes, shoes, toiletries and so on. But she asked me to do one thing. “This is your suitcase. You have to pack it yourself to know where each item is. I won’t be there to help you find your things.”
So I packed my own suitcase and somehow managed to injure my big toe in the process. Don’t worry, I got a brand new shiny toenail after 6 months. When I finally arrived in Singapore and started unpacking. Guess what? My memory, which was equal to the gold fish three-second memory, couldn’t even recall where I had put half of my stuffs. I knew they were somewhere in my giant suitcase, but I just couldn’t locate them. So what did I do? Searched through every corner, found some stuffs that I need, bought new stuffs for things I couldn’t find and spent another 20 minutes or so to tidy up the mess I created. Not the smartest way to do it huh?
The world without database is exactly like the struggle I had with my suitcase.
- We have to spend ridiculous amount of time to find the information we need to do our day-to-day job (think about your customer’s contact details, your tax invoices, your employees’ payslips, etc.) yet we can’t find even half of that.
- We know that the data exists somewhere, but we can’t be sure whether it is still there or forever lost.
- We create and maintain multiple copies of the same data until we lose track and don’t know which is the correct version.
- As the business captures and generates more data in terms of volume and variety, we continuous try to play catch-up, clean up the data mess with all sorts of workaround again, again and again.
On top of those issues, if you are also wondering about how to have your precious data protected from accidental overwriting, or unauthorised access while keeping the maintenance cost affordable, that’s why you need a database.
So what exactly is a database?
By definition, a database is a shared collection of logically related data and its description, designed to meet the information needs of an organisation. I know it may sound like a bunch of jargons again, so let’s break it down together for better understanding.
- A shared collection: many people can have access and use it at the same time
- Logically related data and its description: contains details should be grouped together, together with rules to ensure the data makes sense from a business viewpoint (for example, the ‘Date of Birth’ column should only contain date, not random number, not weird symbols or anything else)
- Designed to meet the information needs of an organisation: created for a purpose, which is to fulfil the business activities (e.g. check whether an order has been shipped, approve a quotation, list all expenses to be paid by the end of the month, etc.)
In layman’s term, a database is a pool of data which have been arranged nicely so that many people in your organisation know what data is there and use it to perform their business activities.
I personally find it not very useful to just know or remember the entire definition. So what is the most useful point I can take away from this definition? For me, it has to be the notion of “meet the information needs of an organisation”. Why? Because it’s a gently reminder that database is not just a technical concept and should never be dealt with as a purely technical challenge. Database exists to serve the business and should always be designed and implemented with real benefits for the organisation such as additional profits, reduce time, mitigate risks or improve customer service. Starting a project to build new databases without having the business in mind doesn’t make sense and should require serious reconsideration.
What is the difference between Database Management System (DBMS) and database?
I remembered feeling so confused between DBMS and Database. So this is to clear the confusion once and for all.
In the above section, we understand that all data after being arranged nicely form databases. However, there is one problem: the data that lies in our spreadsheets, emails and other sources does not magically arrange themselves into databases. We need a tool to make it happen. And that tool is the DBMS. It is a software that does the following for you.
- Define how data should be arranged in the database through a Data Definition Language (DDL)
- Enable user to insert, update, delete and retrieve data from the database through Data Manipulation Language (DML)
- Manage everything related to protection of data in the database such as access control, integrity check, concurrency control (to allow multiple users to use the database at the same time) as well as backup and recovery
In short, database is a product that we want to create and DBMS is the tool to help us to create that product. Some popular DBMS include MySQL, PostgreSQL, MS Access, MS SQL Server, Oracle RDBMS, IBM DB2, Teradata, FileMaker, SQLite, Amazon Redshift, MongoDB, etc. The key takeaway is that some DBMS are better at something than the other. Therefore, it’s completely normal for companies to have more than 1 DBMS, which then increase the level of complexity to design and build a data pipeline for analysis and visualisation.
What is SQL?
SQL stands for Structured Query Language. And the key word that we need to focus on is “Language”.
Remember that we want to create, use and manage databases with a tool called DBMS. But how to we do it? How do we tell DBMS what we want so that the DBMS can do the hard work for us? That’s when we need a way to communicate, a specific language to tell DBMS what we want. As a result, SQL was born as a standardised language across the globe to define and manipulate databases using DBMS. In fact, all vendor must implement a set of features called Core SQL to claim conformance with the SQL standard.
When speaking about a language being used worldwide, we can’t avoid dialects. For example, some people might say “Hello” while other might use “Howdy” to greet each other. The same thing also occurs in SQL. As different vendors build different DBMS and develop their own functionalities, the standardised language grows and gets tweaked differently. Here are a few common ones.
- Transact-SQL (a.k.a T-SQL): procedural language for Microsoft SQL Server databases
- PL/SQL: procedural extension for Oracle relational databases
- PL/pgSQL: procedural language for PostgreSQL
As we are a bit overloaded with everything SQL in this section, let’s recap how to distinguish among 3 different things.
- SQL: a standardised language to define and manipulate databases
- T-SQL or PL/SQL or PL/pgSQL: different dialects implemented by different technology vendors to provide more functionalities on top of the standardised SQL
- MySQL or MS SQL Server or PostgreSQL: different DBMS (built by different vendors) which uses both SQL and its own dialect to manage databases
Wrapping Up
In this post, we have covered why we need databases, what is a database, a database management system and SQL. I do believe this is the absolute basic understanding required before talking about SQL queries and procedures in the next post. Till then, take care and be safe, everyone!