Database for Digital Marketers — 1.1 (filters, aggregates & joins)
Filters and Aggregates
To represent the data sets (discussed inthe previous article) in the database, we will need to create 3 tables:Customers — Holds customer InformationProducts — Holds Product InformationOrders — Hold information about orders of customers that bought the Company Products.Once the tables are created, create a database diagram and drag all the three tables to get an overview of all tables (above).
Customers
Products
Orders
Since these tables contain only a few records you can find your answers simply by looking at the tables. However, in the real world, you will have many more customers(hopefully) and you will need to query(search) your database to gather information about your customers. The programming language that is used for querying a database is known as SQL (Structured Query Language). The syntax of the query varies for the different database environments. In my articles, I will write the queries in T-SQL for the MS SQL database. If you are using any other database, there will be some difference but they are close enough to replicate in your database environment. In case you do not have access to any database, you will need to download and install eitherMySql Workbench orMS SQL Express. Both are available to download for free. Once you have downloaded SQL Express, refer to the above database diagram to create your database and tables.
When you have access to your customer database, the first question that comes to your mind is how many customers do you have. For that, you will need only 3 SQL clauses. SELECT, COUNT and FROM.
Use [DigitalMarketing]GoSelect count(*) from Customers— — — — — -14
To find how many customers are based in the state of California, you will need to use the WHERE clause;
Select count(*) from CustomersWhere state=’CA’— — — — — -5
From the Venn diagram, we can confirm that we have 5 Customers in California. If you want to see the details of these customers, you can choose the column names you want to see in the SELECT clause.
SELECT CustomerID,FirstName,LastName,StateFROM CustomersWHERE State =’CA’
You can also write SELECT * FROM Customers WHERE State =’CA’ and get the same results. However, many reporting tables have more than 100 columns and using * in those scenarios will not be an efficient call. Make it a practice to select only the columns/data you want to see.
You have now written your first SQL query. Congratulations! This is very simple and easy but you will be using these clauses all the time as we move towards more complicated concepts/queries.
Joins: Inner, Left, Outer
Join in a database, in fact, carries the same meaning as in the English dictionary. It connects two tables in a database through a common column or key. If we want to find the customers who live in California and have bought Product A, we will need to “Join” or connect the three tables(mentioned in myprevious article) through common keys. The common key that connects the Customers and Orders table is CustomerID and the common key that connects the Products table and Order table is ProductID.In this particular scenario, since we are looking for customers who live in ‘CA’ and bought Product ‘A’, (i.e. the intersection of two data sets), we will be using “Inner Join”. Let’s write the query, one step at a time. First, let’s join the Customer table with the Orders table.Select a.CustomerID,a.State,b.ProductIDFrom Customers aJoin Orders bOn a.CustomerID=b.CustomerIDWhere a.state=’CA’
The query above will return the following rows.
Here, ProductID does not really give any information about the Product. However, ProductID is the common key between the Orders Table and Products Table. And Product Table holds information about the product with ProductID 1. Let’s join the Products table to the above query.
Select a.CustomerID,a.State,b.ProductID,p.ProductFrom Customers aJoin Orders bOn a.CustomerID=b.CustomerIDJoin Products pOn b.ProductID = p.ProductIDWhere a.state=’CA’And Product=’A’
This above query will return the same number of records but notice the Product column (highlighted).
Try this exercise of using inner join with various small data sets. Write the SQL and then manually check against the Venn diagram representing the data sets.
Now let’s look at the second scenario where you want to find out what all customers living in California ordered.
Select a.CustomerID,a.State,b.ProductID,p.Product,b.QuantityFrom Customers aleft Join Orders bOn a.CustomerID=b.CustomerIDleft Join Products pOn b.ProductID = p.ProductIDWhere state=’CA’
Notice that you now see all 5 California customers on the list as well as the additional information of products each one has bought so far. A left join is primarily used to get additional optional information or to distinguish between customers who have certain data from those who do not. In this scenario, we learned that 2 of 5 California customers have bought product A and the other three customers have not.
You may initially find it challenging to decide on when to use left join. In that scenario, visualize the tables or data set and Venn diagram. If we go back to the Venn diagram, you see all the 5 CustomerID in California set and CustomerID { 3,12} from the Product A set.
Now, let’s look at the third scenario, where you want to see all the customers who live in California or have bought Product A. “or” is represented as a union in Set Theory which is equivalent to full outer join in database. As per the above diagram, the union of California and Product A will return all the CustomerID inside the two circles. {3,4,5,6,7,8,12}. The SQL query that represents the above scenario will look like this.
Select a.CustomerID,a.State,b.ProductID,p.Product,b.QuantityFrom Customers afull outer Join Orders bOn a.CustomerID=b.CustomerIDfull outer Join Products pOn b.ProductID = p.ProductIDWhere state=’CA’ or Product=’A’
Resulting rows contains all California customers as well as customers who bought product ‘A’
With this, I will end this session. As I mentioned before, JOIN is the most important thing you need to learn for the querying database. So, spend plenty of time writing queries for all the three join scenarios. Like Mathematics or Piano, the more you practice these joins, the more comfortable you will get. As we go along, I will show many more examples.
For our next session, be ready to learn something fun and interesting. GROUP BY & HAVING clause. It may sound geeky but these two clauses really are fun and prepare you to address more complicated queries. In my current job, almost every script I write, I use group by clause.