Complete SQL + Databases Bootcamp: Zero to Mastery [2020]

Complete SQL + Databases Bootcamp: Zero to Mastery [2020]

Section 1: Introduction

Live SQL Zoo Study Guide

4. The Big Data Breach At Keiko Corp

Somebody stole our trade secrets!

⬆ back to top

Section 2: History And Story Of Data

5. What Is a Database?

It’s a collection of data, a method for accessing and manipulating that data.

Data are collected by many sources

Drone
Mobile App
Web App
Videocam
Automobile

⬆ back to top

6. I Didn’t Learn Anything, Try Again…

Why is a database always represented with a cylinder?

Data are collected and use by different people

Product Managers: learn from data to improve the product
Marketers: analyse business decision
Web Developers
Data Analysts: analyse and build machine learning model
Data Engineer
Data Administrator

⬆ back to top

7. Database Management System (DBMS)

How to put data in DB
How to use/update/learn from data
How to remove data

Confusing Acronyms

DBMS: Database Management System
RDBMS: Relational Database Management System
SQL: Structured Query Language

Confusing Anagrams

⬆ back to top

5 data models

Relational: support asset transactions
Document: data is in a document, scalability
Key Value: simple way to access data
Graph: good for connected data
Wide Columnar

⬆ back to top

11. Exercise: What Is A Database?

A database is a hardware and software system that allows a user to store, organize and use data.
A database is a hardware and software system stores data and answer questions.

⬆ back to top

12. Section Review

Do you have useful data?
Install a DB to Store Data
Use SQL to Setup + Use Data

⬆ back to top

Section 3: Databases + SQL Fundamentals

13. SQL Playground

DB Fiddle - SQL Database Playground

⬆ back to top

14. Exercise: Setting Up Your First Database

CREATE TABLE Class (
  id character(255),
  year integer NOT NULL,
  PRIMARY KEY(id)
);
CREATE TABLE Student (
  id varchar(255) NOT NULL,
  class varchar(255) NOT NULL,
  firstName varchar(255) NOT NULL,
  lastName varchar(255) NOT NULL,
  dob date NOT NULL,
  sex varchar(1) NOT NULL,
  FOREIGN KEY (class) REFERENCES Class(id),
  PRIMARY KEY (id)
);
INSERT INTO Class(id, year) VALUES('c1', 1);
INSERT INTO Class(id, year) VALUES('c2', 2);
INSERT INTO Class(id, year) VALUES('c2a', 2);
INSERT INTO Student VALUES ('s1', 'c1', 'George', 'Jacobson', '1992-01-01', 'm');
INSERT INTO Student VALUES ('s2', 'c2', 'Macy', 'Waterson', '1992-01-01', 'f');
INSERT INTO Student VALUES ('s3', 'c1', 'Bill', 'Peters', '1992-01-01', 'm');
INSERT INTO Student VALUES ('s4', 'c1', 'Janine', 'Wilson', '1992-01-01', 'f');
INSERT INTO Student VALUES ('s5', 'c2', 'Jason', 'Lipton', '1992-01-01', 'm');

⬆ back to top

15. What Is SQL?

SQL is programming language to talk to database
Database is a structured set of data

⬆ back to top

16. What Is A Query?

Query is question written in SQL statement

CREATE TABLE User (
  id varchar(255) NOT NULL,
  name varchar(255) NOT NULL,
  lastName varchar(255) NOT NULL,
  dob date NOT NULL,
  sex varchar(1) NOT NULL,
  role varchar(255) NOT NULL,
  PRIMARY KEY (id)
);
INSERT INTO User VALUES ('u1', 'George', 'Jacobson', '1992-01-01', 'm', 'manager');
INSERT INTO User VALUES ('u2', 'Macy', 'Waterson', '1992-01-01', 'f', 'employee');
INSERT INTO User VALUES ('u3', 'Bill', 'Peters', '1992-01-01', 'm', 'employee');
INSERT INTO User VALUES ('u4','Janine', 'Wilson', '1992-01-01', 'f', 'manager');
INSERT INTO User VALUES ('u5', 'Jason', 'Lipton', '1992-01-01', 'm', 'manager');
SELECT * 
FROM User
WHERE role = 'employee';

⬆ back to top

17. Imperative vs Declarative

Declarative: What will happen?
Imperative: How it will happen?

⬆ back to top

18. History of SQL

⬆ back to top

19. Exercises: The Select Statement

Exercises: The Select Statement

-- Insert the missing statement to get all the columns from the Customers table.
SELECT * 
FROM Customers;
-- Write a statement that will select the City column from the Customers table.
SELECT City
FROM Customers;
-- Select all the different values from the Country column in the Customers table.
SELECT DISTINCT Country 
FROM Customers;

⬆ back to top

20. Optional: History of SQL Deep Dive

History of Databases

⬆ back to top

21. SQL Standards

The SQL Standard – ISO/IEC 9075:2016 (ANSI X3.135)

⬆ back to top

22. What Is A Database? Revisited

File processing systems

save data on individual files
no correlation between the files
no relationship between the data
update in one file is not reflected in data of relationed files
no common standards per system
no common models per data
data are isolated and duplicated across systems

Model: structure of data

⬆ back to top

23. Database Oriented Approach

⬆ back to top

24. Exercise: SQL Starter Quiz

SQL Starter Quiz

⬆ back to top

25. Database Models

How you organize and store your data ?

Hierarchical
Networking
Entity-Relationship
Relational
Object Oriented
Flat
Semi-Structured

⬆ back to top

26. Hierarchical And Networking Model

Hierarchical Model

each child can have only one parent
child is tightly couple with parent
represent 1 to many relationship

Networking Model

represent many to many relationship

⬆ back to top

27. Relational Model

⬆ back to top

28. DBMS Revisited

DBMS features

App <-> SQL <-> DBMS
CRUD operations
Manage your data
Secure your data
Transactional management

Codd’s Rules for RDBMS

⬆ back to top

29. Relational Model Revisited

⬆ back to top

30. Tables

⬆ back to top

31. Columns

⬆ back to top

32. Rows

⬆ back to top

33. Primary And Foreign Keys

⬆ back to top

34. OLTP vs OLAP

OLTP: Support day to day

OLAP: Support analysis

⬆ back to top

35. Exercise: OLTP vs OLAP

A database is being used to log orders and customers: OLTP
A database is being used to figure out what new products we should offer: OLAP
A database is being used to derive statistics for reporting to the executives: OLAP
A database is being used to keep track of logged in users: OLTP

⬆ back to top

36. Exercise: Relational Model Quiz

Exercise: Relational Model Quiz

⬆ back to top

Section 4: Environment Setup

38. Why PostgreSQL

PostgreSQL is the worlds’ best database

⬆ back to top

39. Environment Tooling

⬆ back to top

40. SQL Tooling Alternatives

our choice was based on multiple criteria

Is it cross-platform?
Is it easy to use?
Is it free?
Can it be extended to multiple database providers?

Alternatives

⬆ back to top

41. Command Line 101

@aechagen/mac-terminal-101-13a3e8e75d4c">Command Line 101

⬆ back to top

45. MAC Setup

⬆ back to top

46. MAC Commandline tools

CLI Tools

⬆ back to top

49. Importing Data On MAC

psql -U chesterheng -d Employees < employees.sql
psql -U chesterheng -d World < world.sql
psql -U chesterheng -d Store < store.sql
psql -U chesterheng -d France < france.sql

⬆ back to top

Section 5: SQL Deep Dive

52. Starting With Query

DCL: Data control language
DDL: Data definition language
DQL: Data query language
DML: Data modification language

What is a select? Retrieve data

⬆ back to top

53. Exercise: Simple Queries

SELECT * FROM "public"."employees";
SELECT * FROM "public"."departments";

⬆ back to top

54. Exercise: Renaming Columns

SELECT emp_no AS "Employee #",
  birth_date AS "Birthday",
  first_name AS "First Name"
FROM "public"."employees";

⬆ back to top

55. Concat Function

Make your data prettier/simpler

SELECT emp_no,
  CONCAT(first_name, ' ', last_name) AS "full name"
FROM "public"."employees";

⬆ back to top

56. What Is A Function In SQL?

input -> function -> output
a function is a set of steps that creates a single value

Types of functions

Aggregate: operate on many records to produce 1 value
Aggregate: produce a summary
Scalar: operate on each record independently

⬆ back to top

57. Aggregate Functions

Aggregate Functions

AVG()
COUNT
MIN()
MAX()
SUM()

SELECT COUNT(emp_no) FROM "public"."employees";
SELECT MAX(salary) FROM "public"."salaries";
SELECT SUM(salary) FROM "public"."salaries";

⬆ back to top

58. Exercise: Aggregate Functions

--
/*
 * What database should I use for these exercises?
 * Name: Employees
 */
-- 
-- Question 1: What is the average salary for the company?
-- Table: Salaries
-- Result: 63810.744836143706
select avg(salary) from salaries;
-- Question 2: What year was the youngest person born in the company?
-- Table: employees
select max(birth_date) from employees;
--
/*
 * What database should I use for these exercises?
 * Name: France
 */
-- 
-- Question 1: How many towns are there in france?
-- Table: Towns
-- Result: 36684
select count(id) from towns;
--
/*
 * What database should I use for these exercises?
 * Name: World
 */
-- 
-- Question 1: How many official languages are there?
-- Table: countrylanguage
-- Result: 238
select count(countrycode) from countrylanguage
where isofficial = true;
-- Question 2: What is the average life expectancy in the world?
-- Table: country
-- Result: 66.48603611164265
select avg(lifeexpectancy) from country;
-- Question 3: What is the average population for cities in the netherlands?
-- Table: city
-- Result: 185001
select AVG(population) from city
where countrycode = 'NLD';

⬆ back to top

59. Commenting Your Queries

How to make comments the most important ‘code’ you write

⬆ back to top

60. Common SELECT Mistakes

Ten Common SQL Programming Mistakes

Misspell commands
Use ; instead of , and vice versa
Use “ instead of ‘
“ is for tables
‘ is for text
Invalid column name

⬆ back to top

61. Filtering Data

SELECT first_name
FROM "public"."employees"
WHERE gender = 'F';

⬆ back to top

62. AND and OR

SELECT first_name, last_name, hire_date
FROM "public"."employees"
WHERE first_name = 'Georgi' 
  AND last_name = 'Facello'
  OR first_name = 'Bezalel' 
  AND last_name = 'Simmel';

⬆ back to top

63. Exercise: Filtering Data

SELECT firstname, lastname, gender, state 
FROM customers
WHERE gender = 'F' AND (state = 'OR' OR state = 'NY') ;

⬆ back to top

64. Exercise: The Where Clause

-- Select all records where the City column has the value "Berlin".
SELECT * 
FROM Customers
WHERE City = 'Berlin';
-- Use the NOT keyword to select all records where City is NOT "Berlin".
SELECT * 
FROM Customers
WHERE NOT City = 'Berlin';
-- Select all records where the CustomerID column has the value 32.
SELECT * 
FROM Customers
WHERE CustomerID = 32 ;
-- Select all records where the City column has the value 'Berlin' and the PostalCode column has the value 12209.
SELECT * 
FROM Customers
WHERE City = 'Berlin'
  AND PostalCode = 12209;
-- Select all records where the City column has the value 'Berlin' or 'London'.
SELECT * FROM Customers
WHERE City = 'Berlin'
  OR City = 'London';

⬆ back to top

65. The NOT Keyword

What if you want to filter on everything but … ?

SELECT age FROM customers WHERE NOT age = 55;
SELECT COUNT(age) FROM customers WHERE NOT age = 55;

⬆ back to top

66. Comparison Operators

Comparison Functions and Operators

⬆ back to top

67. Exercise: Comparison Operators

-- How many female customers do we have from the state of Oregon (OR)?
-- Result: 106
SELECT COUNT(firstName)
FROM customers
WHERE gender = 'F' and state = 'OR';
-- Who over the age of 44 has an income of 100 000 or more? (excluding 44)
-- Result: 2497
SELECT COUNT(income)
FROM customers
WHERE age > 44 and income >= 100000;
-- Who between the ages of 30 and 50 has an income less than 50 000?
-- (include 30 and 50 in the results)
-- Result: 2362
SELECT COUNT(income)
FROM customers
WHERE age >= 30 and age <= 50 AND income < 50000;
-- What is the average income between the ages of 20 and 50? (Excluding 20 and 50)
-- Result: 59409.926240780098
SELECT AVG(income)
FROM customers
WHERE age > 20 and age < 50;

⬆ back to top

68. Logical Operators

Logical Operators

Order of operations

FROM
WHERE
SELECT

⬆ back to top

69. Operator Precedence

A statement having multiple operators is evaluated based on the priority of operators

⬆ back to top

71. Exercise: Operator Precedence

/*
* DB: Store
* Table: Customers
* Question: 
* Select people either under 30 or over 50 with an income above 50000
* Include people that are 50
* that are from either Japan or Australia
*/
SELECT firstname, income, age 
FROM customers
WHERE income > 50000 
  AND (age < 30 OR age >= 50)
  AND (country = 'Japan' OR country = 'Australia')
/*
* DB: Store
* Table: Orders
* Question: 
* What was our total sales in June of 2004 for orders over 100 dollars?
*/
SELECT SUM(totalamount) 
FROM orders
WHERE (orderdate >= '2004-06-01' AND orderdate <= '2004-06-30') 
  AND totalamount > 100

⬆ back to top

72. Checking For NULL Values

When a record does not hava a value, it is considered empty
NULL is used to represent a missing/empty value
Operate with NULL will get NULL, eg. SELECT NULL = NULL
Best practice is to avoid NULL

⬆ back to top

73. IS Keyword

Database Design Follies: NULL vs. NOT NULL

Should you use NULL?

Optional or required?
Future info?
Retional? What impact?

Be defensive

Always check for NULL when necessary
Filter out NULL with IS operator
Clean up your data

In database table creation, do you allow NULL or use a default value to represent empty data?

Use default value: cannot filter NULL data
Use default value: cannot distinguish user entered or default

⬆ back to top

74. NULL Coalescing

Replace NULL values to operate on the data
Return the first non-null value in a list

SELECT COALESCE(<column>, 'Empty')
FROM <table>

⬆ back to top

75. Exercise: Null Value Coalescing

CREATE TABLE "Student" (
    id serial PRIMARY KEY,
  name varchar(255),
  lastName varchar(255),
  age int
);
INSERT INTO "Student" (name, lastName, age) VALUES ('STUDENT 1', NULL, NULL);
INSERT INTO "Student" (name, lastName, age) VALUES ('STUDENT 2', NULL, 25);
INSERT INTO "Student" (name, lastName, age) VALUES (null, 'LAST NAME 3', 20);
INSERT INTO "Student" (name, lastName, age) VALUES (null, null, 34);
INSERT INTO "Student" (name, lastName, age) VALUES ('STUDENT 5', NULL, 58);
-- Assuming a students minimum age for the class is 15, what is the average age of a student?
SELECT avg(coalesce(age, 15)) FROM "Student";
-- Replace all empty first or last names with a default?
SELECT id, coalesce(name, 'fallback'), coalesce(name, 'lastName'), age FROM "Student";

⬆ back to top

76. 3 Valued Logic

TRUE | NULL | FALSE
NULL value can be anything
NULL is always unknown
NULL = NULL (NULL)
IS NULL (TRUE)

⬆ back to top

77. Exercise: 3 Valued Logic

/*
* DB: Store
* Table: customers
* Question: adjust the following query to display the null values as "No Address"
*/
SELECT COALESCE(address2, 'No Address')
FROM customers
/*
* DB: Store
* Table: customers
* Question: Fix the following query to apply proper 3VL
*/
SELECT *
FROM customers
WHERE address2 IS NOT null;
/*
* DB: Store
* Table: customers
* Question: Fix the following query to apply proper 3VL
*/
SELECT coalesce(lastName, 'Empty'), * from customers
where (age IS NULL);

⬆ back to top

78. BETWEEN + AND

Shorthand to match against a range of value
More readability and maintainability

WHERE <column> >= X AND <column> <= Y
WHERE <column> BETWEEN X AND Y

⬆ back to top

79. IN Keyword

Filter multiple values
Check if a value matched any values in a list

WHERE <column> IN (value1, value2, value3)
WHERE <column> = value1 
  OR <column> = value2 
  OR <column> = value3

⬆ back to top

80. Exercise: IN Keyword

/*
* DB: Store
* Table: orders
* Question: How many orders were made by customer 7888, 1082, 12808, 9623
*/
SELECT COUNT(orderid)
FROM orders
WHERE customerid IN (7888, 1082, 12808, 9623)
/*
* DB: World
* Table: city
* Question: How many cities are in the district of Zuid-Holland, Noord-Brabant and Utrecht?
*/
SELECT COUNT(id)
FROM city
WHERE district IN ('Zuid-Holland', 'Noord-Brabant', 'Utrecht');

⬆ back to top

81. LIKE

partial lookups
What if you don’t know exactly what you are searching for?
You need to build patterns to match when use LIKE
postgres LIKE only does text comparison, so must cast whatever we use to text

Pattern wildcards

%: match any number of characters
-: match 1 character

⬆ back to top

82. Exercise: Like Keyword

/*
* DB: Employees
* Table: employees
* Question: Find the age of all employees who's name starts with M.
* Sample output: https://imgur.com/vXs4093
*/
SELECT emp_no, first_name, EXTRACT (YEAR FROM AGE(birth_date)) as "age" FROM employees
WHERE first_name ILIKE 'M%';
/*
* DB: Employees
* Table: employees
* Question: How many people's name start with A and end with R?
* Expected output: 1846
*/
SELECT count(emp_no) FROM employees
WHERE first_name ILIKE 'A%R';
/*
* DB: Store
* Table: customers
* Question: How many people's zipcode have a 2 in it?.
* Expected output: 4211 
*/
SELECT count(customerid) FROM customers
WHERE zip::text LIKE '%2%';
/*
* DB: Store
* Table: customers
* Question: How many people's zipcode start with 2 with the 3rd character being a 1.
* Expected output: 109 
*/
SELECT count(customerid) FROM customers
WHERE zip::text LIKE '2_1%';
/*
* DB: Store
* Table: customers
* Question: Which states have phone numbers starting with 302?
* Replace null values with "No State"                                                  
* Expected output: https://imgur.com/AVe6G4c
*/
SELECT coalesce(state, 'No State') as "State" FROM customers
WHERE phone::text LIKE '302%';

⬆ back to top

83. Dates And Timezones

UTC is enough for everyone…right?

Dates are hard
What is a timezone? set by Sir Sanford Fleming
Greenwich Mean Time (GMT): a time zone
Universal Coordinate Time (UTC): a time standard
No territories use UTC
GMT and UTC share the same current time

SHOW TIMEZONE;
SET TIME ZONE 'UTC';

⬆ back to top

84. Setting Up Timezones

ALTER USER chesterheng SET timezone='UTC';
SHOW TIMEZONE;

⬆ back to top

85. How Do We Format Date And Time?

postgres uses ISO-8601

How do dates look like?

YYYY-MM-DDTHHSS
2017-08-17T12:47:16+02:00

What is a format?

Is a way of representing a date and time

⬆ back to top

86. Timestamps

A timestamp is a date with time and timezone info

SELECT NOW();
CREATE TABLE timezones (
  ts TIMESTAMP WITHOUT TIME ZONE,
  tz TIMESTAMP WITH TIME ZONE
);
INSERT INTO timezones VALUES (
  TIMESTAMP WITHOUT TIME ZONE '2000-01-01 10:00:00',
  TIMESTAMP WITH TIME ZONE '2000-01-01 10:00:00-05'
);

timestamps or date ?

depends on what to store
normally date is enough

⬆ back to top

87. Date Functions

Data Type Formatting Functions

-- Current Date
SELECT NOW()::date;
SELECT CURRENT_DATE;
-- format
SELECT TO_CHAR(CURRENT_DATE, 'dd/mm/yyyy');
SELECT TO_CHAR(CURRENT_DATE, 'DDD');
SELECT TO_CHAR(CURRENT_DATE, 'WW');

⬆ back to top

88. Date Difference And Casting

Subtract dates returns the difference in days

SELECT NOW() - '1800/01/01';
SELECT date '1800/01/01';

⬆ back to top

89. Age Calculation

cast to date before calculate age

SELECT AGE(date '1800/01/01');
SELECT AGE(date '1992/11/13', date '1800/01/01');
SELECT EXTRACT (DAY FROM date '1992/11/13') DAY;

⬆ back to top

90. Extracting Information

SELECT EXTRACT (DAY FROM date '1992/11/13') AS DAY;
SELECT EXTRACT (MONTH FROM date '1992/11/13') AS MONTH;
SELECT EXTRACT (YEAR FROM date '1992/11/13') AS YEAR;
SELECT DATE_TRUNC ('year', date '1992/11/13');
SELECT DATE_TRUNC ('month', date '1992/11/13');
SELECT DATE_TRUNC ('day', date '1992/11/13');

⬆ back to top

91. Intervals

A Comprehensive Look at PostgreSQL Interval Data Type
can store and manipulate a period of time in years, months, days, hours, mins, secs

-- 30 days before given date
SELECT NOW() - interval '30 days';
SELECT *
FROM orders
WHERE purchaseDate <= now() - interval '30 days'
SELECT NOW() - INTERVAL '1 year 2 months 3 days';
SELECT NOW() - INTERVAL '2 weeks ago';
SELECT NOW() - INTERVAL '1 year 3 hours 20 minutes';

⬆ back to top

92. Exercise: Date and Timestamp

/*
* DB: Employees
* Table: employees
* Question: Get me all the employees above 60, use the appropriate date functions
*/
SELECT AGE(birth_date), * FROM employees
WHERE (
   EXTRACT (YEAR FROM AGE(birth_date))
) > 60 ;
-- alternative
SELECT count(birth_date) FROM employees
WHERE birth_date < now() - interval '61 years' -- 61 years before the current date
/*
* DB: Employees
* Table: employees
* Question: How many employees where hired in February?
*/
SELECT count(emp_no) FROM employees
where EXTRACT (MONTH FROM hire_date) = 2;
/*
* DB: Employees
* Table: employees
* Question: How many employees were born in november?
*/
SELECT COUNT(emp_no) FROM employees
WHERE EXTRACT (MONTH FROM birth_date) = 11;
/*
* DB: Employees
* Table: employees
* Question: Who is the oldest employee?
*/
SELECT MAX(AGE(birth_date)) FROM employees;
/*
* DB: Store
* Table: orders
* Question: How many orders were made in January 2004?
*/
SELECT COUNT(orderid)
FROM orders
WHERE DATE_TRUNC('month', orderdate) = date '2004-01-01';

⬆ back to top

93. DISTINCT

remove duplicate
keep one row from each group of duplicates
multiple columns will evaulate based on the combination of columns

⬆ back to top

94. Exercise: Distinct Keyword

/*
* DB: Employees
* Table: titles
* Question: What unique titles do we have?
*/
SELECT DISTINCT title FROM titles;
/*
* DB: Employees
* Table: employees
* Question: How many unique birth dates are there?
*/
SELECT COUNT(DISTINCT birth_date)
from employees;
/*
* DB: World
* Table: country
* Question: Can I get a list of distinct life expectancy ages
* Make sure there are no nulls
*/
SELECT DISTINCT lifeexpectancy FROM country
WHERE lifeexpectancy IS NOT NULL
ORDER BY lifeexpectancy;

⬆ back to top

95. Sorting Data

sort data ascending or descending by column
using expressions

SELECT * FROM customers
ORDER BY LENGTH(name)

⬆ back to top

96. Multi Table SELECT

A join combines columns from one table to another

SELECT a.emp_no, b.salary, b.from_date
FROM employees AS a, salaries AS b
WHERE a.emp_no = b.emp_no
ORDER BY a.emp_no;

⬆ back to top

97. Inner Join

Table A [Matched] Table B
Inner join syntax is generally considered a best pratice. It’s more readable than using the WHERE syntax. It shows you what’s being joined
can become more complicated as you add more tables to combine

SELECT a.emp_no, b.salary, b.from_date
FROM employees AS a
INNER JOIN salaries AS b
ON a.emp_no = b.emp_no
ORDER BY a.emp_no;

-- You want to know the original salary and also salary at a promotion
SELECT a.emp_no, 
  CONCAT(a.first_name, a.last_name) AS "name",
  b.salary, 
  COALESCE(c.title, 'No title change'),
  COALESCE(c.from_date::text, '-') AS "title taken on"
FROM employees AS a
INNER JOIN salaries AS b
ON a.emp_no = b.emp_no
INNER JOIN titles AS c
ON c.emp_no = a.emp_no AND (
  c.from_date = (b.from_date + interval '2 days') OR
  c.from_date = b.from_date
)
ORDER BY a.emp_no;

⬆ back to top

98. Self Join

join a table to itself
a table has a foreign reference its primary key
similar to inner join

id	name	startDate	supervisorId
1	Binni	1990/01/13	2
2	Andrei	1980/01/23	2

CREATE TABLE employee( 
  id varchar(5) NOT NULL,
  name varchar(20) NULL,
  start_date date NULL,
  supervisorId varchar(5) NULL,
  CONSTRAINT id PRIMARY KEY(id),
  CONSTRAINT supervisorId FOREIGN KEY(supervisorId) REFERENCES employee(id)
);
INSERT INTO employee VALUES ('1', 'Boss Man', date '1980/01/23', '1');
INSERT INTO employee VALUES ('1a', 'Andrei', date '1980/01/23', '1');
INSERT INTO employee VALUES ('1abc', 'Boss Man', date '1980/01/23', '1a');

SELECT a.id, 
  a.name AS "employee", 
  b.name AS "supervisor"
FROM employee AS a, 
  employee AS b
WHERE a.supervisorId = b.id;

SELECT a.id, 
  a.name AS "employee", 
  b.name AS "supervisor"
FROM employee AS a
INNER JOIN employee AS b
ON a.supervisorId = b.id;

⬆ back to top

99. Outer Join

What if I need the rows that don’t match?
Table A <-> Table B
Left outer add data that don’t have a match from table A
Right outer add data that don’t have a match from table B
Any value that does not match is made to be NULL

SELECT *
FROM <table A> AS a
LEFT [OUTER] JOIN <table B> AS b
ON a.id = b.id;

-- You want to know every salary and also salary at a promotion
SELECT a.emp_no, 
  CONCAT(a.first_name, a.last_name) AS "name",
  b.salary, 
  COALESCE(c.title, 'No title change'),
  COALESCE(c.from_date::text, '-') AS "title taken on"
FROM employees AS a
INNER JOIN salaries AS b
ON a.emp_no = b.emp_no
LEFT JOIN titles AS c
ON c.emp_no = a.emp_no AND (
  c.from_date = (b.from_date + interval '2 days') OR
  c.from_date = b.from_date
)
ORDER BY a.emp_no;

⬆ back to top

100. Less Common Joins

-- cross join: create a combination of every row
SELECT * FROM <tableA>
CROSS JOIN <tableB>;
-- full outer join: returns result from both table whether they match or not
SELECT * FROM <tableA> AS a
FULL JOIN <tableB> AS b
ON a.id = b.id;

⬆ back to top

101. Inner-Join Exercises

/*
* DB: Store
* Table: orders
* Question: Get all orders from customers who live in Ohio (OH), New York (NY) or Oregon (OR) state
* ordered by orderid
*/
SELECT c.firstname, c.lastname, o.orderid FROM orders AS o
INNER JOIN customers AS c ON o.customerid = c.customerid
WHERE c.state IN ('NY', 'OH', 'OR')
ORDER BY o.orderid;
/*
* DB: Store
* Table: products
* Question: Show me the inventory for each product
*/
SELECT p.prod_id, i.quan_in_stock
FROM products as p
INNER JOIN inventory AS i oN p.prod_id = i.prod_id 
/*
* DB: Employees
* Table: employees
* Question: Show me for each employee which department they work in
*/
SELECT e.first_name, dp.dept_name
FROM employees AS e
INNER JOIN dept_emp AS de ON de.emp_no = e.emp_no
INNER JOIN departments AS dp ON dp.dept_no = de.dept_no

⬆ back to top

102. USING Keyword

simplify the join syntax

SELECT * 
FROM <tableA>
INNER JOIN <tableB> USING(id)

SELECT e.emp_no, e.first_name, d.dept_name 
FROM employees AS e
INNER JOIN dept_emp AS de ON de.emp_no = e.emp_no
INNER JOIN departments AS d USING(dept_no);

⬆ back to top

Section 6: Advanced SQL

103. GROUP BY

summarise or aggregate data by groups
why group data? to get in-depth information by group
GROUP BY splits data into groups to apply functions to group rather than entire table
When we GROUP BY, we apply the function by group
reduce all records found in the group to a single record
GROUP BY utilizes SPLIT-APPLY-COMBINE strategy
GROUP BY happens after WHERE / FROM
FROM -> WHERE -> GROUP BY -> SELECT -> ORDER

⬆ back to top

104. HAVING Keyword

what if I want to filter groups?
HAVING apply filters to a group as a whole
FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER

SELECT col1, COUNT(col2)
FROM <table>
WHERE col2 > X
GROUP BY col1
HAVING col1 === Y;

SELECT d.dept_name, 
  COUNT(e.emp_no) AS "# of employees"
FROM employees AS e
INNER JOIN dept_emp AS de
ON de.emp_no = e.emp_no
INNER JOIN departments AS d
ON de.dept_no = d.dept_no
WHERE e.gender = 'M'
GROUP BY d.dept_name
HAVING COUNT(e.emp_no) > 25000;

⬆ back to top

105. Ordering Grouped Data

SELECT d.dept_name, 
  COUNT(e.emp_no) AS "# of employees"
FROM employees AS e
INNER JOIN dept_emp AS de
ON de.emp_no = e.emp_no
INNER JOIN departments AS d
ON de.dept_no = d.dept_no
GROUP BY d.dept_name
ORDER BY COUNT(e.emp_no) DESC;

⬆ back to top

106. Group By Mental Model

GROUP BY emp_no
we can apply aggregate function to from_date and salary
however, we cannot get the salary of MAX(from_date)
or cannot get the from_date of MAX(salary)

SELECT emp_no, MAX(from_date), MAX(salary)
FROM salaries
GROUP BY emp_no;

⬆ back to top

107. Grouping Sets

What if we want to combine the results of multiple groupings?
UNION removes duplicate records
UNION ALL does not remove duplicate records
Grouping Sets is a subclause of GROUP BY that allows you to define multiple groupings

SELECT NULL AS "prod_id", SUM(ol.quantity)
FROM "public"."orderlines" AS ol
UNION
SELECT prod_id AS "prod_id", SUM(ol.quantity)
FROM "public"."orderlines" AS ol
GROUP BY prod_id
ORDER BY prod_id DESC;

SELECT prod_id AS "prod_id", SUM(ol.quantity)
FROM "public"."orderlines" AS ol
GROUP BY 
  GROUPING SETS (
    (),
    (prod_id)
  )
ORDER BY prod_id DESC;

SELECT prod_id AS "prod_id", orderlineid, SUM(ol.quantity)
FROM "public"."orderlines" AS ol
GROUP BY 
  GROUPING SETS (
    (),
    (prod_id),
    (orderlineid)
  )
ORDER BY prod_id DESC, orderlineid DESC;

⬆ back to top

108. Rollup

useful to replace GROUPING SETS that need all combinations

SELECT EXTRACT (YEAR FROM orderdate) AS "year",
  EXTRACT (MONTH FROM orderdate) AS "month",
  EXTRACT (DAY FROM orderdate) AS "day",
  sum(ol.quantity)
FROM orderlines AS ol
GROUP BY
  GROUPING SETS (
    (EXTRACT (YEAR FROM orderdate)),
    (
      EXTRACT (YEAR FROM orderdate),
      EXTRACT (MONTH FROM orderdate)
    ),
    (
      EXTRACT (YEAR FROM orderdate),
      EXTRACT (MONTH FROM orderdate),
      EXTRACT (DAY FROM orderdate)
    ),
    (
      EXTRACT (MONTH FROM orderdate),
      EXTRACT (DAY FROM orderdate)
    ),
    (EXTRACT (MONTH FROM orderdate)),
    (EXTRACT (DAY FROM orderdate)),
    ()
  )
ORDER BY
  EXTRACT (YEAR FROM orderdate),
  EXTRACT (MONTH FROM orderdate),
  EXTRACT (DAY FROM orderdate);

SELECT EXTRACT (YEAR FROM orderdate) AS "year",
  EXTRACT (MONTH FROM orderdate) AS "month",
  EXTRACT (DAY FROM orderdate) AS "day",
  sum(ol.quantity)
FROM orderlines AS ol
GROUP BY
  ROLLUP (
    EXTRACT (YEAR FROM orderdate),
    EXTRACT (MONTH FROM orderdate),
    EXTRACT (DAY FROM orderdate)
  )
ORDER BY
  EXTRACT (YEAR FROM orderdate),
  EXTRACT (MONTH FROM orderdate),
  EXTRACT (DAY FROM orderdate);

⬆ back to top

109. Group By Exercises

/*
*  How many people were hired on did we hire on any given hire date?
*  Database: Employees
*  Table: Employees
*/
SELECT a.hire_date, COUNT(b.hire_date) as "amount"
FROM employees as a, employees as b
WHERE a.hire_date = b.hire_date
GROUP BY a.hire_date
ORDER BY "amount" DESC;
/*
*  Show me all the employees, hired after 1991, that have had more than 2 titles
*  Database: Employees
*/
SELECT e.emp_no, count(t.title) as "amount of titles"
FROM employees as e
JOIN titles as t USING(emp_no)
WHERE EXTRACT (YEAR FROM e.hire_date) > 1991
GROUP BY e.emp_no
HAVING count(t.title) > 2
ORDER BY e.emp_no;
/*
*  Show me all the employees that have had more than 15 salary changes that work in the department development
*  Database: Employees
*/
SELECT e.emp_no, count(s.from_date) as "amount of raises"
FROM employees as e
JOIN salaries as s USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
WHERE de.dept_no = 'd005'
GROUP BY e.emp_no
HAVING count(s.from_date) > 15
ORDER BY e.emp_no;
/*
*  Show me all the employees that have worked for multiple departments
*  Database: Employees
*/
SELECT e.emp_no, count(de.dept_no) as "worked for # departments"
FROM employees as e
JOIN dept_emp AS de USING(emp_no)
GROUP BY e.emp_no
HAVING count(de.dept_no) > 1
ORDER BY e.emp_no;

⬆ back to top

110. Window What?

group data is useful
group happens after FROM / WHERE
HAVING is a special filter for groups
GROUPING SETS AND ROLLUP are useful for multiple groupings in a single query
Group data is not a silver bullet

Window functions

How do we apply functions against a set of rows related to the current row?
add average to every salary to see how much each employee is from average

⬆ back to top

111. Looking Through The Window

Window Functions
window functions create a new column based on functions performed on a subset or “window” of the data

window_function(arg1, arg2,..) OVER (
  [PARTITION BY partition_expression]
  [ORDER BY sort_expression [ASC | DESC] [NULLS {FIRST | LAST}]
)

-- how to display max salary in each indivodual row?
SELECT *, MAX(salary) OVER() 
FROM salaries
LIMIT 100;

⬆ back to top

112. PARTITION BY

divide rows into groups to apply the function against

SELECT *, AVG(salary) 
OVER (
  PARTITION BY d.dept_name
) 
FROM salaries
JOIN dept_emp AS de USING (emp_no)
JOIN departments AS d USING (dept_no);

⬆ back to top

113. Order By Acting Strange

ORDER BY: order the results

window function ORDER BY

ORDER BY changes the frame of the window function
accumulating: take account of everything before me and myself

SELECT emp_no, salary, COUNT(salary) 
OVER (ORDER BY emp_no) 
FROM salaries

⬆ back to top

114. Using Framing In Window Function

when use a frame, we can create a sub-range or frame
without ORDER BY, by default the frame is usually all partition rows
with ORDER BY, by default the frame is usually everything before current row and current row

Window Function Calls

Key: Meaning

Rows or range: use a range or rows as a frame
preceding: rows before the current one
following: rows after the current one
unbounded preceding or following: returns all before or after
current row: your current row

PARTITION BY category 
ORDER BY price 
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

SELECT emp_no, salary, 
COUNT(salary) OVER (ORDER BY emp_no) 
FROM salaries

SELECT emp_no, salary, 
COUNT(salary) OVER (
  PARTITION BY emp_no
  ORDER BY salary
) 
FROM salaries

SELECT emp_no, salary, 
COUNT(salary) OVER (
  PARTITION BY emp_no
  ORDER BY salary
  RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) 
FROM salaries

⬆ back to top

115. Solving For Current Salary

Window Functions

SELECT e.emp_no, e.first_name, d.dept_name, MAX(s.salary)
FROM salaries AS s
JOIN employees AS e USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
JOIN departments AS d USING(dept_no)
GROUP BY e.emp_no, e.first_name, d.dept_name
ORDER BY e.emp_no;

SELECT DISTINCT e.emp_no, 
  LAST_VALUE(salary) OVER (
    PARTITION BY s.emp_no
    ORDER BY s.from_date
    RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  )
FROM salaries AS s
JOIN employees AS e USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
JOIN departments AS d USING(dept_no)
ORDER BY e.emp_no;

SELECT DISTINCT e.emp_no, 
  LAST_VALUE(s.from_date) OVER (
    PARTITION BY s.emp_no
    ORDER BY s.from_date
    RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ),
  LAST_VALUE(salary) OVER (
    PARTITION BY s.emp_no
    ORDER BY s.from_date
    RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  )
FROM salaries AS s
JOIN employees AS e USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
JOIN departments AS d USING(dept_no)
ORDER BY e.emp_no;

SELECT DISTINCT e.emp_no, 
  s.from_date,
  s.salary
FROM salaries AS s
JOIN employees AS e USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
JOIN departments AS d USING(dept_no)
ORDER BY e.emp_no;

⬆ back to top

116. FIRST_VALUE

return a value evaluated against the first row within its partition

SELECT prod_id, price, category
  FIRST_VALUE(price) OVER (
    PARTITION BY category
    ORDER BY price
    RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS "cheapest in category"
FROM products

SELECT prod_id, price, category,
  MIN(price) OVER (
    PARTITION BY category
  ) AS "cheapest in category"
FROM products

⬆ back to top

117. LAST_VALUE

return a value evaluated against the last row within its partition

SELECT prod_id, price, category,
  LAST_VALUE(price) OVER (
    PARTITION BY category
    ORDER BY price
    RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS "most expensive in category"
FROM products

SELECT prod_id, price, category,
  MAX(price) OVER (
    PARTITION BY category
  ) AS "most expensive in category"
FROM products

⬆ back to top

118. SUM

sum values within a group depend on the frame

SELECT o.orderid, o.customerid, o.netamount,
  SUM(o.netamount) OVER (
    PARTITION BY o.customerid
    ORDER BY o.orderid
  ) AS "cum sum"
FROM orders as o
ORDER BY o.customerid

⬆ back to top

119. ROW_NUMBER

number the current row within the partition starting from 1 regardless of framing

SELECT prod_id, price, category,
  ROW_NUMBER() OVER(
    PARTITION BY category
    ORDER BY price
  ) AS "position in category by price"
FROM products

⬆ back to top

120. Window Function Exercises

/*
*  Show the population per continent
*  Database: World
*  Table: Country
*/
SELECT
  DISTINCT continent,
  SUM(population) OVER w1 as "continent population"
FROM country 
WINDOW w1 AS( PARTITION BY continent );
/*
*  To the previous query add on the ability to calculate the percentage of the world population
*  What that means is that you will divide the population of that continent by the total population and multiply by 100 to get a percentage.
*  Make sure you convert the population numbers to float using `population::float` otherwise you may see zero pop up
*
*  Database: World
*  Table: Country
*/
SELECT
  DISTINCT continent,
  SUM(population) OVER w1 as"continent population",
  CONCAT( 
      ROUND( 
          ( 
            SUM( population::float4 ) OVER w1 / 
            SUM( population::float4 ) OVER() 
          ) * 100    
      ),'%' ) as "percentage of population"
FROM country 
WINDOW w1 AS( PARTITION BY continent );
/*
*  Count the number of towns per region
*
*  Database: France
*  Table: Regions (Join + Window function)
*/
SELECT 
DISTINCT r.id, 
r."name", 
COUNT(t.id) OVER (
    PARTITION BY r.id
    ORDER BY r."name"
) AS "# of towns"
FROM regions AS r
JOIN departments AS d ON r.code = d.region 
JOIN towns AS t ON d.code = t.department
ORDER BY r.id;

⬆ back to top

121. Conditional Statements

What if you only want to select something when a certain criteria is met?
case statements can be used in multiple places in a query
each return must be a single output

Case statement

SELECT a,
  CASE 
    WHEN a=1 THEN 'one'
    WHEN a=2 THEN 'two'
  END
FROM test;

Render custom row data

SELECT 
  o.orderid,
  o.customerid,
  CASE 
    WHEN o.customerid = 1 
    THEN 'my first customer'
    ELSE 'not my first customer'
  END,
  o.netamount
FROM orders as o
ORDER BY o.customerid;

Filter in a where

SELECT 
  o.orderid,
  o.customerid,
  o.netamount
FROM orders as o
WHERE CASE 
  WHEN o.customerid > 10 
  THEN o.netamount < 100 
  ELSE o.netamount > 100 
END
ORDER BY o.customerid;

In an aggregate function

SELECT SUM(
  CASE 
    WHEN o.netamount <100
    THEN -100 
    ELSE o.netamount
  END
) AS "returns",
  SUM(o.netamount) AS "normal total"
FROM orders as o;

⬆ back to top

122. NULLIF

what if you want to return NULL if a condition is met?
if value 1 = value 2, return NULL
fill in empty spots with a NULL to avoid divide by zero issues

NULLIF(val_1, val_2)
NULLIF(0, 0)  -- NULL
NULLIF('ABC', 'DEF')  -- 'ABC'

⬆ back to top

123. Views…What Are They Good For?

What if you want to store the results of a query?
What if you want to query the results of a query?
view allow you to store and query previously run queries
there are 2 types of views materialized and non-materialized
non-materialized: query get re-run each time the view is called
materialized: store the data physically and periodically updates it when tables change

⬆ back to top

124. View Syntax

view is the output of the query we ran
view act like tables you can query them
view take very little space to store
we only store the definition of a view
not all of the data that it returns

CREATE VIEW view_name AS query;
CREATE OR REPLACE view_name AS query;
ALTER VIEW <view_name> RENAME TO <view_name>;
DROP VIEW [ IF EXISTS] <view_name>;

⬆ back to top

125. Using Views

get the most recent salary of an employee
use window functions
view is easier to reason then window functions

SELECT e.emp_no, e.first_name, d.dept_name, MAX(s.salary)
FROM salaries AS s
JOIN employees AS e USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
JOIN departments AS d USING(dept_no)
GROUP BY e.emp_no, e.first_name, d.dept_name
ORDER BY e.emp_no;

SELECT DISTINCT e.emp_no, 
  LAST_VALUE(salary) OVER (
    PARTITION BY s.emp_no
    ORDER BY s.from_date
    RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  )
FROM salaries AS s
JOIN employees AS e USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
JOIN departments AS d USING(dept_no)
ORDER BY e.emp_no;

CREATE OR REPLACE VIEW last_salary_change AS
SELECT e.emp_no, 
  MAX(s.from_date)
FROM salaries AS s
JOIN employees AS e USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
JOIN departments AS d USING(dept_no)
GROUP BY e.emp_no
ORDER BY e.emp_no;

SELECT * FROM salaries
JOIN last_salary_change AS l USING(emp_no)
WHERE from_date = l.max
ORDER BY emp_no;

SELECT s.emp_no, 
  d.dept_name, 
  s.from_date, 
  s.salary
FROM last_salary_change
JOIN salaries AS s USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
JOIN departments AS d USING(dept_no)
WHERE max = s.from_date;

⬆ back to top

126. Indexes

Poor database indexing – a SQL query performance killer – recommendations

What is Indexes?

An index is a pointer to data in a table
An index is a construct to speed up data access in a table
An index is used to find data without having to scan the whole table
An index is used to improve query performance with filter conditions
An index work like a table of contents to help you find a piece of data

What does it do?

speed up queries
slow down data insertion and updates

Types of indexes

Single column
Multi column
Unique (PK)
Partial
Implicit Indexes (be default)

CREATE UNIQUE INDEX <name>
on <table> (column1, column2, ...)
DROP INDEX <name>

When to use

Index foreign keys
Index primary keys and unique columns
Index on columns that end up in the ORDER BY / WHERE clause often

When NOT to use

Don’t add an index just to add an index
Don’t use indexes on small tables
Don’t use on tables that are updated frequently
Don’t use on columns that can contain NULL values
Don’t use on columns that have large values

⬆ back to top

127. Index Types

Index Types

Single column
- Most frequently used column in a query
- when? retrieve data that satisfies one condition
Multi column
- Most frequently used columns in a query
- when? retrieve data that satisfies multiple conditions
Unique
- normally primary key or column with unique value
- for speed and integrity
```
CREATE UNIQUE INDEX <name>
ON <table> (column1);
```

Partial

index over a subset of a table

CREATE INDEX <name>
ON <table> (<expression>);

Implicit Indexes (be default)
- auto created by database
- PRIMARY KEY
- UNIQUE KEY

EXPLAIN ANALYZE
SELECT "name", "district", "countrycode"
FROM "public"."city"
WHERE countrycode IN ('TUN', 'BE', 'NL');

CREATE INDEX idx_countrycode
ON city (countrycode);

Without Index

With Index

CREATE INDEX idx_countrycode
ON city (countrycode)
WHERE countrycode IN ('TUN', 'BE', 'NL');

With Partial Index

⬆ back to top

128. Index Algorithms

Postgres provides several types of indexes

B-Tree: default algorithm
- best used for comparisons
- <, <=, =, >, >=, BETWEEN, IN, IS NULL, IS NOT NULL
Hash
- can only handle equality (=) operations
Gin: generalized inverted index
- best used when multiple values are stored in a single field
Gist: generalized search tree
- useful in index geometric data and full-text search

Each index type uses a different algorithm

CREATE [UNIQUE] INDEX <name>
ON <table> USING <method> (column1, ...)

⬆ back to top

129. What Are Subqueries?

a construct that allow you to build extremely complex query
also called inner query or inner select
is a query within another query
most often found in the WHERE clause
can also used it in SELECT, FROM and HAVING clause
return a single value the SELECT clause,
return a single column with single or multiple rows the WHERE and HAVING clause
return a table in the FROM or JOIN clause

SELECT *
FROM <table>
WHERE <column> <condition> (
  SELECT <column> 
  FROM <table>
  [WHERE | GROUP BY | ORDER BY | ...] 
)

SELECT (
  SELECT <column> 
  FROM <table>
  [WHERE | GROUP BY | ORDER BY | ...] 
)
FROM <table> AS <name>

SELECT *
FROM (
  SELECT <column>, <column>, ...
  FROM <table>
  [WHERE | GROUP BY | ORDER BY | ...] 
) AS <name>

SELECT *
FROM <table> AS <name>
GROUP BY <column>
HAVING (
  SELECT <column> 
  FROM <table>
  [WHERE | GROUP BY | ORDER BY | ...] 
)

⬆ back to top

130. Subqueries vs Joins

how a subquery different from a join?
both combine data from different tables
subquery could stand alone but not join
join combines rows from one or more tables based on match condition
subquery can return a single result or a row set(table)
joins can only return a row set(table)
a subquery’s results cannot be reference by outer query
a joined table can be reference by outer query
use a join if possible over subquery for performance reason

SELECT AVG(price) FROM products
SELECT title, price, 
  (SELECT AVG(price) FROM products) AS "global average price"
FROM products;

SELECT prod_id, title, price, i.quan_in_stock 
FROM products
JOIN inventory AS i USING(prod_id)

SELECT title, price, 
  (SELECT AVG(price) FROM products) AS "global average price"
FROM (
  SELECT * FROM products WHERE price < 10
) AS "products_sub";

⬆ back to top

131. Subquery Guidelines As Types

a subquery must be enclosed in parentheses
must be placed on the right side of the comparision operator
cannot manipulate results internally (order by ignored)
use single row operators with single-row subquery
subquery return NULL may not return results

SELECT *
FROM <table> AS <name>
WHERE X >= | <= | = | != (
  SELECT MAX(<column>)
  FROM <table>
  [WHERE | GROUP BY | ORDER BY | ...] 
)

Types of subquery

single row: return 0 or 1 row
multiple row: return 0 or more rows
multiple column: return 0 or more columns
correlated: reference 1 or more columns in the other statement - runs against each row
correlated: can have performance bottleneck
nested: subquery in a subquery

single row

SELECT name, salary
FROM salaries
WHERE salary = 
  (SELECT AVG(salary) FROM salaries);

SELECT name, salary,
  (SELECT AVG(salary) FROM salaries) 
  AS "Company average salary"
FROM salaries;

multiple row

SELECT title, price, category
FROM products
WHERE category IN (
  SELECT category FROM categories
  WHERE categoryname IN ('Comedy', 'Family', 'Classics')
);

multiple column

SELECT emp_no, salary, 
  dea.avg AS "Department average salary"
FROM salaries AS s
JOIN dept_emp AS de USING(emp_no)
JOIN (
  SELECT dept_no, AVG(salary) 
  FROM salaries AS s2
  JOIN dept_emp AS e USING(emp_no)
  GROUP BY dept_no
) AS dea USING(dept_no)
WHERE salary > dea.avg;

correlated

SELECT emp_no, salary, from_date
FROM salaries AS s
WHERE from_date = (
  SELECT MAX(s2.from_date) AS max
  FROM salaries AS s2
  WHERE s2.emp_no = s.emp_no
)
ORDER BY emp_no;

nested

SELECT orderlineid, prod_id, quantity
FROM orderlines
JOIN (
  SELECT prod_id
  FROM products
  WHERE category IN (
    SELECT category FROM categories
    WHERE categoryname IN ('Comedy', 'Family', 'Classics')
  )
) AS limited USING(prod_id)

⬆ back to top

132. Using Subqueries

show all employees older than the average age

SELECT 
  first_name, 
  last_name, 
  birth_date, 
  AGE(birth_date),
  (
    SELECT AVG(AGE(birth_date)) 
    FROM employees
  )
FROM employees
WHERE AGE(birth_date) > (
  SELECT AVG(AGE(birth_date)) 
  FROM employees
);

Show the title by salary for each employee

SELECT 
  emp_no, 
  salary, 
  from_date, 
  (
    SELECT title 
    FROM titles AS t 
    WHERE t.emp_no = s.emp_no 
      AND t.from_date = s.from_date
  )
FROM salaries AS s
ORDER BY emp_no;

use join instead of subquery if possible for performance reason

SELECT 
  emp_no, 
  salary, 
  from_date,
  t.title
FROM salaries AS s
JOIN titles AS t USING(emp_no, from_date)
ORDER BY emp_no;

⬆ back to top

133. Getting The Latest Salaries

Show the most recent employee salary

SELECT 
  emp_no, 
  salary AS "most recent salary", 
  from_date
FROM salaries AS s
WHERE from_date = (
  SELECT MAX(from_date)
  FROM salaries AS sp
  WHERE sp.emp_no = s.emp_no
)
ORDER BY emp_no ASC;

SELECT   
  emp_no, 
  salary AS "most recent salary", 
  from_date 
FROM salaries AS s
JOIN last_salary_change AS ls USING(emp_no)
WHERE from_date = ls.max
ORDER BY emp_no;

SELECT 
  emp_no, 
  salary AS "most recent salary", 
  from_date
FROM salaries AS s
JOIN (
  SELECT emp_no, MAX(from_date)
  FROM salaries AS sp
  GROUP BY emp_no
) AS ls USING(emp_no)
WHERE ls.max = from_date
ORDER BY emp_no ASC;

⬆ back to top

134. Subquery Operators

operators you can apply in the WHERE clause on subquery
EXISTS: check if subquery returns any rows
IN: check if value is equal to any of the rows in the return (NULL yields NULL)
NOT IN: check if value is not equal to any of the rows in the return (NULL yields NULL)
ANY/SOME: check each row against the operator and if any comparison matches return TRUE
ALL: check each row against the operator and if all comparisons match return TRUE
SINGLE VALUE COMPARISON: subquery must return a single row check comparatore against row

SELECT firstname, lastname, income
FROM customers AS c
WHERE EXISTS (
  SELECT * FROM orders AS o
  WHERE c.customerid = o.customerid 
    AND totalamount > 400 
) AND income > 90000;

SELECT prod_id
FROM products
WHERE category IN (
  SELECT category FROM categories
  WHERE categoryname IN ('Comedy', 'Family', 'Classics')
);

SELECT prod_id
FROM products
WHERE category IN (
  SELECT category FROM categories
  WHERE categoryname NOT IN ('Comedy', 'Family', 'Classics')
);

SELECT prod_id
FROM products
WHERE category = ANY (
  SELECT category FROM categories
  WHERE categoryname IN ('Comedy', 'Family', 'Classics')
);

SELECT prod_id, title, sales
FROM products
JOIN inventory AS i USING(prod_id)
WHERE i.sales > ALL (
  SELECT AVG(sales) FROM inventory
  JOIN products AS pl USING(prod_id)
  GROUP BY pl.category
);

SELECT prod_id
FROM products
WHERE category = (
  SELECT category FROM categories
  WHERE categoryname IN ('Comedy')
);

⬆ back to top

135. Subquery Exercises

/* TRY TO WRITE THESE AS JOINS FIRST */
/*
* DB: Store
* Table: orders
* Question: Get all orders from customers who live in Ohio (OH), New York (NY) or Oregon (OR) state
* ordered by orderid
*/
SELECT c.firstname, c.lastname, o.orderid 
FROM orders AS o, (
    SELECT customerid, state, firstname, lastname
    FROM customers
) AS c
WHERE  o.customerid = c.customerid AND 
c.state IN ('NY', 'OH', 'OR')
ORDER BY o.orderid;
/*
* DB: Employees
* Table: employees
* Question: Filter employees who have emp_no 110183 as a manager
*/
SELECT emp_no, first_name, last_name
FROM employees
WHERE emp_no IN (
    SELECT emp_no
    FROM dept_emp
    WHERE dept_no = (
        SELECT dept_no 
        FROM dept_manager
        WHERE emp_no = 110183
    )
)
ORDER BY emp_no
-- Written with JOIN
SELECT e.emp_no, first_name, last_name
FROM employees as e
JOIN dept_emp as de USING (emp_no)
JOIN dept_manager as dm USING (dept_no)
WHERE dm.emp_no = 110183

⬆ back to top

Section 7: Database Management

136. What we’ll be learning

different types of data
how to create a database
how to create tables
how to insert/update data
… so more more

⬆ back to top

137. Time To Create Some Stuff!

Why is it important?

create and maintain data is extremely hard

Data definition language (DDL)

commands to create, modify, delete different structures
commands for add, delete, modify data

⬆ back to top

138. Types Of Databases In A RDBMS

Types Of Databases

regular
template: build a regular database based on a tempate

Create a database

when you setup your postgres, 3 databases were created
- postgres
- template0
- template1
postgres is the default database that is created when you setup postgres (initdb)

⬆ back to top

139. Default PostgreSQL Database

Postgres Database

So the first time you initialize postgres, you can connect to postgres database to setup your first database
- psql -U <user> <database>
postgres by default will assume a connection to a database with the same name as the user if no database is supplied
- psql -U postgres
This command shows you your current connection
- \conninfo

⬆ back to top

140. Template Databases

Template0 Database

This is the template that is used to create template1 - never change it
It is in essence a backup template

Template1 Database

This is the template that is used to create new databases
Any changes we make to the template are automatically applied to all new databases
one thing to note is that becase template1 is the default template - if it is being accessed / connected to no new databases can be created until the connection is closed!
Case 1: connect to Template1 and try to create new database

psql -U postgres template1
template1=#

psql -U postgres Employees
Employees=# CREATE DATABASE myblockeddb;
ERROR:  source database "template1" is being accessed by other users
DETAIL:  There is 1 other session using the database.

Case 2: exit Template1 and try to create new database

psql -U postgres Employees
Employees=# CREATE DATABASE myblockeddb;
CREATE DATABASE

Case 3: connect to Template1 and try to create new database with template0

psql -U postgres template1
template1=#

psql -U postgres Employees
Employees=# CREATE DATABASE myblockeddb2 WITH TEMPLATE template0;
CREATE DATABASE

Template Databases

You can create your own template database to create other database with. But this however not very common!

CREATE DATABASE mysupertemplate;
CREATE TABLE supertable ();
CREATE DATABASE mysuperdatabase with TEMPLATE mysupertemplate;

⬆ back to top

141. Creating A Database

So how to create a database?
CREATE DATABASE

CREATE DATABASE name
  [ [ WITH ] 
    [ OWNER [=] user_name ]
    [ TEMPLATE [=] template ]
    [ ENCODING [=] encoding ]
    [ LC_COLLATE [=] lc_collate ]
    [ LC_CTYPE [=] lc_ctype ]
    [ TABLESPACE [=] tablespace ]
    [ CONNECTION_LIMIT [=] connlimit ] ]

Setting	Default
TEMPLATE	template1
ENCODING	UTF8
CONNECTION_LIMIT	100
OWNER	Current User

Lets create an exmple database to store courses!

psql -U postgres postgres
postgres=# CREATE DATABASE ztm;
CREATE DATABASE
postgres=# DROP DATABASE ztm;
DROP DATABASE

⬆ back to top

142. Database Organization

What is Database Schema
What is a Database Schema
Databases often contain many tables, view etc depending on how much you care, you may want to organize them in a logical way!

Postgres schema

Postgres offers the concept of schemas. Think of it like a box in which you can organise tables, views, indexes, etc
by default, each database gets a public schema
unless you specific a schema, the default us always assumed to be public!

SELECT * FROM employees
SELECT * FROM public.employees

postgres=# \dn
  List of schemas
  Name  |  Owner   
--------+----------
 public | postgres
(1 row)

Create a schema

postgres=# CREATE SCHEMA sales;
CREATE SCHEMA
postgres=# \dn
   List of schemas
  Name  |    Owner    
--------+-------------
 public | postgres
 sales  | postgres
(2 rows)

Reasons to use schemas

allow many users to use one database without interfering with each other
organise database objects into logical groups to make them more management
3rd party applications can be put into seperate schemas so they do not collide with the names of other objects

Restricted

create databases is a restricted action
not everyone is allowed to do it

⬆ back to top

143. Roles In Postgres

Roles

roles are vital to any dbms they determine what’s allowed
a role can be a user or a group
it depends on how you setup the role
roles have the ability to grant membership to another role
a role have attributes which define privileges

⬆ back to top

144. Role Attributes And Creation

Role Attribute

privileges of a role are determined in part by its attributes
attribute example
- SUPERUSER | NOSUPERUSER
- CREATEDB | NOCREATEDB
- CREATEROLE | NOCREATEROLE
- LOGIN | NOLOGIN

Create role

always encrypt when storing a role that can log in

CREATE ROLE readonly WITH LOGIN ENCRYPTED PASSWORD 'redaonly';

by default only the creator of the database or superuser has access to its onject
```
SELECT * FROM departments;
ERROR: permission denied for table departments
```

postgres=# \du
                                    List of roles
  Role name  |                         Attributes                         | Member of 
-------------+------------------------------------------------------------+-----------
 chesterheng | Superuser, Create role, Create DB                          | {}
 postgres    | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

⬆ back to top

A user is the same a role
It assumes login by default

CREATE USER user1 WITH ENCRYPTED PASSWORD 'user1';

postgres=# CREATE role test_role_with_login WITH LOGIN ENCRYPTED PASSWORD 'password';
CREATE ROLE
postgres=# \du
                                         List of roles
      Role name       |                         Attributes                         | Member of 
----------------------+------------------------------------------------------------+-----------
 chesterheng          | Superuser, Create role, Create DB                          | {}
 postgres             | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 test_role_with_login |    
postgres=# CREATE USER test_user_with_login WITH ENCRYPTED PASSWORD 'password';
CREATE ROLE
postgres=# \du
                                         List of roles
      Role name       |                         Attributes                         | Member of 
----------------------+------------------------------------------------------------+-----------
 chesterheng          | Superuser, Create role, Create DB                          | {}
 postgres             | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 test_role_with_login |                                                            | {}
 test_user_with_login |

createuser --interactive
Enter name of role to add: test_interactive
Shall the new role be a superuser? (y/n) n
Shall the new role be allowed to create databases? (y/n) n
Shall the new role be allowed to create more new roles? (y/n) y

postgres=# \du
                                         List of roles
      Role name       |                         Attributes                         | Member of 
----------------------+------------------------------------------------------------+-----------
 chesterheng          | Superuser, Create role, Create DB                          | {}
 postgres             | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 test_interactive     | Create role                                                | {}
 test_role_with_login |                                                            | {}
 test_user_with_login |   
postgres=# ALTER ROLE test_interactive WITH ENCRYPTED PASSWORD 'password';
ALTER ROLE

Did not prompt for password login to Employees database with test_interactive user

psql -U test_interactive Employees
Employees=>

Force to prompt for password login to Employees database with test_interactive user
Can get in with any password
Configuration of postgres with trust all default connection by default

psql -U test_interactive Employees -W
Employees=>

show hba_file and config_file
hba_file
Password Authentication
config_file

postgres=# show hba_file;
                                  hba_file                                  
----------------------------------------------------------------------------
 /Users/chesterheng/Library/Application Support/Postgres/var-12/pg_hba.conf
(1 row)
postgres=# show config_file;
                                  config_file                                   
--------------------------------------------------------------------------------
 /Users/chesterheng/Library/Application Support/Postgres/var-12/postgresql.conf
(1 row)

Set method to scram-sha-256 in hba_file

# TYPE  DATABASE        USER            ADDRESS                 METHOD
# "local" is for Unix domain socket connections only
local   all             all                                     scram-sha-256
# IPv4 local connections:
host    all             all             127.0.0.1/32            scram-sha-256
# IPv6 local connections:
host    all             all             ::1/128                 scram-sha-256
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     scram-sha-256
host    replication     all             127.0.0.1/32            scram-sha-256
host    replication     all             ::1/128                 scram-sha-256

Login to Employees database with test_interactive user
will ask for password now

psql -U test_interactive Employees
Password for user test_interactive: 
psql: error: could not connect to server: FATAL:  password authentication failed for user "test_interactive"

need to tell postgres to use scram-sha-256 in config_file

password_encryption = scram-sha-256        # md5 or scram-sha-256

Set method back to trust in hba_file
Alter test_interactive to use password

# TYPE  DATABASE        USER            ADDRESS                 METHOD
# "local" is for Unix domain socket connections only
local   all             all                                     trust
# IPv4 local connections:
host    all             all             127.0.0.1/32            trust
# IPv6 local connections:
host    all             all             ::1/128                 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     trust
host    replication     all             127.0.0.1/32            trust
host    replication     all             ::1/128                 trust
postgres=# ALTER USER test_interactive WITH PASSWORD 'password';
ALTER ROLE

Set method to scram-sha-256 in hba_file

# TYPE  DATABASE        USER            ADDRESS                 METHOD
# "local" is for Unix domain socket connections only
local   all             all                                     scram-sha-256
# IPv4 local connections:
host    all             all             127.0.0.1/32            scram-sha-256
# IPv6 local connections:
host    all             all             ::1/128                 scram-sha-256
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     scram-sha-256
host    replication     all             127.0.0.1/32            scram-sha-256
host    replication     all             ::1/128                 scram-sha-256

psql -U test_interactive Employees
Password for user test_interactive: 
Employees=>

set method trust for postgres user

# TYPE  DATABASE        USER            ADDRESS                 METHOD
# "local" is for Unix domain socket connections only
local   all             postgres                                trust

psql -U postgres postgres
postgres=# exit

⬆ back to top

146. Privileges

attributes can only define so many privileges
when you are not a superuser
what you can do is fairly limited
depending on what default provoleges are set
by default objects are only available to the one who creates them
privileges need to be granted for new roles and users to have access to certain data

Grant Privileges

be wise about the privileges you grant

GRANT ALL PRIVILEGES ON <table> TO <user>;
GRANT ALL ON ALL TABLES [IN SCHEMA <schema>] TO <user>;
GRANT [SELECT, INSERT, UPDATE, DELETE, ...] ON <table> [IN SCHEMA <schema>] TO <user>;

⬆ back to top

147. Granting Privileges and Role Management

Set method back to trust in hba_file for convenient

# TYPE  DATABASE        USER            ADDRESS                 METHOD
# "local" is for Unix domain socket connections only
local   all             all                                     trust
# IPv4 local connections:
host    all             all             127.0.0.1/32            trust
# IPv6 local connections:
host    all             all             ::1/128                 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     trust
host    replication     all             127.0.0.1/32            trust
host    replication     all             ::1/128                 trust

create new user privilegetest
privilegetest has no privilege to query titles table created by postgres

createuser --interactive
Enter name of role to add: privilegetest
Shall the new role be a superuser? (y/n) n
Shall the new role be allowed to create databases? (y/n) n
Shall the new role be allowed to create more new roles? (y/n) n
psql -U privilegetest Employees
Employees=# \conninfo
You are connected to database "Employees" as user "privilegetest" via socket in "/tmp" at port "5432".
Employees=# \dt
              List of relations
 Schema |     Name     | Type  |    Owner    
--------+--------------+-------+-------------
 public | departments  | table | postgres
 public | dept_emp     | table | postgres
 public | dept_manager | table | postgres
 public | employees    | table | postgres
 public | salaries     | table | postgres
 public | titles       | table | postgres
(6 rows)
Employees-# SELECT * FROM titles;
ERROR:  syntax error at or near "SELECT"
LINE 2: SELECT * FROM titles;
        ^

use postgres user to grant privilegetest the privilege to query titles table created by postgres

psql -U postgres Employees
Employees=# GRANT SELECT ON titles TO privilegetest;
GRANT

use privilegetest to query titles table created by postgres

psql -U privilegetest Employees
Employees=> SELECT * FROM titles;
 emp_no |       title        | from_date  |  to_date   
--------+--------------------+------------+------------
  10001 | Senior Engineer    | 1986-06-26 | 9999-01-01
  10002 | Staff              | 1996-08-03 | 9999-01-01
  10003 | Senior Engineer    | 1995-12-03 | 9999-01-01
  10004 | Engineer           | 1986-12-01 | 1995-12-01
  10004 | Senior Engineer    | 1995-12-01 | 9999-01-01
  10005 | Senior Staff       | 1996-09-12 | 9999-01-01
  10005 | Staff              | 1989-09-12 | 1996-09-12
  10006 | Senior Engineer    | 1990-08-05 | 9999-01-01
  10007 | Senior Staff       | 1996-02-11 | 9999-01-01
  10007 | Staff              | 1989-02-10 | 1996-02-11
  10008 | Assistant Engineer | 1998-03-11 | 2000-07-31
  10009 | Assistant Engineer | 1985-02-18 | 1990-02-18
  10009 | Engineer           | 1990-02-18 | 1995-02-18
  10009 | Senior Engineer    | 1995-02-18 | 9999-01-01
  10010 | Engineer           | 1996-11-24 | 9999-01-01
  10011 | Staff              | 1990-01-22 | 1996-11-09
  10012 | Engineer           | 1992-12-18 | 2000-12-18
  10012 | Senior Engineer    | 2000-12-18 | 9999-01-01
  10013 | Senior Staff       | 1985-10-20 | 9999-01-01
  10014 | Engineer           | 1993-12-29 | 9999-01-01
  10015 | Senior Staff       | 1992-09-19 | 1993-08-22
:

use postgres user to revoke privilegetest the privilege to query titles table created by postgres

psql -U postgres Employees
Employees=# REVOKE SELECT ON titles FROM privilegetest;
REVOKE

psql -U privilegetest Employees
Employees=> SELECT * FROM titles;
ERROR:  permission denied for table titles

grant all privileges to privilegetest in public schema

psql -U postgres Employees
Employees=# GRANT ALL ON ALL TABLES IN SCHEMA public TO privilegetest;
GRANT
Employees=# REVOKE ALL ON ALL TABLES IN SCHEMA public FROM privilegetest;
REVOKE

create role with granular option such as employee_read
grant role with granular option to specific user

Employees=# CREATE ROLE employee_read;
CREATE ROLE
Employees=# GRANT SELECT ON ALL TABLES IN SCHEMA public TO employee_read;
GRANT
Employees=# GRANT employee_read TO privilegetest;
GRANT ROLE
Employees=# \du
                                            List of roles
      Role name       |                         Attributes                         |    Member of    
----------------------+------------------------------------------------------------+-----------------
 chesterheng          | Superuser, Create role, Create DB                          | {}
 employee_read        | Cannot login                                               | {}
 postgres             | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 privilegetest        |                                                            | {employee_read}
 test_interactive     | Create role                                                | {}
 test_role_with_login |                                                            | {}
 test_user_with_login |                                                            | {}
Employees=# REVOKE employee_read FROM privilegetest;
REVOKE ROLE

⬆ back to top

148. Best Practices For Role Management

when manage roles and permissions, always go with the principle of least privilege
assign attributes and grant privileges to a role
assign a role to a specific user
don’t use admin / super user by default

⬆ back to top

149. Data Types & Boolean Type

Before we start

before we create a full-fledged database, it’s important to know what types of data we can store in a database

Data types

when we talk about types of data, we can store we categorize them into buckets or types
type: Numberic Types, Arrays, Character Types, Date/Time Types, Boolean Types, UUID Types
Data Types
A data type is nothing more than a constraint place on a field to only allow that type of data to be filled in
Specific a type is important because it tells the system how it can / should handle the data in that field
it’s also what allows the dbms to optimize it’s algorithms for processing data by knowing its type

Boolean

a boolean type can hold TRUE, FALSE or NULL
it can also do smart conversions
- 1, yes, y , t, true => TRUE
- 0, no, f false => FALSE

⬆ back to top

150. Storing Text

Character

postgres provides 3 character data types: CHAR(N), VARCHAR(N) and TEXT
CHAR(N): fixed length with space padding
VARCHAR(N): variable length with no padding
TEXT: unlimited length text

CREATE TABLE test_text (
  fixed char(4),
  variable varchar(20),
  unlimited text
);
INSERT INTO test_text VALUES ('mo', 'mo', 'I have unlimited space');
SELECT * FROM test_text;

⬆ back to top

151. Storing Numbers

there are 2 types of number in postgres: integers and floating points
integers: a whole number, a number that is not a fraction
3 types of integers: SMALLINT, INT, BIGINT
SMALLINT: number that can range from -32,768 to 32,767
INT: number that can range from -2,147,483,648 to 2,147,483,647
BIGINT: number that can range from -9223372036854775808 to 9223372036854775807
floating point: number that can contain a decimal point
FLOAT4: single or 6 digits precision
- 1.123456789 => 1.1234568 (reserve 6 digits and round off from 7th digits)
FLOAT8: double or 15 digits precision
- 1.12345678901234567 => 1.1234567890123457 (reserve 15 digits and round off from 16th digits)
DECIMAL / NUMBERIC: up to 131072 digits before the decimal point; up to 16383 digits after the demical point

CREATE TABLE test_float (
  four float4,
  eight float8,
  big decimal
);
INSERT INTO test_float VALUES (1.123456789, 1.12345678901234567, 1.12345678901234567);
SELECT * FROM test_float;

four	eight	big
1.1234568	1.1234567890123457	1.12345678901234567

⬆ back to top

152. Storing Arrays

an array is a group of elements of the same type
an array is denoted by a bracket syntax
every data type in postgres has an array equivalent
Example: store phone numbers

CREATE TABLE test_array (
  four char(2)[],
  eight text[],
  big float4[]
);
INSERT INTO test_array VALUES (ARRAY ['mo', 'm', 'm', 'd'], ARRAY ['test', 'long text', 'longer text'], ARRAY [1.23, 2.11, 3.23, 5.32]);
SELECT * FROM test_array;

four	eight	big
[“mo”,”m “,”m “,”d “]	[“test”,”long text”,”longer text”]	[1.23,2.11,3.23,5.32]

⬆ back to top

153. Data Models And Naming Conventions

Database Models

create a model before we create a database

What is a model?

a model is a design that is used to visualize what we are going to build

Naming conventions

table names must be singular!
columns must be lowercase with underscores
columns with mixed case are acceptable
columns with upper case are unacceptable
be consistence. write down your rules

⬆ back to top

154. CREATE TABLE

CREATE TABLE

CREATE TABLE <name> (
  <col1> TYPE [CONSTRAINT],
  table_constraint [CONSTRAINT]
) [INHERITS <existing_tables>];

Create Student table

CREATE TABLE student (
  student_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  first_name VARCHAR(255) NOT NULL,
  last_name VARCHAR(255) NOT NULL,
  email VARCHAR(255) NOT NULL,
  date_of_birth DATE NOT NULL
);

psql -U postgres ztm
ztm=# \conninfo
You are connected to database "ztm" as user "postgres" via socket in "/tmp" at port "5432".
ztm=# CREATE TABLE student (
ztm(#   student_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
ztm(#   first_name VARCHAR(255) NOT NULL,
ztm(#   last_name VARCHAR(255) NOT NULL,
ztm(#   email VARCHAR(255) NOT NULL,
ztm(#   date_of_birth DATE NOT NULL
ztm(# );
ERROR:  function uuid_generate_v4() does not exist
LINE 2:   student_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
                                              ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
ztm=# CREATE extension if not exists "uuid-ossp";
CREATE EXTENSION
ztm=# CREATE TABLE student (
ztm(#   student_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
ztm(#   first_name VARCHAR(255) NOT NULL,
ztm(#   last_name VARCHAR(255) NOT NULL,
ztm(#   email VARCHAR(255) NOT NULL,
ztm(#   date_of_birth DATE NOT NULL
ztm(# );
CREATE TABLE
ztm=# \dt
          List of relations
 Schema |  Name   | Type  |  Owner   
--------+---------+-------+----------
 public | student | table | postgres
(1 row)
ztm=# \d student
                               Table "public.student"
    Column     |          Type          | Collation | Nullable |      Default       
---------------+------------------------+-----------+----------+--------------------
 student_id    | uuid                   |           | not null | uuid_generate_v4()
 first_name    | character varying(255) |           | not null | 
 last_name     | character varying(255) |           | not null | 
 email         | character varying(255) |           | not null | 
 date_of_birth | date                   |           | not null | 
Indexes:
    "student_pkey" PRIMARY KEY, btree (student_id)

⬆ back to top

155. Extra information on CREATE TABLE

Temporary tables

They are a type of table that exist in a special schema, so you cannot define a schema name when declaring a temporary table.
These types of tables will be dropped at the end of your session. It’s important to also note that they are only visible to the creator.
Now you may be wondering, why would I ever use these? Well if you’re writing intensive queries against a data set it might be beneficial to temporarily create a table based off another table. This is because:
- Temporary tables behave just like normal ones
- Postgres will apply less “rules” (logging, transaction locking, etc.) to temporary tables so they execute more quickly
- You have full access rights to the data, if you otherwise didn’t so you can test things out.

CREATE TEMPORARY TABLE <name> (<columns>);

⬆ back to top

156. Column Constraints

On the Paramount Importance of Database Constraints

Constraints

Constraints are a tool to apply validation methods against data that will be inserted

Column Constraints

A column constraints is defined as part of a column definition

Examples

NOT NULL: cannot be NULL
PRIMARY KEY: column will be primary key
UNIQUE: can only contain unique values (NULL is unique)
CHECK: apply a special condition check against the values in the column
REFERENCES: constraints the values of the column to only values that exist in the column of another table (Foreign Key)

⬆ back to top

157. Table Constraints

Table Constraints

a table constraint definition is not tied to a particular column, and it can encompass more than one column

Examples

UNIQUE(column_list): can only contain unique values (NULL is unique)
PRIMARY KEY(column_list): columns that will be the primary key
CHECK(condition): a condition to check when insert or update
REFERENCES: Foreign Key relationship to a column
every column constraint can be written as a table constraint
use column constraint when involve one column only
use table constraint when involve many columns

CREATE TABLE student (
  student_id UUID DEFAULT uuid_generate_v4(),
  first_name VARCHAR(255) NOT NULL,
  last_name VARCHAR(255) NOT NULL,
  email VARCHAR(255) NOT NULL,
  date_of_birth DATE NOT NULL,
  CONSTRAINT pk_student_id PRIMARY KEY (student_id)
);

CREATE TABLE category (
  cat_id SMALLINT PRIMARY KEY,
  type TEXT
);
CREATE TABLE column_constraints (
  cc_id SMALLINT PRIMARY KEY,
  something TEXT NOT NULL,
  email TEXT CHECK (email ~* '^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+[.][A-Za-z]+$'),
  cat_id SMALLINT REFERENCES category(cat_id)
);
CREATE TABLE table_constraints (
  cc_id SMALLINT,
  something TEXT NOT NULL,
  email TEXT,
  cat_id SMALLINT REFERENCES category(cat_id),
    CONSTRAINT pk_table_constraints PRIMARY KEY (cc_id),
    CONSTRAINT valid_email CHECK (email ~* '^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+[.][A-Za-z]+$')
);
INSERT INTO category VALUES (1, 'category 1');
INSERT INTO column_constraints VALUES (1, 'something', 'test@email.com', 1);
INSERT INTO table_constraints VALUES (1, 'something', 'test@email.com', 1);
SELECT * FROM category;
SELECT * FROM table_constraints;
SELECT * FROM column_constraints;

Query Error when insert wrong email

INSERT INTO column_constraints VALUES (1, 'something', '@email.com', 1);
Query Error: error: new row for relation "column_constraints" violates check constraint "column_constraints_email_check"
INSERT INTO table_constraints VALUES (1, 'something', '@email.com', 1);
Query Error: error: new row for relation "table_constraints" violates check constraint "valid_email"

⬆ back to top

158. Regexes!

Regex for regular folk - A fun, illustrative and simple guide to learning regex
Regex Crossword - Challenging puzzles to practice your Regex-fu
Regex Search - A tool to find quick and easy regex references
Regex Tester - A Regex testing tool that supports multiple flavours
Regexone

⬆ back to top

159. UUID Explained

need to install an extension
extension are pieces of software to expand what postgres can do or expand how certain processes run
UUID extension allows us to generate unique identifiers for PRIMARY KEY
UUID stands for universally unique identifier

CREATE extension if not exists "uuid-ossp";

PROS	CONS
unique everywhere	large values to store
easy to shard	can have performance impact
easy to merge / replicate	more difficult to debug
expose less info about your system

⬆ back to top

160. Custom Data Types And Domains

Custom Data Types

Feedback custom data type
postgres allow to create custom data types to store shapes of data that are more complex

Domains

domain is a alias for an exitsing type that can have a check constraint

CREATE DOMAIN Rating SMALLINT 
  CHECK (VALUE > 0 AND VALUE <= 5);
CREATE TYPE Feedback AS (
  student_id UUID,
  rating Rating,
  feedback TEXT
);

⬆ back to top

161. Creating The Tables For ZTM

CREATE TABLE student (
  student_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  first_name VARCHAR(255) NOT NULL,
  last_name VARCHAR(255) NOT NULL,
  email VARCHAR(255) NOT NULL,
  date_of_birth DATE NOT NULL
);
CREATE TABLE subject (
  subject_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  subject TEXT NOT NULL,
  description TEXT
);
CREATE TABLE teacher (
  teacher_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  first_name VARCHAR(255) NOT NULL,
  last_name VARCHAR(255) NOT NULL,
  email VARCHAR(255) NOT NULL,
  date_of_birth DATE NOT NULL
);
CREATE TABLE course (
  course_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  "name" TEXT NOT NULL,
  description TEXT,
  subject_id UUID REFERENCES subject(subject_id),
  teacher_id UUID REFERENCES teacher(teacher_id),
  feedback feedback[]
);
CREATE TABLE enrollment (
  course_id UUID REFERENCES course(course_id),
  student_id UUID REFERENCES student(student_id),
  enrollment_date DATE NOT NULL,
  CONSTRAINT pk_enrollment PRIMARY KEY (course_id, student_id)
);

ALTER TABLE student
ADD COLUMN email VARCHAR(255) NOT NULL;

⬆ back to top

162. Extra information on ALTER TABLE

ALTER TABLE

Change the SCHEMA of a table
Change the name of a table-level CONSTRAINT
Add and remove constraints
Change a column level constraint

ALTER TABLE [ IF EXISTS ] [ ONLY ] name [ * ]
  ADD COLUMN <col> <type> <constraint>
ALTER TABLE [ IF EXISTS ] [ ONLY ] name [ * ]
  ALTER COLUMN <name> TYPE <new type> [USING <expression>]
ALTER TABLE [ IF EXISTS ] [ ONLY ] name [ * ]
  RENAME COLUMN <old name> TO <new name>
ALTER TABLE [ IF EXISTS ] [ ONLY ] name [ * ]
  DROP COLUMN <col> [ RESTRICT | CASCADE ]

Our main advice when using this command is to make sure you communicate changes early and set in place a plan for other systems/software to be able to migrate, the last thing you want to happen is changing a column and suddenly all your services start breaking!

⬆ back to top

163. Adding Students And Teachers

INSERT

INSERT INTO student (first_name, last_name, email, date_of_birth) VALUES ( 'Mo', 'Binni', 'mo@binni.io', '1992-11-13'::DATE);
INSERT INTO teacher (first_name, last_name, email, date_of_birth) VALUES ( 'Mo', 'Binni', 'mo@binni.io', '1992-11-13'::DATE);

⬆ back to top

164. Creating A Course

INSERT INTO subject (subject, description) VALUES ('SQL', 'A database management language');
DELETE FROM subject WHERE subject = 'SQL';
INSERT INTO course ("name", description) VALUES ('SQL Zero to Mastery', 'The #1 resource for for SQL mastery');
UPDATE course
SET subject_id = 'e4a6e53f-db6f-438f-b4d0-80d4e6a9341e'
WHERE subject_id is NULL;
ALTER TABLE course 
ALTER COLUMN subject_id SET NOT NULL; 
UPDATE course
SET teacher_id = '90de5068-bc2f-4d37-b4f0-1364671fb89e'
WHERE teacher_id is NULL;
ALTER TABLE course 
ALTER COLUMN teacher_id SET NOT NULL; 
INSERT INTO course ("name", description, subject_id, teacher_id) VALUES ('SQL Zero to Mastery', 'The #1 resource for for SQL mastery', 'e4a6e53f-db6f-438f-b4d0-80d4e6a9341e', '90de5068-bc2f-4d37-b4f0-1364671fb89e');

⬆ back to top

165. Adding Feedback To A Course

problem is there is not data integrity check for student_id when append feedback object into feedback array
if we need to check student_id is valid in student table, we should use a feedback table instead of feedback array

INSERT INTO enrollment (student_id, course_id, enrollment_date) VALUES ('f914ba46-9ac4-42cf-b0ac-d10cfe625be8', '6174589a-7f97-4d40-b063-6a974a20a7ec', NOW()::DATE);
UPDATE course
SET feedback = array_append(
  feedback, ROW('f914ba46-9ac4-42cf-b0ac-d10cfe625be8', 5, 'Great course!')::feedback
)
WHERE course_id = '6174589a-7f97-4d40-b063-6a974a20a7ec';

⬆ back to top

166. A Tale Of 2 Feedbacks

Feedback is important
Store feedback in a table instead of feedback array data type
Deprecated feedback array data type by rename feedback to feedback_deprecated
Deprecated feedback field in course by rename feedback to feedback_deprecated
Use array data type for not so important data such as phone numbers

CREATE TABLE feedback (
  student_id UUID NOT NULL REFERENCES student(student_id),
  course_id UUID NOT NULL REFERENCES course(course_id),
  feedback TEXT,
  rating rating,
  CONSTRAINT pk_feedback PRIMARY KEY (student_id, course_id)
);
INSERT INTO feedback (student_id, course_id, feedback, rating) VALUES ('f914ba46-9ac4-42cf-b0ac-d10cfe625be8', '6174589a-7f97-4d40-b063-6a974a20a7ec', 'Interesting Course', 4)

⬆ back to top

167. SQL Exercises

SQL Exercises

⬆ back to top

168. SQL Quiz

SQL Quiz

⬆ back to top

169. Backups And Why They Are Important

disaster will strike
so it’s important

Have a plan

backup plan
disaster recovery plan
test your plans

What can go wrong?

hardware failures
viruses
power outages
hackers
human error

How do I make a plan?

1 determine what needs to be backed up
2 What’s the appropriate way to back up?
3 Decide how frequently you are going to back up
4 decide where you are going to store them
5 have a retention policy for the backup

What to backup?

Type	Purpose	Frequency
full backup	backup all the data	less often
incremental	backup since last incremental	often
differential	backup since last full backup	often
transaction log	backup of the database transactions	a lot

⬆ back to top

170. Backing Up In Postgres

⬆ back to top

171. Restoring A Database

psql -d postgres
postgres=# \conninfo
You are connected to database "postgres" as user "postgres" via socket in "/tmp" at port "5432".
postgres=# \i ~/Desktop/Employees_20200927_153756.sql

⬆ back to top

172. Transactions

Transactions

a database is a shared resource, so many users may access it concurrently
a transaction is a unit of instructions (query for instance)
so how do you keep things consistent?
dbms has a mechanism in place to manage transactions
dbms wrap INSERT, UPDATE and DELETE commands in transaction by default
Transaction lifecycle

session 1

Employees=# BEGIN;
BEGIN
Employees=# DELETE FROM employees WHERE emp_no BETWEEN 10000 AND 10005;
DELETE 5
Employees=# SELECT * FROM employees;
 emp_no | birth_date |   first_name   |    last_name     | gender | hire_date  
--------+------------+----------------+------------------+--------+------------
  10006 | 1953-04-20 | Anneke         | Preusig          | F      | 1989-06-02
  10007 | 1957-05-23 | Tzvetan        | Zielinski        | F      | 1989-02-10
  10008 | 1958-02-19 | Saniya         | Kalloufi         | M      | 1994-09-15
  10009 | 1952-04-19 | Sumant         | Peac             | F      | 1985-02-18
  10010 | 1963-06-01 | Duangkaew      | Piveteau         | F      | 1989-08-24

session 2
session 2 hang when DELETE as session 1 lock employees table

Employees=# SELECT * FROM employees;
 emp_no | birth_date |   first_name   |    last_name     | gender | hire_date  
--------+------------+----------------+------------------+--------+------------
  10001 | 1953-09-02 | Georgi         | Facello          | M      | 1986-06-26
  10002 | 1964-06-02 | Bezalel        | Simmel           | F      | 1985-11-21
  10003 | 1959-12-03 | Parto          | Bamford          | M      | 1986-08-28
  10004 | 1954-05-01 | Chirstian      | Koblick          | M      | 1986-12-01
  10005 | 1955-01-21 | Kyoichi        | Maliniak         | M      | 1989-09-12
  10006 | 1953-04-20 | Anneke         | Preusig          | F      | 1989-06-02
  10007 | 1957-05-23 | Tzvetan        | Zielinski        | F      | 1989-02-10
  10008 | 1958-02-19 | Saniya         | Kalloufi         | M      | 1994-09-15
  10009 | 1952-04-19 | Sumant         | Peac             | F      | 1985-02-18
  10010 | 1963-06-01 | Duangkaew      | Piveteau         | F      | 1989-08-24
Employees=# BEGIN;
BEGIN
Employees=# DELETE FROM employees WHERE emp_no BETWEEN 10000 AND 10005;

session 1 rollback

Employees=# ROLLBACK;

session 2 complete after session 1 rollback

Employees=# DELETE FROM employees WHERE emp_no BETWEEN 10000 AND 10005;
DELETE 5

session 1 begin another transaction and try to DELETE
session 1 hang now as session 2 locked employees table

Employees=# BEGIN;
BEGIN
Employees=# DELETE FROM employees WHERE emp_no BETWEEN 10000 AND 10005;

session 2 commit and end transaction

Employees=# DELETE FROM employees WHERE emp_no BETWEEN 10000 AND 10005;
DELETE 5
Employees=# END;
COMMIT

session 1 delete no records as session 2 commit

Employees=# BEGIN;
BEGIN
Employees=# DELETE FROM employees WHERE emp_no BETWEEN 10000 AND 10005;
DELETE 0
Employees=# END;
COMMIT

Transactions

Maintain the integrity of a database, all transactions must obey ACID properties
Atomicty: either execute entirely or not at all
Consistency: each transaction should leave the database in a consistent state (commit or rollback)
Isolation: transaction should be executed in isolation from other transactions
Durability: after completion of a transaction, the changes in the database should persist

⬆ back to top

Section 8: Solving The Mystery

173. Clue #1 and #2

Clue #1: Bruno insists it’s one of his employees
Clue #2: Data breach happened during a certain time

⬆ back to top

174. Getting ready to solve the Mystery!

We have two solid clues so far:

It was an inside job
It was done on 2020-06-23

For this part of the course you’ll need 2 databases:

Movr_Employees: This database contains all relevant information about the Employees of Keiko Corp.
Movr: A ride-sharing product that Keiko offers to consumers.

To restore the Movr backup make sure to do the following steps in order!

Create a database named Movr
Restore the schema.sql first to the Movr database
Restore the data.sql next

psql -U postgres Movr
Movr=# \i ~/Desktop/schema.sql
Movr=# \i ~/Desktop/data.sql

To restore the Movr_Employees backup make sure to do the following steps in order!

Create a database named Movr_Employees
Restore the movr_employees.sql

psql -U postgres Movr_Employees
Movr_Employees=# \i ~/Desktop/movr_employees.sql

⬆ back to top

175. Clue #3

You had to be on location to steal the data

⬆ back to top

176. Exercise: Solving The First Clues

Date of incident: 2020-06-23
Keiko Corp Latitude: -74.997 to -74.9968
Keiko Corp Longitude: 40.5 to 40.6

⬆ back to top

177. Solution: Solving The First Clues

CREATE VIEW suspected_rides AS
SELECT *
FROM "public"."vehicle_location_histories"
WHERE 
  "city" = 'new york' AND
    "timestamp"::date = '2020-06-23'::date AND
    "lat" BETWEEN -74.997 AND -74.9968 AND
    "long" BETWEEN 40.5 AND 40.6
ORDER BY long;
SELECT DISTINCT r.vehicle_id
FROM suspected_rides AS sr
JOIN rides AS r ON r.id = sr.ride_id; 
SELECT 
  DISTINCT r.vehicle_id, 
  u.name AS "owner name", 
  u.address, 
  v.status, 
  v.current_location
FROM suspected_rides AS sr
JOIN rides AS r ON r.id = sr.ride_id
JOIN vehicles AS v ON v.id = r.vehicle_id
JOIN users AS u ON u.id = v.owner_id;

⬆ back to top

178. Clue #4

It’s not the drivers

⬆ back to top

179. Exercise: Clue #4

So with our current setup, we go ahead and filter out all of the unique riders that were on those suspected rides on that horrible day of the theft.

⬆ back to top

180. Solution: Clue #4

SELECT DISTINCT r.vehicle_id, u.name AS "rider name", u.address
FROM suspected_rides AS sr
JOIN rides AS r ON r.id = sr.ride_id
JOIN users AS u ON u.id = r.rider_id;

⬆ back to top

181. Clue #5 and #6

Clue #5: It’s not a rider
Clue #6: It was an inside job

⬆ back to top

182. Exercise: Clue #5 and #6

We’re going to have to cross-reference data between 2 separate databases.

-- run this to run queries across databases
CREATE extension dblink; 
SELECT *
FROM dblink(
  'host=localhost user=postgres password=postgres dbname=Employees', 
  'SELECT <column> FROM employees;') 
AS t1(<column> NAME)

⬆ back to top

183. Solution: Clue #5 and #6

String Functions and Operators

CREATE VIEW suspected_rider_names AS
SELECT DISTINCT
    split_part(u.name, ' ', 1) AS "first_name",
    split_part(u.name, ' ', 2) AS "last_name"
FROM suspected_rides AS vlh
JOIN rides AS r ON r.id = vlh.ride_id
JOIN users AS u ON u.id = r.rider_id; 
SELECT * FROM suspected_rider_names;
SELECT DISTINCT
  CONCAT(t1.first_name, ' ', t1.last_name) AS "employee",
  CONCAT(u.first_name, ' ', u.last_name) AS "rider"
FROM dblink(
  'host=localhost user=chesterheng password=postgres dbname=Movr_Employees', 
  'SELECT first_name, last_name FROM employees;') 
AS t1(first_name NAME, last_name NAME)
JOIN suspected_rider_names AS u ON t1.last_name = u.last_name
ORDER BY "rider";

⬆ back to top

Section 9: Database Design

185. System Design And SDLC

Software development lifecycle

Phase 1: System planning and selection
Phase 2: System analysis
Phase 3: System design
Phase 4: System implementation and operation
Goal of SDLC is to design robust systems
The process can be implemented in different ways: agile, waterfall, v-model (for hardware system)

⬆ back to top

186. SDLC Phases

different models cycle through the steps at different paces
Phase 1: requirements analysis
- get the information we needs to be done (scope)
Phase 2: system analysis
- take in requirements and analyze if it can be done on time and on budget
Phase 3: system design
- design the system architecture for all related databasesm=, apps, etc
Phase 4: system implementation and operation
- build the software
there are more phases that can be added: testing, maintenance, … (inside phase 4)

⬆ back to top

187. System Design Deep Dive

From chaos to structure: create structure that can be understood and communicated

Database design

In any system, there is data
There are different techniques to design database
Top-Down vs Bottom-Up

⬆ back to top

188. Top-Down vs Bottom-Up

Top-Down

Start from 0
Optimal choice when create a new database
All requirements are gathered up-front

Bottom-Up

There is an exitsing system or specific data in place (exmaple: use by doctore)
Shape a new system around existing data
Optimal choice when migrate an exitsing database

What to use? Often you’ll end up using a bit of both

⬆ back to top

189. DRIVEME Academy

In top-down design, you create the universe based on the requirements
So, let’d do based on DRIVEME, an driving school
Every school has instructors on payroll and an inventory of cars, truck and motorcycles for teaching
Initial stake holders: CEO, CTO, BA’s, customers
Core mission: become a household name across USA for learning how to drive
Current situation: has an outdated website and their customer acquistion is mostly word of mouth
THey want to start gaining marketshare through an online presence
Core requirements are a well-thought out document with sections of functionality
- There is a vehicle inventory for students to rent
- Thare are employees at every branch
- There is maintenance for the vehicles
- There is an optional exam at the end of lessons
- You can only take the exam twice, fail twice and you must take more lessons

⬆ back to top

190. Top Down Design

Goal: create a data model based on requirements
What are requirements?
- high-level requirements
- user interviews
- data collection
- deep understanding of the business
one of the key method to top-down deisgn is ER modeling

⬆ back to top

191. ER Model

Phase 3: system design

How to design?
Top-down: ER modeling
Bottom-up: ???

ER Model: Entity-Relationship Model or diagram

A diagram that functions as a way to structure high-level requirments

⬆ back to top

192. Step 1: Determining Entities

Step 1: Determine what entities are in the system

What is an entity?

a person / place or a thing
has a singular name
has an identifier
should contain more than one instance of data

⬆ back to top

193. Tooling For Diagramming

⬆ back to top

194. DRIVEME Academy Entities

Core requirements

There is a vehicle inventory for students to rent
Thare are employees at every branch
There is maintenance for the vehicles
There is an optional exam at the end of lessons
You can only take the exam twice, fail twice and you must take more lessons

⬆ back to top

195. Step 2: Attributes

give entities the information they will store

Attributes

must be a property of the entity
must be atomic
- can be store in an individual field
- not address
- address can breakdown into street name, postal code, region
single / mutivalued (single or many phone numbers)
keys

⬆ back to top

196. Relational Model Extended

relation schema
attribute
degree
cardinality
tuple
column
relation key
domain
tables
relation instance

⬆ back to top

197. Relational Schema And Instance

Relation schema

table schema
represent data that is going into table

Relation instance

set of data related to relation schema

⬆ back to top

198. Super Key and Candidate Key

Concept of Keys in DBMS - Super, Primary, Candidate, Foreign Key, etc
DBMS Keys: Candidate, Super, Primary, Foreign (Example)
relation key: unique identify rows and relationship
relation key consists of super key and candidate key
super key: an attribute or a set of attributes that can be used to identify row of data in a table
candidate key: minimal subset of super key
candidate key: if any proper subset of a super key is a super key, then that key cannot be a candidate key

⬆ back to top

199. Primary Key and Foreign Key

primary key: uniquely identify a row
foreign key: reference a key in foreign table

⬆ back to top

200. Compound Composite And Surrogate Key

compound key: multiple keys to uniquely identify a row and include foreign key
composite key: multiple keys to uniquely identify a row without include foreign key
surrogate key: use to uniquely identify a row and has no meaning to the data in the table

⬆ back to top

201. DRIVEME Attributes

School:

school_id, street_name, street_nr, postal_code, state, city

Instructor:

instructor_id, first_name, last_name, date_of_birth, hiring_date, school_id

Student:

student_id, first_name, last_name, date_of_birth, enrollment_date, …

Exam:

student_id, teacher_id, date_taken, passed, lesson_id

Lesson:

lesson_id, date_of_enrollment, package, student_id

⬆ back to top

202. Step 3: Relationships

determine relationships between entities
links 2 entities together
- 1 to 1
- 1 to many
- many to many
connect entities with a line that tells is the relationship

⬆ back to top

203. DRIVEME Relationships

⬆ back to top

204. Step 4: Solving Many To Many

in relational model it isn’t possible to store a many to many relationship
technially you can do it, but you really don’t want to
you create more overhead: INSERT overhead, UPDATE overhead, DELETE overhead, potential redundancy
as a rule of thumb always try to resolve many to many
solution is use intermediate entities

⬆ back to top

205. Step 5: Subject Areas

divide entities into logical groups that are related (think schemas)
Subject Areas rules
- all entities must belong to one
- an entity can only belong to one
- you can nest subject areas

⬆ back to top

206. DRIVEME Subject Areas

⬆ back to top

207. Exercise: Painting Reservations

System requirements

a rich business man has tons of paintings
he wants to build a system to catalog and track where his art is
he lends it to museums all across the world
he even wants to see reservations

Some constraints to note:

a painting can only have one artist

Let’s answer some questions about the system:

what’s the goal of the system?
- track painting reservations for a wealthy man
who are our stakeholders
- owner, museums

5 steps of top-down to create an erd of the system

Step 1: entities

Painting
Reservation
Museum
Artist

Step 2: attributes

Entity	Attributes
Painting	name, creation_date, style
Reservation	creation_date, date_from, date_to, accepted
Artist	name, birth_date, email
Museum	name, address, phone_nr, email

Step 3: Relationships

Step 4: Solving Many To Many

⬆ back to top

208. Exercise: Movie Theatre

⬆ back to top

209. Bottom Up Design

to create a data model from specific details, existing systems, legacy systems, …

Steps

identify the data (attributes)
group them (entities)

Goal

create a perfect data model without redundancy and anomalies

⬆ back to top

210. Anomalies

modification anomalies are problems that arise when database is not structured correctly

There are 3 types

Insert anomalies
Update anomalies
Delete anomalies

⬆ back to top

211. Normalization

use Normalization is key to avoid anomalies
Edgar Codd proposed the theory of normalization

Goal

a design technique to reduce redundancy and anomalies

Normalization

to do Normalization we must understand 2 things
- functional dependencies
- normal forms

⬆ back to top

212. Functional Dependencies

A functional Dependency show a relationship between attributes
functional Dependency exists when a relationship between 2 attributes allows you to uniquely determine corresponding attribute’s value
we refer to an entity as R and attributes as A and B
A is functional dependent on B when a value of B determines a value of A: B -> A
it means that based on the value of B, you can determine the value of A: EMP_NO -> FIRST_NAME
DETERMINANT -> DEPENDANT
example: STUIDENT_ID -> BIRTH_DATE
create unique relationship between the data

⬆ back to top

213. Functional Dependencies 2

emp_no	proj_id	salary
1	1	300
2	1	500
1	2	400
3	2	300

Example:

EMP_NO -> SALARY (no)
PROJ_ID -> SALARY (no)
PROJ_ID, EMP_NO -> SALARY (yes)

student_no	activity	contribution
1	swimming	300
2	sailing	500
3	bungee	150
4	shark diving	200
1	shark diving	300
2	bungee	300

Example:

STUDENT_NO -> ACTIVITY (no)
ACTIVITY -> STUDENT_NO (no)
ACTIVITY -> CONTRIBUTION (no)
STUDENT_NO, ACTIVITY -> CONTRIBUTION (yes)

⬆ back to top

214. The Normal Forms

0 NF -> 1 NF -> 2 NF -> 3 NF -> BCNF -> 4 NF -> 5 NF -> 6 NF
normalization happens through a process of running attributes through the normal forms
each normal form aims to further seperate relationships into smaller instances as to create less redundancy and anomalies
normal forms 0 through BCNF are the most common normal forms to run through
NF 4 and 5 are there to further reduce anomalies and 6 NF is not yet standardized

⬆ back to top

215. Going from 0NF to 1NF

data is in 0 NF when it is unnormalized

repeating groups of fields
positional dependence of data
non-atomic data

1 NF

eliminate repeating columns of the same data
each attribute should contain a single value
determine a primary key

A repeating group means that a table contains 2 or more columns that are closely related

Example 1:

O NF

book	author 1	author 2	author 3
1	1	2	3
2	2	1	3
3	3	2	1

1 NF

BOOK

book_id	title

BOOK_AUTHOR

book_id	author_id	author_name	author_address	author_email

Example 2:

O NF

color	quantity	price
red, green, blue	20	9.99
yellow, orange, purple	10	10.99
blue, cyan	15	3.99
green, magento	200	15.99

1 NF

PRODUCT

prod_id	quantity	price

PRODUCT_COLOUR

prod_id	color

⬆ back to top

216. Going from 1NF to 2NF

it is in 1 NF
all non-key attributes are fully functional dependent on the primary key

Example:

O NF

book
title
author
author_name
author_address
author_email

1 NF

BOOK

book_id (pk)
title

BOOK_AUTHOR

book_id (pk)
author_id (pk)
author_name
author_address
author_email

2 NF

BOOK

book_id (pk)
title

BOOK_AUTHOR

book_id (pk / fk)
author_id (pk / fk)

AUTHOR

author_id (pk)
author_name
author_address
author_email

⬆ back to top

217. Going from 2NF to 3NF

It is in 2 NF
no transitive dependencies
A is functional dependent on B and B is functional dependent on C, C is transitively dependent on A via B
C -> B and B -> A means that C -> A

Example:

O NF

branch
first name
last name
title
hours

1 NF

EMPLOYEE

emp_no (pk)
first name
last name
title

BRANCH

branch_no (pk)
emp_no (fk)
street
street_no
province
postal_code
hours_logged
work_date
country

2 NF

EMPLOYEE

emp_no (pk)
first name
last name
title

BRANCH

branch_no (pk)
street
street_no
province
postal_code
country

TIMESHEET

branch_no (pk / fk)
emp_no (pk / fk)
hours_logged
work_date

3 NF

EMPLOYEE

emp_no (pk)
first name
last name
title

BRANCH

branch_no (pk)
street
street_no
postal_code
province_id (fk)

PROVINCE

province_id (pk)
province
country

TIMESHEET

branch_no (pk / fk)
emp_no (pk / fk)
hours_logged
work_date

⬆ back to top

218. Boyce-Codd Normal Form

it is in 3 NF
for any dependency A -> B, A should be a super key
most relationship in 3 NF are also in BCNF, but not all of them
3 NF allows attributes to be part of a candidate key that is not the primary key - BCNF does not

A relationship is not in BCNF if

the primary key is a composite key
there is more than one candidate key
some attributes have keys in common

3 NF

student_id	tutor_id	tutor_sin
1	998	838 383 494
2	234	343 535 352
3	999	112 134 134
4	1234	354 464 234

candidate keys

STUDENT_ID, TUTOR_ID
STUDENT_ID, TUTOR_SIN

functional dependency

TUTOR_ID -> TUTOR_SIN
TUTOR_SIN -> TUTOR_ID
STUDENT_ID, TUTOR_ID -> TUTOR_SIN
STUDENT_ID, TUTOR_SIN -> TUTOR_ID

BCNF

student_id	tutor_id
1	998
2	234
3	999
4	1234

tutor_id	tutor_sin
999	838 383 494
234	343 535 352
999	112 134 134
1234	354 464 234

⬆ back to top

219. Why 4NF And 5NF Are Not Useful

over normalization
incur functionality loss

⬆ back to top

220. Exercise: Database Design Quiz

Database Design Quiz

⬆ back to top

Section 10: Database Landscape, Performance and Security

222. Scalability

Scalability

capability of database to handle growing amount of data
more data to store into database
reach limit on how much data to store
more operations - lots of users query database for more information
how many operations per second

2 strategies

vertical scalability: add more capacity to a single database, add more disk space, add more memory, add physically device, normally for relation database
horizontal scalability: add more database and worked as single database, normally for NoSQL such as MongoDB

⬆ back to top

223. Sharding

split data into many databases
for example: 3 database
1st database store A to J
2nd database store K to T
3rd database store U to z
need a routing system to know data is located in which database
distribute query to different database

⬆ back to top

224. Replication

replicate data across mutiple machines and different locations
happen in real time
eventual consistency: user write data in one database and database updates all its replicas

2 strategies

synchronous: client update a database, database updates all its replicas before response to the client
asynchronous: client update a database and response to the client and updates all its replicas in the background

⬆ back to top

225. Backups

not done often
normally at night
dump data into a sql file “database backup “

⬆ back to top

226. Distributed vs Centralized Databases

Centralized: keep in single location on a given network or control by one company, centrally control
- data integrity is more easy
Distributed: compose of multiple databadr store in multiple locations or controlled by different organizations
- more complexity

⬆ back to top

227. Database Security

users can only see the data they are authorized to see
keep unauthorized uses from access the database
prevent data corruption
database availability, no system crashes
create database to store necessary info and give access to necessary users
principle of least privilege state to give privilege to exactly what is needed

⬆ back to top

232. Relational vs NoSQL, PostgreSQL vs MongoDB Databases

Big drop in developer job postings

NoSQL (MongoDB)
- duplicated data
- no standard way to query
- no need to have schema
- easy to scale horizontally
- related data in one locaton
- data is not linked to other location
Relational (PostgreSQL)
- normalisation, no data duplication
- standard way to query
- need a schema
- hard to scale horizontally
- need to get related data from different tables
- data is linked to many locations

⬆ back to top

233. Future Of Relational Databases

NewSQL

relational database and easy to scale horizontally

⬆ back to top

234. Elasticsearch

good for data need to search
build for full text search
less reliable than relational database
fast in search
relational database as primary data source
relational database to store important info
elasticsearch database as secondary data source
copy data to search to elasticsearch database

⬆ back to top

235. S3 Object Storage

document model
use for object
dump data and retrieve data

⬆ back to top

236. Top Databases To Use

Databases

Postgres
Microsoft SQL Server
MongoDB
Amazon DocumentDB
Firebase
Elasticsearch
Redis
Amazon S3

⬆ back to top

Section 14: Extra Bits: Databases In A Web App (with Node.js)

265. Setting Up Your Database

CREATE TABLE users (
  id serial PRIMARY KEY,
  "name" VARCHAR(100),
  email text UNIQUE NOT NULL,
  entries BIGINT DEFAULT 0,
  joined TIMESTAMP NOT NULL
);
CREATE TABLE login (
  id serial PRIMARY KEY,
  hash VARCHAR(100) NOT NULL,
  email text UNIQUE NOT NULL
);

⬆ back to top

266. Connecting To The Database

Knex.js

const postgres = knex({
  client: 'pg',
  connection: {
    host : 'localhost',
    user : 'postgres',
    password : '',
    database : 'smart-brain'
  }
});
postgres.select('*').from('users');

⬆ back to top

267. Registering A User Part 1

app.post('/register', (req, res) => {
  const { email, name } = req.body;
  db('users')
    .returning('*')
    .insert({
      name,
      email,
      joined: new Date()
  })
  .then(user => res.json(user[0]))
  .catch(err => res.status(400).json('unable to register'));
});

⬆ back to top

Complete SQL + Databases Bootcamp: Zero to Mastery [2020]

Table of Contents

Section 1: Introduction

4. The Big Data Breach At Keiko Corp

Section 2: History And Story Of Data

5. What Is a Database?

6. I Didn’t Learn Anything, Try Again…

7. Database Management System (DBMS)

8. Exercise: Building Amazon

9. Exercise: Building Amazon 2

10. 5 Types Of Databases

11. Exercise: What Is A Database?

12. Section Review

Section 3: Databases + SQL Fundamentals

13. SQL Playground

14. Exercise: Setting Up Your First Database

15. What Is SQL?

16. What Is A Query?

17. Imperative vs Declarative

18. History of SQL

19. Exercises: The Select Statement

20. Optional: History of SQL Deep Dive

21. SQL Standards

22. What Is A Database? Revisited

23. Database Oriented Approach

24. Exercise: SQL Starter Quiz

25. Database Models

26. Hierarchical And Networking Model

27. Relational Model

28. DBMS Revisited

29. Relational Model Revisited

30. Tables

31. Columns

32. Rows

33. Primary And Foreign Keys

34. OLTP vs OLAP

35. Exercise: OLTP vs OLAP

36. Exercise: Relational Model Quiz

Section 4: Environment Setup

38. Why PostgreSQL

39. Environment Tooling

40. SQL Tooling Alternatives

41. Command Line 101

45. MAC Setup

46. MAC Commandline tools

49. Importing Data On MAC

Section 5: SQL Deep Dive

52. Starting With Query

53. Exercise: Simple Queries

54. Exercise: Renaming Columns

55. Concat Function

56. What Is A Function In SQL?

57. Aggregate Functions

58. Exercise: Aggregate Functions

59. Commenting Your Queries

60. Common SELECT Mistakes

61. Filtering Data

62. AND and OR

63. Exercise: Filtering Data

64. Exercise: The Where Clause

65. The NOT Keyword

66. Comparison Operators

67. Exercise: Comparison Operators

68. Logical Operators

69. Operator Precedence

71. Exercise: Operator Precedence

72. Checking For NULL Values

73. IS Keyword

74. NULL Coalescing

75. Exercise: Null Value Coalescing

76. 3 Valued Logic

77. Exercise: 3 Valued Logic

78. BETWEEN + AND

79. IN Keyword

80. Exercise: IN Keyword

81. LIKE

82. Exercise: Like Keyword

83. Dates And Timezones

84. Setting Up Timezones

85. How Do We Format Date And Time?