Complete SQL + Databases Bootcamp: Zero to Mastery [2020]
Complete SQL + Databases Bootcamp: Zero to Mastery [2020]
Section 3: Databases + SQL Fundamentals
Section 7: Database Management
Section 8: Solving The Mystery
Section 10: Database Landscape, Performance and Security
It’s a collection of data, a method for accessing and manipulating that data.
Data are collected by many sources
Why is a database always represented with a cylinder?
Data are collected and use by different people
Confusing Acronyms
Confusing Anagrams
5 data models
DB Fiddle - SQL Database Playground
CREATE TABLE Class (
id character(255),
year integer NOT NULL,
PRIMARY KEY(id)
);
CREATE TABLE Student (
id varchar(255) NOT NULL,
class varchar(255) NOT NULL,
firstName varchar(255) NOT NULL,
lastName varchar(255) NOT NULL,
dob date NOT NULL,
sex varchar(1) NOT NULL,
FOREIGN KEY (class) REFERENCES Class(id),
PRIMARY KEY (id)
);
INSERT INTO Class(id, year) VALUES('c1', 1);
INSERT INTO Class(id, year) VALUES('c2', 2);
INSERT INTO Class(id, year) VALUES('c2a', 2);
INSERT INTO Student VALUES ('s1', 'c1', 'George', 'Jacobson', '1992-01-01', 'm');
INSERT INTO Student VALUES ('s2', 'c2', 'Macy', 'Waterson', '1992-01-01', 'f');
INSERT INTO Student VALUES ('s3', 'c1', 'Bill', 'Peters', '1992-01-01', 'm');
INSERT INTO Student VALUES ('s4', 'c1', 'Janine', 'Wilson', '1992-01-01', 'f');
INSERT INTO Student VALUES ('s5', 'c2', 'Jason', 'Lipton', '1992-01-01', 'm');
CREATE TABLE User (
id varchar(255) NOT NULL,
name varchar(255) NOT NULL,
lastName varchar(255) NOT NULL,
dob date NOT NULL,
sex varchar(1) NOT NULL,
role varchar(255) NOT NULL,
PRIMARY KEY (id)
);
INSERT INTO User VALUES ('u1', 'George', 'Jacobson', '1992-01-01', 'm', 'manager');
INSERT INTO User VALUES ('u2', 'Macy', 'Waterson', '1992-01-01', 'f', 'employee');
INSERT INTO User VALUES ('u3', 'Bill', 'Peters', '1992-01-01', 'm', 'employee');
INSERT INTO User VALUES ('u4','Janine', 'Wilson', '1992-01-01', 'f', 'manager');
INSERT INTO User VALUES ('u5', 'Jason', 'Lipton', '1992-01-01', 'm', 'manager');
SELECT *
FROM User
WHERE role = 'employee';
Exercises: The Select Statement
-- Insert the missing statement to get all the columns from the Customers table.
SELECT *
FROM Customers;
-- Write a statement that will select the City column from the Customers table.
SELECT City
FROM Customers;
-- Select all the different values from the Country column in the Customers table.
SELECT DISTINCT Country
FROM Customers;
The SQL Standard – ISO/IEC 9075:2016 (ANSI X3.135)
File processing systems
Model: structure of data
How you organize and store your data ?
Hierarchical Model
Networking Model
DBMS features
Exercise: Relational Model Quiz
PostgreSQL is the worlds’ best database
our choice was based on multiple criteria
Alternatives
@aechagen/mac-terminal-101-13a3e8e75d4c">Command Line 101
psql -U chesterheng -d Employees < employees.sql
psql -U chesterheng -d World < world.sql
psql -U chesterheng -d Store < store.sql
psql -U chesterheng -d France < france.sql
What is a select? Retrieve data
SELECT * FROM "public"."employees";
SELECT * FROM "public"."departments";
SELECT emp_no AS "Employee #",
birth_date AS "Birthday",
first_name AS "First Name"
FROM "public"."employees";
SELECT emp_no,
CONCAT(first_name, ' ', last_name) AS "full name"
FROM "public"."employees";
Types of functions
SELECT COUNT(emp_no) FROM "public"."employees";
SELECT MAX(salary) FROM "public"."salaries";
SELECT SUM(salary) FROM "public"."salaries";
--
/*
* What database should I use for these exercises?
* Name: Employees
*/
--
-- Question 1: What is the average salary for the company?
-- Table: Salaries
-- Result: 63810.744836143706
select avg(salary) from salaries;
-- Question 2: What year was the youngest person born in the company?
-- Table: employees
select max(birth_date) from employees;
--
/*
* What database should I use for these exercises?
* Name: France
*/
--
-- Question 1: How many towns are there in france?
-- Table: Towns
-- Result: 36684
select count(id) from towns;
--
/*
* What database should I use for these exercises?
* Name: World
*/
--
-- Question 1: How many official languages are there?
-- Table: countrylanguage
-- Result: 238
select count(countrycode) from countrylanguage
where isofficial = true;
-- Question 2: What is the average life expectancy in the world?
-- Table: country
-- Result: 66.48603611164265
select avg(lifeexpectancy) from country;
-- Question 3: What is the average population for cities in the netherlands?
-- Table: city
-- Result: 185001
select AVG(population) from city
where countrycode = 'NLD';
How to make comments the most important ‘code’ you write
Ten Common SQL Programming Mistakes
SELECT first_name
FROM "public"."employees"
WHERE gender = 'F';
SELECT first_name, last_name, hire_date
FROM "public"."employees"
WHERE first_name = 'Georgi'
AND last_name = 'Facello'
OR first_name = 'Bezalel'
AND last_name = 'Simmel';
SELECT firstname, lastname, gender, state
FROM customers
WHERE gender = 'F' AND (state = 'OR' OR state = 'NY') ;
-- Select all records where the City column has the value "Berlin".
SELECT *
FROM Customers
WHERE City = 'Berlin';
-- Use the NOT keyword to select all records where City is NOT "Berlin".
SELECT *
FROM Customers
WHERE NOT City = 'Berlin';
-- Select all records where the CustomerID column has the value 32.
SELECT *
FROM Customers
WHERE CustomerID = 32 ;
-- Select all records where the City column has the value 'Berlin' and the PostalCode column has the value 12209.
SELECT *
FROM Customers
WHERE City = 'Berlin'
AND PostalCode = 12209;
-- Select all records where the City column has the value 'Berlin' or 'London'.
SELECT * FROM Customers
WHERE City = 'Berlin'
OR City = 'London';
What if you want to filter on everything but … ?
SELECT age FROM customers WHERE NOT age = 55;
SELECT COUNT(age) FROM customers WHERE NOT age = 55;
Comparison Functions and Operators
-- How many female customers do we have from the state of Oregon (OR)?
-- Result: 106
SELECT COUNT(firstName)
FROM customers
WHERE gender = 'F' and state = 'OR';
-- Who over the age of 44 has an income of 100 000 or more? (excluding 44)
-- Result: 2497
SELECT COUNT(income)
FROM customers
WHERE age > 44 and income >= 100000;
-- Who between the ages of 30 and 50 has an income less than 50 000?
-- (include 30 and 50 in the results)
-- Result: 2362
SELECT COUNT(income)
FROM customers
WHERE age >= 30 and age <= 50 AND income < 50000;
-- What is the average income between the ages of 20 and 50? (Excluding 20 and 50)
-- Result: 59409.926240780098
SELECT AVG(income)
FROM customers
WHERE age > 20 and age < 50;
Logical Operators
Order of operations
A statement having multiple operators is evaluated based on the priority of operators
/*
* DB: Store
* Table: Customers
* Question:
* Select people either under 30 or over 50 with an income above 50000
* Include people that are 50
* that are from either Japan or Australia
*/
SELECT firstname, income, age
FROM customers
WHERE income > 50000
AND (age < 30 OR age >= 50)
AND (country = 'Japan' OR country = 'Australia')
/*
* DB: Store
* Table: Orders
* Question:
* What was our total sales in June of 2004 for orders over 100 dollars?
*/
SELECT SUM(totalamount)
FROM orders
WHERE (orderdate >= '2004-06-01' AND orderdate <= '2004-06-30')
AND totalamount > 100
Database Design Follies: NULL vs. NOT NULL
Should you use NULL?
Be defensive
In database table creation, do you allow NULL or use a default value to represent empty data?
SELECT COALESCE(<column>, 'Empty')
FROM <table>
CREATE TABLE "Student" (
id serial PRIMARY KEY,
name varchar(255),
lastName varchar(255),
age int
);
INSERT INTO "Student" (name, lastName, age) VALUES ('STUDENT 1', NULL, NULL);
INSERT INTO "Student" (name, lastName, age) VALUES ('STUDENT 2', NULL, 25);
INSERT INTO "Student" (name, lastName, age) VALUES (null, 'LAST NAME 3', 20);
INSERT INTO "Student" (name, lastName, age) VALUES (null, null, 34);
INSERT INTO "Student" (name, lastName, age) VALUES ('STUDENT 5', NULL, 58);
-- Assuming a students minimum age for the class is 15, what is the average age of a student?
SELECT avg(coalesce(age, 15)) FROM "Student";
-- Replace all empty first or last names with a default?
SELECT id, coalesce(name, 'fallback'), coalesce(name, 'lastName'), age FROM "Student";
/*
* DB: Store
* Table: customers
* Question: adjust the following query to display the null values as "No Address"
*/
SELECT COALESCE(address2, 'No Address')
FROM customers
/*
* DB: Store
* Table: customers
* Question: Fix the following query to apply proper 3VL
*/
SELECT *
FROM customers
WHERE address2 IS NOT null;
/*
* DB: Store
* Table: customers
* Question: Fix the following query to apply proper 3VL
*/
SELECT coalesce(lastName, 'Empty'), * from customers
where (age IS NULL);
WHERE <column> >= X AND <column> <= Y
WHERE <column> BETWEEN X AND Y
WHERE <column> IN (value1, value2, value3)
WHERE <column> = value1
OR <column> = value2
OR <column> = value3
/*
* DB: Store
* Table: orders
* Question: How many orders were made by customer 7888, 1082, 12808, 9623
*/
SELECT COUNT(orderid)
FROM orders
WHERE customerid IN (7888, 1082, 12808, 9623)
/*
* DB: World
* Table: city
* Question: How many cities are in the district of Zuid-Holland, Noord-Brabant and Utrecht?
*/
SELECT COUNT(id)
FROM city
WHERE district IN ('Zuid-Holland', 'Noord-Brabant', 'Utrecht');
Pattern wildcards
/*
* DB: Employees
* Table: employees
* Question: Find the age of all employees who's name starts with M.
* Sample output: https://imgur.com/vXs4093
*/
SELECT emp_no, first_name, EXTRACT (YEAR FROM AGE(birth_date)) as "age" FROM employees
WHERE first_name ILIKE 'M%';
/*
* DB: Employees
* Table: employees
* Question: How many people's name start with A and end with R?
* Expected output: 1846
*/
SELECT count(emp_no) FROM employees
WHERE first_name ILIKE 'A%R';
/*
* DB: Store
* Table: customers
* Question: How many people's zipcode have a 2 in it?.
* Expected output: 4211
*/
SELECT count(customerid) FROM customers
WHERE zip::text LIKE '%2%';
/*
* DB: Store
* Table: customers
* Question: How many people's zipcode start with 2 with the 3rd character being a 1.
* Expected output: 109
*/
SELECT count(customerid) FROM customers
WHERE zip::text LIKE '2_1%';
/*
* DB: Store
* Table: customers
* Question: Which states have phone numbers starting with 302?
* Replace null values with "No State"
* Expected output: https://imgur.com/AVe6G4c
*/
SELECT coalesce(state, 'No State') as "State" FROM customers
WHERE phone::text LIKE '302%';
UTC is enough for everyone…right?
SHOW TIMEZONE;
SET TIME ZONE 'UTC';
ALTER USER chesterheng SET timezone='UTC';
SHOW TIMEZONE;
How do dates look like?
What is a format?
SELECT NOW();
CREATE TABLE timezones (
ts TIMESTAMP WITHOUT TIME ZONE,
tz TIMESTAMP WITH TIME ZONE
);
INSERT INTO timezones VALUES (
TIMESTAMP WITHOUT TIME ZONE '2000-01-01 10:00:00',
TIMESTAMP WITH TIME ZONE '2000-01-01 10:00:00-05'
);
timestamps or date ?
-- Current Date
SELECT NOW()::date;
SELECT CURRENT_DATE;
-- format
SELECT TO_CHAR(CURRENT_DATE, 'dd/mm/yyyy');
SELECT TO_CHAR(CURRENT_DATE, 'DDD');
SELECT TO_CHAR(CURRENT_DATE, 'WW');
SELECT NOW() - '1800/01/01';
SELECT date '1800/01/01';
SELECT AGE(date '1800/01/01');
SELECT AGE(date '1992/11/13', date '1800/01/01');
SELECT EXTRACT (DAY FROM date '1992/11/13') DAY;
SELECT EXTRACT (DAY FROM date '1992/11/13') AS DAY;
SELECT EXTRACT (MONTH FROM date '1992/11/13') AS MONTH;
SELECT EXTRACT (YEAR FROM date '1992/11/13') AS YEAR;
SELECT DATE_TRUNC ('year', date '1992/11/13');
SELECT DATE_TRUNC ('month', date '1992/11/13');
SELECT DATE_TRUNC ('day', date '1992/11/13');
-- 30 days before given date
SELECT NOW() - interval '30 days';
SELECT *
FROM orders
WHERE purchaseDate <= now() - interval '30 days'
SELECT NOW() - INTERVAL '1 year 2 months 3 days';
SELECT NOW() - INTERVAL '2 weeks ago';
SELECT NOW() - INTERVAL '1 year 3 hours 20 minutes';
/*
* DB: Employees
* Table: employees
* Question: Get me all the employees above 60, use the appropriate date functions
*/
SELECT AGE(birth_date), * FROM employees
WHERE (
EXTRACT (YEAR FROM AGE(birth_date))
) > 60 ;
-- alternative
SELECT count(birth_date) FROM employees
WHERE birth_date < now() - interval '61 years' -- 61 years before the current date
/*
* DB: Employees
* Table: employees
* Question: How many employees where hired in February?
*/
SELECT count(emp_no) FROM employees
where EXTRACT (MONTH FROM hire_date) = 2;
/*
* DB: Employees
* Table: employees
* Question: How many employees were born in november?
*/
SELECT COUNT(emp_no) FROM employees
WHERE EXTRACT (MONTH FROM birth_date) = 11;
/*
* DB: Employees
* Table: employees
* Question: Who is the oldest employee?
*/
SELECT MAX(AGE(birth_date)) FROM employees;
/*
* DB: Store
* Table: orders
* Question: How many orders were made in January 2004?
*/
SELECT COUNT(orderid)
FROM orders
WHERE DATE_TRUNC('month', orderdate) = date '2004-01-01';
/*
* DB: Employees
* Table: titles
* Question: What unique titles do we have?
*/
SELECT DISTINCT title FROM titles;
/*
* DB: Employees
* Table: employees
* Question: How many unique birth dates are there?
*/
SELECT COUNT(DISTINCT birth_date)
from employees;
/*
* DB: World
* Table: country
* Question: Can I get a list of distinct life expectancy ages
* Make sure there are no nulls
*/
SELECT DISTINCT lifeexpectancy FROM country
WHERE lifeexpectancy IS NOT NULL
ORDER BY lifeexpectancy;
SELECT * FROM customers
ORDER BY LENGTH(name)
SELECT a.emp_no, b.salary, b.from_date
FROM employees AS a, salaries AS b
WHERE a.emp_no = b.emp_no
ORDER BY a.emp_no;
SELECT a.emp_no, b.salary, b.from_date
FROM employees AS a
INNER JOIN salaries AS b
ON a.emp_no = b.emp_no
ORDER BY a.emp_no;
-- You want to know the original salary and also salary at a promotion
SELECT a.emp_no,
CONCAT(a.first_name, a.last_name) AS "name",
b.salary,
COALESCE(c.title, 'No title change'),
COALESCE(c.from_date::text, '-') AS "title taken on"
FROM employees AS a
INNER JOIN salaries AS b
ON a.emp_no = b.emp_no
INNER JOIN titles AS c
ON c.emp_no = a.emp_no AND (
c.from_date = (b.from_date + interval '2 days') OR
c.from_date = b.from_date
)
ORDER BY a.emp_no;
id | name | startDate | supervisorId |
---|---|---|---|
1 | Binni | 1990/01/13 | 2 |
2 | Andrei | 1980/01/23 | 2 |
CREATE TABLE employee(
id varchar(5) NOT NULL,
name varchar(20) NULL,
start_date date NULL,
supervisorId varchar(5) NULL,
CONSTRAINT id PRIMARY KEY(id),
CONSTRAINT supervisorId FOREIGN KEY(supervisorId) REFERENCES employee(id)
);
INSERT INTO employee VALUES ('1', 'Boss Man', date '1980/01/23', '1');
INSERT INTO employee VALUES ('1a', 'Andrei', date '1980/01/23', '1');
INSERT INTO employee VALUES ('1abc', 'Boss Man', date '1980/01/23', '1a');
SELECT a.id,
a.name AS "employee",
b.name AS "supervisor"
FROM employee AS a,
employee AS b
WHERE a.supervisorId = b.id;
SELECT a.id,
a.name AS "employee",
b.name AS "supervisor"
FROM employee AS a
INNER JOIN employee AS b
ON a.supervisorId = b.id;
SELECT *
FROM <table A> AS a
LEFT [OUTER] JOIN <table B> AS b
ON a.id = b.id;
-- You want to know every salary and also salary at a promotion
SELECT a.emp_no,
CONCAT(a.first_name, a.last_name) AS "name",
b.salary,
COALESCE(c.title, 'No title change'),
COALESCE(c.from_date::text, '-') AS "title taken on"
FROM employees AS a
INNER JOIN salaries AS b
ON a.emp_no = b.emp_no
LEFT JOIN titles AS c
ON c.emp_no = a.emp_no AND (
c.from_date = (b.from_date + interval '2 days') OR
c.from_date = b.from_date
)
ORDER BY a.emp_no;
-- cross join: create a combination of every row
SELECT * FROM <tableA>
CROSS JOIN <tableB>;
-- full outer join: returns result from both table whether they match or not
SELECT * FROM <tableA> AS a
FULL JOIN <tableB> AS b
ON a.id = b.id;
/*
* DB: Store
* Table: orders
* Question: Get all orders from customers who live in Ohio (OH), New York (NY) or Oregon (OR) state
* ordered by orderid
*/
SELECT c.firstname, c.lastname, o.orderid FROM orders AS o
INNER JOIN customers AS c ON o.customerid = c.customerid
WHERE c.state IN ('NY', 'OH', 'OR')
ORDER BY o.orderid;
/*
* DB: Store
* Table: products
* Question: Show me the inventory for each product
*/
SELECT p.prod_id, i.quan_in_stock
FROM products as p
INNER JOIN inventory AS i oN p.prod_id = i.prod_id
/*
* DB: Employees
* Table: employees
* Question: Show me for each employee which department they work in
*/
SELECT e.first_name, dp.dept_name
FROM employees AS e
INNER JOIN dept_emp AS de ON de.emp_no = e.emp_no
INNER JOIN departments AS dp ON dp.dept_no = de.dept_no
SELECT *
FROM <tableA>
INNER JOIN <tableB> USING(id)
SELECT e.emp_no, e.first_name, d.dept_name
FROM employees AS e
INNER JOIN dept_emp AS de ON de.emp_no = e.emp_no
INNER JOIN departments AS d USING(dept_no);
SELECT col1, COUNT(col2)
FROM <table>
WHERE col2 > X
GROUP BY col1
HAVING col1 === Y;
SELECT d.dept_name,
COUNT(e.emp_no) AS "# of employees"
FROM employees AS e
INNER JOIN dept_emp AS de
ON de.emp_no = e.emp_no
INNER JOIN departments AS d
ON de.dept_no = d.dept_no
WHERE e.gender = 'M'
GROUP BY d.dept_name
HAVING COUNT(e.emp_no) > 25000;
SELECT d.dept_name,
COUNT(e.emp_no) AS "# of employees"
FROM employees AS e
INNER JOIN dept_emp AS de
ON de.emp_no = e.emp_no
INNER JOIN departments AS d
ON de.dept_no = d.dept_no
GROUP BY d.dept_name
ORDER BY COUNT(e.emp_no) DESC;
SELECT emp_no, MAX(from_date), MAX(salary)
FROM salaries
GROUP BY emp_no;
SELECT NULL AS "prod_id", SUM(ol.quantity)
FROM "public"."orderlines" AS ol
UNION
SELECT prod_id AS "prod_id", SUM(ol.quantity)
FROM "public"."orderlines" AS ol
GROUP BY prod_id
ORDER BY prod_id DESC;
SELECT prod_id AS "prod_id", SUM(ol.quantity)
FROM "public"."orderlines" AS ol
GROUP BY
GROUPING SETS (
(),
(prod_id)
)
ORDER BY prod_id DESC;
SELECT prod_id AS "prod_id", orderlineid, SUM(ol.quantity)
FROM "public"."orderlines" AS ol
GROUP BY
GROUPING SETS (
(),
(prod_id),
(orderlineid)
)
ORDER BY prod_id DESC, orderlineid DESC;
SELECT EXTRACT (YEAR FROM orderdate) AS "year",
EXTRACT (MONTH FROM orderdate) AS "month",
EXTRACT (DAY FROM orderdate) AS "day",
sum(ol.quantity)
FROM orderlines AS ol
GROUP BY
GROUPING SETS (
(EXTRACT (YEAR FROM orderdate)),
(
EXTRACT (YEAR FROM orderdate),
EXTRACT (MONTH FROM orderdate)
),
(
EXTRACT (YEAR FROM orderdate),
EXTRACT (MONTH FROM orderdate),
EXTRACT (DAY FROM orderdate)
),
(
EXTRACT (MONTH FROM orderdate),
EXTRACT (DAY FROM orderdate)
),
(EXTRACT (MONTH FROM orderdate)),
(EXTRACT (DAY FROM orderdate)),
()
)
ORDER BY
EXTRACT (YEAR FROM orderdate),
EXTRACT (MONTH FROM orderdate),
EXTRACT (DAY FROM orderdate);
SELECT EXTRACT (YEAR FROM orderdate) AS "year",
EXTRACT (MONTH FROM orderdate) AS "month",
EXTRACT (DAY FROM orderdate) AS "day",
sum(ol.quantity)
FROM orderlines AS ol
GROUP BY
ROLLUP (
EXTRACT (YEAR FROM orderdate),
EXTRACT (MONTH FROM orderdate),
EXTRACT (DAY FROM orderdate)
)
ORDER BY
EXTRACT (YEAR FROM orderdate),
EXTRACT (MONTH FROM orderdate),
EXTRACT (DAY FROM orderdate);
/*
* How many people were hired on did we hire on any given hire date?
* Database: Employees
* Table: Employees
*/
SELECT a.hire_date, COUNT(b.hire_date) as "amount"
FROM employees as a, employees as b
WHERE a.hire_date = b.hire_date
GROUP BY a.hire_date
ORDER BY "amount" DESC;
/*
* Show me all the employees, hired after 1991, that have had more than 2 titles
* Database: Employees
*/
SELECT e.emp_no, count(t.title) as "amount of titles"
FROM employees as e
JOIN titles as t USING(emp_no)
WHERE EXTRACT (YEAR FROM e.hire_date) > 1991
GROUP BY e.emp_no
HAVING count(t.title) > 2
ORDER BY e.emp_no;
/*
* Show me all the employees that have had more than 15 salary changes that work in the department development
* Database: Employees
*/
SELECT e.emp_no, count(s.from_date) as "amount of raises"
FROM employees as e
JOIN salaries as s USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
WHERE de.dept_no = 'd005'
GROUP BY e.emp_no
HAVING count(s.from_date) > 15
ORDER BY e.emp_no;
/*
* Show me all the employees that have worked for multiple departments
* Database: Employees
*/
SELECT e.emp_no, count(de.dept_no) as "worked for # departments"
FROM employees as e
JOIN dept_emp AS de USING(emp_no)
GROUP BY e.emp_no
HAVING count(de.dept_no) > 1
ORDER BY e.emp_no;
Window functions
window_function(arg1, arg2,..) OVER (
[PARTITION BY partition_expression]
[ORDER BY sort_expression [ASC | DESC] [NULLS {FIRST | LAST}]
)
-- how to display max salary in each indivodual row?
SELECT *, MAX(salary) OVER()
FROM salaries
LIMIT 100;
SELECT *, AVG(salary)
OVER (
PARTITION BY d.dept_name
)
FROM salaries
JOIN dept_emp AS de USING (emp_no)
JOIN departments AS d USING (dept_no);
window function ORDER BY
SELECT emp_no, salary, COUNT(salary)
OVER (ORDER BY emp_no)
FROM salaries
Key: Meaning
PARTITION BY category
ORDER BY price
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
SELECT emp_no, salary,
COUNT(salary) OVER (ORDER BY emp_no)
FROM salaries
SELECT emp_no, salary,
COUNT(salary) OVER (
PARTITION BY emp_no
ORDER BY salary
)
FROM salaries
SELECT emp_no, salary,
COUNT(salary) OVER (
PARTITION BY emp_no
ORDER BY salary
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
FROM salaries
SELECT e.emp_no, e.first_name, d.dept_name, MAX(s.salary)
FROM salaries AS s
JOIN employees AS e USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
JOIN departments AS d USING(dept_no)
GROUP BY e.emp_no, e.first_name, d.dept_name
ORDER BY e.emp_no;
SELECT DISTINCT e.emp_no,
LAST_VALUE(salary) OVER (
PARTITION BY s.emp_no
ORDER BY s.from_date
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
FROM salaries AS s
JOIN employees AS e USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
JOIN departments AS d USING(dept_no)
ORDER BY e.emp_no;
SELECT DISTINCT e.emp_no,
LAST_VALUE(s.from_date) OVER (
PARTITION BY s.emp_no
ORDER BY s.from_date
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
),
LAST_VALUE(salary) OVER (
PARTITION BY s.emp_no
ORDER BY s.from_date
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
FROM salaries AS s
JOIN employees AS e USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
JOIN departments AS d USING(dept_no)
ORDER BY e.emp_no;
SELECT DISTINCT e.emp_no,
s.from_date,
s.salary
FROM salaries AS s
JOIN employees AS e USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
JOIN departments AS d USING(dept_no)
ORDER BY e.emp_no;
SELECT prod_id, price, category
FIRST_VALUE(price) OVER (
PARTITION BY category
ORDER BY price
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS "cheapest in category"
FROM products
SELECT prod_id, price, category,
MIN(price) OVER (
PARTITION BY category
) AS "cheapest in category"
FROM products
SELECT prod_id, price, category,
LAST_VALUE(price) OVER (
PARTITION BY category
ORDER BY price
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS "most expensive in category"
FROM products
SELECT prod_id, price, category,
MAX(price) OVER (
PARTITION BY category
) AS "most expensive in category"
FROM products
SELECT o.orderid, o.customerid, o.netamount,
SUM(o.netamount) OVER (
PARTITION BY o.customerid
ORDER BY o.orderid
) AS "cum sum"
FROM orders as o
ORDER BY o.customerid
SELECT prod_id, price, category,
ROW_NUMBER() OVER(
PARTITION BY category
ORDER BY price
) AS "position in category by price"
FROM products
/*
* Show the population per continent
* Database: World
* Table: Country
*/
SELECT
DISTINCT continent,
SUM(population) OVER w1 as "continent population"
FROM country
WINDOW w1 AS( PARTITION BY continent );
/*
* To the previous query add on the ability to calculate the percentage of the world population
* What that means is that you will divide the population of that continent by the total population and multiply by 100 to get a percentage.
* Make sure you convert the population numbers to float using `population::float` otherwise you may see zero pop up
*
* Database: World
* Table: Country
*/
SELECT
DISTINCT continent,
SUM(population) OVER w1 as"continent population",
CONCAT(
ROUND(
(
SUM( population::float4 ) OVER w1 /
SUM( population::float4 ) OVER()
) * 100
),'%' ) as "percentage of population"
FROM country
WINDOW w1 AS( PARTITION BY continent );
/*
* Count the number of towns per region
*
* Database: France
* Table: Regions (Join + Window function)
*/
SELECT
DISTINCT r.id,
r."name",
COUNT(t.id) OVER (
PARTITION BY r.id
ORDER BY r."name"
) AS "# of towns"
FROM regions AS r
JOIN departments AS d ON r.code = d.region
JOIN towns AS t ON d.code = t.department
ORDER BY r.id;
Case statement
SELECT a,
CASE
WHEN a=1 THEN 'one'
WHEN a=2 THEN 'two'
END
FROM test;
Render custom row data
SELECT
o.orderid,
o.customerid,
CASE
WHEN o.customerid = 1
THEN 'my first customer'
ELSE 'not my first customer'
END,
o.netamount
FROM orders as o
ORDER BY o.customerid;
Filter in a where
SELECT
o.orderid,
o.customerid,
o.netamount
FROM orders as o
WHERE CASE
WHEN o.customerid > 10
THEN o.netamount < 100
ELSE o.netamount > 100
END
ORDER BY o.customerid;
In an aggregate function
SELECT SUM(
CASE
WHEN o.netamount <100
THEN -100
ELSE o.netamount
END
) AS "returns",
SUM(o.netamount) AS "normal total"
FROM orders as o;
NULLIF(val_1, val_2)
NULLIF(0, 0) -- NULL
NULLIF('ABC', 'DEF') -- 'ABC'
CREATE VIEW view_name AS query;
CREATE OR REPLACE view_name AS query;
ALTER VIEW <view_name> RENAME TO <view_name>;
DROP VIEW [ IF EXISTS] <view_name>;
SELECT e.emp_no, e.first_name, d.dept_name, MAX(s.salary)
FROM salaries AS s
JOIN employees AS e USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
JOIN departments AS d USING(dept_no)
GROUP BY e.emp_no, e.first_name, d.dept_name
ORDER BY e.emp_no;
SELECT DISTINCT e.emp_no,
LAST_VALUE(salary) OVER (
PARTITION BY s.emp_no
ORDER BY s.from_date
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
FROM salaries AS s
JOIN employees AS e USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
JOIN departments AS d USING(dept_no)
ORDER BY e.emp_no;
CREATE OR REPLACE VIEW last_salary_change AS
SELECT e.emp_no,
MAX(s.from_date)
FROM salaries AS s
JOIN employees AS e USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
JOIN departments AS d USING(dept_no)
GROUP BY e.emp_no
ORDER BY e.emp_no;
SELECT * FROM salaries
JOIN last_salary_change AS l USING(emp_no)
WHERE from_date = l.max
ORDER BY emp_no;
SELECT s.emp_no,
d.dept_name,
s.from_date,
s.salary
FROM last_salary_change
JOIN salaries AS s USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
JOIN departments AS d USING(dept_no)
WHERE max = s.from_date;
Poor database indexing – a SQL query performance killer – recommendations
What is Indexes?
What does it do?
Types of indexes
CREATE UNIQUE INDEX <name>
on <table> (column1, column2, ...)
DROP INDEX <name>
When to use
When NOT to use
CREATE UNIQUE INDEX <name>
ON <table> (column1);
CREATE INDEX <name>
ON <table> (<expression>);
EXPLAIN ANALYZE
SELECT "name", "district", "countrycode"
FROM "public"."city"
WHERE countrycode IN ('TUN', 'BE', 'NL');
CREATE INDEX idx_countrycode
ON city (countrycode);
Without Index
With Index
CREATE INDEX idx_countrycode
ON city (countrycode)
WHERE countrycode IN ('TUN', 'BE', 'NL');
With Partial Index
Postgres provides several types of indexes
Each index type uses a different algorithm
CREATE [UNIQUE] INDEX <name>
ON <table> USING <method> (column1, ...)
SELECT *
FROM <table>
WHERE <column> <condition> (
SELECT <column>
FROM <table>
[WHERE | GROUP BY | ORDER BY | ...]
)
SELECT (
SELECT <column>
FROM <table>
[WHERE | GROUP BY | ORDER BY | ...]
)
FROM <table> AS <name>
SELECT *
FROM (
SELECT <column>, <column>, ...
FROM <table>
[WHERE | GROUP BY | ORDER BY | ...]
) AS <name>
SELECT *
FROM <table> AS <name>
GROUP BY <column>
HAVING (
SELECT <column>
FROM <table>
[WHERE | GROUP BY | ORDER BY | ...]
)
SELECT AVG(price) FROM products
SELECT title, price,
(SELECT AVG(price) FROM products) AS "global average price"
FROM products;
SELECT prod_id, title, price, i.quan_in_stock
FROM products
JOIN inventory AS i USING(prod_id)
SELECT title, price,
(SELECT AVG(price) FROM products) AS "global average price"
FROM (
SELECT * FROM products WHERE price < 10
) AS "products_sub";
SELECT *
FROM <table> AS <name>
WHERE X >= | <= | = | != (
SELECT MAX(<column>)
FROM <table>
[WHERE | GROUP BY | ORDER BY | ...]
)
Types of subquery
single row
SELECT name, salary
FROM salaries
WHERE salary =
(SELECT AVG(salary) FROM salaries);
SELECT name, salary,
(SELECT AVG(salary) FROM salaries)
AS "Company average salary"
FROM salaries;
multiple row
SELECT title, price, category
FROM products
WHERE category IN (
SELECT category FROM categories
WHERE categoryname IN ('Comedy', 'Family', 'Classics')
);
multiple column
SELECT emp_no, salary,
dea.avg AS "Department average salary"
FROM salaries AS s
JOIN dept_emp AS de USING(emp_no)
JOIN (
SELECT dept_no, AVG(salary)
FROM salaries AS s2
JOIN dept_emp AS e USING(emp_no)
GROUP BY dept_no
) AS dea USING(dept_no)
WHERE salary > dea.avg;
correlated
SELECT emp_no, salary, from_date
FROM salaries AS s
WHERE from_date = (
SELECT MAX(s2.from_date) AS max
FROM salaries AS s2
WHERE s2.emp_no = s.emp_no
)
ORDER BY emp_no;
nested
SELECT orderlineid, prod_id, quantity
FROM orderlines
JOIN (
SELECT prod_id
FROM products
WHERE category IN (
SELECT category FROM categories
WHERE categoryname IN ('Comedy', 'Family', 'Classics')
)
) AS limited USING(prod_id)
show all employees older than the average age
SELECT
first_name,
last_name,
birth_date,
AGE(birth_date),
(
SELECT AVG(AGE(birth_date))
FROM employees
)
FROM employees
WHERE AGE(birth_date) > (
SELECT AVG(AGE(birth_date))
FROM employees
);
Show the title by salary for each employee
SELECT
emp_no,
salary,
from_date,
(
SELECT title
FROM titles AS t
WHERE t.emp_no = s.emp_no
AND t.from_date = s.from_date
)
FROM salaries AS s
ORDER BY emp_no;
use join instead of subquery if possible for performance reason
SELECT
emp_no,
salary,
from_date,
t.title
FROM salaries AS s
JOIN titles AS t USING(emp_no, from_date)
ORDER BY emp_no;
Show the most recent employee salary
SELECT
emp_no,
salary AS "most recent salary",
from_date
FROM salaries AS s
WHERE from_date = (
SELECT MAX(from_date)
FROM salaries AS sp
WHERE sp.emp_no = s.emp_no
)
ORDER BY emp_no ASC;
SELECT
emp_no,
salary AS "most recent salary",
from_date
FROM salaries AS s
JOIN last_salary_change AS ls USING(emp_no)
WHERE from_date = ls.max
ORDER BY emp_no;
SELECT
emp_no,
salary AS "most recent salary",
from_date
FROM salaries AS s
JOIN (
SELECT emp_no, MAX(from_date)
FROM salaries AS sp
GROUP BY emp_no
) AS ls USING(emp_no)
WHERE ls.max = from_date
ORDER BY emp_no ASC;
SELECT firstname, lastname, income
FROM customers AS c
WHERE EXISTS (
SELECT * FROM orders AS o
WHERE c.customerid = o.customerid
AND totalamount > 400
) AND income > 90000;
SELECT prod_id
FROM products
WHERE category IN (
SELECT category FROM categories
WHERE categoryname IN ('Comedy', 'Family', 'Classics')
);
SELECT prod_id
FROM products
WHERE category IN (
SELECT category FROM categories
WHERE categoryname NOT IN ('Comedy', 'Family', 'Classics')
);
SELECT prod_id
FROM products
WHERE category = ANY (
SELECT category FROM categories
WHERE categoryname IN ('Comedy', 'Family', 'Classics')
);
SELECT prod_id, title, sales
FROM products
JOIN inventory AS i USING(prod_id)
WHERE i.sales > ALL (
SELECT AVG(sales) FROM inventory
JOIN products AS pl USING(prod_id)
GROUP BY pl.category
);
SELECT prod_id
FROM products
WHERE category = (
SELECT category FROM categories
WHERE categoryname IN ('Comedy')
);
/* TRY TO WRITE THESE AS JOINS FIRST */
/*
* DB: Store
* Table: orders
* Question: Get all orders from customers who live in Ohio (OH), New York (NY) or Oregon (OR) state
* ordered by orderid
*/
SELECT c.firstname, c.lastname, o.orderid
FROM orders AS o, (
SELECT customerid, state, firstname, lastname
FROM customers
) AS c
WHERE o.customerid = c.customerid AND
c.state IN ('NY', 'OH', 'OR')
ORDER BY o.orderid;
/*
* DB: Employees
* Table: employees
* Question: Filter employees who have emp_no 110183 as a manager
*/
SELECT emp_no, first_name, last_name
FROM employees
WHERE emp_no IN (
SELECT emp_no
FROM dept_emp
WHERE dept_no = (
SELECT dept_no
FROM dept_manager
WHERE emp_no = 110183
)
)
ORDER BY emp_no
-- Written with JOIN
SELECT e.emp_no, first_name, last_name
FROM employees as e
JOIN dept_emp as de USING (emp_no)
JOIN dept_manager as dm USING (dept_no)
WHERE dm.emp_no = 110183
Why is it important?
Data definition language (DDL)
Types Of Databases
Create a database
Postgres Database
psql -U <user> <database>
psql -U postgres
\conninfo
Template0 Database
Template1 Database
one thing to note is that becase template1 is the default template - if it is being accessed / connected to no new databases can be created until the connection is closed!
Case 1: connect to Template1 and try to create new database
psql -U postgres template1
template1=#
psql -U postgres Employees
Employees=# CREATE DATABASE myblockeddb;
ERROR: source database "template1" is being accessed by other users
DETAIL: There is 1 other session using the database.
psql -U postgres Employees
Employees=# CREATE DATABASE myblockeddb;
CREATE DATABASE
Case 3: connect to Template1 and try to create new database with template0
psql -U postgres template1
template1=#
psql -U postgres Employees
Employees=# CREATE DATABASE myblockeddb2 WITH TEMPLATE template0;
CREATE DATABASE
Template Databases
CREATE DATABASE mysupertemplate;
CREATE TABLE supertable ();
CREATE DATABASE mysuperdatabase with TEMPLATE mysupertemplate;
CREATE DATABASE name
[ [ WITH ]
[ OWNER [=] user_name ]
[ TEMPLATE [=] template ]
[ ENCODING [=] encoding ]
[ LC_COLLATE [=] lc_collate ]
[ LC_CTYPE [=] lc_ctype ]
[ TABLESPACE [=] tablespace ]
[ CONNECTION_LIMIT [=] connlimit ] ]
Setting | Default |
---|---|
TEMPLATE | template1 |
ENCODING | UTF8 |
CONNECTION_LIMIT | 100 |
OWNER | Current User |
Lets create an exmple database to store courses!
psql -U postgres postgres
postgres=# CREATE DATABASE ztm;
CREATE DATABASE
postgres=# DROP DATABASE ztm;
DROP DATABASE
Postgres schema
SELECT * FROM employees
SELECT * FROM public.employees
postgres=# \dn
List of schemas
Name | Owner
--------+----------
public | postgres
(1 row)
Create a schema
postgres=# CREATE SCHEMA sales;
CREATE SCHEMA
postgres=# \dn
List of schemas
Name | Owner
--------+-------------
public | postgres
sales | postgres
(2 rows)
Reasons to use schemas
Restricted
Roles
Role Attribute
Create role
CREATE ROLE readonly WITH LOGIN ENCRYPTED PASSWORD 'redaonly';
SELECT * FROM departments;
ERROR: permission denied for table departments
postgres=# \du
List of roles
Role name | Attributes | Member of
-------------+------------------------------------------------------------+-----------
chesterheng | Superuser, Create role, Create DB | {}
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
CREATE USER user1 WITH ENCRYPTED PASSWORD 'user1';
postgres=# CREATE role test_role_with_login WITH LOGIN ENCRYPTED PASSWORD 'password';
CREATE ROLE
postgres=# \du
List of roles
Role name | Attributes | Member of
----------------------+------------------------------------------------------------+-----------
chesterheng | Superuser, Create role, Create DB | {}
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
test_role_with_login |
postgres=# CREATE USER test_user_with_login WITH ENCRYPTED PASSWORD 'password';
CREATE ROLE
postgres=# \du
List of roles
Role name | Attributes | Member of
----------------------+------------------------------------------------------------+-----------
chesterheng | Superuser, Create role, Create DB | {}
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
test_role_with_login | | {}
test_user_with_login |
createuser --interactive
Enter name of role to add: test_interactive
Shall the new role be a superuser? (y/n) n
Shall the new role be allowed to create databases? (y/n) n
Shall the new role be allowed to create more new roles? (y/n) y
postgres=# \du
List of roles
Role name | Attributes | Member of
----------------------+------------------------------------------------------------+-----------
chesterheng | Superuser, Create role, Create DB | {}
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
test_interactive | Create role | {}
test_role_with_login | | {}
test_user_with_login |
postgres=# ALTER ROLE test_interactive WITH ENCRYPTED PASSWORD 'password';
ALTER ROLE
psql -U test_interactive Employees
Employees=>
psql -U test_interactive Employees -W
Employees=>
postgres=# show hba_file;
hba_file
----------------------------------------------------------------------------
/Users/chesterheng/Library/Application Support/Postgres/var-12/pg_hba.conf
(1 row)
postgres=# show config_file;
config_file
--------------------------------------------------------------------------------
/Users/chesterheng/Library/Application Support/Postgres/var-12/postgresql.conf
(1 row)
# TYPE DATABASE USER ADDRESS METHOD
# "local" is for Unix domain socket connections only
local all all scram-sha-256
# IPv4 local connections:
host all all 127.0.0.1/32 scram-sha-256
# IPv6 local connections:
host all all ::1/128 scram-sha-256
# Allow replication connections from localhost, by a user with the
# replication privilege.
local replication all scram-sha-256
host replication all 127.0.0.1/32 scram-sha-256
host replication all ::1/128 scram-sha-256
psql -U test_interactive Employees
Password for user test_interactive:
psql: error: could not connect to server: FATAL: password authentication failed for user "test_interactive"
password_encryption = scram-sha-256 # md5 or scram-sha-256
# TYPE DATABASE USER ADDRESS METHOD
# "local" is for Unix domain socket connections only
local all all trust
# IPv4 local connections:
host all all 127.0.0.1/32 trust
# IPv6 local connections:
host all all ::1/128 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
local replication all trust
host replication all 127.0.0.1/32 trust
host replication all ::1/128 trust
postgres=# ALTER USER test_interactive WITH PASSWORD 'password';
ALTER ROLE
# TYPE DATABASE USER ADDRESS METHOD
# "local" is for Unix domain socket connections only
local all all scram-sha-256
# IPv4 local connections:
host all all 127.0.0.1/32 scram-sha-256
# IPv6 local connections:
host all all ::1/128 scram-sha-256
# Allow replication connections from localhost, by a user with the
# replication privilege.
local replication all scram-sha-256
host replication all 127.0.0.1/32 scram-sha-256
host replication all ::1/128 scram-sha-256
psql -U test_interactive Employees
Password for user test_interactive:
Employees=>
# TYPE DATABASE USER ADDRESS METHOD
# "local" is for Unix domain socket connections only
local all postgres trust
psql -U postgres postgres
postgres=# exit
Grant Privileges
GRANT ALL PRIVILEGES ON <table> TO <user>;
GRANT ALL ON ALL TABLES [IN SCHEMA <schema>] TO <user>;
GRANT [SELECT, INSERT, UPDATE, DELETE, ...] ON <table> [IN SCHEMA <schema>] TO <user>;
# TYPE DATABASE USER ADDRESS METHOD
# "local" is for Unix domain socket connections only
local all all trust
# IPv4 local connections:
host all all 127.0.0.1/32 trust
# IPv6 local connections:
host all all ::1/128 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
local replication all trust
host replication all 127.0.0.1/32 trust
host replication all ::1/128 trust
createuser --interactive
Enter name of role to add: privilegetest
Shall the new role be a superuser? (y/n) n
Shall the new role be allowed to create databases? (y/n) n
Shall the new role be allowed to create more new roles? (y/n) n
psql -U privilegetest Employees
Employees=# \conninfo
You are connected to database "Employees" as user "privilegetest" via socket in "/tmp" at port "5432".
Employees=# \dt
List of relations
Schema | Name | Type | Owner
--------+--------------+-------+-------------
public | departments | table | postgres
public | dept_emp | table | postgres
public | dept_manager | table | postgres
public | employees | table | postgres
public | salaries | table | postgres
public | titles | table | postgres
(6 rows)
Employees-# SELECT * FROM titles;
ERROR: syntax error at or near "SELECT"
LINE 2: SELECT * FROM titles;
^
psql -U postgres Employees
Employees=# GRANT SELECT ON titles TO privilegetest;
GRANT
psql -U privilegetest Employees
Employees=> SELECT * FROM titles;
emp_no | title | from_date | to_date
--------+--------------------+------------+------------
10001 | Senior Engineer | 1986-06-26 | 9999-01-01
10002 | Staff | 1996-08-03 | 9999-01-01
10003 | Senior Engineer | 1995-12-03 | 9999-01-01
10004 | Engineer | 1986-12-01 | 1995-12-01
10004 | Senior Engineer | 1995-12-01 | 9999-01-01
10005 | Senior Staff | 1996-09-12 | 9999-01-01
10005 | Staff | 1989-09-12 | 1996-09-12
10006 | Senior Engineer | 1990-08-05 | 9999-01-01
10007 | Senior Staff | 1996-02-11 | 9999-01-01
10007 | Staff | 1989-02-10 | 1996-02-11
10008 | Assistant Engineer | 1998-03-11 | 2000-07-31
10009 | Assistant Engineer | 1985-02-18 | 1990-02-18
10009 | Engineer | 1990-02-18 | 1995-02-18
10009 | Senior Engineer | 1995-02-18 | 9999-01-01
10010 | Engineer | 1996-11-24 | 9999-01-01
10011 | Staff | 1990-01-22 | 1996-11-09
10012 | Engineer | 1992-12-18 | 2000-12-18
10012 | Senior Engineer | 2000-12-18 | 9999-01-01
10013 | Senior Staff | 1985-10-20 | 9999-01-01
10014 | Engineer | 1993-12-29 | 9999-01-01
10015 | Senior Staff | 1992-09-19 | 1993-08-22
:
psql -U postgres Employees
Employees=# REVOKE SELECT ON titles FROM privilegetest;
REVOKE
psql -U privilegetest Employees
Employees=> SELECT * FROM titles;
ERROR: permission denied for table titles
psql -U postgres Employees
Employees=# GRANT ALL ON ALL TABLES IN SCHEMA public TO privilegetest;
GRANT
Employees=# REVOKE ALL ON ALL TABLES IN SCHEMA public FROM privilegetest;
REVOKE
Employees=# CREATE ROLE employee_read;
CREATE ROLE
Employees=# GRANT SELECT ON ALL TABLES IN SCHEMA public TO employee_read;
GRANT
Employees=# GRANT employee_read TO privilegetest;
GRANT ROLE
Employees=# \du
List of roles
Role name | Attributes | Member of
----------------------+------------------------------------------------------------+-----------------
chesterheng | Superuser, Create role, Create DB | {}
employee_read | Cannot login | {}
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
privilegetest | | {employee_read}
test_interactive | Create role | {}
test_role_with_login | | {}
test_user_with_login | | {}
Employees=# REVOKE employee_read FROM privilegetest;
REVOKE ROLE
Before we start
Data types
Boolean
Character
CREATE TABLE test_text (
fixed char(4),
variable varchar(20),
unlimited text
);
INSERT INTO test_text VALUES ('mo', 'mo', 'I have unlimited space');
SELECT * FROM test_text;
CREATE TABLE test_float (
four float4,
eight float8,
big decimal
);
INSERT INTO test_float VALUES (1.123456789, 1.12345678901234567, 1.12345678901234567);
SELECT * FROM test_float;
four | eight | big |
---|---|---|
1.1234568 | 1.1234567890123457 | 1.12345678901234567 |
CREATE TABLE test_array (
four char(2)[],
eight text[],
big float4[]
);
INSERT INTO test_array VALUES (ARRAY ['mo', 'm', 'm', 'd'], ARRAY ['test', 'long text', 'longer text'], ARRAY [1.23, 2.11, 3.23, 5.32]);
SELECT * FROM test_array;
four | eight | big |
---|---|---|
[“mo”,”m “,”m “,”d “] | [“test”,”long text”,”longer text”] | [1.23,2.11,3.23,5.32] |
Database Models
What is a model?
Naming conventions
CREATE TABLE <name> (
<col1> TYPE [CONSTRAINT],
table_constraint [CONSTRAINT]
) [INHERITS <existing_tables>];
CREATE TABLE student (
student_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
first_name VARCHAR(255) NOT NULL,
last_name VARCHAR(255) NOT NULL,
email VARCHAR(255) NOT NULL,
date_of_birth DATE NOT NULL
);
psql -U postgres ztm
ztm=# \conninfo
You are connected to database "ztm" as user "postgres" via socket in "/tmp" at port "5432".
ztm=# CREATE TABLE student (
ztm(# student_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
ztm(# first_name VARCHAR(255) NOT NULL,
ztm(# last_name VARCHAR(255) NOT NULL,
ztm(# email VARCHAR(255) NOT NULL,
ztm(# date_of_birth DATE NOT NULL
ztm(# );
ERROR: function uuid_generate_v4() does not exist
LINE 2: student_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
ztm=# CREATE extension if not exists "uuid-ossp";
CREATE EXTENSION
ztm=# CREATE TABLE student (
ztm(# student_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
ztm(# first_name VARCHAR(255) NOT NULL,
ztm(# last_name VARCHAR(255) NOT NULL,
ztm(# email VARCHAR(255) NOT NULL,
ztm(# date_of_birth DATE NOT NULL
ztm(# );
CREATE TABLE
ztm=# \dt
List of relations
Schema | Name | Type | Owner
--------+---------+-------+----------
public | student | table | postgres
(1 row)
ztm=# \d student
Table "public.student"
Column | Type | Collation | Nullable | Default
---------------+------------------------+-----------+----------+--------------------
student_id | uuid | | not null | uuid_generate_v4()
first_name | character varying(255) | | not null |
last_name | character varying(255) | | not null |
email | character varying(255) | | not null |
date_of_birth | date | | not null |
Indexes:
"student_pkey" PRIMARY KEY, btree (student_id)
Temporary tables
CREATE TEMPORARY TABLE <name> (<columns>);
Constraints
Column Constraints
Examples
Table Constraints
Examples
CREATE TABLE student (
student_id UUID DEFAULT uuid_generate_v4(),
first_name VARCHAR(255) NOT NULL,
last_name VARCHAR(255) NOT NULL,
email VARCHAR(255) NOT NULL,
date_of_birth DATE NOT NULL,
CONSTRAINT pk_student_id PRIMARY KEY (student_id)
);
CREATE TABLE category (
cat_id SMALLINT PRIMARY KEY,
type TEXT
);
CREATE TABLE column_constraints (
cc_id SMALLINT PRIMARY KEY,
something TEXT NOT NULL,
email TEXT CHECK (email ~* '^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+[.][A-Za-z]+$'),
cat_id SMALLINT REFERENCES category(cat_id)
);
CREATE TABLE table_constraints (
cc_id SMALLINT,
something TEXT NOT NULL,
email TEXT,
cat_id SMALLINT REFERENCES category(cat_id),
CONSTRAINT pk_table_constraints PRIMARY KEY (cc_id),
CONSTRAINT valid_email CHECK (email ~* '^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+[.][A-Za-z]+$')
);
INSERT INTO category VALUES (1, 'category 1');
INSERT INTO column_constraints VALUES (1, 'something', 'test@email.com', 1);
INSERT INTO table_constraints VALUES (1, 'something', 'test@email.com', 1);
SELECT * FROM category;
SELECT * FROM table_constraints;
SELECT * FROM column_constraints;
INSERT INTO column_constraints VALUES (1, 'something', '@email.com', 1);
Query Error: error: new row for relation "column_constraints" violates check constraint "column_constraints_email_check"
INSERT INTO table_constraints VALUES (1, 'something', '@email.com', 1);
Query Error: error: new row for relation "table_constraints" violates check constraint "valid_email"
CREATE extension if not exists "uuid-ossp";
PROS | CONS |
---|---|
unique everywhere | large values to store |
easy to shard | can have performance impact |
easy to merge / replicate | more difficult to debug |
expose less info about your system |
Custom Data Types
Domains
CREATE DOMAIN Rating SMALLINT
CHECK (VALUE > 0 AND VALUE <= 5);
CREATE TYPE Feedback AS (
student_id UUID,
rating Rating,
feedback TEXT
);
CREATE TABLE student (
student_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
first_name VARCHAR(255) NOT NULL,
last_name VARCHAR(255) NOT NULL,
email VARCHAR(255) NOT NULL,
date_of_birth DATE NOT NULL
);
CREATE TABLE subject (
subject_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
subject TEXT NOT NULL,
description TEXT
);
CREATE TABLE teacher (
teacher_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
first_name VARCHAR(255) NOT NULL,
last_name VARCHAR(255) NOT NULL,
email VARCHAR(255) NOT NULL,
date_of_birth DATE NOT NULL
);
CREATE TABLE course (
course_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
"name" TEXT NOT NULL,
description TEXT,
subject_id UUID REFERENCES subject(subject_id),
teacher_id UUID REFERENCES teacher(teacher_id),
feedback feedback[]
);
CREATE TABLE enrollment (
course_id UUID REFERENCES course(course_id),
student_id UUID REFERENCES student(student_id),
enrollment_date DATE NOT NULL,
CONSTRAINT pk_enrollment PRIMARY KEY (course_id, student_id)
);
ALTER TABLE student
ADD COLUMN email VARCHAR(255) NOT NULL;
ALTER TABLE [ IF EXISTS ] [ ONLY ] name [ * ]
ADD COLUMN <col> <type> <constraint>
ALTER TABLE [ IF EXISTS ] [ ONLY ] name [ * ]
ALTER COLUMN <name> TYPE <new type> [USING <expression>]
ALTER TABLE [ IF EXISTS ] [ ONLY ] name [ * ]
RENAME COLUMN <old name> TO <new name>
ALTER TABLE [ IF EXISTS ] [ ONLY ] name [ * ]
DROP COLUMN <col> [ RESTRICT | CASCADE ]
Our main advice when using this command is to make sure you communicate changes early and set in place a plan for other systems/software to be able to migrate, the last thing you want to happen is changing a column and suddenly all your services start breaking!
INSERT INTO student (first_name, last_name, email, date_of_birth) VALUES ( 'Mo', 'Binni', 'mo@binni.io', '1992-11-13'::DATE);
INSERT INTO teacher (first_name, last_name, email, date_of_birth) VALUES ( 'Mo', 'Binni', 'mo@binni.io', '1992-11-13'::DATE);
INSERT INTO subject (subject, description) VALUES ('SQL', 'A database management language');
DELETE FROM subject WHERE subject = 'SQL';
INSERT INTO course ("name", description) VALUES ('SQL Zero to Mastery', 'The #1 resource for for SQL mastery');
UPDATE course
SET subject_id = 'e4a6e53f-db6f-438f-b4d0-80d4e6a9341e'
WHERE subject_id is NULL;
ALTER TABLE course
ALTER COLUMN subject_id SET NOT NULL;
UPDATE course
SET teacher_id = '90de5068-bc2f-4d37-b4f0-1364671fb89e'
WHERE teacher_id is NULL;
ALTER TABLE course
ALTER COLUMN teacher_id SET NOT NULL;
INSERT INTO course ("name", description, subject_id, teacher_id) VALUES ('SQL Zero to Mastery', 'The #1 resource for for SQL mastery', 'e4a6e53f-db6f-438f-b4d0-80d4e6a9341e', '90de5068-bc2f-4d37-b4f0-1364671fb89e');
INSERT INTO enrollment (student_id, course_id, enrollment_date) VALUES ('f914ba46-9ac4-42cf-b0ac-d10cfe625be8', '6174589a-7f97-4d40-b063-6a974a20a7ec', NOW()::DATE);
UPDATE course
SET feedback = array_append(
feedback, ROW('f914ba46-9ac4-42cf-b0ac-d10cfe625be8', 5, 'Great course!')::feedback
)
WHERE course_id = '6174589a-7f97-4d40-b063-6a974a20a7ec';
CREATE TABLE feedback (
student_id UUID NOT NULL REFERENCES student(student_id),
course_id UUID NOT NULL REFERENCES course(course_id),
feedback TEXT,
rating rating,
CONSTRAINT pk_feedback PRIMARY KEY (student_id, course_id)
);
INSERT INTO feedback (student_id, course_id, feedback, rating) VALUES ('f914ba46-9ac4-42cf-b0ac-d10cfe625be8', '6174589a-7f97-4d40-b063-6a974a20a7ec', 'Interesting Course', 4)
Have a plan
What can go wrong?
How do I make a plan?
What to backup?
Type | Purpose | Frequency |
---|---|---|
full backup | backup all the data | less often |
incremental | backup since last incremental | often |
differential | backup since last full backup | often |
transaction log | backup of the database transactions | a lot |
psql -d postgres
postgres=# \conninfo
You are connected to database "postgres" as user "postgres" via socket in "/tmp" at port "5432".
postgres=# \i ~/Desktop/Employees_20200927_153756.sql
Transactions
Employees=# BEGIN;
BEGIN
Employees=# DELETE FROM employees WHERE emp_no BETWEEN 10000 AND 10005;
DELETE 5
Employees=# SELECT * FROM employees;
emp_no | birth_date | first_name | last_name | gender | hire_date
--------+------------+----------------+------------------+--------+------------
10006 | 1953-04-20 | Anneke | Preusig | F | 1989-06-02
10007 | 1957-05-23 | Tzvetan | Zielinski | F | 1989-02-10
10008 | 1958-02-19 | Saniya | Kalloufi | M | 1994-09-15
10009 | 1952-04-19 | Sumant | Peac | F | 1985-02-18
10010 | 1963-06-01 | Duangkaew | Piveteau | F | 1989-08-24
Employees=# SELECT * FROM employees;
emp_no | birth_date | first_name | last_name | gender | hire_date
--------+------------+----------------+------------------+--------+------------
10001 | 1953-09-02 | Georgi | Facello | M | 1986-06-26
10002 | 1964-06-02 | Bezalel | Simmel | F | 1985-11-21
10003 | 1959-12-03 | Parto | Bamford | M | 1986-08-28
10004 | 1954-05-01 | Chirstian | Koblick | M | 1986-12-01
10005 | 1955-01-21 | Kyoichi | Maliniak | M | 1989-09-12
10006 | 1953-04-20 | Anneke | Preusig | F | 1989-06-02
10007 | 1957-05-23 | Tzvetan | Zielinski | F | 1989-02-10
10008 | 1958-02-19 | Saniya | Kalloufi | M | 1994-09-15
10009 | 1952-04-19 | Sumant | Peac | F | 1985-02-18
10010 | 1963-06-01 | Duangkaew | Piveteau | F | 1989-08-24
Employees=# BEGIN;
BEGIN
Employees=# DELETE FROM employees WHERE emp_no BETWEEN 10000 AND 10005;
Employees=# ROLLBACK;
Employees=# DELETE FROM employees WHERE emp_no BETWEEN 10000 AND 10005;
DELETE 5
Employees=# BEGIN;
BEGIN
Employees=# DELETE FROM employees WHERE emp_no BETWEEN 10000 AND 10005;
Employees=# DELETE FROM employees WHERE emp_no BETWEEN 10000 AND 10005;
DELETE 5
Employees=# END;
COMMIT
Employees=# BEGIN;
BEGIN
Employees=# DELETE FROM employees WHERE emp_no BETWEEN 10000 AND 10005;
DELETE 0
Employees=# END;
COMMIT
Transactions
We have two solid clues so far:
For this part of the course you’ll need 2 databases:
To restore the Movr backup make sure to do the following steps in order!
psql -U postgres Movr
Movr=# \i ~/Desktop/schema.sql
Movr=# \i ~/Desktop/data.sql
To restore the Movr_Employees backup make sure to do the following steps in order!
psql -U postgres Movr_Employees
Movr_Employees=# \i ~/Desktop/movr_employees.sql
You had to be on location to steal the data
CREATE VIEW suspected_rides AS
SELECT *
FROM "public"."vehicle_location_histories"
WHERE
"city" = 'new york' AND
"timestamp"::date = '2020-06-23'::date AND
"lat" BETWEEN -74.997 AND -74.9968 AND
"long" BETWEEN 40.5 AND 40.6
ORDER BY long;
SELECT DISTINCT r.vehicle_id
FROM suspected_rides AS sr
JOIN rides AS r ON r.id = sr.ride_id;
SELECT
DISTINCT r.vehicle_id,
u.name AS "owner name",
u.address,
v.status,
v.current_location
FROM suspected_rides AS sr
JOIN rides AS r ON r.id = sr.ride_id
JOIN vehicles AS v ON v.id = r.vehicle_id
JOIN users AS u ON u.id = v.owner_id;
It’s not the drivers
So with our current setup, we go ahead and filter out all of the unique riders that were on those suspected rides on that horrible day of the theft.
SELECT DISTINCT r.vehicle_id, u.name AS "rider name", u.address
FROM suspected_rides AS sr
JOIN rides AS r ON r.id = sr.ride_id
JOIN users AS u ON u.id = r.rider_id;
We’re going to have to cross-reference data between 2 separate databases.
-- run this to run queries across databases
CREATE extension dblink;
SELECT *
FROM dblink(
'host=localhost user=postgres password=postgres dbname=Employees',
'SELECT <column> FROM employees;')
AS t1(<column> NAME)
CREATE VIEW suspected_rider_names AS
SELECT DISTINCT
split_part(u.name, ' ', 1) AS "first_name",
split_part(u.name, ' ', 2) AS "last_name"
FROM suspected_rides AS vlh
JOIN rides AS r ON r.id = vlh.ride_id
JOIN users AS u ON u.id = r.rider_id;
SELECT * FROM suspected_rider_names;
SELECT DISTINCT
CONCAT(t1.first_name, ' ', t1.last_name) AS "employee",
CONCAT(u.first_name, ' ', u.last_name) AS "rider"
FROM dblink(
'host=localhost user=chesterheng password=postgres dbname=Movr_Employees',
'SELECT first_name, last_name FROM employees;')
AS t1(first_name NAME, last_name NAME)
JOIN suspected_rider_names AS u ON t1.last_name = u.last_name
ORDER BY "rider";
Software development lifecycle
Database design
Top-Down
Bottom-Up
What to use? Often you’ll end up using a bit of both
Phase 3: system design
ER Model: Entity-Relationship Model or diagram
What is an entity?
Core requirements
Attributes
Relation schema
Relation instance
School:
Instructor:
Student:
Exam:
Lesson:
System requirements
Some constraints to note:
Let’s answer some questions about the system:
5 steps of top-down to create an erd of the system
Step 1: entities
Step 2: attributes
Entity | Attributes |
---|---|
Painting | name, creation_date, style |
Reservation | creation_date, date_from, date_to, accepted |
Artist | name, birth_date, email |
Museum | name, address, phone_nr, email |
Step 3: Relationships
Step 4: Solving Many To Many
Steps
Goal
There are 3 types
Goal
Normalization
emp_no | proj_id | salary |
---|---|---|
1 | 1 | 300 |
2 | 1 | 500 |
1 | 2 | 400 |
3 | 2 | 300 |
Example:
EMP_NO -> SALARY (no)
PROJ_ID -> SALARY (no)
PROJ_ID, EMP_NO -> SALARY (yes)
student_no | activity | contribution |
---|---|---|
1 | swimming | 300 |
2 | sailing | 500 |
3 | bungee | 150 |
4 | shark diving | 200 |
1 | shark diving | 300 |
2 | bungee | 300 |
Example:
STUDENT_NO -> ACTIVITY (no)
ACTIVITY -> STUDENT_NO (no)
ACTIVITY -> CONTRIBUTION (no)
STUDENT_NO, ACTIVITY -> CONTRIBUTION (yes)
data is in 0 NF when it is unnormalized
1 NF
A repeating group means that a table contains 2 or more columns that are closely related
Example 1:
O NF
book | author 1 | author 2 | author 3 |
---|---|---|---|
1 | 1 | 2 | 3 |
2 | 2 | 1 | 3 |
3 | 3 | 2 | 1 |
1 NF
BOOK
book_id | title |
---|---|
BOOK_AUTHOR
book_id | author_id | author_name | author_address | author_email |
---|---|---|---|---|
Example 2:
O NF
color | quantity | price |
---|---|---|
red, green, blue | 20 | 9.99 |
yellow, orange, purple | 10 | 10.99 |
blue, cyan | 15 | 3.99 |
green, magento | 200 | 15.99 |
1 NF
PRODUCT
prod_id | quantity | price |
---|---|---|
PRODUCT_COLOUR
prod_id | color |
---|---|
Example:
O NF
1 NF
BOOK
BOOK_AUTHOR
2 NF
BOOK
BOOK_AUTHOR
AUTHOR
Example:
O NF
1 NF
EMPLOYEE
BRANCH
2 NF
EMPLOYEE
BRANCH
TIMESHEET
3 NF
EMPLOYEE
BRANCH
PROVINCE
TIMESHEET
A relationship is not in BCNF if
3 NF
student_id | tutor_id | tutor_sin |
---|---|---|
1 | 998 | 838 383 494 |
2 | 234 | 343 535 352 |
3 | 999 | 112 134 134 |
4 | 1234 | 354 464 234 |
candidate keys
functional dependency
BCNF
student_id | tutor_id |
---|---|
1 | 998 |
2 | 234 |
3 | 999 |
4 | 1234 |
tutor_id | tutor_sin |
---|---|
999 | 838 383 494 |
234 | 343 535 352 |
999 | 112 134 134 |
1234 | 354 464 234 |
Scalability
2 strategies
2 strategies
Big drop in developer job postings
NoSQL (MongoDB)
Relational (PostgreSQL)
NewSQL
CREATE TABLE users (
id serial PRIMARY KEY,
"name" VARCHAR(100),
email text UNIQUE NOT NULL,
entries BIGINT DEFAULT 0,
joined TIMESTAMP NOT NULL
);
CREATE TABLE login (
id serial PRIMARY KEY,
hash VARCHAR(100) NOT NULL,
email text UNIQUE NOT NULL
);
const postgres = knex({
client: 'pg',
connection: {
host : 'localhost',
user : 'postgres',
password : '',
database : 'smart-brain'
}
});
postgres.select('*').from('users');
app.post('/register', (req, res) => {
const { email, name } = req.body;
db('users')
.returning('*')
.insert({
name,
email,
joined: new Date()
})
.then(user => res.json(user[0]))
.catch(err => res.status(400).json('unable to register'));
});