🖊️ 计算机学院数据库系统（Database System）

Basic info

课程名：数据库系统（Database System）

主讲教师： $\mathbf{Prof.}$ $Gang$ $Chen$

E-Mail：[email protected]$

分数占比：

Assignment $10\%$
Quiz $10\%$
Lab $30\%$
Final $50\%$ （Cheat sheet allowed）

教材：

$Abraham Silberschatz$, Database System Concepts (7th Edition).

Lec 1 : Introduction

Abstraction of a database system：

Physical Level: Describes how a record is stored
Logical Level: Describes data stored in database, and the relationships among the data on upper level.
View Level: Application programs hide details of data types.

► Schema $\leftrightarrow$ Type：The structure of the database on different level.

Physical Schema: database structure design at the physical level.

Logical Schema: database structure design at the logical level.

Subschema: schema at view level.

► Instance $\leftrightarrow$ Variable: The actual content of the database at a particular point in time.

Independence: Ability to modify a schema definition at one level without affecting a schema definition at a higher level.

Physical data independence: The ability to modify the physical schema without changing the logical schema.
1. Applications depend on the logical schema.
2. Applications are insulated (绝缘的) from how data is structured and stored.
3. One of the most important benefits of using a DBMS.
Logical data independence: Protect application programs from changes in logical structure of data.
1. Hard to achieve as the application programs are heavily dependent on the logical structure of data.

Database Languages:

Data Definition Language: Specification notation for defining the database schema.
1. Specifies a database schema as a set of definitions of relational schema
2. Also specifies storage structure, access methods, and consistency constraints.
3. DDL statements are compiled, resulting in a set of tables stored in a special file called data dictionary.
Data Manipulation Language: Language for accessing and manipulating the data organized by appropriate data model.
Data Control Language

Lec 2 : Relational Model

A Relation is a table with rows and columns.

A relational database is a collection of one or more relations, which are based on the relational model

Relations

Basic Definition of a relation:

Formally, given sets $D_1,D_2,\dots,D_n (D_i=a_{ij}|_{j=1\dots k})$, a relation $r$ is a subset of $D_1\times D_2\times \dots \times D_n$.

Attribute Types：

Each attribute of a relation has a name.
The set of allowed values for each attribute is called the domain of the attribute.
$1^{st} NF$: Attribute values are required to be atomic (indivisible)
The special value null is a member of every domain.

A relation is concerned with two concepts: relation schema and relation instance.

The relation schema describes the structure of the relation
The relation instance corresponds to the snapshot of the data in the relation at a given instant in time

Assume $A_1,A_2,\dots,A_n$ are attributes, then $\mathbf{R}=(A_1,A_2,\dots,A_n)$

is a relation schema
$r(\mathbf{R})$ is a relation on the relation schema $\mathbf{R}$

Properties of a relation:

The order of tuples is irrelevant.
No duplicated tuple in a relation.
Attribute values are atomic.

Database：

A database consists of multiple relations.

Key

$K\subseteq \mathbf{R}$. (An attribute of the relation schema)

Superkey: $K$ is a superkey of $\mathbf{R}$ if values for $K$ are sufficient to identify a unique tuple of each possible relation $r(\mathbf{R})$.
Candidate Key: $K$ is a candidate key if it is minimal superkey.
Primary key: $K$ is a Primary key if $K$ is a candidate key and is defined by user explicitly.
Foreign key: Assume there exists relations $r$ and $s$: $r(A,B,C),s(B,D)$, we can say that attribute $B$ in relation $r$ is foreign key referencing $s$ and $r$ is a referencing relation and $s$ is a referenced relation. There are two types of constraints:
1. Foreign-key constraint:
  1. the value of $B$ for each tuple in $r$ must also be the value of $B$ for some tuple in $s$.
  2. $B$ must be the primary key of $s$.
2. Referential-integrity constraint:
  1. the value of $B$ for each tuple in $r$ must also be the value of $B$ for some tuple in $s$.
  2. Usually not supported by Database Systems.

Take the table above as an example, id and name are both superkey. birth year and weight is a superkey when we combine them together.

Superkey id or name have only one attribute, thus they are both candidate key. The user can decide which one becomes the primary key.

If a connection should be established between the table and another table which have id as an attribute, then id is a foreign key.

Fundamental Relational-Algebra Operations

There are six basic operators:

Select

defined as $\sigma_p(r)=\{t|t\in r \land p(t)\}$, where $p$ is a formula in propositional calculas consisting of terms connected by $\land, \vee, \neg.$ e.g.

$\mathrm{\sigma_{customer-name='Dickson'\ \land\ type='debit'}(account).}$

Project

defined as $\prod_{A_1,A_2,\dots,A_k}(r).$ Where $A_1,A_2,\dots,A_k$ are attribute names and $r$ is a relation name. The result is defined as the relation of $k$ columns obtained by erasing the columns that are not listed. Note that duplicated rows must be removed from the result.

Union

defined as $r\cap s=\{t|t\in r\vee t\in s\}.$ where $r$ and $s$ must have the same number of attributes, and the attribute domains must be compatible.

Set Difference

defined as $r-s=\{t|t\in r \land t \notin s\}.$ where $r$ and $s$ must have the same number of attributes, and the attribute domains must be compatible.

Cartesian Product

defined as $r\times s=\{ \{t ,q\}|t\in r \land q\in s\}$. Assume that attributes of $r(\mathbf{R})$ and $s(\mathbf{S})$ are disjoint. If attributes of $r(\mathbf{R})$ and $s(\mathbf{S})$ are not disjoint, then renaming for attributes must be used.

Rename

defined as $\rho_x(E)$, essential when self-compare occurs, e.g. $\mathrm{\max(r)}$.

Advanced

Natural Join Operation

Notation: $r\bowtie s$
Example:$\mathrm{R=(A,B,C,D), S=(B,D,E)}$

$\mathrm{r\bowtie s=\prod_{r.A,r.B,r.C,r.D,s.E}(\sigma_{r.B=s.B\land r.D=s.D}(r\times s))}$

Theta join Operation

Notation: $r\bowtie_\theta s$
where $\theta$ is the predicate on attributes in the schema. note that $\mathrm{r\bowtie_\theta s=\sigma_\theta(r\times s)}$

Division Operation

Notation: $r\div s$
The backward operation of Cartesian Product.

$\mathrm{ r\div s=\prod_{R-S}(r)-\prod_{R-S}((\prod_{R-S}(r\times s)-\prod_{R-S,S}(r)) ) }.$

Assignment Operation

Notation: $r\leftarrow s$

Lec 3 : SQL

SQL /ˈsēkw(ə)l/ (Structured Query Language, 结构化查询语言)
SQL includes these parts:

Data-Definition Language
- CREATE TABLE;
- DROP TABLE;
Data-Manipulation Language
- SELECT * FROM TABLE;
- INSERT
Data-Control Language
- GRANT

Data-Definition Language

What can DDL do?

Create, drop or alter table and Define the schema for relations

for example,

CREATE TABLE dick
            (dick_owner  char(20) not null,
             dick_length numeric(2, 2),
             dick_radius numeric(2, 2),
             primary key (dick_owner));

creates a new table names dick , and contains three attribute dick_owner, dick_length and dick_radius. We set dick_owner the primary key as no dick can be owned by two.

1	DROP TABLE dick;

this command deletes all information about the dropped relation from the database.
One should be careful and responsible to use the DROP command.

1
2
3

ALTER TABLE dick ADD dick_angle numeric(2, 2);
ALTER TABLE dick DROP dick_angle; //not supported by many DBMS
ALTER TABLE dick MODIFY (dick_owner varchar(20));

this command is used to add, drop or edit attributes to an existing relation

Note:
“-“ character is invalid in names.
SQL names are case-insensitive.

Define the domain of values associated with each attribute
There are several types of domain in SQL

char(n) : Fixed length character string, the length must be n.

varchar(n) : Variable length character string, the maximum length is n.

int : Integer (depend on the machine)

smallint : Subset of Integer type (depend on the machine)

numeric(p, d) : Fixed point number, precision is p digits, d digits to the right of decimal point.

real, double precision : Floating point and double-precision floating point numbers.

float(n) : Floating point number, precision at least n digits.

NULL : nothing.

date : Contain year, month and day.

time : Time of a day. Contain hour, minute and second.

timestamp : date and time.

Functions can be used to handle the conversion of different types. such as Abs(), exp()…

Define the integrity constraints.

Integrity Constraints are the protocols that a table’s data columns must follow. Some integrity constraints are shown below:

Not null
Primary key (A, B, ..., Z), ( for SQL_92, this constraint ensures Not null )
Check (P), where P is a predicate, value of this attribute of each tuple must satisfy P.

Other

Define the physical storage structure of each relations on disk
Define the indices to be maintained for each relations
Define the view on relations

Basic Structure of command

select clause

A typical SQL query can have the form:

SELECT (DISTINCT) A, B, ... 
FROM   r, s, ...
WHERE  P
GROUP BY C, D, ... 
HAVING Q
ORDER BY E, F, ...

where A and B are attributes; r and s are relations; P is a predicate.

This query is equivalent to the relational algebra expression:

$\prod_{A, B,\dots}(\sigma_P(r\times s\times \dots)).$

the result of query is also a relation.

Distinct: In the result of query, duplicates is allowed by default, if we want it to be eliminated, key word distinct can be use:

1 2	SELECT distinct dick_length FROM dick

Note that distinct means every tuple is distinct in the result relation, but not every value of one attribute is distinctive.

Asterisk (*): An asterisk in the select clause denotes all attributes.

1	SELECT * FROM dick

The asterisk can also be used in arithmetic expressions, such as

1 2	SELECT dick_owner, dick_length * 10 FROM dick;

where clause: The where clause specifies conditions that the result must satisfy.

1
2
3

SELECT dick_owner
FROM dick
WHERE dick_length > 30;

the SQL query returns all the names of whom have a dick longer than 30. It’s equivalent to the relational algebra expression:

$\prod_{\mathrm{dick\_owner}}(\sigma_{\mathrm{dick\_length>30}}(\mathrm{dick})).$

The predicate, P can be a combination of several predicates connected with logical connectives including AND, OR and NOT.

from clause: the FROM clause lists the relations involved in the query. If more than one relation is specified in the clause, than select from the Cartesian product relation of all relations.

SELECT dick.dick_owner, dick_length, dick_used_count
FROM dick, dick_usage_record
WHERE dick.dick_owner = dick_usage_record.dick_owner AND
      dick_used_count > 114514 AND
      dick_length     > 30

The query selects the name of those who have a dick longer than 30 and used for more than 114,514 times. Note that dick.dick_owner is necessary because there are two dick_owner attribute in total.
rename(as): The SQL allows renaming relations and attributes using the as clause.

SELECT dick.dick_owner as excellent_dick_owner, 
       dick_length, dick_used_count
FROM dick, dick_usage_record
WHERE dick.dick_owner = dick_usage_record.dick_owner AND
      dick_used_count > 114514 AND
      dick_length     > 30

tuple variables: renaming the relation temporarily in the query.
for simplification:

SELECT r.dick_owner, dick_length, dick_used_count
FROM dick as r, dick_usage_record as s
WHERE r.dick_owner    = s.dick_owner AND
      dick_used_count > 114514 AND
      dick_length     > 30

for self-comparing, the following query selects all the name of those who’s dick is longer than cartman:

SELECT DISTINCT r.dick_owner, r.dick_length
FROM dick as r, dick as s
WHERE r.dick_length > s.dick_length AND
      s.dick_owner  = "cartman";

string operations

SQL includes a string-matching operator for comparisons on character strings

% matches any substring
_ matches any character
|| Concat strings
lower()
upper()

LIKE clause can be used to form a predicate
e.g. find all dick owner who’s name ends with ‘son’

1	WHERE dick_owner LIKE '%son'

e.g.

1
2
3

SELECT 'dick length of him is ' || dick_length
FROM dick
WHERE dick_owner LIKE '%son'

ordering of display of tuples

Though tuples in the databse is inordered, but we can define how it will display by using ORDER BY clause.

SELECT dick_owner
FROM dick
WHERE dick_length > 30
ORDER BY dick_length DESC

Set Operation

There are there basic type of Set operations:

UNION : $A\cap B$.
INTERSECT : $A\cup B$.
EXCEPT : $A - B$.

The three operations eliminates duplicates automatically, if we want every tuple, use ALL clause.
e.g.

(SELECT dick_owner FROM dick
 WHERE  dick_length > 30)
UNION
(SELECT dick_owner FROM dick_usage_record
 WHERE  dick_used_count > 114514)

Aggregate Functions

There are functions operate on the multiset values of a relation’s column, and returns a value.

avg(col) : average value
min(col) : minimum value
sum(col) : sum of values
count(col) : number of values

e.g. Find the average dick length of those who’s name ends with ‘son’

1
2
3

SELECT avg(dick_length) avg_dick_length
FROM dick
WHERE dick_owner LIKE '%son';

GROUP BY: if an attribute in the SELECT clause isn’t in the aggregate function, it shall be put into GROUP BY list.

SELECT dick_owner, avg(dick_length) avg_dick_length
FROM dick
WHERE dick_owner LIKE '%son'
GROUP BY dick_owner;

HAVING: Make predicates on the result of aggregate function.

1	HAVING avg(dick_length) > 10

Execution of SELECT

The execution order of SELECT clause is FROM$\rightarrow$WHERE$\rightarrow$GROUP(functions)$\rightarrow$HAVING$\rightarrow$SELECT$\rightarrow$DISTINCT$\rightarrow$ORDER BY

Note that predicates in the HAVING clause are applied after GROUP.
Aggregate functions can’t be used in WHERE clause directly.

NULL values

NULL is a special marker used in SQL, first introduced by $\mathrm{E.F.Codd.}$. It means ‘missing information’ or ‘unknown’

The result of any arithmetic expression involving NULL is NULL.
Any comparison with NULL returns UNKNOWN.
- NULL = NULL

Three valued logic : $\mathrm{\{true,false,UNKNOWN\}}$

UNKNOWN or TRUE = TRUE
UNKNOWN or FALSE = UNKNOWN
UNKNOWN or UNKNOWN = UNKNOWN
UNKNOWN and TRUE = UNKNOWN
UNKNOWN and FALSE = FALSE
UNKNOWN and UNKNOWN = UNKNOWN
not UNKNOWN = UNKNOWN

UNKNOWN is treated as FALSE in WHERE clauses.

P IS UNKNOWN predicate evaluates to true if predicate P equals to UNKNOWN

IS NULL and IS NOT NULL predicates:
e.g. find all valid tuples:

1
2
3

SELECT dick_owner
FROM dick
WHERE dick_length IS NOT NULL;

NULL will be ignored in aggregate functions except COUNT().

Nested Subqueries

A subquery is a SELECT_FROM_WHERE expression that is nested within another query. A common usage of subqueries is to perform tests for set membership and set comparisons.

e.g. Find all person who is both rich and stupid.

SELECT distinct names
FROM richie
WHERE names IN (SELECT names 
                FROM douchebag)

e.g. Find all person who is both rich and NOT stupid.

SELECT distinct names
FROM richie
WHERE names NOT IN (SELECT names 
                    FROM douchebag)

e.g. Find the owner of the biggest dick for every city

SELECT dick_owner B, dick_length
FROM dick A
WHERE dick_length >= (SELECT max(dick_length)
                      FROM dick B
                      WHERE A.city = B.city)
ORDER BY dick_length

Some and all clause

$C[comp]$ SOME $r\leftrightarrow \exist t\in r, C[comp]t$ holds, where $[comp]$ could be $<,>,\le,\ge,=,\neq.$

For example $6<$ SOME $\{2,4,5,7\}=$ TRUE.

$(=$ SOME $)\equiv$ IN.

$C[comp]$ ALL $r\leftrightarrow \forall t\in r, C[comp]t$ holds, where $[comp]$ could be $<,>,\le,\ge,=,\neq.$

For example $6<$ ALL $\{5,7,9,10\}=$ FALSE.

$(\neq$ ALL $)\equiv$ NOT IN.

Test for empty relations

EXISTS construct returns the value TRUE if the argument subquery is non-empty.

EXISTS $r\leftrightarrow r\neq \phi.$

Test for absence of Duplicate Tuples

UNIQUE construct tests whether a subquery has any duplicates in its result.
e.g. Find all person who have only one child

SELECT names
FROM child as T
WHERE UNIQUE (SELECT names
              FROM child AS P
              WHERE T.names = P.names)

Views

Views provide a mechanism to hide certain data from the view of certain users.

To create a view we use:

1 2	CREATE VIEW <name_v> AS SELECT ... FROM ... WHERE ...

To drop a view we use:

1	DROP VIEW <name_v>

Benefits of using a view

Easy to use.
Security.
Supports logical independent.

Derived Relations

WITH clause allows views to be defined locally for a query rather than globally.

Modification of the database

Deletion

The Delete command in formal form:

1 2	DELETE FROM <TABLE> WHERE <predicate>

Insertion

Lec 4 : Advanced SQL

SQL Data Types and schemas

1	CREATE DOMAIN Dllars as numeric(12, 2) NOT NULL