Basic info
课程名:数据库系统 (Database System)
主讲教师: $\mathbf{Prof.}$ $Gang$ $Chen$
E-Mail:[email protected]$
分数占比:
- Assignment $10\%$
- Quiz $10\%$
- Lab $30\%$
- Final $50\%$ (Cheat sheet allowed)
教材:
- $Abraham Silberschatz$, Database System Concepts (7th Edition).
Lec 1 : Introduction
Abstraction of a database system:
- Physical Level: Describes how a record is stored
- Logical Level: Describes data stored in database, and the relationships among the data on upper level.
- View Level: Application programs hide details of data types.
► Schema $\leftrightarrow$ Type:The structure of the database on different level.
- Physical Schema: database structure design at the physical level.
- Logical Schema: database structure design at the logical level.
- Subschema: schema at view level.
► Instance $\leftrightarrow$ Variable: The actual content of the database at a particular point in time.
Independence: Ability to modify a schema definition at one level without affecting a schema definition at a higher level.
- Physical data independence: The ability to modify the physical schema without changing the logical schema.
- Applications depend on the logical schema.
- Applications are insulated (绝缘的) from how data is structured and stored.
- One of the most important benefits of using a DBMS.
- Logical data independence: Protect application programs from changes in logical structure of data.
- Hard to achieve as the application programs are heavily dependent on the logical structure of data.
Database Languages:
- Data Definition Language: Specification notation for defining the database schema.
- Specifies a database schema as a set of definitions of relational schema
- Also specifies storage structure, access methods, and consistency constraints.
- DDL statements are compiled, resulting in a set of tables stored in a special file called data dictionary.
Data Manipulation Language: Language for accessing and manipulating the data organized by appropriate data model.
Data Control Language
Lec 2 : Relational Model
A Relation is a table with rows and columns.
A relational database is a collection of one or more relations, which are based on the relational model
Relations
Basic Definition of a relation:
Formally, given sets $D_1,D_2,\dots,D_n (D_i=a_{ij}|_{j=1\dots k})$, a relation $r$ is a subset of $D_1\times D_2\times \dots \times D_n$.
Attribute Types:
- Each attribute of a relation has a name.
- The set of allowed values for each attribute is called the domain of the attribute.
- $1^{st} NF$: Attribute values are required to be atomic (indivisible)
- The special value null is a member of every domain.
A relation is concerned with two concepts: relation schema and relation instance.
- The relation schema describes the structure of the relation
- The relation instance corresponds to the snapshot of the data in the relation at a given instant in time
Assume $A_1,A_2,\dots,A_n$ are attributes, then
is a relation schema
$r(\mathbf{R})$ is a relation on the relation schema $\mathbf{R}$
Properties of a relation:
- The order of tuples is irrelevant.
- No duplicated tuple in a relation.
- Attribute values are atomic.
Database:
- A database consists of multiple relations.
Key
$K\subseteq \mathbf{R}$. (An attribute of the relation schema)
- Superkey: $K$ is a superkey of $\mathbf{R}$ if values for $K$ are sufficient to identify a unique tuple of each possible relation $r(\mathbf{R})$.
- Candidate Key: $K$ is a candidate key if it is minimal superkey.
- Primary key: $K$ is a Primary key if $K$ is a candidate key and is defined by user explicitly.
- Foreign key: Assume there exists relations $r$ and $s$: $r(A,B,C),s(B,D)$, we can say that attribute $B$ in relation $r$ is foreign key referencing $s$ and $r$ is a referencing relation and $s$ is a referenced relation. There are two types of constraints:
- Foreign-key constraint:
- the value of $B$ for each tuple in $r$ must also be the value of $B$ for some tuple in $s$.
- $B$ must be the primary key of $s$.
- Referential-integrity constraint:
- the value of $B$ for each tuple in $r$ must also be the value of $B$ for some tuple in $s$.
- Usually not supported by Database Systems.
- Foreign-key constraint:
Take the table above as an example, id
and name
are both superkey. birth year
and weight
is a superkey when we combine them together.
Superkey id
or name
have only one attribute, thus they are both candidate key. The user can decide which one becomes the primary key.
If a connection should be established between the table and another table which have id
as an attribute, then id
is a foreign key.
Fundamental Relational-Algebra Operations
There are six basic operators:
Select
defined as $\sigma_p(r)=\{t|t\in r \land p(t)\}$, where $p$ is a formula in propositional calculas consisting of terms connected by $\land, \vee, \neg.$ e.g.
Project
defined as $\prod_{A_1,A_2,\dots,A_k}(r).$ Where $A_1,A_2,\dots,A_k$ are attribute names and $r$ is a relation name. The result is defined as the relation of $k$ columns obtained by erasing the columns that are not listed. Note that duplicated rows must be removed from the result.
Union
defined as $r\cap s=\{t|t\in r\vee t\in s\}.$ where $r$ and $s$ must have the same number of attributes, and the attribute domains must be compatible.
Set Difference
defined as $r-s=\{t|t\in r \land t \notin s\}.$ where $r$ and $s$ must have the same number of attributes, and the attribute domains must be compatible.
Cartesian Product
defined as $r\times s=\{ \{t ,q\}|t\in r \land q\in s\}$. Assume that attributes of $r(\mathbf{R})$ and $s(\mathbf{S})$ are disjoint. If attributes of $r(\mathbf{R})$ and $s(\mathbf{S})$ are not disjoint, then renaming for attributes must be used.
Rename
defined as $\rho_x(E)$, essential when self-compare occurs, e.g. $\mathrm{\max(r)}$.
Advanced
Natural Join Operation
Notation: $r\bowtie s$
Example:$\mathrm{R=(A,B,C,D), S=(B,D,E)}$
Theta join Operation
Notation: $r\bowtie_\theta s$
where $\theta$ is the predicate on attributes in the schema. note that
Division Operation
Notation: $r\div s$
The backward operation of Cartesian Product.
Assignment Operation
Notation: $r\leftarrow s$
Lec 3 : SQL
SQL /ˈsēkw(ə)l/
(Structured Query Language, 结构化查询语言)
SQL includes these parts:
- Data-Definition Language
CREATE TABLE;
DROP TABLE;
- Data-Manipulation Language
SELECT * FROM TABLE;
INSERT
- Data-Control Language
GRANT
Data-Definition Language
What can DDL do?
Create, drop or alter table and Define the schema for relations
for example,1
2
3
4
5CREATE TABLE dick
(dick_owner char(20) not null,
dick_length numeric(2, 2),
dick_radius numeric(2, 2),
primary key (dick_owner));
creates a new table names dick
, and contains three attribute dick_owner
, dick_length
and dick_radius
. We set dick_owner
the primary key as no dick can be owned by two.1
DROP TABLE dick;
this command deletes all information about the dropped relation from the database.
One should be careful and responsible to use the DROP
command.1
2
3ALTER TABLE dick ADD dick_angle numeric(2, 2);
ALTER TABLE dick DROP dick_angle; //not supported by many DBMS
ALTER TABLE dick MODIFY (dick_owner varchar(20));
this command is used to add, drop or edit attributes to an existing relation
Note:
“-“ character is invalid in names.
SQL names are case-insensitive.Define the domain of values associated with each attribute
There are several types of domain in SQL
char(n)
: Fixed length character string, the length must ben
.varchar(n)
: Variable length character string, the maximum length isn
.int
: Integer (depend on the machine)smallint
: Subset of Integer type (depend on the machine)numeric(p, d)
: Fixed point number, precision isp
digits,d
digits to the right of decimal point.real, double precision
: Floating point and double-precision floating point numbers.float(n)
: Floating point number, precision at leastn
digits.NULL
: nothing.date
: Contain year, month and day.time
: Time of a day. Contain hour, minute and second.timestamp
: date and time.
Functions can be used to handle the conversion of different types. such as Abs()
, exp()
…
Define the integrity constraints.
Integrity Constraints are the protocols that a table’s data columns must follow. Some integrity constraints are shown below:
Not null
Primary key (A, B, ..., Z)
, ( for SQL_92, this constraint ensuresNot null
)Check (P)
, whereP
is a predicate, value of this attribute of each tuple must satisfyP
.
Other
- Define the physical storage structure of each relations on disk
- Define the indices to be maintained for each relations
- Define the view on relations
Basic Structure of command
select clause
A typical SQL query can have the form:1
2
3
4
5
6SELECT (DISTINCT) A, B, ...
FROM r, s, ...
WHERE P
GROUP BY C, D, ...
HAVING Q
ORDER BY E, F, ...
where A
and B
are attributes; r
and s
are relations; P
is a predicate.
This query is equivalent to the relational algebra expression:
the result of query is also a relation.
Distinct: In the result of query, duplicates is allowed by default, if we want it to be eliminated, key word distinct
can be use:1
2SELECT distinct dick_length
FROM dick
Note that
distinct
means every tuple is distinct in the result relation, but not every value of one attribute is distinctive.
Asterisk (*): An asterisk in the select clause denotes all attributes.1
SELECT * FROM dick
The asterisk can also be used in arithmetic expressions, such as1
2SELECT dick_owner, dick_length * 10
FROM dick;
where clause: The where clause specifies conditions that the result must satisfy.1
2
3SELECT dick_owner
FROM dick
WHERE dick_length > 30;
the SQL query returns all the names of whom have a dick longer than 30. It’s equivalent to the relational algebra expression:
The predicate, P
can be a combination of several predicates connected with logical connectives including AND
, OR
and NOT
.
from clause: the FROM
clause lists the relations involved in the query. If more than one relation is specified in the clause, than select from the Cartesian product relation of all relations.1
2
3
4
5SELECT dick.dick_owner, dick_length, dick_used_count
FROM dick, dick_usage_record
WHERE dick.dick_owner = dick_usage_record.dick_owner AND
dick_used_count > 114514 AND
dick_length > 30
The query selects the name of those who have a dick longer than 30 and used for more than 114,514 times. Note that dick.dick_owner
is necessary because there are two dick_owner
attribute in total.
rename(as): The SQL allows renaming relations and attributes using the as
clause.1
2
3
4
5
6SELECT dick.dick_owner as excellent_dick_owner,
dick_length, dick_used_count
FROM dick, dick_usage_record
WHERE dick.dick_owner = dick_usage_record.dick_owner AND
dick_used_count > 114514 AND
dick_length > 30
tuple variables: renaming the relation temporarily in the query.
for simplification:1
2
3
4
5SELECT r.dick_owner, dick_length, dick_used_count
FROM dick as r, dick_usage_record as s
WHERE r.dick_owner = s.dick_owner AND
dick_used_count > 114514 AND
dick_length > 30
for self-comparing, the following query selects all the name of those who’s dick is longer than cartman:1
2
3
4SELECT DISTINCT r.dick_owner, r.dick_length
FROM dick as r, dick as s
WHERE r.dick_length > s.dick_length AND
s.dick_owner = "cartman";
string operations
SQL includes a string-matching operator for comparisons on character strings
%
matches any substring_
matches any character||
Concat stringslower()
upper()
LIKE
clause can be used to form a predicate
e.g. find all dick owner who’s name ends with ‘son’1
WHERE dick_owner LIKE '%son'
e.g.1
2
3SELECT 'dick length of him is ' || dick_length
FROM dick
WHERE dick_owner LIKE '%son'
ordering of display of tuples
Though tuples in the databse is inordered, but we can define how it will display by using ORDER BY
clause.1
2
3
4SELECT dick_owner
FROM dick
WHERE dick_length > 30
ORDER BY dick_length DESC
Set Operation
There are there basic type of Set operations:
UNION
: $A\cap B$.INTERSECT
: $A\cup B$.EXCEPT
: $A - B$.
The three operations eliminates duplicates automatically, if we want every tuple, use ALL
clause.
e.g.1
2
3
4
5(SELECT dick_owner FROM dick
WHERE dick_length > 30)
UNION
(SELECT dick_owner FROM dick_usage_record
WHERE dick_used_count > 114514)
Aggregate Functions
There are functions operate on the multiset values of a relation’s column, and returns a value.
avg(col)
: average valuemin(col)
: minimum valuesum(col)
: sum of valuescount(col)
: number of values
e.g. Find the average dick length of those who’s name ends with ‘son’1
2
3SELECT avg(dick_length) avg_dick_length
FROM dick
WHERE dick_owner LIKE '%son';
GROUP BY: if an attribute in the SELECT
clause isn’t in the aggregate function, it shall be put into GROUP BY
list.1
2
3
4SELECT dick_owner, avg(dick_length) avg_dick_length
FROM dick
WHERE dick_owner LIKE '%son'
GROUP BY dick_owner;
HAVING: Make predicates on the result of aggregate function.1
HAVING avg(dick_length) > 10
Execution of SELECT
The execution order of SELECT
clause is FROM
$\rightarrow$WHERE
$\rightarrow$GROUP(functions)
$\rightarrow$HAVING
$\rightarrow$SELECT
$\rightarrow$DISTINCT
$\rightarrow$ORDER BY
Note that predicates in the
HAVING
clause are applied afterGROUP
.
Aggregate functions can’t be used inWHERE
clause directly.
NULL values
NULL
is a special marker used in SQL, first introduced by $\mathrm{E.F.Codd.}$. It means ‘missing information’ or ‘unknown’
- The result of any arithmetic expression involving
NULL
isNULL
. - Any comparison with
NULL
returnsUNKNOWN
.NULL = NULL
Three valued logic : $\mathrm{\{true,false,UNKNOWN\}}$
UNKNOWN or TRUE = TRUE
UNKNOWN or FALSE = UNKNOWN
UNKNOWN or UNKNOWN = UNKNOWN
UNKNOWN and TRUE = UNKNOWN
UNKNOWN and FALSE = FALSE
UNKNOWN and UNKNOWN = UNKNOWN
not UNKNOWN = UNKNOWN
UNKNOWN
is treated as FALSE
in WHERE
clauses.
P IS UNKNOWN
predicate evaluates to true if predicate P
equals to UNKNOWN
IS NULL
and IS NOT NULL
predicates:
e.g. find all valid tuples:1
2
3SELECT dick_owner
FROM dick
WHERE dick_length IS NOT NULL;
NULL
will be ignored in aggregate functions except COUNT()
.
Nested Subqueries
A subquery is a SELECT_FROM_WHERE
expression that is nested within another query. A common usage of subqueries is to perform tests for set membership and set comparisons.
e.g. Find all person who is both rich and stupid.1
2
3
4SELECT distinct names
FROM richie
WHERE names IN (SELECT names
FROM douchebag)
e.g. Find all person who is both rich and NOT stupid.1
2
3
4SELECT distinct names
FROM richie
WHERE names NOT IN (SELECT names
FROM douchebag)
e.g. Find the owner of the biggest dick for every city1
2
3
4
5
6SELECT dick_owner B, dick_length
FROM dick A
WHERE dick_length >= (SELECT max(dick_length)
FROM dick B
WHERE A.city = B.city)
ORDER BY dick_length
Some and all clause
$C[comp]$ SOME
$r\leftrightarrow \exist t\in r, C[comp]t$ holds, where $[comp]$ could be $<,>,\le,\ge,=,\neq.$
For example $6<$ SOME
$\{2,4,5,7\}=$ TRUE
.
$(=$
SOME
$)\equiv$IN
.
$C[comp]$ ALL
$r\leftrightarrow \forall t\in r, C[comp]t$ holds, where $[comp]$ could be $<,>,\le,\ge,=,\neq.$
For example $6<$ ALL
$\{5,7,9,10\}=$ FALSE
.
$(\neq$
ALL
$)\equiv$NOT IN
.
Test for empty relations
EXISTS
construct returns the value TRUE
if the argument subquery is non-empty.
EXISTS
$r\leftrightarrow r\neq \phi.$
Test for absence of Duplicate Tuples
UNIQUE
construct tests whether a subquery has any duplicates in its result.
e.g. Find all person who have only one child1
2
3
4
5SELECT names
FROM child as T
WHERE UNIQUE (SELECT names
FROM child AS P
WHERE T.names = P.names)
Views
Views provide a mechanism to hide certain data from the view of certain users.
To create a view we use:1
2CREATE VIEW <name_v> AS
SELECT ... FROM ... WHERE ...
To drop a view we use:1
DROP VIEW <name_v>
Benefits of using a view
- Easy to use.
- Security.
- Supports logical independent.
Derived Relations
WITH
clause allows views to be defined locally for a query rather than globally.
Modification of the database
Deletion
The Delete command in formal form:1
2DELETE FROM <TABLE>
WHERE <predicate>
Insertion
Lec 4 : Advanced SQL
SQL Data Types and schemas
1 | CREATE DOMAIN Dllars as numeric(12, 2) NOT NULL |