Lecture 1
Introduction to database systems
Basic database concepts
A database stores data in operating system files.
A database management system (DBMS) is a
special software for managing large amount
of data stored in operating system files.
Disadvantages of the simplest DBMSs based files:
Problems in accessing data – users need special applications to operate on data stored in files.
Problems with integrity constraints – it is required to implement mechanisms which allow to ensure that the data meets some conditions.
Lack of concurrency control – one user can perform operations on data which are accessed by other
users at the same time.
Data inconsistency – without special mechanisms a database can be left in an inconsistent state (e.g.
some integrity constraints can not be preserved, data entered) after user modifications.
Lack of backup and restoring mechanisms –
database should posses mechanisms which allow to archive its content and restore backups in case of failure.
Contemporary database management systems provide the following features:
Higher level of abstractions – DBMS hides details connected with physical data representation and storage operations.
Efficient data access – DBMS provides diverse
mechanisms which allow to process data efficiently.
Data integrity – DBMS allows to define variety of integrity constraints and it has mechanisms to enforce these constraints.
Security – users can gain access to some (not to all) data through specific privileges or roles.
Concurrent access – DBMS provides very
sophisticated mechanisms to allow users work simultaneously on the same date.
Archiving and recovering mechanisms – DBMS is equipped with variety mechanisms to restore database from backup files in case of failure.
Architecture of database management system
The ANSI/SPARC architecture is the most known framework for database management systems.
Conceptual level
Internal level External level
User 1/
Application 1 ...
User 2/
Application 2
User n/
Application n
Stored data
The external level represents the data model dedicated to specific users and
client applications. In relational databases it involves user schemas, privileges, tables, views, etc.
The conceptual level defines the data model for the whole enterprise, not only to
particular users. In particular it contains the database catalog and its schema.
The internal level concerns physical data
storage.
The classical client-server architecture is based on the ANSI/SPARC model.
Users connect to the server through a
network and send their requests (e.g. SQL queries).
The server sends then the answer back to
the clients.
The main components of DBMS.
A query parser takes user queries to perform their syntactic and semantic analysis.
The parsed query is passed to a query optimizer which examines various execution plans for this query.
The plan with the lowest cost is presented to a query
evaluator which processes it and returns its result.
The query evaluator make use of a resource manager. It defines how records are arranged in files and allows access to desired sets of records.
A buffer manager is
responsible for moving pages from disk to memory buffers.
It cooperates with a storage manager to obtain data from disk.
A recovery manager is
responsible for preserving the effects of completed
transactions even if the system fails.
Storage (database files) Query parser
Query optimizer
Query evaluator
Resource manager
Buffer manager
Storage manager
Transaction manager Recovery
manager
User requests forms database transactions.
A transaction manager ensures atomicity and isolation of transactions.
Atomicity means that all operations of a
transaction must be
successfully completed;
otherwise the transaction is cancelled.
Isolation ensures that effects made by a
transaction are not visible to other transactions
Storage (database files) Query parser Query optimizer Query evaluator
Resource manager
Buffer manager
Storage manager
Transaction manager Recovery
manager
Relational database management systems (RDBMSs)
It was proposed in 1970 by Edgar Codd.
The model assumes that data are stored in relations which can be understood as two- dimensional tables with columns and rows.
The model also introduces the collection of operators (e.g. selection, projection, join) which act on relations and return also
relations.
Example
Assume that there is a need to store
information about departments and their
employees. The above information can be
kept in the following two relations (tables):
DEPARTMENTS
EMPLOYEES
The DEPT_ID column in both tables models the relationships between them.
The relationship is modelled by storing the values from a unique column (e.g.
DEPARTMENTS.DEPT_ID) of one table in a column of another table (e.g.
EMPLOYEES.DEPT_ID).
Relations can be considered as time-varying
sets of rows if the modifications operations are allowed on them. Hence the above two tables represent some states of the relations
EMPLOYEES and DEPARTMENTS. They are called
instances of these relations.
The concept of a schema is introduced to describe the structure of relations.
The schema involves the relation name and the names of its attributes (columns).
It specifies also data types for these attributes. For example, the previously considered relations can have the following schemas:
DEPARTMENTS(DEPT_ID:integer,DEPT_NAME:string, CITY:string,STREET:string,
POSTAL_CODE:string)
EMPLOYEES(EMP_ID:integer,FIRST_NAME:string, LAST_NAME:string,
SALARY:float,HIRE_DATE:date, DEPT_ID:integer)
The rows (tuples) of relations must confirm to their schemas. For example, the values of the DEPT_ID attribute of the DEPARTMENTS relation must be integers.
Schemas
Structured Query Language (SQL)
Each RDBMS provides a language to manipulate relations and their contents. The language which contemporary RDBMSs apply today is called SQL (Structured Query Language).
It has several dialects depending on the DBMS on which it is applied. SQL consists of the following:
Data Definition Language (DDL) – it is applied to create, modify, and remove database objects (e.g.
tables, indices, views). DDL mainly involves CREATE, ALTER, and DROP statements.
Data Manipulation Language (DML) – it is applied to retrieve, insert, update, and delete data from tables.
DML involves SELECT, INSERT, UPDATE, and DELETE statements.
Data Control Language (DCL) – it allows to set and remove database privileges. DCL involves GRANT, DENNY, and REVOKE statements.
Example 1.1: Sample SQL query.
The query below retrieves information (first name, last name and salary) about employees earning more than 4000.
SELECT FIRST_NAME, LAST_NAME, SALARY FROM EMPLOYEES
WHERE SALARY > 4000
The execution plan for this can involve the following steps:
examine all the rows of the EMPLOYEES relation which occurs in the FROM clause,
choose only the rows which satisfy the selection conditions occurring in the WHERE clause,
display the values of the attributes of those rows (attribute names are indicated in the SELECT
clause).
Main database objects
The following database objects are supported by contemporary RDBMSs:
Table – it is a collection of rows in which each row has the same number of fields (columns).
Constraint – it defines rules to enforce the integrity of a database.
View- it can be considered a virtual table because its rows are not explicitly stored in the database. If a user query addresses a view then its rows are obtained by executing the select
statement which is associated with this view.
Index – it is auxiliary (redundant) data structure that allows to speed up retrieval of data.
Procedure – it is a saved collection of SQL statements which perform specific tasks.
Function – it is a stored routine that must return a value.
Trigger – it is a special kind of the stored procedure which automatically runs whenever a specific event occurs in the database.
Examples of RDBMSs
Microsoft SQL Server
https://www.microsoft.com/sqlserver
Oracle Server
www.oracle.com/database
DB2 www.ibm.com/db2
Sybase www.sybase.com
MySQL http://www.mysql.com/
PostgreSQL http://www.postgresql.org/
Other database management systems
Object-oriented database systems
An object-oriented database management system
(OODBMS) is a database management system which is based on object-oriented data model.
This model introduces concepts such as complex
objects, classes, attributes, relationships, methods, typing, persistence, inheritance, polymorphism etc.
Object-oriented databases (object bases) consists of objects. The relationships between object are
modelled by pointer objects.
Object-oriented concepts
Objects are abstract entities which represent things or notions occurring in the real world.
For example, the following entities are objects:
an employee named John Smith,
a bank account with the number 5683487554867,
a car with VIN 678GH9056893.
Each object possess a unique internal identifier (OID), an external name (assigned by database designers) and a set of attributes. The state of an object is determined by the values stored in its attributes. Objects usually posses a set of methods (e.g. functions, procedures) which are applied to read and
modify their state.
Note that the concept of a primary key does not occur in an
object model. It is not necessary to add an additional attribute to an object which uniquely identifies it. In the object model the system automatically generates a unique internal identifier for each object. Internal identifiers are also used to model relations between objects. In the relational model a foreign key is applied to implement relationships between tables.
Classes
To capture the structure of objects the concept of classes is introduced.
A class is a container which holds information common to a specific group of objects.
The elements of the class are called invariants (e.g. attributes, methods).
An object assigned to a class is called an
instance of this class. All instances of a given class are equipped with its invariants.
The actual set of instances of the class is called
extent of this class.
Inheritance
An inheritance is a special type of relation between classes which introduces some hierarchy of classes.
One class C1 can be defined as a subclass of another class C2.
It means that instances of C1 will be equipped with all the variants of C2.
The class C1 can also stores additional invariants.
Multiple inheritance takes place if a class
inherits properties of at least two classes.
Example of classes hierarchy
Vehicle vin owneryear getYear()
getOwner() setOwner(...) displayDesc()
number_plateCar mark model
getNP() setNP(...) getMark() getModel()
displayDesc()
Bicycle
number_of_wheels type
getWheels() getType(...) displayDesc()
object object
object object object object
object object
i
Polymorphism
It is possible to equip an object with different methods which have the same names and signatures. Each such a method is the invariant of a specific class to which the object belongs. Polymorphism causes that sending a message to objects can result in different actions depending on the class of object which receives the message.
For instance, the class Vehicle contains a method
displayDesc() which display some information about its instances. The specialized classes Car and Bicycle also contain a method displayDesc() (each class posses own code of this method). When an object receives a message displayDesc() the system verifies to which class the object belongs. After detecting the object class it calls an appropriate method displayDesc().
Example of object-oriented database
The above class diagram (schema) introduces three classes:
PersonClass, EmpClass and DeptClass. Persistent instances of these classes can be accessed using their names Person,
Employee and Department. A Department object stores a collection (named employs) of references to the instances of EmpClass. An Employee object stores single reference (named worksIn) to the Department object.
employs
worksIn
EmpClass:Employee salary: float
hire_date: date DeptClass:Department
dept_name: string city: string
street: string postal_code: string
PersonClass:Person first_name: string last_name: string birth_date: date getFullName()
inherits
1 *
Each object possesses a unique internal identifier. Object identifiers will be
denoted by ij, j=1,2,...Three types of objects are used to model an object store for the above diagram: atomic, reference, and complex. They are modelled as the
following triples:
atomic object as <i, n, v> - an object has the internal identifier i, the external name n and the atomic value v.
reference object as <im, n, ip> - an object has the identifier im and the name n. The identifier ip is the value of this object and denotes a
reference (pointer) to another object.
complex object as <i, n, S> - an object has the identifier im and the name n. S denotes arbitrary set of objects.
i20 Department i21 dept_name “Marketing”
i22 city “Lodz”
i23 street “Piotrkowska 145”
i24 postal_code “90-024”
i25 employs i8 i26 employs i15
i8 Employee i9 first_name ”John”
i10 last_name ”Smith”
i11 birth_date 23-02-1984 i12 salary 6300
i13 hire_date 11-07-2013 i14 worksIn i20
i15 Employee i16 first_name ”John”
i17 last_name ”Smith”
i18 birth_date 23-02-1984 i19 salary 4800
i20 hire_date 12-10-2013 i21 worksIn i20
i3 EmpClass (invariants of EmpClass) i2 PersonClass
(invariants of PersonClass) i1 DeptClass
(invariants of DeptClass)
instance of instance of
instance of subclass of