6.1 Introduction To Database

A database, is just a location to store data, and a database management system (DBMS) is a collection of interrelated data and a set of programs to access those data.

Data models

For most database applications, we don't have to care about how data is stored physically. Instead, we only need to understand it in a logical view[1], and it is mainly described as a data model.

A collection of conceptual tools for describing data, data relationships, data semantics, and consistency constraints.

There are a number of different data models, but in this book, we only focus on relational model, which is used in a vast majority of current database system. The relational model uses a collection of tables to represent both data and the relationships among those data. Tables are also known relations. Probably everyone knows what a table is, and you see digital tables in spreadsheet software such as Excel, Google Docs.

Figure 6.1 Digital tables.

Each table has multiple columns, and each column has a unique name. Each row of the table represents one piece of information. In general, a row in a table represents a relationship among a set of values. The following presents a sample relational database comprising two tables: one (book) shows details of books, and the other (author) shows details of authors. Note that the order of rows is not important for a relation.

ISBNtitleauthor_nameprice
9780446365383Gone with the WindMargaret Mitchell7.99
9780743273565The Great GatsbyF. Scott Fitzgerald15.00
97803000288431587, a Year of No SignificanceRay Huang27.00
namebirthcountry
Margaret Mitchell1900-11-08America
Ray Huang1918-06-25America
F. Scott Fitzgerald1896-09-24America
Lu Xun1881-09-25China

A relational database consists of a collection of tables, each of which is assigned a unique name. For example, consider the book table above, which stores information about books. The table has four column headers: ISBN, title, author_name, and price. Each row of this table records information about a book, consisting of the book's ISBN, title, author_name, and price. In the relational model, the term relation is used to refer to a table, while the term tuple is used to refer to a row. Similarly, the term attribute refers to a column of a table.

Keys

To describe how a relation looks like, we use the concept schema so that we can investigate relations while ignoring the real data in them. The schema for book is

book(ISBN, title, author_name, price)

We must have a way to specify how tuples within a given relation are distinguished. This is expressed in terms of their attributes. No two tuples in a relation are allowed to have exactly the same values for all attributes.

[!TIP] TL;DR. Super keys can uniquely identify a tuple; candidate key is the minimal super key; primary key is the one chosen from candidate keys by programmers.

A super key is a set of one or more attributes, that taken collectively, allow us to identify uniquely a tuple in the relation. For example, the ISBN attribute of the relation book is sufficient to distinguish one book tuple from another. Thus, ISBN is super key. The author attribute, on the other hand, is not a super key, because several books might have the same name. Clearly, if K is a super key, so is any superset of K. We are often interested in super keys for which no proper subset is a super key. Such minimal super keys are called candidate keys.

Candidate keys can be more than one. For example, suppose that given an author, all books written by her have different titles. So {author, title} is also a candidate key of book. We use the term primary key to denote a candidate key that is chosen by the database designers. Primary key attributes are also underlined: book(ISBN, title, author_name, price).

Database systems

There are a large number of commercial database systems in use today. The major ones include: Oracle, Microsoft SQL Server, and SAP HANA. There are also many free database systems; widely used ones include MySQL, PostgreSQL, and the embedded database SQLite.

In this book, for the ease of installing, we use SQLite. Note that although there are the differences between different databases, and it is almost unlikely to use SQLite for Java web application in the production environment, the basic concepts and practices are the same. So you can easily switch to other alternative database systems without much trouble. Additionally, SQLite is a popular storage choice for Android and iOS applications in mobile devices.

Users can request information from the database using query languages. And the SQL query language is the standard one in relational databases. Individual implementations of SQL may differ in details, and most basic SQL syntaxes are the same.

[!NOTE] Many SQL syntaxes in SQLite differ from the standard one. You can refer to Quirks, Caveats, and Gotchas In SQLite for more information.


[1] A major purpose of a database system is to provide users with an abstract view of data.