6.3 SQL (2)

Studying SQL would cost most time for undergraduates in the database course, and SELECT would take more than 90% of time. In this section, we will explore how to use SELECT clause in different contexts. Again, this book is not dedicated to discussing SQL and databases, so only basic syntaxes are covered.

Additional basic operations

The SELECT clause may contain arithmetic expressions involving +, -, *, and /. For example,

SELECT title, price * 0.9
FROM book;

Pattern matching can be performed on strings using the operator LIKE. We describe patterns by using two special characters:

  • Percent(%): matching any substring
  • Underscore(_): matching any character

For example, for a book's title Gone with the Wind, title LIKE 'Gone%' is true.

To simplify WHERE clauses that specify that a value be less than or equal to some value and greater than to equal to some other value, SQL provides a BETWEEN comparison operator.

SELECT title
FROM book
WHERE price BETWEEN 10 AND 50;

The ORDER BY clause causes the tuples in the result of a query to appear in sorted order. To list alphabetic order all authors from America, we write:

SELECT *
FROM author
WHERE country = 'America'
ORDER BY name;

Aggregate functions

Aggregate functions are functions that take a collection of values as input and return a single value as output. SQL offers five standard built-in aggregate functions:

  • Average: AVG
  • Minimum: MIN
  • Maximum: MAX
  • Total: SUM
  • Count: COUNT

Consider the query "find the average price of books written by Lu Xun". We can write the query as follows:

SELECT AVG(price)
FROM book
WHERE author_name = 'Lu Xun';

We often use COUNT to count the number of tuples in a relation. The notation for this function in SQL is COUNT(*). Thus, to find how many books written by Lu Xun, we write:

SELECT COUNT(*)
FROM book
WHERE author_name = 'Lu Xun';

The database system may give an awkward name to the result, and we can give a meaningful name to the attribute by using AS clause as follows:

SELECT COUNT(*) AS books_num
FROM book
WHERE author_name = 'Lu Xun';

We can use GROUP BY to apply the aggregate functions to a group of sets of tuples[1]. As an illustration, to find the average book price published by each author, we write this query as follows:

SELECT author_name, AVG(price)
FROM book
GROUP BY author_name;

Note that if GROUP BY is used, then the attributes shown in SELECT clause are either specified in GROUP BY or applied by an aggregate function. Thus, the following SQL is erroneous:

SELECT title, AVG(price)
FROM book
GROUP BY author_name;

Sometimes, it is useful to state a condition that applies to groups. SQL offers HAVING clause to achieve this query. For example, to find the author whose published books' average price is more than 25:

SELECT author_name
FROM book
GROUP BY author_name
HAVING AVG(price) > 25;

Nested queries

The IN connective tests for set membership[2]. As an illustration, consider the query "to find the books which were written by Chinese authors". We can divide this query into two sub queries

  • Find all Chinese authors, donated as A
  • To test if a book's author is a member of A.
SELECT *
FROM book
WHERE author IN (SELECT name FROM author WHERE country = 'China');

The IN operator can also be used on enumerate sets. For example, to find all authors from China and Japan, we can write the query:

SELECT name
FROM author
WHERE country IN ('China', 'Japan');

Queries on multiple relations

So far our example queries were on a single relation. Queries often need to access information from multiple relations[3]. For example, to retrieve the titles of books, along with their authors and countries, we can write:

SELECT title, author_name, country
FROM book, author
WHERE book.author_name = author.name;

[1] By default, the ORDER BY clause lists items in ascending order. To specify the sort order, we may specify DESC for descending order.

[2] The NOT IN connective tests for absence of set membership.

[3] Join expression is another way to express queries on multiple relations.