Grouping Results (GROUP BY Clause)
The above summary queries are similar to the totals at the bottom of a report. They condense
all the detailed data in the report into a single summary row of data. However, it is
often useful to have subtotals in reports. We can use the GROUP BY clause of the
SELECT statement to do this. A query that includes the GROUP BY clause is called a
grouped query, because it groups the data from the SELECT table(s) and produces a
single summary row for each group. The columns named in the GROUP BY clause are
called the grouping columns. The ISO standard requires the SELECT clause and the
GROUP BY clause to be closely integrated. When GROUP BY is used, each item in the
SELECT list must be single-valued per group. Further, the SELECT clause may contain
only:
n column names;
n aggregate functions;
n constants;
n an expression involving combinations of the above.
All column names in the SELECT list must appear in the GROUP BY clause unless the
name is used only in an aggregate function. The contrary is not true: there may be column
names in the GROUP BY clause that do not appear in the SELECT list. When the WHERE
clause is used with GROUP BY, the WHERE clause is applied first, then groups are
formed from the remaining rows that satisfy the search condition.
The ISO standard considers two nulls to be equal for purposes of the GROUP BY
clause. If two rows have nulls in the same grouping columns and identical values in all the
non-null grouping columns, they are combined into the same group.
Example 5.17 Use of GROUP BY
Find the number of staff working in each branch and the sum of their salaries.
SELECT branchNo, COUNT(staffNo) AS myCount, SUM(salary) AS mySum
FROM Staff
GROUP BY branchNo
ORDER BY branchNo;
It is not necessary to include the column names staffNo and salary in the GROUP BY list
because they appear only in the SELECT list within aggregate functions. On the other
hand, branchNo is not associated with an aggregate function and so must appear in the
GROUP BY list. The result table is shown in Table 5.17.
Conceptually, SQL performs the query as follows:
(1) SQL divides the staff into groups according to their respective branch numbers.
Within each group, all staff have the same branch number. In this example, we get
three groups:
(2) For each group, SQL computes the number of staff members and calculates the
sum of the values in the salary column to get the total of their salaries. SQL generates
a single summary row in the query result for each group.
(3) Finally, the result is sorted in ascending order of branch number, branchNo.
The SQL standard allows the SELECT list to contain nested queries (see Section 5.3.5).
Therefore, we could also express the above query as:
SELECT branchNo, (SELECT COUNT(staffNo) AS myCount
FROM Staff s
WHERE s.branchNo = b.branchNo),
(SELECT SUM(salary) AS mySum
FROM Staff s
WHERE s.branchNo = b.branchNo)
FROM Branch b
ORDER BY branchNo;
With this version of the query, however, the two aggregate values are produced for each
branch office in Branch, in some cases possibly with zero values.
Restricting groupings (HAVING clause)
The HAVING clause is designed for use with the GROUP BY clause to restrict the groups
that appear in the final result table. Although similar in syntax, HAVING and WHERE
serve different purposes. The WHERE clause filters individual rows going into the final
result table, whereas HAVING filters groups going into the final result table. The ISO
standard requires that column names used in the HAVING clause must also appear in the
GROUP BY list or be contained within an aggregate function. In practice, the search condition
in the HAVING clause always includes at least one aggregate function, otherwise
the search condition could be moved to the WHERE clause and applied to individual rows.
(Remember that aggregate functions cannot be used in the WHERE clause.)
The HAVING clause is not a necessary part of SQL – any query expressed using a
HAVING clause can always be rewritten without the HAVING clause.
Example 5.18 Use of HAVING
For each branch office with more than one member of staff, find the number of staff
working in each branch and the sum of their salaries.
SELECT branchNo, COUNT(staffNo) AS myCount, SUM(salary) AS mySum
FROM Staff
GROUP BY branchNo
HAVING COUNT(staffNo) > 1
ORDER BY branchNo;
This is similar to the previous example with the additional restriction that we want to
consider only those groups (that is, branches) with more than one member of staff. This
restriction applies to the groups and so the HAVING clause is used. The result table is
shown in Table 5.18.
all the detailed data in the report into a single summary row of data. However, it is
often useful to have subtotals in reports. We can use the GROUP BY clause of the
SELECT statement to do this. A query that includes the GROUP BY clause is called a
grouped query, because it groups the data from the SELECT table(s) and produces a
single summary row for each group. The columns named in the GROUP BY clause are
called the grouping columns. The ISO standard requires the SELECT clause and the
GROUP BY clause to be closely integrated. When GROUP BY is used, each item in the
SELECT list must be single-valued per group. Further, the SELECT clause may contain
only:
n column names;
n aggregate functions;
n constants;
n an expression involving combinations of the above.
All column names in the SELECT list must appear in the GROUP BY clause unless the
name is used only in an aggregate function. The contrary is not true: there may be column
names in the GROUP BY clause that do not appear in the SELECT list. When the WHERE
clause is used with GROUP BY, the WHERE clause is applied first, then groups are
formed from the remaining rows that satisfy the search condition.
The ISO standard considers two nulls to be equal for purposes of the GROUP BY
clause. If two rows have nulls in the same grouping columns and identical values in all the
non-null grouping columns, they are combined into the same group.
Example 5.17 Use of GROUP BY
Find the number of staff working in each branch and the sum of their salaries.
SELECT branchNo, COUNT(staffNo) AS myCount, SUM(salary) AS mySum
FROM Staff
GROUP BY branchNo
ORDER BY branchNo;
It is not necessary to include the column names staffNo and salary in the GROUP BY list
because they appear only in the SELECT list within aggregate functions. On the other
hand, branchNo is not associated with an aggregate function and so must appear in the
GROUP BY list. The result table is shown in Table 5.17.
Conceptually, SQL performs the query as follows:
(1) SQL divides the staff into groups according to their respective branch numbers.
Within each group, all staff have the same branch number. In this example, we get
three groups:
(2) For each group, SQL computes the number of staff members and calculates the
sum of the values in the salary column to get the total of their salaries. SQL generates
a single summary row in the query result for each group.
(3) Finally, the result is sorted in ascending order of branch number, branchNo.
The SQL standard allows the SELECT list to contain nested queries (see Section 5.3.5).
Therefore, we could also express the above query as:
SELECT branchNo, (SELECT COUNT(staffNo) AS myCount
FROM Staff s
WHERE s.branchNo = b.branchNo),
(SELECT SUM(salary) AS mySum
FROM Staff s
WHERE s.branchNo = b.branchNo)
FROM Branch b
ORDER BY branchNo;
With this version of the query, however, the two aggregate values are produced for each
branch office in Branch, in some cases possibly with zero values.
Restricting groupings (HAVING clause)
The HAVING clause is designed for use with the GROUP BY clause to restrict the groups
that appear in the final result table. Although similar in syntax, HAVING and WHERE
serve different purposes. The WHERE clause filters individual rows going into the final
result table, whereas HAVING filters groups going into the final result table. The ISO
standard requires that column names used in the HAVING clause must also appear in the
GROUP BY list or be contained within an aggregate function. In practice, the search condition
in the HAVING clause always includes at least one aggregate function, otherwise
the search condition could be moved to the WHERE clause and applied to individual rows.
(Remember that aggregate functions cannot be used in the WHERE clause.)
The HAVING clause is not a necessary part of SQL – any query expressed using a
HAVING clause can always be rewritten without the HAVING clause.
Example 5.18 Use of HAVING
For each branch office with more than one member of staff, find the number of staff
working in each branch and the sum of their salaries.
SELECT branchNo, COUNT(staffNo) AS myCount, SUM(salary) AS mySum
FROM Staff
GROUP BY branchNo
HAVING COUNT(staffNo) > 1
ORDER BY branchNo;
This is similar to the previous example with the additional restriction that we want to
consider only those groups (that is, branches) with more than one member of staff. This
restriction applies to the groups and so the HAVING clause is used. The result table is
shown in Table 5.18.
Grouping Results (GROUP BY Clause)
Reviewed by Shopping Sale
on
04:02
Rating:
No comments: