Aggregate functions
Overview
COUNT, SUM, AVG, MIN, and MAX are aggregate functions that can be used with or without the GROUP BY and HAVING clauses. When used without GROUP BY or HAVING, these functions perform the action that they are named after. For example, let’s look at our database from the joins page:
| id | description | customer_id | value | 
|---|---|---|---|
| 1 | Water bottle | 1 | 15.00 | 
| 2 | Key chain | 1 | 7.50 | 
| 3 | Computer | 3 | 2000.00 | 
| 4 | Thumb drive | 3 | 25.00 | 
| 5 | Notebook | 4 | 9.00 | 
| 6 | Shampoo | 5.00 | |
| 7 | Paper | 4.00 | 
| id | first_name | last_name | |
|---|---|---|---|
| 1 | Natalie | Wright | |
| 2 | Ana | Sousa | |
| 3 | Ben | Schwartz | |
| 4 | Chen | Xi | |
| 5 | Frank | Zhang | |
| 6 | Tianchi | Liu | |
| 7 | Jake | Jons | 
To count the number of rows in a table, for example, the orders table, you could run the following query.
SELECT COUNT(*) FROM orders;| COUNT(*) | 
|---|
| 7 | 
As you can see, by default the column header for the resulting row is COUNT(*). To rename the column, you can create an alias.
SELECT COUNT(*) AS 'number of results' FROM orders;| number of results | 
|---|
| 7 | 
COUNT only returns the number of rows returned by the query. If the query only returns 4 rows, 4 will be the result of COUNT.
SELECT COUNT(*) AS 'number of results' FROM orders WHERE value > 10;| number of results | 
|---|
| 3 | 
In COUNT(*), the * is a wildcard that returns all columns from the selected table. In our examples with the orders table, this is the same as running the following query:
SELECT id, description, customer_id, value FROM orders;You can, of course, selected a specific column, however, if there are any NULL values for the selected column, they will not be counted.
SELECT COUNT(customer_id) FROM orders;| COUNT(customer_id) | 
|---|
| 5 | 
The rest of the functions work the same way. For example, SUM returns the sum of all of the values in the selected rows.
SELECT SUM(value) AS 'our sum' FROM orders WHERE value > 10;| our sum | 
|---|
| 2040.0 | 
MIN, MAX, and AVG return the minimum, maximum, and average, respectively.
SELECT MIN(value) AS minimum, MAX(value) AS maximum, AVG(value) AS avg FROM orders WHERE value > 10;| minimum | maximum | avg | 
|---|---|---|
| 15.0 | 2000.0 | 680.0 | 
GROUP BY
The capabilities of the aggregate functions are not realized until used in conjunction with the GROUP BY clause. The GROUP BY clause goes after the WHERE clause (if WHERE clause is present). GROUP BY forces the SQL engine to calculate the aggregate function after the data has been grouped by one or more columns. For example, both of the following queries work. The first calculates the sum of the value of all orders by the category of the order. The second does the same thing, but excludes all rows where value ⇐ 10 before performing the operations.
SELECT category, SUM(value) FROM orders GROUP BY category;| category|SUM(value) | 
|---|
| beauty products|5.0 | 
| electronics|2025.0 | 
| misc|22.5 | 
| office|13.0 | 
SELECT category, SUM(value) FROM orders WHERE value > 10 GROUP BY category;| category | SUM(value) | 
|---|---|
| electronics | 2025.0 | 
| misc | 15.0 | 
This is powerful when you want to calculate values for many categories. For example, if we wanted to count the number of orders for each category, we could do so with the following query.
SELECT category, COUNT(*) FROM orders GROUP BY category;| category | COUNT(*) | 
|---|---|
| beauty products | 1 | 
| electronics | 2 | 
| misc | 2 | 
| office | 2 | 
Or, you could get all of the statistics when grouped by category.
SELECT category, COUNT(*), SUM(value), MIN(value), MAX(value), AVG(value) FROM orders GROUP BY category;| category | COUNT(*) | SUM(value) | MIN(value) | MAX(value) | AVG(value) | 
|---|---|---|---|---|---|
| beauty products | 1 | 5.0 | 5.0 | 5.0 | 5.0 | 
| electronics | 2 | 2025.0 | 25.0 | 2000.0 | 1012.5 | 
| misc | 2 | 22.5 | 7.5 | 15.0 | 11.25 | 
| office | 2 | 13.0 | 4.0 | 9.0 | 6.5 | 
HAVING
HAVING adds one more tool to the tool chest. HAVING is similar to the WHERE clause but instead of being applied to every row before grouping, it is applied on the aggregated data. A single query can have both a WHERE clause and a HAVING clause. The WHERE clause will filter the data before aggregation, and the HAVING clause will filter the data after aggregation. The following is an example that tries to demonstrate this slightly subtle difference.
In this example, the SUM of the value is calculated for each category, and afterwards, only results are included where the SUM of the value is greater than 10.
SELECT
	category,
	SUM(value)
FROM
	orders
GROUP BY
	category
HAVING
	value > 10;The following are also equivalent:
SELECT
	category,
	SUM(value)
FROM
	orders
GROUP BY
	category
HAVING
	SUM(value) > 10;SELECT
	category,
	SUM(value) AS 's_value'
FROM
	orders
GROUP BY
	category
HAVING
	s_value > 10;| category | sum(value) | 
|---|---|
| electronics | 2025.0 | 
| misc | 22.5 | 
In this example, all rows where value ⇐ 10 are excluded, then the remaining rows are grouped by category, and the SUM of value is calculated for the groups.
SELECT
	category,
	SUM(value)
FROM
	orders
WHERE
	value > 10
GROUP BY
	category;| category | sum(value) | 
|---|---|
| electronics | 2025.0 | 
| misc | 15.0 | 
Examples
All of the following examples are taken from the chinook database found at /anvil/projects/tdm/data/chinook/Chinook_Sqlite.sqlite
Find the average length (in seconds) of the tracks with genre Pop.
Click to see solution
SELECT
	AVG(Milliseconds / 1000.0) AS avg
FROM
	Track
WHERE
	genreId = 9;| avg | 
|---|
| 229.0341 | 
Figure out what the longest track (in seconds) is for the "Bossa Nova" genre.
Click to see solution
SELECT
	GenreId
FROM
	Genre
WHERE
	Name = 'Bossa Nova';| GenreId | 
|---|
| 11 | 
SELECT
	*,
	MAX(Milliseconds / 1000.0) AS Seconds
FROM
	Track
WHERE
	genreId = 11;| TrackId | Name | AlbumId | MediaTypeId | GenreId | Composer | Milliseconds | Bytes | UnitPrice | Seconds | 
|---|---|---|---|---|---|---|---|---|---|
| 646 | Samba Da Bênção | 52 | 1 | 11 | NA | 409965 | 13490008 | 0.99 | 409.965 | 
Get the average price per hour for Bossa Nova music (genreId of 11).
Click to see solution
SELECT
	AVG(UnitPrice / (Milliseconds / 1000.0 / 3600)) AS 'Price per Hour'
FROM
	Track
WHERE
	genreId = 11;| Price per Hour | 
|---|
| 17.7131539609759 | 
Get the average time (in seconds) for tracks by genre.
Click to see solution
SELECT
	genreId,
	AVG(Milliseconds / 1000.0) AS 'Average seconds per track'
FROM
	Track
GROUP BY
	genreId;| GenreId | Average seconds per track | 
|---|---|
| 1 | 283.9100 | 
| 2 | 291.7554 | 
| 3 | 309.7494 | 
| 4 | 234.3538 | 
| 5 | 134.6435 | 
| 6 | 270.3598 | 
| 7 | 232.8593 | 
| 8 | 247.1778 | 
| 9 | 229.0341 | 
| 10 | 244.3709 | 
You can take this 1 step further by getting the genre names. This requires using an inner join, however.
SELECT
	g.Name,
	track_time. 'Average seconds per track'
FROM
	Genre AS g
	INNER JOIN (
		SELECT
			genreId,
			AVG(Milliseconds / 1000.0) AS 'Average seconds per track'
		FROM
			Track
		GROUP BY
			genreId) AS track_time ON g.GenreId = track_time.GenreId
ORDER BY
	track_time. 'Average seconds per track' DESC;| Name | Average seconds per track | 
|---|---|
| Sci Fi & Fantasy | 2911.7830 | 
| Science Fiction | 2625.5491 | 
| Drama | 2575.2838 | 
| TV Shows | 2145.0410 | 
| Comedy | 1585.2637 | 
| Metal | 309.7494 | 
| Electronica/Dance | 302.9858 | 
| Heavy Metal | 297.4529 | 
| Classical | 293.8676 | 
| Jazz | 291.7554 | 
Figure out the average price per track for each genre.
Click to see solution
SELECT
	genreId,
	AVG(UnitPrice) AS 'Average price per track'
FROM
	Track
GROUP BY
	genreId;| GenreId | Average price per track | 
|---|---|
| 1 | 0.99 | 
| 2 | 0.99 | 
| 3 | 0.99 | 
| 4 | 0.99 | 
| 5 | 0.99 | 
| 6 | 0.99 | 
| 7 | 0.99 | 
| 8 | 0.99 | 
| 9 | 0.99 | 
| 10 | 0.99 | 
What is the average number of tracks per album?
Click to see solution
SELECT
	AVG(trackCount)
FROM (
	SELECT
		COUNT(*) AS trackCount
	FROM
		Track
	GROUP BY
		albumId) AS track_count;| AVG(trackCount) | 
|---|
| 10.0951 | 
Figure out the average number of tracks per album genre.
Click to see solution
SELECT
	genreId,
	AVG(trackCount)
FROM (
	SELECT
		genreId,
		COUNT(*) AS trackCount
	FROM
		Track
	GROUP BY
		albumId) AS track_count
GROUP BY
	genreId;| genreId | AVG(trackCount) | 
|---|---|
| 1 | 11.41379 | 
| 2 | 10.00000 | 
| 3 | 10.90625 | 
| 4 | 14.43478 | 
| 5 | 12.00000 | 
| 6 | 13.85714 | 
| 7 | 14.81579 | 
| 8 | 15.00000 | 
| 9 | 16.00000 | 
| 10 | 10.75000 | 
Of course, we can add the names of genres in, using joins.
SELECT
	Name,
	avg_track_count. 'Average Track Count'
FROM
	Genre AS g
	INNER JOIN (
		SELECT
			genreId,
			AVG(trackCount) AS 'Average Track Count'
		FROM (
			SELECT
				genreId,
				COUNT(*) AS trackCount
			FROM
				Track
			GROUP BY
				albumId) AS track_count
		GROUP BY
			genreId) AS avg_track_count ON g.GenreId = avg_track_count.genreId;| Name | Average Track Count | 
|---|---|
| Rock | 11.41379 | 
| Jazz | 10.00000 | 
| Metal | 10.90625 | 
| Alternative & Punk | 14.43478 | 
| Rock And Roll | 12.00000 | 
| Blues | 13.85714 | 
| Latin | 14.81579 | 
| Reggae | 15.00000 | 
| Pop | 16.00000 | 
| Soundtrack | 10.75000 |