Week 2 - Functions
Week 2 - Functions
This week we'll start by reviewing and expanding on SQL queries and their clauses. We'll introduce the concept of functions, and look at scalar functions in SQL.
Here's what we're going to do today:
Query basics review
SELECT | What columns do we want? |
FROM | What table's it going to come from? |
WHERE | How do we filter it? |
ORDER BY | How do we sort it? |
SELECT
* | Asterisk selects all columns. |
column1, column2 | Comma-separating column names lets us specify multiple columns. |
column1 || column2 | Pipes concatenate our data. |
'string' | Single quotes let us use literal strings. |
AS 'Custom Column Name' | The 'AS' clause creates an alias. |
SELECT DISTINCT column1 | 'DISTINCT' eliminates duplicates. |
Filtering
The most common method of filtering our data is by using the WHERE
clause. WHERE uses our three types of operators to set our conditions for including data in our results.
We can use comparison operators (=
, <>
, >=
, etc.) Note that >, <, etc. can be used with letters as well as numbers.
We can use arithmetic operators (+
, -
, *
, /
).
We can use logical operators AND
, OR
, NOT
.
IN
lets us check against a list.
BETWEEN
lets us specify a range (for alphanumeric characters).
IS NULL
lets us see if a cell has no data (not even zero or an empty string).
Note that IS NULL
, IN
, and BETWEEN
can be combined with the NOT
operator to form IS NOT NULL
, NOT IN
, and NOT BETWEEN
conditions.
Unfortunately, however, NOT IN
and NOT BETWEEN
will also exclude NULL
values. These "false positives" make these last two conditions not super useful, so they're not often used.
Order of operations
BEDMAS: Brackets, Exponents, Division and Multiplication, Addition and Subtraction.
We can use brackets (parentheses) to affect the order of operations for our logical operators as well.
IS NULL
This one is pretty straightforward - IS NULL
and IS NOT NULL
are conditions we can add to the WHERE
clause to return results that are (or aren't) null.
LIKE and wildcards
LIKE
is a powerful search tool for finding strings that match a pattern.
The %
symbol matches zero or more characters.
The _
(underscore) symbol matches one character.
DISTINCT
The DISTINCT
clause, placed after SELECT
, means our query only returns rows where the data in the selected column(s) is unique.
We can select multiple rows, and DISTINCT
will return rows where the combination of column data is unique.
ORDER BY
ORDER BY
lets us sort our table based on a specified column. We can specify more than one column (or an alias).
We can use ASC
and DESC
to say whether we want our results in ascending or descending order.
A few more things about ORDER BY
ASC
and DESC
are specific to a column, so the following two statements produce different results:
SELECT * FROM invoice
ORDER BY customerId DESC, total
Ordered by descending customerId, ascending total.SELECT * FROM invoice
ORDER BY customerId, total DESC
Ordered by ascending customerId, descending total.The order of precedence is...
- special characters
- numbers
- capital letters
- lowercase letters
As we talked about, you can order according to any column (except when using DISTINCT
, in which case it can only be a column you've selected).
We are also able to select using an expression, or with the column number.
SELECT *
FROM customer
ORDER BY firstName || lastName
SELECT *
FROM customer
ORDER BY 3,2
Wait, isn't that just 2 different ways of writing ORDER BY firstName, lastName
?
Yep! SQL is pretty generous in terms of how to accomplish things. Which is not to say you should be writing things three different ways if you don't need to, just a) that you should be able to read alternate methods when you come across them, and b) you know that you have options if you're in a situation where one method works better than another.
Now let's take single-table queries a few steps further.
More Query Basics
Limiting results with LIMIT and OFFSET
LIMIT
is a clause that lets us declare how many rows we want to return. OFFSET
lets us declare at what row to begin returning rows.
We write our LIMIT clause after our ORDER BY. If we want to offset, we append the OFFSET clause to LIMIT.
Our syntax is:
LIMIT n [OFFSET n]
Example:
SELECT * FROM artist
ORDER BY artistId
LIMIT 5 OFFSET 5
The order we write in.
We'll discuss this in depth next week, but this week, just know: if you put your clauses (WHERE
, ORDER BY
, etc.) in the wrong order you'll get an error. The order of our clauses (the ones we've learned so far) is this:
WHERE
ORDER BY
LIMIT
The logic behind this: if WHERE
did not come first, then the database would have to do extra work, ordering or limiting rows that won't show up in the results table; if LIMIT
did not come last, then it would have to change the rows that were included in the limited set when the rows were re-ordered or filtered.
To review:
Queries start with SELECT
...
...followed by the names of the columns that we want in the results table (or *
for all columns)...
...followed by FROM
and the name of the table.
If you want only unique rows of data in your results table, write the DISTINCT
clause after the SELECT
command.
If you want to filter out some rows, you can define conditions in the WHERE
clause. These conditions are defined with logical, arithmetic, and/or comparision operators. If column data meets the condition(s), then the row is included in the results table.
If you want your results in a different order than they are in the database, we use ORDER BY
.
Now that we know all the major clauses we can add to a query, we can look at functions.
So far we've just used our queries to choose which rows and columns to show in our results tables. Using scalar functions, we'll be able to transform individual cells of data.
What is a function?
In computer programming in general, a function can be a block of code that either:
- performs a task, or
- returns modified data.
SQL has a more strict definition.
In SQL, code blocks that can perform a task are called 'procedures'. We'll talk about those another week.
In SQL, when we call something a function, it must return data.
Since we're learning SQL, from now on, when I call something a function, I'm using only definition b) from the above list .
In a different week, we'll also talk about how to create our own functions, but for now we're just looking at "built-in" functions.
Scalar functions
Scalar functions are functions that operate on a single value. Later, we'll learn about aggregate functions, which can do things like sum an entire column, but for now, the functions we're working with will both consume and return single pieces of data.
having magnitude, but not direction.
Arguments
Some functions, like DATE()
have a default value that goes into them. If you want to get the value of today's date, you can just write SELECT DATE()
For most functions, however, we need to pass in one or more pieces of input data so it can give us an output. These are called 'arguments', and we write them inside the parentheses following the name of the function. If there is more than one argument, we separate the values with commas.
name | arguments | ||
SUBSTR | ( | '7th & 8th character', 7, 2 | ) |
Rather than hardcoding these values, most often we use the column names to pass information into these functions. Remember that what we're doing in SQL is manipulating data. It's very important that you understand where data comes from, what's happening to it, and where it is ending up.
So far, we have simply been getting data from point A (the database) to point B (the results table), with some sorting and filtering along the way.
Now that we've started looking at functions, we've begun to manipulate data - changing it from its original form in the database, while it passes through our query and ends in the results table.
SUBSTR
The SUBSTR()
function takes 3 arguments - string, position, and length.
Scalar functions are nestable
I know what you're thinking, "What if I want to know what the last letters in their names are?"
SELECT
lastName,
firstName,
UPPER(SUBSTR(REVERSE(firstName), 1, 1)) ||
UPPER(SUBSTR(REVERSE(lastName), 1, 1))
AS "Last Initials"
FROM customer
Again, think of the journey of the data. The original data comes from the database column name (i.e. firstName). Then it is changed by the REVERSE()
function, then the SUBSTR()
function, and finally the UPPER()
function.
Step | Result |
---|---|
Original data | Simon |
REVERSE() | nomiS |
SUBSTR({string}, 1, 1) | n |
UPPER() | N |
There are a ton of these functions.
Scalar functions cheat sheet | |||
---|---|---|---|
Name | Example | Example result | Description |
In String | INSTR('Demodogs' , 'dog') | 5 | Returns the location of a substring in a string |
Length | LENGTH('Mornings are for coffee and contemplation') | 41 | Returns the number of characters |
Right trim | RTRIM(' MKULTRA ') | ' MKULTRA' | Trims trailing whitespace |
Left trim | LTRIM(' MKULTRA ') | 'MKULTRA ' | Trims initial whitespace |
Replace | REPLACE('High score: DUSTIN' , 'DUSTIN' , 'MADMAX') | High score: MADMAX | Replaces all of one string value with another. |
Lowercase | LOWER('RAINBOW ROOM') | rainbow room | Converts string to lowercase |
Uppercase | UPPER('Why are you keeping this curiosity door locked?') | WHY ARE YOU KEEPING THIS CURIOSITY DOOR LOCKED? | Converts string to uppercase |
Coalesce | COALESCE(NULL,'Will Byers') | Will Byers | Replaces null values with specified value. |