Week 2 - Functions

This week we'll start by reviewing and expanding on SQL queries and their clauses. We'll introduce the concept of functions, and look at scalar functions in SQL.

Here's what we're going to do today:

Query basics review
More query basics
1. LIMIT and OFFSET
2. The order we write in
More single table query techniques
What is a function?
Scalar functions in SQL

Query basics review

`SELECT`	What columns* do we want?*
`FROM`	What table's it going to come from?
`WHERE`	How do we filter* it?*
`ORDER BY`	How do we sort* it?*

SELECT

`*`	Asterisk selects all columns.
`column1, column2`	Comma-separating column names lets us specify multiple columns.
`column1 \|\| column2`	Pipes concatenate our data.
`'string'`	Single quotes let us use literal strings.
`AS 'Custom Column Name'`	The 'AS' clause creates an alias.
`SELECT DISTINCT column1`	'DISTINCT' eliminates duplicates.

Filtering

The most common method of filtering our data is by using the WHERE clause. WHERE uses our three types of operators to set our conditions for including data in our results.

We can use comparison operators (=, <>, >=, etc.) Note that >, <, etc. can be used with letters as well as numbers.

We can use arithmetic operators (+, -, *, /).

We can use logical operators AND, OR, NOT.

IN lets us check against a list.

BETWEEN lets us specify a range (for alphanumeric characters).

IS NULL lets us see if a cell has no data (not even zero or an empty string).

Note that IS NULL, IN, and BETWEEN can be combined with the NOT operator to form IS NOT NULL, NOT IN, and NOT BETWEEN conditions.

Unfortunately, however, NOT IN and NOT BETWEEN will also exclude NULL values. These "false positives" make these last two conditions not super useful, so they're not often used.

Order of operations

BEDMAS: Brackets, Exponents, Division and Multiplication, Addition and Subtraction.

We can use brackets (parentheses) to affect the order of operations for our logical operators as well.

IS NULL

This one is pretty straightforward - IS NULL and IS NOT NULL are conditions we can add to the WHERE clause to return results that are (or aren't) null.

Remember that 0 is not null

LIKE and wildcards

LIKE is a powerful search tool for finding strings that match a pattern.

The % symbol matches zero or more characters.

The _ (underscore) symbol matches one character.

DISTINCT

The DISTINCT clause, placed after SELECT, means our query only returns rows where the data in the selected column(s) is unique.

We can select multiple rows, and DISTINCT will return rows where the combination of column data is unique.

ORDER BY

ORDER BY lets us sort our table based on a specified column. We can specify more than one column (or an alias).

We can use ASC and DESC to say whether we want our results in ascending or descending order.

A few more things about ORDER BY

ASC and DESC are specific to a column, so the following two statements produce different results:

SELECT * FROM invoice
ORDER BY customerId DESC, total

Ordered by descending customerId, ascending total.

SELECT * FROM invoice
ORDER BY customerId, total DESC

Ordered by ascending customerId, descending total.

The order of precedence is...

special characters
numbers
capital letters
lowercase letters

As you'd expect, this order is reversed when descending.

As we talked about, you can order according to any column (except when using DISTINCT, in which case it can only be a column you've selected).

We are also able to select using an expression, or with the column number.

SELECT *
FROM customer
ORDER BY firstName || lastName

SELECT *
FROM customer
ORDER BY 3,2

Wait, isn't that just 2 different ways of writing ORDER BY firstName, lastName?

Yep! SQL is pretty generous in terms of how to accomplish things. Which is not to say you should be writing things three different ways if you don't need to, just a) that you should be able to read alternate methods when you come across them, and b) you know that you have options if you're in a situation where one method works better than another.

Now let's take single-table queries a few steps further.

More Query Basics

Limiting results with LIMIT and OFFSET

LIMIT is a clause that lets us declare how many rows we want to return. OFFSET lets us declare at what row to begin returning rows.

We write our LIMIT clause after our ORDER BY. If we want to offset, we append the OFFSET clause to LIMIT.

Our syntax is:

LIMIT n [OFFSET n]

Example:

SELECT * FROM artist
    ORDER BY artistId
    LIMIT 5 OFFSET 5

The order we write in.

We'll discuss this in depth next week, but this week, just know: if you put your clauses (WHERE, ORDER BY, etc.) in the wrong order you'll get an error. The order of our clauses (the ones we've learned so far) is this:

WHERE
ORDER BY
LIMIT

The logic behind this: if WHERE did not come first, then the database would have to do extra work, ordering or limiting rows that won't show up in the results table; if LIMIT did not come last, then it would have to change the rows that were included in the limited set when the rows were re-ordered or filtered.

To review:

Queries start with SELECT...

...followed by the names of the columns that we want in the results table (or * for all columns)...

...followed by FROM and the name of the table.

If you want only unique rows of data in your results table, write the DISTINCT clause after the SELECT command.

If you want to filter out some rows, you can define conditions in the WHERE clause. These conditions are defined with logical, arithmetic, and/or comparision operators. If column data meets the condition(s), then the row is included in the results table.

If you want your results in a different order than they are in the database, we use ORDER BY.

Now that we know all the major clauses we can add to a query, we can look at functions.

So far we've just used our queries to choose which rows and columns to show in our results tables. Using scalar functions, we'll be able to transform individual cells of data.

What is a function?

In computer programming in general, a function can be a block of code that either:

performs a task, or
returns modified data.

SQL has a more strict definition.

In SQL, code blocks that can perform a task are called 'procedures'. We'll talk about those another week.

In SQL, when we call something a function, it must return data.

Since we're learning SQL, from now on, when I call something a function, I'm using only definition b) from the above list in the previous slide.

In a different week, we'll also talk about how to create our own functions, but for now we're just looking at "built-in" functions.

Scalar functions

Scalar functions are functions that operate on a single value. Later, we'll learn about aggregate functions, which can do things like sum an entire column, but for now, the functions we're working with will both consume and return single pieces of data.

In mathematics, scalar is defined as having magnitude, but not direction.

Arguments

Some functions, like DATE() have a default value that goes into them. If you want to get the value of today's date, you can just write SELECT DATE()

For most functions, however, we need to pass in one or more pieces of input data so it can give us an output. These are called 'arguments', and we write them inside the parentheses following the name of the function. If there is more than one argument, we separate the values with commas.

name		arguments
`SUBSTR`	`(`	`'7th & 8th character', 7, 2`	`)`

What do you think the result of the above function would be? What do the second and third arguments represent?

Rather than hardcoding these values, most often we use the column names to pass information into these functions. Remember that what we're doing in SQL is manipulating data. It's very important that you understand where data comes from, what's happening to it, and where it is ending up.

So far, we have simply been getting data from point A (the database) to point B (the results table), with some sorting and filtering along the way.

Now that we've started looking at functions, we've begun to manipulate data - changing it from its original form in the database, while it passes through our query and ends in the results table.

SUBSTR

The SUBSTR() function takes 3 arguments - string, position, and length.

Scalar functions are nestable

I know what you're thinking, "What if I want to know what the last letters in their names are?"

SELECT 
    lastName, 
    firstName, 
    UPPER(SUBSTR(REVERSE(firstName), 1, 1)) || 
        UPPER(SUBSTR(REVERSE(lastName), 1, 1)) 
    AS "Last Initials"
FROM customer

Again, think of the journey of the data. The original data comes from the database column name (i.e. firstName). Then it is changed by the REVERSE() function, then the SUBSTR() function, and finally the UPPER() function.

Step	Result
`Original data`	Simon
`REVERSE()`	nomiS
`SUBSTR({string}, 1, 1)`	n
`UPPER()`	N

See the notes for a handy cheat sheet of (nearly) all the scalar functions.

There are a ton of these functions.

Scalar functions cheat sheet
Name	Example	Example result	Description
In String	`INSTR('Demodogs' , 'dog')`	5	Returns the location of a substring in a string
Length	`LENGTH('Mornings are for coffee and contemplation')`	41	Returns the number of characters
Right trim	`RTRIM(' MKULTRA ')`	' MKULTRA'	Trims trailing whitespace
Left trim	`LTRIM(' MKULTRA ')`	'MKULTRA '	Trims initial whitespace
Replace	`REPLACE('High score: DUSTIN' , 'DUSTIN' , 'MADMAX')`	High score: MADMAX	Replaces all of one string value with another.
Lowercase	`LOWER('RAINBOW ROOM')`	rainbow room	Converts string to lowercase
Uppercase	`UPPER('Why are you keeping this curiosity door locked?')`	WHY ARE YOU KEEPING THIS CURIOSITY DOOR LOCKED?	Converts string to uppercase
Coalesce	`COALESCE(NULL,'Will Byers')`	Will Byers	Replaces null values with specified value.