Week 9 - Views & Custom Functions

This week we'll cover creating views and custom functions in MySQL

Saving (things that aren't data) in the database

For the next few weeks, we'll be looking at four things that you can save to your database that aren't tables, or rows or columns or data values.

Instead, they are custom tools for (ideally) making your database safer, simpler and easier to use.

This week we're looking at views & custom functions.

Views are essentially saved results tables that you can query.

Custom functions are just like the built-in scalar functions, like LOWER(), SUBSTR(), or ROUND(), except we get to define how the data is processed!

Next week, we'll look at triggers.

Triggers are things that automatically happen before or after data is changed in a table.

The week after, we'll look at stored procedures, which are programmed subroutines - real procedural programming!

Stored procedures are saved "scripts" that can run DML commands (CREATE, INSERT, UPDATE, DELETE, etc.) based on logic we create.

Views

What is a view?

Views are "virtual tables" - a saved query that returns a results table, which can then be used as if it were an existing table in the database.

You've probably noticed at this point that good database design requires splitting your data up into numerous tables, often requiring joins on three or more tables at a time. If you were required to write out these complex joins over and over again, that would be a lot of work!

That's where views come in - write your query once, and save it. Now you can not only easily fetch that data, but you can query the results table as if it were an existing database table.

Creating a VIEW

The syntax for creating a view is very simple:

CREATE VIEW {viewName} AS {query}

Let's create a view in our library table:

CREATE VIEW americanAuthors AS
SELECT * FROM authors
WHERE country IN ('USA', 'U.S.');

Then we can query the view in the same way we'd write a select statement, only using the view in place of a table:

SELECT * FROM americanAuthors
WHERE SUBSTR(first_name,1,1) = 'D';

So this is a new database table, right?

NOPE

A view is not a database table. A view is just a shortcut - a shortcut to running a query and getting back the results table.

Every time you interact with the view, you are running the saved query and getting fresh results. Views don't represent any stored data, just a stored query.

When to use views

You find yourself writing the same SELECT statements repeatedly.
You find yourself writing the same complex SELECT statements (i.e. with JOINs or complex WHERE clauses) occasionally.
You want to grant a user access to some of the columns in a table, but not all.

When not to use views

When you think you'll need to alter the table structure in the future, and you're worried you'll forget to update the views to match.

Do views impact performance?

Nope, views don't have an impact on performance, either good or bad. They're just stored queries. You can't index them, and they don't partition your tables. On the other hand, they don't eat up space or memory either, they're essentially no different than if you wrote out the queries long-hand.

The only consideration with views is whether they can make the database easier to use.

Can I use UPDATE, INSERT or DELETE with views?

You can use UPDATE, INSERT, and DELETE with some views.

There is a long list Opens in a new window of qualifications for a view that can can accept DML statements. What it boils down to is this: for a view to be "updateable", there must be a direct relationship between the rows in the view and the rows in the underlying table. This means:

No DISTINCT
No aggregate functions
No GROUP BY or HAVING
No outer joins

You can also only use DML on one of the underlying tables if your view contains an inner join.

And, of course all the constraints of the underlying table still apply.

Should I use UPDATE, INSERT or DELETE with views?

I'd argue that you should not. Whatever possible gains you might make in syntactical simplicity are outweighed by the risks of adding abstraction to your data manipulations.

Simpler to write? Maybe. Harder to keep safe and organized? Usually, yes.

How do I replace, alter or drop my view?

The syntax for replacing, altering and dropping views probably won't shock you...

Deleting a view:

DROP VIEW {viewName};

Altering a view:

ALTER VIEW {viewName}
AS {selectStatement};

Replacing a view (maybe slightly different than you'd expect):

CREATE OR REPLACE VIEW {viewName}
AS {selectStatement};

Running Scripts instead of statements

So far, the things we've been doing are SQL statements - telling our database engines to do one thing.

Today we're going to start talking about SQL scripts - telling our database engines to do multiple things.

This means we'll need to learn about delimiters, but first, we need to learn how to run scripts in DBeaver. Don't worry, it's just a different button!

You can highlight some code to run as a script, or you can press the 'run script' button, and DBeaver will execute all of the statements in your script (.sql document).

From now on, when you tell me "my code isn't working", the first thing I'm gonna ask is "did you run it as a script?"

What is a delimiter?

When we write in english, we write in sentences and paragraphs.

If you were to just write one sentence, you wouldn't really need something to say where the sentence stops, because there's only one sentence - it just ends

However, if you're writing a paragraph with multiple sentences, you need periods. This is how the reader knows where one sentence ends, and the next begins.

Periods are the default, but there are other ways to end a sentence, aren't there? There are exclamation points! And sometimes question marks? Or even ellipses...

In an SQL script, we have a default way of delimiting statements (or, in other words, showing where one statement ends and the next begins). We have the semicolon (;) as our default delimiter.

Today we'll be talking to the database about commands we want it to run later. These commands will have semicolons in them. However, when the database sees a semicolon, it will want to stop reading!

What we do to tackle this is to use the command DELIMITER to define a new delimiter symbol. Then, once we're done telling it about the commands we want it to run later, we'll use the command DELIMITER again to set the delimiter back to the default.

For example:

-- Set the delimiter to something 
-- other than a semicolon, in this case
-- two backslashes
DELIMITER //
CREATE FUNCTION capitalize (
  word VARCHAR(255)
)
RETURNS VARCHAR(255)
DETERMINISTIC
BEGIN
  -- Here's the code we're saving in the database
  -- that it will _eventually_ run
  -- but we don't want interpreted as code
  -- right now.
  DECLARE outputWord VARCHAR(255); 
  SET outputWord = CONCAT(UPPER(SUBSTR(word, 1, 1)), SUBSTR(word, 2));
  RETURN outputWord;
END; //
-- Now that we've finished our
-- CREATE FUNCTION statement, we can
-- end the statement with //, and
-- set the delimiter back 
-- to being a semicolon.
DELIMITER ;

Note the space before the semicolon when we set it back to the default!

DELIMITER ;

The space before the semicolon is very important - your DELIMITER statement won't work without it.

// This is WRONG and will be a hard thing to spot!
DELIMITER;

What is a function?

Very broadly, in programming a function is a block of code that does a job for us.

As you know by this point, SQL has lots of functions that are already available. They all have one thing in common - they return a single value (a single string, or number, or date, or time, etc.)

SQL has built-in scalar functions, that return one value, based on, at most, one row of data. Think of LOWER(), which we use to transform a string into all lower-case letters.

SQL also has built-in aggregate functions, which can take in a whole column of data. Think of SUM(), which adds up all the numbers in a column.

What is a custom function?

SQL also allows us to create our own functions. These functions can only be scalar functions. We get to decide what values (if any) can go into the function, and how they get transformed before they come out the other side, a.k.a get returned.

The syntax for creating a function

DELIMITER //
CREATE FUNCTION {functionName} ( 
  {parameterName} {parameterDataType} 
)
RETURNS {outputDataType}
BEGIN
-- Note: in this example, we're just adding five 
-- to whatever the input number is, to keep things 
-- simple. Obviously, you can do a lot more to 
-- transform the data between the "BEGIN" and "END", 
-- but we'll get to that later.
   RETURN {parameterName} + 5;
END; //
DELIMITER ;

Okay, that's a lot! Let's break it down.

What is a parameter?

A parameter is a variable (a named placeholder) that we use to represent the value that will get input into the function.

There are only two hard things in Computer Science: cache invalidation and naming things.
Phil Karlton

It's important to give your variables, including your parameters, good names. Good names can be hard to come up with, since you want people to be able to recognize your variables based on their names. You want them to be easy to type, so that you're not creating errors with typos all the time. You want them to be broad enough to describe all the data that could go in your variable, but not vague enough so that they could describe other variables as well.

There isn't a great system for naming stuff. Maybe you'll be lucky enough to be on a team that has a naming convention, and you can blame your frustrations on them. Anyway, just do your best, and know that this is something that even great developers chew their lips about.

Let's look at one of SQL's built-in functions:

SELECT UPPER('Simon');
-- Takes the input parameter, 
-- and outputs 'SIMON'

And of course, we can use the values from rows as our input paramters, with one value output for each row returned by the SELECT statement:

SELECT UPPER(first_name) FROM authors;

The input parameters in your custom functions are the placeholders for the values that get input between the parentheses when your function gets used in a SELECT statement. If your custom functions gets defined like this...

DELIMITER //
CREATE FUNCTION whoseWorld ( 
  personsName VARCHAR(255) 
)
RETURNS VARCHAR(263)
BEGIN
   RETURN CONCAT(personsName, '''s World');
END; //
DELIMITER ;

... you can use it in your SELECT statements like this:

SELECT whoseWorld(first_name) FROM authors;
-- Giovannino's World
-- Douglas's World
-- R.L.'s World
-- Denis's World
-- etc.

Notice in our whoseWorld() function, we're using the CONCAT() function. You can totally use functions within functions!

Okay, the punctuation nerd in me is looking at the whoseWorld() function and cringing, because if a name ends in the letter s, it shouldn't have an "'s" after it, just an apostrophe. Let's drop that function, and add to our function using an IF/ELSE statement!

Don't worry too much about picking up the IF/ELSE syntax this week - we'll learn all about how conditions & loops work in SQL next week.

DROP FUNCTION IF EXISTS whoseWorld;

DELIMITER //
CREATE FUNCTION whoseWorld ( 
  personsName VARCHAR(255) 
)
RETURNS VARCHAR(263)
BEGIN
-- Note: the 'RIGHT()' function works like SUBSTR(),
-- except it selects characters starting at the end,
-- and works backwards.
  IF (RIGHT(personsName, 1) = 's')
    THEN 
      RETURN CONCAT(personsName, ''' World');
  ELSE
    RETURN CONCAT(personsName, '''s World');
  END IF;
END; //
DELIMITER ;

SELECT whoseWorld(first_name) FROM authors;
-- Giovannino's World
-- Douglas' World
-- R.L.'s World
-- Denis' World
-- etc.

Much better!

That being said, our code is starting to look a little repetitious - writing and re-writing the whole formulation of the return value multiple times. Let's use a variable within the function to take care of that.

But first, let's take a little aside to see if we can use multiple parameters, and/or defaults for our parameters.

Multiple parameters

Can you have multiple input parameters for your custom function? Of course!

DELIMITER //
CREATE FUNCTION oneIntPlusTheSquareOfAnotherInt ( 
  theFirstNumber  INT,
  theNumberToSquare INT
)
RETURNS INT
BEGIN
  RETURN theFirstNumber + 
    (theNumberToSquare * theNumberToSquare);
END; //
DELIMITER ;

SELECT oneIntPlusTheSquareOfAnotherInt(2,3);
-- Returns 11

Default parameters

Want to set a default parameter in case someone forgets to pass in a value?

TOO BAD.

MySQL, unlike most other RDBMS' doesn't provide a way to set default values for parameters that don't get passed in - any missing parameter will trigger an error.

That being said, you can set null values to a default, but the null value still has to get passed in as an argument. I'll show you how to set a default for null values in a minute, after we discuss how to assign values to variables within the function with the SET command. But first:

The DECLARE command

Parameters aren't the only kind of variable we can use in our functions. We can also have "local" variables that have their values set not by an input, but within the function.

In order to use a local variable in your function, you have to give it a name and a data type. That's what the DECLARE statement is for. You have to do this first thing after the BEGIN command.

DELIMITER //
CREATE FUNCTION whoseWorld ( 
  personsName VARCHAR(255) 
)
RETURNS VARCHAR(263)
BEGIN
  DECLARE wrld VARCHAR(263);
  IF (RIGHT(personsName, 1) = 's')
    THEN 
      RETURN CONCAT(personsName, ''' World');
  ELSE
    RETURN CONCAT(personsName, '''s World');
  END IF;
END; //
DELIMITER ;

Hmmm, that's all well and good, but we never did anything with our wrld variable. To add a value to a local variable, we need to use the SET command.

The SET command

The SET command lets us assign a value to local variables, like so:

DROP FUNCTION IF EXISTS whoseWorld;
DELIMITER //
CREATE FUNCTION whoseWorld ( 
  personsName VARCHAR(255) 
)
RETURNS VARCHAR(263)
BEGIN
  DECLARE wrld VARCHAR(263);
  DECLARE str CHAR(6);
  SET str = ' World';
  IF (RIGHT(personsName, 1) = 's')
    THEN 
      SET wrld = CONCAT(personsName, '''', str);
  ELSE
    SET wrld = CONCAT(personsName, '''s', str);
  END IF;
 RETURN wrld;
END; //
DELIMITER ;

Okay, this may seem like we're making things more complicated... and we are. But with good reason! This will save us time, and help prevent errors, in the long run.

So, first we declare a variable called wrld. This holds the value that we ultimately output. This means we don't have to put our whole formula into a return statement, and we can manipulate our data in multiple steps before we output it.

Then we DECLARE and SET a variable called str. This holds a string that we were repeating in our code. Variables help reduce repetition, which means our code is easier to maintain (we only have to update it in one place), and easier to debug (either it's right or it's wrong, not a mix of the two).

Assigning a default to NULL arguments

Oh, by the way, an 'argument' is the value to pass into a function's parameter. I feel like I'd been writing code for a weirdly long time before I learned that distinction. Anyway...

So, you want to change null values passed in through your parameters into a default value? That's easy with the SET command:

DELIMITER //
CREATE FUNCTION oneIntSquared ( 
  theFirstNumber  INT
)
RETURNS INT
BEGIN
SET theFirstNumber = IFNULL(theFirstNumber, 0);
  RETURN theFirstNumber * theFirstNumber;
END; //
DELIMITER ;

SELECT oneIntSquared(NULL);
-- Returns 0

SELECT oneIntSquared(7);
-- Returns 49

Deterministic vs Non-deterministic functions

One way we can optimize the execution of our functions is by declaring them to be DETERMINISTIC.

A deterministic function is a function that always returns the same result based on the same inputs.

Now you may be thinking, "shouldn't my function always return the same results??" Well, consider this function:

DELIMITER //
CREATE FUNCTION xHoursFromNow ( numberOfHours INT )
RETURNS DATETIME
BEGIN
 RETURN DATE_ADD(NOW(), INTERVAL numberOfHours HOUR);
END; //
DELIMITER ;

If you run the following SELECT statement, wait one minute, and then run it again, you'll get two different results, despite using the same input:

SELECT xHoursFromNow(2);

That makes xHoursFromNow() a non-deterministic function.

Custom functions are NON-DETERMINISTIC by default, despite the fact that most functions you write (basically all those that don't use NOW(), RAND() - which returns a random number, or similar) will be DETERMINISTIC.

It's really easy to take advantage of the performance optimizations that MySQL performs on deterministic functions. Just write the word 'DETERMINISTIC' right before your 'BEGIN' statement, like so:

DELIMITER //
CREATE FUNCTION oneHourFromWhenever ( 
  whateverTime DATETIME 
)
RETURNS DATETIME
DETERMINISTIC
BEGIN
 RETURN DATE_ADD(whateverTime, INTERVAL 1 HOUR);
END; //
DELIMITER ;

How to use custom functions

I think you've picked up on this by now - you use them exactly the same as the built-in scalar functions you've already been using. You just write their name, and then an open parenthesis, and then any values you need to assign to the parameters. You get it.

Dropping a function

Okay, you've seen me do this, and I think you could've guessed it anyway:

DROP FUNCTION {functionName};

While you're developing a function, it might be handy to keep the DROP FUNCTION statement at the top of your script, so it automatically overwrites the existing definition of the function every time you run your script.

If you want to have it there the first time you run your script, you might want to add IF EXISTS, so that you won't get an error the first time you run your script:

DROP FUNCTION IF EXISTS {functionName};

Altering a function

You can't actually alter anything significant about a function after it's been created - just drop the function and recreate it with your alterations.

Finally, let's talk about your

Week 9 - Views & Custom Functions