{ Input Validation with JSONSchema. }

Objectives:

By the end of this chapter, you should be able to:

  • Describe the importance of server-side validation
  • Understand the basics of the JSONSchema standard
  • Implement JSON validation on your API endpoints

Server-side Data Validation

One of the most fundamental and important security measures you can take as an API developer is preventing bad user inputs from messing up your API. The overall goal here is that you identify bad inputs as quickly as possible and respond with more 400 - Bad Request responses than 500 - Internal Server Error responses. Recall that a 400 status code is issued when the server proactively identifies that the user has sent some invalid or incorrect data, while a 500 status code indicates that the server itself broke because it was unable to handle the request.

A server lacking adequate validation can result in:

  • corrupt or incomplete data in the database (this causes all kinds of errors down the road)
  • crashing or locking up the server
  • displaying vague, unhelpful errors to the frontend or users of the API
  • at a minimum, extra server or database CPU/memory load due to unsuccessfully trying to process bad data

In this section, we are particularly interested in validating data at any API endpoint that accepts a user-defined JSON payload (for example a POST, PUT, or PATCH to /users). User-written payloads can often be large, complex, and often manually entered by a user (for example, form input). Therefore, these payloads are extra prone to having errors in them.

This is where JSON Schema comes in.

Why JSON Schema?

There are three main reasons for using a schema validation system:

  1. You want user data to fail fast, before the bad data even gets to your ORM or database.
  2. You want to reduce the amount of code you have to write for processing and validating data.
  3. You want a validation system that is easy to setup and maintain.

Before we get into how to use it, let's talk about rolling our own validation first.

Rolling Your Own Validation Doesn't Always Scale

Let's assume you have a /books endpoint, and the JSON payload to add a new book looks like this:

{
  "data": {
    "amazon-url": "http://a.co/eobPtX2",
    "author": "Matthew Lane",
    "isbn-10": "0691161518",
    "isbn-13": "978-0691161518",
    "language": "english",
    "pages": 264,
    "publisher": "Princeton University Press",
    "title": "Power-Up: Unlocking the Hidden Mathematics in Video Games",
    "year": 2017
  }
}

Your /books POST request handler might look like this:

router.route('/books').post((req, res, next) => {
  const book = request.body.data;

  if (!book) {
    // pass a 400 error to the error-handler
    let error = new Error('Book payload is required');
    error.status = 400;
    return next(error);
  }
  /* 
    (not implemented) insert the book into the database here
  */
  return res.status(201).json(book);
});

In the above example, there is some very light validation going on, consisting of checking if the request.body.data is not null or undefined.

This is the bare minimum amount of validation you would need.

But what about if you want title and author to be required fields?

if (!book.author || !book.title) {
  let error = new Error('Book "author" and "title" are required fields.');
  error.status = 400;
  return next(error);
}

Not too bad, and we're getting more validation...

But what if users send invalid amazon URLs or ISBNs that are numbers instead of strings?

/**
 * let's assume you've written a validateUrl function
 */
if (book['amazon-url'] && !validateUrl(book['amazon-url'])) {
  let error = new Error('Amazon URL is not valid.');
  error.status = 400;
  return next(error);
}

if (book['isbn-10'] && typeof book['isbn-10'] !== 'string') {
  let error = new Error('ISBN-10 needs to be a string.');
  error.status = 400;
  return next(error);
}

As you can see in the above examples, if we want to roll our own validation this way, every request handler is just going to have tons of conditional logic checking for all the edge cases. And trust me, there are tons of edge cases! If this backend is powering a web form for example, you can count on getting tons of bad data just from bot spam.

While this can sometimes be a perfectly fine approach, it doesn't scale that well, unless you want to write your own extensive validation framework or constantly be adding more conditionals once you discover more loopholes.

JSON Schema Basics

JSON Schema is a standard specification for describing JSON documents in a human- and machine-readable format. You can go here to see the exact specification, and here to see a more readable guide of what the specification means.

We're going to jump right into it using our previous example. Recall the example "Book" JSON payload using Matt's book:

{
  "data": {
    "amazon-url": "http://a.co/eobPtX2",
    "author": "Matthew Lane",
    "isbn-10": "0691161518",
    "isbn-13": "978-0691161518",
    "language": "english",
    "pages": 264,
    "publisher": "Princeton University Press",
    "title": "Power-Up: Unlocking the Hidden Mathematics in Video Games",
    "year": 2017
  }
}

JSONschema.net

Instead of manually writing a JSON schema doc, since we have this nice example already filled out we can head over to jsonschema.net to auto-generate a schema for us. Simply paste the JSON in the box on the left and click "SUBMIT":

generating a json schema doc

In the main box marked "HTML" we have our resulting JSON schema as interpreted by our input.

The easiest thing we can do to customize this, is to click on the "EDIT" tab, and adjust the fields. Let's make "data", "author", and "title" required.

marking json schema fields as required

Click the save button at the top right (looks like a floppy disk). And then the schema should update.

This is what the resulting schema should look like (on JSONschema.net you can click the copy button and paste it into any .json file):

{
  "$id": "http://example.com/example.json",
  "type": "object",
  "properties": {
    "data": {
      "$id": "/properties/data",
      "type": "object",
      "properties": {
        "amazon-url": {
          "$id": "/properties/data/properties/amazon-url",
          "type": "string",
          "title": "The Amazon-url Schema ",
          "default": "",
          "examples": ["http://a.co/eobPtX2"]
        },
        "author": {
          "$id": "/properties/data/properties/author",
          "type": "string",
          "title": "The Author Schema ",
          "default": "",
          "examples": ["Matthew Lane"]
        },
        "isbn-10": {
          "$id": "/properties/data/properties/isbn-10",
          "type": "string",
          "title": "The Isbn-10 Schema ",
          "default": "",
          "examples": ["0691161518"]
        },
        "isbn-13": {
          "$id": "/properties/data/properties/isbn-13",
          "type": "string",
          "title": "The Isbn-13 Schema ",
          "default": "",
          "examples": ["978-0691161518"]
        },
        "language": {
          "$id": "/properties/data/properties/language",
          "type": "string",
          "title": "The Language Schema ",
          "default": "",
          "examples": ["english"]
        },
        "pages": {
          "$id": "/properties/data/properties/pages",
          "type": "integer",
          "title": "The Pages Schema ",
          "default": 0,
          "examples": [264]
        },
        "publisher": {
          "$id": "/properties/data/properties/publisher",
          "type": "string",
          "title": "The Publisher Schema ",
          "default": "",
          "examples": ["Princeton University Press"]
        },
        "title": {
          "$id": "/properties/data/properties/title",
          "type": "string",
          "title": "The Title Schema ",
          "default": "",
          "examples": [
            "Power-Up: Unlocking the Hidden Mathematics in Video Games"
          ]
        },
        "year": {
          "$id": "/properties/data/properties/year",
          "type": "integer",
          "title": "The Year Schema ",
          "default": 0,
          "examples": [2017]
        }
      },
      "required": ["author", "title"]
    }
  },
  "required": ["data"]
}

Great! We now have a massive blob of JSON Schema that we can use for validation (as well as testing, although we will not cover that in this section).

Using the JSONSchema NPM Package in Express

We'll be using the jsonschema npm package (links: npm and github).

The package works basically works like this:

  1. You import the validator.
  2. You supply the validator with a schema.
  3. You pass instances of user input to the validator.
  4. The validator checks if the user input is valid against the schema.
  5. If it's invalid, you respond with errors. Otherwise continue.

We install this with npm install jsonschema.

Once installed, we can use it in any file like so:

// import the validator class
const { validate } = require('jsonschema');

// require the book schema (a JSON file that we generated on jsonschema.net)
const bookSchema = require('./bookSchema.json');

router.route('/books').post((req, res, next) => {
  // check if the current request.body payload is a valid book
  const result = validate(req.body, bookSchema);

  // jsonschema validation results in a "valid" key being set to "false" if the instance doesn't match the schema
  if (!result.valid) {
    // pass the validation errors to the error handler
    //  the "stack" key is generally the most useful
    return next(result.errors.map(error => error.stack));
  }

  // at this point in the code, we know we have a valid payload with a data key
  const book = req.body.data;
  /* 
    (not implemented) insert the book into the database here
  */
  return res.status(201).json(book);
});

That's all there is to it! With an auto-generated schema from JSONschema.net (with perhaps a few minor tweaks) and the jsonschema npm package, you can easily add robust validation to your Node/Express API to prevent bad inputs.

Final Note: you may have to make your error handler more robust to handle arrays of errors given to you by the validation result. Basically, the validate function will tell you everything wrong with the instance in relation to the supplied schema, so just make sure you have a way to tell the user all of their errors in the error handler:

app.use((error, req, res, next) => {
  // by default get the error message
  let err = error.message;

  let key = 'error';
  // for display purposes, if it's an array call it "errors"
  if (Array.isArray(error)) {
    key = 'errors';
  }

  return res.status(err.status || 500).json({ [key]: err });
});

When you're ready, move on to Environment Variables

Continue

Creative Commons License