How to best validate JSON on the server-side How to best validate JSON on the server-side json json

How to best validate JSON on the server-side


What you are describing with POST, PUT, and PATCH sounds like you are implementing a REST API. Depending on your back-end platform, you can use libraries that will map JSON to objects which is very powerful and performs that validation for you. In JAVA, you can use Jersey, Spring, or Jackson. If you are using .NET, you can use Json.NET.

If efficiency is your goal and you want to validate every single request, it would be ideal if you could evaluate on the front-end if you are using JavaScript you can use json2.js.

In regards to comparing your methods, here is a Pro / Cons list.

Method #1: Upon Request

Pros

  1. The business logic integrity is maintained. As you mentioned trying to validate while processing business logic could result in invalid tests that may actually be valid and vice versa or also the validation could inadvertently impact the business logic negatively.
  2. As Norbert mentioned, catching the errors before hand will improve efficiency. The logical question this poses is why spend the time processing, if there are errors in the first place?
  3. The code will be cleaner and easier to read. Having validation and business logic separated will result in cleaner, easier to read and maintain code.

Cons

  1. It could result in redundant processing meaning longer computing time.

Method #2: Validation on the Go

Pros

  1. It's efficient theoretically by saving process and compute time doing them at the same time.

Cons

  1. In reality, the process time that is saved is likely negligible (as mentioned by Norbert). You are still doing the validation check either way. In addition, processing time is wasted if an error was found.
  2. The data integrity can be comprised. It could be possible that the JSON becomes corrupt when processing it this way.
  3. The code is not as clear. When reading the business logic, it may not be as apparent what is happening because validation logic is mixed in.

What it really boils down to is Accuracy vs Speed. They generally have an inverse relationship. As you become more accurate and validate your JSON, you may have to compromise some on speed. This is really only noticeable in large data sets as computers are really fast these days. It is up to you to decide what is more important given how accurate you think you data may be when receiving it or whether that extra second or so is crucial. In some cases, it does matter (i.e. with the stock market and healthcare applications, milliseconds matter) and both are highly important. It is in those cases, that as you increase one, for example accuracy, you may have to increase speed by getting a higher performant machine.

Hope this helps.


The first approach is more robust, but does not have to be noticeably more expensive. It becomes way less expensive even when you are able to abort the parsing process due to errors: Your business logic usually takes >90% of the resources in a process, so if you have an error % of 10%, you are already resource neutral. If you optimize the validation process so that the validations from the business process are performed upfront, your error rate might be much lower (like 1 in 20 to 1 in 100) to stay resource neutral.

For an example on an implementation assuming upfront data validation, look at GSON (https://code.google.com/p/google-gson/):

GSON works as follows: Every part of the JSON can be cast into an object. This object is typed or contains typed data:Sample object (JAVA used as example language):

public class someInnerDataFromJSON {    String name;    String address;    int housenumber;    String buildingType;    // Getters and setters    public String getName() { return name; }    public void setName(String name) { this.name=name; }    //etc.}

The data parsed by GSON is by using the model provided, already type checked.This is the first point where your code can abort.

After this exit point assuming the data confirmed to the model, you can validate if the data is within certain limits. You can also write that into the model.

Assume for this buildingType is a list:

  • Single family house
  • Multi family house
  • Apartment

You can check data during parsing by creating a setter which checks the data, or you can check it after parsing in a first set of your business rule application. The benefit of first checking the data is that your later code will have less exception handling, so less and easier to understand code.


In general, the first option would be the way to go. The only reason why you might need to think of the second option is if you were dealing with JSON data which was tens of MBs large or more.

In other words, only if you are trying to stream JSON and process it on the fly, you will need to think about second option.

Assuming that you are dealing with few hundred KB at most per JSON, you can just go for option one.

Here are some steps you could follow:

  1. Go for a JSON parser like GSON that would just convert your entireJSON input into the corresponding Java domain model object. (If GSONdoesn't throw an exception, you can be sure that the JSON isperfectly valid.)
  2. Of course, the objects which were constructed using GSON in step 1may not be in a functionally valid state. For example, functionalchecks like mandatory fields and limit checks would have to be done.
  3. For this, you could define a validateState method which repeatedlyvalidates the states of the object itself and its child objects.

Here is an example of a validateState method:

public void validateState(){     //Assume this validateState is part of Customer class.    if(age<12 || age>150)         throw new IllegalArgumentException("Age should be in the range 12 to 120");    if(age<18 && (guardianId==null || guardianId.trim().equals(""))         throw new IllegalArgumentException("Guardian id is mandatory for minors");    for(Account a:customer.getAccounts()){        a.validateState(); //Throws appropriate exceptions if any inconsistency in state    }}