Skip to content

Fast JSON schema validation with RapidJSON in Ruby

Notifications You must be signed in to change notification settings

foxtacles/rj_schema

Repository files navigation

rj_schema Ruby Gem Version

Fast JSON schema validation with RapidJSON (https://github.com/Tencent/rapidjson)

Usage

require 'rj_schema'

Create an instance of RjSchema::Validator and provide a JSON schema and a JSON document. If you pass a File, it will be read and parsed as JSON. Otherwise, to_json will be called on the arguments internally:

RjSchema::Validator.new.validate(File.new("schema/my_schema.json"), '{"stuff": 1}')

It is possible to resolve remote schemas by specifying them in the initializer. For example, if your schema contains "$ref": "/path/to/generic#/definitions/something":

RjSchema::Validator.new(
  '/path/to/generic' => File.new("definitions/generic.json")
).validate(File.new("schema/my_schema.json"), '{"stuff": 1}')

validate will return a hash containing various descriptions of the errors (for details, see Options below). An ArgumentError exception will be raised if any of the inputs are malformed or missing.

You can also call valid?, which returns a boolean value indicating success/failure instead.

SAX parser validation

If you prefer SAX validation with a simple boolean value, which also dramatically reduces memory requirements, then you can also call sax_valid? and pass the filepath instead of the file reference for the document you want to validate, eg:

RjSchema::Validator.new(
  '/path/to/generic' => File.new("definitions/generic.json")
).sax_valid?(File.new("schema/my_schema.json"), '<<FILEPATH to Doc>>')

Options

validate currently offers three options to customize the validation process. They can be specified as keyword arguments:

RjSchema::Validator.new.validate(
  File.new("schema/my_schema.json"),
  '{"stuff": 1}', 
  continue_on_error: true, 
  machine_errors: false, 
  human_errors: true
)

continue_on_error (default: false).

When set to true, validation will not stop upon the first error. Instead, an attempt will be made to determine all errors in the document based on the provided schema.

machine_errors (default: true).

When set to true, the return value of validate will contain a symbol key called machine_errors, which is a structured hash describing the encountered errors. The hash will be empty if no errors were found. The documentation for the error codes can be found here. Example:

{:machine_errors=>{"maximum"=>{"actual"=>31, "expected"=>20, "errorCode"=>2, "instanceRef"=>"#/aaaa", "schemaRef"=>"#/patternProperties/aaa%2A"}}}

human_errors (default: false).

When set to true, the return value of validate will contain a symbol key called human_errors, which is a printable and human readable string describing the encountered errors. The string will be empty if no errors were found. Example:

{:human_errors=>"Error Name: maximum\nMessage: Number '31' is greater than the 'maximum' value '20'.\nInstance: #/aaaa\nSchema: #/patternProperties/aaa%2A\n\n"}

Caching

Another feature of rj_schema is the ability to preload schemas. This can boost performance by a lot, especially in applications that routinely perform validations against static schemas, i.e. validations of client inputs inside the endpoints of a web app. Add the schemas to preload into the initializer and pass a Symbol to the validation function:

RjSchema::Validator.new(
  '/path/to/generic' => File.new("definitions/generic.json"),
  '/schema/my_schema.json' => File.new("schema/my_schema.json")
).validate(:"/schema/my_schema.json", '{"stuff": 1}')

Limitations

Some limitations apply due to RapidJSON:

  • only JSON schema draft-04 is supported
  • the format keyword is not supported

Benchmark

The main motivation for this gem was that we needed a faster JSON schema validation for our Ruby apps. We have been using Ruby JSON Schema Validator for a while (https://github.com/ruby-json-schema/json-schema) but some of our endpoints became unacceptably slow.

A benchmark to compare various gem performances can be run with: rake benchmark. These are the results collected on my machine (with g++ (Debian 7.3.0-3) 7.3.0, ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux]).

report i/s x
rj_schema (valid?) (cached) 370.9 1
rj_schema (validate) (cached) 187.5 1.98x slower
json_schemer (valid?) (cached) 135.6 2.73x slower
rj_schema (valid?) 132.7 2.79x slower
rj_schema (validate) 96.9 3.83x slower
json-schema 10.6 34.92x slower
json_schema 3.7 101.26x slower

The error reporting of rj_schema is implemented inefficiently at the time of writing, so in this benchmark environment (based on JSON Schema test suite which includes many failing validations) validate performs significantly worse than valid?. This may not be an issue in production environments though, where failing validations are usually the exception (the overhead is only incurred in case of an error).

About

Fast JSON schema validation with RapidJSON in Ruby

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published