New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
json_extract, json_extract_scalar, and json_size should support Jayway style JSON path syntax #7049
Comments
There was an issue opened in simdjson recently for supporting JSONPath, this would get us most of the way there, I think Jayway just extended this with aggregates that can be tacked on the end of the path. |
We will prioritize this feature in simdjson. |
Update: This is more ambitious a proposal than I considered at first. |
In our production workload, we are seeing the following JSONPaths that are currently not supported.
I believe we can support (1) and (2) with simple changes to JsonPathTokenizer and support (3) using something like "Tree Walking and JSON Element Types" in https://github.com/simdjson/simdjson/blob/master/doc/basics.md#json-path See prestodb/presto#22589 for additional context. |
Hi @mbasmanova, Spark's implementation is based on Jackson. Here are some findings.
Spark requires json path starts with "$".
Spark allows a mix also. |
@PHILO-HE Thank you for clarifying. |
…alar] Presto functions Summary: json_extract_scalar and json_extract Presto function allow paths like 'foo' and 'foo.bar'. These paths are non-standard. They are enabled via use of Jayway engine, which handles paths without leading '$' by prepending the path with '$.'. Path 'foo' becomes '$.foo'. However, this logic has some unexpected side effects. Path '.foo' becomes '$..foo', which uses deep scan operator '..' and matches 'foo' keys anywhere in the document, rather then just at the top level. This change allows Velox to support paths like 'foo' and 'foo.bar[2].bar', but not '.foo' or '[10]'. These paths are handled as if they contained leading '$.' characters. See prestodb/presto#22589 Part of facebookincubator#7049 Differential Revision: D56465662
…alar] Presto functions (facebookincubator#9584) Summary: json_extract_scalar and json_extract Presto function allow paths like 'foo' and 'foo.bar'. These paths are non-standard. They are enabled via use of Jayway engine, which handles paths without leading '$' by prepending the path with '$.'. Path 'foo' becomes '$.foo'. However, this logic has some unexpected side effects. Path '.foo' becomes '$..foo', which uses deep scan operator '..' and matches 'foo' keys anywhere in the document, rather then just at the top level. This change allows Velox to support paths like 'foo' and 'foo.bar[2].bar', but not '.foo' or '[10]'. These paths are handled as if they contained leading '$.' characters. See prestodb/presto#22589 Part of facebookincubator#7049 Differential Revision: D56465662
…alar] Presto functions (facebookincubator#9584) Summary: json_extract_scalar and json_extract Presto function allow paths like 'foo' and 'foo.bar'. These paths are non-standard. They are enabled via use of Jayway engine, which handles paths without leading '$' by prepending the path with '$.'. Path 'foo' becomes '$.foo'. However, this logic has some unexpected side effects. Path '.foo' becomes '$..foo', which uses deep scan operator '..' and matches 'foo' keys anywhere in the document, rather then just at the top level. This change allows Velox to support paths like 'foo' and 'foo.bar[2].bar', but not '.foo' or '[10]'. These paths are handled as if they contained leading '$.' characters. See prestodb/presto#22589 Part of facebookincubator#7049 Differential Revision: D56465662
…Presto Summary: json_extract_scalar and json_extract Presto function allow paths like '$.[0]. [1].[2].foo' that include "redundant" dots before opening brackets. These paths are non-standard. They are enabled via use of Jayway engine. This change allows Velox to support these paths by ignoring dots that appear before opening brackets. See prestodb/presto#22589 Part of facebookincubator#7049 Differential Revision: D56467690
…alar] Presto functions (facebookincubator#9584) Summary: json_extract_scalar and json_extract Presto function allow paths like 'foo' and 'foo.bar'. These paths are non-standard. They are enabled via use of Jayway engine, which handles paths without leading '$' by prepending the path with '$.'. Path 'foo' becomes '$.foo'. However, this logic has some unexpected side effects. Path '.foo' becomes '$..foo', which uses deep scan operator '..' and matches 'foo' keys anywhere in the document, rather then just at the top level. This change allows Velox to support paths like 'foo' and 'foo.bar[2].bar', but not '.foo' or '[10]'. These paths are handled as if they contained leading '$.' characters. See prestodb/presto#22589 Part of facebookincubator#7049 Differential Revision: D56465662
…Presto (facebookincubator#9585) Summary: json_extract_scalar and json_extract Presto function allow paths like '$.[0]. [1].[2].foo' that include "redundant" dots before opening brackets. These paths are non-standard. They are enabled via use of Jayway engine. This change allows Velox to support these paths by ignoring dots that appear before opening brackets. See prestodb/presto#22589 Part of facebookincubator#7049 Reviewed By: xiaoxmeng Differential Revision: D56467690
…alar] Presto functions (facebookincubator#9584) Summary: json_extract_scalar and json_extract Presto function allow paths like 'foo' and 'foo.bar'. These paths are non-standard. They are enabled via use of Jayway engine, which handles paths without leading '$' by prepending the path with '$.'. Path 'foo' becomes '$.foo'. However, this logic has some unexpected side effects. Path '.foo' becomes '$..foo', which uses deep scan operator '..' and matches 'foo' keys anywhere in the document, rather then just at the top level. This change allows Velox to support paths like 'foo' and 'foo.bar[2].bar', but not '.foo' or '[10]'. These paths are handled as if they contained leading '$.' characters. See prestodb/presto#22589 Part of facebookincubator#7049 Differential Revision: D56465662
…Presto (facebookincubator#9585) Summary: json_extract_scalar and json_extract Presto function allow paths like '$.[0]. [1].[2].foo' that include "redundant" dots before opening brackets. These paths are non-standard. They are enabled via use of Jayway engine. This change allows Velox to support these paths by ignoring dots that appear before opening brackets. See prestodb/presto#22589 Part of facebookincubator#7049 Reviewed By: xiaoxmeng Differential Revision: D56467690
…alar] Presto functions (#9584) Summary: Pull Request resolved: #9584 json_extract_scalar and json_extract Presto function allow paths like 'foo' and 'foo.bar'. These paths are non-standard. They are enabled via use of Jayway engine, which handles paths without leading '$' by prepending the path with '$.'. Path 'foo' becomes '$.foo'. However, this logic has some unexpected side effects. Path '.foo' becomes '$..foo', which uses deep scan operator '..' and matches 'foo' keys anywhere in the document, rather then just at the top level. This change allows Velox to support paths like 'foo' and 'foo.bar[2].bar', but not '.foo' or '[10]'. These paths are handled as if they contained leading '$.' characters. See prestodb/presto#22589 Part of #7049 Reviewed By: xiaoxmeng Differential Revision: D56465662 fbshipit-source-id: c32bfe6c1b2587b9f996605a169abb934fd3d95c
…Presto (#9585) Summary: Pull Request resolved: #9585 json_extract_scalar and json_extract Presto function allow paths like '$.[0]. [1].[2].foo' that include "redundant" dots before opening brackets. These paths are non-standard. They are enabled via use of Jayway engine. This change allows Velox to support these paths by ignoring dots that appear before opening brackets. See prestodb/presto#22589 Part of #7049 Reviewed By: xiaoxmeng Differential Revision: D56467690 fbshipit-source-id: 42866815ff186510679a0ab877e2796d324309c7
Description
The functions json_extract, json_extract_scalar, and json_size in Presto use the Jayway library to parse JSON paths and handle extraction https://github.com/json-path/JsonPath
Velox's version of these functions uses a JSON path tokenizer based on Presto's JsonPathTokenizer. JsonPathTokenizer supports a much simpler syntax, but is likely faster, Presto uses it if possible and falls back to Jayway's if it can parse the path.
Jayway's parser is quite extensive, supporting various operators, aggregates, regexes, filters.
It would be great if the Velox JSON path parser could support Jayway's syntax.
https://github.com/facebookincubator/velox/blob/e2ee0cad24d5407146d2da08b68c6701ee86e9da/velox/functions/prestosql/json/JsonPathTokenizer.h
The text was updated successfully, but these errors were encountered: