Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json_extract and json_extract_scalar allow invalid JsonPaths (without $ at the start) #22589

Open
mbasmanova opened this issue Apr 22, 2024 · 5 comments
Labels

Comments

@mbasmanova
Copy link
Contributor

Json Path seems to require all paths to start with '$': https://goessner.net/articles/JsonPath/

However, json_extract and json_extract_scalar functions treat leading $ as optional.

Looks like this "bug" was introduced in dbe2ed9#diff-c1770855d6e7c3f3e05245346d9f1ee514df7310d141019e51631124706109e6

Before this change there was an explicit check for leading $:


    @VisibleForTesting
    public static ImmutableList<Object> tokenizePath(String path)
    {
        checkCondition(!isNullOrEmpty(path), INVALID_FUNCTION_ARGUMENT, "Invalid JSON path: '%s'", path);
        checkCondition(path.charAt(0) == '$', INVALID_FUNCTION_ARGUMENT, "JSON path must start with '$': '%s'", path);
...

After, this change, leading $ became optional:

     public JsonPathTokenizer(String path)
    {
        this.path = checkNotNull(path, "path is null");

        // skip the start token
        match('$');
    } 

Velox currently requires JSON Paths to start with '$'.

CC: @tdcmeehan @Yuhta @kagamiori @kgpai @amitkdutta

@mbasmanova mbasmanova added the bug label Apr 22, 2024
@mbasmanova mbasmanova changed the title json_extract and json_extract_scalar allow invalid Json Paths (without $ at the start) json_extract and json_extract_scalar allow invalid JsonPaths (without $ at the start) Apr 22, 2024
@mbasmanova
Copy link
Contributor Author

Looks like the reason for Presto to allow such paths is fallback logic to Jayway:

presto-main/src/main/java/com/facebook/presto/operator/scalar/JsonPath.java

    public static JsonPath build(String pattern)
    {
        try {
            return buildPresto(pattern);
        }
        catch (PrestoException ex) {
            if (ex.getErrorCode() == INVALID_FUNCTION_ARGUMENT.toErrorCode()) {
                return buildJayway(pattern);
            }
            throw ex;
        }
    }

com.jayway.jsonpath.JsonPath.compile(path) happily translates 'foo' path into '$[foo]' path.

@mbasmanova
Copy link
Contributor Author

Jayway has logic to check if path starts with '$' or '@' and if not to prepend the path with '$.' before compiling.

com/jayway/jsonpath/json-path/2.6.0/json-path-2.6.0.jar!/com/jayway/jsonpath/internal/path/PathCompiler.class

    public static Path compile(String path, Predicate... filters) {
        try {
            CharacterIndex ci = new CharacterIndex(path);
            ci.trim();
            if (ci.charAt(0) != '$' && ci.charAt(0) != '@') {
                ci = new CharacterIndex("$." + path);
                ci.trim();
            }

            if (ci.lastCharIs('.')) {
                fail("Path must not end with a '.' or '..'");
            }

            LinkedList<Predicate> filterStack = new LinkedList(Arrays.asList(filters));
            Path p = (new PathCompiler(ci, filterStack)).compile();
            return p;

@mbasmanova
Copy link
Contributor Author

mbasmanova commented Apr 23, 2024

Also, JSON Path seems to allow either dot-notation ($.store.book[0].title) or bracket-notation ($['store']['book'][0]['title']), but Jayway allows a mix: $.['store'].['book'].[0].['title']

presto:di> SELECT json_extract_scalar('[{"foo": 1}, {"bar": 2}]', '$.0.foo');
 _col0
-------
 1

presto:di> SELECT json_extract_scalar('[{"foo": 1}, {"bar": 2}]', '$[0]foo');
 _col0
-------
 1

presto:di> SELECT json_extract_scalar('[{"foo": 1}, {"bar": 2}]', '$[0].foo');
 _col0
-------
 1

presto:di> SELECT json_extract_scalar('[{"foo": 1}, {"bar": 2}]', '$[0].[''foo'']');
 _col0
-------
 1

presto:di> SELECT json_extract_scalar('[{"foo": 1}, {"bar": 2}]', '$[0][''foo'']');
 _col0
-------
 1

@mbasmanova
Copy link
Contributor Author

Jayway has logic to check if path starts with '$' or '@' and if not to prepend the path with '$.' before compiling.

This logic seems problematic as it produces some unexpected results. ".[0]" path is converted to "..[0]" which performs deep scan and returns a list of first elements for ALL arrays in the JSON document.

presto:di> SELECT json_extract('[1, 2, [10, 20, [100]]]', '$[0]');
 _col0
-------
 1

presto:di> SELECT json_extract('[1, 2, [10, 20, [100]]]', '$.[0]');
 _col0
-------
 1

presto:di> SELECT json_extract('[1, 2, [10, 20, [100]]]', '.[0]');
   _col0
------------
 [1,10,100]

I think it is fine to change Velox to treat leading '$' as optional, but I don't think Velox should prepend '$.' to the path when leading '$' is missing.

CC: @sviscaino

mbasmanova added a commit to mbasmanova/velox-1 that referenced this issue Apr 23, 2024
…alar] Presto functions

Summary:
json_extract_scalar and json_extract Presto function allow paths like 'foo'
and 'foo.bar'. These paths are non-standard. They are enabled via use of Jayway
engine, which handles paths without leading '$' by prepending the path
with '$.'. Path 'foo' becomes '$.foo'. However, this logic has some unexpected
side effects. Path '.foo' becomes '$..foo', which uses deep scan operator '..'
and matches 'foo' keys anywhere in the document, rather then just at the top
level.

This change allows Velox to support paths like 'foo' and 'foo.bar[2].bar', but
not '.foo' or '[10]'. These paths are handled as if they contained leading '$.' characters.

See prestodb/presto#22589

Part of facebookincubator#7049

Differential Revision: D56465662
mbasmanova added a commit to mbasmanova/velox-1 that referenced this issue Apr 23, 2024
…alar] Presto functions (facebookincubator#9584)

Summary:

json_extract_scalar and json_extract Presto function allow paths like 'foo'
and 'foo.bar'. These paths are non-standard. They are enabled via use of Jayway
engine, which handles paths without leading '$' by prepending the path
with '$.'. Path 'foo' becomes '$.foo'. However, this logic has some unexpected
side effects. Path '.foo' becomes '$..foo', which uses deep scan operator '..'
and matches 'foo' keys anywhere in the document, rather then just at the top
level.

This change allows Velox to support paths like 'foo' and 'foo.bar[2].bar', but
not '.foo' or '[10]'. These paths are handled as if they contained leading '$.' characters.

See prestodb/presto#22589

Part of facebookincubator#7049

Differential Revision: D56465662
mbasmanova added a commit to mbasmanova/velox-1 that referenced this issue Apr 23, 2024
…alar] Presto functions (facebookincubator#9584)

Summary:

json_extract_scalar and json_extract Presto function allow paths like 'foo'
and 'foo.bar'. These paths are non-standard. They are enabled via use of Jayway
engine, which handles paths without leading '$' by prepending the path
with '$.'. Path 'foo' becomes '$.foo'. However, this logic has some unexpected
side effects. Path '.foo' becomes '$..foo', which uses deep scan operator '..'
and matches 'foo' keys anywhere in the document, rather then just at the top
level.

This change allows Velox to support paths like 'foo' and 'foo.bar[2].bar', but
not '.foo' or '[10]'. These paths are handled as if they contained leading '$.' characters.

See prestodb/presto#22589

Part of facebookincubator#7049

Differential Revision: D56465662
mbasmanova added a commit to mbasmanova/velox-1 that referenced this issue Apr 23, 2024
…Presto

Summary:
json_extract_scalar and json_extract Presto function allow paths like '$.[0].
[1].[2].foo' that include "redundant" dots before opening brackets. These paths
are non-standard. They are enabled via use of Jayway engine. 

This change allows Velox to support these paths by ignoring dots that appear
before opening brackets.

See prestodb/presto#22589 Part of
facebookincubator#7049

Differential Revision: D56467690
mbasmanova added a commit to mbasmanova/velox-1 that referenced this issue Apr 23, 2024
…alar] Presto functions (facebookincubator#9584)

Summary:

json_extract_scalar and json_extract Presto function allow paths like 'foo'
and 'foo.bar'. These paths are non-standard. They are enabled via use of Jayway
engine, which handles paths without leading '$' by prepending the path
with '$.'. Path 'foo' becomes '$.foo'. However, this logic has some unexpected
side effects. Path '.foo' becomes '$..foo', which uses deep scan operator '..'
and matches 'foo' keys anywhere in the document, rather then just at the top
level.

This change allows Velox to support paths like 'foo' and 'foo.bar[2].bar', but
not '.foo' or '[10]'. These paths are handled as if they contained leading '$.' characters.

See prestodb/presto#22589

Part of facebookincubator#7049

Differential Revision: D56465662
mbasmanova added a commit to mbasmanova/velox-1 that referenced this issue Apr 23, 2024
…Presto (facebookincubator#9585)

Summary:

json_extract_scalar and json_extract Presto function allow paths like '$.[0].
[1].[2].foo' that include "redundant" dots before opening brackets. These paths
are non-standard. They are enabled via use of Jayway engine. 

This change allows Velox to support these paths by ignoring dots that appear
before opening brackets.

See prestodb/presto#22589 Part of
facebookincubator#7049

Reviewed By: xiaoxmeng

Differential Revision: D56467690
mbasmanova added a commit to mbasmanova/velox-1 that referenced this issue Apr 23, 2024
…alar] Presto functions (facebookincubator#9584)

Summary:

json_extract_scalar and json_extract Presto function allow paths like 'foo'
and 'foo.bar'. These paths are non-standard. They are enabled via use of Jayway
engine, which handles paths without leading '$' by prepending the path
with '$.'. Path 'foo' becomes '$.foo'. However, this logic has some unexpected
side effects. Path '.foo' becomes '$..foo', which uses deep scan operator '..'
and matches 'foo' keys anywhere in the document, rather then just at the top
level.

This change allows Velox to support paths like 'foo' and 'foo.bar[2].bar', but
not '.foo' or '[10]'. These paths are handled as if they contained leading '$.' characters.

See prestodb/presto#22589

Part of facebookincubator#7049

Differential Revision: D56465662
mbasmanova added a commit to mbasmanova/velox-1 that referenced this issue Apr 23, 2024
…Presto (facebookincubator#9585)

Summary:

json_extract_scalar and json_extract Presto function allow paths like '$.[0].
[1].[2].foo' that include "redundant" dots before opening brackets. These paths
are non-standard. They are enabled via use of Jayway engine. 

This change allows Velox to support these paths by ignoring dots that appear
before opening brackets.

See prestodb/presto#22589 Part of
facebookincubator#7049

Reviewed By: xiaoxmeng

Differential Revision: D56467690
facebook-github-bot pushed a commit to facebookincubator/velox that referenced this issue Apr 23, 2024
…alar] Presto functions (#9584)

Summary:
Pull Request resolved: #9584

json_extract_scalar and json_extract Presto function allow paths like 'foo'
and 'foo.bar'. These paths are non-standard. They are enabled via use of Jayway
engine, which handles paths without leading '$' by prepending the path
with '$.'. Path 'foo' becomes '$.foo'. However, this logic has some unexpected
side effects. Path '.foo' becomes '$..foo', which uses deep scan operator '..'
and matches 'foo' keys anywhere in the document, rather then just at the top
level.

This change allows Velox to support paths like 'foo' and 'foo.bar[2].bar', but
not '.foo' or '[10]'. These paths are handled as if they contained leading '$.' characters.

See prestodb/presto#22589

Part of #7049

Reviewed By: xiaoxmeng

Differential Revision: D56465662

fbshipit-source-id: c32bfe6c1b2587b9f996605a169abb934fd3d95c
facebook-github-bot pushed a commit to facebookincubator/velox that referenced this issue Apr 23, 2024
…Presto (#9585)

Summary:
Pull Request resolved: #9585

json_extract_scalar and json_extract Presto function allow paths like '$.[0].
[1].[2].foo' that include "redundant" dots before opening brackets. These paths
are non-standard. They are enabled via use of Jayway engine.

This change allows Velox to support these paths by ignoring dots that appear
before opening brackets.

See prestodb/presto#22589 Part of
#7049

Reviewed By: xiaoxmeng

Differential Revision: D56467690

fbshipit-source-id: 42866815ff186510679a0ab877e2796d324309c7
@mbasmanova
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 🆕 Unprioritized
Development

No branches or pull requests

1 participant