Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Unable to parse the JSON schema for LlamaIndex when there is a space in dictionary key #150

Open
prabhupant opened this issue Nov 21, 2023 · 2 comments

Comments

@prabhupant
Copy link

The below code throws error when this schema is passed in LlamaIndex's JSONQueryEngine. The error occurs because the key Issue id has a space in between. Tried this after removing the space with underscore and it worked

json_schema = {
    "description": "Schema defining the jira tickets and their related data",
    "type": "object",
    "properties": {
        "issues": {
            "description": "List of Jira tickets with their related data",
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "Summary": {
                        "description": "Summary of an issue",
                        "type": "string"
                    },
                    "Issue id": {
                        "description": "Issue id of an issue",
                        "type": "integer"
                    }
                }, 
            }
        }
    },
}
@michaelmior
Copy link
Collaborator

@prabhupant It would help if you could provide a code sample not using LlamaIndex which exhibits this problem.

@prabhupant
Copy link
Author

@michaelmior I don't have a code without LlamaIndex right now, I came across jsonpath-ng through LlamaIndex only. If it helps I am pasting the error I got from LlamaIndex when this library was called

Doing this string -  
$.tech_issue[?(@.Status == 'Open')].* | $.tech_issue[?(@.Status == 'Closed')].* | $.tech_issue[?(@.Status == 'In Progress')].* | $.tech_issue[?(@.Status == 'Resolved')].* | count(@)
---------------------------------------------------------------------------
JsonPathParserError                       Traceback (most recent call last)
Cell In[43], line 1
----> 1 nl_response = nl_query_engine.query(
      2 "Group the issues according to the status field and give count of issues in each group. Also print the entire json data that you processed."
      3 )
      5 nl_response

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/llama_index/indices/query/base.py:23, in BaseQueryEngine.query(self, str_or_query_bundle)
     21 if isinstance(str_or_query_bundle, str):
     22     str_or_query_bundle = QueryBundle(str_or_query_bundle)
---> 23 response = self._query(str_or_query_bundle)
     24 return response

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/llama_index/token_counter/token_counter.py:78, in llm_token_counter.<locals>.wrap.<locals>.wrapped_llm_predict(_self, *args, **kwargs)
     76 def wrapped_llm_predict(_self: Any, *args: Any, **kwargs: Any) -> Any:
     77     with wrapper_logic(_self):
---> 78         f_return_val = f(_self, *args, **kwargs)
     80     return f_return_val

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/llama_index/indices/struct_store/json_query.py:120, in JSONQueryEngine._query(self, query_bundle)
    115     print_text(f"> JSONPath Prompt: {formatted_prompt}\n")
    116     print_text(
    117         f"> JSONPath Instructions:\n" f"```\n{json_path_response_str}\n```\n"
    118     )
--> 120 json_path_output = self._output_processor(
    121     json_path_response_str,
    122     self._json_value,
    123     **self._output_kwargs,
    124 )
    126 if self._verbose:
    127     print_text(f"> JSONPath Output: {json_path_output}\n")

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/llama_index/indices/struct_store/json_query.py:47, in default_output_processor(llm_output, json_value)
     44 except ImportError as exc:
     45     raise ImportError(IMPORT_ERROR_MSG) from exc
---> 47 datum: List[DatumInContext] = parse(llm_output).find(json_value)
     48 return [d.value for d in datum]

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/jsonpath_ng/ext/parser.py:172, in parse(path, debug)
    171 def parse(path, debug=False):
--> 172     return ExtentedJsonPathParser(debug=debug).parse(path)

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/jsonpath_ng/parser.py:46, in JsonPathParser.parse(self, string, lexer)
     44 lexer = lexer or self.lexer_class()
     45 print("Doing this string - ", string)
---> 46 return self.parse_token_stream(lexer.tokenize(string))

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/jsonpath_ng/parser.py:70, in JsonPathParser.parse_token_stream(self, token_iterator, start_symbol)
     60 # And we regenerate the parse table every time;
     61 # it doesn't actually take that long!
     62 new_parser = ply.yacc.yacc(module=self,
     63                            debug=self.debug,
     64                            tabmodule = parsing_table_module,
   (...)
     67                            start = start_symbol,
     68                            errorlog = logger)
---> 70 return new_parser.parse(lexer = IteratorToTokenStream(token_iterator))

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/ply/yacc.py:333, in LRParser.parse(self, input, lexer, debug, tracking, tokenfunc)
    331     return self.parseopt(input, lexer, debug, tracking, tokenfunc)
    332 else:
--> 333     return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/ply/yacc.py:1201, in LRParser.parseopt_notrack(self, input, lexer, debug, tracking, tokenfunc)
   1199     errtoken.lexer = lexer
   1200 self.state = state
-> 1201 tok = call_errorfunc(self.errorfunc, errtoken, self)
   1202 if self.errorok:
   1203     # User must have done some kind of panic
   1204     # mode recovery on their own.  The
   1205     # returned token is the next lookahead
   1206     lookahead = tok

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/ply/yacc.py:192, in call_errorfunc(errorfunc, token, parser)
    190 _token = parser.token
    191 _restart = parser.restart
--> 192 r = errorfunc(token)
    193 try:
    194     del _errok, _token, _restart

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/jsonpath_ng/parser.py:84, in JsonPathParser.p_error(self, t)
     83 def p_error(self, t):
---> 84     raise JsonPathParserError('Parse error at %s:%s near token %s (%s)'
     85                               % (t.lineno, t.col, t.value, t.type))

JsonPathParserError: Parse error at 2:179 near token ( (()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants