Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-228] Testset generation. TypeError: unsupported operand type(s) for -: 'str' and 'int' #900

Closed
GaalDorn1k opened this issue Apr 25, 2024 · 6 comments · Fixed by #980
Closed
Assignees
Labels
bug Something isn't working linear Created by Linear-GitHub Sync
Milestone

Comments

@GaalDorn1k
Copy link

GaalDorn1k commented Apr 25, 2024

[ *] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug
TestsetGenerator.generate_with_langchain_docs() returns an empty TestDataset object with TypeError: unsupported operand type(s) for -: 'str' and 'int'

Ragas version: 0.1.8
Python version: 3.10.0

Code to Reproduce

from ragas import adapt
from ragas.testset.evolutions import simple, reasoning, multi_context, conditional
from ragas.testset.generator import TestsetGenerator
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

chat = ChatOpenAI(
    model="gpt-3.5-turbo",
    openai_api_key="token",
    openai_api_base=inference_server_url,
    max_tokens=1000,
    temperature=0.2,
)
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

distributions = {  
    simple: 0.6,  
    multi_context: 0.2,  
    reasoning: 0.2  
}

generator = TestsetGenerator.from_langchain(
    generator_llm=chat,
    critic_llm=chat,
    embeddings=embeddings,
)

generator.adapt(language='russian', evolutions=[simple, reasoning, conditional, multi_context])

testset = generator.generate_with_langchain_docs(documents, 1, distributions=distributions, raise_exceptions=False, with_debugging_logs=True)

Error trace

Runner in Executor raised an exception
Traceback (most recent call last):
  File "C:\Users\beznosov_pb\Documents\RAGAS\ragas3\ragas\src\ragas\executor.py", line 79, in _aresults
    r = await future
  File "c:\Users\beznosov_pb\AppData\Local\miniconda3\envs\llmetric\lib\asyncio\tasks.py", line 575, in _wait_for_one
    return f.result()  # May raise f.exception().
  File "C:\Users\beznosov_pb\Documents\RAGAS\ragas3\ragas\src\ragas\executor.py", line 38, in sema_coro
    return await coro
  File "C:\Users\beznosov_pb\Documents\RAGAS\ragas3\ragas\src\ragas\executor.py", line 112, in wrapped_callable_async
    return counter, await callable(*args, **kwargs)
  File "C:\Users\beznosov_pb\Documents\RAGAS\ragas3\ragas\src\ragas\testset\evolutions.py", line 144, in evolve
    return await self.generate_datarow(
  File "C:\Users\beznosov_pb\Documents\RAGAS\ragas3\ragas\src\ragas\testset\evolutions.py", line 210, in generate_datarow
    selected_nodes = [
  File "C:\Users\beznosov_pb\Documents\RAGAS\ragas3\ragas\src\ragas\testset\evolutions.py", line 213, in <listcomp>
    if int(i) - 1 < len(current_nodes.nodes)
TypeError: unsupported operand type(s) for -: 'str' and 'int'
Generating: 100%|██████████| 1/1 [01:39<00:00, 99.74s/it]

Expected behavior
TestsetGenerator.generate_with_langchain_docs() returns a non-empty TestDataset object

Additional context
The error can actually be corrected by editing the file editing ragas\src\ragas\testset\evolutions.py:

210   selected_nodes = [
211            current_nodes.nodes[int(i) - 1]
212            for i in relevant_context_indices
213            if int(i) - 1 < len(current_nodes.nodes)
214    ]

If I were confident in the reliability of this fix, I would create a PR

R-228

@GaalDorn1k GaalDorn1k added the bug Something isn't working label Apr 25, 2024
@jjmachan jjmachan added the linear Created by Linear-GitHub Sync label Apr 28, 2024
@jjmachan jjmachan changed the title Testset generation. TypeError: unsupported operand type(s) for -: 'str' and 'int' [R-228] Testset generation. TypeError: unsupported operand type(s) for -: 'str' and 'int' Apr 28, 2024
@jjmachan
Copy link
Member

hey @GaalDorn1k I'm actually not sure about this - could you give me some time to get back to this.

I've added this to our linear workflow but feel free to make a PR too if you want - we can work of from there 🙂

@HerrIvan
Copy link

HerrIvan commented May 1, 2024

I had the same issue and indeed the fix is the following:

             selected_nodes = [
                 current_nodes.nodes[i - 1]
                 for i in relevant_context_indices
-                if i - 1 < len(current_nodes.nodes)
+                if int(i) - 1 < len(current_nodes.nodes)
             ]
             relevant_context = (
                 CurrentNodes(root_node=selected_nodes[0], nodes=selected_nodes)

@GaalDorn1k it is however strange the your error trace already contains the fix... I am not sure how that can happen, but I assume that the error was triggered because the ragas package executed did not contain the fix. I guess that is possible if you run the code within a notebook without restarting the kernel after changing the code.

Btw, the error is not 100% deterministic. I think it has to do with the json parsing returning integers sometimes as integers and sometimes as strings.

@GaalDorn1k
Copy link
Author

@HerrIvan In my case type casting is also needed on line 211. In addition, the error also occurs when running not in notebook.

@GaalDorn1k
Copy link
Author

@jjmachan After thinking a little, I began to suspect that the cause of thre error was using FastChat to run the model. It seems that the fastchat server is no different in appearance from the openai server. But this may not be the case, since the error is not observed when using the vllm server. Also, I am running an openchat model named "gpt-3.5-turbo" for langchain integration. This may be the reason for the error. Now, I'm not sure that is a bug, since i haven't found information anywhere about whether Ragas should work with FatChat. In any case, I'm going to spend a little time to find the final causes of the error, since I need to work with FastChat.

@HerrIvan
Copy link

HerrIvan commented May 2, 2024

@HerrIvan In my case type casting is also needed on line 211. In addition, the error also occurs when running not in notebook.

Oh yes, you are absolutely right about the additional type casting needed.

@jjmachan After thinking a little, I began to suspect that the cause of thre error was using FastChat to run the model. It seems that the fastchat server is no different in appearance from the openai server. But this may not be the case, since the error is not observed when using the vllm server. Also, I am running an openchat model named "gpt-3.5-turbo" for langchain integration. This may be the reason for the error. Now, I'm not sure that is a bug, since i haven't found information anywhere about whether Ragas should work with FatChat. In any case, I'm going to spend a little time to find the final causes of the error, since I need to work with FastChat.

Hey @GaalDorn1k. I also encountered the issue running the testset generation from the terminal. I think the issue is (or was) that the json generation can happen in two steps: a direct casting, and if that one fails, an additional request to the LLM. One of these actions was sometimes returning integers as strings. In the meantime this repo has had some updates in the output parsing, so maybe the issue is not there anymore. But I think the fix with the type casting cannot hurt.

I made a PR since I would actually I need this fix to be able to run my workflows without using a forked repo.

@GaalDorn1k
Copy link
Author

Hi @HerrIvan. It will be great if your PR closes the issue. Until then I'm also forced to use a forked repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working linear Created by Linear-GitHub Sync
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants