[R-228] Testset generation. TypeError: unsupported operand type(s) for -: 'str' and 'int' #900

GaalDorn1k · 2024-04-25T12:32:05Z

[ *] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug
TestsetGenerator.generate_with_langchain_docs() returns an empty TestDataset object with TypeError: unsupported operand type(s) for -: 'str' and 'int'

Ragas version: 0.1.8
Python version: 3.10.0

Code to Reproduce

from ragas import adapt
from ragas.testset.evolutions import simple, reasoning, multi_context, conditional
from ragas.testset.generator import TestsetGenerator
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

chat = ChatOpenAI(
    model="gpt-3.5-turbo",
    openai_api_key="token",
    openai_api_base=inference_server_url,
    max_tokens=1000,
    temperature=0.2,
)
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

distributions = {  
    simple: 0.6,  
    multi_context: 0.2,  
    reasoning: 0.2  
}

generator = TestsetGenerator.from_langchain(
    generator_llm=chat,
    critic_llm=chat,
    embeddings=embeddings,
)

generator.adapt(language='russian', evolutions=[simple, reasoning, conditional, multi_context])

testset = generator.generate_with_langchain_docs(documents, 1, distributions=distributions, raise_exceptions=False, with_debugging_logs=True)

Error trace

Runner in Executor raised an exception
Traceback (most recent call last):
  File "C:\Users\beznosov_pb\Documents\RAGAS\ragas3\ragas\src\ragas\executor.py", line 79, in _aresults
    r = await future
  File "c:\Users\beznosov_pb\AppData\Local\miniconda3\envs\llmetric\lib\asyncio\tasks.py", line 575, in _wait_for_one
    return f.result()  # May raise f.exception().
  File "C:\Users\beznosov_pb\Documents\RAGAS\ragas3\ragas\src\ragas\executor.py", line 38, in sema_coro
    return await coro
  File "C:\Users\beznosov_pb\Documents\RAGAS\ragas3\ragas\src\ragas\executor.py", line 112, in wrapped_callable_async
    return counter, await callable(*args, **kwargs)
  File "C:\Users\beznosov_pb\Documents\RAGAS\ragas3\ragas\src\ragas\testset\evolutions.py", line 144, in evolve
    return await self.generate_datarow(
  File "C:\Users\beznosov_pb\Documents\RAGAS\ragas3\ragas\src\ragas\testset\evolutions.py", line 210, in generate_datarow
    selected_nodes = [
  File "C:\Users\beznosov_pb\Documents\RAGAS\ragas3\ragas\src\ragas\testset\evolutions.py", line 213, in <listcomp>
    if int(i) - 1 < len(current_nodes.nodes)
TypeError: unsupported operand type(s) for -: 'str' and 'int'
Generating: 100%|██████████| 1/1 [01:39<00:00, 99.74s/it]

Expected behavior
TestsetGenerator.generate_with_langchain_docs() returns a non-empty TestDataset object

Additional context
The error can actually be corrected by editing the file editing ragas\src\ragas\testset\evolutions.py:

210   selected_nodes = [
211            current_nodes.nodes[int(i) - 1]
212            for i in relevant_context_indices
213            if int(i) - 1 < len(current_nodes.nodes)
214    ]

If I were confident in the reliability of this fix, I would create a PR

_R-228

The text was updated successfully, but these errors were encountered:

jjmachan · 2024-04-28T19:00:24Z

hey @GaalDorn1k I'm actually not sure about this - could you give me some time to get back to this.

I've added this to our linear workflow but feel free to make a PR too if you want - we can work of from there 🙂

HerrIvan · 2024-05-01T13:46:51Z

I had the same issue and indeed the fix is the following:

             selected_nodes = [
                 current_nodes.nodes[i - 1]
                 for i in relevant_context_indices
-                if i - 1 < len(current_nodes.nodes)
+                if int(i) - 1 < len(current_nodes.nodes)
             ]
             relevant_context = (
                 CurrentNodes(root_node=selected_nodes[0], nodes=selected_nodes)

@GaalDorn1k it is however strange the your error trace already contains the fix... I am not sure how that can happen, but I assume that the error was triggered because the ragas package executed did not contain the fix. I guess that is possible if you run the code within a notebook without restarting the kernel after changing the code.

Btw, the error is not 100% deterministic. I think it has to do with the json parsing returning integers sometimes as integers and sometimes as strings.

GaalDorn1k · 2024-05-02T06:40:38Z

@HerrIvan In my case type casting is also needed on line 211. In addition, the error also occurs when running not in notebook.

GaalDorn1k · 2024-05-02T07:01:07Z

@jjmachan After thinking a little, I began to suspect that the cause of thre error was using FastChat to run the model. It seems that the fastchat server is no different in appearance from the openai server. But this may not be the case, since the error is not observed when using the vllm server. Also, I am running an openchat model named "gpt-3.5-turbo" for langchain integration. This may be the reason for the error. Now, I'm not sure that is a bug, since i haven't found information anywhere about whether Ragas should work with FatChat. In any case, I'm going to spend a little time to find the final causes of the error, since I need to work with FastChat.

HerrIvan · 2024-05-02T07:16:14Z

@HerrIvan In my case type casting is also needed on line 211. In addition, the error also occurs when running not in notebook.

Oh yes, you are absolutely right about the additional type casting needed.

@jjmachan After thinking a little, I began to suspect that the cause of thre error was using FastChat to run the model. It seems that the fastchat server is no different in appearance from the openai server. But this may not be the case, since the error is not observed when using the vllm server. Also, I am running an openchat model named "gpt-3.5-turbo" for langchain integration. This may be the reason for the error. Now, I'm not sure that is a bug, since i haven't found information anywhere about whether Ragas should work with FatChat. In any case, I'm going to spend a little time to find the final causes of the error, since I need to work with FastChat.

Hey @GaalDorn1k. I also encountered the issue running the testset generation from the terminal. I think the issue is (or was) that the json generation can happen in two steps: a direct casting, and if that one fails, an additional request to the LLM. One of these actions was sometimes returning integers as strings. In the meantime this repo has had some updates in the output parsing, so maybe the issue is not there anymore. But I think the fix with the type casting cannot hurt.

I made a PR since I would actually I need this fix to be able to run my workflows without using a forked repo.

GaalDorn1k · 2024-05-02T07:25:38Z

Hi @HerrIvan. It will be great if your PR closes the issue. Until then I'm also forced to use a forked repo

fixes: #900

GaalDorn1k added the bug Something isn't working label Apr 25, 2024

jjmachan added the linear Created by Linear-GitHub Sync label Apr 28, 2024

jjmachan changed the title ~~Testset generation. TypeError: unsupported operand type(s) for -: 'str' and 'int'~~ [R-228] Testset generation. TypeError: unsupported operand type(s) for -: 'str' and 'int' Apr 28, 2024

HerrIvan mentioned this issue May 2, 2024

fix TypeError in evolutions.py generate_data_row #929

Closed

choshiho mentioned this issue May 17, 2024

Testset generation ValueError: invalid literal for int() with base 10: #966

Open

shahules786 self-assigned this May 21, 2024

jjmachan added this to the v.4 milestone May 21, 2024

shahules786 mentioned this issue May 21, 2024

fix: patch type issue in evolution parsing #980

Merged

shahules786 closed this as completed in #980 May 21, 2024

shahules786 added a commit that referenced this issue May 21, 2024

fix: patch type issue in evolution parsing (#980)

54e9f4d

fixes: #900

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R-228] Testset generation. TypeError: unsupported operand type(s) for -: 'str' and 'int' #900

[R-228] Testset generation. TypeError: unsupported operand type(s) for -: 'str' and 'int' #900

GaalDorn1k commented Apr 25, 2024 •

edited by jjmachan

jjmachan commented Apr 28, 2024

HerrIvan commented May 1, 2024 •

edited

GaalDorn1k commented May 2, 2024

GaalDorn1k commented May 2, 2024

HerrIvan commented May 2, 2024 •

edited

GaalDorn1k commented May 2, 2024

[R-228] Testset generation. TypeError: unsupported operand type(s) for -: 'str' and 'int' #900

[R-228] Testset generation. TypeError: unsupported operand type(s) for -: 'str' and 'int' #900

Comments

GaalDorn1k commented Apr 25, 2024 • edited by jjmachan

jjmachan commented Apr 28, 2024

HerrIvan commented May 1, 2024 • edited

GaalDorn1k commented May 2, 2024

GaalDorn1k commented May 2, 2024

HerrIvan commented May 2, 2024 • edited

GaalDorn1k commented May 2, 2024

GaalDorn1k commented Apr 25, 2024 •

edited by jjmachan

HerrIvan commented May 1, 2024 •

edited

HerrIvan commented May 2, 2024 •

edited