Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] stream mode generator can't have final llm output as into to other node #3101

Open
vhan2kpmg opened this issue May 2, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@vhan2kpmg
Copy link

Is your feature request related to a problem? Please describe.
We have a use case like: [llm_node] -> [save_complete_answer_in_external_history_node]. when we have [llm_node] stream mode turned on, we can't save history in DAG, instead we will need to put process of final output from llm_node outside of DAG

Describe the solution you'd like
Can we have a output parameter in llm node to have final output. Take this as example

dag.yaml

$schema: https://azuremlschemas.azureedge.net/promptflow/latest/Flow.schema.json
inputs:
  chat_history:
    type: list
    default: []
  question:
    type: string
    is_chat_input: true
    default: What is ChatGPT?
outputs:
  answer:
    type: string
    reference: ${chat.output.answer}
    is_chat_output: true
  final_answer:
    type: string
    reference: ${chat.output.final_answer}
    is_chat_output: false
nodes:
- inputs:
    # This is to easily switch between openai and azure openai.
    # deployment_name is required by azure openai, model is required by openai.
    deployment_name: gpt-35-turbo
    model: gpt-3.5-turbo
    max_tokens: "256"
    temperature: "0.7"
    chat_history: ${inputs.chat_history}
    question: ${inputs.question}
  ##### can we have this  #####
  output:
    answer: answer_generator
    final_answer: final_answer_string
  #### can we have this finished  #####
  name: chat
  type: llm
  source:
    type: code
    path: chat.jinja2
  api: chat
  connection: open_ai_connection
 ##### below are consumer of final answer example  #####
- name: save_history
  type: python
  source:
    type: code
    path: save_history.py
  inputs:
    final_answer: ${chat.output.final_answer}
  ##### consumer of final answer example ends #####
node_variants: {}
environment:
    python_requirements_txt: requirements.txt

And

f = load_flow(source="../../examples/flows/chat/chat-basic/")
f.context.streaming = True
result = f(
    chat_history=[
        {
            "inputs": {"chat_input": "Hi"},
            "outputs": {"chat_output": "Hello! How can I assist you today?"},
        }
    ],
    question="How are you?",
)


answer = ""
# the result will be a generator, iterate it to get the result
for r in result["answer"]:
    answer += r
    
 # result['final_answer'] should be same with "answer" after generator finished

Describe alternatives you've considered
not sure
Additional context
not sure

@vhan2kpmg vhan2kpmg added the enhancement New feature or request label May 2, 2024
@0mza987
Copy link
Contributor

0mza987 commented May 6, 2024

Hi @vhan2kpmg ,

Just use the ${chat.output} as the input of the save_history node:

$schema: https://azuremlschemas.azureedge.net/promptflow/latest/Flow.schema.json
inputs:
  chat_history:
    type: list
    default: []
  question:
    type: string
    is_chat_input: true
    default: What is ChatGPT?
outputs:
  answer:
    type: string
    reference: ${chat.output}
    is_chat_output: true
nodes:
- inputs:
    # This is to easily switch between openai and azure openai.
    # deployment_name is required by azure openai, model is required by openai.
    deployment_name: gpt-35-turbo
    model: gpt-3.5-turbo
    max_tokens: "256"
    temperature: "0.7"
    chat_history: ${inputs.chat_history}
    question: ${inputs.question}
  name: chat
  type: llm
  source:
    type: code
    path: chat.jinja2
  api: chat
  connection: open_ai_connection
- name: save_history
  type: python
  source:
    type: code
    path: save_history.py
  inputs:
    record: ${chat.output}


node_variants: {}
environment:
    python_requirements_txt: requirements.txt

And then use the flow as function to run, with streaming mode enbled:

from promptflow import load_flow

f = load_flow(source=r"E:\programs\msft-promptflow\examples\flows\chat\chat-basic-streaming")
f.context.streaming = True

result = f(
    chat_history=[
        {
            "inputs": {"chat_input": "Hi"},
            "outputs": {"chat_output": "Hello! How can I assist you today?"},
        }
    ],
    question="How are you?",
)


answer = ""
# the result will be a generator, iterate it to get the result
for r in result["answer"]:
    answer += r

print(answer)

Inside the save_history node I save the record to a local txt file:

from promptflow.core import tool

@tool
def save(record: str):
    # append the record to the history file
    with open("history.txt", "a") as f:
        f.write(record + "\n")
    print(f"Recorded: {record}")

Everytime I run this flow the record can be recorded to the txt file. Could you please provide more details about the statement:

when we have [llm_node] stream mode turned on, we can't save history in DAG

What's the error message, and do you have a sample to repro it?

@0mza987 0mza987 self-assigned this May 6, 2024
@vhan2kpmg
Copy link
Author

vhan2kpmg commented May 13, 2024

Hi, thanks for your reply. Sorry I may not explain clearly initially, we can save history, just we will lose the benefit of steam mode meanwhile. From what I tested,

  • if no save_history node in dag, stream result will be in chunk, e. g result["answer"] will be a generator.
  • However, if there is a node after llm node, stream result will be just final output, e.g. result["answer"] will be string.

That makes sense in some way because dag outputs are only ready when all nodes are finished? But the purpose of stream mode is to have answer chunk by chunk before finial result finishes, if there is node after llm node, dag wait until all node finishes ? in out example dag wait until response is saved, when all llm response chunks are finished.

Is there anyway we can output generator immediately, meanwhile leave save history as some background tasks?

@wxpjimmy
Copy link
Contributor

wxpjimmy commented May 21, 2024

“ if there is a node after llm node, stream result will be just final output, e.g. result["answer"] will be string.”

You can also return a generator in that node, then that node works just like a generator hook, any iteration of the final node output will trigger llm output iteration. Here's a code sample:

from promptflow.core import tool

@tool
def save(llm_output):
    data = []
    for chunk in llm_output:
        data.append(chunk)
        yield chunk
    # append the record to the history file
    with open("history.txt", "a") as f:
        f.write(''.join(data) + "\n")

Is this what you want? @vhan2kpmg

@0mza987
Copy link
Contributor

0mza987 commented May 22, 2024

However, if there is a node after llm node, stream result will be just final output, e.g. result["answer"] will be string.

I thinks that's due to the python nature that when you start reading the content then it means the iteration starts, you cannot iterate the same iterator for two times

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants