Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor agent delegation and tweak micro agents #1910

Merged
merged 1 commit into from
May 29, 2024

Conversation

li-boxuan
Copy link
Collaborator

@li-boxuan li-boxuan commented May 20, 2024

This PR fixes #1897. In addition, this PR fixes and tweaks a few micro-agents.

For the first time, I am able to use ManagerAgent to complete test_write_simple_script and test_edits tasks in integration tests, so this PR also adds ManagerAgent as part of integration tests. test_write_simple_script involves delegation to CoderAgent while test_edits involves delegation to TypoFixerAgent.

Also for the first time, I am able to use DelegateAgent to complete test_write_simple_script and test_edits tasks in integration tests, so this PR also adds DelegateAgent as part of integration tests. It involves delegation to StudyRepoForTaskAgent, CoderAgent and VerifierAgent.

This PR is a blocker for #1735 and likely #1945.

TODOs:

  • Delegate controller starting from step 0
  • Parent history should include child's history
  • Child should not know parent's history
  • Make sure DelegateAgent work
  • Add sanity checks for subscriptions

Future work:

  • Add a config that controls global max steps

@li-boxuan li-boxuan marked this pull request as ready for review May 20, 2024 04:28
@li-boxuan li-boxuan requested a review from rbren May 20, 2024 04:28
@li-boxuan
Copy link
Collaborator Author

@rbren This might have some overlap/conflict with #1660, but I believe we anyways need to fix the current broken AgentDelegateAction. Your review is greatly appreciated!

@li-boxuan li-boxuan added agent framework strategies for prompting, agent, etc micro agent add or update a micro-agent labels May 20, 2024
Comment on lines 67 to 81
def subscribe(self, id: EventStreamSubscriber, callback: Callable):
if id in self._subscribers:
raise ValueError('Subscriber already exists: ' + id)
self._subscribers[id].append(callback)
else:
self._subscribers[id] = callback
self._subscribers[id] = [callback]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone back and forth on this, but I think we should raise an exception here. It's indicative of something going very wrong. Not raising here caused me to miss a pretty severe bug in another PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see the issue--you've got multiple AgentControllers registering themselves...

Let me think on this one for a minute.

Copy link
Collaborator Author

@li-boxuan li-boxuan May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s by design and unavoidable when you have (multi-layer) delegates - so I maintain a stack here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the parent AgentController just calls on_event to its child? Then we only subscribe once from the top-level AgentController.

We should probably pass None as the event_stream to the Delegate in order to prevent issues

This comment was marked as outdated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a viable approach I think, but it would be hard to implement/manage multi-layer agent delegate. If grandpa delegates to parent, which in turn delegates to a child, e.g. main agent -> database agent -> postgres migration agent, it would be hard/inelegant to let the root AgentController handle everything.

Can you explain more the issue here? Each AgentController would get to see every event, so it should be flexible enough to handle every possible case.

Copy link
Collaborator

@rbren rbren May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, the logic I'm envisioning is something like

def on_event(evt):
  if isinstance(evt, Foo):
    # ...all the normal logic...
  if self.delegate:
    self.delegate.on_event(evt)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that’s a great idea! Will try

Copy link
Collaborator Author

@li-boxuan li-boxuan May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I am not happy with the current implementation either (need to check whether itself is a delegate before it subscribe/unsubscribe) but looks like there's no perfect approach!

We cannot just pass None as event stream because AgentController needs to call event_stream.add_event.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it back to the original implementation - let each AgentController worry about their own subscriptions and listeners. I added an append parameter to event_stream.subscribe method, and only delegates could set this parameter to true. This should largely prevent bugs like duplicate subscription...

Copy link
Collaborator

@rbren rbren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM overall! Just one suggestion on history

@neubig neubig assigned li-boxuan and unassigned rbren May 23, 2024
@rbren
Copy link
Collaborator

rbren commented May 23, 2024

@li-boxuan for some reason I can't open a PR into your branch, but here's my repro:

https://github.com/OpenDevin/OpenDevin/tree/rb/delegate-repro

See the edits in DummyAgent--it builds a stack of delegates, which just finish immediately

This consistently trips over the error:

poetry run python opendevin/core/main.py -c DummyAgent -t "write a todo list app in react"

@li-boxuan
Copy link
Collaborator Author

for some reason I can't open a PR into your branch

this is on my fork, but you should have access to push to my branch directly. You just need to first do git remote add li-boxuan https://github.com/li-boxuan/OpenDevin.git

@li-boxuan
Copy link
Collaborator Author

li-boxuan commented May 24, 2024

I'll come back and work on this this weekend. Hope I could make some progress Friday evening (PT time).

@li-boxuan li-boxuan marked this pull request as draft May 26, 2024 04:51
{
"action": "read",
"args": {
"path": "good.txt"
Copy link
Collaborator Author

@li-boxuan li-boxuan May 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, verifier Agent has severely hallucinated that there's a "good.txt" - a name coming out of nowhere.

In the long term, we do need to think of some intermediate representation of code changes - e.g. ".patch" or ".diff" so that the verifier agent could really see the "updates". Now it only sees the snapshot after the "fix" is already applied.

Alternatively, maybe we should work on the prompts and don't assign tasks to verifier agent if it's not in a git repository.

@li-boxuan li-boxuan changed the title Revamp agent delegation Refactor agent delegation and tweak micro agents May 27, 2024
@li-boxuan li-boxuan marked this pull request as ready for review May 27, 2024 07:41
await self.event_stream.add_event(obs, EventSource.AGENT)

# update current controller's state
self.state.history += self.delegate.state.history
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please help me understand this a little, now that the delegate is done, does the parent need to send the child's historical messages to the LLM going forward?

Or is there another purpose for updating itself with its history?

Copy link
Collaborator Author

@li-boxuan li-boxuan May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally suggested by @rbren here: #1910 (comment)

I think the rationale is, the parent might be interested in learning what the child has done, to drive the task forward. It is debatable, though, that the parent may only need to know the result/summary of its child. But should parent trust child's conclusion? What if the child executes with a weaker LLM (not possible at the moment but maybe in the future to lower down the cost - parent agent using more powerful LLM while delegates use small models). Would a memory condenser be interesting? These open questions are hard to answer without benchmarking...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I've gone back and forth on this.

I think keeping the parent's memory out of the child makes a lot of sense. I could also see a case for keeping the child's memory out of the parent.

I forget what the problem was that led me to that original comment, but it definitely broke something I was working on...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say we get this in either way, and we can take another look at history management soon

Copy link
Collaborator Author

@li-boxuan li-boxuan May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah let's keep it this way at the moment.

Copy link
Collaborator Author

@li-boxuan li-boxuan May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed my mind when I was working on #2103. The parent agent would get confused when it sees the child agent's history because the parent doesn't know those commands used by child agent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now, the parent agent only learns about the result provided by the child agent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be pure luck, but after removing child history, I ran regenerate and ManagerAgent was able to finish the test tasks in fewer turns.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! Thanks for the details. Hmm, food for thought... What LLM did you use in the tests?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using gpt-4o (and gpt-4-turbo, sometimes).


## Examples

Here is an example of how you can interact with the environment for task solving:
Copy link
Collaborator Author

@li-boxuan li-boxuan May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I had to do this one-shot learning otherwise StudyRepoForTaskAgent, backed by GPT-4o or GPT-4-turbo, consistently writes code to finish the task by itself, no matter how much I emphasize it should not make changes/write data/implement the task by itself.

What's more interesting is, if I copy the full prompt (without this example) to ChatGPT-4, it can give reasonable results - calling finish action once it sees the repository is empty.

@li-boxuan li-boxuan force-pushed the fix-delegation branch 2 times, most recently from dcd87b4 to 2ae6051 Compare May 28, 2024 07:08
@li-boxuan
Copy link
Collaborator Author

@rbren I believe there are still a few debatable designs such as history management, but I am going to merge this PR as it is, since agent delegation is broken at the moment, and this PR doesn't introduce regression. We could discuss & improve later.

@li-boxuan li-boxuan merged commit 9b371b1 into OpenDevin:main May 29, 2024
18 checks passed
@li-boxuan li-boxuan deleted the fix-delegation branch May 29, 2024 03:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent framework strategies for prompting, agent, etc micro agent add or update a micro-agent
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: AgentDelegateAction no more works
5 participants