Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add autogenerated messages #518

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

sorgfresser
Copy link

Adds OpenAI integration to automatically generate messages.

Right now, this only works for Immobilienscout as it requires get_expose_details to also return some kind of description and the lessor.

I simply added the functionality for OpenAI for personal use but thought others might be interested.
Certainly, there are a lot of improvements that should be done before merging.
Please point them out to me so I can edit whatever should be rewritten!

Features:

  • Adds OpenAIProcessor
  • Adds OpenAI yaml section and Env Vars
  • Adds a new hunter (debatable whether this is necessary, I did it because I needed crawl_expose_details)
  • Modifies the message format to also include application

@sorgfresser
Copy link
Author

sorgfresser commented Jan 21, 2024

Aaaand the Lockfile looks broken - I'll dive into it
Edit: should be fixed.

@codders
Copy link

codders commented Jan 24, 2024

Hi @sorgfresser ,

Thanks for the PR. I've got to say, I have pretty mixed feelings about this. On the one hand I'm a bit worried about hastening the time when all actual content on the internet is drowned out by LLM garbage - https://marketoonist.com/2023/03/ai-written-ai-read.html . On the other hand, the market for apartments is already a trash market full of trash and flathunter exists to make that problem worse.

I'll review the code anyway, and if we get it in a good shape I'm happy to merge it.

Copy link

@codders codders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good - thanks for this. Some changes I would like to see from my side, and please take a look at CI to see what pyright and pylint want.

@@ -31,7 +31,7 @@ def launch_flat_hunt(config, heartbeat: Heartbeat):

wait_during_period(time_from, time_till)

hunter = Hunter(config, id_watch)
hunter = OpenAIHunter(config, id_watch)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want to create an OpenAIHunter here, instead of just adding generate_text to the filters for the default Hunter?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Sorry it's been a while...
It's not just about the generate_text, it's mostly about crawl_expose_details which is needed for the description of an exposé and the description in turn is needed to extract the features worth mentioning in an application (at least that was my idea).
I do agree that changing the Hunter is silly and maybe the whole Hunter itself, but I did not like the idea of crawling details everytime.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As another idea: I could introduce some additional logic in generate_text() for this. I was thinking something like

    def generate_text(self):
        """Add processor to generate text, if enabled"""
        if self.config.openai_enabled():
            if not any([isinstance(processor, CrawlExposeDetails) for processor in self.processors]):
                self.crawl_expose_details()
            self.processors.append(OpenAIProcessor(self.config))
        return self

How about this to prevent the need for a new hunter?

flathunter/chrome_wrapper.py Outdated Show resolved Hide resolved
@@ -181,12 +181,26 @@ def get_page(self, search_url, driver=None, page_no=None):

def get_expose_details(self, expose):
"""Loads additional details for an expose by processing the expose detail URL"""
soup = self.get_soup_from_url(expose['url'])
driver = self.get_driver()
if driver is not None:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do I understand from this that you need to open a second tab to load the expose to get the details? Is there a reason to do this here instead of in the OpenAIProcessor? If we have to do it here, can we have a switch to skip this if the OpenAI feature is disabled? Otherwise everybody not using OpenAI will make a bunch of unnecessary fetches.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the whole get_expose_details is broken for me with immobilienscout if one does not add a driver. This is due to a request initially returning "Checking that you're not a robot" and only once this check is finished the final content of the page will be returned. But I can simply cherry-pick this into a different pull request.

from flathunter.processor import ProcessorChain
from flathunter.exceptions import BotBlockedException, UserDeactivatedException

class OpenAIHunter(Hunter):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine but per my other comment I think I would prefer just to change the default Hunter.

messages = [
{
"role": "system",
"content": f"You are helping in generating an application for an exposé. For this, you will be provided with a dictionary which contains information on the kind of flat you are applying for and a prewritten text. The application is supposed to be written in {self.language}. Fill in the blanks, marked by [] in the text with the information from the dictionary. Do not directly copy text (except room information or similar things) from the dictionary - instead, paraphrase it."
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering if it makes sense also to have a German option here - presumably a lot of people what to generate their texts in German.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea! I'm currently running this with a german template and it works just fine - nonetheless, I can add an option for German if you'd like me to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants