Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileNotFoundError (demons.json) for custom knowledge base #28

Open
MathiasKraus opened this issue Oct 26, 2023 · 2 comments
Open

FileNotFoundError (demons.json) for custom knowledge base #28

MathiasKraus opened this issue Oct 26, 2023 · 2 comments

Comments

@MathiasKraus
Copy link

MathiasKraus commented Oct 26, 2023

Hello,

First off, I'd like to express my appreciation for this great package you've developed. I'm in the process of testing a scenario where I evaluate the quality of generated summaries based on a custom knowledge base. Any guidance or pointers would be greatly appreciated!

For this purpose, I create the following knowledge.jsonl file:

{"title": "Gravity", "text": "Gravity is a force by which a planet or other body draws objects toward its center. The force of gravity keeps all of the planets in orbit around the sun."}
{"title": "Photosynthesis", "text": ["Photosynthesis is the process by which green plants and some other organisms use sunlight to synthesize foods with the help of chlorophyll pigments.", "In simple words, it is the process where plants make their own food using sunlight."]}
{"title": "Pythagorean Theorem", "text": "In mathematics, the Pythagorean theorem, also known as Pythagoras's theorem, is a fundamental relation in Euclidean geometry among the three sides of a right triangle. It states that the square of the hypotenuse is equal to the sum of the squares of the other two sides."}

and, following the example in the README, run the code:

fs = FactScorer(openai_key="...")
fs.register_knowledge_source("science_knowledge_base",
                             data_path="/content/knowledge.jsonl",
                             db_path="/content/knowledge_db")
topics = ["Gravity", "Photosynthesis", "Pythagorean Theorem"]
generations = ["Gravity is a force that draws objects toward the center of a planet or body, keeping planets in orbit around the sun.",
               "Photosynthesis allows plants and certain organisms to create food using sunlight and chlorophyll.",
               "This theorem in Euclidean geometry relates the three sides of a right triangle, stating that the hypotenuse's square is the sum of the squares of the other sides."]

out = fs.get_score(topics, generations, knowledge_source="science_knowledge_base")

In the last line however I receive the following error message:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
[<ipython-input-47-58ee9f60532e>](https://localhost:8080/#) in <cell line: 2>()
      1 # now, when you compute a score, specify knowledge source to use
----> 2 out = fs.get_score(topics, generations, knowledge_source="science_knowledge_base")
      3 print (out["score"]) # FActScore
      4 print (out["respond_ratio"]) # % of responding (not abstaining from answering)
      5 print (out["num_facts_per_response"]) # average number of atomic facts per response

1 frames
[/usr/local/lib/python3.10/dist-packages/factscore/factscorer.py](https://localhost:8080/#) in get_score(self, topics, generations, gamma, atomic_facts, knowledge_source, verbose)
    127         else:
    128             if self.af_generator is None:
--> 129                 self.af_generator = AtomicFactGenerator(key_path=self.openai_key,
    130                                                         demon_dir=os.path.join(self.data_dir, "demos"),
    131                                                         gpt3_cache_file=os.path.join(self.cache_dir, "InstructGPT.pkl"))

[/usr/local/lib/python3.10/dist-packages/factscore/atomic_facts.py](https://localhost:8080/#) in __init__(self, key_path, demon_dir, gpt3_cache_file)
     27 
     28         # get the demos
---> 29         with open(self.demon_path, 'r') as f:
     30             self.demons = json.load(f)
     31 

FileNotFoundError: [Errno 2] No such file or directory: '.cache/factscore/demos/demons.json'

I'm trying to understand the role of demons.json and necessity. Despite my efforts to comb through the code, I couldn't quite grasp its purpose. Could you shed some light on this?

System: I am running this on colab and installed the factscore package using pip install --upgrade factscore.

Thank you very much in advance!

@MathiasKraus
Copy link
Author

I made it work now by putting the demons.json file in the folder. However, I am wondering why I need this for a custom knowledge base. Could you help me understand this?

@martiansideofthemoon
Copy link
Collaborator

martiansideofthemoon commented Oct 26, 2023

Hi @MathiasKraus, thanks a lot for your interest in our work!

The demonstrations are needed for atomic fact generation with davinci-003, which is used irrespective of the knowledge base. Did you run the following command in your setup? It downloads all the needed data for you. https://github.com/shmsw25/FActScore#download-the-data

You could skip the --llama_7B_HF_path "llama-7B" flag here if you are only using OpenAI models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants