Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the freebase_full.json file #10

Open
Gungnir2099 opened this issue Feb 8, 2021 · 4 comments
Open

About the freebase_full.json file #10

Gungnir2099 opened this issue Feb 8, 2021 · 4 comments

Comments

@Gungnir2099
Copy link

You gave a link of your research data. I downloaded it and I saw a file called freebase_full.json. In this file, each element has its neighbors and paths directing to them. I checked the freebase dump which offered by google with sparql language, the paths that each element connects in your json file are not the all. Which means there are more paths should be connected by each element in freebase_full.json. I’m curious about how did you decide which path or neighbor should be added in each record? How did you build freebase_full.json file?

All the best, appreciate!
Siqi Lai

@hugochan
Copy link
Owner

hugochan commented Feb 27, 2021

You gave a link of your research data. I downloaded it and I saw a file called freebase_full.json. In this file, each element has its neighbors and paths directing to them. I checked the freebase dump which offered by google with sparql language, the paths that each element connects in your json file are not the all. Which means there are more paths should be connected by each element in freebase_full.json. I’m curious about how did you decide which path or neighbor should be added in each record? How did you build freebase_full.json file?

All the best, appreciate!
Siqi Lai

Hi Siqi,

Thank you for your interest to our work! The short answer to your question is that in the freebase_full.json file, we only kept 2-hop subgraphs surrounding all the candidate topic entities appearing in the webquestions dataset. So basically what was stored in freebase_full.json is a small subset of the whole Freebase which is minimally necessary for answering questions in the webquestions dataset. You might want to look into the following data preprocessing scripts for more details.

  1. get topics entity list
  2. get 2-hop subgraphs given a topic entity list

As for the candidate topic entities, please refer to #9.

@Gungnir2099
Copy link
Author

Thank you. It helps a lot.

@Gungnir2099
Copy link
Author

Gungnir2099 commented Feb 28, 2021 via email

@hugochan
Copy link
Owner

hugochan commented Mar 1, 2021

Thanks for your reply. I'm also facing problems in loading freebase dump ( 2013-06-09 version, which is the version WebQ dataset maker used ). If a triple’s object is empty, this results in ‘‘Unrecognized: [DOT]’’ error; if a triple’s predicate ID contains a ‘‘$’’ character, it will result in ‘‘Unknown char : $’’ error, and so on. These errors will cause the load process to be interrupted. What should I do in this situation? Do you have any code to process the dump and make it can be parsed into Apache Jena or Virtuoso? Thanks a lot. Appreciate! Siqi Lai

I used the freebase dump released by this ACL 2014 paper [1]. They provided the dumped results from Freebase Search API. Here is the link to the data: http://cs.jhu.edu/~xuchen/packages/freebase-data.tar

[1] Yao, Xuchen, and Benjamin Van Durme. "Information extraction over structured data: Question answering with freebase." Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2014.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants