About the freebase_full.json file #10

Gungnir2099 · 2021-02-08T03:08:51Z

You gave a link of your research data. I downloaded it and I saw a file called freebase_full.json. In this file, each element has its neighbors and paths directing to them. I checked the freebase dump which offered by google with sparql language, the paths that each element connects in your json file are not the all. Which means there are more paths should be connected by each element in freebase_full.json. I’m curious about how did you decide which path or neighbor should be added in each record? How did you build freebase_full.json file?

All the best, appreciate!
Siqi Lai

hugochan · 2021-02-27T23:42:35Z

You gave a link of your research data. I downloaded it and I saw a file called freebase_full.json. In this file, each element has its neighbors and paths directing to them. I checked the freebase dump which offered by google with sparql language, the paths that each element connects in your json file are not the all. Which means there are more paths should be connected by each element in freebase_full.json. I’m curious about how did you decide which path or neighbor should be added in each record? How did you build freebase_full.json file?

All the best, appreciate!
Siqi Lai

Hi Siqi,

Thank you for your interest to our work! The short answer to your question is that in the freebase_full.json file, we only kept 2-hop subgraphs surrounding all the candidate topic entities appearing in the webquestions dataset. So basically what was stored in freebase_full.json is a small subset of the whole Freebase which is minimally necessary for answering questions in the webquestions dataset. You might want to look into the following data preprocessing scripts for more details.

As for the candidate topic entities, please refer to #9.

Gungnir2099 · 2021-02-28T03:21:49Z

Thank you. It helps a lot.

Gungnir2099 · 2021-02-28T12:19:02Z

Thanks for your reply. I'm also facing problems in loading freebase dump ( 2013-06-09 version, which is the version WebQ dataset maker used ). If a triple’s object is empty, this results in ‘‘Unrecognized: [DOT]’’ error; if a triple’s predicate ID contains a ‘‘$’’ character, it will result in ‘‘Unknown char : $’’ error, and so on. These errors will cause the load process to be interrupted. What should I do in this situation? Do you have any code to process the dump and make it can be parsed into Apache Jena or Virtuoso? Thanks a lot. Appreciate! Siqi Lai

…

On Sun, Feb 28, 2021 at 7:42 AM Yu Chen ***@***.***> wrote: You gave a link of your research data. I downloaded it and I saw a file called freebase_full.json. In this file, each element has its neighbors and paths directing to them. I checked the freebase dump which offered by google with sparql language, the paths that each element connects in your json file are not the all. Which means there are more paths should be connected by each element in freebase_full.json. I’m curious about how did you decide which path or neighbor should be added in each record? How did you build freebase_full.json file? All the best, appreciate! Siqi Lai Hi Siqi, Thank you for your interest to our work! The short answer to your question is that in the freebase_full.json file, we only kept 2-hop subgraphs surrounding all the candidate topic entities appearing in the webquestions dataset. So basically what was stored in freebase_full.json is a small subset of the whole Freebase which is minimally necessary for answering questions in the webquestions dataset. You might want to look into the following data preprocessing scripts for more details. 1. get topics entity list <https://github.com/hugochan/BAMnet/blob/master/src/core/build_data/webquestions.py#L28> 2. get 2-hop subgraphs given a topic entity list <https://github.com/hugochan/BAMnet/blob/master/src/run_freebase.py> — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#10 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOZ2TKSXU5OZ6JGWEXQTA3DTBF7PNANCNFSM4XIFU2GA> .

hugochan · 2021-03-01T04:19:12Z

Thanks for your reply. I'm also facing problems in loading freebase dump ( 2013-06-09 version, which is the version WebQ dataset maker used ). If a triple’s object is empty, this results in ‘‘Unrecognized: [DOT]’’ error; if a triple’s predicate ID contains a ‘‘$’’ character, it will result in ‘‘Unknown char : $’’ error, and so on. These errors will cause the load process to be interrupted. What should I do in this situation? Do you have any code to process the dump and make it can be parsed into Apache Jena or Virtuoso? Thanks a lot. Appreciate! Siqi Lai

I used the freebase dump released by this ACL 2014 paper [1]. They provided the dumped results from Freebase Search API. Here is the link to the data: http://cs.jhu.edu/~xuchen/packages/freebase-data.tar

[1] Yao, Xuchen, and Benjamin Van Durme. "Information extraction over structured data: Question answering with freebase." Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2014.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the freebase_full.json file #10

About the freebase_full.json file #10

Gungnir2099 commented Feb 8, 2021

hugochan commented Feb 27, 2021 •

edited

Gungnir2099 commented Feb 28, 2021

Gungnir2099 commented Feb 28, 2021 via email

hugochan commented Mar 1, 2021

About the freebase_full.json file #10

About the freebase_full.json file #10

Comments

Gungnir2099 commented Feb 8, 2021

hugochan commented Feb 27, 2021 • edited

Gungnir2099 commented Feb 28, 2021

Gungnir2099 commented Feb 28, 2021 via email

hugochan commented Mar 1, 2021

hugochan commented Feb 27, 2021 •

edited