Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to predict AMR graph for arbitrary sentences? #5

Open
TonalidadeHidrica opened this issue Jul 21, 2019 · 10 comments
Open

How to predict AMR graph for arbitrary sentences? #5

TonalidadeHidrica opened this issue Jul 21, 2019 · 10 comments

Comments

@TonalidadeHidrica
Copy link

Let me ask a question. Is there any way to predict an AMR graph for the sentences not in AMR corpus? The README describes how to run predictions on the test sentences in AMR corpus, but I couldn't find ways to apply it to another sentence, using the trained parameters. As far as I can see, the prediction consists of several steps: given any AMR files, do feature annotations (step 3.), apply preprocess to the graph (step 4.), and do the prediction (step 6.). However, according to these steps, it seems that a sentence needs to be accompanied with AMR graph. Am I misunderstanding something? Is there any way to do prediction on any other sentences, either on command line or calling the Python codes?

@sheng-z
Copy link
Owner

sheng-z commented Jul 21, 2019

Hi,

At test time, yes, we take the gold AMR file as input, but we do not use the gold graph at all. If you take a look at this line (https://github.com/sheng-z/stog/blob/master/stog/models/stog.py#L324), the gold graph is only used at the training stage.
We do so for evaluation convenience: people can directly use the AMR files for evaluation.

With modifications to the data loader, it should be able to take plain text as input.
We will probably push an update on it later this year.

@TonalidadeHidrica
Copy link
Author

Thank you for your reply. So, as far as I read the scripts, the following steps seems to be a way to achieve it, given the current state. Is it true? Are there any necessary or unnecessary steps here?

  • Create an AMR file with desired sentence, dummy id and dummy AMR graph, like the following:
# ::id dummy_id_1
# ::snt This is the first test that I'm going to apply for this program.
(d / dummy)
  • Annotate feature (stog.data.dataset_readers.amr_parsing.preprocess.feature_annotator)
  • Preprocess data
    • Clean input (stog.data.dataset_readers.amr_parsing.preprocess.input_cleaner)
    • Recategorize subgraph (stog.data.dataset_readers.amr_parsing.preprocess.text_anonymizor)
    • Remove senses (stog.data.dataset_readers.amr_parsing.preprocess.sense_remover)
  • Do prediction (stog.commands.predict)

@xdqkid
Copy link

xdqkid commented Jul 29, 2019

Thank you for your reply. So, as far as I read the scripts, the following steps seems to be a way to achieve it, given the current state. Is it true? Are there any necessary or unnecessary steps here?

  • Create an AMR file with desired sentence, dummy id and dummy AMR graph, like the following:
# ::id dummy_id_1
# ::snt This is the first test that I'm going to apply for this program.
(d / dummy)
  • Annotate feature (stog.data.dataset_readers.amr_parsing.preprocess.feature_annotator)

  • Preprocess data

    • Clean input (stog.data.dataset_readers.amr_parsing.preprocess.input_cleaner)
    • Recategorize subgraph (stog.data.dataset_readers.amr_parsing.preprocess.text_anonymizor)
    • Remove senses (stog.data.dataset_readers.amr_parsing.preprocess.sense_remover)
  • Do prediction (stog.commands.predict)

That's OK. Remember2build json file- 'data/AMR/amr_2.0_utils/spotlight_wiki.json', then u can apply postprocess-operation. The json file is simple, just contains { "sentences":{}, ...}, then it works, but amr quality is not good

It seems that now AMR Parses that I have used for arbitrary sentences are bad.

Now anonymization is limited to this amr parser. I hope it can be improved in the future version and work well.

@SimonWesterlind
Copy link

@TonalidadeHidrica . I would be very interested to hear about how this worked out for you. Did you succeed? Was the quality good? How long does it take for you to make a prediction?

@sinhzun
Copy link

sinhzun commented Jan 4, 2020

Thank you for your reply. So, as far as I read the scripts, the following steps seems to be a way to achieve it, given the current state. Is it true? Are there any necessary or unnecessary steps here?

  • Create an AMR file with desired sentence, dummy id and dummy AMR graph, like the following:
# ::id dummy_id_1
# ::snt This is the first test that I'm going to apply for this program.
(d / dummy)
  • Annotate feature (stog.data.dataset_readers.amr_parsing.preprocess.feature_annotator)

  • Preprocess data

    • Clean input (stog.data.dataset_readers.amr_parsing.preprocess.input_cleaner)
    • Recategorize subgraph (stog.data.dataset_readers.amr_parsing.preprocess.text_anonymizor)
    • Remove senses (stog.data.dataset_readers.amr_parsing.preprocess.sense_remover)
  • Do prediction (stog.commands.predict)

That's OK. Remember2build json file- 'data/AMR/amr_2.0_utils/spotlight_wiki.json', then u can apply postprocess-operation. The json file is simple, just contains { "sentences":{}, ...}, then it works, but amr quality is not good

It seems that now AMR Parses that I have used for arbitrary sentences are bad.

Now anonymization is limited to this amr parser. I hope it can be improved in the future version and work well.

@xdqkid Could you explain how to build the json file? Do we need to write our own script to extract sentences with Wiki components?

@xdqkid
Copy link

xdqkid commented Jan 5, 2020

Thank you for your reply. So, as far as I read the scripts, the following steps seems to be a way to achieve it, given the current state. Is it true? Are there any necessary or unnecessary steps here?

  • Create an AMR file with desired sentence, dummy id and dummy AMR graph, like the following:
# ::id dummy_id_1
# ::snt This is the first test that I'm going to apply for this program.
(d / dummy)
  • Annotate feature (stog.data.dataset_readers.amr_parsing.preprocess.feature_annotator)

  • Preprocess data

    • Clean input (stog.data.dataset_readers.amr_parsing.preprocess.input_cleaner)
    • Recategorize subgraph (stog.data.dataset_readers.amr_parsing.preprocess.text_anonymizor)
    • Remove senses (stog.data.dataset_readers.amr_parsing.preprocess.sense_remover)
  • Do prediction (stog.commands.predict)

That's OK. Remember2build json file- 'data/AMR/amr_2.0_utils/spotlight_wiki.json', then u can apply postprocess-operation. The json file is simple, just contains { "sentences":{}, ...}, then it works, but amr quality is not good
It seems that now AMR Parses that I have used for arbitrary sentences are bad.
Now anonymization is limited to this amr parser. I hope it can be improved in the future version and work well.

@xdqkid Could you explain how to build the json file? Do we need to write our own script to extract sentences with Wiki components?

ahh, I forgot how to do it and my modified codes was deleted a week ago. I do not suggest follow this work : 1. it rely on anonymization excessively, which makes predicting arbitrary sentence perhaps have terrible amr graph. 2. preprocess is too difficult and it looks like that process tools - amr_2.0_utils has too much artificial processing shadow.(amr_2.0_utils for arbitrary sentence is non-public even now, I hope I'm wrong and apologize it in advance) 3. integrating elmo,bert...information, this work still can get 76.3+-0.1 score, I do not use bert but some my non-public data, and get about the same smatch f1 score.

@sinhzun
Copy link

sinhzun commented Jan 6, 2020

Thank you for your reply. So, as far as I read the scripts, the following steps seems to be a way to achieve it, given the current state. Is it true? Are there any necessary or unnecessary steps here?

  • Create an AMR file with desired sentence, dummy id and dummy AMR graph, like the following:
# ::id dummy_id_1
# ::snt This is the first test that I'm going to apply for this program.
(d / dummy)
  • Annotate feature (stog.data.dataset_readers.amr_parsing.preprocess.feature_annotator)

  • Preprocess data

    • Clean input (stog.data.dataset_readers.amr_parsing.preprocess.input_cleaner)
    • Recategorize subgraph (stog.data.dataset_readers.amr_parsing.preprocess.text_anonymizor)
    • Remove senses (stog.data.dataset_readers.amr_parsing.preprocess.sense_remover)
  • Do prediction (stog.commands.predict)

That's OK. Remember2build json file- 'data/AMR/amr_2.0_utils/spotlight_wiki.json', then u can apply postprocess-operation. The json file is simple, just contains { "sentences":{}, ...}, then it works, but amr quality is not good
It seems that now AMR Parses that I have used for arbitrary sentences are bad.
Now anonymization is limited to this amr parser. I hope it can be improved in the future version and work well.

@xdqkid Could you explain how to build the json file? Do we need to write our own script to extract sentences with Wiki components?

ahh, I forgot how to do it and my modified codes was deleted a week ago. I do not suggest follow this work : 1. it rely on anonymization excessively, which makes predicting arbitrary sentence perhaps have terrible amr graph. 2. preprocess is too difficult and it looks like that process tools - amr_2.0_utils has too much artificial processing shadow.(amr_2.0_utils for arbitrary sentence is non-public even now, I hope I'm wrong and apologize it in advance) 3. integrating elmo,bert...information, this work still can get 76.3+-0.1 score, I do not use bert but some my non-public data, and get about the same smatch f1 score.

@xdqkid Agree with your 2nd point about the preprocessing. Do you still keep any script used for this repo?

@xdqkid
Copy link

xdqkid commented Jan 6, 2020

Thank you for your reply. So, as far as I read the scripts, the following steps seems to be a way to achieve it, given the current state. Is it true? Are there any necessary or unnecessary steps here?

  • Create an AMR file with desired sentence, dummy id and dummy AMR graph, like the following:
# ::id dummy_id_1
# ::snt This is the first test that I'm going to apply for this program.
(d / dummy)
  • Annotate feature (stog.data.dataset_readers.amr_parsing.preprocess.feature_annotator)

  • Preprocess data

    • Clean input (stog.data.dataset_readers.amr_parsing.preprocess.input_cleaner)
    • Recategorize subgraph (stog.data.dataset_readers.amr_parsing.preprocess.text_anonymizor)
    • Remove senses (stog.data.dataset_readers.amr_parsing.preprocess.sense_remover)
  • Do prediction (stog.commands.predict)

That's OK. Remember2build json file- 'data/AMR/amr_2.0_utils/spotlight_wiki.json', then u can apply postprocess-operation. The json file is simple, just contains { "sentences":{}, ...}, then it works, but amr quality is not good
It seems that now AMR Parses that I have used for arbitrary sentences are bad.
Now anonymization is limited to this amr parser. I hope it can be improved in the future version and work well.

@xdqkid Could you explain how to build the json file? Do we need to write our own script to extract sentences with Wiki components?

ahh, I forgot how to do it and my modified codes was deleted a week ago. I do not suggest follow this work : 1. it rely on anonymization excessively, which makes predicting arbitrary sentence perhaps have terrible amr graph. 2. preprocess is too difficult and it looks like that process tools - amr_2.0_utils has too much artificial processing shadow.(amr_2.0_utils for arbitrary sentence is non-public even now, I hope I'm wrong and apologize it in advance) 3. integrating elmo,bert...information, this work still can get 76.3+-0.1 score, I do not use bert but some my non-public data, and get about the same smatch f1 score.

@xdqkid Agree with your 2nd point about the preprocessing. Do you still keep any script used for this repo?

I'm so sorry that I do not have any script about this repo because it takes too much space. The script is not too difficult, I think u can do it within a few days. b.t.w, I think "Modeling Source Syntax and Semantics for Neural AMR Parsing" is a good paper for any sentences parsing and has a not bad smatch score. Maybe, U can follow that work.

@sinhzun
Copy link

sinhzun commented Jan 6, 2020

@xdqkid That paper's idea is very nice, but the author has not released his source code. It's hard to follow :)

@xdqkid
Copy link

xdqkid commented Jan 6, 2020

@xdqkid That paper's idea is very nice, but the author has not released his source code. It's hard to follow :)

all right, I'm so sorry to hear that :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants