Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Svm context engine #635

Open
wants to merge 1,501 commits into
base: master
Choose a base branch
from
Open

Svm context engine #635

wants to merge 1,501 commits into from

Conversation

ShraddhaThumsi
Copy link

Preparing to merge SVMContextEngine to master

… the bug was arising. Hopefully, this patch does it.
… the bug was arising. Hopefully, this patch does it.
… the bug was arising. Hopefully, this patch does it.
… the bug was arising. Hopefully, this patch does it.
… the bug was arising. Hopefully, this patch does it.
… the bug was arising. Hopefully, this patch does it.
… the bug was arising. Hopefully, this patch does it.
… the bug was arising. Hopefully, this patch does it.
… the bug was arising. Hopefully, this patch does it.
… the bug was arising. Hopefully, this patch does it.
@enoriega
Copy link
Member

enoriega commented Sep 16, 2019

@ShraddhaThumsi , Travis is reporting some of the tests failing because it's searching for a file hardcoded to a path in your computer

For example: /home/sthumsi/enter/reach/main/src/main/resources/org/clulab/context/svmFeatures/svmTrainedModel.dat

That is not the only one, please take a look into Travis' build log to find all the pertinent file paths.

Can you fix those, please?

Copy link
Member

@enoriega enoriega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor clean up requests and I think its ready

.sbtopts Outdated Show resolved Hide resolved
logfile_IS_UNDEFINED Outdated Show resolved Hide resolved
main/build.sbt Outdated
@@ -35,6 +37,8 @@ libraryDependencies ++= {
// testing
"org.scalatest" %% "scalatest" % "3.0.1" % "test",
"com.typesafe.akka" %% "akka-testkit" % akkaV % "test"
//"org.ml4ai" %% "scalacontext" % "0.1.0-SNAPSHOT"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please erase this commented line

main/build.sbt Outdated Show resolved Hide resolved
main/build.sbt Outdated Show resolved Hide resolved
main/src/main/resources/application.conf Outdated Show resolved Hide resolved
# the output formats for mentions:
# "arizona" (column-based, one file per paper)
# "cmu" (column-based, one file per paper)
# "fries" (multiple JSON files per paper)
# "serial-json" (JSON serialization of mentions data structures. LARGE output!)
# "text" (non-JSON textual format)
outputTypes = ["fries"]
outputTypes = ["fries", "arizona", "cmu", "serial-json", "text"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please restore this setting to its original value

main/src/main/scala/org/clulab/reach/ReachSystem.scala Outdated Show resolved Hide resolved
…Will make another commit for that, and I'll be ready to push it
…. have been attended to. The log file is deleted.
Copy link
Member

@enoriega enoriega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good now

@enoriega
Copy link
Member

@MihaiSurdeanu let me know if you want to take a look, otherwise I consider this is ready to be merged.

@ShraddhaThumsi
Copy link
Author

ShraddhaThumsi commented Sep 28, 2019 via email

@MihaiSurdeanu
Copy link
Contributor

@enoriega, @ShraddhaThumsi, @cl4yton: this PR was never merged.
Should we merge now? Can one of you summarize what is included here?
Thanks!

@enoriega
Copy link
Member

enoriega commented Mar 7, 2020

@MihaiSurdeanu This pull request contains @ShraddhaThumsi 's implementation of the SVM context engine, a training script for a linear SVM and ensures no unit test is broken. You requested to hold back from merging because the validation of the trained model is still pending.

@MihaiSurdeanu
Copy link
Contributor

MihaiSurdeanu commented Mar 8, 2020 via email

Copy link

@cl4yton cl4yton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have merged master into this branch and all tests are currently passing. I have asked @enoriega to do another pass, and also added @MihaiSurdeanu to also review. I do not have authority to complete the pull request.

@MihaiSurdeanu
Copy link
Contributor

@enoriega: please let me know when you're done, and I'll do a pass too.

Copy link
Member

@enoriega enoriega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good. All tests pass

@enoriega
Copy link
Member

I think this is okay, however, I wonder if you want to merge until I write the README for the SVMContextEngine and co.

As long as you select the Policy4 context engine, nothing should be affected after the merge

@MihaiSurdeanu
Copy link
Contributor

Thanks @enoriega !

I'll wait for the README. Please let me know when it's available.

…onfigure the SVMContextEngine in application.conf
@MihaiSurdeanu
Copy link
Contributor

@enoriega, @ShraddhaThumsi, @cl4yton:
I propose to not merge this yet, until we solve the issue discussed in the email thread. Pasted here, for completeness:

ENRIQUE:
My apologies for getting back to you until today. I wrote a README on how to use the training script left by Shraddha and on how to configure the SVMContextEngine on the application.conf file.

By doing this I detected two annoying details I didn’t realize back then, but fortunately they’re easy to fix. The first is that the svm model file is hard coded and can’t be changed in the config file.

The second is that there’s a file that specifies a subset of features to use for training. This file is supposed to be a text file but instead is a serialized java string. This should be simple to solve too.

The features file is a product of the experiments Paul Hein ran for the paper. It is not a raw data file because several attempts that Shraddha made to generate it generated different results and instead of following that unreliable approach we decided to use the same file as before to aim to get the same results, and then go back into the feature parser code.

MIHAI:

  • Small issue: the feature file should be stored as a text file in resources/. This is hopefully an easy change.
  • The bigger issue: I am worried that if you train using the features from this file, but at testing time, or when Reach is used in the wild, you use features created in the Scala code, they won’t match. Based on your comments, this seems indeed to be the case. This is a big problem. It means that we have unexpected behavior in actual Reach usage. Should we work on training the model used in Reach using the same feature creation code used during testing?

@kwalcock
Copy link
Member

kwalcock commented Jan 6, 2021

Do I dare ask for an update? March 2020 was a long time ago. At the very least I need to resolve the merge conflicts that have developed in the interim. It's also very possible to keep a parallel branch around and not ever merge into master, but then to close the PR. There may be other users who now are concerned with the repercussions of these changes.

@MihaiSurdeanu
Copy link
Contributor

Still in progress... But I hope that @enoriega will wrap this up once he starts his new position.

@enoriega enoriega self-assigned this Jan 15, 2021
@enoriega
Copy link
Member

I'll take over it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants