Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add metaexample for CHAIDTree Regression #5065

Merged
merged 11 commits into from
Jun 25, 2020

Conversation

Hephaestus12
Copy link
Contributor

Also removed Random forest regression undocumented example as it is already ported.

Copy link
Member

@gf712 gf712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, now you just have to add the data file :)


#![create_machine]
Machine chaidtree = create_machine("CHAIDTree", dependent_vartype=2, feature_types=ft, num_breakpoints=50)
chaidtree.set_labels(labels_train)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you pass labels either to the constructor or use put instead please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll do that.

@Hephaestus12
Copy link
Contributor Author

Looks good, now you just have to add the data file :)

Isn't it already present?
I'm using the same dataset used for the CARTree meta example.

@gf712
Copy link
Member

gf712 commented Jun 12, 2020

@Hephaestus12
Copy link
Contributor Author

Have a look at https://github.com/shogun-toolbox/docs/blob/master/DEVELOPING.md#adding-tests-2

This says I should copy the chaidtree.dat file from the build/tests/meta/generated_results/cpp/ directory, but running
make meta_examples
as well as
make build_cpp_meta_examples
isnt generating the .dat file for chaidtree. Only the chaidtree.cpp meta example file is getting generated.(.dat files for other meta examples are getting generated though) What might be the reason for this?

@gf712
Copy link
Member

gf712 commented Jun 13, 2020

You need to also run the meta example (using ctest) and that will generate the output file. You should use the cpp meta example output

#![set_feature_types]

#![create_machine]
Machine chaidtree = create_machine("CHAIDTree", labels=labels_train, dependent_vartype=2, feature_types=ft, num_breakpoints=50)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually the error is here: if you have a look at the parameter registration they all have a m_ prefix. You should remove that m_ for the parameter name

SG_ADD(&m_weights,"m_weights", "weights", ParameterProperties::READONLY);
SG_ADD(&m_weights_set,"m_weights_set", "weights set", ParameterProperties::READONLY);
SG_ADD(&m_feature_types,"m_feature_types", "feature types", ParameterProperties::SETTING);
SG_ADD(&m_dependent_vartype,"m_dependent_vartype", "dependent variable type", ParameterProperties::SETTING);
SG_ADD(&m_max_tree_depth,"m_max_tree_depth", "max tree depth", ParameterProperties::HYPER);
SG_ADD(&m_min_node_size,"m_min_node_size", "min node size", ParameterProperties::SETTING);
SG_ADD(&m_alpha_merge,"m_alpha_merge", "alpha-merge", ParameterProperties::HYPER);
SG_ADD(&m_alpha_split,"m_alpha_split", "alpha-split", ParameterProperties::HYPER);
SG_ADD(&m_cont_breakpoints,"m_cont_breakpoints", "breakpoints in continuous attributes", ParameterProperties::SETTING);
SG_ADD(&m_num_breakpoints,"m_num_breakpoints", "number of breakpoints", ParameterProperties::HYPER);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh okay, I'll do that.

@Hephaestus12
Copy link
Contributor Author

When I run make test I get the following error:

The following tests FAILED:
	 19 - python_legacy-distance_director_euclidean (Child aborted)
	105 - python_legacy-structure_factor_graph_model (NUMERICAL)
	106 - python_legacy-structure_graphcuts (NUMERICAL)
	233 - generated_cpp-regression-random_forest_regression (SEGFAULT)
	354 - generated_python-regression-random_forest_regression (SEGFAULT)

I removed the regression-random-forest undocumented example as the random forest regression meta example already exists. Is that what is leading to these errors?

@gf712
Copy link
Member

gf712 commented Jun 13, 2020

When I run make test I get the following error:

The following tests FAILED:
	 19 - python_legacy-distance_director_euclidean (Child aborted)
	105 - python_legacy-structure_factor_graph_model (NUMERICAL)
	106 - python_legacy-structure_graphcuts (NUMERICAL)
	233 - generated_cpp-regression-random_forest_regression (SEGFAULT)
	354 - generated_python-regression-random_forest_regression (SEGFAULT)

I removed the regression-random-forest undocumented example as the random forest regression meta example already exists. Is that what is leading to these errors?

No, that error is not related to this PR. see #5060

@Hephaestus12
Copy link
Contributor Author

Hephaestus12 commented Jun 13, 2020

I tried running make test and the generated_cpp-regression-chaidtree runs successfully:

        Start 227: generated_cpp-regression-chaidtree
227/358 Test #227: generated_cpp-regression-chaidtree .......   Passed    0.03 sec

However, the .dat file is still not getting created.(I have removed the m from the parameter names)

(python3.5) tejsukhatme@hephaestus:~/shogun/build/tests/meta/generated_results/cpp/regression$ ls
cartree.cpp                             cpp-regression-kernel_ridge_regression_nystrom  cpp-regression-svrlight              least_angle_regression.dat    random_forest_regression.dat
cartree.dat                             cpp-regression-least_angle_regression           kernel_ridge_regression.cpp          linear_ridge_regression.cpp   support_vector_regression.cpp
chaidtree.cpp                           cpp-regression-linear_ridge_regression          kernel_ridge_regression.dat          linear_ridge_regression.dat   support_vector_regression.dat
cpp-regression-cartree                  cpp-regression-multiple_kernel_learning         kernel_ridge_regression_nystrom.cpp  multiple_kernel_learning.cpp  svrlight.cpp
cpp-regression-chaidtree                cpp-regression-random_forest_regression         kernel_ridge_regression_nystrom.dat  multiple_kernel_learning.dat  svrlight.dat
cpp-regression-kernel_ridge_regression  cpp-regression-support_vector_regression        least_angle_regression.cpp           random_forest_regression.cpp

When I run ctest for the single test I get the following error:

error while loading shared libraries: libhdf5.so.103: cannot open shared object file: No such file or directory

@geektoni libhdf5 strikes again, should I make a new environment and set everything up all over again?

@gf712
Copy link
Member

gf712 commented Jun 13, 2020

you need to find libhdf5.so.103 in your system, I am guessing it is is anaconda folder. And then you need to add the path to LD_LIBRARY_PATH

@karlnapf
Copy link
Member

Could you name the files chaid_tree? Ie with an underscore. Just to tidy up a bit (also for future prs on examples) thx

@karlnapf
Copy link
Member

Just disable hdf5 in cmake... I do this locally as it causes problems otherwise


#![extract_weights_labels]
RealVector labels_vector = labels_predict.get_real_vector("labels")
RealVector weights = chaidtree.get_real_vector("weights")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are set by the user so I think this doesn't need to be extracted as discussed in the data PR.
Once you have removed this, you have to regenerate the data, and update the data PR, then update this PR (including the submodule)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the weights are still extracted here, you will need to remove that

@Hephaestus12
Copy link
Contributor Author

I have pushed the new data to the shogun-data PR.
After that gets merged, what do I have to do?

@gf712
Copy link
Member

gf712 commented Jun 23, 2020

I have pushed the new data to the shogun-data PR.
After that gets merged, what do I have to do?

You need to first update the data commit hash here to use the latest commit you just pushed. And then when the CI passes we merge both PRs and that's it :)

SG_ADD(&m_weights,"weights", "weights", ParameterProperties::READONLY);
SG_ADD(&m_weights_set,"weights_set", "weights set", ParameterProperties::READONLY);
SG_ADD(&m_feature_types,"feature_types", "feature types", ParameterProperties::SETTING);
SG_ADD(&m_dependent_vartype,"dependent_vartype", "dependent variable type", ParameterProperties::SETTING);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes the notebooks to fail:
https://dev.azure.com/shogunml/shogun/_build/results?buildId=3629&view=logs&j=089c709a-44eb-5f6e-96e7-15e9ee1ff5bf&t=2da3e16b-a2b2-5f01-2cbe-a20d9528195b&l=1849

Should be simple to fix: open the notebook and edit the name in there as well :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I run test the notebooks on my local machine? Does make test do that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a script for doing that https://github.com/shogun-toolbox/shogun/blob/develop/scripts/test_notebooks.sh.

The link Heiko pasted here above will show also how to use it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesnt. You would have to compile shogun with the python interface, make sure you can load it from python, and then open the notebook in jupyter notebook.

However, you might be able to do a simple hack here:

  1. Open the notebook in a texteditor
  2. Search for m_
  3. If it is one of the varnames, change it to the values you updated them to
  4. Save the file in the texteditor and submit

As this is such a simple change, that should do it without the need for you running it locally

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have already pushed the code after making those changes. :)

@karlnapf
Copy link
Member

@Hephaestus12 you also need to push the updated example here (as the code in the PR still extracts the weights)
And I just saw that a notebook fails from renaming the variable...easy to fix see my other comment

@Hephaestus12
Copy link
Contributor Author

The file changes are showing that some other data file has been deleted.

@Hephaestus12
Copy link
Contributor Author

Fixed.

@Hephaestus12
Copy link
Contributor Author

@geektoni Now, this PR shows that I have deleted the weighted_degree_string.dat file when you see the files changed tab.

@@ -1405,9 +1405,9 @@
"source": [
"def train_chaidtree(dependent_var_type,feature_types,num_bins,feats,labels):\n",
" # create CHAID tree object\n",
" c = sg.create_machine(\"CHAIDTree\", m_dependent_vartype=dependent_var_type,\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you forgot some, I remember seeing one name in a "get" call...double check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to this:

" tree = sg.create_machine(\"C45ClassifierTree\", labels=labels, m_nominal=types)\n",

I don't think we should change this right? As it isn't related to CHAID tree? As then we will have to make some other change in the source code related to C45ClassifierTree

Also, the other one is :

" output_certainty=tree.get('m_certainty')\n",

This one too, is an instance of C45ClassifierTree. Should I change these two instances too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry. of course you are right!
We will see it in the CI for the notebooks

@karlnapf
Copy link
Member

This is because I just merged another data PR.
You will need to rebase your data PR, force push it.
Then update this PR with the new submodule and force push as well
(also double check the notebook I mentioned)

@Hephaestus12
Copy link
Contributor Author

This is because I just merged another data PR.
You will need to rebase your data PR, force push it.
Then update this PR with the new submodule and force push as well
(also double check the notebook I mentioned)

I did what you said, Now there are 4 data file changes, I don't understand why this is happening.

@karlnapf
Copy link
Member

as you can see from the merge conflict for the submodule here, you haven't updated the submodule in this PR to the version of your (refactored!) data PR

@Hephaestus12
Copy link
Contributor Author

Now the merge conflict will resolve when we merge the shogun-data pr right?

@karlnapf
Copy link
Member

Make sure to read about why merge conflicts happen in git and how to resolve them...

@karlnapf
Copy link
Member

All you really need to do is to update the data submodule to the latest version in your data PR (and then force push again)

@karlnapf
Copy link
Member

This shouldn't take so long to sort out @Hephaestus12 if you have questions, come to irc and ask, we are happy to help! Please make this a priority

@Hephaestus12
Copy link
Contributor Author

Yes, I hope this is ready now?

@karlnapf
Copy link
Member

Almost! See comment in other PR (in shogun-data, you should always squash your commits, in shogun-dev that is not necessary as we can do it when merging)


#![set_feature_types]
IntVector ft(1)
ft[0] = 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karlnapf this is causing issues. In octave this becomes a scalar value :( I wrote a fix for this but it's in another branch, not yet merged.. Also this throws an error in the meta example, but ctest doesn't pick this up (I had this issue before) and I am not sure why. The test only fails when comparing the serialised outputs in the integration test, because this will not have serialised anything because of the exception thrown when you put ft

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, unless we merge your fix this meta example will fail only in Octave, right? Could it be possible to merge this PR anyway, but somehow excluding it from testing with Octave (since it is broken atm)? Just to not have to put this on hold indefinitely...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @Hephaestus12 can just fix it here? All you need to do is replace

#ifdef SWIGR

with #if defined(SWIGR) || defined(SWIGOCTAVE)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll do this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this works :D

@karlnapf
Copy link
Member

ok data is not all in sync.... let's see what the CI says

@karlnapf
Copy link
Member

Looks good, I'll merge :)

@karlnapf karlnapf merged commit 8c83d7e into shogun-toolbox:develop Jun 25, 2020
@karlnapf
Copy link
Member

thanks! This was a nice one! :)

@Hephaestus12
Copy link
Contributor Author

Hephaestus12 commented Jun 25, 2020

Yayayay

^ sorry for this, I'm just really relieved. xD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants