Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

With learning rebase #52

Draft
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

scottwittenburg
Copy link
Collaborator

Rebase the original work adding learning to the application, and fix a couple console errors

matthewma7 and others added 2 commits January 12, 2021 20:45
active learning with girder worker

Show good_prob row

Add active learning persisted state
@scottwittenburg
Copy link
Collaborator Author

@dzenanz This is more or less working for me, but I haven't really been able to test it since I don't know where to find a dataset with the new fields. Even if you import a file in the original csv format, I think there are going to be issues when it goes looking for the data in csv format in Girder, as I have not yet converted everything in the PR to work with JSON. In fact, I'm not sure that's even the right way to go for what it's doing. But I think the next step would be to put together a dataset in the format it expects and see where we start hitting errors. Once we see what's going on, we can decide if converting all internal expectations to JSON is what we want to do.

Looks like it wants two new fields in the data (associated with each scan): IQMs (if it's csv) or iqms if it's json, and good_prob. The latter is simply a floating point value, the former is a semicolon-separated string which corresponds to a list of key/value pairs. The key and value in each pair are separated by a colon. Futhermore, some special processing is done if the key has underscores in it.

But you can look at that expectation yourself if you check out the parseIQM method in server/miqa_server/session.py.

You can feel free to push more commits on this branch, or pull it to your fork and work there, up to you.

@scottwittenburg
Copy link
Collaborator Author

@dzenanz As I'm working more with this, I see there are still issues I can fix without a dataset containing the image quality metrics. I'm working on that now, and will push another commit when I sort out the issues.

@scottwittenburg
Copy link
Collaborator Author

The refactoring of some of the session.py methods to a utility module was necessary because they are used from the learning.py module.

@scottwittenburg
Copy link
Collaborator Author

All of the learning modules seem to assume the data format is csv. I'm considering writing the inverse of my converter so we can easily go back and forth between formats. It shouldn't take much time to write, and the bi-directional conversion would be easily tested. I think it's either that, or maybe we just abandon the JSON format, perhaps it's just not that useful.

Thoughts @dzenanz @curtislisle?

@dzenanz
Copy link
Member

dzenanz commented Jan 13, 2021

I don't have a strong preference. Do what you think is easier or better 😄 Or wait for Curt's opinion.

@scottwittenburg
Copy link
Collaborator Author

@dzenanz @curtislisle

I opted to write a converter from json back to csv. It seems to be working to pull the json representation out of girder, convert it to csv, and pass this into the learning component.

I learned more about how this workflow runs as I worked through issues today. I tried to encapsulate the key steps to reproduce it in the development.md at the root of the repo (latest commit). It's still not working as we had hoped unfortunately. The mriqc module is used to do the active learning, and there's documentation in various places in there saying you need to point to the MIQA_MRIQC_PATH, which it describes as needing to contain "all important directories like training_data.csv, model_weights, log files". Your guess as to what form those things are supposed to take is likely better than mine. If either of you know how to satisfy those requirements, let me know, maybe we can get it running.

Also, everytime I click the "RETRAIN" button in the application, I see the following errors in the celery output:

Traceback (most recent call last):
  File "/Users/scott/projects/miqa/miqa-venv/lib/python3.7/site-packages/celery/app/trace.py", line 412, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/Users/scott/projects/miqa/miqa-venv/lib/python3.7/site-packages/girder_worker/task.py", line 173, in __call__
    self._maybe_cleanup(hook)
  File "/Users/scott/projects/miqa/miqa-venv/lib/python3.7/site-packages/girder_worker/task.py", line 146, in _maybe_cleanup
    arg.cleanup(**kwargs)
  File "/Users/scott/projects/miqa/miqa-venv/lib/python3.7/site-packages/girder_worker_utils/transforms/girder_io.py", line 161, in cleanup
    if os.path.isdir(self.output_file_path):
AttributeError: 'GirderUploadToItem' object has no attribute 'output_file_path'

That seems like it may be caused by having an older girder_worker_utils in my virtual environment, as newer code has a guard in that method. The other error I'm currently hitting is:

FileNotFoundError: [Errno 2] No such file or directory: '/Users/scott/miqa/mriqc_master_folder/training_data.csv' 

... which is obviously because I don't have all the important files in the mriqc master directory.

scottwittenburg and others added 11 commits January 21, 2021 13:27
Add a note indicating that the slash is needed at the end of the mriqc
path (we should fix this, but it works this way for now).

Also, I added the --no-sub argument when running mriqc so that we avoid
submission of the computed metrics back to mriqc.nimh.nih.gov, at least
while we are doing a lot of testing of our workflow.
: used in time separation (e.g. 16:10:20) is not allowed in file name on Windows.
This adds a radio group to the retrain dialog, allowing the user to
select neural network or random forest classifaction.  Also adds a
placeholder for neural network learning in the tasks.py module.

Also fixes the bug where the retrain dialog "Cancel" button does
not work, and adds a note to dev doc on running "data2mriqc.py".
WARNING/MainProcess] m:\dev\zarr\miqa\mriqc\mriqc\data_loader.py:112: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.data['good_prob'][i] = predictions[ind]
@dzenanz
Copy link
Member

dzenanz commented Feb 4, 2021

NN classifier seems to be working on my computer. Can you check my work in #54?

@scottwittenburg
Copy link
Collaborator Author

Thanks @dzenanz I'll try to take a look at this tomorrow. In the meantime I'm just curious, why create a new PR?

@dzenanz
Copy link
Member

dzenanz commented Feb 5, 2021

I didn't want to impose on your branch. You can cherry pick commits from my branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Sprint Board (4 week)
  
Triage (Needs Grooming)
Development

Successfully merging this pull request may close these issues.

None yet

3 participants