Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make GNN plots available as an SPT CLI command and API endpoint #319

Closed
CarlinLiao opened this issue May 9, 2024 · 7 comments · Fixed by #320
Closed

Make GNN plots available as an SPT CLI command and API endpoint #319

CarlinLiao opened this issue May 9, 2024 · 7 comments · Fixed by #320
Assignees
Labels
feature New feature

Comments

@CarlinLiao
Copy link
Collaborator

This involves moving the current analysis_replication/gnn_figure/graph_plugin_plots.py out of analysis_relication/ and into spatialprofilingtoolbox/graphs/, adding a CLI command for it to scripts/, and an API endpoint that calls it to spatialprofilingtoolbox/apiserver/app/main.py.

@CarlinLiao CarlinLiao added the feature New feature label May 9, 2024
@CarlinLiao CarlinLiao self-assigned this May 9, 2024
@CarlinLiao
Copy link
Collaborator Author

Here's an example configuration file for graph_plugin_plots.py as is.

{
    "study": "Melanoma intralesional IL2",
    "phenotypes": [
        "Tumor",
        "Adipocyte or Langerhans cell",
        "Nerve",
        "B cell",
        "Natural killer cell",
        "Natural killer T cell",
        "CD4+/CD8+ T cell",
        "CD4+ natural killer T cell",
        "CD4+ regulatory T cell",
        "CD4+ T cell",
        "CD8+ natural killer T cell",
        "CD8+ regulatory T cell",
        "CD8+ T cell",
        "Double negative regulatory T cell",
        "T cell/null phenotype",
        "CD163+MHCII- macrophage",
        "CD163+MHCII+ macrophage",
        "CD68+MHCII- macrophage",
        "CD68+MHCII+ macrophage",
        "Other macrophage/monocyte CD14+",
        "Other macrophage/monocyte CD4+"
    ],
    "attribute_order": [
        "Tumor",
        "Adipocyte or Langerhans cell",
        "Natural killer cell",
        "CD4+ T cell",
        "Nerve",
        "B cell",
        "CD4+/CD8+ T cell",
        "CD4+ regulatory T cell",
        "CD8+ natural killer T cell",
        "CD8+ regulatory T cell",
        "CD8+ T cell",
        "Double negative regulatory T cell",
        "T cell/null phenotype",
        "Natural killer T cell",
        "CD4+ natural killer T cell",
        "cohort"
    ],
    "cohorts": [
        {
            "index_int": 1,
            "label": "Non-responder"
        },
        {
            "index_int": 3,
            "label": "Responder"
        }
    ],
    "plugins": [
        "cg-gnn",
        "graph-transformer"
    ],
    "figure_size": [
        11,
        8
    ],
    "orientation": "horizontal"
}

Translating this to an API call,

  • study we can ask the user to provide
  • phenotypes can be pulled from the database, but
  • attribute_order is tricky, since it's up to the user to determine what's a reasonable order for the phenotypes they want to display. Maybe we make the user provide this too? Also, this seems redundant with phenotypes since it's just specifying the subset of phenotypes to use but I'll need to double-check.
  • cohorts can be pulled from the database, although it will likely have extra cohorts not used by the GNN that'll hopefully just fall out instead of causing an error
  • plugins I think we can ask the user to provide?
  • figure_size we could try to determine dynamically simply from the number of phenotypes and specimens, but that leaves out how long the longest phenotype's name is... this could be tricky. The values we're using now I only determined using trial and error. Maybe we cache the results so the user can quickly try new values of their own?
  • orientation is a less complex version of figure_size

What do you think @jimmymathews?

@CarlinLiao
Copy link
Collaborator Author

We have to look at this from the web application perspective as well if we go the caching route.

@jimmymathews
Copy link
Collaborator

To keep the work here bounded, I propose that we make the new API endpoint have almost no parameters, maybe just the study. We can manually record (in that JSON format, I suppose) the detailed configuration for each study, and get the API handler to consult this configuration when regenerating the plot.

@CarlinLiao
Copy link
Collaborator Author

Okay, so for the web API endpoint we expose only the study parameter, but for the actual function and CLI input we expose all parameters. The web API will look up a file or table with the parameters we've fixed for that study and call the function that way. (Where will that file or table be and how will it be looked up?)

@jimmymathews
Copy link
Collaborator

Yup, that is what I had in mind.
How about we just add a little self-sufficient database table with study name and JSON blob contents? This way it will be sure to be available to the application.
Similar to the spt db upload-sync-findings functionality recently added (which takes a local source file and makes its contents available in the DB in a simple way), we could also have spt db upload-sync-gnn-plot-configurations ?

@jimmymathews
Copy link
Collaborator

jimmymathews commented May 9, 2024

The script upload_sync_findings.pyI mentioned above is here.

It creates an isolated sql table and uses it / syncs it with some local file (local, that is, to the spt-data repo), which is sort of similar to what we would need.

@CarlinLiao
Copy link
Collaborator Author

My thought's been to replace analysis_replication.accesors.DataAccessor and dependence on the host API with a db_config_file and usage of either FeatureMatrixExtractor or raw SQL queries, but this is proving more complicated than expected.

With the apiserver, the phenotype counts per specimen is fast because of the encoding you did, but if I'm understanding this correctly that functionality is in ondemand.providers.provider and not meant for usage outside of that context. (I suppose I could copy the functionality of OnDemandProvider._get_data_array_from_db and CountsProvider.count_structures_of_partial_signed_signature but that doesn't feel like a very modular solution either.)

What would you say is the right way to approach this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants