You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been running a PCV workflow on a HPC cluster, and currently it outputs results files the same way that local parallelization would work (however the for the Slurm array I am using a single image workflow over the whole array of images). The results files are as such: "IMG_2_IGP0001_results.txt" and "IMG_2_IGP0002_results.txt" for example. There are 212 images. I run the following code over the results directory.
`import os
import sys, traceback
from plantcv import plantcv as pcv
from plantcv.parallel import WorkflowInputs
from plantcv import parallel
import matplotlib
from matplotlib import pyplot as plt
Which does output a very long file, I think this is where I am losing individual image markers, becuase after I run the json2csv code locally in the terminal the wide format csv file only has one row called "default_1". I think this is probably just a simple thing I have to add to the function, but I haven't found it in the documentation. I have attached the files too. Thank you! combined_output.txt output.csv-multi-value-traits.csv output.csv-single-value-traits.csv
The text was updated successfully, but these errors were encountered:
Hi @connornelle , thanks for opening this issue. Since you are parallelizing without using plantcv-run-workflow it looks like your outputs have no metadata. When json2csv creates the CSV file there is no metadata (such as the image file name) to use as a unique data frame key. To resolve this I believe you can use the pcv.outputs.add_metadata method that we recently added to store the image filename under the term filepath.
Thank you! This appears to be working well. I found that implementing the single img workflow in parallel on our cluster was much easier than trying to use the built in. Is there a resource on using the built in version? I didn't know where to start getting that running.
I believe the best documentation page for our parallelization where it details how to setup a configuration file is here. We'll likely be adding a Scribe doc page additionally since we are finding their formatting to be really useful for processes that involve switching between multiple applications.
Hello,
I have been running a PCV workflow on a HPC cluster, and currently it outputs results files the same way that local parallelization would work (however the for the Slurm array I am using a single image workflow over the whole array of images). The results files are as such: "IMG_2_IGP0001_results.txt" and "IMG_2_IGP0002_results.txt" for example. There are 212 images. I run the following code over the results directory.
`import os
import sys, traceback
from plantcv import plantcv as pcv
from plantcv.parallel import WorkflowInputs
from plantcv import parallel
import matplotlib
from matplotlib import pyplot as plt
parallel.process_results(job_dir="/home/m18c364/ondemand/data/sys/myjobs/projects/default/19/results", json_file="combined_output.txt")`
Which does output a very long file, I think this is where I am losing individual image markers, becuase after I run the json2csv code locally in the terminal the wide format csv file only has one row called "default_1". I think this is probably just a simple thing I have to add to the function, but I haven't found it in the documentation. I have attached the files too. Thank you!
combined_output.txt
output.csv-multi-value-traits.csv
output.csv-single-value-traits.csv
The text was updated successfully, but these errors were encountered: