Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FCMA feature selection excludes the best performed voxel #486

Open
peetal opened this issue Oct 29, 2020 · 2 comments
Open

FCMA feature selection excludes the best performed voxel #486

peetal opened this issue Oct 29, 2020 · 2 comments

Comments

@peetal
Copy link

peetal commented Oct 29, 2020

Hi,

I found this potential issue for FCMA feature selection step, which may lead to excluding the best performed voxel when selecting the top k number of voxels.

At the end of the fcma_voxel_selection_cv.py:

with open(file_str + 'result_list.txt', 'w') as fp:
    for idx, tuple in enumerate(results):
    fp.write(str(tuple[0]) + ' ' + str(tuple[1]) + '\n')

    # Store the score for each voxel
    score[tuple[0]] = tuple[1]
    seq[tuple[0]] = idx

result is an iterator of tuples. tuple[0] is the voxel ID, which index the voxel, tuple[1] is that voxel's score. The tuples are ranked, such that the highest performed voxel would be ranked at the top, thus when being enumerated, the best performed voxel would have idx = 0. As a result, seq[tuple[0]] = idx would assign the best performed voxel the rank of 0.

Then when using fslmaths to select the top k number of voxels, as in make_top_voxel_mask.sh:

for file in ${input_dir}/*_seq.nii.gz
do	
	# Preprocess the file name
	fbase=$(basename "$file")
	pref="${fbase%%.*}"
	
	# Create the voxel mask
	fslmaths $file -uthr $voxel_number -bin ${output_dir}/${pref}_top${voxel_number}.nii.gz

done

-uthr would up-threshold the input file based on the voxel_number input. For example, it k = 3000, -uthr would select voxels that have the rank from 0-3000, including the top 3000 voxels and all non-brain voxels, which also have the value of 0. Then -bin would binarize the file into a mask, excluding all voxels that have 0 value, including the non-brain voxels and the best performed voxel which has the value of 0 because it ranks 0. In this way, I believe FCMA feature selection would exclude the top-performed voxel.

If I was correct about this issue, the solution should be pretty simple, and can be done: (just added +1 to idx)

with open(file_str + 'result_list.txt', 'w') as fp:
    for idx, tuple in enumerate(results):
    fp.write(str(tuple[0]) + ' ' + str(tuple[1]) + '\n')

    # Store the score for each voxel
    score[tuple[0]] = tuple[1]
    seq[tuple[0]] = idx + 1

Please let me know if this does or doesn't makes any sense or if I misunderstood the script and this is not a potential issue. Thank you all very much!

@CameronTEllis
Copy link
Contributor

@yidawang are you able to look at this?

@yidawang
Copy link
Member

yidawang commented Nov 2, 2020

The description makes sense to me. I couldn't remember all the details of how fslmaths works. If it is as described above, I am fine with the proposed fix with a 1-based index system. Please submit a PR to fix it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants