Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding Ternary operator in clocal_exp_editsim #23

Open
KeitaW opened this issue May 31, 2020 · 7 comments
Open

Question regarding Ternary operator in clocal_exp_editsim #23

KeitaW opened this issue May 31, 2020 · 7 comments
Assignees
Labels
question Further information is requested

Comments

@KeitaW
Copy link
Member

KeitaW commented May 31, 2020

I got the following question from https://github.com/rcojocaru.

"""
I have a short question about the code. In editsim.py --> clocal_exp_editsim(_withbp) you have this:

for col1 in range(nrow):
for col2 in range(ncol):
match = 0
for row in range(nneuron):
match += mat1[row, col1] * mat2[row, col2]
match = -10 if match == 0 else match
...
dp[col1+1, col2+1] = max4(
0,
down_score,
right_score,
dp[col1, col2] + match

Do you remember why you introduced this if clause for the case in which match is 0? I think it can have radical effects on the edit similarity score. For example, even if comparing identical sequences, instead of getting the maximum edit similarity score, the result would be alpha dependent because of this if clause. I would really appreciate your input before modifying core things.
"""

@KeitaW KeitaW self-assigned this May 31, 2020
@KeitaW KeitaW added the question Further information is requested label May 31, 2020
@KeitaW
Copy link
Member Author

KeitaW commented May 31, 2020

This change was intentional. Let me explain why I made this change.
https://github.com/KeitaW/spykesim/blob/f095c24e20ef07a1029f422344f02ed0741059cc/spykesim/editsim.pyx#L291-L317

@KeitaW
Copy link
Member Author

KeitaW commented May 31, 2020

match == 0 means there is no neuron that fires in both time windows. In that case, we would like to penalize the score by either the down_score or right_score but since the minimum of the match score is zero, those two values are not considered in the previous version.
The ternary operator was introduced to circumvent the problem.

@rcojocaru
Copy link
Collaborator

rcojocaru commented Jun 22, 2020

I am still not convinced about this.

"match == 0 means there is no neuron that fires in both time windows" --> as I understand it, match is the scalar product between two specific columns of the time windows; so, if this is 0, it might mean there's no match between them (different neurons firing in one and the other) or no activity at all in those two columns (no neuron firing). Maybe we should penalize the first situation, but not the second one.

For example if you compare two identical windows of 100 ms with 10 neurons firing, this condition will systematically reduce the edit similarity score for these two identical windows - the reduction will depend on alpha. It is true that for low enough alpha the reduction might not be significant. But this is the main thing that bothers me, that even identical windows with sufficient activity can end up with lower scores, because it is penalizing "holes" or gaps in the profile.

Is it a good idea for this penalty to be fixed/always used? Maybe it should be parameterized and applied in function of the data set.

@KeitaW
Copy link
Member Author

KeitaW commented Jul 1, 2020

Thank you for the comment.

Yeah, I understand what you mean. We can of course do something like below:

nspikes1 = 0
nspikes2 = 0
match = 0
for row in range(nneuron):
    nspikes1 += mat1[row, col1]
    nspikes2 += mat2[row, col2]
    match += mat1[row, col1] * mat2[row, col2]  
match = -10 if match == 0 else match
if nspikes1 == 0 or nspikes2 == 0:
    down[col1+1, col2+1] = down[col1, col2+1]
    down_score = dp[col1-down[col1+1, col2+1]+1, col2+1] - cexp(a, down[col1+1, col2+1]) + 1
    # For Rightp
    right[col1+1, col2+1] = right[col1+1, col2]
    right_score = dp[col1+1, col2-right[col1+1, col2+1]+1] - cexp(a, right[col1+1, col2+1]) + 1
else:
    # For Down
    # Comparing options: newly extend a gap or extend the exsinting gap
    dirpair = pairmax2(
        dp[col1, col2+1] - cexp(a, 1) + 1,
        dp[col1-down[col1, col2+1], col2+1] - cexp(a, down[col1, col2+1] + 1) + 1
    )
    down[col1+1, col2+1] = 1 if dirpair.idx == 0 else down[col1, col2+1] + 1
    down_score = dp[col1-down[col1+1, col2+1]+1, col2+1] - cexp(a, down[col1+1, col2+1]) + 1
    # For Rightp
    # Comparing options: newly extend a gap or extend the exsinting gap
    dirpair = pairmax2(
        dp[col1+1, col2] - cexp(a, 1) + 1,
        dp[col1+1, col2-right[col1+1, col2]] - cexp(a, right[col1+1, col2] + 1) + 1
    )
    right[col1+1, col2+1] = 1 if dirpair.idx == 0 else right[col1+1, col2] + 1
    right_score = dp[col1+1, col2-right[col1+1, col2+1]+1] - cexp(a, right[col1+1, col2+1]) + 1
# Update dp
pair = pairmax4(
    0,
    down_score,
    right_score,
    dp[col1, col2] + match
)
bp[col1 + 1, col2 + 1] = pair.idx
dp[col1 + 1, col2 + 1] = pair.value

to avoid penalization when there is no spikes in a certain column.
How do you think?

@rcojocaru
Copy link
Collaborator

Thank you for looking into it. I think this solution could be a good compromise for the cases I am worried about (my initial solution was to remove this zero-match penalty completely, but that might be too drastic). Let me test this new code on some of the data I had issues with and get back to you!

@rcojocaru
Copy link
Collaborator

Hi. I checked, and this version of the code does what I was asking for. Still, this modification can significantly change the range of edit similarity values one can obtain on a given dataset, so maybe before committing to it (if you plan to modify the repository) it could be good to have another parameter that chooses this version or the previous one. For reproducibility if nothing else.

I was thinking something about like:

def clocal_exp_editsim(DBL_C[:, :] mat1, DBL_C[:, :] mat2, DBL_C a = 0.01, zerospikes_penalty = False):
....
if (nspikes1 == 0 or nspikes2 == 0) and not zerospikes_penalty:
...

same for clocal_exp_editsim_withbp. Please check if this makes sense and feel free to close the issue, I consider it solved. Thank you!

@KeitaW
Copy link
Member Author

KeitaW commented Jul 15, 2020

Thanks @rcojocaru for testing the idea. I agree that we should have an option to enable/disable the modification.
Okay, let me make a PR that introduce the change shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants