You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Background: In order to trace Shakespearean Intertextuality, I am tokenizing Shakespeare texts (hypotexts) as 9grams and align each ngram (align_local) with other texts, e.g. by Terry Pratchett or Charles Dickens (hypertexts). I loop through all the ngrams and only return alignments that are above a certain alignment score. To speed the process up a little bit, I only use every third ngram, which should still be sufficient overlap to not miss any potential quotes (WyrdSisters_Macbeth_minimal.R.zip).
Problem: However, I am occasionally getting the following error message:
Error in b_out[out_i] <- b_orig[row_i - 1] : replacement has length zero
Here is some more context via a screenshot from my console:
I cannot really reproduce the error, but it seems to depend on how I set the count-variable, which has an effect on the ngram I start with. I assume the error has something to do with how the Smith-Waterman algorithm builds up its matrix of values, or – looking into the TextReuse code – more concretely with the output vector construction ...
# Place our first known values in the output vectors
b_out[out_i] <- b_orig[row_i - 1]
a_out[out_i] <- a_orig[col_i - 1]
out_i = out_i + 1L # Advance the out vector position
I assume a related problem is described in StackOverflow, but with no real solution.
Since the overall approach seems to work pretty well when it comes to discovering verbatim or near verbatim Shakespeare text reuse in other hypertexts, I would be really happy to understand what is happening here, and how I can possibly fix it.
The text was updated successfully, but these errors were encountered:
Background: In order to trace Shakespearean Intertextuality, I am tokenizing Shakespeare texts (hypotexts) as 9grams and align each ngram (align_local) with other texts, e.g. by Terry Pratchett or Charles Dickens (hypertexts). I loop through all the ngrams and only return alignments that are above a certain alignment score. To speed the process up a little bit, I only use every third ngram, which should still be sufficient overlap to not miss any potential quotes (WyrdSisters_Macbeth_minimal.R.zip).
Problem: However, I am occasionally getting the following error message:
Error in b_out[out_i] <- b_orig[row_i - 1] : replacement has length zero
Here is some more context via a screenshot from my console:
I cannot really reproduce the error, but it seems to depend on how I set the count-variable, which has an effect on the ngram I start with. I assume the error has something to do with how the Smith-Waterman algorithm builds up its matrix of values, or – looking into the TextReuse code – more concretely with the output vector construction ...
I assume a related problem is described in StackOverflow, but with no real solution.
Since the overall approach seems to work pretty well when it comes to discovering verbatim or near verbatim Shakespeare text reuse in other hypertexts, I would be really happy to understand what is happening here, and how I can possibly fix it.
The text was updated successfully, but these errors were encountered: