Skip to content

Shuffle the string sequences such that the k-mer frequency is preserved in each string

License

Notifications You must be signed in to change notification settings

kchu25/SeqShuffle.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SeqShuffle

Dev Build Status Coverage

Shuffle a string such that it preserves the k-mer frequency in the string (k $\geq$ 1).

Installation

To install SeqShuffle.jl use Julia's package manager:

pkg> add SeqShuffle

Usage

using SeqShuffle

# an example string
str = "CAGCCCCGCAGGCCACTGCCTCGCC";

# shuffle the string such that it preserves the frequency of 2-mers
seq_shuffle(str; k=2)
> "CTGCCAGCCCCCAGCGCACGGCCTC"

# shuffle the string such that it preserves the frequency of 3-mers
seq_shuffle(str; k=3)
> "CAGCCAGGCCGCACTGCCCCTCGCC"

# k=1 is just the ordinary shuffle
seq_shuffle(str; k=1)
> "CGTTACCGCGCGGCCCACCCAGCCC"

# The shuffling is not restricted to DNA alphabets; other alphabets
# works as well
seq_shuffle("ababacraggrac"; k=2)
> "ababaggracrac"

# of course, you can use the dot syntax in Julia to shuffle every string in the vector
vec_str = ["GCCCCGCAGGCCACTG", "CGCAGGCCTG", "CGTTTTCGCCTCGAAAAG"];
seq_shuffle.(vec_str; k=2)
> 3-element Vector{String}:
  "GCCCCCGCAGGCACTG"
  "CGCCAGGCTG"
  "CCTCGAAAAGTTTTCGCG"

# shuffle every string in the fasta file such that it perserves the 
# frequency of 2-mers in each string; save the result as a new fasta 
# file output. Input and output are absolute filepaths as strings.     
# (optional) Use a fixed seed for reproducibility.
shuffle_fasta(fasta_location::String, 
                fasta_output_location::String;
                k=2, seed::Union{Nothing, Int}=1234)