you should assemble two types of reads using the De Bruijn graph representation and the Eulerian path to obtain the original sequence from that representation.
Pair-reads with a known distance between themInput : Your program should be able to read a file. The first line would be: - the length of the sequences in each side and the length of the gap. - Each read will take one. - The pair reads will be separated by “|” .. Example : AGCC|TTAA Output : The program then outputs the assembled sequence to the screen.
READS
GACC|GCGC
ACCG|CGCC
CCGA|GCCG
CGAG|CCGG
GAGC|CGGA
GRAPH
(S)GAC,GCG --> ACC,CGC
ACC,CGC --> CCG,GCC
CCG,GCC --> CGA,CGC
CGA,CCG --> GAG,CGG
GAG,CGG --> AGC,GGA(E)
PATHS use the first letter only in all expect the last one get all its letters
GAC -> ACC -> CCG -> CGA -> GAG -> AGC
Prefix = GACCGAGC
GCG -> CGC -> GCC -> CCG -> CGG -> GGA
suffix = GCGCCGGA
Genome = GACCGAGCGCCGGA