Skip to content

Representation of Bio Sequences via Chaos Game and using the same to find similarities

Notifications You must be signed in to change notification settings

Rohith-2/Chaos-Game-Representation_BioSeq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chaos-Game-Representation_BioSeq

Representation of Bio Sequences via Chaos Game and using the same to find similarities.

Medium : https://rrohith2001.medium.com/chaos-game-representation-of-genetic-sequences-e0e6bdcfaf6c


Data

Each excel file is a collection of all sequences in the respective family.


GUI

https://share.streamlit.io/rohith-2/chaos-game-representation_bioseq/stream.py

Screenshot 2021-05-18 at 7 41 30 PM


Gene-Similarity

CGR Matrix is a 2D matrix => (x,y) which consists of normalised value ranging from 0 to 1, which depicts the intensity of a color at any given (x,y)
The first two rows are considered for similarity measurement:

cgr_vec = Empty Vector()
for i <- cgr matrix of SEQ_1 # i iterates row wise
  a = max(i)
  new_row = i/a # Element-Wise Division
  cgr_vector = cgr_vector + new_row
  
cgr_vec_2 = Empty Vector()
for i <- cgr matrix of SEQ_2 # i iterates row wise
  a = max(i)
  new_row = i/a # Element-Wise Division
  cgr_vector_2 = cgr_vector_2 + new_row

Correlation(cgr_vec,cgr_vec_2)

cgr_vec and cgr_vec_2 will be vectors which can be utilised for measuring similarity via Spearmans correlation.


Running the GUI

The GUI is universally accesible via the above mentioned link to run in locally :

git clone https://github.com/Rohith-2/Chaos-Game-Representation_BioSeq.git
pip install -r requirements.txt  
cd Chaos-Game-Representation_BioSeq/GUI/
streamlit run gui_v2.py