Skip to content

merlijnvb/8QA05

Repository files navigation

8QA05

Understanding the development and course of diseases at the cellular level is an important research topic. The proteins in the cell play a major role in this. Its production is controlled by the genes expressed in the cell. Determining the extent to which genes are expressed in a cell has therefore become a promising technique for understanding, and possibly even curing, some diseases. One of the possible techniques for determining gene expression values is the use of micro arrays. This makes it possible to determine the expression values of very many (10000) genes in one experiment. Comparing gene expression values of, for example, "healthy" and "diseased" tissue can reveal which genes play a role in the development of the disease or in the healing process. If the measurements are repeated at different times, information about the course of the disease can also be distilled. The processing and analysis of these large numbers of data is impossible without the use of computers. The first step consists of pre-processing (preprocessing) the data. Among other things, genes with expression levels that do not exceed the signal-to-noise ratio are omitted. Then the relative expression can be calculated. This is a number that indicates how much more a gene is expressed in healthy tissue than in diseased tissue. After preprocessing, the data consists of the relative expression values of a large number of genes at 8 time points. Each gene therefore corresponds to a point in 8-dimensional space. Two genes that have approximately the same expression pathway over time correspond to points in 8-dimensional space that are close to each other. To find groups of genes with the same behavior, the points are now clustered in 8-dimensional space. This means that the points are divided into clusters, i.e. groups of points located close to each other. Various cluster algorithms have been described in the literature, some of which will be implemented with a Python program. The final phase is the interpretation of the results found in the previous phase. This step concerns the analysis of the relationship of the different clusters to the biological processes in the tissue.