Annotation of the Genome


Annotation of the Genome: 


Once the raw sequence data is in hand, the primary function is to locate the genes in the genome. Structurally, genes can ben identified by some common criteria shared by every gene. For example, a gene will be preceded by a promoter sequence, which is often conserved. About 50% of all the promoters contain a TATA box 30 bp upstream of transcription start site, which is the most conserved signal for eukaryotic promoters. II the gene is eukaryotic, it will also contain exon and intron sequences. The junction of CXOfl and intron shares common sequences that help to identify these sequences. Identification of genes in prokaryotes is relatively easy, because the starting points and the end points can be identified by finding the initiation and termination codons. However eukarvotic genes are often long, contain many introns, and are dispersed in the genome, A human gene may have the size of mcgabase pairs containing more than fifty introns, which makes scanning such genes extremely difficult. Several computational softwares have been developed to identify gene sequences from the raw genome sequences. These computational tools use predictive algorithms for identify sequences that can possibly code for protein sequences. A stretch of sequence that can be converted to a possible meaningful protein sequence is known as an open reading frame (OR F), whereas some sequences that have the structural similarity of a gene, but the translated
polypeptide sequence does not match to any known protein is considered as an unassigned reading frame (URF). The target of structural and functional genomics. two important branches of genomics research are to characterization of these reading frames for extraction of information contributing to cellular functioning.

No comments:

Post a Comment