This Multiple Sequence alignment using Genetic Algorithm Project in java is aimed at developing a tool for alignment of multiple DNA sequences. DNA molecules are chains of nucleotides. There are four different types of nucleotides, denoted by A,T,G and C. The primary structure of a protein is a linear chain of amino acids. There are twenty amino acids, denoted by A, R, N, D, C, Q, E, G, H, I, L, K, M, F, F, P, S, T, W, Y and V. Therefore , both proteins and DNA molecules can be represented as strings of letters from relatively small alphabet.

Genetic Algorithm Project Code and Report in Java

PROBLEM DEFINITION:

Multiple Sequence Alignment (MSA) refers to the problem of optimally aligning three or more sequences of maximize symbols with inserting gaps between the symbols. The objective is to the number of matching symbols between the sequences and also use only minimum gap insertions. This problem appears in several fields, such as molecular biology, geology and computer science. In biology it is especially important for constructing evolutionary trees based on DNA sequences and for analyzing the protein structures to help design new protein.MSA belongs to a class of optimization problems with exponential time complexity, called combinatorial problems.
To compare different alignments, a fitness function is defined based on the number of matching symbols and the number and size of gaps.

CONCLUSION: 

 This Project is aimed at solving multiple sequence alignment using novel genetic algorithm. In this project evaluations were done regarding the performance and efficiency of the algorithm. It was compared with the existing ClustalX MSA tool for validation. The real data sets were provided from Dept. of Biotechnology Pondicherry University. The results were satisfactory and can be used for purpose of sequence analysis.

This project has helped in understanding the importance of Genetic Algorithm for solving NP complete problems. 

 FORESEEABLE ENHANCEMENTS:           

  • It can be enhanced to solve the MSA in grid environment keeping the basic algorithm same to achieve still easier way of distribution.
  • It can be further extended to include finding idle systems in network before distributing the task.
  • It can include phylogenetic tree construction algorithm to enhance the tool functionalities.
  • The project can be further enhanced by including functions to find motifs and conserved regions in alignment.

Download Genetic Algorithm Project Code and Report in Java source code, project report, design details and database.