AlignAlign:  Software for optimally aligning alignments
    
    
   
    
   Alert
 The functionality of AlignAlign is now being provided by the software tool
Opal.
Because Opal is the active implementation of the algorithm for optimally aligning alignments, 
AlignAlign is no longer being maintained.
 
Opal can optimally align two alignments, and is faster than the implementation in AlignAlign because of a number of more recent speedups to the algorithm. To align alignments with Opal, run the command: 
    java -jar Opal.x.jar --in alignment1.fasta --in2 alignment2.fasta 
    
    Description
   
   The software AlignAlign
   takes as input two alignments (of disjoint sequences)
   and outputs one alignment (of all the sequences)
   that contains both input alignments and has optimal score.
   In the multiple sequence alignment that is output,
   the columns of the two input alignments are optimally aligned
   under the sum-of-pairs scoring function, correctly accounting for gaps.
   This software is free for noncommercial use.
   (The terms of use are below.)
    
   AlignAlign implements the algorithm of Kececioglu and Starrett
   from RECOMB 2004.
   (The citation is below.)
   This algorithm computes an optimal alignment of two multiple-sequence
   alignments under the sum-of-pairs objective with linear gap-costs,
   which is NP-complete.
   (In the sum-of-pairs objective, the score of a multiple-sequence alignment
   is the sum of the scores of the two-sequence alignments that it induces on all
   pairs of sequences.
   With linear gap costs, a run of x insertions or deletions
   in a pairwise alignment costs ax + b,
   where a and b are constants.)
   Extensive experiments show that on biological data the algorithm is fast
   and runs in quadratic time.
   More precisely, the theoretical worst-case time for the algorithm is
   O(5min(k,n) max(k,n)2)
   for two alignments each with k sequences and n columns,
   while in practice the observed running time is
   O(k2n2).
    
   AlignAlign implements two versions of the algorithm:
   one optimized for space, the other optimized for time.
   The space-optimized version computes an optimal alignment in space
   linear in the number of input columns.
   The time-optimized version computes an optimal alignment in quadratic
   space, but runs an order of magnitude faster in practice.
   Both can be invoked from AlignAlign. In addition,
   AlignAlign supports non-uniform sequence-pair weights,
   as well as terminal gap penalties that differ from the default penalties.
    
   Example
   
   Below is an example illustrating AlignAlign on a multiple alignment from the 
   PALI database of structural protein alignments.
   The example shows part of an optimal alignment of two 10-sequence subalignments from the Kunitz STI
   inhibitor dataset (accession number 409). The alignment was computed using optimal parameter 
   values (for both amino acid substitution scores and gap-open and -extension penalties) 
   obtained by the algorithm of Kececioglu and Kim (RECOMB 2006). The optimal alignment of 
   the two alignments (which costs 528533) improves on the score of the original PALI benchmark
   (which costs 529280). Computing the alignment took 0.2 seconds on a 2.4 GHz Pentium processor 
   with 512 MB RAM.
  
	 
	 
 
... 1avwb_409        31 AA-PTGNE--R-CPLTVVQSRNELDKGIGTIISSP----Y---R-IRFIAEGHPLSLKFD-SFAVIMLCVG-I--P--TE-WSVV--E-D-LP  100 ... 
... 1tie_409         31 LA-KTGEE--T-CPLTVVQSPNELSDGKPIRIESR----L---R-SAFIPDDDKVRIGFA-Y-A----PKCAP--S--PW-WTVV--------   92 ... 
... 1eyla_409        35 TA-KTGNE--P-CPLTVVRSPNEVSKGEPIRISSQ----F---L-SLFIPRGSLVALGFA-N-P----PSCAA--S--PW-WTVV----D-SP   99 ... 
... SPR1_IPOBA/4     32 LA-HLDM--MSKCATDVIVSPNDLDNGDPITITPA----TADPE-STVVMASTYQTFRFN-I-ATNKL--CVN--N--VN-WGIQ--H-D-SA  104 ... 
... KTI1_SOYBN/2     31 VD-STGKE--I-CPLTVVQSPNELDKGIGLVFTSP----L---H-ALFIAERYPLSIKFG-SFAVITLCAG-M--P--TE-WAIV----E-RE   99 ... 
... ITRA_SOYBN/2     30 AA-PTGNE--R-CPLTVVQSRNELDKGIGTIISSP----Y---R-IRFIAEGHPLSLKFD-SFAVIMLCVG-I--P--TE-WSVV--E-D-LP   99 ... 
... IT1A_PSOTE/2     31 AA-ATGTE--T-CPLTVVRSPNEVSVGEPLRISSQ----L---R-SGFIPDYSVVRIGFA-N-P----PKCAP--S--PW-WTVV--E-D-QP   96 ... 
... IDE3_ERYCA/1     35 LA-KTGEE--T-CPLTVVQSPNELSDGKPIRIESR----L---R-SAFIPDDDKVRIGFA-Y-A----PKCAP--S--PW-WTVV--E-D-EQ   96 ... 
... IECI_ERYVA/2     31 AA-RTGKE--T-CPLTVVQSPFEVSNGEPIRIASQ----F---L-STFIPDGSPYAIGFA-N-P----PSCAA--S--PW-WTVV----E-TS   95 ... 
... ICW3_PSOTE/2     31 TA-KTGNE--P-CPLTVVRSPNEVSKGEPIRISSQ----F---L-SLFIPRGSLVALGFA-N-P----PSCAA--S--PW-WTVV----D-SP   95 ... 
                        || ||||| ||||||||||||||||||||||||||    |   | ||||||||||||||| | ||||||||||  |  || ||||  | | ||  
... 1avac_409        35 MA-PGHG--RH-CPLFVSQDPNGQHDGFPVRITPYGV--A---PSDKIIRLSTDVRISFR-A-Y---TTC-LQ--S--TE-WHID--S-ELAA  105 ... 
... 1wba_409         31 VV-ATGNENPE-DPLSIVKST-RNI-MYATSISSEDKTPP---Q-PRNILENMRLKINFA-T-D---P--H-K--G--DV-WSVV----DFQP   99 ... 
... WIN3_POPSP/3     25 VVNATIN--PI-CNSDVILST-GIE-GLPVTFSPV--INS---T-DGVIREGTLITVSFD-A--S---TCGMA--GVTPM-WKIG--F-NSTA   95 ... 
... IAAS_ORYSA/4     31 MA-PRRV--IP-CPLLVAQETDERRKGFPVRFTPWGG--A---ASPRTLRVSTDVRIRFN-A--A---TLCVQ--S--TE-WHVG--D-EPLT  100 ... 
... IAAS_HORVU/2     31 MA-PGHG--RH-CPLFVSQDPNGQHDGFPVRITPYGV--A---PSDKIIRLSTDVRISFR-A-Y---TTC-LQ--S--TE-WHID--S-ELAA  100 ... 
... ASP_THECC/30     32 LGRATGQ---S-CPEIVVQRRSDLDNGTPVIFSNADS------K-DDVVRVSTDVNIEFV-P---IRDRLCST--S--TV-WRLD--NYDNSA  102 ... 
... MIRA_RICDU/3     31 VS-ATTPNGTFVCPPRVVQTRKEVDHDRPLAFFPE-N--P---K-EDVVRVSTDLNINFS-A-F--MPCRWTS--S--TV-WRLDKYD-E-ST  104 ... 
... IT2_PSOTE/2-     31 AA-KVGNE--D-CPLTVVKSLDENSNGEPIRIASR--L-----R-STFIPEYSLVNLGFA-D--P---PKCAP--S--PF-WTVV--K-D-QS   96 ... 
... ITRY_ACACO/2     31 LA-KTGDE--S-CPLTVVQAQSETKRGLPAVIWTP----P---K-IAILTPGFYLNFEFQPR-D---LPACLQKYS--TLPWKV---E-G---   98 ... 
... ALB1_PSOTE/4     29 VV-ATGNENPE-DPLSIVKST-RNI-MYATSISSEDKTPP---Q-PRNILENMRLKINFA-T-D---P--H-K--G--DV-WSVV----DFQP   96 ... 
	 
	
   
   Complete alignment available here
   
  
   The input alignments and parameter values for this example are provided with the software
   distribution below. 
    
   
    
   Distribution
   
   The source code is in C, distributed as Unix
   tar and
	zip files.
   All noteworthy uses of AlignAlign should cite the paper referenced below.
    
    
     
     
    Acknowledgements
    This research was supported by the US National Science Foundation
    through a grant from the Biological Databases and Informatics Program,
    Grant DBI-0317498, "Robust Tools for Biological Sequence Analysis."
     
     
     
    
    17 May 2007  | 
    John Kececioglu  | 
    Travis Wheeler  | 
    alignalign@cs.arizona.edu  | 
    alignalign.cs.arizona.edu
    
   |