AnCoRe is aimed to identify complementary residues in multiple-sequence alignments (MSA). This allows the prediction of potentially intramolecular interacting residues for homologous proteins, which share a common fold like in the GPHRs. The assumption is, if one of two interacting residues has changed the side chain property by evolutionary mutation, the other residue changed simultaneously the side chain to maintain the interaction by complementary properties. To find these residue positions in a MSA, the complementarity of the amino acids is evaluated by comparing the properties of the participating amino acids.


[Objectives]   [Description]   [Method]   [Tutorial]   [Download]   [Installation]   [References]


Objectives

Beside the prediction of potentially intramolecular interacting residues AnCoRe can also be used to predict the residues which are responsible for the selective interaction of two sets of homologous proteins sharing a conserved intramolecular binding mechanism. For this application AnCoRe is applied to two corresponding MSAs. To find these residue positions, the complementarity of the amino acids is evaluated by comparing the properties of the participating amino acids.


Description

Introduction

Most cellular processes depend largely on specific intra-protein interactions for proper protein folding and on inter-protein interactions for specific protein-protein interactions. However, experimental analyses of these interactions (e.g. by Mutation) are unfortunately expensive and time consuming. To reduce the number of wet lab experiments necessary, good working hypotheses are needed. Such working hypotheses can often be obtained by deciphering the hidden information within the vast amount of sequence information available. Especially alignments of homologous proteins sharing a common fold contain information of great value, which is utilized in the approach presented here.

Rationale

The specificity of intra- and protein-protein interactions is mediated for the most part by the properties of the amino acid side chains. Only complementary properties, suitable for the interaction between amino acid side chains, will allow for binding of the respective poly-peptide chains. For the interaction to remain throughout the course of evolution, a change in the property of one of the interacting partners has to be compensated by a corresponding change of the other partner. These complementary mutations can be spotted in multiple sequence alignments (MSA) and permit the identification of potentially interacting residues.

Prerequisites

  • Large family of homologous proteins

    Intra-molecular (fold stabilizing) interactions:

  • A common fold
  • Correct single multiple sequence alignment

    Inter-molecular (protein-protein) interactions:

  • A conserved binding mechanism
  • Two correct multiple sequence alignments, each for one of the two binding partners with corresponding partners in the same rows



Flow chart of interative process involving AnCoRe

Summary

AnCoRe is a fast and easy to use tool which aids the user in analyzing multiple sequence alignments for potentially interacting residues if no other prior knowledge is available. It can be applied to intra-protein interactions for evaluation of the protein fold, e.g. in homology modeling applications especially for segments exhibiting less sequence conservation. It can also be applied to protein-protein interactions, by comparing two sequence alignments of the participating proteins, to identify residues being potentially responsible for the interaction.

Implementation

Tcl/Tk was used to implement AnCoRe in a cross-platform manner. Therefore AnCoRe should in principle run on every platform supported by Tcl/Tk. AnCoRe has been successfully tested on the following platforms using Tcl/Tk 8.3:

  • Linux
  • SGI Irix 6.5
  • Microsoft Windows NT/2000/XP


Method

The MSA consists of m sequences S, each containing n residues aa taken from the amino acid alphabet AA (1).

Amino acid alphabet

Amino acids are grouped into classes corresponding to similar amino acid properties. Using this reduced class alphabet, with each member c representing a subset CDc from the amino acid alphabet, the MSA is converted into m sequences C (2).

Class definition - reduced amino acid alphabet

The class break pattern is obtained by evaluating the changes of amino acid classes from sequence to sequence at a given homologous residue position. CB is then the set of sequence numbers for which the residue class cs at a given residue position r is different from the residue class cs-1 of the preceding sequence (3).

Class break pattern

The complementarity of two residue positions in one multiple-sequence alignment (intra-protein) or two corresponding multiple-sequence alignments (protein-protein) is evaluated by AnCoRe using three scores:

1. CIS - the amino acid class complementarity score: CIS is the mean score of all class pairs at the two residue positions (4).

CIS - class interaction score

In addition to predefined class definitions [Murphy et al. 2000], the user can supply customized ones. The class complementarity matrix MCIS [Betancourt et al. 1999], defining the corresponding complementarity scores, is also user customizable. This enables the user to search for specific interaction types only (e.g. charged interactions, aromatic interactions, etc.) while suppressing unwanted less specific potential interactions.

2. AIS - the amino acid complementarity score: AIS is the mean score of all amino acid pairs at two residue positions taken from an amino acid complementarity matrix MAIS [Betancourt et al. 1999] (5).

AIS - amino acid interaction score

Different matrices can be used and even customized. By additionally calculating the AIS the coarse grained results from evaluating the CIS can be refined while retaining its capability to filter out less likely interactions.

3. CBS - the class break similarity score: The comparison of two class break patterns at two residue positions for similarity yields the CBS (6).

CBS - class break similarity score

The CBS emphasizes the simultaneity of property changes at the two residue positions.

RIR - the three scores are combined into a composite RIR score, which is used to rank the potentially interacting residue pairs (7).

RIR - composite residue interaction rank

Additionally it is possible to analyze each score individually to focus only on a specific aspect.



Tutorial

In order to follow the tutorial please download the exemplary synthetic alignments tutmsa1.msf, tutmsa2.msf and install AnCoRe as described below. The you can follow the tutorial outlined in the next section:



tutmsa1.msf                tutmsa2.msf

Synthetic multiple sequence alignmnet (MSA) for protein 1 for tutorial       Synthetic multiple sequence alignmnet (MSA) for protein 2 for tutorial


Prior to starting a new AnCoRe project:

  1. Decide in which directory you want the AnCoRe analyses go to, e.g. ~/ancoretut
  2. Place the newly downloaded tutmsa1.msf and tutmsa2.msf in this directory.

The project definition in the main AnCoRe window

Start Ancore and define a new Project:

  1. Start AnCoRe (see below on how to install it).
  2. Click [New...] in order to create a new project.
  3. Provide the path to the directory under which you want the project to be created (for each project a new directory will be created), e.g. "ancoretut"
  4. Give the project a name, e.g. "tutmsa". This will be the name of the directory that is created for the project.
  5. Select the analysis mode "Inter-protein correlations" (This is settable in the top area of the AnCoRe main window).
  6. Provide an alignment for Protein 1: click [Choose...] and select the "tutmsa1.msf" provided with this tutorial.
  7. Provide an alignment for Protein 2: click [Choose...] and select the "tutmsa2.msf" provided with this tutorial.
  • For the "Intra-protein correlations" mode you need only to provide an alignment for Protein 1.
  • For "Inter-protein correlations" you have to provide an alignment for protein 1 AND protein 2. Both alignments MUST match each other in the way that the ith protein in alignment 1 interacts with the ith protein in alignment 2. This implies that both alignments MUST have the same number of proteins.

The selection of the matrices in the main AnCoRe window

Select the matrices to use for this analysis. The AnCoRe installation provides a basic set of matrices (see below in the installation section) which are installed in the AnCoRe installation path under "matrices".

  1. For the amino acid class definition (CDF), click [Choose...] and select "3_syn_charge.cdf" from "ancore/matrices". This class definition focuses on the interaction of charged residues. For further explanation of the matrices provided with AnCoRe see below in the installation section.
  2. For the amino acid class interaction matrix (CIF), click [Choose...] and select "3_syn_charge_syn.cif". Make always sure you select the corresponding class interaction matrix for the specified class definition. If the two selections do not match unpredictable results may be obtained.
  3. Finally, select for the amino acid interaction matrix (AIF) "20_betancourt_b_scale01.aif" again by clicking [Choose...].

The class definition edit window.
The class interaction matrix edit window.
The amino acid interaction matrix edit window.

An important feature of AnCoRe is the possibility to define custom matrices focusing on a specific type of interaction while suppressing many other potential interactions. The user can also customize the provided matrices. In order to edit a given matrix click on [Edit...]:

  1. To view the content of the chosen class definition click now [Edit...] and click again close without saving after inspection.
  2. Repeat this for the class interaction matrix.
  3. Repeat this for the amino acid interaction matrix.

The setting of the analysis threshold values.

Set the thresholds and weights for the AnCoRe analysis and start the calculation.

  1. Click [Edit values...] to make the parameter section of the main window accessible.
  2. Set all thresholds to "0.0". You can apply arbitrary thresholds also after the analysis. This feature is usefull for larger alignments, because it reduces the size of the final reults.
  3. Set the weigths to "1.0". Since the provided matrices are rescaled to the 0 to 1 range. Their influence to the final residue interaction rank (RIR) is then equal. The weights can be either used to compensate different scoring ranges from different matrices or they can be used to control the influence of a certain score to the total rank, thus focusing more on a specific score.
  4. Start the calculation by clicking [Calculate].

The AnCoRe analysis produces a number of files in the project directory.

  1. The project file: tutmsa.prj
    It contains the parameter settings used while calculating the respective project. It can be used as a reference or to reload a already calculated project.
  2. The cma files: tutmsa1.cma and tutmsa2.cma
    Each msf file is converted into an AnCoRe internal representation of the alignment which is saved as a cma file for future reference. The alignment is already transposed in these files.
  3. The cla files: tutmsa1.cla and tutmsa2.cla
    The cla files are very similar to the cma files but the amino acid sequence has been translated into the class sequence according to the given class definition. These are basically sequences using a reduced amino acid alphabet.
  4. The ccb files: tutmsa1.ccb and tutmsa2.ccb
    These files contain the class break patterns for the alignments.
  5. The cis file: tutmsa1_tutmsa2.cis
    It contains for those residue pairs the class interaction score, which score above the given threshold.
  6. The cbs file: tutmsa1_tutmsa2.cbs
    It contains for those residue pairs the class break similarity score, which score above the given threshold.
  7. The ais file: tutmsa1_tutmsa2.ais
    It contains for those residue pairs the amino acid interaction score, which score above the given threshold.
  8. The rir file: tutmsa1_tutmsa2.rir
    It contains the residue interaction rank.
  9. The summary table files: tutmsa1_tutmsa2_v.tab and tutmsa1_tutmsa2_h.tab
    These files contain the summarized results in table form either with residue pair records in rows or columns. The fields are separated by tabulators. These files facilitate the data exchange with other software packages, e.g. Excel, R or relational databases, which can be very helpful for including further information (e.g. structural neighborhood) or perform more complex queries.

The results analysis window.

AnCoRe itself provides a tool to visualize and filter the results.

  1. Click [Show results] to open the Results window.

  2. A few exemplary potential interactions can be summerized as follows:
  • The clearly complementary residues 11 and 4 rank first if all three scores are combined.
  • The potential interaction between the completely conserved residue 5 from protein 1 and 3,8,14 from protein2. Rank also very high. But this also shows that completely conserved residue positions do not contain a lot information. These residues may be suppressed by requiring at least 1 class break. This option is available in the main AnCoRe window.
  • Remember that the class interaction matrix completely suppressed hydrophobic interactions (see CIS score of 0.0) which are probably the most difficult to handle. Still the amino acid interaction matrix ranks these types of interactions high. To avoid this, the results list can be resorted only using the CIS score.
  • Try now other class definitions and class interaction matrices to filter other types of interactions. Interactions like the one between residues 6 and 6 of protein 1 and 2, respectively, will be ranked better if e.g. the Betancourt HP interaction model is used (select 5_betancourt_hp.cdf and 5_betancourt_hp_scale01.cif). These residue positions in which an electrostatic interaction is replace by a hydrophobic interaction require that the hydrophobic interactions are considered as well. In order to suppress then the high number (due to the combinatorial increase) of suggested interactions between completely hydrophobic residue positions from the hydrophobic core of a protein, require again at least 1 class break.


You have successfully finished the tutorial!




Download

The current release of  AnCoRe V.1.0.1 is available as a .tar.gz or .zip archive. The archives contain the AnCoRe script together with some exemplary matrices contained in a predefined directory structure.



Installation

To install the AnCoRe package please download one of the above archives. Unpack the downloaded archive to a directory of your choice. under Linux and Unix you can use the following command:

 % gunzip -c ancore_101.tar.gz | tar xvf -

Under Windows use WinZip or a similar tool and make sure you extract also the directory structure.

The unpacking of the archive produces the following directory structure:

/ancore

The main AnCoRe Installation directory

    ancore.cfg

The optional AnCoRe configuration file defining defaults for certain parameters

/ancore/bin

Contains the Tcl/Tk script and shell scripts to start the program

    ancore.tcl

The main AnCoRe script

    ancore

The Bourne Shell start script for Linux/Unix

    ancore.bat

The Windows start script

/ancore/matrices

Contains exemplary matrices for CDF, CIF and AIF

    2_syn_aromat.cdf

Class definition (CDF) filtering aromatic residues
(aromatic/non-aromatic)

    2_syn_aromat.cif

Synthetic class interaction matrix (CIF) for aromatic residues

    3_syn_charge.cdf

Class definition (CDF) filtering charged residues
(positively charged/negatively charged/non-charged)

    3_syn_charge_syn.cif

Synthetic class interaction matrix (CIF) for charged residues

    4_syn_hbond.cdf

Class definition (CDF) filtering charged and H-bond acceptor/donor residues
(positively charged/negatively charged/polar/unpolar)

    4_syn_hbond_woWY.cdf

Class definition (CDF) filtering charged and H-bond acceptor/donor residues, with W/Y as unpolar residues
(positively charged/negatively charged/polar/unpolar)

    4_syn_hbond_syn.cif

Synthetic class interaction matrix (CIF) for charged and H-bond acceptor/donor residues

    5_betancourt_hp.cdf

Class definition (CDF) filtering the reduced hyrophobic/polar amino acid alphabet proposed by Betancourt et al.
(positively charged/negatively charged/polar/C,H,P,G/unpolar)

    5_betancourt_hp_scale01.cif

Calculated class interaction matrix (CIF) for the reduced hyrophobic/polar amino acid alphabet proposed by Betancourt et al.

    20_betancourt_b_scale01.aif

Calculated amino acid interaction matrix (AIF) for the complete amino acid alphabet by Betancourt et al.

A working Tcl/Tk (version 8.3 or later) scripting environment is needed in order to run the AnCoRe script. Most Linux distributions are delivered with a current version of Tcl/Tk, but it may be necessary to install the package separately. For Windows and some other platforms the sources and for some platforms also pre-compiled binaries are available at:

A very good free development environment that can also be used as a runtime environment is

Especially for SGI Irix, the TclPro package provides a current pre-compiled binary.

For Unix/Linux edit the /ancore/bin/ancore Bourne shell script as follows:
Please provide the full path to the Tcl/Tk binary wish as the variable TCLBIN and set the variable ANCOREPATH to the main AnCoRe installation directory where you unpacked the AnCoRe files to.

For Windows edit the /ancore/bin/ancore.bat shell script as follows:
Please provide the full path to the Tcl/Tk binary wish.exe as the variable TCLBIN and set the variable ANCOREPATH to the main AnCoRe installation directory where you unpacked the AnCoRe files to. You can create a shortcut to this batch file and put it on your desktop or in your startmenu.

Now you should now be able to start AnCoRe using the start scripts:
Under Unix/Linux (you must have the ancore/bin directory in your path):

 % ancore

Under Windows:
Double click ancore.bat or the shortcut you have created.



References

  • Betancourt, M.R. and Thirumalai, D. 1999. Pair potentials for protein folding: Choice of reference states and sensitivity of predicted native states to variations in the interaction schemes. Protein Science 8: 361-369.
  • Murphy, L.R., Wallqvist, A. and Levy, R.M. 2000. Simplified amino acid alphabets for protein fold recognition and implications for folding. Protein Engineering 13: 149-152.