Introduction to Codon Substitution Matrices


     This bio-recipe shows how to use the codon substitution matrices to align DNA sequences.

Mutation Matrices

     Computing the transition counts and from them the mutation matrices is a long process and has been described in (xxx reference). In Counts.drw the 64x64 matrix with all the 8.4 million transition counts can be found. PAM1.drw contains the logarithm of the 1 CodonPAM mutation matrix computed from these counts. For practical purposes, the logarithm lnM of the mutation matrix is used because this allows a simple computation of a mutation matrix of any desired distance t as M(t):=exp(t*lnM). E.g. the 100 CodonPAM mutation matrix can be computed in Darwin as follows:

ReadProgram('CodonMatrix/PAM1.drw');
CodonLogPAM1, CodonFrequencies

      This loads the logarithm of the 1 CodonPAM matrix. It is assigned to the variable CodonLogPAM1.

M100:=exp(100*CodonLogPAM1):
transpose(M100)[1];
[0.3967, 0.02164916, 0.1999, 0.01741684, 0.02023601, 0.00504251, 0.00373650, 
0.00488295, 0.06966439, 0.00820036, 0.02267978, 0.00935011, 0.00700514, 
0.00219475, 0.00393033, 0.00274957, 0.02331321, 0.00400041, 0.02201862, 
0.00321225, 0.00227011, 0.00110470, 0.00082483, 0.00109037, 0.01009535, 
0.00650303, 0.00760544, 0.00399207, 0.00094996, 0.00072819, 0.00168992, 
0.00089541, 0.03355702, 0.00601959, 0.01371621, 0.00574102, 0.01014711, 
0.00312464, 0.00216805, 0.00208131, 0.00577375, 0.00229960, 0.00240056, 
0.00242762, 0.00289402, 0.00149123, 0.00233631, 0.00156308, 0, 0.00227151, 0, 
0.00215627, 0.00260469, 0.00185268, 0.00071089, 0.00184531, 0, 0.00057232, 
0.00084416, 0.00052018, 0.00092128, 0.00071734, 0.00087285, 0.00075607]

     The first column of this matrix gives us now the probabilities that the first codon (i.e. AAA, they are sorted alphabetically, otherwise use CIntToCodon(1)) mutatates to the 64 different codons. Note that the probability for codons 49, 51 and 57 (TAA, TAG and TGA) is 0. These are the stop codons and mutations from sense to stop codons are not considered in this model.

The CodonMatrix data-structure

     From a mutation matrix it is possible to calculate the similarity scores. (Please see the Dayhoff matrix bio-recipe for more information about scoring matrices).

     The emirical codon matrices by Schneider et al. can be found here.

© 2013 by Adrian Schneider, Informatik, ETH Zurich

Index of bio-recipes

Last updated on Tue Nov 19 16:17:31 2013 by AS