This bio-recipe shows how to use the codon substitution matrices to align DNA sequences.

## Mutation Matrices

Computing the transition counts and from them the mutation matrices is a long process and has been described in (xxx reference). In Counts.drw the 64x64 matrix with all the 8.4 million transition counts can be found. PAM1.drw contains the logarithm of the 1 CodonPAM mutation matrix computed from these counts. For practical purposes, the logarithm lnM of the mutation matrix is used because this allows a simple computation of a mutation matrix of any desired distance t as M(t):=exp(t*lnM). E.g. the 100 CodonPAM mutation matrix can be computed in Darwin as follows:

ReadProgram('CodonMatrix/PAM1.drw');

CodonLogPAM1, CodonFrequencies

This loads the logarithm of the 1 CodonPAM matrix. It is assigned to the variable CodonLogPAM1.

M100:=exp(100*CodonLogPAM1): transpose(M100)[1];

[0.3967, 0.02164916, 0.1999, 0.01741684, 0.02023601, 0.00504251, 0.00373650, 0.00488295, 0.06966439, 0.00820036, 0.02267978, 0.00935011, 0.00700514, 0.00219475, 0.00393033, 0.00274957, 0.02331321, 0.00400041, 0.02201862, 0.00321225, 0.00227011, 0.00110470, 0.00082483, 0.00109037, 0.01009535, 0.00650303, 0.00760544, 0.00399207, 0.00094996, 0.00072819, 0.00168992, 0.00089541, 0.03355702, 0.00601959, 0.01371621, 0.00574102, 0.01014711, 0.00312464, 0.00216805, 0.00208131, 0.00577375, 0.00229960, 0.00240056, 0.00242762, 0.00289402, 0.00149123, 0.00233631, 0.00156308, 0, 0.00227151, 0, 0.00215627, 0.00260469, 0.00185268, 0.00071089, 0.00184531, 0, 0.00057232, 0.00084416, 0.00052018, 0.00092128, 0.00071734, 0.00087285, 0.00075607]

The first column of this matrix gives us now the probabilities that the first codon (i.e. AAA, they are sorted alphabetically, otherwise use CIntToCodon(1)) mutatates to the 64 different codons. Note that the probability for codons 49, 51 and 57 (TAA, TAG and TGA) is 0. These are the stop codons and mutations from sense to stop codons are not considered in this model.

## The CodonMatrix data-structure

From a mutation matrix it is possible to calculate the similarity scores. (Please see the Dayhoff matrix bio-recipe for more information about scoring matrices).

The emirical codon matrices by Schneider *et al.* can be found here.

© 2019 by Adrian Schneider, Informatik, ETH Zurich

Last updated on Wed Apr 3 16:29:15 2019 by AS

!!! This document is stored in the ETH Web archive and is no longer maintained !!!