Basic introduction to programming in Darwin

     Darwin is an interactive system which executes expressions or programs which are given as input. Expressions are evaluated directly, and the result(s) are printed for each statement.

1+2;  2*3;  3/4;  4-5;  5^6;

     Each Darwin statement is terminated with a semicolon (;). If you forget the semicolon, the system will do nothing until one is read. In some cases, it may result in an invalid input.


nothing happens until we enter the missing (;)
on line 12, syntax error:
forgetting the semi colon causes an error in the next statement.

     Functions are evaluated on the arguments which are enclosed in parenthesis. Darwin knows about most mathematical functions.

sqrt(0);  sqrt(1);  sqrt(2);  sqrt(36);
log(1);  log10(10);  log(2);  exp(log(2));
max(Pi,exp(1),sqrt(2));  abs(tan(2));
round(2.499999), round(2.5), round(2.50000001);
2, 2, 3
sin(0), cos(Pi), arctan(1000000000)/Pi;
0, -1, 0.5000

     The ditto symbol (") can be used to recall the previous expression evaluated. A double ditto ("") recalls the second back and a triple (""") the third back.

1;  20;  30;
("" + """) / ";

     Repetition of statements is done with the do ... od construct. Any number of statements or expressions can be placed in the body of the loop. The do part is preceeded by many optional parts, for, from, by, to and while. The following loop calculates an expression with four terms for each value of i from 1 to 5.

for i to 5 do  i, i^2, i^3, log(i)  od;
1, 1, 1, 0
2, 4, 8, 0.6931
3, 9, 27, 1.0986
4, 16, 64, 1.3863
5, 25, 125, 1.6094

     The following loop finds out prime numbers less than 100 (by taking a shortcut and testing only against the possible primes). The function mod computes the rest of the division by its second argument, so by testing that the mod is not zero we are testing for not divisible by the second argument.

for i from 101 by -2 to 70 do
      if mod(i,3) <> 0 and mod(i,5) <> 0 and mod(i,7) <> 0
            then lprint( i, 'is prime') fi od;
101 is prime
97 is prime
89 is prime
83 is prime
79 is prime
73 is prime
71 is prime

     The following loop, starts at 36 increasing by 3 while the square of this number is less than 3000. For each value it tests several conditions with an if statement with several branches and prints messages accordingly.

for i from 36 by 3 while i^2 < 3000 do
     if mod(i,13) = 0 then lprint( i,'is an unlucky number' )
     elif mod(i,7) = 0 then lprint( i,'is a lucky number' )
     elif mod(i,9) = 0 then
          lprint( i,'is easy to check for divisibility' )
     else lprint( i, 'is a boring number' ) fi
36 is easy to check for divisibility
39 is an unlucky number
42 is a lucky number
45 is easy to check for divisibility
48 is a boring number
51 is a boring number
54 is easy to check for divisibility

     Variables are like boxes where we can store values. Variables are identified by a name, which has to be composed of letters and numbers or underscores (numbers or underscores cannot be the first character). To store a value in a variable we use the assignment operator (:=). In Darwin, any name can be used for any value without any previous declaration.

a := 1;  b := 2;
a := 1
b := 2
x := [1,2,3];  MyFile := 'gonnet.drw';  An_extremely_long_name := 0;
x := [1, 2, 3]
MyFile := gonnet.drw
An_extremely_long_name := 0

     Assigned variables can be used later anywhere a constant can be used.


     Procedures or functions are declared enclosed between the tokens proc and end. The procedure is an object which can be assigned to a variable. Once this is done, the named variable can be used as a function.

p := proc() print('the procedure is being executed') end;
p := proc () print(the procedure is being executed) end
the procedure is being executed

     A function or procedure may have parameters. These are specified in parenthesis after the word proc, and can be used anywhere inside the procedure. The final value evaluated by a procedure is the value returned, and can be used as a function value.

q := proc(a,b,c) (a+b)*c end;
q := proc (a, b, c) (a+b)*c end

     In principle, the arguments can contain any types of values.

r := proc(a,b,c)
   lprint('the first argument is',a);
   lprint('the second argument is',b);
   lprint('the third argument is',c);
r(1/7,100*Pi,'this is a string of text');
the first argument is 0.1429
the second argument is 314.1593
the third argument is this is a string of text

     The following procedure does some computation which requires the use of a local variable res. The result is computed in this local variable and it is evaluated at the end of the function so that its value is returned as the value of the function. Since the second argument should be an integer, we force checking its type.

s := proc( a, b:integer )
     res := a;
     for j to b do res := res*(a+j) od;
Error, s expects a 2nd argument, b:integer, found: 2.5000

     The local variables of the function above res, j exist only inside the procedure, and do leave traces once the procedure finished execution.

res, j;
res, j

     Any variable which is assigned inside a procedure (or is the variable or a for loop) becomes local. To prevent an assigned variable to be local, it has to be explicitly declared global.

s2 := proc( a, b )
   global type_of_error;
   if b=0 then
	type_of_error := 'division by 0';  0
   else type_of_error := 'none';  a/b fi;
s2(1,3);  type_of_error;
s2(1,0);  type_of_error;
division by 0

     Several names have special meaning inside procedures. procname is the name of the function called,args contains all the argumentsnargs is the number of arguments used. This facilitates writing very generic functions.

t := proc()
  printf('the function %s was called with %d arguments\n',
     procname, nargs);
  for i to nargs do
     printf('argument %d --> %a of type %a\n',
        i, args[i], type(args[i]) ) od;
the function t was called with 0 arguments
the function t was called with 4 arguments
argument 1 --> 1 of type integer
argument 2 --> 3.1416 of type float
argument 3 --> [1, 2, 3] of type list
argument 4 --> abcd of type string

     The following is a classical example to show recursive computation; the computation of factorials.

fact := proc( n:integer )
  description 'compute the factorial of a positive integer';
  if n < 0 then error('negative argument')
  elif n < 2 then 1 else fact(n-1)*n fi
The following produce errors:
Error, fact expects a 1st argument, n:integer, found: 0.5000
Error, (in fact) negative argument
The description becomes a quick way of identifying the function:
fact: Usage: fact( n:integer )
compute the factorial of a positive integer

     The following function returns a list with three numbers. Lists are the same objects as arrays, and matrices are lists of lists.

three := proc( v:numeric )
  [ v+1,v+2,v+3 ]
[101, 102, 103]

     List of numbers allow normal arithmetic.

10 * three(100) + [0.1,0.11,0.111];
[1010.1000, 1020.1100, 1030.1110]

     Vectors multiplied by vectors (of the same length) are interpreted as inner products.

three(1) * three(1000);

     Data structures are identical to function calls, except that the function does not evaluate, it remains identical to the call itself.

me := Person( Gaston, 1948, true,
   [Ariana,Pedro,Julio,Ignacio] );
me := Person(Gaston,1948,true,[Ariana, Pedro, Julio, Ignacio])

     For further operations with data structures, please see the bio-recipes on classes (Counters and Poisson). Here we will just perform the most basic operations.

Person := proc( name:string, BirthYear:posint, married:boolean,
  children:list(string) )
  noeval( procname(args) )
CompleteClass( Person );
length(me), me[3];
4, true
me[name], length(me[children]);
Gaston, 4

     A comprehensive help system is available. By entering a name after a question mark (at the beginning of a line), the help system is activated:


Function Align - align sequences using various modes of dynamic programming

Calling Sequence:  Align(seq1,seq2,method,DayMat)

  Name     Type                          Description                            
  seq1     {ProbSeq,string}              pept, nucleot or probabilistic sequence
  seq2     {ProbSeq,string}              pept, nucleot or probabilistic sequence
  method   string                        the mode of dynamic programming to use 
  DayMat   {DayMatrix,list(DayMatrix)}   Dayhoff matrices used for alignment    

Synopsis:  Align does an alignment of two sequences using the similarity
  scores given in the DayMat and the given method.  If a single DayMatrix is
  given, the alignment is done using it.  If a list of DayMatrix is given, it
  is understood that the best PAM matrix be used.  In this case Align will
  also compute the PamDistance and PamVariance between the two sequences.  The
  method is optional, if not given it assumes Local.  The valid methods are:

Local   A local alignment will be performed, this means that the best
        subsequences of seq1 and seq2 will be selected to be aligned.  This
        type of alignment gives the highest possible similarity score of any
        alignment.  This is sometimes called the Smith & Watermann algorithm.
Global  A global alignment will be performed, this means that the entire seq1
        is aligned against the entire seq2.  This may result in a negative
        score if the sequences do not align very well.  This is sometimes
        called the Needleman & Wunsch algorithm.
CFE     A Cost-Free ends alignment is done.  This is like a Global alignment,
        but deletions of one of the sequences at each of the end are not
        penalized.  In some sense it is between a Local and a Global
Shake   A forward-backward alignment is performed.  This alignment iterates
        forward and backwards until the score cannot be increased.  In its
        forward phase will start at the given positions for seq1 and seq2 and
        find the ends which give a maximal score.  From this end, it will
        perform backwards dynamic programming to find the optimal beginning,
        and so on until convergence.  This type of alignment is quite similar
        to a Local alignment, but can be directed to focus on a particular
        alignment, even though it may not be the best of the two sequences.

If the DayMat is omitted, the global variable DM (if assigned a DayMatrix) is
  used, else a PAM-250 matrix is constructed.
If in addition to the method, the keyword "NoSelf" is included, when sequences
  of peptides or nucleotides are aligned (excluding ProbSeq), self-matches are
  not allowed.  That is, if a sequence is aligned to itself (being
  structurally the same string, this we call self-alignment), the self-match
  (which is trivial) will not be allowed.  This is done by giving the
  alignment of a position with itself a large penalty.  By doing this it is
  possible to find repeated patterns.  I.e. an alignment with itself, where
  the identity is ruled out, will show any repeated patterns.  In particular
  if the sequences align with an offset of k, then there is a k-long motif
  which is repeated in the sequence.
The method to find the approximate PamDistance and variance may not find the
  global maximum of the Score, it may find a local maximum.  By using the
  argument "ApproxPAM=ppp", the search for the maximum will be started at PAM
  distance ppp.  This may help when we know an approximation of the distance,
  or may provide a way of exploring the existence of other local maxima.

> Align(AC(P00083),AC(P00091));
> Align(Entry(1),Entry(2),Local,DMS);
> Align(AC(P13475),AC(P13475),Local,DMS,NoSelf);
See Also:
         ?Alignment         ?CodonAlign      ?DynProgStrings    ?MAlign
         ?CalculateScore    ?DynProgScore    ?EstimatePam              


     The help system uses approximate string matching to find appropriate topics:

?geriatic code

GGG  G  Gly    AGG  R  Arg    CGG  R  Arg    UGG  W  Trp    
GGA  G  Gly    AGA  R  Arg    CGA  R  Arg    UGA    Stop    
GGC  G  Gly    AGC  S  Ser    CGC  R  Arg    UGC  C  Cys    
GGU  G  Gly    AGU  S  Ser    CGU  R  Arg    UGU  C  Cys    

GAG  E  Glu    AAG  K  Lys    CAG  Q  Gln    UAG    Stop    
GAA  E  Glu    AAA  K  Lys    CAA  Q  Gln    UAA    Stop    
GAC  D  Asp    AAC  N  Asn    CAC  H  His    UAC  Y  Tyr    
GAU  D  Asp    AAU  N  Asn    CAU  H  His    UAU  Y  Tyr    

GCG  A  Ala    ACG  T  Thr    CCG  P  Pro    UCG  S  Ser    
GCA  A  Ala    ACA  T  Thr    CCA  P  Pro    UCA  S  Ser    
GCC  A  Ala    ACC  T  Thr    CCC  P  Pro    UCC  S  Ser    
GCU  A  Ala    ACU  T  Thr    CCU  P  Pro    UCU  S  Ser    

GUG  V  Val    AUG  M  Met    CUG  L  Leu    UUG  L  Leu    
GUA  V  Val    AUA  I  Ile    CUA  L  Leu    UUA  L  Leu    
GUC  V  Val    AUC  I  Ile    CUC  L  Leu    UUC  F  Phe    
GUU  V  Val    AUU  I  Ile    CUU  L  Leu    UUU  F  Phe    

See Also:
   ?AltGenCode      ?BaseToInt    ?CIntToAmino    ?CodonToInt     ?IntToBBB  
   ?AminoToInt      ?BBBToInt     ?CIntToCodon    ?Complement     ?IntToCInt 
   ?antiparallel    ?BToInt       ?CIntToInt      ?GeneticCode    ?IntToCodon
   ?AToCInt         ?CIntToA      ?CodonToA       ?IntToB         ?Reverse   
   ?AToCodon        ?CIntToAAA    ?CodonToCInt    ?IntToBase                 


     Next we will align two sequences of amino acids.

a := Align(s1,s2):
Warning: procedure s2 reassigned

The system warns the user that the variable s2 had been previously assigned to a procedure and now it is reassigned (and hence the procedure becomes inaccessible). We have terminated the last line with a colon (:). This makes the system not echo the value of the line. In this case we want to print the alignment, rather than to see the structure holding it.

lengths=27,24 simil=61.9, PAM_dist=250, identity=63.0%
:|||||||.||   ||||||!||.:::

     For further details on alignments and using a database please see the bio-recipe on Alignments.

© 2011 by Gaston Gonnet, Informatik, ETH Zurich

Index of bio-recipes

Last updated on Mon Oct 3 17:23:30 2011 by GhG

!!! Dieses Dokument stammt aus dem ETH Web-Archiv und wird nicht mehr gepflegt !!!
!!! This document is stored in the ETH Web archive and is no longer maintained !!!