# Basic introduction to programming in Darwin

Darwin is an interactive system which executes expressions or programs which are given as input. Expressions are evaluated directly, and the result(s) are printed for each statement.

`1+2;  2*3;  3/4;  4-5;  5^6;`
```3
6
0.7500
-1
15625
```
`(1+2)*3/4^5;`
```0.00878906
```

Each Darwin statement is terminated with a semicolon (;). If you forget the semicolon, the system will do nothing until one is read. In some cases, it may result in an invalid input.

`1+1`

nothing happens until we enter the missing (;)
`;`
```2
```
```355/113
1+2;```
```on line 12, syntax error:
1+2;
^
```
forgetting the semi colon causes an error in the next statement.

Functions are evaluated on the arguments which are enclosed in parenthesis. Darwin knows about most mathematical functions.

`sqrt(0);  sqrt(1);  sqrt(2);  sqrt(36);`
```0
1
1.4142
6
```
`log(1);  log10(10);  log(2);  exp(log(2));`
```0
1
0.6931
2
```
`max(Pi,exp(1),sqrt(2));  abs(tan(2));`
```3.1416
2.1850
```
`round(2.499999), round(2.5), round(2.50000001);`
```2, 2, 3
```
`sin(0), cos(Pi), arctan(1000000000)/Pi;`
```0, -1, 0.5000
```

The ditto symbol (") can be used to recall the previous expression evaluated. A double ditto ("") recalls the second back and a triple (""") the third back.

```1;  20;  30;
("" + """) / ";```
```1
20
30
0.7000
```

Repetition of statements is done with the do ... od construct. Any number of statements or expressions can be placed in the body of the loop. The do part is preceeded by many optional parts, for, from, by, to and while. The following loop calculates an expression with four terms for each value of i from 1 to 5.

`for i to 5 do  i, i^2, i^3, log(i)  od;`
```1, 1, 1, 0
2, 4, 8, 0.6931
3, 9, 27, 1.0986
4, 16, 64, 1.3863
5, 25, 125, 1.6094
```

The following loop finds out prime numbers less than 100 (by taking a shortcut and testing only against the possible primes). The function mod computes the rest of the division by its second argument, so by testing that the mod is not zero we are testing for not divisible by the second argument.

```for i from 101 by -2 to 70 do
if mod(i,3) <> 0 and mod(i,5) <> 0 and mod(i,7) <> 0
then lprint( i, 'is prime') fi od;```
```101 is prime
97 is prime
89 is prime
83 is prime
79 is prime
73 is prime
71 is prime
```

The following loop, starts at 36 increasing by 3 while the square of this number is less than 3000. For each value it tests several conditions with an if statement with several branches and prints messages accordingly.

```for i from 36 by 3 while i^2 < 3000 do
if mod(i,13) = 0 then lprint( i,'is an unlucky number' )
elif mod(i,7) = 0 then lprint( i,'is a lucky number' )
elif mod(i,9) = 0 then
lprint( i,'is easy to check for divisibility' )
else lprint( i, 'is a boring number' ) fi
od;```
```36 is easy to check for divisibility
39 is an unlucky number
42 is a lucky number
45 is easy to check for divisibility
48 is a boring number
51 is a boring number
54 is easy to check for divisibility
```

Variables are like boxes where we can store values. Variables are identified by a name, which has to be composed of letters and numbers or underscores (numbers or underscores cannot be the first character). To store a value in a variable we use the assignment operator (:=). In Darwin, any name can be used for any value without any previous declaration.

`a := 1;  b := 2;`
```a := 1
b := 2
```
`x := [1,2,3];  MyFile := 'gonnet.drw';  An_extremely_long_name := 0;`
```x := [1, 2, 3]
MyFile := gonnet.drw
An_extremely_long_name := 0
```

Assigned variables can be used later anywhere a constant can be used.

`(10*a+b)^b;`
```144
```

Procedures or functions are declared enclosed between the tokens proc and end. The procedure is an object which can be assigned to a variable. Once this is done, the named variable can be used as a function.

```p := proc() print('the procedure is being executed') end;
p();```
```p := proc () print(the procedure is being executed) end
the procedure is being executed
```

A function or procedure may have parameters. These are specified in parenthesis after the word proc, and can be used anywhere inside the procedure. The final value evaluated by a procedure is the value returned, and can be used as a function value.

```q := proc(a,b,c) (a+b)*c end;
q(1,2,3);
q(1/2,3,Pi);```
```q := proc (a, b, c) (a+b)*c end
9
10.9956
```

In principle, the arguments can contain any types of values.

```r := proc(a,b,c)
lprint('the first argument is',a);
lprint('the second argument is',b);
lprint('the third argument is',c);
end:
r(1/7,100*Pi,'this is a string of text');```
```the first argument is 0.1429
the second argument is 314.1593
the third argument is this is a string of text
```

The following procedure does some computation which requires the use of a local variable res. The result is computed in this local variable and it is evaluated at the end of the function so that its value is returned as the value of the function. Since the second argument should be an integer, we force checking its type.

```s := proc( a, b:integer )
res := a;
for j to b do res := res*(a+j) od;
res
end:
s(1,10);
s(10,3);```
```39916800
17160
```
`s(10,2.5);`
```Error, s expects a 2nd argument, b:integer, found: 2.5000
```

The local variables of the function above res, j exist only inside the procedure, and do leave traces once the procedure finished execution.

`res, j;`
```res, j
```

Any variable which is assigned inside a procedure (or is the variable or a for loop) becomes local. To prevent an assigned variable to be local, it has to be explicitly declared global.

```s2 := proc( a, b )
global type_of_error;
if b=0 then
type_of_error := 'division by 0';  0
else type_of_error := 'none';  a/b fi;
end:
s2(1,3);  type_of_error;```
```0.3333
none
```
`s2(1,0);  type_of_error;`
```0
division by 0
```

Several names have special meaning inside procedures. procname is the name of the function called,args contains all the argumentsnargs is the number of arguments used. This facilitates writing very generic functions.

```t := proc()
printf('the function %s was called with %d arguments\n',
procname, nargs);
for i to nargs do
printf('argument %d --> %a of type %a\n',
i, args[i], type(args[i]) ) od;
end:
t();```
```the function t was called with 0 arguments
```
`t(1,Pi,[1,2,3],'abcd');`
```the function t was called with 4 arguments
argument 1 --> 1 of type integer
argument 2 --> 3.1416 of type float
argument 3 --> [1, 2, 3] of type list
argument 4 --> abcd of type string
```

The following is a classical example to show recursive computation; the computation of factorials.

```fact := proc( n:integer )
description 'compute the factorial of a positive integer';
if n < 0 then error('negative argument')
elif n < 2 then 1 else fact(n-1)*n fi
end:
fact(0);
fact(10);```
```1
3628800
```
The following produce errors:
```fact(1/2);
fact(-1);```
```Error, fact expects a 1st argument, n:integer, found: 0.5000
Error, (in fact) negative argument
```
The description becomes a quick way of identifying the function:
`print(fact);`
```fact: Usage: fact( n:integer )
compute the factorial of a positive integer
```

The following function returns a list with three numbers. Lists are the same objects as arrays, and matrices are lists of lists.

```three := proc( v:numeric )
[ v+1,v+2,v+3 ]
end:
three(100);```
```[101, 102, 103]
```

List of numbers allow normal arithmetic.

`10 * three(100) + [0.1,0.11,0.111];`
```[1010.1000, 1020.1100, 1030.1110]
```

Vectors multiplied by vectors (of the same length) are interpreted as inner products.

`three(1) * three(1000);`
```9020
```

Data structures are identical to function calls, except that the function does not evaluate, it remains identical to the call itself.

```me := Person( Gaston, 1948, true,
[Ariana,Pedro,Julio,Ignacio] );```
```me := Person(Gaston,1948,true,[Ariana, Pedro, Julio, Ignacio])
```

For further operations with data structures, please see the bio-recipes on classes (Counters and Poisson). Here we will just perform the most basic operations.

```Person := proc( name:string, BirthYear:posint, married:boolean,
children:list(string) )
noeval( procname(args) )
end:
CompleteClass( Person );
type(me,Person);```
```true
```
`length(me), me;`
```4, true
```
`me[name], length(me[children]);`
```Gaston, 4
```

A comprehensive help system is available. By entering a name after a question mark (at the beginning of a line), the help system is activated:

`?Align;`
```
Function Align - align sequences using various modes of dynamic programming

Calling Sequence:  Align(seq1,seq2,method,DayMat)

Parameters:
Name     Type                          Description
------------------------------------------------------------------------------
seq1     {ProbSeq,string}              pept, nucleot or probabilistic sequence
seq2     {ProbSeq,string}              pept, nucleot or probabilistic sequence
method   string                        the mode of dynamic programming to use
DayMat   {DayMatrix,list(DayMatrix)}   Dayhoff matrices used for alignment

Returns:
Alignment
Synopsis:  Align does an alignment of two sequences using the similarity
scores given in the DayMat and the given method.  If a single DayMatrix is
given, the alignment is done using it.  If a list of DayMatrix is given, it
is understood that the best PAM matrix be used.  In this case Align will
also compute the PamDistance and PamVariance between the two sequences.  The
method is optional, if not given it assumes Local.  The valid methods are:

Local   A local alignment will be performed, this means that the best
subsequences of seq1 and seq2 will be selected to be aligned.  This
type of alignment gives the highest possible similarity score of any
alignment.  This is sometimes called the Smith & Watermann algorithm.
Global  A global alignment will be performed, this means that the entire seq1
is aligned against the entire seq2.  This may result in a negative
score if the sequences do not align very well.  This is sometimes
called the Needleman & Wunsch algorithm.
CFE     A Cost-Free ends alignment is done.  This is like a Global alignment,
but deletions of one of the sequences at each of the end are not
penalized.  In some sense it is between a Local and a Global
alignment.
Shake   A forward-backward alignment is performed.  This alignment iterates
forward and backwards until the score cannot be increased.  In its
forward phase will start at the given positions for seq1 and seq2 and
find the ends which give a maximal score.  From this end, it will
perform backwards dynamic programming to find the optimal beginning,
and so on until convergence.  This type of alignment is quite similar
to a Local alignment, but can be directed to focus on a particular
alignment, even though it may not be the best of the two sequences.

If the DayMat is omitted, the global variable DM (if assigned a DayMatrix) is
used, else a PAM-250 matrix is constructed.
If in addition to the method, the keyword "NoSelf" is included, when sequences
of peptides or nucleotides are aligned (excluding ProbSeq), self-matches are
not allowed.  That is, if a sequence is aligned to itself (being
structurally the same string, this we call self-alignment), the self-match
(which is trivial) will not be allowed.  This is done by giving the
alignment of a position with itself a large penalty.  By doing this it is
possible to find repeated patterns.  I.e. an alignment with itself, where
the identity is ruled out, will show any repeated patterns.  In particular
if the sequences align with an offset of k, then there is a k-long motif
which is repeated in the sequence.
The method to find the approximate PamDistance and variance may not find the
global maximum of the Score, it may find a local maximum.  By using the
argument "ApproxPAM=ppp", the search for the maximum will be started at PAM
distance ppp.  This may help when we know an approximation of the distance,
or may provide a way of exploring the existence of other local maxima.

Examples:
> Align(AC(P00083),AC(P00091));
Alignment(Sequence(AC('P00083'))[14..92],Sequence(AC('P00091'))[19..97],177.7799,DM,0,0,{Local})
> Align(Entry(1),Entry(2),Local,DMS);
Alignment(Sequence(AC('P15711'))[905..917],Sequence(AC('Q43495'))[13..25],45.1050,DMS,80,1153.8025,{Local})
> Align(AC(P13475),AC(P13475),Local,DMS,NoSelf);
Alignment(Sequence(AC('P13475'))[128..178],Sequence(AC('P13475'))[137..188],279.9088,DMS,42.1286,98.4150,{Local,NoSelf})
?Alignment         ?CodonAlign      ?DynProgStrings    ?MAlign
?CalculateScore    ?DynProgScore    ?EstimatePam

------------
```

The help system uses approximate string matching to find appropriate topics:

`?geriatic code`
```
GGG  G  Gly    AGG  R  Arg    CGG  R  Arg    UGG  W  Trp
GGA  G  Gly    AGA  R  Arg    CGA  R  Arg    UGA    Stop
GGC  G  Gly    AGC  S  Ser    CGC  R  Arg    UGC  C  Cys
GGU  G  Gly    AGU  S  Ser    CGU  R  Arg    UGU  C  Cys

GAG  E  Glu    AAG  K  Lys    CAG  Q  Gln    UAG    Stop
GAA  E  Glu    AAA  K  Lys    CAA  Q  Gln    UAA    Stop
GAC  D  Asp    AAC  N  Asn    CAC  H  His    UAC  Y  Tyr
GAU  D  Asp    AAU  N  Asn    CAU  H  His    UAU  Y  Tyr

GCG  A  Ala    ACG  T  Thr    CCG  P  Pro    UCG  S  Ser
GCA  A  Ala    ACA  T  Thr    CCA  P  Pro    UCA  S  Ser
GCC  A  Ala    ACC  T  Thr    CCC  P  Pro    UCC  S  Ser
GCU  A  Ala    ACU  T  Thr    CCU  P  Pro    UCU  S  Ser

GUG  V  Val    AUG  M  Met    CUG  L  Leu    UUG  L  Leu
GUA  V  Val    AUA  I  Ile    CUA  L  Leu    UUA  L  Leu
GUC  V  Val    AUC  I  Ile    CUC  L  Leu    UUC  F  Phe
GUU  V  Val    AUU  I  Ile    CUU  L  Leu    UUU  F  Phe

?AltGenCode      ?BaseToInt    ?CIntToAmino    ?CodonToInt     ?IntToBBB
?AminoToInt      ?BBBToInt     ?CIntToCodon    ?Complement     ?IntToCInt
?antiparallel    ?BToInt       ?CIntToInt      ?GeneticCode    ?IntToCodon
?AToCInt         ?CIntToA      ?CodonToA       ?IntToB         ?Reverse
?AToCodon        ?CIntToAAA    ?CodonToCInt    ?IntToBase

------------
```

Next we will align two sequences of amino acids.

```s1 := 'NMTTSRQLLFTFFFTTTFFFFFFQARGLPCSPTWC';
s2 := 'NQLLFTFFTTTFFFFQAKGLRSAS';
a := Align(s1,s2):```
```s1 := NMTTSRQLLFTFFFTTTFFFFFFQARGLPCSPTWC
Warning: procedure s2 reassigned
s2 := NQLLFTFFTTTFFFFQAKGLRSAS
```

The system warns the user that the variable s2 had been previously assigned to a procedure and now it is reassigned (and hence the procedure becomes inaccessible). We have terminated the last line with a colon (:). This makes the system not echo the value of the line. In this case we want to print the alignment, rather than to see the structure holding it.

`print(a);`
```lengths=27,24 simil=61.9, PAM_dist=250, identity=63.0%
RQLLFTFFFTTTFFFFFFQARGLPCSP
:|||||||.||   ||||||!||.:::
NQLLFTFFTTT___FFFFQAKGLRSAS
```

For further details on alignments and using a database please see the bio-recipe on Alignments.

© 2011 by Gaston Gonnet, Informatik, ETH Zurich

Index of bio-recipes

Last updated on Mon Oct 3 17:23:30 2011 by GhG

!!! Dieses Dokument stammt aus dem ETH Web-Archiv und wird nicht mehr gepflegt !!!
!!! This document is stored in the ETH Web archive and is no longer maintained !!!