ICEMAN Example of phylogenetic analysis

Contents

Introduction

Ötzi the Iceman is a well-preserved natural mummy of a man frozen for about 5300 years and found in 1991 in a glacier of the Ötztal Alps, near the border between Austria and Italy. Recently researchers found interesting informations about it from the exam of mitochondrial DNA taken from cells of the iceman intestine. In particular from phylogenetic analysis they made some hypotesis about its mitochondrial haplogroup. A haplogroup is defined by set of characteristic mutations on the mitochondrial genome. Therefore it can be traced along a person’s maternal line and it can be used to group populations by genetic features. The famous book “The Seven Daughters of Eve” by Bryan Sykes describes the classification of all modern humans into into mitochondrial haplogroups and links each haplogroup to a specific prehistoric woman (“clan mothers”). In fact the branches of the mtDNA tree (composed of groups of people with related haplogroups) are continent-specific. In this example, we analyze the statistical properties and perform phylogenetic analysis of the mtDNA of the iceman to investigate the relationship with modern humans of different geographical locations and to determine useful information about its haplogroup.

The mtDNA sequence of the iceman can be obtained from the Genbank database (accession number S69989). Some other sequence of modern humans from different parts of the world are considered. They have been extracted from the Max Ingman database (http://www.genpat.uu.se/mtDB/). All the sequences are stored in the same structure. The associated haplogrop is indicated.

fullset = {
'iceman'  'S69989' ''
'Aborigine'	'AF346963'	'A'
'Aborigine2'	'AF346965'	'A'
'Biaka Pygmy'	'AF346968'	'L1'
'Biaka Pygmy2'	'AF346969'	'L1'
'Buriat'	'AF346970'	'C'
'Chukchi'	'AF346971'	'A'
'Chinese'	'AF346972'	'F'
'Chinese2'	'AF346973'	'D'
'Crimean Tatar'	'AF346974'	'H'
'Dutch'	'AF346975'	'H'
'Effik'	'AF346976'	'L2'
'Effik2'	'AF346977'	'L2'
'English'	'AF346978'	'V'
'Evenki'	'AF346979'	'C'
'German'	'AF346983'	'J'
'Guarani'	'AF346984'  'D'
'Hausa'	'AF346985'	'L0'
'Ibo'	'AF346986'	'L1'
'Ibo2'	'AF346987'	'L1'
'Italian'	'AY882413'	'UK'
'Japanese'	'AF346989'	'D'
'Japanese2'	'AF346990'	'D'
'Kikuyu'	'AF346992'  'L1'
'Korean'	'AF346993'	'B'
'Mandenka'	'AF346995'	'L2'
'Mbenzele Pygmy'	'AF346996'	'L1'
'Mbenzele Pygmy2'	'AF346997'	'L1'
'Mbuti Pygmy'	'AF346998'	'L0'
'Mbuti Pygmy2'	'AF346999'	'L0'
'Piman'	'AF347001'	'B'
'PNG Coast'	'AF347002'	'N'
'PNG Highlands'	'AF347004'	'N'
'Samoan'	'AF347007'	'B'
'Saami'	'AF347006'	'V'
'San'	'AF347008'	'L0'
'San2'	'AF347009'	'L0'
'Siberian Inuit'	'AF347010'	'D'
'Warao'	'AF347012'	'C'
'Warao2'	'AF347013'	'C'
};

Since the mtDNA sequence of the Oetzi mummy is only partially available we consider the corresponding nucleotides in modern human sequences. To do that, we perform some local alignments. The scores of local alignments are stored in a vector indicating various degrees of similarity between sequences.

for ind = 1:length(fullset)
    fullPeople(ind).Header = [fullset{ind,1},' ', fullset{ind,3}];
end

fullPeople(1).Sequence = getgenbank(fullset{1,2},'sequenceonly','true');

for ind = 2:length(fullset)
    temp = getgenbank(fullset{ind,2},'sequenceonly','true');
	[score, localAlignment, Start] = swalign(fullPeople(1).Sequence,temp,'Alphabet', 'NT');
	sci(ind-1) = score;
	cutof = Start(2);
	lenK=length(localAlignment)+cutof;
	fullPeople(ind).Sequence=temp(cutof:lenK);
end

If you don’t have a live web connection, you can load the data from a MAT-file using the command.

% load bigPic        % <== Uncomment this if no internet connection

Statistical analysis

Before performing phylogenic analysis, we analyse some statistical properties of the iceman sequence.

S = fullPeople(1).Sequence;

The MATLAB function basecount reports nucleotide counts and displays them in pie chart.

figure;
basecount(S,'chart','pie');
title('Distribution of nucleotide bases of Iceman mtDNA HVR');

We can also plot the density of each nucleotide.

figure;
ntdensity(S);

The MATLAB function dimercount can be used to count dimers and plot them.

figure;
dimercount(S,'Chart','pie');
title('Dimer distribution for Iceman mtDNA')

The codons of each of reading frame of the positive strand are also counted and plotted. You can use the MATLAB function codoncount.

figure;
for f = 1 : 3,
    subplot(3,1,f);
    codons{1,f} = codoncount(S,'frame',f,'figure',true);
    title(sprintf('Codons for Frame %d of Iceman mtDNA ',f));
end

Phylogenetic analysis

The scores of local alignments between sequences previously computed are analyzed. We plot the associated histogram and we compute some statistics (mean, variance max, min).

figure
hist(sci,40)
title('Istogram of local alignment scores')

stats=[max(sci), min(sci), mean(sci) var(sci)]
stats =

  490.8493  455.9075  474.6299   75.5072

Few sequences have high scores and then are strongly related to the iceman sequence. The multiple alignment gives some information about the relationship between the analysed sequences.

MultiAligned = multialign(fullPeople);
showalignment(MultiAligned)

Then we calculate the pairwise distances using the Juckes Cantor correction. The MATLAB function seqpdist is used for that.

distances = seqpdist(fullPeople,'Method','Jukes-Cantor','Alphabet','NT');

The phylogenetic tree is built with the UPGMA algorithm an displayed.

UPGMAtree = seqlinkage(distances,'UPGMA',fullPeople);
h = plot(UPGMAtree,'orient','bottom');
title('UPGMA Distance Tree using Jukes-Cantor model');
ylabel('Evolutionary distance')
set(h.terminalNodeLabels,'Rotation',-45)

A neighbor-joining tree is also constructed using the seqneighjoin function. Neighbor-joining tree uses also the pairwise distance calculated above (using the seqpdist function).

NJtree = seqneighjoin(distances,'equivar',fullPeople);
h = plot(NJtree,'orient','left');
title('Neighbor-joining tree using Jukes-Cantor model');

The phylogenetic trees show that the iceman is more related to european population in particular to Italian belonging to the UK superhaplogroup. This is in accordance with the results presented in the paper of Rollo et al. that associates Otzi to the K haplogroup. Both trees show also some infomation about the relationship and the evolution of haplogroups in specific regions of the world. The neighbor joining algorithm seems more accurate. The tree depicts in a general manner also the relationship and the evolution of different populations. In Africa, the most ancient mtDNA haplogroups (L0, L1, L2), make up macrohaplogroup L. It radiated to form the Eurasian macrohaplogroups M and N. Among Europeans, haplogroups H, I, J, N1b, T, U, V, W, and X make up about 98% of the mtDNAs. These were derived primarily from macrohaplogroup N. In Asia, macrohaplogroups N and M contributed equally to mtDNA radiation. Here now the main haplogroups are A, C, D, G.

References

F. Rollo, L. Ermini, S. Luciani, I. Marota, C. Olivieri, D. Luiselli, Fine characterization of the Iceman’s mtDNA haplogroup, J Phys Anthropol. Jan 2006.

Torroni, K. Huoponen, P. Francalacci, M. Petrozzi, L. Morelli, R. Scozzari, D. Obinu, M. L. Savontaus, D.C. Wallace. Classification of European mtDNAs from an Analysis of Three European Populations, Genetics, Vol 144, 18351850, 1996.

M.Ingman, H. Kaessmann, S. Paabo, U. Gyllensten, Mitochondrial genome variation and the origin of modern humans. Nature 408, 708-713, 2000.

D.Mishmar, E. Ruiz-Pesini, P. Golik, V. Macaulay, A. Clark, S. Hosseini, M.Brandon, K. Easley, M. Brown, R.I. Sukernik, A. Olckers, D. Wallace. Natural selection shaped regional mtDNA variation in humans. Proc.Natl.Acad.Sci. 100, 171-176, 2003.