HUMANS Example of sequence comparison by genetic distance

Contents

Introduction

The discovery of Neanderthal skeletons in various parts of Europe raised many questions about human origis, among them the issue of our relation with these species. Now many questions about human and primate origins have been answered by the study of the mitochondrial genome and in particular of the hypervariable regions. These regions presents high sequence variability among humans, therefore they are ideal for studying the relationships among individuals. There are two hypervariable regions called HVR-I and HVR-II.

Load data into MATLAB workspace

In this demostration we first consider the HVR-1 and HVR-2 extracted from Nenderthal bones. Two Neanderthals are considered. The associated sequences can be download from the Genbank database.

% hvr-1
nea_1(1)=getgenbank('AF011222');
nea_1(2)=getgenbank('AF282971');


% hvr-2
nea_2(1)=getgenbank('AF142095');
nea_2(2)=getgenbank('AF282972');

If you don’t have a live web connection you can load them directly into the MATLAB workspace.

% load genbanknean    % <== Uncomment this if no internet connection

The mtDNA sequences of the Neanderthal are compared with 206 mtDNA sequences of modern humans. The complete D-loop is considered for each modern humans. The sequences have been extracted from the Max Ingman database. The associated web page can be explored by the URL.

web('http://www.genpat.uu.se/mtDB');

The data corresponding to both the D-loop are in the FASTA format and can be read using the MATLAB function fastaread.

human_dloop=fastaread('d_loop.fasta');
n=length(human_dloop);

[Note: the d_loop.fasta file can be found here]

Since the sequence of the Neanderthal are incomplete, the corresponding fragments of D-loop of modern human are extracted by local alignment. To this aim the HVR-1 and 2 of the first Neanderthal is considered.

nea.Header='First Neanderthal';
nea.Sequence=strcat(nea_1(1).Sequence,nea_2(1).Sequence);

score=zeros(n,1);
startat=zeros(n,1);

for i=1:n
      [sc,alignment,st]=swalign(human_dloop(i).Sequence,nea.Sequence,'Alphabet', 'NT');
      [m,offset]=size(alignment);                                             % calculate the offset
      startat(i,1)=st(1);                                                     % record the starting index
      endat(i,1)=min(startat(i,1)+offset-1,length(human_dloop(i).Sequence));       % record the ending index
      % the ending index is calculate as minimum between start + offset and length of the sequence because
      % the alignment can produce a lot of gaps
end

real_offset=min(min(endat-startat+1));

The final sequences of humans and Neanderthals are saved in the same structure.

for i=1:n
    hvr_tot(i).Sequence=human_dloop(i).Sequence(startat(i,1):startat(i,1)+real_offset-1);
end

hvr_tot(n+1).Sequence=strcat(nea_1(1).Sequence,nea_2(1).Sequence);
hvr_tot(n+2).Sequence=strcat(nea_1(2).Sequence,nea_2(2).Sequence);

Genetic distances

The genetic distances between each pair of sequences are computed with the Jukes-Cantor correction. It is a very time consuming task, therefore we give also the precomputed matrix. You can load it directly in the workspace.

%distances = seqpdist(hvr_tot,'Method','Jukes-Cantor','Alphabet', 'NT', 'PairwiseAlignment', true);

%D=squareform(distances);

load D

The distance matrix can be plotted. From the image it clearly appears that the zone corresponding to the higher values of distances (red) are those of distances between humans and Neanderthals.

imagesc(D);
legend

We can calculate also the mean distance among any two H. sapiens and between Neanderthal and any modern human.

D_human=reshape((D(1:n,1:n)),1,n^2);
human_dist=mean(D_human(find(D_human>0)))

D_mix=[D(1:n,n+1); D(1:n,n+2)];
mixed_dist=mean(D_mix)
human_dist =

    0.0890


mixed_dist =

    1.0739

The situation can be visualized also through multidimensional scaling, a statistical visualization method that enables us to enbed the datapoints on a plane in a way that respects their pairwise distances.

[Y,eigvals] = cmdscale(D);
plot(Y(1:n,1),Y(1:n,2),'.');
hold on
plot(Y(n+1:n+2,1),Y(n+1:n+2,2),'r*');
title('Relative Distances: 206 H.Sapiens and 2 H.Neanderthalensis');
hold off

From the plot it is evident that the Neanderthal sequences are very distant to the 206 Homo Sapiens distances. It means that Homo Neanderthalensis were a different species and are very different from modern humans.

Primate evolution

We study the relationship between primates, humans and Neanderthal through phylogenetic analysis. The Hypervariable region II of human, Neanderthal, chimpanzee, bonobo, gorilla, orangutan and gibbon are considered. The genetic distance are estimated as above with the Jukes-Cantor correction.

hvr_primates=fastaread('human_primates_hvr_2.fasta');
distances= seqpdist(hvr_primates,'Method','Jukes-Cantor','Alphabet', 'NT');
tree = seqlinkage(distances,'UPGMA',hvr_primates);

The phylogenetic tree is shown.

h1 = plot(tree,'orient','bottom');
ylabel('Genetic Distance')

From the tree it is evident that the Neanderthals are more closely related to modern humans than are any of the other primates.
With view you can further explore/edit the phylogenetic tree using an interactive tool.

view(tree)