OLFACTORYRECEPTORS Example of use of HMM in sequence analysis

Contents

Introduction

Proteins are composed of a sequence of amino acids. These aminoacids have various atomic compositions and structures that lead to different properties. The first part of this demo focuses on the property of hydrophobicity.

Olfactory receptors (OR) are part of a family of proteins that have 7 transmembrane regions. That is they pass through the cell membrane 7 times. The interior of the cell membrane is hydrophobic while both the exterior and interior of the cell are hydrophilic. Therefore, the regions of the protein that pass through the membrane should contain mostly hydrophobic amino acids while the portion outside of the membrane should be mostly hydrophilic.

Segmenting odorant receptors

You will create a segmentation Hidden Markov Model to predict the transmembrane (hydrophobic) regions of an olfactory receptor (OR) protein. In this example you will consider 347 OR due to Zozulya, et al., 2001. You can read the 347 sequence of amino acids with the MATLAB function fastaread

orseqs = fastaread('347OR.fasta');

Considering only the first sequence, you can plot various property of that using the tool MATLAB proteinplot.

or1 = orseqs(1).Sequence

proteinplot(or1)
or1 =

MPNSTTVMEFLLMRFSDVWTLQILHSASFFMLYLVTLMGNILIVTVTTCDSSLHMPMYFFLRNLSILDACYISVTVPTSCVNSLLDSTTISKAGCVAQVFLVVFFVYVELLFLTIMAHDRYVAVCQPLHYPVIVNSRICIQMTLASLLSGLVYAGMHTGSTFQLPFCRSNVIHQFFCDIPSLLKLSCSDTFSNEVMIVVSALGVGGGCFIFIIRSYIHIFSTVLGFPRGADRTKAFSTCIPHILVVSVFLSSCSSVYLRPPAIPAATQDLILSGFYSIMPPLFNPIIYSLRNKQIKVAIKKIMKRIFYSENV

In the protein plot window, you must change the selected property to hydrophobicity (Kyte & Doolittle). Under Edit -> Change Configuration, you must change the window size to 19.0. Now you need to export the figure so that you can plot the results of the model over it. To do this, select File -> Export Figure. Then hold the figure for future use. Here the final profile is loaded for simplicity.

open('hydrophobicity.fig'); % Comment this if you want to use the proteinplot

hold on

Although you can pick out a number of hydrophobic and hydrophilic peaks in this graph simply by eye, an HMM can help to delineate exactly the various regions of interest. A model is assumed that consider the protein sequence being generated by a stochastic process that alternates between two hidden states: “out of the membrane” and “in the membrane”. The HMM is trained using 20 OR sequences.

for i=1:20
    intseqs(i) = {aa2int(orseqs(i).Sequence)};
end

The probability transition matrix is 2X2 for the 2 states: out of the membrane and in the membrane. You must enter an estimate of the transition probabilities as initial guess.

T = [0.95 0.05;
     0.05 0.95];

The probability emission matrix is 2X20 (2 states and 20 amino acids). Also in this case you must enter the initial guesses of the emission probabilities. These guesses of emission are based on the hydrophobicity of the amino acids. For example, in the ‘in the membrane’ state, a hydrophilic amino acids has a higher probability of emission that one that is hydrophobic.

E = [0.018	0.067	0.067	0.067	0.018	0.067	0.067	0.067	0.067	0.01	0.01	0.067	0.018	0.018	0.067	0.067	0.067	0.067	0.067	0.01;
     0.114	0.007	0.007	0.007	0.114	0.007	0.007	0.025	0.007	0.114	0.114	0.007	0.114	0.114	0.025	0.025	0.025	0.025	0.025	0.114];

Now, starting from this initial guesses the true emission and transition matrices must be estimated from our sequences using the EM algorithm. The Matlab function hmmtrain is used to this purpose.

[estT, estE] = hmmtrain(intseqs, T, E)
estT =

    0.7338    0.2662
    0.0711    0.9289


estE =

  Columns 1 through 11 

    0.0820    0.0971    0.0657    0.0408    0.0000    0.0429    0.0553    0.0555    0.0187    0.0242    0.0715
    0.0626    0.0181    0.0188    0.0248    0.0464    0.0153    0.0122    0.0416    0.0306    0.0902    0.1542

  Columns 12 through 20 

    0.1480    0.0874    0.0000    0.0366    0.0818    0.0228    0.0045    0.0000    0.0653
    0.0000    0.0424    0.0857    0.0427    0.0911    0.0763    0.0072    0.0583    0.0814

Finally the Viterbi algorithm can be used to detemine the states path, so to automatically segment the protein into its component regions. The MATLAB function hmmviterbi is used, receiving as input the emission and transition matrices previously estimated from hmmtrain. You can also plot the states over the hydrophobicity plot.

estimatedStates = hmmviterbi(aa2int(or1),estT,estE);
plot(estimatedStates)
hold off

Profile HMMs for odorant receptors

You will now turn to protein families and multiple alignment. Protein families are groups of proteins that have similar structure and function. Profile HMMs for specific families can be developed from the multiple alignment of members of the family. Profile HMMs create a useful position-based scoring system. Then, homologues can be compared back to the HMM. In MATLAB, you can use profile HMM to perform multiple sequence alignments.

Profile HMMs can be found for many protein families and the PFAM website.

web('http://www.sanger.ac.uk/Software/Pfam')

A first question to be answerd is to detemine which pHMM must be used for the olfactory receptors. You can use the first OR sequence and a randomized version of that to compare to the pHMMs.

randor1 = randseq(length(or1), 'fromstructure', aacount(or1));

There are over 7000 pHMMs available at the PFAM site. You can search all the HMMs by entering the sequence in the ‘search by protein sequence’ on the PFAM site. For sake of time, we will only compare the sequences to the first 4 pHMMs. The MATLAB function hmmprofalign aligns the sequences to the selected profile HMM while gethmmprof retrieves the model. For PFAM accession number PF00001, you must simply enter gethmmprof(1).

seqs = {or1,randor1};
for i = 1:4
    for j = 1:2
        [score(i,j)] = hmmprofalign(gethmmprof(i), seqs(j));
    end
end
score
score =

  111.2758 -112.1583
 -178.5775 -160.0956
 -157.0018 -161.5690
  -97.2529 -101.9993

You can see that the fake protein did not show a good alignment with any of the HMMs. The real olfactory receptor matches PF00001. You will use this HMM to align the olfactory receptors.

Therefore, first of all, you should get the pHMM from the PFAM database. Then you can retrieve multiple aligned sequences from the PFAM database using the MATLAB function gethmmalignment.

hmm7tm = gethmmprof(1);
seqs = gethmmalignment(1, 'type', 'seed');
disp([char(seqs.Header) char(seqs.Sequence)])
O10J1_HUMAN/52-300 GNIIIVTIIRIDLHLH....TPMYFFLSMLSTSETVYTLVILPRMLSSLVG........MSQPMSL...AGCATQMFFFVTFGITNCFLLTAMGYDRYVAICNPLRYMVIMN..KRLRIQLVLGACSIGLIVAITQVTS.VFRLPFC.ARK.......VPHFFCDIR...............PVMKLSCIDTTVNEILTLIISVLVLVVPMGLVFISYVLIIS...................................................................................................................................TILKIASVEGRKKAFATCASHLTVVIVHYSCASIAYLKPKSENT............REHDQLISVTYTVITPLLNPVVY
OLF15_MOUSE/41-290 GNLTIILLSRLDARLH....TPMYFFLSNLSSLDLAFTTSSVPQMLKNLWG........PDKTISY...GGCVTQLYVFLWLGATECILLVVMAFDRYVAVCRPLHYMTVMN..PRLCWGLAAISWLGGLGNSVIQSTF.TLQLPFCGHRK.......VDNFLCEVP...............AMIKLACGDTSLNEAVLNGVCTFFTVVPVSVILVSYCFIAQ...................................................................................................................................AVMKIRSVEGRRKAFNTCVSHLVVVFLFYGSAIYGYLLPAKSSN............QSQGKFISLFYSVVTPMVNPLIY
OL287_RAT/44-293   GNLAIISLVGAHRCLQ....TPMYFFLCNLSFLEIWFTTACVPKTLATFAP........RGGVISL...AGCATQMYFVFSLGCTEYFLLAVMAYDRYLAICLPLRYGGIMT..PGLAMRLALGSWLCGFSAITVPATL.IARLSFCGSRV.......INHFFCDIS...............PWIVLSCTDTQVVELVSFGIAFCVILGSCGITLVSYAYIIT...................................................................................................................................TIIKIPSARGRHRAFSTCSSHLTVVLIWYGSTIFLHVRTSVESS............LDLTKAITVLNTIVTPVLNPFIY
OLF1_CHICK/41-290  TNLGLIALISVDLHLQ....TPMYIFLQNLSFTDAAYSTVITPKMLATFLE........ERKTISY...VGCILQYFSFVLLTVTESLLLAVMAYDRYVAICKPLLYPSIMT..KAVCWRLVESLYFLAFLNSLVHTSG.LLKLSFCYSNV.......VNHFFCDIS...............PLFQISSSSIAISELLVIISGSLFVMSSIIIILISYVFIIL...................................................................................................................................TVVMIRSKDGKYKAFSTCTSHLMAVSLFHGTVIFMYLRPVKLFS............LDTDKIASLFYTVVIPMLNPLIY
GU27_RAT/22-271    GNLLIILAVSSNSHLH....NLMYFFLSNLSFVDICFISTTIPKMLVNIHS........QTKDISY...IECLSQVYFLTTFGGMDNFLLTLMACDRYVAICHPLNYTVIMN..LQLCALLILMFWLIMFCVSLIHVLLMNELNFSRG.............TEIPHFFCELA..........QVLKVANSDTHINNVFMYVVTSLLGLIPMTGILMSYSQIAS...................................................................................................................................SLLKMSSSVSKYKAFSTCGSHLCVVSLFYGSATIVYFCSSVLHS............THKKMIASLMYTVISPMLNPFIY
MRGRF_RAT/61-291   GNGLVLWFFGFSIKRT.....PFSIYFLHLASADGIYLFSKAVIALLNMGT........FLGSFPD...YVRRVSRIVGLCTFFAGVSLLPAISIERCVSVIFPMWYWRRRP..KRLSAGVCALLWLLSFLVTSIHNYFCMFLGHEASG............TACLNMDISLG................ILLFFLFCPLMVLPCLALILHVECRARR................................................................................................................................................RQRSAKLNHVVLAIVSVFLVSSIYLGIDWFLFWVFQIP............APFPEYVTDLCICINSSAKPIVY
OPSB_HUMAN/51-303  LNAMVLVATLRYKKLR....QPLNYILVNVSFGGFLLCIFSVFPVFVASCNG........YFVFGR...HVCALEGFLGTVAGLVTGWSLAFLAFERYIVICKPFGNFRFSS...KHALTVVLATWTIGIGVSIPPFFG.WSRFIPEG...........LQCSCGPDWYTVG....TKYRSESYTWFLFIFCFIVPLSLICFSYTQLLRALKAVAAQQQE........................................................................................................................................SATTQKAEREVSRMVVVMVGSFCVCYVPYAAFAMYMVNNRNHG..........LDLRLVTIPSFFSKSACIYNPIIY
OPS3_DROME/75-338  GNGLVIWVFSAAKSLR....TPSNILVINLAFCDFMMMVKTPIFIYNSFHQG.........YALGH...LGCQIFGIIGSYTGIAAGATNAFIAYDRFNVITRPMEGKMTHG....KAIAMIIFIYMYATPWVVACYTETWGRFVPEG...........YLTSCTFDYLTDN......FDTRLFVACIFFFSFVCPTTMITYYYSQIVGHVFSHEKALRDQAKKMNV..........................................................................................................................ESLRSNVDKNKETAEIRIAKAAITICFLFFCSWTPYGVMSLIGAFGDKT..........LLTPGATMIPACACKMVACIDPFVY
OPSD_LOLFO/51-315  GNGVVIYLFTKTKSLQ....TPANMFIINLAFSDFTFSLVNGFPLMTISCFM.......KYWVFGN...AACKVYGLIGGIFGLMSIMTMTMISIDRYNVIGRPMSASKKMS..HRKAFIMIIFVWIWSTIWAIGPIFGWGAYTLEGV............LCNCSFDYITRD......TTTRSNILCMYIFAFMCPIVVIFFCYFNIVMSVSNHEKEMAAMAKRLN............................................................................................................................AKELRKAQAGANAEMKLAKISIVIVTQFLLSWSPYAVVALLAQFGPIEW..........VTPYAAQLPVMFAKASAIHNPMIY
OPS1_DROME/67-329  GNGVVIYIFATTKSLR....TPANLLVINLAISDFGIMITNTPMMGINLYF........ETWVLGP...MMCDIYAGLGSAFGCSSIWSMCMISLDRYQVIVKGMAGRPMTIP...LALGKIAYIWFMSSIWCLAPAFGWSRYVPEGN............LTSCGIDYLERDWN......PRSYLIFYSIFVYYIPLFLICYSYWFIIAAVSAHEKAMREQAKKMNV...........................................................................................................................KSLRSSEDAEKSAEGKLAKVALVTITLWFMAWTPYLVINCMGLFKFEG...........LTPLNTIWGACFAKSAACYNPIVY
V2R_HUMAN/54-325   SNGLVLAALARRGRRGH..WAPIHVFIGHLCLADLAVALFQVLPQLAWKAT.........DRFRGP..DALCRAVKYLQMVGMYASSYMILAMTLDRHRAICRPMLAYRHGS..GAHWNRPVLVAWAFSLLLSLPQLFIFAQRNVEGGSG..........VTDCWACFAEPWG.......RRTYVTWIALMVFVAPTLGIAACQVLIFREIHASLVPGPSERPGGRRRG.......................................................................................................................RRTGSPGEGAHVSAAVAKTVRMTLVIVVVYVLCWAPFFLVQLWAAWDPEAP..........LEGAPFVLLMLLASLNSCTNPWIY
FSHR_BOVIN/379-626 GNILVLVILITSQYKL....TVPRFLMCNLAFADLCIGIYLLLIASVDVHTKTEYHNYAIDWQTG....AGCDAAGFFTVFASELSVYTLTAITLERWHTITHAMQLECKVQ..LRHAASIMLVGWIFAFAVALFPIFGISSYMKVS...............ICLPMDIDSP........LSQLYVMSLLVLNVLAFVVICGCYTHIYLTVRNPNIT..............................................................................................................................................SSSSDTKIAKRMAMLIFTDFLCMAPISFFAISASLKVPL..........ITVSKSKILLVLFYPINSCANPFLY
TRFR_HUMAN/42-320  GNIMVVLVVMRTKHMR....TPTNCYLVSLAVADLMVLVAAGLPNITDSIYG........SWVYGY...VGCLCITYLQYLGINASSCSITAFTIERYIAICHPIKAQFLCT..FSRAKKIIIFVWAFTSLYCMLWFFLLDLNISTYKD.........AIVISCGYKISRN........YYSPIYLMDFGVFYVVPMILATVLYGFIARILFLNPIPSDPKENSKTWKNDSTH..............................................................................................................QNTNLNVNTSNRCFNSTVSSRKQVTKMLAVVVILFALLWMPYRTLVVVNSFLSSP..........FQENWFLLFCRICIYLNSAINPVIY
NTR1_HUMAN/80-364  GNTVTAFTLARKKSLQS.LQSTVHYHLGSLALSDLLTLLLAMPVELYNFIWV......HHPWAFGD...AGCRGYYFLRDACTYATALNVASLSVERYLAICHPFKAKTLMS..RSRTKKFISAIWLASALLAVPM.LFTMGEQNRSADGQH....AGGLVCTPTIHTATVK..........VVIQVNTFMSFIFPMVVISVLNTIIANKLTVMVRQAAEQGQVCTVGG......................................................................................................................EHSTFSMAIEPGRVQALRHGVRVLRAVVIAFVVCWLPYHVRRLMFCYISDE...QWTPFLYDFYHYFYMVTNALFYVSSTINPILY
NPY1R_HUMAN/57-320 GNLALIIIILKQKEMR....NVTNILIVNLSFSDLLVAIMCLPFTFVYTLMD........HWVFGE...AMCKLNPFVQCVSITVSIFSLVLIAVERHQLIINPRGWRPNNR....HAYVGIAVIWVLAVASSLPFLIYQVMTDEPFQNVTLD...AYKDKYVCFDQFPSDS.......HRLSYTTLLLVLQYFGPLCFIFICYFKIYIRLKRRNNMMDKMR.....................................................................................................................................DNKYRSSETKRINIMLLSIVVAFAVCWLPLTIFNTVFDWNHQIIA.......TCNHNLLFLLCHLTAMISTCVNPIFY
GPR83_MOUSE/88-345 GNVLVCHVIFKNQRMH....SATSLFIVNLAVADIMITLLNTPFTLVRFVN........STWVFGK...GMCHVSRFAQYCSLHVSALTLTAIAVDRHQVIMHPLKPRISIT....KGVIYIAVIWVMATFFSLPHAICQKLFTFKYSED........IVRSLCLPDFPEPAD.....LFWKYLDLATFILLYLLPLFIISVAYARVAKKLWLCNTIGDVTT....................................................................................................................................EQYLALRRKKKTTVKMLVLVVVLFALCWFPLNCYVLLLSSKAIH...........TNNALYFAFHWFAMSSTCYNPFIY
NK1R_CAVPO/49-305  GNVVVMWIILAHKRMR....TVTNYFLVNLAFAEASMAAFNTVVNFTYAVHN........EWYYGL...FYCKFHNFFPIAAVFASIYSMTAVAFDRYMAIIHPLQPRLSAT....ATKVVICVIWVLALLLAFPQGYYSTTETMPGR.............VVCMIEWPSHP....DKIYEKVYHICVTVLIYFLPLLVIGYAYTVVGITLWASEIPGDSSD.....................................................................................................................................RYHEQVSAKRKVVKMMIVVVCTFAICWLPFHIFFLLPYINPDLY.......LKKFIQQVYLAIMWLAMSSTMYNPIIY
TLR1_DROME/100-363 GNGIVLWIVTGHRSMR....TVTNYFLLNLSIADLLMSSLNCVFNFIFMLN........SDWPFGS...IYCTINNFVANVTVSTSVFTLVAISFDRYIAIVHPLKRRTSRR....KVRIILVLIWALSCVLSAPCLLYSSIMTKHYYNGKSR...TVCFMMWPDGRYPTSM.......ADYAYNLIILVLTYGIPMIVMLICYSLMGRVLWGSRSIGENTD.....................................................................................................................................RQMESMKSKRKVVRMFIAIVSIFAICWLPYHLFFIYAYHNNQV.......ASTKYVQHMYLGFYWLAMSNAMVNPLIY
NPYR_DROME/122-383 GNGTVCYIVYSTPRMR....TVTNYFIASLAIGDILMSFFCVPSSFISLFIL.......NYWPFGL...ALCHFVNYSQAVSVLVSAYTLVAISIDRYIAIMWPLKPRITKR....YATFIIAGVWFIALATALPIPIVSGLDIPMSP..WH....TKCEKYICREMWPSRT.......QEYYYTLSLFALQFVVPLGVLIFTYARITIRVWAKRPPGEAET....................................................................................................................................NRDQRMARSKRKMVKMMLTVVIVFTCCWLPFNILQLLLNDEEFAHW........DPLPYVWFAFHWLAMSHCCYNPIIY
CCKAR_HUMAN/58-370 GNTLVITVLIRNKRMR....TVTNIFLLSLAVSDLMLCLFCMPFNLIPNLLK........DFIFGS...AVCKTTTYFMGTSVSVSTFNLVAISLERYGAICKPLQSRVWQT..KSHALKVIAATWCLSFTIMTPYPIYSNLVPFTKNNNQ........TANMCRFLLPNDV.......MQQSWHTFLLLILFLIPGIVMMVAYGLISLELYQGIKFEASQKKSAKERKPSTTSSGKYEDSDGCYLQK.................................................................................TRPPRKLELRQLSTGSSSRANRIRSNSSAANLMAKKRVIRMLIVIVVLFFLCWMPIFSANAWRAYDTAS.......AERRLSGTPISFILLLSYTSSCVNPIIY
BRS3_CAVPO/64-330  GNAILIKVFFKTKSMQ....TVPNIFITSLALGDLLLLLTCVPVDATHYLA........EGWLFGR...IGCKVLSFIRLTSVGVSVFTLTILSADRYKAVVKPLERQPSNA..ILKTCAKAGCIWIMSMIFALPEAIFSNVHTLRDPNKN.......MTSEWCAFYPVSEK......LLQEIHALLSFLVFYIIPLSIISVYYSLIARTLYKSTLNIPTEEQ...................................................................................................................................SHARKQVESRKRIAKTVLVLVALFALCWLPNHLLNLYHSFTHKAYE.....DSSAIHFIVTIFSRVLAFSNSCVNPFAL
MC3R_MOUSE/55-299  ENILVILAVVRNGNLH....SPMYFFLCSLAAADMLVSLSNSLETIMIAVINSD..SLTLEDQFIQ...HMDNIFDSMICISLVASICNLLAIAIDRYVTIFYALRYHSIMT..VRKALTLIGVIWVCCGICGVMFIIYSESKM................VIVCLITMFFAM...........VLLMGTLYIHMFLFARLHVQRIAVLPPAGVVAPQ...............................................................................................................................................QHSCMKGAVTITILLGVFIFCWAPFFLHLVLIITCPTN.......PYCICYTAHFNTYLVLIMCNSVIDPLIY
ACM1_HUMAN/42-418  GNLLVLISFKVNTELK....TVNNYFLLSLACADLIIGTFSMNLYTTYLLMG........HWALGT...LACDLWLALDYVASNASVMNLLLISFDRYFSVTRPLSYRAKRT..PRRAALMIGLAWLVSFVLWAPA.ILFWQYLVGERTVL.........AGQCYIQFLSQP..........IITFGTAMAAFYLPVTVMCTLYWRIYRETENRARELAALQGSETPGKGGGSSSSSERSQPGAEGSPETPPGRCCRCCRAPRLLQAYSWKEEEEEDEGSMESL........TSSEGEEPGSEVVIKMPMVDPEAQAPTKQPPRSSPNTVKRPTKKGRDRAGKGQKPRGKEQLAKRKTFSLVKEKKAARTLSAILLAFILTWTPYNIMVLVSTFCKDC...........VPETLWELGYWLCYVNSTINPMCY
5HT2A_CRIGR/91-380 GNILVIMAVSLEKKLQ....NATNYFLMSLAIADMLLGFLVMPVSMLTILYG.......YRWPLPS...KLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHHSRFNS..RTKAFLKIIAVWTISVGVSMPIPVFGLQDDSKVFK...........QGSCLLADDNF.............VLIGSFVAFFIPLTIMVITYFLTIKSLQKEATLCVSDLSTRAKLASFSFLPQSSLS................................................................................................SEKLFQRSIHREPGSYTGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICKESCN.......EHVIGALLNVFVWIGYLSSAVNPLVY
5HT5A_MOUSE/57-338 WNLLVLATILKVRTFH....RVPHNLVASMAISDVLVAVLVMPLSLVHELSG.......RRWQLGR...RLCQLWIACDVLCCTASIWNVTAIALDRYWSITRHLEYTLRTR..KRVSNVMILLTWALSTVISLAPLLFGWGETYSEP............SEECQVSREPS............YTVFSTVGAFYLPLWLVLFVYWKIYRAAKFRMGSRKTNSVSPVPEAVEVKNATQH....................................................................................................PQMVFTARHATVTFQTEGDTWREQKEQRAALMVGILIGVFVLCWFPFFVTELISPLCSWD...........VPAIWKSIFLWLGYSNSFFNPLIY
HRH2_CANFA/35-288  GNVVVCLAVGLNRRLR....SLTNCFIVSLAITDLLLGLLVLPFSAFYQLS........CRWSFGK...VFCNIYTSLDVMLCTASILNLFMISLDRYCAVTDPLRYPVLIT..PVRVAVSLVLIWVISITLSFLSIHLGWNSRNETSSF..........NHTIPKCKVQVN.........LVYGLVDGLVTFYLPLLVMCITYYRIFKIARDQAKRIHHMG.....................................................................................................................................SWKAATIGEHKATVTLAAVMGAFIICWFPYFTVFVYRGLKGDD..........AINEAFEAVVLWLGYANSALNPILY
DRD1_HUMAN/40-331  GNTLVCAAVIRFRHLR...SKVTNFFVISLAVSDLLVAVLVMPWKAVAEIAG........FWPFG....SFCNIWVAFDIMCSTASILNLCVISVDRYWAISSPFRYERKMT..PKAAFILISVAWTLSVLISFIPVQLSWHKAKPTSPS..........DGNATSLAETIDNC..DSSLSRTYAISSSVISFYIPVAIMIVTYTRIYRIAQKQIRRIAALERAAVHAKNCQTTT...........................................................................................................GNGKPVECSQPESSFKMSFKRETKVLKTLSVIMGVFVCCWLPFFILNCILPFCGSG.....ETQPFCIDSNTFDVFVWFGWANSSLNPIIY
ADRB1_HUMAN/75-377 GNVLVIVAIAKTPRLQ....TLTNLFIMSLASADLVMGLLVVPFGATIVVW........GRWEYGS...FFCELWTSVDVLCVTASIETLCVIALDRYLAITSPFRYQSLLT..RARARGLVCTVWAISALVSFLPILMHWWRAESDE............ARRCYNDPKCCD....FVTN.RAYAIASSVVSFYVPLCIMAFVYLRVFREAQKQVKKIDSCERRFLGGPARPPSPSPSPVPAPAPP.....................................................................................PGPPRPAAAAATAPLANGRAGKRRPSRLVALREQKALKTLGIIMGVFTLCWLPFFLANVVKAFHREL...........VPDRLFVFFNWLGYANSAFNPIIY
5HT1R_DROME/179-507GNVLVCIAVCMVRKLR....RPCNYLLVSLALSDLCVALLVMPMALLYEVL........EKWNFGP...LLCDIWVSFDVLCCTASILNLCAISVDRYLAITKPLEYGVKRT..PRRMMLCVGIVWLAAACISLPP.LLILGNEHEDEEG..........QPICTVCQNFA............YQIYATLGSFYIPLSVMLFVYYQIFRAARRIVLEEKRAQTHLQQALNGTGSPSAPQAPPLGHTELASSGNGQRHSSVGN.....................................................TSLTYSTCGGLSSGGGALAGHGSGGGVSGSTGLLGSPHHKKLRFQLAKEKKASTTLGIIMSAFTVCWLPFFILALIRPFETMHV...........PASLSSLFLWLGYANSLLNPIIY
5HT7R_HUMAN/98-384 GNCLVVISVCFVKKLR....QPSNYLIVSLALADLSVAVAVMPFVSVTDLIG.......GKWIFGH...FFCNVFIAMDVMCCTASIMTLCVISIDRYLGITRPLTYPVRQN..GKCMAKMILSVWLLSASITLPP.LFGWAQNVNDDK.............VCLISQDFG............YTIYSTAVAFYIPMSVMLFMYYQIYKAARKSAAKHKFPGFPRVEPDSVIALNGIVKL.................................................................................................QKEVEECANLSRLLKHERKNISIFKREQKAATTLGIIVGAFTVCWLPFFLLSTARPFICGTSCSCI.......PLWVERTFLWLGYANSLINPFIY
5HT1B_HUMAN/66-369 SNAFVIATVYRTRKLH....TPANYLIASLAVTDLLVSILVMPISTMYTVT........GRWTLGQ...VVCDFWLSSDITCCTASILHLCVIALDRYWAITDAVEYSAKRT..PKRAAVMIALVWVFSISISLPP..FFWRQAKAEEEV...........SECVVNTDHIL...........YTVYSTVGAFYFPTLLLIALYGRIYVEARSRILKQTPNRTGKRLTRAQLITDSPGSTSSVTSINSR...............................................................................VPDVPSESGSPVYVNQVKVRVSDALLEKKKLMAARERKATKTLGIILGAFIVCWLPFFIISLVMPICKDACWF.........HLAIFDFFTWLGYLNSLINPIIY
5HT1A_HUMAN/53-400 GNACVVAAIALERSLQ....NVANYLIGSLAVTDLMVSVLVLPMAALYQVL........NKWTLGQ...VTCDLFIALDVLCCTSSILHLCAIALDRYWAITDPIDYVNKRT..PRRAAALISLTWLIGFLISIPP.MLGWRTPEDRSDP...........DACTISKDHG............YTIYSTFGAFYIPLLLMLVLYGRIFRAARFRIRKTVKKVEKTGADTRHGASPAPQPKKSVNGESGSRNWRLGVESKAGGALCANGAVR...................................QGDDGAALEVIEVHRVGNSKEHLPLPSEAGPTPCAPASFERKNERNAEAKRKMALARERKTVKTLGIIMGTFILCWLPFFIVALVLPFCESSCHM.........PTLLGAIINWLGYSNSLLNPVIY
DRD2_BOVIN/51-427  GNVLVCMAVSREKALQ....TTTNYLIVSLAVADLLVATLVMPWVVYLEVVG........EWKFSR...IHCDIFVTLDVMMCTASILNLCAISIDRYTAVAMPMLYNTRYSS.KRRVTVMIAIVWVLSFTISCPMLFG.LNNTDQNE...............CIIANPAF.............VVYSSIVSFYVPFIVTLLVYIKIYIVLRRRRKRVNTKRSSRAFRANLKAPLKGNCTHPEDMKLCTVIMKSNGSFPVNRRRVEAARRAQELEMEMLSSTSPPERTRYSPIPPSHHQLTLPDPSHHGLHSTPDSPAKPEKNGHAKTVNPKIAKIFEIQSMPNGKTRTSLKTMSRRKLSQQKEKKATQMLAIVLGVFIICWLPFFITHILNIHCDCN...........IPPVLYSAFTWLGYVNSAVNPIIY
ADA1D_HUMAN/113-402GNLLVILSVACNRHLQ....TVTNYFIVNLAVADLLLSATVLPFSATMEVLG........FWAFGR...AFCDVWAAVDVLCCTASILSLCTISVDRYVGVRHSLKYPAIMT..ERKAAAILALLWVVALVVSVGP.LLGWKEPVPPD............ERFCGITEEAG............YAVFSSVCSFYLPMAVIVVMYCRVYVVARSTTRSLEAGVKRERGKASEVVLRIHCRGAAT...........................................................................................GADGAHGMRSAKGHTFRSSLSVRLLKFSREKKAAKTLAIVVGVFVLCWFPFFFVLPLGSLFPQLK..........PSEGVFKVIFWLGYFNSCVNPLIY
5HT6R_RAT/43-320   ANSLLIVLICTQPALR....NTSNFFLVSLFTSDLMVGLVVMPPAMLNALYG........RWVLAR...GLCLLWTAFDVMCCSASILNLCLISLDRYLLILSPLRYKLRMT..APRALALILGAWSLAALASFLPLLLGWHELGKARTPA.........PGQCRLLASLP............FVLVASGVTFFLPSGAICFTYCRILLAARKQAVQVASLTTGTAGQALETLQVP.........................................................................................................RTPRPGMESADSRRLATKHSRKALKASLTLGILLGMFFVTWLPFFVANIAQAVCDCIS............PGLFDVLTWLGYCNSTMNPIIY
AA1R_BOVIN/26-288  GNVLVIWAVKVNQALR....DATFCFIVSLAVADVAVGALVIPLAILINIG..........PRTYF...HTCLKVACPVLILTQSSILALLAMAVDRYLRVKIPLRYKTVVT..PRRAVVAITGCWILSFVVGLTP.MFGWNNLSAVERDWLANGSVGEPVIECQFEKVIS.........MEYMVYFNFFVWVLPPLLLMVLIYMEVFYLIRKQLSKKVSASS...................................................................................................................................GDPQKYYGKELKIAKSLALILFLFALSWLPLHILNCITLFCPSCH..........MPRILIYIAIFLSHGNSAMNPIVY
PTAFR_CAVPO/32-293 ANGYVLWVFARLYPSKK..LNEIKIFMVNLTVADLLFLITLPLWIVYYSNQ........GNWFLPK...FLCNLAGCLFFINTYCSVAFLGVITYNRFQAVKYPIKTAQATT..RKRGIALSLVIWVAIVAAASYFLVMDSTNVVSNKAGS.......GNITRCFEHYEKGS.......KPVLIIHICIVLGFFIVFLLILFCNLVIIHTLLRQPVKQQ...........................................................................................................................................RNAEVRRRALWMVCTVLAVFVICFVPHHMVQLPWTLAELG...MWPSSNHQAINDAHQVTLCLLSTNCVLDPVIY
PAR1_CRILO/122-374 LNILAIAVFVLKMKVK....KPAVVYMLHLAMADVLFVSVLPLKISYYFSG........SDWQFGS...GMCRFATAAFYCNMYASIMLMTVISIDRFLAVVYPIQSLSWRT..LGRANFTCLVIWVMAIMGVVPLLLKEQTTRVPGLN...........ITTCHDVLNETLLQG....FYSYYFSAFSAVFFLVPLIISTICYMSIIRCLSSSSVA..............................................................................................................................................NRSKKSRALFLSAAVFCVFIVCFGPTNVLLIMHYLLLSD......SPATEKAYFAYLLCVCVSSVSCCIDPLIY
P2RY5_CHICK/31-288 ANCVAIYIFTFTLKVR....NETTTYMLNLAISDLLFVFTLPFRIYYFVVR.........NWPFGD...VLCKISVTLFYTNMYGSILFLTCISVDRFLAIVHPFRSKTLRT..KRNARIVCVAVWITVLAGSTPASFFQSTNRQNNTE...........QRTCFENFPEST....WKTYLSRIVIFIEIVGFFIPLILNVTCSTMVLRTLNKPLTLS............................................................................................................................................RNKLSKKKVLKMIFVHLVIFCFCFVPYNITLILYSLMRTQTWIN..CSVVTAVRTMYPVTLCIAVSNCCFDPIVY
EBI2_HUMAN/48-308  GNLLALVVIVQ.NRKK...INSTTLYSTNLVISDILFTTALPTRIAYYAMG........FDWRIGD...ALCRITALVFYINTYAGVNFMTCLSIDRFIAVVHPLRYNKIKR..IEHAKGVCIFVWILVFAQTLPLLINPMSKQEAERI...........TCMEYPNFEETKS.......LPWILLGACFIGYVLPLIIILICYSQICCKLFRTAKQNPL.........................................................................................................................................TEKSGVNKKALNTIILIIVVFVLCFTPYHVAIIQHMIKKLRFSNFLECSQRHSFQISLHFTVCLMNFNCCMDPFIY
US28_HCMVT/50-291  GNFLVIFTITWRRRIQ....CSGDVYFINLAAADLLFVCTLPLWMQYLL..........DHNSLAS...VPCTLLTACFYVAMFASLCFITEIALDRYYAIV....YMRYRP..VKQACLFSIFWWIFAVIIAIPHFMVVTKKDNQ.................CMTDYDYLE....VS.YPIILNVELMLGAFVIPLSVISYCYYRISRIVAVSQS.................................................................................................................................................RHKGRIVRVLIAVVLVFIIFWLPYHLTLFVDTLKLLKWI.SSSCEFERSLKRALILTESLAFCHCCLNPLLY
CX3C1_RAT/49-294   GNLLVVLALTNSRKSK....SITDIYLLNLALSDLLFVATLPFWTHYLIS.........HEGLH.N...AMCKLTTAFFFIGFFGGIFFITVISIDRYLAIVLAANSMNNRT..VQHGVTISLGVWAAAILVASPQFMFTKRK.................DNECLGDYPEVL....QEIWPVLRNSEVNILGFVLPLLIMSFCYFRIVRTLFSCKN.................................................................................................................................................RKKARAIRLILLVVVVFFLFWTPYNIVIFLETLKFYNFF..PSCGMKRDLRWALSVTETVAFSHCCLNPFIY
CCR1_HUMAN/51-301  GNILVVLVLVQYKRLK....NMTSIYLLNLAISDLLFLFTLPFWIDYKLK.........DDWVFGD...AMCKILSGFYYTGLYSEIFFIILLTIDRYLAIVHAVFALRART..VTFGVITSIIIWALAILASMPGLYF.SKTQWEFT............HHTCSLHFPHES....LREWKLFQALKLNLFGLVLPLLVMIICYTGIIKILLRRPN.................................................................................................................................................EKKSKAVRLIFVIMIIFFLFWTPYNLTILISVFQDFLFT..HECEQSRHLDLAVQVTEVIAYTHCCVNPVIY
CCRL1_BOVIN/58-303 GNSTVVAIYAYYKKRR....TKTDVYILNLAVADLFLLFTLPFWAVNAVHG..........WVLGK...IMCKVTSALYTVNFVSGMQFLACISTDRYWAVTKAPSQSGVGK....PCWVICFCVWVAAILLSIPQLVFYTVNHKARCVPI.........FPYHLGTSMKAS...........IQILEICIGFIIPFLIMAVCYFITAKTLIKMPN.................................................................................................................................................IKKSQPLKVLFTVVIVFIVTQLPYNIVKFCQAIDIIYSL.ITDCDMSKRMDVAIQITESIALFHSCLNPVLY
CCR7_HUMAN/75-326  GNGLVVLTYIYFKRLK....TMTDTYLLNLAVADILFLLTLPFWAYSAAKS..........WVFGV...HFCKLIFAIYKMSFFSGMLLLLCISIDRYVAIVQAVSAHRHRARVLLISKLSCVGIWILATVLSIPELLYSDLQRSSSEQ...........AMRCSLITEHVEA.......FITIQVAQMVIGFLVPLLAMSFCYLVIIRTLLQARN.................................................................................................................................................FERNKAIKVIIAVVVVFIVFQLPYNGVVLAQTVANFNIT.SSTCELSKQLNIAYDVTYSLACVRCCVNPFLY
CXCR4_BOVIN/56-303 GNGLVILVMGYQKKLR....SMTDKYRLHLSVADLLFVLTLPFWAVDAVAN..........WYFGK...FLCKAVHVIYTVNLYSSVLILAFISLDRYLAIVHATNSQKPRK..LLAEKVVYVGVWLPAVLLTIPDLIFADIKEVDER.............YICDRFYPSDL.......WLVVFQFQHIVVGLLLPGIVILSCYCIIISKLSHSKG.................................................................................................................................................YQKRKALKTTVILILTFFACWLPYYIGISIDSFILLEII.QQGCEFESTVHKWISITEALAFFHCCLNPILY
CXCR1_HUMAN/56-305 GNSLVMLVILYSRVGR....SVTDVYLLNLALADLLFALTLPIWAASKVNG..........WIFGT...FLCKVVSLLKEVNFYSGILLLACISVDRYLAIVHATRTLTQKR...HLVKFVCLGCWGLSMNLSLPFFLFRQAYHPNNSSP............VCYEVLGNDT.....AKWRMVLRILPHTFGFIVPLFVMLFCYGFTLRTLFKAHM.................................................................................................................................................GQKHRAMRVIFAVVLIFLLCWLPYNLVLLADTLMRTQVI.QETCERRNNIGRALDATEILGFLHSCLNPIIY
CXCR5_HUMAN/68-322 GNVLVLVILERHRQTR....SSTETFLFHLAVADLLLVFILPFAVAEGSVG..........WVLGT...FLCKTVIALHKVNFYCSSLLLACIAVDRYLAIVHAVHAYRHRR..LLSIHITCGTIWLVGFLLALPEILFAKVSQGHHNNS..........LPRCTFSQENQA....ETHAWFTSRFLYHVAGFLLPMLVMGWCYVGVVHRLRQAQR................................................................................................................................................RPQRQKAVRVAILVTSIFFLCWSPYHIVIFLDTLARLKAV.DNTCKLNGSLPVAITMCEFLGLAHCCLNPMLY
APJ_HUMAN/45-309   GNGLVLWTVFRSSREK...RRSADIFIASLAVADLTFVVTLPLWATYTYRD........YDWPFGT...FFCKLSSYLIFVNMYASVFCLTGLSFDRYLAIVRPVANARLRL..RVSGAVATAVLWVLAALLAMPVMVLRTTGDLENTT...........KVQCYMDYSMVATVSSEWAWEVGLGVSSTTVGFVVPFTIMLTCYFFIAQTIAGHFRKER..........................................................................................................................................IEGLRKRRRLLSIIVVLVVTFALCWMPYHLVKTLYMLGSLLH...WPCDFDLFLMNIFPYCTCISYVNSCLNPFLY
BKRB2_HUMAN/74-332 ENIFVLSVFCLHKSSC....TVAEIYLGNLAAADLILACGLPFWAITISNN........FDWLFGE...TLCRVVNAIISMNLYSSICFLMLVSIDRYLALVKTMSMGRMRG..VRWAKLYSLVIWGCTLLLSSPMLVFRTMKEYSDEGHN.........VTACVISYPSLI.......WEVFTNMLLNVVGFLLPLSVITFCTMQIMQVLRNNEMQKF...........................................................................................................................................KEIQTERRATVLVLVVLLLFIICWLPFQISTFLDTLHRLGI..LSSCQDERIIDVITQIASFMAYSNSCLNPLVY
AGTR1_BOVIN/45-302 GNSLVVIVIYFYMKLK....TVASVFLLNLALADLCFLLTLPLWAVYTAMEY........RWPFGN...YLCKIASASVSFNLYASVFLLTCLSIDRYLAIVHPMKSRLRRT..MLVAKVTCIIIWLLAGLASLPTIIHRNVFFIENTN...........ITVCAFHYESQN.....STLPVGLGLTKNILGFLFPFLIILTSYTLIWKTLKKAYEIQ............................................................................................................................................KNKPRKDDIFKIILAIVLFFFFSWVPHQIFTFMDVLIQLGL..IRDCKIEDIVDTAMPITICLAYFNNCLNPLFY
AGTR2_MOUSE/61-318 VNIVVVSLFCCQKGPK....KVSSIYIFNLALADLLLLATLPLWATYYSYR........YDWLFGP...VMCKVFGSFLTLNMFASIFFITCMSVDRYQSVIYPFLSQRRNP...WQASYVVPLVWCMACLSSLPTFYFRDVRTIEYLG...........VNACIMAFPPEK....YAQWSAGIALMKNILGFIIPLIFIATCYFGIRKHLLKTNSYG............................................................................................................................................KNRITRDQVLKMAAAVVLAFIICWLPFHVLTFLDALTWMGI..INSCEVIAVIDLALPFAILLGFTNSCVNPFLY
C5AR_CANFA/55-302  GNFLVVWVTGFEVRRT.....INAIWFLNLAVADLLSCLALPILFSSIVQQG........YWPFGN...AACRILPSLILLNMYASILLLTTISADRFVLVFNPIWCQNYRG..PQLAWAACSVAWAVALLLTVPSFIFRGVHTEYFPF...........WMTCGVDYSGVG.....VLVERGVAILRLLMGFLGPLVILSICYTFLLIRTWSRKA.................................................................................................................................................TRSTKTLKVVVAVVVSFFVLWLPYQVTGMMMALFYKHS......ESFRRVSRLDSLCVAVAYINCCINPIIY
SSR1_HUMAN/75-323  GNSMVIYVILRYAKMK....TATNIYILNLAIADELLMLSVPFLVTSTLL.........RHWPFGA...LLCRLVLSVDAVNMFTSIYCLTVLSVDRYVAVVHPIKAARYRR..PTVAKVVNLGVWVLSLLVILPIVVFSRTAANSDG............TVACNMLMPEPA.....QRWLVGFVLYTFLMGFLLPVGAICLCYVLIIAKMRMVALKAGW.........................................................................................................................................QQRKRSERKITLMVMMVVMVFVICWMPFYVVQLVNVFAEQD....D........ATVSQLSVILGYANSCANPILY
OPRD_MOUSE/66-318  GNVLVMFGIVRYTKLK....TATNIYIFNLALADALATSTLPFQSAKYLM.........ETWPFGE...LLCKAVLSIDYYNMFTSIFTLTMMSVDRYIAVCHPVKALDFRT..PAKAKLINICIWVLASGVGVPIMVM.AVTQPRDGAVV.......CMLQFPSPS..........WYWDTVTKICVFLFAFVVPILIITVCYGLMLLRLRSVRLLSGS.........................................................................................................................................KEKDRSLRRITRMVLVVVGAFVVCWAPIHIFVIVWTLVDINRRDPL.......VVAALHLCIALGYANSSLNPVLY
RDC1_CANFA/61-315  ANSVVVWVNIQAKTTG....YDTHCYILNLAIADLWVVVTIPVWVVSLVQHN........QWPMGE...LTCKITHLIFSINLFGSIFFLTCMSVDRYLSITYFASTSSRRK..KVVRRAVCVLVWLLAFCVSLPDTYYLKTVTSASNN...........ETYCRSFYPEHS....VKEWLISMELVSVVLGFAIPFCVIAVFYCLLARAISASSD.................................................................................................................................................QEKQSSRKIIFSYVVVFLVCWLPYHVVVLLDIFSILHYI.PFTCQLENFLFTALHVTQCLSLVHCCVNPVLY
ADMR_RAT/66-316    ENVLVICVNCR.RSGR...VGMLNLYILNMAVADLGIILSLPVWMLEVMLE........YTWLWGS...FSCRFIHYFYLANMYSSIFFLTCLSIDRYVTLTNTSPSWQRHQ..HRIRRAVCAGVWVLSAIIPLPEVVHIQLLDGSEP..............MCLFLAPFET....YSAWALAVALSATILGFLLPFPLIAVFNILSACRLRRQGQ.................................................................................................................................................TESRRHCLLMWAYIVVFVICWLPYHVTMLLLTLHTTHI..FLHCNLVNFLYFFYEIIDCFSMLHCVANPILY
US27_HCMVA/47-294  LNVLVITTILYYRRKK...KSPSDTYICNLAVADLLIVVGLPFFLEYAKH.........HPKLSRE...VVCSGLNACFYICLFAGVCFLINLSMDRYCVIVWGVELNRVRN..NKRATCWVVIFWILAVLMGMPHYLMYSHTNNECVGEF.........ANETSGWFPVF............LNTKVNICGYLAPIALMAYTYNRMVRFIINYVG.................................................................................................................................................KWHMQTLHVLLVVVVSFASFWFPFNLALFLESIRLLAGV..YNDTLQNVIIFCLYVGQFLAYVRACLNPGIY
EDG1_HUMAN/62-310  ENIFVLLTIWKTKKFH....RPMYYFIGNLALSDLLAGVAYTANLLLSGAT.........TYKLTP...AQWFLREGSMFVALSASVFSLLAIAIERYITMLKMKLHNGSNN...FRLFLLISACWVISLILGGLPIMGWNCISALSS...............CSTVLPLYH..........KHYILFCTTVFTLLLLSIVILYCRIYSLVRTRSRRLTFRKN...................................................................................................................................ISKASRSSENVALLKTVIIVLSVFIACWAPLFILLLLDVGCKVKT.........CDILFRAEYFLVLAVLNSGTNPIIY
CNR2_HUMAN/50-299  ENVAVLYLILSSHQLR...RKPSYLFIGSLAGADFLASVVFACSFVNFHVF.........HGVDSK...AVFLLKIGSVTMTFTASVGSLLLTAIDRYLCLRYPPSYKALLT..RGRALVTLGIMWVLSALVSYLPLMGWTCCPRP.................CSELFPLIP..........NDYLLSWLLFIAFLFSGIIYTYGHVLWKAHQHVASLSGHQDR.................................................................................................................................QVPGMARMRLDVRLAKTLGLVLAVLLICWFPVLALMAHSLATTLS.....DQ.....VKKAFAFCSMLCLINSMVNPVIY
CNR1_HUMAN/133-397 ENLLVLCVILHSRSLR...CRPSYHFIGSLAVADLLGSVIFVYSFIDFHVF.........HRKDSR...NVFLFKLGGVTASFTASVGSLFLTAIDRYISIHRPLAYKRIVT..RPKAVVAFCLMWTIAIVIAVLPLLGWNCEKLQSV...............CSDIFPHID..........ETYLMFWIGVTSVLLLFIVYAYMYILWKAHSHAVRMIQRGTQKSIIIH....................................................................................................................TSEDGKVQVTRPDQARMDIRLAKTLVLILVVLIICWGPLLAIMVYDVFGKMN.....KL.....IKTVFAFCSMLCLLNSTVNPIIY
UL33_HCMVA/48-306  LNAIVLITQLLTNRVLG..YSTPTIYMTNLYSTNFLTLTVLPFIVLSNQWLL..........PAGV...ASCKFLSVIYYSSCTVGFATVALIAADRYRVLH..KRTYARQS..YRSTYMILLLTWLAGLIFSVPAAVYTTVVMHHDANDTN....NTNGHATCVLYFVAEE....VHTVLLSWKVLLTMVWGAAPVIMMTWFYAFFYSTVQRTSQ.................................................................................................................................................KQRSRTLTFVSVLLISFVALQTPYVSLMIFNSYATTAW..PMQCEHLTLRRTIGTLARVVPHLHCLINPILY
TA2R_HUMAN/41-308  SNLLALSVLAGARQGGSHTRSSFLTFLCGLVLTDFLGLLVTGTIVVSQHAAL.......FEWHAVDPGCRLCRFMGVVMIFFGLSPLLLGAAMASERYLGITRPFSRPAVAS..QRRAWATVGLVWAAALALGLLPLLGVGRYTVQYP............GSWCFLTLGAES......GDVAFGLLFSMLGGLSVGLSFLLNTVSVATLCHVYHGQEAAQ.........................................................................................................................................QRPRDSEVEMMAQLLGIMVVASVCWLPLLVFIAQTVLRNPP.AMSPAGQLSRTTEKELLIYLRVATWNQILDPWVY
PE2R4_HUMAN/34-329 GNLVAIVVLCKSRKEQK..ETTFYTLVCGLAVTDLLGTLLVSPVTIATYMKG........QWPGGQ...PLCEYSTFILLFFSLSGLSIICAMSVERYLAINHAYFYSHYVD..KRLAGLTLFAVYASNVLFCALPNMGLGSSRLQYP............DTWCFIDWTTNVTAHAAYSYMYAGFSSFLILATVLCNVLVCGALLRMHRQFMRRTSLGTEQHHAAAAASVASRGHPA.......................................................................................................ASPALPRLSDFRRRRSFRRIAGAEIQMVILLIATSLVVLICSIPLVVRVFVNQLYQPS.......LEREVSKNPDLQAIRIASVNPILDPWIY

As above you can perform alignment between the chosen pHMM and our first 30 up to 347 OR sequences. The MATLAB function hmmmerge allows an easier viewing.

for i=1:30
    [Score(i), Seqs(i).Aligned] = hmmprofalign(hmm7tm, orseqs(i).Sequence);
end
hmmprofmerge(Seqs,Score)

It must be pointed out why it is better to use profile HMM instead of a pairwise alignment. Each time you align a sequence to an HMM, it is like aligning it to the hundreds of sequences that have been used to create the HMM. This gives you more certainty in the results of the alignment. As confirm, you can consider the following. The family to which we aligned the OR is the Rhodopsin family. Here you can try to perform an alignment with just one of the sequences used to develop the HMM. First, you must retrieve the sequences for the rhodopsin receptor and one of the odorant receptors.

rhod = getgenpept('NP_002368','sequenceonly',true);
or = orseqs(300).Sequence;

Now you can perform a paiwise global alignment of the two sequences. The BLOSUM30 is used as scoring matrix. Here the penalties for opening and extending a gap in the alignment are set to 5.

[Score, Alignment] = nwalign(or, rhod, 'scoringmatrix','blosum30','gapopen',5,'extendgap',5)
Score =

   54.8000


Alignment =

MDVG-N-KS-TMSE--FVLLG--LS--NS-WELQMFFFMVFSLLYVATMVGNSLIVITVIVDPHLHSPMYFLLTNLSIIDMSLA--SFA-TPKMITDY-L-TGH-KTISFDGCLTQIFFLHLFTGTEIILLMAMSFDRYIAICKPLHYASVISPQVCVALVVA-SWIMGVMHSMSQVIFALTLPFCGPYEVDSFFCDLPVVFQLACVDTYVLGLFMISTSGIIALSCFIVLFNSYVIVLVTVKHHS-SRGSSKALSTCTAHFIVVFLFFG-PCIFIYM-W-PL-SSFLTDKILSV-FYTIFTPTLNPIIYTL-------RNQE-VKIAM-RKLKN----RF--LNFNKAM-PS--
|| | |  | :::|   : 😐   |  |: :::::: ::: |:  |:  | |: |:: ::   :  😐  :::| |||:|:||    |: : : ::|| | :||  || :   |: :|:: : ||  : ||:|:|::| :::: |: |     |::  ||| |  |::: :  :: :: ::::  |    😐   :: :   : 😐  : |: 😐 |:: :::::: :: |::| ::| | ::  : :  ||| : : ::  |::||:|: |  ::|: : :: |:| :   :|: | || : : ||:|| :       | 😐 :|::: |::|:    |    | | :: ::  
MD-GSNVTSFVVEEPTNISTGRNASVGNAHRQIPIVHWVIMSISPVG-FVENG-ILLWFLCFRMRRNPFTVYITHLSIADISLLFCIFILSIDYALDYELSSGHYYTI-VT--LSVTFLFGYNTG--LYLLTAISVERCLSVLYPIWY-RCHRPKYQSALVCALLWALSCL--VT-TM-EYVM--C----IDR--EE-ESHSRNDCR-A-VI-IF-IAILSFLVFTPLM-LVSSTILV-VKIRKNTWASHSSKLYIVIMVT-IIIFLIFAMPMRLLYLLYYEYWSTFGNLHHISLLFSTINS-SANPFIYFFVGSSKKKRFKESLKVVLTRAFKDEMQPRRQKDNCNTVTVETVV

Now you must assess the significance of this alignment. Use randperm for this and compare the alignment and scores. The scores are fairly close. It is difficult to tell if the alignment is significant.

perm = randperm(length(or));
randor = or(perm);
[Score, Alignment] = nwalign(randor, rhod,'scoringmatrix','blosum30','gapopen',5,'extendgap',5)
Score =

   43.8000


Alignment =

LLLASVTYGMNSAGTE-CTFVLMPVAN-HIQVLVKSDH-IKMSIEIVLKPCIKSVGYMPTCSDV--SPIFTFFADYVIWIAVVEALV-NLISYVGMKLPFFFLSN-VF-INIFSLNHMQHYIADLLLHTVMIHFTVFQVFIAFFKTTMRPSNPVL-TIKICGFGMMFYFHFNFTIVFSAISVLMTSTQASSSDSPSKISLLGSFICVYLSTMNFLPTFLVFDCDYAYQSMGCLRAVSLIVSPCETALSFIVVIRM-FSLSG-CKKLTEV-IWQPFILWLHI-FL-DLI-T---PLA-FML-TDL---Y--S--V-LTFGLNPDGLGYRMLRATL-THSIMSKV
:  ::||  : :: |   |   : |:| | |: :   | : |||: |   ::  : :  😐  :  😐 ||::: ::  || :: |: ::|  :  😐 : : |: :: | ::|:: :  |:: |:| | 😐  😐 : : :::::  |   | : :: 😐   ::::  ::  😐 ::::::|   ::::| | ::   :  || : || : |:|: || :  :::: :  ::| |   |:: ::: ::::| : |::   :  |: :  | 😐    || 😐 ::| :   |:: | : ::    :  |  | || :: 😐 :  |: :::: | :: : |
MDGSNVTSFVVEEPTNISTGRNASVGNAHRQIPI-V-HWVIMSISPVGFVENGILLWF-LCFRMRRNP-FTVYITHLS-IADISLLFCIFILSIDYALDYELSSGHYYTIVTLSVTFLFGYNTGLYLLT-AI--SVERCLSVLYPIWYRCHRPKYQSALVC---ALLW-ALS-CLV-TTMEYVMCIDREEESHSRNDCRAVIIFIAI-LSFLVFTPLMLVSSTILVVKIRKNTWA-SHS-SKLYIVIMVTIIIFLIFAMPMRLLYLLYYEYWSTFGNLHHISLLFSTINSSANPFIYFFVGSSKKKRFKESLKVVLTRAF-KDEMQPRRQKDNCNTVTVETVV

On the contrary, if you align both sequences with the HMM, the significance of the alignment is more apparent.

[score_or, align_or] = hmmprofalign(hmm7tm,or)
[score_rand, align_rand] = hmmprofalign(hmm7tm,randor)
score_or =

  156.9790


align_or =

GNSLIVITVIVDPHLHSPMYFLLTNLSIIDMSLASFATPKMITDYLTG-HKTISFDGCLTQIFFLHLFTGTEIILLMAMSFDRYIAICKPLHYASVIS-PQVCVALVVASWIMGVMHSMSQVIF-ALTLPFCGPYEVDSFFCDLPVVFQLACVDTYVLGLFMISTSGIIALSCFIVLFNSYVIVLVTVKHHSS--------RGSSKALSTCTAHFIVVFLFFGPC-IFIYMWPLSsFL--------------TDKILSVFYTIFTPTLNPIIY


score_rand =

 -136.5222


align_rand =

AVVEALVNLISYVGMK-LPFFFLSN---------VFINIFSLNHMQH-------YIADLLLHTVMIHFTVFQVF------------IAFFKTTMRPSN-PVLTIKICGFGmmfyfhFNFTIVFSAISVLMTSTQASSSDSPS--KISLLGSFICV----------YLSTMNFLPTFLVFDCDYAYQSMGCLRAVSLIVSPC----------ETALSFIVVIRMFSLSGCKKLTEVIWQPFIlWLHI-----FLDLITPLAFMLTDLYSVLTFGLNPdgLGY

Therefore, the alignment with pHMM has much more power than pairwise alignment since it includes the characteristics of all the sequences used to create the model.

Phylogenetic Tree

In the last part of this demo, you will create a phylogenetic tree from member of this protein family. The olfactory receptors are actually part of a much large protein family known as the G-Protein-Coupled Receptors. All of these proteins are 7-transmembrane, but they detect molecules other than odorants. There are 5 main groups of GPCRs: Adhesion, Secretin, Glutamate, Frizzled/TAS2, Rhodopsin (Fredriksson, et al.,2003). You will use a few of these groups to create the tree. First, sequences can be retrieved from the GenBank database using the getgenbank function.

data = {'Adhesion 1' 'NP_001775';
        'Adhesion 2' 'NP_001965';
        'Glutamate 1' 'NP_000830';
        'Glutamate 2' 'NP_000836';
        'Rhod-Alpha 1' 'NP_001051';
        'Rhod-Alpha 2' 'NP_000946';
        'Rhod-Delta 1' 'NP_002368';
        'Rhod-Delta 2' 'NP_473372'};

for prot = 1:8
    seqs(prot).Header   = data{prot,1};
    seqs(prot).Sequence = getgenpept(data{prot,2},'sequenceonly','true');
end

You can calculate the UPGMA distances using Jukes-Cantor correction, so you build the tree.

distances = seqpdist(seqs,'Method','Jukes-Cantor');
tree = seqlinkage(distances,'UPGMA',seqs)

Now plot the tree.

h = plot(tree,'orient','bottom');
ylabel('Evolutionary distance')

Adding two of the olfactory receptors sequences, you must recreate the tree.

data2 = {'Olfactory 1';'Olfactory 2'};
for prot = 1:2
    seqs(prot+8).Header   = data2{prot,1};
    seqs(prot+8).Sequence = orseqs(prot).Sequence;
end

distances = seqpdist(seqs,'Method','Jukes-Cantor');
tree = seqlinkage(distances,'UPGMA',seqs)
h = plot(tree,'orient','bottom');
ylabel('Evolutionary distance')

You can see that the members of the GPCR groups were grouped together and that the olfactory receptors fell within the Rhodopsin group. This matches what we previously knew from matching the ORs with the correct profile HMM. However, UPGMA grouped the ORs with the Alpha-Rhodopsins while the maximum parsimony method used by Fredriksson et al. group them with the Delta-Rhodopsins.

References

Axel, R. The Molecular Logic of Smell. 1995. Scientific American 273(4):154-159.

Buck, L. and R. Axel. 1991. A Novel Multigene Family May Encode Odorant Receptors: A Molecular Basis for Odor Recognition. Cell 65:175-187.

Eddy, S. Profile Hidden Markov Models. 1998. Bioinformatics. 14(9):755-763.

Fredriksson, R., M. Lagerstrom, L. Lundin and H. Schioth. 2003. The G-Protein-Coupled Receptors in the Human Genome Form Five Main Families. Phylogenetic Analysis, Paralogon Groups, and Fingerprints. Molecular Pharmacology 63:1256-1272.

Mombaerts, R. 2004. Genes and Ligands for Odorant, Vomeronasal, and Taste Receptors. Nature Reviews 5:263-278.

Zozulya, S., F. Echeverri and T. Nguyen. 2001. The human olfactory receptor repertoire. Genome Biology 2(6):research0018.1-0018.12.