From - Thu Dec 23 10:42:56 1999
Received: from dodo.cpmc.columbia.edu (dodo.cpmc.columbia.edu [156.111.190.78])
	by buda.ipbs.fr (8.9.3/8.9.3) with ESMTP id KAA10319
	for <gouet@ipbs.fr>; Thu, 23 Dec 1999 10:41:25 +0100
Received: (from phd@localhost) by dodo.cpmc.columbia.edu (980427.SGI.8.8.8/980728.SGI.AUTOCF) id EAA20843 for gouet@ipbs.fr; Thu, 23 Dec 1999 04:37:21 -0500 (EST)
Date: Thu, 23 Dec 1999 04:37:21 -0500 (EST)
From: phd@dodo.cpmc.columbia.edu (PredictProtein)
Message-Id: <199912230937.EAA20843@dodo.cpmc.columbia.edu>
To: gouet@ipbs.fr
Subject: PredictProtein
Status:   
X-Mozilla-Status: 0001
Content-Length: 102467




The following information has been received by the server:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

________________________________________________________________________________

reference predict_h11817647 (Dec 23, 1999 04:29:56)
PPhdr from: gouet@ipbs.fr
PPhdr resp: MAIL
PPhdr orig: HTML
PPhdr want: ASCII
PPhdr password(###)
prediction of: - default prediction of: - PHDsec PHDacc PHDhtm ProSite SEG ProDom
return msf format
# default: single protein sequence description=VP2_ROTBU
MAYRKRGATVEADINNNDRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDE
VKKSTKEESKQLLEVLKTKEEHQKEIQYEILQKTIPTFEPKESILKKLEDIKPEQAKKQT
KLFRIFEPRQLPIYRANGEKELRNRWYWKLKKDTLPDGDYDVREYFLNLYDQVLTEMPDY
LLLKDMAVENKNSRDAGKVVDSETASICDAIFQDEETEGAVRRFIAEMRQRVQADRNVVN
YPSILHPIDYAFNEYFLQHQLVEPLNNDIIFNYIPERIRNDVNYILNMDRNLPSTARYIR
PNLLQDRLNLHDNFESLWDTITTSNYILARSVVPDLKELVSTEAQIQKMSQDLQLEALTI
QSETQFLTGINSQAANDCFKTLIAAMLSQRTMSLDFVTTNYMSLISGMWLLTVVPNDMFI
RESLVACQLAIVNTIIYPAFGMQRMHYRNGDPQTPFQIAEQQIRKFSGSGIGWHFVNNNQ
FRQVVIDGVLNQVLNDNIRNVHVIKQLMQALMQLSRQQFPTMPVDYKRSIQRGILLLSNR
LGQLVDLTRLLAYNYETLMACVTMNMQHVQTLTTEKLQLTSVTSLCMLIGNATVIPSPQT
LFHYYNVNVNFHSNYNERINDAVAIITAANRLNLYQKKMKAIVEDFLKRLHIFDVARVPD
DQMYRLRDRLRLLPVEVRRLDIFNLILMNMDQIERASDKIAQGVIIAYRDMQLERDEMYG
YVNIARNLDGFQQINLEELMRTGDYAQITNMLLNNQPVALVGALPFVTDSSVISLIAKLD
ATVFAQIVKLRKVDTLKPILYKINSDSNDFYLVANYDWVPTSTTKVYKQVPQQFDFRNSM
HMLTSNLTFTVYSDLLAFVSADTVEPINAVAFDNMRIMNEL

________________________________________________________________________________





Result of PROSITE search (Amos Bairoch): 	
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

please quote: A Bairoch, P Bucher & K Hofmann: The PROSITE database,
its status in 1997. Nucl. Acids Res., 1997, 25, 217-221.

________________________________________________________________________________


--------------------------------------------------------
Pattern-ID: ASN_GLYCOSYLATION PS00001 PDOC00001
Pattern-DE: N-glycosylation site
Pattern:    N[^P][ST][^P]
   591      NATV
   846      NLTF

Pattern-ID: GLYCOSAMINOGLYCAN PS00002 PDOC00002
Pattern-DE: Glycosaminoglycan attachment site
Pattern:    SG.G
   467      SGSG

Pattern-ID: CAMP_PHOSPHO_SITE PS00004 PDOC00004
Pattern-DE: cAMP- and cGMP-dependent protein kinase phosphorylation site
Pattern:    [RK]{2}.[ST]
   62       KKST
   117      KKQT
   151      KKDT
   464      RKFS

Pattern-ID: PKC_PHOSPHO_SITE PS00005 PDOC00005
Pattern-DE: Protein kinase C phosphorylation site
Pattern:    [ST].[RK]
   37       SDK
   42       SKK
   64       STK
   295      TAR
   388      SQR
   538      SNR
   574      TEK
   697      SDK
   795      TLK
   823      TTK

Pattern-ID: CK2_PHOSPHO_SITE PS00006 PDOC00006
Pattern-DE: Casein kinase II phosphorylation site
Pattern:    [ST].{2}[DE]
   42       SKKE
   51       SQEE
   64       STKE
   78       TKEE
   154      TLPD
   206      SICD
   316      SLWD

Pattern-ID: TYR_PHOSPHO_SITE PS00007 PDOC00007
Pattern-DE: Tyrosine kinase phosphorylation site
Pattern:    [RK].{2,3}[DE].{2,3}Y
   277      RIRNDVNY
   657      RVPDDQMY

Pattern-ID: MYRISTYL PS00008 PDOC00008
Pattern-DE: N-myristoylation site
Pattern:    G[^EDRKHPFYW].{2}[STAGCN][^P]
   468      GSGIGW
   703      GVIIAY

Pattern-ID: LEUCINE_ZIPPER PS00029 PDOC00029
Pattern-DE: Leucine zipper pattern
Pattern:    L.{6}L.{6}L.{6}L
   537      LSNRLGQLVDLTRLLAYNYETL
   666      LRDRLRLLPVEVRRLDIFNLIL



________________________________________________________________________________





Result of SEG low-complexity search (JC Wootton & S Federhen):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

please quote: J C Wootton & S Federhen:  Analysis of compositionally
biased regions in sequence databases.  Meth. in Enzymol.
1996, 266, 554-571.					

NOTE 1:       regions of low-complexity ('simple sequence' or 'compo-
		    sition biased regions') are marked by the letter 'x' in
		    the following output.
NOTE 2:       The dynamic programming algorithm (MaxHom) does NOT take
		    the SEG information into account, nor do the PHD pre-
dictions!
		    	     		
!!! --> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! <--  !!!
!!! --> WE STRONGLY suggest that you resubmit the regions NOT marked by <--  !!!
!!! -->             'x' separately!!	                                <--  !!!
!!! --> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! <--  !!!
		    	
________________________________________________________________________________


>prot ,  (#) ppOld, default: single protein sequence description=vp2_rotbu /home/phd/server/work/predict_h11817647
MAYRKRGATVEADINNNDRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDE
VKKSTKEESKQLLEVLKTKEEHQKEIQYEILQKTIPTFEPKESILKKLEDIKPEQAKKQT
KLFRIFEPRQLPIYRANGEKELRNRWYWKLKKDTLPDGDYDVREYFLNLYDQVLTEMPDY
LLLKDMAVENKNSRDAGKVVDSETASICDAIFQDEETEGAVRRFIAEMRQRVQADRNVVN
YPSILHPIDYAFNEYFLQHQLVEPLNNDIIFNYIPERIRNDVNYILNMDRNLPSTARYIR
PNLLQDRLNLHDNFESLWDTITTSNYILARSVVPDLKELVSTEAQIQKMSQDLQLEALTI
QSETQFLTGINSQAANDCFKTLIAAMLSQRTMSLDFVTTNYMSLISGMWLLTVVPNDMFI
RESLVACQLAIVNTIIYPAFGMQRMHYRNGDPQTPFQIAEQQIRKFSGSGIGWHFVNNNQ
FRQVVIDGVLNQVLNDNIRNVHVIKQLMQALMQLSRQQFPTMPVDYKRSIQRGILLLSNR
LGQLVDLTRLLAYNYETLMACVTMNMQHVQTLTTEKLQLTSVTSLCMLIGNATVIPSPQT
LFHYYNVNVNFHSNYNERINDAVAIITAANRLNLYQKKMKAIVEDFLKRLHIFDVARVPD
DQMYxxxxxxxxxxxxxxxxxIFNLILMNMDQIERASDKIAQGVIIAYRDMQLERDEMYG
YVNIARNLDGFQQINLEELMRTGDYAQITNMLLNNQPVALVGALPFVTDSSVISLIAKLD
ATVFAQIVKLRKVDTLKPILYKINSDSNDFYLVANYDWVPTSTTKVYKQVPQQFDFRNSM
HMLTSNLTFTVYSDLLAFVSADTVEPINAVAFDNMRIMNEL


________________________________________________________________________________





Result of ProDom domain search (Sonnhammer; Corpet, Gouzy, Kahn):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- please quote: ELL Sonnhammer & D Kahn, Prot. Sci., 1994, 3, 482-492

________________________________________________________________________________


--- ------------------------------------------------------------
--- Results from running BLAST against PRODOM domains
---
--- PLEASE quote:
---       F Corpet, J Gouzy, D Kahn (1998).  The ProDom database
---       of protein domain families. Nucleic Ac Res 26:323-326.
---
--- BEGIN of BLASTP output
BLASTP 1.4.7 [16-Oct-94] [Build 12:52:03 Oct 30 1994]

Reference:  Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers,
and David J. Lipman (1990).  Basic local alignment search tool.  J. Mol. Biol.
215:403-10.

Query=  prot ,  (#) ppOld, default: single protein sequence
     description=vp2_rotbu /home/phd/server/work/predict_h11817647
        (881 letters)

Database:  prodom_99_2
           157,167 sequences; 18,560,502 total letters.
Searching..................................................done

                                                                     Smallest
                                                                       Sum
                                                              High  Probability
Sequences producing High-scoring Segment Pairs:              Score  P(N)      N

 PD008866 p99.2 (8) VP2(5) O55591(1) Q86218(1)  // VP2 PR...  1972  0.0       3
 PD000002 p99.2 (2100) ATPF(44) MYSP(31) MYSA(29)  // PRO...   108  3.1e-06   1
 PD000422 p99.2 (108) TOP1(17) NFH(4) Q20007(3)  // PROTE...    91  0.0010    1
 PD140913 p99.2 (1) YMT4_YEAST // HYPOTHETICAL 55.4 KD PR...    56  0.0011    5
 PD001891 p99.2 (39) GRPE(22)  // HEAT SHOCK PROTEIN CHAP...    81  0.017     1
 PD000023 p99.2 (401) TPM1(11) Q25893(7) TPM2(6)  // PROT...    81  0.018     1
 PD074905 p99.2 (1) O42263_XENLA // KINESINRELATED PROTEI...    75  0.019     2
 PD031168 p99.2 (3) O06969(1) P54(1) US45(1)  // PROTEIN ...    64  0.027     2
 PD107575 p99.2 (1) YN8V_YEAST // HYPOTHETICAL 36.4 KD PR...    74  0.051     1



>PD008866 p99.2 (8) VP2(5) O55591(1) Q86218(1)  // VP2 PROTEIN RNABINDING MAJOR
  INTERNAL CORE NUCLEOCAPSID RNA COMPLETE CDS
  Length = 891

 Score = 1972 (903.1 bits), Expect = 0.0, Sum P(3) = 0.0
 Identities = 374/423 (88%), Positives = 399/423 (94%)

Query:    44 KEEVVTDSQEEIKIRDEVKKSTKEESKQLLEVLKTKEEHQKEIQYEILQKTIPTFEPKES 103
             K++  T  +EEIKI D+VK STKEESKQLLEVLKTKEEHQKE+QYEILQKTIPTFEP ES
Sbjct:    56 KDKKDTPPEEEIKINDQVKBSTKEESKQLLEVLKTKEEHQKEVQYEILQKTIPTFEPBES 115

Query:   104 ILKKLEDIKPEQAKKQTKLFRIFEPRQLPIYRANGEKELRNRWYWKLKKDTLPDGDYDVR 163
             ILKKLEDIKPEQ KKQ KLFR+FEP+QLPIYRANGEKELRNRWYWKLKKD LPDGDYDVR
Sbjct:   116 ILKKLEDIKPEQCKKQNKLFRLFEPKQLPIYRANGEKELRNRWYWKLKKDDLPDGDYDVR 175

Query:   164 EYFLNLYDQVLTEMPDYLLLKDMAVENKNSRDAGKVVDSETASICDAIFQDEETEGAVRR 223
             EYFLNLYDQVL EMPDYLLLKDMAVENKNSRDAGKVVDSETA ICD IFQDEETEG VRR
Sbjct:   176 EYFLNLYDQVLDEMPDYLLLKDMAVENKNSRDAGKVVDSETAEICDEIFQDEETEGYVRR 235

Query:   224 FIAEMRQRVQADRNVVNYPSILHPIDYAFNEYFLQHQLVEPLNNDIIFNYIPERIRNDVN 283
             FIA+MRQRVQADRN VNYP+ILHPID+ FNEYFL+HQL+EPL N+IIFNYIPER+RND N
Sbjct:   236 FIADMRQRVQADRNTVNYPAILHPIDHEFNEYFLEHQLIEPLTNEIIFNYIPERLRNDPN 295

Query:   284 YILNMDRNLPSTARYIRPNLLQDRLNLHDNFESLWDTITTSNYILARSVVPDLKELVSTE 343
             YILNMD NLP+TARYIRP LLQDRLNLHDNFES+WDTIT +NY+LARSVVPDLKELVSTE
Sbjct:   296 YILNMDANLPTTARYIRPYLLQDRLNLHDNFESIWDTITHANYVLARSVVPDLKELVSTE 355

Query:   344 AQIQKMSQDLQLEALTIQSETQFLTGINSQAANDCFKTLIAAMLSQRTMSLDFVTTNYMS 403
             AQIQKMSQDLQLEALTIQSETQFLTGINSQAANDCFKT+IA MLSQRTMSLDFVTTNYMS
Sbjct:   356 AQIQKMSQDLQLEALTIQSETQFLTGINSQAANDCFKTIIATMLSQRTMSLDFVTTNYMS 415

Query:   404 LISGMWLLTVVPNDMFIRESLVACQLAIVNTIIYPAFGMQRMHYRNGDPQTPFQIAEQQI 463
             LIS MWL+T+VPNDMFIRESLVACQLA++NTIIYPAFGMQRMHYRNGDP+TPFQIAEQQI
Sbjct:   416 LISCMWLMTIVPNDMFIRESLVACQLAVINTIIYPAFGMQRMHYRNGDPRTPFQIAEQQI 475

Query:   464 RKF 466
             + F
Sbjct:   476 QNF 478

 Score = 1904 (872.0 bits), Expect = 0.0, Sum P(3) = 0.0
 Identities = 369/407 (90%), Positives = 388/407 (95%)

Query:   474 HFVNNNQFRQVVIDGVLNQVLNDNIRNVHVIKQLMQALMQLSRQQFPTMPVDYKRSIQRG 533
             HF NNNQFRQVVIDGVLNQ LNDNIRN H++ QLM+ALMQLSRQQFPT P+DYKRS+QRG
Sbjct:   485 HFCNNNQFRQVVIDGVLNQTLNDNIRNGHIVNQLMEALMQLSRQQFPTYPIDYKRSVQRG 544

Query:   534 ILLLSNRLGQLVDLTRLLAYNYETLMACVTMNMQHVQTLTTEKLQLTSVTSLCMLIGNAT 593
             ILLLSNRLGQLVDLTRL+ YNYETLMAC+TMNMQHVQTLTTEKLQLTSVTSLCMLIGN T
Sbjct:   545 ILLLSNRLGQLVDLTRLICYNYETLMACITMNMQHVQTLTTEKLQLTSVTSLCMLIGNTT 604

Query:   594 VIPSPQTLFHYYNVNVNFHSNYNERINDAVAIITAANRLNLYQKKMKAIVEDFLKRLHIF 653
             VIP PQTLFHYYN NVNFHSNYNERINDAVAIITAANRL+LYQKKMK+IVEDFLKRLHIF
Sbjct:   605 VIPEPQTLFHYYNTNVNFHSNYNERINDAVAIITAANRLDLYQKKMKSIVEDFLKRLHIF 664

Query:   654 DVARVPDDQMYRLRDRLRLLPVEVRRLDIFNLILMNMDQIERASDKIAQGVIIAYRDMQL 713
             DV +VPDDQMYRLRDRLR LPVE RRLD+FN+I+ NMDQIERASDKI QGVIIAYRDMQL
Sbjct:   665 DVPKVPDDQMYRLRDRLRNLPVERRRLDVFNIIMNNMDQIERASDKICQGVIIAYRDMQL 724

Query:   714 ERDEMYGYVNIARNLDGFQQINLEELMRTGDYAQITNMLLNNQPVALVGALPFVTDSSVI 773
             E DEMYGYVNIAR+L+GFQQINLEELMRTGDY+QITNMLLNNQPVALVGA+PFVTDSSVI
Sbjct:   725 EYDEMYGYVNIARDLNGFQQINLEELMRTGDYSQITNMLLNNQPVALVGAIPFVTDSSVI 784

Query:   774 SLIAKLDATVFAQIVKLRKVDTLKPILYKINSDSNDFYLVANYDWVPTSTTKVYKQVPQQ 833
             SLIAKLDATVFAQIVK RKVDTLKPILYKINSDSNDFYLV NYDWVPTSTTKVYKQVPQQ
Sbjct:   785 SLIAKLDATVFAQIVKDRKVDTLKPILYKINSDSNDFYLVHNYDWVPTSTTKVYKQVPQQ 844

Query:   834 FDFRNSMHMLTSNLTFTVYSDLLAFVSADTVEPINAVAFDNMRIMNE 880
             FDFR+SMHMLTSNLTFTVY+DLL FVSADTVEPINAVAFDNMRIM E
Sbjct:   845 FDFRHSMHMLTSNLTFTVYNDLLKFVSADTVEPINAVAFDNMRIMQE 891

 Score = 92 (42.1 bits), Expect = 0.0, Sum P(3) = 0.0
 Identities = 18/32 (56%), Positives = 23/32 (71%)

Query:     1 MAYRKRGATVEADINNNDRMQEKDDEKQDQNN 32
             MAYRKRGA  E     N+R QEK+DEK+ +N+
Sbjct:     1 MAYRKRGANTEQKDAENERQQEKEDEKEIKND 32

 Score = 77 (35.3 bits), Expect = 0.0, Sum P(3) = 0.0
 Identities = 16/42 (38%), Positives = 25/42 (59%)

Query:    15 NNNDRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIK 56
             N   + +E + E ++   + QL +KVLSKKE V+TD   + K
Sbjct:    17 NERQQEKEDEKEIKNDAKKQQLDEKVLSKKENVITDKDVKDK 58

 Score = 45 (20.6 bits), Expect = 0.00020, Sum P(2) = 0.00020
 Identities = 11/46 (23%), Positives = 21/46 (45%)

Query:   214 DEETEGAVRRFIAEMRQRVQADRNVVNYPSILHPIDYAFNEYFLQH 259
             D      + +  A +  ++  DR V     IL+ I+   N+++L H
Sbjct:   780 DSSVISLIAKLDATVFAQIVKDRKVDTLKPILYKINSDSNDFYLVH 825

 Score = 43 (19.7 bits), Expect = 0.0, Sum P(3) = 0.0
 Identities = 11/49 (22%), Positives = 23/49 (46%)

Query:    23 KDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDEVKKSTKEESKQ 71
             KD +       ++++D+V    +E      E +K ++E +K  + E  Q
Sbjct:    56 KDKKDTPPEEEIKINDQVKBSTKEESKQLLEVLKTKEEHQKEVQYEILQ 104

 Score = 41 (18.8 bits), Expect = 2.7e-285, Sum P(3) = 2.7e-285
 Identities = 8/27 (29%), Positives = 14/27 (51%)

Query:   820 PTSTTKVYKQVPQQFDFRNSMHMLTSN 846
             P +  ++ +Q  Q F  RN +H   +N
Sbjct:   464 PRTPFQIAEQQIQNFQVRNWLHFCNNN 490

 Score = 38 (17.4 bits), Expect = 0.0, Sum P(3) = 0.0
 Identities = 9/39 (23%), Positives = 20/39 (51%)

Query:    17 NDRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEI 55
             ND  +++ DEK        ++DK +  K++   + + +I
Sbjct:    31 NDAKKQQLDEKVLSKKENVITDKDVKDKKDTPPEEEIKI 69

 Score = 37 (16.9 bits), Expect = 0.0, Sum P(3) = 0.0
 Identities = 12/49 (24%), Positives = 22/49 (44%)

Query:     5 KRGATVEADINNNDRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQE 53
             ++ A  E      D  + K+D K+ Q +   LS K     ++ V D ++
Sbjct:    12 QKDAENERQQEKEDEKEIKNDAKKQQLDEKVLSKKENVITDKDVKDKKD 60

 Score = 34 (15.6 bits), Expect = 0.0, Sum P(3) = 0.0
 Identities = 7/26 (26%), Positives = 14/26 (53%)

Query:     5 KRGATVEADINNNDRMQEKDDEKQDQ 30
             K+    E +I  ND++++   E+  Q
Sbjct:    58 KKDTPPEEEIKINDQVKBSTKEESKQ 83


>PD000002 p99.2 (2100) ATPF(44) MYSP(31) MYSA(29)  // PROTEIN COILED COIL CHAIN
  MYOSIN REPEAT HEAVY ATPBINDING FILAMENT HEPTAD
  Length = 239

 Score = 108 (49.5 bits), Expect = 3.1e-06, P = 3.1e-06
 Identities = 24/102 (23%), Positives = 59/102 (57%)

Query:    18 DRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDEVKKSTKEESKQLLEVLK 77
             + M+E ++EK+++  ++Q  +K   + EE + + QEE++ ++E  K  +EE ++ +E  +
Sbjct:    15 EEMEELEEEKEEEEEKLQELEKKKKELEEEMEELQEEMEEQEEKMKEEQEEKEEEMEEKE 74

Query:    78 TKEEHQKEIQYEILQKTIPTFEPKESILKKLEDIKPEQAKKQ 119
              + E Q+E   E  ++     E +E   +++E+   E+ +++
Sbjct:    75 EEMEEQEEEMEEQKEEQEEKQEEQEEEQEEMEEEMEEEQEEE 116

 Score = 105 (48.1 bits), Expect = 8.2e-06, P = 8.2e-06
 Identities = 29/131 (22%), Positives = 62/131 (47%)

Query:    11 EADINNNDRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDEVKKSTKEESK 70
             E +    ++ +E +++K++Q  + +  ++   + EE + + QEE + + E +K   EE +
Sbjct:    72 EKEEEMEEQEEEMEEQKEEQEEKQEEQEEEQEEMEEEMEEEQEEEEEKQEEEKEEMEEEQ 131

Query:    71 QLLEVLKTKEEHQKEIQYEILQKTIPTFEPKESILKKLEDIKPEQAKKQTKLFRIFEPRQ 130
             + +E    ++E QKE   E  +K     E  E  +++ E+   E  ++  K     E ++
Sbjct:   132 KEMEEQMEEQEEQKEEMEEEQKKLEQELEELEEEMEEQEEEMEEMEEEMEKQQEEMEEQE 191

Query:   131 LPIYRANGEKE 141
                     EKE
Sbjct:   192 EEKEEQQEEKE 202

 Score = 103 (47.2 bits), Expect = 1.6e-05, P = 1.6e-05
 Identities = 21/68 (30%), Positives = 39/68 (57%)

Query:    18 DRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDEVKKSTKEESKQLLEVLK 77
             + M+E+++E ++    M+   + + ++EE   + QEE +   E K+  KEE +Q  E  K
Sbjct:   164 EEMEEQEEEMEEMEEEMEKQQEEMEEQEEEKEEQQEEKEEEQEEKEEQKEEQEQEQEEQK 223

Query:    78 TKEEHQKE 85
              +EE Q+E
Sbjct:   224 QQEEEQEE 231

 Score = 101 (46.3 bits), Expect = 3.0e-05, P = 3.0e-05
 Identities = 27/131 (20%), Positives = 66/131 (50%)

Query:    11 EADINNNDRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDEVKKSTKEESK 70
             E +    ++ +E ++++++   + +  ++   ++EE   + +EE++   E ++  +EE K
Sbjct:    65 EKEEEMEEKEEEMEEQEEEMEEQKEEQEEKQEEQEEEQEEMEEEMEEEQEEEEEKQEEEK 124

Query:    71 QLLEVLKTKEEHQKEIQYEILQKTIPTFEPKESILKKLEDIKPEQAKKQTKLFRIFEPRQ 130
             + +E  + + E Q E Q E  ++     +  E  L++LE+   EQ ++  ++    E +Q
Sbjct:   125 EEMEEEQKEMEEQMEEQEEQKEEMEEEQKKLEQELEELEEEMEEQEEEMEEMEEEMEKQQ 184

Query:   131 LPIYRANGEKE 141
               +     EKE
Sbjct:   185 EEMEEQEEEKE 195

 Score = 98 (44.9 bits), Expect = 7.9e-05, P = 7.9e-05
 Identities = 24/93 (25%), Positives = 52/93 (55%)

Query:    27 KQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDEVKKSTKEESKQLLEVLKTKEEHQKEI 86
             K+    +++  ++ + + EE   + +E+++  ++ KK  +EE ++L E ++ +EE  KE
Sbjct:     3 KEQMQEQLKELEEEMEELEEEKEEEEEKLQELEKKKKELEEEMEELQEEMEEQEEKMKEE 62

Query:    87 QYEILQKTIPTFEPKESILKKLEDIKPEQAKKQ 119
             Q E  ++     E  E   +++E+ K EQ +KQ
Sbjct:    63 QEEKEEEMEEKEEEMEEQEEEMEEQKEEQEEKQ 95

 Score = 97 (44.4 bits), Expect = 0.00011, P = 0.00011
 Identities = 21/99 (21%), Positives = 58/99 (58%)

Query:    21 QEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDEVKKSTKEESKQLLEVLKTKE 80
             +E+++E++ Q    +  ++   + EE + + +E+ +  +E +K  ++E ++L E ++ +E
Sbjct:   111 EEQEEEEEKQEEEKEEMEEEQKEMEEQMEEQEEQKEEMEEEQKKLEQELEELEEEMEEQE 170

Query:    81 EHQKEIQYEILQKTIPTFEPKESILKKLEDIKPEQAKKQ 119
             E  +E++ E+ ++     E +E   ++ E+ + EQ +K+
Sbjct:   171 EEMEEMEEEMEKQQEEMEEQEEEKEEQQEEKEEEQEEKE 209

 Score = 96 (44.0 bits), Expect = 0.00015, P = 0.00015
 Identities = 22/104 (21%), Positives = 58/104 (55%)

Query:    18 DRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDEVKKSTKEESKQLLEVLK 77
             +++QE + +K++    M+   + + ++EE + + QEE +   E K+   EE ++ +E  K
Sbjct:    29 EKLQELEKKKKELEEEMEELQEEMEEQEEKMKEEQEEKEEEMEEKEEEMEEQEEEMEEQK 88

Query:    78 TKEEHQKEIQYEILQKTIPTFEPKESILKKLEDIKPEQAKKQTK 121
              ++E ++E Q E  ++     E ++   ++ ++ + E+ +++ K
Sbjct:    89 EEQEEKQEEQEEEQEEMEEEMEEEQEEEEEKQEEEKEEMEEEQK 132

 Score = 93 (42.6 bits), Expect = 0.00040, P = 0.00040
 Identities = 18/74 (24%), Positives = 45/74 (60%)

Query:    20 MQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDEVKKSTKEESKQLLEVLKTK 79
             ++E ++E ++Q   M+  ++ + K++E + + +EE + + E K+  +EE ++  E  + +
Sbjct:   159 LEELEEEMEEQEEEMEEMEEEMEKQQEEMEEQEEEKEEQQEEKEEEQEEKEEQKEEQEQE 218

Query:    80 EEHQKEIQYEILQK 93
             +E QK+ + E  +K
Sbjct:   219 QEEQKQQEEEQEEK 232

 Score = 91 (41.7 bits), Expect = 0.00077, P = 0.00077
 Identities = 22/119 (18%), Positives = 64/119 (53%)

Query:     4 RKRGATVEADINNNDRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDEVKK 63
             +K+    E +    +  ++++  K++Q  + +  ++   + EE   + +E+ + ++E ++
Sbjct:    37 KKKELEEEMEELQEEMEEQEEKMKEEQEEKEEEMEEKEEEMEEQEEEMEEQKEEQEEKQE 96

Query:    64 STKEESKQLLEVLKTKEEHQKEIQYEILQKTIPTFEPKESILKKLEDIKPEQAKKQTKL 122
               +EE +++ E ++ ++E ++E Q E  ++     +  E  +++ E+ K E  ++Q KL
Sbjct:    97 EQEEEQEEMEEEMEEEQEEEEEKQEEEKEEMEEEQKEMEEQMEEQEEQKEEMEEEQKKL 155

 Score = 88 (40.3 bits), Expect = 0.0020, P = 0.0020
 Identities = 23/109 (21%), Positives = 55/109 (50%)

Query:    11 EADINNNDRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDEVKKSTKEESK 70
             E      + M+EK++E ++Q   M+   +   +K+E   + QEE++   E ++  +EE +
Sbjct:    61 EEQEEKEEEMEEKEEEMEEQEEEMEEQKEEQEEKQEEQEEEQEEMEEEMEEEQEEEEEKQ 120

Query:    71 QLLEVLKTKEEHQKEIQYEILQKTIPTFEPKESILKKLEDIKPEQAKKQ 119
             +  +    +E+ + E Q E  ++     E ++  L++  +   E+ ++Q
Sbjct:   121 EEEKEEMEEEQKEMEEQMEEQEEQKEEMEEEQKKLEQELEELEEEMEEQ 169

 Score = 87 (39.8 bits), Expect = 0.0028, P = 0.0028
 Identities = 25/121 (20%), Positives = 62/121 (51%)

Query:    21 QEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDEVKKSTKEESKQLLEVLKTKE 80
             +EK++E++      +   ++  + EE+  + +E+ +   E ++  +EE ++  E ++ +E
Sbjct:    22 EEKEEEEEKLQELEKKKKELEEEMEELQEEMEEQEEKMKEEQEEKEEEMEEKEEEMEEQE 81

Query:    81 EHQKEIQYEILQKTIPTFEPKESILKKLEDIKPEQAKKQTKLFRIFEPRQLPIYRANGEK 140
             E  +E + E  +K     E +E + +++E+ + E+ +KQ +     E  Q  +     E+
Sbjct:    82 EEMEEQKEEQEEKQEEQEEEQEEMEEEMEEEQEEEEEKQEEEKEEMEEEQKEMEEQMEEQ 141

Query:   141 E 141
             E
Sbjct:   142 E 142

 Score = 86 (39.4 bits), Expect = 0.0039, P = 0.0039
 Identities = 21/119 (17%), Positives = 63/119 (52%)

Query:     1 MAYRKRGATVEADINNNDRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDE 60
             M  +K     + +    ++ + +++ +++Q    +  ++   + EE   + +E+++ ++E
Sbjct:    84 MEEQKEEQEEKQEEQEEEQEEMEEEMEEEQEEEEEKQEEEKEEMEEEQKEMEEQMEEQEE 143

Query:    61 VKKSTKEESKQLLEVLKTKEEHQKEIQYEILQKTIPTFEPKESILKKLEDIKPEQAKKQ 119
              K+  +EE K+L + L+  EE  +E + E+ +      + +E + ++ E+ + +Q +K+
Sbjct:   144 QKEEMEEEQKKLEQELEELEEEMEEQEEEMEEMEEEMEKQQEEMEEQEEEKEEQQEEKE 202

 Score = 84 (38.5 bits), Expect = 0.0074, P = 0.0074
 Identities = 20/105 (19%), Positives = 59/105 (56%)

Query:    23 KDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDEVKKSTKEESKQLLEVLKTKEEH 82
             K+++++ +    +  +++  ++EE+    +E+ + ++E ++  +E  +++ E  + +EE
Sbjct:    60 KEEQEEKEEEMEEKEEEMEEQEEEMEEQKEEQEEKQEEQEEEQEEMEEEMEEEQEEEEEK 119

Query:    83 QKEIQYEILQKTIPTFEPKESILKKLEDIKPEQAKKQTKLFRIFE 127
             Q+E + E+ ++     E  E   ++ E+++ EQ K + +L  + E
Sbjct:   120 QEEEKEEMEEEQKEMEEQMEEQEEQKEEMEEEQKKLEQELEELEE 164

 Score = 83 (38.0 bits), Expect = 0.010, P = 0.010
 Identities = 19/102 (18%), Positives = 56/102 (54%)

Query:    18 DRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDEVKKSTKEESKQLLEVLK 77
             +  +E++++ Q+   + +  ++ + + +E + + +E++K   E K+   EE ++ +E  +
Sbjct:    22 EEKEEEEEKLQELEKKKKELEEEMEELQEEMEEQEEKMKEEQEEKEEEMEEKEEEMEEQE 81

Query:    78 TKEEHQKEIQYEILQKTIPTFEPKESILKKLEDIKPEQAKKQ 119
              + E QKE Q E  ++     E  E  +++ ++ + E+ +++
Sbjct:    82 EEMEEQKEEQEEKQEEQEEEQEEMEEEMEEEQEEEEEKQEEE 123

 Score = 81 (37.1 bits), Expect = 0.020, P = 0.019
 Identities = 19/109 (17%), Positives = 58/109 (53%)

Query:    11 EADINNNDRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDEVKKSTKEESK 70
             E +    +  +E ++E+++   +M+  ++   + EE     ++E++  +E  +  +EE +
Sbjct:   115 EEEEKQEEEKEEMEEEQKEMEEQMEEQEEQKEEMEEEQKKLEQELEELEEEMEEQEEEME 174

Query:    71 QLLEVLKTKEEHQKEIQYEILQKTIPTFEPKESILKKLEDIKPEQAKKQ 119
             ++ E ++ ++E  +E + E  ++     E +E   ++ E+ + EQ +++
Sbjct:   175 EMEEEMEKQQEEMEEQEEEKEEQQEEKEEEQEEKEEQKEEQEQEQEEQK 223

 Score = 80 (36.6 bits), Expect = 0.027, P = 0.027
 Identities = 17/67 (25%), Positives = 38/67 (56%)

Query:    18 DRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDEVKKSTKEESKQLLEVLK 77
             + M+E ++E + Q   M+  ++   +++E   + QEE + + E ++  +EE KQ  E  +
Sbjct:   171 EEMEEMEEEMEKQQEEMEEQEEEKEEQQEEKEEEQEEKEEQKEEQEQEQEEQKQQEEEQE 230

Query:    78 TKEEHQK 84
              K++ +K
Sbjct:   231 EKKKQKK 237

 Score = 77 (35.3 bits), Expect = 0.071, P = 0.069
 Identities = 21/109 (19%), Positives = 53/109 (48%)

Query:    11 EADINNNDRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDEVKKSTKEESK 70
             E      + M+E+  E ++Q    +   + + ++++ +    EE++   E ++   EE +
Sbjct:   118 EKQEEEKEEMEEEQKEMEEQMEEQEEQKEEMEEEQKKLEQELEELEEEMEEQEEEMEEME 177

Query:    71 QLLEVLKTKEEHQKEIQYEILQKTIPTFEPKESILKKLEDIKPEQAKKQ 119
             + +E  + + E Q+E + E  ++     E KE   ++ E  + EQ +++
Sbjct:   178 EEMEKQQEEMEEQEEEKEEQQEEKEEEQEEKEEQKEEQEQEQEEQKQQE 226

 Score = 76 (34.8 bits), Expect = 0.098, P = 0.094
 Identities = 18/109 (16%), Positives = 60/109 (55%)

Query:    11 EADINNNDRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDEVKKSTKEESK 70
             E +     + +EK++ +++Q    +  ++   +KEE+  + ++  +  +E+++  +E+ +
Sbjct:   112 EQEEEEEKQEEEKEEMEEEQKEMEEQMEEQEEQKEEMEEEQKKLEQELEELEEEMEEQEE 171

Query:    71 QLLEVLKTKEEHQKEIQYEILQKTIPTFEPKESILKKLEDIKPEQAKKQ 119
             ++ E+ +  E+ Q+E++ +  +K     E +E   +K E  + ++ +++
Sbjct:   172 EMEEMEEEMEKQQEEMEEQEEEKEEQQEEKEEEQEEKEEQKEEQEQEQE 220


>PD000422 p99.2 (108) TOP1(17) NFH(4) Q20007(3)  // PROTEIN TOPOISOMERASE I DNA
  ISOMERASE REPEAT DNABINDING INTERMEDIATE FILAMENT HEPTAD
  Length = 805

 Score = 91 (41.7 bits), Expect = 0.0010, P = 0.0010
 Identities = 32/123 (26%), Positives = 54/123 (43%)

Query:    32 NRMQLSDKVLSKKEEVVTDSQEEIKIRDEVKKSTKEESKQLLEVLKTKEEHQKEIQYEIL 91
             N  + SD +  KKE+      ++ K +++ K  TKE+SK+        E+ +KE   E
Sbjct:   542 NPKKPSDDLFEKKEKDKEPKPKKEKHKEKAKPKTKEKSKKSSNSKAKSEKEKKEKSKEKK 601

Query:    92 QKTIPTFEPKESILKKLEDIKPEQAKKQTKLFRIFEPRQLPIYRANGEKELRNRWYWKLK 151
              K       K+   KK E+  PE   K+ K  +    +  P  +   E + ++    K K
Sbjct:   602 PKPKKKEAKKKKESKKKEEKPPESKSKKEKKEKESPEKSKPEEKEKKESKKKSSKPSKSK 661

Query:   152 KDT 154
             K+T
Sbjct:   662 KET 664


>PD140913 p99.2 (1) YMT4_YEAST // HYPOTHETICAL 55.4 KD PROTEIN IN MCM1NUP116
  INTERGENIC REGION
  Length = 434

 Score = 56 (25.6 bits), Expect = 0.0011, Sum P(5) = 0.0011
 Identities = 13/58 (22%), Positives = 31/58 (53%)

Query:    62 KKSTKEESKQLLEVLKTKEEHQKEIQYEILQKTIPTFEPKESILKKLEDIKPEQAKKQ 119
             K+ T+   +Q ++  + +EE ++E   E ++K     + K+ ++ K +     +AKK+
Sbjct:   187 KRVTRSTRQQAIDASEEEEEEEEEKVQEAVRKRPQRTKTKKVVVSKTKPNPKTKAKKE 244

 Score = 44 (20.2 bits), Expect = 0.0011, Sum P(5) = 0.0011
 Identities = 7/26 (26%), Positives = 17/26 (65%)

Query:     4 RKRGATVEADINNNDRMQEKDDEKQD 29
             +K    ++AD+ +  R +E+ +E++D
Sbjct:    12 KKANEDIDADMESEARDREQSEEEED 37

 Score = 40 (18.3 bits), Expect = 0.0011, Sum P(5) = 0.0011
 Identities = 12/52 (23%), Positives = 24/52 (46%)

Query:   317 LWDTITTSNYILARSVVPDLKELVSTEAQIQKMSQDLQLEALTIQSETQFLT 368
             L D +   N +L   +   + + VS +   + +  +LQ+   T  S  +F+T
Sbjct:   301 LEDKLAGINKLLCDVLCSAINQAVSIKDDFEIILDELQIALDTRGSRNEFIT 352

 Score = 38 (17.4 bits), Expect = 0.0011, Sum P(5) = 0.0011
 Identities = 8/28 (28%), Positives = 16/28 (57%)

Query:    44 KEEVVTDSQEEIKIRDEVKKSTKEESKQ 71
             +EE V + +EE +   +  + T+  S+Q
Sbjct:   140 EEEYVEEEEEENEPEKKAIRPTRSSSRQ 167

 Score = 36 (16.5 bits), Expect = 0.0083, Sum P(6) = 0.0083
 Identities = 8/28 (28%), Positives = 16/28 (57%)

Query:   213 QDEETEGAVRRFIAEMRQRVQADRNVVN 240
             ++EE E  V+  + +  QR +  + VV+
Sbjct:   204 EEEEEEEKVQEAVRKRPQRTKTKKVVVS 231

 Score = 35 (16.0 bits), Expect = 0.0011, Sum P(5) = 0.0011
 Identities = 5/22 (22%), Positives = 12/22 (54%)

Query:   651 HIFDVARVPDDQMYRLRDRLRL 672
             HI+    +PD + ++L   + +
Sbjct:   390 HIYSYQFIPDTEDWQLEQNMEI 411

 Score = 35 (16.0 bits), Expect = 0.0083, Sum P(6) = 0.0083
 Identities = 10/30 (33%), Positives = 13/30 (43%)

Query:   189 ENKNSRDAGKVVDSETASICDAIFQDEETE 218
             E  NS    +V  S      DA  ++EE E
Sbjct:   179 EGGNSNKRKRVTRSTRQQAIDASEEEEEEE 208


>PD001891 p99.2 (39) GRPE(22)  // HEAT SHOCK PROTEIN CHAPERONE GRPE HOMOLOG
  PRECURSOR MITOCHONDRION TRANSIT PEPTIDE
  Length = 189

 Score = 81 (37.1 bits), Expect = 0.017, P = 0.017
 Identities = 22/112 (19%), Positives = 52/112 (46%)

Query:    16 NNDRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDEVKKSTKEESKQLLEV 75
             N + ++E+++E+Q++ +  Q  ++V  + EE+  + +E  +  +E+K        +
Sbjct:     4 NAENLEEENEEEQEEESEEQEEEEVQEETEELEEELEELEEELEELKDKYLRLQAEFENY 63

Query:    76 LKTKEEHQKEIQYEILQKTIPTFEPKESILKKLEDIKPEQAKKQTKLFRIFE 127
              K  +   +E +   +QK +    P    L++     PE A K  ++  + E
Sbjct:    64 RKRTQREMEEAKKYAVQKLLKDLLPVLDNLERALSAVPESASKNEEVKSLVE 115


>PD000023 p99.2 (401) TPM1(11) Q25893(7) TPM2(6)  // PROTEIN REPEAT TROPOMYOSIN
  COILED COIL ALTERNATIVE SPLICING SIGNAL PRECURSOR CHAIN
  Length = 210

 Score = 81 (37.1 bits), Expect = 0.018, P = 0.018
 Identities = 22/104 (21%), Positives = 47/104 (45%)

Query:    18 DRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDEVKKSTKEESKQLLEVLK 77
             ++ +E ++E ++   +MQ +++   K EE     ++E +  +E K+   EE +      +
Sbjct:    37 EKKKEMEEELEEMQKKMQKTEEEKEKSEEEKKKEEQEKEEEEEAKEEEAEEEQAAKNRRE 96

Query:    78 TKEEHQKEIQYEILQKTIPTFEPKESILKKLEDIKPEQAKKQTK 121
                E +KE + E  +      E  E   ++ E  K E+  +  K
Sbjct:    97 QLAEEEKEKKEERQESAKQKAEEAEKAAEESERKKKEEESEAAK 140

 Score = 79 (36.2 bits), Expect = 0.035, P = 0.035
 Identities = 23/120 (19%), Positives = 55/120 (45%)

Query:     2 AYRKRGATVEADINNNDRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDEV 61
             A  +R    E +    +  QE   +K ++  +     +   K+EE     +EE K ++E
Sbjct:    91 AKNRREQLAEEEKEKKEERQESAKQKAEEAEKAAEESERKKKEEESEAAKEEEKKAKEEE 150

Query:    62 KKSTKEESKQLLEVLKTKEEHQKEIQYEILQKTIPTFEPKESILKKLEDIKPEQAKKQTK 121
             K+  +EE ++  E  + +++ +KE + +  ++       + +  K+ ++   E+ K+  K
Sbjct:   151 KEKEEEEEEKAEEKKEAEKQAKKEAEEKAKEEAEAKKAEEAAKAKEEKESVAEKTKEAEK 210

 Score = 76 (34.8 bits), Expect = 0.093, P = 0.089
 Identities = 19/101 (18%), Positives = 53/101 (52%)

Query:    21 QEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDEVKKSTKEESKQLLEVLKTKE 80
             QEK++E++ +    +      +++E++  + +E+ + R E  K   EE+++  E  + K+
Sbjct:    72 QEKEEEEEAKEEEAEEEQAAKNRREQLAEEEKEKKEERQESAKQKAEEAEKAAEESERKK 131

Query:    81 EHQKEIQYEILQKTIPTFEPKESILKKLEDIKPEQAKKQTK 121
             + ++    +  +K     E ++   ++ +  + ++A+KQ K
Sbjct:   132 KEEESEAAKEEEKKAKEEEKEKEEEEEEKAEEKKEAEKQAK 172

 Score = 76 (34.8 bits), Expect = 0.093, P = 0.089
 Identities = 18/111 (16%), Positives = 61/111 (54%)

Query:    11 EADINNNDRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDEVKKSTKEESK 70
             E + +  ++ +E+ ++++++  + + +++  + K      ++EE + ++E ++S K++++
Sbjct:    59 EKEKSEEEKKKEEQEKEEEEEAKEEEAEEEQAAKNRREQLAEEEKEKKEERQESAKQKAE 118

Query:    71 QLLEVLKTKEEHQKEIQYEILQKTIPTFEPKESILKKLEDIKPEQAKKQTK 121
             +  +  +  E  +KE + E  ++     + +E   ++ E+ K E+ K+  K
Sbjct:   119 EAEKAAEESERKKKEEESEAAKEEEKKAKEEEKEKEEEEEEKAEEKKEAEK 169


>PD074905 p99.2 (1) O42263_XENLA // KINESINRELATED PROTEIN MOTOR PROTEIN
  MICROTUBULES ATPBINDING COILED COIL
  Length = 161

 Score = 75 (34.3 bits), Expect = 0.019, Sum P(2) = 0.019
 Identities = 16/46 (34%), Positives = 27/46 (58%)

Query:   310 LHDNFESLWDTITTSNYILARSVVPDLKELVSTEAQIQKMSQDLQL 355
             L D+ +   +++ + N IL  ++   LK    T+AQ+QK  Q+LQL
Sbjct:    85 LKDDLQQKLESLLSENIILKENIDTTLKHHSDTQAQLQKTQQELQL 130

 Score = 38 (17.4 bits), Expect = 0.019, Sum P(2) = 0.019
 Identities = 7/41 (17%), Positives = 22/41 (53%)

Query:    16 NNDRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIK 56
             N   ++   +EK + +N++++  K +     +  D Q++++
Sbjct:    54 NQYLLERLQEEKLELSNKLEILQKEMETSVLLKDDLQQKLE 94


>PD031168 p99.2 (3) O06969(1) P54(1) US45(1)  // PROTEIN PRECURSOR SIGNAL P54
  CELL WALL SECRETED
  Length = 148

 Score = 64 (29.3 bits), Expect = 0.027, Sum P(2) = 0.027
 Identities = 15/62 (24%), Positives = 30/62 (48%)

Query:    20 MQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDEVKKSTKEESKQLLEVLKTK 79
             ++ +D E  +       + K +    + V D+  +++ + E    TKEE K+L + +K
Sbjct:    34 IEAQDKEITELQENQAKAQKQIKDLNDKVLDTSNKVEDKKEENDKTKEEIKKLKKEIKET 93

Query:    80 EE 81
             EE
Sbjct:    94 EE 95

 Score = 47 (21.5 bits), Expect = 0.027, Sum P(2) = 0.027
 Identities = 9/25 (36%), Positives = 18/25 (72%)

Query:   337 KELVSTEAQIQKMSQDLQLEALTIQ 361
             KE+  TE +I+K ++ L+ +A ++Q
Sbjct:    88 KEIKETEERIEKRNETLKKQARSLQ 112


>PD107575 p99.2 (1) YN8V_YEAST // HYPOTHETICAL 36.4 KD PROTEIN IN POP2HOL1
  INTERGENIC REGION
  Length = 80

 Score = 74 (33.9 bits), Expect = 0.053, P = 0.051
 Identities = 16/75 (21%), Positives = 42/75 (56%)

Query:    16 NNDRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDEVKKSTKEESKQLLEV 75
             N+D      DE+ DQ+N +  + K +S K+++ +   E+I+  +E     +++ ++  +V
Sbjct:     6 NSDFEDFSSDEETDQHNVLIQTKKKISSKDDIFSKKVEDIESENESDIEEEQKQEEKEDV 65

Query:    76 LKTKEEHQKEIQYEI 90
              +  +E+ +++  E+
Sbjct:    66 EQPDKENGEKLDREV 80


Parameters:
  E=0.1
  B=500

  V=500
  -ctxfactor=1.00

  Query                        -----  As Used  -----    -----  Computed  ----
  Frame  MatID Matrix name     Lambda    K       H      Lambda    K       H
   +0      0   BLOSUM62        0.317   0.134   0.370    same    same    same

  Query
  Frame  MatID  Length  Eff.Length   E    S W   T  X     E2  S2
   +0      0      881       881     0.10 77 3  11 22    0.20 34


Statistics:
  Query          Expected         Observed           HSPs       HSPs
  Frame  MatID  High Score       High Score       Reportable  Reported
   +0      0    67 (30.7 bits)  1972 (903.1 bits)      46         46

  Query         Neighborhd  Word      Excluded    Failed   Successful  Overlaps
  Frame  MatID   Words      Hits        Hits    Extensions Extensions  Excluded
   +0      0     17991    30225742     5859552    24306066    60098      1958

  Database:  prodom_99_2
    Release date:  unknown
    Posted date:  10:12 PM EDT Jul 29, 1999
  # of letters in database:  18,560,502
  # of sequences in database:  157,167
  # of database sequences satisfying E:  9
  No. of states in DFA:  569 (56 KB)
  Total size of DFA:  230 KB (256 KB)
  Time to generate neighborhood:  0.02u 0.00s 0.02t  Real: 00:00:00
  Time to search database:  40.91u 0.04s 40.95t  Real: 00:00:41
  Total cpu time:  40.95u 0.06s 41.01t  Real: 00:00:42
--- END of BLASTP output
--- ------------------------------------------------------------
---
--- Again: these results were obtained based on the domain data-
--- base collected by Daniel Kahn and his coworkers in Toulouse.
---
--- PLEASE quote:
---       F Corpet, J Gouzy, D Kahn (1998).  The ProDom database
---       of protein domain families. Nucleic Ac Res 26:323-326.
---
--- The general WWW page is on:
----      ---------------------------------------
---       http://www.toulouse.inra.fr/prodom.html
----      ---------------------------------------
---
--- For WWW graphic interfaces to PRODOM, in particular for your
--- protein family, follow the following links (each line is ONE
--- single link for your protein!!):
---
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=PD008866 ==> multiple alignment, consensus, PDB and PROSITE links of domain PD008866
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=PD008866 ==> graphical output of all proteins having domain PD008866
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=PD000002 ==> multiple alignment, consensus, PDB and PROSITE links of domain PD000002
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=PD000002 ==> graphical output of all proteins having domain PD000002
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=PD000422 ==> multiple alignment, consensus, PDB and PROSITE links of domain PD000422
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=PD000422 ==> graphical output of all proteins having domain PD000422
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=PD140913 ==> multiple alignment, consensus, PDB and PROSITE links of domain PD140913
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=PD140913 ==> graphical output of all proteins having domain PD140913
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=PD001891 ==> multiple alignment, consensus, PDB and PROSITE links of domain PD001891
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=PD001891 ==> graphical output of all proteins having domain PD001891
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=PD000023 ==> multiple alignment, consensus, PDB and PROSITE links of domain PD000023
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=PD000023 ==> graphical output of all proteins having domain PD000023
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=PD074905 ==> multiple alignment, consensus, PDB and PROSITE links of domain PD074905
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=PD074905 ==> graphical output of all proteins having domain PD074905
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=PD031168 ==> multiple alignment, consensus, PDB and PROSITE links of domain PD031168
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=PD031168 ==> graphical output of all proteins having domain PD031168
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=PD107575 ==> multiple alignment, consensus, PDB and PROSITE links of domain PD107575
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=PD107575 ==> graphical output of all proteins having domain PD107575
---
--- NOTE: if you want to use the link, make sure the entire line
---       is pasted as URL into your browser!
---
--- END of PRODOM
--- ------------------------------------------------------------

________________________________________________________________________________





The alignment that has been used as input to the network is:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

________________________________________________________________________________

---
--- Version of database searched for alignment:
--- SWISS-PROT release 38.0 (7/99) with 80000 proteins
---

--- ------------------------------------------------------------
--- MAXHOM multiple sequence alignment
--- ------------------------------------------------------------
---
--- MAXHOM ALIGNMENT HEADER: ABBREVIATIONS FOR SUMMARY
--- ID           : identifier of aligned (homologous) protein
--- STRID        : PDB identifier (only for known structures)
--- PIDE         : percentage of pairwise sequence identity
--- WSIM         : percentage of weighted similarity
--- LALI         : number of residues aligned
--- NGAP         : number of insertions and deletions (indels)
--- LGAP         : number of residues in all indels
--- LSEQ2        : length of aligned sequence
--- ACCNUM       : SwissProt accession number
--- NAME         : one-line description of aligned protein
---
--- MAXHOM ALIGNMENT HEADER: SUMMARY
ID         STRID  IDE WSIM LALI NGAP LGAP LEN2 ACCNUM NAME
vp2_rotbu         100  100  881    0    0  881 P17462 RNA-BINDING PROTEIN VP2 (
vp2_rotbr          98   99  880    1    1  880 P12472 RNA-BINDING PROTEIN VP2 (
vp2_rots1          92   95  879    3    4  881 P22672 RNA-BINDING PROTEIN VP2 (
vp2_rothw          91   95  880    3   11  890 P11231 RNA-BINDING PROTEIN VP2 (
vp2_rotpc          46   66  863    7   25  872 P26191 RNA-BINDING PROTEIN VP2 (
---
--- MAXHOM ALIGNMENT: IN MSF FORMAT
MSF of: /home/phd/server/work/predict_h11817647.hsspFilter from:    1 to:  881
 /home/phd/server/work/predict_h11817647.msfRet  MSF:  881  Type: P 23-Dec-99  04:36:5  Check:  448  ..


 Name: predict_h1180    Len:   881  Check: 1498  Weight:  1.00
 Name: vp2_rotbu        Len:   881  Check: 1498  Weight:  1.00
 Name: vp2_rotbr        Len:   881  Check:  440  Weight:  1.00
 Name: vp2_rots1        Len:   881  Check: 8116  Weight:  1.00
 Name: vp2_rothw        Len:   881  Check: 9800  Weight:  1.00
 Name: vp2_rotpc        Len:   881  Check: 9096  Weight:  1.00

//


               1                                                   50
predict_h1180  MAYRKRGATV EADINNNDRM QEKDDEKQDQ NNRMQLSDKV LSKKEEVVTD
vp2_rotbu      MAYRKRGATV EADINNNDRM QEKDDEKQDQ NNRMQLSDKV LSKKEEVVTD
vp2_rotbr      MAYRKRGARR EANINNNDRM QEKDDEKQDQ NNRMQLSDKV LSKKEEVVTD
vp2_rots1      MAYRKRGARR ETNLKQDERM QEKEDSKNIN NdkSQLSEKV LSKKEEIITD
vp2_rothw      MAYRKRGAKR ENLPQQNERL QEKEIEKDvn NRKQQLSDKV LSQKEEIITD
vp2_rotpc      MISRNRRRNT QQKDAEKEKQ TENVEEKEIK EAKEQVKDEK QVITEENVDS

               51                                                 100
predict_h1180  SQEEIKIRDE VKKSTKEESK QLLEVLKTKE EHQKEIQYEI LQKTIPTFEP
vp2_rotbu      SQEEIKIRDE VKKSTKEESK QLLEVLKTKE EHQKEIQYEI LQKTIPTFEP
vp2_rotbr      SQEEIKIADE VKKSTKEESK QLLEVLKTKE EHQKEIQYEI LQKTIPTFEP
vp2_rots1      NQEEVKISDE VKKSNKEESK QLLEVLKTKE EHQKEVQYEI LQKTIPTFEP
vp2_rothw      AQDDIKIAGE IKKSSKEESK QLLEILKTKE DHQKEIQYEI LQKTIPTFES
vp2_rotpc      PKDVKEQSNT VNLQKNDLVK EVINilNTIV AENKVEIEEV VKKYIPSYST

               101                                                150
predict_h1180  KESILKKLED IKPEQAKKQT KLFRIFEPRQ LPIYRANGEK ELRNRWYWKL
vp2_rotbu      KESILKKLED IKPEQAKKQT KLFRIFEPRQ LPIYRANGEK ELRNRWYWKL
vp2_rotbr      KESILKKLED IKPEQAKKQT KLFRIFEPRQ LPIYRANGEK ELRNRWYWKL
vp2_rots1      KESILKKLED IKPEQAKKQT KLFRIFEPKQ LPIYRANGER ELRNRWYWKL
vp2_rothw      KESILKKLED IRPEQAKKQM KLFRIFEPKQ LPIYRANGEK ELRNRWYWKL
vp2_rotpc      DKLIVKNYRN SRIK.CQTYN KLFRLLHVKS Y.LYDVNGEK KLSTRWYWKL

               151                                                200
predict_h1180  KKDTLPDGDY DVREYFLNLY DQVLTEMPDY LLLKDMAVEN KNSRDAGKVV
vp2_rotbu      KKDTLPDGDY DVREYFLNLY DQVLTEMPDY LLLKDMAVEN KNSRDAGKVV
vp2_rotbr      KKDTLPDGDY DVREYFLNLY DQVLTEMPDY LLLKDMAVEN KNSRDAGKVV
vp2_rots1      KRDTLPDGDY DVREYFLNLY DQVLMEMPDY LLLKDMAVEN KNSRDAGKVV
vp2_rothw      KKDTLPDGDY DVREYFLNLY DQILIEMPDY LLLKDMAVEN KNSRDAGKVV
vp2_rotpc      LKDDLPAGDY SVRQFFLSLY LNVLDEMPDY VMLRDMAVDN PYSAEAGKIV

               201                                                250
predict_h1180  DSETASICDA IFQDEETEGA VRRFIAEMRQ RVQADRNVVN YPSILHPIDY
vp2_rotbu      DSETASICDA IFQDEETEGA VRRFIAEMRQ RVQADRNVVN YPSILHPIDY
vp2_rotbr      DSETASICDA IFQDEETEGA VRRFIAEMRQ RVQADRNVVN YPSILHPIDY
vp2_rots1      DSETAAICDA IFQDE.EPKA VRRFIAEMRQ RVQADRNVVN YPSILHPIDH
vp2_rothw      DSETANICDA IFQDEETEGV VRRFIADMRQ QVQADRNIVN YPSILHPIDH
vp2_rotpc      DEKSKEILVE IYQDQMTEGY IRRYMSDLRH RISGETNTAK YPAILHPVDE

               251                                                300
predict_h1180  AFNEYFLQHQ LVEPLNNDII FNYIPERIRN DVNYILNMDR NLPSTARYIR
vp2_rotbu      AFNEYFLQHQ LVEPLNNDII FNYIPERIRN DVNYILNMDR NLPSTARYIR
vp2_rotbr      AFNEYFLQHQ LVEPLNNDII FNYIPERIRN DVNYILNMDR NLPSTARYIR
vp2_rots1      AFNEYFLQHQ LVEPLNNVYI FNYIPERIRN DVNYILNMDR NLPSTARYIR
vp2_rothw      AFNEYFLNHQ LVEPLNNEII FNYIPERIRN DVNYILNMDM NLPSTARYIR
vp2_rotpc      ELNKYFLEHQ LIQPLTTRNI AELIPTQLYH DPNYVFNIDA AFLTNSRFVP

               301                                                350
predict_h1180  PNLLQDRLNL HDNFESLWDT ITTSNYILAR SVVPDLKELV STEAQIQKMS
vp2_rotbu      PNLLQDRLNL HDNFESLWDT ITTSNYILAR SVVPDLKELV STEAQIQKMS
vp2_rotbr      PNLLQDRLNL HDNFESLWDT ITTSNYILAR SVVPDLKELV STEAQIQKMS
vp2_rots1      PNLLQDRLNL HDNFESLWDT ITTSNYILAR SVVPDLKELV STEAQIQKMS
vp2_rothw      PNLLQDRLNL HDNFESLWDT ITTSNYILAR SVVPDLkeLV STEAQIQKMS
vp2_rotpc      PYLTQDRIGL HDGFESIWDA KTHADYVSAR RFVPDLTELV DAEKQMKEML

               351                                                400
predict_h1180  QDLQLEALTI QSETQFLTGI NSQAANDCFK TLIAAMLSQR TMSLDFVTTN
vp2_rotbu      QDLQLEALTI QSETQFLTGI NSQAANDCFK TLIAAMLSQR TMSLDFVTTN
vp2_rotbr      QDLQLEALTI QSETQFLTGI NSQAANDCFK TLIAAMLSQR TMSLDFVTTN
vp2_rots1      QDLQLEALTI QSETQFLTGI NSQAANDCFK TLIAAMLSQR TMSLDFVTTN
vp2_rothw      QDLQLEALTI QSETQFLAGI NSQAANDCFK TLIAAMLSQR TMSLDFVTTN
vp2_rotpc      QC....KLNH NSWQELVHGR .....NEAFK FIIGTVLSTR TIAVEFITSN

               401                                                450
predict_h1180  YMSLISGMWL LTVVPNDMFI RESLVACQLA IVNTIIYPAF GMQRMHYRNG
vp2_rotbu      YMSLISGMWL LTVVPNDMFI RESLVACQLA IVNTIIYPAF GMQRMHYRNG
vp2_rotbr      YMSLISGMWL LTVVPNDMFI RESLVACQLA IVNTIIYPAF GMQRMHYRNG
vp2_rots1      YMSLISGMWL LTVIPNDMFI RESLVACQLA IINTIVYPAF GMQRMHYRNG
vp2_rothw      YMSLISGMWL LTVIPNDMFL RESLVACELA IINTIVYPAF GMQRMHYRNG
vp2_rotpc      YMSLASCMYL MTIMPSEIFL RESLVAMQLA VINTLIYPAL GLAQMHYQAG

               451                                                500
predict_h1180  DPQTPFQIAE QQIRKFSGSG IGWHFVNNNQ FRQVVIDGVL NQVLNDNIRN
vp2_rotbu      DPQTPFQIAE QQIRKFSGSG IGWHFVNNNQ FRQVVIDGVL NQVLNDNIRN
vp2_rotbr      DPQRPFQIAE QQIQNFQVAN W.LHFVNNNQ FRQVVIDGVL NQVLNDNIRN
vp2_rots1      DPQTPFQIAE QQIQNFQVAN W.LHFVNYNQ FRQVVIDGVL NQVLNDNIRN
vp2_rothw      DPQTPFQIAE QQIQNFQVAN W.LHFINNNR FRQVVIDGVL NQTLNDNIRN
vp2_rotpc      EIRreMQVAN RPIRQWL... ...HHCNTLQ FGRQVTEGVT HLRFTNDIMT

               501                                                550
predict_h1180  VHVIKQLMQA LMQLSRQQFP TMPVDYKRSI QRGILLLSNR LGQLVDLTRL
vp2_rotbu      VHVIKQLMQA LMQLSRQQFP TMPVDYKRSI QRGILLLSNR LGQLVDLTRL
vp2_rotbr      GHVINQLMEA LMQLSRQQFP TMPVDYKRSI QRGILLLSNR LGQLVDLTRL
vp2_rots1      GHVVNQLMEA LMQLSRQQFP TMPVDYKRSI QRGIFLLSNR LGQLVDLTRL
vp2_rothw      GQVINQLMEA LMQLSRQQFP TMPVDYKRSI QRGILLLSNR LGQLVDLTRL
vp2_rotpc      GRIVNLFSTM LVALSSQPFA TYPLDYKRSV QRALQLLSNR TAQIADLTRL

               551                                                600
predict_h1180  LAYNYETLMA CVTMNMQHVQ TLTTEKLQLT SVTSLCMLIG NATVIPSPQT
vp2_rotbu      LAYNYETLMA CVTMNMQHVQ TLTTEKLQLT SVTSLCMLIG NATVIPSPQT
vp2_rotbr      LAYNYETLMA CVTMNMQHVQ TLTTEKLQLT SVTSLCMLIG NATVIPSPQT
vp2_rots1      LSYINETLMA CITMNMQHVQ TLTTEKLQLT SVTSLCMLIG NATVIPSPQT
vp2_rothw      VSYNYETLMA CVTMNMQHVQ TLTTEKLQLT SVTSLCMLIG NTTVIPSPQT
vp2_rotpc      IVYNYTTLSA CIVMNMHLVG TLTVERIQAT ALTSLIMLIS NKTVIPEPSS

               601                                                650
predict_h1180  LFHYYNVNVN FHSNYNERIN DAVAIITAAN RLNLYQKKMK AIVEDFLKRL
vp2_rotbu      LFHYYNVNVN FHSNYNERIN DAVAIITAAN RLNLYQKKMK AIVEDFLKRL
vp2_rotbr      LFHYYNVNVN FHSNYNERIN DAVAIITGAN RLNLYQKKMK AIVEDFLKRL
vp2_rots1      LFHYYNVNVN FHSNYNERIN DAVAIITAAN RLNLYQKKMK SIVEDFLKRL
vp2_rothw      LFHYYNINVN FHSNYNERIN DAVAIITAAN RLNLYQKKMK SIVEDFLKRL
vp2_rotpc      LFSYFSSNIN FLTNYNEQID NVVAEIMAAY RLDLYQQKML MLVTRFVSRL

               651                                                700
predict_h1180  HIFDVARVPD DQMYRLRDRL RLLPVEVRRL DIFNLILMNM DQIERASDKI
vp2_rotbu      HIFDVARVPD DQMYRLRDRL RLLPVEVRRL DIFNLILMNM DQIERASDKI
vp2_rotbr      HIFDVARVPD DQMYRLRDRL RLLPVEVRRL DIFNLILMNM DQIERASDKI
vp2_rots1      QIFDVARVPD DQMYRLRDRL RLLPVEIRRL DIFNLIAMNM EQIERASDKI
vp2_rothw      QIFDVPRVPD DQMYRLRDRL RLLPVERRRL DIFNLILMNM EQIERASDKI
vp2_rotpc      YIFDAPKIPP DQMYRLRNRL RNIPVERRRA DVFRIIMNNR DLIEKTSERI

               701                                                750
predict_h1180  AQGVIIAYRD MQLERDEMYG YVNIARNLDG FQQINLEELM RTGDYAQITN
vp2_rotbu      AQGVIIAYRD MQLERDEMYG YVNIARNLDG FQQINLEELM RTGDYAQITN
vp2_rotbr      AQGVIIAYRD MQLERDEMYG YVNIARNLDG FQQINLEELM RTGDYAQITN
vp2_rots1      AQGVIIAYRD MQLERDEMYG YVNIARNLDG FQQINLEELM RSGDYAQITN
vp2_rothw      AQGVIIAYRD MQLERDEMYG YVNIARNLDG YQQINLEELM RTGDYGQITN
vp2_rotpc      CQGVLLSYSP MPLTYVEDVG LTNVVNDTNG FQIINIEEIE KTGDYSAITN

               751                                                800
predict_h1180  MLLNNQPVAL VGALPFVTDS SVISLIAKLD ATVFAQIVKL RKVDTLKPIL
vp2_rotbu      MLLNNQPVAL VGALPFVTDS SVISLIAKLD ATVFAQIVKL RKVDTLKPIL
vp2_rotbr      MLLNNQPVAL VGALPFVTDS SVISLIANVD ATVFAQIVKL RKVDTLKPIL
vp2_rots1      MLLNNQPVAL VGALPFITDS SVISLIAKLD ATVFAQIVKL RKVDTLKPIL
vp2_rothw      MLLNNQPVAL VGALPFVTDS SVISLIAKLD ATVFAQIVKL RKVDTLKPIL
vp2_rotpc      ALLRDTPIIL KGAIPYVTNS SVIDVLSKID TTVFASIVKD RDISKLKPIK

               801                                                850
predict_h1180  YKINSDSNDF YLVANYDWVP TSTTKVYKQV PQQFDFRNSM HMLTSNLTFT
vp2_rotbu      YKINSDSNDF YLVANYDWVP TSTTKVYKQV PQQFDFRNSM HMLTSNLTFT
vp2_rotbr      YKINSDSNDF YLVANYDWVP TSTTKVYKQV PQQFDFRNSM HMLTSNLTFT
vp2_rots1      YKINSDSNDF YLVANYDWIP TSTTKVYKQV PQQFDFRASM HMLTSNLTFT
vp2_rothw      YKINSDSNDF YLVANYDWIP TSTTKVYKQV PQPFDFRASM HMLTSNLTFT
vp2_rotpc      FTINSDSSEY YLVHNNKWTP TTTTAVYKAR SQQFNIQHSV SMLESNLFFV

               851                            881
predict_h1180  VYSDLLAFVS ADTVEPINAV AFDNMRIMNE L
vp2_rotbu      VYSDLLAFVS ADTVEPINAV AFDNMRIMNE L
vp2_rotbr      VYSDLLAFVS ADTVEPINAV AFDNMRIMNE L
vp2_rots1      VYSDLLAFVS ADTVEPINAV AFDNMRIMNE L
vp2_rothw      VYSDLLSFVS ADTVEPINAV AFDNMRIMNE L
vp2_rotpc      VYNDLFKYIK TTTVLPINAV SYDGARIMQE .


________________________________________________________________________________





Result of COILS prediction (Andrei Lupas):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A Lupas: Methods in Enzymology, 1996, 266, 513-525.

version 2.2: Rob B. Russell & Andrei N. Lupas, 1999

________________________________________________________________________________




--- COILS HEADER: SUMMARY

COILS version 2.2: R.B. Russell, A.N. Lupas, 1999
using MTK matrix.
weights: a,d=2.5 and b,c,e,f,g=1.0

For the threshold of 5 ( probability > 0.5):
window size = 14         0  residues in coiled coil domain
window size = 21         0  residues in coiled coil domain
window size = 28        28  residues in coiled coil domain

               .    :    .    :    .    :    .    :    .    5
seq        MAYRKRGATVEADINNNDRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTD
frame-14   aaaaabcdeabcdefgabcabcdefgabcdefgbcdefggabcdefgabc
frame-21   aaabcdefgabcdefgabcdefgabcdefgefgabcdefgabcabcdefg
frame-28   aaababcdeabcdefgabcabcdefgabcdefgabcdefgabcdefgabc
prob-14    ---------111111111111111111111111-----------------
prob-21    --------------------------------------------------
prob-28    --------------------------111111111111111111111111

               .    :    .    :    .    :    .    :    .    10
seq        SQEEIKIRDEVKKSTKEESKQLLEVLKTKEEHQKEIQYEILQKTIPTFEP
frame-14   defgefabcdeabcdabcdefgabcdefgbcdefggbcdefgfgabcabc
frame-21   abcdefgabcdabcabcdefgabcdefgabcdefggbcdefgggefgabc
frame-28   defgefgabcdefgabcdefgabcdefgabcdefgdefgefggdefgfgc
prob-14    ---------------33333333333333111111---------------
prob-21    -----------1111111111111111111111111--------------
prob-28    1111---55555555555555555555555555551111-----------

               .    :    .    :    .    :    .    :    .    15
seq        KESILKKLEDIKPEQAKKQTKLFRIFEPRQLPIYRANGEKELRNRWYWKL
frame-14   abcdefgabcdefgdefggfggabcdefgbabcdefgabcdefgcdefge
frame-21   abcdefgabcdefgabcdefgdefggfgefgbabcdefgabcdefgabcd
frame-28   defgabcdefgabcdefgabcdefgefgefggdefgabcdefggcdeaab
prob-14    --------------------------------------------------
prob-21    --------------------------------------------------
prob-28    --------------------------------------------------

               .    :    .    :    .    :    .    :    .    20
seq        KKDTLPDGDYDVREYFLNLYDQVLTEMPDYLLLKDMAVENKNSRDAGKVV
frame-14   fgggabcdefgaabcdefgabcdefgabcabcabcdefgabcdefgefga
frame-21   efggaabcdefgabcdefgabcdefgabcdefabcabcdefgabcdefga
frame-28   cdefgabcdefgabcdefgabcdefgababcdabcabcdefgabcdefga
prob-14    --------------------------------------------------
prob-21    --------------------------------------------------
prob-28    --------------------------------------------------

               .    :    .    :    .    :    .    :    .    25
seq        DSETASICDAIFQDEETEGAVRRFIAEMRQRVQADRNVVNYPSILHPIDY
frame-14   bcdefgcdefgababcdefgabcdabcdefgabcdefgbcdefggbcabc
frame-21   bcdefgabcdabcdefabcdefgabcdefgabcdefggabcdefgbcabc
frame-28   bcdefgabcdefgabcdefgabcdefgdefgabcdefgbcdefggbcdef
prob-14    --------------------------------------------------
prob-21    --------------------------------------------------
prob-28    --------------------------------------------------

               .    :    .    :    .    :    .    :    .    30
seq        AFNEYFLQHQLVEPLNNDIIFNYIPERIRNDVNYILNMDRNLPSTARYIR
frame-14   defabcdefgabcdefggdefgababcabcdefgabcdefggbcdefgda
frame-21   defgabcdefgabcdefgdefgcdefgabcdefgabcdefgabcdefgga
frame-28   gabcdefgabcdefgabcdeabcdefgaabcdefgabcdefgabcdefga
prob-14    --------------------------------------------------
prob-21    --------------------------------------------------
prob-28    --------------------------------------------------

               .    :    .    :    .    :    .    :    .    35
seq        PNLLQDRLNLHDNFESLWDTITTSNYILARSVVPDLKELVSTEAQIQKMS
frame-14   bcdabcdefgabcdefgabcdefgeabcabcabcdabcabcdefgabcde
frame-21   bcdabcdefgabcdefgabcdefgabcdabcabcdabcdefgabcdefga
frame-28   bcdabcdefgabcdefgabcdefgabcdabcdefgabcdefgabcdefga
prob-14    --------------------------------------111111111111
prob-21    -----------------------------------111111111111111
prob-28    -----------------------------------222222222222222

               .    :    .    :    .    :    .    :    .    40
seq        QDLQLEALTIQSETQFLTGINSQAANDCFKTLIAAMLSQRTMSLDFVTTN
frame-14   fgdefgabcdefgcdefgcdeabcdabcdefgabcdefgefggabcdefg
frame-21   bcdefgabcdefgabcdefgabcdefggabcdefgabcdefggabcdefg
frame-28   bcdefgabcdefgabcdefgabcdefgabcdefggabcdefgabcdefgg
prob-14    11------------------------------------------------
prob-21    111111--------------------------------------------
prob-28    2222222222222-------------------------------------

               .    :    .    :    .    :    .    :    .    45
seq        YMSLISGMWLLTVVPNDMFIRESLVACQLAIVNTIIYPAFGMQRMHYRNG
frame-14   abcdefgabcdefgeabcdabcdefgabcdefgdefgbcdefgabcdeab
frame-21   abcdefgabcdefgababcdefgabcdefgabcdefggabcabcdefgab
frame-28   abcdefgabcdefgabcdefgfgabcdefgabcdefggabcabcdefgab
prob-14    --------------------------------------------------
prob-21    --------------------------------------------------
prob-28    --------------------------------------------------

               .    :    .    :    .    :    .    :    .    50
seq        DPQTPFQIAEQQIRKFSGSGIGWHFVNNNQFRQVVIDGVLNQVLNDNIRN
frame-14   cdefgabcdefgabcdefgfggcabcaaabcdefgabcdeabcdefgaba
frame-21   cdefgabcabcdefgabcdefgabcdabcdefgababcdeabcdefgabc
frame-28   cdefgabcdefgabcdefgabcdefgababcdeababcdeabcdefgabc
prob-14    --------------------------------------------------
prob-21    --------------------------------------------------
prob-28    --------------------------------------------------

               .    :    .    :    .    :    .    :    .    55
seq        VHVIKQLMQALMQLSRQQFPTMPVDYKRSIQRGILLLSNRLGQLVDLTRL
frame-14   bcdabcdefgabcdefggfgggabcaabcdefgabcdefgabcdefgefg
frame-21   defgabcdefgabcdefgfggdeabaabcdefgabcdefgabcdefgdef
frame-28   defgabcdefgabcdefgfggdabcdabcdefgabcdabcdefgabcdef
prob-14    ---33333333333333---------------------------------
prob-21    --------------------------------------------------
prob-28    --------------------------------------------------

               .    :    .    :    .    :    .    :    .    60
seq        LAYNYETLMACVTMNMQHVQTLTTEKLQLTSVTSLCMLIGNATVIPSPQT
frame-14   gabcdefgabcdefgabcabcdefgabcdefgefgefgabcdefgbcdef
frame-21   gabcdefgabcdefgabcdefgbcdefgabcdefgabcdefgefgfgdef
frame-28   gabcdefgabcdefgabcdefgabcdefgefgefgabcdefgefgbcdef
prob-14    --------------------------------------------------
prob-21    --------------------------------------------------
prob-28    --------------------------------------------------

               .    :    .    :    .    :    .    :    .    65
seq        LFHYYNVNVNFHSNYNERINDAVAIITAANRLNLYQKKMKAIVEDFLKRL
frame-14   abcdefgaabcabcdefgabcdefabcdefgabcdefgabcdefgdefga
frame-21   aabcdefgabcabcdefgabcdefabcdefgabcdefgabcdefgdefga
frame-28   aabcabcdefgabcdefgabcabcdefgabcdefgabcdefgabcdefga
prob-14    -------------------------------11111111111111-----
prob-21    --------------------------------------------------
prob-28    --------------------------------------------------

               .    :    .    :    .    :    .    :    .    70
seq        HIFDVARVPDDQMYRLRDRLRLLPVEVRRLDIFNLILMNMDQIERASDKI
frame-14   bcdefgbcdefgabcdefgabcdefgdefabcabcdefgabcdefgabcd
frame-21   bcdefgbcdefgabcdefgabcdeaabcabcdabcdefgabcdefgabcd
frame-28   bcdefgefgdefabcdefgababcabcdaabcabcdabcabcdefgabcd
prob-14    --------------------------------------------------
prob-21    --------------------------------------------------
prob-28    --------------------------------------------------

               .    :    .    :    .    :    .    :    .    75
seq        AQGVIIAYRDMQLERDEMYGYVNIARNLDGFQQINLEELMRTGDYAQITN
frame-14   efgdefgabcdefgabcdefabcdefgabcdefgabcdefggabcdefga
frame-21   efgabcdabcdefgabcdefgabcdefgbcdefgaabcdefgabcdefga
frame-28   efgabcdefgabcdefgdefgabcdefgbcdefgabcdefgabcdefgab
prob-14    --------------------------------------------------
prob-21    --------------------------------------------------
prob-28    --------------------------------------------------

               .    :    .    :    .    :    .    :    .    80
seq        MLLNNQPVALVGALPFVTDSSVISLIAKLDATVFAQIVKLRKVDTLKPIL
frame-14   bcdefgabcdefgaaaababcabcdabcdefgabcdefgdefgbcdefga
frame-21   bcdefgefgdefgabcababcdabcabcdefgabcdefgabcdefgefga
frame-28   cdefggabcdefgababcabcdefgabcdefgabcdefgabcdefgabcd
prob-14    --------------------------------------------------
prob-21    --------------------------------------------------
prob-28    --------------------------------------------------

               .    :    .    :    .    :    .    :    .    85
seq        YKINSDSNDFYLVANYDWVPTSTTKVYKQVPQQFDFRNSMHMLTSNLTFT
frame-14   bcdefgabcdefggfggfgabcdabcabcdefabcdefgabcdefgabab
frame-21   bcdefgabcdefgefggfgabcdabaabcdefabcdefgababcdefgab
frame-28   efgefgefgdefgdefgfabcdefgabcdefgabcdabcdeabcdefgab
prob-14    --------------------------------------------------
prob-21    --------------------------------------------------
prob-28    --------------------------------------------------

               .    :    .    :    .    :    .    :    .    90
seq        VYSDLLAFVSADTVEPINAVAFDNMRIMNEL
frame-14   cdefgabcdefgcdefaabcdefgabcdefg
frame-21   cdefgabcdeabcdefgabcdefgabcdefg
frame-28   cdefgabcdefgabcdefgabcdefgcdefg
prob-14    -------------------------------
prob-21    -------------------------------
prob-28    -------------------------------


________________________________________________________________________________




   Prediction of:			
	- secondary structure,   		by PHDsec		
	- solvent accessibility, 		by PHDacc		

   PHD: Profile fed neural network systems from HeiDelberg
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   Author:             Burkhard Rost		
                       EMBL, Heidelberg, FRG
                       Meyerhofstrasse 1, 69 117 Heidelberg
                       Internet: Predict-Help@EMBL-Heidelberg.DE

   All rights reserved.



   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~	
   Secondary structure prediction by PHDsec:
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~	

   Author:             Burkhard Rost		
                       EMBL, Heidelberg, FRG
                       Meyerhofstrasse 1, 69 117 Heidelberg
                       Internet: Rost@EMBL-Heidelberg.DE 		

   All rights reserved.




About the network method
~~~~~~~~~~~~~~~~~~~~~~~

The network procedure is described in detail in:
1) Rost, Burkhard; Sander, Chris:
  Prediction of protein structure at better than 70% accuracy.
  J. Mol. Biol., 1993, 232, 584-599.        	

A brief description is given in:
  Rost, Burkhard; Sander, Chris:
  Improved prediction of protein secondary structure by use of se-
  quence profiles and neural networks.
  Proc. Natl. Acad. Sci. U.S.A., 1993, 90, 7558-7562.   		

The PHD mail server is described in:
2) Rost, Burkhard; Sander, Chris; Schneider, Reinhard:
  PHD - an automatic mail server for protein secondary structure
  prediction.
  CABIOS, 1994, 10, 53-60.

The latest improvement steps (up to 72%) are explained in:
3) Rost, Burkhard; Sander, Chris:
  Combining evolutionary information and neural networks to predict
  protein secondary structure.
  Proteins, 1994,  19, 55-72.

To be quoted for publications of PHD output:
  Papers 1-3 for the prediction of secondary structure and the pre-
  diction server.



About the input to the network
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The prediction is performed by a system of neural networks.
The input is a multiple sequence alignment. It is taken from an HSSP
file (produced by the program MaxHom:
  Sander, Chris & Schneider, Reinhard: Database of Homology-Derived
  Structures and the Structural Meaning of Sequence Alignment.
  Proteins, 1991, 9, 56-68.

For optimal results the alignment should contain sequences with varying
degrees of sequence similarity relative to the input protein.
The following is an ideal situation:

+-----------------+----------------------+
|   sequence:     |  sequence identity   |
+-----------------+----------------------+
| target sequence |  100 %               |
| aligned seq. 1  |   90 %               |
| aligned seq. 2  |   80 %               |
|      ...        |   ...                |
| aligned seq. 7  |   30 %               |
+-----------------+----------------------+



Estimated Accuracy of Prediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A careful cross validation test on some 250 protein chains (in total
about 55,000 residues) with less than 25% pairwise sequence identity
gave the following results:

++================++-----------------------------------------+
|| Qtotal = 72.1% ||      ("overall three state accuracy")   |
++================++-----------------------------------------+

+----------------------------+-----------------------------+
| Qhelix (% of observed)=70% | Qhelix (% of predicted)=77% |
| Qstrand(% of observed)=62% | Qstrand(% of predicted)=64% |
| Qloop  (% of observed)=79% | Qloop  (% of predicted)=72% |
+----------------------------+-----------------------------+
..........................................................................

These percentages are defined by:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

|                    number of correctly predicted residues
|Qtotal =            ---------------------------------------      (*100)
|                          number of all residues
|
|                    no of res correctly predicted to be in helix
|Qhelix (% of obs) = -------------------------------------------- (*100)
|                    no of all res observed to be in helix
|
|
|                    no of res correctly predicted to be in helix
|Qhelix (% of pred)= -------------------------------------------- (*100)
|                    no of all residues predicted to be in helix

..........................................................................

Averaging over single chains
~~~~~~~~~~~~~~~~~~~~~~~~~~~

The most reasonable way to compute the overall accuracies is the above
quoted percentage of correctly predicted residues.  However, since the
user is mainly interested in the expected performance of the prediction
for a particular protein, the mean value when averaging over protein
chains might be of help as well.  Computing first the three state
accuracy for each protein chain, and then averaging over 250 chains
yields the following average:

+-------------------------------====--+
| Qtotal/averaged over chains = 72.2% |
+-------------------------------====--+
| standard deviation          =  9.3% |
+-------------------------------------+

..........................................................................

Further measures of performance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Matthews correlation coefficient:

+---------------------------------------------+
| Chelix = 0.63, Cstrand = 0.53, Cloop = 0.52 |
+---------------------------------------------+
..........................................................................

Average length of predicted secondary structure segments:

.           +------------+----------+
.           |  predicted | observed |
+-----------+------------+----------+
| Lhelix  = |    10.3    |    9.3   |
| Lstrand = |     5.0    |    5.3   |
| Lloop   = |     7.2    |    5.9   |
+-----------+------------+----------+
..........................................................................

The accuracy matrix in detail:

+---------------------------------------+
|    number of residues with H, E, L    |
+---------+------+------+------+--------+
|         |net H |net E |net L |sum obs |
+---------+------+------+------+--------+
| obs H   |12447 | 1255 | 3990 |  17692 |
| obs E   |  949 | 7493 | 3750 |  12192 |
| obs L   | 2604 | 2875 |19962 |  25441 |
+---------+------+------+------+--------+
| sum Net |16000 |11623 |27702 |  55325 |
+---------+------+------+------+--------+

Note: This table is to be read in the following manner:
     12447 of all residues predicted to be in helix, were observed to
     be in helix, 949 however belong to observed strands, 2604 to
     observed loop regions.  The term "observed" refers to the DSSP
     assignment of secondary structure calculated from 3D coordinates
     of experimentally determined structures (Dictionary of Secondary
     Structure  of Proteins: Kabsch & Sander (1983) Biopolymers, 22,
     2577-2637).



Position-specific reliability index
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The network predicts the three secondary structure types using real
numbers from the output units. The prediction is assigned by choosing
the maximal unit ("winner takes all").  However, the real numbers
contain additional information.
E.g. the difference between the maximal and the second largest output
unit can be used to derive a "reliability index".  This index is given
for each residue along with the prediction.  The index is scaled to
have values between 0 (lowest reliability), and 9 (highest).
The accuracies (Qtot) to be expected for residues with values above a
particular value of the index are given below as well as the fraction
of such residues (%res).:

+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| index|  0  |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |
| %res |100.0| 99.2| 90.4| 80.9| 71.6| 62.5| 52.8| 42.3| 29.8| 14.1|
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
|      |     |     |     |     |     |     |     |     |     |     |
| Qtot | 72.1| 72.3| 74.8| 77.7| 80.3| 82.9| 85.7| 88.5| 91.1| 94.2|
|      |     |     |     |     |     |     |     |     |     |     |
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| H%obs| 70.4| 70.6| 73.7| 77.1| 80.1| 83.1| 86.0| 89.3| 92.5| 96.4|
| E%obs| 61.5| 61.7| 63.7| 66.6| 69.1| 71.7| 74.6| 77.0| 77.8| 68.1|
|      |     |     |     |     |     |     |     |     |     |     |
| H%prd| 77.8| 78.0| 80.0| 82.6| 84.7| 86.9| 89.2| 91.3| 93.1| 95.4|
| E%prd| 64.5| 64.7| 67.8| 71.0| 74.2| 77.6| 81.4| 85.1| 89.8| 93.5|
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+

The above table gives the cumulative results, e.g. 62.5% of all
residues have a reliability of at least 5.  The overall three-state
accuracy for this subset of almost two thirds of all residues is 82.9%.
For this subset, e.g., 83.1% of the observed helices are correctly
predicted, and 86.9% of all residues predicted to be in helix are
correct.

..........................................................................

The following table gives the non-cumulative quantities, i.e. the
values per reliability index range.  These numbers answer the question:
how reliable is the prediction for all residues labeled with the
particular index i.

+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| index|  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |
| %res |  8.8|  9.5|  9.3|  9.1|  9.7| 10.5| 12.5| 15.7| 14.1|
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+
|      |     |     |     |     |     |     |     |     |     |
| Qtot | 46.6| 50.6| 57.7| 62.6| 67.9| 74.2| 82.2| 88.3| 94.2|
|      |     |     |     |     |     |     |     |     |     |
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| H%obs| 36.8| 42.3| 49.5| 55.2| 61.7| 69.9| 78.8| 87.4| 96.4|
| E%obs| 44.7| 44.5| 52.1| 55.4| 60.9| 68.0| 75.9| 81.0| 68.1|
|      |     |     |     |     |     |     |     |     |     |
| H%prd| 49.9| 52.5| 60.3| 64.2| 69.2| 77.5| 85.4| 89.9| 95.4|
| E%prd| 41.7| 47.1| 53.6| 57.0| 64.0| 71.6| 78.8| 88.8| 93.5|
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+

For example, for residues with Relindex = 5 64% of all predicted betha-
strand residues are correctly identified.





   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~		
   Solvent accessibility prediction by PHDacc:
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~		

   Author:             Burkhard Rost		
                       EMBL, Heidelberg, FRG
                       Meyerhofstrasse 1, 69 117 Heidelberg
                       Internet: Rost@EMBL-Heidelberg.DE 		

   All rights reserved.




About the network method
~~~~~~~~~~~~~~~~~~~~~~~

The network for prediction of secondary structure is described in
detail in:
  Rost, Burkhard; Sander, Chris:
  Prediction of protein structure at better than 70% accuracy.
  J. Mol. Biol., 1993, 232, 584-599.

The analysis of the prediction of solvent exposure is given in:
  Rost, Burkhard; Sander, Chris:
  Conservation and prediction of solvent accessibility in protein
  families.  Proteins, 1994, 20, 216-226.

To be quoted for publications of PHD exposure prediction:
  Both papers quoted above.



Definition of accessibility
~~~~~~~~~~~~~~~~~~~~~~~~~~

For training the residue solvent accessibility the DSSP (Dictionary of
Secondary Structure of Proteins; Kabsch & Sander (1983) Biopolymers, 22,
2577-2637) values of accessible surface area have been used.  The
prediction provides values for the relative solvent accessibility.  The
normalisation is the following:

|                           ACCESSIBILITY (from DSSP in Angstrom)
|RELATIVE_ACCESSIBILITY =   ------------------------------------- * 100
|                               MAXIMAL_ACC (amino acid type i)

where MAXIMAL_ACC (i) is the maximal accessibility of amino acid type i.
The maximal values are:

+----+----+----+----+----+----+----+----+----+----+----+----+
|  A |  B |  C |  D |  E |  F |  G |  H |  I |  K |  L |  M |
| 106| 160| 135| 163| 194| 197|  84| 184| 169| 205| 164| 188|
+----+----+----+----+----+----+----+----+----+----+----+----+
|  N |  P |  Q |  R |  S |  T |  V |  W |  X |  Y |  Z |
| 157| 136| 198| 248| 130| 142| 142| 227| 180| 222| 196|
+----+----+----+----+----+----+----+----+----+----+----+

Notation: one letter code for amino acid, B stands for D or N; Z stands
  for E or Q; and X stands for undetermined.

The relative solvent accessibility can be used to estimate the number
of water molecules (W) in contact with the residue:

W = ACCESSIBILITY /10

The prediction is given in 10 states for relative accessibility, with

RELATIVE_ACCESSIBILITY = (PREDICTED_ACC * PREDICTED_ACC)

where PREDICTED_ACC = 0 - 9.



Estimated Accuracy of Prediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A careful cross validation test on some 238 protein chains (in total
about 62,000 residues) with less than 25% pairwise sequence identity
gave the following results:


Correlation
...........

The correlation between observed and predicted solvent accessibility
is:

-----------
corr = 0.53
-----------

This value ought to be compared to the worst and best case prediction
scenario: random prediction (corr = 0.0) and homology modelling
(corr = 0.66).  (Note: homology modelling yields a relative accurate
prediction in 3D if, and only if, a significantly identical sequence
has a known 3D structure.)


3-state accuracy
................

Often the relative accessibility is projected onto, e.g., 3 states:
  b  = buried       (here defined as < 9% relative accessibility),
  i  = intermediate ( 9% <= rel. acc. < 36% ),
  e  = exposed      ( rel. acc. >= 36% ).

A projection onto 3 states or 2 states (buried/exposed) enables the
compilation of a 3- and 2-state prediction accuracy.  PHD reaches an
overall 3-state accuracy of:
  Q3 = 57.5%
(compared to 35% for random prediction and 70% for homology modelling).

In detail:

+-----------------------------------+-------------------------+
| Qburied       (% of observed)=77% | Qb (% of predicted)=60% |
| Qintermediate (% of observed)= 9% | Qi (% of predicted)=44% |
| Qexposed      (% of observed)=78% | Qe (% of predicted)=56% |
+-----------------------------------+-------------------------+


10-state accuracy
.................

The network predicts relative solvent accessibility in 10 states, with
state i (i = 0-9) corresponding to a relative solvent accessibility of
i*i %.  The 10-state accuracy of the network is:

  Q10 = 24.5%

..........................................................................

These percentages are defined by:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

|                     number of correctly predicted residues
|Q3 		      = ---------------------------------------      (*100)
|                           number of all residues
|
|                     no of res. correctly predicted to be buried
|Qburied (% of obs) = ------------------------------------------- (*100)
|                     no of all res. observed to be buried
|
|
|                     no of res. correctly predicted to be buried
|Qburied (% of pred)= ------------------------------------------- (*100)
|                     no of all residues predicted to be buried

..........................................................................

Averaging over single chains
~~~~~~~~~~~~~~~~~~~~~~~~~~~

The most reasonable way to compute the overall accuracies is the above
quoted percentage of correctly predicted residues.  However, since the
user is mainly interested in the expected performance of the prediction
for a particular protein, the mean value when averaging over protein
chains might be of help as well.  Computing first the correlation
between observed and predicted accessibility for each protein chan, and
then averaging over all 238 chains yields the following average:

+-------------------------------====--+
| corr/averaged over chains   = 0.53  |
+-------------------------------====--+
| standard deviation          = 0.11  |
+-------------------------------------+

..........................................................................

Further details of performance accuracy
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The accuracy matrix in detail:
..............................

-------+----------------------------------------------------+-----------
\ PHD |    0    1   2   3    4    5     6     7    8    9  |  SUM  %obs
-------+----------------------------------------------------+-----------
OBS  0 | 8611  140   8  44   82  169   772   334   27    0  | 10187 16.6
OBS  1 | 4367  164   0  50  106  231   738   346   44    3  |  6049  9.8
OBS  2 | 3194  168   1  68  125  303   951   513   42    7  |  5372  8.7
OBS  3 | 2760  159   8  80  136  327  1246   746   58   19  |  5539  9.0
OBS  4 | 2312  144   2  72  166  396  1615  1245  124   19  |  6095  9.9
OBS  5 | 1873   96   3  84  138  425  1979  1834  187   27  |  6646 10.8
OBS  6 | 1387   67   1  60   80  278  2237  2627  231   51  |  7019 11.4
OBS  7 | 1082   35   0  32   56  225  1871  3107  302   60  |  6770 11.0
OBS  8 |  660   25   0  27   43  136  1206  2374  325   87  |  4883  7.9
OBS  9 |  325   20   2  27   29   74   648  1159  366  214  |  2864  4.7
-------+----------------------------------------------------+-----------
SUM    |26571 1018  25 544  961 2564 13263 14285 1706  487  |
%pred  | 43.3  1.7 0.0 0.9  1.6  4.2  21.6  23.3  2.8  0.8  |
-------+----------------------------------------------------+-----------

Note: This table is to be read in the following manner:
     8611 of all residues predicted to be in exposed by 0%, were
     observed with 0% relative accessibility.  However, 325 of all
     residues predicted to have 0% are observed as completely exposed
     (obs = 9 -> rel. acc. >= 81%).  The term "observed" refers to the
     DSSP compilation of area of solvent accessibility calculated from
     3D coordinates of experimentally determined structures (Diction-
     ary of Secondary Structure  of Proteins: Kabsch & Sander (1983)
     Biopolymers, 22, 2577-2637).


Accuracy for each amino acid:
.............................

+---+------------------------------+-----+-------+------+
|AA |   Q3 b%o b%p i%o i%p e%o e%p | Q10 |  corr |    N |
+---+------------------------------+-----+-------+------+
| A | 59.0  87  60   2  38  66  57 |  31 | 0.530 | 5054 |
| C | 62.0  91  67   5  39  25  21 |  34 | 0.244 |  893 |
| D | 56.5  21  45   6  49  94  57 |  20 | 0.321 | 3536 |
| E | 60.8   9  40   3  41  98  61 |  21 | 0.347 | 3743 |
| F | 63.3  94  67   9  46  29  37 |  27 | 0.366 | 2436 |
| G | 52.1  75  51   1  31  67  53 |  22 | 0.405 | 4787 |
| H | 50.9  63  53  23  45  71  50 |  18 | 0.442 | 1366 |
| I | 64.9  95  68   6  41  30  38 |  34 | 0.360 | 3437 |
| K | 66.6   2  11   2  37  98  67 |  23 | 0.267 | 3652 |
| L | 61.6  93  65   8  44  31  40 |  31 | 0.368 | 5016 |
| M | 60.1  92  64   5  39  45  44 |  29 | 0.452 | 1371 |
| N | 55.5  45  45   8  38  87  59 |  17 | 0.410 | 2923 |
| P | 53.0  48  48   9  39  83  56 |  18 | 0.364 | 2920 |
| Q | 54.3  27  44   7  44  92  56 |  20 | 0.344 | 2225 |
| R | 49.9  15  47  36  47  76  51 |  18 | 0.372 | 2765 |
| S | 55.6  69  53   3  51  81  56 |  22 | 0.464 | 3981 |
| T | 51.8  61  51   8  38  78  53 |  21 | 0.432 | 3740 |
| V | 61.1  93  65   5  40  39  42 |  34 | 0.418 | 4156 |
| W | 56.2  85  62  20  49  29  27 |  21 | 0.318 |  891 |
| Y | 49.7  73  52  33  49  36  38 |  19 | 0.359 | 2301 |
+---+------------------------------+-----+-------+------+

Abbreviations:

AA:   amino acid in one-letter code
b%o, i%o, e%o:   = Qburied, Qintermediate, Qexposed (% of observed),
     i.e. percentage of correct prediction in each state, see above
b%p, i%p, e%p:   = Qburied, Qintermediate, Qexposed (% of predicted),
     i.e. probability of correct prediction in each state, see above
b%o:  = Qburied (% of observed), see above
Q10:  percentage of correctly predicted residues in each of the 10
     states of predicted relative accessibility.
corr: correlation between predicted and observed rel. acc.
N:    number of residues in data set


Accuracy for different secondary structure:
...........................................

+--------+------------------------------+----+-------+-------+
| type   |   Q3 b%o b%p i%o i%p e%o e%p |Q10 |  corr |     N |
+--------+------------------------------+----+-------+-------+
| helix  | 59.5  79  64   8  44  80  56 | 27 | 0.574 | 20100 |
| strand | 61.3  84  73   9  46  69  37 | 35 | 0.524 | 13356 |
| loop   | 54.4  64  43  11  44  78  61 | 18 | 0.442 | 27968 |
+--------+------------------------------+----+-------+-------+

Abbreviations as before.



Position-specific reliability index
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The network predicts the 10 states for relative accessibility using real
numbers from the output units. The prediction is assigned by choosing
the maximal unit ("winner takes all").  However, the real numbers
contain additional information.
E.g. the difference between the maximal and the second largest output
unit (with the constraint that the second largest output is compiled
among all units at least 2 positions off the maximal unit) can be used
to derive a "reliability index".  This index is given for each residue
along with the prediction.  The index is scaled to have values between
0 (lowest reliability), and 9 (highest).
The accuracies (Q3, corr, asf.) to be expected for residues with values
above a particular value of the index are given below as well as the
fraction of such residues (%res).:

+---+------------------------------+----+-------+-------+
|RI |   Q3 b%o b%p i%o i%p e%o e%p |Q10 |  corr |  %res |
+---+------------------------------+----+-------+-------+
| 0 | 57.5  77  60   9  44  78  56 | 24 | 0.535 | 100.0 |
| 1 | 59.1  76  63   9  45  82  57 | 25 | 0.560 |  91.2 |
| 2 | 61.7  79  66   4  47  87  58 | 27 | 0.594 |  77.1 |
| 3 | 66.6  87  70   1  51  89  63 | 30 | 0.650 |  57.1 |
| 4 | 70.0  89  72   0  83  91  67 | 32 | 0.686 |  45.8 |
| 5 | 72.9  92  75   0   0  93  70 | 34 | 0.722 |  35.6 |
| 6 | 76.3  95  77   0   0  93  75 | 36 | 0.769 |  24.7 |
| 7 | 79.0  97  79   0   0  93  78 | 39 | 0.803 |  16.0 |
| 8 | 80.9  98  80   0   0  91  81 | 43 | 0.824 |   9.6 |
| 9 | 81.2  99  80   0   0  88  83 | 45 | 0.828 |   5.9 |
+---+------------------------------+----+-------+-------+

Abbreviations as before.

The above table gives the cumulative results, e.g. 45.8% of all
residues have a reliability of at least 4.  The correlation for this
most reliably predicted half of the residues is 0.686, i.e. a value
comparable to what could be expected if homology modelling were
possible.  For this subset of 45.8% of all residues, 89% of the buried
residues are correctly predicted, and 72% of all residues predicted to
be buried are correct.

..........................................................................

The following table gives the non-cumulative quantities, i.e. the
values per reliability index range.  These numbers answer the question:
how reliable is the prediction for all residues labeled with the
particular index i.

+---+------------------------------+----+-------+-------+
|RI |   Q3 b%o b%p i%o i%p e%o e%p |Q10 |  corr |  %res |
+---+------------------------------+----+-------+-------+
| 0 | 40.9  79  40  16  41  21  40 | 14 | 0.175 |   8.8 |
| 1 | 45.4  61  46  28  44  48  44 | 17 | 0.278 |  14.1 |
| 2 | 47.4  53  52  10  46  80  44 | 19 | 0.343 |  19.9 |
| 3 | 52.9  75  59   4  50  77  47 | 23 | 0.439 |  11.4 |
| 4 | 60.0  81  63   0  83  84  56 | 25 | 0.547 |  10.1 |
| 5 | 65.2  82  70   0   0  93  62 | 28 | 0.607 |  10.9 |
| 6 | 71.3  90  72   0   0  94  70 | 31 | 0.692 |   8.8 |
| 7 | 76.0  94  76   0   0  95  75 | 34 | 0.762 |   6.3 |
| 8 | 80.5  97  81   0   0  94  79 | 39 | 0.808 |   3.8 |
| 9 | 81.2  99  80   0   0  88  83 | 45 | 0.828 |   5.9 |
+---+------------------------------+----+-------+-------+

For example, for residues with RI = 4 83% of all predicted intermediate
residues are correctly predicted as such.






The resulting network (PHD) prediction is:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

________________________________________________________________________________



 PHD: Profile fed neural network systems from HeiDelberg
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Prediction of:			
	secondary structure,   			   by PHDsec		
	solvent accessibility, 			   by PHDacc		
	and helical transmembrane regions, 	   by PHDhtm		

 Author:             						
	Burkhard Rost							
    EMBL, 69012 Heidelberg, Germany					
    Internet: Rost@EMBL-Heidelberg.DE				

 All rights reserved.



 The network systems are described in:   	                     	

 PHDsec:    B Rost & C Sander: JMB, 1993, 232, 584-599.		
 		B Rost & C Sander: Proteins, 1994, 19, 55-72.		
 PHDacc:  	B Rost & C Sander: Proteins, 1994, 20, 216-226.		
 PHDhtm:  	B Rost et al.: 	   Prot. Science, 1995, 4, 521-533.	



 Some statistics
 ~~~~~~~~~~~~~~~

 Percentage of amino acids:
 +--------------+--------+--------+--------+--------+--------+
 | AA:          |    L   |    N   |    V   |    I   |    D   |
 | % of AA:     |   10.8 |    7.4 |    7.2 |    7.2 |    6.9 |
 +--------------+--------+--------+--------+--------+--------+
 | AA:          |    Q   |    R   |    E   |    T   |    A   |
 | % of AA:     |    6.8 |    6.1 |    5.9 |    5.7 |    5.7 |
 +--------------+--------+--------+--------+--------+--------+
 | AA:          |    K   |    S   |    Y   |    M   |    F   |
 | % of AA:     |    5.4 |    4.9 |    4.0 |    3.9 |    3.9 |
 +--------------+--------+--------+--------+--------+--------+
 | AA:          |    P   |    G   |    H   |    W   |    C   |
 | % of AA:     |    3.4 |    2.4 |    1.4 |    0.7 |    0.6 |
 +--------------+--------+--------+--------+--------+--------+

 Percentage of secondary structure predicted:
 +--------------+--------+--------+--------+
 | SecStr:      |    H   |    E   |    L   |
 | % Predicted: |   61.4 |    8.6 |   30.0 |
 +--------------+--------+--------+--------+

 According to the following classes:
    all-alpha:   %H>45 and %E< 5; all-beta : %H<5 and %E>45
    alpha-beta : %H>30 and %E>20; mixed:    rest,
 this means that the predicted class is:           mixed class



 PHD output for your protein
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Thu Dec 23 04:37:09 1999
 Jury on:       10    different architectures (version   5.94_317 ).
 Note: differently trained architectures, i.e., different versions can
 result in different predictions.



 About the protein
 ~~~~~~~~~~~~~~~~~

 HEADER     /home/phd/server/work/predict_h11817647.
 COMPND
 SOURCE
 AUTHOR
 SEQLENGTH   881
 NCHAIN        1 chain(s) in predict_h11817647 data set
 NALIGN        1
 (=number of aligned sequences in HSSP file)



 WARNING
 ~~~~~~~

 Expected accuracy is about 72% if, and only if, the alignment contain
 sufficient information.  For your sequence there was no homologue in
 the current version of Swissprot detected.  This implies that the
 expected accuracy is about 6-10 percentage points lower !



 Abbreviations: PHDsec
 ~~~~~~~~~~~~~~~~~~~~~

 sequence:
    AA : amino acid sequence
 secondary structure:
    HEL: H=helix, E=extended (sheet), blank=other (loop)
    PHD: Profile network prediction HeiDelberg
    Rel: Reliability index of prediction (0-9)
 detail:
    prH: 'probability' for assigning helix
    prE: 'probability' for assigning strand
    prL: 'probability' for assigning loop
         note: the 'probabilites' are scaled to the interval 0-9, e.g.,
               prH=5 means, that the first output node is 0.5-0.6
 subset:
    SUB: a subset of the prediction, for all residues with an expected
         average accuracy > 82% (tables in header)
         note: for this subset the following symbols are used:
      L: is loop (for which above " " is used)
    ".": means that no prediction is made for this residue, as the
         reliability is:  Rel < 5

 Abbreviations: PHDacc
 ~~~~~~~~~~~~~~~~~~~~~

    SS : secondary structure
    HEL: H=helix, E=extended (sheet), blank=other (loop)
 solvent accessibility:
    3st: relative solvent accessibility (acc) in 3 states:
         b = 0-9%, i = 9-36%, e = 36-100%.
    PHD: Profile network prediction HeiDelberg
    Rel: Reliability index of prediction (0-9)
    O_3: observed relative acc. in 3 states: B, I, E
         note: for convenience a blank is used intermediate (i).
    P_3: predicted relative accessibility in 3 states
    10st:relative accessibility in 10 states:
         = n corresponds to a relative acc. of n*n %
 subset:
    SUB: a subset of the prediction, for all residues with an expected
         average correlation > 0.69 (tables in header)
         note: for this subset the following symbols are used:
    "I": is intermediate (for which above " " is used)
    ".": means that no prediction is made for this residue, as the
         reliability is: Rel < 4



 protein:       predict        length      881



                  ....,....1....,....2....,....3....,....4....,....5....,....6
         AA      |MAYRKRGATVEADINNNDRMQEKDDEKQDQNNRMQLSDKVLSKKEEVVTDSQEEIKIRDE|
         PHD sec |         HHHHHHHHHHHHH HHHHHHHHHHHHH      EEEEE   HHHHHHHHHH|
         Rel sec |999657872136854543433314699667556443241344122122890435556724|
 detail:
         prH sec |000000013567876765656546799778777665324222110100004666767856|
         prE sec |000121000000000000000000000000000000101111445454100000010000|
         prL sec |988778875432123233333353200221222223564556333435894232221133|
 subset: SUB sec |LLLLLLLL...HHH.H........HHHHHHHHH...............LL...HHHHH..|
 accessibility
 3st:    P_3 acc |eee eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeebeeeeeeeeeeebeeeeeeeeeeee|
 10st:   PHD acc |996576877778777787777777777777777760777777777860777877777787|
         Rel acc |001141233141423453334242413325613312425411351101532451271234|
 subset: SUB acc |....e.....e.e..ee...e.e.e....ee.....e.ee...e....e..ee..e...e|


                  ....,....7....,....8....,....9....,....10...,....11...,....12
         AA      |VKKSTKEESKQLLEVLKTKEEHQKEIQYEILQKTIPTFEPKESILKKLEDIKPEQAKKQT|
         PHD sec |H     HHHHHHHHHHHHHHHHHHHHHHHHHHHH      HHHHHHHHH   HHHHHHHH|
         Rel sec |305543588999999928999627899999999547978637999866615388876778|
 detail:
         prH sec |542233688999999958899758889999999731010167998877742388887888|
         prE sec |110000000000000000000000000000000000000000000000000000000000|
         prL sec |346665211000000041000241100000000268988731000122257511112111|
 subset: SUB sec |..LL..HHHHHHHHHH.HHHHH.HHHHHHHHHHH.LLLLL.HHHHHHHH.L.HHHHHHHH|
 accessibility
 3st:    P_3 acc |beeeeeeebeebbebbebeeeeeeeebeebbee bbe eeeeebbee eeeeeeebeeee|
 10st:   PHD acc |077779780770070070677777760770077500747777700773787777707777|
         Rel acc |433343510235434431132314312224733040302145064240430336205610|
 subset: SUB acc |b...e.e....bb.bb.......e.....bb...b.....ee.bb.e.e....e..ee..|


                  ....,....13...,....14...,....15...,....16...,....17...,....18
         AA      |KLFRIFEPRQLPIYRANGEKELRNRWYWKLKKDTLPDGDYDVREYFLNLYDQVLTEMPDY|
         PHD sec |HHHHHH              HHHHHHHHHHHH       HHHHHHHHHHHHHHHHH   H|
         Rel sec |876643258998556688511467777668827997877158999999999999982236|
 detail:
         prH sec |877655211000000001234677777778851001011578999999999999984437|
         prE sec |011121210000221100011100000110000000000000000000000000000000|
         prL sec |111122468898667788643221111110137997887421000000000000015551|
 subset: SUB sec |HHHH...LLLLLLLLLLLL...HHHHHHHHH.LLLLLLL.HHHHHHHHHHHHHHHH...H|
 accessibility
 3st:    P_3 acc |ebbbbbeeee ebbe ebeeebee bbbeb eeebeebe ebbebbbebbeebbeebbeb|
 10st:   PHD acc |600000777747006570777077500060577707706370060006007600770070|
         Rel acc |157043201311212013233222152424023310400128117961810286414200|
 subset: SUB acc |.bb.b....................b.b.b......e....b..bbb.b...bbe.b...|


                  ....,....19...,....20...,....21...,....22...,....23...,....24
         AA      |LLLKDMAVENKNSRDAGKVVDSETASICDAIFQDEETEGAVRRFIAEMRQRVQADRNVVN|
         PHD sec |HHHHH                HHHHHHHHHHHH HHHHHHHHHHHHHHHHHHH       |
         Rel sec |775351136999724477760679999999950415688999999999998626567434|
 detail:
         prH sec |776563431000143311114789999999864347788999999999998751111233|
         prE sec |112311011000000001100000000000010000000000000000000010001100|
         prL sec |001114557998856677774210000000024642211000000000001137777656|
 subset: SUB sec |HHH.H...LLLLL...LLLL.HHHHHHHHHHH...HHHHHHHHHHHHHHHHH.LLLL...|
 accessibility
 3st:    P_3 acc |bbbeebbbebeebeebbebbeeeeeebbebbbeeeebeb bbbbbbeb e beeeebeee|
 10st:   PHD acc |000660007077079007007776770070006677070400000070574077770677|
         Rel acc |779106541052122652251541455500930022211080166017101430210013|
 subset: SUB acc |bbb..bbb..e....bb..b.ee.eebb..b.........b..bb..b...b........|


                  ....,....25...,....26...,....27...,....28...,....29...,....30
         AA      |YPSILHPIDYAFNEYFLQHQLVEPLNNDIIFNYIPERIRNDVNYILNMDRNLPSTARYIR|
         PHD sec |        HHHHHHHHHHHHHH    HHHHHHHHHHHH     EEEE             |
         Rel sec |633466134799999998775443885999998389752698724411111333685357|
 detail:
         prH sec |233111436899999998876633116899998688863100110123444333200000|
         prE sec |001211000000000000000000000000000000001000056641100000002321|
         prL sec |755567563100000000112366882100000310124798733234345656786578|
 subset: SUB sec |L...LL...HHHHHHHHHHHH...LLHHHHHHH.HHHH.LLLL...........LLL.LL|
 accessibility
 3st:    P_3 acc |bbbbbbbbbeebbebbbebbbbebbeee bbebbbeebee bbbbbbbbeebeeeb bbe|
 10st:   PHD acc |000000000770070006000060067750070008607750000000077077705006|
         Rel acc |233860340225351791227520112308341713141100139544011112121251|
 subset: SUB acc |...bb..b...b.e.bb...bb.......b.e.b...b......bbbb..........b.|


                  ....,....31...,....32...,....33...,....34...,....35...,....36
         AA      |PNLLQDRLNLHDNFESLWDTITTSNYILARSVVPDLKELVSTEAQIQKMSQDLQLEALTI|
         PHD sec |    HHHHH       E    HHHHHHHHHHH HHHHHHHHHHHHHHHHHHHHHHHHHHH|
         Rel sec |824214666169822020232244344556616459999999999999988994598735|
 detail:
         prH sec |033356777420033323223566555667741679999999999999988996688866|
         prE sec |000000000000011342211000011111110000000000000000000001000000|
         prL sec |856543222579855333455433322121147320000000000000011002201132|
 subset: SUB sec |L.....HHH.LLL..............HHHH.L.HHHHHHHHHHHHHHHHHHH.HHHH.H|
 accessibility
 3st:    P_3 acc |bebeeb beb eebebbbebebeeebbbb ebbeebeebbeeeeebeebbeebebeebee|
 10st:   PHD acc |070670306047606000607067700003700670770077776077006606077067|
         Rel acc |114000031401171523111001123190115017324440141357700111521722|
 subset: SUB acc |..b......b...b.b............b...b..b..bbe..e..eeb.....b..b..|


                  ....,....37...,....38...,....39...,....40...,....41...,....42
         AA      |QSETQFLTGINSQAANDCFKTLIAAMLSQRTMSLDFVTTNYMSLISGMWLLTVVPNDMFI|
         PHD sec |HHHHHHHE   HHHHHHHHHHHHHHHHHH  EEEEE HHHHHHHHH  EEEEE    HHH|
         Rel sec |226786106671166999999999998761526451122235576331336645762123|
 detail:
         prH sec |557787431115577899999999998774201114354456677534231100013345|
         prE sec |000001430000000000000000000000157664211110111112557762101222|
         prL sec |342100137784422000000000001124631211333322211353110126874332|
 subset: SUB sec |..HHHH..LLL..HHHHHHHHHHHHHHHH.L.E.E......HHHH.....EE.LLL....|
 accessibility
 3st:    P_3 acc |ebeeebbebeebebbbebbeebbbbbbbe bbbbebbbbbbbbbbbbbbbbbbbbeebbb|
 10st:   PHD acc |707760070660700070066000000074000060000000000000000000076000|
         Rel acc |341315232110146036910696156430151516760438585758398542410147|
 subset: SUB acc |.b...b.......bb..bb..bbb.bbb...b.b.bbb.b.bbbbbbb.bbbb.b...bb|


                  ....,....43...,....44...,....45...,....46...,....47...,....48
         AA      |RESLVACQLAIVNTIIYPAFGMQRMHYRNGDPQTPFQIAEQQIRKFSGSGIGWHFVNNNQ|
         PHD sec |HHHHHHHHHHHHHHHE  HHHHHHHHHH        HHHHHHHHHH     HHHHHHHHH|
         Rel sec |458899999876451134234899997449987711259999999729994225522386|
 detail:
         prH sec |568889998887663112566899988630000143578999998740003456754587|
         prE sec |210000000000113421000000000000001011000000000000000101111100|
         prL sec |121100000011212455433100001368988844420000000159996342133311|
 subset: SUB sec |.HHHHHHHHHHH.H.......HHHHHH..LLLLL...HHHHHHHHH.LLL...HH...HH|
 accessibility
 3st:    P_3 acc |bbbbbbbbbbbbbbbbbbbbbbbebbbeeeeeeeebebbeeeb ebeeeebbbbbbbeee|
 10st:   PHD acc |000000000000000000000006000679777770600767057069790000000676|
         Rel acc |008788749869677957765430721110332241056121604300102543253010|
 subset: SUB acc |..bbbbbbbbbbbbbbbbbbbb..b.........e..bb...b.e......bb..b....|


                  ....,....49...,....50...,....51...,....52...,....53...,....54
         AA      |FRQVVIDGVLNQVLNDNIRNVHVIKQLMQALMQLSRQQFPTMPVDYKRSIQRGILLLSNR|
         PHD sec |HHHHHHH    EE  HHH  HHHHHHHHHHHHHHHHH         HHHHHHHHHHHHHH|
         Rel sec |546544413211111121115899999999999865378889983217888999999742|
 detail:
         prH sec |666666643223334454336899999999999877510000013458888999999865|
         prE sec |001222011334320111221100000000000000000000000000000000000000|
         prL sec |221111245442234433331000000000000122388889986541111000000134|
 subset: SUB sec |H.HH................HHHHHHHHHHHHHHHH.LLLLLLL...HHHHHHHHHHH..|
 accessibility
 3st:    P_3 acc |beebbbebbeeeebeeeb eb bbebbbebbbebbeeebeb  bebe bbebbbbbbbe |
 10st:   PHD acc |076000700666606770570500700070007007760704526075006000000065|
         Rel acc |612070246000041214011088307246741553114010000110540075056400|
 subset: SUB acc |b...b..bb....b...b....bb..b.ebbb.bb...b.........bb..bb.bbb..|


                  ....,....55...,....56...,....57...,....58...,....59...,....60
         AA      |LGQLVDLTRLLAYNYETLMACVTMNMQHVQTLTTEKLQLTSVTSLCMLIGNATVIPSPQT|
         PHD sec |HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHEEE    EE     |
         Rel sec |899999999999969988887898668876221367655342332111215751259812|
 detail:
         prH sec |899999999999979988887898778887544577666554555444312100000053|
         prE sec |000000000000000000011000100000112200111111123444431024520000|
         prL sec |100000000000020001000000110111333211122223221111246864479845|
 subset: SUB sec |HHHHHHHHHHHHHHHHHHHHHHHHHHHHHH....HHHHH...........LLL..LLL..|
 accessibility
 3st:    P_3 acc |ebbbbbbb bbbbbbebbbbbbbbbbebbebbbeeebebbbbbbbbbbbbbebbbeebee|
 10st:   PHD acc |600000005000000700000000006007000666060000000000000700077077|
         Rel acc |030510860764142257289939550280232011311124489796711026301121|
 subset: SUB acc |...b..bb.bbb.b..bb.bbb.bbb..b............bbbbbbbb....b......|


                  ....,....61...,....62...,....63...,....64...,....65...,....66
         AA      |LFHYYNVNVNFHSNYNERINDAVAIITAANRLNLYQKKMKAIVEDFLKRLHIFDVARVPD|
         PHD sec |  EEEEEEEE    HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHEEE      H|
         Rel sec |114322214315632688999999999999999999999999999988642462576994|
 detail:
         prH sec |332112221001135788989999999999999999999999999988763101212006|
         prE sec |235544445542000000000000000000000000000000000000014673000000|
         prL sec |421232332346764201000000000000000000000000000010122215787993|
 subset: SUB sec |...........LL..HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH...E.LLLLL.|
 accessibility
 3st:    P_3 acc |bbebbbbbbbbbebbbeebeebbbbbbbbbbbebbbeebebbbeebbe bbbbbbbebbe|
 10st:   PHD acc |007000000000600066077000000000007000760700077007400000007009|
         Rel acc |551231126162100011612599393861072420317105931643090830103102|
 subset: SUB acc |bb......b.b.......b..bbb.b.bb..b.b....b..bb..bb..b.b........|


                  ....,....67...,....68...,....69...,....70...,....71...,....72
         AA      |DQMYRLRDRLRLLPVEVRRLDIFNLILMNMDQIERASDKIAQGVIIAYRDMQLERDEMYG|
         PHD sec |HHHHHHHHHHHH    HHHHHHHHHHH  HHHHHHHHHHHHHHHEEEEE    EEE   E|
         Rel sec |899999999852988225799999998132999954599951112686189742216451|
 detail:
         prH sec |899999999875000246789999998435898876799864443100000000000101|
         prE sec |000000000000000121000000000000000000000000125787410125541224|
         prL sec |100000000124988532100000001464000023200024431001488764447664|
 subset: SUB sec |HHHHHHHHHHH.LLL..HHHHHHHHHH...HHHHH.HHHHH....EEE.LLL....L.L.|
 accessibility
 3st:    P_3 acc |ebbb b e b ebbbebb bebbebbbeb eebeebbeebbbbbbbbbeebebeeeee b|
 10st:   PHD acc |700050474057000700506006000604660670077000000000670707676750|
         Rel acc |107008111611233100000780987140119130342651894751122212110204|
 subset: SUB acc |..b..b...b...........bb.bbb.b...b....e.bb.bbbbb............b|


                  ....,....73...,....74...,....75...,....76...,....77...,....78
         AA      |YVNIARNLDGFQQINLEELMRTGDYAQITNMLLNNQPVALVGALPFVTDSSVISLIAKLD|
         PHD sec |EEEEE       EEEHHHHHH   HHHHHHHHHH    EEE   EE   HHHHHHHHHHH|
         Rel sec |265524688611211578985586699999996449927524551047378999999993|
 detail:
         prH sec |111010001111110688987212799999997630000100012211388899999986|
         prE sec |576653100033544000000000000000000000037652214421000000000000|
         prL sec |312236788744344210012787100000002369951136763357611000000013|
 subset: SUB sec |.EEE..LLLL.....HHHHHHLLLHHHHHHHHH..LL.EE..LL...L.HHHHHHHHHH.|
 accessibility
 3st:    P_3 acc |bbbbbeeeebbbbbbbeebeeebebbebbbbbbeeebbbbebbbbbbbebbbbebbbebe|
 10st:   PHD acc |000007697000000066067606006000000786000070000000700007000706|
         Rel acc |012522004240263710512011131953245330233814732163045880872220|
 subset: SUB acc |...b....e.b..b.b..b........bb..bb......b.bb...b..bbbb.bb....|


                  ....,....79...,....80...,....81...,....82...,....83...,....84
         AA      |ATVFAQIVKLRKVDTLKPILYKINSDSNDFYLVANYDWVPTSTTKVYKQVPQQFDFRNSM|
         PHD sec |HHHHHHHHHHHHH     EEEEEE     EEEEE        HHHHHHHHHH E  HHHH|
         Rel sec |399999996442435893377783899647887478898862254653323211516799|
 detail:
         prH sec |699999987665632000000000000100000000000013566765555432147899|
         prE sec |000000000000000003588786100127888610001110001100111123100000|
         prL sec |300000002224357896311113898761111378898875322123233343651100|
 subset: SUB sec |.HHHHHHHH.....LLL..EEEE.LLLL.EEEE.LLLLLLL..H.HH.......L.HHHH|
 accessibility
 3st:    P_3 acc |bbbbbbbbee ebeebebbebeb bebeebbbbbb ebbeeebbebbee eeebebebbb|
 10st:   PHD acc |000000007747077060060604060770000005700777007007757660607000|
         Rel acc |228880852113034410414170212015776010500010113712301103132153|
 subset: SUB acc |..bbb.bb......eb..b.b.b......bbbb...e........b............b.|


                  ....,....85...,....86...,....87...,....88...,....89...,....90
         AA      |HMLTSNLTFTVYSDLLAFVSADTVEPINAVAFDNMRIMNEL|
         PHD sec |HHHHH   EEHHHHHHHHHH             HHHHHH  |
         Rel sec |99852651325998999982167412455455558899339|
 detail:
         prH sec |99875112226888999985410001122221268888530|
         prE sec |00000013551000000000001343211111000000000|
         prL sec |00124764221000000013578644666666621000369|
 subset: SUB sec |HHHH.LL...HHHHHHHHH..LL....LL.LLLHHHHH..L|
 accessibility
 3st:    P_3 acc |ebbebbb bbbbbbbbebbeeebbebbbbbbbbbb bbeee|
 10st:   PHD acc |70060005000000007007770060000000000400779|
         Rel acc |25714070929301843353212415638852002076310|
 subset: SUB acc |.bb.b.b.b.b...bb..b....b.bb.bbb.....bb...|

________________________________________________________________________________





The resulting prediction of globularity is:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

________________________________________________________________________________

---
--- GLOBE: prediction of protein globularity
---
--- nexp =   402    (number of predicted exposed residues)
--- nfit =   307    (number of expected exposed residues
--- diff =    95.00 (difference nexp-nfit)
--- =====> your protein appears as compact, as a globular domain
---
---
--- GLOBE: further explanations preliminaryily in:
---        http://www.columbia.edu/~rost/Papers/98globe.html
---
--- END of GLOBE

________________________________________________________________________________


-----------------------------------------------------------------------------
- PredictProtein (PP): News 1999                                            -
-----------------------------------------------------------------------------
-                                                                           -
- PP home:                                                                  -
  New York            http://dodo.cpmc.columbia.edu/predictprotein
-                                                                           -
- PP mirrors:                                                               -
  Australia (ANGIS)   http://molmod.angis.org.au//predictprotein
  England (EBI)       http://www.ebi.ac.uk/~rost/predictprotein
  Germany (EMBL)      http://www.embl-heidelberg.de/predictprotein
  India (CDFC)        http://iris.cdfd.org.in/~www/pp/predictprotein
  India (Pune)        http://202.41.70.33/predictprotein
  Israel (Beer-Sheva) http://www.cs.bgu.ac.il/~dfischer/predictprotein
  Italy (Rome)        http://obelix.bio.uniroma2.it/www/predictprotein
  Singapore (BIC)     http://embl.bic.nus.edu.sg/predictprotein
  Spain (CNB)         http://www.es.embnet.org/Services/MolBio/PredictProtein
  Switzerland (Glaxo) http://www.gwer.ch/tools/predictprotein
-                                                                           -
- Tools to post-process PP results:                                         -
-                                                                           -
- Generate a PostScript (or GIF, or TIFF):                                  -
  ESPript (New York)  http://dodo.cpmc.columbia.edu/cgi/pp/ESPript
  ESPript (Toulouse)  http://www-pgm1.ipbs.fr:8080/ESPript
-                                                                           -
-----------------------------------------------------------------------------

