J Mol Evol (1996) 42:247-256
jou,. ,_o MDLECULAR [EVOLUTION © Springer-Vertag N e w York Inc. I996
Evolution of the src-Related Protein Tyrosine Kinases Austin L. Hughes Department of Biologyand Institute of MolecularEvolutionaryGenetics,208 MuellerLab, The PennsylvaniaState University, UniversityPark, PA 16802,USA Received: 5 August 1995 / Accepted: 24 August 1995
Abstract. A phylogenetic analysis of src-related protein tyrosine kinases (PTKs) showed that one group of these genes is quite ancient in the animals, its divergence predating the divergence of the diploblast and triploblast phyla. Three other major groupings of genes were found to predate the divergence of protostome and deuterostome phyla. Most known src-related PTKs of mammals were found to belong to five well-differentiated families: srcA, srcB, abl, csk, and tec. One srcA gene ((yn) has an alternatively spliced seventh exon which shows a different pattern of relationship from the remainder of the gene; this suggests that this exon may have been derived by a recombinational event with another gene, perhaps one related to fgr. The recently published claim that mammalian members of this family expressed in the nervous system evolve more slowly at nonsynonymous nucleotide sites than do those expressed in the immune system was not supported by an analysis of 13 pairs of human and mouse orthologues. Rather, T-cellspecific src-related PTKs were found to have higher rates of nonsynonymous substitution than were those having broader expression. This effect was particularly marked in the peptide binding site of the SH2 domain. While the SH2 binding site was highly conserved among paralogous mammalian members of the srcA and srcB subfamilies, no such effect was seen in the comparison of paralogous members of the csk and tec subfamilies. This suggests that, while the peptide binding function of SH2 is conserved within both srcA and srcB subfamilies, paralogous members of the csk and tec subfamilies have diverged functionally with respect to peptide recognition by SH2.
Key words: Tyrosine kinases - - Protein - - Phylogenetic analysis
Introduction Phosphorylation of tyrosine residues of proteins, catalyzed by a superfamily of enzymes known as protein tyrosine kinases (PTKs), plays an important role in the regulation of animal cell differentiation and proliferation (Hanks et al. 1988; Cantley et al. 1991; Klausner and Samelson 1991; Mustetin 1994; Pawson 1995; Taniguchi 1995). Functionally, PTKs of animals can be divided into two major categories: (1) receptor PTKs, which include a transmembrane portion and an extracellular receptor, as well as an intracellular enzyme domain; and (2) nonreceptor or cytoplasmic PTKs, which are entirely cytoplasmic in location and interact with cytoplasmic portions of adhesion receptors of other cell surface receptors. The prototypic nonreceptor PTK gene was the src oncogene, originally discovered as a viral gene and later found to have a cellular counterpart (Czerilofsky et al. 1980). Recently, a number of genes related to src have been described in a variety of animal species, all characterized by possession of the following three domain types (listed here in order from N- to C-terminal): (I) src-homology region 3 (SH3); (2) src-homology region 2 (SH2); (3) and the enzymatic kinase domain (sometimes called SH1) (Bolen 1993). To determine the evolutionary relationships of these genes, I conducted a phylogenetic analysis of src-related
248
PTKs, operationally defined by the presence of these three domain types. In the case of several of these genes, alternative transcripts, derived by alternative splicing of the primary transcript of a single gene, have been described. To understand how alternative splicing has evolved, I analyzed in detail two alternatively spliced members of this family. In a recent paper, Kuma et al. (1995) argued that the rate of nonsynonymous (amino acid-altering) nucleotide substitution is tissue dependent, with brain-specific genes evolving more slowly than those expressed in other tissues, particularly tissues of the immune system. Their conclusion was based on comparison of putatively orthologous genes of human and mouse, including six src-related PTKs. Because this number represents less than half the number of orthologous pairs belonging to this family for which both human and mouse sequences are available, I extended Kuma et al.'s (1995) analysis to additional genes. Comparing rates of nonsynonymous evolution in functionally distinct gene regions can provide evidence of how related genes have diverged functionally from each other after gene duplication (Hughes 1993). I used this approach to test for functional divergence among related groups of mammalian src-related PTK genes.
Table 1.
Sequences used in analyses
Species Viruses Avian sarcoma virus Rous saroma virus Feline sarcoma virus Porifera (sponges)
SpongiIIa tacustris
analyzed for genes listed in Table 1. The nomenclature of these genes in the literature is complex and sometimes inconsistent. Many authors, following the usage prevalent soon after the discovery of src and related genes, have prefixed gene names with " c - " and " v - " to indicate cellular and viral genes, respectively. In this paper, because only a few of the genes analyzed were of viral origin, these prefixes were not used. Rather, each gene symbol was given a prefix to indicate the species of origin (Table 1). The protein products of sir-related genes are sometimes named on the basis of the molecular mass of the protein product; e.g., the 62-kDa product of the yes gene may be referred to as p t U -yes. However, since molecular mass differences were not directly relevant to the present study, for simplicity the protein product was referred to by the nonitalicized form of the gene symbol. Mustelin (1994) classified nor~eceptor PTKs in a number of families. The genes analyzed here include representatives of his abl, csk, src, and tec families. He further divided the src family into two subfamilies: A (including src, yes, yrk, fyn, and fgr) and B (including lyn, hck, ~k, and blk). However, this classification was not based on a formal phylogenetic analysis of sequence data, and the relationship among the families was not examined. The phylogenetic analyses reported here tested whether these families correspond to natural groupings of sequences descended from a common ancestor as well as examining evolutionary relationships of sequences within families and evolutionary relationships among families. To study the evolution of alternative splicing among noureceptor PTKs, I focused on the cases of Iyn and d3~ngenes from mammals. In the case of tyn, alternative splicing leads to the expression of two major protein forms, lynA and lynB, which differ in that the former includes 21 additional amino acids in the amino-terminal portion of the protein (Yi et al. 1991). In the case offyn, two alternatively spliced forms have been reported in mammals: (1) a form expressed in thymocytes, splenocytes, and some hematolymphoid cell lines; and (2) a form found in
Accession no.
ASV-src ASV-yes RSV-src FeSV-abl
L21974 V01170, J02027 D10652 M15805
Sl-rkl Sl-rk4
X61601 X61604
Ha-stk
M25245
Dm-src Dm-abl Dm-src28C
Ml1917 M 19692 M16599
Xh-yes Xh-fyn Xx-src
X54970 X54971 X64658
Xl-src Xl-yes Xl-fyn
M24704, J04822 X14377 M27502
Ch-src Ch-yes Ch-yrk Ch-fyn Ch-tkl Ch-csk
U00402, J00844 X13207 X67786, X68973 X52841 J03579 M85039
M-src M-yes M-fgr M-fyn M-blk M-lyn M-hck M-ick M-srm M-ahl M-ark M-tec M-lyk M-ntk M-csk Mco-txk R-fgr R-lyn R-hck R-ntk R-csk H-src H-yes H-fgr H-syn H-slk H-tyn
M 17031 M67677 X16440 X52481 M30903 M64608 Y00487 X03533 D26186 J02995, J02996 L08967 $53716 D14042 L27738 U05244 L35268 X57018 L14782 X66245 L34542 X58631 K03218 M15990 M 19722, J03429 M14333 M14676 M16038
Cnidaria
Hydra attenuata Arthropoda
Drosophila melanogaster
Chordata Osteichtyes
Xiphophorus helleri Xiphophorus xiphidium Amphibia
Xenopus laevis
Aves Chicken GaIlus gatlus
Methods Sequences Analyzed. DNA and deduced amino acid sequences were
Symbol
Mammalia Mouse Mus musculus
Mus cookii Rat Rattus norvegicus
Human Homo sapiens
brain. These differ in having alternative forms of the seventh exon (Fig. 1), which encodes the C-terminal portion of the SH2 domain and the N-terminal portion of the kinase domain (Cooke and Perlmutter 1989). This phenomenon was discovered by comparison of the former type of
249 Tab~ 1. Species
Continued Symbol
Accession no.
H-hck H-YTI 6 H-Ick H-frk H-arg H-aN H-ark H-tec Htxk H-csk
M16591 X05027 X14055 U00803 M35926 X16416 XI6416 D29767 L27071 X60114
mRNA from mouse with the latter type from human (Cooke and Perlmutter 1989). SH3 and SH2 domains are both involved in signal transduction through interactions with cytoplasmic domains of cell surface receptors (Pawson 1995). SH3 domains bind proline-rich peptides (Feng et al. 1994). SH2 domains bind phosphotyrosine-containing peptides, and the structure of this complex has also been determined (Waksman et al. 1993). The SH2 domain includes two e~helices, encoded in the N-terminal and C-terminal portions of the domain (Fig. 1). The central portion of the domain consists of a continuous 13 meander, involving two ~3 sheets (Waksman et al. 1993), The N-terminal e~ helix, along with one side of the N-terminal [3 sheet, is involved in binding the phosphotyrosine, whereas the C-terminal c~ helix, along with the opposite side of the N-terminal [3 sheet and the C-terminaI ~3 sheet, provides a binding site for the three residues following the phosphotyrosine (Waksman et al. 1993).
Statistical Metho&. Src-related PTKs were aligned at the amino acid level by means of the CLUSTAL V program (Higgins et al. 1992), and the alignments were corrected in some instances by eye. Outside of the SH3, SH2, and kinase domains, there is no discernable sequence homology among all sequences analyzed; therefore, only these regions were used for phylogenetic analysis. An alignment of these domains from representative sequences is shown in Fig. 1; the complete alignment is available from the author on request. In all analyses reported here, any codon at which the aligm~aentpostulated a gap in any one of a set of sequences being compared was excluded fi'om all pairwise comparisons so that a comparable set of data was used for each pairwise comparison. A phylogenetic tree of src-related PTKs was constructed by the neighbor-joining method (Saitou and Nei 1987) on the basis of the proportion of amino acid differences, and the statistical significance of internal branches was tested by Rzhetsky and Nei's (1992) method. In the analysis of alternative splicing offyn, neighbor-joining trees for this and closely related genes were constructed for the seventh exon and for the remainder of the gene on the basis of the number of nonsynonytutus nucleotide substitutions per site (dN), estimated by Nei and Gojobori's (1986) method. Because the standard error of branch lengths is complicated when d~vis used as a distance, the reliability of branching patterns in these trees was assessed by bootstrapping (1,000 replications) (Felsenstein 1985). On the basis of the phylogenetic analysis, 13 putative pairs of orthologous human mid mouse (Mus muscuIas or Mus cooki) genes were selected. The orthologous nature of these pairs was assessed by computing the number of synonymous substitutions per site (ds) between the human and mouse genes. The ds values ranged from 0.408 to 0.874, which is well within the range of typical values of the numbers of synonymous substitutions per site between these two order (e.g., Wolfe et al. 1989). In the case of these 13 gene pairs, estimation of dN between human and mouse was used to assess the relative degree of
functional constraint on different geue regions. It has been argued that when there is a bias toward transitions at twofold-degenerate sites, a method like Nei and Gojobori's (1986) may overestimate the rate of synonymous substitution per site and slightly underestimate the rate of nonsynonymous substitution per site (Li 1993). In the present case, however, there was little difference between the results of Li's (1993) and Nei and Gojobori's (1986) methods (data not shown), and only the latter are reported here. The standard error of mean dN for a set of pairwise comparisons was estimated by Nei and Jin's (1989) method. In examining the degree of conservation of these 13 pairs of orthologous genes, dN was computed separately in the SH3 domain, in the kinase domain, in the peptide-binding site residues of the SH2 domain (Fig. 1), and in the remainder of the SH2 domain. In addition, estimation of dN in these regions for comparisons among paralogous mammalian src-related PTK genes was used to test for evidence of functional divergence among these genes.
Results
Gene Phylogeny The unrooted p h y l o g e n e t i c tree o f src-related P T K s (Fig. 2) s h o w e d a n u m b e r of subfamilies defined by highly significant internal branches. N e a r l y all o f the available vertebrate genes fell into five subfamilies, w h i c h in Fig. 2 are designated srcA, srcB, abl, tec, and csk. T h e s e subfamilies, each o f w h i c h was supported by a statistically significant internal branch, correspond to the subfamilies defined by M u s t e l i n (1994), although they include additional sequences not m e n t i o n e d by that author. M o r e o v e r , significant internal branches defined three m a j o r subdivisions o f the tree, each of which included both vertebrate and invertebrate sequences. T h e branch labelled I in Fig. 2 established a group including a Drosophila s e q u e n c e (Dm-src), sequences f r o m the sponge Spongilla lacustris and the coelenterate Hydra attenuata, as w e l l as the s r c A and srcB subfamilies f r o m vertebrates (Fig. 2). The abt family also included a Drosophia gene, grouped with vertebrate abl and arg by a highly significant internal branch (Fig. 2). Finally, the Drosophila D M - s r c 2 8 C g e n e was g r o u p e d with the vertebrate tec family by a h i g h l y significant internal branch (Fig. 2). Therefore, the p h y l o g e n e t i c analysis indicated that at least these three m a j o r groups o f src-related P T K s arose prior to the d i v e r g e n c e of vertebrates and invertebrates. Furthermore, since each o f these groups included seq u e n c e s f r o m Drosophila and f r o m vertebrates, t h e y must h a v e arisen prior to the d i v e r g e n c e o f the prot o s t o m e p h y l a (such as A r t h r o p o d a ) f r o m the d e u terostome phyla (such as Chordata). T h e only two available vertebrate sequences falling outside the five subfamilies indicated in Fig. 2 were M - s r m and H-frk. The m o u s e protein M - s r m fell in the major grouping with abl, tec, and csk subfamilies, being like t h e m separated f r o m the srcA and srcB families by branch 1 (Fig. 2). H o w e v e r , no other sequences closely related to M - s r m were available. T h e h u m a n s e q u e n c e H-frk grouped with sequences f r o m the sponge S. lacus-
250 SH3 VALYDYESRT VALYDYEART VALYDYEART EALYDYEART H-frk V A L F D Y Q A R T H-abl V A L Y D F V A S G H-t~k KALYDFLPRE H-PT~ VAMyDFQAAE H-MATK I T K C E H T R P K H-cyl IAKYNFHGTA
H-src H-syn M-fyn Ch-fyn
ETDLSFKKGE EDDLSFHKGE EDDLSFHKGE EDDLSFHKGE AEDLSFRAGD DNTLSITKGE PCNLALRRAE GHDLRLERGQ PGELAFRKGD EQDLPFCKGD SH2 H-src P S D S I Q A E E . . . . . . W Y F G K H-syn P V D S I Q A E E . . . . . . W Y F G K M-fyn P V D S I Q A E E . . . . . . W Y F G K Ch-fyn P V D S I Q A E E . . . . . . W Y F G K H-frk EDRSLQAEP ...... WFFGA S-abl P V N S L E K H S . . . . . . W Y H G P H-txk ENKI .... TN L E I Y E W Y H R N H-pTKA GKKS .... NN L D Q Y E W Y C R N H-MATK DGEALSADPK LSLMPWFHGK H-cyl KREGVKAGTK LSLMYWFHGK
RLQIVN-NTE KFQILN-SSE KFQILN-SSE KFQILN-SSN KLQVLD-TLH KLRVLG-YNH EYLILE-KYN EYLILE-KND VVTILEACEN VLTIVAVTKD
GDWWLAHSLS GDWWEANSLT GDWWEARSLT GDWWEARSLT EGWWFAKHLE NGEWCEAQTK PHWWKARDRL VHWWRARDKY KSWYRVKHHT PNWYKAKNKV
ITRNESERLL Z N A N N P R G T F LGRKDAERQL LGRKDAERQL LGRKDAERQL IGRSDAEKQL VSRNAAEY-L ITRNQAEHLL MNRSKAEQLL ISGQEAVQQL ITREQAERLL
LSFGNPRGTF LSFGNPRGTF LSFGNPRGTF LYSENKTGSF L-SSGINGSF R-QESKEGAF R-SEDKEGGF QPPED--GLF ¥PPET-~GLF
60 TGQT . . . . . . . G Y I P S N Y V A TGET ........ G Y I P S N Y V A TCET ....... GYIPSNYVA TGET ....... GYIPSNYVA KRRDGSSQQL QGYIPSNYVA NGQ . . . . . . . . G W V P S N Y I T -GNE . . . . . . . G L I P S N Y V T -GNE . . . . . . . G Y I P S N Y V T SGQE . . . . . . . G L L A A G A L R -GRE . . . . . . . G I I P A N Y V Q * **** 120 LVRESETTKG AYCLSV--$D LIRESETTKG AYSLSI--RD LIRESQTTKG AYSLSI--RD LIRESETTKG AYSLSI--RD LIRESESQKG EFSLSV--LD LVRESESSPG QRSISL--RY IVRDSRH-LG $YTISVFMGMVRDSSQ-PG LYTVSLYTKLVR~SARHPG DYVLCV . . . . L V R E S T N Y P G D Y T L C V .... 180
kinase
**
H-src H-syn M-fyn Ch-fyn
FDNAKGLNVK WDDNKGDHVK WDDMKGDHVK WDDMKGDHVK H - f r k ..... G A V V K H-abl ..... EGRVY H-txk A R R S T E A A I K H-PTKA FGGEGSSGFR H - M A T K ---SFGRDVI H-cyl ---SCDGKVE
HYKIRKLDSG HYKIRKLDNG HYKIRKLDNG HYKIRKLDNG HYRIKRLDEG HYRINTASDG HYQIKKNDSG HYHIKETTTS HYRV-LHRDG HYRI-MYHAS
9--FYITSRT_~NSLQQLVA
YYSKHADGLC H R L T T V C P T S
G--YYITTRA G--YYITTKA G--YYITTRA G--FFLTRRR K--LYVSSES --QWYVAERH PKKYYLAEKH H--LTIDEAV K--LSIDEEV
QFETLQQLVQ QFETLQQLVQ QFETLQQLVQ IFSTLNEFVS RFNTLAELVH AFQSIPELIW AFGSIPEIIE FFCNLMDMVE YFENLMQLVE
HYSEN.~.GLC HYSEKADGLC HYSEKADGLC HYTKTSDGLC HHSTVADGLI YHQHNAAGLM YHKHNAAGLV HYSKDKGAIC HYTSDADGLC
H-src H-syn M-fyn Ch-fyn
-K-PQTQGLA -M-PRLTDLS -T-PQTSGLA -T-PQTVGLA H-frk QV-PAPFDLS H-abl N K - P T V Y G V S H-~xk -GSCLPATAG H-PTKA -GKNAPTTAG H-MATK -KHGTKSAEE H-cyl - M E G T V A A Q D
---KDAWEIP VKTKDVWEIp ---KDAWEVA ---KDAWEVA YKTVDQWNID PNY-DKWEME -FSYEKWEID -FSYEKWEIN ELARAGWLLN EFYRSGWALN
RESLRLEVKL RESLQLIKRL RDSLFLEKKL RDSLFLEQKL RNSIQLLKNL RTDITMKHKL PSELAFIKE[ PSELTFMREL LQHLTLGAQI MKELKLLQTI
GQGCFGEVWM GNGQFGEVWM GQGCFAEVWL GQGCFAEVWR GSGQFGEVWE GGGQYGEVYE GSGQFGWHL GSGLFGWRL GEGEFGAVLQ GKGEFGDVML
GTWNGTT-RV GTWNGNT-KV GTWNGNT-KV GTWNGNT-KV GLWNNTT-PV GVWKKYSLTV GEWRSHI-QV GKWRAQY-KV GEYLGQ--KV GDYRGN--KV
H-src
VMKKLRHEKL IMKKLKHDKL IMKKLKHDKL IMKKLKHDKL IMKNLRHPgL VMKEIKHPNL VMMKLSHSKL VMMKLTHPKL VMTKMQNENL VMTQLRHSNL
VQLYAVVSEVQLYAWSEVQLYAVVSEVQLYAVVSRIQLYAVCTLE VQLLGVCTRE VQLYGVCIQR VQLYGVCTQQ VRLLGVILHQ VQLLGVIVE-
EP--!YIVTE EP--IYIVTE EP--IYIVTE RP--IYIVTE DP--IYIITE PP--FYIITE KP--LYIVTE KP--IYIVTE ---GLYIVME EKGGLYIVTE
YMSKGSLLDF YMNKGSLLDF YMSKGSLLDF YMSKGSLLIF LMRHGSLQEY FMTYGNLLDY FMENGCLLNY FMERGCLLNF NVSKGNLVNP YMAKGSLVDY
H-syn M-fyn Ch-fyn H-frk H-abl H-txk H~PTKA H-MATK H-cyl
SPEAFLQEAQ SPESFLEEAQ SPESFLEEAQ SPESFLEEAQ DPNDFLREAQ EVEEFLKEAA SEEDFIEEAK CEEDFIEEAK TAQAFLDETA TAQAFLAEAS
CRLVVPCHKG FNLTVVSSSC FNLTVIATNN VKLGKPCLKI TTLHYPAPKR TRLRYPVGLM TRLRYPVSVK TKLVRP--KR TRLIKP--KV 240 AIKTLKPGTM AIKTLKPGTM AIKTLKPGTM AIKTLKPGTM AVKTLKPGSM AVKTLKEDTM AIKAINEGSM AIKAIREGAM AVKNIKCDVAVKCIKNDA300 iKGETGKYLR LKDGEGRALK LKDGEGRALK LKDGEGP~LK LQNDTGSKIH LNECNRQEVN LREN-KGKLR LRQR-QGHFS LRTNGRALVN 5RSRGRSVLG
HCsrc H-syn M-fyn Ch-fyn
LPQLVDMAAQ LPNLVDMAAQ LPNLVDMAAQ LPNLVDMAAQ H-frk LTQQVDMAAQ H-abl AVVLLyMATQ H-txk KEMLLSVCQD H-PTKA RDVLLSMCQD H-MATK TAQLLQFSLH H-cyl G D C L L K F N L D H-s~c
IASGMAYVER VAAGMAYIER VAAGMAYIER VAAGMAYIER VASGMAYLES ISSAMEYLEK ICEGMEYLER VCEGMEYLER VAEGMEYLES VCEAMEYLEG
MNYVHRDLRA MNYIHRDLRS MNYIHRDLRS MNYIHRDLRS RNYIHRDLAA KNFIHRDLAA NGYIHRDLAA NSFIHNDLAA KKLVHRDLAA NNFVHRDLAA
ANILVGENSV ANILVGNGLI ANILVGNGLI ANILVGNGLI RNVLVGEHNI RNCLVGENHL RNCLVSSTCI RNCLVSEAGV RNILVSEDSV RNVLVSEDNV
CKVADFGLAR CKIADFGLAR CKIADFGLAR CKIADFGLAR YKVADFGLAR VKVADFGLSR VKISDPGMTR VKVSDFGMAR AKVSDFGLAK AKVSDFGLTK
T A R Q G A K F P I KWTAPE~uALY G R F T I K S D V W S F G I L L T E L T T K G R V P Y P G M
H-syn T A R Q G A K F P I K W T A P E A A L Y G R F T I K S D V W S F G I L L T E L V T K G R V P Y P G M M-fyn TARQGAKFPI K W T A P E A A L Y G R F T I K S D V W S F G I L L T E L V T K G R V P Y P G M ch-fyn TARQGAKFPI KWTAPEAALY G R F T I K S D V W S F G I L L T E L V T K G R V P Y P G M H-frk H-abl H-txk H-PTKA H-MITK H-cyl
ESRHEIKLPV TAHAGAKFPI VSSFGAKFPI TSSSGAKFPV KGLDSSRLPV STQDTGKLPV
KWTAPEAIRS KWTAPESLAY KWSPPEVFLF KWCPPEVFNY KWTAPEALKH KWTAPEALRE
RGYRMPCPPE RGYR~PCPQD RGYRMPCPQD RGYRMPCPQD H-frk Q N Y R L P Q P S N H-abl KDYRMERPEG H-txk EGFSLYRPHL H-PTKA RGHRLYQPKL H-MATK KGYRMEPPEG H-cyl KGYKMDAPDG
CPESLHDLMC CPISLHELMI CPISLHELMI CPISLHELMI
H-src H-syn M-fyn Ch-fyn
CPQQFYNIML CPEKVYELMR APMS[YEVMY ASNYVYEVML CPGPVHVLMS CPPAVYEVMK
NKFSIKSDVW NKFSIKSD%nR NKYSSKSDVW SRFSSKSDVW G-FTSKSD~ KKFSTKSD%~ 447 QCWRKEP HCWKKDP HCWKKDP HCWKKDP ECWNAEP ACWQWNP SCWHEKP RCWQEKP SCWEAEP NCWHLDA
SFGILLYEII AFGVLLWEIA SFGVLMWEVF SFGVLMWEVF SFGVLLWEVF SFGILLWEIY
TYGKMPYSGM TYGMSPYPGI TEGKMPFENK TEGRMPFEKY SYGRAPYPKM SFGRVPYPRI
360 LI---EDNEY LI---EDNEY LI---EDNEY L~---EDNEY VFKVDNEDIY LM---TGDTY YV---LDDEY YF~--LDDQY ...... AE-R . . . . . . . EAS 420 VNREVLDQVE NNREVLEQVE NNREVLEQVE NNREVLEQVE TGAQVIQMLA DLSQVYELLE SNLQVVEAIS TNYEWTMVT SLKEVSEAVE PLKDVVPRVE
Fig. 1. Alignmentof SH3, SH2, and kinasedomainsof selected src-related PTKs. In SH2 of H-src,residuesin c~helices are italicized, while those in [3 strandsare underlined. Asterisks (*) indicate residuesin the peptide binding site.
tris. Note that there are also two partial sequences for two additional src-related PTKs from S. lacustris (Ottilie et al. 1992). In a tree based on these partial sequences and the corresponding regions of all sequences listed in Table 1, these two additional genes clustered with the two complete S. lacustris gene and with H-frk (data not shown). Thus H-frk seems to be a member of a very ancient subfamily of src-related PTKs whose origin predates the divergence of diploblast animal phyla (such as sponges) from triploblast phyla (higher Metazoa). Mustelin (1994) presented a hypothetical phylogeny of srcA and srcB subfamilies. Some of the groupings proposed by that author received strong statistical support by the present formal phylogenetic analysis; namely, the grouping of src with yes, yrk with fyn, and lyn with hck (Fig. 2). However, the phylogenetic analysis rejected the hypothesis of Mustelin (t994) the blk is a sister group to lck. In fact, blk was seen to be an outgroup to other members of the srcB subfamily. Because the relevant branches were quite short, the phylogeny suggested that the viral src-related PTKs are quite closely related to corresponding vertebrate genes. To exanaine whether such similarity is found at the D N A level as well as at the amino acid level, numbers of
synonymous (d s) and nonsynonymous (dN) nucleotide substitutions per site were estimated for comparisons between chicken and viral src and yes genes and human and viral abl genes (Table 2). The viral src and yes genes showed remarkable similarity to their chicken counterparts at both synonymous and nonsynonymous sites (Table 2). This suggests that these viral genes have been derived by quite recent capture of chicken genes. By contrast, d s in the comparison of human and FeSV abl genes was much higher than in the comparison of chicken and related viral genes. However, since this d s value is consistent with those between the mammalian orders Primates and Carnivora (Li et al. 1990), it is consistent with the very recent capture of the FeSV-abl gene from a feline source.
Alternative Splicing
Figure 3 illustrates the region of alternative splicing of lyn transcripts. The 21-amino-acid region present in lynA but absent in lynB showed evidence of homology to the corresponding region of related genes. A clearly homologous region was present in the hck proteins, which
251 srcA
M-fgr ASV-yes
csk .
M-yes
4 Ch-yrk
srcB
H-slk
M-~k
H-src '---~2 Ch-src ASV-src RSV-src
"~'H-YT16 Ch-tkl
H-txk
H-frk
M-tee
tec
\
Sl-srkl
M-lyk-- /
/
/
abl Dm-src
H4yk I
[
I
_~.__../
Dm-srcZBC
p
H-abl FeSV-abl
H-arg
Dm-ab~
M-srm
Fig. 2. Unrootedneighbor-joining tree of src-reiated PTKs based on the proportion of amino acid difference (p) at 375 aligned sites. Tests of the significance of internal branches: *P < 0.05; **P < 0.01; ***P < 0.001.
Table 2. Numbers of synonymous (ds) and nonsynonymous (dN) nucleotide substitutions per 100 sites (±SE) in comparisons between vertebrate and related viral src-related PTKs
Ch-src vs ASV-src vs RSV-src Ch-yes vs ASV-yes H-ab! vs FeSV-abl
0.3 _+0.3 1.0 ± 0.6 1.4 + 0.7 40.7 ± 4.7
0.4 _+0.2 0.4 ± 0.2 0.3 ± 0.2 0.3 -+0.2~
ads and dN are significantly different at the 0.1% level
formed an outgroup to the lyn proteins (Fig. 3). Thus, it seems likely that this region was present in the common ancestor of these two groups of genes. In addition, an NJ tree based on these 21 residues of the sequences in Fig. 3 showed the same topology as that based on the remainder of the sequence (data not shown). Therefore, the alternatively spliced portion of lyn genes has evidently evolved along with the remainder of the gene. By contrast, the alternatively spliced exon 7 of fyn genes revealed a pattern o f relationship that is quite different from the remainder of the gene. Figure 4 shows phylogenetic trees of exon 7 and of the remainder of the gene for srcA subfamily members; the tree is rooted based on Fig. 2. In exon 7, the mouse and chicken fyn
H-lynA M-lynA R-lynA H-LynB M-lynB R-lynB H-hck M-hck R-hck
MGCIKSKGKDSLSDDGVDL-KTQPVRNTERTIYVRDPTSNKQQRPVPES MGCIKSKRKDNLNDDEVDS-KTQPVRNTDRTIYVRDPTSNKQQRPVPEF MGeIKSKRKDNLNDDGVDM-KTQPVRNTDRTIYVRDPTSNKQQRPVPES M G C I K S K G K D S L S D D G V D L - K T Q P V . . . . . . . . . . . . . . . . . . . . . PES M G C I K S K R K D N L N D D E V D S - K T Q P V . . . . . . . . . . . . . . . . . . . . . PEF M G C I K S K R K D N L N D D G V D M - K T Q P V . . . . . . . . . . . . . . . . . . . . . PES MGSMKSK---FLQVGGNTFSKTETSASPHCPVYVPDPTSTIKPGPNSHN MGCVKSR---FLRDGSKA-SKTEPSANQKGPVYVPDPTSSSKLGPNNSN MGCVKSR---FLREGSKA-SKIEPNANQKGPVYVPDPTSPKKLGPNSIN ** **** •
Fig. 3. Alignment of N-terminal potion of lyn and rela~d members of the srcB subfamily, illustrating the alternatively spliced domain found in lynA but not in lynB.
D N A sequences clustered together, apart from other fyn and syn sequences (Fig. 4A). In this region, mouse and chicken fyn clustered with f g r sequences, although this clustering pattern did not receive strong bootstrap support (Fig. 4A). In the remainder of the gene, however, mouse and chicken fyn sequences clustered with other fyn and syn sequences (Fig. 4B). Furthermore, this phylogeny" suggested that, outside of exon 7, the genes in the fyn/syn cluster are orthologous, because their phylogeny corresponded to the species' phylogeny; that is, a fish sequence clustered outside of tetrapod sequences, while the frog sequence clustered outside amniote sequences (Fig. 4B). A n additional anomaly in these trees was that in the exon 7 region Ch-yrk clustered with A n and syn genes, a pattern that received strong bootstrap support,
252
r Ch-yes 81r~ASV-yes I tM-yes [-~ [H-yes h XI-yes II ' Xh-yes .~ Xx-src
A
B
Ch-yes
8~ TASV-yes 100198L~-M'yes 9 6 ~ L- H-yes 1 t Xl_Yexh_yes 8~ H-SFC ~ . ~ LM-src
l I I~U-src
I IF- Asv- rc
9o! ~ [ASV-src I - qCh-src
1
991 ,oo-~ C,-~rc
I O01
8z[RSV.src
L__ Xl-src
I
G8L_XI_src RSV-src
I
99~_~..~_n Ch-fyn
1o0
~ooF-M-fgr L~
t
O L
-Ch-yrk
°°I L,,
.OG i
i
dN
R-fgr H4gr
~-~
Ch-yrk
99
I ........ Xh-fyn 99[ [H-slk .~ Xl-fyn 73[ H-syn
76j-H-syn
9~L_ H-slk r~l L M-fyn LCh.fy n
I00[
' XI-fyn Xh-fyn
,04 ~
o I
I
Xx-src H-fgr 99F - ~ R-fgr L--M-fgr
t
d, Fig. 4. Neighbor-joining trees for srcA genes (A) in exon 7 and (B) in the remainder of the coding region, based on the number of nonsynonymous substitutions per site (dN), Numbers on branches represent the percent of 1,000 bootstraps supporting the branch,
whereas in the remainder of the gene Ch-yrk clustered with fgr genes, a pattern for which support was also fairly strong (Fig. 4). Table 3 shows numbers of nonsynonymous nucleotide substitutions per site (dN) in comparison between mouse and chicken src, yes, and fyn genes. These comparison show that, even though the exon 7 regions of mouse and chickenj~n genes clustered together in the phylogenetic tree, the exon 7 region has actually diverged at nonsynonymous sites between these two species both to a greater extent than have other regions of thef~n gene and to a greater extent than have the exon 7 regions of src and yes genes (Table 3). Thus, this alternatively spliced exon seems to have been subject to reduced functional constraint in comparison with other regions of the J}cn gene and in comparison with corresponding regions of related genes.
Tissue Specific Constraints For 13 orthologous pairs of human and mouse genes selected on the basis of the phylogenetic tree (Fig. 2), d N was estimated in the SH3, SH2, and kinase domains, and the pattern was compared with information from the literature on tissue expression (Table 4). It was found that the members of this sample could not readily be classified in the same way as was done by Kuma et al. (1995). These authors placed PTKs in two mutually exclusive categories: those having expression in the "immune sys-
Table 3. Numbers of nonsynonymous substitutions per 100 sites (dN _+ SE) in different regions of orthologous mouse and chicken srcA genes*
src yes
~n
SH3
Exon7
0.7±0.7 2.5±I.4 1,4±1.0
1,7±1.2 0.0±0,0 8.6±2,8
Remainder SH2
Remainder kinase
2,2±1,1 1.6±0.9
1,4±0.5 1,7±0.6 b 1.6±0.6 a
0,5~0,5 b
* Tests of the hypothesis that d N equals the corresponding value for exon 7: ap < 0.05; bp < 0.01
tem" and those having expression in "neural/brain systems," However, according to the published literature, several src-related PTKs should properly be assigned to both of these categories. First, there are certain genes (src, yes, and abl) which have an essentially universal expression (Table 4), thus including presumably cells of both the nervous and immune systems. Furthermore, certain genes with more restricted expression were found to be expressed in both nervous and immune system cells. For example, ntk is expressed in both brain and T cells, and lyn is expressed both in brain and in certain immune systems cells such as macrophages and B cells (Table 4). To test Kuma et al.'s (1995) hypothesis that nervous system proteins are subject to greater constraint than immune system proteins, I subjected d N in different gene regions to a two-way analysis of variance using a general linear models procedure; the factors tested were expression in the nervous system, expression in the immune
253 Table 4. Tissueexpression and numbersof nonsynonymoussubstitutionsper 100 sites (dN + SE) in different regions of orthologous human and mouse src-related PTKs SH2 Gene
Tissue [ref.]
SH3
Binding site
Remainder
Kinase
AI1, incl. Br He Ki Lu Ne Sp [1] AII, incl. Br Ki Li Ne Tc [2-4] All, incI. Sp, Te, Th [5,6]
0.0 ± 0.0 0.8 ± 0.8 0.0 ± 0.0
0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0
0.0±0.0 0.0±0.0 0.0±0.0
0.4±0.3 0.3±0.2 0.2±0.2
Gr Ma Mo [7] Bc Br Ma Sp [8,9] Bc Tc [10, 11] Hm Ma Mo [12,13] Bc Mk [14,15] He Li [16l Br Tc [17] Br Ki Th [18]
6.9 ± 2.5b 2.4 ± 1.4 0.8 ±0.8 3.8 ± 1.8 1.6± 1.2 4.0 ± 1.9a 9.9 ± 3.0° 0.0 ± 0.0
0.0 ± 0.0 0.0 ± 0.0 0.0±0.0 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0
8.0 ± 2.3° 0.6 ± 0.6 1.6± 1.0 1.6 ± 1.0 0.0 ± 0.0 3.2 ± 1.5a 2.5 ± 1.3" 1.3 ± 0.9
5.6 + 1.0° 1.1 ± 0.4u 1.3 ±0.5 b 1.3 ± 0.5b 0.5 ± 0.3 2.8 ± 0.7° 3.2 ± 0.8~ 1.1 ± 0.5a
Tc [19, 20] Tc [21]
5.1 ±2.I 6.0±2.3
Universal expression SFC yes
abt
Broad expression fgr lyn tck hck ark ~e¢
ntk csk
T cell specific
l),k txk
Oneway ANOVA:F2.1o (t')
2.49 (n.s.)
6.3 ±5.2 10.4_+6.7 70.27 (<.0001)
8.0± 2.3 11.8 ±2.9 12.34 (<.005)
4.0+0.9 6.0+ 1.1 6.17 (<.05)
a Tissue abbreviations:Bc B cells; Br brain; Gr granulocytes;He heart; Hm hematopoietic cells; Ki kidney; Li liver; Lu lung; Ma macrophages; Mk megakaryocytes; Mo monocytes;Ne neurons; Sp spleen; Tc T cells; Te testis; Th thymus, References: [1] Martinezet al. 1987; [2] Kawakamiet aL 1986; [3] Semba et al. t986; [4] Zhao et al. 1990; [5] Wang and Baltimore 1983; [6] Oppi et a1.1987; [7] Ley et al. 1989; [8] Yi et al. 1991; [9] Yamanishiet al. 1989; [10] Marth et al. 1985;
[1i] Veilletteet al. 1988; [12] Holtzmannet al. 1987; [13] Ziegleret al. 1987; [141 Bennettet aI. 1994; [15] Tsukadaet al. 1993; [16] Mano et al. 1993; [17] Chow et al. 1994; [181Partanenet al. 1991; [191Tanaka et al. 1993; [20] Gibson et al. 1993; [21] Ha±re et al. 1994. Tests of the hypothesis that ~v equals the correspondingvalue for the SH2 binding site: ~P < 0.05; bp < 0.01; ~P < 0.001
system, and their interaction. No statistically significant effects were detected for any of these factors in SH3, in the binding sites or the remainder of SH2, structurally defined regions of SH2, or in the kinase domain. Therefore, the results provide no support for Kuma et al.'s (1995) hypothesis. In addition, d u in these three regions was subjected to analysis of variance on the basis of the level of expression of the genes. Genes were categorized as follows: (1) those having universal or nearly universal expression; (2) those expressed in a number of tissues, and (3) those having T-cell-specific expression (Table 4). There were significant differences with respect to d N among these groups in both the binding site and the remainder of SH2 and in the kinase domain (Table 4). In each of these three regions, d N was higher in the case of T-cell-specific genes than in the case of genes with a broader expression (Table 4). The effect was particularly marked in the binding site of SH2. In the binding site, no nonsynonymous substitutions were observed in any of the universally or broadly expressed genes, whereas d N values for the T-cell-specific genes are among the highest values observed in any domain analyzed (Table 4).
were seen in the patterns of nousynonymous substitution in different gene regions (Table 5). In comparisons among members of the srcA subfamily (src, yes, and J)r), d N was generally quite high in SH2 outside the binding site; but within the SH2 binding site no nonsynonymous differences were observed among these three pairs of genes (Table 5). This indicates that all three of these genes share some strong functional constraint in the SH2 binding site. A similar pattern was observed in the case of the srcB subfamily. In comparisons among paralogous srcB genes, d,v values in the SH2 binding site were in every case significantly lower than those in the other domains analyzed (Table 5). Again these genes seem to share some functional constraint in the SH2 binding site. By contrast, in the csk and tec subfamilies, in all comparisons among pairs of paralogous m a m m a l i a n genes, d N in the SH2 binding site was not significantly different from that in the remainder of the SH2 domain (Table 5). In these subfamilies, dN in the SH2 binding domain was generally not significantly different from that in any other domain analyzed, except for two comparisons in the tec subfamily in which d N in SH3 was significantly higher than that in the SH2 binding site (Table 5). Note, however, that this does not imply that the SH2 binding site is not subject to functional constraint in these subfamilies. In several orthologous corn-
Patterns of Nonsynonymous Divergence
When different pairs of paralogous m a m m a l i a n srcrelated PTK genes were compared, striking differences
254 Table 5. Mean numbers of nonsynonymous substitutions per 100 sites (tin -+ SE) in comparisons of different regions of paralogous mammalian src-related PTK genes* SH2
Comparison srcA src vs yes vsfgr
SH3
Binding site
Remainder
Kinase
t7.1 -+ 4.1 ° 18,7+4.1 ° 16,9-+3.8 °
0.0 _+0~0 0.0 ± 0,0 0.0-+0.0
24.5 +- 4.5 ° 35.6±5.5 ~ 21.0+3,8 °
7.2 ± 1.2 ~ 16,2+ 1.8 c 15.9+1.7 ~
srcB lck vs hck vs lyn hck vs lyn csk
37.0 -+ 6 . C 40.2+7.0 c 23.4 _+4.7 c
4.1 -+ 4.2 5.2-+4.7 2.1 -+ t.7
26.7 ± 4.6 c 28.5-+4.8 ° 30.6 _+5.0 ~
16,1 -+ 1.8 b 20.2-+2.1 b 12.5 _+ 1.5 ~
cskvsntk
66.4_+10.6
41.8±16.3
34.4-+5.5
31,4-+2.8
34.6-+6.1 57.6_+9.2" 43.1-+7.3 47.7 -+ 7.8 ~ 31.3+_5.8 53,0-+ 8.6
21.0-+9.5 23.4±9.7 48.0-+17.4 20.7 -+ 9.7 18.5+_9.7 38.1-+15.1
38.0+5.6 41.8-+5.9 36.9-+5.5 37.0 -+ 5.6 32.0-+ 5.2 43.6+6.4
20,2_+2.0 28.1 -+2.4 30.4-+2.6 26.7 -+ 2.4 27.4-+2.5 35.3+3.0
yesvsfgr
tec
f x k v s tec vs lyk vsatk tec vs lyk vsatk lykvsatk
* Tests of the hypothesis that d N equals the corresponding value for the SH2 binding site: ap < 0.05; bp < 0,01; °P < 0.001
parisons involving members of this family, dN in the SH2 binding site was significantly lower than that in the remainder of SH2 or in the kinase domain (Table 4). Rather, the fact that paralogous comparisons do not show the same degree of conservation as seen in orthologous comparisons indicates that the binding sites of paralogous members of these subfamilies do not share the same functional constraints.
Discussion Phylogenetic analyses indicated that the src-related PTKs are an ancient gene family in animals. At least one group (corresponding to the clade in Fig. 2 including the human frk gene) predates the divergence of diploblasts and triploblasts, which is believed to have occurred very early in the history of multicellular animals (Willmer 1990). In addition, tbxee major groupings of genes were shown to predate the protostome/deuterostome divergence. By contrast, viral members of this family have been captured very recently from their vertebrate hosts. This recent viral capture contrasts markedly with the case of homologues of vertebrate genes found in pox viruses, where gene capture seems to have occurred in the distant past (Hughes, unpublished). Kuma et al. (1995) presented data suggesting that, in this and in certain other families of genes, genes with expression in the immune system have evolved more rapidly than genes expressed in the nervous system. Analysis of a more extensive set of src-related PTK
genes than that analyzed by Kuma et al. (1995) failed to support this hypothesis. Rather, the best predictor of the rate of evolution of these genes was breadth of tissue expression. Hughes and Hughes (1995) recently compared rates of amino acid evolution between human and murine rodent (mouse or rat) in a random sample of 120 proteins. They categorized these proteins into three groups on the basis of their breadth of tissue expression, as reported in the literature: (1) restricted, being expressed in only one cell type or tissue type; (2) broad, being expressed in a variety of tissues; and (3) universal, being expressed in all or nearly all cell types. They found that universally and broadly expressed proteins were much more conserved than were those with restricted expression (Hughes and Hughes 1995). The present study yielded a similar result in the case of the src-related PTKs. Hughes and Hughes (1995) hypothesized that a protein expressed in a wide variety of tissues must interact with a wide variety of other proteins and thus may be less free to vary than a protein that is expressed in a single cell type and encounters a smaller set of proteins. The results of the present study are consistent with this hypothesis. Not only were the highest rates of nonsynonymous evolution observed in genes (lyk and txk) with T-cell-restricted expression but this effect was most dramatic in the binding site of SH2, which is known to interact with other proteins (Waksman et al. 1993). Note that, under this hypothesis, it is the fact that these two genes are restricted to a specific ceil type rather than the fact that they are expressed in T cells per se that accounts for the reduction in constraint at the amino acid level. Comparison of rates of nonsynonymous evolution among pairs of paralogous mammalian src-related PTK genes revealed that these rates were not uniform across all groups. In particular, there was evidence of shared strong functional constraints on the SH2 binding site within both srcA and srcB subfamilies, whereas such strong constraints were absent in paralogous comparisons in the csk and tec subfamilies. Different SH2 domains are known to bind different phosphopeptide sequences (Songyang et al. 1993). Conservation of the SH2 binding site between paralogous loci would thus seem to imply a conservation of binding function. The absence of such conservation, as seen in the csk and tec subfamilies, would in turn imply that paralogous loci in these subfamilies have diverged functionally with respect to the phosphopeptide sequences which they recognize. The alternatively spliced portion of thefyn gene (exon 7) includes the C-terminal portion of SH2 and the N-terminal portion of the kinase domain. The N-terminal portion of the kinase domain of lck is important for its interaction with the IL-2 receptor (Hatakeyama et al. 1991; Minami et al. 1993), and this region may be important for similar interactions in the case of other srcrelated PTKs. The fact that the alternatively spliced forms of this exon are remarkably divergent from each
255 other suggests that alternative splicing may be a mecha n i s m for p r o d u c i n g f u n c t i o n a l l y d i s t i n c t t r a n s c r i p t s b y p r o v i d i n g a l t e r n a t i v e s e q u e n c e s o f this region. F u r t h e r more, the alternatively spliced exon 7 does not show the s a m e p a t t e r n o f r e l a t i o n s h i p as the rest o f the g e n e (Fig. 4). T h i s s u g g e s t s t h a t in t h e c a s e o f t h e f y n g e n e altern a t i v e s p l i c i n g m a y h a v e a r i s e n as a r e s u l t o f a n a n c i e n t r e c o m b i n a t i o n a l ( " e x o n s h u f f l i n g " ) e v e n t b y w h i c h the gene acquired an alternative exon 7 from another gene, p e r h a p s o n e r e l a t e d to fgr. T h e fact t h a t the c h i c k e n yrk g e n e also s h o w s a d i f f e r e n t p h y l o g e n y in this r e g i o n f r o m t h a t s e e n i n t h e rest o f t h e g e n e s u g g e s t s t h a t in this c a s e also a r e c o m b i n a t i o n a l e v e n t m a y h a v e b e e n inv o l v e d in g e n e r a t i n g a n o v e l P T K .
Acknowledgments. This research was supported by grants R01GM34940 and K04-GM00614 from the National institutes of Health. I am grateful for technical assistance by G. Lovgreen.
References Bennett BD, Cowley S, Jiang S, London R, Deng B, Grabarek J, Groopman JE, Goeddel DV, Avraham H (1994) Identification and characterization of a novel tyrosine kinase from megakeryocytes. J BioI Chem 269:1068-I074 Bolen JB (t993) Nonreceptor tyrosine protein kinases. Oncogene 8: 2025-2031 Cantley LC, Auger KR, Carpenter C, Duckworth D, Graziani A, Kapeller R, Soltoff S (1991) Oncogenes and signal transduction. Cell 64:281-302 Chow LML Jarvis C, Hu Q, Nye SH, Gervais FG, Veillette A, Matis LA (1994) Ntk: a Csk-related protein-tyrosine kinase expressed in brain and T lymphocytes. Proc Natl Acad Sci USA 91:4975-4979 Cooke MP, Perlmutter RM (1989) Expression of a novel form of the fyn proto-oncogene in hematopoietic ceils. New Biol 1:66-74 Czerilofsky AP, Levinson AD, Varmus HE, Bishop JM, Tischler E, Goodman HM (1980) Nucleotide sequence of an avian sarcoma virus oncogene (src) and proposed amino acid sequence for gene product. Nature 287:198-203 Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783-791 Feng S, Chert JK, Yu H, Simon JA, Schreiber SL (1994) Two binding orientations for peptides to the src SH3 domain: development of a general model for SH3-1igand interactions. Science 266:1241-1247 Gibson S, Leung B, Squire JA, Hill M, Avima N, Goss P, Hogg D, Mills GB (1993) Identification, cloning, and characterization of a novel human T-cell-specific tyrosine kinase located at the hematopoietin complex on chromosome 5q. Blood 82:1561-1572 Haire RN, Ohta Y, Lewis JE, Fu SM, Kroisel P, Litman GW (1994) 7XK, a novel human tyrosine kinase expressed in T cells, shares sequence identity with wc family kinases and maps to @12. Hum Mol Genet 3:897-901 Hanks SK, Quinn AM, Hunter T (1988) The protein kSnase family: conserved features and deduced phylogeny of the catalytic domains. Science 241:42-51 Hatakeyama M, Kono T, Kobayashi N, Kawahara A, Levin SD, Perlmutter RM, Taniguchi T (1991) Interaction of the IL-2 receptor with the src-family kinase p561ck: identification of novel intermolecular association. Science 252:1523-1528 Higgins DG, Bleasby AJ, Fuchs R (1992) Clustal V: improved software for multiple sequence alignment. Comput Appl Biosci 8:189-191 Holtzman DA, Cook WD, Dunn AR (1987) Isolation and sequence of
a cDNA corresponding to a src-related gene expressed in human hematopoietic cells. Proc Natl Acad Sci USA 84:8325-8329 Hughes AL (1993) Nonlinear relationships among evolutionary rates identify regions of functional divergence in heat-shock protein 70 genes. Mol Blot Evol 10:243-255 Hughes AL, Hughes MK (1995) Self peptides bound by HLA class I molecules are derived from highly conserved regions of a set of evolutionarily conserved proteins. Immunogeneties 41:257-262 Kawakami T, Pennington CY, Robbins KC (1986) Isolation and oncogenic potential of a novel human srcdike gene. Mol Cell Biol 6:4195-4201 Klausner RD, Samelson LE (1991) T cell antigen receptor activation pathways: the tyrosine kinase connection. Cell 64:875-878 Kuma K, Iwabe N~ Miyata T (I995) Fnnctional constraints against variations on molecules from the tissue level: slowly evolving brain-specific genes demonstrated by protein kinase and immnnoglobulin supergene families. Mol Biol Evol 12:123-130 Ley TJ, Connolly NL, Katamine S, Cheah MSC, Senior RM, Robbins KC (1989) Tissue-specific expression and developmentaI regulation of the human fgr proto-oncogene. Mol Cell Biol 9:92-99 Li WH (1993) Unbiased estimates of the rates of synonymous and nonsynonymous substitution. J Mol Evol 36:96-99 Li WH, Gouy M, Sharp PM, O'hUigin C, Yang YW (1990) Molecular phylogeny of Rodentia, Lagomorpha, Primates, Artiodactyla and Carnivora and molecular clocks. Proc Natl Acad Sci USA 87:67036707 Mano H, Mano K, Tang B, Koehler TY, Gilbert DJ, Jenkins NA, Copeland NG, Ihle JN (1993) Expression of a novel form of tec kinase in hematopoietic cells and mapping of the gene to chromosome 5 near kit. Oncogene 8:417M24 Marth JD, Peet R, Krebs EG, Perlmutter RM (1985) A lymphocytespecific protein-tyrosine kinase gene is rearranged and overexpressed in the murine T cell lymphoma LSTRA. Cell 43:393-404 Martinez R, Mathey-Prevot B, Bernards A, David B (1987) Neuronal pp60 ~s'c contains a six-amino acid insertion relative to its nonneuronal counterpart. Science 237:411-415 Minami Y, Kono T, Yamada K, Kobayashi N, Kawahara A, Perlmutter RM, Taniguchi T (1993) Association of p56 lc~ with IL-2 receptor [3 chain is critical for the IL-2-induced activation of p56 tck. EMBO J 12:759-768 Mustelin T (1994) Src family tyrosine kinases in lenkocytes. Landes, Austin, TX Nei M, Gojobori T (1986) Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol 3:418-426 Nei M, Jin L (1989) Variances of the average numbers of nucleotide substitutions between populations. Mol Biol EvoI 6:290-300 Oppi C, Shore SK, Reddy EP (1987) Nucleotide sequence of testisderived c-abl cDNAs: implications for testis-specific transcription and abl oncogene activation. Proc Natl Acad Sci USA 84:82008204 Ottilie S, Raulf F, Barnekow A, Hannig G, Schartl M (1992) Multiple src-related kinase genes, srk].4 in the fresh water sponge Spongilla lacustris. Oncogene 7:1625-1630 Partanen J, Armstrong E, BelLgananM, Maekelae P, Hirronen H, Huebher K, Alitalo K (t 991) Cyl encodes a putative cytoplasmic tyrosine kinase lacking the conserved tyrosine autophosphoylation site (Y41C~% Oncogene 6:2013-2018 Pawson T (1995) Protein modules and signalling networks. Nature 373:573-580 Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evot 4:406425 Semba K, Nishizama M, Noboyuki M, Yoshida MC, Sukegawa J, Yamanishi Y, Sasaki M, Yamamoto T, Toyoshima K (1986) yesrelated protooncogene, syn, belongs to the protein-tyrosine kinase family. Proc Natl Acad Sci USA 83:5459-5463 Songyang Z, Shoelson SE, Chaudhuri M, Gish G, Pawson T, Haser WG, King F, Robe,s T, Ratnofsky S, Lechleider R, Neel BG, Birge
256 RB, Fajardo JE, Chou MM, Hanafusa H, Schaffhausen B, CantIey LC (1993) SH2 domains recognize specific phosphopeptide sequences. Cell 72:767-778 Rzhetsky A, Nei M (1992) A simple method for estimating and testing minimum evolution trees. Mol Biol Evol 9:945-967 Tanaka N, Asao H, Ohtani K, Nakemura M, Sugamura K (1993) A novel human tyrosine kinase gene inducible in T cells by interleukin 2. FEBS Lett 324:1-5 Taniguchi T (1995) Cytokine signaling through nonreceptor protein tyrosine kinases. Science 268:251-255 Tsukuda S, Saffran DC, Rawlings DJ, Pavolini O, Allen RC, Klisak I, Sparkes RS, Kubagawa H, Mohandas T, Quan S, Belmont JW, Cooper MD, Conley ME, Witte ON (1993) Deficient expression of a B cell cytoplasmic tyrosine kinase in human X-linked agammaglobulinemia. Cell 72:279-290 Veillette A, Bookman MA, Horak EM, Bolen JB (1988) The CD4 and CD8 T cell surface antigens are associated with the internal membrane tyrosine-protein kinase p56 l~k. Cell 55:301-308 Waksman G, Shoelson SE, Pant N, Cowburu D, Kuriyan J (1993) Binding of a high affinity phosphotyrosyl peptide to the src SH2 domain: crystal structures of the complexed and peptide-free forms. Cell 72:779-790
Wang JYJ, Baltimore D (1983) Cellular RNA homologous to the Abelson murine leukemia virus transforming gene: expression and relationship to the viral sequence. Mol Cell BiN 3:773-779 WiUmer P (1990) Invertebrate relationships. Cambridge University Press, Cambridge Wolfe KH, Sharp PM, Li WH (1989) Mutation rates very among regions of the mammalian genome. Nature 337:283-285 Yamanashi Y, Mori S, Yoshida M, Kishimoto T, Inoue K, Yamamoto T, Toyoshima K (1989) Selective expression of a protein-tyrosine kinase, p56lyn, in hematopoietic cells and association with production of human T-cell lymphotropic virus type I. Proc Natl Acad Sci USA 86:6538-6542 Yi T, Bolen JB, Ihle JN (1991) Hematopoietic cells express two forms of lyn kinase differing by 21 amino acids in the amino terminus. Mol Cell Biol 11:2391-2398 Zhao YH, Krueger JG, Sudol M (1990) Expression of cellular-yes protein in mammalian tissues. Oncogene 5:1624-1635 Ziegler SF, Marth JD, Lewis DB, Perlmutter RM (1987) Novel proteintyrosine gene (hck) preferentially expressed in cells of hematopoietic origin. Mol Cell Biol 7:2276-2285