заявка
№ WO 1995016793
МПК G01N33/50

COMPOSITIONS AND METHODS RELATING TO DNA MISMATCH REPAIR GENES

Авторы:
BAKER, Sean, M. BOLLAG, Roni, J. KOLODNER, Richard, D.
Все (5)
Номер заявки
US9414746
Дата подачи заявки
16.12.1994
Опубликовано
22.06.1995
Страна
WO
Дата приоритета
14.12.2025
Номер приоритета
Страна приоритета
Как управлять
интеллектуальной собственностью
Реферат

[130]

Genomic sequences of human mismatch repair genes are described, as are methods of detecting mutations and/or polymorphisms in those genes. Also described are methods of diagnosing cancer susceptibility in a subject, and methods of identifying and classifying mismatch-repair-defective tumors. In particular, sequences and methods relating to human mutL homologs, hMLH1 and hPMS1 genes are provided.

[131]

[132]

Формула изобретения

WE CLAIM:

1. A method of diagnosing cancer susceptibility in a subject comprising detecting a mutation in a mutL homolog gene or gene product in a tissue of the subject, the mutation being indicative of the subject's susceptibility to cancer.

2. A method of identifying and classifying a DNA mismatch- repair-defective tumor comprising detecting in a tumor a mutation in a mutL homolog gene or gene product, the mutation being indicative of a defect in a mismatch repair system of the tumor.

3. The method of claim 1 or claim 2 wherein the step of detecting comprises detecting a mutation in hMLHl or hPMSl.

4. The method of claim 1 or claim 2 wherein the step of detecting comprises isolating nucleic acid from the subject; amplifying a segment of the mismatch repair gene or gene product from the isolated nucleic acid; comparing the amplified segment with an analogous segment of a wild-type allele of the mismatch repair gene or gene product; and detecting a difference between the amplified segment and the analogous segment, the difference being indicative of a mutation in the mismatch repair gene or gene product.

5. The method of claim 4 wherein the step of detecting comprises determining whether the difference between the amplified segment and the analogous segment causes an affected phenotype. 6. The method of claim 4 wherein the difference in nucleotide sequence is selected from the group consisting of deletions of at least one nucleotide, insertions of at least one nucleotide, substitutions of at least one nucleotide and nucleotide rearrangements.

7. The method of claim 4 wherein the step of amplifying comprises: reverse transcribing all or a portion of an RNA mismatch repair gene product to DNA; and amplifying a segment of the DNA produced by reverse transcription.

8. The method of claim 4 wherein the step of amplifying comprises: selecting a pair of oligonucleotide primers capable of hybridizing to opposite strands of the mismatch repair gene, and in opposite orientation; performing a polymerase chain reaction utilizing the oligonucleotide primers such that nucleic acid of the mismatch repair chain intervening between the primers is amplified to become the amplified segment.

9. The method of claim 8 wherein the intervening nucleic acid comprises at least a fragment of at least one exon of the mismatch repair gene.

10. The method of claim 9 wherein the at least one exon has a nucleotide sequence selected from the group consisting of SEQ ID NOS: 25-43. 11. The method of claim 1 or claim 2 wherein the step of detecting comprises detecting a mutation in a mutL homolog mismatch repair protein.

12. The method of claim 4 wherein the analogous segment of a wild-type allele of the mismatch repair gene or gene product comprises a wild- type hMLHl gene fragment having a unique portion of nucleotide sequence selected from the group consisting of: SEQ ID NOS: 6-24.

13. The method of claim 8 wherein the step of selecting comprises selecting a pair of oligonucleotide primers, each primer of the pair comprising a nucleotide sequence selected from the group consisting of: SEQ ID NOS: 44-82.

14. The method of claim 8 wherein the intervening nucleotide sequence that is amplified comprises a unique portion of at least one nucleotide sequence selected from the group consisting of: SEQ ID NOS: 6-24.

15. The method of claim 4 wherein the step of detecting a difference comprises detecting an hMLHl mutation characterized by a C to T transition mutation which produces a non-conservative amino acid substitution at position 44 of the hMLHl protein. 16. The method of claim 5 wherein the step of determining comprises: deriving a yeast strain that is deleted for its hMLHl gene; constructing a yeast homolog of the amplified segment including the difference; introducing the yeast homolog of the amplified segment into the yeast strain; and assaying the yeast strains ability to correct DNA mispairs.

17. The method of claim 5 wherein the step of determining comprises producing an hMLHl protein including amino acids corresponding to the difference; and determining the extent of interaction between the hMLHl protein and an hPMSl protein compared to the degree of protein-protein interaction observed with wild-type hMLHl and hPMSl proteins.

18. An isolated oligonucleotide primer capable of hybridizing specifically to all or a fragment of an hMLHl genomic sequence with a Tm of greater than about 55-degrees° C0.

19. The isolated oligonucleotide primer of claim 18, the oligonucleotide primer being extendable by a DNA polymerase.

20. The isolated oligonucleotide primer of claim 19, the oligonucleotide primer being capable of amplifying at least a portion of an hMLHl gene when used in a polymerase chain reaction including another primer. 21. The isolated oligonucleotide primer of claim 20, the oligonucleotide primer being at least 13 nucleotides in length.

22. The isolated oligonucleotide primer of claim 21 comprising a nucleotide sequence selected from the group consisting of SEQ ID NOS: 44-82.

23. An isolated nucleic acid including a segment having a nucleotide sequence substantially identical to a nucleotide sequence selected from the group consisting of SEQ ID NOS: 6-24.

24. An isolated nucleic acid including a segment having a nucleotide sequence substantially identical to a nucleotide sequence selected from the group consisting of SEQ ID NOS: 25-43.

25. A unique fragment of the nucleic acid of claim 23 or claim 24.

26. A method of detecting a mutation in a eukaryotic mutL homolog gene or fragment thereof comprising the steps of: isolating a eukaryotic mutL homolog gene or fragment thereof; and detecting a difference in activity between the isolated gene or fragment thereof and a wild-type allele of the gene or fragment thereof; the difference in activity being indicative of a mutation in the eukaryotic mutL homolog gene or fragment thereof. 27. A method of detecting a mutation in a eukaryotic mutL homolog gene or gene product comprising detecting a difference in activity between the gene or gene product and a wild-type version of the gene or gene product, the difference in activity being indicative of a mutation in the mutL homolog gene or gene product.

28. The method of claim 26 wherein the eukaryotic mutL homolog gene or fragment thereof comprises a human gene or fragment thereof.

29. The method of claim 27 wherein the mutL homolog gene or gene product comprises a human gene or gene product.

30. The method of claim 28 or claim 29 wherein the gene comprises an hMLHl and the wild-type version of the gene comprises a wild-type allele of the hMLHl gene.

31. The method of claim 28 or claim 29 wherein the gene comprises a hPMSl and the wild-type version of the gene comprises a wild-type allele of the hPMSl gene.

32. The method of claim 30 wherein the wild-type version of the hMLHl gene comprises a nucleotide sequence substantially identical to a nucleotide sequence selected from the group consisting of SEQ ID NOS: 6-24, and unique fragments thereof. 33. The method of claim 30 wherein the wild-type version of the hMLHl gene encodes a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 5 and unique fragments thereof.

34. The method of claim 28 or claim 29 wherein the human mismatch repair gene product comprises a hMLHl protein or unique fragment thereof.

35. The method of claim 34 wherein the hMLHl protein comprises an amino acid sequence selected from the group consisting of SEQ ID NO: 5 and unique fragments thereof.

36. An isolated nucleotide or protein structure including a segment sequentially conesponding to a unique portion of a human mutL homolog gene or gene product.

37. The nucleotide of claim 36 wherein the mutL homolog gene is hMLHl or hPMSl.

38. A pair of oligonucleotide primers capable of being used together in a polymerase chain reaction to amplify specifically a unique segment of a human mutL homolog gene.

39. The pair of oligonucleotide primers of claim 38 wherein the mutL homolog gene is hMLHl or hPMSl. 40. A probe comprising a nucleotide sequence capable of binding specifically by Watson/Crick pairing to complementary bases in a portion of a human mutL homolog gene; and a label-moiety attached to the sequence, wherein the label-moiety has a property selected from the group consisting of fluorescent, radioactive and chemiluminescent.

41. The probe of claim 40 wherein the human mutL homolog gene is hMLHl or hPMSl.

42. An amplified quantity of a nucleotide including a segment corresponding to a unique portion of a human mutL homolog gene.

43. The nucleotide of claim 42 wherein the human mutL homolog gene is hMLHl or hPMSl.

44. A pair of oligonucleotide primers capable of being employed in a polymerase chain reaction to amplify specifically a single exon from a human mutL homolog gene along with selected portions of flanking upstream and downstream introns.

45. The primers of claim 44 wherein the human mutL homolog gene' is hMLHl or hPMSl. 46. The method of claim 1 wherein the detecting step comprises detecting a mutation in a portion of the individual's hMLHl gene, the portion being homologous to the DNA sequence including and between the two sets of underlined bases in Figure 3.

47. The nucleotide of claim 37 wherein the segment is homologous to the DNA sequence including and between the two sets of underlined bases in Figure 3.

48. An isolated nucleotide or protein structure including a segment substantially conesponding to a unique portion of a mouse mutL homolog gene or gene product.

49. The structure of claim 48 wherein the segment substantially conesponds to a unique portion of a mammalian MLHl or PMSl gene or protein.

50. Purified antibodies binding specifically to a MutL homolog protein.

51. The antibodies of claim 50 wherein the antibodies are monoclonal antibodies.

52. The antibodies of claim 50 wherein the MutL homolog protein is a human protein. 53. The antibodies of claim 52 wherein the protein is hMLHl or hPMSl.

54. The antibodies of claim 50 wherein the MutL homolog protein is a mouse protein.

55. The antibodies of claim 54 wherein the protein is mMLHl or mPMSl.

Описание

[0001]

COMPOSITIONS AND METHODS RELATING TO DNA MISMATCH REPAIR GENES

[0002]

This invention was made with government support under Agreement No. GM 32741 and Agreement No. HG00395/GM50006 awarded by the National i 5 Institute of Health in the General Sciences Division. The government has certain

[0003]

* rights in the invention.

[0004]

This application is a continuation-in-part from U.S. Patent Application Serial No. 08/209,521, titled: MAMMALIAN DNA MISMATCH REPAIR GENES PMSl AND MLHl, filed on March 8, 1994, which is a 10 continuation-in-part from U.S. Patent Application Serial No. 08/168,877, filed on

[0005]

December 17, 1993. All of the above patent applications are incorporated by reference.

[0006]

Field of the Invention 15 The present invention involves DNA mismatch repair genes. In particular, the invention relates to identification of mutations and polymorphisms in DNA mismatch repair genes, to identification and characterization of DNA mismatch-repair-defective tumors, and to detection of genetic susceptibility to cancer. 20

[0007]

Background

[0008]

In recent years, with the development of powerful cloning and amplification techniques such as the polymerase chain reaction (PCR), in combination with a rapidly accumulating body of information concerning the

[0009]

25 structure and location of numerous human genes and markers, it has become practical and advisable to collect and analyze samples of DNA or RNA from

[0010]

» individuals who are members of families which are identified as exhibiting a high frequency of certain genetically transmitted disorders. For example, screening procedures are routinely used to screen for genes involved in sickle cell anemia,

[0011]

30 cystic fibrosis, fragile X chromosome syndrome and multiple sclerosis. For some types of disorders, early diagnosis can greatly improve the person's long-term prognosis by, for example, adopting an aggressive diagnostic routine, and/or by making life style changes if appropriate to either prevent or prepare for an anticipated problem.

[0012]

Once a particular human gene mutation is identified and linked to a disease, development of screening procedures to identify high-risk individuals can be relatively straight forward. For example, after the structure and abnormal phenotypic role of the mutant gene are understood, it is possible to design primers for use in PCR to obtain amplified quantities of the gene from individuals for testing. However, initial discovery of a mutant gene, i.e., its structure, location and linkage with a known inherited health problem, requires substantial experimental effort and creative research strategies.

[0013]

One approach to discovering the role of a mutant gene in causing a disease begins with clinical studies on individuals who are in families which exhibit a high frequency of the disease. In these studies, the approximate location of the disease-causing locus is determined indirectly by searching for a chromosome marker which tends to segregate with the locus. A principal limitation of this approach is that, although the approximate gertomic location of

[0014]

- the gene can be determined, it does not generally allow actual isolation or sequencing of the gene. For example, Lindblom et al.3 reported results of linkage analysis studies performed with SSLP (simple sequence length polymorphism) markers on individuals from a family known to exhibit a high incidence of hereditary non-polyposis colon cancer (HNPCC). Lindblom et al. found a "tight linkage" between a polymorphic marker on the short arm of human chromosome 3 (3p21-23) and a disease locus apparently responsible for increasing an individual's risk of developing colon cancer. Even though 3p21-23 is a fairly specific location relative to the entire genome, it represents a huge DNA region relative to the probable size of the mutant gene. The mutant gene could be separated from the markers identifying the locus by millions of bases. At best, such linkage studies have only limited utility for screening purposes because in order to predict one person's risk, genetic analysis must be performed with tightly linked genetic markers on a number of related individuals in the family. It is often impossible to obtain such information, particularly if affected family members are deceased. Also, informative markers may not exist in the family under analysis. Without knowing the gene's structure, it is not possible to sample, amplify, sequence and determine directly whether an individual carries the mutant gene.

[0015]

Another approach to discovering a disease-causing mutant gene ^ 5 begins with design and trial of PCR primers, based on known information about the disease, for example, theories for disease state mechanisms, related protein structures and function, possible analogous genes in humans or other species, etc. The objective is to isolate and sequence candidate normal genes which are believed to sometimes occur in mutant forms rendering an individual disease

[0016]

10 prone. This approach is highly dependent on how much is known about the disease at the molecular level, and on the investigator's ability to construct strategies and methods for finding candidate genes. Association of a mutation in a candidate gene with a disease must ultimately be demonstrated by performing tests on members of a family which exhibits a high incidence of the disease. The 15 most direct and definitive way to confirm such linkage in family studies is to use

[0017]

PCR primers which are designed to amplify portions of the candidate gene in samples collected from the family members. The amplified gene products are then sequenced and compared to the normal gene structure for the purpose of finding and characterizing mutations. A given mutation is ultimately implicated

[0018]

20 by showing that affected individuals have it while unaffected individuals do not, and that the mutation causes a change in protein function which is not simply a polymorphism.

[0019]

Another way to show a high probability of linkage between a candidate gene mutation and disease is by determining the chromosome location

[0020]

25 of the gene, then comparing the gene's map location to known regions of disease- linked loci such as the one identified by Lindblom et al. Coincident map location of a candidate gene in the region of a previously identified disease-linked locus may strongly implicate an association between a mutation in the candidate gene and the disease.

[0021]

30 There are other ways to show that mutations in a gene candidate may be linked to the disease. For example, artificially produced mutant forms of the gene can be introduced into animals. Incidence of the disease in animals carrying the mutant gene can then be compared to animals with the normal genotype. Significantly elevated incidence of disease in animals with the mutant genotype, relative to animals with the wild-type gene, may support the theory that mutations in the candidate gene are sometimes responsible for occurrence of the disease.

[0022]

One type of disease which has recently received much attention because of the discovery of disease-linked gene mutations is Hereditary Nonpolyposis Colon Cancer (HNPCC).1'2 Members of HNPCC families also display increased susceptibility to other cancers including endometrial, ovarian, gastric and breast. Approximately 10% of colorectal cancers are believed to be

[0023]

HNPCC. Tumors from HNPCC patients display an unusual genetic defect in which short, repeated DNA sequences, such as the dinucleotide repeat sequences found in human chromosomal DNA ("microsatellite DNA"), appear to be unstable. This genomic instability of short, repeated DNA sequences, sometimes called the "RER+" phenotype, is also observed in a significant proportion of a wide variety of sporadic tumors, suggesting that many sporadic tumors may have acquired mutations that are similar (or identical) to mutations that are inherited in HNPCC.

[0024]

Genetic linkage studies have identified two HNPCC loci thought to account for as much as 90% of HNPCC. The loci map to human chromosome

[0025]

2pl5-16 (2p21) and 3p21-23. Subsequent studies have identified human DNA mismatch repair gene hMSH2 as being the gene on chromosome 2p21, in which mutations account for a significant fraction of HNPCC cancers.1'2% u hMSH2 is one of several genes whose normal function is to identify and correct DNA mispairs including those that follow each round of chromosome replication.

[0026]

The best defined mismatch repair pathway is the E.coli MutHLS pathway that promotes a long-patch (approximately 3Kb) excision repair reaction which is dependent on the mutH, mutL, mutS and mutU (uvrD) gene products. The MutHLS pathway appears to be the most active mismatch repair pathway in E.coli and is known to both increase the fidelity of DNA replication and to act on recombination intermediates containing mispaired bases. The system has been reconstituted in vitro, and requires the mutH, mutL, mutS and uvrD (helicase II) proteins along with DNA polymerase III holoenzyme, DNA ligase, single-stranded DNA binding protein (SSB) and one of the single-stranded DNA exonucleases, Exo I, Exo VII or RecJ. hMSH2 is homologous to the bacterial mutS gene. A similar pathway in yeast includes the yeast MSH2 gene and two mutL-like genes f 5 referred to as PMSl and MLHl.

[0027]

With the knowledge that mutations in a human mutS type gene (hMSH2) sometimes cause cancer, and the discovery that HNPCC tumors exhibit microsatellite DNA instability, interest in other DNA mismatch repair genes and gene products, and their possible roles in HNPCC and/or other cancers, has

[0028]

10 intensified. It is estimated that as many as 1 in 200 individuals carry a mutation in either the hMSH2 gene or other related genes which encode for other proteins in the same DNA mismatch repair pathway.

[0029]

An important objective of our work has been to identify human genes which are useful for screening and identifying individuals who are at

[0030]

15 elevated risk of developing cancer. Other objects are: to determine the sequences of exons and flanking intron structures in such genes; to use the structural information to design testing procedures for the purpose of finding and characterizing mutations which result in an absence of or defect in a gene product which confers cancer susceptibility; and to distinguish such mutations from

[0031]

20 "harmless" polymorphic variations. Another object is to use the structural information relating to exon and flanking intron sequences of a cancer-linked gene, to diagnose tumor types and prescribe appropriate therapy. Another object is to use the structural information relating to a cancer-linked gene to identify other related candidate human genes for study.

[0032]

25

[0033]

Summary of the Invention » Based on our knowledge of DNA mismatch repair mechanisms in bacteria and yeast including conservation of mismatch repair genes, we reasoned* that human DNA mismatch repair homologs should exist, and that mutations in

[0034]

30 such homologs affecting protein function, would be likely to cause genetic instability, possibly leading to an increased risk of developing certain forms of human cancer. We have isolated and sequenced two human genes, hPMSl and hMLHl each of which encodes for a protein involved in DNA mismatch repair. hPMSl and hMLHl are homologous to mutL genes found in E.coli. Our studies strongly support an association between mutations in DNA mismatch repair genes and susceptibility to HNPCC. Thus, DNA mismatch repair gene sequence information of the present invention, namely, cDNA and genomic structures relating to hMLHl and hPMSl, make possible a number of useful methods relating to cancer risk determination and diagnosis. The invention also encompasses a large number of nucleotide and protein structures which are useful in such methods.

[0035]

We mapped the location of hMLHl to human chromosome 3p21-23. This is a region of the human genome that, based upon family studies, harbors a locus that predisposes individuals to HNPCC. Additionally, we have found a mutation in a conserved region of the hMLHl cDNA in HNPCC-affected individuals from a Swedish family. The mutation is not found in unaffected individuals from the same family, nor is it a simple polymorphism. We have also found that a homologous mutation in yeast results in a defective DNA mismatch repair protein. We have also found a frameshift mutation in hMLHl of affected individuals from an English family. Our discovery of a cancer-linked mutations in hMLHl, combined with the gene's map position which is coincident with a previously identified HNPCC-linked locus, plus the likely role of the hMLHl gene in mutation avoidance makes the hMLHl gene a prime candidate for underlying one form of common inherited human cancer, and a prime candidate to screen and identify individuals who have an elevated risk of developing cancer. hMLHl has 19 exons and 18 introns. We have determined the location of each of the 18 introns relative to hMLHl cDNA. We have also determined the structure of all intron/exon boundary regions of hMLHl. Knowledge of the intron/exon boundary structures makes possible efficient screening regimes to locate mutations which negatively affect the structure and function of gene products. Further, we have designed complete sets of oligonucleotide primer pairs which can be used in PCR to amplify individual complete exons together with surrounding intron boundary structures. We mapped the location of hPMSl to human chromosome 7. Subsequent studies by others39 have confirmed our prediction that mutations in this gene are linked to HNPCC.

[0036]

The most immediate use of the present invention will be in screening tests on human individuals who are members of families which exhibit an unusually high frequency of early onset cancer, for example HNPCC. Accordingly, one aspect of the invention comprises a method of diagnosing cancer susceptibility in a subject by detecting a mutation in a mismatch repair gene or gene product in a tissue from the subject, wherein the mutation is indicative of the subject's susceptibility to cancer. In a preferred embodiment of the invention, the step of detecting comprises detecting a mutation in a human mutL homolog gene, for example, hMLHl of hPMSl.

[0037]

The method of diagnosing preferably comprises the steps of: 1) amplifying a segment of the mismatch repair gene or gene product from an isolated nucleic acid; 2) comparing the amplified segment with an analogous segment of a wild-type allele of the mismatch repair gene or gene product; and 3) detecting a difference between the amplified segment and the analogous segment, the difference being indicative of a mutation in the mismatch repair gene or gene product which confers cancer susceptibility. Another aspect of the invention provides methods of determining whether the difference between the amplified segment and the analogous wild- type segment causes an affected phenotype, i.e., does the sequence alteration affect the individual's ability to repair DNA mispairs.

[0038]

The method of diagnosing may include the steps of: 1) reverse transcribing all or a portion of an RNA copy of a DNA mismatch repair gene; and 2) amplifying a segment of the DNA produced by reverse transcription. An amplifying step in the present invention may comprise: selecting a pair of oligonucleotide primers capable of hybridizing to opposite strands of the mismatch repair gene, in an opposite orientation; and performing a polymerase chain reaction utilizing the oligonucleotide primers such that nucleic acid of the mismatch repair chain intervening between the primers is amplified to become the amplified segment. In preferred embodiments of the methods summarized above, the DNA mismatch repair gene is hMLHl or hPMSl. The segment of DNA corresponds to a unique portion of a nucleotide sequence selected from the group consisting of SEQ ID NOS: 6-24. "First stage" oligonucleotide primers selected from the group consisting of SEQ ID NOS: 44-82 are used in PCR to amplify the

[0039]

DNA segment are . The invention also provides a method of using "second stage" nested primers (SEQ ID NOS: 83-122), for use with the first stage primers to allow more specific amplification and conservation of template DNA.

[0040]

Another aspect of the present invention provides a method of identifying and classifying a DNA mismatch repair defective tumor comprising detecting in a tumor a mutation in a mismatch repair gene or gene product, preferably a mutL homolog (hMLHl or hPMSl), the mutation being indicative of a defect in a mismatch repair system of the tumor.

[0041]

The present invention also provides useful nucleotide and protein compositions. One such composition is an isolated nucleotide or protein structure including a segment sequentially corresponding to a unique portion of a human mutL homolog gene or gene product, preferably derived from either hMLHl or hPMSl.

[0042]

Other composition aspects of the invention comprise oligonucleotide primers capable of being used together in a polymerase chain reaction to amplify specifically a unique segment of a human mutL homolog gene, preferably hMLHl or hPMSl.

[0043]

Another aspect of the present invention provides a probe including a nucleotide sequence capable of binding specifically by Watson/Crick pairing to complementary bases in a portion of a human mutL homolog gene; and a label- moiety attached to the sequence, wherein the label-moiety has a property selected from the group consisting of fluorescent, radioactive and chemiluminescent.

[0044]

We have also isolated and sequenced mouse MLHl (mMLHl) and

[0045]

PMSl (mPMSl) genes. We have used our knowledge of mouse mismatch repair genes to construct animal models for studying cancer. The models will be useful to identify additional oncogenes and to study environmental effects on mutagenesis. We have produced polyclonal antibodies directed to a portion of the protein encoded by mPMSl cDNA. The antibodies also react with hPMSl protein and are useful for detecting the presence of the protein encoded by a normal hPMSl gene. We are also producing monoclonal antibodies directed to hMLHl and hPMSl.

[0046]

In addition to diagnostic and therapeutic uses for the genes, our knowledge of hMLHl and hPMSl can be used to search for other genes of related function which are candidates for playing a role in certain forms of human cancer.

[0047]

Description of the Figures

[0048]

Figure 1 is a flow chart showing an overview of the sequence of experimental steps we used to isolate, characterize and use human and mouse

[0049]

PMSl and MLHl genes. Figure 2 is an alignment of protein sequences for mutL homologs

[0050]

(SEQ ID NOS: 1-3) showing two highly-conserved regions (underlined) which we

[0051]

- used to create degenerate PCR oligonucleotides for isolating additional mutL

[0052]

« homologs.

[0053]

Figure 3 shows the entire cDNA nucleotide sequence (SEQ ID NO: 4) for the human MLHl gene, and the corresponding predicted amino acid sequence (SEQ ID NO: 5) for the human MLHl protein. The underlined DNA sequences are the regions of cDNA that correspond to the degenerate PCR primers that were originally used to amplify a portion of the MLHl gene (nucleotides 118-135 and 343-359). Figure 4 A shows the nucleotide sequences of the 19 exons which collectively correspond to the entire hMLHl cDNA structure. The exons are flanked by intron boundary structures. Primer sites are underlined. The exons with their flanking intron structures correspond to SEQ ID NOS: 6-24. The exons, shown in non-underlined small case letters, corespond to SEQ ID NOS: 25-43.

[0054]

Figure 4B shows nucleotide sequences of primer pairs which have been used in PCR to amplify the individual exons. The "second stage" amplification primers (SEQ ID NOS: 83-122) are "nested" primers which are used to amplify target exons from the amplification product obtained with corresponding "first stage" amplification primers (SEQ ID NOS: 44-82). The structures in Figure 4B correspond to the structures in Tables 2 and 3. Figure 5 is an alignment of the predicted amino acid sequences for human and yeast (SEQ ID NOS: 5 and 123, respectively) MLHl proteins. Amino acid identities are indicated by boxes and gaps are indicated by dashes. Figure 6 is a phylogenetic tree of MutL-related proteins. Figure 7 is a two-panel photograph. The first panel (A) is a metaphase spread showing hybridization of the hMLHl gene of chromosome 3.

[0055]

The second panel (B) is a composite of chromosome 3 from multiple metaphase spreads aligned with a human chromosome 3 ideogram. The region of hybridization is indicated in the ideogram by a vertical bar.

[0056]

Figure 8 is a comparison of sequence chromatograms from affected and unaffected individuals showing identification of a C to T transition mutation that produces a non-conservative amino acid substitution at position 44 of the hMLHl protein.

[0057]

Figure 9 is an amino acid sequence alignment (SEQ ID NOS: 124- 131) of the highly-conserved region of the MLH family of proteins surrounding the site of the predicted amino acid substitution. Bold type indicates the position of the predicted serine to phenylalanine amino acid substitution in affected individuals. Also highlighted are the serine or alanine residues conserved at this position in MutL-like proteins. Bullets indicate positions of highest amino acid conservation. For the MLHl protein, the dots indicate that the sequence has not been obtained. Sequences were aligned as described below in reference to the phylogenetic tree of Figure 6.

[0058]

Figure 10 shows the entire nucleotide sequence for hPMSl (SEQ ID NO: 132).

[0059]

Figure 11 is an alignment of the predicted amino acid sequences for human and yeast PMSl proteins (SEQ ID NOS: 133 and 134, respectively).

[0060]

Amino acid identities are indicated by boxes and gaps are indicated by dashes. Figure 12 is a partial nucleotide sequence of mouse MLHl mMLHl) cDNA (SEQ ID NO: 135).

[0061]

Figure 13 is a comparison of the predicted amino acid sequence for mMLHl and hMLHl proteins (SEQ ID NOS: 136 and 5, respectively). Figure 14 shows the cDNA nucleotide sequence for mouse PMSl

[0062]

(mPMSl) (SEQ ID NO: 137).

[0063]

Figure 15 is a comparison of the predicted amino acid sequences for mPMSl and hPMSl proteins (SEQ ID NOS: 138 and 133, respectively).

[0064]

Definitions gene - "Gene" means a nucleotide sequence that contains a complete coding sequence. Generally, "genes" also include nucleotide sequences found upstream (e.g. promoter sequences, enhancers, etc.) or downstream (e.g. transcription termination signals, polyadenylation sites, etc.) of the coding sequence that affect the expression of the encoded polypeptide.

[0065]

gene product - A "gene product" is either a DNA or RNA (mRNA) copy of a portion of a gene, or a corresponding amino acid sequence translated from mRNA.

[0066]

wild-type - The term "wild-type", when applied to nucleic acids and proteins of the present invention, means a version of a nucleic acid or protein that functions in a manner indistinguishable from a naturally-occurring, normal version of that nucleic acid or protein (i.e. a nucleic acid or protein with wild-type activity). For example, a "wild-type" allele of a mismatch repair gene is capable of functionally replacing a normal, endogenous copy of the same gene within a host cell without detectably altering mismatch repair in that cell. Different wild-type versions of the same nucleic acid or protein may or may not differ structurally from each other.

[0067]

non-wild-type - The term "non-wild-type" when applied to nucleic acids and proteins of the present invention, means a version of a nucleic acid or protein that functions in a manner distinguishable from a naturally-occurring, normal version of that nucleic acid or protein. Non-wild-type alleles of a nucleic acid of the invention may differ structurally from wild-type alleles of the same nucleic acid in any of a variety of ways including, but not limited to, differences in the amino acid sequence of an encoded polypeptide and/or differences in expression levels of an encoded nucleotide transcript of polypeptide product.

[0068]

For example, the nucleotide sequence of a non-wild-type allele of a nucleic acid of the invention may differ from that of a wild-type allele by, for example, addition, deletion, substitution, and/or rearrangement of nucleotides. Similarly, the amino acid sequence of a non-wild-type mismatch repair protein may differ from that of a wild-type mismatch repair protein by, for example, addition, substitution, and/or rearrangement of amino acids.

[0069]

Particular non-wild-type nucleic acids or proteins that, when introduced into a normal host cell, interfere with the endogenous mismatch repair pathway, are termed "dominant negative" nucleic acids or proteins.

[0070]

homologous - The term "homologous" refers to nucleic acids or polypeptides that are highly related at the level of nucleotide or amino acid sequence. Nucleic acids or polypeptides that are homologous to each other are termed "homologues".

[0071]

The term "homologous" necessarily refers to a comparison between two sequences. In accordance with the invention, two nucleotide sequences are considered to be homologous if the polypeptides they encode are at least about 50-60% identical, preferably about 70% identical, for at least one stretch of at least 20 amino acids. Preferably, homologous nucleotide sequences are also characterized by the ability to encode a stretch of at least 4-5 uniquely specified amino acids. Both the identity and the approximate spacing of these amino acids relative to one another must be considered for nucleotide sequences to be considered to be homologous. For nucleotide sequences less than 60 nucleotides in length, homology is determined by the ability to encode a stretch of at least 4-5 uniquely specified amino acids. upstream/downstream - The terms "upstream" and "downstream" are art- understood terms referring to the position of an element of nucleotide sequence. "Upstream" signifies an element that is more 5' than the reference element. "Downstream" refers to an element that is more 3' than a reference element.

[0072]

intron/exon - The terms "exon" and "intron" are art-understood terms referring to various portions of genomic gene sequences. "Exons" are those portions of a genomic gene sequence that encode protein. "Introns" are sequences of nucleotides found between exons in genomic gene sequences.

[0073]

affected - The term "affected", as used herein, refers to those members of a kindred that either have developed a characteristic cancer (e.g. colon cancer in an HNPCC lineage) and/or are predicted, on the basis of, for example, genetic studies, to carry an inherited mutation that confers susceptibility to cancer.

[0074]

unique - A "unique" segment, fragment or portion of a gene or protein means a portion of a gene or protein which is different sequentially from any other gene or protein segment in an individual's genome. As a practical matter, a unique segment or fragment of a gene will typically be a nucleotide of at least about 13 bases in length and will be sufficiently different from other gene segments so that oligonucleotide primers may be designed and used to selectively and specifically amplify the segment. A unique segment of a protein is typically an amino acid sequence which can be translated from a unique segment of a gene.

[0075]

References

[0076]

The following publications are referred to by number in the text of the application. Each of the publications is incorporated here by reference.

[0077]

1. Fishel, R., et al. Cell 75, 1027-1038 (1993).

[0078]

2. Leach, F., et al. Cell 75, 1215-1225 (1993). 3. Lindblom, A., Tannergard, PI, Werelius, B. & Nordenskjold, M. Nature

[0079]

Genetics 5, 279-282 (1993). 4. Prolla, T.A., Christie, D.M. & Liskay, R.M. Molec. and Cell. Biol. 14, 407- 415 (1994). 5. Strand, M. Prolla, TA., Liskay, R.M. & Petes, T.D. Nature 365, 274-276 (1993).

[0080]

6. Aaltonen, LA., et al. Science 260, 812-816 (1993).

[0081]

7. Han, H J., Yanagisawa, A, Kato, Y., Park, J.G. & Nakamura, Y. Cancer 53, 5087-5089 (1993).

[0082]

8. Ionov, Y., Peinado, MA. , Malkhosyan, S., Shibata, D. & Perucho, M. Nature 363, 558-561 (1993).

[0083]

9. Risinger, J.I. et al. Cancer 53, 5100-5103 (1993).

[0084]

10. Thibodeau, S.N., Bren, G. & Shaid, D. Science 260, 816-819 (1993). 11. Levinson, G. & Gutman, GA. Nucleic Acids Res. 15, 5323-5338 (1987).

[0085]

12. Parsons, R., et al. Cell 75, 1227-1236 (1993).

[0086]

13. Modrich, P. Ann. Rev. of Genet. 25, 229-53 (1991).

[0087]

14. Reenan, RA. & Kolodner, R.D. Genetics 132, 963-73 (1992).

[0088]

15. Bishop, D.K., Anderson, J. & Kolodner, R.D. PNAS 86, 3713-3717 (1989). 16. Kramer, W., Kramer, B., Williamson, M.S. & Fogel, S. J. Bacteriol. 171,

[0089]

5339-5346 (1989).

[0090]

17. Williamson, M.S., Game, J.C. & Fogel, S., Genetics 110, 609-646 (1985).

[0091]

18. Prudhomme, M., Martin, B., Mejean, V. & Claverys, J. J. Bacteriol. 171, 5332-5338 (1989). 19. Mankovich, J.A., Mclntyre, CA & Walker, G.C. J. Bacteriol. 171, 5325-

[0092]

5331 (1989).

[0093]

20. Lichter, P., et al. Science 247, 64-69 (1990).

[0094]

21. Boyle, A., Feltquite, D.M., Dracopoli, N., Housman, D. & Ward, D.C. Genomics 12, 106-115 (1992). 22. Lyon, M.F. & Kirby, M.C., Mouse Genome 91, 40-80 (1993).

[0095]

23. Reenan, RA. & Kolodner, R.D. Genetics 132, 975-85 (1992).

[0096]

24. Latif, F. et al. Cancer Research 52, 1451-1456 (1992).

[0097]

25. Naylor, S.L., Johnson, B.E., Minna, J.D. & Sakaguchi, A.Y. Nature 329, 451-454 (1987). 26. Ali, I.U., Lidereau, R. & Callahan, R. Journal of the National Cancer

[0098]

Institute 81, 1815-1820 (1989). 27. Higgins, D., Bleasby, A. & Fuchs, R. Comput. Apple Biosci. 8, 189-191 (1992).

[0099]

28. Fields, S. & Song, O.K. Nature 340, 245-246 (1989).

[0100]

29. Lynch, H.T., et al. Gastroenterology 104, 1535-1549 (1993). 30. Elledge, S J., Mulligan, J.T., Ramer, S.W., Spottswood, M. & Davis, R.W.

[0101]

Proc. Natl. Acad. Sci. U.S.A. 88, 1731-1735 (1991).

[0102]

31. Frohman, M. Amplifications, a forum for PCR users 1, 11-15 (1990).

[0103]

32. Powell, S.M., et al. New England Journal of Medicine 329, 1982-1987 (1993). 33. Wu, D.Y., Nozari, G. Schold, M., Conner, B J. & Wallace, R.B. DNA 8,

[0104]

135-142 (1989).

[0105]

34. Mullis, K.E.B. & Faloona, FA. Methods in Enzymology 155, 335-350 (1987).

[0106]

35. Bishop, T.D., Thomas, H. Cancer Sur. 9, 585-604 (1990). 36. Capecchi, M.R. Scientific American 52-59 (March 1994).

[0107]

37. Erlich, HA. PCR Technology, Principles and Applications for DNA Amplification (1989).

[0108]

38. Papadopoulos et al. Science 263, 1625-1629 (March 1994).

[0109]

39. Nicolaides et al. Nature 371, 75-80 (September 1994). 40. Tong et al. Anal. Chem. 64, 2672-2677 (1992).

[0110]

41. Debuire et al. Clin. Chem. 39, 1682-5 (1993).

[0111]

42. Wahlberg et al. Electrophoresis 13, 547-551 (1992).

[0112]

43. Kaneoka et al. Biotechniques 10, 30, 32, 34 (1991).

[0113]

44. Huhman et al. Biotechniques 10, 84-93 (1991). 45. Hultman et al. Nuc. Acid. Res. 17, 4937-46 (1989).

[0114]

46. Zu et al. Mutn. Res. 288, 232-248 (1993).

[0115]

47. Espelund et al. Biotechniques 13, 74-81 (1992).

[0116]

48. Prolla et al. Science 265, 1091-1093 (1994).

[0117]

49! Bishop et al. Mol. Cell. Biol. 6, 3401-3409 (1986). 50. Folger et al. Mol. Cell. Biol. 5, 70-74 (1985).

[0118]

51. T.C. Brown et al. Cell 54, 705-711 (1988).

[0119]

52. T.C. Brown et al. Genome 31, 578-583 (1989). 53. C. Muster-Nassal et al. Proc. Natl. Acad. Sci. U.S.A. 83, 7618-7622 (1986).

[0120]

54. I. Varlet et al. Proc. Natl. Acad. Sci. U.S.A. 87, 7883-7887 (1990).

[0121]

55. D.C. Thomas et al. J. Biol. Chem. 266, 3744-3751 (1991).

[0122]

56. JJ. Holmes et al. Proc. Natl. Acad. Sci. U.S.A. 87, 5837-5841 (1990). 57. P. Branch et al. Nature 362, 652-654 (1993).

[0123]

58. A Kat et al. Proc. Natl. Acad. Sci. U.S.A. 90, 6424-6428 (1993).

[0124]

59. K. Wiebauer et al. Nature 339, 234-236 (1989).

[0125]

60. K. Wiebauer et al. Proc. Natl. Acad. Sci. U.S.A. 87, 5842-5845 (1990).

[0126]

61. P. Neddermann et al. J. Biol. Chem. 268, 21218-24 (1993). 62. Kramer et al. Mol. Cell Biol. 9:4432-40 (1989).

[0127]

63. Kramer et al. J. Bacteriol. 171:5339-5346 (1989).

[0128]

Description of the Invention We have discovered mammalian genes which are involved in DNA mismatch repair. One of the genes, hPMSl, encodes a protein which is homologous to the yeast DNA mismatch repair protein PMSl. We have mapped the locations oi hPMSl to human chromosome 7 and the mouse PMSl gene to mouse chromosome 5, band G. Another gene, hMLHl (MutL Homolog) encodes a protein which is homologous to the yeast DNA mismatch repair protein MLHl. We have mapped the locations of hMLHl to human chromosome 3p21.3-23 and to mouse chromosome 9, band E.

[0129]

Studies1'2 have demonstrated involvement of a human DNA mismatch repair gene homolog, hMSH2, on chromosome 2p in HNPCC. Based upon linkage data, a second HNPCC locus has been assigned to chromosome 3p21-23.3 Examination of tumor DNA from the chromosome 3-linked kindreds revealed dinucleotide repeat instability similar to that observed for other HNPCC families6 and several types of sporadic tumors.7"10 Because dinucleotide repeat instability is characteristic of a defect in DNA mismatch repair,5'π'12 we reasoned that HNPCC linked to chromosome 3p21-23 could result from a mutation in a second DNA mismatch repair gene.

[0130]

Repair of mismatched DNA in Escherichia coli requires a number of genes including mutS, mutL and mutH, defects in any one of which result in elevated spontaneous mutation rates.13 Genetic analysis in the yeast Saccharomyces cerevisiae has identified three DNA mismatch repair genes: a mutS homolog, MSH2,14 and two mutL homologs, PMSl16 and MLHl.4 Each of these three genes play an indispensable role in DNA replication fidelity, including the stabilization of dinucleotide repeats.5

[0131]

We believe that hMLHl is the HNPCC gene previously linked to chromosome 3p based upon the similarity of the hMLHl gene product to the yeast DNA mismatch repair protein, MLHl,4 the coincident location of the hMLHl gene and the HNPCC locus on chromosome 3, and hMLHl missense mutations which we found in affected individuals from chromosome 3-linked

[0132]

HNPCC families.

[0133]

Our knowledge of the human and mouse MLHl and PMSl gene structures has many important uses. The gene sequence information can be used to screen individuals for cancer risk. Knowledge of the gene structures makes it possible to' easily design PCR primers which can be used to selectively amplify portions of hMLHl and hPMSl genes for subsequent comparison to the normal sequence and cancer risk analysis. This type of testing also makes it possible to search for and characterize hMLHl and hPMSl cancer-linked mutations for the purpose of eventually focusing the cancer screening effort on specific gene loci. Specific characterization of cancer-linked mutations in hMLHl and hPMSl makes possible the production of other valuable diagnostic tools such as allele specific probes which may be used in screening tests to determine the presence or absence of specific gene mutations.

[0134]

Additionally, the gene sequence information for hMLHl and/or hPMSl can be used, for example, in a two hybrid system, to search for other genes of related function which are candidates for cancer involvement.

[0135]

The hMLHl and hPMSl gene structures are useful for making proteins which are used to develop antibodies directed to specific portions or the complete hMLHl and hPMSl proteins. Such antibodies can then be used to isolate the corresponding protein and possibly related proteins for research and diagnostic purposes. The mouse MLHl and PMSl gene sequences are useful for producing mice that have mutations in the respective gene. The mutant mice are useful for studying the gene's function, particularly its relationship to cancer.

[0136]

Methods for Isolating and Characterizing

[0137]

Mammalian MLHl and PMSl Genes

[0138]

We have isolated and characterized four mammalian genes, i.e., human MLHl (hMLHl), human PMSl (hPMSl), mouse MLHl (mPMSl) and mouse PMSl (mPMSl). Due to the structural similarity between these genes, the methods we have employed to isolate and characterize them are generally the same. Figure 1 shows in broad terms, the experimental approach which we used to isolate and characterize the four genes. The following discussion refers to the step-by-step procedure shown in Figure 1. Step 1 Design of degenerate oligonucleotide pools for PCR Earlier reports indicated that portions of three MutL-like proteins, two from bacteria, MutL and HexB, and one from yeast, PMSl are highly conserved.16'18'19 After inspection of the amino acid sequences of HexB, MutL and PMSl proteins, as shown in Figure 2, we designed pools of degenerate oligonucleotide pairs corresponding to two highly-conserved regions, KELVEN and GFRGEA, of the MutL-like proteins. The sequences (SEQ ID NOS: 139 and 140, respectively) of the degenerate oligonucleotides which we used to isolate the four genes are:

[0139]

5'-CTTGATT_C_rAGAGC(T/C)TCNCCNC(T/G)(A/G)AANCC-3' and 5'-AGGTCGG____3CTCAA(A/G)GA(A/G)(T/C)TNGTNGANAA-3'. The underlined sequences within the primers are Xbάl and Sad restriction endonuclease sites respectively. They were introduced in order to facilitate the cloning of the PCR-amplified fragments. In the design of the oligonucleotides, we took into account the fact that a given amino acid can be coded for by more than one DNA triplet (codon). The degeneracy within these sequences are indicated by multiple nucleotides within parentheses or N, for the presence of any base at that position. Step 2 Reverse transcription and PCR on poly A+ selected mRNA isolated from human cells We isolated messenger (poly A+ enriched) RNA from cultured human cells, synthesized double-stranded cDNA from the mRNA, and performed PCR with the degenerate oligonucleotides.4 After trying a number of different

[0140]

PCR conditions, for example, adjusting the annealing temperature, we successfully amplified a DNA of the size predicted (~210bp) for a MutL-like protein. Step 3 Cloning and sequencing of PCR-generated fragments; identification of two gene fragments representing human PMSl and MLHl

[0141]

We isolated the PCR amplified material (~210bp) from an agarose gel and cloned this material into a plasmid (pUC19). We determined the DNA sequence of several different clones. The amino acid sequence inferred from the DNA sequence of two clones showed strong similarity to other known MutL-like proteins.4'16'18'19 The predicted amino acid sequence for one of the clones was most similar to the yeast PMSl protein. Therefore we named it hPMSl, for human PMSl. The second clone was found to encode a polypeptide that most closely resembles yeast MLHl protein and was named, hMLHl, for human MLHl. Step 4 Isolation of complete human and mouse PMSl and MLHl cDNA clones using the PCR fragments as probes We used the 210bp PCR-generated fragments of the hMLHl and hPMSl cDNAs, as probes to screen both human and mouse cDNA libraries (from Stratagene, or as described in reference 30). A number of cDNAs were isolated that corresponded to these two genes. Many of the cDNAs were truncated at the

[0142]

5' end. Where necessary, PCR techniques31 were used to obtain the 5' -end of the gene in addition to further screening of cDNA libraries. Complete composite cDNA sequences were used to predict the amino acid sequence of the human and mouse, MLHl and PMSl proteins. Step 5 Isolation of human and mouse, PMSl and MLHl genomic clones Information on genomic and cDNA structure of the human MLHl and PMSl genes are necessary in order to thoroughly screen for mutations in cancer prone families. We have used human cDNA sequences as probes to isolate the genomic sequences of human PMSl and MLHl. We have isolated four cosmids and two Pl clones for hPMSl, that together are likely to contain most, if not all, of the cDNA (exon) sequence. For hMLHl we have isolated four overlapping λ-phage clones containing S'-MLHl genomic sequences and four Pl clones (two full length clones and two which include the 5' coding end plus portions of the promoter region) Pl clone. PCR analysis using pairs of oligonucleotides specific to the 5' and 3' ends of the hMLHl cDNA, clearly indicates that the Pl clone contains the complete hMLHl cDNA information. Similarly, genomic clones for mouse PMSl and MLHl genes have been isolated and partially characterized (described in Step 8).

[0143]

Step 6 Chromosome positional mapping of the human and mouse,

[0144]

PMSl and MLHl genes by fluorescence in situ hybridization

[0145]

We used genomic clones isolated from human and mouse PMSl and

[0146]

MLHl for chromosomal localization by fluorescence in situ hybridization (FISH).2021 We mapped the human MLHl gene to chromosome 3p21.3-23, shown in Figure 7 as discussed in more detail below. We mapped the mouse MLHl gene to chromosome 9 band E, a region of synteny between mouse and human.22 In addition to FISH techniques, we used PCR with a pair of hMLHl -specific oligonucleotides to analyze DNA from a rodent/human somatic cell hybrid mapping panel (Coriell Institute for Medial Research, Camden, N.J.). Our PCR results with the panel clearly indicate that hMLHl maps to chromosome 3. The position of hMLHl 3p21.3-23 is coincident to a region known to harbor a second locus for HNPCC based upon linkage data.

[0147]

We mapped the hPMSl gene, as shown in Figure 12, to the long (q) arm of chromosome 7 (either 7ql 1 or 7q22) and the mouse PMSl to chromosome

[0148]

5 band G, two regions of synteny between the human and the mouse.22 We performed PCR using oligonucleotides specific to hPMSl on DNA from a rodent/human cell panel. In agreement with the FISH data, the location of hPMSl was confirmed to be on chromosome 7. These observations assure us that our human map position for hPMSl to chromosome 7 is correct. The physical localization of hPMSl is useful for the purpose of identifying families which may potentially have a cancer linked mutation in hPMSl.

[0149]

Step 7 Using genomic and cDNA sequences to identify mutations in hPMSl and hMLHl genes from HNPCC Families We have analyzed samples collected from individuals in HNPCC families for the purpose of identifying mutations in hPMSl or hMLHl genes. Our approach is to design PCR primers based on our knowledge of the gene structures, to obtain exon/intron segments which we can compare to the known normal sequences. We refer to this approach as an "exon-screening".

[0150]

Using cDNA sequence information we have designed and are continuing to design hPMSl and hMLHl specific oligonucleotides to delineate exon/intron boundaries within genomic sequences. The hPMSl and hMLHl specific oligonucleotides were used to probe genomic clones for the presence of exons containing that sequence. Oligonucleotides that hybridized were used as primers for DNA sequencing from the genomic clones. Exon-intron junctions were identified by comparing genomic with cDNA sequences. Amplification of specific exons from genomic DNA by PCR and sequencing of the products is one method to screen HNPCC families for mutations.1'2 We have identified genomic clones containing hMLHl cDNA information and have determined the structures of all intron/exon boundary regions which flanks the 19 exons of hMCHl. We have used the exon-screening approach to examine the MLHl gene of individuals from HNPCC families showing linkage to chromosome 3.3 As will be discussed in more detail below, we identified a mutation in the MLHl gene of one such family, consisting of a C to T substitution. We predict that the C to T mutation causes a serine to phenylalanine substitution in a highly- conserved region of the protein. We are continuing to identify HNPCC families from whom we can obtain samples in order to find additional mutations' in hMLHl and hPMSl genes. We are also using a second approach to identify mutations in hPMSl and hMLHl. The approach is to design hPMSl or hMLHl specific oligonucleotide primers to produce first-strand cDNA by reverse transcription off RNA. PCR using gene-specific primers will allow us to amplify specific regions from these genes. DNA sequencing of the amplified fragments will allow us to detect mutations.

[0151]

Step 8 Design targeting vectors to disrupt mouse PMSl and MLHl genes in ES cells; study mice deficient in mismatch repair.

[0152]

We constructed a gene targeting vector based on our knowledge of the genomic mouse PMSl DNA structure. We used the vector to disrupt the

[0153]

PMSl gene in mouse embryonic stem cells.36 The cells were injected into mouse blastocysts which developed into mice that are chimeric (mixtures) for cells carrying the PMSl mutation. The chimeric animals will be used to breed mice that are heterozygous and homozygous for the PMSl mutation. These mice will be useful for studying the role of the PMSl gene in the whole organism.

[0154]

Human MLHl

[0155]

The following discussion is a more detailed explanation of our experimental work relating to hMLHl. As mentioned above, to clone mammalian MLH genes, we used PCR techniques like those used to identify the yeast MSH1,

[0156]

MSH2 and MLHl genes and the human MSH2 gene.1'2'4'14 As template in the PCR, we used double-stranded cDNA synthesized from poly (A+ ) enriched RNA prepared from cultured primary human fibroblasts. The degenerate oligonucleotides were targeted at the N-terminal amino acid sequences KELVEN and GFRGEA (see Figure 3), two of the most conserved regions of the MutL family of proteins previously described for bacteria and yeast.16'18'19 Two PCR products of the predicted size were identified, cloned and shown to encode a predicted amino acid sequence with homology to MutL-like proteins. These two fragments generated by PCR were used to isolate human cDNA and genomic DNA clones.

[0157]

The oligonucleotide primers which we used to amplify human MutL- related sequences were 5' - CTTGATTCTAGAGC(T/C)TCNCCNC(T/G)(A/G)AANCC-3' (SEQ ID NO: 139) and 5' - AGGTCGGAGCTCAA(A/G)GA(A/G)(T/C)TNGTNGANAA-3' (SEQ ID NO: 140). PCR was carried out in 50 μL reactions containing cDNA template, 1.0 μM each primer, 5 IU of Taq polymerase (C) 50 mM KCl, 10 mM Tris buffer pH 7.5 and 1.5 mM MgCl. PCR was carried out for 35 cycles of 1 minute at 94 C°, 1 minute at 43 C° and 1.5 minutes at 62 C°. Fragments of the expected size, approximately 212 bp, were cloned into pUC19 and sequenced. The cloned MLHl PCR products were labeled with a random primer labeling kit (RadPrime, Gibco BRL) and used to probe human cDNA and genomic cosmid libraries by standard procedures. DNA sequencing of double-stranded plasmid

[0158]

DNAs was performed as previously described.1

[0159]

The hMLHl cDNA nucleotide sequence as shown in Figure 3 encodes an open reading frame of 2268 bp. Also shown in Figure 3 is the predicted protein sequence encoded for by the hMLHl cDNA. The underlined DNA sequences are the regions of cDNA that correspond to the degenerate PCR primers that were originally used to amplify a portion of the MLHl gene (nucleotides 118-135 and 343-359).

[0160]

Figure 4 A shows 19 nucleotide sequences corresponding to portions of hMLHl. Each sequence includes one of the 19 exons, in its entirety, surrounded by flanking intron sequences. Target PCR primer cites are underlined. More details relating to the derivation and uses of the sequences shown in Figure 4A, are set forth below.

[0161]

As shown in Figure 5, the hMLHl protein is comprised of 756 amino acids and shares 41% identity with the protein product of the yeast DNA mismatch repair gene, MLHl.4 The regions of the hMLHl protein most similar to yeast MLHl correspond to amino acids 11 through 317, showing 55% identity, and the last 13 amino acids which are identical between the two proteins. Figure 5 shows an alignment of the predicted human MLHl and S. cerevisiae MLHl protein sequences. Amino acid identities are indicated by boxes, and gaps are indicated by dashes. The pair wise protein sequence alignment was performed with DNAStar MegAlign using the clustal method.27 Pair wise alignment parameters were a ktuple of 1, gap penalty of 3, window of 5 and diagonals of 5. Furthermore, as shown in Figure 13, the predicted amino acid sequences of the human and mouse MLHl proteins show at least 74% identity.

[0162]

Figure 6 shows a phylogenetic tree of MutL-related proteins. The phylogenetic tree was constructed using the predicted amino acid sequences of 7 MutL-related proteins: human MLHl; mouse MLHl; S. cerevisiae MLHl; S. cerevisiae PMSl; E. coli; MutL; S. typhimurium MutL and S. pneumoniae HexB. Required sequences were obtained from GenBank release 7.3. The phylogenetic tree was generated with the PILΕUP program of the Genetics Computer Group software using a gap penalty of 3 and a length penalty of 0.1. The recorded DNA sequences of hMLHl and hPMSl have been submitted to GenBank.

[0163]

hMLHl Intron Location and Intron/Εxon Boundary Structures

[0164]

In our previous U.S. Patent Application No. 08/209,521, we described the nucleotide sequence of a complimentary DNA (cDNA) clone of a human gene, hMLHl. The cDNA sequence of hMLHl (SΕQ ID NO: 4) is presented in this application in Figure 3. We note that there may be some variability between individuals hMLHl cDNA structures, resulting from polymorphisms within the human population, and the degeneracy of the genetic code. In the present application, we report the results of our genomic sequencing studies. Specifically, we have cloned the human genomic region that includes the hMLHl gene, with specific focus on individual exons and surrounding intron/exon boundary structures. Toward the ultimate goal of designing a comprehensive and efficient approach to identify and characterize mutations which confer susceptibility to cancer, we believe it is important to know the wild- type sequences of intron structures which flank exons in the hMLHl gene. One advantage of knowing the sequence of introns near the exon boundaries, is that it makes it possible to design primer pairs for selectively amplifying entire individual exons. More importantly, it is also possible that a mutation in an intron region, which, for example, may cause a mRNA splicing error, could result in a defective gene product, i.e., susceptibility to cancer, without showing any abnormality in an exon region of the gene. We believe a comprehensive screening approach requires searching for mutations, not only in the exon or cDNA, but also in the intron structures which flank the exon boundaries.

[0165]

We have cloned the human genomic region that includes hMLHl using approaches which are known in the art, and other known approaches could have been used. We used PCR to screen a Pl human genomic library for the hMLHl gene. We obtained four clones, two that contained the whole gene and two which lacked the C-terminus. We characterized one of the full length clones by cycle sequencing, which resulted in our definition of all intron/exon junction sequences for both sides of the 19 hMLHl exons. We then designed multiple sets of PCR primers to amplify each individual exon (first stage primers) and verified the sequence of each exon and flanking intron sequence by amplifying several different genomic DNA samples and sequencing the resulting fragments using an ABI 373 sequencer. In addition, we have determined the sizes of each hMLHl exon using PCR methods. Finally, we devised a set of nested PCR primers (second stage primers) for reamplification of individual exons. We have used the second stage primers in a multi-plex method for analyzing HNPCC families and tumors for hMLHl mutations. Generally, in the nested PCR primer approach, we perform a first multi-plex amplification with four to eight sets of "first stage" primers, each directed to a different exon. We then reamplify individual exons from the product of the first amplification step, using a single set of second stage primers. Examples and further details relating to our use of the first and second stage primers are set forth below.

[0166]

Through our genomic sequencing studies, we have identified all nineteen exons within the hMLHl gene, and have mapped the intron/exon boundaries. One aspect of the invention, therefore, is the individual exons of the hMLHl gene. Table 1 presents the nucleotide coordinates (i.e., the point of insertion of each intron within the coding region of the gene) of the hMLHl exons (SEQ ID NOS: 25-43). The presented coordinates are based on the hMLHl cDNA sequence, assigning position "1" to the "A" of the start "ATG" (which A is nucleotide 1 in SEQ ID NO: 4. Table 1

[0167]

Intron Number cDNA Sequence Coordinates

[0168]

intron 1 116 & 117

[0169]

intron 2 207 & 208 intron 3 306 & 307

[0170]

intron 4 380 & 381 intron 5 453 & 454

[0171]

intron 6 545 & 546

[0172]

intron 7 592 & 593

[0173]

intron 8 677 & 678 intron 9 790 & 791 intron 10 884 & 885 intron 11 1038 & 1039

[0174]

intron 12 1409 & 1410

[0175]

intron 13 1558 & 1559

[0176]

intron 14 1667 & 1668

[0177]

intron 15 1731 & 1732

[0178]

intron 16 1896 & 1897

[0179]

intron 17 1989 & 1990

[0180]

intron 18 2103 & 2104

[0181]

We have also determined the nucleotide sequence of intron regions which flank exons of the hMLHl gene. SEQ ID NOS: 6-24 are individual exon sequences bounded by their respective upstream and downstream intron sequences. The same nucleotide structures are shown in Fig. 4A, where the exons are numbered from N-terminus to C-terminus with respect to the chromosomal locus. The 5-digit numbers indicate the primers used to amplify the exon. All sequences are numbered assuming the A of the ATG codon is nucleotide 1. The numbers in ( ) are the nucleotide coordinates of the coding sequence found in the indicated exon. Uppercase is intron. Lowercase is exon or non-translated sequences found in the mRNA/cDNA clone. Lowercase and underlined sequences correspond to primers. The stop codon at 2269-2271 is in italics and underlined.

[0182]

Table 2 presents the sequences of primer pairs ("first stage" primers) which we have used to amplify individual exons together with flanking intron structures.

[0183]

Table 2

[0184]

EXON PRIMER PRIMER PRIMER PRIMER NUCLEOTIDE NO. LOCATION NO. SEQ ID SEQUENCE NO •

[0185]

1 upstream 18442 44 5'aggcactgaggtgattggc

[0186]

1 downstream 19109 45 5 'tcgtagcccttaagtgagc

[0187]

2 upstream 19689 46 5'aatatgtacattagagtagttg

[0188]

2 downstream 19688 47 5'cagagaaaggtcctgactc

[0189]

3 upstream 19687 48 5'agagatttggaaaatgagtaac

[0190]

3 downstream 19786 49 5'acaatgtcatcacaggagg

[0191]

4 upstream 18492 50 5'aacctttccctttggtgagg

[0192]

4 downstream 18421 51 5'gattactctgagacctaggc

[0193]

5 upstream 18313 52 5'gattttctcttttccccttggg

[0194]

5 downstream 18179 53 5'caaacaaagcttcaacaatttac EXON PRIMER PRIMER PRIMER PRIMER NUCLEOTIDE NO. LOCATION NO. SEQ ID SEQUENCE NO

[0195]

6 upstream 18318 54 5'gggttttattttcaagtacttctatg

[0196]

6 downstream 18317 55 5'gctcagcaactgttcaatgtatgagc

[0197]

7 upstream 19009 56 5'ctagtgtgtgtttttggc

[0198]

7 downstream 19135 57 5'cataaccttatctccacc

[0199]

8 upstream 18197 58 5'ctcagccatgagacaataaatcc

[0200]

8 downstream 18924 59 5 'ggttcccaaataatgtgatgg

[0201]

9 upstream 18765 60 5'caaaagcttcagaatctc

[0202]

9 downstream 18198 • 61 5'ctgtgggtgtttcctgtgagtgg

[0203]

10 upstream 18305 62 5'catgactttgtgtgaatgtacacc

[0204]

IO downstream 18306 63 5'gaggagagcctgatagaacatctg

[0205]

11 upstream 18182 64 5'gggctttttctccccctccc

[0206]

11 downstream 19041 65 5'aaaatctgggctctcacg

[0207]

12 upstream 18579 66 5'aattatacctcatactagc

[0208]

12 downstream 18178 67 5'gttttattacagaataaaggagg

[0209]

12 downstream 19070 68 5'aagccaaagttagaaggca

[0210]

13 upstream 18420 69 5'tgcaacccacaaaatttggc

[0211]

13 downstream 18443 70 5'ctttctccatttccaaaacc

[0212]

14 upstream 19028 71 5'tggtgtctctagttctgg

[0213]

14 downstream 18897 72 5'cattgttgtagtagctctgc

[0214]

15 upstream 19025 73 5'cccatttgtcccaactgg EXON PRIMER PRIMER PRIMER PRIMER NUCLEOTIDE NO. LOCATION NO. SEQ ID SEQUENCE NO

[0215]

15 downstream 18575 74 5'cggtcagttgaaatgtcag

[0216]

16 upstream 18184 75 5'catttggatgctccgttaaagc

[0217]

16 downstream 18314 76 5'cacccggctggaaattttatttg

[0218]

17 upstream 18429 77 5'ggaaaggcactggagaaatggg

[0219]

17 downstream 18315 78 5'ccctccagcacacatgcatgtaccg

[0220]

18 upstream 18444 79 5'taagtagtctgtgatctccg

[0221]

18 downstream 18581 80 5'atgtatgaggtcctgtcc

[0222]

19 upstream 18638 81 5'gacaccagtgtatgttgg

[0223]

19 downstream 18637 82 5'gagaaagaagaacacatccc

[0224]

Additionally, we have designed a set of "second stage" amplification primers, the structures of which are shown below in Table 3. We use the second stage primers in conjunction with the first stage primers in a nested amplification protocol, as described below.

[0225]

Table 3

[0226]

EXON PRIMER PRIMER PRIMER PRIMER NO. LOCATION NO. SEQ ID NUCLEOTIDE NO SEQUENCE

[0227]

1 upstream 19295 83 5'tgtaaaacgacggccagtcact gaggtgattggctgaa

[0228]

1 downstream 19446 84 * 5 ' tagcccttaagtgagcccg

[0229]

2 upstream 18685 85 5'tgtaaaacgacggccagttacat tagagtagttgcaga EXON PRIMER PRIMER PRIMER PRIMER NO. LOCATION NO. SEQ ID NUCLEOTIDE NO SEQUENCE

[0230]

2 downstream 19067 86 *5'aggtcctgactcttccatg

[0231]

3 upstream 18687 87 5'tgtaaaacgacggccagtttgga aaatgagtaacatgatt

[0232]

3 downstream 19068 88 *5'tgtcatcacaggaggatat

[0233]

4 upstream 19294 89 5'tgtaaaacgacggccagtctttc cctttggtgaggtga

[0234]

4 downstream 19077 90 *5'tactctgagacctaggccca

[0235]

5 upstream 19301 91 5 'tgtaaaacgacggccagttctct tttccccttgggattag

[0236]

5 downstream 19046 92 * 5 ' acaaagcttcaacaatttactc t •

[0237]

6 upstream 19711 93 5'tgtaaaacgacggccagtgtttt attttcaagtacttctatgaatt

[0238]

6 downstream 19079 94 * 5 'cagcaactgttcaatgtatgag cact

[0239]

7 upstream 19293 95 5'tgtaaaacgacggccagtgtgtg tgtttttggcaac

[0240]

7 downstream 19435 96 *5'aaccttatctccaccagc

[0241]

8 upstream 19329 97 5 'tgtaaaacgacggccagtagcc atgagacaataaatccttg

[0242]

8 downstream 19450 98 *5'tcccaaataatgtgatggaatg

[0243]

9 upstream 19608 99 5'tgtaaaacgacggccagtaagc ttcagaatctctttt EXON PRIMER PRIMER PRIMER PRIMER NO. LOCATION NO. SEQ ID NUCLEOTIDE NO SEQUENCE

[0244]

9 downstream 19449 100 *5'tgggtgtttcctgtgagtggatt

[0245]

10 upstream 19297 101 5'tgtaaaacgacggccagtacttt gtgtgaatgtacacctgtg

[0246]

10 downstream 19081 102 *5'gagagcctgatagaacatctgt tg

[0247]

11 upstream 19486 103 5'tgtaaaacgacggccagtcttttt ctccccctcccacta

[0248]

11 downstream 19455 104 *5'tctgggctctcacgtct

[0249]

12 upstream 20546 105 *5'cttattctgagtctctcc

[0250]

12 downstream 20002 106 5'tgtaaaacgacggccagtgtttg . ctcagaggctgc

[0251]

12 upstream 19829 107 *5'gatggttcgtacagattcccg

[0252]

12 downstream 19385 108 5'tgtaaaacgacggccagtttatt acagaataaaggaggtag

[0253]

13 upstream 19300 109 5'tgtaaaacgacggccagtaacc cacaaaatttggctaag

[0254]

13 downstream 19078 110 *5'tctccatttccaaaaccttg

[0255]

14 upstream 19456 111 *5'tgtctctagttctggtgc

[0256]

14 downstream 19472 112 5'tgtaaaacgacggccagttgttg tagtagctctgcttg

[0257]

15 upstream 19697 113 *5'atttgtcccaactggttgta EXON PRIMER PRIMER PRIMER PRIMER NO. LOCATION NO. SEQ ID NUCLEOTIDE NO SEQUENCE

[0258]

15 downstream 19466 114 5'tgtaaaacgacggccagttcagt tgaaatgtcagaaagtg

[0259]

16 upstream 19269 115 5'tgtaaaacgacggccagt

[0260]

16 downstream 19047 116 *5'ccggctggaaattttatttggag

[0261]

17 upstream 19298 117 5'tgtaaaacgacggccagtaggc actggagaaatgggatttg

[0262]

17 downstream 19080 118 * 5 'tccagcacacatgcatgtaccg aaat

[0263]

18 upstream 19436 119 * 5 'gtagtctgtgatctccgttt

[0264]

18 downstream 19471 120 5 'tgtaaaacgacggccagttatga ggtcctgtcctag

[0265]

19 upstream 19447 121 *5'accagtgtatgttgggatg

[0266]

19 downstream 19330 122 5 'tgtaaaacgacggccagtgaaa gaagaacacatcccaca

[0267]

In Table 3 an asteric (*) indicates that the 5' nucleotide is biotinylated. Exons 1-7, 10, 13 and 16-19 can be specifically amplified in PCR reactions containing either 1.5 mM or 3 mM MgCl2. Exons 11 and 14 can only be specifically amplified in PCR reactions containing 1.5 mM MgCl2 and exons 8, 9, 12 and 15 can only be specifically amplified in PCR reactions containing 3 mM MgCI^. With respect to exon 12, the second stage amplification primers have been designed so that exon 12 is reamplified in two halves. The 20546 and 20002 primer set amplifies the N-terminal half. The primer set 19829 and 19835 amplifies the C-terminal half. An alternate primer for 18178 is 19070. The hMLHl sequence information provided by our studies and disclosed in this application and preceding related applications, may be used to design a large number of different oligonucleotide primers for use in identifying hMLHl mutations that correlate with cancer susceptibility and/or with tumor development in an individual, including primers that will amplify more than one exon (and/or flanking intron sequences) in a single product band.

[0268]

One of ordinary skill in the art would be familiar with considerations important to the design of PCR primers for use to amplify the desired fragment or gene.37 These considerations may be similar, though not necessarily identical to those involved in design of sequencing primers, as discussed above. Generally it is important that primers hybridize relatively specifically (i.e. have a Tm of greater than about 55-degrees° C, and preferably around 60-degrees° C). For most cases, primers between about 17 and 25 nucleotides in length work well. Longer primers can be useful for amplifying longer fragments. In all cases, it is desirable to avoid using primers that are complementary to more than one sequence in the human genome, so that each pair of PCR primers amplifies only a single, correct fragment. Nevertheless, it is only absolutely necessary that the correct band be distinguishable from other product bands in the PCR reaction. The exact PCR conditions (e.g. salt concentration, number of cycles, type of DNA polymerase, etc.) can be varied as known in the art to improve, for example, yield or specificity of the reaction. In particular, we have found it valuable to use nested primers in PCR reactions in order to reduce the amount of required DNA substrate and to improve amplification specificity. Two examples follow. The first example illustrates use of a first stage primer pair (SEQ ID NOS: 69 and 70) to amplify intron/exon segment (SEQ ID NO: 18). The second example illustrates use of second stage primers to amplify a target intron/exon segment from the product of a first PCR amplification step employing first stage primers. EXAMPLE 1: Amplification oi hMLHl genomic clones from a Pl phage library 25ng genomic DNA (or lng of a Pl phage can be used) was used in PCR reactions including: 0.05mM dNTPs 50mM KCl 3mM Mg lOmM Tris-HCl pH 8.5 0.01% gelatin 5//.M primers Reactions were performed on a Perkin-Elmer Cetus model 9600 thermal cycler. Reactions were incubated at 95-degrees0 C for 5 minutes, followed by 35 cycles

[0269]

(30 cycles from a Pl phage) of:

[0270]

94-degrees° C for 30 seconds 55-degrees° C for 30 seconds 72-degrees° C for 1 minute. A final 7 . minute extension reaction was then performed at 72°-degrees C.

[0271]

Desirable Pl clones were those from which an approximately bp product band was produced.

[0272]

EXAMPLE 2: Amplification of hMLHl sequences from genomic DNA using nested PCR primers We performed two-step PCR amplification of hMLHl sequences from genomic DNA as follows. Typically, the first amplification was performed in a 25 microliter reaction including:

[0273]

25ng of chromosomal DNA

[0274]

Perkin-Elmer PCR buffer II (any suitable buffer could be used) 3mM MgCl2

[0275]

50 M each dNTP Taq DNA polymerase 5μU primers (SEQ ID NOS: 69, 70) and incubated at 95-degrees° C for 5 minutes, followed by 20 cycles of: 94-degrees° C for 30 seconds

[0276]

55-degrees° C for 30 seconds. The product band was typically small enough (less than an approximately 500 bp) that separate extension steps were not performed as part of each cycle. Rather, a single extension step was performed, at 72-degrees° C for 7 minutes, after the 20 cycles were completed. Reaction products were stored at 4-degrees° C. The second amplification reaction, usually 25 or 50 microliters in volume, included:

[0277]

1 or 2 microliters (depending on the volume of the reaction) of the first amplification reaction product

[0278]

Perkin-Elmer PCR buffer II (any suitable buffer could be used) 3mM or MgCl2

[0279]

50 μU each dNTP Taq DNA polymerase

[0280]

5//M nested primers (SEQ ID NOS: 109, 110), and was incubated at 95-degrees° C for 5 minutes, followed by 20-25 cycles of: " 94-degrees° C for 30 seconds

[0281]

55-degrees° C for 30 seconds a single extension step was performed, at 72-degrees° C for 7 minutes, after the cycles were completed. Reaction products were stored at 4-degrees° C.

[0282]

Any set of primers capable of amplifying a target hMLHl sequence can be used in the first amplification reaction. We have used each of the primer sets presented in Table 2 to amplify an individual hMLHl exon in the first amplification reaction. We have also used combinations of those primer sets, thereby amplifying multiple individual hMLHl exons in the first amplification reaction. The nested primers used in the first amplification step were designed relative to the primers used in the first amplification reaction. That is, where a single set of primers is used in the first amplification reaction, the primers used in the second amplification reaction should be identical to the primers used in the first reaction except that the primers used in the second reaction should not include the 5'-most nucleotides of the first amplification reaction primers, and should extend sufficiently more at the 3' end that the Tm of the second amplification primers is approximately the same as the Tm of the first amplification reaction primers. Our second reaction primers typically lacked the 3 5'-most nucleotides of the first amplification reaction primers, and extended approximately 3-6 nucleotides farther on the 3' end. SEQ ID NOS: 109, 110 are examples of nested primer pairs that could be used in a second amplification reaction when SEQ ID NOS: 69 and 70 were used in the first amplification reaction.

[0283]

We have also found that it can be valuable to include a standard sequence at the 5' end of one of the second amplification reaction primers to prime sequencing reactions. Additionally, we have found it useful to biotinylate that last nucleotide of one or both of the second amplification reaction primers so that the product band can easily be purified using magnetic beads40 and then sequencing reactions can be performed directly on the bead-associated products.41"45

[0284]

For additional discussion of multiplex amplification and sequencing methods, see References by Zu et al. and Espelund et al.46'47

[0285]

hMLHl Link to Cancer

[0286]

As a first step to determine whether hMLHl was a candidate for the HNPCC locus on human chromosome 3p21-23,3 we mapped hMLHl by fluorescence in situ hybridization (FISH).20'21 We used two separate genomic fragments (data not shown) of the hMLHl gene in FISH analysis. Examination of several metaphase chromosome spreads localized hMLHl to chromosome 3p21.3-23.

[0287]

Panel A of Figure 7 shows hybridization of hMLHl probes in a metaphase spread. Biotinylated hMLHl genomic probes were hybridized to banded human metaphase chromosomes as previously described.20'21 Detection was performed with fluorescein isothiocyanate (FITC)-conjugated avidin (green signal); chromosomes, shown in blue, were counterstained with 4'6-diamino-2- phenylindole (DAPI). Images were obtained with a cooled CCD camera, enhanced, pseudocoloured and merged with the following programs: CCD Image

[0288]

Capture; NIH Image 1.4; Adobe Photoshop and Genejoin Maxpix respectively. Panel B of Figure 7 shows a composite of chromosome 3 from multiple metaphase spreads aligned with the human chromosome 3 ideogram. Region of hybridization (distal portion of 3p21.3-23) is indicated in the ideogram by a vertical bar.

[0289]

As independent confirmation of the location of hMLHl on chromosome 3, we used both PCR with a pair of hMLHl -specific oligonucleotides and Southern blotting with a hMLHl -specific probe to analyze DNA from the NIGMS2 rodent/human cell panel (Coriell Inst. for Med. Res., Camden, NJ, USA). Results of both techniques indicated chromosome 3 linkage. We also mapped the mouse MLHl gene by FISH to chromosome 9 band E. This is a position of synteny to human chromosome 3p.22 Therefore, the hMLHl gene localizes to 3p21.3-23, within the genomic region implicated in chromosome 3- linked HNPCC families.3

[0290]

Next, we analyzed blood samples from affected and unaffected individuals from two chromosome-3 candidate HNPCC families3 for mutations. One family, Family 1, showed significant linkage (lod score = 3.01 at recombination fraction of 0) between HNPCC and a marker' on 3p. For the

[0291]

- second family, Family 2, the reported lod score (1.02) was below the commonly accepted level of significance, and thus only suggested linkage to the same marker on 3p. Subsequent linkage analysis of Family 2 with the microsatellite marker D3S1298 on 3p21.3 gave a more significant lod score of 1.88 at a recombination fraction of 0. Initially, we screened for mutations in two PCR-amplified exons of the hMLHl gene by direct DNA sequencing (Figure 4). We examined these two exons from three affected individuals of Family 1, and did not detect any differences from the expected sequence. In Family 2, we observed that four individuals affected with colon cancer are heterozygous for a C to T substitution in an exon encoding amino acids 41-69, which corresponds to a highly-conserved region of the protein (Figure 9). For one affected individual, we screened PCR- amplified cDNA for additional sequence differences. The combined sequence information obtained from the two exons and cDNA of this one affected individual represents 95% (i.e. all but the first 116 bp) of the open reading frame.

[0292]

We observed no nucleotide changes other than the C to T substitution. In addition, four individuals from Family 2, predicted to be carriers based upon linkage data, and as yet unaffected with colon cancer, were found to be heterozygous for the same C to T substitution. Two of these predicted carriers are below and two are above the mean age of onset (50 years) in this particular family. Two unaffected individuals examined from this same family, both predicted by linkage data to be non carriers, showed the expected normal sequence at this position. Linkage analysis that includes the C to T substitution in Family 2 gives a lod score of 2.23 at a recombination fraction 0. Using low stringency cancer diagnostic criteria, we calculated a lod score of 2.53. These data indicate the C to T substitution shows significant linkage to the HNPCC in Family 2.

[0293]

Figure 8 shows sequence chromatograms indicating a C to T transition mutation that produces a non-conservative amino acid substitution at position 44 of the hMLHl protein. Sequence analysis of one unaffected (top panels, plus and minus strands) and one affected individual (lower panels, plus and minus strands) is presented. The position of the heterozygous nucleotide is indicated by an arrow. Analysis of the sequence chromatographs indicates that there is sufficient T signal in the C peak and enough A signal in the G peak for the affected individuals to be heterozygous at this site.

[0294]

To determine whether this C to T substitution was a polymorphism, we sequenced this same exon amplified from the genomic DNA from 48 unrelated individuals and observed only the normal sequence. We have examined an additional 26 unrelated individuals using allele specific oligonucleotide (ASO) hybridization analysis.33 The ASO sequences (SEQ ID NOS: 141 and 142, respectively) which we used are: 5'-ACTTGTGGATTTTGC-3' and

[0295]

5'-ACTTGTGAATTTTGC-3'.

[0296]

Based upon direct DNA sequencing and ASO analysis, none of these 74 unrelated individuals carry the C to T substitution. Therefore, the C to T substitution observed in Family 2 individuals is not likely to be a polymorphism. As mentioned above, we did not detect this same C to T substitution in affected individuals from a second chromosome 3-linked family, Family l.3 We are continuing to study individuals of Family 1 for mutations in hMLHl. Table 4 below summarizes our experimental analysis of blood samples from affected and unaffected individuals from Family 2 and unrelated individuals.

[0297]

Table 4

[0298]

Number of Individuals with

[0299]

C to T Mutation/

[0300]

Status Number of Individuals Tested

[0301]

F

[0302]

A

[0303]

Affected 4/4 M I L Predicted Carriers 4/4 Y

[0304]

Predicted Non-carriers 0/2

[0305]

2

[0306]

Unrelated Individuals 0/74

[0307]

Based on several criteria, we suggest that the observed C to T substitution in the coding region of hMLHl represents the mutation that is the basis for HNPCC in Family 2.3 First, DNA sequence and ASO analysis did not detect the C to T substitution in 74 unrelated individuals. Thus, the C to T substitution is not simply a polymorphism. Second, the observed C to T substitution is expected to produce a serine to phenylalanine change at position 44 (See Figure 9). This amino acid substitution is a non-conservative change in a conserved region of the protein (Figures 3 and 9). Secondary structure predictions using Chou-Fasman parameters suggest a helix-turn-beta sheet structure with position 44 located in the turn. The observed Ser to Phe substitution, at position 44 lowers the prediction for this turn considerably, suggesting that the predicted amino acid substitution alters the conformation of the hMLHl protein. The suggestion that the Ser to Phe substitution is a mutation which confers cancer susceptibility is further supported by our experiments which show that an analogous substitution (alanine to phenylalanine) in a yeast MLHl gene results in a nonfunctional mismatch repair protein. In bacteria and yeast, a mutation affecting DNA mismatch repair causes comparable increases in the rate of spontaneous mutation including additions and deletions within dinucleotide repeats.4,5'11'13'14'15'16 In humans, mutation of HMSH2 is the basis of chromosome-2 HNPCC,1'2 tumors which show microsatellite instability and an apparent defect in mismatch repair.12 Chromosome 3-linked HNPCC is also associated with instability of dinucleotide repeats.3 Combined with these observations, the high degree of conservation between the human MLHl protein and the yeast DNA mismatch repair protein MLHl suggests that hMLHl is likely to function in DNA mismatch repair. During isolation of the hMLHl gene, we identified the hPMSl gene. This observation suggests that mammalian DNA mismatch repair, like that in yeast,4 may require at least two MutL-like proteins.

[0308]

It should be noted that it appears that different HNPCC families show different mutations in the MLHl gene. As explained above, affected individuals in Family 1 showed "tight linkage" between HNPCC and a locus in the region of 3p21-23. However, affected individuals in Family 1 do not have the C to T mutation found in Family 2. It appears that the affected individuals in Family 1 have a different mutation in their MLHl gene. Further, we have used the structure information and methods described in this application to find and characterize another hMLHl mutation which apparently confers cancer susceptibility in heterozygous carriers of the mutant gene in a large English HNPCC family. The hMLHl mutation in the English family is a + 1 T frameshift which is predicted to lead to the synthesis of a truncated hMLHl protein. Unlike, for example, sickle cell anemia, in which essentially all known affected individuals have the same mutation multiple hMLHl mutations have been discovered and linked to cancer. Therefore, knowledge of the entire cDNA sequence for hMLHl (and probably hPMSl), as well as genomic sequences particularly those that surround exons, will be useful and important for characterizing mutations in families identified as exhibiting a high frequency of cancer.

[0309]

Subsequent to our discovery of a cancer conferring mutation in hMLHl, studies by others have resulted in the characterization of at least 5 additional mutations in hMLHl, each of which appears to have conferred cancer susceptibility to individuals in at least one HNPCC family. For example, Papadopoulos et al. indentified such as a mutation, characterized by an in-frame deletion of 165 base pairs between codons 578 to 632. In another family, Papadopoulos et al. observed an hMLHl mutation, characterized by a frame shift and substitution of new amino acids, namely, a 4 base pair deletion between codons 727 and 728. Papadopoulos et al. also reports an hMLHl cancer linked mutation, characterized by an extension of the COOH terminus, namely, a 4 base pair insertion between codons 755 and 750.38 In summary, we have shown that DNA mismatch repair gene hMLHl which is likely to be the hereditary nonpolyposis colon cancer gene previously localized by linkage analysis to chromosome 3p21-23.3 Availability of the hMLHl gene sequence will facilitate the screening of HNPCC families for cancer-linked mutations. In addition, although loss of heterozygosity (LOH) of linked markers is not a feature of either the 2p or 3p forms of HNPCC,3'6 LOH involving the 3p21.3-23 region has been observed in several human cancers.24"26 This suggests the possibility that hMLHl mutation may play some role in these tumors.

[0310]

Human PMSl

[0311]

Human PMSl was isolated using the procedures discussed with reference to Figure 1. Figure 10 shows the entire hPMSl cDNA nucleotide sequence. Figure 11 shows an alignment of the predicted human and yeast PMSl protein sequences. We determined by FISH analysis that human PMSl is located on chromosome 7. Subsequent to our discovery of hPMSl, others have identified mutations in the gene which appear to confer HNPCC susceptibility.39

[0312]

Mouse MLHl

[0313]

Using the procedure outlined above with reference to Figure 1, we have determined a partial nucleotide sequence of mouse MLHl cDNA, as shown in Figure 12 (SEQ ID NO: 135). Figure 13 shows the corresponding predicted amino acid sequence for mMLHl protein (SEQ ID NO: 136) in comparison to the predicted hMLHl protein sequence (SEQ ID NO: 5). Comparison of the mouse and human MLHl proteins as well as the comparison of hMLHl with yeast MLHl proteins, as shown in Figure 9, indicate a high degree of conservation.

[0314]

Mouse PMSl

[0315]

Using the procedures discussed above with reference to Figure 1, we isolated and sequenced the mouse PMSl gene, as shown in Figure 14 (SEQ

[0316]

ID NO: 137). This cDNA sequence encodes a predicted protein of 864 amino acids (SEQ ID NO: 138), as shown in Figure 15, where it is compared to the predicted amino acid sequence for hPMSl (SEQ ID NO: 133). The degree of identity between the predicted mouse and human PMSl proteins is high, as would be expected between two mammals. Similarly, as noted above, there is a strong similarity between the human PMSl protein and the yeast DNA mismatch repair protein PMSl, as shown in Figure 11. The fact that yeast PMSl and MLHl function in yeast to repair DNA mismatches, strongly suggests that human and mice PMSl and MLHl are also mismatch repair proteins.

[0317]

Uses for Mouse MLHl and PMSl We believe our isolation and characterization of mMLHl and mPMSl genes will have many research applications. For example, as already discussed above, we have used our knowledge of the mPMSl gene to produce antibodies which react specifically with hPMSl. We have already explained that antibodies directed to the human proteins, MLHl or PMSl may be used for both research purposes as well as diagnostic purposes.

[0318]

We also believe that our knowledge oimPMSl and mMLHl will be useful for constructing mouse models in order to study the consequences of DNA mismatch repair defects. We expect that mPMSl or mMLHl defective mice will be highly prone to cancer because chromosome 2p and 3p-associated HNPCC are each due to a defect in a mismatch repair gene.1'2 As noted above, we have already produced chimeric mice which carry an mPMSl defective gene. We are currently constructing mice heterozygous for mPMSl or mMLHl mutation. These heterozygous mice should provide useful animal models for studying human cancer, in particular HNPCC. The mice will be useful for analysis of both intrinsic and extrinsic factors that determine cancer risk and progression. Also, cancers associated with mismatch repair deficiency may respond differently to conventional therapy in comparison to other cancers. Such animal models will be useful for determining if differences exist, and allow the development of regimes for the effective treatment of these types of tumors. Such animal models may also be used to study the relationship between hereditary versus dietary factors in carcinogenesis.

[0319]

Distinguishing Mutations From Polymorphisms

[0320]

For studies of cancer susceptibility and for tumor identification and characterization, it is important to distinguish "mutations" from "polymorphisms". A "mutation" produces a "non-wild-type allele" of a gene. A non-wild-type allele of a gene produces a transcript and/or a protein product that does not function normally within a cell. "Mutations" can be any alteration in nucleotide sequence including insertions, deletions, substitutions, and rearrangements.

[0321]

"Polymorphisms", on the other hand, are sequence differences that are found within the population of normally-functioning (i.e., "wild-type") genes. Some polymorphisms result from the degeneracy of the nucleic acid code. That is, given that most amino acids are encoded by more than one triplet codon, many different nucleotide sequences can encode the same polypeptide. Other polymorphisms are simply sequence differences that do not have a significant effect on the function of the gene or encoded polypeptide. For example, polypeptides can often tolerate small insertions or deletions, or "conservative" substitutions in their amino acid sequence without significantly altering function of the polypeptide.

[0322]

"Conservative" substitutions are those in which a particular amino acid is substituted by another amino acid of similar chemical characteristics. For example, the amino acids are often characterized as "non-polar (hydrophobic)" including alanine, leucine, isoleucine, valine, proline, phenylaline, tryptophan, and methionine; "polar neutral", including glycine, serine, threonine, cysteine, tyrosine, asparagine, and glutamine; "positively charged (basic)", including arginine, lysine, and histidine; and "negatively charged (acidic)", including aspartic acid and glutamic acid. A substitution of one amino acid for another amino acid in the same group is generally considered to be "conservative", particularly if the side groups of the two relevant amino acids are of a similar size.

[0323]

The first step in identifying a mutation or polymorphism in a mismatch repair gene sequence involves identification, using available techniques including those described herein, of a mismatch repair gene, (or gene fragment) sequence that differs from a known, normal (e.g. wild-type) sequence of the same mismatch repair gene (or gene fragment). For example, a hMLHl gene (or gene fragment) sequence could be identified that differs in at least one nucleotide position from a known normal (e.g. wild-type) hMLHl sequence such as any of SEQ ID NOS: 6-24.

[0324]

Mutations can be distinguished from polymorphisms using any of a variety of methods, perhaps the most direct of which is data collection and correlation with tumor development. That is, for example, a subject might be identified whose hMLHl gene sequence differs from a sequence reported in SEQ ID. NOS: 6-24, but who does not have cancer and has no family history of cancer. Particularly if other, preferably senior, members of that subject's family have hMLHl gene sequences that differ from SEQ ID NOS: 6-24 in the same way(s), it is likely that subject's hMLHl gene sequence could be categorized as a "polymorphism". If other, unrelated individuals are identified with the same hMLHl gene sequence and no family history of cancer, the categorization may be confirmed. Mutations that are responsible for conferring genetic susceptibility to cancer can be identified because, among other things, such mutations are likely to be present in all tissues of an affected individual and in the germ line of at least one of that individual's parents, and are not likely to be found in unrelated families with no history of cancer. When distinguishing mutations from polymorphisms, it can sometimes be valuable to evaluate a particular sequence difference in the presence of at least one known mismatch repair gene mutation. In some instances, a particular sequence change will not have a detectable effect (i.e., will appear to be a polymorphism) when assayed alone, but will, for example, increase the penetrance of a known mutation, such that individuals carrying both the apparent polymorphism difference and a known mutation have higher probability of developing cancer than do individuals carrying only the mutation. Sequence differences that have such an effect are properly considered to be mutations, albeit weak ones.

[0325]

As discussed above and previously (U.S. Patent Application Nos. 08/168,877 and 08/209,521), mutations in mismatch repair genes or gene products produced non-wild-type versions of those genes or gene products. Some mutations can therefore be distinguished from polymorphisms by their functional characteristics in in vivo or in vitro mismatch repair assays. Any available mismatch repair assay can be used to analyze these characteristics.49"63 It is generally desirable to utilize more than one mismatch repair assay before classifying" a sequence change as a polymorphism, since some mutations will have effects that will not be observed in all assays.

[0326]

For example, a mismatch repair gene containing a mutation would not be expected to be able to replace an endogenous copy of the same gene in a host cell without detectably affecting mismatch repair in that cell; whereas a mismatch repair gene containing a sequence polymorphism would be expected to be able to replace an endogenous copy of the same gene in a host cell without detectably affecting mismatch repair in that cell. We note that for such "replacement" studies, it is generally desirable to introduce the gene to be tested into a host cell of the same (or at least closely related) species as the cell from which the test gene was derived, to avoid complications due to, for example, the inability of a gene product from one species to interact with other mismatch repair gene products from another species. Similarly, a mutant mismatch repair protein would not be expected to function normally in an in vitro mismatch repair system (preferably from a related organism); whereas a polymorphic mismatch repair protein would be expected to function normally.

[0327]

The methods described herein and previously allow identification of different kinds of mismatch repair gene mutations. The following examples illustrate protocols for distinguishing mutations from polymorphisms in DNA mismatch repair genes.

[0328]

EXAMPLE 3: We have developed a system for testing in yeast, S. cerevisiae the functional significance of mutations found in either the hMLHl or hPMSl genes. The system is described in this application using as an example, the serine (SER) to phenylalanine (PHE) causing mutation in hMLHl that we found in a family with HNPCC, as described above. We have derived a yeast strain that it is essentially deleted for its MLHl gene and hence is a strong mutator (i.e., 1000 fold above the normal rate in a simple genetic marker assay involving reversion from growth dependence on a given amino acid to independence (reversion of the hom3-10 allele, Prolla, Christie and Liskay, Mol Cell Biol, 14:407-415, 1994). When we placed the normal yeast MLHl gene (complete with all known control regions) on a yeast plasma that is stably maintained as a single copy into the MLHl-deleted strain, the mutator phenotype is fully corrected using the reversion to amino acid independence assay. However, if we introduce a deleted copy of the yeast MLHl there is no correction. We next tested the mutation that in the HNPCC family caused a SER to PHE alteration. We found that the resultant mutant yeast protein cannot correct the mutator phenotype, strongly suggesting that the alteration from the wild-type gene sequence probably confers cancer susceptibility, and is therefore classified as a mutation, not a polymorphism. We subsequently tested proteins engineered to contain other amino acids at the "serene" position and found that most changes result in a fully mutant, or at least partially mutant phenotype.

[0329]

As other "point" mutations in MLHl and PMSl genes are found in cancer families, they can be engineered into the appropriate yeast homolog gene and their consequence on protein function studied. In addition, we have identified a number of highly conserved amino acids in both the MLHl and PMSl genes. We also have evidence that hMLHl interacts with yeast PMSl. This finding raises the possibility that mutations observed in the hMLHl gene can be more directly tested in the yeast system. We plan to systematically make mutations that will alter the amino acid at these conserved positions and determine what amino acid substitutions are tolerated and which are not. By collecting mutation information relating to hMLHl and hPMSl, both by determining and documenting actual found mutations in HNPCC families, and by artificially synthesizing mutants for testing in experimental systems, it may be eventually possible to practice a cancer susceptibility testing protocol which, once the individuals hMLHl or hPMSl structure is determined, only requires comparison of that structure to known mutation versus polymorphism data.

[0330]

EXAMPLE 4: Another method which we have employed to study physical interactions between hMLHl and hPMSl, can also be used to study whether a particular alteration in a gene product results in a change in the degree of protein-protein interaction. Information concerning changes in protein-protein interaction may demonstrate or confirm whether a particular genomic variation is a mutation or a polymorphism. Following our labs findings on the interaction between yeast MLHl and PMSl proteins in vitro and in vivo, (U.S. Patent Application Serial No. 08/168,877), the interaction between the human counterparts of these two DNA mismatch repair proteins was tested. The human

[0331]

MLHl and human PMSl proteins were tested for in vitro interaction using maltose binding protein (MBP) affinity chromatography. hMLHl protein was prepared as an MBP fusion protein, immobilized on an amylose resin column via the MBP, and tested for binding to hPMSl, synthesized in vitro. The hPMSl protein bound to the MBP-hMLHl matrix, whereas control proteins showed no affinity for the matrix. When the hMLHl protein, translated in vitro, was passed over an MBP-hPMSl fusion protein matrix, the hMLHl protein bound to the MBP-hPMSl matrix, whereas control proteins did not.

[0332]

Potential in vivo interactions between hMLHl and hPMSl were tested using the yeast "two hybrid" system.28 Our initial results indicate that hMLHl and hPMSl interact in vivo in yeast. The same system can also be used to detect changes in protein-protein interaction which result from changes in gene or gene product structure and which have yet to be classified as either a polymorphism or a mutation which confers cancer susceptibility. Detection of HNPCC Families and Their Mutation(s)

[0333]

It has been estimated that approximately 1,000,000 individuals in the

[0334]

United States carry (are heterozygous for) an HNPCC mutant gene.29

[0335]

Furthermore, estimates suggest that 50-60% of HNPCC families segregate mutations in the MSH2 gene that resides on chromosome 2p.1,2 Another significant fraction appear to be associated with the HNPCC gene that maps to chromosome 3p21-22, presumably due to hMLHl mutations such as the C to T transition discussed above. Identification of families that segregate mutant alleles of either the hMSH2 or hMLHl gene, and the determination of which individuals in these families actually have the mutation will be of great utility in the early intervention into the disease. Such early intervention will likely include early detection through screening and aggressive follow-up treatment of affected individuals. In addition, determination of the genetic basis for both familial and sporadic tumors could direct the method of therapy in the primary tumor, or in recurrences.

[0336]

Initially, HNPCC candidate families will be diagnosed partly through

[0337]

- the study of family histories, most likely at the local level, e.g., by hospital oncologists. One criterion for HNPCC is the observation of microsatellite instability in individual's tumors.3'6 The presenting patient would be tested for mutations in hMSH2, hMLHl, hPMSl and other genes involved in DNA mismatch repair as they are identified. This is most easily done by sampling blood from the individual. Also highly useful would be freshly frozen tumor tissue. It is important to note for the screening procedure, that affected individuals are heterozygous for the offending mutation in their normal tissues. The available tissues, e.g., blood and tumor, are worked up for

[0338]

PCR-based mutation analysis using one or both of the following procedures:

[0339]

1) Linkage analysis with a microsatellite marker tightly linked to the hMLHl gene.

[0340]

One approach to identify cancer prone families with a hMLHl mutation is to perform linkage analysis with a highly polymorphic marker located within or tightly linked to hMLHl. Microsatellites are highly polymorphic and therefore are very useful as markers in linkage analysis. Because we possess the hMLHl gene on a single large genomic fragment in a Pl phage clone (~100kbp), it is very likely that one or more microsatellites, e.g., tracts of dinucleotide repeats, exist within, or very close to, the hMLHl gene. At least one such microsatellite has been reported.38 Once such markers have been identified, PCR primers will be designed to amplify the stretches of DNA containing the microsatellites. DNA of affected and unaffected individuals from a family with a high frequency of cancer will be screened to determine the segregation of the MLHl markers and the presence of cancer. The resulting data can be used to calculate a lod score and hence determine the likelihood of linkage between hMLHl and the occurrence of cancer. Once linkage is established in a given family, the same polymorphic marker can be used to test other members of the kindred for the likelihood of their carrying the hMLHl mutation. 2) Sequencing of reverse transcribed cDNA. a) RNA from affected individuals, unaffected and unrelated individuals is reverse transcribed (RTd), followed by PCR to amplify the cDNA in 4-5 overlapping portions.34'37 It should be noted that for the purposes of PCR, many different oligonucleotide primer pair sequences may potentially be used to amplify relevant portions of an individual's hMLHl or hPMSl gene for genetic screening purposes. With the knowledge of the cDNA structures for the genes, it is a straight-forward exercise to construct primer pairs which are likely to be effective for specifically amplifying selected portions of the gene. While primer sequences are typically between 20 to 30 bases long, it may be possible to use shorter primers, potentially as small as approximately 13 bases, to amplify specifically selected gene segments. The principal limitation on how small a primer sequence may be is that it must be long enough to hybridize specifically to the targeted gene segment. Specificity of PCR is generally improved by lengthening primers and/or employing nested pairs of primers.

[0341]

The PCR products, in total representing the entire cDNA, are then sequenced and compared to known wild-type sequences. In most cases a mutation will be observed in the affected individual. Ideally, the nature of mutation will indicate that it is likely to inactivate the gene product. Otherwise, the possibility that the alteration is not simply a polymorphism must be determined. b) Certain mutations, e.g., those affecting splicing or resulting in translation stop codons, can destabilize the messenger RNA produced from the mutant gene and hence comprise the normal RT-based mutation detection method. One recently reported technique can circumvent this problem by testing whether the mutant cDNA can direct the synthesis of normal length protein in a coupled in vitro transcription/translation system.32

[0342]

3) Direct sequencing of genomic DNA. A second route to detect mutations relies on examining the exons and the intron/exon boundaries by PCR cycle sequencing directly off a DNA template.1'2 This method requires the use of oligonucleotide pairs, such as those described in Tables 2 and 3 above, that amplify individual exons for direct PCR cycle sequencing. The method depends upon genomic DNA sequence information at each intron/exon boundary (50bp, or greater, for each boundary).

[0343]

The advantage of the technique is two fold. First, because DNA is more stable than RNA, the condition of the material used for PCR is not as important as it is for RNA-based protocols. Second, most . any mutation within the actual transcribed region of the gene, including those in an intron affecting splicing, will be detectable.

[0344]

For each candidate gene, mutation detection may require knowledge of both the entire cDNA structure, and all intron/exon boundaries of the genomic structure. With such information, the type of causal mutation in a particular family can be determined. In turn, a more specific and efficient mutation detection scheme can be adapted for the particular family. Screening for the disease (HNPCC) is complex because it has a genetically heterogeneous basis in the sense that more than one gene is involved, and for each gene, multiple types of mutations are involved.2 Any given family is highly likely to segregate one particular mutation. However, as the nature of the mutation in multiple families is determined, the spectrum of the most prevalent mutations in the population will be determined. In general, determination of the most frequent mutations will direct and streamline mutation detection. Because HNPCC is so prevalent in the human population, carrier detection at birth could become part of standardized neonatal testing. Families at risk can be identified and all members not previously tested can be tested. Eventually, all affected kindreds could be determined.

[0345]

Mode of Mutation Screening and Testing DNA-based Testing

[0346]

Initial testing, including identifying likely HNPCC families by standard diagnosis and family history study, will likely be done in local and smaller DNA diagnosis laboratories. However, large scale testing of multiple family members, and certainly population wide testing, will ultimately require large efficient centralized commercial facilities.

[0347]

Tests will be developed based on the determination of the most common mutations for the major genes underlying HNPCC, including at least the hMSH2 gene on chromosome 2p and the MLHl gene on chromosome 3p. A variety of tests are likely to be developed. For example, one possibility is a set of tests employing oligonucleotide hybridizations that distinguish the normal vs. mutant alleles.33 As already noted, our knowledge of the nucleotide structures for hMLHl, hPMSl and hMSH2 genes makes possible the design of numerous oligonucleotide primer pairs which may be used to amplify specific portions of an individual's mismatch repair gene for genetic screening and cancer risk analysis.

[0348]

Our knowledge of the genes' structures also makes possible the design of labeled probes which can be quickly used to determine the presence or absence of all or a portion of one of the DNA mismatch repair genes. For example, allele-specific oligomer probes (ASO) may be designed to distinguish between alleles. ASOs are short DNA segments that are identical in sequence except for a single base difference that reflects the difference between normal and mutant alleles. Under the appropriate DNA hybridization conditions, these probes can recognize a single base difference between two otherwise identical DNA sequences. Probes can be labeled radioactively or with a variety of non-radioactive reporter molecules, for example, fluorescent or chemiluminescent moieties. Labeled probes are then used to analyze the PCR sample for the presence of the disease- causing allele. The presence or absence of several different disease-causing genes can readily be determined in a single sample. The length of the probe must be long enough to avoid non-specific binding to nucleotide sequences other than the target. All tests will depend ultimately on accurate and complete structural information relating to hMLHl, MSH2, hPMSl and other DNA mismatch repair genes implicated in HNPCC.

[0349]

Protein Detection-Based Screening

[0350]

Tests based on the functionality of the protein product, per se, may also be used. The protein-examining tests will most likely utilize antibody reagents specific to either the hMLHl, hPMSl and hMSH2 proteins or other related "cancer" gene products as they are identified.

[0351]

For example, a frozen tumor specimen can be cross-sectioned and prepared for antibody staining using indirect fluorescence techniques. Certain gene mutations are expected to alter or destabilize the protein structure sufficiently such as to give an altered or reduced signal after antibody staining.

[0352]

It is likely that such tests will be performed in cases where gene involvement in a family's cancer has yet to be established. We are in the process of developing diagnostic monoclonal antibodies against the human MLHl and PMSl proteins. We are over expressing MLHl and PMSl human proteins in bacteria. We will purify the proteins, inject them into mice and derive protein specific monoclonal antibodies which can be used for diagnostic and research purposes.

[0353]

Identification and Characterization of DNA Mismatch Repair Tumors

[0354]

In addition to their usefulness in diagnosing cancer susceptibility in a subject, nucleotide sequences that are homologous to a bacterial mismatch repair gene can be valuable for, among other things, use in the identification and characterization of mismatch-repair-defective tumors. Such identification and characterization is valuable because mismatch-repair-defective tumors may respond better to particular therapy regimens. For example, mismatch-repair- defective tumors might be sensitive to DNA damaging agents, especially when administered in combination with other therapeutic agents. Defects in mismatch repair genes need not be present throughout an individual's tissues to contribute to tumor formation in that individual. Spontaneous mutation of a mismatch repair gene in a particular cell or tissue can contribute to tumor formation in that tissue. In fact, at least in some cases, a single mutation in a mismatch repair gene is not sufficient for tumor development.

[0355]

In such instances, an individual with a single mutation in a mismatch repair gene is susceptible to cancer, but will not develop a tumor until a secondary mutation occurs. Additionally, in some instances, the same mismatch repair gene mutation that is strictly tumor-associated in an individual will be responsible for conferring cancer susceptibility in a family with a hereditary predisposition to cancer development.

[0356]

In yet another aspect of the invention, the sequence information we have provided can be used with methods known in the art to analyze tumors (or tumor cell lines) and to identify tumor-associated mutations in mismatch repair genes. Preferably, it is possible to demonstrate that these tumor-associated mutations are not present in non-tumor tissues from the same individual. The information described in this application is particularly useful for the identification of mismatch repair gene mutations within tumors (or tumor cell lines) that display genomic instability of short repeated DNA elements. The^ sequence information and testing protocols of the present invention can also be used to determine whether two tumors are related, i.e., whether a second tumor is the result of metastasis from an earlier found first tumor which exhibits a particular DNA mismatch repair gene mutation.

[0357]

Isolating Additional Genes of Related Function

[0358]

Proteins that interact physically with either hMLHl and/or hPMSl, are likely to be involved in DNA mismatch repair. By analogy to hMLHl and hMSH2, mutations in the genes which encode for such proteins would be strong candidates for potential cancer linkage. A powerful molecular genetic approach using yeast, referred to as a "two-hybrid system", allows the relatively rapid detection and isolation of genes encoding proteins that interact with a gene product of interest, e.g., hMLHl.7 1 The two-hybrid system involves two plasmid vectors each intended to encode a fusion protein. Each of the two vectors contains a portion, or domain, of a transcription activator. The yeast cell used in the detection scheme contains a "reporter" gene. The activator alone cannot activate transcription. However, if the two domains are brought into close proximity then transcription may occur. The cDNA for the protein of interest, e.g., hMLHl is inserted within a reading frame in one of the vectors. This is termed the "bait". A library of human cDNAs, inserted into a second plasmid vector so as to make fusions with the other domain of the transcriptional activator, is introduced into the yeast cells harboring the "bait" vector. If a particular yeast cell receives a library member that contains a human cDNA encoding a protein that interacts with hMLHl protein, this interaction will bring the two domains of the transcriptional activator into close proximity, activate transcription of the reporter gene and the yeast cell will turn blue. Next, the insert is sequenced to determine whether it is related to any sequence in the data base. The same procedure can be used to identify yeast proteins in DNA mismatch repair or a related process. Performing the yeast and human "hunts" in parallel has certain advantages. The function of novel yeast homologs can be quickly determined in yeast by gene disruption and subsequent examination of the genetic consequences of being defective in the new found gene. These yeast studies will help guide the analysis of novel human "hMLHl-or hPMSl -interacting" proteins in much the same way that the yeast studies on PMSl and MLHl have influenced our studies of the human MLHl and PMSl genes.

[0359]

Production of Antibodies By using our knowledge of the DNA sequences for hMLHl and hPMSl, we can synthesize all or portions of the predicted protein structures for the purpose of producing antibodies. One important use for antibodies directed to hMLHl and hPMSl proteins will be for capturing other proteins which may be involved in DNA mismatch repair. For example, by employing coimmuno- precipitation techniques, antibodies directed to either hMLHl or hPMSl may be precipitated along with other associated proteins which are functionally and/or physically related. Another important use for antibodies will be for the purpose of isolating hMLHl and hPMSl proteins from tumor tissue. The hMLHl and hPMSl proteins from tumors can then be characterized for the purpose of determining appropriate treatment strategies.

[0360]

We are in the process of developing monoclonal antibodies directed to the hMLHl and hPMSl proteins.

[0361]

EXAMPLE 5: We have also used the following procedure to produce polyclonal antibodies directed to the human and mouse forms of PMSl protein.

[0362]

We inserted a 3' fragment of the mouse PMSl cDNA in the bacterial expression plasmid vector, pET (Novagen, Madison, WI). The expected expressed portion of the mouse PMSl protein corresponds to a region of approximately 200 amino acids at the end of the PMSl protein. This portion of the mPMSl is conserved with yeast PMSl but is not conserved with either the human or the mouse MLHl proteins. One reason that we selected this portion of the PMSl protein for producing antibodies is that we did not want the resulting antibodies to cross-react with MLHl. The mouse PMSl protein fragment was highly expressed in E. coli., purified from a polyacrylamide gel and the eluted protein was then prepared for animal injections. Approximately 2 mg of the PMSl protein fragment was sent to the Pocono Rabbit Farm (PA) for injections into rabbits. Sera from rabbits multiple times was tittered against the PMSl antigen using standard ELISA techniques. Rabbit antibodies specific to mouse PMSl protein were affinity-purified using columns containing immobilized mouse PMSl protein. The affinity-purified polyclonal antibody preparation was tested further using Western blotting and dot blotting. We found that the polyclonal antibodies recognized, not only the mouse PMSl protein, but also the human

[0363]

PMSl protein which is very similar. Based upon the Western blots, there is no indication that other proteins were recognized strongly by our antibody, including either the human or mouse MLHl proteins.

[0364]

DNA Mismatch Repair Defective Mice

[0365]

EXAMPLE 6: In order to create a experimental model system for studying DNA mismatch repair defects and resultant cancer in a whole animal system we have derived DNA mismatch repair defective mice using embryonic stem (ES) cell technology. Using genomic DNA containing a portion of the mPMSl gene we constructed a vector that upon homologous recombination causes disruption of the chromosomal mPMSl gene. Mouse ES cells from the 129 mouse strain were confirmed to contain a disrupted mPMSl allele. The ES cells were injected into C57/BL6 host blastocysts to produce animals that were chimeric or a mixture of 129 and C57/BL6 cells. The incorporation of the ES cells was determined by the presence of patches of agouti coat coloring (indicative of ES cell contribution). All male chimeras were bred with C57/BL6 female mice.

[0366]

Subsequently, twelve offspring (F2) were born in which the agouti coat color was detected indicating the germline transmission of genetic material from the ES cells. Analysis of DNA extracted from the tail tips of the twelve offspring indicated that six of the animals were heterozygous (contained one wild- type and one mutant allele) for the mPMSl mutation. Of the six heterozygous animals, three were female, (animals F2-8, F2-ll and F2-12) and three were males (F_,, F2-10 and F2-13). Four breeding pens were set up to obtain mice that were homozygous for mPMSl mutation, and additional heterozygous mice. Breeding pen #1 which contained animals F2-ll and F2-10, yielded a total of thirteen mice in three litters, four of which have been genotyped. Breeding pen #2 (animals

[0367]

F2-8 and F,-13) gave twenty- two animals and three litters, three of which have been genotyped. Of the seven animals genotyped, three homozygous female animals have been identified. One animal died at six weeks of age from unknown causes. The remaining homozygous females are alive and healthy at twelve weeks of age. The results indicate that mPMSl homozygous defective mice are viable.

[0368]

Breeding pens #3 and #4 were used to backcross the mPMSl mutation into the C57/BL6 background. Breeding pen #3 (animal F,-12 crossed to a C57/BL6 mouse) produced twenty-one animals in two litters, nine of which have been genotyped. Breeding pen #4 (animal F2-6 crossed with a C57/BL6 mouse) gave eight mice. In addition, the original male chimera (breeding pen

[0369]

#5) has produced thirty-one additional offspring. To genotype the animals, a series of PCR primers have been developed that are used to identify mutant and wild-type mPMSl genes. They are: (SEQ ID NOS: 143-148, respectively) Primer 1: 5'TTCGGTGACAGATTTGTAAATG-3' Primer 2: 5'TTTACGGAGCCCTGGC-3' Primer 3: 5'TCACCATAAAAATAGTTTCCCG-3' Primer 4: 5'TCCTGGATCATATTTTCTGAGC-3' Primer 5: 5'TTTCAGGTATGTCCTGTTACCC-3' Primer 6: 5 GAGGCAGCTTTTAAGAAACTC-3'

[0370]

Primers 1+2 (5'targeted) Primers 1 + 3 (5'untargeted) Primers 4 + 5 (3'targeted) Primers 4 + 6 (3'untargeted) The mice we have developed provide an animal model system for studying the consequences of defects in DNA mismatch repair and resultant HNPCC. The long term survival of mice homozygous and heterozygous for the mPMSl mutation and the types and timing of tumors in these mice will be determined. The mice will be screened daily for any indication of cancer onset as indicated by a hunched appearance in combination with deterioration in coat condition. These mice carrying mPMSl mutation will be used to test the effects of other factors, environmental and genetic, on tumor formation. For example, the effect of diet on colon and other type of tumors can be compared for normal mice versus those carrying mPMSl mutation either in the heterozygous or homozygous genotype. In addition, the mPMSl mutation can be put into different genetic backgrounds to learn about interactions between genes of the mismatch repair pathway and other genes involved in human cancer, for example, p53. Mice carrying mPMSl mutations will also be useful for testing the efficacy of somatic gene therapy on the cancers that arise in mice, for example, the expected colon cancers. Further, isogenic fibroblast cell lines from the homozygous and heterozygous mPMSl mice can be established for use in various cellular studies, including the determination of spontaneous mutation rates. We are currently constructing a vector for disrupting the mouse mMLHl gene to derive mice carrying mutation in mMLHl. We will compare mice carrying defects in mPMSl to mice carrying defects in mMLHl. In addition, we will construct mice that carry mutations in both genes to see whether there is a synergistic effect of having mutations in two HNPCC genes. Other studies on the mMLHl mutant mice will be as described above for the mPMSl mutant mice.

[0371]

SEQUENCE LISTING (1) GENERAL INFORMATION:

[0372]

(i) APPLICANT: Liskay, Robert M. Bronner, C Eric Baker, Sean M. Bollag, Roni J. Kolodner, Richard D. (ii) TITLE OF INVENTION: COMPOSITIONS AND METHODS RELATING TO DNA MISMATCH REPAIR GENES

[0373]

(iii) NUMBER OF SEQUENCES: 148 (iv) CORRESPONDENCE ADDRESS:

[0374]

(A) ADDRESSEE: Kolisch, Hartwell, Dickinson, McCormack & Heuser

[0375]

(B) STREET: 520 S.W. Yamhill Street, Suite 200

[0376]

(C) CITY: Portland

[0377]

(D) STATE: Oregon

[0378]

(E) COUNTRY: U.S.A.

[0379]

(F) ZIP: 97204

[0380]

(v) COMPUTER READABLE FORM:

[0381]

(A) MEDIUM TYPE: Floppy disk • (B) COMPUTER: IBM PC compatible

[0382]

(C) OPERATING SYSTEM: PC-DOS/MS-DOS

[0383]

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 (vi) CURRENT APPLICATION DATA:

[0384]

(A) APPLICATION NUMBER:

[0385]

(B) FILING DATE:

[0386]

(C) CLASSIFICATION:

[0387]

(viii) ATTORNEY/AGENT INFORMATION:

[0388]

(A) NAME: Van Rysselberghe, Pierre C.

[0389]

(B) REGISTRATION NUMBER: 33,557

[0390]

(C) REFERENCE/DOCKET NUMBER: OHSU 306B (ix) TELECOMMUNICATION INFORMATION:

[0391]

(A) TELEPHONE: (503) 224-6655

[0392]

(B) TELEFAX: (503) 295-6679

[0393]

(C) TELEX: 360619

[0394]

(2) INFORMATION FOR SEQ ID NO:l:

[0395]

(i) SEQUENCE CHARACTERISTICS:

[0396]

(A) LENGTH: 361 amino acids

[0397]

(B) TYPE: amino acid

[0398]

(C) STRANDEDNESS: single

[0399]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA

[0400]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l:

[0401]

Met Pro lie Gin Val Leu Pro Pro Gin Leu Ala Asn Gin lie Ala Ala 1 5 10 15

[0402]

Gly Glu Val Val Glu Arg Pro Ala Ser Val Val Lys Glu Leu Val Glu

[0403]

20 25 30

[0404]

Asn Ser Leu Asp Ala Gly Ala Thr Arg Val Asp lie Asp lie Glu Arg

[0405]

35 40 45

[0406]

Gly Gly Ala Lys Leu lie Arg lie Arg Asp Asn Gly Cys Gly lie Lys

[0407]

50 55 60

[0408]

Lys Glu Glu Leu Ala Leu Ala Leu Ala Arg His Ala Thr Ser Lys lie 65 70 75 80

[0409]

Ala Ser Leu Asp Asp Leu Glu Ala lie lie Ser Leu Gly Phe Arg Gly

[0410]

85 90 95

[0411]

Glu AΪa Leu Ala Ser lie Ser Ser Val Ser Arg Leu Thr Leu Thr Ser

[0412]

100 105 110

[0413]

Arg Thr Ala Glu Gin Ala Glu Ala Trp Gin Ala Tyr Ala Glu Gly Arg 115 120 125

[0414]

Asp Met Asp Val Thr Val Lys Pro Ala Ala His Pro Val Gly Thr Thr

[0415]

130 135 140

[0416]

Leu Glu Val Leu Asp Leu Phe Tyr Asn Thr Pro Ala Arg Arg Lys Phe 145 150 155 160

[0417]

Met Arg Thr Glu Lys Thr Glu Phe Asn His He Asp Glu He He Arg

[0418]

165 170 175

[0419]

Arg He Ala Leu Ala Arg Phe Asp Val Thr Leu Asn Leu Ser His Asn

[0420]

180 185 190

[0421]

Gly Lys Leu Val Arg Gin Tyr Arg Ala Val Ala Lys Asp Gly Gin Lys

[0422]

195 200 205

[0423]

Glu Arg Arg Leu Gly Ala He Cys Gly Thr Pro Phe Leu Glu Gin Ala

[0424]

210 215 220

[0425]

Leu Ala He Glu Trp Gin His Gly Asp Lys Thr Lys Arg Gly Trp Val 225 230 235 240

[0426]

Ala Asp Pro Asn His Thr Thr Thr Ala Leu Thr Glu He Gin Tyr Cys

[0427]

245 250 255

[0428]

Tyr Val Asn Gly Arg Met Met Arg Asp Arg Leu He Asn His Ala He

[0429]

260 265 270

[0430]

Arg Gin Ala Cys Glu Asp Lys Leu Gly Ala Asp Gin Gin Pro Ala Phe

[0431]

275 280 285

[0432]

Val Leu Tyr Leu Glu He Asp Pro His Gin Val Asp Val Asn Val His

[0433]

290 295 300

[0434]

Pro Ala Lys His Glu Val Arg Phe His Gin Ser Arg Leu Val His Asp 305 310 315 320

[0435]

Phe He Tyr Gin Gly Val Leu Ser Val Leu Gin Gin Gin Thr Glu Thr

[0436]

325 330 335

[0437]

Ala Leu Pro Leu Glu Glu He Ala Pro Ala Pro Arg His Val Gin Glu

[0438]

340 345 350

[0439]

Asn Arg He Ala Ala Gly Arg Asn His 355 360

[0440]

(2) INFORMATION FOR SEQ ID NO:2:

[0441]

(i) SEQUENCE CHARACTERISTICS:

[0442]

(A) LENGTH: 538 amino acids

[0443]

(B) TYPE: amino acid

[0444]

(C) STRANDEDNESS: single

[0445]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA

[0446]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2:

[0447]

Met Ser His He He Glu Leu Pro Glu Met Leu Ala Asn Gin He Ala 1 5 10 15

[0448]

Ala Gly Glu Val He Glu Arg Pro Ala Ser Val Cys Lys Glu Leu Val

[0449]

20 25 30

[0450]

Glu Asn Ala He Asp Ala Gly Ser Ser Gin He He He Glu He Glu 35 40 45 Glu Ala Gly Leu Lys Lys Val Gin He Thr Asp Asn Gly His Gly He

[0451]

50 55 60

[0452]

Ala His Asp Glu Val Glu Leu Ala Leu Arg Arg His Ala Thr Ser Lys 65 70 75 80

[0453]

He Lys Asn Gin Ala Asp Leu Phe Arg He Arg Thr Leu Gly Phe Arg

[0454]

85 90 95

[0455]

Gly Glu Ala Leu Pro Ser He Ala Ser Val Ser Val Leu Thr Leu Leu

[0456]

100 105 110

[0457]

Thr Ala Val Asp Gly Ala Ser His Gly Thr Lys Leu Val Ala Arg Gly

[0458]

115 120 125

[0459]

Gly Glu Val Glu Glu Val He Pro Ala Thr Ser Pro Val Gly Thr Lys

[0460]

130 135 140

[0461]

Val Cys Val Glu Asp Leu Phe Phe Asn Thr Pro Ala Arg Leu Lys Tyr 145 150 155 160

[0462]

Met Lys Ser Gin Gin Ala Glu Leu Ser His He He Asp He Val Asn

[0463]

165 170 175

[0464]

Arg Leu Gly Leu Ala His Pro Glu He Ser Phe Ser Leu He Ser Asp

[0465]

180 185 190

[0466]

Gly Lys Glu Met Thr Arg Thr Ala Gly Thr Gly Gin Leu Arg Gin Ala

[0467]

195 200 205

[0468]

He Ala Gly He Tyr Gly Leu Val Ser Ala Lys Lys Met He Glu He 210 215 220

[0469]

Glu Asn Ser Asp Leu Asp Phe Glu He Ser Gly Phe Val Ser Leu Pro 225 230 235 240

[0470]

Glu Leu Thr Arg Ala Asn Arg Asn Tyr He Ser Leu Phe He Asn Gly

[0471]

245 250 255

[0472]

Arg Tyr He Lys Asn Phe Leu Leu Asn Arg Ala He Leu Asp Gly Phe

[0473]

260 265 270

[0474]

Gly Ser Lys Leu Met Val Gly Arg Phe Pro Leu Ala Val He His He

[0475]

275 280 285

[0476]

His He Asp Pro Tyr Leu Ala Asp Val Asn Val His Pro Thr Lys Gin

[0477]

290 295 300

[0478]

Glu Val Arg He Ser Lys Glu Lys Glu Leu Met Thr Leu Val Ser Glu 305 310 315 320

[0479]

Ala He Ala Asn Ser Leu Lys Glu Gin Thr Leu He Pro Asp Ala Leu

[0480]

325 330 335

[0481]

Glu Asn Leu Ala Lys Ser Thr Val Arg Asn Arg Glu Lys Val Glu Gin

[0482]

340 345 350

[0483]

Thr He Leu Pro Leu Ser Phe Pro Glu Leu Glu Phe Phe Gly Gin Met

[0484]

355 360 365

[0485]

His Gly Thr Tyr Leu Phe Ala Gin Gly Arg Asp Gly Leu Tyr He He

[0486]

370 375 380

[0487]

Asp Gin His 'Ala Ala Gin Glu Arg Val Lys Tyr Glu Glu Tyr Arg Glu 385 390 395 400

[0488]

Ser He Gly Asn Val Asp Gin Ser Gin Gin Gin Leu Leu Val Pro Tyr

[0489]

405 410 415

[0490]

He Phe Glu Phe Pro Ala Asp Asp Ala Leu Arg Leu Lys Glu Arg Met

[0491]

420 425 430

[0492]

Pro Leu Leu Glu Glu Val Gly Val Phe Leu Ala Glu Tyr Gly Glu Asn

[0493]

435 440 445

[0494]

Gin Phe He Leu Arg Glu His Pro He Trp Met Ala Glu Glu Glu He

[0495]

450 455 460

[0496]

Glu Ser Gly He Tyr Glu Met Cys Asp Met Leu Leu Leu Thr Lys Glu 465 470 475 480

[0497]

Val Ser He Lys Lys Tyr Arg Ala Glu Leu Ala He Met Met Ser Cys 485 490 495

[0498]

Lys Arg Ser He Lys Ala Asn His Arg He Asp Asp His Ser Ala Arg

[0499]

500 505 510

[0500]

Gin Leu Leu Tyr Gin Leu Ser Gin Cys Asp Asn Pro Tyr Asn Cys Pro

[0501]

515 520 525

[0502]

His Gly Arg Pro Val Leu Val His Phe Thr 530 535 (2) INFORMATION FOR SEQ ID NO:3:

[0503]

(i) SEQUENCE CHARACTERISTICS:

[0504]

(A) LENGTH: 607 amino acids

[0505]

(B) TYPE: amino acid

[0506]

(C) STRANDEDNESS: single

[0507]

(D) TOPOLOGY: linear

[0508]

(ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3:

[0509]

Met Phe His His He Glu Asn Leu Leu He Glu Thr Glu Lys Arg Cys 1 5 10 15

[0510]

Lys Gin Lys Glu Gin Arg Tyr He Pro Val Lys Tyr Leu Phe Ser Met

[0511]

20 25 30

[0512]

Thr Gin He His Gin He Asn Asp He Asp Val His Arg He Thr Ser

[0513]

35 40 45

[0514]

Gly Gin Val He Thr Asp Leu Thr Thr Ala Val Lys Glu Leu Val Asp

[0515]

50 55 60

[0516]

Asn Ser He Asp Ala Asn Ala Asn Gin He Glu He He Phe Lys Asp 65 70 75 80

[0517]

Tyr Gly Leu Glu Ser He Glu Cys Ser Asp Asn Gly Asp Gly He Asp

[0518]

85 90 95

[0519]

Pro Ser Asn Tyr Glu Phe Leu Ala Leu Lys His Tyr Thr Ser Lys He

[0520]

100 105 110

[0521]

Ala Lys Phe Gin Asp Val Ala Lys Val Gin Thr Leu Gly Phe Arg Gly

[0522]

115 120 125

[0523]

Glu Ala Leu Ser Ser Leu Cys Gly He Ala Lys Leu Ser Val He Thr 130 135 140

[0524]

Thr Thr Ser Pro Pro Lys Ala Asp Lys Glu Leu Tyr Asp Met Val Gly 145 150 155 160

[0525]

His He Thr Ser Lys Thr Thr Thr Ser Arg Asn Lys Gly Thr Thr Val

[0526]

165 170 175

[0527]

Leu Val Ser Gin Leu Phe His Asn Leu Pro Val Arg Gin Lys Glu Phe

[0528]

180 185 190

[0529]

Ser Lys Thr Phe Lys Arg Gin Phe Thr Lys Cys Leu Thr Val He Gin

[0530]

195 200 205

[0531]

Gly Tyr Ala He He Asn Ala Ala He Lys Phe Ser Val Trp Asn He

[0532]

210 215 220

[0533]

Thr Pro Lys Gly Lys Lys Asn Leu He Leu Ser Thr Met Arg Asn Ser 225 230 235 240

[0534]

Ser Met Arg Lys Asn He Ser Ser Val Phe Gly Ala Gly Gly Met Arg

[0535]

245 250 255

[0536]

Gly Glu Leu Glu Val Asp Leu Val Leu Asp Leu Asn Pro Phe Lys Asn

[0537]

260 265 270

[0538]

Arg Met Leu Gly Lys Tyr Thr Asp Asp Pro Asp Phe Leu Asp Leu Asp 275 280 285 Tyr Lys He Arg Val Lys Gly Tyr He Ser Gin Asn Ser Phe Gly Cys

[0539]

290 295 300

[0540]

Gly Arg Asn Ser Lys Asp Arg Gin Phe He Tyr Val Asn Lys Arg Pro 305 310 315 320

[0541]

Val Glu Tyr Ser Thr Leu Leu Lys Cys Cys Asn Glu Val Tyr Lys Thr

[0542]

325 330 335

[0543]

Phe Asn Asn Val Gin Phe Pro Ala Val Phe Leu Asn Leu Glu Leu Pro

[0544]

340 345 350

[0545]

Met Ser Leu He Asp Val Asn Val Thr Pro Asp Lys Arg Val He Leu

[0546]

355 360 365

[0547]

Leu His Asn Glu Arg Ala Val He Asp He Phe Lys Thr Thr Leu Ser

[0548]

370 375 380

[0549]

Asp Tyr Tyr Asn Arg Gin Glu Leu Ala Leu Pro Lys Arg Met Cys Ser 385 390 395 400

[0550]

Gin Ser Glu Gin Gin Ala Gin Lys Arg Leu Leu Thr Glu Val Phe Asp 405 410 415

[0551]

Asp Asp Phe Lys Lys Met Glu Val Val Gly Gin Phe Asn Leu Gly Phe

[0552]

420 425 430

[0553]

He He Val Thr Arg Lys Val Asp Asn Lys Ser Asp Leu Phe He Val

[0554]

435 440 445

[0555]

Asp Gin His Ala Ser Asp Glu Lys Tyr Asn Phe Glu Thr Leu Gin Ala

[0556]

450 455 460

[0557]

Val Thr Val Phe Lys Ser Gin Lys Leu He He Pro Gin Pro Val Glu 465 470 475 480

[0558]

Leu Ser Val He Asp Glu Leu Val Val Leu Asp Asn Leu Pro Val Phe

[0559]

485- 490 495

[0560]

Glu Lys Asn Gly Phe Lys Leu Lys He Asp Glu Glu Glu Glu Phe Gly

[0561]

500 505 510

[0562]

Ser Arg Val Lys Leu Leu Ser Leu Pro Thr Ser Lys Gin Thr Leu Phe

[0563]

515 520 525

[0564]

Asp Leu Gly Asp Phe Asn Glu Leu He His Leu He Lys Glu Asp Gly

[0565]

530 535 540

[0566]

Gly Leu Arg Arg Asp Asn He Arg Cys Ser Lys He Arg Ser Met Phe 545 550 555 560

[0567]

Ala Met Arg Ala Cys Arg Ser Ser He Met He Gly Lys Pro Leu Asn

[0568]

565 570 575

[0569]

Lys Lys Thr Met Thr Arg Val Val His Asn Leu Ser Glu Leu Asp Lys

[0570]

580 585 590

[0571]

Pro Trp Asn Cys Pro His Gly Arg Pro Thr Met Arg His Leu Met 595 600 605 (2) INFORMATION FOR SEQ ID NO:4:

[0572]

(i) SEQUENCE CHARACTERISTICS:

[0573]

(A) LENGTH: 2484 base pairs

[0574]

(B) TYPE: nucleic acid

[0575]

(C) STRANDEDNESS: single

[0576]

(D) TOPOLOGY: linear

[0577]

(ii) MOLECULE TYPE: DNA (genomic)

[0578]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: : CTTGGCTCTT CTGGCGCCAA AATGTCGTTC GTGGCAGGGG TTATTCGGCG GCTGGACGAG 60 ACAGTGGTGA ACCGCATCGC GGCGGGGGAA GTTATCCAGC GGCCAGCTAA TGCTATCAAA 120 GAGATGATTG AGAACTGTTT AGATGCAAAA TCCACAAGTA TTCAAGTGAT TGTTAAAGAG 180 GGAGGCCTGA AGTTGATTCA GATCCAAGAC AATGGCACCG GGATCAGGAA AGAAGATCTG 240 GATATTGTAT GTGAAAGGTT CACTACTAGT AAACTGCAGT CCTTTGAGGA TTTAGCCAGT 300 ATTTCTACCT ATGGCTTTCG AGGTGAGGCT TTGGCCAGCA TAAGCCATGT GGCTCATGTT 360 ACTATTACAA CGAAAACAGC TGATGGAAAG TGTGCATACA GAGCAAGTTA CTCAGATGGA 420 AAACTGAAAG CCCCTCCTAA ACCATGTGCT GGCAATCAAG GGACCCAGAT CACGGTGGAG 480 GACCTTTTTT ACAACATAGC CACGAGGAGA AAAGCTTTAA AAAATCCAAG TGAAGAATAT 540 GGGAAAATTT TGGAAGTTGT TGGCAGGTAT TCAGTACACA ATGCAGGCAT TAGTTTCTCA 600 GTTAAAAAAC AAGGAGAGAC AGTAGCTGAT GTTAGGACAC TACCCAATGC CTCAACCGTG 660 GACAATATTC GCTCCATCTT TGGAAATGCT GTTAGTCGAG AACTGATAGA AATTGGATGT 720 GAGGATAAAA CCCTAGCCTT CAAAATGAAT GGTTACATAT CCAATGCAAA CTACTCAGTG 780 AAGAAGTGCA TCTTCTTACT CTTCATCAAC CATCGTCTGG TAGAATCAAC TTCCTTGAGA 840 AAAGCCATAG AAACAGTGTA TGCAGCCTAT TTGCCCAAAA ACACACACCC ATTCCTGTAC 900 CTGAGTTTAG AAATCAGTCC CCAGAATGTG GATGTTAATG TGCACCCCAC AAAGCATGAA 960 GTTCACTTCC TGCACGAGGA GAGCATCCTG GAGCGGGTGC AGCAGCACAT CGAGAGCAAG 1020 CTCCTGGGCT CCAATTCCTC CAGGATGTAC TTCACCCAGA CTTTGCTACC AGGACTTGCT 1080 GGCCCCTCTG GGGAGATGGT TAAATCCACA ACAAGTCTGA CCTCGTCTTC TACTTCTGGA 1140 AGTAGTGATA AGGTCTATGC CCACCAGATG GTTCGTACAG ATTCCCGGGA ACAGAAGCTT 1200 GATGCATTTC TGCAGCCTCT GAGCAAACCC CTGTCCAGTC AGCCCCAGGC CATTGTCACA 1260 GAGGATAAGA CAGATATTTC TAGTGGCAGG GCTAGGCAGC AAGATGAGGA GATGCTTGAA 1320 CTCCCAGCCC CTGCTGAAGT GGCTGCCAAA AATCAGAGCT TGGAGGGGGA TACAACAAAG 1380 GGGACTTCAG AAATGTCAGA GAAGAGAGGA CCTACTTCCA GCAACCCCAG AAAGAGACAT 1440 CGGGAAGATT CTGATGTGGA AATGGTGGAA GATGATTCCC GAAAGGAAAT GACTGCAGCT 1500 TGTACCCCCC GGAGAAGGAT CATTAACCTC ACTAGTGTTT TGAGTCTCCA GGAAGAAATT 1560 AATGAGCAGG GACATGAGGT TCTCCGGGAG ATGTTGCATA ACCACTCCTT CGTGGGCTGT 1620 GTGAATCCTC AGTGGGCCTT GGCACAGCAT CAAACCAAGT TATACCTTCT CAACACCACC 1680 AAGCTTAGTG AAGAACTGTT CTACCAGATA CTCATTTATG ATTTTGCCAA TTTTGGTGTT 1740 CTCAGGTTAT CGGAGCCAGC ACCGCTCTTT GACCTTGCCA TGCTTGCCTT AGATAGTCCA 1800 GAGAGTGGCT GGACAGAGGA AGATGGTCCC AAAGAAGGAC TTGCTGAATA CATTGTTGAG 1860 TTTCTGAAGA AGAAGGCTGA GATGCTTGCA GACTATTTCT CTTTGGAAAT TGATGAGGAA 1920 GGGAACCTGA TTGGATTACC CCTTCTGATT GACAACTATG TGCCCCCTTT GGAGGGACTG 1980 CCTATCTTCA TTCTTCGACT AGCCACTGAG GTGAATTGGG ACGAAGAAAA GGAATGTTTT 2040 GAAAGCCTCA GTAAAGAATG CGCTATGTTC TATTCCATCC GGAAGCAGTA CATATCTGAG 2100 GAGTCGACCC TCTCAGGCCA GCAGAGTGAA GTGCCTGGCT CCATTCCAAA CTCCTGGAAG 2160 TGGACTGTGG AACACATTGT CTATAAAGCC TTGCGCTCAC ACATTCTGCC TCCTAAACAT 2220 TTCACAGAAG ATGGAAATAT CCTGCAGCTT GCTAACCTGC CTGATCTATA CAAAGTCTTT 2280

[0579]

GAGAGGTGTT AAATATGGTT ATTTATGCAC TGTGGGATGT GTTCTTCTTT CTCTGTATTC 2340

[0580]

CGATACAAAG TGTTGTATCA AAGTGTGATA TACAAAGTGT ACCAACATAA GTGTTGGTAG 2400

[0581]

CACTTAAGAC TTATACTTGC CTTCTGATAG TATTCCTTTA TACACAGTGG ATTGATTATA 2460

[0582]

AATAAATAGA TGTGTCTTAA CATA 2484

[0583]

(2) INFORMATION FOR SEQ ID NO:5:

[0584]

(i) SEQUENCE CHARACTERISTICS:

[0585]

(A) LENGTH: 756 amino acids

[0586]

(B) TYPE: amino acid

[0587]

(C) STRANDEDNESS: single

[0588]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein

[0589]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5:

[0590]

Met Ser Phe Val Ala Gly Val He Arg Arg Leu Asp Glu Thr Val Val 1 5 10 15

[0591]

Asn Arg He Ala Ala Gly Glu Val He Gin Arg Pro Ala Asn Ala He

[0592]

20 25 30

[0593]

Lys Glu Met He Glu Asn Cys Leu Asp Ala Lys Ser Thr Ser He Gin

[0594]

35 40 45

[0595]

Val He Val Lys Glu Gly Gly Leu Lys Leu He Gin He Gin Asp Asn

[0596]

50 55 60

[0597]

Gly Thr Gly He Arg Lys Glu Asp Leu Asp He Val Cys Glu Arg Phe 65 -70 75 80

[0598]

Thr Thr Ser Lys Leu Gin Ser Phe Glu Asp Leu Ala Ser He Ser Thr

[0599]

85 90 95

[0600]

Tyr Gly Phe Arg Gly Glu Ala Leu Ala Ser He Ser His Val Ala His

[0601]

100 105 110

[0602]

Val Thr He Thr Thr Lys Thr Ala Asp Gly Lys Cys Ala Tyr Arg Ala

[0603]

115 120 125

[0604]

Ser Tyr Ser Asp Gly Lys Leu Lys Ala Pro Pro Lys Pro Cys Ala Gly

[0605]

130 135 140

[0606]

Asn Gin Gly Thr Gin He Thr Val Glu Asp Leu Phe Tyr Asn He Ala 145 150 155 160

[0607]

Thr Arg Arg Lys Ala Leu Lys Asn Pro Ser Glu Glu Tyr Gly Lys He

[0608]

165 170 175

[0609]

Leu Glu Val Val Gly Arg Tyr Ser Val His Asn Ala Gly He Ser Phe

[0610]

180 185 190

[0611]

Ser Val Lys Lys Gin Gly Glu Thr Val Ala Asp Val Arg Thr Leu Pro

[0612]

195 200 205

[0613]

Asn Ala Ser Thr Val Asp Asn He Arg Ser He Phe Gly Asn Ala Val

[0614]

210 215 220

[0615]

Ser Arg Glu Leu He Glu He Gly Cys Glu Asp Lys Thr Leu Ala Phe 225 230 235 240 Lys Met Asn Gly Tyr He Ser Asn Ala Asn Tyr Ser Val Lys Lys Cys

[0616]

245 250 255

[0617]

He Phe Leu Leu Phe He Asn His Arg Leu Val Glu Ser Thr Ser Leu

[0618]

260 265 270

[0619]

Arg Lys Ala He Glu Thr Val Tyr Ala Ala Tyr Leu Pro Lys Asn Thr

[0620]

275 280 285

[0621]

His Pro Phe Leu Tyr Leu Ser Leu Glu He Ser Pro Gin Asn Val Asp

[0622]

290 295 300

[0623]

Val Asn Val His Pro Thr Lys His Glu Val His Phe Leu His Glu Glu 305 310 315 320

[0624]

Ser He Leu Glu Arg Val Gin Gin His He Glu Ser Lys Leu Leu Gly

[0625]

325 330 335

[0626]

Ser Asn Ser Ser Arg Met Tyr Phe Thr Gin Thr Leu Leu Pro Gly Leu

[0627]

340 345 350

[0628]

Ala Gly Pro Ser Gly Glu Met Val Lys Ser Thr Thr Ser Leu Thr Ser

[0629]

355 360 365

[0630]

Ser Ser Thr Ser Gly Ser Ser Asp Lys Val Tyr Ala His Gin Met Val

[0631]

370 375 380

[0632]

Arg Thr Asp Ser Arg Glu Gin Lys Leu Asp Ala Phe Leu Gin Pro Leu 385 390 395 400

[0633]

Ser Lys Pro Leu Ser Ser Gin Pro Gin Ala He Val Thr Glu Asp Lys

[0634]

405 410 415

[0635]

Thr Asp He Ser Ser Gly Arg Ala Arg Gin Gin Asp Glu Glu Met Leu

[0636]

420 425. 430

[0637]

Glu Leu Pro Ala Pro Ala Glu Val Ala Ala Lys Asn Gin Ser Leu Glu

[0638]

435 440 445

[0639]

Gly Asp Thr Thr Lys Gly Thr Ser Glu Met Ser Glu Lys Arg Gly Pro

[0640]

450 455 460

[0641]

Thr Ser Ser Asn Pro Arg Lys Arg His Arg Glu Asp Ser Asp Val Glu 465 470 475 480

[0642]

Met Val Glu Asp Asp Ser Arg Lys Glu Met Thr Ala Ala Cys Thr Pro

[0643]

485 490 495

[0644]

Arg Arg Arg He He Asn Leu Thr Ser Val Leu Ser Leu Gin Glu Glu

[0645]

500 505 510

[0646]

He Asn Glu Gin Gly His Glu Val Leu Arg Glu Met Leu His Asn His

[0647]

515 520 525

[0648]

Ser Phe Val Gly Cys Val Asn Pro Gin Trp Ala Leu Ala Gin His Gin

[0649]

530 535 540

[0650]

Thr Lys Leu Tyr Leu Leu Asn Thr Thr Lys Leu Ser Glu Glu Leu Phe 545 550 555 560

[0651]

Tyr Gin He Leu He Tyr Asp Phe Ala Asn Phe Gly Val Leu Arg Leu

[0652]

565 570 575

[0653]

Ser Glu Pro Ala Pro Leu Phe Asp Leu Ala Met Leu Ala Leu Asp Ser 580 585 590 Pro Glu Ser Gly Trp Thr Glu Glu Asp Gly Pro Lys Glu Gly Leu Ala

[0654]

595 600 605

[0655]

Glu Tyr He Val Glu Phe Leu Lys Lys Lys Ala Glu Met Leu Ala Asp

[0656]

610 615 620

[0657]

Tyr Phe Ser Leu Glu He Asp Glu Glu Gly Asn Leu He Gly Leu Pro 625 630 635 640

[0658]

Leu Leu He Asp Asn Tyr Val Pro Pro Leu Glu Gly Leu Pro He Phe

[0659]

645 650 655

[0660]

He Leu Arg Leu Ala Thr Glu Val Asn Trp Asp Glu Glu Lys Glu Cys 660 665 670

[0661]

Phe Glu Ser Leu Ser Lys Glu Cys Ala Met Phe Tyr Ser He Arg Lys

[0662]

675 680 685

[0663]

Gin Tyr He Ser Glu Glu Ser Thr Leu Ser Gly Gin Gin Ser Glu Val

[0664]

690 695 700

[0665]

Pro Gly Ser He Pro Asn Ser Trp Lys Trp Thr Val Glu His He Val 705 710 715 720

[0666]

Tyr Lys Ala Leu Arg Ser His He Leu Pro Pro Lys His Phe Thr Glu

[0667]

725 730 735

[0668]

Asp Gly Asn He Leu Gin Leu Ala Asn Leu Pro Asp Leu Tyr Lys Val

[0669]

740 745 750

[0670]

Phe Glu Arg Cys 755

[0671]

(2) INFORMATION FOR SEQ ID NO:6:

[0672]

(i) SEQUENCE CHARACTERISTICS:

[0673]

(A) LENGTH: 397 base pairs

[0674]

(B) TYPE: nucleic acid

[0675]

(C) STRANDEDNESS: single

[0676]

(D) TOPOLOGY: linear

[0677]

(ii) MOLECULE TYPE: DNA (genomic)

[0678]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:6:

[0679]

TGGCTGGATG CTAAGCTACA GCTGAAGGAA GAACGTGAGC ACGAGGCACT GAGGTGATTG 60

[0680]

GCTGAAGGCA CTTCCGTTGA GCATCTAGAC GTTTCCTTGG CTCTTCTGGC GCCAAAATGT 120

[0681]

CGTTCGTGGC AGGGGTTATT CGGCGGCTGG ACGAGACAGT GGTGAACCGC ATCGCGGCGG 180

[0682]

GGGAAGTTAT CCAGCGGCCA GCTAATGCTA TCAAAGAGAT GATTGAGAAC TGGTACGGAG 240

[0683]

GGAGTCGAGC CGGGCTCACT TAAGGGCTAC GACTTAACGG GCCGCGTCAC TCAATGGCGC 300

[0684]

GGACACGCCT CTTTCCCCGG GCAGAGGCAT GTACAGCGCA TGCCCACAAC GGCGGAGGCC 360

[0685]

GCCGGGTTCC CTACGTGCCA TAAGCCTTCT CCTTTTC 397 (2) INFORMATION FOR SEQ ID NO:7:

[0686]

(i) SEQUENCE CHARACTERISTICS:

[0687]

(A) LENGTH: 393 base pairs

[0688]

(B) TYPE: nucleic acid

[0689]

(C) STRANDEDNESS: single

[0690]

(D) TOPOLOGY: linear

[0691]

(ii) MOLECULE TYPE: DNA (genomic)

[0692]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:7:

[0693]

AAACACGTTA ATGAGGCACT ATTGTTTGTA TTTGGAGTTT GTTATCATTG CTTGGCTCAT 60

[0694]

ATTAAAATAT GTACATTAGA GTAGTTGCAG ACTGATAAAT TATTTTCTGT TTGATTTGCC 120

[0695]

AGTTTAGATG CAAAATCCAC AAGTATTCAA GTGATTGTTA AAGAGGGAGG CCTGAAGTTG 180

[0696]

ATTCAGATCC AAGACAATGG CACCGGGATC AGGGTAAGTA AAACCTCAAA GTAGCAGGAT 240

[0697]

GTTTGTGCGC TTCATGGAAG AGTCAGGACC TTTCTCTGTT CTGGAAACTA GGCTTTTGCA 300

[0698]

GATGGGATTT TTTCACTGAA AAATTCAACA CCAACAATAA ATATTTATTG AGTACCTATT 360

[0699]

ATTTGCGGGG CACTGTTCAG GGGATGTGTC AGT 393

[0700]

(2) INFORMATION FOR SEQ ID NO:8:

[0701]

(i) SEQUENCE CHARACTERISTICS:

[0702]

(A) LENGTH: 352 base pairs

[0703]

(B) TYPE: nucleic acid

[0704]

(C) STRANDEDNESS: single

[0705]

(D) TOPOLOGY: linear

[0706]

(ii) MOLECULE TYPE: DNA (genomic)

[0707]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8:

[0708]

TTTCCTGGAT TAATCAAGAA ATGGAATTCA AAGAGATTTG GAAAATGAGT AACATGATTA 60

[0709]

TTTACTCATC TTTTTGGTAT CTAACAGAAA GAAGATCTGG ATATTGTATG TGAAAGGTTC 120

[0710]

ACTACTAGTA AACTGCAGTC CTTTGAGGAT TTAGCCAGTA TTTCTACCTA TGGCTTTCGA 180

[0711]

GGTGAGGTAA GCTAAAGATT CAAGAAATGT GTAAAATATC CTCCTGTGAT GACATTGTCT 240

[0712]

GTCATTTGTT AGTATGTATT TCTCAACATA GATAAATAAG GTTTGGTACC TTTTACTTGT 300

[0713]

TAAATGTATG"CAAATCTGAG CAAACTTAAT GAACTTTAAC TTTCAAAGAC TG 352

[0714]

(2) INFORMATION FOR SEQ ID NO:9:

[0715]

(i) SEQUENCE CHARACTERISTICS:

[0716]

(A) LENGTH: 287 base pairs

[0717]

(B) TYPE: nucleic acid

[0718]

(C) STRANDEDNESS: single

[0719]

(D) TOPOLOGY: linear

[0720]

(ii) MOLECULE TYPE: DNA (genomic)

[0721]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9:

[0722]

TGGAAGCAGC AGCAGATAAC CTTTCCCTTT GGTGAGGTGA CAGTGGGTGA CCCAGCAGTG 60

[0723]

AGTTTTTCTT TCAGTCTATT TTCTTTTCTT CCTTAGGCTT TGGCCAGCAT AAGCCATGTG 120

[0724]

GCTCATGTTA CTATTACAAC GAAAACAGCT GATGGAAAGT GTGCATACAG GTATAGTGCT 180

[0725]

GACTTCTTTT ACTCATATAT ATTCATTCTG AAATGTATTT TGGGCCTAGG TCTCAGAGTA 240

[0726]

ATCCTGTCTC AACACCAGTG TTATCTTTGG CAGAGATCTT GAGTACG 287 (2) INFORMATION FOR SEQ ID NO:10: (i) SEQUENCE CHARACTERISTICS:

[0727]

(A) LENGTH: 336 base pairs

[0728]

(B) TYPE: nucleic acid

[0729]

(C) STRANDEDNESS: single

[0730]

(D) TOPOLOGY: linear

[0731]

(ii) MOLECULE 'TYPE: DNA (genomic)

[0732]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:10:

[0733]

TTGATATGAT TTTCTCTTTT CCCCTTGGGA TTAGTATCTA TCTCTCTACT GGATATTAAT 60

[0734]

TTGTTATATT TTCTCATTAG AGCAAGTTAC TCAGATGGAA AACTGAAAGC CCCTCCTAAA 120

[0735]

CCATGTGCTG GCAATCAAGG GACCCAGATC ACGGTAAGAA TGGTACATGG GAGAGTAAAT 180

[0736]

TGTTGAAGCT TTGTTTGTAT AAATATTGGA ATAAAAAATA AAATTGCTTC TAAGTTTTCA 240

[0737]

GGGTAATAAT AAAATGAATT TGCACTAGTT AATGGAGGTC CCAAGATATC CTCTAAGCAA 300

[0738]

GATAAATGAC TATTGGCTTT TTGGCATGGC AGCCTG 336

[0739]

(2) INFORMATION FOR SEQ ID NO:11: (i) SEQUENCE CHARACTERISTICS:

[0740]

(A) LENGTH: 275 base pairs

[0741]

(B) TYPE: nucleic acid

[0742]

(C) STRANDEDNESS: single

[0743]

(D) TOPOLOGY: linear

[0744]

(ii) MOLECULE TYPE: DNA (genomic)

[0745]

(xi) SEQUENCE DESCRIPTION: SEQ ID'NO:ll:

[0746]

GCTTTTGCCA GGACCATCTT GGGTTTTATT TTCAAGTACT TCTATGAATT TACAAGAAAA 60

[0747]

' ATCAATCTTC TGTTCAGGTG GAGGACCTTT TTTACAACAT AGCCACGAGG AGAAAAGCTT 120

[0748]

TAAAAAATCC AAGTGAAGAA TATGGGAAAA TTTTGGAAGT TGTTGGCAGG TACAGTCCAA 180

[0749]

AATCTGGGAG TGGGTCTCTG AGATTTGTCA TCAAAGTAAT GTGTTCTAGT GCTCATACAT 240

[0750]

TGAACAGTTG CTGAGCTAGA TGGTGAAAAG TAAAA 275

[0751]

(2) INFORMATION FOR SEQ ID NO:12: (i) SEQUENCE CHARACTERISTICS:

[0752]

(A) LENGTH: 389 base pairs

[0753]

(B) TYPE: nucleic acid

[0754]

(C) STRANDEDNESS: single

[0755]

(D) TOPOLOGY: linear

[0756]

(ii) MOLECULE TYPE: DNA (genomic)

[0757]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:12:

[0758]

CAGCAACCTA TAAAAGTAGA GAGGAGTCTG TGTTTTGACG CAGCACCTTT AGCATTTTTA 60

[0759]

TTTGGATGAA GTTTCTGCTG GTTTATTTTT CTGTGGGTAA AATATTAATA GGCTGTATGG 120

[0760]

AGATATTTTT CTTTATATGT ACCTTTGTTT AGATTACTCA ACTCCACTAA TTTATTTAAC 180

[0761]

TAAAAGGGGG CTCTGACATC TAGTGTGTGT TTTTGGCAAC TCTTTTCTTA CTCTTTTGTT 240

[0762]

TTTCTTTTCC AGGTATTCAG TACACAATGC AGGCATTAGT TTCTCAGTTA AAAAAGTAAG 300

[0763]

TTCTTGGTTT ATGGGGGATG GTTTTGTTTT ATGAAAAGAA AAAAGGGGAT TTTTAATAGT 360

[0764]

TTGCTGGTGG AGATAAGGTT ATGATGTTT 389 (2) INFORMATION FOR SEQ ID NO:13: (i) SEQUENCE CHARACTERISTICS:

[0765]

(A) LENGTH: 381 base pairs

[0766]

(B) TYPE: nucleic acid

[0767]

(C) STRANDEDNESS: single

[0768]

(D) TOPOLOGY: linear

[0769]

(ii) MOLECULE TYPE: DNA (genomic)

[0770]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13:

[0771]

ATGTTTCAGT CTCAGCCATG AGACAATAAA TCCTTGTGTC TTCTGCTGTT TGTTTATCAG 60

[0772]

CAAGGAGAGA CAGTAGCTGA TGTTAGGACA CTACCCAATG CCTCAACCGT GGACAATATT 120

[0773]

CGCTCCATCT TTGGAAATGC TGTTAGTCGG TATGTCGATA ACCTATATAA AAAAATCTTT 180

[0774]

TACATTTATT ATCTTGGTTT ATCATTCCAT CACATTATTT GGGAACCTTT CAAGATATTA 240

[0775]

TGTGTGTTAA GAGTTTGCTT TAGTCAAATA CACAGGCTTG TTTTATGCTT CAGATTTGTT 300

[0776]

AATGGAGTTC TTATTTCACG TAATCAACAC TTTCTAGGTG TATGTAATCT CCTAGATTCT 360

[0777]

GTGGCGTGAA TCATGTGTTC T 381

[0778]

(2) INFORMATION FOR SEQ ID NO:14: (i) SEQUENCE CHARACTERISTICS:

[0779]

(A) LENGTH: 526 base pairs

[0780]

(B) TYPE: nucleic acid

[0781]

(C) STRANDEDNESS: single (D) TOPOLOGY: linear

[0782]

(ii) MOLECULE TYPE: DNA (genomic)

[0783]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:14:

[0784]

ACTGAGTAGG GTAGGTGGGT GAGTGGGTGG GTGGGTGGGT GGGTGGATGG ATGGATGGGA 60

[0785]

GGATGGGTGG GTGAATGGGT GAACAGACAA ATGGATGGAT GAATGGACAG GCACAGGAGG 120

[0786]

ACCTCAAATG GACCAAGTCT TCGGGGCCCT CATTTCACAA AGTTAGTTTA TGGGAAGGAA 180

[0787]

CCTTGTGTTT TTAAATTCTG ATTCTTTTGT AATGTTTGAG TTTTGAGTAT TTTCAAAAGC 240

[0788]

TTCAGAATCT CTTTTCTAAT AGAGAACTGA TAGAAATTGG ATGTGAGGAT AAAACCCTAG 300

[0789]

CCTTCAAAAT GAATGGTTAC ATATCCAATG CAAACTACTC AGTGAAGAAG TGCATCTTCT 360

[0790]

TACTCTTCAT CAACCGTAAG TTAAAAAGAA CCACATGGGA AATCCACTCA CAGGAAACAC 420

[0791]

CCACAGGGAA TTTTATGGGA CCATGGAAAA ATTTCTGAGT CCATAGGTTT GATTAAACAT 480

[0792]

GGAGAAACCT CATGGCAAAG TTTGGTTTTA TTGGGAAGCA TGTATA 526

[0793]

(2) INFORMATION FOR SEQ ID NO:15: (i) SEQUENCE CHARACTERISTICS:

[0794]

(A) LENGTH: 434 base pairs

[0795]

(B) TYPE: nucleic acid

[0796]

(C) STRANDEDNESS: single

[0797]

(D) TOPOLOGY: linear

[0798]

(ii) MOLECULE TYPE: DNA (genomic)

[0799]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:15: ATAGTGGGCT GGAAAGTGGC CACAGGTAAA GGTGCACCTT TCTTCCTGGG GATGTGATGT 60 GCATATCACT ACAGAAATGT CTTTCCTGAG GTGATGTCAT GACTTTGTGT GAATGTACAC 120 CTGTGACCTC ACCCCTCAGG ACAGTTTTGA ACTGGTTGCT TTCTTTTTAT TGTTTAGATC 180 GTCTGGTAGA ATCAACTTCC TTGAGAAAAG CCATAGAAAC AGTGTATGCA GCCTATTTGC 240

[0800]

CCAAAAACAC ACACCCATTC CTGTACCTCA GGTAATGTAG CACCAAACTC CTCAACCAAG 300

[0801]

ACTCACAAGG AACAGATGTT CTATCAGGCT CTCCTCTTTG AAAGAGATGA GCATGCTAAT 360

[0802]

AGTACAATCA GAGTGAATCC CATACACCAC TGGCAAAAGG ATGTTCTGTC CCTTCTTACA 420

[0803]

GGTACAAGGC ACAG 434

[0804]

(2) INFORMATION FOR SEQ ID NO:16: (i) SEQUENCE CHARACTERISTICS:

[0805]

(A) LENGTH: 458 base pairs

[0806]

(B) TYPE: nucleic acid

[0807]

(C) STRANDEDNESS: single

[0808]

(D) TOPOLOGY: linear

[0809]

(ii) MOLECULE TYPE: DNA (genomic)

[0810]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: CTTACGCAAA GCTACACAGC TCTTAAGTAG CAGTGCCAAT ATTTGAACAC ACTCAGACTC 60 GAGCCTGAGG TTTTGACCAC TGTGTCATCT GGCCTCAAAT CTTCTGGCCA CCACATACAC 120 CATATGTGGG CTTTTTCTCC CCCTCCCACT ATCTAAGGTA ATTGTTCTCT CTTATTTTCC 180 TGACAGTTTA GAAATCAGTC CCCAGAATGT GGATGTTAAT GTGCACCCCA CAAAGCATGA 240 AGTTCACTTC CTGCACGAGG AGAGCATCCT GGAGCGGGTG CAGCAGCACA TCGAGAGCAA 300 GCTCCTGGGC TCCAATTCCT CCAGGATGTA CTTCACCCAG GTCAGGGCGC TTCTCATCCA 360 GCTACTTCTC TGGGGCCTTT GAAATGTGCC CGGCCAGACG TGAGAGCCCA GATTTTTGCT 420 GTTATTTAGG AACTTTTTTT GAAGTATTAC CTGGATAG 458

[0811]

(2) INFORMATION FOR SEQ ID NO:17: (i) SEQUENCE CHARACTERISTICS:

[0812]

(A) LENGTH: 618 base pairs

[0813]

(B) TYPE: nucleic acid

[0814]

(C) STRANDEDNESS: single

[0815]

(D) TOPOLOGY: linear

[0816]

(ii) MOLECULE TYPE: DNA (genomic)

[0817]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: GATAATTATA CCTCATACTA GCTTCTTTCT TAGTACTGCT CCATTTGGGG ACCTGTATAT 60 CTATACTTCT TATTCTGAGT CTCTCCACTA TATATATATA TATATATATA TTTTTTTTTT 120 TTTTTTTTTT TAATACAGAC TTTGCTACCA GGACTTGCTG GCCCCTCTGG GGAGATGGTT 180 AAATCCACAA CAAGTCTGAC CTCGTCTTCT ACTTCTGGAA GTAGTGATAA GGTCTATGCC 240 CACCAGATGG TTCGTACAGA TTCCCGGGAA CAGAAGCTTG ATGCATTTCT GCAGCCTCTG 300 AGCAAACCCC TGTCCAGTCA GCCCCAGGCC ATTGTCACAG AGGATAAGAC AGATATTTCT 360 AGTGGCAGGG CTAGGCAGCA AGATGAGGAG ATGCTTGAAC TCCCAGCCCC TGCTGAAGTG 420 GCTGCCAAAA ATCAGAGCTT GGAGGGGGAT ACAACAAAGG GGACTTCAGA AATGTCAGAG 480 AAGAGAGGAC CTACTTCCAG CAACCCCAGG TATGGCCTTT TGGGAAAAGT ACAGCCTACC 540 TCCTTTATTC TGTAATAAAA CTGCCTTCTA ACTTTGGCTT TTCATGAATC ACTTGCATCT 600 TCTCTCTGCC GACTTCCC 618 (2) INFORMATION FOR SEQ ID NO:18: (i) SEQUENCE CHARACTERISTICS:

[0818]

(A) LENGTH: 478 base pairs

[0819]

(B) TYPE: nucleic acid

[0820]

(C) STRANDEDNESS: single

[0821]

(D) TOPOLOGY: linear

[0822]

(ii) MOLECULE TYPE: DNA (genomic)

[0823]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:18:

[0824]

CTGTGCTCCA GCACAGGTCA TCCAGCTCTG TAGACCAGCG CAGAGAAGTT GCTTGCTCCC 60

[0825]

AAATGCAACC CACAAAATTT GGCTAAGTTT AAAAACAAGA ATAATAATGA TCTGCACTTC 120

[0826]

CTTTTCTTCA TTGCAGAAAG AGACATCGGG AAGATTCTGA TGTGGAAATG GTGGAAGATG 180

[0827]

ATTCCCGAAA GGAAATGACT GCAGCTTGTA CCCCCCGGAG AAGGATCATT AACCTCACTA 240

[0828]

GTGTTTTGAG TCTCCAGGAA GAAATTAATG AGCAGGGACA TGAGGGTACG TAAACGCTGT 300

[0829]

GGCCTGCCTG GGATGCATAG GGCCTCAACT GCCAAGGTTT TGGAAATGGA GAAAGCAGTC 360

[0830]

ATGTTGTCAG AGTGGCACTA CAGTTTTGAT GGGCAAGCTC CTCTTCCTTT ACTAACCCAC 420

[0831]

AATAGCATCA GCTTAAAGAC AATTTTTGAT TGGGAGAAAA GGGAGAAAAT AATCTCTG 478

[0832]

(2) INFORMATION FOR SEQ ID NO:19: (i) SEQUENCE CHARACTERISTICS:

[0833]

(A) LENGTH: 377 base pairs

[0834]

(B) TYPE: nucleic acid

[0835]

(C) STRANDEDNESS: single

[0836]

(D) TOPOLOGY: linear

[0837]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:19:

[0838]

CAGTTTTCAC CAGGAGGCTC AAATCAGGCC TTTGCTTACT TGGTGTCTCT AGTTCTGGTG 60

[0839]

CCTGGTGCTT TGGTCAATGA AGTGGGGTTG GTAGGATTCT ATTACTTACC TGTTTTTTGG 120

[0840]

TTTTATTTTT TGTTTTGCAG TTCTCCGGGA GATGTTGCAT AACCACTCCT TCGTGGGCTG 180

[0841]

TGTGAATCCT CAGTGGGCCT TGGCACAGCA TCAAACCAAG TTATACCTTC TCAACACCAC 240

[0842]

CAAGCTTAGG TAAATCAGCT GAGTGTGTGA ACAAGCAGAG CTACTACAAC AATGGTCCAG 300

[0843]

GGAGCACAGG CACAAAAGCT AAGGAGAGCA GCATGAAGGT AGTTGGGAAG GGCACAGGCT 360

[0844]

TTGGAGTCAG CACATGT 377

[0845]

(2) INFORMATION FOR SEQ ID NO:20: (i) SEQUENCE CHARACTERISTICS:

[0846]

(A) LENGTH: 325 base pairs

[0847]

(B) TYPE: nucleic acid

[0848]

(C) STRANDEDNESS: single

[0849]

(D) TOPOLOGY: linear

[0850]

(ii) MOLECULE TYPE: DNA (genomic) " (xi) SEQUENCE DESCRIPTION: SEQ ID NO:20:

[0851]

CCCCTGGTTG AAGCGTTGGA ATCCCACTCT TTGGAAGATT GTGTTAGACT GTTAACCAGA 60

[0852]

TTCCACAGCC AGGCAGAACT ATGTCTGTCT CATCCATGTG TCAGGGATTA CGTCTCCCAT 120

[0853]

TTGTCCCAAC TGGTTGTATC TCAAGCATGA ATTCAGCTTT TCCTTAAAGT CACTTCATTT 180

[0854]

TTATTTTCAG TGAAGAACTG TTCTACCAGA TACTCATTTA TGATTTTGCC AATTTTGGTG 240 TTCTCAGGTT ATCGGTAAGT TTAGATCCTT TTCACTTCTG ACATTTCAAC TGACCGCCCC 300 GCAAACAGTA GCTCTCCACT AAATA 325

[0855]

(2) INFORMATION FOR SEQ ID NO:21: (i) SEQUENCE CHARACTERISTICS:

[0856]

(A) LENGTH: 341 base pairs

[0857]

(B) TYPE: nucleic acid

[0858]

(C) STRANDEDNESS: single

[0859]

(D) TOPOLOGY: linear

[0860]

(ii) MOLECULE TYPE: DNA (genomic)

[0861]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21:

[0862]

CATTTATGGT TTCTCACCTG CCATTCTGAT AGTGGATTCT TGGGAATTCA GGCTTCATTT 60

[0863]

GGATGCTCCG TTAAAGCTTG CTCCTTCATG TTCTTGCTTC TTCCTAGGAG CCAGCACCGC 120

[0864]

TCTTTGACCT TGCCATGCTT GCCTTAGATA GTCCAGAGAG TGGCTGGACA GAGGAAGATG 180

[0865]

GTCCCAAAGA AGGACTTGCT GAATACATTG TTGAGTTTCT GAAGAAGAAG GCTGAGATGC 240

[0866]

TTGCAGACTA TTTCTCTTTG GAAATTGATG AGGTGTGACA GCCATTCTTA TACTTCTGTT 300

[0867]

GTATTCTCCA AATAAAATTT CCAGCCGGGT GCATTGGCTC A 341

[0868]

(2) INFORMATION FOR SEQ ID NO:22: (i) SEQUENCE CHARACTERISTICS:

[0869]

(A) LENGTH: 260 base pairs

[0870]

(B) TYPE: nucleic acid

[0871]

(C) STRANDEDNESS: single

[0872]

(D) TOPOLOGY: linear

[0873]

(ii) MOLECULE TYPE: DNA (genomic)

[0874]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22:

[0875]

CAGATAGGAG GCACAAGGCC TGGGAAAGGC ACTGGAGAAA TGGGATTTGT TTAAACTATG 60

[0876]

ACAGCATTAT TTCTTGTTCC CTTGTCCTTT TTCCTGCAAG CAGGAAGGGA ACCTGATTGG 120

[0877]

ATTACCCCTT CTGATTGACA ACTATGTGCC CCCTTTGGAG GGACTGCCTA TCTTCATTCT 180

[0878]

TCGACTAGCC ACTGAGGTCA GTGATCAAGC AGATACTAAG CATTTCGGTA CATGCATGTG 240

[0879]

TGCTGGAGGG AAAGGGCAAA 260

[0880]

(2) INFORMATION FOR SEQ ID NO:23: (i) SEQUENCE CHARACTERISTICS:

[0881]

(A) LENGTH: 340 base pairs

[0882]

(B) TYPE: nucleic acid

[0883]

(C) STRANDEDNESS: single

[0884]

(D) TOPOLOGY: linear

[0885]

(ii) MOLECULE TYPE: DNA (genomic)

[0886]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23:

[0887]

CTATATCTTC CCAGCAATAT TCACAGTCCG TTTACAGTTT TAACGCCTAA AGTATCACAT 60

[0888]

TTCGTTTTTT AGCTTTAAGT AGTCTGTGAT CTCCGTTTAG AATGAGAATG TTTAAATTCG 120

[0889]

TACCTATTTT GAGGTATTGA ATTTCTTTGG ACCAGGTGAA TTGGGACGAA GAAAAGGAAT 180

[0890]

GTTTTGAAAG CCTCAGTAAA GAATGCGCTA TGTTCTATTC CATCCGGAAG CAGTACATAT 240 CTGAGGAGTC GACCCTCTCA GGCCAGCAGG TACAGTGGTG ATGCACACTG GCACCCCAGG 300 ACTAGGACAG GACCTCATAC ATCTTAGGAG ATGAAACTTG 340

[0891]

(2) INFORMATION FOR SEQ ID NO:24: (i) SEQUENCE CHARACTERISTICS:

[0892]

(A) LENGTH: 563 base pairs

[0893]

(B) TYPE: nucleic acid

[0894]

(C) STRANDEDNESS: single

[0895]

(D) TOPOLOGY: linear

[0896]

(ii) MOLECULE TYPE: DNA (genomic)

[0897]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24:

[0898]

AATCCTCTTG TGTTCAGGCC TGTGGATCCC TGAGAGGCTA GCCCACAAGA TCCACTTCAA 60

[0899]

AAGCCCTAGA TAACACCAAG TCTTTCCAGA CCCAGTGCAC ATCCCATCAG CCAGGACACC 120

[0900]

AGTGTATGTT GGGATGCAAA CAGGGAGGCT TATGACATCT AATGTGTTTT CCAGAGTGAA 180

[0901]

GTGCCTGGCT CCATTCCAAA CTCCTGGAAG TGGACTGTGG AACACATTGT CTATAAAGCC 240

[0902]

TTGCGCTCAC ACATTCTGCC TCCTAAACAT TTCACAGAAG ATGGAAATAT CCTGCAGCTT 300

[0903]

GCTAACCTGC CTGATCTATA CAAAGTCTTT GAGAGGTGTT AAATATGGTT ATTTATGCAC 360

[0904]

TGTGGGATGT GTTCTTCTTT CTCTGTATTC CGATACAAAG TGTTGTATCA AAGTGTGATA 420

[0905]

TACAAAGTGT ACCAACATAA GTGTTGGTAG CACTTAAGAC TTATACTTGC CTTCTGATAG 480

[0906]

TATTCCTTTA TACACAGTGG ATTGATTATA AATAAATAGA TGTGTCTTAA CATAATTTCT 540

[0907]

TATTTAATTT TATTATGTAT ATA 563

[0908]

(2) INFORMATION FOR SEQ ID NO:25:

[0909]

(i) SEQUENCE CHARACTERISTICS: .

[0910]

(A) LENGTH: 137 base pairs

[0911]

(B) TYPE: nucleic acid

[0912]

(C) STRANDEDNESS: single

[0913]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA

[0914]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25:

[0915]

CTTGGCTCTT CTGGCGCCAA AATGTCGTTC GTGGCAGGGG TTATTCGGCG GCTGGACGAG 60

[0916]

ACAGTGGTGA ACCGCATCGC GGCGGGGGAA GTTATCCAGC GGCCAGCTAA TGCTATCAAA 120

[0917]

GAGATGATTG AGAACTG 137

[0918]

(2) INFORMATION FOR SEQ ID NO:26: (i) SEQUENCE CHARACTERISTICS:

[0919]

(A) LENGTH: 91 base pairs

[0920]

(B) TYPE: nucleic acid

[0921]

(C) STRANDEDNESS: single

[0922]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA

[0923]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: TTTAGATGCA AAATCCACAA GTATTCAAGT GATTGTTAAA GAGGGAGGCC TGAAGTTGAT 60 TCAGATCCAA GACAATGGCA CCGGGATCAG G 91 (2) INFORMATION FOR SEQ ID NO:27: (i) SEQUENCE CHARACTERISTICS:

[0924]

(A) LENGTH: 99 base pairs

[0925]

(B) TYPE: nucleic acid

[0926]

(C) STRANDEDNESS: single

[0927]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA

[0928]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: AAAGAAGATC TGGATATTGT ATGTGAAAGG TTCACTACTA GTAAACTGCA GTCCTTTGAG 60 GATTTAGCCA GTATTTCTAC CTATGGCTTT CGAGGTGAG 99

[0929]

(2) INFORMATION FOR SEQ ID NO:28: (i) SEQUENCE CHARACTERISTICS:

[0930]

(A) LENGTH: 74 base pairs

[0931]

(B) TYPE: nucleic acid

[0932]

(C) STRANDEDNESS: single

[0933]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA

[0934]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: GCTTTGGCCA GCATAAGCCA TGTGGCTCAT GTTACTATTA CAACGAAAAC AGCTGATGGA 60 AAGTGTGCAT ACAG 74

[0935]

(2) INFORMATION FOR SEQ ID NO:29: (i) SEQUENCE CHARACTERISTICS:

[0936]

(A) LENGTH: 73 base pairs

[0937]

(B) TYPE: nucleic acid

[0938]

(C) STRANDEDNESS: single

[0939]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA

[0940]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: AGCAAGTTAC TCAGATGGAA AACTGAAAGC CCCTCCTAAA CCATGTGCTG GCAATCAAGG 60 GACCCAGATC ACG 73

[0941]

(2) INFORMATION FOR SEQ ID NO:30: (i) SEQUENCE CHARACTERISTICS:

[0942]

(A) LENGTH: 92 base pairs

[0943]

(B) TYPE: nucleic acid

[0944]

(C) STRANDEDNESS: single

[0945]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA

[0946]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:30: GTGGAGGACC TTTTTTACAA CATAGCCACG AGGAGAAAAG CTTTAAAAAA TCCAAGTGAA 60 GAATATGGGA AAATTTTGGA AGTTGTTGGC AG 92 (2) INFORMATION FOR SEQ ID NO:31: (i) SEQUENCE CHARACTERISTICS:

[0947]

(A) LENGTH: 43 base pairs

[0948]

(B) TYPE: nucleic acid

[0949]

(C) STRANDEDNESS: single

[0950]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA

[0951]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:31: GTATTCAGTA CACAATGCAG GCATTAGTTT CTCAGTTAAA AAA 43

[0952]

(2) INFORMATION FOR SEQ ID NO:32: (i) SEQUENCE CHARACTERISTICS:

[0953]

(A) LENGTH: 89 base pairs

[0954]

(B) TYPE: nucleic acid

[0955]

(C) STRANDEDNESS: single

[0956]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA

[0957]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: CAAGGAGAGA CAGTAGCTGA TGTTAGGACA CTACCCAATG CCTCAACCGT GGACAATATT 60 CGCTCCATCT TTGGAAATGC TGTTAGTCG 89

[0958]

(2) INFORMATION FOR SEQ ID NO:33: (i) SEQUENCE CHARACTERISTICS:

[0959]

(A) LENGTH: 113 base pairs

[0960]

(B) TYPE: nucleic acid

[0961]

(C) STRANDEDNESS: single

[0962]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA

[0963]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: AGAACTGATA GAAATTGGAT GTGAGGATAA AACCCTAGCC TTCAAAATGA ATGGTTACAT 60 ATCCAATGCA AACTACTCAG TGAAGAAGTG CATCTTCTTA CTCTTCATCA ACC 113

[0964]

(2) INFORMATION FOR SEQ ID NO:34: (i) SEQUENCE CHARACTERISTICS:

[0965]

(A) LENGTH: 94 base pairs

[0966]

(B) TYPE: nucleic acid

[0967]

(C) STRANDEDNESS: single

[0968]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA

[0969]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: ATCGTCTGGT AGAATCAACT TCCTTGAGAA AAGCCATAGA AACAGTGTAT GCAGCCTATT 60 TGCCCAAAAA CACACACCCA TTCCTGTACC TCAG 94 (2) INFORMATION FOR SEQ ID NO:35: (i) SEQUENCE CHARACTERISTICS:

[0970]

(A) LENGTH: 154 base pairs

[0971]

(B) TYPE: nucleic acid

[0972]

(C) STRANDEDNESS: single

[0973]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA

[0974]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:35:

[0975]

TTTAGAAATC AGTCCCCAGA ATGTGGATGT TAATGTGCAC CCCACAAAGC ATGAAGTTCA 60

[0976]

CTTCCTGCAC GAGGAGAGCA TCCTGGAGCG GGTGCAGCAG CACATCGAGA GCAAGCTCCT 120

[0977]

GGGCTCCAAT TCCTCCAGGA TGTACTTCAC CCAG 154

[0978]

(2) INFORMATION FOR SEQ ID NO:36: (i) SEQUENCE CHARACTERISTICS:

[0979]

(A) LENGTH: 371 base pairs

[0980]

(B) TYPE: nucleic acid

[0981]

(C) STRANDEDNESS: single

[0982]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA

[0983]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:36:

[0984]

ACTTTGCTAC CAGGACTTGC TGGCCCCTCT GGGGAGATGG TTAAATCCAC AACAAGTCTG 60

[0985]

ACCTCGTCTT CTACTTCTGG AAGTAGTGAT AAGGTCTATG CCCACCAGAT GGTTCGTACA 120

[0986]

GATTCCCGGG AACAGAAGCT TGATGCATTT CTGCAGCCTC TGAGCAAACC CCTGTCCAGT 180

[0987]

CAGCCCCAGG CCATTGTCAC AGAGGATAAG ACAGATATTT CTAGTGGCAG GGCTAGGCAG 240

[0988]

CAAGATGAGG AGATGCTTGA ACTCCCAGCC CCTGCTGAAG TGGCTGCCAA AAATCAGAGC 300

[0989]

TTGGAGGGGG ATACAACAAA GGGGACTTCA GAAATGTCAG AGAAGAGAGG ACCTACTTCC 360

[0990]

AGCAACCCCA G 371

[0991]

(2) INFORMATION FOR SEQ ID NO:37: (i) SEQUENCE CHARACTERISTICS:

[0992]

(A) LENGTH: 149 base pairs

[0993]

(B) TYPE: nucleic acid

[0994]

(C) STRANDEDNESS: single

[0995]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA

[0996]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:37:

[0997]

AAAGAGACAT CGGGAAGATT CTGATGTGGA AATGGTGGAA GATGATTCCC GAAAGGAAAT 60

[0998]

GACTGCAGCT TGTACCCCCC GGAGAAGGAT CATTAACCTC ACTAGTGTTT TGAGTCTCCA 120

[0999]

GGAAGAAATT AATGAGCAGG GACATGAGG 149 (2) INFORMATION FOR SEQ ID NO:38: (i) SEQUENCE CHARACTERISTICS:

[1000]

(A) LENGTH: 109 base pairs

[1001]

(B) TYPE: nucleic acid

[1002]

(C) STRANDEDNESS: single

[1003]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA

[1004]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: TTCTCCGGGA GATGTTGCAT AACCACTCCT TCGTGGGCTG TGTGAATCCT CAGTGGGCCT 60 TGGCACAGCA TCAAACCAAG TTATACCTTC TCAACACCAC CAAGCTTAG 109

[1005]

(2) INFORMATION FOR SEQ ID NO:39: (i) SEQUENCE CHARACTERISTICS:

[1006]

(A) LENGTH: 64 base pairs

[1007]

(B) TYPE: nucleic acid

[1008]

(C) STRANDEDNESS: single

[1009]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: CDNA

[1010]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:39: TGAAGAACTG TTCTACCAGA TACTCATTTA TGATTTTGCC AATTTTGGTG TTCTCAGGTT 60 ATCG 64

[1011]

(2) INFORMATION FOR SEQ ID NO:40: (i) SEQUENCE CHARACTERISTICS:

[1012]

(A) LENGTH: 165 base pairs

[1013]

(B) TYPE: nucleic acid

[1014]

(C) STRANDEDNESS: single

[1015]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA

[1016]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:40:

[1017]

GAGCCAGCAC CGCTCTTTGA CCTTGCCATG CTTGCCTTAG ATAGTCCAGA GAGTGGCTGG 60

[1018]

ACAGAGGAAG ATGGTCCCAA AGAAGGACTT GCTGAATACA TTGTTGAGTT TCTGAAGAAG 120

[1019]

AAGGCTGAGA TGCTTGCAGA CTATTTCTCT TTGGAAATTG ATGAG 165

[1020]

(2) INFORMATION FOR SEQ ID NO:41: (i) SEQUENCE CHARACTERISTICS:

[1021]

(A) LENGTH: 93 base pairs

[1022]

(B) TYPE: nucleic acid

[1023]

(C) STRANDEDNESS: single

[1024]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA

[1025]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: GAAGGGAACC TGATTGGATT ACCCCTTCTG ATTGACAACT ATGTGCCCCC TTTGGAGGGA 60 CTGCCTATCT TCATTCTTCG ACTAGCCACT GAG 93 (2) INFORMATION FOR SEQ ID NO:42: (i) SEQUENCE CHARACTERISTICS:

[1026]

(A) LENGTH: 114 base pairs

[1027]

(B) TYPE: nucleic acid

[1028]

(C) STRANDEDNESS: single

[1029]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA

[1030]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: GTGAATTGGG ACGAAGAAAA GGAATGTTTT GAAAGCCTCA GTAAAGAATG CGCTATGTTC 60 TATTCCATCC GGAAGCAGTA CATATCTGAG GAGTCGACCC TCTCAGGCCA GCAG 114

[1031]

(2) INFORMATION FOR SEQ ID NO:43: (i) SEQUENCE CHARACTERISTICS:

[1032]

(A) LENGTH: 360 base pairs

[1033]

(B) TYPE: nucleic acid

[1034]

(C) STRANDEDNESS: single

[1035]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA

[1036]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43:

[1037]

AGTGAAGTGC CTGGCTCCAT TCCAAACTCC TGGAAGTGGA CTGTGGAACA CATTGTCTAT 60

[1038]

AAAGCCTTGC GCTCACACAT TCTGCCTCCT AAACATTTCA CAGAAGATGG AAATATCCTG 120

[1039]

CAGCTTGCTA ACCTGCCTGA TCTATACAAA GTCTTTGAGA GGTGTTAAAT ATGGTTATTT 180

[1040]

ATGCACTGTG GGATGTGTTC TTCTTTCTCT GTATTCCGAT ACAAAGTGTT GTATCAAAGT 240

[1041]

GTGATATACA AAGTGTACCA ACATAAGTGT TGGTAGCACT TAAGACTTAT ACTTGCCTTC 300

[1042]

TGATAGTATT CCTTTATACA CAGTGGATTG ATTATAAATA AATAGATGTG TCTTAACATA 360

[1043]

(2) INFORMATION FOR SEQ ID NO:44: (i) SEQUENCE CHARACTERISTICS:

[1044]

(A) LENGTH: 19 base pairs

[1045]

(B) TYPE: nucleic acid

[1046]

(C) STRANDEDNESS: single

[1047]

(D) TOPOLOGY: linear (ix) FEATURE:

[1048]

(A) NAME/KEY: misc_feature

[1049]

(B) LOCATION: 1

[1050]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: AGGCACTGAG GTGATTGGC 19 (2) INFORMATION FOR SEQ ID NO:45: (i) SEQUENCE CHARACTERISTICS:

[1051]

(A) LENGTH: 19 base pairs

[1052]

(B) TYPE: nucleic acid

[1053]

(C) STRANDEDNESS: single

[1054]

(D) TOPOLOGY: linear (i ) FEATURE:

[1055]

(A) NAME/KEY: misc_feature

[1056]

(B) LOCATION: 1

[1057]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:45: TCGTAGCCCT TAAGTGAGC 19

[1058]

(2) INFORMATION FOR SEQ ID NO:46: (i) SEQUENCE CHARACTERISTICS:

[1059]

(A) LENGTH: 22 base pairs

[1060]

(B) TYPE: nucleic acid

[1061]

(C) STRANDEDNESS: single

[1062]

(D) TOPOLOGY: linear (ix) FEATURE:

[1063]

' (A) NAME/KEY: misc_feature (B) LOCATION: 1

[1064]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: AATATGTACA TTAGAGTAGT TG 22

[1065]

(2) INFORMATION FOR SEQ ID NO:47: (i) SEQUENCE CHARACTERISTICS:

[1066]

(A) LENGTH: 19 base pairs

[1067]

(B) TYPE: nucleic acid

[1068]

(C) STRANDEDNESS: single

[1069]

(D) TOPOLOGY: linear (ix) FEATURE:

[1070]

(A) NAME/KEY: misc_feature

[1071]

(B) LOCATION: 1

[1072]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: CAGAGAAAGG TCCTGACTC 19 (2) INFORMATION FOR SEQ ID NO:48: (i) SEQUENCE CHARACTERISTICS:

[1073]

(A) LENGTH: 22 base pairs

[1074]

(B) TYPE: nucleic acid

[1075]

(C) STRANDEDNESS: single

[1076]

(D) TOPOLOGY: linear (ix) FEATURE:

[1077]

(A) NAME/KEY: misc_feature

[1078]

(B) LOCATION: 1

[1079]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:48: AGAGATTTGG AAAATGAGTA AC 22

[1080]

(2) INFORMATION FOR SEQ ID NO:49: (i) SEQUENCE CHARACTERISTICS:

[1081]

(A) LENGTH: 19 base pairs

[1082]

(B) TYPE: nucleic acid

[1083]

(C) STRANDEDNESS: single

[1084]

(D) TOPOLOGY: linear (ix) FEATURE:

[1085]

(A) NAME/KEY: misc_feature

[1086]

(B) LOCATION: 1

[1087]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:49: ACAATGTCAT CACAGGAGG 19

[1088]

(2) INFORMATION FOR SEQ ID NO:50: (i) SEQUENCE CHARACTERISTICS:

[1089]

(A) LENGTH: 20 base pairs

[1090]

(B) TYPE: nucleic acid

[1091]

(C) STRANDEDNESS: single

[1092]

(D) TOPOLOGY: linear (ix) FEATURE:

[1093]

(A) NAME/KEY: misc_feature

[1094]

(B) LOCATION: 1

[1095]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:50: AACCTTTCCC TTTGGTGAGG 20 (2) INFORMATION FOR SEQ ID NO:51: (i) SEQUENCE CHARACTERISTICS:

[1096]

(A) LENGTH: 20 base pairs

[1097]

(B) TYPE: nucleic acid

[1098]

(C) STRANDEDNESS: single

[1099]

(D) TOPOLOGY: linear (ix) FEATURE:

[1100]

(A) NAME/KEY: misc_feature

[1101]

(B) LOCATION: 1

[1102]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:51: GATTACTCTG AGACCTAGGC 20

[1103]

(2) INFORMATION FOR SEQ ID NO:52: (i) SEQUENCE CHARACTERISTICS:

[1104]

(A) LENGTH: 22 base pairs

[1105]

(B) TYPE: nucleic acid

[1106]

(C) STRANDEDNESS: single

[1107]

(D) TOPOLOGY: linear (ix)'FEATURE.'

[1108]

(A) NAME/KEY: misc_feature

[1109]

(B) LOCATION: 1

[1110]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:52: GATTTTCTCT TTTCCCCTTG GG 22

[1111]

(2) INFORMATION FOR SEQ ID NO:53: (i) SEQUENCE CHARACTERISTICS:

[1112]

(A) LENGTH: 23 base pairs

[1113]

(B) TYPE: nucleic acid

[1114]

(C) STRANDEDNESS: single

[1115]

(D) TOPOLOGY: linear (ix) FEATURE:

[1116]

(A) NAME/KEY: misc_feature

[1117]

(B) LOCATION: 1

[1118]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:53: CAAACAAAGC TTCAACAATT TAC 23 (2) INFORMATION FOR SEQ ID NO:54: (i) SEQUENCE CHARACTERISTICS:

[1119]

(A) LENGTH: 26 base pairs

[1120]

(B) TYPE: nucleic acid

[1121]

(C) STRANDEDNESS: single

[1122]

(D) TOPOLOGY: linear (ix) FEATURE:

[1123]

(A) NAME/KEY: misc_feature

[1124]

(B) LOCATION: 1

[1125]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:54: GGGTTTTATT TTCAAGTACT TCTATG 26

[1126]

(2) INFORMATION FOR SEQ ID NO:55: (i) SEQUENCE CHARACTERISTICS:

[1127]

(A) LENGTH: 26 base pairs

[1128]

(B) TYPE: nucleic acid

[1129]

(C) STRANDEDNESS: single

[1130]

(D) TOPOLOGY: linear (ix) FEATURE:

[1131]

(A) NAME/KEY: misc_feature

[1132]

(B) LOCATION: 1

[1133]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:55: GCTCAGCAAC TGTTCAATGT ATGAGC 26

[1134]

(2) INFORMATION FOR SEQ ID NO:56: (i) SEQUENCE CHARACTERISTICS:

[1135]

(A) LENGTH: 18 base pairs

[1136]

(B) TYPE: nucleic acid

[1137]

(C) STRANDEDNESS: single

[1138]

(D) TOPOLOGY: linear (ix) FEATURE:

[1139]

(A) NAME/KEY: misc_feature

[1140]

(B) LOCATION: 1

[1141]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:56: CTAGTGTGTG TTTTTGGC 18 (2) INFORMATION FOR SEQ ID NO:57: (i) SEQUENCE CHARACTERISTICS:

[1142]

(A) LENGTH: 18 base pairs

[1143]

(B) TYPE: nucleic acid

[1144]

(C) STRANDEDNESS: single

[1145]

(D) TOPOLOGY: linear (ix) FEATURE:

[1146]

(A) NAME/KEY: misc_feature

[1147]

(B) LOCATION: 1

[1148]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:57: CATAACCTTA TCTCCACC 18

[1149]

(2) INFORMATION FOR SEQ ID NO:58: (i) SEQUENCE CHARACTERISTICS:

[1150]

(A) LENGTH: 23 base pairs

[1151]

(B) TYPE: nucleic acid

[1152]

(C) STRANDEDNESS: single

[1153]

(D) TOPOLOGY: linear (ix) FEATURE:

[1154]

(A) NAME/KEY: misc_feature

[1155]

(B) LOCATION: 1

[1156]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:58: CTCAGCCATG AGACAATAAA TCC 23

[1157]

(2) INFORMATION FOR SEQ ID NO:59: (i) SEQUENCE CHARACTERISTICS:

[1158]

(A) LENGTH: 21 base pairs

[1159]

(B) TYPE: nucleic acid

[1160]

(C) STRANDEDNESS: single

[1161]

(D) TOPOLOGY: linear (ix) FEATURE:

[1162]

(A) NAME/KEY: misc_feature

[1163]

(B) LOCATION: 1

[1164]

(D) OTHER INFORMATION: /note= "primers directed to genomic DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:59: GGTTCCCAAA TAATGTGATG G 21 (2) INFORMATION FOR SEQ ID NO:60:' (i) SEQUENCE CHARACTERISTICS:

[1165]

(A) LENGTH: 18 base pairs

[1166]

(B) TYPE: nucleic acid

[1167]

(C) STRANDEDNESS: single

[1168]

(D) TOPOLOGY: linear (ix) FEATURE:

[1169]

(A) NAME/KEY: misc_feature

[1170]

(B) LOCATION: 1

[1171]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:60: CAAAAGCTTC AGAATCTC 18

[1172]

(2) INFORMATION FOR SEQ ID NO:61: (i) SEQUENCE CHARACTERISTICS:

[1173]

(A) LENGTH: 23 base pairs

[1174]

(B) TYPE: nucleic acid

[1175]

(C) STRANDEDNESS: single

[1176]

(D) TOPOLOGY: linear (ix) FEATURE:

[1177]

(A) NAME/KEY: misc_feature

[1178]

(B) LOCATION: 1

[1179]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA"- (xi) SEQUENCE DESCRIPTION: SEQ ID NO:61: CTGTGGGTGT TTCCTGTGAG TGG 23

[1180]

(2) INFORMATION FOR SEQ ID NO:62: (i) SEQUENCE CHARACTERISTICS:

[1181]

(A) LENGTH: 24 base pairs

[1182]

(B) TYPE: nucleic acid

[1183]

(C) STRANDEDNESS: single

[1184]

(D) TOPOLOGY: linear (ix) FEATURE:

[1185]

(A) NAME/KEY: misc_feature

[1186]

(B) LOCATION: 1

[1187]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:62: CATGACTTTG TGTGAATGTA CACC 24 (2) INFORMATION FOR SEQ ID NO:63: (i) SEQUENCE CHARACTERISTICS:

[1188]

(A) LENGTH: 24 base pairs

[1189]

(B) TYPE: nucleic acid

[1190]

(C) STRANDEDNESS: single

[1191]

(D) TOPOLOGY: linear (ix) FEATURE:

[1192]

(A) NAME/KEY: misc_feature

[1193]

(B) LOCATION: 1

[1194]

(D) OTHER INFORMATION: /nσte= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:63: GAGGAGAGCC TGATAGAACA TCTG 24

[1195]

(2) INFORMATION FOR SEQ ID NO:64: (i) SEQUENCE CHARACTERISTICS:

[1196]

(A) LENGTH: 20 base pairs

[1197]

(B) TYPE: nucleic acid

[1198]

(C) STRANDEDNESS: single

[1199]

(D) TOPOLOGY: linear (ix) FEATURE:

[1200]

(A) NAME/KEY: misc_feature

[1201]

(B) LOCATION: 1

[1202]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:64: GGGCTTTTTC TCCCCCTCCC 20

[1203]

(2) INFORMATION FOR SEQ ID NO:65: (i) SEQUENCE CHARACTERISTICS:

[1204]

(A) LENGTH: 18 base pairs

[1205]

(B) TYPE: nucleic acid

[1206]

(C) STRANDEDNESS: single

[1207]

(D) TOPOLOGY: linear (ix) FEATURE:

[1208]

(A) NAME/KEY: misc_feature

[1209]

(B) LOCATION: 1

[1210]

(D) OTHER INFORMATION: /note= "primers directed to genomic . intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:65: AAAATCTGGG CTCTCACG 18 (2) INFORMATION FOR SEQ ID NO:66: (i) SEQUENCE CHARACTERISTICS:

[1211]

(A) LENGTH: 19 base pairs

[1212]

(B) TYPE: nucleic acid

[1213]

(C) STRANDEDNESS: single

[1214]

(D) TOPOLOGY: linear (ix) FEATURE:

[1215]

(A) NAME/KEY: misc_feature

[1216]

(B) LOCATION: 1

[1217]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:66: AATTATACCT CATACTAGC 19

[1218]

(2) INFORMATION FOR SEQ ID NO:67: (i) SEQUENCE CHARACTERISTICS:

[1219]

(A) LENGTH: 23 base pairs

[1220]

(B) TYPE: nucleic acid

[1221]

(C) STRANDEDNESS: single

[1222]

(D) TOPOLOGY: linear (ix) FEATURE:

[1223]

' (A) NAME/KEY: misc_feature (B) LOCATION: 1

[1224]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:67: GTTTTATTAC AGAATAAAGG AGG 23

[1225]

(2) INFORMATION FOR SEQ ID NO:68: (i) SEQUENCE CHARACTERISTICS:

[1226]

(A) LENGTH: 19 base pairs

[1227]

(B) TYPE: nucleic acid

[1228]

(C) STRANDEDNESS: single

[1229]

(D) TOPOLOGY: linear (ix) FEATURE:

[1230]

(A) NAME/KEY: misc_feature

[1231]

(B) LOCATION: 1

[1232]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:68: AAGCCAAAGT TAGAAGGCA 19 (2) INFORMATION FOR SEQ ID NO:69: (i) SEQUENCE CHARACTERISTICS:

[1233]

(A) LENGTH: 20 base pairs

[1234]

(B) TYPE: nucleic acid

[1235]

(C) STRANDEDNESS: single

[1236]

(D) TOPOLOGY: linear (ix) FEATURE:

[1237]

(A) NAME/KEY: misc_feature

[1238]

(B) LOCATION: 1

[1239]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:69: TGCAACCCAC AAAATTTGGC 20

[1240]

(2) INFORMATION FOR SEQ ID NO:70: (i) SEQUENCE CHARACTERISTICS:

[1241]

(A) LENGTH: 20 base pairs

[1242]

(B) TYPE: nucleic acid

[1243]

(C) STRANDEDNESS: single

[1244]

(D) TOPOLOGY: linear (ix) FEATURE:

[1245]

(A) NAME/KEY: misc_feature

[1246]

(B) LOCATION: 1

[1247]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" . (xi) SEQUENCE DESCRIPTION: SEQ ID NO:70: CTTTCTCCAT TTCCAAAACC 20

[1248]

(2) INFORMATION FOR SEQ ID NO:71: (i) SEQUENCE CHARACTERISTICS:

[1249]

(A) LENGTH: 18 base pairs

[1250]

(B) TYPE: nucleic acid

[1251]

(C) STRANDEDNESS: single

[1252]

(D) TOPOLOGY: linear (ix) FEATURE:

[1253]

(A) NAME/KEY: misc_feature

[1254]

(B) LOCATION: 1

[1255]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:71: TGGTGTCTCT AGTTCTGG 18 (2) INFORMATION FOR SEQ ID NO:72: (i) SEQUENCE CHARACTERISTICS:

[1256]

(A) LENGTH: 20 base pairs

[1257]

(B) TYPE: nucleic acid

[1258]

(C) STRANDEDNESS: single

[1259]

(D) TOPOLOGY: linear (ix) FEATURE:

[1260]

(A) NAME/KEY: misc_feature

[1261]

(B) LOCATION: 1

[1262]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:72: CATTGTTGTA GTAGCTCTGC 20

[1263]

(2) INFORMATION FOR SEQ ID NO:73: (i) SEQUENCE CHARACTERISTICS:

[1264]

(A) LENGTH: 18 base pairs

[1265]

(B) TYPE: nucleic acid

[1266]

(C) STRANDEDNESS: single

[1267]

(D) TOPOLOGY: linea (ix) FEATURE:

[1268]

(A) NAME/KEY: misc_feature

[1269]

(B) LOCATION: 1

[1270]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:73: CCCATTTGTC CCAACTGG 18

[1271]

(2) INFORMATION FOR SEQ ID NO:74: (i) SEQUENCE CHARACTERISTICS:

[1272]

(A) LENGTH: 19 base pairs

[1273]

(B) TYPE: nucleic acid

[1274]

(C) STRANDEDNESS: single

[1275]

(D) TOPOLOGY: linear (ix) FEATURE:

[1276]

(A) NAME/KEY: misc_feature

[1277]

(B) LOCATION: 1

[1278]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:74: CGGTCAGTTG AAATGTCAG 19 (2) INFORMATION FOR SEQ ID NO:75: (i) SEQUENCE CHARACTERISTICS:

[1279]

(A) LENGTH: 22 base pairs

[1280]

(B) TYPE: nucleic acid

[1281]

(C) STRANDEDNESS: single

[1282]

(D) TOPOLOGY: linear (ix) FEATURE:

[1283]

(A) NAME/KEY: misc_feature

[1284]

(B) LOCATION: 1

[1285]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:75: CATTTGGATG CTCCGTTAAA GC 22

[1286]

(2) INFORMATION FOR SEQ ID NO:76: (i) SEQUENCE CHARACTERISTICS:

[1287]

(A) LENGTH: 23 base pairs

[1288]

(B) TYPE: nucleic acid

[1289]

(C) STRANDEDNESS: single

[1290]

(D) TOPOLOGY: linear. (ix) FEATURE:

[1291]

(A) NAME/KEY: misc_feature

[1292]

(B) LOCATION: 1

[1293]

(D) OTHER INFORMATION: /note= "primers, directed to genomic. intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:76: CACCCGGCTG GAAATTTTAT TTG 23

[1294]

(2) INFORMATION FOR SEQ ID NO:77: (i) SEQUENCE CHARACTERISTICS:

[1295]

(A) LENGTH: 22 base pairs

[1296]

(B) TYPE: nucleic acid

[1297]

(C) STRANDEDNESS: single

[1298]

(D) TOPOLOGY: linear (ix) FEATURE:

[1299]

(A) NAME/KEY: misc_feature

[1300]

(B) LOCATION: 1

[1301]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:77: GGAAAGGCAC TGGAGAAATG GG 22 (2) INFORMATION FOR SEQ ID NO:78: (i) SEQUENCE CHARACTERISTICS:

[1302]

(A) LENGTH: 25 base pairs

[1303]

(B) TYPE: nucleic acid

[1304]

(C) STRANDEDNESS: single

[1305]

(D) TOPOLOGY: linear (ix) FEATURE:

[1306]

(A) NAME/KEY: misc_feature

[1307]

(B) LOCATION: 1

[1308]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:78: CCCTCCAGCA CACATGCATG TACCG 25

[1309]

(2) INFORMATION FOR SEQ ID NO:79: (i) SEQUENCE CHARACTERISTICS:

[1310]

(A) LENGTH: 20 base pairs

[1311]

(B) TYPE: nucleic acid

[1312]

(C) STRANDEDNESS: single

[1313]

(D) TOPOLOGY: linear (ix) FEATURE:

[1314]

(A) NAME/KEY: misc_feature

[1315]

(B) LOCATION: 1

[1316]

(D) OTHER INFORMATION: /note= "primers directed to genqmic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:79: TAAGTAGTCT GTGATCTCCG 20

[1317]

(2) INFORMATION FOR SEQ ID NO:80: (i) SEQUENCE CHARACTERISTICS:

[1318]

(A) LENGTH: 18 base pairs

[1319]

(B) TYPE: nucleic acid

[1320]

(C) STRANDEDNESS: single

[1321]

(D) TOPOLOGY: linear (ix) FEATURE:

[1322]

(A) NAME/KEY: misc_feature

[1323]

(B) LOCATION: 1

[1324]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:80: ATGTATGAGG TCC GTCC 18 (2) INFORMATION FOR SEQ ID NO:81: (i) SEQUENCE CHARACTERISTICS:

[1325]

(A) LENGTH: 18 base pairs

[1326]

(B) TYPE: nucleic acid

[1327]

(C) STRANDEDNESS: single

[1328]

(D) TOPOLOGY: linear (ix) FEATURE:

[1329]

(A) NAME/KEY: misc_feature

[1330]

(B) LOCATION: 1

[1331]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:81: GACACCAGTG TATGTTGG 18

[1332]

(2) INFORMATION FOR SEQ ID NO:82: (i) SEQUENCE CHARACTERISTICS:

[1333]

(A) LENGTH: 20 base pairs

[1334]

(B) TYPE: nucleic acid

[1335]

(C) STRANDEDNESS: single

[1336]

(D) TOPOLOGY: linear (ix)'FEATURE.

[1337]

(A) NAME/KEY: misc_feature

[1338]

(B) LOCATION: 1

[1339]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:82: GAGAAAGAAG AACACATCCC 20

[1340]

(2) INFORMATION FOR SEQ ID NO:83: (i) SEQUENCE CHARACTERISTICS:

[1341]

(A) LENGTH: 38 base pairs

[1342]

(B) TYPE: nucleic acid

[1343]

(C) STRANDEDNESS: single

[1344]

(D) TOPOLOGY: linear (ix) FEATURE:

[1345]

(A) NAME/KEY: misc_feature

[1346]

(B) LOCATION: 1

[1347]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:83: TGTAAAACGA CGGCCAGTCA CTGAGGTGAT TGGCTGAA 38 (2) INFORMATION FOR SEQ ID NO:84: (i) SEQUENCE CHARACTERISTICS:

[1348]

(A) LENGTH: 19 base pairs

[1349]

(B) TYPE: nucleic acid

[1350]

(C) STRANDEDNESS: single

[1351]

(D) TOPOLOGY: linear (ix) FEATURE:

[1352]

(A) NAME/KEY: misc_feature

[1353]

(B) LOCATION: 1

[1354]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:84: TAGCCCTTAA GTGAGCCCG 19

[1355]

(2) INFORMATION FOR SEQ ID NO:85: (i) SEQUENCE CHARACTERISTICS:

[1356]

(A) LENGTH: 38 base pairs

[1357]

(B) TYPE: nucleic acid

[1358]

(C) STRANDEDNESS: single

[1359]

(D) TOPOLOGY: 1inear (ix) FEATURE:

[1360]

(A) NAME/KEY: misc_feature

[1361]

(B) LOCATION: 1

[1362]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:85: TGTAAAACGA CGGCCAGTTA CATTAGAGTA GTTGCAGA 38

[1363]

(2) INFORMATION FOR SEQ ID NO:86: (i) SEQUENCE CHARACTERISTICS:

[1364]

(A) LENGTH: 19 base pairs

[1365]

(B) TYPE: nucleic acid

[1366]

(C) STRANDEDNESS: single

[1367]

(D) TOPOLOGY: linear (ix) FEATURE:

[1368]

(A) NAME/KEY: misc_feature

[1369]

(B) LOCATION: 1

[1370]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:86: AGGTCCTGAC TCTTCCATG 19 (2) INFORMATION FOR SEQ ID NO:87: (i) SEQUENCE CHARACTERISTICS:

[1371]

(A) LENGTH: 40 base pairs

[1372]

(B) TYPE: nucleic acid

[1373]

(C) STRANDEDNESS: single

[1374]

(D) TOPOLOGY: linear (ix) FEATURE:

[1375]

(A) NAME/KEY: misc_feature

[1376]

(B) LOCATION: 1

[1377]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:87: TGTAAAACGA CGGCCAGTTT GGAAAATGAG TAACATGATT 40

[1378]

(2) INFORMATION FOR SEQ ID NO:88: (i) SEQUENCE CHARACTERISTICS:

[1379]

(A) LENGTH: 19 base pairs

[1380]

(B) TYPE: nucleic acid

[1381]

(C) STRANDEDNESS: single

[1382]

(D) TOPOLOGY: linear (ix) FEATURE:

[1383]

(A) NAME/KEY: misc_feature

[1384]

(B) LOCATION: 1

[1385]

(D) .OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:88: TGTCATCACA GGAGGATAT 19

[1386]

(2) INFORMATION FOR SEQ ID NO:89: (i) SEQUENCE CHARACTERISTICS:

[1387]

(A) LENGTH: 38 base pairs

[1388]

(B) TYPE: nucleic acid

[1389]

(C) STRANDEDNESS: single

[1390]

(D) TOPOLOGY: linear (ix) FEATURE:

[1391]

(A) NAME/KEY: misc_feature

[1392]

(B) LOCATION: 1

[1393]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:89: TGTAAAACGA CGGCCAGTCT TTCCCTTTGG TGAGGTGA 38 (2) INFORMATION FOR SEQ ID NO:90: (i) SEQUENCE CHARACTERISTICS:

[1394]

(A) LENGTH: 20 base pairs

[1395]

(B) TYPE: nucleic acid

[1396]

(C) STRANDEDNESS: single

[1397]

(D) TOPOLOGY: linear (ix) FEATURE:

[1398]

(A) NAME/KEY: misc_feature

[1399]

(B) LOCATION: 1

[1400]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:90: TACTCTGAGA CCTAGGCCCA 20

[1401]

(2) INFORMATION FOR SEQ ID NO:91: (i) SEQUENCE CHARACTERISTICS:

[1402]

(A) LENGTH: 40 base pairs

[1403]

(B) TYPE: nucleic acid

[1404]

(C) STRANDEDNESS: single

[1405]

(D) TOPOLOGY: linear (ix)- FEATURE:

[1406]

(A) NAME/KEY: misc_feature

[1407]

(B) LOCATION: 1

[1408]

(D) OTHER INFORMATION:~/note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:91: TGTAAAACGA CGGCCAGTTC TCTTTTCCCC TTGGGATTAG 40

[1409]

(2) INFORMATION FOR SEQ ID NO:92: (i) SEQUENCE CHARACTERISTICS:

[1410]

(A) LENGTH: 23 base pairs

[1411]

(B) TYPE: nucleic acid

[1412]

(C) STRANDEDNESS: single

[1413]

(D) TOPOLOGY: linear (ix) FEATURE:

[1414]

(A) NAME/KEY: misc_feature

[1415]

(B) LOCATION: 1

[1416]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:92: ACAAAGCTTC AACAATTTAC TCT 23 (2) INFORMATION FOR SEQ ID NO:93: (i) SEQUENCE CHARACTERISTICS:

[1417]

(A) LENGTH: 46 base pairs

[1418]

(B) TYPE: nucleic acid

[1419]

(C) STRANDEDNESS: single

[1420]

(D) TOPOLOGY: linear (ix) FEATURE:

[1421]

(A) NAME/KEY: misc_feature

[1422]

(B) LOCATION: 1

[1423]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:93: TGTAAAACGA CGGCCAGTGT TTTATTTTCA AGTACTTCTA TGAATT 46

[1424]

(2) INFORMATION FOR SEQ ID NO:94: (i) SEQUENCE CHARACTERISTICS:

[1425]

(A) LENGTH: 26 base pairs

[1426]

(B) TYPE: nucleic acid

[1427]

(C) STRANDEDNESS: single

[1428]

(D) TOPOLOGY: linear (ix) FEATURE:

[1429]

(A) NAME/KEY: misc_feature

[1430]

(B) LOCATION: 1

[1431]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:94: CAGCAACTGT TCAATGTATG AGCACT 26

[1432]

(2) INFORMATION FOR SEQ ID NO:95: (i) SEQUENCE CHARACTERISTICS:

[1433]

(A) LENGTH: 36 base pairs

[1434]

(B) TYPE: nucleic acid

[1435]

(C) STRANDEDNESS: single

[1436]

(D) TOPOLOGY: linear (ix) FEATURE:

[1437]

(A) NAME/KEY: misc_feature

[1438]

(B) LOCATION: 1

[1439]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:95: TGTAAAACGA CGGCCAGTGT GTGTGTTTTT GGCAAC 36 (2) INFORMATION FOR SEQ ID NO:96: (i) SEQUENCE CHARACTERISTICS:

[1440]

(A) LENGTH: 18 base pairs

[1441]

(B) TYPE: nucleic acid

[1442]

(C) STRANDEDNESS: single

[1443]

(D) TOPOLOGY: linear (ix) FEATURE:

[1444]

(A) NAME/KEY: misc_feature

[1445]

(B) LOCATION: 1

[1446]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:96: AACCTTATCT CCACCAGC 18

[1447]

(2) INFORMATION FOR SEQ ID NO:97: (i) SEQUENCE CHARACTERISTICS:

[1448]

(A) LENGTH: 41 base pairs

[1449]

(B) TYPE: nucleic acid

[1450]

(C) STRANDEDNESS: single

[1451]

(D) TOPOLOGY: linear (ix) FEATURE:

[1452]

(A) NAME/KEY: misc_feature

[1453]

(B) LOCATION: 1

[1454]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:97: TGTAAAACGA CGGCCAGTAG CCATGAGACA ATAAATCCTT G 41

[1455]

(2) INFORMATION FOR SEQ ID NO:98: (i) SEQUENCE CHARACTERISTICS:

[1456]

(A) LENGTH: 22 base pairs

[1457]

(B) TYPE: nucleic acid

[1458]

(C) STRANDEDNESS: single

[1459]

(D) TOPOLOGY: linear (ix) FEATURE:

[1460]

(A) NAME/KEY: misc_feature

[1461]

(B) LOCATION: 1

[1462]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:98: TCCCAAATAA TGTGATGGAA TG 22 (2) INFORMATION FOR SEQ ID NO:99: (i) SEQUENCE CHARACTERISTICS:

[1463]

(A) LENGTH: 37 base pairs

[1464]

(B) TYPE: nucleic acid

[1465]

(C) STRANDEDNESS: single

[1466]

(D) TOPOLOGY: linear (ix) FEATURE:

[1467]

(A) NAME/KEY: misc_feature

[1468]

(B) LOCATION: 1

[1469]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:99: TGTAAAACGA CGGCCAGTAA GCTTCAGAAT CTCTTTT 37

[1470]

(2) INFORMATION FOR SEQ ID NO:100: (i) SEQUENCE CHARACTERISTICS:

[1471]

(A) LENGTH: 23 base pairs

[1472]

(B) TYPE: nucleic acid

[1473]

(C) STRANDEDNESS: single

[1474]

(D) TOPOLOGY: linear (ix) FEATURE:

[1475]

' (A) NAME/KEY: misc_feature (B) LOCATION: 1

[1476]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:100: ,

[1477]

TGGGTGTTTC CTGTGAGTGG ATT 23

[1478]

(2) INFORMATION FOR SEQ ID NO:101: (i) SEQUENCE CHARACTERISTICS:

[1479]

(A) LENGTH: 42 base pairs

[1480]

(B) TYPE: nucleic acid

[1481]

(C) STRANDEDNESS: single

[1482]

(D) TOPOLOGY: linear (ix) FEATURE:

[1483]

(A) NAME/KEY: misc_feature

[1484]

(B) LOCATION: 1

[1485]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:101: TGTAAAACGA CGGCCAGTAC TTTGTGTGAA TGTACACCTG TG 42 (2) INFORMATION FOR SEQ ID NO:102: (i) SEQUENCE CHARACTERISTICS:

[1486]

(A) LENGTH: 24 base pairs

[1487]

(B) TYPE: nucleic acid

[1488]

(C) STRANDEDNESS: single

[1489]

(D) TOPOLOGY: linear (ix) FEATURE:

[1490]

(A) NAME/KEY: misc_feature

[1491]

(B) LOCATION: 1

[1492]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:102: GAGAGCCTGA TAGAACATCT GTTG 24

[1493]

(2) INFORMATION FOR SEQ ID NO:103: (i) SEQUENCE CHARACTERISTICS:

[1494]

(A) LENGTH: 39 base pairs

[1495]

(B) TYPE: nucleic acid

[1496]

(C) STRANDEDNESS: single

[1497]

(D) TOPOLOGY: linear (ix) FEATURE:'

[1498]

(A) NAME/KEY: misc_feature

[1499]

(B) LOCATION: 1

[1500]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:103: TGTAAAACGA CGGCCAGTCT TTTTCTCCCC CTCCCACTA 39

[1501]

(2) INFORMATION FOR SEQ ID NO:104: (i) SEQUENCE CHARACTERISTICS:

[1502]

(A) LENGTH: 17 base pairs

[1503]

(B) TYPE: nucleic acid

[1504]

(C) STRANDEDNESS: single

[1505]

(D) TOPOLOGY: linear (ix) FEATURE:

[1506]

(A) NAME/KEY: misc_feature

[1507]

(B) LOCATION: 1

[1508]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:104: TCTGGGCTCT CACGTCT 17 (2) INFORMATION FOR SEQ ID NO:105: (i) SEQUENCE CHARACTERISTICS:

[1509]

(A) LENGTH: 18 base pairs

[1510]

(B) TYPE: nucleic acid

[1511]

(C) STRANDEDNESS: single

[1512]

(D) TOPOLOGY: linear (ix) FEATURE:

[1513]

(A) NAME/KEY: misc_feature

[1514]

(B) LOCATION: 1

[1515]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:105: CTTATTCTGA GTCTCTCC 18

[1516]

(2) INFORMATION FOR SEQ ID NO:106: (i) SEQUENCE CHARACTERISTICS:

[1517]

(A) LENGTH: 35 base pairs

[1518]

(B) TYPE: nucleic acid

[1519]

(C) STRANDEDNESS: single

[1520]

(D) TOPOLOGY: linear (ix) FEATURE:

[1521]

(A) NAME/KEY: misc_feature

[1522]

(B) LOCATION: 1

[1523]

(D) OTHER INFORMATION: /note= "primers directed to genomic- intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:106: TGTAAAACGA CGGCCAGTGT TTGCTCAGAG GCTGC 35

[1524]

(2) INFORMATION FOR SEQ ID NO:107: (i) SEQUENCE CHARACTERISTICS:

[1525]

(A) LENGTH: 21 base pairs

[1526]

(B) TYPE: nucleic acid

[1527]

(C) STRANDEDNESS: single

[1528]

(D) TOPOLOGY: linear (ix) FEATURE:

[1529]

(A) NAME/KEY: misc_feature

[1530]

(B) LOCATION: 1

[1531]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:107: GATGGTTCGT ACAGATTCCC G 21 (2) INFORMATION FOR SEQ ID NO:108: (i) SEQUENCE CHARACTERISTICS:

[1532]

(A) LENGTH: 41 base pairs

[1533]

(B) TYPE: nucleic acid

[1534]

(C) STRANDEDNESS: single

[1535]

(D) TOPOLOGY: linear (ix) FEATURE:

[1536]

(A) NAME/KEY: misc_feature

[1537]

(B) LOCATION: 1

[1538]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:108: TGTAAAACGA CGGCCAGTTT ATTACAGAAT AAAGGAGGTA G 41

[1539]

(2) INFORMATION FOR SEQ ID NO:109: (i) SEQUENCE CHARACTERISTICS:

[1540]

(A) LENGTH: 39 base pairs

[1541]

(B) TYPE: nucleic acid

[1542]

(C) STRANDEDNESS: single

[1543]

(D) TOPOLOGY: linear (ix) FEATURE:

[1544]

(A) NAME/KEY: misc_feature

[1545]

(B) LOCATION: 1

[1546]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:109: TGTAAAACGA CGGCCAGTAA CCCACAAAAT TTGGCTAAG 39

[1547]

(2) INFORMATION FOR SEQ ID NO:110: (i) SEQUENCE CHARACTERISTICS:

[1548]

(A) LENGTH: 20 base pairs

[1549]

(B) TYPE: nucleic acid

[1550]

(C) STRANDEDNESS: single

[1551]

(D) TOPOLOGY: linear (ix) FEATURE:

[1552]

(A) NAME/KEY: misc_feature

[1553]

(B) LOCATION: 1

[1554]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:110: TCTCCATTTC CAAAACCTTG 20 (2) INFORMATION FOR SEQ ID NO:111: (i) SEQUENCE CHARACTERISTICS:

[1555]

(A) LENGTH: 18 base pairs

[1556]

(B) TYPE: nucleic acid

[1557]

(C) STRANDEDNESS: single

[1558]

(D) TOPOLOGY: linear (i ) FEATURE:

[1559]

(A) NAME/KEY: misc_feature

[1560]

(B) LOCATION: 1

[1561]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:111: TGTCTCTAGT TCTGGTGC 18

[1562]

(2) INFORMATION FOR SEQ ID NO:112: (i) SEQUENCE CHARACTERISTICS:

[1563]

(A) LENGTH: 38 base pairs

[1564]

(B) TYPE: nucleic acid

[1565]

(C) STRANDEDNESS: single

[1566]

(D) TOPOLOGY: linear (ix) FEATURE:

[1567]

(A) NAME/KEY: misc_feature

[1568]

(B) LOCATION: 1

[1569]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:112: TGTAAAACGA CGGCCAGTTG TTGTAGTAGC TCTGCTTG 38

[1570]

(2) INFORMATION FOR SEQ ID NO:113: (i) SEQUENCE CHARACTERISTICS:

[1571]

(A) LENGTH: 20 base pairs

[1572]

(B) TYPE: nucleic acid

[1573]

(C) STRANDEDNESS: single

[1574]

(D) TOPOLOGY: linear (ix) FEATURE:

[1575]

(A) NAME/KEY: misc_feature

[1576]

(B) LOCATION: 1

[1577]

/D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID ATTTGTCCCA ACTGGTTGTA 20

[1578]

(2) INFORMATION FOR SEQ ID NO:114: (i) SEQUENCE CHARACTERISTICS:

[1579]

(A) LENGTH: 39 base pairs

[1580]

(B) TYPE: nucleic acid

[1581]

(C) STRANDEDNESS: single

[1582]

(D) TOPOLOGY: linear (ix) FEATURE:

[1583]

(A) NAME/KEY: misc_feature

[1584]

(B) LOCATION: 1

[1585]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:114: TGTAAAACGA CGGCCAGTTC AGTTGAAATG TCAGAAGTG 39

[1586]

(2) INFORMATION FOR SEQ ID NO:115: (i) SEQUENCE CHARACTERISTICS:

[1587]

(A) LENGTH: 18 base pairs

[1588]

(B) TYPE: nucleic acid

[1589]

(C) STRANDEDNESS: single

[1590]

(D) TOPOLOGY: linear (ix) FEATURE:

[1591]

(A) NAME/KEY: misc_feature

[1592]

(B) LOCATION: 1

[1593]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:115: TGTAAAACGA CGGCCAGT 18

[1594]

(2) INFORMATION FOR SEQ ID NO:116: (i) SEQUENCE CHARACTERISTICS:

[1595]

(A) LENGTH: 23 base pairs

[1596]

(B) TYPE: nucleic acid

[1597]

(C) STRANDEDNESS: single

[1598]

(D) TOPOLOGY: linear (ix) FEATURE:

[1599]

(A) NAME/KEY: misc_feature

[1600]

(B) LOCATION: 1

[1601]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:116: CCGGCTGGAA ATTTTATTTG GAG 23 (2) INFORMATION FOR SEQ ID NO:117: (i) SEQUENCE CHARACTERISTICS:

[1602]

(A) LENGTH: 41 base pairs

[1603]

(B) TYPE: nucleic acid

[1604]

(C) STRANDEDNESS: single

[1605]

(D) TOPOLOGY: linear (ix) FEATURE:

[1606]

(A) NAME/KEY: misc_feature

[1607]

(B) LOCATION: 1

[1608]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:117: TGTAAAACGA CGGCCAGTAG GCACTGGAGA AATGGGATTT G 41

[1609]

(2) INFORMATION FOR SEQ ID NO:118: (i) SEQUENCE CHARACTERISTICS:

[1610]

(A) LENGTH: 26 base pairs

[1611]

(B) TYPE: nucleic acid

[1612]

(C) STRANDEDNESS: single

[1613]

(D) TOPOLOGY: linear (ix) FEATURE:

[1614]

(A) NAME/KEY: misc_feature

[1615]

(B) LOCATION: 1

[1616]

(D) .OTHER INFORMATION: /note= "primers directed.to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:118: TCCAGCACAC ATGCATGTAC CGAAAT 26

[1617]

(2) INFORMATION FOR SEQ ID NO:119: (i) SEQUENCE CHARACTERISTICS:

[1618]

(A) LENGTH: 20 base pairs

[1619]

(B) TYPE: nucleic acid

[1620]

(C) STRANDEDNESS: single

[1621]

(D) TOPOLOGY: linear (i ) FEATURE:

[1622]

(A) NAME/KEY: misc_feature

[1623]

(B) LOCATION: 1

[1624]

(D) OTHER INFORMATION: /note= "primer directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:119: GTAGTCTGTG ATCTCCGTTT 20 (2) INFORMATION FOR SEQ ID NO:120: (i) SEQUENCE CHARACTERISTICS:

[1625]

(A) LENGTH: 36 base pairs

[1626]

(B) TYPE: nucleic acid

[1627]

(C) STRANDEDNESS: single

[1628]

(D) TOPOLOGY: linear (i ) FEATURE:

[1629]

(A) NAME/KEY: misc_feature

[1630]

(B) LOCATION: 1

[1631]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:120: TGTAAAACGA CGGCCAGTTA TGAGGTCCTG TCCTAG 36

[1632]

(2) INFORMATION FOR SEQ ID NO:121: (i) SEQUENCE CHARACTERISTICS:

[1633]

(A) LENGTH: 19 base pairs

[1634]

(B) TYPE: nucleic acid

[1635]

(C) STRANDEDNESS: single

[1636]

(D) TOPOLOGY: linear (ix) FEATURE:

[1637]

(A) NAME/KEY: misc_feature

[1638]

(B) LOCATION: 1

[1639]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:121: ACCAGTGTAT GTTGGGATG 19

[1640]

(2) INFORMATION FOR SEQ ID NO:122: (i) SEQUENCE CHARACTERISTICS:

[1641]

(A) LENGTH: 39 base pairs

[1642]

(B) TYPE: nucleic acid

[1643]

(C) STRANDEDNESS: single

[1644]

(D) TOPOLOGY: linear (ix) FEATURE:

[1645]

(A) NAME/KEY: misc_feature

[1646]

(B) LOCATION: 1

[1647]

(D) OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:122: TGTAAAACGA CGGCCAGTGA AAGAAGAACA CATCCCACA 39 (2) INFORMATION FOR SEQ ID NO:123: (i) SEQUENCE CHARACTERISTICS:

[1648]

(A) LENGTH: 770 amino acids

[1649]

(B) TYPE: amino acid

[1650]

(C) STRANDEDNESS: single

[1651]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein

[1652]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:123:

[1653]

Met Ser Leu Arg lie Lys Ala Leu Asp Ala Ser Val Val Asn Lys lie 1 5 10 15

[1654]

Ala Ala Gly Glu lie lie lie Ser Pro Val Asn Ala Leu Lys Glu Met

[1655]

20 25 30

[1656]

Met Glu Asn Ser lie Asp Ala Asn Ala Thr Met lie Asp lie Leu Val

[1657]

35 40 45

[1658]

Lys Glu Gly Gly lie Lys Val Leu Gin lie Thr Asp Asn Gly Ser Gly

[1659]

50 55 60 lie Asn Lys Ala Asp Leu Pro lie Leu Cys Glu Arg Phe Thr Thr Ser 65 70 75 80

[1660]

Lys Leu Gin Lys Phe Glu Asp Leu Ser Gin lie Gin Thr Tyr Gly Phe

[1661]

85 90 95

[1662]

Arg Gly Glu Ala Leu Ala Ser lie Ser His Val Ala Arg Val Thr Val

[1663]

100 105 110.

[1664]

Thr Thr Lys Val Lys Glu Asp Arg Cys Ala Trp Arg Val Ser Tyr Ala

[1665]

115 120 125

[1666]

Glu Gly Lys Met Leu Glu Ser Pro Lys Pro Val Ala Gly Lys Asp Gly

[1667]

130 135 140

[1668]

Thr Thr lie Leu Val Glu Asp Leu Phe Phe Asn lie Pro Ser Arg Leu 145 150 155 160

[1669]

Arg Ala Leu Arg Ser His Asn Asp Glu Tyr Ser Lys lie Leu Asp Val

[1670]

165 170 175

[1671]

Val Gly Arg Tyr Ala lie His Ser Lys Asp lie Gly Phe Ser Cys Lys

[1672]

180 185 190

[1673]

Lys Phe Gly Asp Ser Asn Tyr Ser Leu Ser Val Lys Pro Ser Tyr Thr

[1674]

195 200 205

[1675]

Val Gin Asp Arg lie Arg Thr Val Phe Asn Lys Ser Val Ala Ser Asn

[1676]

210 215 220

[1677]

Leu lie Thr Phe His lie Ser Lys Val Glu Asp Leu Asn Leu Glu Ser 225 230 235 240

[1678]

Val Asp Gly Lys Val Cys Asn Leu Asn Phe lie Ser Lys Lys Ser lie

[1679]

245 250 255

[1680]

Ser Leu lie Phe Phe lie Asn Asn Arg Leu Val Thr Cys Asp Leu Leu

[1681]

260 265 270

[1682]

Arg Arg Ala Leu Asn Ser Val Tyr Ser Asn Tyr Leu Pro Lys Gly Phe 275 280 285 Arg Pro Phe lie Tyr Leu Gly lie Val lie Asp Pro Ala Ala Val Asp

[1683]

290 295 300

[1684]

Val Asn Val His Pro Thr Lys Arg Glu Val Arg Phe Leu Ser Gin Asp 305 310 315 320

[1685]

Glu lie lie Glu Lys lie Ala Asn Gin Leu His Ala Glu Leu Ser Ala

[1686]

325 330 335 lie Asp Thr Ser Arg Thr Phe Lys Ala Ser Ser lie Ser Thr Asn Lys

[1687]

340 345 350

[1688]

Pro Glu Ser Leu lie Pro Phe Asn Asp Thr lie Glu Ser Asp Arg Asn

[1689]

355 360 365

[1690]

Arg Lys Ser Leu Arg Gin Ala Gin Val Val Glu Asn Ser Tyr Thr Thr

[1691]

370 375 380

[1692]

Ala Asn Ser Gin Leu Arg Lys Ala Lys Arg Gin Glu Asn Lys Leu Val 385 390 395 400

[1693]

Arg lie Asp Ala Ser Gin Ala Lys lie Thr Ser Phe Leu Ser Ser Ser

[1694]

405 410 415

[1695]

Gin Gin Phe Asn Phe Glu Gly Ser Ser Thr Lys Arg Gin Leu Ser Glu

[1696]

420 425 430

[1697]

Pro Lys Val Thr Asn Val Ser His Ser Gin Glu Ala Glu Lys Leu Thr

[1698]

435 440 445

[1699]

Leu Asn Glu Ser Glu Gin Pro Arg Asp Ala Asn Thr lie Asn Asp Asn

[1700]

450 455 460

[1701]

Asp Leu Lys Asp Gin Pro Lys Lys' Lys Gin Lys Gin Leu Gly Asp Tyr 465 470 475. 480

[1702]

Lys Val Pro Ser lie Ala Asp Asp Glu Lys Asn Ala Leu Pro lie Ser

[1703]

485 490 495

[1704]

Lys Asp Gly Tyr lie Arg Val Pro Lys Glu Arg Val Asn Val Asn Leu

[1705]

. 500 505 510

[1706]

Thr Ser lie Lys Lys Leu Arg Glu Lys Val Asp Asp Ser lie His Arg

[1707]

515 520 525

[1708]

Glu Leu Thr Asp lie Phe Ala Asn Leu Asn Tyr Val Gly Val Val Asp

[1709]

530 535 540

[1710]

Glu Glu Arg Arg Leu Ala Ala lie Gin His Asp Leu Lys Leu Phe Leu 545 550 555 560 lie Asp Tyr Gly Ser Val Cys Tyr Glu Leu Phe Tyr Gin lie Gly Leu

[1711]

565 570 575

[1712]

Thr Asp Phe Ala Asn Phe Gly Lys lie Asn Leu Gin Ser Thr Asn Val

[1713]

580 585 590

[1714]

Ser Asp Asp lie Val Leu Tyr Asn Leu Leu Ser Glu Phe Asp Glu Leu

[1715]

595 600 605

[1716]

Asn Asp Asp Ala Ser Lys Glu Lys lie lie Ser Lys lie Trp Asp Met

[1717]

610 615 620

[1718]

Ser Ser Met Leu Asn Glu Tyr Tyr Ser lie Glu Leu Val Asn Asp Gly 625 630 635 640 Leu Asp Asn Asp Leu Lys Ser Val Lys Leu Lys Ser Leu Pro Leu Leu

[1719]

645 650 655

[1720]

Leu Lys Gly Tyr lie Pro Ser Leu Val Lys Leu Pro Phe Phe lie Tyr

[1721]

660 665 670

[1722]

Arg Leu Gly Lys Glu Val Asp Trp Glu Asp Glu Gin Glu Cys Leu Asp

[1723]

675 680 685

[1724]

Gly lie Leu Arg Glu lie Ala Leu Leu Tyr lie Pro Asp Met Val Pro

[1725]

690 695 700

[1726]

Lys Val Asp Thr Leu Asp Ala Ser Leu Ser Glu Asp Glu Lys Ala Gin 705 710 715 720

[1727]

Phe lie Asn Arg Lys Glu His lie Ser Ser Leu Leu Glu His Val Leu

[1728]

725 730 735

[1729]

Phe Pro Cys lie Lys Arg Arg Phe Leu Ala Pro Arg His lie Leu Lys

[1730]

740 745 750

[1731]

Asp Val Val Glu lie Ala Asn Leu Pro Asp Leu Tyr Lys Val Phe Glu

[1732]

755 760 765

[1733]

Arg Cys 770

[1734]

(2) INFORMATION FOR SEQ ID NO:124: (i) SEQUENCE CHARACTERISTICS: ' (A) LENGTH: 64 amino acids

[1735]

(B) TYPE: amino acid

[1736]

(C) STRANDEDNESS: single

[1737]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein

[1738]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:124:

[1739]

Val Asn Arg lie Ala Ala Gly Glu Val lie Gin Arg Pro Ala Asn Ala 1 5 10 15 lie Lys Glu Met lie Glu Asn Cys Leu Asp Ala Lys Phe Thr Ser lie

[1740]

20 25 30

[1741]

Gin Val lie Val Lys Glu Gly Gly Leu Lys Leu lie Gin lie Gin Asp

[1742]

35 40 45

[1743]

Asn Gly Thr Gly lie Arg Lys Glu Asp Leu Asp lie Val Cys Glu Arg 50 55 60

[1744]

(2) INFORMATION FOR SEQ ID NO:125: (i) SEQUENCE CHARACTERISTICS:

[1745]

(A) LENGTH: 64 amino acids

[1746]

(B) TYPE: amino acid

[1747]

(C) STRANDEDNESS: single

[1748]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:125:

[1749]

Val Asn Arg lie Ala Ala Gly Glu Val lie Gin Arg Pro Ala Asn Ala 1 5 10 15 lie Lys Glu Met lie Glu Asn Cys Leu Asp Ala Lys Ser Thr Ser lie

[1750]

20 25 30

[1751]

Gin Val lie Val Lys Glu Gly Gly Leu Lys Leu lie Gin lie Gin Asp

[1752]

35 40 45

[1753]

Asn Gly Thr Gly lie Arg Lys Glu Asp Leu Asp lie Val Cye Glu Arg 50 55 60

[1754]

(2) INFORMATION FOR SEQ ID NO:126: (i) SEQUENCE CHARACTERISTICS:

[1755]

(A) LENGTH: 52 amino acids

[1756]

(B) TYPE: amino acid

[1757]

(C) STRANDEDNESS: single

[1758]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein

[1759]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:126: Pro Ala Asn Ala lie Lys Glu Met lie Glu Asn Cys Leu Asp Ala Lys 1 5 10 15

[1760]

Ser Thr Asn lie Gin Val Val Val Lys Glu Gly Gly Leu Lys Leu lie

[1761]

20 25 30

[1762]

Gin lie Gin Asp Asn Gly Thr Gly lie Arg Lys Glu Asp Leu Asp lie

[1763]

35 40 45 .

[1764]

Val Cys Glu Arg 50

[1765]

(2) INFORMATION FOR SEQ ID NO:127: (i) SEQUENCE CHARACTERISTICS:

[1766]

(A) LENGTH: 64 amino acids

[1767]

(B) TYPE: amino acid

[1768]

(C) STRANDEDNESS: single

[1769]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein

[1770]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:127:

[1771]

Val Asn Lys lie Ala Ala Gly Glu lie lie lie Ser Pro Val Asn Ala 1 5 10 15

[1772]

Leu Lys Glu Met Met Glu Asn Ser lie Asp Ala Asn Ala Thr Met lie

[1773]

20 25 30

[1774]

Asp lie Leu Val Lys Glu Gly Gly lie Lys Val Leu Gin lie Thr Asp

[1775]

35 40 45

[1776]

Asn Gly Ser Gly lie Asn Lys Ala Asp Leu Pro lie Leu Cys Glu Arg 50 55 60 (2) INFORMATION FOR SEQ ID NO:128: (i) SEQUENCE CHARACTERISTICS:

[1777]

(A) LENGTH: 64 amino acids

[1778]

(B) TYPE: amino acid

[1779]

(C) STRANDEDNESS: single

[1780]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein

[1781]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:128:

[1782]

Val His Arg lie Thr Ser Gly Gin Val lie Thr Asp Leu Thr Thr Ala 1 5 10 15

[1783]

Val Lys Glu Leu Val Asp Asn Ser lie Asp Ala Asn Ala Asn Gin lie

[1784]

20 25 30

[1785]

Glu lie lie Phe Lys Asp Tyr Gly Leu Glu Ser lie Glu Cys Ser Asp

[1786]

35 40 45

[1787]

Asn Gly Asp Gly lie Asp Pro Ser Asn Tyr Glu Phe Leu Ala Leu Lys 50 55 60

[1788]

(2) INFORMATION FOR SEQ ID NO:129: (i) SEQUENCE CHARACTERISTICS:

[1789]

(A) LENGTH: 64 amino acids

[1790]

(B) TYPE: amino acid

[1791]

(C) STRANDEDNESS: single

[1792]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein

[1793]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:129:

[1794]

Ala Asn Gin lie Ala Ala Gly Glu Val Val Glu Arg Pro Ala Ser Val 1 . 5 10 15

[1795]

Val Lys Glu Leu Val Glu Asn Ser Leu Asp Ala Gly Ala Thr Arg lie

[1796]

20 25 30

[1797]

Asp lie Asp lie Glu Arg Gly Gly Ala Lys Leu lie Arg lie Arg Asp

[1798]

35 40 45

[1799]

Asn Gly Cys Gly lie Lys Lys Asp Glu Leu Ala Leu Ala Leu Ala Arg 50 55 60

[1800]

(2) INFORMATION FOR SEQ ID NO:130: (i) SEQUENCE CHARACTERISTICS:

[1801]

(A) LENGTH: 64 amino acids

[1802]

(B) TYPE: amino acid

[1803]

(C) STRANDEDNESS: single

[1804]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein

[1805]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:130:

[1806]

Ala Asn Gin lie Ala Ala Gly Glu Val Val Glu Arg Pro Ala Ser Val 1 5 10 15 Val Lys Glu Leu Val Glu Asn Ser Leu Asp Ala Gly Ala Thr Arg Val

[1807]

20 25 30

[1808]

Asp He Asp He Glu Arg Gly Gly Ala Lys Leu He Arg He Arg Asp

[1809]

35 40 45

[1810]

Asn Gly Cys Gly He Lys Lys Glu Glu Leu Ala Leu Ala Leu Ala Arg 50 55 60

[1811]

(2) INFORMATION FOR SEQ ID NO:131: (i) SEQUENCE CHARACTERISTICS:

[1812]

(A) LENGTH: 64 amino acids

[1813]

(B) TYPE: amino acid

[1814]

(C) STRANDEDNESS: single

[1815]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein

[1816]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:131:

[1817]

Ala Asn Gin He Ala Ala Gly Glu Val He Glu Arg Pro Ala Ser Val 1 5 10 15

[1818]

Cys Lys Glu Leu Val Glu Asn Ala He Asp Ala Gly Ser Ser Gin He

[1819]

20 25 30

[1820]

He He Glu He Glu Glu Ala Gly Leu Lys Lys Val Gin He Thr Asp

[1821]

35 40 45

[1822]

Asn Gly His Gly He Ala His Asp Glu Val Glu Leu Ala Leu Arg Arg 50 55 60

[1823]

(2) INFORMATION FOR SEQ ID NO:132: (i) SEQUENCE CHARACTERISTICS:

[1824]

(A) LENGTH: 2687 base pairs

[1825]

(B) TYPE: nucleic acid

[1826]

(C) STRANDEDNESS: single

[1827]

(D) TOPOLOGY: linear

[1828]

(ii) MOLECULE TYPE: DNA (genomic) (viii) POSITION IN GENOME:

[1829]

(B) MAP POSITION: 7q

[1830]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:132:

[1831]

CCATGGAGCG AGCTGAGAGC TCGAGTACAG AACCTGCTAA GGCCATCAAA CCTATTGATC 60

[1832]

GGAAGTCAGT CCATCAGATT TGCTCTGGGC AGGTGGTACT GAGTCTAAGC ACTGCGGTAA 120

[1833]

AGGAGTTAGT AGAAAACAGT CTGGATGCTG GTGCCACTAA TATTGATCTA AAGCTTAAGG 180

[1834]

ACTATGGAGT GGATCTTATT GAAGTTTCAG ACAATGGATG TGGGGTAGAA GAAGAAAACT 240

[1835]

TCGAAGGCTT AACTCTGAAA CATCACACAT CTAAGATTCA AGAGTTTGCC GACCTAACTC 300

[1836]

AGGTTGAAAC TTTTGGCTTT CGGGGGGAAG CTCTGAGCTC ACTTTGTGCA CTGAGCGATG 360

[1837]

TCACCATTTC TACCTGCCAC GCATCGGCGA AGGTTGGAAC TCGACTGATG TTTGATCACA 420

[1838]

ATGGGAAAAT TATCCAGAAA ACCCCCTACC CCCGCCCCAG AGGGACCACA GTCAGCGTGC 480

[1839]

AGCAGTTATT TTCCACACTA CCTGTGCGCC ATAAGGAATT TCAAAGGAAT ATTAAGAAGG 540

[1840]

AGTATGCCAA AATGGTCCAG GTCTTACATG CATACTGTAT CATTTCAGCA GGCATCCGTG 600

[1841]

TAAGTTGCAC CAATCAGCTT GGACAAGGAA AACGACAGCC TGTGGTATGC ACAGGTGGAA 660 GCCCCAGCAT AAAGGAAAAT ATCGGCTCTG TGTTTGGGCA GAAGCAGTTG CAAAGCCTCA 720 TTCCTTTTGT TCAGCTGCCC CCTAGTGACT CCGTGTGTGA AGAGTACGGT TTGAGCTGTT 780 CGGATGCTCT GCATAATCTT TTTTACATCT CAGGTTTCAT TTCACAATGC ACGCATGGAG 840 TTGGAAGGAG TTCAACAGAC AGACAGTTTT TCTTTATCAA CCGGCGGCCT TGTGACCCAG 900 CAAAGGTCTG CAGACTCGTG AATGAGGTCT ACCACATGTA TAATCGACAC CAGTATCCAT 960 TTGTTGTTCT TAACATTTCT GTTGATTCAG AATGCGTTGA TATCAATGTT ACTCCAGATA 1020 AAAGGCAAAT TTTGCTACAA GAGGAAAAGC TTTTGTTGGC AGTTTTAAAG ACCTCTTTGA 1080 TAGGAATGTT TGATAGTGAT GTCAACAAGC TAAATGTCAG TCAGCAGCCA CTGCTGGATG 1140 TTGAAGGTAA CTTAATAAAA ATGCATGCAG CGGATTTGGA AAAGCCCATG GTAGAAAAGC 1200 AGGATCAATC CCCTTCATTA AGGACTGGAG AAGAAAAAAA AGACGTGTCC ATTTCCAGAC 1260 TGCGAGAGGC CTTTTCTCTT CGTCACACAA CAGAGAACAA GCCTCACAGC CCAAAGACTC 1320 CAGAACCAAG AAGGAGCCCT CTAGGACAGA AAAGGGGTAT GCTGTCTTCT AGCACTTCAG 1380 GTGCCATCTC TGACAAAGGC GTCCTGAGAT CTCAGAAAGA GGCAGTGAGT TCCAGTCACG 1440 GACCCAGTGA CCCTACGGAC AGAGCGGAGG TGGAGAAGGA CTCGGGGCAC GGCAGCACTT 1500 CCGTGGATTC TGAGGGGTTC AGCATCCCAG ACACGGGCAG TCACTGCAGC AGCGAGTATG 1560 CGGCCAGCTC CCCAGGGGAC AGGGGCTCGC AGGAACATGT GGACTCTCAG GAGAAAGCGC 1620 CTGAAACTGA CGACTCTTTT TCAGATGTGG ACTGCCATTC AAACCAGGAA GATACCGGAT 1680 GTAAATTTCG AGTTTTGCCT CAGCCAACTA ATCTCGCAAC CCCAAACACA AAGCGTTTTA 1740 AAAAAGAAGA AATTCTTTCC AGTTCTGACA TTTGTCAAAA GTTAGTAAAT ACTCAGGACA 1800 TGTCAGCCTC TCAGGTTGAT TGAGCTGTGA AAATTAATAA GAAAGTTGTG CCCCTGGACT 1860 TTTCTATGAG TTCTTTAGCT AAACGAATAA AGCAGTTACA TCATGAAGCA CAGCAAAGTG 1920 AAGGGGAACA GAATTACAGG AAGTTTAGGG CAAAGATTTG TCCTGGAGAA AATCAAGCAG 1980 CCGAAGATGA ACTAAGAAAA GAGATAAGTA AAACGATGTT TGCAGAAATG GAAATCATTG 2040 GTCAGTTTAA CCTGGGATTT ATAATAACCA AACTGAATGA GGATATCTTC ATAGTGGACC.2100 AGCATGCCAC GGACGAGAAG TATAACTTCG AGATGCTGCA GCAGCACACC GTGCTCCAGG 2160 GGCAGAGGCT CATAGCACCT CAGACTCTCA ACTTAACTGC TGTTAATGAA GCTGTTCTGA 2220 TAGAAAATCT GGAAATATTT AGAAAGAATG GCTTTGATTT TGTTATCGAT GAAAATGCTC 2280 CAGTCACTGA AAGGGCTAAA CTGATTTCCT TGCCAACTAG TAAAAACTGG ACCTTCGGAC 2340 CCCAGGACGT CGATGAACTG ATCTTCATGC TGAGCGACAG CCCTGGGGTC ATGTGCCGCC 2400 CTTCCCGAGT CAAGCAGATG TTTGCCTCCA GAGCCTGCCG GAAGTCGGTG ATGATTGGGA 2460 CTGCTCTCAA CACAAGCGAA TGAAGAAACT GATCACCCAC ATGGGGGAGA TGGGCCACCC 2520 CTGGAACTGT CCCCATGGAA GGCCACCATG AGACACATCG CCAACCTGGG TGTCATTTCT 2580 CAGAACTGAC CGTAGTCACT GTATGGAATA ATTGGTTTTA TCGCAGATTT TTATGTTTTG 2640 AAAGACAGAG TCTTCACTAA CCTTTTTTGT TTTAAAATGA AACCTGC 2687

[1842]

(2) INFORMATION FOR SEQ ID NO:133: (i) SEQUENCE CHARACTERISTICS:

[1843]

(A) LENGTH: 862 amino acids

[1844]

(B) TYPE: amino acid

[1845]

(C) STRANDEDNESS: single

[1846]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein

[1847]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:133:

[1848]

Met Glu Arg Ala Glu Ser Ser Ser Thr Glu Pro Ala Lys Ala He Lys 1 5 10 15 Pro He Asp Arg Lys Ser Val His Gin He Cys Ser Gly Gin Val Val

[1849]

20 25 30

[1850]

Leu Ser Leu Ser Thr Ala Val Lys Glu Leu Val Glu Asn Ser Leu Asp

[1851]

35 40 45

[1852]

Ala Gly Ala Thr Asn He Asp Leu Lys Leu Lys Asp Tyr Gly Val Asp

[1853]

50 55 60

[1854]

Leu He Glu Val Ser Asp Asn Gly Cys Gly Val Glu Glu Glu Asn Phe 65 70 75 80

[1855]

Glu Gly Leu Thr Leu Lys His His Thr Ser Lys He Gin Glu Phe Ala

[1856]

85 90 95

[1857]

Asp Leu Thr Gin Val Glu Thr Phe Gly Phe Arg Gly Glu Ala Leu Ser

[1858]

100 105 110

[1859]

Ser Leu Cys Ala Leu Ser Asp Val Thr He Ser Thr Cys His Ala Ser

[1860]

115 120 125

[1861]

Ala Lys Val Gly Thr Arg Leu Met Phe Asp His Asn Gly Lys He He

[1862]

130 135 140

[1863]

Gin Lys Thr Pro Tyr Pro Arg Pro Arg Gly Thr Thr Val Ser Val Gin 145 150 155 160

[1864]

Gin Leu Phe Ser Thr Leu Pro Val Arg His Lys Glu Phe Gin Arg Asn

[1865]

165 170 175

[1866]

He- Lys Lys Glu Tyr Ala Lys Met Val Gin Val Leu His Ala Tyr Cys

[1867]

180 185 190

[1868]

He He Ser Ala Gly He Arg Val Ser Cys Thr Asn Gin Leu- Gly Gin

[1869]

195 200 205

[1870]

Gly Lys Arg Gin Pro Val Val Cys He Gly Gly Ser Pro Ser He Lys

[1871]

210 215 220

[1872]

Glu Asn He Gly Ser Val Phe Gly Gin Lys Gin Leu Gin Ser Leu He 225 230 235 240

[1873]

Pro Phe Val Gin Leu Pro Pro Ser Asp Ser Val Cys Glu Glu Tyr Gly

[1874]

245 250 255

[1875]

Leu Ser Cys Ser Asp Ala Leu His Asn Leu Phe Tyr He Ser Gly Phe

[1876]

260 265 270

[1877]

He Ser Gin Cys Thr His Gly Val Gly Arg Ser Ser Thr Asp Arg Gin

[1878]

275 280 285

[1879]

Phe Phe Phe He Asn Arg Arg Pro Cys Asp Pro Ala Lys Val Cys Arg

[1880]

290 295 300

[1881]

Leu Val Asn Glu Val Tyr His Met Tyr Asn Arg His Gin Tyr Pro Phe 305 310 315 320

[1882]

Val Val Leu Asn He Ser Val Asp Ser Glu Cys Val Asp He Asn Val

[1883]

325 330 335

[1884]

Thr Pro Asp Lys Arg Gin He Leu Leu Gin Glu Glu Lys Leu Leu Leu

[1885]

340 345 350

[1886]

Ala Val Leu Lys Thr Ser Leu He Gly Met Phe Asp Ser Asp Val Asn 355 360 365 Lys Leu Asn Val Ser Gin Gin Pro Leu Leu Asp Val Glu Gly Asn Leu

[1887]

370 375 380

[1888]

He Lys Met His Ala Ala Asp Leu Glu Lys Pro Met Val Glu His Gin 385 390 395 400

[1889]

Asp Gin Ser Pro Ser Leu Arg He Gly Glu Glu Lys Lys Asp Val Ser

[1890]

405 410 415

[1891]

He Ser Arg Leu Arg Glu Ala Phe Ser Leu Arg His Thr Thr Glu Asn

[1892]

420 425 430

[1893]

Lys Pro His Ser Pro Lys Thr Pro Glu Pro Arg Arg Ser Pro Leu Gly

[1894]

435 440 445

[1895]

Gin Lys Arg Gly Met Leu Ser Ser Ser Thr Ser Gly Ala He Ser Asp

[1896]

450 455 460

[1897]

Lys Gly Val Leu Arg Ser Gin Lys Glu Ala Val Ser Ser Ser His Gly 465 470 475 480

[1898]

Pro Ser Asp Pro Thr Asp Arg Ala Glu Val Glu Lys Asp Ser Gly His

[1899]

485 490 495

[1900]

Gly Ser Thr Ser Val Asp Ser Glu Gly Phe Ser He Pro Asp Thr Gly

[1901]

500 505 510

[1902]

Ser His Cys Ser Ser Glu Tyr Ala Ala Ser Ser Pro Gly Asp Arg Gly

[1903]

515 520 525

[1904]

Ser Gin Glu His Val Asp Ser Gin Glu Lys Ala Pro Glu Thr Asp Asp

[1905]

530 535 540

[1906]

Ser Phe Ser Asp Val Asp Cys His Ser Asn Gin Glu Asp Thr Gly Cys 545 550 555 560

[1907]

Lys Phe Arg Val Leu Pro Gin Pro He Asn Leu Ala Thr Pro Asn Thr

[1908]

565 570 575

[1909]

Lys Arg Phe Lys Lys Glu Glu He Leu Ser Ser Ser Asp He Cys Gin

[1910]

580 585 590

[1911]

Lys Leu Val Asn Thr Gin Asp Met Ser Ala Ser Gin Val Asp Val Ala

[1912]

595 600 605

[1913]

Val Lys He Asn Lys Lys Val Val Pro Leu Asp Phe Ser Met Ser Ser

[1914]

610 615 620

[1915]

Leu Ala Lys Arg He Lys Gin Leu His His Glu Ala Gin Gin Ser Glu 625 630 635 640

[1916]

Gly Glu Gin Asn Tyr Arg Lys Phe Arg Ala Lys He Cys Pro Gly Glu

[1917]

645 650 655

[1918]

Asn Gin Ala Ala Glu Asp Glu Leu Arg Lys Glu He Ser Lys Thr Met

[1919]

660 665 670

[1920]

Phe Ala Glu Met Glu He He Gly Gin Phe Asn Leu Gly Phe He He

[1921]

675 680 685

[1922]

Thr Lys Leu Asn Glu Asp He Phe He Val Asp Gin His Ala Thr Asp

[1923]

690 695 700

[1924]

Glu Lys Tyr Asn Phe Glu Met Leu Gin Gin His Thr Val Leu Gin Gly 705 710 715 720 Gin Arg Leu He Ala Pro Gin Thr Leu Asn Leu Thr Ala Val Asn Glu

[1925]

725 730 735

[1926]

Ala Val Leu He Glu Asn Leu Glu He Phe Arg Lys Asn Gly Phe Asp

[1927]

740 745 750

[1928]

Phe Val He Asp Glu Asn Ala Pro Val Thr Glu Arg Ala Lys Leu He

[1929]

755 760 765

[1930]

Ser Leu Pro Thr Ser Lys Asn Trp Thr Phe Gly Pro Gin Asp Val Asp

[1931]

770 775 780

[1932]

Glu Leu He Phe Met Leu Ser Asp Ser Pro Gly Val Met Cys Arg Pro 785 790 795 800

[1933]

Ser Arg Val Lys Gin Met Phe Ala Ser Arg Ala Cys Arg Lys Ser Val

[1934]

805 810 815

[1935]

Met He Gly Thr Ala Leu Asn Thr Ser Glu Met Lys Lys Leu He Thr

[1936]

820 825 830

[1937]

His Met Gly Glu Met Gly His Pro Trp Asn Cys Pro His Gly Arg Pro

[1938]

835 840 845

[1939]

Thr Met Arg His He Ala Asn Leu Gly Val He Ser Gin Asn 850 855 860

[1940]

(2) INFORMATION FOR SEQ ID NO:134: (i) SEQUENCE CHARACTERISTICS:

[1941]

(A) LENGTH: 903 amino acids

[1942]

(B) TYPE: amino acid

[1943]

(C) STRANDEDNESS: single

[1944]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein

[1945]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:134:

[1946]

Met Phe His His He Glu Asn Leu Leu He Glu Thr Glu Lys Arg Cys 1 5 10 15

[1947]

Lys Gin Lys Glu Gin Arg Tyr He Pro Val Lys Tyr Leu Phe Ser Met

[1948]

20 25 30

[1949]

Thr Gin He His Gin He Asn Asp He Asp Val His Arg He Thr Ser

[1950]

35 40 45

[1951]

Gly Gin Val He Thr Asp Leu Thr Thr Ala Val Lys Glu Leu Val Asp

[1952]

50 55 60

[1953]

Asn Ser He Asp Ala Asn Ala Asn Gin He Glu He He Phe Lys Asp 65 70 75 80

[1954]

Tyr Gly Leu Glu Ser He Glu Cys Ser Asp Asn Gly Asp Gly He Asp

[1955]

85 90 95

[1956]

Pro Ser Asn Tyr Glu Phe Leu Ala Leu Lys His Tyr Thr Ser Lys He

[1957]

100 105 110

[1958]

Ala Lys Phe Gin Asp Val Ala Lys Val Gin Thr Leu Gly Phe Arg Gly

[1959]

115 120 125

[1960]

Glu Ala Leu Ser Ser Leu Cys Gly He Ala Lys Leu Ser Val He Thr 130 135 140 Thr Thr Ser Pro Pro Lys Ala Asp Lys Leu Glu Tyr Asp Met Val Gly 145 150 155 160

[1961]

His He Thr Ser Lys Thr Thr Ser Arg Asn Lys Gly Thr Thr Val Leu

[1962]

165 170 175

[1963]

Val Ser Gin Leu Phe His Asn Leu Pro Val Arg Gin Lys Glu Phe Ser

[1964]

180 185 190

[1965]

Lys Thr Phe Lys Arg Gin Phe Thr Lys Cys Leu Thr Val He Gin Gly

[1966]

195 200 205

[1967]

Tyr Ala He He Asn Ala Ala He Lys Phe Ser Val Trp Asn He Thr

[1968]

210 215 220

[1969]

Pro Lys Gly Lys Lys Asn Leu He Leu Ser Thr Met Arg Asn Ser Ser 225 230 235 240

[1970]

Met Arg Lys Asn He Ser Ser Val Phe Gly Ala Gly Gly Met Phe Gly

[1971]

245 250 255

[1972]

Leu Glu Glu Val Asp Leu Val Leu Asp Leu Asn Pro Phe Lys Asn Arg

[1973]

260 265 270

[1974]

Met Leu Gly Lys Tyr Thr Asp Asp Pro Asp Phe Leu Asp Leu Asp Tyr

[1975]

275 280 285

[1976]

Lys He Arg Val Lys Gly Tyr He Ser Gin Asn Ser Phe Gly Cys Gly

[1977]

290 295 300

[1978]

Arg Asn Ser Lys Asp Arg Gin Phe He Tyr Val Asn Lys Arg Pro Val 305 310 315 320

[1979]

Glu Tyr Ser Thr Leu Leu Lys Cys Cys Asn Glu Val Tyr Lys -Thr Phe

[1980]

325 330 335

[1981]

Asn Asn Val Gin Phe Pro Ala Val Phe Leu Asn Leu Glu Leu Pro Met

[1982]

340 345 350

[1983]

Ser Leu He Asp Val Asn Val Thr Pro Asp Lys Arg Val He Leu Leu

[1984]

355 360 365

[1985]

His Asn Glu Arg Ala Val He Asp He Phe Lys Thr Thr Leu Ser Asp

[1986]

370 375 380

[1987]

Tyr Tyr Asn Arg Gin Glu Leu Ala Leu Pro Lys Arg Met Cys Ser Gin 385 390 395 400

[1988]

Ser Glu Gin Gin Ala Gin Lys Arg Leu Lys Thr Glu Val Phe Asp Asp

[1989]

405 410 415

[1990]

Arg Ser Thr Thr His Glu Ser Asp Asn Glu Asn Tyr His Thr Ala Arg

[1991]

420 425 430

[1992]

Ser Glu Ser Asn Gin Ser Asn His Ala His Phe Asn Ser Thr Thr Gly

[1993]

435 440 445

[1994]

Val He Asp Lys Ser Asn Gly Thr Glu Leu Thr Ser Val Met Asp Gly

[1995]

450 455 460

[1996]

Asn Tyr Thr Asn Val Thr Asp Val He Gly Ser Glu Cys Glu Val Ser 465 470 475 480

[1997]

Val Asp Ser Ser Val Val Leu Asp Glu Gly Asn Ser Ser Thr Pro Thr 485 490 495 Lys Lys Leu Pro Ser He Lys Thr Asp Ser Gin Asn Leu Ser Asp Leu

[1998]

500 505 510

[1999]

Asn Leu Asn Asn Phe Ser Asn Pro Glu Phe Gin Asn He Thr Ser Pro

[2000]

515 520 525

[2001]

Asp Lys Ala Arg Ser Leu Glu Lys Val Val Glu Glu Pro Val Tyr Phe

[2002]

530 535 540

[2003]

Asp He Asp Gly Glu Lys Phe Gin Glu Lys Ala Val Leu Ser Gin Ala 545 550 555 560

[2004]

Asp Gly Leu Val Phe Val Asp Asn Glu Cys His Glu His Thr Asn Asp

[2005]

565 570 575

[2006]

Cys Cys His Gin Glu Arg Arg Gly Ser Thr Asp He Glu Gin Asp Asp

[2007]

580 585 590

[2008]

Glu Ala Asp Ser He Tyr Ala Glu He Glu Pro Val Glu He Asn Val

[2009]

595 600 605

[2010]

Arg Thr Pro Leu Lys Asn Ser Arg Lys Ser He Ser Lys Asp Asn Tyr

[2011]

610 615 620

[2012]

Arg Ser Leu Ser Asp Gly Leu Thr His Arg Lys Phe Glu Asp Glu He 625 630 635 640

[2013]

Leu Glu Tyr Asn Leu Ser Thr Lys Asn Phe Lys Glu He Ser Lys Asn

[2014]

645 650 655

[2015]

Gly Lys Gin' Met Ser Ser He He Ser Lys Arg Lys Ser Glu Ala Gin

[2016]

660 665 670

[2017]

Glu Asn He He Lys Asn Lys Asp Glu Leu Glu Asp Phe Glu Gin Gly

[2018]

675 680 685

[2019]

Glu Lys Tyr Leu Thr Leu Thr Val Ser Lys Asn Asp Phe Lys Lys Met

[2020]

690 695 700

[2021]

Glu Val Val Gly Gin Phe Asn Leu Gly Phe He He Val Thr Arg Lys 705 710 715 720

[2022]

Val Asp Asn Lys Ser Lys Leu Phe He Val Asp Gin His Ala Ser Asp

[2023]

725 730 735

[2024]

Glu Lys Tyr Asn Phe Glu Thr Leu Gin Ala Val Thr Val Phe Lys Ser

[2025]

740 745 750

[2026]

Gin Lys Leu He He Pro Gin Pro Val Glu Leu Ser Val He Asp Glu

[2027]

755 760 765

[2028]

Leu Val Val Leu Asp Asn Leu Pro Val Phe Glu Lys Asn Gly Phe Lys

[2029]

770 775 780

[2030]

Leu Lys He Asp Glu Glu Glu Glu Phe Gly Ser Arg Val Lys Leu Leu 785 790 795 800

[2031]

Ser Leu Pro Thr Ser Lys Gin Thr Leu Phe Asp Leu Gly Asp Phe Asn

[2032]

805 810 815

[2033]

Glu Leu He His Leu He Lys Glu Asp Gly Gly Leu Arg Arg Asp Asn

[2034]

820 825 830

[2035]

He Arg Cys Ser Lys He Arg Ser Met Phe Ala Met Arg Ala Cys Arg 835 840 845 Ser Ser He Met He Gly Lys Pro Leu Asn Lys Lys Thr Met Thr Arg

[2036]

850 855 860

[2037]

Val Val His Asn Leu Ser Glu Leu Asp Lys Pro Trp Asn Cys Pro His 865 870 875 880

[2038]

Gly Arg Pro Thr Met Arg His Leu Met Glu He Arg Asp Trp Ser Ser

[2039]

885 890 895

[2040]

Phe Ser Lys Asp Tyr Glu He 900

[2041]

(2) INFORMATION FOR SEQ ID NO:135: (i) SEQUENCE CHARACTERISTICS:

[2042]

(A) LENGTH: 2577 base pairs

[2043]

(B) TYPE: nucleic acid

[2044]

(C) STRANDEDNESS: single

[2045]

(D) TOPOLOGY: linear

[2046]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:135: TTCCGGCCAA TGCTATCAAA GAGATGATAG AAAACTGTTT AGATGCAAAA TCTACAAATA 60 TTCAAGTGGT TGTTAAGGAA GGTGGCCTGA AGCTAATTCA GATCCAAGAC AATGGCACTG 120 GAATCAGGAA GGAAGATCTG GATATTGTGT GTGAGAGGTT CACTACGAGT AAACTGCAGA 180 CTTTTGAGGA TTTAGCCAGT ATTTCTACCT ATGGCTTTCG TGGTGAGCAT TTGGCAAGCA 240 TAAGTCATGT GGCCCATGTC ACTATTACAA CCAAAACAGC TGATGGGAAA TGTGCGTACA 300 GAGCAAGTTA CTCAGATGGA AAGCTGCAAG CCCCTCCTAA ACCCTGTGCA GGCAACCAGG 360 GCACCCTGAT CACGGTGGAA GACCTTTTTT ACAACATAAT CACAAGGAGG AAAGCTTTAA 420 AAAATCCAAG TGAAGAGTAC GGAAAAATTT TGGAAGTTGT TGGCAGGTAT TCAATACACA 480' ATTCAGGCAT TAGTATCTCA GTTAAAAAAC AAGGTGAGAC AGTATCTGAT GTCAGAACAC 540 TGCCCAATGC CACAACCGTG GACAACATTC GCTCCATCTT TGGAAATGCG GTTAGTCGAG 600 AACTGATAGA AGTTGGGTGT GAGGATAAAA CCCTAGCTTT CAAAATGAAT GGCTATATAT 660 CGAATGCAAA GTATTCAGTG AAGAAGTGCA TTTTCCTACT CTTCATCAAC CACCGTCTGG 720 TAGAATCAGC TGCCTTGAGA AAAGCCATTG AAACTGTATA TGCAGCATAC TTGCCAAAAA 780 CACACACCCA TTCCTGTACC TCAGTTTGAA ATCAGCCCTC AGAACGTGAC GTCAATGTAC 840 ACCCCACCAA GACAGAAGTT CATTTTCTGC ACGAGGAGAG CATTCTGCAG CGTGTGCAGC 900 AGCACATTGA GAGCAAGCTG CTGGGCTCCA ATTCCTCCAG GATGTATTTC ACCCAGACCT 960 TGCTTCCAGG ACTTGCTGGG CCTCTGGGGA GGCAGCTAGA CCCACGACAG GGGTGGCTTC 1020 CTCATCCACT AGTGGAAGTG GCGACAAGGT CTACGCTTAC CAGATGTCGC GTACGGACTC 1080 CCGGGATCAG AAGCTTGACG CCTTTCTGCA GCCTGTAACC AGCCTTGTGC CCAGCCAGCC 1140 CCAGGACCCT CGCCCTGTCC GAGGGGCCAG GACAGAGGGC TCTCCTGAAA GGGCCACGCG 1200 GGAGGATGAG GAGATGCTTG CTCTCCCAGC CCCCGCTGAA GCAGCTGCTG AGAGTGAGAA 1260 CTTGGAGAGG GAATCACTAA TGGAGACTTC AGACGCAGCC CAGAAAGCGG CACCCACTTC 1320 CAGTCCAGGA AGCTCCAGAA AGAGTCATCG GGAGGACTCT GATGTGGAAA TGGTGGAAAA 1380 TGCTTCCGGG AAGGAAATGA CAGCTGCTTG CTACCCCAGG AGGAGGATCA TTAACCTCAC 1440 CAGCGTCTTG AGTCTCCAGG AAGAGATTAG TGAGCGGTGC CATGAGACTC TCCGGGAGAT 1500 ACTCCGTAAC CATTCCTTTG TGGGCTGTGT GAATCCTCAG TGGGCCTTGG CACAGCACCA 1560 GACCAAGCTA TACCTCCTCA ACACTACCAA GCTCAGTGAA GAGCTGTTCT ACCAGATACT 1620 CATTTATGAT TTTGCCAACT TTGGTGTTCT GAGGTTATCG GAACCAGCGC CACTCTTCGA 1680 CCTGGCCATG CTGGCTTAGA CAGTCCTGAA AGTGGCTGGA CAGAGGACGA CGGCCCGAAG 1740 AAGGGCTTGC AGAGTACATT GTCGAGTTTC TGAAGAGAAG CGAGATGCTT GCAGACTATT 1800

[2047]

CTCTGTGAGA TCGATGAGAA GGGAACCTGA TTGATTACTC TTCTGATGAC AGCTATGTGC 1860

[2048]

CACCTTTGGA GGGACTGCCT ATCTTCATTC TTCGACTGGC CACTGAGGTG AATTGGGTGA 1920

[2049]

AGAAAAGGAG TGTTTTGAAA GTCTCAGTAA AGAATGTGCT ATGTTTTACT CCATTCGGAA 1980

[2050]

GCAGTATATA CTGGAGGAGT CGACCCTCTC AGGCCAGCAG AGTGACATGC CTGGCTCCAC 2040

[2051]

GTCAAAGCCC TGGAAGTGGA CTGTGGAGCA CATTATCTAT AAAGCCTTCC GCTCACACCT 2100

[2052]

CCTACCTCCG AAGCATTTCA CAGAAGATGG CAATGTCCTG CAGCTTGCCA ACCTGCCAGA 2160

[2053]

TCTATACAAA GTCTTTGAGC GGTGTTAAAT ACAATCATAG CCACCGTAGA GACTGCATGA 2220

[2054]

CCATCCAAGG CGAAGTGTAT GGTACTAATC TGGAAGCCAC AGAATAGGAC ACTTGGTTTC 2280

[2055]

AGCTCCAGGG TTTTCAGTGC TCACTATTCT TGTTCTGTAT CCCAGTATTG GTGCTGCAAC 2340

[2056]

TTAATGTACT TCACCTGTGG ATTGGCTGCA AATAAACTCA CGTGTATTGG AAAAAAGGAA 2400

[2057]

TTCCTGCAGC CCGGGGGATC CACTAGTTCT AGAGCGGCCG CCACCGGTGG AGCTCCAGCT 2460

[2058]

TTTGTTCCCT TTAGTGAGGG TTAATTTCGA GCTTGGCGTA ATCATGGTCA TAGCTGTTTC 2520

[2059]

CTGTGTGAAA TTGTTATCCG CTCACAATTC CACACAACAT ACGAGCCGGA AGCATAA 2577

[2060]

(2) INFORMATION FOR SEQ ID NO:136: (i) SEQUENCE CHARACTERISTICS:

[2061]

(A) LENGTH: 728 amino acids

[2062]

(B) TYPE: amino acid

[2063]

(C) STRANDEDNESS: single

[2064]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein

[2065]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:136:

[2066]

Pro Ala Asn Ala He Lys Glu Met He Glu Asn Cys Leu Asp Ala Lys 1 5 10 15

[2067]

Ser Thr Asn He Gin Val Val Val Lys Glu Gly Gly Leu Lys Leu He

[2068]

20 25 30

[2069]

Gin He Gin Asp Asn Gly Thr Gly He Arg Lys Glu Asp Leu Asp He

[2070]

35 40 45

[2071]

Val Cys Glu Arg Phe Thr Thr Ser Lys Leu Gin Thr Phe Glu Asp Leu

[2072]

50 55 60

[2073]

Ala Ser He Ser Thr Tyr Gly Phe Arg Gly Glu His Leu Ala Ser He 65 70 75 80

[2074]

Ser His Val Ala His Val Thr He Thr Thr Lys Thr Ala Asp Gly Lys

[2075]

85 90 95

[2076]

Cys Ala Tyr Arg Ala Ser Tyr Ser Asp Gly Lys Leu Gin Ala Pro Pro

[2077]

100 105 110

[2078]

Lys Pro Cys Ala Gly Asn Gin Gly Thr Leu He Thr Val Glu Asp Leu

[2079]

115 120 125

[2080]

Phe Tyr Asn He He Thr Arg Arg Lys Ala Leu Lys Asn Pro Ser Glu

[2081]

130 135 140

[2082]

Glu Tyr Gly Lys He Leu Glu Val Val Gly Arg Tyr Ser He His Asn 145 150 155 160

[2083]

Ser Gly He Ser He Ser Val Lys Lys Gin Gly Glu Thr Val Ser Asp 165 170 175 Val Arg Thr Leu Pro Asn Ala Thr Thr Val Asp Asn He Arg Ser He

[2084]

180 185 190

[2085]

Phe Gly Asn Ala Val Ser Arg Glu Leu He Glu Val Gly Cys Glu Asp

[2086]

195 200 205

[2087]

Lys Thr Leu Ala Phe Lys Met Asn Gly Tyr He Ser Asn Ala Lys Tyr

[2088]

210 215 220

[2089]

Ser Val Lys Lys Cys He Phe Leu Leu Phe He Asn His Arg Leu Val 225 230 235 240

[2090]

Glu Ser Ala Ala Leu Arg Lys Ala He Glu Thr Val Tyr Ala Ala Tyr

[2091]

245 250 255

[2092]

Leu Pro Lys Thr His Thr His Ser Cys Thr Ser Val Glx Asn Gin Pro

[2093]

260 265 270

[2094]

Ser Glu Arg Asp Val Asn Val His Pro Thr Lys Thr Glu Val His Phe

[2095]

275 280 285

[2096]

Leu His Glu Glu Ser He Leu Gin Arg Val Gin Gin His He Glu Ser

[2097]

290 295 300

[2098]

Lys Leu Leu Gly Ser Asn Ser Ser Arg Met Val Phe His Pro Asp Leu 305 310 315 320

[2099]

Ala Ser Arg Thr Cys Trp Ala Ser Gly Glu Ala Ala Arg Pro Thr Thr

[2100]

325 330 335

[2101]

Gly Val Ala Ser Ser Ser Thr Ser Gly Ser Gly Asp Lys Val Tyr Ala

[2102]

340 345 350

[2103]

Tyr Gin Met Ser Arg Thr Asp Ser Arg Asp Gin Lys Leu Asp Ala Phe'

[2104]

355 360 365

[2105]

Leu Gin Pro Val Ser Ser Leu Val Pro Ser Gin Pro Gin Asp Pro Arg

[2106]

370 375 380

[2107]

Pro Val Arg Gly Ala Arg Thr Glu Gly Ser Pro Glu Arg Ala Thr Arg 385 390 395 400

[2108]

Glu Asp Glu Glu Met Leu Ala Leu Pro Ala Pro Ala Glu Ala Ala Ala

[2109]

405 410 415

[2110]

Glu Ser Glu Asn Leu Glu Arg Glu Ser Leu Met Glu Thr Ser Asp Ala

[2111]

420 425 430

[2112]

Ala Gin Lys Ala Ala Pro Thr Ser Ser Pro Gly Ser Ser Arg Lys Ser

[2113]

435 440 445

[2114]

His Arg Glu Asp Ser Asp Val Glu Met Val Glu Asn Ala Ser Gly Lys

[2115]

450 455 460

[2116]

Glu Met Thr Ala Ala Cys Tyr Pro Arg Arg Arg He He Asn Leu Thr 465 470 475 480

[2117]

Ser Val Leu Ser Leu Gin Glu Glu He Ser Glu Arg Cys His Glu Thr

[2118]

485 490 495

[2119]

Leu Arg Glu He Leu Arg Asn His Ser Phe Val Gly Cys Val Asn Pro

[2120]

500 505 510

[2121]

Gin Trp Ala Leu Ala Gin His Gin Thr Lys Leu Tyr Leu Leu Asn Thr 515 520 525 Thr Lys Leu Ser Glu Glu Leu Phe Tyr Gin He Leu He Tyr Asp Phe

[2122]

530 535 540

[2123]

Ala Asn Phe Gly Val Leu Arg Leu Ser Glu Pro Ala Pro Leu Phe Asp 545 550 555 560

[2124]

Leu Ala Met Leu Ala Glx Thr Val Leu Lys Val Ala Gly Gin Arg Thr

[2125]

565 570 575

[2126]

Thr Ala Arg Arg Arg Ala Cys Arg Val His Cys Arg Val Ser Glu Glu

[2127]

580 585 590

[2128]

Lys Arg Asp Ala Cys Arg Leu Phe Ser Val Arg Ser Met Arg Arg Glu

[2129]

595 600 605

[2130]

Pro Asp Glx Leu Leu Phe Glx Glx Gin Leu Cys Ala Thr Phe Gly Gly

[2131]

610 615 620

[2132]

Thr Ala Tyr Leu His Ser Ser Thr Gly His Glx Gly Glu Leu Gly Glu 625 630 635 640

[2133]

Glu Lys Glu Cys Phe Glu Ser Leu Ser Lys Glu Cys Ala Met Phe Tyr

[2134]

645 650 655

[2135]

Ser He Arg Lys Gin Tyr He Leu Glu Glu Ser Thr Leu Ser Gly Gin

[2136]

660 665 670

[2137]

Gin Ser Asp Met Pro Gly Ser Thr Ser Lys Pro Trp Lys Trp Thr Val

[2138]

675 680 685

[2139]

Glu His He He Tyr Lys Ala Phe Arg Ser His Leu Leu Pro Pro Lys

[2140]

690 695 700

[2141]

His Phe Thr Glu Asp Gly Asn Val Leu Gin Leu Ala Asn Leu Pro Asp 705 710 715 720

[2142]

Leu Tyr Lys Val Phe Glu Arg Cys 725

[2143]

(2) INFORMATION FOR SEQ ID NO:137: (i) SEQUENCE CHARACTERISTICS:

[2144]

(A) LENGTH: 3065 base pairs

[2145]

(B) TYPE: nucleic acid

[2146]

(C) STRANDEDNESS: single

[2147]

(D) TOPOLOGY: linear

[2148]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:137:

[2149]

CGGTGAAGGT CCTGAAGAAT TTCCAGATTC CTGAGTATCA TTGGAGGAGA CAGATAACCT 60

[2150]

GTCGTCAGGT AACGATGGTG TATATGCAAC AGAAATGGGT GTTCCTGGAG ACGCGTCTTT 120

[2151]

TCCCGAGAGC GGCACCGCAA CTCTCCCGCG GTGACTGTGA CTGGAGGAGT CCTGCATCCA 180

[2152]

TGGAGCAAAC CGAAGGCGTG AGTACAGAAT GTGCTAAGGC CATCAAGCCT ATTGATGGGA 240

[2153]

AGTCAGTCCA TCAAATTTGT TCTGGGCAGG TGATACTCAG TTTAAGCACC GCTGTGAAGG 300

[2154]

AGTTGATAGA AAATAGTGTA GATGCTGGTG CTACTACTAT TGATCTAAGG CTTAAAGACT 360

[2155]

ATGGGGTGGA CCTCATTGAA GTTTCAGACA ATGGATGTGG GGTAGAAGAA GAAAACTTTG 420

[2156]

AAGGTCTAGC TCTGAAACAT CACACATCTA AGATTCAAGA GTTTGCCGAC CTCACGCAGG 480

[2157]

TTGAAACTTT CGGCTTTCGG GGGGAAGCTC TGAGCTCTCT GTGTGCACTA AGTGATGTCA 540

[2158]

CTATATCTAC CTGCCACGGG TCTGCAAGCG TTGGGACTCG ACTGGTGTTT GACCATAATG 600

[2159]

GGAAAATCAC CCAGAAAACT CCCTACCCCC GACCTAAAGG AACCACAGTC AGTGTGCAGC 660 ACTTATTTTA TACACTACCC GTGCGTTACA AAGAGTTTCA GAGGAACATT AAAAAGGAGT 720 ATTCCAAAAT GGTGCAGGTC TTACAGGCGT ACTGTATCAT CTCAGCAGGC GTCCGTGTAA 780 GCTGCACTAA TCAGCTCGGA CAGGGGAAGC GGCACGCTGT GGTGTGCACA AGCGGCACGT 840 CTGGCATGAA GGAAAATATC GGGTCTGTGT TTGGCCAGAA GCAGTTGCAA AGCCTCATTC 900 CTTTTGTTCA GCTGCCCCCT AGTGACGCTG TGTGTGAAGA GTACGGCCTG AGCACTTCAG 960 GACGCCACAA AACCTTTTCT ACGTTTTCGG GCTTCATTTC ACAGTGCACG CACGGCGCCG 1020 GGAGGAGTGC AACAGACAGG CAGTTTTTCT TCATCAATCA GAGGCCCTGT GACCCAGCAA 1080 AGGTCTCTAA GCTTGTCAAT GAGGTTTATC ACATGTATAA CCGGCATCAG TACCCATTTG 1140 TCGTCCTTAA CGTTTCCGTT GACTCAGAAT GTGTGGATAT TAATGTAACT CCAGATAAAA 1200 GGCAAATTCT ACTACAAGAA GAGAAGCTAT TGCTGGCCGT TTTAAAGACC TCCTTGATAG 1260 GAATGTTTGA CAGTGATGCA AACAAGCTTA ATGTCAACCA GCAGCCACTG CTAGATGTTG 1320 AAGGTAACTT AGTAAAGTCG CATACTGCAG AACTAGAAAA GCCTGTGCCA GGAAAGCAAG 1380 ATAACTCTCC TTCACTGAAG AGCACAGCAG ACGAGAAAAG GGTAGCATCC ATCTCCAGGC 1440 TGAGAGAGGC CTTTTCTCTT CATCCTACTA AAGAGATCAA GTCTAGGGGT CCAGAGACTG 1500 CTGAACTGAC ACGGAGTTTT CCAAGTGAGA AAAGGGGCGT GTTATCCTCT TATCCTTCAG 1560 ACGTCATCTC TTACAGAGGC CTCCGTGGCT CGCAGGACAA ATTGGTGAGT CCCACGGACA 1620 GCCCTGGTGA CTGTATGGAC AGAGAGAAAA TAGAAAAAGA CTCAGGGCTC AGCAGCACCT 1680 CAGCTGGCTC TGAGGAAGAG TTCAGCACCC CAGAAGTGGC CAGTAGCTTT AGCAGTGACT 1740 ATAACGTGAG CTCCCTAGAA GACAGACCTT CTCAGGAAAC CATAAACTGT GGTGACCTGC 1800 TGCCGTCCTC CAGGTACAGG ACAGTCCTTG AAGCCAGAAG ACCATGGATA TCAATGCAAA 1860 GCTCTACCTC TAGCTCGTCT GTCACCCACA AATGCCAAGC GCTTCAAGAC AGAGGAAGAC 1920 CCTCAAATGT CAACATATCT CAAAGATTGC CTGGTCCTCA GAGCACCTCA GCAGCTGAGG 1980 TCGATGTAGC CATAAAAATG AATAAGAGAT CGTGCTCCTC GAGTTCTCTA GCTAAGCGAA 2040 TGAAGCAGTT ACAGCACCTA AAGGCGCAGA ACAAACATGA ACTGAGTTAC AGAAAATTTA 2100 GGGCCAAGAT TTGCCCTGGA GAAAACCAAG CAGCAGAAGA TGAACTCAGA AAAGAGATTA 2160 GTAAATCGAT GTTTGCAGAG ATGGAGATCT TGGGTCAGTT TAACCTGGGA TTTATAGTAA 2220 CCAAACTGAA AGAGGACCTC TTCCTGGTGG ACCAGCATGC TGCGGATGAG AAGTACAACT 2280 TTGAGATGCT GCAGCAGCAC ACGGTGCTCC AGGCGCAGAG GCTCATCACG TGGGTGCACA 2340 CAGGCTTCAG AGTTCCCAGA CCCCAGACTC TGAACTTAAC TGCTGTCAAT GAAGCTGTAC 2400 TGATAGAAAA TCTGGAAATA TTCAGAAAGA ATGGCTTTGA CTTTGTCATT GATGAGGATG 2460 CTCCAGTCAC TGAAAGGGCT AAATTGATTT CCTTACCAAC TAGTAAAAAC TGGACCTTTG 2520 GACCCCAAGA TATAGATGAA CTGATCTTTA TGTTAAGTGA CAGCCCTGGG GTCATGTGCC 2580 GGCCCTCACG AGTCAGACAG ATGTTTGCTT CCAGAGCCTG TCGGAAGTCA GTGATGATTG 2640 GAACGGCGCT CAATGCGAGC GAGATGAAGA AGCTCATCAC CCACATGGGT GAGATGGACC 2700 ACCCCTGGAA CTGCCCCCAC GGCAGGCCAA CCATGAGGCA CGTTGCCAAT CTGGATGTCA 2760 TCTCTCAGAA CTGACACACC CCTTGTAGCA TAGAGTTTAT TACAGATTGT TCGGTTCGCA 2820 AAGAGAAGGT TTTAAGTAAT CTGATTATCG TTGTACAAAA ATTAGCATGC TGCTTTAATG 2880 TACTGGATCC ATTTAAAAGC AGTGTTAAGG CAGGCATGAT GGAGTGTTCC TCTAGCTCAG 2940 CTACTTGGGT GATCCGGTGG GAGCTCATGT GAGCCCAGGA CTTTGAGACC ACTCCGAGCC 3000 ACATTCATGA GACTCAATTC AAGGACAAAA AAAAAAAGAT ATTTTTGAAG CCTTTTAAAA 3060 AAAAA 3065 (2) INFORMATION FOR SEQ ID NO:138: (i) SEQUENCE CHARACTERISTICS:

[2160]

(A) LENGTH: 864 amino acids

[2161]

(B) TYPE: amino acid

[2162]

(C) STRANDEDNESS: single

[2163]

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein

[2164]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:138:

[2165]

Met Glu Gin Thr Glu Gly Val Ser Thr Glu Cys Ala Lys Ala He Lys 1 5 10 15

[2166]

Pro He Asp Gly Lys Ser Val His Gin He Cys Ser Gly Gin Val He

[2167]

20 25 30

[2168]

Leu Ser Leu Ser Thr Ala Val Lys Glu Leu He Glu Asn Ser Val Asp

[2169]

35 40 45

[2170]

Ala Gly Ala Thr Thr He Asp Leu Arg Leu Lys Asp Tyr Gly Val Asp

[2171]

50 55 60

[2172]

Leu He Glu Val Ser Asp Asn Gly Cys Gly Val Glu Glu Glu Asn Phe 65 70 75 80

[2173]

Glu Gly Leu Ala Leu Lys His His Thr Ser Lys He Gin Glu Phe Ala

[2174]

85 90 95

[2175]

Asp Leu Thr Gin Val Glu Thr Phe Gly Phe Arg Gly Glu Ala Leu Ser

[2176]

100 105 110

[2177]

Ser Leu Cys Ala Leu Ser Asp Val Thr He Ser Thr Cys His Gly Ser

[2178]

115 120 125

[2179]

Ala Ser Val Gly Thr Arg Leu Val Phe Asp His Asn Gly Lys He Thr

[2180]

130 135 140

[2181]

Gin Lys Thr Pro Tyr Pro Arg Pro Lys Gly Thr Thr Val Ser Val Gin 145 150 155 160

[2182]

His Leu Phe Tyr Thr Leu Pro Val Arg Tyr Lys Glu Phe Gin Arg Asn

[2183]

165 170 175

[2184]

He Lys Lys Glu Tyr Ser Lys Met Val Gin Val Leu Gin Ala Tyr Cys

[2185]

180 185 190

[2186]

He He Ser Ala Gly Val Arg Val Ser Cys Thr Asn Gin Leu Gly Gin

[2187]

195 200 205

[2188]

Gly Lys Arg His Ala Val Val Cys Thr Ser Gly Thr Ser Gly Met Lys

[2189]

210 215 220

[2190]

Glu Asn He Gly Ser Val Phe Gly Gin Lys Gin Leu Gin Ser Leu He 225 230 235 240

[2191]

Pro Phe Val Gin Leu Pro Pro Ser Asp Ala Val Cys Glu Glu Tyr Gly

[2192]

245 250 255

[2193]

Leu Ser Thr Ser Gly Arg His Lys Thr Phe Ser Thr Phe Ser Gly Phe

[2194]

260 265 270

[2195]

He Ser Gin Cys Thr His Gly Ala Gly Arg Ser Ala Thr Asp Arg Gin 275 280 285 Phe Phe Phe He Asn Gin Arg Pro Cys Asp Pro Ala Lys Val Ser Lys

[2196]

290 295 300

[2197]

Leu Val Asn Glu Val Tyr His Met Tyr Asn Arg His Gin Tyr Pro Phe 305 310 315 320

[2198]

Val Val Leu Asn Val Ser Val Asp Ser Glu Cys Val Asp He Asn Val

[2199]

325 330 335

[2200]

Thr Pro Asp Lys Arg Gin He Leu Leu Gin Glu Glu Lys Leu Leu Leu

[2201]

340 345 350

[2202]

Ala Val Leu Lys Thr Ser Leu He Gly Met Phe Asp Ser Asp Ala Asn

[2203]

355 360 365

[2204]

Lys Leu Asn Val Asn Gin Gin Pro Leu Leu Asp Val Glu Gly Asn Leu

[2205]

370 375 380

[2206]

Val Lys Ser His Thr Ala Glu Leu Glu Lys Pro Val Pro Gly Lys Gin 385 390 395 400

[2207]

Asp Asn Ser Pro Ser Leu Lys Ser Thr Ala Asp Glu Lys Arg Val Ala

[2208]

405 410 415

[2209]

Ser He Ser Arg Leu Arg Glu Ala Phe Ser Leu His Pro Thr Lys Glu

[2210]

420 425 430

[2211]

He Lys Ser Arg Gly Pro Glu Thr Ala Glu Leu Thr Arg Ser Phe Pro

[2212]

435 440 445

[2213]

Ser Glu Lys Arg Gly Val Leu Ser Ser Tyr Pro Ser Asp Val He Ser

[2214]

450 455 460

[2215]

Tyr Arg Gly Leu Arg Gly Ser Gin Asp Lys Leu Val Ser Pro Thr Asp 465 470 475 480

[2216]

Ser Pro Gly Asp Cys Met Asp Arg Glu Lys He Glu Lys Asp Ser Gly

[2217]

485 490 495

[2218]

Leu Ser Ser Thr Ser Ala Gly Ser Glu Glu Glu Phe Ser Thr Pro Glu

[2219]

500 505 510

[2220]

Val Ala Ser Ser Phe Ser Ser Asp Tyr Asn Val Ser Ser Leu Glu Asp

[2221]

515 520 525

[2222]

Arg Pro Ser Gin Glu Thr He Asn Cys Gly Asp Leu Leu Pro Ser Ser

[2223]

530 535 540

[2224]

Arg Tyr Arg Thr Val Leu Glu Ala Arg Arg Pro Trp He Ser Met Gin 545 550 555 560

[2225]

Ser Ser Thr Ser Ser Ser Ser Val Thr His Lys Cys Gin Ala Leu Gin

[2226]

565 570 575

[2227]

Asp Arg Gly Arg Pro Ser Asn Val Asn He Ser Gin Arg Leu Pro Gly

[2228]

580 585 590

[2229]

Pro Gin Ser Thr Ser Ala Ala Glu Val Asp Val Ala He Lys Met Asn

[2230]

595 600 605

[2231]

Lys Arg Ser Cys Ser Ser Ser Ser Leu Ala Lys Arg Met Lys Gin Leu

[2232]

610 615 620

[2233]

Gin His Leu Lys Ala Gin Asn Lys His Glu Leu Ser Tyr Arg Lys Phe 625 630 635 640 Arg Ala Lys He Cys Pro Gly Glu Asn Gin Ala Ala Glu Asp Glu Leu

[2234]

645 650 655

[2235]

Arg Lys Glu He Ser Lys Ser Met Phe Ala Glu Met Glu He Leu Gly

[2236]

660 665 670

[2237]

Gin Phe Asn Leu Gly Phe He Val Thr Lys Leu Lys Glu Asp Leu Phe

[2238]

675 680 685

[2239]

Leu Val Asp Gin His Ala Ala Asp Glu Lys Tyr Asn Phe Glu Met Leu

[2240]

690 695 700

[2241]

Gin Gin His Thr Val Leu Gin Ala Gin Arg Leu He Thr Trp Val His 705 710 715 720

[2242]

Thr Gly Phe Arg Val Pro Arg Pro Gin Thr Leu Asn Leu Thr Ala Val

[2243]

725 730 735

[2244]

Asn Glu Ala Val Leu He Glu Asn Leu Glu He Phe Arg Lys Asn Gly

[2245]

740 745 750

[2246]

Phe Asp Phe Val He Asp Glu Asp Ala Pro Val Thr Glu Arg Ala Lys

[2247]

755 760 765

[2248]

Leu He Ser Leu Pro Thr Ser Lys Asn Trp Thr Phe Gly Pro Gin Asp

[2249]

770 775 780

[2250]

He Asp Glu Leu He Phe Met Leu Ser Asp Ser Pro Gly Val Met Cys 785 790 795 800

[2251]

Arg Pro Ser Arg Val Arg Gin Met Phe Ala Ser Arg Ala Cys Arg Lys

[2252]

805 810 815

[2253]

Ser Val Met He Gly Thr Ala Leu Asn Ala Ser Glu Met Lys Lys Leu

[2254]

820 825 830

[2255]

He Thr His Met Gly Glu Met Asp His Pro Trp Asn Cys Pro His Gly

[2256]

835 840 845

[2257]

Arg Pro Thr Met Arg His Val Ala Asn Leu Asp Val He Ser Gin Asn 850 . 855 860

[2258]

(2) INFORMATION FOR SEQ ID NO:139: (i) SEQUENCE CHARACTERISTICS:

[2259]

(A) LENGTH: 29 base pairs

[2260]

(B) TYPE: nucleic acid

[2261]

(C) STRANDEDNESS: single

[2262]

(D) TOPOLOGY: linear

[2263]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:139: CTTGATTCTA GAGCYTCNCC NCKRAANCC 29

[2264]

(2) INFORMATION FOR SEQ ID NO:140: (i) SEQUENCE CHARACTERISTICS:

[2265]

(A) LENGTH: 29 base pairs

[2266]

(B) TYPE: nucleic acid

[2267]

(C) STRANDEDNESS: single

[2268]

(D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:140: AGGTCGGAGC TCAARGARYT NGTNGANAA 29

[2269]

(2) INFORMATION FOR SEQ ID NO:141: (i) SEQUENCE CHARACTERISTICS:

[2270]

(A) LENGTH: 15 base pairs

[2271]

(B) TYPE: nucleic acid

[2272]

(C) STRANDEDNESS: single

[2273]

(D) TOPOLOGY: linear

[2274]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:141: ACTTGTGGAT TTTGC 15

[2275]

(2) INFORMATION FOR SEQ ID NO:142: (i) SEQUENCE CHARACTERISTICS:

[2276]

(A) LENGTH: 15 base pairs

[2277]

(B) TYPE: nucleic acid

[2278]

(C) STRANDEDNESS: single

[2279]

(D) TOPOLOGY: linear

[2280]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:142: ACTTGTGAAT TTTGC 15

[2281]

(2) INFORMATION FOR SEQ ID NO:143: (i) SEQUENCE CHARACTERISTICS:

[2282]

(A) LENGTH: 22 base pairs

[2283]

(B) TYPE: nucleic acid

[2284]

(C) STRANDEDNESS: single

[2285]

(D) TOPOLOGY: linear

[2286]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:143: TTCGGTGACA GATTTGTAAA TG 22

[2287]

(2) INFORMATION FOR SEQ ID NO:144: (i) SEQUENCE CHARACTERISTICS:

[2288]

(A) LENGTH: 16 base pairs

[2289]

(B) TYPE: nucleic acid

[2290]

(C) STRANDEDNESS: single

[2291]

(D) TOPOLOGY: linear

[2292]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:144: TTTACGGAGC CCTGGC 16 (2) INFORMATION FOR SEQ ID NO:145: (i) SEQUENCE CHARACTERISTICS:

[2293]

(A) LENGTH: 22 base pairs

[2294]

(B) TYPE: nucleic acid

[2295]

(C) STRANDEDNESS: single

[2296]

(D) TOPOLOGY: linear

[2297]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:145: TCACCATAAA AATAGTTTCC CG 22

[2298]

(2) INFORMATION FOR SEQ ID NO:146: (i) SEQUENCE CHARACTERISTICS:

[2299]

(A) LENGTH: 22 base pairs

[2300]

(B) TYPE: nucleic acid

[2301]

(C) STRANDEDNESS: single

[2302]

(D) TOPOLOGY: linear

[2303]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:146: TCCTGGATCA TATTTTCTGA GC 22

[2304]

(2) INFORMATION FOR SEQ ID NO:147: (i) SEQUENCE CHARACTERISTICS:" (A) LENGTH: 22 base pairs

[2305]

(B) TYPE: nucleic acid

[2306]

(C) STRANDEDNESS: single

[2307]

(D) TOPOLOGY: linear

[2308]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:147: TTTCAGGTAT GTCCTGTTAC CC 22

[2309]

(2) INFORMATION FOR SEQ ID NO:148: (i) SEQUENCE CHARACTERISTICS:

[2310]

(A) LENGTH: 22 base pairs

[2311]

(B) TYPE: nucleic acid

[2312]

(C) STRANDEDNESS: single

[2313]

(D) TOPOLOGY: linear

[2314]

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:148: TGAGGCAGCT TTTAAGAAAC TC 22

Как компенсировать расходы
на инновационную разработку
Похожие патенты