заявка
№ US 20210193267
МПК G16B40/00

METHODS, SYSTEMS, AND RELATED COMPUTER PROGRAM PRODUCTS FOR EVALUATING CANCER MODEL FIDELITY

Авторы:
Patrick Cahan
Номер заявки
17123591
Дата подачи заявки
16.12.2020
Опубликовано
24.06.2021
Страна
US
Как управлять
интеллектуальной собственностью
Чертежи 
11
Реферат

[0000]

Provided herein are methods of generating training classifiers and/or evaluating cancer models. Related systems and computer program products are also provided.

[00000]

Формула изобретения

1. A method of generating a training classifier at least partially using a computer, the method comprising:

generating, by the computer, one or more training data sets, wherein a given training data set comprises gene expression profiles of subjects having a given tumor type;

identifying, by the computer, intersecting genes between the training data sets and one or more query samples to produce one or more intersecting gene sets;

partitioning, by the computer, the intersecting gene sets into training subsets and validation subsets for a given tumor type;

identifying, by the computer, one or more groups of differentially over-expressed genes, differentially under-expressed genes, and/or least differentially expressed genes in the training subsets to produce one or more baseline gene sets;

generating, by the computer, one or more gene-pairs for one or more of the tumor types from the baseline gene sets;

pair-transforming, by the computer, the gene-pairs to produce one or more binarized training data sets;

selecting, by the computer, one or more discriminatory gene-pairs for at least some of the tumor types;

generating, by the computer, one or more random gene-pair profiles through random permutations of the training data sets, which gene-pair profiles lack tumor type annotation; and,

selecting, by the computer, one or more of the gene-pairs as features to produce a random forest classifier, thereby generating the training classifier.

2. The method of claim 1, wherein the query samples comprise cancer cell line (CCL) samples, patient derived xenograft (PDX) samples, and/or genetically engineered mouse model (GEMM) samples.

3. The method of claim 1, wherein the partitioning step comprises randomly sampling the gene expression profiles for the given tumor type.

4. The method of claim 1, comprising evaluating performance of the training classifier using precision-recall curve and area under the precision-recall curve (AUPR).

5. The method of claim 1, comprising repeating one or more steps of generating the training classifier.

6. The method of claim 1, wherein the gene-pairs are selected from genes listed in Table 1.

7. The method of claim 1, comprising adding one or more additional features to produce the random forest classifier.

8. The method of claim 1, comprising evaluating one or more cancer cell line (CCL) expression profiles, patient derived xenograft (PDX) expression profiles, and/or genetically engineered mouse model (GEMM) expression profiles using the training classifier.

9. The method of claim 1, wherein the gene-pairs comprise genes from different species.

10. The method of claim 1, wherein gene expression profiles comprise RNA-seq and/or microarray gene expression profiles.

11. The training classifier generated by the method of claim 1.

12. The method of claim 1, further comprising generating one or more tumor sub-type classifiers.

13. The method of claim 12, wherein the tumor sub-type classifiers comprise one or more gene pairs selected from genes listed in Tables 2-12.

14. A method of evaluating a cancer model at least partially using a computer, the method comprising:

generating, by the computer, one or more training data sets, wherein a given training data set comprises gene expression profiles of subjects having a given tumor type;

identifying, by the computer, intersecting genes between the training data sets and one or more query samples to produce one or more intersecting gene sets;

partitioning, by the computer, the intersecting gene sets into training subsets and validation subsets for a given tumor type;

identifying, by the computer, one or more groups of differentially over-expressed genes, differentially under-expressed genes, and/or least differentially expressed genes in the training subsets to produce one or more baseline gene sets;

generating, by the computer, one or more gene-pairs for one or more of the tumor types from the baseline gene sets;

pair-transforming, by the computer, the gene-pairs to produce one or more binarized training data sets;

selecting, by the computer, one or more discriminatory gene-pairs for at least some of the tumor types;

generating, by the computer, one or more random gene-pair profiles through random permutations of the training data sets, which gene-pair profiles lack tumor type annotation;

selecting, by the computer, one or more of the gene-pairs as features to produce a random forest classifier; and,

evaluating one or more cancer models using the random forest classifier.

15. A system, comprising a controller comprising, or capable of accessing, computer readable media comprising non-transitory computer executable instruction which, when executed by at least electronic processor perform, at least:

generating one or more training data sets, wherein a given training data set comprises gene expression profiles of subjects having a given tumor type;

identifying intersecting genes between the training data sets and one or more query samples to produce one or more intersecting gene sets;

partitioning the intersecting gene sets into training subsets and validation subsets for a given tumor type;

identifying one or more groups of differentially over-expressed genes, differentially under-expressed genes, and/or least differentially expressed genes in the training subsets to produce one or more baseline gene sets;

generating one or more gene-pairs for one or more of the tumor types from the baseline gene sets;

pair-transforming the gene-pairs to produce one or more binarized training data sets;

selecting one or more discriminatory gene-pairs for at least some of the tumor types;

generating one or more random gene-pair profiles through random permutations of the training data sets, which gene-pair profiles lack tumor type annotation; and,

selecting one or more of the gene-pairs as features to produce a random forest classifier, thereby generating the training classifier.

16. The system of claim 15, comprising stratifying sampling when selecting gene-pairs as features to produce the random forest classifier.

17. The system of claim 15, comprising repeating one or more steps of generating the training classifier.

18. The system of claim 15, wherein the gene-pairs are selected from genes listed in Table 1.

19. The system of claim 15, further comprising generating one or more tumor sub-type classifiers.

20. The system of claim 19, wherein the tumor sub-type classifiers comprise one or more gene pairs selected from genes listed in Tables 2-12.

Описание

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001]

This application claims priority to, and the benefit of, U.S. Provisional Patent Application No. 62/949,295 entitled “METHODS, SYSTEMS, AND RELATED COMPUTER PROGRAM PRODUCTS FOR EVALUATING CANCER MODEL FIDELITY” filed Dec. 17, 2019, the disclosure of which is hereby incorporated by reference in its entirety.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

[0002]

This invention was made with government support under grant number CA228991 awarded by the National Institutes of Health. The government has certain rights in the invention.

BACKGROUND

[0003]

Models are widely used to investigate cancer biology and to identify potential therapeutics. Popular modeling modalities are cancer cell lines (CCLs), genetically engineered mouse models (GEMMs), and patient derived xenografts (PDXs). These classes of models differ in the types of questions that they are designed to address. CCLs are often used to address cell intrinsic mechanistic questions, GEMMs to chart progression of molecularly defined-disease, and PDXs to explore patient-specific response to therapy in a physiologically relevant context. Models also differ in the extent to which they represent specific aspects of a cancer type. Even with this intra- and inter-class model variation, all models should represent the tumor type or sub-type under investigation, and not another type of tumor, and not a non-cancerous tissue. Therefore, cancer-models should be selected not only based on the specific biological question but also based on the similarity of the model to the cancer type under investigation (Mouradov et al. (2014) “Colorectal cancer cell lines are representative models of the main molecular subtypes of primary cancer,” Cancer Research, 74(12):3238-3247; Stuckelberger et al. (2018) “Precious GEMMs: emergence of faithful models for ovarian cancer research,” The Journal of Pathology, 245(2):129-131).

[0004]

Various methods have been proposed to determine the similarity of cancer models to their intended subjects. Domcke et al. devised a ‘suitability score’ as a metric of the molecular similarity of CCLs to high grade serous ovarian carcinoma based on a heuristic weighting of copy number alterations, mutation status of several genes that distinguish ovarian cancer subtypes, and hypermutation status (Domcke et al. (2013) “Evaluating cell lines as tumour models by comparison of genomic profiles,” Nature Communications, 4:2126). Other studies have taken analogous approaches by either focusing on transcriptomic or ensemble molecular profiles (e.g. transcriptomic and copy number alterations) to quantify the similarity of cell lines to tumors (Jiang et al. (2016) “Comprehensive comparison of molecular portraits between cell lines and tumors in breast cancer,” BMC Genomics 17 Suppl 7:525; Chen (2015) “Relating hepatocellular carcinoma tumor samples and cell lines using gene expression data in translational research,” BMC Medical Genomics 8 Suppl 2:S5.; Vincent et al. (2015) “Assessing breast cancer cell lines as tumour models by comparison of mRNA expression profiles,” Breast Cancer Research 17:114). These studies were tumor-type specific, focusing on CCLs that model, for example, hepatocellular carcinoma or breast cancer. More recently, Yu et al. compared the transcriptomes of CCLs to The Cancer Genome Atlas (TCGA) by correlation analysis, resulting in a panel of CCLs recommended as most representative of 22 tumor types (Yu et al. (2019) “Comprehensive transcriptomic analysis of cell lines as models of primary tumors across 22 tumor types,” Nature Communications 10(1):3574). While all of these studies have provided valuable information, they leave at least two major challenges unmet. The first challenge is to determine the fidelity of GEMMs and PDXs and whether there are stark differences between these classes of models and CCLs. The other major unmet challenge is to allow for rapid assessment of new, emerging cancer models. This challenge is especially relevant now as technical barriers to model generation have been substantially lowered, and because each PDX can be considered a distinct entity requiring validation.

SUMMARY

[0005]

The present disclosure relates, in certain aspects, to a computational software tool, called CancerCellNet (CCN), which can be used for several purposes in the clinical and research settings of cancer. A function of the tool is to classify biological samples according to their similarity to over two dozen well-defined cancer tumor types (e.g. breast invasive carcinoma), and sub-types thereof (e.g. ‘luminal A’). This tool is especially useful in cases where the tumor type is difficult for pathologists to determine, such as when the cancer has metastasized and the origin of the primary tumor is unknown. The tool is also useful as a means to gauge the similarity of cancers models to naturally occurring disease. Researchers will be able to use CancerCellNet to determine the model that is most appropriate for their research or translational question.

[0006]

CancerCellNet uses various types of data, including gene expression or transcriptomic data in certain applications. In some embodiments, the software uses the Random Forest machine learning classification technique. In certain of these embodiments, the training data used to train the algorithm are derived from The Cancer Genome Atlas (TCGA) and/or other data sources. As described herein, CancerCellNet's performance has been assessed on both held out TCGA data, as well as a host of well-annotated tumor data from other sources. The methods and related aspects of the present disclosure also provide a way to transform the data that enables CancerCellNet to be ‘agnostic’ with regards to the type of transcriptomic or other data types. Therefore, the methods are not limited to either microarray data, or RNA-Seq data. In addition, the present disclosure also provides a means of quickly identifying relevant features, which shortens the classifier training time, and makes classification rapid.

[0007]

In certain aspects, the present disclosure provides a method of generating a training classifier at least partially using a computer. The method includes generating, by the computer, one or more training data sets, wherein a given training data set comprises gene expression profiles of subjects having a given tumor type. The method also includes identifying, by the computer, intersecting genes between the training data sets and one or more query samples to produce one or more intersecting gene sets, and partitioning, by the computer, the intersecting gene sets into training subsets and validation subsets for a given tumor type. The method also includes identifying, by the computer, one or more groups of differentially over-expressed genes, differentially under-expressed genes, and/or least differentially expressed genes in the training subsets to produce one or more baseline gene sets, and generating, by the computer, one or more gene-pairs for one or more of the tumor types from the baseline gene sets. The method also includes pair-transforming, by the computer, the gene-pairs to produce one or more binarized training data sets, and selecting, by the computer, one or more discriminatory gene-pairs for at least some of the tumor types. In addition, the method also includes generating, by the computer, one or more random gene-pair profiles through random permutations of the training data sets, which gene-pair profiles lack tumor type annotation, and selecting, by the computer, one or more of the gene-pairs as features to produce a random forest classifier, thereby generating the training classifier.

[0008]

In other aspects, the present disclosure provides a method of evaluating a cancer model at least partially using a computer. The method includes generating, by the computer, one or more training data sets, wherein a given training data set comprises gene expression profiles of subjects having a given tumor type, and identifying, by the computer, intersecting genes between the training data sets and one or more query samples to produce one or more intersecting gene sets. The method also includes partitioning, by the computer, the intersecting gene sets into training subsets and validation subsets for a given tumor type, and identifying, by the computer, one or more groups of differentially over-expressed genes, differentially under-expressed genes, and/or least differentially expressed genes in the training subsets to produce one or more baseline gene sets. The method also includes generating, by the computer, one or more gene-pairs for one or more of the tumor types from the baseline gene sets, and pair-transforming, by the computer, the gene-pairs to produce one or more binarized training data sets. The method also includes selecting, by the computer, one or more discriminatory gene-pairs for at least some of the tumor types, and generating, by the computer, one or more random gene-pair profiles through random permutations of the training data sets, which gene-pair profiles lack tumor type annotation. In addition, the method also includes selecting, by the computer, one or more of the gene-pairs as features to produce a random forest classifier, and evaluating one or more cancer models using the random forest classifier.

[0009]

In some embodiments of the methods, the query samples comprise cancer cell line (CCL) samples, patient derived xenograft (PDX) samples, and/or genetically engineered mouse model (GEMM) samples, or data derived from such sample types. In certain embodiments, the partitioning step comprises randomly sampling the gene expression profiles for the given tumor type. In some embodiments, the methods include down-sampling, up-sampling, and/or log transforming one or more of the training subsets. In certain embodiments, the methods include using log transformed down-sampled counts to produce the baseline gene sets. In some embodiments, the methods include stratifying sampling when selecting gene-pairs as features to produce the random forest classifier. In certain embodiments, the methods include validating the training classifier using the validation subsets. In some embodiments, the methods include pair-transforming the validation subsets.

[0010]

In some embodiments, the methods include evaluating performance of the training classifier using precision-recall curve and area under the precision-recall curve (AUPR). In certain embodiments, the methods include repeating one or more steps of generating the training classifier. In some embodiments, the methods include using gene-pairs selected from genes listed in Table 1. In certain embodiments, the methods include adding one or more additional features to produce the random forest classifier. In some embodiments, the methods include evaluating one or more cancer cell line (CCL) expression profiles, patient derived xenograft (PDX) expression profiles, and/or genetically engineered mouse model (GEMM) expression profiles using the training classifier. In some embodiments of the methods, the gene-pairs comprise genes from different species.

[0011]

In certain embodiments of the methods, gene expression profiles comprise RNA-seq and/or microarray gene expression profiles. In some embodiments, the methods also include generating one or more tumor sub-type classifiers. In certain embodiments, the tumor sub-type classifiers comprise one or more gene pairs selected from genes listed in Tables 2-12.

[0012]

In other aspects, the present disclosure provides a system, comprising a controller comprising, or capable of accessing, computer readable media comprising non-transitory computer executable instruction which, when executed by at least electronic processor perform at least: generating one or more training data sets, wherein a given training data set comprises gene expression profiles of subjects having a given tumor type, and identifying intersecting genes between the training data sets and one or more query samples to produce one or more intersecting gene sets. The electronic processor also performs partitioning the intersecting gene sets into training subsets and validation subsets for a given tumor type, and identifying one or more groups of differentially over-expressed genes, differentially under-expressed genes, and/or least differentially expressed genes in the training subsets to produce one or more baseline gene sets. The electronic processor also performs generating one or more gene-pairs for one or more of the tumor types from the baseline gene sets, and pair-transforming the gene-pairs to produce one or more binarized training data sets. The electronic processor also performs selecting one or more discriminatory gene-pairs for at least some of the tumor types, and generating one or more random gene-pair profiles through random permutations of the training data sets, which gene-pair profiles lack tumor type annotation. In addition, the electronic processor also performs selecting one or more of the gene-pairs as features to produce a random forest classifier, thereby generating the training classifier.

[0013]

In other aspects, the present disclosure also provides a computer readable media comprising non-transitory computer executable instruction which, when executed by at least electronic processor perform at least: generating one or more training data sets, wherein a given training data set comprises gene expression profiles of subjects having a given tumor type, and identifying intersecting genes between the training data sets and one or more query samples to produce one or more intersecting gene sets. The electronic processor also performs partitioning the intersecting gene sets into training subsets and validation subsets for a given tumor type, and identifying one or more groups of differentially over-expressed genes, differentially under-expressed genes, and/or least differentially expressed genes in the training subsets to produce one or more baseline gene sets. The electronic processor also performs generating one or more gene-pairs for one or more of the tumor types from the baseline gene sets, and pair-transforming the gene-pairs to produce one or more binarized training data sets. The electronic processor also performs selecting one or more discriminatory gene-pairs for at least some of the tumor types, and generating one or more random gene-pair profiles through random permutations of the training data sets, which gene-pair profiles lack tumor type annotation. In addition, the electronic processor also performs selecting one or more of the gene-pairs as features to produce a random forest classifier, thereby generating the training classifier.

[0014]

In some embodiments of the systems or computer readable media, the query samples comprise cancer cell line (CCL) samples, patient derived xenograft (PDX) samples, and/or genetically engineered mouse model (GEMM) samples. In certain embodiments of the systems or computer readable media, the partitioning step comprises randomly sampling the gene expression profiles for the given tumor type. In some embodiments, the systems or computer readable media include down-sampling, up-sampling, and/or log transforming one or more of the training subsets. In some embodiments, the systems or computer readable media include using log transformed down-sampled counts to produce the baseline gene sets. In some embodiments, the systems or computer readable media include stratifying sampling when selecting gene-pairs as features to produce the random forest classifier. In some embodiments, the systems or computer readable media include validating the training classifier using the validation subsets. In some embodiments, the systems or computer readable media include pair-transforming the validation subsets. In some embodiments, the systems or computer readable media include evaluating performance of the training classifier using precision-recall curve and area under the precision-recall curve (AUPR). In some embodiments, the systems or computer readable media include repeating one or more steps of generating the training classifier.

[0015]

In some embodiments of the systems or computer readable media, the gene-pairs are selected from genes listed in Table 1. In some embodiments, the systems or computer readable media include adding one or more additional features to produce the random forest classifier. In some embodiments, the systems or computer readable media include evaluating one or more cancer cell line (CCL) expression profiles, patient derived xenograft (PDX) expression profiles, and/or genetically engineered mouse model (GEMM) expression profiles using the training classifier. In some embodiments of the systems or computer readable media, the gene-pairs comprise genes from different species. In some embodiments of the systems or computer readable media, the gene expression profiles comprise RNA-seq and/or microarray gene expression profiles. In some embodiments, the systems or computer readable media further include generating one or more tumor sub-type classifiers. In some embodiments of the systems or computer readable media, the tumor sub-type classifiers comprise one or more gene pairs selected from genes listed in Tables 2-12.

BRIEF DESCRIPTION OF THE DRAWINGS

[0016]

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate certain embodiments, and together with the written description, serve to explain certain principles of the methods, systems, and related computer readable media disclosed herein. The description provided herein is better understood when read in conjunction with the accompanying drawings which are included by way of example and not by way of limitation. It will be understood that like reference numerals identify like components throughout the drawings, unless the context indicates otherwise. It will also be understood that some or all of the figures may be schematic representations for purposes of illustration and do not necessarily depict the actual relative sizes or locations of the elements shown.

[0017]

FIG. 1 is a flow chart that schematically depicts exemplary method steps according to some aspects disclosed herein.

[0018]

FIG. 2 is a schematic diagram of an exemplary system suitable for use with certain aspects disclosed herein.

[0019]

FIG. 3A schematically depicts exemplary method steps according to some aspects disclosed herein.

[0020]

FIG. 3B is a plot of mean area under the precision-recall curve (AUPR) (y-axis) for various cancer types (x-axis).

[0021]

FIG. 4A are plots showing the performance of a classifier according to certain embodiments disclosed herein for various cancer types in which precision is represented on the y-axis, while recall is represented on the x-axis.

[0022]

FIG. 4B is a plot of AUPR (y-axis) for various cancer types (x-axis).

[0023]

FIG. 4C is a plot of AUPR of Cross-Species Testing Data with AUPR represented on the y-axis for various cell types represented on the x-axis.

[0024]

FIG. 4D schematically depicts exemplary method steps according to some aspects disclosed herein.

[0025]

FIG. 4E is a plot of cancer subtypes (y-axis) versus mean AUPR (x-axis).

[0026]

FIG. 5A is a plot of RNA-seq expression data of 657 different cell lines mined across 20 cancer types.

[0027]

FIG. 5B is a plot of CCN profiles.

[0028]

FIG. 5C is a plot of classifications.

[0029]

FIG. 5D is a plot of sub-type classification of Lung Squamous Cell Carcinoma (LUSC) cell lines.

[0030]

FIG. 5E is a plot of sub-type classification of Lung Adenocarcinoma (LUAD) cell lines.

[0031]

FIG. 5F is a plot of normalized citation count (y-axis) versus general classification score (x-axis).

[0032]

FIG. 6A is a plot of AUPR of Microarray Testing Data with AUPR represented on the y-axis for various cancer types represented on the x-axis.

[0033]

FIG. 6B is a plot of microarray expression data for cancer cell lines mined across various cancer types.

[0034]

FIG. 6C are plots comparing CCLE classification scores between microarray (y-axis) and RNA-seq data (x-axis).

[0035]

FIG. 7A is a plot of expression data mined across various cancer types.

[0036]

FIG. 7B is a plot of CCN profiles.

[0037]

FIG. 7C is a plot of classifications.

[0038]

FIG. 7D is a plot of classifications.

[0039]

FIG. 7E is a plot of classifications.

[0040]

FIG. 8A is a plot of expression data mined across various cancer types.

[0041]

FIG. 8B is a plot of CCN profiles.

[0042]

FIG. 8C is a plot of classifications.

[0043]

FIG. 8D is a plot of classifications.

[0044]

FIG. 9 is a plot of classifications.

[0045]

FIG. 10 are plots of general CCN scores of cancer models compared on a per tumor type basis.

[0046]

FIG. 11 are plots of sub-type classifications.

DEFINITIONS

[0047]

In order for the present disclosure to be more readily understood, certain terms are first defined below. Additional definitions for the following terms and other terms may be set forth through the specification. If a definition of a term set forth below is inconsistent with a definition in an application or patent that is incorporated by reference, the definition set forth in this application should be used to understand the meaning of the term.

[0048]

As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. Thus, for example, a reference to “a method” includes one or more methods, and/or steps of the type described herein and/or which will become apparent to those persons skilled in the art upon reading this disclosure and so forth.

[0049]

It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting. Further, unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. In describing and claiming the methods, systems, and component parts, the following terminology, and grammatical variants thereof, will be used in accordance with the definitions set forth below.

[0050]

About: As used herein, “about” or “approximately” as applied to one or more values or elements of interest, refers to a value or element that is similar to a stated reference value or element. In certain embodiments, the term “about” or “approximately” refers to a range of values or elements that falls within 25%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 11%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, or less in either direction (greater than or less than) of the stated reference value or element unless otherwise stated or otherwise evident from the context (except where such number would exceed 100% of a possible value or element).

[0051]

Cancer Type: As used herein, “cancer type” or “tumor type” refers to type or subtype of cancer defined, e.g., by histopathology. Cancer type can be defined by any conventional criterion, such as on the basis of occurrence in a given tissue (e.g., blood cancers, CNS, brain cancers, lung cancers (small cell and non-small cell), skin cancers, nose cancers, throat cancers, liver cancers, bone cancers, lymphomas, pancreatic cancers, bowel cancers, rectal cancers, thyroid cancers, bladder cancers, kidney cancers, mouth cancers, stomach cancers, breast cancers, prostate cancers, ovarian cancers, lung cancers, intestine cancers, soft tissue cancers, thyroid cancers, neuroendocrine cancers, gastroesophageal cancers, head and neck cancers, gynecological cancers, colorectal cancers, urothelial cancers, solid state cancers, heterogeneous cancers, homogenous cancers), unknown primary origin and the like, and/or of the same cell lineage (e.g., carcinoma, sarcoma, lymphoma, cholangiocarcinoma, leukemia, mesothelioma, melanoma, or glioblastoma) and/or cancer markers, such as Her2, CA15-3, CA19-9, CA-125, CEA, AFP, PSA, HCG, hormone receptor and NMP-22. Cancers can also be classified by stage (e.g., stage 1, 2, 3, or 4) and whether of primary or secondary origin.

[0052]

Classifier: As used herein, “classifier,” generally refers to algorithm computer code that receives, as input, test data and produces, as output, a classification of the input data as belonging to one or another class.

[0053]

Machine Learning Algorithm: As used herein, “machine learning algorithm,” generally refers to an algorithm, executed by computer, that automates analytical model building, e.g., for clustering, classification or pattern recognition. Machine learning algorithms may be supervised or unsupervised. Learning algorithms include, for example, artificial neural networks (e.g., back propagation networks), discriminant analyses (e.g., Bayesian classifier or Fischer analysis), support vector machines, decision trees (e.g., recursive partitioning processes such as CART—classification and regression trees, or random forests), linear classifiers (e.g., multiple linear regression (MLR), partial least squares (PLS) regression, and principal components regression), hierarchical clustering, and cluster analysis. A dataset on which a machine learning algorithm learns can be referred to as “training data.”

[0054]

Sample: As used herein, “sample” means anything capable of being analyzed by the methods and/or systems disclosed herein.

[0055]

Subject: As used herein, “subject” refers to an animal, such as a mammalian species (e.g., human) or avian (e.g., bird) species. More specifically, a subject can be a vertebrate, e.g., a mammal such as a mouse, a primate, a simian or a human. Animals include farm animals (e.g., production cattle, dairy cattle, poultry, horses, pigs, and the like), sport animals, and companion animals (e.g., pets or support animals). A subject can be a healthy individual, an individual that has or is suspected of having a disease or a predisposition to the disease, or an individual that is in need of therapy or suspected of needing therapy. The terms “individual” or “patient” are intended to be interchangeable with “subject.” For example, a subject can be an individual who has been diagnosed with having a cancer, is going to receive a cancer therapy, and/or has received at least one cancer therapy. The subject can be in remission of a cancer.

DETAILED DESCRIPTION

[0056]

Cancer researchers use, for example, cell lines, patient derived xenografts, and genetically engineered mice as models to investigate tumor biology and to identify therapeutics. The generalizability and power of a model derives from the fidelity with which it represents the tumor type of investigation, however, the extent to which this is true is often unclear. The preponderance of models and the ability to readily generate new ones has created a demand for tools that can measure the extent and ways in which cancer models resemble or diverge from native tumors. In certain aspects, the present disclosure relates to a computational tool, called CancerCellNet (CCN), which measures the similarity of cancer models, in some embodiments, to 25 naturally occurring tumor types and 46 sub-types, in a platform and species agnostic manner. As illustrated in the Examples provided herein, this tool was applied to 657 cancer cell lines, 415 patient derived xenografts, and 26 distinct genetically engineered mouse models, documenting the most faithful models, identifying cancers underserved by adequate models, and finding models with annotations that do not match their classification. By comparing models across modalities, the illustrative Examples further show that genetically engineered mice have higher transcriptional fidelity than patient derived xenografts and cell lines in four out of five tumor types.

[0057]

Exemplary Methods

[0058]

The present disclosure provides various methods of generating training classifiers and/or evaluating cancer models. To illustrate, FIG. 1 is flow chart that schematically depicts exemplary method steps according to some aspects disclosed herein. As shown, method 100 includes generating training data sets in which a given training data set includes gene expression profiles of subjects having a given tumor type (step 102). Typically, one or more of the steps of method 100 are computer implemented. Exemplary systems and computers are described further herein. Method 100 also includes identifying intersecting genes between the training data sets and query samples to produce intersecting gene sets (step 104), and partitioning the intersecting gene sets into training subsets and validation subsets for a given tumor type (step 106). Method 100 also includes identifying groups of differentially over-expressed genes, differentially under-expressed genes, and/or least differentially expressed genes in the training subsets to produce baseline gene sets (step 108), and generating gene-pairs for the tumor types from the baseline gene sets (step 110). Method 100 also includes pair-transforming the gene-pairs to produce binarized training data sets (step 112), and selecting discriminatory gene-pairs for at least some of the tumor types (step 114). In addition, method 100 also includes generating random gene-pair profiles through random permutations of the training data sets (step 116). Typically, these gene-pair profiles lack tumor type annotation. Method 100 also includes selecting gene-pairs as features to produce a random forest classifier to generate the training classifier (step 118). Typically, the methods disclosed herein include evaluating cancer models using the random forest classifier using the training classifier generated by method 100. Aspects of the methods are described further herein, including in the Example.

[0059]

In some embodiments of the methods, the query samples comprise cancer cell line (CCL) samples, patient derived xenograft (PDX) samples, and/or genetically engineered mouse model (GEMM) samples. In certain embodiments, the partitioning step comprises randomly sampling the gene expression profiles for the given tumor type. In some embodiments, the methods include down-sampling, up-sampling, and/or log transforming one or more of the training subsets. In certain embodiments, the methods include using log transformed down-sampled counts to produce the baseline gene sets. In some embodiments, the methods include stratifying sampling when selecting gene-pairs as features to produce the random forest classifier. In certain embodiments, the methods include validating the training classifier using the validation subsets. In some embodiments, the methods include pair-transforming the validation subsets.

[0060]

In some embodiments, the methods include evaluating performance of the training classifier using precision-recall curve and area under the precision-recall curve (AUPR). In certain embodiments, the methods include repeating one or more steps of generating the training classifier. In some embodiments, the methods include the gene-pairs are selected from genes listed in Table 1. In certain embodiments, the methods include adding one or more additional features to produce the random forest classifier. In some embodiments, the methods include evaluating one or more cancer cell line (CCL) expression profiles, patient derived xenograft (PDX) expression profiles, and/or genetically engineered mouse model (GEMM) expression profiles using the training classifier. In some embodiments of the methods, the gene-pairs comprise genes from different species.

[0061]

In certain embodiments of the methods, gene expression profiles comprise RNA-seq and/or microarray gene expression profiles. In some embodiments, the methods also include generating one or more tumor sub-type classifiers. In certain embodiments, the tumor sub-type classifiers comprise one or more gene pairs selected from genes listed in Tables 2-12.

[0062]

Exemplary Systems and Computer Readable Media

[0063]

The present disclosure also provides various systems and computer program products or machine readable media. In some aspects, for example, the methods described herein are optionally performed or facilitated at least in part using systems, distributed computing hardware and applications (e.g., cloud computing services), electronic communication networks, communication interfaces, computer program products, machine readable media, electronic storage media, software (e.g., machine-executable code or logic instructions) and/or the like. To illustrate, FIG. 2 provides a schematic diagram of an exemplary system suitable for use with implementing at least aspects of the methods disclosed in this application. As shown, system 200 includes at least one controller or computer, e.g., server 202 (e.g., a search engine server), which includes processor 204 and memory, storage device, or memory component 206, and one or more other communication devices 214 (e.g., client-side computer terminals, telephones, tablets, laptops, other mobile devices, etc.) positioned remote from and in communication with the remote server 202, through electronic communication network 212, such as the Internet or other internetwork. Communication device 214 typically includes an electronic display (e.g., an internet enabled computer or the like) in communication with, e.g., server 202 computer over network 212 in which the electronic display comprises a user interface (e.g., a graphical user interface (GUI), a web-based user interface, and/or the like) for displaying results upon implementing the methods described herein. In certain aspects, communication networks also encompass the physical transfer of data from one location to another, for example, using a hard drive, thumb drive, or other data storage mechanism. System 200 also includes program product 208 stored on a computer or machine readable medium, such as, for example, one or more of various types of memory, such as memory 206 of server 202, that is readable by the server 202, to facilitate, for example, a guided search application or other executable by one or more other communication devices, such as 214 (schematically shown as a desktop or personal computer). In some aspects, system 200 optionally also includes at least one database server, such as, for example, server 210 associated with an online website having data stored thereon (e.g., control sample or comparator result data, indexed customized therapies, etc.) searchable either directly or through search engine server 202. System 200 optionally also includes one or more other servers positioned remotely from server 202, each of which are optionally associated with one or more database servers 210 located remotely or located local to each of the other servers. The other servers can beneficially provide service to geographically remote users and enhance geographically distributed operations.

[0064]

As understood by those of ordinary skill in the art, memory 206 of the server 202 optionally includes volatile and/or nonvolatile memory including, for example, RAM, ROM, and magnetic or optical disks, among others. It is also understood by those of ordinary skill in the art that although illustrated as a single server, the illustrated configuration of server 202 is given only by way of example and that other types of servers or computers configured according to various other methodologies or architectures can also be used. Server 202 shown schematically in FIG. 2, represents a server or server cluster or server farm and is not limited to any individual physical server. The server site may be deployed as a server farm or server cluster managed by a server hosting provider. The number of servers and their architecture and configuration may be increased based on usage, demand and capacity requirements for the system 200. As also understood by those of ordinary skill in the art, other user communication device 214 in these aspects, for example, can be a laptop, desktop, tablet, personal digital assistant (PDA), cell phone, server, or other types of computers. As known and understood by those of ordinary skill in the art, network 212 can include an internet, intranet, a telecommunication network, an extranet, or world wide web of a plurality of computers/servers in communication with one or more other computers through a communication network, and/or portions of a local or other area network.

[0065]

As further understood by those of ordinary skill in the art, exemplary program product or machine readable medium 208 is optionally in the form of microcode, programs, cloud computing format, routines, and/or symbolic languages that provide one or more sets of ordered operations that control the functioning of the hardware and direct its operation. Program product 208, according to an exemplary aspect, also need not reside in its entirety in volatile memory, but can be selectively loaded, as necessary, according to various methodologies as known and understood by those of ordinary skill in the art.

[0066]

As further understood by those of ordinary skill in the art, the term “computer-readable medium” or “machine-readable medium” refers to any medium that participates in providing instructions to a processor for execution. To illustrate, the term “computer-readable medium” or “machine-readable medium” encompasses distribution media, cloud computing formats, intermediate storage media, execution memory of a computer, and any other medium or device capable of storing program product 208 implementing the functionality or processes of various aspects of the present disclosure, for example, for reading by a computer. A “computer-readable medium” or “machine-readable medium” may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media includes, for example, optical or magnetic disks. Volatile media includes dynamic memory, such as the main memory of a given system. Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise a bus. Transmission media can also take the form of acoustic or light waves, such as those generated during radio wave and infrared data communications, among others. Exemplary forms of computer-readable media include a floppy disk, a flexible disk, hard disk, magnetic tape, a flash drive, or any other magnetic medium, a CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave, or any other medium from which a computer can read.

[0067]

Program product 208 is optionally copied from the computer-readable medium to a hard disk or a similar intermediate storage medium. When program product 208, or portions thereof, are to be run, it is optionally loaded from their distribution medium, their intermediate storage medium, or the like into the execution memory of one or more computers, configuring the computer(s) to act in accordance with the functionality or method of various aspects. All such operations are well known to those of ordinary skill in the art of, for example, computer systems.

[0068]

To further illustrate, in certain aspects, this application provides systems that include one or more processors, and one or more memory components in communication with the processor. The memory component typically includes one or more instructions that, when executed, cause the processor to provide information that causes at least one CCN model or component thereof, and/or the like to be displayed (e.g., via communication device 214 or the like) and/or receive information from other system components and/or from a system user (e.g., via communication device 214 or the like).

[0069]

In some aspects, program product 208 includes non-transitory computer-executable instructions which, when executed by electronic processor 204 perform at least: generating one or more training data sets, wherein a given training data set comprises gene expression profiles of subjects having a given tumor type; identifying intersecting genes between the training data sets and one or more query samples to produce one or more intersecting gene sets; partitioning the intersecting gene sets into training subsets and validation subsets for a given tumor type; identifying one or more groups of differentially over-expressed genes, differentially under-expressed genes, and/or least differentially expressed genes in the training subsets to produce one or more baseline gene sets; generating one or more gene-pairs for one or more of the tumor types from the baseline gene sets; pair-transforming the gene-pairs to produce one or more binarized training data sets; selecting one or more discriminatory gene-pairs for at least some of the tumor types; generating one or more random gene-pair profiles through random permutations of the training data sets, which gene-pair profiles lack tumor type annotation; and selecting one or more of the gene-pairs as features to produce a random forest classifier, thereby generating the training classifier.

[0070]

System 200 also typically includes additional system components that are configured to perform various aspects of the methods described herein. In some of these aspects, one or more of these additional system components are positioned remote from and in communication with the remote server 202 through electronic communication network 212, whereas in other aspects, one or more of these additional system components are positioned local, and in communication with server 202 (i.e., in the absence of electronic communication network 212) or directly with, for example, desktop computer 214.

[0071]

Additional details relating to computer systems and networks, databases, and computer program products are also provided in, for example, Peterson, Computer Networks: A Systems Approach, Morgan Kaufmann, 5th Ed. (2011), Kurose, Computer Networking: A Top-Down Approach, Pearson, 7th Ed. (2016), Elmasri, Fundamentals of Database Systems, Addison Wesley, 6th Ed. (2010), Coronel, Database Systems: Design, Implementation, & Management, Cengage Learning, 11th Ed. (2014), Tucker, Programming Languages, McGraw-Hill Science/Engineering/Math, 2nd Ed. (2006), and Rhoton, Cloud Computing Architected: Solution Design Handbook, Recursive Press (2011), which are each incorporated by reference in their entirety.

Example

[0072]

This example presents various exemplary aspects of CancerCellNet (CCN). Details of CCN are also described in Peng et al. “Evaluating the transcriptional fidelity of cancer models.” bioRxiv (2020) (10.1101/2020.03.27.012757), the entire disclosure of which, including all supplemental material, is incorporated by reference in its entirety.

[0073]

Training Broad CancerCellNet

[0074]

To generate training data sets, 9288 patient tumor non-normalized RNA-seq expression profiles and their corresponding sample tables annotating each patient profile to a cancer type across 25 different tumor types were downloaded from TCGA using TCGAWorkflowData, TCGAbiolinks (Silva et al. (2016) “TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages,” [version 2; peer review: 1 approved, 2 approved with reservations]. F1000Research 5:1542) and SummarizedExperiment (Morgan et al. (2018) SummarizedExperiment: SummarizedExperiment container) packages. After compiling the patient tumor dataset, the intersecting genes between TCGA dataset and all the query samples (CCLs, PDXs, GEMMs) were found, and only those genes were used as features for building the classifier. Two-thirds of the patient tumor profiles from each cancer category randomly sampled as the training set and the rest were used as a validation set to measure the classifier's performance (step 1). The training subset were then down-sampled to 500,000 counts per cell (weightedDown_total=5e5), then scaled up such that the total expression per cell was 100000 (transprop_xFact=1e5) and log transformed (step 2). Using log-transformed down-sampled counts, the top 25 differentially over-expressed genes, top 25 differentially under-expressed genes and 25 least differentially expressed genes were found as baseline genes for generating gene-pairs per cancer type (nTopgenes=25) (step 3). A quicker version of pair-transform different from Tan, et al (Tan et al. (2018)) “SingleCellNet: a computational tool to classify single cell RNA-Seq data across platforms and across species,” BioRxiv) (quickPairs=TRUE) was performed by generating gene-pairs among the 75 genes found in step 3 for each cancer type (step 4). The normalized training data were binarized through pair-transformation inspired by the top-pair classifier (Geman et al. (2004) “Classifying gene expression profiles from pairwise mRNA comparisons,” Statistical Applications in Genetics and Molecular Biology 3, p. Article19.). The top 70 most discriminatory gene-pairs for each cancer type were then selected (step 5) (Table 1). Additionally, 70 random gene-pair profiles were generated through random permutations of existing training data (nrand=70) annotated as “rand” or “Unknown” category in which is designed to capture cases where samples in query do not have representation in the cancer categories in the classifier (step 6). Using selected top gene-pairs as features, a CCN random forest classifier of 1000 trees (nTrees=1000) was constructed (step 7). Additionally, stratified sampling in the construction of random forest classifier was used with a strata size of 60 (stratify=TRUE, samplesize=60) to resolve the issue of imbalance profiles quantity across different cancer types.

[0075]

After the CCN classifier was built, 35 held-out samples from each of the cancer categories from the held-out data were randomly sampled and generated 40 “Unknown” profiles for validation (step 8). The held-out data was gene-pair transformed for assessment based on the top gene-pairs selected (step 9). The performance of the classifier was assessed by using precision-recall curve and area under the precision-recall curve (AUPR) (step 10). The process of randomly sampling a training set from all patient tumor data, train classifier and validate using validation set (step 1-10) was repeated 50 times to have a robust assessment of the classifier represented in FIG. 3B and FIG. 4A. After the parameters were tuned based on the performance of classifier on held-out data, a final version CCN classifier was trained using all the TCGA patient tumor data and 2000 trees (nTrees=2000) with all the other parameters staying the same to improve overall robustness and classification power. The specific parameters for the final CCN classifier and can gene-pairs be found in Table 1. The parameters used to train CCN are provided in Table 13.

[0076]

Classifying Query Data into Broad Class

[0077]

The cancer cell lines expression profiles and sample table were downloaded from a portal at the Broad Institute. PDX expression profiles and a sample table were obtained from Gao et al (Gao et al. (2015) “High-throughput screening using patient-derived tumor xenografts to predict clinical trial drug response,” Nature Medicine 21(11):1318-1325). GEMM expression profiles were obtained from 10 different studies on GEO database (Adeegbe et al. (2018) “BET Bromodomain Inhibition Cooperates with PD-1 Blockade to Facilitate Antitumor Response in Kras-Mutant Non-Small Cell Lung Cancer,” Cancer immunology research 6(10):1234-1245; Blaisdell et al. (2015), “Neutrophils oppose uterine epithelial carcinogenesis via debridement of hypoxic tumor cells,” Cancer Cell 28(6):785-799; Fitamant et al. (2015) “YAP inhibition restores hepatocyte differentiation in advanced HCC, leading to tumor regression,” Cell reports 10(10):1692-1707; Jia et al. (2018) “Crebbp loss drives small cell lung cancer and increases sensitivity to HDAC inhibition,” Cancer discovery 8(11):1422-1437; Kress et al. (2016) “Identification of MYC-Dependent Transcriptional Programs in Oncogene-Addicted Liver Tumors,” Cancer Research 76(12):3463-3472; Li et al. (2018) “GKAP acts as a genetic modulator of NMDAR signaling to govern invasive tumor growth,” Cancer Cell 33(4):736-751.e5; Mollaoglu et al. (2018) “The Lineage-Defining Transcription Factors SOX2 and NKX2-1 Determine Lung Cancer Cell Fate and Shape the Tumor Immune Microenvironment,” Immunity 49(4):764-779.e9; Pan et al. (2017) “Whole tumor RNA-sequencing and deconvolution reveal a clinically-prognostic PTEN/PI3K-regulated glioma transcriptional signature,” Oncotarget 8(32):52474-52487; Lissanu Deribe et al. (2018) “Mutations in the SWI/SNF complex induce a targetable dependence on oxidative phosphorylation in lung cancer,” Nature Medicine 24(7):1047-1057). To use CCN classifier on GEMM data, the mouse genes were converted from GEMM expression profiles into human orthologs. Once a final classifier was trained with all the patient tumor samples, the query samples were gene-pair transformed with gene-pairs selected from the training step and the query samples were classified using CCN. The results were analyzed using R and the classification results were visualized through heatmaps and attribution plots processed using R package ggplot2 (Wickham (2016) ggplot2—Elegant Graphics for Data Analysis. New York, N.Y.: Springer-Verlag New York).

[0078]

Cross-Species Assessment

[0079]

Among the innovative aspects of the CCN tool is the ability for cross species analysis. To assess the performance of cross-species classification, 1003 labelled human tissue/cell type and 1993 labelled mouse tissue/cell type RNA-seq expression profiles were downloaded from Github. The mouse genes were converted into human orthologous genes. Then the intersecting genes were found between mouse tissue/cell expression profiles and human tissue/cell expression profiles. Using the intersecting genes, a CCN classifier was trained with all the human tissue/cell expression profiles. The parameters can be found in Table 3. After the classifier was trained, 75 samples were randomly sampled from each tissue category in mouse tissue/cell data and the classifier was applied on those samples to assess performance. The AUPR is depicted in FIG. 4C.

[0080]

Cross-Technology Assessment

[0081]

To assess the performance of CCN in applications to microarray, 6219 patient tumor microarray profiles were gathered across 12 different cancer types from the GEO database from more than 100 different projects. The interesting genes between the microarray profiles and TCGA patient RNA-seq profiles were located. Using those genes as features, a CCN classifier was created with all the TCGA patient profiles using hyper-parameters listed in Table 4. The parameters used to train CCN are provided in Table 13. After the microarray specific classifier was trained, 60 microarray patient samples were randomly sampled from each cancer category, and the CCN classifier was applied on them as an assessment of the cross-technology performance. The same CCN classifier was used to classify microarray CCL samples.

[0082]

Training Sub-Type CancerCellNet

[0083]

Eleven cancer types (BRCA, COAD, ESCA, HNSC, KIRC, LGG, PAAD, UCEC, STAD, LUAD, LUSC) were found which have meaningful subtypes based on either histology or expression and sufficient samples in every subtype to train a sub-type classifier with high AUPR. Normal tissue samples were also included from BRCA, COAD, HNSC, KIRC, UCEC to create a normal tissue category in the construction of their sub-type classifier. To train a sub-type classifier, a sample table was manually curated annotating each as either a cancer sub-type or “Unknown” representing other cancer types. Similar to training for broad class classifier, ⅔ of all samples in each sub-type (and “Unknown” category) were randomly sampled as training data. Expression down sampling, gene selections, gene-pair transform and selection (step 2-5 from broad training) were performed using just the samples labelled as a cancer sub-type (excluding samples labelled as “Unknown”) to find discriminating gene pairs that can differentiate sub-type in the broad cancer. Different from the broad class CCN training, the quick version of pair-transform was not used for creating gene-pairs for feature selection. In addition to having gene-pairs as features, the final broad class classifier was applied to all the training samples and the classification scores were added as features to mainly discriminate between the broad cancer type of interest and other cancer types. For some sub-type classifiers, the weight of the broad classification scores were increased as features to fine tune the sub-type classifiers. Some random permutation samples were also generated to add to the “Unknown” training data along with expression profiles of other cancer types. The specific parameters used to train individual sub-type classifiers can be found in Table 5. The parameters used to train CCN are provided in Table 13.

[0084]

An equal amount across all sub-types and Unknown category in the held-out data was then sampled for assessing the sub-type classifiers through AUPR. The process was repeated 20 times for robust assessment of the sub-type classifiers. The results are shown in FIG. 4E. For the final sub-type classifiers of the 11 broad categories, all of the TCGA data was used.

[0085]

Classifying Query Data into Sub-Type

[0086]

The 11 sub-type classifiers were applied on query samples when available. Heatmap visualizations were done using ComplexHeatmap package (Gu et al. (2016) “Complex heatmaps reveal patterns and correlations in multidimensional genomic data,” Bioinformatics 32(18):2847-2849) and other analysis were done in R.

[0087]

Results

[0088]

CancerCellNet Classifies Samples Accurately Across Species and Technologies

[0089]

A computational tool was previously developed using the Random Forest classification method to measure the similarity of engineered cell populations with their in vivo counterparts based on transcriptional profiles (Cahan et al. (2014) “CellNet: network biology applied to stem cell engineering,”. Cell, 158(4):903-915.; Radley et al. (2017) “Assessment of engineered cells using CellNet and RNA-seq,” Nature Protocols 12(5):1089-1102). This approach was recently elaborated to allow for classification of single cell RNA-Seq data in a manner that allows for cross-platform and cross-species analysis (Tan et al. (2018) “SingleCellNet: a computational tool to classify single cell RNA-Seq data across platforms and across species,” BioRxiv.). In the present example, an approach was used to quantitatively compare cancer models to naturally occurring patient tumors (FIG. 3A). In brief, The Cancer Genome Atlas (TCGA) expression data was used from 25 solid tumor types to train a top-pair multi-class Random forest classifier. The approach also included an ‘Unknown’ category trained on a random shuffling and sampling of profiles from the remaining 24 tumor types in the training data to identify query samples that are not reflective of any of the training data.

[0090]

The performance of this approach was assessed by computing the area under the precision recall curves derived by k-fold cross validation (n=50) (FIG. 3B and FIG. 4A). In the k-fold cross validation, the mean AUPR exceeded 0.95 in most of the tumor types and was below 0.7 only for the READ and COAD categories. This is not surprising as READ and COAD are considered to be the same disease. In addition to achieving high mean AUPRs on held-out TCGA data, it was found that CCN also achieved high AUPR (above 0.9) when it was applied to independent testing data from ICGC consisting RNA-Seq data from 886 tumors across 5 tumor types (FIG. 4B) (Zhang et al. (2011) “International Cancer Genome Consortium Data Portal—a one-stop shop for cancer genomics data,” Database: the Journal of Biological Databases and Curation, p. bar026).

[0091]

One of the aims of the study was to compare distinct cancer models, including GEMMs, the exemplary method was able to classify samples from mouse and human samples equivalently. The Top-Pair transform, previously described (Tan et al. (2018) “SingleCellNet: a computational tool to classify single cell RNA-Seq data across platforms and across species,” BioRxiv), was used to achieve this and the feasibility of this approach was tested by assessing the performance of a normal (i.e., non-tumor) human tissue classifier as applied to mouse tissues. Consistent with prior applications, it was found that the cross-species classifier performed well, achieving mean AUPR of 0.93 when applied to mouse data (FIG. 4C).

[0092]

To evaluate cancer models at a finer resolution, an approach was developed to perform tumor sub-type classifications (FIG. 4D). Eleven different cancer sub-type classifiers were constructed based on the availability of expression or histological subtype information (Cancer Genome Atlas Network (2012), “Comprehensive molecular portraits of human breast tumours,” Nature 490(7418):61-70; Parker et al. (2009), “Supervised risk predictor of breast cancer based on intrinsic subtypes,” Journal of Clinical Oncology 27(8): 1160-1167; Cancer Genome Atlas Network (2012), “Comprehensive molecular characterization of human colon and rectal cancer,” Nature 487(7407):330-337; Cancer Genome Atlas Research Network (2017), “Integrated genomic characterization of pancreatic ductal adenocarcinoma,” Cancer Cell 32(2):185-203.e13; Cancer Genome Atlas Network (2015), “Comprehensive genomic characterization of head and neck squamous cell carcinomas,” Nature 517(7536):576-582; Cancer Genome Atlas Research Network (2013), “Comprehensive molecular characterization of clear cell renal cell carcinoma,” Nature 499(7456):43-49; Verhaak et al. (2010), “Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1,” Cancer Cell 17(1):98-110; Cancer Genome Atlas Research Network (2014), “Comprehensive molecular profiling of lung adenocarcinoma,” Nature 511(7511): 543-550; Wilkerson et al. (2010), “Lung squamous cell carcinoma mRNA expression subtypes are reproducible, clinically important, and correspond to normal cell types,” Clinical Cancer Research 16(19):4864-4875; Cancer Genome Atlas Research Network, Analysis Working Group: Asan University, BC Cancer Agency, et al. (2017), “Integrated genomic characterization of oesophageal carcinoma,” Nature 541(7636):169-175; Hu et al. 2012; Cancer Genome Atlas Research Network, Kandoth et al. (2013) “Integrated genomic characterization of endometrial carcinoma,” Nature 497(7447):67-73). Non-cancerous, normal tissues were also included when available for several sub-type classifiers (BRCA, COAD, HNSC, KIRC and UCEC). The 11 sub-type classifiers all achieved high overall AUPRs ranging from 0.78 to 0.98 (FIG. 4E).

[0093]

Fidelity of Cancer Cell Lines

[0094]

Having validated the performance of CCN, it was then used to determine the fidelity of CCLs. RNA-seq expression data of 657 different cell lines was mined across 20 cancer types from Cancer Cell Line Encyclopedia (CCLE) and CCN was applied to them, finding a wide classification range for cell lines of each tumor type (FIG. 5A). To verify the classification results, CCN was applied to CCLE expression profiles generated through microarray expression profiling. To ensure that CCN would function on microarray data, CNN was applied to 720 expression profiles of 12 tumor types from GEO. The cross-platform CCN classifier performed well, based on comparison to study-provided annotation, achieving a mean AUPRs of 0.94 (FIG. 6A). Next, this was applied cross-platform classifiers to microarray expression profiles of CCLE (FIG. 6B). From the classification results of 571 cell lines that have both RNA-seq and microarray expression profiles, a strong positive association was found between the classification scores from RNA-seq and those from microarray (FIG. 6C). This comparison supports the notion that the classification scores for each cell line are not artifacts of profiling methodology. Moreover, this comparison shows that the scores are consistent between the times that the cell lines were first assayed by microarray expression profiling in 2012 and by RNA-Seq in 2019, further validating the robustness of the CCN results.

[0095]

Next, the CCN scores of CCLE cell lines was categorized based on the proportion of lines associated with each tumor type that were correctly classified. A decision threshold of 0.266 was set, which was selected as it represents the 5th percentile of all TCGA held-out classification scores to ensure at least 95% true positive rate for the held-out data. Each cell line was placed into one of five categories based on its CCN profile: correctly classified, mix-correctly classified, not classified, mix incorrectly classified and incorrectly classified (FIG. 5B). Cell lines originally annotated as BRCA, CESC SKCM and SARC had a high proportion of lines correctly classified. The COAD_READ cell lines had a high proportion of cell lines with mixed classification, reflecting the similarities of the tumor samples in the COAD and READ training data. Seventeen out of twenty tumor types had greater than 25% of lines that received no classification. In particular, no ESCA, GBM and LGG cell lines were classified as such, suggesting that these tumor types need more faithful cell line models (FIGS. 5 A and B).

[0096]

One way to explain low classification scores is that some cell lines are derived from and represent sub-types of tumors that are not well-represented in TCGA. To explore this hypothesis, tumor sub-type classification was first performed on the CCLE lines from 11 tumor types for which sub-type classifiers had been trained. It was reasoned that if a cell was a good model for a rarer sub-type, then it would receive a poor general classification but a high classification for the sub-type that it models well. Therefore, the number of lines that fit this pattern was counted. It was found that of the 198 lines with no general classification, 52 (26%) were classified as a specific sub-type, suggesting that derivation from rare sub types is not the major contributor to poor overall CCL classification.

[0097]

Another potential contributor to low scoring cell lines could be the intra-tumor impurity in the training data. If impurity were such a confounder of CCN scoring, then a positive correlation between mean purity and mean CCN classification of CCLE per general tumor type would be expected. However low Pearson correlation of 0.076 between the mean purity and mean CCN classification scores of CCLE was found, suggesting that tumor purity is not a major contributor to the low scoring of CCLEs (FIG. 5D).

[0098]

Next, the sub-type classification of CCLs from three general tumor types was explored in more depth, focusing first on Uterine Corpus Endometrial Carcinoma (UCEC). The histological based sub-types of UCEC, endometrioid and serous histological type, differ in prevalence, molecular properties, prognosis, and treatment (Black et al. (2014), “Targeted therapy in uterine serous carcinoma: an aggressive variant of endometrial cancer,” Women's health (London, England) 10(1):45-57; Yang et al. (2011), “Progesterone: the ultimate endometrial tumor suppresso,” Trends in Endocrinology and Metabolism 22(4):145-152). CCN classified the majority of the UCEC cell lines as serous. All of the other lines were classified as ‘unknown’ except for JHUEM-1 and HEC-265, which received a mixed serous and endometrioid, meaning that the classification of each sub-type exceeded the 5th percentile of TCGA held-out classification scores (FIG. 5C). The preponderance of serous versus endometroid may be due to properties of serous cancer cells that aid propagation in vitro, such as upregulation in cell adhesion (Huszar et al. (2010), “Up-regulation of L1CAM is linked to loss of hormone receptors and E-cadherin in aggressive subtypes of endometrial carcinomas,” The Journal of Pathology 220(5):551-561) helps the derivation of CCLs. Some of the sub-type classification results are consistent with prior observations. For example, HEC-1A, HEC-1B, and KLE were previously characterized as endometrial (Kozak et al. (2018) “A guide for endometrial cancer cell lines functional assays using the measurements of electronic impedance,” Cytotechnology 70(1):339-350). On the other hand, the sub-type classification results contradict prior observations in at least one case. For example, Ishikawa ER− has been used as a model of endometroid cancer (Korch et al. (2012), “DNA profiling analysis of endometrial and ovarian cell lines reveals misidentification, redundancy and contamination,” Gynecologic Oncology 127(1):241-248; Kozak et al. (2018) “A guide for endometrial cancer cell lines functional assays using the measurements of electronic impedance,” Cytotechnology 70(1):339-350), CCN classified the Ishikawa 02 ER− cell line strongly as serous. This could be a result of ER negative being a characteristic of type 2 endometrial cancer (Black et al. (2014), “Targeted therapy in uterine serous carcinoma: an aggressive variant of endometrial cancer,” Women's health (London, England) 10(1): 45-57). Taken together, these results indicate a need for more endometroid-like CCLs.

[0099]

Next, the sub-type classification of Lung Squamous Cell Carcinoma (LUSC) cell lines (FIG. 5D) was examined. It was found that of the 19 lines unclassified or misclassified in the general classifier, 16 (84%) were considered to be the unknown sub-type. These three lines had general classification scores modestly below the threshold; two had sub-type classification as primitive, and one as a mix of basal, primitive and secretory. Among all of the cell LUAD lines that were classified, all the cell lines have underlying primitive subtype classification. This is consistent either with the ease of deriving lines from tumors with a primitive character, or with a process by which cell line derivation promotes similarity to more the primitive sub-type, which is marked by increased cellular proliferation (Wilkerson et al. (2010), “Lung squamous cell carcinoma mRNA expression subtypes are reproducible, clinically important, and correspond to normal cell types,” Clinical Cancer Research 16(19):4864-4875). The results are consistent with prior reports that have investigated the resemblance of some lines to LUAD sub-types. For example, HCC-95, classified as classical and primitive subtype, has previously been characterized as classical (Wu et al. (2013), “Gene-expression data integration to squamous cell lung cancer subtypes reveals drug sensitivity,” British Journal of Cancer 109(6):1599-1608; Wilkerson et al. (2010), “Lung squamous cell carcinoma mRNA expression subtypes are reproducible, clinically important, and correspond to normal cell types,” Clinical Cancer Research 16(19):4864-4875). Further, LUDLU-1, classified as a mix of primitive, basal and classical, was previously characterized as resembling both basal and classical (Wu et al. (2013), “Gene-expression data integration to squamous cell lung cancer subtypes reveals drug sensitivity,” British Journal of Cancer 109(6):1599-1608). Lung Adenocarcinoma (LUAD) cell lines had classification results similar to LUSC: most lines did not classify as LUAD in the general classifier (53 of 76), and most of the remaining lines exhibited mixed sub-type classification (FIG. 5E). RERF-LC-Ad1 had the highest general classification score and the highest proximal inflammation sub-type classification score. Taken together, these sub-type classification results have revealed an absence of cell lines models for basal, classical, and secretory LUSC, and for the TRU LUAD sub-type.

[0100]

Finally, it was sought to measure the extent to which cell line transcriptional fidelity related to model use. The number of papers in which a model was mentioned was used, normalized by the number of years since the cell line was derived, as a rough approximation of model usage. To explore this metric, the normalized citation count was plotted versus general classification score, labeling the highest cited and highest classified cell lines from each general tumor type (FIG. 5F). For most of the general tumor types, the highest cited cell line is not the highest classified cell line except for Hep G2 and ML-1, representing LIHC and THCA, respectively. On the other hand, the general scores of the highest cited cell lines representing BRCA, LUAD, OV, PRAD and SKCM fall below the classification threshold of 0.266. Notably, each of these tumor types have lines with scores exceeding 0.5, suggesting that these lines should be considered as more faithful transcriptional models when selecting lines for a study.

[0101]

Evaluation of Patient Derived Xenografts

[0102]

Next, it was sought to evaluate a more recent class of cancer models: PDX. To do so, the RNA-Seq expression profiles of 415 PDX models from 13 different types of cancer types generated previously (Gao et al. (2015), “High-throughput screening using patient-derived tumor xenografts to predict clinical trial drug response,” Nature Medicine 21(11):1318-1325) were subjected to CCN. Similar to the results of CCLE, the PDXs exhibited a wide range of classification scores (FIG. 7A). By categorizing the CCN scores of PDX based on the proportion of samples associated with each tumor type that were correctly classified, it was found that SARC, SKCM and BRCA have higher proportion of correctly classified PDX than those of other cancer categories (FIG. 7B). In contrast to CCLE, it was found a higher proportion of correctly classified PDX in STAD and KIRC (FIG. 7B). However, similar to CCLE, no ESCA PDXs correctly classified. This held true when sub-type classification was performed on PDX samples: none of the PDX in ESCA were classified as any rare ESCA subtypes (FIG. 11). UCEC PDXs had both endometrioid subtypes, serous subtypes, and mixed subtypes, which provides broader representation than in CCLE (FIG. 8C). LUSC PDXs had a large proportion HNSC misclassified, yet strong as basal and classical subtype classification (FIG. 8D). This could be due to result from the similarity in expression profiles of basal and classical subtypes of HNSC and LUSC (Walter et al. (2013), “Molecular subtypes in head and neck cancer exhibit distinct patterns of chromosomal gain and loss of canonical cancer genes,” Plos One 8(2):e56823; Wickham (2016) ggplot2—Elegant Graphics for Data Analysis, New York, N.Y.: Springer-Verlag New York). No LUSC PDXs lack were classified as the secretory subtype (FIG. 8D). While 9 of the LUAD PDX samples were classified as the unknown sub-type class classification, the remaining 5 classify as proximal proliferative or mixed proximal proliferative and proximal inflammatory (FIG. 9). Finally, similar to the CCLE, there were no TRU subtypes in the PDX cohort (FIG. 9). Collectively, these results indicate that PDXs can have very high transcriptional fidelity to both general tumor types and sub-types.

[0103]

Evaluation of GEMMs

[0104]

Next, CCN was used to evaluate GEMMs of six general tumor types from ten studies for which expression data was publicly available (Adeegbe et al. (2018) “BET Bromodomain Inhibition Cooperates with PD-1 Blockade to Facilitate Antitumor Response in Kras-Mutant Non-Small Cell Lung Cancer,” Cancer immunology research 6(10):1234-1245; Blaisdell et al. (2015), “Neutrophils oppose uterine epithelial carcinogenesis via debridement of hypoxic tumor cells,” Cancer Cell 28(6):785-799; Fitamant et al. (2015) “YAP inhibition restores hepatocyte differentiation in advanced HCC, leading to tumor regression,” Cell reports 10(10):1692-1707; Jia et al. (2018) “Crebbp loss drives small cell lung cancer and increases sensitivity to HDAC inhibition,” Cancer discovery 8(11):1422-1437; Kress et al. (2016) “Identification of MYC-Dependent Transcriptional Programs in Oncogene-Addicted Liver Tumors,” Cancer Research 76(12):3463-3472; Li et al. (2018) “GKAP acts as a genetic modulator of NMDAR signaling to govern invasive tumor growth,” Cancer Cell 33(4):736-751.e5; Mollaoglu et al. (2018) “The Lineage-Defining Transcription Factors SOX2 and NKX2-1 Determine Lung Cancer Cell Fate and Shape the Tumor Immune Microenvironment,” Immunity 49(4):764-779.e9; Pan et al. (2017) “Whole tumor RNA-sequencing and deconvolution reveal a clinically-prognostic PTEN/PI3K-regulated glioma transcriptional signature,” Oncotarget 8(32):52474-52487; Lissanu Deribe et al. (2018) “Mutations in the SWI/SNF complex induce a targetable dependence on oxidative phosphorylation in lung cancer,” Nature Medicine 24(7):1047-1057). As was true for CCLs and PDXs, GEMMs also had a wide range of CCN scores (FIG. 8A). The CCN scores were next categorized based on the proportion of samples associated with each tumor type that were correctly classified (FIG. 8B). In contrast to CCLs and PDXs, the GEMM dataset included multiple replicates per model, which allowed for the examination of intra-GEMM variability. Both at the level of CCN score and at the level of categorization, GEMMs were highly invariant. For example, replicates of LUAD GEMMs (driven by Kras mutation and loss of p53 (Adeegbe et al. (2018) “BET Bromodomain Inhibition Cooperates with PD-1 Blockade to Facilitate Antitumor Response in Kras-Mutant Non-Small Cell Lung Cancer,” Cancer immunology research 6(10):1234-1245), and Smarca4 loss (Lissanu Deribe et al. (2018) “Mutations in the SWI/SNF complex induce a targetable dependence on oxidative phosphorylation in lung cancer,” Nature Medicine 24(7):1047-1057), or overexpression of Sox2 and loss of Lkb1 (Mollaoglu et al. (2018) “The Lineage-Defining Transcription Factors SOX2 and NKX2-1 Determine Lung Cancer Cell Fate and Shape the Tumor Immune Microenvironment,” Immunity 49(4):764-779.e9) were all correctly classified (FIG. 8B). GEMMs sharing genotypes across studies, such as Pgr(cre/+)Pten(lox/lox)-driven UCEC (Blaisdell et al. (2015), “Neutrophils oppose uterine epithelial carcinogenesis via debridement of hypoxic tumor cells,” Cancer Cell 28(6):785-799; Daikoku et al. (2008) “Conditional loss of uterine Pten unfailingly and rapidly induces endometrial cancer in mice,” Cancer Research 68(14):5619-5627) received highly similar general and sub-type classification scores (FIG. 9). Even GEMMs with mixed classifications received consistent CCN scores. For example, LGG GEMMs, generated by Nf1 mutations expressed in different neural progenitors in combination with Pten deletion (Pan et al. (2017) “Whole tumor RNA-sequencing and deconvolution reveal a clinically-prognostic PTEN/PI3K-regulated glioma transcriptional signature,” Oncotarget 8(32):52474-52487), consistently received mixed classification as both LGG and GBM (FIG. 8A).

[0105]

To explore the extent to which driver genotype impacts sub-type classification, two general tumor types were examined in which there were GEMMs with different tumor drivers: LUSC and LUAD. The LUSC GEMMs were generated using loss of Lkb1 and either overexpression of Sox2 (via two distinct mechanisms) or loss of Pten (Mollaoglu et al. (2018) “The Lineage-Defining Transcription Factors SOX2 and NKX2-1 Determine Lung Cancer Cell Fate and Shape the Tumor Immune Microenvironment,” Immunity 49(4):764-779.e9). It was found that most of the lenti-Sox2-Cre-infected;Lkb1fl/fl samples were classified as LUSC, whereas the majority of the Rosa26LSL-Sox2-1RES-GFP;Lkb1fl/fl samples were classified as either LUAD or a mixture of LUAD and LUSC (FIG. 8C). It is possible that the distinct transcriptional programs result from differing levels of exogenous Sox2 expression in these models, and that the samples with mixed classification results reflect an adenosquamous carcinoma phenotype. Most of the Lkb1fl/fl;Ptenfl/fl GEMMs were classified as ‘unknown’. Moreover, the sub-type classification indicated that this GEMM was either unknown or of mixed serous/primitive sub-type, in contrast to prior reports suggesting that it is most similar to a basal subtype (Xu et al. (2014) “Loss of Lkb1 and Pten leads to lung squamous cell carcinoma with elevated PD-L1 expression,” Cancer Cell 25(5):590-604). The results have shown that Lkb1fl/fl,Ptenfl/fl GEMMs are mostly classified as unknown and primitive, secretory subtypes which correlates with the general classification scores. The lenti-Sox2-Cre-infected;Lkb1fl/fl samples were more strongly classified as the secretory sub-type, whereas the Rosa26LSL-Sox2-1RES-GFP;Lkb1fl/fl samples were classified as a more balanced mix of serous and primitive sub-types. None of the three LUSC GEMMs were sub-typed as classical or basal. All of the LUAD GEMMs, which were generated using various combinations of activating Kras mutation, loss of Trp53, loss of Lkb1, and loss of Smarca4L (Lissanu Deribe et al. (2018) “Mutations in the SWI/SNF complex induce a targetable dependence on oxidative phosphorylation in lung cancer,” Nature Medicine 24(7):1047-1057; Adeegbe et al. (2018) “BET Bromodomain Inhibition Cooperates with PD-1 Blockade to Facilitate Antitumor Response in Kras-Mutant Non-Small Cell Lung Cancer,” Cancer immunology research 6(10):1234-1245); Mollaoglu et al. (2018) “The Lineage-Defining Transcription Factors SOX2 and NKX2-1 Determine Lung Cancer Cell Fate and Shape the Tumor Immune Microenvironment,” Immunity 49(4):764-779.e9), were correctly classified (FIG. 8D). There were no substantial differences in general, or sub-type classification across driver genotypes. Notably, the sub-types tended to be a mixture of proximal proliferation, proximal inflammation and TRU. Taken together, this analysis suggests that there is a degree of similarity, and perhaps plasticity between the primitive and secretory (but not basal or classical) sub-types of LUSC. On the other hand, while the LUAD GEMMs classify strongly as LUAD, all have a mixed sub-type classification—a result that does not vary by genotype.

[0106]

Comparison of CCLs, PDXs, and GEMMs

[0107]

Finally, it was sought to estimate the comparative transcriptional fidelity of the three cancer models modalities, limiting the comparison to those five general tumor types for which there were at least two examples per modality: UCEC, PAAD, LUSC, LUAD, and LIHC. The general CCN scores of each model were compared on a per tumor type basis (FIG. 10). In the case of GEMMs, the mean classification score of all samples with shared genotypes was used. It was found that GEMMs had the highest median general classification scores in four out of the five tumor types. However, some PDXs achieved the highest classification scores. In UCEC, LUAD and LIHC, the maximum classification score of PDXs exceeded 0.75 and were thus comparable to the majority of scores on held out TCGA data, highlighting the potential for PDXs to mirror the transcriptional state of natural tumors (FIG. 10).

[0108]

It was also sought to compare model modalities in terms of the diversity of sub-types that they represent. As a reference, the overall sub-type incidence was also included in this analysis, as approximated by incidence in TCGA. In models of UCEC, there is a notable difference in endometroid incidence, and the proportion of models classified as endometroid, with only PDX having any representatives (FIG. 10). The vast majority of CCLE and all of the GEMM models of PAAD have an unknown sub-type classification. However, the PDXs are sub-typed as either a mixture of basal and classical, or classical alone. No model of LUSC was sub-typed exclusively as secretory, and only PDXs were sub-typed exclusively as basal. No model of LUAD was sub-typed exclusively as TRU, but there were models that were sub-typed exclusively as proximal proliferative in both PDXs and GEMMs. Taken together, these results indicate that only a few CCLs are good transcriptional exemplars of natural tumor sub-types, that GEMMs are typically mixtures of sub-types, and the PDXs are the modality that can best reflect specific sub-types.

[0109]

Discussion

[0110]

A major goal in the field of cancer biology is to develop models that mimic naturally occurring tumors with enough fidelity to enable therapeutic discoveries. However, methods to measure the extent to which cancer models resemble or diverge from native tumors are lacking. This is especially problematic now because there are many existing models from which to choose, and it has become easier to generate new models. Accordingly, in certain aspects, this disclosure presents CancerCellNet (CCN), a computational tool that measures the similarity of cancer models to 25 naturally occurring tumor types and 46 sub-types. Because CCN is platform and species agnostic, it can be applied across many model modalities, including CCLs, PDXs, and GEMMs, and thus it represents a consistent platform to compare models across modalities. In this example, CCN was applied to 657 cancer cell lines, 415 patient derived xenografts, and 26 distinct genetically engineered mouse models. Several exemplary lessons emerged from these computational analyses that have implications for the field of cancer biology.

[0111]

First, CancerCellNet indicates that GEMMs are transcriptionally the most faithful models of four out of five general tumor types for which data from all modalities was available. This is consistent with the fact that GEMMs are typically derived by recapitulating well defined driver mutations of natural tumors, and thus this observation corroborates the importance of genetics in the etiology of cancer. Moreover, in contrast to PDXs, GEMMs are typically generated in immune replete (complete) hosts. Therefore, the higher fidelity of GEMMs may also be a result of the influence of a native immune system on GEMM tumors. Second, PDXs and CCLs have lower scores that are comparable to each other. This is consistent with the observation that PDXs can undergo selective pressures in the host that distort the progression of genomic alterations away from what is observed in natural tumors (Ben-David et al. (2017) “Patient-derived xenografts undergo mouse-specific tumor evolution,” Nature Genetics 49(11):1567-1575). Furthermore, the observation that a few PDXs have very high classification scores, approaching a level that is indistinguishable from held out TCGA data, suggests that under certain conditions, PDX can almost perfectly mimic natural tumors transcriptionally. It is unclear what these conditions are; it may be that these few PDXs were profiled prior to the acquisition of non-typical genomic alterations. Third, it was found that none of the samples that we evaluated here are transcriptionally adequate models of ESCA, and therefore this tumor type requires further attention to derive new models. Fourth, it was found that in several tumor types, GEMMs tend to reflect mixtures of sub-types rather than conforming to single sub-types. The reasons for this are not clear but it is possible that in the cases that were examined, the histologically defined sub-types have a degree of plasticity that is exacerbated in the murine host environment.

[0112]

CCN includes various embodiments or aspects. For example, CCN is based on transcriptomic data in some embodiments, but other molecular readouts of tumor state are also optionally utilized in lieu of, or in combination with, transcriptomic data, such as profiles of the proteome, epigenome, non-coding RNA-ome, and genome, among others, can also be mimicked in a model system. It is possible that some models reflect tumor behavior well, and because this behavior is not well predicted by transcriptome alone, these models have lower CCN scores. To both measure the extent that such situations exist, and to correct for them, other omic data is optionally incorporated into CCN so as to make more accurate and integrated model evaluation possible. Further, in the cross-species analysis, CCN generally implicitly assumes that homologs are functionally equivalent. The extent to which they are not functionally equivalent determines how confounded the CCN results will be. However, this possibility may be of limited consequence based on the high performance of the normal tissue cross-species classifier, and based on the fact that GEMMs have the highest median CCN scores. In addition, the TCGA training data is made up of RNA-Seq from bulk tumor samples, which necessarily includes non-tumor cells, whereas the CCLs are by definition cell lines of tumor origin. Therefore, CCLs theoretically could have artificially low CCN scores due to the presence of non-tumor cells in the training data. This potential problem appears to be limited as no correlation between tumor purity and CCN score was found in the CCLE samples. However, this potential problem may be related to the question of intra-tumor heterogeneity. Thus, in certain embodiments, CCN can be extended to interpret single cell RNA-Seq data. A sufficient amount of training single cell RNA-Seq data enables CCN to not only evaluate models on a per cell type basis, but also based on cellular composition.

[0000]

BRCAGBMOVLUADUCEC
BRCA_1BRCA_2GBM_1GBM_2OV_1OV_2LUAD_1LUAD_2UCEC_1UCEC_2
LMX1BMIB2PSRC1FLNBWT1TAF15NAPSAPPP2R1ADLX5PRNP
LMX1BANKS6KLHDC8AFLNBWT1SUN2SFTA2ITPK1DLX6NR3C1
LMX1BID1C21orf62NET1WT1DSTSFTA2OAFDLX5SBDS
TRPS1ODC1NR2E1NET1KCNK15ORMDL3SFTA2PLCD3DLX5RNF13
PRLRETS2LCTLFAM83HKLHL14ORMDL3NAPSAPTMSMSX1SBDS
AARDANKS6GAP43NUCKS1ZNF503TAF15NAPSAHNRNPCDLX6TBC1D2B
TRPS1HADHAPSRC1TRIM27KCNK15RETSATROS1SLC16A1DLX6LYPLAL1
TRPS1EIF3LCNR1NET1KLHL14USP47SFTPDCELSR2MSX1CALCOCO2
PRLRODC1PSRC1HTATSF1KCNK15DNAJC3ROS1CELSR2MSX1TACC1
IRX5ESRRARNASE2FAM83HKLHL14DNAJC7SCGB3A2SLC16A1MAP2K6TAOK3
AARDPSAT1C21orf62DSTYKZNF503NAP1L4SFTPA1CELSR2STX18CALCOCO2
EFHD1ITM2CRFX4HTATSF1DOK5DSTROS1PHGDHSTX18SERINC3
IRX5MIB2RNASE2DSPATP6V1B1ORMDL3SFTPA1PHGDHSOX17CREBL2
IRX5ID1NR2E1NT5DC1DOK5SPAG9SFTPCHRSTX18TM9SF4
AARDFZD5PLA2G5MYO1DATP6V1B1NAP1L4BPIFA1HRSOX17PRNP
PRLRETFBNR2E1LSRZNF503NBR1SFTPA1SOX9CCDC157LYPLAL1
GATA3GSTP1LCTLBAIAP2L1DOK5ABRSFTPDECSITTEKT2LYPLAL1
GATA3ITM2CC21orf62MYO1DATP6V1B1TAF15SFTPDTIMM44SOX17TBC1D2B
TBC1D9HADHALCTLDSPPNOCPPP3CCCOL6A5HRMAP2K6SBDS
PIPPSAT1PLA2G5LSRNPR1NBR1SCGB3A2PHGDHFGF18PLSCR4
GATA3ETS2PLA2G5KIAA1217LYPD1DSTSFTPCSLC16A1FGF18NR3C1
EFHD1CKBHEPACAMHTATSF1LYPD1NAP1L4SFTPCSYNGR1MAP2K6RNF13
PLEKHF2ODC1RNASE2LSRNPR1SUN2SCGB3A2LARP6ARMC3PLSCR4
CILPITM2CPOU3F2BRD3PNOCELL2LGSNPLEKHH1ARMC3NR3C1
SLC16A6ANKS6GAP43CNDP2PNOCNIPA1TREM1OAFFGF18NEDD4
NAT1UBE2E3KLHDC8AJUPNPR1SPAG9TREM1PLCD3HOXB6CALCOCO2
ESR1GSTP1CNR1FLNBLYPD1SPAG9LGSNPPT2TEKT2PLSCR4
FSIP1STARD4RNASE3HOOK1DOK7LRP11CCNJLECSITEMX2PRNP
PIPFZD5MT3NUCKS1DOK7TMEM181CCNJLTIMM44RNF183ADCY9
PIPMID1POU3F2DSTYKRSPO1LRP11SFTPBPPP2R1AELP3SERINC3
SERTAD4RNF145KLHDC8AMYO1CRSPO1PPP3CCLPCAT1PPP2R1ARNF183TAOK3
NAT1RNF145RNASE3ARHGEF5RSPO1STK39SFTPBPTMSEMX2MAF
NAT1PPARACNR1DSTYKDOK7TOM1TBX4SYNGR1EMX2CREBL2
FSIP1PRKCADBX2DSPMEIS1NBR1SFTPBHNRNPCC2orf88TACC1
FSIP1PSAT1RFX4BRD3CTU1STK39NKX2-1WIZRNF183ELL2
CILPRNF145RNASE3BAIAP2L1MEIS1SUN2LGSNOAFDACT2FKBP5
SLC16A6PPARADBX2HOOK1SOX17ABRNKX2-1TIMM44ASRGL1SERINC3
SLC16A6SLC9A6S100BNUCKS1SOX17DNAJC7NKX2-1ECSITC2orf88TAOK3
TBC1D9EIF3LGAP43WFS1CTU1LRP11TBX4LARP6HOXB6TACC1
CILPETS2GFAPJUPSOX17GGNBP2MUC21PLCD3HOXB6SETD7
EFHD1BIN1DBX2NT5DC1HTR3ASTK39BMP5PPT2TEKT2ADCY9
ST8SIA6PRKCAGFAPMYO1CHTR3AELL2LPCAT1HNRNPCDACT2CREBL2
LRRC15PFKPGFAPSTAT6KLK7ABRCCNJLERFARMC3NEDD4
SERTAD4UBE2E3PMP2PERPHTR3ANIPA1BMP5LDLRAD3C2orf88RNF13
ST8SIA6FZD5PMP2MYO1DMAMSTRPPP3CCBMP5PLEKHH1ASRGL1USP22
SERTAD4PITPNM1POU3F2GTF3C4IMPG2GALK2BPIFA1LARP6DACT2SETD7
ST8SIA6STARD4PMP2LTBRUPK3BELL2XKRXPLEKHH1CCDC157ADCY9
LRRC15PITPNM1MT3STAT6MEIS1DNAJC3TREM1ERFCCDC157NEDD4
LRRC15GSTP1MT3MYO1CLRRTM1NIPA1BPIFA1SYNGR1HOXB8SETD7
SCUBE2CKBRFX4BAIAP2L1CTU1GALK2MUC21KAZNASRGL1TM9SF4
TFAP2BBMP2MLC1JUPUPK3BTOM1TBX4PPT2FOXJ1MFSD1
GFRA1BIN1HEPACAMFAM83HKLK7AHRLPCAT1PTMSCCDC114RAB8B
TFAP2BBIN1MLC1SPINT2KLK7CALCOCO2MUC21SOX9CCDC114FKBP5
STC2PFKPHEPACAMKRT8MAMSTRGALK2MBIPGTF2F1HOXB8MFSD1
GFRA1HADHAMLC1LTBRLRRTM1USP47SCGB3A1KAZNCCDC114MAF
TFAP2BCAPN5AQP4DDX5LRRTM1CAMK2DMBIPITPK1FOXJ1FKBP5
TBC1D9LAMC1SCRG1LTBRUPK3BDNAJC7PIP5KL1KAZNELP3TM9SF4
PLEKHF2UBE2E3FOXG1KRT8FGF18HARS2MBIPERFFOXJ1ELL2
GFRA1CKBAQP4SPINT2FGF18CAMK2DPIP5KL1LDLRAD3HOXB8FBXL3
PLEKHF2PFKPSCRG1STAT6FGF18TMEM181SCGB3A1LDLRAD3C20orf85ELL2
ESR1LAMC1FOXG1BRD3RPL17CALCOCO2COL6A5PSPNC20orf85MAF
ESR1EIF3LFOXG1KRT18CLDN16CAMK2DXKRXTLN2C20orf85TBC1D2B
SCUBE2ID1SCRG1SPINT2CLDN16RETSATSCGB3A1SOX9ELP3MFSD1
STC2LAMC1ST8SIA5NT5DC1IMPG2SLC38A9C16orf89TLN2CCDC33CHRNE
SCUBE2ETFBAQP4KRT18CTCFLAHRRCXCL17ITPK1CCDC33ARHGEF33
AZGP1ETFBBAALCB4GALT1CLDN16USP47C16orf89GTF2F1CCDC33ZNF519
AZGP1ESRRAST8SIA5KRT18RPL17DNAJC3C16orf89STK11TEKT4WDR44
AZGP1CAPN5BAALCCNDP2IMPG2CHIC1CXCL17GTF2F1TEKT4RAB8B
STC2PITPNM1KCNIP1KRT8CTCFLTNFRSF4CXCL17WIZTEKT4TUBD1
DCAF10SSBP1KCNIP1PERPMAMSTRHARS2COL6A5IL24WFDC2USP22
KIRCHNSCLGGTHCALUSC
KIRC_1KIRC_2HNSC_1HNSC_2LGG_1LGG_2THCA_1THCA_2LUSC_1LUSC_2
TLR3SMARCD2ALOXE3ACP6KCNJ10ANXA2TGRPN2SFTPA1SORBS2
ENPP3FASNSDR9C7SVIPKCNJ10CLIC1TGPRKCSHEGFL6CXXC5
TLR3RBM15BHEPHL1ACP6KCNJ10MYL12BTGYWHAGSFTPA1ME3
SEMA5BFASNSDR9C7ACP6KCNJ9PDLIM1TPOPYCR1ABCA13KIF13B
GAL3ST1GIPC1HEPHL1DDAH1CDH20OSTCTPOTMEM97SFTPA1FHIT
SEMA5BSCAPHEPHL1ADCY6GPR37L1TAGLN2TPOMETTL8ABCA13CXXC5
TLR3FOXK2SDR9C7ICA1IL17DOSTCCRYGNRACGAP1ABCA13MAGI1
ENPP3SMARCD2ALOXE3SVIPOLIG2CLIC1CRYGNTMEM97RASSF9CXXC5
GAL3ST1SCAPKRTDAPSVIPOLIG1ANXA2CRYGNNUSAP1ABCC5PEBP1
ENPP3HMGA1KRTDAPFN3KKCNJ9TEAD3IYDSCDRASSF9ALDH7A1
GAL3ST1RANGAP1KRTDAPFARP1APC2TAGLN2DAPK2SCDRASSF9CRIP2
ESM1SEC13FAM25AFN3KPSD2OSTCMUC15SCDTP63CST3
SEC14L6HMGB3ALOXE3ICA1CDH20PPCSIYDIRAKITP63PEBP1
ESM1FASNSLC10A6ICA1PSD2MYL12ADAPK2IDH2ADH7KIF13B
SEMA5BRANGAP1FAM25APPP1R9AOLIG2ANXA2DAPK2MPZL1ADH7OASL
MTCP1ARHGAP39SBSNFARP1OLIG1TAGLN2IYDIDH2EGFL6CRIP2
ESM1SCAPIL36GFN3KCDH20PDLIM1MUC15IRAKIADH7ALDH7A1
CLEC18BARHGAP39IL36GHNMTCACNG7CLIC1HHEXIRAKIADAM23CRIP2
SLC5A10BDH1CNFNPKN1KCNJ9F11RTCERG1LSLC6A8GPR87CAMK2N1
ENPEPSEC13RNF222PPP1R9APSD2S100A11HHEXIDH2TP63THRAP3
CUBNGIPC1PLA2G4EHNMTMMD2TEAD3INPP5JPAICSFBXO27ALDH7A1
ENPEPRANGAP1IL36RNHNMTAPC2MYL12BHHEXPAICSADAM23MGRN1
MTCP1BDH1IL36RNDDAH1ZDHHC22PDLIM1MUC15PAICSNTSPLEKHA6
ALPK2HMGA1SLC10A6ZNF253MMD2SERINC2SRLSLC6A8ADAM23MXD4
SEC14L6HMGA1IL36GZNF253TNRMYL12BSLC26A7SLC6A8GPR87PLEKHA6
CLEC18BBDH1SBSNPKN1OLIG2S100A11INPP5JMPZL1B3GNT5PNPLA2
SLC5A10HMGB3CNFNPATZ1RFX4S100A11LCN12DNPEPABCC5GDI2
ALPK2GNL3IL36RNADCY6IL17DPPCSTCERG1LUCK2FBXO27PNPLA2
CUBNMAVSBNC1IVDMMD2TESTCERG1LFAM189BEGFL6MAGI1
CD70HMGB3SBSNIVDZDHHC22MYO1CMGAT4CARHGEF9NTSPNPLA2
ALPK2FOXK2DSG1DDAH1GPR37L1MYL12AWDR86FAM189BGPR87KIF13B
CUBNSMARCD2CNFNFARP1RFX4MYL12AINPP5JKIAA0930ABCC5CST3
COL23A1GIPC1PLA2G4EADCY6TNRWBP11SLC26A7TMEM97ARTNHDAC11
SLC5A10RRP9FAM25AZNF253ATP6V1G2MYO1CSLC26A7NUSAP1B3GNT5DDAH1
ENPEPCOX7A2LBNC1PKN1DSCAMTEAD3LCN12TTLL12NTSSORBS2
TMEM72ARHGAP39DSG1TCTAIL17DMAP2K3ZCCHC12KIAA0930DSG3CAMK2N1
CLEC18BZMYND19BNC1PATZ1ATP6V1G2TMEM214SRLRACGAP1DSG3MGRN1
SLC5A12ZNRF1PLA2G4EP4HTMZDHHC22PPCSWDR86TTLL12ARTNMAGI1
ZNF395EIF3EKRT75CHKADSCAMMAP2K3SRLEIF4EBP1DSG3RNPEP
COL23A1SEC13DSG1CRELD1ATP6V1G2ZDHHC5C2orf40TMCO3DCUN1D1MGRN1
ASPAFOXK2SPRR2DP4HTMCMTM5LTBRZBED2TTLL12ARTNSDSL
SLC5A12ZMYND19DSG3GNB1DSCAMMYO1CLCN12KIAA0930GCLCCST3
SLC5A12DOLPP1SLC10A6PPP1R9ATNRTMEM214C2orf40DNPEPPTHLHCAMK2N1
SLC22A2DOLPP1FAM83CP4HTMCMTM5MAP2K3S100A5NUSAP1GCLCPEBP1
TMEM72PWWP2BSPRR2DPATZ1CACNG7TMEM214WDR86MRPL16PTHLHRPS27L
ASPAKIF22KRT75TMEM8BRFX4VAMP8TMEM233EIF4EBP1KRT74RAPSN
SLC22A2ZMYND19NIPAL4SCCPDHCACNG7PRKAG1NKX2-1YWHAGFBXO27HDAC11
SLC22A2RRP9SPRR2DMGAT4ACMTM5VAMP8C2orf40FAM189BB3GNT5RPS27L
SEC14L6ZNRF1FGFBP1GORASP2OLIG1STAT6ZCCHC12YWHAGPTHLHPGPEP1
COL23A1DYNLRB1KRT75PGPEP1CRB1LTBRZCCHC12ATAD1WDR53HDAC11
CD70RRP9FGFBP1IVDCRB1TBCCD1SLC26A4TMCO3SOSTKIF9
SLC17A3KIF22SPRR1BGORASP2GPR37L1JUPSLC26A4PYCR1SOSTBTD
TMEM174PACRGKRT16GNB1PMP2WBP11ZBED2MPZL1SOSTSDSL
ASPAGNL3FGFBP1THRAP3SHISA7LTBRZBED2DNPEPTBCCD1RUFY1
CD70KIF22DSG3CHCHD2CRB1TESS100A5NUDT2ACTL6ATHRAP3
TMEM174CCDC151IVLSCCPDHPMP2CDC25BTMEM233RACGAP1DCUN1D1MXD4
SLC6A13RBM15BDSG3THRAP3PMP2STAT6CITED1TMCO3LSG1WIPI2
SLC17A3RBM15BFAM83CSCCPDHNCANJUPRXRGUCK2LSG1THRAP3
SLC17A3GNL3SPRR1BCHCHD2SHISA7FBXL15RXRGMRPL16TBCCD1RPS27L
SLC6A13ZNRF1SPRR1BTHRAP3NCANCDC25BCITED1PYCR1GCLCGDI2
TMEM72PYCR1TGM1TCTAGFAPJUPTMEM233UCK2KRT74ADAM11
MTCP1PWWP2BFAM83CCHKALRRTM3TESSLC26A4UCHL5DCUN1D1DDAH1
TMEM174CNKSR1GSDMCCHKAGFAPSTAT6RXRGUCHL5PARLWIPI2
NAT8TXN2TGM1TMEM39ANCANZDHHC5NKX2-1VWA1WDR53BTD
SLC6A13ZADH2GSDMCMGAT4AGFAPB4GALT1MGAT4CATAD1PARLMXD4
SLC3A1COX7A2LTGM1CRELD1SHISA7SERINC2CITED1MRPL16ACTL6ARPS10
SLC3A1MAVSGSDMCTCTAPCDH15SERINC2GABRB2ARHGEF9TBCCD1PGPEP1
SLC3A1DYNLRB1NIPAL4CRELD1LRRTM3F11RMGAT4CUCHL5WDR53PLEKHA6
NAT8FGFRL1KRT16CHCHD2APC2WBP11S100A5EIF4EBP1KRT74KIF9
NAT8COX7A2LIVLGORASP2PCDH15B4GALT1NKX2-1SP3ACTL6ARNPEP
PRADSKCMCOADSTADBLCA
PRAD_1PRAD_2SKCM_1SKCM_2COAD_1COAD_2STAD_1STAD_2BLCA_1BLCA_2
NKX3-1TAGLN2MLANATOR1AIP1NOX1ZNF362ZFPM1B3GAT3UPK2ALDH7A1
KLK3TAGLN2MLANAVOPP1CDX2PACS1ZFPM1CD2BP2UPK2NEO1
KLK3LASP1MLANAMYO6NOX1TCEA2ZFPM1URODUPK2ST6GAL1
SLC45A3LASP1PAX3PBX1NOX1BCAMZBTB7APRDX5PLA2G2FALDH2
NKX3-1LASP1SLC45A2DDAH1CDX1PACS1GATA4URODUPK1AST6GAL1
KLK3INTS1PMELTMSB4XCDX2TRIM56ZBTB7AMRFAP1UPK1AHIPK2
ACPPTAGLN2DCTMYO6CDX2ZC3H3GATA6TMEM9UPK1ASH3BP4
ACPPOGDHTRPM1DDAH1GPA33PACS1GATA4TMEM9PLA2G2FSTXBP1
ACPPINTS1TRPM1PBX1GPA33ZC3H3GATA4DNAJB2VGLL1NFIX
SLC45A3YWHAHTRPM1RAB3IPCCL24C20orf194GNL3LTSR2PLA2G2FCERK
NKX3-1KIAA0100PMELPTPRFGPA33CLUGATA6UBXN6SNX31CERK
SLC45A3OGDHPAX3NFYBCDX1BCAMZBTB7ARNF187VGLL1ST6GAL1
CHRNA2TNFAIP8L1DCTVOPP1CDX1ZC3H3ZBTB20RNF215PPARGALDH2
KLK4OGDHPAX3VOPP1CCL24TCEA2GATA6CD2BP2SNX31SH3BP4
CHRNA2OSBPL3DCTPBX1CDH17CLUGNL3LFN3KRPSNX31OAT
OR51E2OSBPL3SLC45A2PAWRCCL24KRBA1GNL3LADPRHL2VGLL1OAT
CHRNA2CITSLC45A2NET1MEPIASMARCA1CLDN18CIRBPPM20D1PARD3B
KLK4YWHAHPMELNET1GUCY2CBCAMCLDN18PRDX5UPK3AIQGAP2
OR51E2TNFAIP8L1C10orf90DDAH1EPS8L3CLUCLDN18DNAJB2PM20D1STXBP1
KLK4LAPTM4BC10orf90MAGI1GUCY2CNR3C1ZBTB20TMED1ACER2PTPRJ
OR51E2CITC10orf90RAB3IPGUCY2CMYH10NKX6-3TMED1BTBD16CERK
SLC30A4ANP32EALX1RAB3IPMEPIATCEA2ZBTB20MYL6BUPK3ACOBL
HOXB13YWHAHALX1SGMS1MEPIAABHD8NKX6-3RNF215UPK3BNFIX
SLC30A4FAM49BC19orf71SGMS1CDH17OST4CCDC68HSDL1BTBD16COBL
HOXB13INTS1ALX1NFYBPHGR1BCL6ONECUT2DNAJB2BTBD16RAPGEF5
HOXB13KIAA0100TYRP1SLC38A1PHGR1ZNF362NKX6-3HSDL1PM20D1COBL
ANO7SERPINB1FCRLASLC38A1PHGR1PTPRSCCDC68COQ5ACER2HIPK2
SLC30A4LAPTM4BTYRP1PTPRFCDH17TRIM56ONECUT2URODUPK3BNFIC
ANO7S100A16TYRP1TJP2MYO1ABCL6PABPC3TMED1GRHL3KLF13
TRPV6S100A16TRIM63OCIAD2NR1I2NR3C1ONECUT2TMEM9SNCGNFIC
ANO7CDC25BCAPN3SGMS1NR1I2SMARCA1CCDC68MYL6BACER2AGAP1
BEND4CITC19orf71MAGI1MYO1ANR3C1ONECUT3B3GAT3UPK3ANFIX
FOLH1LAPTM4BCAPN3MAGI1ATOH1KRBA1MUC13PRDX5ACOXLIQGAP2
BEND4OSBPL3TRIM63TJP2ATOH1LDOC1ONECUT3MYL6BGDPD3ALDH2
TMEFF2FSCN1IRF4PTPRFDPEP1OBSL1ONECUT3ADPRHL2UPK3BSH3BP4
BEND4FSCN1TSPAN10TJP2PPP1R14DBCL6C6orf222APOBRACOXLPTPRJ
NWD1ANP32ETRIM63PAWRMYO1ASMARCA1TFF2ING4PPARGKLF13
NWD1ARHGEF2IRF4SPINT2ISXTMEM25REG4B3GAT3IL9RSYBU
CHRM1FSCN1CAPN3MYO6BCL2L14AMOTL1REG4RNF215NIPAL4KLF13
TRPV6CTSCIRF4SLC38A1ASCL2PTPRSREG4ING4ACOXLSTXBP1
FOLH1S100A16TSPAN10NET1BCL2L14C20orf194CTSEMRFAP1IL9RIQGAP2
FOLH1ANP32EFOXD3NFYBSLC26A3C20orf194CTSECIRBPPPARGGSE1
CHRM1CERKFCRLAPTPRKATOH1TMEM25MUC5ACCD2BP2IL9RRASGEF1B
ADRB1C1GALT1ENTHD1RBM47SLC26A3TMEM25VSIG1ZMAT2OR13A1SYBU
TMEFF2ARHGEF2TSPAN10PSD4ISXLDOC1TFF2COQ5PSCANFIC
ZNF613AGPSMMP8EPCAMDPEP1MYH10TFF2HSDL1GRHL3OAT
TRPV6ARHGEF2ENTHD1CDS1BCL2L14PTPRSMUC5ACADPRHL2SNCGPHC2
ZNF613CDC25BFCRLAPSD4ASCL2EVLMUC5ACGMPR2FCRLBPTPRJ
OR51E1TNFAIP8L1MMP8CDS1SLC26A3KRBA1MUC13UBXN6SNCGPBXIP1
CHRM1CDC25BEXTL1PTPRKISXABHD8CTSEUBXN6GDPD3GSE1
LMAN1LRELTGPR143OCIAD2GPR35RDXVSIG1ING4GDPD3HIPK2
ZNF613DERASNCAPFN2EPS8L3TRIM56C6orf222COQ5PSCASLC25A23
ADRB1RHBDF2FOXD3SPINT2NR1I2RDXPDX1APOBRFCRLBRASGEF1B
ADRB1DERAENTHD1PAWRGPR35EVLMUC13CIRBPOR13A1RASGEF1B
STEAP2KIAA0100MMP8RBM47PPP1R14DEVLVSIG1TSR2PSCAGSE1
NWD1AGPSGPR143USP39PPP1R14DRDXC6orf222TSR2SYT8SLC25A23
MSMBCTSCGPR143SPINT2ASCL2ZNF362TM4SF20APOBRNIPAL4RAPGEF5
OR51E1RHBDF2EXTL1BTBD1GPR35OBSL1TM4SF20F10FCRLBALDH7A1
MSMBSERPINB1MMP17USP39KRT20AMOTL1PGCMRFAP1TMEM40RAPGEF5
LMAN1LGHRLFOXD3OCIAD2KRT20TUSC3PGCSNX17PADI3UTRN
DNASE2BRELTCA14PTPRKKRT20OBSL1PGCZMAT2SYT8ALDH7A1
OR51E1AGPSSNCABTBD1DPEP1BNIP3TM4SF20SLC25A34SYT8UTRN
MSMBKPNA2EXTL1USP39FAM3DAMOTL1PDX1UCK1PADI3ATXN1
TMEFF2CLCN6CA14CFL2VIL1OST4GJD3SLC25A34NIPAL4ATXN1
POTEHGHRLMMP17BTBD1FAM3DMYH10PDX1SLC25A34GRHL3UTRN
DNASE2BST6GALNAC4SNCATOR1AIP1EPS8L3OST4POTEEPAK6TMEM40ATXN1
LMAN1LHS3ST2ABCB5RBM47ATP10BGALNT1GJD3TTLL10UPK1BSLC25A23
STEAP2TXLNACA14TOR1AIP1ATP10BFMRIGJD3PAK6PADI3SMARCA5
POTEHRELTMMP17CCDC12FAM3DTUSC3POTEETTLL10UPK1BNEO1
STEAP2LIMA1ABCB5PSD4ATP10BAKIRIN1POTEELRRC8ETNNI2SYBU
LIHCCESCKIRPSARCESCA
LIHC_1LIHC_2CESC_1CESC_2KIRP_1KIRP_2SARC_1SARC_2ESCA_1ESCA_2
C8BIGF1RARHGEF33ZNF608LRRN4EMP2TWIST2ERBB3ANKRD11CD63
SERPINC1FAR1SYCP2INSRKCPNOTCH3TWIST2DSPZBTB7AAPH1A
C8BFAR1ARHGEF33ZNF773LRRN4TP53I11TWIST2FAM83HANKRD11CD81
SERPINC1EXOC1SYCP2TBC1D16SMTNL2TP53I11C1QTNF2RAB11FIP4ZBTB7APEBP1
ASGR2MAPRE1KCNS1PTPRMLRRN4NOTCH3FAM180AERBB3ZBTB7APPIB
C8BCTBP2CDKN2AGRINATPK1UAP1RAB23TPD52EIF3CNUDT16L1
SERPINC1SLC25A36ARHGEF33ZC4H2PKHD1NOTCH3IL17BCAMSAP3RC3H1UFC1
APOC3IQGAP1SYCP2CREB3L2LYG1TP53I11FAM180AWWC1FBRSL1PEBP1
ASGR1HK1KCNS1PKIGSMTNL2EMP2CCDC36CAMSAP3FBRSL1APH1A
KNG1HK1ZNF541PTPRMSMTNL2MFGE8CDK15ERBB3GNL3LTSR2
CPB2HK1KCNS1PTPRGTPK1ZDHHC20C1QTNF2PRKCZFBXL18NUDT16L1
C8ASLC25A12RIBC2PKIGMYL3DPYSL3SHOX2TPD52RC3H1ANP32A
AGXTFAR1EPHX3CCND1TPK1EMP2CDK15CAMSAP3GNL3LTEX264
AGXTSLC25A36ZNF541MOCS1LYG1NEURL1BC1QTNF2FAM84BEIF3CING4
ASGR1TBC1D10BRIBC2ZBTB10PTH1RMFGE8FAM180ARAB11FIP4RC3H1TMEM9
ASGR2PLEKHB2ZNF541TMEM150AMYL3COL5A3TWIST1TPD52ANKRD11MRFAP1
AGXTABRRIBC2PTPRMEMX1NEURL1BMRGPRFLSRFBRSL1PPIB
HAO1ZNF827SOX30ZNF608ENAMCOL5A3CDK15MARVELD2HCFC1CD81
ASGR1ABRC19orf57TBC1D16MYL3LTBP1IL17BMARVELD2NRARPANP32A
ITIH3IQGAP1SERPINB3CCND1KCPMFGE8TWIST1F11RMAPK6APH1A
C8AZNF827HMSDZNF608EMX1UAP1CCDC36MARVELD2MAPK6PPIB
APOC3PLEKHB2HMSDZC4H2KCPMARCKSL1TWIST1DSPEIF3CSTK16
APOC3CHD3TAF7LZNF773ENAMNEURL1BTBXA2RFAM84BNRARPARF5
APOA5ZNF827SOX30ZC4H2SYPL2UAP1CCDC36PRKCZGNL3LPDHB
F2IQGAP1PRDM15TBC1D16DYNC2LI1AZIN1TNFAIP8L3WWC1HCFC1CD63
F2ARF3HMSDZNF773DYNC2LI1SAE1TNFAIP8L3FAM84BFBXL18ING4
ASGR2SLC44A2C19orf57PKIGPTH1RDPYSL3IL17BHOOK1KLHL11TMED1
F2PLEKHB2TAF7LZNF43ENAMLDLRMRGPRFSPINT2MAPK6MRFAP1
HRGIGF1RC19orf57FERMT2COQ9SERP1EBF3DSPFBXL18STK16
HRGSLC25A36TAF7LMOCS1EMX1PCDH1MRGPRFF11RPABPC3TMED1
ITIH2CLSTN1EPHX3CREB3L2SYPL2PCDH1TBXA2RWWC1RBM15TSR2
KNG1IGF1RIL20RBCCND1LYG1LTBP1ADAM33MYH14ATAD5ING4
CPB2CTBP2CENPKPTPRGCYS1SERP1EBF3FAM83HCLSPNTSR2
KNG1METTL9CDC7INSRPTH1RSAE1ADAM33LSRNRARPCD2BP2
CPB2METTL9WDR76INSRSULT1C4AZIN1EBF3PRKCZKLHL11GPANK1
APOHCLSTN1RFC4AP2B1HOGA1SERINC5MFAP4PTPRFZFPM1NUDT16L1
C8GABRMEI1FERMT2HOGA1SAE1ADAM33SPINT2RBM15PEX11B
ITIH3CLSTN1SERPINB3GRINADYNC2LI1SERP1SHOX2CXADRHCFC1ILF3
ITIH2CCNIEPHX3SNX19SLC13A1COL5A3TNFAIP8L3RAB11FIP4CLSPNELOF1
ITIH2ARF3SOX30PARD3BSULT1C4DPYSL3SCARA5MYH14RBM15UROD
ITIH3CHD3LY6KCREB3L2SYPL2SERINC5RAB23PTPRFZFPM1PEX11B
APOHMAPRE1MEI1TNS3HOGA1PCDH1LGI2LSRATAD5UROD
AMBPCCNISERPINB3MTPNPKHD1AZIN1SHOX2MYH14FAM83BZMAT2
APOHARF3MEI1SIAECYS1MARCKSL1PTGFRMAL2CLSPNUROD
HAO1SLC25A12IL20RBGRINASLC13A1LTBP1HSPB6PTPRFZFPM1STK16
SERPINA10METTL9PSMC3IPTMEM150ACYS1BAZ2ALGI2CXADRATAD5PEX11B
HRGCTBP2LY6KZBTB10SULT1C4MARCKSL1LGI2SPINT2FAM83BDNAJB2
SERPINA10CHMP3CDC7PTPRGPKHD1BAZ2ASCARA5MAL2FAM83BANP32A
C8GSLC44A2WDR76SNX19SLC17A1SERINC5PTGFRCXADRRELPDHB
C8GDCTN5CDKN2ATNS3SLC13A1PRRX1PTGFRCDH1RELTEX264
SERPINA10PRKRAGPR87SNX19SLC17A1LDLRRAB23RNF11RELTMEM9
APOC2SLC25A12LY6KFERMT2SLC17A1SLC22A23PTX3MAL2PABPC3GPANK1
C8AMTMR2CDKN2AAP2B1SLCO4C1ZDHHC20TBXA2RCDH1TMPPETMED1
AHSGDCTN5WDR76TNS3PAX2ZDHHC20SCARA5FAM83HMXD1ARF5
APOA2CCNICENPKSIAESLCO4C1BAZ2AEBF1F11RMXD1MRFAP1
AHSGCHD3CENPKZBTB10MIOXTSPAN13EBF1CTSOMXD1PARK7
AHSGSLC44A2IL20RBAP2B1SLC3A1LDLRPTX3HOOK1GJD3SOWAHA
HAO1MTMR2S1PR5SIAESLCO4C1TSPAN13PTX3CDH1TMPPEGPANK1
APOC2MTMR2GPR87MARVELD1PAX2TSPAN13SYDE1KRT18GJD3ATRIP
APOA5C6orf203PSMC3IPMOCS1MIOXSQLEHSPA12BDDX54PABPC3SLC11A1
APOA5EFCAB2KLHDC7BTMEM150AMIOXINTS7HSPB6KRT18GJD3NLRP14
APOA2MAPRE1KLHDC7BMARVELD1PAX2BCL6EBF1RNF11POTEEATRIP
APOC2PRKRACDC7CRY1CDH16PIH1D1HSPA12BUBN1KLHL11ZFYVE28
VTNWBP2GPR87PRKCDSLC3A1SQLEMFAP4KRT18POTEESOWAHA
APOA2DCTN5KLHDC7BPRKCDSLC3A1PIH1D1HSPA12BMAP3K7PLECCD63
AMBPWBP2S1PR5ZNF43CDH16SQLEMFAP4KRT8POTEEWNT16
ALBWBP2PSMC3IPEPDR1CDH16MTHFD2SYDE1KRT8PLECCD81
VTNCHMP3S1PR5EPDR1GLYATBCL6HSPB6KRT8PLECPEBP1
VTNPRMT2RFC4FOXJ3GLYATSLC22A23SYDE1SPINT1TMPPEZFYVE28
AMBPPRMT2CENPWPRKCDGLYATITGALKANK2SPINT1C11orf91NLRP14
PAADPCPGREADTCGTTHYM_1
PAAD_1PAAD_2PCPG_1PCPG_2READ_1READ_2TGCT_1TGCT_2THYM_1THYM_2
GCGFOXRED2CHRNA3YBX1LY6G6DSNX24VRTNMFSD6PAX1DSTN
GCGORC3SLC18A1TMEM63ACDX2DTX3LLIN28AEFNA1PRSS16NCKAP1
GCGMCUR1CHRNA3SERBP1CDX2NFICLIN28ACHMP3PRSS16DSTN
CPA1FOXRED2PHOX2ALSRLY6G6DKRBA1VRTNTICAM1PAX1NCKAP1
CPA1MCUR1CHRNA3IDH2LY6G6DKCTD1LIN28AELOVL1FOXN1DHCR24
CPA1TMEM69THERBB2NOX1GPD2VRTNMBNL2PRSS16CALU
G6PC2KCNAB1THYBX1NOX1SS18DPPA4EXOC3PAX1CALU
CLPSMMACHCTHANXA11NOX1STOMDPPA4KLHDC10RAG1ZDHHC9
CLPSSUV39H2PHOX2ANOTCH2CDX2STOMTRIM71IRF2BP2CHRM4CAMK2N1
CLPSRFC5DBHKIF1CCCL24RNF144BTRIM71PGRMC1GRAP2DHCR24
G6PC2L2HGDHDRD2IDH2GPA33NFICDPPA4COMTCCR9CAMK2N1
CPA2FOXRED2DBHIDH2CCL24C20orf194GDF3TICAM1SLC46A2EPS8
CASRL2HGDHDBHZFP36L1GPR35NFICGDF3AIG1RAG1DHCR24
G6PC2SUV39H2HAND2YBX1GPA33STOMGDF3EFNA1FOXN1NCKAP1
CPA2RFC5SLC18A1TRAF4AIFM3EVLTRIM71PHC2RAG1SLC31A1
CASRCLPBPHOX2AERBB2GPA33DTX3LPOU5F1TMEM59PTCRAPCDH1
CASRCELSR2SLC18A1PTGFRNCCL24KCTD1POU5F1DAZAP2PTCRASOX13
CPA2TMEM69HAND2ZFP36L1AIFM3BCL6POU5F1CASTFOXN1BAG3
CHST4RFC5MAB21L1NOTCH2RXFP4KRBA1FOXH1EFNA1LATCAMK2N1
PNLIPRP2ARMC6DRD2PTGFRNSLC26A3NR3C1TRIML2KDSRSLC46A2PCDH1
PLA2G1BPCCBMAB21L1RESTCDX1SS18TRIML2TICAM1PTCRAZDHHC9
CHST4MMACHCMAB21L1TRAF4ASCL2SS18TRIML2FBXO3GRAP2ZDHHC9
PLA2G1BATPAF1DGKKNOTCH2PPP1R14DNR3C1ZSCAN10AIG1GRAP2BAG3
PNLIPRP2PCCBPENKZFP36L1SLC26A3KCTD1VENTXPPA2CCR9SOX13
PNLIPRP2TMEM209HAND2SERBP1PPP1R14DBCL6FOXH1MBNL2CHRM4EPS8
PLA2G1BBTBD6TLX2RCC1ISXNR3C1VENTXCHMP3CD3DEFHD2
CHST4CLPBTLX2TMEM63ASLC26A3RAB12L1TD1CASTUBASH3ABAG3
CUZD1CLPBTLX2RESTCDX1PTPRSL1TD1TMEM59CCR9MANSC1
CUZD1TMEM209DRD2ERBB2ISXSMARCA1ZFP42ELOVL1APOBEC2PCDH1
SLC30A8CELSR2INSM2LRRC1CDX1SART1SLC2A14AIG1MEIG1MANSC1
CUZD1ORC3DRGXRCC1PPP1R14DTANC2VENTXELOVL1TRAT1FAM114A1
SCTRSOX12DRGXRPS6KA1MEPIAWWTR1FOXH1MFSD6CD3DJTB
FOXL1BTBD6DRGXNEK6MEPIABCL6HYAL4MFSD6ZAP70EFHD2
SCTRBTBD6SLC18A2VAMP8GUCY2CWWTR1SLC2A14KLHDC10SH2D1APLBD2
GPBAR1SUV39H2SLC18A2NEK6ASCL2EVLZFP42PTPRKSLC46A2MANSC1
SCTRMCUR1NEUROD4LRRC1MEP1AEVLZFP42MBNL2SH2D1ACALU
SFRP5CELSR2SLC18A2TMEM63AAIFM3RAB12ZSCAN10PTPRKCCL25DSTN
GPBAR1MMACHCTBX20LRRC1MYO1AWWTR1L1TD1DAZAP2SH2D1ADUSP3
SFRP5SOX12DGKKTRAF4GUCY2CRDXSLC2A14ZADH2CD3GADAM9
FOXL1PPIL1INSM2NEK6DPEP1MYH10HYAL4ZADH2UBASH3ACDC42EP1
TFF2PPIL1PENKSERBP1ISXC20orf194HYAL4FBXO3CD3GPTK2
SLC30A8SOX12CHGBB2MR3HDMLKRBA1ZSCAN10ZADH2CHRM4SOX13
SFRP5ATPAF1DGKKTSPAN6ASCL2SART1DPPA2PTPRKUBASH3AFAM114A1
TFF2TMEM69NEUROD4TSPAN6DPEP1ECH1SLC7A3NFICSIT1CDC42EP1
FOXL1ARMC6NEUROD4RCC1GUCY2CCDC23SLC7A3KLHDC10APOBEC2CDC42EP1
TFF2PCCBFAM163AANXA11CDH17ZFP36SLC7A3KDSRSIT1B4GALT2
SLC30A8TMEM209HAND1CDH1NR1I2SMARCA1NODALSETD7ZAP70PLBD2
GLP2RL2HGDHRTL1YAP1PHGR1PTPRSNANOS3EXOC3CD3GDUSP3
REG1BCSE1LRTL1TGIF1PHGR1RNF144BNANOS3PPA2CD247PLBD2
REG1BGLO1PENKSF3B2PHGR1RAB12NANOS3CHMP3ZAP70JTB
REG1BMTCH2VWA5B2ANXA11DPEP1RDXCLEC4DSETD7SLAMF1DUSP3
TM4SF4ATPAF1RTL1LSRCDH17DTX3LNLRP9SETD7TRAT1SLC31A1
CFC1GNMTTBX20STXBP2CDH17ECH1OOEPFBXO3CCL25ERBB3
TM4SF4ARMC6SLC6A2LSRGUCA2ATMEM25NLRP9LRRCC1CD247CD276
TM4SF4TRUB2SLC6A2VAMP8RXFP4CLIP4NLRP9PPA2APOBEC2FAM114A1
ANXA10PPIL1KCNG4STXBP2NR1I2GNB5RNF17KDSRCCL25EFHD2
ANXA10TRUB2HAND1RESTGPR35NAGARNF17PGRMC1CD3DYARS
RBPJLMETTL4INSM2TSPAN6NR1I2RDXDPPA2IL13RA1SLAMF1ADAM9
RBPJLSNRNP25SLC6A2KIF1CMYO1AGNB5RNF17EXOC3TRAT1EPS8
RBPJLPCBD2CHGAB2MMYO1ASMARCA1CLEC4DLRRCC1SLAMF1SLC31A1
ANXA10SNRNP25FAM163AKIF1CEPS8L3ZFP36CLEC4DRPIACD8BCD276
FFAR1GNMTHAND1STXBP2GUCA2AC20orf194DPPA2PGRMC1CD8BJTB
FFAR1KCNAB1TBX20CDH1GUCA2AB3GALNT1NODALLRRCC1CD247PTK2
FFAR1PCBD2KCNG4CDH1FAM3DPTPRSNODALRPIASIT1PRKAR2A
C1orf127PCBD2FAM163APLIN3GPR35MYH10OOEPRPIACD8BPTK2
C1orf127GPN3KCNG4PHF7RXFP4B3GALNT1OOEPIL13RA1LATCCDC142
CFC1PLCXD2VWA5B2DBNLFAM3DZNF532RPL10LRHOFTTC24CCDC142
GLP2RKCNAB1CARTPTYAP1EPS8L3NAGAZNF99IL13RA1TTC24WWC1
GPBAR1GPN3VWA5B2PTGFRNFAM3DGPD2HOXB1RHOFMEIG1WWC1
C1orf127SNRNP25CARTPTRPS6KA1EPS8L3MYH10HOXB1MATN3LATADAM9

[0000]

Gene Pairs For UCEC Sub-Types
Solid TissueSolid Tissue
Normal_1Normal_2Endometrioid_1Endometrioid_2Serous_1Serous_2
RERGMKI67FOXA2MAGEH1L1CAMCDKN1A
RERGTMEM132AKIAA1324NPR1L1CAMMOB3A
SLC22A3MYBL2SPDEFNPR1L1CAMNFIC
PLSCR4ZDHHC16SPDEFHIF3ACLDN6CDKN1A
PLSCR4NUP43FOXA2HIF3ACLDN6MOB3A
TCF23MYBL2FOXA2PNMA3CLDN6NFIC
MAMDC2MYBL2NANSNPR1GRB7CDKN1A
GATA6TK1SPDEFMAGEH1GRB7MOB3A
PLSCR4FTSJ1MYBL2L1CAMPNMA3IL20RA
RSPO1MKI67BSPRYL1CAMMYBL2KIAA1324
BCHEMKI67KIAA1324HIF3ASLC6A12IL20RA
SLC22A3CDC20NANSARHGAP23CDC20KIAA1324
RERGTK1GALNT10ARHGAP23GPRIN2IL20RA
GATA6CDC20CDC20L1CAMUNKKIAA1324
RSPO1CDC20KIAA1324FBXO17GRB7PGR
RSPO1TK1BSPRYSLC6A12PNMA3PGR
GATA6ZDHHC16OSTF1FBXO17SLC6A12PGR
MAGEH1FTSJ1BSPRYFAM110BCTCFLNIPAL1
ASPAEME1MLPHARHGAP23SLC6A12PXK
BCHETBC1D7OSTF1MAGEH1TBC1D7SPDEF

[0000]

Gene Pairs For STAD Sub-Types
Intestinal_1Intestinal_2Diffuse_1Diffuse _2
HOOK1JAM2ABCA8SHPRH
BUB1OGNCHRDL1TNIK
HOOK1CHRDL1OGNVPS37A
HOOK1OGNNGFRLYRM4
FAM136AGYPCJAM2LYRM4
AURKAOGNCHRDL1TRAFD1
BUB1NGFRJAM2STIM2
DSN1JAM2JAM2VPS37A
BUB1JAM2NGFRSHPRH
DSN1SELPCADM3ZNF112
DSN1ABCA8SRPXSTIM2
PIGUGYPCABCA8LYRM4
RAE1BOCCHRDL1VPS37A
AURKANGFROGNTRAFD1
UBE2CGYPCPKNOX2ZNF112

[0000]

Gene Pairs For PADD Sub-Types
LowPurity_1LowPurity_2basal_1basal_2classical_1classical_2
RHOJEFNA4BCAR3BTG2LRRC66LDLRAD3
JAM2SAMD10GPR87FRZBIHHDSE
PREX1PTK6COX6B2NOSTRINLRRC66TTC7B
FBLN5MANBALFBXL2FRZBZFPM1RDX
CYYR1EFNA4COX6B2FMO5IHHCAMK1D
ERGEFNA4BEAN1NOSTRINSPIRE2CHST11
FBLN5ICA1METCAPRIN1FMO5PTPRS
CXCL12KRTCAP3GPR87NOSTRINFMO5MYO5A
ST8SIA4SAMD10RYKBTG2TM4SF5CAMK1D
BCL2SAMD10GPR87FMO5C9orf152CITED2
SAMHD1MST1RCOX6B2BLNKTM4SF5PTPRS
FBLN5ELMO3NT5EBTG2C9orf152PTPRS
SAMHD1B3GNT3BCAR3TMEM98IHHMYO5A
MPP1SPIRE2BEAN1KALRNTM4SF5MCC
JAM2NXT1FBXL2RAI2C9orf152PHLDB2
BCL2PORCNFBXL2PDX1SPIRE2FMNL1
PRCPOCIAD2ANXA8ARHGAP24AGR3EVL
PRCPSSH3ANXA8RAI2SPIRE2RDX
PRCPB3GNT3SIX4CHN2ZFPM1FMNL1
GNG2NXT1NT5ETMEM98LRRC66SACS
GIMAP4IGSF9BEAN1PDX1ZFPM1CHST3
RASSF2ADAP1ANXA8BLNKANKS4BCAMK1D
ADPRHC1DTNNT1EXOC6AGR3RDX
CELF2PITX1ARNTL2MAPRE2AGR3DENND5A
BCL2C1DPORCNKALRNFMO5PHLDB2
JAM2IGSF9BCAR3MAPRE2FOXA3EFEMP1
SAMHD1OCIAD2TNNT1KALRNTRIM15PHLDB2
CYYR1IGSF9PORCNC1orf115FOXA3NDST1
METTL7ATSPAN15ADAMTSL5FMO5TRIM15CHST3
ST8SIA4C1DSIX4ASRGL1NPAS1P2RY6
GIMAP4PITX1PTK6ATP2A3ICA1ELL2
CD8AADAMTSL5PORCNARL15KALRNEVL
CD8ACENPEPLXNA1CTSSADAP1DNAJC13
CERKLCENPEPLXNA1ATP2A3CRB3NIN
ST8SIA4PORCNFSCN1ATP2A3ANKS4BDYSF
ERGNXT1TNNT1PDX1ADAP1EVL
RASSF2PTK6SIX4ARHGAP24USH1CCNN3
CXCL12SH3RF1C16orf74CEBPAADAP1CHST11
CXCL12PREBMETCTSSLRCH1DENND5A
PREX1ICA1FAM83AMETTL7AKALRNNIN
RHOJSPIRE2ARNTL2IQGAP2BDH1DYSF
AOAHADAMTSL5PTK6EPS8L3USH1CETS1
GAB3ADAMTSL5C16orf74ASRGL1APOBEC1P2RY6
MPP1PITX1SNCGLPAR6TRIM15DYSF
PREX1ADAP1SNCGC1orf115FOXA3FMNL1
CD8ACHEK2PTK6IQGAP2EPS8L3ETS1
EVLPREBSNCGARL15SLC45A3NDST1
GIMAP6CENPVPRRC1METTL7ATJP3ETS1
GIMAP4VAMP4FAM3CMETTL7ACYP251CNN3
GIMAP8RBFAITGA3R13516ITPKASLC37A2

[0000]

Gene Pairs For LUSC Sub-Types
primitive_1primitive_2secretory_1secretory_2basal_1basal_2classical_1classical_2
SBK1MAFBCIITAPIRSERPINB3TXNRD1TMEM116GPSM3
ATAT1IL1RNFMNL1FBXO45HES2MEGF9MRAP2ACSL5
MEX3AMAFBTNFRSF1BSIAH2IL1RNTXNRD1CYP4F3KRT7
CSTF1RIN2TNFRSF1BPOLR2HCXCL1CDK5RAP2TSPAN7FAM107B
SBK1IL1RNTNFRSF1BZNF639SERPINB3EPCAMTMEM116ZFAND2B
SBK1S100A8RFTN1FBXO45FAM83ACDK5RAP2MRAP2PDZD2
FAM184ARAB27BFMNL1MRPL47CXCL1RIT1OSGIN1CXXC5
FAM184ACIITAABI3BPECE2PTPRHFANCCOSGIN1CRIP2
HES6MAFBANXA6ACTL6APTK6MAFGTMEM116CXXC5
HES6S100A8FLI1DENND2CCXCL1ME1ME1PHC2
FAM184AABI3BPSELPLGECE2PTK6CDK5RAP2ADAM23PHC2
TOX3TMEM116ANXA6PCYT1AFABP5STARD7MRAP2TMEM51
VIL1SERPINB3ANXA6GMPSFAM83AGTF3C4MAFGFAM107B
HES6GJB3BIRC3ZNF639GPR153CTNNAL1CYP4F11CRIP2
MEX3APHLDA3ETS1PCYT1AGPR153GTF3C4TSPAN7PMEPA1
SRCIN1ANXA8TGM2PFN2FAM83AMAFGTSPAN7CRIP2
MEX3ATUBB6ABI3BPMOB2FABP5TXNRD1SCN9ACXXC5
TUBB2BRAC2ABI3BPDENND2CSERPINB3ME1SCN9ASLC43A3
VIL1S100A8C1orf162DENND2CCXCL6WASF1SCN9AGPSM3
SRCIN1RAB27BFLI1WDR53S100A8TALDO1CYP4F11PHC2
VIL1ANXA8SLCO2A1PIRGJB3CBX1CYP4F11KRT7
ATAT1RAB27BCIITAMAFGFABP5PGDPIRTRIM8
TUBB2BTNFRSF1BLTBGPX2EPS8L1CTNNAL1ME1PTP4A2
TOX3PDZK1IP1TSPAN4FBXO45HES2GTF3C4OSGIN1TMEM51
ATAT1GJB3BIRC3RIT1HES2MAFGTXNSDC4

[0000]

Gene Pairs For LUAD Sub-Types
prox.-inflam_1prox.-inflam_2TRU_1TRU_2prox.-prolif._1prox.-prolif._2
CD274KIAA1324PLA2G4FNUF2CABYRPER3
BEND6GJB1SCTRCEP55FGL1PER3
TNFSF4GJB1SCTRKIF2CC2CD4DHPGDS
SPHK1C9orf152SCTRKIF4AFGL1TLR2
RGS10RAP1GAPPLA2G4FNEK2FGL1CIITA
PLAUMTUS1PLLPBIRC5CABYRARHGAP20
NTAN1FAM174BPLA2G4FPRR11SLC16A14CIITA
PDCD1LG2GJB1HLFKIF11CABYRMAML2
DSERAP1GAPPLLPCDK1SLC16A14MAML2
CMTM3RAP1GAPHLFCEP55VAX2HPGDS
ANLNGPT2SUSD2KPNA2FGADPYD
CTHRC1CITINMTBIRC5FGAHLA-DMB
ANLNCABLES1ADAMTS8CENPASLC48A1TLR2
CD274INMTHLFBUB1SLC16A14ATP10A
TPX2GPT2ADAMTS8PBKABCB6FAS
RGS10FAM174BADAMTS8NUF2GPT2EMP1
DSECABLES1INMTKIF11FGACIITA
NTAN1KIAA1324TNXBKIF11GPT2HLA-DMB
DSESLC48A1SCN4BCKAP2LPBKATP10A
CD109TOB1INMTCDK1ENO3ARHGAP20
CD109FAM174BRTN4RL1CENPAS100PEMP1
RGS10SLC48A1TMPRSS2KPNA2PBKDAPP1
CD109KIAA1324SCN4BCENPAENO3PER3
CD274C9orf152CBX7CEP55PBKFAS
ANLNSORBS2NFIXKPNA2GPT2SPRED1

[0000]

Gene Pairs For LGG Sub-Types
ME_1ME_2PN_1PN_2CL_1CL_2NE_1NE_2
IL1R1KLHL23SLCO5A1NIPAL2MEOX2NALCNNAPBLIMA1
IL1R1BCL7AFERMT1KCNAB2IGFBP2ACTR1ANAPBMIDN
IL1R1DSCAMDSCAMSYNPOMEOX2REPS2CAMKK1NKIRAS2
TYMPCRTC1FAM110BSYNPOMEOX2GNAI1GDANKIRAS2
TYMPBCL7AFERMT1SYNPOTLK1RAB18MAL2NUBP1
TYMPRUNDC3ASHDNAPBFBXO17TMEFF2KCNAB2MIDN
CD3DTBR1GPR173UGP2HS3ST3B1PCBP3KCNAB2LIMA1
GPR65ANAPC1SLCO5A1OCIAD2PIPOXMAGEH1KCNAB2CDC42SE1
RAB27AMEIS1BCL7AUGP2PIPOXDNM3SULT4A1PPP1R18
GPR65PTSSLCO5A1RGS14SHOX2H2AFY2SULT4A1LIMA1
MYO1GEDN3PCGF2FAM131AHS3ST3B1H2AFY2SV2BNUP188
TNFAIP8EDN3SHDKCNAB2MEIS1GNAI1GDAWDR81
RAB27AANAPC1SHDUGP2MEIS1ASB13SULT4A1NUP188
GPRC5ARCOR2FERMT1SIPA1L1SH2D4APCBP3CAMKK1TRAFD1
FAM20AKLHL23DSCAMSIPA1L1OCIAD2TMEFF2SV2BPPP1R18
CD3DGABRA1RCOR2FAM131ASHOX2PCBP3GABRA1NKIRAS2
RAB27AKLHL23RCOR2RALBPIPOXARL3CACNG3DDX19B
KYNUEDN3GPR173HOPXHS3ST3B1TMEFF2SYNPRBAZ1A
CD3GTBR1GPR173FAM131AIGFBP2WACRBFOX1BAZ1A
CD96CACNG3BCL7ASIPA1L1IGFBP2SAR1AMAL2ANAPC1
PTPN22CACNG3JPH4NAPBFBXO17GNAI1TBR1DDX19B
PTPN22RYR2H2AFY2CAMKK1DMRTA2AIFM2NAPBPTBP1
CD96TBR1DSCAMHOPXMCCC1ARL3CAMKK1ARHGAP17
TNFAIP8AIFM2ZNF74CYB561MEIS1GALNT13PTERDDX19B
GPRC5ACAMKK1USP49CYB561FBXO17REPS2PTERNUBP1
TREM1SYNPRTMEFF2CAMKK1DMRTA2DDX19BGDASTK10
GPRC5AZNF74RCOR2HOPXDMRTA2TTNSV2BTRAFD1
MYO1GAMY2BPCGF2RALBMCCC1DNM3PTERINTS9
FAM20AZNF74USP49CXCL14ARAP3DNM3RYR2BAZ1A
FAM20ADSCAMZNF74LGALS8SHOX2TTNCCKSTK10
CD3DRBP4JPH4KCNAB2NPNTJPH4CPNE6MAN2B1
CD96MAL2USP49DYNLT3ARAP3GALNT13CACNG3NUBP1
GPR65MEIS1ZNF74DYNLT3SHROOM3REPS2RBFOX1STK10
SNX20AIFM2GALNT13NAPBOTX1SH3GL2CACNG3ANAPC1
TREM1GABRA1PTSKLHL26PDPNJPH4CPNE6WDR81
TREM1RYR2KLHL23RALBTNFAIP6H2AFY2RBFOX1MAN2B1
CD3GSH2D7PCGF2CXCL14WIPF3SH3GL2FAM131ATRAFD1
PTPN22HCN1AMOTL2ANKRD11PDPNMXI1SYNPRANAPC1
IL15PCDH8H2AFY2CPNE6EMP3KCNAB2CCKINTS9
MYO1GTMIEOLIG2NDST1ARAP3ASB13CCKMAN2B1
TNFAIP8TTNOLIG2CLSTN1EMP3ASB13GABRA1PPP1R18
MMP19TTNTMEFF2GDAEMP3GALNT13GABRA1INTS9
IL15GABRA1PTSDYNLT3MCCC1MAGEH1CPNE6ARHGAP17
LCKPPP1R1CSOX6TMEM127PDPNWACFAM131ANUP188
CD3GCACNG3PTSWIPF3HOPXACTR1ASYNPRARHGAP17
MMP19SLC25A32EBF1OCIAD2TLK1MXI1UGP2PTBP1
MMP19AIFM2TMEFF2RBFOX1TLK1MICU1SYNPOHNRNPAB
BATFSYNPRPATZ1TMEM127NPNTSH3GL2SLC6A7TTN
LY96MEIS1H2AFY2GDAFABP5NALCNCRTC1MIDN
BATFRBP4FAM110BTECPR2WIPF3KCNAB2UGP2HNRNPAB

[0000]

Gene Pairs For KIRC Sub-Types
Solid TissueSolid Tissue
Normal_1Normal_23_13_21_11_22_12_24_14_2
PIK3C2GSIGLEC10ADAM12FAAHATP11APPIATAZPOP4TIMM8BATG2B
FXYD4COL23A1ADAM12CCDC130TOLLIPSLC25A39TUBGCP6TSNMTX1RAD54L2
FXYD4NDUFA4L2ADAM12CRB3ATP11AOAZ1TUBGCP6STRAPPOP4TAF1
CLDN8DDB2ARL4CSHMT1SPATA18MRPS34CCDC130COPS4TIMM8BZFHX3
CLDN8SEMA5BCTHRC1ACADLOSBPL1ASLC25A39TUBGCP6MMADHCMRPS34UBR5
CLDN8STC2IL2RATMEM171ITGA6OAZ1ZNF692COPS4POP4PRDM2
PIK3C2GCXXC4TRAM2PRKAB1RAPGEF2SLC25A39CCDC84POP4MRPS34HERC1
PLA2G4FSTC2PLAURACADLPRUNE2OAZ1CCDC84PIGCMRPS34ARID1B
GGT6STC2ARL4CIMPA2SPATA18PSMB3TAZPIGCMRPL17NEK9
GGT6HILPDASAP30ACADLSPATA18GNG5ZNF276COPS4POP4ZFHX3
FAM3BSPAG4ADAMTS12TRPM3DIP2BPNKDZNF276PIGCGRB2MACF1
FAM3BSAP30ARL4CACAA2BCL2TMEM219ZNF276SPTY2D1MTX1NEMF
FAM3BTRDMT1PODNL1C16orf86DIP2BSEC13CHKBLSM11ORAI3ZFHX3
SLC26A7SCARB1RUNX1PDZK1TMCC3SEC13CCDC130POP4MTX1ZNF445
TMPRSS2SCARB1ADAMTS12FAAHTMCC3PSMB3LCATGPN3LSM4ARID1A
TMPRSS2EGLN3CALUPEBP1RIT1GTF3ACHKBKIAA0391TXNDC17NR2C2
FXYD4BHLHE41ADAMTS12PTH2RTMCC3GNG5GPS2HSF2CLPPHERC1
PIK3C2GCENPPBCAT1ETFDHARHGAP42PNKDCCDC130MMGT1ORAI3DICER1
PLA2G4FSEMA5BRUNX1RIT1LYSMD3LSM4TAZUSP39PRELID1ARID1A
PLA2G4FCOL23A1RUNX1TOLLIPRAVER2SLC50A1CCDC84MMGT1MRPL51UBR5

[0000]

SolidSolid
TissueTissue
Normal_1Normal_2Atypical_1Atypical_2Classical_1Classical_2
FAM3DTGFB1ME11VEGFCASNSSAMHD1
FAM107ALOXL2ME11PDGFCTMEM116CCDC69
CLEC3BNID2FOXRED2PRSS23SCN9AAPOL3
EMCNNID2ZNF541VEGFCOSGIN1SAMHD1
GPD1LELF4ZNF541DACT1ARTNMOB3B
FAM3DTTYH3SYCP2PODNL1SCN9ACCDC69
CLEC3BASPNMEI1FSTL3EPCAMSAMHD1
SH3BGRL2TGFB1FOXRED2USP10B4GALNT4CCDC69
SH3BGRL2TTYH3SYNGR3FSTL3GUIAPOL3
SH3BGRL2DNAJC13SYCP2VEGFCTMEM116ARHGEF10L
CLEC3BPCDH12FOXRED2FBLIM1SCN9AUBA7
FAM107AADAMTS2ZNF541P4HA3CYP4F11IL4R
FAM3DTPX2SYNGR3FBXO44TMEM116UBA7
GPD1LMYBL2SYNGR3PRR5PANX2TMEM51
NRG2NOX4CEP70PDGFCARTNAPOL3
GPD1LFOXM1SYCP2F2RL1CYP4F11RAP1A
FAM107AOLFML2BILDR1PDGFCGLI2TMEM51
ATP6V0A4LOXL2C19orf57UBTD1CYP4F11PRDM2
PLIN4LOXL2FAM83EPAQR5RIT1RAP1A
NDRG2LAMC2FAM83ERUSC2OSGIN1CASP4
Mesenchymal_1Mesenchymal_2Basal_1Basal_2
ASPNRAPGEFL1RGS20ZDHHC2
POSTNCD9TRPV3ZDHHC2
OLFML2BMAPK13TRPV3GPRC5B
OLFML2BRAPGEFL1HTR7GPRC5B
TGFB3ERBB3TRPV3PBX1
ASPNERBB3HTR7EPS8
PCOLCEMAPK13RGS20GPRC5B
ADAMTS2SLC9A3R1FLRT3PTPRS
PCOLCERAPGEFL1GOLGA7BNTRK2
ASPNELF3FLRT3PBX1
PCOLCERAB25HTR7ZDHHC2
OLFML2BSTAP2RGS20EPS8
DACT1CAMSAP3FLRT3LTBP3
OLFML3STAP2SLC6A11PBX1
FAPLLGL2SH2D5EPS8
GLT8D2CAMSAP3CDSNARHGAP24
OLFML3LLGL2SLC6A11NTRK2
TGFB3STAP2MOB3BNTRK2
ADAMTS2MAPK13TSPAN10ARHGAP24
ADAMTS2CLDN4SH2D5TTC28

[0000]

Gene Pairs For ESCA Sub-Types
AC_1AC_2ESCC_1ESCC_2
HNF4ATFAP2CTP63YKT6
HNF4ARNF217TP63BRD2
HNF4AGPR87TP63ATG3
MUC13BNC1ZNF385AYKT6
MUC13SOX15S1PR5CD68
MUC13TP63EFSMRPL1
EPS8L3LPAR3S1PR5PDF
EPS8L3S1PR5S1PR5ECM2
EPS8L3GPR87SOX15TIMM8A
USH1CLPAR3EFSECM2
USH1CMRPL1DSC3YKT6
TSPAN8MCCTFAP2CMCTP2
TSPAN8RNF217PKP1BRD2
TSPAN8EFSEFSMRPL23
LGALS4CALML3SOX15MCTP2
LGALS4TP63SNAI2TM2D2
TMC5SOX15PARD6GMRPL1
GPR35S1PR5BNC1TIMM8A
PLEKHA6EFSSNAI2MRPL1
PRR15LEFSDSC3ATG3
VIL1LPAR3LPAR3CD68
VIL1S1PR5CALML3MCTP2
LGALS4BNC1CALML3MRPL23
TMC5TFAP2CCALML3TM2D2
TMC5MCCPKP1SEC31A
HNF1APDFBNC1MRPL23
PLEKHA6MCCDSC3BRD2
PRR15LSOX15BNC1CD68
SEMA4GGPR87FRMD6ATG3
USH1CPARD6GGPR87ECM2
PLEKHA6TP63SOX15IFIT2
PRR15LTFAP2CGPR87TIMM8A
VIL1TIMM8ARNF217TM2D2
ICA1PARD6GFSCN1SEC31A
HNF1ACD68GPR87PDF
HNF1ACYB5D1LPAR3PDF
RHPN2BNC1LPAR3CYB5D1
GPR35PARD6GS100A2SEC31A
GPR35TIMM8ASNAI2MRPL18
HNF1BTIMM8AFRMD6ANGPTL2
SEMA4GSNAI2PKP1MRPL18
SLC44A4RNF217S100A2MRPL18
CGNFRMD6PARD6GIFIT2
RHPN2SNAI2RHPN2SLC44A4
ICA1SNAI2S100A2ANGPTL2
RHPN2FRMD6RNF217IFIT2
SLC44A4FRMD6GPR35VIL1
SLC44A4CALML3MCCANGPTL2
FOXA3CHST6RNF217SIGLEC1
CGNZNF385ASEMA4GSLC44A4

[0000]

Gene Pairs For COAD Sub-Types
SolidSolid
TissueTissue
Normal_1Normal_2CIN_1CIN_2MSI/CIMP_1MSI/CIMP_2Invasive_1Invasive_2
ABCA8URB2TNNC2CCL5ADAMTS2SLC39A5APOBEC1FGFR1
ABCA8SLCO4A1GDPD5TRIM69ADAM12SGK2QPCTSIRPA
ABCA8TRIB3GDPD5ICAM1TREM1SLC19A3QPCTAQP1
CA7FTSJ1TTI1LHFPL2ADAMTS2IHHIL33TNS1
CA7GTF2IRD1SLC5A6LGMNOLR1SLC19A3QPCTTNS1
CA7KRT80MOCS3TRIM69SLC11A1PPP1R14CCOMMD10AQP1
SCARA5SLC7A5TGIF2TRIM69ADAM12PPP1R14CAPOBEC1SIRPA
SCARA5FTSJ1CDK5RAP1LHFPL2SLC11A1PLA2G4FAPOBEC1CCDC80
SCARA5GTF2IRD1PIGULHFPL2HAPLN3ABATIL33SIRPA
CLEC3BKRT80TNNC2TNFAIP8ITGAXSGK2SLC11A2AEBP1
CLEC3BSLCO4A1GNG4SGMS2ICAM1SLC39A5SMAGPAEBP1
CLEC3BTEAD4TNNC2HPSECLEC5ASLC19A3PPA2TIMP2
SPIBURB2SLC5A6VAPANCF2SGK2RAB32AQP1
SPIBSLCO4A1GNG4ABHD3OSMRNLSCYP39A1GPR161
SPIBTEAD4SLC35C2LGMNSPP1CXCL14COMMD10TNS1
GLP2RKRT80SLC13A3FCGR3ATREM1RNLSIL33EHD2
GLP2RCLDN1GDPD5TRIB2SLC11A1PRRG2HSD17B4VIM
GLP2RETV4GNG4CD163C5AR1PPP1R14CSLC11A2IGFBP5
TMIGD1URB2FITM2ABHD3SPHK1PRRG2SLC11A2TIMP2
TMIGD1TEAD4SLC13A3TAGAPITGAXABATHCN1CCDC8

[0000]

Gene Pairs For BRCA Sub-Types
Solid TissueSolid Tissue
Normal_1Normal_2LumA_1LumA_2Basal_1Basal_2
CD300LGMMP11DEGS2PHGDHFOXC1AR
TMEM132CCOL10A1AGR3AIF1LNEK2FOXA1
CA4COL10A1TMC4PHGDHFAM171A1AR
ABCA10MMP11DEGS2AIF1LBCL11AAGR2
ARHGAP20MMP11AGR3PHGDHNUSAP1MLPH
FXYD1COL10A1ZMYND10PSAT1CDK1FOXA1
PAMR1SLC35A2FGD3IFRD1ZWINTMLPH
CD300LGPAFAH1B3MAPTAIF1LFOXC1MAGI1
TSLPNEK2AGR3ID4CDK1MLPH
PAMR1PSENENDEGS2MCCC1NUSAP1FOXA1
PAMR1PYCR1ABATLPIN1FOXC1EZH1
CD300LGTK1THSD4EGFRCDCA7AR
SCARA5CENPFZMYND10CENPWKCNK5AGR2
BTNL9SLC50A1ZMYND10CENPNNEK2AGR2
MAMDC2SLC50A1FGD3TTLL4CENPWSIDT1
ARHGAP20TPX2FGD3LBRBCL11ASPDEF
MAMDC2PYCR1ESR1CX3CL1ORC1SIDT1
ARHGAP20ZWINTABATMCCC1BCL11AVIPR1
MAMDC2SLC35A2ESR1EGFRNEK2SPDEF
SCARA5SLC50A1GATA3YBX1CENPASIDT1
LYVE1TK1NAT1LBRKCNK5FBP1
SCARA5TIMELESSSUSD3MCCC1KCNK5THSD4
FXYD1NEK2KCNJ11PSAT1CDCA7SPDEF
CA4NEK2ABATIFRD1SKP2CMBL
LYVE1MKI67KCNJ11DSCC1SRSF12DNALI1
LYVE1LMNB1ESR1ANO6MTHFD1LCMBL
CLEC3BPAFAH1B3FOXA1PGRMC1CDCA7FBP1
BTNL9SLC35A2MAPTEGFRSFT2D2REEP5
CLEC3BTK1MLPHHNRNPDMTHFD1LFBP1
CA4ASF1BCA12CX3CL1PSAT1CMBL
TSLPCCNE2EVLKARSCENPFGATA3
BTNL9PAFAH1B3NAT1SKP2TPX2GATA3
TSLPCENPKKCNJ11PIRCHODLDNALI1
C1QTNF9CDC25CSUSD3RGMASFT2D2RHOB
ABCA10TPX2SLC44A4KCMF1TPX2TBC1D9
ABCA10ZWINTNAT1IFRD1PPP1R14CTHSD4
ASPAASF1BSLC44A4LPIN1VGLL1DNALI1
C1QTNF9TAS1R3SUSD3TTLL4VGLL1VIPR1
ASPADTLGATA3HNRNPDKRT16THSD4
GLYATASF1BTMC4KCMF1LMNB1TBC1D9
ASPACDK1CA12YBX1FAM171A1EZH1
CLEC3BPYCR1EVLHNRNPDMKI67GATA3
C1QTNF9CENPAMAPTLPIN1PPP1R14CVIPR1
ACVR1CTPX2MLPHCX3CL1NUSAP1TBC1D9
GLYATDTLSLC44A4TOMM22EN1TMEM86A
ACVR1CCENPFMLPHORMDL3KARSREEP5
TMEM132CCDK1GATA3ARL6IP1TPX2CA12
ITM2AUBE2E1DNALI1RGMAEN1CROT
GLYATCDK1FOXA1TRIM29UGT8CROT
TMEM132CZWINTFOXA1STAU1CDK1CA12
LumB_1LumB_2Normal_1Normal_2Her2_1Her2_2
MCM10SFRP1CFIHLTFMPHOSPH6ASB13
CENPAFOXC1LZTS1HLTFGRB7IGF1R
ESPL1SFRP1COL17A1PEX19SIDT1IGF1R
ESPL1CX3CL1SERPINF2LYSMD1MPHOSPH6SCARB1
DSCC1SFRP1COL17A1OTUD7BMPHOSPH6SMAD4
CCNE2EGFRLZTS1PIGMPGAP3IGF1R
CDC25CTRIM29IL3RAERI2PNMTZNF516
CENPKID4CX3CL1ZNF664KMOASB13
ESPL1SLC25A37ITM2ACOG2PNMTGREB1
CCNE2TRIM29PPM1FOTUD7BKMOBCL2
MCM10CRYABITM2ASTRBPPNMTC1orf226
EME1TRIM29CFICOG2MFSD2ARARG
DSCC1FAM171A1CX3CL1SDHCTMEM86AASB13
CDC25CRGMANGFRCOG2FA2HC1orf226
MCM10FAM171A1CX3CL1PEX19TCAPNUDT6
ORC1FOXC1ITM2AKLHL12SPINK8RERG
WDR76EGFRPPM1FHLTFKMOEZH1
CENPNFAM171A1NGFREZH1TMEM86ASCARB1
CENPAID4CFIOTUD7BMFSD2ASCARB1
NEK2SLC25A37PPM1FKLHL12SPINK8ZNF516
DSCC1CRYABLZTS1RBBP5TMEM86ABCL2
CDC25CID4COL17A1STRBPZP2EDN3
CCNE2CRYABPTNRBBP5FGFR4STC2
CENPARGMANGFRMAGI1GRB7STC2
NEK2GSTP1PTNPEX19SPINK8GREB1
CDK1GSTP1PAMR1LYSMD1MFSD2ARERG
TPX2GSTP1RHOJWDR19NUDT8C1orf226
CDC25AFOXC1MAMDC2LYSMD1FA2HZNF516
ORC1RGMARHOJERI2FA2HRERG
WDR76SLC25A37PTNPIGMGRB7SMAD4
PRIM1EGFREGFRGNPATSIDT1BCL2
WDR76TINAGL1IL3RATADA1ZP2NUDT6
NEK2CX3CL1RHOJPIGMSOX11RARG
RACGAP1PNRC1PAMR1TADA1ZP2MRGPRX3
DTLPNRC1CHST3RBBP5FGFR4RARG
CENPKANXA3PAMR1MBOAT1B4GALNT2MBOAT1
CENPNTCF7L1PDGFAPCCBFGFR4EZH1
FANCIPNRC1TINAGL1STRBPTCAPKIAA0391
CENPNCHST3TRIM29GNPATDEGS2ESR1
DTLCX3CL1SERPINF2MBOAT1SOX11SMAD4
EME1ANXA3TRIM29RRM1TCAPGREB1
PRIM1TINAGL1PGCIARS2NUDT8STC2
PRIM1TCF7L1PGCPGRMC1CCNE2MBOAT1
BRCA1TINAGL1PGCHNRNPDPSMD3RPS19
ORC1ANXA3CADM3EPS15ABCC2NUDT6
DSN1PPM1FEDN3NUDT6NUDT8EZH1
CDC25ATCF7L1TINAGL1KLHL12SLC44A4ESR1
BRCA1PDZRN3PNRC1SDHCTAS1R3PMAIP1
TMEM106CZFP36L2PDGFARRM1CDK1ESR1
CENPKBOCEGFRRRM1ORC1PMAIP1

[0000]

finalCCN crossCCN cross
generalspeciestechnology
ParametersCCNvalidationvalidationBRCACOADESCAHNSC
nTopGenes25252520202020
nTopGenePairs70707050205020
nRand70387020202015
nTrees2000 2000 2000 2000 2000 1000 2000 
stratifyTRUETRUETRUETRUETRUETRUETRUE
sampsize60256020247040
weightedDown_total5.00E+055.00E+055.00E+055.00E+055.00E+055.00E+055.00E+05
weightedDown_dThresh   0.25   0.25   0.25   0.25   0.25   0.25   0.25
transprop_xFact1.00E+051.00E+051.00E+051.00E+051.00E+051.00E+051.00E+05
weight_broadClassNANANA 1 1 5 5
quickPairsTRUETRUETRUEFALSEFALSEFALSEFALSE
ParametersKIRCLGGUCECPAADSTADLUADLUSC
nTopGenes20201030202020
nTopGenePairs20502050152525
nRand1515152055600 600 
nTrees2000 2000 1000 2000 1000 2000 2000 
stratifyTRUETRUETRUETRUETRUETRUETRUE
sampsize70301530556027
weightedDown_total5.00E+055.00E+055.00E+055.00E+055.00E+055.00E+055.00E+05
weightedDown_dThresh   0.25   0.25   0.25   0.25   0.25   0.25   0.25
transprop_xFact1.00E+051.00E+051.00E+051.00E+051.00E+051.00E+051.00E+05
weight_broadClass 11510 510 5 5
quickPairsFALSEFALSEFALSEFALSEFALSEFALSEFALSE

[0113]

While the foregoing disclosure has been described in some detail by way of illustration and example for purposes of clarity and understanding, it will be clear to one of ordinary skill in the art from a reading of this disclosure that various changes in form and detail can be made without departing from the true scope of the disclosure and may be practiced within the scope of the appended claims. For example, all the methods, cranial implant devices, and/or component parts or other aspects thereof can be used in various combinations. All patents, patent applications, websites, other publications or documents, and the like cited herein are incorporated by reference in their entirety for all purposes to the same extent as if each individual item were specifically and individually indicated to be so incorporated by reference.

Как компенсировать расходы
на инновационную разработку
Похожие патенты