патент
№ US 0011913957
МПК G01N33/574

Compositions, methods and kits for diagnosis of lung cancer

Авторы:
Paul E. Kearney Kenneth C. Fang Xiao-Jun Li
Все (5)
Правообладатель:
Все (2)
Номер заявки
15786924
Дата подачи заявки
18.10.2017
Опубликовано
27.02.2024
Страна
US
Как управлять
интеллектуальной собственностью
Реферат

[0000]

Methods are provided for identifying biomarker proteins that exhibit differential expression in subjects with a first lung condition versus healthy subjects or subjects with a second lung condition. Also provided are compositions comprising these biomarker proteins and methods of using these biomarker proteins or panels thereof to diagnose, classify, and monitor various lung conditions. The methods and compositions provided herein may be used to diagnose or classify a subject as having lung cancer or a non-cancerous condition, and to distinguish between different types of cancer (e.g., malignant versus benign, SCLC versus NSCLC).

Формула изобретения

1. A method of treating a pulmonary nodule in a subject, comprising:

(a) performing an immunoassay to measure expression levels of a panel of proteins present in a blood sample of the subject, wherein the panel comprises LG3BP and C163A;

(b) determining a probability of lung cancer score based on the measurements of step (a); and

(c) treating the subject having a score in step (b) that is equal to or greater than a predetermined score by surgery, chemotherapy, radiotherapy, or any combination thereof.

2. The method of claim 1, wherein the immunoassay is enzymelinked immunosorbent assay (ELISA).

3. The method of claim 1, wherein the panel of proteins further comprises at least one of ALDOA, FRIL, TSP1, COIA1, PEDF, MASP1, GELS, LUM, PTPRJ, IBP3, LRP1, ISLR, GRP78, TETN, PRDX1, CD14, BGH3, FIBA, and GSLG1.

4. The method of claim 1, wherein the pulmonary nodule of the subject has a diameter of less than or equal to 3 cm.

5. The method of claim 1, wherein the pulmonary nodule of the subject has a diameter of about 0.8 cm to 3.0 cm.

6. The method of claim 1, wherein the subject is at risk of developing lung cancer.

7. The method of claim 1, wherein the subject is 40 years or older.

8. The method of claim 1, further comprising a clinical assessment.

9. The method of claim 8, wherein clinical assessment comprises a procedure selected from a PFT, pulmonary imaging, a biopsy, a surgery, and any combination thereof.

10. The method of claim 9, wherein the pulmonary imaging is an x-ray, a chest computed tomography (CT) scan, or a positron emission tomography (PET) scan.

11. The method of claim 1, wherein the panel of proteins consisting essentially of, or consisting of LG3BP and C163A.

12. The method of claim 1, wherein the step (a) comprises contacting the blood sample with a LG3BP antibody and a C163A antibody.

Описание

RELATED APPLICATIONS

[0001]

This application is a continuation-in-part of U.S. application Ser. No. 15/051,153, filed Feb. 23, 2016, now U.S. Pat. No. 10,388,074, which is a continuation of U.S. application Ser. No. 13/775,494, filed Feb. 25, 2013, now U.S. Pat. No. 9,304,137, which is a continuation-in-part of U.S. application Ser. No. 13/724,823, filed Dec. 21, 2012, now U.S. Pat. No. 9,201,044, which claims priority to, and the benefit of, U.S. Application No. 61/578,712, filed Dec. 21, 2011, U.S. Application No. 61/589,920, filed Jan. 24, 2012, U.S. Application No. 61/676,859, filed Jul. 27, 2012 and U.S. Application No. 61/725,153, filed Nov. 12, 2012, the contents of each of which are incorporated herein by reference in their entireties.

INCORPORATION-BY-REFERENCE OF SEQUENCE LISTING

[0002]

The contents of the text file named “IDIA-005_XO2US_Sequence Listing_ST25.txt”, which was created on Feb. 27, 2015 and is 14 KB in size, are hereby incorporated by reference in their entireties.

BACKGROUND

[0003]

Lung conditions and particularly lung cancer present significant diagnostic challenges. In many asymptomatic patients, radiological screens such as computed tomography (CT) scanning are a first step in the diagnostic paradigm. Pulmonary nodules (PNs) or indeterminate nodules are located in the lung and are often discovered during screening of both high risk patients or incidentally. The number of PNs identified is expected to rise due to increased numbers of patients with access to health care, the rapid adoption of screening techniques and an aging population. It is estimated that over 3 million PNs are identified annually in the US. Although the majority of PNs are benign, some are malignant leading to additional interventions. For patients considered low risk for malignant nodules, current medical practice dictates scans every three to six months for at least two years to monitor for lung cancer. The time period between identification of a PN and diagnosis is a time of medical surveillance or “watchful waiting” and may induce stress on the patient and lead to significant risk and expense due to repeated imaging studies. If a biopsy is performed on a patient who is found to have a benign nodule, the costs and potential for harm to the patient increase unnecessarily. Major surgery is indicated in order to excise a specimen for tissue biopsy and diagnosis. All of these procedures are associated with risk to the patient including: illness, injury and death as well as high economic costs.

[0004]

Frequently, PNs cannot be biopsied to determine if they are benign or malignant due to their size and/or location in the lung. However, PNs are connected to the circulatory system, and so if malignant, protein markers of cancer can enter the blood and provide a signal for determining if a PN is malignant or not.

[0005]

Diagnostic methods that can replace or complement current diagnostic methods for patients presenting with PNs are needed to improve diagnostics, reduce costs and minimize invasive procedures and complications to patients. The present invention provides novel compositions, methods and kits for identifying protein markers to identify, diagnose, classify and monitor lung conditions, and particularly lung cancer. The present invention uses a blood-based multiplexed assay to distinguish benign pulmonary nodules from malignant pulmonary nodules to classify patients with or without lung cancer. The present invention may be used in patients who present with symptoms of lung cancer, but do not have pulmonary nodules.

SUMMARY

[0006]

The present invention provides a method of determining the likelihood that a lung condition in a subject is cancer by measuring an abundance of a panel of proteins in a sample obtained from the subject; calculating a probability of cancer score based on the protein measurements and ruling out cancer for the subject if the score is lower than a pre-determined score. When cancer is ruled out, the subject does not receive a treatment protocol. Treatment protocols include for example pulmonary function test (PFT), pulmonary imaging, a biopsy, a surgery, a chemotherapy, a radiotherapy, or any combination thereof. In some embodiments, the imaging is an x-ray, a chest computed tomography (CT) scan, or a positron emission tomography (PET) scan.

[0007]

The present invention further provides a method of ruling in the likelihood of cancer for a subject by measuring an abundance of panel of proteins in a sample obtained from the subject, calculating a probability of cancer score based on the protein measurements and ruling in the likelihood of cancer for the subject if the score is higher than a pre-determined score.

[0008]

In another aspect, the invention further provides a method of determining the likelihood of the presence of a lung condition in a subject by measuring an abundance of panel of proteins in a sample obtained from the subject, calculating a probability of cancer score based on the protein measurements and concluding the presence of said lung condition if the score is equal or greater than a pre-determined score. The lung condition is lung cancer such as for example, non-small cell lung cancer (NSCLC). The subject is at risk of developing lung cancer.

[0009]

In another aspect, the invention provides a method of determining the likelihood that a pulmonary nodule in a subject is not lung cancer, comprising: (a) measuring the expression levels of a panel of proteins present in a blood sample obtained from the subject, wherein the panel of proteins comprises, consisting essentially of, or consisting of LG3BP and C163A; (b) calculating a probability of lung cancer score based on the expression levels of the panel of proteins of step (a); and (c) ruling out lung cancer for the subject if the score in step (b) is lower than a pre-determined score.

[0010]

In some embodiments, the panel includes at least 3 proteins selected from ALDOA, FRIL, LG3BP, IBP3, LRP1, ISLR, TSP1, COIA1, GRP78, TETN, PRDX1 and CD14. Optionally, the panel further includes at least one protein selected from BGH3, COIA1, TETN, GRP78, PRDX, FIBA and GSLG1.

[0011]

In some embodiments, the panel includes at least 4 proteins selected from ALDOA, FRIL, LG3BP, IBP3, LRP1, ISLR, TSP1, COIA1, GRP78, TETN, PRDX1 and CD14.

[0012]

In a preferred embodiment, the panel comprises LRP1, COIA1, ALDOA, and LG3BP.

[0013]

In another preferred embodiment, the panel comprises LRP1, COIA1, ALDOA, LG3BP, BGH3, PRDX1, TETN, and ISLR.

[0014]

In yet another preferred embodiment, the panel comprises LRP1, COIA1, ALDOA, LG3BP, BGH3, PRDX1, TETN, ISLR, TSP1, GRP78, FRIL, FIBA and GSLG1.

[0015]

The subject has or is suspected of having a pulmonary nodule. The pulmonary nodule has a diameter of less than or equal to 3 cm. In one embodiment, the pulmonary nodule has a diameter of about 0.8 cm to 2.0 cm.

[0016]

The score is calculated from a logistic regression model applied to the protein measurements. For example, the score is determined as Ps=1/[1+exp(−a−Σi=1Nβi*{hacek over (I)}i,s)], where is logarithmically transformed and normalized intensity of transition i in said sample (s), J is the corresponding logistic regression coefficient, a was a panel-specific constant, and N was the total number of transitions in said panel.

[0017]

In various embodiments, the method of the present invention further comprises normalizing the protein measurements. For example, the protein measurements are normalized by one or more proteins selected from PEDF, MASP1, GELS, LUM, C163A and PTPRJ.

[0018]

The biological sample includes, such as for example tissue, blood, plasma, serum, whole blood, urine, saliva, genital secretion, cerebrospinal fluid, sweat and excreta.

[0019]

In one aspect, the determining the likelihood of cancer is determined by the sensitivity, specificity, negative predictive value or positive predictive value associated with the score. The score determined has a negative predictive value (NPV) at least about 80%.

[0020]

The measuring step is performed by selected reaction monitoring mass spectrometry, using a compound that specifically binds the protein being detected or a peptide transition. In one embodiment, the compound that specifically binds to the protein being measured is an antibody or an aptamer.

BRIEF DESCRIPTION OF THE DRAWINGS

[0021]

FIG. 1 is a line graph showing area under the curve for a receiving operating curve for 15 protein LC-SRM-MS panels.

[0022]

FIG. 2 shows six line graphs each showing area under the curve for a receiving operating curve for 15 protein LC-SRM-MS panels for different patient populations and for subjects with large and small PN

[0023]

FIG. 3 is a graph showing variability among three studies used to evaluate 15 protein panels.

[0024]

FIG. 4 is a line graph showing area under the curve for a receiving operating curve for a 15 protein LC-SRM-MS panel.

[0025]

FIG. 5 shows three line graphs each showing area under the curve for a receiving operating curve for a 15 protein LC-SRM-MS panel for a different patient population.

[0026]

FIG. 6 shows the results of a query of blood proteins used to identify lung cancer using the “Ingenuity” ® program.

[0027]

FIG. 7 is a bar diagram showing Pearson correlations for peptides from the same peptide, from the same protein and from different proteins.

[0028]

FIG. 8 is a graph showing performance of the classifier on the training samples, validation samples and all samples combined.

[0029]

FIG. 9 is a graph showing clinical and molecular factors.

[0030]

FIG. 10 is a schematic showing the molecular network containing the 13 classifier proteins (green), 5 transcription factors (blue) and the three networks (orange lines) of lung cancer, response to oxidative stress and lung inflammation.

[0031]

FIG. 11 is a graph depicting interpretation of classifier score in terms of risk.

[0032]

FIG. 12 is a graph showing performance of the classifier on the discovery samples (n=143) and validation samples (n=104). Negative predictive value (NPV) and specificity (SPC) are presented in terms of classifier score. A cancer prevalence of 20% was assumed.

[0033]

FIG. 13 is a graph showing multivariate analysis of clinical (smoking, nodule size) and molecular (classifier score) factors as they relate to cancer and benign samples (n=247) in the discovery and validation studies. Smoking is measured by pack-years on the vertical. Nodule size is represented by circle diameter. A reference value of 0.43 is presented to illustrate the discrimination between low numbers of cancer samples less than the reference value as compared to the high number of cancer samples above the reference value.

[0034]

FIG. 14 is a graph showing the 13 classifier proteins (green), 4 transcription regulators (blue) and the three networks (orange lines) of lung cancer, oxidative stress response and lung inflammation. All references are human UniProt identifiers.

[0035]

FIG. 15 is a graph showing scattering plot of nodule size vs. classifier score of all 247 patients, demonstrating the lack of correlation between the two variables.

[0036]

FIG. 16 is a diagram showing the Pearson correlations for peptides from the same peptide (blue), from the same protein (green) and from different proteins (red).

[0037]

FIG. 17 is a graph showing the correlation of Log2 ELISA concentration ratio (Galectin 3BP/CD163A) vs Log2 of mass spectrometry ratio (Galectin 3BP/CD163A).

[0038]

FIG. 18 is a graph showing XL1 Wcalibratedhistorical distribution.

[0039]

FIG. 19 is a graph showing XL2 reversal score historical distribution.

DETAILED DESCRIPTION

[0040]

The disclosed invention derives from the surprising discovery, that in patients presenting with pulmonary nodule(s), protein markers in the blood exist that specifically identify and classify lung cancer. Accordingly the invention provides unique advantages to the patient associated with early detection of lung cancer in a patient, including increased life span, decreased morbidity and mortality, decreased exposure to radiation during screening and repeat screenings and a minimally invasive diagnostic model. Importantly, the methods of the invention allow for a patient to avoid invasive procedures.

[0041]

The routine clinical use of chest computed tomography (CT) scans identifies millions of pulmonary nodules annually, of which only a small minority are malignant but contribute to the dismal 15% five-year survival rate for patients diagnosed with non-small cell lung cancer (NSCLC). The early diagnosis of lung cancer in patients with pulmonary nodules is a top priority, as decision-making based on clinical presentation, in conjunction with current non-invasive diagnostic options such as chest CT and positron emission tomography (PET) scans, and other invasive alternatives, has not altered the clinical outcomes of patients with Stage I NSCLC. The subgroup of pulmonary nodules between 8 mm and 20 mm in size is increasingly recognized as being “intermediate” relative to the lower rate of malignancies below 8 mm and the higher rate of malignancies above 20 mm [9]. Invasive sampling of the lung nodule by biopsy using transthoracic needle aspiration or bronchoscopy may provide a cytopathologic diagnosis of NSCLC, but are also associated with both false-negative and non-diagnostic results. In summary, a key unmet clinical need for the management of pulmonary nodules is a non-invasive diagnostic test that discriminates between malignant and benign processes in patients with indeterminate pulmonary nodules (IPNs), especially between 8 mm and 20 mm in size.

[0042]

The clinical decision to be more or less aggressive in treatment is based on risk factors, primarily nodule size, smoking history and age [9] in addition to imaging. As these are not conclusive, there is a great need for a molecular-based blood test that would be both non-invasive and provide complementary information to risk factors and imaging.

[0043]

Accordingly, these and related embodiments will find uses in screening methods for lung conditions, and particularly lung cancer diagnostics. More importantly, the invention finds use in determining the clinical management of a patient. That is, the method of invention is useful in ruling in or ruling out a particular treatment protocol for an individual subject.

[0044]

Cancer biology requires a molecular strategy to address the unmet medical need for an assessment of lung cancer risk. The field of diagnostic medicine has evolved with technology and assays that provide sensitive mechanisms for detection of changes in proteins. The methods described herein use a LC-SRM-MS technology for measuring the concentration of blood plasma proteins that are collectively changed in patients with a malignant PN. This protein signature is indicative of lung cancer. LC-SRM-MS is one method that provides for both quantification and identification of circulating proteins in plasma. Changes in protein expression levels, such as but not limited to signaling factors, growth factors, cleaved surface proteins and secreted proteins, can be detected using such a sensitive technology to assay cancer. Presented herein is a blood-based classification test to determine the likelihood that a patient presenting with a pulmonary nodule has a nodule that is benign or malignant. The present invention presents a classification algorithm that predicts the relative likelihood of the PN being benign or malignant.

[0045]

More broadly, it is demonstrated that there are many variations on this invention that are also diagnostic tests for the likelihood that a PN is benign or malignant. These are variations on the panel of proteins, protein standards, measurement methodology and/or classification algorithm.

[0046]

As disclosed herein, archival plasma samples from subjects presenting with PNs were analyzed for differential protein expression by mass spectrometry and the results were used to identify biomarker proteins and panels of biomarker proteins that are differentially expressed in conjunction with various lung conditions (cancer vs. non-cancer).

[0047]

In one aspect of the invention, one hundred and sixty three panels were discovered that allow for the classification of PN as being benign or malignant. These panels include those listed on Table 1. In some embodiments the panel according to the invention includes measuring 1, 2, 3, 4, 5 or more proteins selected from ISLR, ALDOA, KIT, GRP78, AIFM1, CD14, COIA1, IBP3, TSP1, BGH3, TETN, FRI, LG3BP, GGH, PRDX1 or LRP1. In other embodiments, the panel includes any panel or protein exemplified on Table 1. For example, the panel includes ALDOA, GRP78, CD14, COIA1, IBP3, FRIL, LG3BP, and LRP1.

[0048]

NumberpAUCProteins
IdentifierProteinsFactorISLRALDOAKITGRP78AIFM1CD14COIA1
194.5620101011
284.4880101011
3114.4511101001
4114.3571101001
5114.3311100011
6134.3241100011
7104.2051101001
8114.1931100001
9124.1891101001
10124.1821000011
11124.1691101001
1284.1071101011
13134.0270111011
14103.9940111011
15113.9791111011
16103.9321101011
17113.9261100011
18123.9131011001
19123.8720111011
20123.8641110011
21143.8531101011
2293.8491101001
23123.8461111001
24103.8290111010
25103.8290111011
26123.8261000101
2773.8041101011
28103.8020101011
29103.7870101010
3093.7791101011
31113.7740101011
3283.7591100001
33133.7581100011
34113.7571101000
35123.7540111011
36103.7501101011
37113.7470111011
38123.7441011001
39113.7421101011
4093.7401101011
41123.7401111011
42123.7391101011
4393.7341100001
44123.7301101001
45113.7250111011
46123.7170100111
4793.7130101011
4893.7131111011
49103.7090101011
50113.7091101011
51113.7010111111
52123.6851101011
53103.6800001010
54113.6761111001
5593.6680101011
5693.6590001010
57143.6571101111
58103.6551101001
59113.6430111011
6093.6430101010
6183.6401101010
62123.6401111011
63103.6381101001
64123.6331001101
65103.6321101011
66113.6271101010
67103.6271100010
68103.6231110001
69113.6191001011
7063.6171101001
71123.6171001011
72113.6131101010
73113.6081101010
74133.6081111011
75113.6050111011
76113.6020111011
77103.6001101000
78113.5961101001
79103.5921101010
80113.5871010001
81133.5841101111
8283.5840101010
83113.5811111010
84133.5781101010
8593.5731110011
8693.5721101001
87133.5711111010
88103.5691101001
8993.5690101010
9083.5590101010
91103.5580101010
92123.5541101011
93113.5520101010
94123.5490101010
9583.5471110011
96123.5451111011
9783.5421110000
98113.5361111001
99143.5301111011
10093.5271101011
101103.5220110111
102123.5091101011
10353.5050100010
104113.5001100101
105113.4971111001
10693.4911100010
10773.4890110010
108133.4861111011
109113.4831111001
110103.4771111011
111103.4731100011
112153.4681101111
113103.4670100110
114123.4671100111
115133.4671101101
116103.4670101010
11783.4651101001
118103.4640101111
119153.4641101111
120113.4621101011
12193.4601100010
122133.4531101011
123123.4491110010
124103.4481101010
125103.4450111010
12663.4410100010
127113.4401101010
128123.4401101100
129113.4391101010
130103.4260100110
131113.4231100001
132103.4201100010
133103.4191111010
134113.4171101101
135123.4140101111
136103.4130111010
137113.4000100110
138123.3981101010
139133.3961101010
14093.3861100010
14193.3731101010
142123.3631100101
14383.3620101010
144103.3601101011
14593.3591110010
14673.3490100000
14773.3481100011
14893.3401000010
14993.3351101010
150113.3330111010
15193.3330001010
152103.3281101010
15373.3150101010
154113.3111101111
155113.2931101010
15683.2921101000
15793.2890101010
15873.2290100010
15973.2291100010
16073.2031101000
161123.1611110110
16293.1381100101
163133.0781100101
Proteins
IdentifierIBP3TSP1BGH3TETNFRILLG3BPGGHPRDX1LRP1
 1100011001
 2100011001
 3111110011
 4110011111
 5011110111
 6111111111
 7011110011
 8011110111
 9111110011
10111111011
11110011111
12000011001
13110011111
14100011001
15000011101
16000111001
17111110011
18110011111
19100011111
20011111011
21111111011
22011110001
23110011111
24100011111
25100011101
26111110111
27000001001
28100011111
29110011111
30000011001
31100011111
32001110011
33111111011
34111111011
35110011111
36100011011
37110011110
38111110011
39110111001
40100011001
41100111001
42110011111
43011110011
44111111011
45100111001
46111111110
47100011011
48000011001
49100011101
50011111001
51100011001
52111111001
53111111011
54011110011
55100011101
56110011110
57111110011
58010011101
59100011111
60101011001
61100011001
62000111011
63011111001
64111110011
65100011001
66111111001
67111111001
68011111001
69111011001
70000001001
71111110011
72110011111
73111011011
74110011011
75100011011
76100011101
77111111010
78111110101
79110011011
80111101011
81111111001
82110011010
83110011110
84111111011
85100011000
86010011001
87110011111
88110110011
89110011011
90100011001
91100111111
92011110111
93110011111
94111111111
95110001000
96100011101
97110101000
98100011111
99110111110
100 010011001
101 110011010
102 001111011
103 110001000
104 111110110
105 110011001
106 110001110
107 110001010
108 100111011
109 100011101
110 100011001
111 001111001
112 111110111
113 111111010
114 111101011
115 111110011
116 110011101
117 010011001
118 100011001
119 111111110
120 000111011
121 111101010
122 111111110
123 110111110
124 110011110
125 110011011
126 110001000
127 110011101
128 111110011
129 100011111
130 111101010
131 111111110
132 110111110
133 100011001
134 001110011
135 110111001
136 110011010
137 111111010
138 101111111
139 111111111
140 110011110
141 100011001
142 111111110
143 100011011
144 000111010
145 110011000
146 111101000
147 110001000
148 111101010
149 110011001
150 110011011
151 111011001
152 100011101
153 100011001
154 000111100
155 101011011
156 110011001
157 110011010
158 110011000
159 110001010
160 100011001
161 111111010
162 001111000
163 111111110
1 = in the panel;
0 = not in the panel.

[0049]

The one hundred best random panels of proteins out of the million generated are shown in Table 2.

[0050]

1IBP3TSP1CO6A3PDIA3SEM3GSAA6PGDEF1A1PRDX1TERA
2EPHB6CNTN1CLUSIBP3BGH36PGDFRILLRP1TBB3ERO1A
3PPIBLG3BPMDHCDSG2BST1CD14DESPPRDX1CDCP1MMP9
4TPISCOIA1IBP3GGHISLRMMP2AIFM1DSG21433TCBPB2
5TPISIBP3CH10SEM3G6PGDFRILICAM3TERAFINCERO1A
6BGH3ICAM1MMP126PGDCD14EF1A1HYOU1PLXC1PROF1ERO1A
7KITLG3BPTPISIBP3LDHBGGHTCPAISLRCBPB2EF1A1
8LG3BPIBP3LDHBTSP1CRPZA2GCD14LRP1PLIN2ERO1A
9COIA1TSP1ISLRTFR1CBPB2FRILLRP1UGPAPTPAERO1A
10CO6A3SEM3GAPOEFRILICAM3PRDX1EF2HS90BNCF4PTPA
11PPIBLG3BPCOIA1APOA1DSG2APOECD14PLXC1NCF4GSLG1
12SODMEPHB6C163ACOIA1LDHBTETN1433TCD14PTPAERO1A
13SODMKPYMIBP3TSP1BGH3SEM3G6PGDCD14RAP2BEREG
14EPHB6ALDOAMMP7COIA1TIMP1GRP78MMP12CBPB2G3PPTPA
15KITTSP1SCFTIMP1OSTPPDIA3GRP78TNF12PRDX1PTPA
16IBP2LG3BPGELSHPTFIBAGGHICAM1BST1HYOU1GSLG1
17KITCD44CH10PEDFICAM16PGDS10A1ERO1AGSTP1MMP9
18LG3BPC163AGGHERBB3TETNBGH3ENOAGDIR2LRP1ERO1A
19SODMKPYMBGH3FOLH16PGDDESPLRP1TBA1BERO1AGSTP1
20CNTN1TETNICAM1K1C19ZA2G6PGDEF2RANERO1AGSTP1
21GELSENPLOSTPPEDFICAM1BST1TNF12GDIR2LRP1ERO1A
22KITLDHAIBP3PEDFDSG2FOLH1CD14LRP1UGPAERO1A
23KITTSP1ISLRBGH3COF1PTPRJ6PGDLRP1S10A6MPRI
24LG3BPC163AGGHDSG2ICAM16PGDGDIR2HYOU1EREGERO1A
25IBP2C163AENPLFIBABGH3CERU6PGDLRP1PRDX1MMP9
26LG3BPC163ATENXPDIA3SEM3GBST1VTNCFRILPRDX1ERO1A
27ALDOACOIA1TETN1433TCBPB2CD14G3PCD59ERO1AMMP9
28IBP3TENXCRPTETNMMP2SEM3GVTNCCD14PROF1ERO1A
29SODMEPHB6TPISTENXERBB3SCFTETNFRILLRP1ERO1A
30LG3BPIBP3POSTNDSG2MDHM1433ZCD14EF1A1PLXC1ERO1A
31IBP2LG3BPCOIA1CNTN1IBP3POSTNTETNBGH36PGDERO1A
32PVRTSP1GGHCYTBAIFM1ICAM1MDHM1433Z6PGDFRIL
33LYOXGELSCOIA1IBP3AIFM1ICAM1FRILPRDX1RAP2BNCF4
34KITAMPNTETNTNF126PGDFRILLRP1EF2ERO1AMMP9
35LG3BPGELSCOIA1CLUSCALUAIFM11433TCD14UGPAS10A1
36ALDOAIBP3TSP1TETNSEM3GICAM1EF1A1G3PRAP2BNCF4
37ALDOACOIA1CH10TETNPTPRJSEM3G1433T6PGDFRILERO1A
38LG3BPCOIA1PLSLFIBATENXPOSTNCD14LRP1NCF4ERO1A
39LUMIBP3CH10AIFM1MDHM6PGDPLXC1EF2CD59GSTP1
40SODMLG3BPLUMLDHAMDHCGGHICAM1LRP1TBA1BERO1A
41LG3BPCD44IBP3CALUCERU1433TCD14CLIC1NCF4ERO1A
42LG3BPTPISCOIA1HPTFIBAAIFM11433Z6PGDCD14EF2
43ALDOACD44MMP2CD14FRILPRDX1RANNCF4MPRIPTPA
44COIA1CLUSOSTPICAM11433TPLXC1PTGISRAP2BPTPAGSTP1
45KITLYOXIBP3GRP78FOLH1MASP1CD14LRP1ERO1AGSTP1
46LG3BPGGHCRPSCFICAM1ZA2G1433TRANNCF4ERO1A
47LG3BPC163ABGH3MMP2GRP78LRP1RANITA5HS90BPTPA
48ALDOACLUSTENXICAM1K1C19MASP16PGDCBPB2PRDX1PTPA
49IBP3PDIA3PEDFFOLH1ICAM1NRP16PGDUGPARANERO1A
50ENPLFIBAISLRSAA6PGDPRDX1EF2PLIN2HS90BGSLG1
51LG3BPCOIA1CO6A3GGHERBB3FOLH1ICAM1RANCDCP1ERO1A
52GELSENPLA1AG1SCFCOF1ICAM16PGDRAP2BEF2HS90B
53SODMIBP2COIA1CLUSIBP3ENPLPLSLTNF126PGDERO1A
54KITMMP7COIA1TSP1CO6A3GGHPDIA3ICAM1LRP1GSLG1
55ALDOACOIA1TSP1CH10NRP1CD14DESPLRP1CLIC1ERO1A
56C163AGELSCALUA1AG1AIFM1DSG2ICAM16PGDRAP2BNCF4
57PPIBLG3BPIBP3TSP1PLSLGRP78FOLH16PGDHYOU1RAP2B
58KITLG3BPLUMGELSOSTPICAM1CD14EF1A1NCF4MMP9
59KITPPIBLG3BPGELSFOLH1ICAM1MASP1GDIR2ITA5NCF4
60IBP3ENPLERBB3BGH3VTNC6PGDEF1A1TBA1BS10A6HS90B
61LG3BPCLUSIBP3SCFTCPAISLRGRP786PGDERO1AGSTP1
62LG3BPLEG1GELSGGHTETNENOAICAM1MASP1FRILNCF4
63LG3BPCD44TETNBGH3G3PLRP1PRDX1CDCP1PTPAMMP9
64CALUENPLICAM1VTNCFRILLRP1PROF1TBB3GSLG1ERO1A
65PPIBPLSLTENXA1AG1COF16PGDFRILLRP1CLIC1ERO1A
66IBP2IBP3CERUENOA6PGDCD14LRP1PDGFBERO1AGSTP1
67COIA11433TCD14DESPGDIR2PLXC1PROF1RAP2BRANERO1A
68LYOXOSTPTETNSEM3GICAM1ZA2GFRILEREGRANERO1A
69LG3BPIBP3TSP1PEDFFOLH1MDHMTNF12NRP1S10A6RAP2B
70KITALDOALG3BPCOIA1TSP1A1AG1BGH3SEM3GFOLH1RAN
71ALDOAOSTPBST1CD14G3PPRDX1PTGISFINCPTPAMMP9
72EPHB6TETNPEDFICAM1APOEPROF1UGPANCF4GSLG1PTPA
73LG3BPCOIA1ENPLMMP21433TEF1A1LRP1HS90BGSLG1ERO1A
74KITIBP3CYTBMMP21433Z6PGDCLIC1EF2NCF4PTPA
75SODMLYOXIBP3TETNSEM3GCD14PRDX1PTPAERO1AGSTP1
76SODMKPYMCOIA1MDHCTCPACD14FRILLRP1EF2ERO1A
77PPIBLG3BPFIBAGRP78AIFM1ICAM16PGDNCF4GSLG1PTPA
78LG3BPC163APVRMDHCTETNSEM3GAIFM16PGDEREGERO1A
79GELSISLRBGH3DSG2ICAM1SAAHYOU1ICAM3PTGISRAP2B
80KPYMTPISIBP3TIMP1GRP78ICAM1LRP1TERAERO1AMMP9
81IBP3HPTTSP1GRP78SAAMMP121433Z6PGDCD14S10A6
82TENXA1AG1ENOAAIFM16PGDCD14FRILLRP1RAP2BCD59
83ALDOAKPYMISLRTETNBGH3VTNCLRP1ITA5PTPAMMP9
84SODMTENXISLRTETNVTNC6PGDLRP1EF2ERO1AMMP9
85LG3BPC163ACOIA1FOLH1CD14LRP1TBA1BGSLG1ERO1AGSTP1
86SODMPVRCOIA1ISLRPDIA3APOECD14FRILLRP1CDCP1
87ALDOAPEDFICAM16PGDCD14FINCRANNCF4GSLG1PTPA
88LG3BPKPYMGELSCOIA1IBP3CD14EF1A1PLIN2HS90BERO1A
89LG3BPPVRCLUSTETNCOF1SEM3GDESPEF2HS90BERO1A
90LG3BPCOIA1FIBATETNTFR1ICAM1MDHMCD14PLXC1ERO1A
91PPIBLG3BPGELSCLUSTENXICAM1SAANCF4PTPAERO1A
92COIA1TSP1ISLRBGH3SAA6PGDLRP1PROF1EREGERO1A
93CALUFIBAOSTPISLRPDIA3SEM3GK1C196PGDHYOU1RAP2B
94FIBACH10GRP78SEM3GAIFM1ICAM1MDHMFRILUGPAGSTP1
95COIA1IBP3PDIA3ICAM1K1C19CD14EF1A1FRILPTGISPDGFB
96LG3BPC163ACOIA1LDHA1433T1433ZFRILLRP1ERO1AMMP9
97LG3BPGELSCOIA1GRP78SEM3GFRILPLXC1PROF1S10A1ERO1A
98LG3BPCOIA1ENPLGRP78AIFM1ICAM11433ZCD14LRP1ERO1A
99COIA1PLSLNRP11433TCD14FRILLRP1RAP2BPDGFBERO1A
100IBP2COIA1TETNDSG2FOLH11433TCD14FRILLRP1ERO1A

Preferred panels for ruling in treatment for a subject include the panels listed on Table 3 and 4. In various other embodiments, the panels according to the invention include measuring at least 2, 3, 4, 5, 6, 7, or more of the proteins listed on Tables 2 and 3.

[0051]

ERO1AERO1AERO1A
6PGD6PGD6PGD
FRILFRILFRIL
GSTP1GSTP1GSTP1
COIA1COIA1COIA1
GGHGGHGGH
PRDX1PRDX1PRDX1
LRP1CD14SEM3G
ICAM1LRP1GRP78
CD14LG3BPTETN
LG3BPPTPAAIFM1
PTPAICAM1TSP1
TETNTSP1MPRI
GRP78IBP3TNF12
AIFM1FOLH1MMP9
SEM3GSODMOSTP
BGH3FIBA
PDIA3GSLG1
FINCRAP2B
C163A

[0052]

LRP1LRP1 (LRP1
BGH3COIA1COIA1
COIA1TETNTETN
TETNTSP1TSP1
TSP1ALDOAALDOA
PRDX1GRP78GRP78
PROF1FRILFRIL
GRP78LG3BPAPOE
FRILBGH3TBB3
LG3BPISLR
CD14PRDX1
GGHFIBA
AIFM1GSLG1

A preferred normalizer panel is listed in Table 5.

[0053]

PEDF
MASP1
GELS
LUM
C163A
PTPRJ

[0054]

The term “pulmonary nodules” (PNs) refers to lung lesions that can be visualized by radiographic techniques. A pulmonary nodule is any nodules less than or equal to three centimeters in diameter. In one example a pulmonary nodule has a diameter of about 0.8 cm to 2 cm.

[0055]

The term “masses” or “pulmonary masses” refers to lung nodules that are greater than three centimeters maximal diameter.

[0056]

The term “blood biopsy” refers to a diagnostic study of the blood to determine whether a patient presenting with a nodule has a condition that may be classified as either benign or malignant.

[0057]

The term “acceptance criteria” refers to the set of criteria to which an assay, test, diagnostic or product should conform to be considered acceptable for its intended use. As used herein, acceptance criteria are a list of tests, references to analytical procedures, and appropriate measures, which are defined for an assay or product that will be used in a diagnostic. For example, the acceptance criteria for the classifier refers to a set of predetermined ranges of coefficients.

[0058]

The term “average maximal AUC” refers to the methodology of calculating performance. For the present invention, in the process of defining the set of proteins that should be in a panel by forward or backwards selection proteins are removed or added one at a time. A plot can be generated with performance (AUC or partial AUC score on the Y axis and proteins on the X axis) the point which maximizes performance indicates the number and set of proteins the gives the best result.

[0059]

The term “partial AUC factor or pAUC factor” is greater than expected by random prediction. At sensitivity=0.90 the pAUC factor is the trapezoidal area under the ROC curve from 0.9 to 1.0 Specificity/(0.1*0.1/2).

[0060]

The term “incremental information” refers to information that may be used with other diagnostic information to enhance diagnostic accuracy. Incremental information is independent of clinical factors such as including nodule size, age, or gender.

[0061]

The term “score” or “scoring” refers to the refers to calculating a probability likelihood for a sample. For the present invention, values closer to 1.0 are used to represent the likelihood that a sample is cancer, values closer to 0.0 represent the likelihood that a sample is benign.

[0062]

The term “robust” refers to a test or procedure that is not seriously disturbed by violations of the assumptions on which it is based. For the present invention, a robust test is a test wherein the proteins or transitions of the mass spectrometry chromatograms have been manually reviewed and are “generally” free of interfering signals

[0063]

The term “coefficients” refers to the weight assigned to each protein used to in the logistic regression equation to score a sample.

[0064]

In certain embodiments of the invention, it is contemplated that in terms of the logistic regression model of MC CV, the model coefficient and the coefficient of variation (CV) of each protein's model coefficient may increase or decrease, dependent upon the method (or model) of measurement of the protein classifier. For each of the listed proteins in the panels, there is about, at least, at least about, or at most about a 2-, 3-, 4-, 5-, 6-, 7-, 8-, 9-, or 10-, -fold or any range derivable therein for each of the coefficient and CV. Alternatively, it is contemplated that quantitative embodiments of the invention may be discussed in terms of as about, at least, at least about, or at most about 10, 20, 30, 40, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99% or more, or any range derivable therein.

[0065]

The term “best team players” refers to the proteins that rank the best in the random panel selection algorithm, i.e., perform well on panels. When combined into a classifier these proteins can segregate cancer from benign samples. “Best team player” proteins is synonymous with “cooperative proteins”. The term “cooperative proteins” refers proteins that appear more frequently on high performing panels of proteins than expected by chance. This gives rise to a protein's cooperative score which measures how (in)frequently it appears on high performing panels. For example, a protein with a cooperative score of 1.5 appears on high performing panels 1.5×more than would be expected by chance alone.

[0066]

The term “classifying” as used herein with regard to a lung condition refers to the act of compiling and analyzing expression data for using statistical techniques to provide a classification to aid in diagnosis of a lung condition, particularly lung cancer.

[0067]

The term “classifier” as used herein refers to an algorithm that discriminates between disease states with a predetermined level of statistical significance. A two-class classifier is an algorithm that uses data points from measurements from a sample and classifies the data into one of two groups. In certain embodiments, the data used in the classifier is the relative expression of proteins in a biological sample. Protein expression levels in a subject can be compared to levels in patients previously diagnosed as disease free or with a specified condition.

[0068]

The “classifier” maximizes the probability of distinguishing a randomly selected cancer sample from a randomly selected benign sample, i.e., the AUC of ROC curve.

[0069]

In addition to the classifier's constituent proteins with differential expression, it may also include proteins with minimal or no biologic variation to enable assessment of variability, or the lack thereof, within or between clinical specimens; these proteins may be termed endogenous proteins and serve as internal controls for the other classifier proteins.

[0070]

The term “normalization” or “normalizer” as used herein refers to the expression of a differential value in terms of a standard value to adjust for effects which arise from technical variation due to sample handling, sample preparation and mass spectrometry measurement rather than biological variation of protein concentration in a sample. For example, when measuring the expression of a differentially expressed protein, the absolute value for the expression of the protein can be expressed in terms of an absolute value for the expression of a standard protein that is substantially constant in expression. This prevents the technical variation of sample preparation and mass spectrometry measurement from impeding the measurement of protein concentration levels in the sample.

[0071]

The term “condition” as used herein refers generally to a disease, event, or change in health status.

[0072]

The term “treatment protocol” as used herein including further diagnostic testing typically performed to determine whether a pulmonary nodule is benign or malignant. Treatment protocols include diagnostic tests typically used to diagnose pulmonary nodules or masses such as for example, CT scan, positron emission tomography (PET) scan, bronchoscopy or tissue biopsy. Treatment protocol as used herein is also meant to include therapeutic treatments typically used to treat malignant pulmonary nodules and/or lung cancer such as for example, chemotherapy, radiation or surgery.

[0073]

The terms “diagnosis” and “diagnostics” also encompass the terms “prognosis” and “prognostics”, respectively, as well as the applications of such procedures over two or more time points to monitor the diagnosis and/or prognosis over time, and statistical modeling based thereupon. Furthermore the term diagnosis includes: a. prediction (determining if a patient will likely develop a hyperproliferative disease) b. prognosis (predicting whether a patient will likely have a better or worse outcome at a pre-selected time in the future) c. therapy selection d. therapeutic drug monitoring e. relapse monitoring.

[0074]

In some embodiments, for example, classification of a biological sample as being derived from a subject with a lung condition may refer to the results and related reports generated by a laboratory, while diagnosis may refer to the act of a medical professional in using the classification to identify or verify the lung condition.

[0075]

The term “providing” as used herein with regard to a biological sample refers to directly or indirectly obtaining the biological sample from a subject. For example, “providing” may refer to the act of directly obtaining the biological sample from a subject (e.g., by a blood draw, tissue biopsy, lavage and the like). Likewise, “providing” may refer to the act of indirectly obtaining the biological sample. For example, providing may refer to the act of a laboratory receiving the sample from the party that directly obtained the sample, or to the act of obtaining the sample from an archive.

[0076]

As used herein, “lung cancer” preferably refers to cancers of the lung, but may include any disease or other disorder of the respiratory system of a human or other mammal. Respiratory neoplastic disorders include, for example small cell carcinoma or small cell lung cancer (SCLC), non-small cell carcinoma or non-small cell lung cancer (NSCLC), squamous cell carcinoma, adenocarcinoma, broncho-alveolar carcinoma, mixed pulmonary carcinoma, malignant pleural mesothelioma, undifferentiated large cell carcinoma, giant cell carcinoma, synchronous tumors, large cell neuroendocrine carcinoma, adenosquamous carcinoma, undifferentiated carcinoma; and small cell carcinoma, including oat cell cancer, mixed small cell/large cell carcinoma, and combined small cell carcinoma; as well as adenoid cystic carcinoma, hamartomas, mucoepidermoid tumors, typical carcinoid lung tumors, atypical carcinoid lung tumors, peripheral carcinoid lung tumors, central carcinoid lung tumors, pleural mesotheliomas, and undifferentiated pulmonary carcinoma and cancers that originate outside the lungs such as secondary cancers that have metastasized to the lungs from other parts of the body. Lung cancers may be of any stage or grade. Preferably the term may be used to refer collectively to any dysplasia, hyperplasia, neoplasia, or metastasis in which the protein biomarkers expressed above normal levels as may be determined, for example, by comparison to adjacent healthy tissue.

[0077]

Examples of non-cancerous lung condition include chronic obstructive pulmonary disease (COPD), benign tumors or masses of cells (e.g., hamartoma, fibroma, neurofibroma), granuloma, sarcoidosis, and infections caused by bacterial (e.g., tuberculosis) or fungal (e.g. histoplasmosis) pathogens. In certain embodiments, a lung condition may be associated with the appearance of radiographic PNs.

[0078]

As used herein, “lung tissue”, and “lung cancer” refer to tissue or cancer, respectively, of the lungs themselves, as well as the tissue adjacent to and/or within the strata underlying the lungs and supporting structures such as the pleura, intercostal muscles, ribs, and other elements of the respiratory system. The respiratory system itself is taken in this context as representing nasal cavity, sinuses, pharynx, larynx, trachea, bronchi, lungs, lung lobes, aveoli, aveolar ducts, aveolar sacs, aveolar capillaries, bronchioles, respiratory bronchioles, visceral pleura, parietal pleura, pleural cavity, diaphragm, epiglottis, adenoids, tonsils, mouth and tongue, and the like. The tissue or cancer may be from a mammal and is preferably from a human, although monkeys, apes, cats, dogs, cows, horses and rabbits are within the scope of the present invention. The term “lung condition” as used herein refers to a disease, event, or change in health status relating to the lung, including for example lung cancer and various non-cancerous conditions.

[0079]

“Accuracy” refers to the degree of conformity of a measured or calculated quantity (a test reported value) to its actual (or true) value. Clinical accuracy relates to the proportion of true outcomes (true positives (TP) or true negatives (TN) versus misclassified outcomes (false positives (FP) or false negatives (FN)), and may be stated as a sensitivity, specificity, positive predictive values (PPV) or negative predictive values (NPV), or as a likelihood, odds ratio, among other measures.

[0080]

The term “biological sample” as used herein refers to any sample of biological origin potentially containing one or more biomarker proteins. Examples of biological samples include tissue, organs, or bodily fluids such as whole blood, plasma, serum, tissue, lavage or any other specimen used for detection of disease.

[0081]

The term “subject” as used herein refers to a mammal, preferably a human.

[0082]

The term “biomarker protein” as used herein refers to a polypeptide in a biological sample from a subject with a lung condition versus a biological sample from a control subject. A biomarker protein includes not only the polypeptide itself, but also minor variations thereof, including for example one or more amino acid substitutions or modifications such as glycosylation or phosphorylation.

[0083]

The term “biomarker protein panel” as used herein refers to a plurality of biomarker proteins. In certain embodiments, the expression levels of the proteins in the panels can be correlated with the existence of a lung condition in a subject. In certain embodiments, biomarker protein panels comprise 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 60, 70, 80, 90 or 100 proteins. In certain embodiments, the biomarker proteins panels comprise from 100-125 proteins, 125-150 proteins, 150-200 proteins or more.

[0084]

“Treating” or “treatment” as used herein with regard to a condition may refer to preventing the condition, slowing the onset or rate of development of the condition, reducing the risk of developing the condition, preventing or delaying the development of symptoms associated with the condition, reducing or ending symptoms associated with the condition, generating a complete or partial regression of the condition, or some combination thereof.

[0085]

The term “ruling out” as used herein is meant that the subject is selected not to receive a treatment protocol.

[0086]

The term “ruling-in” as used herein is meant that the subject is selected to receive a treatment protocol.

[0087]

Biomarker levels may change due to treatment of the disease. The changes in biomarker levels may be measured by the present invention. Changes in biomarker levels may be used to monitor the progression of disease or therapy.

[0088]

“Altered”, “changed” or “significantly different” refer to a detectable change or difference from a reasonably comparable state, profile, measurement, or the like. One skilled in the art should be able to determine a reasonable measurable change. Such changes may be all or none. They may be incremental and need not be linear. They may be by orders of magnitude. A change may be an increase or decrease by 1%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 99%, 100%, or more, or any value in between 0% and 100%. Alternatively the change may be 1-fold, 1.5-fold 2-fold, 3-fold, 4-fold, 5-fold or more, or any values in between 1-fold and five-fold. The change may be statistically significant with a p value of 0.1, 0.05, or 0.0001.

[0089]

Using the methods of the current invention, a clinical assessment of a patient is first performed. If there exists is a higher likelihood for cancer, the clinician may rule in the disease which will require the pursuit of diagnostic testing options yielding data which increase and/or substantiate the likelihood of the diagnosis. “Rule in” of a disease requires a test with a high specificity.

[0090]

“FN” is false negative, which for a disease state test means classifying a disease subject incorrectly as non-disease or normal.

[0091]

“FP” is false positive, which for a disease state test means classifying a normal subject incorrectly as having disease.

[0092]

The term “rule in” refers to a diagnostic test with high specificity that coupled with a clinical assessment indicates a higher likelihood for cancer. If the clinical assessment is a lower likelihood for cancer, the clinician may adopt a stance to rule out the disease, which will require diagnostic tests which yield data that decrease the likelihood of the diagnosis. “Rule out” requires a test with a high sensitivity.

[0093]

The term “rule out” refers to a diagnostic test with high sensitivity that coupled with a clinical assessment indicates a lower likelihood for cancer.

[0094]

The term “sensitivity of a test” refers to the probability that a patient with the disease will have a positive test result. This is derived from the number of patients with the disease who have a positive test result (true positive) divided by the total number of patients with the disease, including those with true positive results and those patients with the disease who have a negative result, i.e. false negative.

[0095]

The term “specificity of a test” refers to the probability that a patient without the disease will have a negative test result. This is derived from the number of patients without the disease who have a negative test result (true negative) divided by all patients without the disease, including those with a true negative result and those patients without the disease who have a positive test result, e.g. false positive. While the sensitivity, specificity, true or false positive rate, and true or false negative rate of a test provide an indication of a test's performance, e.g. relative to other tests, to make a clinical decision for an individual patient based on the test's result, the clinician requires performance parameters of the test with respect to a given population.

[0096]

The term “positive predictive value” (PPV) refers to the probability that a positive result correctly identifies a patient who has the disease, which is the number of true positives divided by the sum of true positives and false positives.

[0097]

The term “negative predictive value” or “NPV” is calculated by TN/(TN+FN) or the true negative fraction of all negative test results. It also is inherently impacted by the prevalence of the disease and pre-test probability of the population intended to be tested.

[0098]

The term “disease prevalence” refers to the number of all new and old cases of a disease or occurrences of an event during a particular period. Prevalence is expressed as a ratio in which the number of events is the numerator and the population at risk is the denominator. The term disease incidence refers to a measure of the risk of developing some new condition within a specified period of time; the number of new cases during some time period, it is better expressed as a proportion or a rate with a denominator.

[0099]

Lung cancer risk according to the “National Lung Screening Trial” is classified by age and smoking history. High risk—age ≥55 and ≥30 pack-years smoking history; Moderate risk—age ≥50 and ≥20 pack-years smoking history; Low risk— <age 50 or <20 pack-years smoking history.

[0100]

The term “negative predictive value” (NPV) refers to the probability that a negative test correctly identifies a patient without the disease, which is the number of true negatives divided by the sum of true negatives and false negatives. A positive result from a test with a sufficient PPV can be used to rule in the disease for a patient, while a negative result from a test with a sufficient NPV can be used to rule out the disease, if the disease prevalence for the given population, of which the patient can be considered a part, is known.

[0101]

The clinician must decide on using a diagnostic test based on its intrinsic performance parameters, including sensitivity and specificity, and on its extrinsic performance parameters, such as positive predictive value and negative predictive value, which depend upon the disease's prevalence in a given population.

[0102]

Additional parameters which may influence clinical assessment of disease likelihood include the prior frequency and closeness of a patient to a known agent, e.g. exposure risk, that directly or indirectly is associated with disease causation, e.g. second hand smoke, radiation, etc., and also the radiographic appearance or characterization of the pulmonary nodule exclusive of size. A nodule's description may include solid, semi-solid or ground glass which characterizes it based on the spectrum of relative gray scale density employed by the CT scan technology.

[0103]

“Mass spectrometry” refers to a method comprising employing an ionization source to generate gas phase ions from an analyte presented on a sample presenting surface of a probe and detecting the gas phase ions with a mass spectrometer.

[0104]

The technology liquid chromatography selected reaction monitoring mass spectrometry (LC-SRM-MS) was used to assay the expression levels of a cohort of 388 proteins in the blood to identify differences for individual proteins which may correlate with the absence or presence of the disease. The individual proteins have not only been implicated in lung cancer biology, but are also likely to be present in plasma based on their expression as membrane-anchored or secreted proteins. An analysis of epithelial and endothelial membranes of resected lung cancer tissues (including the subtypes of adenocarcinoma, squamous, and large cell) identified 217 tissue proteins. A review of the scientific literature with search terms relevant to lung cancer biology identified 319 proteins. There was an overlap of 148 proteins between proteins identified by cancer tissue analysis or literature review, yielding a total of 388 unique proteins as candidates. The majority of candidate proteins included in the multiplex LC-SRM-MS assay were discovered following proteomics analysis of secretory vesicle contents from fresh NSCLC resections and from adjacent non-malignant tissue. The secretory proteins reproducibly upregulated in the tumor tissue were identified and prioritized for inclusion in the LC-SRM-MS assay using extensive bioinformatic and literature annotation. An additional set of proteins that were present in relevant literature was also added to the assay. In total, 388 proteins associated with lung cancer were prioritized for SRM assay development. Of these, 371 candidate protein biomarkers were ultimately included in the assay. These are listed in Table 6, below.

[0105]

1433B_HUMAN14-3-3YWHABSecreted,LungCancersCytoplasm.Literature,
proteinEPIMelanosome.Detection
beta/alphaNote = Identified
by mass
spectrometry
in melanosome
fractions
from stage I
to stage IV.
1433E_HUMAN14-3-3YWHAEENDOLungCancers,CytoplasmLiterature,
proteinBenign-(By similarity).Detection
epsilonNodulesMelanosome.
Note = Identified
by mass
spectrometry
in melanosome
fractions
from stage I
to stage IV.
1433S_HUMAN14-3-3SFNSecreted,LungCancersCytoplasm.UniProt, Literature,
proteinEPINucleus (ByDetection
sigmasimilarity).
Secreted.
Note = May
be secreted
by a non-
classical
secretory
pathway.
1433T_HUMAN14-3-3YWHAQEPILungCancers,Cytoplasm.Detection
proteinBenign-Note = In
thetaNodulesneurons,
axonally
transported
to the nerve
terminals.
1433Z_HUMAN14-3-3YWHAZEPILungCancers,Cytoplasm.Detection
proteinBenign-Melanosome.
zeta/deltaNodulesNote = Located
to stage I
to stage IV
melanosomes.
6PGD_HUMAN6-PGDEPI, ENDOCytoplasmDetection
phosphogluconate(By similarity).
dehydrogenase,
decarboxylating
A1AG1_HUMANAlpha-1-ORM1EPISymptomsSecreted.UniProt, Literature,
acid glycoprotein 1Detection,
Prediction
ABCD1_HUMANATP-ABCD1ENDOPeroxisomeDetection,
bindingmembrane;Prediction
cassetteMulti-pass
sub-membrane
family Dprotein.
member 1
ADA12_HUMANDisintegrinADAM12LungCancers,Isoform 1:UniProt, Detection,
andBenign-Cell membrane;Prediction
metallo-Nodules,Single-
proteinaseSymptomspass
domain-type I membrane
containingprotein.
protein 12|Isoform
2: Secreted.
|Isoform
3: Secreted
(Potential).
|Isoform
4: Secreted
(Potential).
ADML_HUMANADMADMLungCancers,Secreted.UniProt, Literature,
Benign-Detection,
Nodules,Prediction
Symptoms
AGR2_HUMANAnteriorAGR2EPILungCancersSecreted.UniProt, Prediction
gradientEndoplasmic
protein 2reticulum
homolog(By
similarity).
AIFM1_HUMANApoptosis-AIFM1EPI, ENDOLungCancersMitochondrionDetection,
inducinginter-Prediction
factor 1,membrane
mitochondrialspace. Nucleus.
Note = Translocated
to the
nucleus upon
induction
of apoptosis.
ALDOA_HUMANFructose-ALDOASecreted,LungCancers,Literature,
bisphosphateEPISymptomsDetection
aldolase A
AMPN_HUMANAminopeptidase NANPEPEPI, ENDOLungCancers,Cell membrane;UniProt, Detection
Benign-Single-
Nodules,pass
Symptomstype II
membrane
protein. Cytoplasm,
cytosol (Potential).
Note = A
soluble form
has also
been detected.
ANGP1_HUMANAngiopoietin-1ANGPT1LungCancers,Secreted.UniProt, Literature,
Benign-Prediction
Nodules
ANGP2_HUMANAngiopoietin-2ANGPT2LungCancers,Secreted.UniProt, Literature,
Benign-Prediction
Nodules
APOA1_HUMANApolipo-APOA1LungCancers,Secreted.UniProt, Literature,
protein A-IBenign-Detection,
Nodules,Prediction
Symptoms
APOE_HUMANApolipo-APOEEPI, ENDOLungCancers,Secreted.UniProt, Literature,
protein EBenign-Detection,
Nodules,Prediction
Symptoms
ASM3B_HUMANAcidSMPDL3BEPI, ENDOSecreted (ByUniProt, Prediction
sphingo-similarity).
myelinase-
like
phosphodiesterase
3b
AT2A2_HUMANSarcoplasmic/ATP2A2EPI, ENDOLungCancers,EndoplasmicDetection
endoplasmicBenign-reticulum
reticulumNodulesmembrane;
calciumMulti-
ATPase 2pass
membrane
protein. Sarcoplasmic
reticulum
membrane;
Multi-pass
membrane
protein.
ATS1_HUMANA disintegrinADAMTS1LungCancers,Secreted,UniProt, Literature,
andBenign-extracellularPrediction
metallo-Nodules,space, extra-
proteinaseSymptomscellular matrix
with(By similarity).
thrombospondin
motifs 1
ATS12_HUMANA disintegrinADAMTS12LungCancersSecreted,UniProt, Detection,
andextracellularPrediction
metallo-space, extra-
proteinasecellular matrix
with(By similarity).
thrombospondin
motifs 12
ATS19_HUMANA disintegrinADAMTS19LungCancersSecreted,UniProt, Prediction
andextracellular
metallo-space, extra-
proteinasecellular matrix
with(By similarity).
thrombospondin
motifs 19
BAGE1_HUMANB melanomaBAGELungCancersSecretedUniProt, Prediction
antigen 1(Potential).
BAGE2_HUMANB melanomaBAGE2LungCancersSecretedUniProt, Prediction
antigen 2(Potential).
BAGE3_HUMANB melanomaBAGE3LungCancersSecretedUniProt, Prediction
antigen 3(Potential).
BAGE4_HUMANB melanomaBAGE4LungCancersSecretedUniProt, Prediction
antigen 4(Potential).
BAGE5_HUMANB melanomaBAGE5LungCancersSecretedUniProt, Prediction
antigen 5(Potential).
BASP1_HUMANBrain acidBASP1Secreted,Cell membrane;Detection
solubleEPILipid-
protein 1anchor.
Cell projection,
growth
cone.
Note = Associated
with
the membranes
of
growth
cones that
form the tips
of elongating
axons.
BAX_HUMANApoptosisBAXEPILungCancers,Isoform Alpha:UniProt, Literature,
regulatorBenign-MitochondrionPrediction
BAXNodulesmembrane;
Single-pass
membrane
protein. Cytoplasm.
Note = Colocalizes
with
14-3-3 proteins
in the
cytoplasm.
Under stress
conditions,
redistributes
to the mitochondrion
membrane
through the
release from
JNK-
phosphorylated
14-3-3
proteins.
|Isoform
Beta: Cytoplasm.
|Isoform
Gamma:
Cytoplasm.
|Isoform
Delta:
Cytoplasm
(Potential).
BDNF_HUMANBrain-BDNFBenign-Secreted.UniProt, Literature,
derivedNodules,Prediction
neurotrophicSymptoms
factor
BGH3_HUMANTransformingTGFBILungCancers,Secreted,UniProt, Detection
growthBenign-extracellular
factor-Nodulesspace, extra-
beta-cellular matrix.
inducedNote = May
protein igh3be associated
both with
microfibrils
and with the
cell surface.
BMP2_HUMANBoneBMP2LungCancers,Secreted.UniProt, Literature
morphogeneticBenign-
protein 2Nodules,
Symptoms
BST1_HUMANADP-BST1EPISymptomsCell membrane;Detection,
ribosylLipid-Prediction
cyclase 2anchor,
GPI-anchor.
C163A_HUMANScavengerCD163EPISymptomsSolubleUniProt, Detection
receptorCD163: Secreted.
cysteine-|Cell
rich type 1membrane;
proteinSingle-pass
M130type I membrane
protein.
Note = Isoform
1 and
isoform 2
show a lower
surface
expression
when expressed
in
cells.
C4BPA_HUMANC4b-C4BPALungCancers,Secreted.UniProt, Detection,
bindingSymptomsPrediction
protein
alpha
chain
CAH9_HUMANCarbonicCA9LungCancers,Nucleus.UniProt
anhydrase 9Benign-Nucleus,
Nodules,nucleolus.
SymptomsCell membrane;
Single-
pass
type I membrane
protein.
Cell
projection,
microvillus
membrane;
Single-pass
type I membrane
protein.
Note = Found
on the surface
microvilli
and in
the nucleus,
particularly
in nucleolus.
CALR_HUMANCalreticulinCALREPISymptomsEndoplasmicUniProt, Literature,
reticulumDetection,
lumen.Prediction
Cytoplasm,
cytosol. Secreted,
extracellular
space, extra-
cellular matrix.
Cell
surface.
Note = Also
found in cell
surface (T
cells), cytosol
and extracellular
matrix. Associated
with the
lytic granules
in the
cytolytic T-
lymphocytes.
CALU_HUMANCalumeninCALUEPISymptomsEndoplasmicUniProt, Detection,
reticulumPrediction
lumen.
Secreted.
Melanosome.
Sarcoplasmic
reticulum
lumen (By
similarity).
Note = Identified
by mass
spectrometry
in melanosome
fractions
from stage I
to stage IV.
CALX_HUMANCalnexinCANXSecreted,Benign-EndoplasmicUniProt, Literature,
EPI, ENDONodulesreticulumDetection
membrane;
Single-
pass
type I membrane
protein.
Melanosome.
Note = Identified
by mass
spectrometry
in melanosome
fractions
from stage I
to stage IV.
CAP7_HUMANAzurocidinAZU1EPISymptomsCytoplasmicPrediction
granule.
Note = Cytoplasmic
granules
of neutrophils.
CATB_HUMANCathepsin BCTSBSecretedLungCancersLysosome.Literature,
Melanosome.Detection,
Note = IdentifiedPrediction
by mass
spectrometry
in melanosome
fractions
from stage I
to stage IV.
CATG_HUMANCathepsin GCTSGSecreted,Benign-Cell surface.Detection,
ENDONodulesPrediction
CBPB2_HUMANCarboxy-CPB2LungCancers,Secreted.UniProt, Detection,
peptidaseBenign-Prediction
B2Nodules,
Symptoms
CCL22_HUMANC-C motifCCL22LungCancers,Secreted.UniProt, Prediction
chemokineBenign-
22Nodules
CD14_HUMANMonocyteCD14EPILungCancers,Cell membrane;Literature,
differentiationBenign-Lipid-Detection,
antigenNodules,anchor,Prediction
CD14SymptomsGPI-anchor.
CD24_HUMANSignalCD24LungCancers,Cell membrane;Literature
transducerBenign-Lipid-
CD24Nodulesanchor,
GPI-anchor.
CD2A2_HUMANCyclin-CDKN2ALungCancers,Cytoplasm.Literature,
dependentBenign-Nucleus.Prediction
kinaseNodules|Nucleus,
inhibitornucleolus
2A, isoform 4(By similarity).
CD38_HUMANADP-CD38EPI, ENDOSymptomsMembrane;UniProt, Literature
ribosylSingle-pass
cyclase 1type II
membrane
protein.
CD40L_HUMANCD40CD40LGLungCancers,Cell membrane;UniProt, Literature
ligandBenign-Single-
Nodules,pass
Symptomstype II
membrane
protein.
|CD40
ligand, soluble
form:
Secreted.
CD44_HUMANCD44CD44EPILungCancers,Membrane;UniProt, Literature,
antigenBenign-Single-passDetection,
Nodules,type I membranePrediction
Symptomsprotein.
CD59_HUMANCD59CD59LungCancers,Cell membrane;UniProt, Literature,
glycoproteinBenign-Lipid-Detection,
Nodules,anchor,Prediction
SymptomsGPI-anchor.
Secreted.
Note = Soluble
form
found in a
number of
tissues.
CD97_HUMANCD97CD97EPI, ENDOSymptomsCell membrane;UniProt
antigenMulti-
pass
membrane
protein.
|CD97
antigen subunit
alpha:
Secreted,
extracellular
space.
CDCP1_HUMANCUB domain-CDCP1LungCancersIsoform 1:UniProt, Prediction
containingCell membrane;
protein 1Single-
pass
membrane
protein (Potential).
Note = Shedding
may also
lead to a
soluble peptide.
|Isoform
3: Secreted.
CDK4_HUMANCell divisionCDK4LungCancers,Literature
proteinSymptoms
kinase 4
CEAM5_HUMANCarcinoembryonicCEACAM5EPILungCancers,Cell membrane;Literature,
antigen-Benign-Lipid-Prediction
relatedNodules,anchor,
cell adhesionSymptomsGPI-anchor.
molecule 5
CEAM8_HUMANCarcinoembryonicCEACAM8EPILungCancersCell membrane;Detection,
antigen-Lipid-Prediction
relatedanchor,
cell adhesionGPI-anchor.
molecule 8
CERU_HUMANCeruloplasminCPEPILungCancers,Secreted.UniProt, Literature,
SymptomsDetection,
Prediction
CH10_HUMAN10 kDaHSPE1ENDOLungCancersMitochondrionLiterature,
heat shockmatrix.Detection,
protein,Prediction
mitochondrial
CH60_HUMAN60 kDaHSPD1Secreted,LungCancers,MitochondrionLiterature,
heat shockEPI, ENDOSymptomsmatrix.Detection
protein,
mitochondrial
CKAP4_HUMANCytoskeleton-CKAP4EPI, ENDOLungCancersEndoplasmicUniProt
associatedreticulum-
protein 4Golgi
intermediate
compartment
membrane;
Single-
pass
membrane
protein (Potential).
CL041_HUMANUncharacterizedC12orf41ENDOPrediction
protein
C12orf41
CLCA1_HUMANCalcium-CLCA1LungCancers,Secreted,UniProt, Prediction
activatedBenign-extracellular
chlorideNodulesspace. Cell
channelmembrane;
regulator 1Peripheral
membrane
protein; Extracellular
side.
Note = Protein
that remains
attached
to the
plasma
membrane
appeared to
be predominantly
localized
to microvilli.
CLIC1_HUMANChlorideCLIC1EPINucleus.UniProt, Literature,
intracellularNucleusDetection
channelmembrane;
protein 1Single-pass
membrane
protein
(Probable).
Cytoplasm.
Cell membrane;
Single-
pass
membrane
protein
(Probable).
Note = Mostlyin
the nucleus
including
in the
nuclear
membrane.
Small
amount in
the cytoplasm
and
the plasma
membrane.
Exists both
as soluble
cytoplasmic
protein and
as membrane
protein
with
probably a
single
transmembrane
domain.
CLUS_HUMANClusterinCLUEPI, ENDOLungCancers,Secreted.UniProt, Literature,
Benign-Detection,
Nodules,Prediction
Symptoms
CMGA_HUMANChromogranin-ACHGALungCancers,Secreted.UniProt, Literature,
Benign-Note = NeuroDetection,
NodulesendocrinePrediction
and endocrine
secretory
granules.
CNTN1_HUMANContactin-1CNTN1LungCancersIsoform 1:Detection,
Cell membrane;Prediction
Lipid-
anchor,
GPI-anchor;
Extracellular
side.|Isoform
2: Cell
membrane;
Lipid-
anchor, GPI-
anchor; Extracellular
side.
CO4A1_HUMANCollagenCOL4A1LungCancersSecreted,UniProt, Detection,
alpha-extracellularPrediction
1(IV)space, extra-
chaincellular matrix,
basement
membrane.
CO5A2_HUMANCollagenCOL5A2LungCancersSecreted,UniProt, Detection,
alpha-extracellularPrediction
2(V) chainspace, extra-
cellular matrix
(By similarity).
CO6A3_HUMANCollagenCOL6A3SecretedSymptomsSecreted,UniProt, Detection,
alpha-extracellularPrediction
3(VI)space, extra-
chaincellular matrix
(By similarity).
COCA1_HUMANCollagenCOL12A1ENDOLungCancers,Secreted,UniProt, Prediction
alpha-Symptomsextracellular
1(XII)space, extra-
chaincellular matrix
(By similarity).
COF1_HUMANCofilin-1CFL1Secreted,LungCancers,NucleusDetection,
EPIBenign-matrix. Cytoplasm,Prediction
Nodulescytoskeleton.
Note = Almost
completely
in nucleus in
cells exposed
to
heat shock
or 10% di-
methyl sulfoxide.
COIA1_HUMANCollagenCOL18A1LungCancers,Secreted,UniProt, Literature,
alpha-Benign-extracellularDetection,
1(XVIII)Nodulesspace, extra-Prediction
chaincellular matrix
(By similarity).
COX5A_HUMANCytochrome cCOX5ASecreted,MitochondrionPrediction
oxidaseENDOinner
subunitmembrane.
5A, mitochondrial
CRP_HUMANC-reactiveCRPLungCancers,Secreted.UniProt, Literature,
proteinBenign-Detection,
Nodules,Prediction
Symptoms
CS051_HUMANUPF0470C19orf51ENDOPrediction
protein
C19orf51
CSF1_HUMANMacrophageCSF1LungCancers,Cell membrane;UniProt, Literature,
colony-Benign-Single-Detection
stimulatingNodulespass
factor 1membrane
protein (By
similarity).
|Processed
macrophage
colony-
stimulating
factor 1:
Secreted,
extracellular
space (By
similarity).
CSF2_HUMANGranulocyte-CSF2LungCancers,Secreted.UniProt, Literature,
macrophageBenign-Prediction
colony-Nodules
stimulating
factor
CT085_HUMANUncharacterizedC20orf85LungCancers,Prediction
proteinBenign-
C20orf85Nodules
CTGF_HUMANConnectiveCTGFLungCancers,Secreted,UniProt, Literature,
tissueBenign-extracellularDetection,
growthNodulesspace, extra-Prediction
factorcellular matrix
(By similarity).
Secreted
(By
similarity).
CYR61_HUMANProteinCYR61LungCancers,Secreted.UniProt, Prediction
CYR61Benign-
Nodules
CYTA_HUMANCystatin-ACSTALungCancersCytoplasm.Literature,
Detection
CYTB_HUMANCystatin-BCSTBSecretedCytoplasm.Literature,
Nucleus.Detection
DDX17_HUMANProbableDDX17ENDOLungCancers,Nucleus.Detection,
ATP-Benign-Prediction
dependentNodules
RNA helicase
DDX17
DEFB1_HUMANBeta-DEFB1LungCancers,Secreted.UniProt, Prediction
defensin 1Benign-
Nodules
DESP_HUMANDesmoplakinDSPEPI, ENDOLungCancersCell junction,Detection
desmosome.
Cytoplasm,
cytoskeleton.
Note = Inner
most portion
of the desmosomal
plaque.
DFB4A_HUMANBeta-DEFB4ALungCancers,Secreted.UniProt
defensinBenign-
4ANodules
DHI1L_HUMANHydroxysteroidHSD11B1LLungCancersSecretedUniProt, Prediction
11-(Potential).
beta-
dehydrogenase
1-
like protein
DMBT1_HUMANDeleted inDMBT1LungCancers,Secreted (ByUniProt, Detection,
malignantBenign-similarity).Prediction
brain tumors 1NodulesNote = Some
proteinisoforms
may be
membrane-
bound. Localized
to
the lumenal
aspect of
crypt cells in
the small
intestine. In
the colon,
seen in the
lumenal
aspect of
surface epithelial
cells.
Formed in
the ducts of
von Ebner
gland, and
released into
the fluid
bathing the
taste buds
contained in
the taste
papillae (By
similarity).
DMKN_HUMANDermokineDMKNLungCancersSecreted.UniProt, Detection,
Prediction
DPP4_HUMANDipeptidylDPP4EPILungCancers,DipeptidylUniProt, Detection
peptidase 4Benign-peptidase 4
Nodules,soluble
Symptomsform: Secreted.
|Cell
membrane;
Single-pass
type II
membrane
protein.
DSG2_HUMANDesmoglein-2DSG2ENDOSymptomsCell membrane;UniProt, Detection
Single-
pass
type I membrane
protein.
Cell
junction,
desmosome.
DX39A_HUMANATP-DDX39AEPINucleus (ByPrediction
dependentsimilarity).
RNA helicase
DDX39A
DX39B_HUMANSpliceosomeDDX39BEPINucleus.Prediction
RNA helicaseNucleus
DDX39Bspeckle.
DYRK2_HUMANDual specificityDYRK2ENDOLungCancersCytoplasm.Literature
tyrosine-Nucleus.
phosphorylation-Note = Translocates
regulatedinto
kinase 2the nucleus
following
DNA damage.
EDN2_HUMANEndothelin-2EDN2LungCancersSecreted.UniProt, Prediction
EF1A1_HUMANElongationEEF1A1Secreted,LungCancers,Cytoplasm.Detection
factorEPIBenign-
1-alpha 1Nodules
EF1D_HUMANElongationEEF1DSecreted,LungCancersPrediction
factorEPI
1-delta
EF2_HUMANElongationEEF2Secreted,Cytoplasm.Literature,
factor 2EPIDetection
EGF_HUMANPro-EGFLungCancers,Membrane;UniProt, Literature
epidermalBenign-Single-pass
growthNodules,type I membrane
factorSymptomsprotein.
EGFL6_HUMANEpidermalEGFL6LungCancersSecreted,UniProt, Detection,
growthextracellularPrediction
factor-likespace, extra-
protein 6cellular matrix,
basement
membrane
(By
similarity).
ENOA_HUMANAlpha-ENO1Secreted,LungCancers,Cytoplasm.Literature,
enolaseEPI, ENDOBenign-Cell membrane.Detection,
Nodules,Cytoplasm,Prediction
Symptomsmyofibril,
sarcomere,
M-
band.
Note = Can
translocate
to the plasma
membrane
in
either the
homodimeric
(alpha/
alpha)
or heterodimeric
(alpha/
gamma)
form. ENO1
is localized
to the M-
band.|Isoform
MBP-1:
Nucleus.
ENOG_HUMANGamma-ENO2EPILungCancers,CytoplasmLiterature,
enolaseSymptoms(By similarity).Detection,
CellPrediction
membrane
(By similarity).
Note = Can
translocate
to the plasma
membrane
in
either the
homodimeric
(alpha/
alpha)
or heterodimeric
(alpha/
gamma)
form (By
similarity).
ENOX2_HUMANEcto-ENOX2LungCancersCell membrane.UniProt, Detection
NOX di-Secreted,
sulfide-extracellular
thiol exchanger 2space.
Note = Extracellular
and
plasma
membrane-
associated.
ENPL_HUMANEndo-HSP90B1Secreted,LungCancers,EndoplasmicLiterature,
plasminEPI, ENDOBenign-reticulumDetection,
Nodules,lumen.Prediction
SymptomsMelanosome.
Note = Identified
by mass
spectrometry
in melanosome
fractions
from stage I
to stage IV.
EPHB6_HUMANEphrinEPHB6LungCancersMembrane;UniProt, Literature
type-BSingle-pass
receptor 6type I membrane
protein.
|Isoform
3: Secreted
(Probable).
EPOR_HUMANErythro-EPORLungCancers,Cell membrane;UniProt, Literature,
poietinBenign-Single-Detection
receptorNodules,pass
Symptomstype I membrane
protein.
|Isoform
EPOR-S:
Secreted.
Note = Secreted
and located
to the
cell surface.
ERBB3_HUMANReceptorERBB3LungCancers,Isoform 1:UniProt, Literature,
tyrosine-Benign-Cell membrane;Prediction
proteinNodulesSingle-
kinasepass
erbB-3type I membrane
protein.
|Isoform
2: Secreted.
EREG_HUMANPro-EREGLungCancersEpiregulin:UniProt
epiregulinSecreted,
extracellular
space.|Proepiregulin:
Cell membrane;
Single-
pass
type I membrane
protein.
ERO1A_HUMANERO1-ERO1LSecreted,SymptomsEndoplasmicPrediction
like proteinEPI, ENDOreticulum
alphamembrane;
Peripheral
membrane
protein;
Lumenal
side.
Note = The
association
with ERP44
is essential
for its retention
in the
endoplasmic
reticulum.
ESM1_HUMANEndothelialESM1LungCancers,Secreted.UniProt, Prediction
cell-Benign-
specificNodules
molecule 1
EZRI_HUMANEzrinEZRSecretedLungCancers,Apical cellLiterature,
Benign-membrane;Detection,
NodulesPeripheralPrediction
membrane
protein; Cytoplasmic
side. Cell
projection.
Cell projection,
microvillus
membrane;
Peripheral
membrane
protein; Cytoplasmic
side. Cell
projection,
ruffle membrane;
Peripheral
membrane
protein; Cytoplasmic
side. Cytoplasm,
cell
cortex. Cytoplasm,
cytoskeleton.
Note = Localization
to the
apical membrane
of
parietal cells
depends on
the interaction
with
MPP5. Localizes
to
cell extensions
and
peripheral
processes of
astrocytes
(By similarity).
Microvillar
peripheral
membrane
protein (cytoplasmic
side).
F10A1_HUMANHsc70-ST13EPICytoplasmDetection,
interacting(By similarity).Prediction
protein|Cytoplasm
(Probable).
FAM3C_HUMANProteinFAM3CEPI, ENDOSecretedUniProt, Detection
FAM3C(Potential).
FAS_HUMANFatty acidFASNEPILungCancers,Cytoplasm.Literature,
synthaseBenign-Melanosome.Detection
Nodules,Note = Identified
Symptomsby mass
spectrometry
in melanosome
fractions
from stage I
to stage IV.
FCGR1_HUMANHigh affinityFCGR1AEPILungCancers,Cell membrane;UniProt
immunoglobulinBenign-Single-
gamma FcNodules,pass
receptor ISymptomstype I membrane
protein.
Note = Stabilized
at the
cell membrane
through interaction
with
FCER1G.
FGF10_HUMANFibroblastFGF10LungCancersSecretedUniProt, Prediction
growth(Potential).
factor 10
FGF2_HUMANHeparin-FGF2LungCancers,Literature
bindingBenign-
growthNodules,
factor 2Symptoms
FGF7_HUMANKeratinocyteFGF7LungCancers,Secreted.UniProt, Literature,
growthBenign-Prediction
factorNodules
FGF9_HUMANGlia-FGF9LungCancersSecreted.UniProt, Literature,
activatingPrediction
factor
FGFR2_HUMANFibroblastFGFR2LungCancers,Cell membrane;UniProt, Literature,
growthBenign-Single-Prediction
factorNodulespass
receptor 2type I membrane
protein.
|Isoform
14: Secreted.
|Isoform
19: Secreted.
FGFR3_HUMANFibroblastFGFR3LungCancersMembrane;UniProt, Literature,
growthSingle-passPrediction
factortype I membrane
receptor 3protein.
FGL2_HUMANFibroleukinFGL2Benign-Secreted.UniProt, Detection,
Nodules,Prediction
Symptoms
FHIT_HUMANBis(5′-FHITLungCancers,Cytoplasm.Literature
adenosyl)-Benign-
triphosphataseNodules,
Symptoms
FIBA_HUMANFibrinogenFGALungCancers,Secreted.UniProt, Literature,
alphaBenign-Detection,
chainNodules,Prediction
Symptoms
FINC_HUMANFibronectinFN1Secreted,LungCancers,Secreted,UniProt, Literature,
EPI, ENDOBenign-extracellularDetection,
Nodules,space, extra-Prediction
Symptomscellular matrix.
FKB11_HUMANPeptidyl-FKBP11EPI, ENDOMembrane;UniProt, Prediction
prolyl cis-Single-pass
trans isomerasemembrane
FKBP11protein (Potential).
FOLH1_HUMANGlutamateFOLH1ENDOLungCancers,Cell membrane;UniProt, Literature
carboxy-SymptomsSingle-
peptidase 2pass
type II
membrane
protein.
|Isoform
PSMA′:
Cytoplasm.
FOLR1_HUMANFolateFOLR1LungCancersCell membrane;UniProt
receptorLipid-
alphaanchor,
GPI-anchor.
Secreted
(Probable).
FOXA2_HUMANHepatocyteFOXA2LungCancersNucleus.Detection,
nuclearPrediction
factor
3-beta
FP100_HUMANFanconiC17orf70ENDOSymptomsNucleus.Prediction
anemia-
associated
protein of
100 kDa
FRIH_HUMANFerritinFTH1EPILungCancers,Literature,
heavyBenign-Detection,
chainNodulesPrediction
FRIL_HUMANFerritinFTLSecreted,Benign-Literature,
light chainEPI, ENDONodules,Detection
Symptoms
G3P_HUMANGlyceraldehyde-GAPDHSecreted,LungCancers,Cytoplasm.Detection
3-EPI, ENDOBenign-Cytoplasm,
phosphateNodules,perinuclear
dehydrogenaseSymptomsregion.
Membrane.
Note = Postnuclear
and
Perinuclear
regions.
G6PD_HUMANGlucose-G6PDSecreted,LungCancers,Literature,
6-EPISymptomsDetection
phosphate
1-
dehydrogenase
G6PI_HUMANGlucose-GPISecreted,SymptomsCytoplasm.UniProt, Literature,
6-EPISecreted.Detection
phosphate
isomerase
GA2L1_HUMANGAS2-GAS2L1ENDOCytoplasm,Prediction
like protein 1cytoskeleton
(Probable).
GALT2_HUMANPolypeptideGALNT2EPI, ENDOGolgi apparatus,UniProt, Detection
N-Golgi
acetylgalactosaminyl-stack membrane;
transferase 2Single-
pass
type II
membrane
protein. Secreted.
Note = Resides
preferentially
in the
trans and
medial parts
of the Golgi
stack. A
secreted
form also
exists.
GAS6_HUMANGrowthGAS6LungCancersSecreted.UniProt, Detection,
arrest-Prediction
specific
protein 6
GDIR2_HUMANRho GDP-ARHG-EPICytoplasm.Detection
dissociationDIB
inhibitor 2
GELS_HUMANGelsolinGSNLungCancers,Isoform 2:UniProt, Literature,
Benign-Cytoplasm,Detection,
Nodulescytoskeleton.Prediction
|Isoform
1: Secreted.
GGH_HUMANGamma-GGHLungCancersSecreted,UniProt, Detection,
glutamylextracellularPrediction
hydrolasespace. Lysosome.
Melanosome.
Note = While
its intracellular
location
is primarily
the
lysosome,
most of the
enzyme activity
is secreted.
Identified
by
mass spectrometry
in
melanosome
fractions
from stage I
to stage IV.
GPC3_HUMANGlypican-3GPC3LungCancers,Cell membrane;UniProt, Literature,
SymptomsLipid-Prediction
anchor,
GPI-anchor;
Extracellular
side (By
similarity).
|Secreted
glypican-3:
Secreted,
extracellular
space (By
similarity).
GRAN_HUMANGrancalcinGCAEPICytoplasm.Prediction
Cytoplasmic
granule
membrane;
Peripheral
membrane
protein; Cytoplasmic
side.
Note = Primarily
cytosolic
in the
absence of
calcium or
magnesium
ions. Relocates
to
granules and
other membranes
in
response to
elevated
calcium and
magnesium
levels.
GREB1_HUMANProteinGREB1ENDOMembrane;UniProt, Prediction
GREB1Single-pass
membrane
protein (Potential).
GREM1_HUMANGremlin-1GREM1LungCancers,SecretedUniProt, Prediction
Benign-(Probable).
Nodules
GRP_HUMANGastrin-GRPLungCancers,Secreted.UniProt, Prediction
releasingSymptoms
peptide
GRP78_HUMAN78 kDaHSPA5Secreted,LungCancers,EndoplasmicDetection,
glucose-EPI, ENDOBenign-reticulumPrediction
regulatedNoduleslumen.
proteinMelanosome.
Note = Identified
by mass
spectrometry
in melanosome
fractions
from stage I
to stage IV.
GSLG1_HUMANGolgiGLG1EPI, ENDOBenign-Golgi apparatusUniProt
apparatusNodulesmembrane;
protein 1Single-
pass
type I membrane
protein.
GSTP1_HUMANGlutathioneGSTP1SecretedLungCancers,Literature,
S-Benign-Detection,
transferase PNodules,Prediction
Symptoms
GTR1_HUMANSoluteSLC2A1EPI, ENDOLungCancers,Cell membrane;Literature
carrierBenign-Multi-
family 2,Nodules,pass
facilitatedSymptomsmembrane
glucoseprotein (By
transportersimilarity).
member 1Melanosome.
Note = Localizes
primarily
at the cell
surface (By
similarity).
Identified by
mass spectrometry
in
melanosome
fractions
from stage I
to stage IV.
GTR3_HUMANSoluteSLC2A3EPIMembrane;Detection
carrierMulti-pass
family 2,membrane
facilitatedprotein.
glucose
transporter
member 3
H2A1_HUMANHistoneHIST1H2AGSecretedNucleus.Detection,
H2A type 1Prediction
H2A1B_HUMANHistoneHIST1H2ABSecretedNucleus.Detection,
H2A typePrediction
1-B/E
H2A1C_HUMANHistoneHIST1H2ACSecretedNucleus.Literature,
H2A typeDetection,
1-CPrediction
H2A1D_HUMANHistoneHIST1H2ADSecretedNucleus.Detection,
H2A typePrediction
1-D
HG2A_HUMANHLA classCD74LungCancers,Membrane;UniProt, Literature
II histo-Benign-Single-pass
compatibilityNodules,type II
antigenSymptomsmembrane
gammaprotein (Potential).
chain
HGF_HUMANHepatocyteHGFLungCancers,Literature,
growthBenign-Prediction
factorNodules,
Symptoms
HMGA1_HUMANHigh mobilityHMGA1LungCancers,Nucleus.Literature
groupBenign-
proteinNodules,
HMG-Symptoms
I/HMG-Y
HPRT_HUMANHypoxanthine-HPRT1EPICytoplasm.Detection,
guaninePrediction
phosphoribosyltransferase
HPSE_HUMANHeparanaseHPSELungCancers,LysosomeUniProt, Prediction
Benign-membrane;
Nodules,Peripheral
Symptomsmembrane
protein. Secreted.
Note = Secreted,
internalised
and
transferred
to late endosomes/
lysosomes
as a
proheparanase.
In lysosomes,
it
is processed
into the active
form,
the heparanase.
The
uptake or
internalisation
of proheparanase
is mediated
by HSPGs.
Heparin
appears to
be a competitor
and retain
proheparanase
in
the extracellular
medium.
HPT_HUMANHaptoglobinHPLungCancers,Secreted.UniProt, Literature,
Benign-Detection,
Nodules,Prediction
Symptoms
HS90A_HUMANHeatHSP90AA1Secreted,LungCancers,Cytoplasm.Literature,
shockEPISymptomsMelanosome.Detection
proteinNote = Identified
HSP 90-by mass
alphaspectrometry
in melanosome
fractions
from stage I
to stage IV.
HS90B_HUMANHeatHSP90AB1Secreted,LungCancersCytoplasm.Literature,
shockEPIMelanosome.Detection
proteinNote = Identified
HSP 90-by mass
betaspectrometry
in melanosome
fractions
from stage I
to stage IV.
HSPB1_HUMANHeatHSPB1Secreted,LungCancers,Cytoplasm.Literature,
shockEPIBenign-Nucleus.Detection,
proteinNodulesCytoplasm,Prediction
beta-1cytoskeleton,
spindle.
Note = Cytoplasmic
in
interphase
cells. Colocalizes
with
mitotic
spindles in
mitotic cells.
Translocates
to the nucleus
during
heat shock.
HTRA1_HUMANSerineHTRA1LungCancersSecreted.UniProt, Prediction
protease
HTRA1
HXK1_HUMANHexokinase-1HK1ENDOSymptomsMitochondrionLiterature,
outerDetection
membrane.
Note = Its
hydrophobic
N-terminal
sequence
may be involved
in
membrane
binding.
HYAL2_HUMANHyaluronidase-2HYAL2LungCancersCell membrane;Prediction
Lipid-
anchor,
GPI-anchor.
HYOU1_HUMANHypoxiaHYOU1EPI, ENDOSymptomsEndoplasmicDetection
up-reticulum
regulatedlumen.
protein 1
IBP2_HUMANInsulin-IGFBP2LungCancersSecreted.UniProt, Literature,
likeDetection,
growthPrediction
factor-
binding
protein 2
IBP3_HUMANInsulin-IGFBP3LungCancers,Secreted.UniProt, Literature,
likeBenign-Detection,
growthNodules,Prediction
factor-Symptoms
binding
protein 3
ICAM1_HUMANIntercellularICAM1LungCancers,Membrane;UniProt, Literature,
adhesionBenign-Single-passDetection
molecule 1Nodules,type I membrane
Symptomsprotein.
ICAM3_HUMANIntercellularICAM3EPI, ENDOLungCancers,Membrane;UniProt, Detection
adhesionBenign-Single-pass
molecule 3Nodules,type I membrane
Symptomsprotein.
IDHP_HUMANIsocitrateIDH2Secreted,Mitochondrion.Prediction
dehydrogenaseENDO
[NADP],
mitochondrial
IF4A1_HUMANEukaryoticEIF4A1Secreted,Detection,
initiationEPI, ENDOPrediction
factor
4A-I
IGF1_HUMANInsulin-IGF1LungCancers,Secreted.UniProt, Literature,
likeBenign-|Secreted.Detection,
growthNodules,Prediction
factor ISymptoms
IKIP_HUMANInhibitorIKIPENDOSymptomsEndoplasmicUniProt, Prediction
of nuclearreticulum
factormembrane;
kappa-BSingle-
kinase-pass
interactingmembrane
proteinprotein.
Note = Isoform
4 deletion
of the hydrophobic,
or transmembrane
region between
AA
45-63 results
in uniform
distribution
troughout
the cell,
suggesting
that this
region is
responsible
for endoplasmic
reticulum
localization.
IL18_HUMANInterleukin-IL18LungCancers,Secreted.UniProt, Literature,
18Benign-Prediction
Nodules,
Symptoms
IL19_HUMANInterleukin-IL19LungCancersSecreted.UniProt, Detection,
19Prediction
IL22_HUMANInterleukin-IL22LungCancers,Secreted.UniProt, Prediction
22Benign-
Nodules
IL32_HUMANInterleukin-IL32LungCancers,Secreted.UniProt, Prediction
32Benign-
Nodules
IL7_HUMANInterleukin-7IL7LungCancers,Secreted.UniProt, Literature,
Benign-Prediction
Nodules
IL8_HUMANInterleukin-8IL8LungCancers,Secreted.UniProt, Literature
Benign-
Nodules,
Symptoms
ILEU_HUMANLeukocyteSERPINB1Secreted,CytoplasmDetection,
elastaseEPI(By similarity).Prediction
inhibitor
ILK_HUMANIntegrin-ILKSecretedLungCancers,Cell junction,Literature,
linkedBenign-focalDetection
proteinNodules,adhesion.
kinaseSymptomsCell membrane;
Peripheral
membrane
protein; Cytoplasmic
side.
INHBA_HUMANInhibinINHBALungCancers,Secreted.UniProt, Literature,
beta ABenign-Prediction
chainNodules
ISLR_HUMANImmunoglobulinISLRLungCancersSecretedUniProt, Detection,
super-(Potential).Prediction
family
containing
leucine-
rich repeat
protein
ITA5_HUMANIntegrinITGA5EPILungCancers,Membrane;UniProt, Literature,
alpha-5Benign-Single-passDetection
Nodules,type I membrane
Symptomsprotein.
ITAM_HUMANIntegrinITGAMEPI, ENDOLungCancers,Membrane;UniProt, Literature
alpha-MBenign-Single-pass
Nodules,type I membrane
Symptomsprotein.
K0090_HUMANUncharacterizedKIAA0090EPISymptomsMembrane;UniProt, Prediction
proteinSingle-pass
KIAA0090type I membrane
protein
(Potential).
K1C18_HUMANKeratin,KRT18SecretedLungCancers,Cytoplasm,Literature,
type IBenign-perinuclearDetection,
cytoskeletalNodulesregion.Prediction
18
K1C19_HUMANKeratin,KRT19LungCancers,Literature,
type IBenign-Detection,
cytoskeletalNodulesPrediction
19
K2C8_HUMANKeratin,KRT8EPILungCancersCytoplasm.Literature,
type IIDetection
cytoskeletal 8
KIT_HUMANMast/stemKITLungCancersMembrane;UniProt, Literature,
cellSingle-passDetection
growthtype I membrane
factorprotein.
receptor
KITH_HUMANThymidineTK1LungCancersCytoplasm.Literature,
kinase,Prediction
cytosolic
KLK11_HUMANKallikrein-KLK11LungCancersSecreted.UniProt, Literature,
11Prediction
KLK13_HUMANKallikrein-KLK13LungCancersSecretedUniProt, Literature,
13(Probable).Detection,
Prediction
KLK14_HUMANKallikrein-KLK14LungCancers,Secreted,UniProt, Literature,
14SymptomsextracellularPrediction
space.
KLK6_HUMANKallikrein-6KLK6LungCancers,Secreted.UniProt, Literature,
Benign-Nucleus,Detection,
Nodules,nucleolus.Prediction
SymptomsCytoplasm.
Mitochondrion.
Microsome.
Note = In
brain, detected
in the
nucleus of
glial cells
and in the
nucleus and
cytoplasm of
neurons.
Detected in
the mitochondrial
and microsomal
fractions
of
HEK-293
cells and
released into
the cytoplasm
following
cell
stress.
KNG1_HUMANKininogen-1KNG1LungCancers,Secreted,UniProt, Detection,
Benign-extracellularPrediction
Nodules,space.
Symptoms
KPYM_HUMANPyruvatePKM2Secreted,LungCancers,Cytoplasm.Literature,
kinaseEPISymptomsNucleus.Detection
isozymesNote = Translocates
M1/M2to the
nucleus in
response to
different
apoptotic
stimuli. Nuclear
trans-
location is
sufficient to
induce cell
death that is
caspase independent,
isoform-
specific and
independent
of its enzymatic
activity.
KRT35_HUMANKeratin,KRT35ENDODetection,
type IPrediction
cuticular
Ha5
LAMB2_HUMANLamininLAMB2ENDOLungCancers,Secreted,UniProt, Detection,
subunitSymptomsextracellularPrediction
beta-2space, extra-
cellular matrix,
basement
membrane.
Note = S-
laminin is
concentrated
in the synaptic
cleft of
the neuro-
muscular
junction.
LDHA_HUMANL-lactateLDHASecreted,LungCancersCytoplasm.Literature,
dehydrogenase AEPI, ENDODetection,
chainPrediction
LDHB_HUMANL-lactateLDHBEPILungCancersCytoplasm.Detection,
dehydrogenase BPrediction
chain
LEG1_HUMANGalectin-1LGALS1SecretedLungCancersSecreted,UniProt, Detection
extracellular
space, extra-
cellular matrix.
LEG3_HUMANGalectin-3LGALS3LungCancers,Nucleus.Literature,
Benign-Note = CytoplasmicDetection,
NodulesinPrediction
adenomas
and carcinomas.
May
be secreted
by a non-
classical
secretory
pathway and
associate
with the cell
surface.
LEG9_HUMANGalectin-9LGALS9ENDOSymptomsCytoplasmUniProt
(By similarity).
Secreted
(By similarity).
Note = May
also be secreted
by a
non-
classical
secretory
pathway (By
similarity).
LG3BP_HUMANGalectin-LGALS3BPSecretedLungCancers,Secreted.UniProt, Literature,
3-bindingBenign-Secreted,Detection,
proteinNodules,extracellularPrediction
Symptomsspace, extra-
cellular matrix.
LPLC3_HUMANLong palate,C20orf185LungCancersSecreted (ByUniProt, Prediction
lungsimilarity).
and nasalCytoplasm.
epitheliumNote = According
carcinoma-to
associatedPub-
protein 3Med: 12837268
it is cytoplasmic.
LPLC4_HUMANLong palate,C20orf186LungCancersSecreted (ByUniProt, Prediction
lungsimilarity).
and nasalCytoplasm.
epithelium
carcinoma-
associated
protein 4
LPPRC_HUMANLeucine-LRPPRCSecreted,LungCancers,Mitochondrion.Prediction
rich PPRENDOSymptomsNucleus,
motif-nucleoplasm.
containingNucleus
protein,inner membrane.
mitochondrialNucleus
outer
membrane.
Note = Seems
to be pre-
dominantly
mitochondrial.
LRP1_HUMANProlow-LRP1EPILungCancers,Low-densityUniProt, Detection
densitySymptomslipoprotein
lipoproteinreceptor-
receptor-related protein
related1 85 kDa
protein 1subunit:
Cell membrane;
Single-
pass
type I membrane
protein.
Membrane,
coated
pit.|Low-
density lipo-
protein receptor-
related protein
1 515 kDa
subunit:
Cell membrane;
Peripheral
membrane
protein; Extracellular
side. Membrane,
coated
pit.|Low-
density lipo-
protein receptor-
related protein
1 intra-
cellular domain:
Cytoplasm.
Nucleus.
Note = After
cleavage, the
intracellular
domain
(LRPICD) is
detected
both in the
cytoplasm
and in the
nucleus.
LUM_HUMANLumicanLUMSecreted,LungCancers,Secreted,UniProt, Detection,
EPIBenign-extracellularPrediction
Nodules,space, extra-
Symptomscellular matrix
(By similarity).
LY6K_HUMANLymphocyteLY6KLungCancers,Secreted.UniProt, Prediction
antigenSymptomsCytoplasm.
6KCell membrane;
Lipid-
anchor,
GPI-anchor
(Potential).
LYAM2_HUMANE-selectinSELELungCancers,Membrane;UniProt, Literature,
Benign-Single-passDetection
Nodules,type I membrane
Symptomsprotein.
LYAM3_HUMANP-selectinSELPLungCancers,Membrane;UniProt, Literature,
Benign-Single-passDetection
Nodules,type I membrane
Symptomsprotein.
LYOX_HUMANProtein-LOXLungCancers,Secreted,UniProt, Detection,
lysine 6-Benign-extracellularPrediction
oxidaseNodulesspace.
LYPD3_HUMANLy6/PLAURLYPD3LungCancersCell membrane;Detection,
domain-Lipid-Prediction
containinganchor,
protein 3GPI-anchor.
MAGA4_HUMANMelanoma-MAGEA4LungCancersLiterature,
associatedPrediction
antigen 4
MASP1_HUMANMannan-MASP1LungCancers,Secreted.UniProt, Detection,
bindingSymptomsPrediction
lectin serine
protease 1
MDHC_HUMANMalateMDH1SecretedCytoplasm.Literature,
dehydrogenase,Detection,
cytoplasmicPrediction
MDHM_HUMANMalateMDH2ENDOLungCancersMitochondrionDetection,
dehydrogenase,matrix.Prediction
mitochondrial
MIF_HUMANMacrophageMIFSecretedLungCancers,Secreted.UniProt, Literature,
migrationBenign-Cytoplasm.Prediction
inhibitoryNodules,Note = Does
factorSymptomsnot have a
cleavable
signal sequence
and
is secreted
via a specialized,
non-classical
pathway.
Secreted by
macrophages
upon
stimulation
by bacterial
lipopolysaccharide
(LPS), or by
M. tuberculosis
antigens.
MLH1_HUMANDNAMLH1ENDOLungCancers,Nucleus.Literature
mismatchBenign-
repairNodules,
proteinSymptoms
Mlh1
MMP1_HUMANInterstitialMMP1LungCancers,Secreted,UniProt, Literature,
collagenaseBenign-extracellularPrediction
Nodules,space, extra-
Symptomscellular matrix
(Probable).
MMP11_HUMANStromelysin-3MMP11LungCancers,Secreted,UniProt, Literature,
SymptomsextracellularPrediction
space, extra-
cellular matrix
(Probable).
MMP12_HUMANMacrophageMMP12LungCancers,Secreted,UniProt, Literature,
metalloelastaseBenign-extracellularPrediction
Nodules,space, extra-
Symptomscellular matrix
(Probable).
MMP14_HUMANMatrixMMP14ENDOLungCancers,Membrane;UniProt, Literature,
metallo-Benign-Single-passDetection
proteinase-Nodules,type I membrane
14Symptomsprotein
(Potential).
Melanosome.
Note = Identified
by mass
spectrometry
in melanosome
fractions
from stage I
to stage IV.
MMP2_HUMAN72 kDaMMP2LungCancers,Secreted,UniProt, Literature,
type IVBenign-extracellularDetection,
collagenaseNodules,space, extra-Prediction
Symptomscellular matrix
(Probable).
MMP26_HUMANMatrixMMP26LungCancersSecreted,UniProt, Prediction
metallo-extracellular
proteinase-space, extra-
26cellular matrix.
MMP7_HUMANMatrilysinMMP7LungCancers,Secreted,UniProt, Literature,
Benign-extracellularPrediction
Nodules,space, extra-
Symptomscellular matrix
(Probable).
MMP9_HUMANMatrixMMP9LungCancers,Secreted,UniProt, Literature,
metallo-Benign-extracellularDetection,
proteinase-9Nodules,space, extra-Prediction
Symptomscellular matrix
(Probable).
MOGS_HUMANMannosyl-MOGSENDOEndoplasmicUniProt, Prediction
oligosaccharidereticulum
glucosidasemembrane;
Single-
pass
type II
membrane
protein.
MPRI_HUMANCation-IGF2REPI, ENDOLungCancers,LysosomeUniProt, Literature,
independentSymptomsmembrane;Detection
mannose-Single-pass
6-type I membrane
phosphateprotein.
receptor
MRP3_HUMANCanalicularABCC3EPILungCancersMembrane;Literature,
multi-Multi-passDetection
specificmembrane
organicprotein.
anion
transporter 2
MUC1_HUMANMucin-1MUC1EPILungCancers,Apical cellUniProt, Literature,
Benign-membrane;Prediction
Nodules,Single-pass
Symptomstype I membrane
protein.
Note = Exclusively
located
in the
apical domain
of the
plasma
membrane
of highly
polarized
epithelial
cells. After
endocytosis,
internalized
and recycled
to the cell
membrane.
Located to
microvilli
and to the
tips of long
filopodial
protusions.
|Isoform
5: Secreted.
|Isoform
7: Secreted.
|Isoform
9: Secreted.
|Mucin-1
subunit beta:
Cell membrane.
Cytoplasm.
Nucleus.
Note = On
EGF and
PDGFRB
stimulation,
transported
to the nucleus
through
interaction
with
CTNNB1, a
process
which is
stimulated
by phosphorylation.
On HRG
stimulation,
colocalizes
with
JUP/gamma-
catenin at
the nucleus.
MUC16_HUMANMucin-16MUC16LungCancersCell membrane;UniProt, Detection
Single-
pass
type I membrane
protein.
Secreted,
extracellular
space.
Note = May
be liberated
into the extracellular
space following
the
phosphorylation
of the
intracellular
C-terminus
which induces
the
proteolytic
cleavage and
liberation of
the extracellular
domain.
MUC4_HUMANMucin-4MUC4LungCancers,Membrane;UniProt
Benign-Single-pass
Nodulesmembrane
protein (Potential).
Secreted.
Note = Isoforms
lacking
the Cys-rich
region,
EGF-like
domains and
transmembrane
region
are secreted.
Secretion
occurs by
splicing or
proteolytic
processing.
|Mucin-4
beta chain:
Cell membrane;
Single-
pass
membrane
protein.
|Mucin-
4 alpha
chain:
Secreted.
|Isoform
3: Cell
membrane;
Single-pass
membrane
protein.
|Isoform
15: Secreted.
MUC5B_HUMANMucin-5BMUC5BLungCancers,Secreted.UniProt, Detection,
Benign-Prediction
Nodules
MUCL1_HUMANMucin-MUCL1LungCancersSecretedUniProt, Prediction
like protein 1(Probable).
Membrane
(Probable).
NAMPT_HUMANNicotinamideNAMPTEPILungCancers,CytoplasmLiterature,
phosphoribosyltransferaseBenign-(By similarity).Detection
Nodules,
Symptoms
NAPSA_HUMANNapsin-ANAPSASecretedLungCancersPrediction
NCF4_HUMANNeutrophilNCF4ENDOCytoplasm.Prediction
cytosol
factor 4
NDKA_HUMANNucleosideNME1SecretedLungCancers,Cytoplasm.Literature,
di-Benign-Nucleus.Detection
phosphateNodules,Note = Cell-
kinase ASymptomscycle dependent
nuclear
localization
which
can be induced
by
interaction
with Epstein-
barr
viral proteins
or by
degradation
of the SET
complex by
GzmA.
NDKB_HUMANNucleosideNME2Secreted,Benign-Cytoplasm.Literature,
di-EPINodulesNucleus.Detection
phosphateNote = Isoform
kinase B2 is mainly
cytoplasmic
and
isoform 1
and isoform
2 are excluded
from
the nucleolus.
NDUS1_HUMANNADH-NDUFS1Secreted,SymptomsMitochondrionPrediction
ubiquinoneENDOinner
oxidoreductasemembrane.
75 kDa
subunit,
mitochondrial
NEBL_HUMANNebuletteNEBLENDOPrediction
NEK4_HUMANSerine/NEK4ENDOLungCancersNucleusPrediction
threonine-(Probable).
protein
kinase
Nek4
NET1_HUMANNetrin-1NTN1LungCancers,Secreted,UniProt, Literature,
Benign-extracellularPrediction
Nodulesspace, extra-
cellular matrix
(By similarity).
NEU2_HUMANVasopressin-AVPLungCancers,Secreted.UniProt, Prediction
neurophysinSymptoms
2-
copeptin
NGAL_HUMANNeutrophilLCN2EPILungCancers,Secreted.UniProt, Detection,
gelatinase-Benign-Prediction
associatedNodules,
lipocalinSymptoms
NGLY1_HUMANPeptide-NGLY1ENDOCytoplasm.Detection,
N(4)-(N-Prediction
acetyl-
beta-
glucosaminyl)asparagine
amidase
NHRF1_HUMANNa(+)/H(+)SLC9A3R1EPIBenign-EndomembraneDetection
exchangeNodulessystem;
regulatoryPeripheral
cofactormembrane
NHE-RF1protein.
Cell
projection,
filopodium.
Cell projection,
ruffle.
Cell projection,
microvillus.
Note = Colocalizes
with
actin in microvilli-
rich
apical regions
of the
syncytio-
trophoblast.
Found in
microvilli,
ruffling
membrane
and filopodia
of HeLa
cells. Present
in lipid
rafts of T-
cells.
NIBAN_HUMANProteinFAM129AEPICytoplasm.Literature,
NibanDetection
NMU_HUMANNeuromedin-UNMULungCancersSecreted.UniProt, Prediction
NRP1_HUMANNeuropilin-1NRP1LungCancers,Cell membrane;UniProt, Literature,
Benign-Single-Detection,
Nodules,passPrediction
Symptomstype I membrane
protein.
|Isoform
2: Secreted.
ODAM_HUMANOdontogenicODAMLungCancersSecreted (ByUniProt, Prediction
ameloblast-similarity).
associated
protein
OSTP_HUMANOsteopontinSPP1LungCancers,Secreted.UniProt, Literature,
Benign-Detection,
Nodules,Prediction
Symptoms
OVOS2_HUMANOvostatinOVOS2ENDOSecreted (ByUniProt, Prediction
homolog 2similarity).
P5CS_HUMANDelta-1-ALDH18A1ENDOMitochondrionPrediction
pyrroline-inner
5-membrane.
carboxylate
synthase
PA2GX_HUMANGroup 10PLA2G10SymptomsSecreted.UniProt
secretory
phospholipase
A2
PAPP1_HUMANPappalysin-1PAPPALungCancers,Secreted.UniProt, Literature,
Benign-Prediction
Nodules,
Symptoms
PBIP1_HUMANPre-B-cellPBXIP1EPICytoplasm,Prediction
leukemiacytoskeleton.
transcriptionNucleus.
factor-Note = Shuttles
interactingbetween
protein 1the nucleus
and the cytosol.
Mainly
localized
in the cytoplasm,
associated
with
microtubules.
Detected
in
small
amounts in
the nucleus.
PCBP1_HUMANPoly(rC)-PCBP1EPI, ENDONucleus.Detection,
bindingCytoplasm.Prediction
protein 1Note = Loosely
bound in
the nucleus.
May shuttle
between the
nucleus and
the cytoplasm.
PCBP2_HUMANPoly(rC)-PCBP2EPINucleus.Detection,
bindingCytoplasm.Prediction
protein 2Note = Loosely
bound in
the nucleus.
May shuttle
between the
nucleus and
the cytoplasm.
PCD15_HUMANProtocadherin-PCDH15ENDOCell membrane;UniProt, Detection
15Single-
pass
type I membrane
protein
(By
similarity).
|Isoform
3: Secreted.
PCNA_HUMANProliferatingPCNAEPILungCancers,Nucleus.Literature,
cellBenign-Prediction
nuclearNodules,
antigenSymptoms
PCYOX_HUMANPrenylcysteinePCYOX1SecretedLungCancers,Lysosome.Detection,
oxidase 1SymptomsPrediction
PDGFA_HUMANPlatelet-PDGFALungCancersSecreted.UniProt, Literature,
derivedPrediction
growth
factor
subunit A
PDGFB_HUMANPlatelet-PDGFBLungCancers,Secreted.UniProt, Literature,
derivedBenign-Detection,
growthNodules,Prediction
factorSymptoms
subunit B
PDGFD_HUMANPlatelet-PDGFDLungCancersSecreted.UniProt, Prediction
derived
growth
factor D
PDIA3_HUMANProteinPDIA3ENDOLungCancersEndoplasmicDetection,
disulfide-reticulumPrediction
isomeraselumen
A3(By similarity).
Melanosome.
Note = Identified
by mass
spectrometry
in melanosome
fractions
from stage I
to stage IV.
PDIA4_HUMANProteinPDIA4Secreted,EndoplasmicDetection,
disulfide-EPI, ENDOreticulumPrediction
isomeraselumen.
A4Melanosome.
Note = Identified
by mass
spectrometry
in melanosome
fractions
from stage I
to stage IV.
PDIA6_HUMANProteinPDIA6Secreted,EndoplasmicDetection,
disulfide-EPI, ENDOreticulumPrediction
isomeraselumen
A6(By similarity).
Melanosome.
Note = Identified
by mass
spectrometry
in melanosome
fractions
from stage I
to stage IV.
PECA1_HUMANPlateletPECAM1LungCancers,Membrane;UniProt, Literature,
endothelialBenign-Single-passDetection
cellNodules,type I membrane
adhesionSymptomsprotein.
molecule
PEDF_HUMANPigmentSERPINF1LungCancers,Secreted.UniProt, Literature,
epithelium-SymptomsMelanosome.Detection,
derivedNote = EnrichedPrediction
factorin stage I
melanosomes.
PERM_HUMANMyeloperoxidaseMPOSecreted,LungCancers,Lysosome.Literature,
EPI, ENDOBenign-Detection,
Nodules,Prediction
Symptoms
PERP1_HUMANPlasmaPACAPEPI, ENDOSecretedUniProt, Detection,
cell-(Potential).Prediction
inducedCytoplasm.
residentNote = In
endoplasmic(Pub-
reticulumMed: 11350957)
proteindiffuse
granular
localization
in the cytoplasm
surrounding
the
nucleus.
PGAM1_HUMANPhospho-PGAM1Secreted,LungCancers,Detection
glycerateEPISymptoms
mutase 1
PLAC1_HUMANPlacenta-PLAC1LungCancersSecretedUniProt, Prediction
specific(Probable).
protein 1
PLACL_HUMANPlacenta-PLAC1LLungCancersSecretedUniProt, Prediction
specific 1-(Potential).
like protein
PLIN2_HUMANPerilipin-2ADFPENDOLungCancersMembrane;Prediction
Peripheral
membrane
protein.
PLIN3_HUMANPerilipin-3M6PRBP1EPICytoplasm.Detection,
EndosomePrediction
membrane;
Peripheral
membrane
protein; Cytoplasmic
side (Potential).
Lipid
droplet (Potential).
Note = Membrane
associated
on endosomes.
Detected in
the envelope
and the core
of lipid bodies
and in
lipid sails.
PLOD1_HUMANProcollagen-PLOD1EPI, ENDORough endoplasmicPrediction
lysine, 2-reticulum
oxoglutaratemembrane;
5-Peripheral
dioxygenase 1membrane
protein;
Lumenal
side.
PLOD2_HUMANProcollagen-PLOD2ENDOBenign-Rough endoplasmicPrediction
lysine,Nodules,reticulum
2-Symptomsmembrane;
oxoglutaratePeripheral
5-membrane
dioxygenase 2protein;
Lumenal
side.
PLSL_HUMANPlastin-2LCP1Secreted,LungCancersCytoplasm,Detection,
EPIcytoskeleton.Prediction
Cell
junction.
Cell projection.
Cell
projection,
ruffle membrane;
Peripheral
membrane
protein; Cytoplasmic
side (By
similarity).
Note = Relocalizes
to the
immunological
synapse
between
peripheral
blood T
lymphocytes
and antibody-
presenting
cells in response
to
costimulation
through
TCR/CD3
and CD2 or
CD28. Associated
with the
actin cytoskeleton
at
membrane
ruffles (By
similarity).
Relocalizes
to actin-rich
cell projections
upon
serine phosphorylation.
PLUNC_HUMANProteinPLUNCLungCancers,Secreted (ByUniProt, Prediction
PluncBenign-similarity).
NodulesNote = Found
in the nasal
mucus (By
similarity).
Apical side
of airway
epithelial
cells. Detected
in
nasal mucus
(By similarity).
PLXB3_HUMANPlexin-B3PLXNB3ENDOMembrane;UniProt, Detection,
Single-passPrediction
type I membrane
protein.
PLXC1_HUMANPlexin-C1PLXNC1EPIMembrane;UniProt, Detection
Single-pass
type I membrane
protein
(Potential).
POSTN_HUMANPeriostinPOSTNSecreted,LungCancers,Secreted,UniProt, Literature,
ENDOBenign-extracellularDetection,
Nodules,space, extra-Prediction
Symptomscellular matrix.
PPAL_HUMANLysosomalACP2EPISymptomsLysosomeUniProt, Prediction
acidmembrane;
phosphataseSingle-pass
membrane
protein;
Lumenal
side. Lysosome
lumen.
Note = The
soluble form
arises by
proteolytic
processing
of the membrane-
bound
form.
PPBT_HUMANAlkalineALPLEPILungCancers,Cell membrane;Literature,
phosphatase,Benign-Lipid-Detection,
tissue-Nodules,anchor,Prediction
nonspecificSymptomsGPI-anchor.
isozyme
PPIB_HUMANPeptidyl-PPIBSecreted,EndoplasmicDetection,
prolyl cis-EPI, ENDOreticulumPrediction
trans isomerase Blumen.
Melanosome.
Note = Identified
by mass
spectrometry
in melanosome
fractions
from stage I
to stage IV.
PRDX1_HUMANPeroxiredoxin-1PRDX1EPILungCancersCytoplasm.Detection,
Melanosome.Prediction
Note = Identified
by mass
spectrometry
in melanosome
fractions
from stage I
to stage IV.
PRDX4_HUMANPeroxiredoxin-4PRDX4Secreted,Cytoplasm.Literature,
EPI, ENDODetection,
Prediction
PROF1_HUMANProfilin-1PFN1Secreted,LungCancersCytoplasm,Detection
EPIcytoskeleton.
PRP31_HUMANU4/U6PRPF31ENDONucleusPrediction
small nuclearspeckle.
ribo-Nucleus,
nucleo-Cajal body.
proteinNote = Predominantly
Prp31found in
speckles and
in Cajal
bodies.
PRS6A_HUMAN26S proteasePSMC3EPIBenign-CytoplasmDetection
regulatoryNodules(Potential).
subunitNucleus
6A(Potential).
PSCA_HUMANProstatePSCALungCancersCell membrane;Literature,
stem cellLipid-Prediction
antigenanchor,
GPI-anchor.
PTGIS_HUMANProstacyclinPTGISEPILungCancers,EndoplasmicUniProt, Detection,
synthaseBenign-reticulumPrediction
Nodulesmembrane;
Single-
pass
membrane
protein.
PTPA_HUMANSerine/PPP2R4ENDOSymptomsDetection,
threonine-Prediction
protein
phosphatase
2A
activator
PTPRC_HUMANReceptor-PTPRCSecreted,LungCancersMembrane;UniProt, Detection,
type tyrosine-EPI, ENDOSingle-passPrediction
proteintype I membrane
phosphatase Cprotein.
PTPRJ_HUMANReceptor-PTPRJEPILungCancers,Membrane;UniProt, Detection,
type tyrosine-SymptomsSingle-passPrediction
proteintype I membrane
phosphataseprotein.
eta
PVR_HUMANPoliovirusPVRSymptomsIsoform Alpha:UniProt, Detection,
receptorCellPrediction
membrane;
Single-pass
type I membrane
protein.
|Isoform
Delta: Cell
membrane;
Single-pass
type I membrane
protein.
|Isoform
Beta: Secreted.
|Isoform
Gamma:
Secreted.
RAB32_HUMANRas-RAB32EPIMitochondrion.Prediction
related
protein
Rab-32
RAGE_HUMANAdvancedAGERSecretedLungCancers,Isoform 1:UniProt, Literature
glycosylationBenign-Cell membrane;
endNodulesSingle-
product-pass
specifictype I membrane
receptorprotein.
|Isoform
2: Secreted.
RAN_HUMANGTP-RANSecreted,LungCancers,Nucleus.Detection,
bindingEPIBenign-Cytoplasm.Prediction
nuclearNodulesMelanosome.
proteinNote = Becomes
Randispersed
throughout
the cytoplasm
during
mitosis.
Identified by
mass spectrometry
in
melanosome
fractions
from stage I
to stage IV.
RAP2B_HUMANRas-RAP2BEPICell membrane;Prediction
relatedLipid-
proteinanchor;
Rap-2bCytoplasmicside
(Potential).
RAP2C_HUMANRas-RAP2CEPICell membrane;Prediction
relatedLipid-
proteinanchor;
Rap-2cCytoplasmic
side (Potential).
RCN3_HUMANReticulocalbin-3RCN3EPISymptomsEndoplasmicPrediction
reticulum
lumen
(Potential).
RL24_HUMAN60S ribosomalRPL24EPIPrediction
protein
L24
S10A1_HUMANProteinS100A1SymptomsCytoplasm.Literature,
S100-A1Prediction
S10A6_HUMANProteinS100A6SecretedLungCancersNucleusLiterature,
S100-A6envelope.Detection,
Cytoplasm.Prediction
S10A7_HUMANProteinS100A7LungCancersCytoplasm.UniProt, Literature,
S100-A7Secreted.Detection,
Note = SecretedPrediction
by a non-
classical
secretory
pathway.
SAA_HUMANSerumSAA1SymptomsSecreted.UniProt, Literature,
amyloid ADetection,
proteinPrediction
SCF_HUMANKit ligandKITLGLungCancers,Isoform 1:UniProt, Literature
SymptomsCell membrane;
Single-
pass
type I membrane
protein
(By
similarity).
Secreted (By
similarity).
Note = Also
exists as a
secreted
soluble form
(isoform 1
only) (By
similarity).
|Isoform
2: Cell
membrane;
Single-pass
type I membrane
protein
(By
similarity).
Cytoplasm,
cytoskeleton
(By similarity).
SDC1_HUMANSyndecan-1SDC1LungCancers,Membrane;UniProt, Literature,
Benign-Single-passDetection
Nodules,type I membrane
Symptomsprotein.
SEM3G_HUMANSemaphorin-SEMA3GLungCancersSecreted (ByUniProt, Prediction
3Gsimilarity).
SEPR_HUMANSepraseFAPENDOSymptomsCell membrane;UniProt, Literature,
Single-Detection
pass
type II
membrane
protein. Cell
projection,
lamellipodium
membrane;
Single-
pass
type II
membrane
protein. Cell
projection,
invadopodium
membrane;
Single-
pass
type II
membrane
protein.
Note = Found
in cell surface
lamellipodia,
invadopodia
and on shed
vesicles.
SERPH_HUMANSerpin H1SERPINH1Secreted,LungCancers,EndoplasmicDetection,
EPI, ENDOBenign-reticulumPrediction
Noduleslumen.
SFPA2_HUMANPulmonarySFTPA2SecretedLungCancers,Secreted,UniProt, Prediction
surfactant-Benign-extracellular
associatedNodulesspace, extra-
protein A2cellular matrix.
Secreted,
extracellular
space,
surface film.
SFTA1_HUMANPulmonarySFTPA1SecretedLungCancers,Secreted,UniProt, Prediction
surfactant-Benign-extracellular
associatedNodules,space, extra-
protein A1Symptomscellular matrix.
Secreted,
extracellular
space,
surface film.
SG3A2_HUMANSecreto-SCGB3A2LungCancers,Secreted.UniProt, Prediction
globinBenign-
family 3ANodules
member 2
SGPL1_HUMANSphingosine-SGPL1ENDOEndoplasmicUniProt, Prediction
1-reticulum
phosphatemembrane;
lyase 1Single-
pass
type III
membrane
protein.
SIAL_HUMANBone sialoprotein 2IBSPLungCancersSecreted.UniProt, Literature,
Prediction
SLPI_HUMANAntileukoproteinaseSLPILungCancers,Secreted.UniProt, Literature,
Benign-Detection,
NodulesPrediction
SMD3_HUMANSmallSNRPD3SecretedBenign-Nucleus.Prediction
nuclearNodules
ribonucleoprotein
Sm D3
SMS_HUMANSomatostatinSSTLungCancersSecreted.UniProt, Literature,
Prediction
SODM_HUMANSuperoxideSOD2SecretedLungCancers,MitochondrionLiterature,
dismutaseBenign-matrix.Detection,
[Mn],Nodules,Prediction
mitochondrialSymptoms
SORL_HUMANSortilin-SORL1EPILungCancers,Membrane;UniProt, Detection
relatedSymptomsSingle-pass
receptortype I membrane
protein
(Potential).
SPB3_HUMANSerpin B3SERPINB3LungCancers,Cytoplasm.Literature,
Benign-Note = SeemsDetection
Nodulesto also be
secreted in
plasma by
cancerous
cells but at a
low level.
SPB5_HUMANSerpin B5SERPINB5LungCancersSecreted,UniProt, Detection
extracellular
space.
SPON2_HUMANSpondin-2SPON2LungCancers,Secreted,UniProt, Prediction
Benign-extracellular
Nodulesspace, extra-
cellular matrix
(By similarity).
SPRC_HUMANSPARCSPARCLungCancers,Secreted,UniProt, Literature,
Benign-extracellularDetection,
Nodules,space, extra-Prediction
Symptomscellular matrix,
basement
membrane.
Note = In or
around the
basement
membrane.
SRC_HUMANProto-SRCENDOLungCancers,Literature
oncogeneBenign-
tyrosine-Nodules,
proteinSymptoms
kinase Src
SSRD_HUMANTranslocon-SSR4Secreted,EndoplasmicUniProt, Prediction
associatedENDOreticulum
proteinmembrane;
subunitSingle-
deltapass
type I membrane
protein.
STAT1_HUMANSignalSTAT1EPILungCancers,Cytoplasm.Detection
transducerBenign-Nucleus.
and activatorNodulesNote = Translocated
ofinto
transcriptionthe nucleus
1-in response
alpha/betato IFN-
gamma-
induced tyrosine
phosphorylation
and dimerization.
STAT3_HUMANSignalSTAT3ENDOLungCancers,Cytoplasm.Prediction
transducerBenign-Nucleus.
and activatorNodules,Note = Shuttles
ofSymptomsbetween
transcription 3the nucleus
and the cytoplasm.
Constitutive
nuclear
presence is
independent
of tyrosine
phosphorylation.
STC1_HUMANStanniocalcin-1STC1LungCancers,Secreted.UniProt, Prediction
Symptoms
STT3A_HUMANDolichyl-STT3AEPISymptomsEndoplasmicLiterature
diphosphooligo-reticulum
saccharide--membrane;
proteinMulti-
glycosyl-pass
transferasemembrane
subunitprotein.
STT3A
TAGL_HUMANTransgelinTAGLNEPILungCancersCytoplasmLiterature,
(Probable).Prediction
TARA_HUMANTRIO andTRIOBPENDONucleus.Detection,
F-actin-Cytoplasm,Prediction
bindingcytoskeleton.
proteinNote = Localized
to F-
actin in a
periodic
pattern.
TBA1B_HUMANTubulinTUBA1BEPILungCancersDetection
alpha-1B
chain
TBB2A_HUMANTubulinTUBB2AEPILungCancers,Detection,
beta-2ABenign-Prediction
chainNodules
TBB3_HUMANTubulinTUBB3EPILungCancers,Detection
beta-3Benign-
chainNodules
TBB5_HUMANTubulinTUBBEPILungCancers,Detection
beta chainBenign-
Nodules
TCPA_HUMANT-TCP1EPICytoplasm.Prediction
complex
protein 1
subunit
alpha
TCPD_HUMANT-CCT4EPICytoplasm.Detection,
complexMelanosome.Prediction
protein 1Note = Identified
subunitby mass
deltaspectrometry
in melanosome
fractions
from stage I
to stage IV.
TCPQ_HUMANT-CCT8Secreted,Cytoplasm.Prediction
complexEPI
protein 1
subunit
theta
TCPZ_HUMANT-CCT6ASecreted,Cytoplasm.Detection
complexEPI
protein 1
subunit
zeta
TDRD3_HUMANTudorTDRD3ENDOCytoplasm.Prediction
domain-Nucleus.
containingNote = Predominantly
protein 3cytoplasmic.
Associated
with actively
translating
polyribosomes
and
with mRNA
stress granules.
TENA_HUMANTenascinTNCENDOLungCancers,Secreted,UniProt, Literature,
Benign-extracellularDetection
Nodules,space, extra-
Symptomscellular matrix.
TENX_HUMANTenascin-XTNXBENDOLungCancers,Secreted,UniProt, Detection,
SymptomsextracellularPrediction
space, extra-
cellular matrix.
TERA_HUMANTransitionalVCPEPILungCancers,Cytoplasm,Detection
endoplasmicBenign-cytosol. Nucleus.
reticulumNodulesNote = Present
ATPasein the neuronal
hyaline
inclusion
bodies
specifically
found in
motor neurons
from
amyotrophic
lateral sclerosis
patients.
Present
in the
Lewy bodies
specifically
found in
neurons
from Parkinson
disease
patients.
TETN_HUMANTetranectinCLEC3BLungCancersSecreted.UniProt, Literature,
Detection,
Prediction
TF_HUMANTissueF3LungCancers,Membrane;UniProt, Literature
factorBenign-Single-pass
Nodules,type I membrane
Symptomsprotein.
TFR1_HUMANTransferrinTFRCSecreted,LungCancers,Cell membrane;UniProt, Literature,
receptorEPI, ENDOBenign-Single-Detection
protein 1Nodules,pass
Symptomstype II
membrane
protein.
Melanosome.
Note = Identified
by mass
spectrometry
in melanosome
fractions
from stage I
to stage
IV.|Transferrin
receptor
protein 1,
serum form:
Secreted.
TGFA_HUMANProtransformingTGFALungCancers,TransformingUniProt, Literature
growthBenign-growth
factorNodulesfactor alpha:
alphaSecreted,
extracellular
space.|Protransforming
growth factor
alpha:
Cell membrane;
Single-
pass
type I membrane
protein.
THAS_HUMANThromboxane-ATBXAS1EPI, ENDOLungCancers,Membrane;Prediction
synthaseBenign-Multi-pass
Nodules,membrane
Symptomsprotein.
THY1_HUMANThy-1THY1EPISymptomsCell membrane;Detection,
membraneLipid-Prediction
glycoproteinanchor,
GPI-anchor
(By similarity).
TIMP1_HUMANMetallo-TIMP1LungCancers,Secreted.UniProt, Literature,
proteinaseBenign-Detection,
inhibitor 1Nodules,Prediction
Symptoms
TIMP3_HUMANMetallo-TIMP3LungCancers,Secreted,UniProt, Literature,
proteinaseBenign-extracellularPrediction
inhibitor 3Nodulesspace, extra-
cellular matrix.
TLL1_HUMANTolloid-TLL1ENDOSecretedUniProt, Prediction
like protein 1(Probable).
TNF12_HUMANTumorTNFSF12LungCancers,Cell membrane;UniProt
necrosisBenign-Single-
factorNodulespass
ligandtype II
super-membrane
familyprotein.
member|Tumor
12necrosis
factor ligand
superfamily
member 12,
secreted
form: Secreted.
TNR6_HUMANTumorFASLungCancers,Isoform 1:UniProt, Literature,
necrosisBenign-Cell membrane;Prediction
factorNodules,Single-
receptorSymptomspass
super-type I membrane
familyprotein.
member 6|Isoform
2: Secreted.
|Isoform
3: Secreted.
|Isoform
4: Secreted.
|Isoform
5: Secreted.
|Isoform
6: Secreted.
TPIS_HUMANTri-TPI1Secreted,SymptomsLiterature,
osephosphateEPIDetection,
isomerasePrediction
TRFL_HUMANLacto-LTFSecreted,LungCancers,Secreted.UniProt, Literature,
transferrinEPI, ENDOBenign-Detection,
Nodules,Prediction
Symptoms
TSP1_HUMANThrombospondin-1THBS1LungCancers,Literature,
Benign-Detection,
Nodules,Prediction
Symptoms
TTHY_HUMANTransthyretinTTRLungCancers,Secreted.UniProt, Literature,
Benign-Cytoplasm.Detection,
NodulesPrediction
TYPH_HUMANThymidineTYMPEPILungCancers,Literature,
phosphorylaseBenign-Detection,
Nodules,Prediction
Symptoms
UGGG1_HUMANUDP-UGGT1Secreted,EndoplasmicDetection,
glucose:glycoENDOreticulumPrediction
proteinlumen.
glucosyl-Endoplasmic
transferase 1reticulum-
Golgi
intermediate
compartment.
UGGG2_HUMANUDP-UGGT2ENDOEndoplasmicPrediction
glucose:glycoreticulum
proteinlumen.
glucosyl-Endoplasmic
transferase 2reticulum-
Golgi
intermediate
compartment.
UGPA_HUMANUTP--UGP2EPISymptomsCytoplasm.Detection
glucose-1-
phosphate
uridyl-
yltransferase
UPAR_HUMANUrokinasePLAURLungCancers,Isoform 1:UniProt, Literature,
plasminogenBenign-Cell membrane;Prediction
activatorNodules,Lipid-
surfaceSymptomsanchor,
receptorGPI-anchor.
|Isoform
2: Secreted
(Probable).
UTER_HUMANUtero-SCGB1A1LungCancers,Secreted.UniProt, Literature,
globinBenign-Detection,
Nodules,Prediction
Symptoms
VA0D1_HUMANV-typeATP6V0D1EPIPrediction
proton
ATPase
subunit d1
VAV3_HUMANGuanineVAV3ENDOPrediction
nucleotide
exchange
factor
VAV3
VEGFA_HUMANVascularVEGFALungCancers,Secreted.UniProt, Literature,
endothelialBenign-Note = VEGFPrediction
growthNodules,121 is acidic
factor ASymptomsand freely
secreted.
VEGF165 is
more basic,
has heparin-
binding
properties
and, although a
signicant
proportion
remains cell-
associated,
most is
freely secreted.
VEGF189 is
very basic, it
is cell-
associated
after secretion
and is
bound avidly
by heparin
and the
extracellular
matrix, although
it
may be released
as a
soluble form
by heparin,
heparinase
or plasmin.
VEGFC_HUMANVascularVEGFCLungCancers,Secreted.UniProt, Literature,
endothelialBenign-Prediction
growthNodules
factor C
VEGFD_HUMANVascularFIGFLungCancersSecreted.UniProt, Literature,
endothelialPrediction
growth
factor D
VGFR1_HUMANVascularFLT1LungCancers,IsoformUniProt, Literature,
endothelialBenign-Flt1: CellDetection,
growthNodules,membrane;Prediction
factorSymptomsSingle-pass
receptor 1type I membrane
protein.
|Isoform
sFlt1: Secreted.
VTNC_HUMANVitronectinVTNENDOSymptomsSecreted,UniProt, Literature,
extracellularDetection,
space.Prediction
VWC2_HUMANBrorinVWC2LungCancersSecreted,UniProt, Prediction
extracellular
space, extra-
cellular matrix,
basement
membrane
(By
similarity).
WNT3A_HUMANProteinWNT3ALungCancers,Secreted,UniProt, Prediction
Wnt-3aSymptomsextracellular
space, extra-
cellular matrix.
WT1_HUMANWilmsWT1LungCancers,Nucleus.Literature,
tumorBenign-CytoplasmPrediction
proteinNodules,(By similarity).
SymptomsNote = Shuttles
between
nucleus and
cytoplasm
(By similarity).
|Isoform
1: Nucleus
speckle.
|Isoform
4: Nucleus,
nucleoplasm.
ZA2G_HUMANZinc-AZGP1LungCancers,Secreted.UniProt, Literature,
alpha-2-SymptomsDetection,
glycoproteinPrediction
ZG16B_HUMANZymogenZG16BLungCancersSecretedUniProt, Prediction
granule(Potential).
protein 16
homolog B

[0106]

190 of these candidate protein biomarkers were shown to be measured reproducibly in blood. A moderately powered multisite and unbiased study of 242 blood samples from patients with PN was designed to determine whether a statistically significant subpanel of proteins could be identified to distinguish benign and malignant nodules of sizes under 2 cm. The three sites contributing samples and clinical data to this study were the University of Laval, University of Pennsylvania and New York University.

[0107]

In an embodiment of the invention, a panel of 15 proteins effectively distinguished between samples derived from patients with benign and malignant nodules less than 2 cm diameter.

[0108]

Bioinformatic and biostatistical analyses were used first to identify individual proteins with statistically significant differential expression, and then using these proteins to derive one or more combinations of proteins or panels of proteins, which collectively demonstrated superior discriminatory performance compared to any individual protein. Bioinformatic and biostatistical methods are used to derive coefficients (C) for each individual protein in the panel that reflects its relative expression level, i.e. increased or decreased, and its weight or importance with respect to the panel's net discriminatory ability, relative to the other proteins. The quantitative discriminatory ability of the panel can be expressed as a mathematical algorithm with a term for each of its constituent proteins being the product of its coefficient and the protein's plasma expression level (P) (as measured by LC-SRM-MS), e.g. C×P, with an algorithm consisting of n proteins described as: C1×P1+C2×P2+C3×P3+ . . . +Cn×Pn. An algorithm that discriminates between disease states with a predetermined level of statistical significance may be refers to a “disease classifier”. In addition to the classifier's constituent proteins with differential expression, it may also include proteins with minimal or no biologic variation to enable assessment of variability, or the lack thereof, within or between clinical specimens; these proteins may be termed typical native proteins and serve as internal controls for the other classifier proteins.

[0109]

In certain embodiments, expression levels are measured by MS. MS analyzes the mass spectrum produced by an ion after its production by the vaporization of its parent protein and its separation from other ions based on its mass-to-charge ratio. The most common modes of acquiring MS data are 1) full scan acquisition resulting in the typical total ion current plot (TIC), 2) selected ion monitoring (SIM), and 3) selected reaction monitoring (SRM).

[0110]

In certain embodiments of the methods provided herein, biomarker protein expression levels are measured by LC-SRM-MS. LC-SRM-MS is a highly selective method of tandem mass spectrometry which has the potential to effectively filter out all molecules and contaminants except the desired analyte(s). This is particularly beneficial if the analysis sample is a complex mixture which may comprise several isobaric species within a defined analytical window. LC-SRM-MS methods may utilize a triple quadrupole mass spectrometer which, as is known in the art, includes three quadrupole rod sets. A first stage of mass selection is performed in the first quadrupole rod set, and the selectively transmitted ions are fragmented in the second quadrupole rod set. The resultant transition (product) ions are conveyed to the third quadrupole rod set, which performs a second stage of mass selection. The product ions transmitted through the third quadrupole rod set are measured by a detector, which generates a signal representative of the numbers of selectively transmitted product ions. The RF and DC potentials applied to the first and third quadrupoles are tuned to select (respectively) precursor and product ions that have m/z values lying within narrow specified ranges. By specifying the appropriate transitions (m/z values of precursor and product ions), a peptide corresponding to a targeted protein may be measured with high degrees of sensitivity and selectivity. Signal-to-noise ratio is superior to conventional tandem mass spectrometry (MS/MS) experiments, which select one mass window in the first quadrupole and then measure all generated transitions in the ion detector. LC-SRM-MS.

[0111]

In certain embodiments, an SRM-MS assay for use in diagnosing or monitoring lung cancer as disclosed herein may utilize one or more peptides and/or peptide transitions derived from the proteins set forth in Table 6. In certain embodiments, the assay may utilize peptides and/or peptide transitions from 100 or more, 150 or more, 200 or more, 250 or more, 300 or more, 345 or more, or 371 or more biomarker proteins. In certain embodiments, two or more peptides may be utilized per biomarker proteins, and in certain of these embodiments three or more of four or more peptides may be utilized. Similarly, in certain embodiments two or more transitions may be utilized per peptide, and in certain of these embodiments three or more; four or more; or five or more transitions may be utilized per peptide. In one embodiment, an LC-SRM-MS assay for use in diagnosing lung cancer may measure the intensity of five transitions that correspond to selected peptides associated with each biomarker protein. The achievable limit of quantification (LOQ) may be estimated for each peptide according to the observed signal intensities during this analysis. For examples, for sets of target proteins associated with lung cancer see Table 12.

[0112]

The expression level of a biomarker protein can be measured using any suitable method known in the art, including but not limited to mass spectrometry (MS), reverse transcriptase-polymerase chain reaction (RT-PCR), microarray, serial analysis of gene expression (SAGE), gene expression analysis by massively parallel signature sequencing (MPSS), immunoassays (e.g., ELISA), immunohistochemistry (IHC), transcriptomics, and proteomics.

[0113]

When ELISA is used to measure the expression level of a biomarker protein, an antibody that specifically binds the biomarker protein can be used. For example, a LG3BP antibody is used for measuring the expression level of LG3BP; a C163A antibody is used for measuring the expression level of C163A. In some embodiments, the method includes contacting a blood sample obtained from the subject with a LG3BP antibody and a C163A antibody.

[0114]

To evaluate the diagnostic performance of a particular set of peptide transitions, a ROC curve is generated for each significant transition.

[0115]

An “ROC curve” as used herein refers to a plot of the true positive rate (sensitivity) against the false positive rate (specificity) for a binary classifier system as its discrimination threshold is varied. A ROC curve can be represented equivalently by plotting the fraction of true positives out of the positives (TPR=true positive rate) versus the fraction of false positives out of the negatives (FPR=false positive rate). Each point on the ROC curve represents a sensitivity/specificity pair corresponding to a particular decision threshold. FIGS. 7 and 9 provide a graphical representation of the functional relationship between the distribution of biomarker or biomarker panel sensitivity and specificity values in a cohort of diseased subjects and in a cohort of non-diseased subjects.

[0116]

AUC represents the area under the ROC curve. The AUC is an overall indication of the diagnostic accuracy of 1) a biomarker or a panel of biomarkers and 2) a ROC curve. AUC is determined by the “trapezoidal rule.” For a given curve, the data points are connected by straight line segments, perpendiculars are erected from the abscissa to each data point, and the sum of the areas of the triangles and trapezoids so constructed is computed. In certain embodiments of the methods provided herein, a biomarker protein has an AUC in the range of about 0.75 to 1.0. In certain of these embodiments, the AUC is in the range of about 0.8 to 0.8, 0.9 to 0.95, or 0.95 to 1.0.

[0117]

The methods provided herein are minimally invasive and pose little or no risk of adverse effects. As such, they may be used to diagnose, monitor and provide clinical management of subjects who do not exhibit any symptoms of a lung condition and subjects classified as low risk for developing a lung condition. For example, the methods disclosed herein may be used to diagnose lung cancer in a subject who does not present with a PN and/or has not presented with a PN in the past, but who nonetheless deemed at risk of developing a PN and/or a lung condition. Similarly, the methods disclosed herein may be used as a strictly precautionary measure to diagnose healthy subjects who are classified as low risk for developing a lung condition.

[0118]

The present invention provides a method of determining the likelihood that a lung condition in a subject is cancer by measuring an abundance of a panel of proteins in a sample obtained from the subject; calculating a probability of cancer score based on the protein measurements and ruling out cancer for the subject if the score) is lower than a pre-determined score, wherein when cancer is ruled out the subject does not receive a treatment protocol. Treatment protocols include for example pulmonary function test (PFT), pulmonary imaging, a biopsy, a surgery, a chemotherapy, a radiotherapy, or any combination thereof. In some embodiments, the imaging is an x-ray, a chest computed tomography (CT) scan, or a positron emission tomography (PET) scan.

[0119]

The present invention further provides a method of ruling in the likelihood of cancer for a subject by measuring an abundance of panel of proteins in a sample obtained from the subject, calculating a probability of cancer score based on the protein measurements and ruling in the likelihood of cancer for the subject if the score in step is higher than a pre-determined score.

[0120]

In another aspect the invention further provides a method of determining the likelihood of the presence of a lung condition in a subject by measuring an abundance of panel of proteins in a sample obtained from the subject, calculating a probability of cancer score based on the protein measurements and concluding the presence of said lung condition if the score is equal or greater than a pre-determined score. The lung condition is lung cancer such as for example, non-small cell lung cancer (NSCLC).

[0121]

The panel includes at least 4 proteins selected from ALDOA, FRIL, LG3BP, IBP3, LRP1, ISLR, TSP1, COIA1, GRP78, TETN, PRDX1 and CD14. Optionally, the panel further includes at least one protein selected from BGH3, COIA1, TETN, GRP78, PRDX, FIBA and GSLG1.

[0122]

Alternatively, the panel includes at least 3 proteins selected from ALDOA, FRIL, LG3BP, IBP3, LRP1, ISLR, TSP1, COIA1, GRP78, TETN, PRDX1 and CD14. In some embodiments, the panel comprises at least 1, 2, 3, or 4 proteins selected from LRP1, COIA1, ALDOA, and LG3BP. In some embodiments, the panel comprises at least 1, 2, 3, 4, 5, 6, 7, or 8 proteins selected from LRP1, COIA1, ALDOA, LG3BP, BGH3, PRDX1, TETN, and ISLR. In some embodiments, the panel comprises at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, or 13 proteins selected from LRP1, COIA1, ALDOA, LG3BP, BGH3, PRDX1, TETN, ISLR, TSP1, GRP78, FRIL, FIBA, GSLG1.

[0123]

Optionally, the panel includes at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, or 36 proteins selected from TSP1, COIA1, ISLR, TETN, FRIL, GRP78, ALDOA, BGH3, LG3BP, LRP1, FIBA, PRDX1, GSLG1, KIT, CD14, EF1A1, TENX, AIFM1, GGH, IBP3, ENPL, ERO1A, 6PGD, ICAM1, PTPA, NCF4, SEM3G, 1433T, RAP2B, MMP9, FOLH1, GSTP1, EF2, RAN, SODM, and DSG2.

[0124]

Optionally, the panel includes at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44 proteins selected from FRIL, TSP1, LRP1, PRDX1, TETN, TBB3, COIA1, GGH, A1AG1, AIFM1, AMPN, CRP, GSLG1, IBP3, KIT, NRP1, 6PGD, CH10, CLIC1, COF1, CSF1, CYTB, DMKN, DSG2, EREG, ERO1A, FOLH1, ILEU, K1C19, LYOX, MMP7, NCF4, PDIA3, PTGIS, PTPA, RAN, SCF, SEM3G, TBA1B, TCPA, TERA, TIMP1, TNF12, and UGPA.

[0125]

The subject has or is suspected of having a pulmonary nodule. The pulmonary nodule has a diameter of less than or equal to 3 cm. In one embodiment, the pulmonary nodule has a diameter of about 0.8 cm to 2.0 cm. The subject may have stage IA lung cancer (i.e., the tumor is smaller than 3 cm).

[0126]

The score is calculated from a logistic regression model applied to the protein measurements. For example, the score is determined as Ps=1/[1+exp(−a−Σi=1Nβi*{hacek over (I)}i,s)], where Ii,sis logarithmically transformed and normalized intensity of transition i in said sample (s), βiis the corresponding logistic regression coefficient, a was a panel-specific constant, and N was the total number of transitions in said panel.

[0127]

In various embodiments, the method of the present invention further comprises normalizing the protein measurements. For example, the protein measurements are normalized by one or more proteins selected from PEDF, MASP1, GELS, LUM, C163A and PTPRJ.

[0128]

The biological sample such as for example tissue, blood, plasma, serum, whole blood, urine, saliva, genital secretion, cerebrospinal fluid, sweat and excreta.

[0129]

In one aspect, the determining the likelihood of cancer is determined by the sensitivity, specificity, negative predictive value or positive predictive value associated with the score. The score determined has a negative predictive value (NPV) is at least about 60%, at least 70% or at least 80%.

[0130]

The measuring step is performed by selected reaction monitoring mass spectrometry, using a compound that specifically binds the protein being detected or a peptide transition. In one embodiment, the compound that specifically binds to the protein being measured is an antibody or an aptamer.

[0131]

In specific embodiments, the diagnostic methods disclosed herein are used to rule out a treatment protocol for a subject, measuring the abundance of a panel of proteins in a sample obtained from the subject, calculating a probability of cancer score based on the protein measurements and ruling out the treatment protocol for the subject if the score determined in the sample is lower than a pre-determined score. In some embodiments the panel contains at least 3 proteins selected ALDOA, FRIL, LG3BP, IBP3, LRP1, ISLR, TSP1, COIA1, GRP78, TETN, PRDX1 and CD14.

[0132]

Optionally, the panel further comprises one or more proteins selected from ERO1A, 6PGD, GSTP1, GGH, PRDX1, CD14, PTPA, ICAM1, FOLH1, SODM, FIBA, GSLG1, RAP2B, or C163A or one or more proteins selected from LRP1, COIA1, TSP1, ALDOA, GRP78, FRIL, LG3BP, BGH3, ISLR, PRDX1, FIBA, or GSLG. In preferred embodiments, the panel contains at least TSP1, LG3BP, LRP1, ALDOA, and COIA1. In more a preferred embodiment, the panel contains at least TSP1, LRP1, ALDOA and COIA1.

[0133]

In specific embodiments, the diagnostic methods disclosed herein are used to rule in a treatment protocol for a subject by measuring the abundance of a panel of proteins in a sample obtained from the subject, calculating a probability of cancer score based on the protein measurements and ruling in the treatment protocol for the subject if the score determined in the sample is greater than a pre-determined score. In some embodiments the panel contains at least 3 proteins selected ALDOA, FRIL, LG3BP, IBP3, LRP1, ISLR or TSP1 or ALDOA, FRIL, LG3BP, IBP3, LRP1, ISLR, TSP1, COIA1, GRP78, TETN, PRDX1 and CD14. Optionally, the panel further comprises one or more proteins selected from ERO1A, 6PGD, GSTP1, COIA1, GGH, PRDX1, SEM3G, GRP78, TETN, AIFM1, MPR1, TNF12, MMP9 or OSTP or COIA1, TETN, GRP78, APOE or TBB3.

[0134]

In some embodiments, the panel comprises LG3BP and C163A.

[0135]

In certain embodiments, the diagnostic methods disclosed herein can be used in combination with other clinical assessment methods, including for example various radiographic and/or invasive methods. Similarly, in certain embodiments, the diagnostic methods disclosed herein can be used to identify candidates for other clinical assessment methods, or to assess the likelihood that a subject will benefit from other clinical assessment methods.

[0136]

The high abundance of certain proteins in a biological sample such as plasma or serum can hinder the ability to assay a protein of interest, particularly where the protein of interest is expressed at relatively low concentrations. Several methods are available to circumvent this issue, including enrichment, separation, and depletion. Enrichment uses an affinity agent to extract proteins from the sample by class, e.g., removal of glycosylated proteins by glycocapture. Separation uses methods such as gel electrophoresis or isoelectric focusing to divide the sample into multiple fractions that largely do not overlap in protein content. Depletion typically uses affinity columns to remove the most abundant proteins in blood, such as albumin, by utilizing advanced technologies such as IgY14/Supermix (SigmaSt. Louis, MO) that enable the removal of the majority of the most abundant proteins.

[0137]

In certain embodiments of the methods provided herein, a biological sample may be subjected to enrichment, separation, and/or depletion prior to assaying biomarker or putative biomarker protein expression levels. In certain of these embodiments, blood proteins may be initially processed by a glycocapture method, which enriches for glycosylated proteins, allowing quantification assays to detect proteins in the high pg/ml to low ng/ml concentration range. Exemplary methods of glycocapture are well known in the art (see, e.g., U.S. Pat. No. 7,183,188; U.S. Patent Appl. Publ. No. 2007/0099251; U.S. Patent Appl. Publ. No. 2007/0202539; U.S. Patent Appl. Publ. No. 2007/0269895; and U.S. Patent Appl. Publ. No. 2010/0279382). In other embodiments, blood proteins may be initially processed by a protein depletion method, which allows for detection of commonly obscured biomarkers in samples by removing abundant proteins. In one such embodiment, the protein depletion method is a Supermix (Sigma) depletion method.

[0138]

In certain embodiments, a biomarker protein panel comprises two to 100 biomarker proteins. In certain of these embodiments, the panel comprises 2 to 5, 6 to 10, 11 to 15, 16 to 20, 21-25, 5 to 25, 26 to 30, 31 to 40, 41 to 50, 25 to 50, 51 to 75, 76 to 100, biomarker proteins. In certain embodiments, a biomarker protein panel comprises one or more subpanels of biomarker proteins that each comprise at least two biomarker proteins. For example, biomarker protein panel may comprise a first subpanel made up of biomarker proteins that are overexpressed in a particular lung condition and a second subpanel made up of biomarker proteins that are under-expressed in a particular lung condition.

[0139]

In certain embodiments of the methods, compositions, and kits provided herein, a biomarker protein may be a protein that exhibits differential expression in conjunction with lung cancer. For example, in certain embodiments a biomarker protein may be one of the proteins associated with lung cancer set forth in Table 6.

[0140]

In other embodiments, the diagnosis methods disclosed herein may be used to distinguish between two different lung conditions. For example, the methods may be used to classify a lung condition as malignant lung cancer versus benign lung cancer, NSCLC versus SCLC, or lung cancer versus non-cancer condition (e.g., inflammatory condition).

[0141]

In certain embodiments, kits are provided for diagnosing a lung condition in a subject. These kits are used to detect expression levels of one or more biomarker proteins. Optionally, a kit may comprise instructions for use in the form of a label or a separate insert. The kits can contain reagents that specifically bind to proteins in the panels described, herein. These reagents can include antibodies. The kits can also contain reagents that specifically bind to mRNA expressing proteins in the panels described, herein. These reagents can include nucleotide probes. The kits can also include reagents for the detection of reagents that specifically bind to the proteins in the panels described herein. These reagents can include fluorophores.

[0142]

The following examples are provided to better illustrate the claimed invention and are not to be interpreted as limiting the scope of the invention. To the extent that specific materials are mentioned, it is merely for purposes of illustration and is not intended to limit the invention. One skilled in the art may develop equivalent means or reactants without the exercise of inventive capacity and without departing from the scope of the invention.

EXAMPLES

Example 1: Identification of lung cancer biomarker proteins

[0143]

A retrospective, case-control study design was used to identify biomarker proteins and panels thereof for diagnosing various lung diseases in pre-defined control and experimental groups. The first goal of these studies was to demonstrate statistically significant differential expression for individual proteins between control and experimental groups. The second goal is to identify a panel of proteins which all individually demonstrate statistically significant differential expression between control and experimental groups. This panel of proteins can then be used collectively to distinguish between dichotomous disease states.

[0144]

Specific study comparisons may include 1) cancer vs. non-cancer, 2) small cell lung cancer versus non-small cell lung cancer (NSCLC), 3) cancer vs. inflammatory disease state (e.g., infectious granuloma), or 4) different nodule size, e.g., <10 mm versus ≥10 mm (alternatively using 10, 15 or 20 mm cut-offs depending upon sample distributions).

[0145]

Data for each subject consisted of the following:

[0146]

Archived plasma samples from subjects previously enrolled in Institute Review Board (IRB)-approved studies was used to identify biomarker proteins and biomarker panels for distinguishing lung malignancies from non-malignancies. Plasma samples were originally obtained by routine phlebotomy, aliquotted, and stored at −80° C. or lower. Sample preparation, assignment of subject identification codes, initial subject record entry, and specimen storage were performed as per IRB study protocols. Sample eligibility is based on clinical parameters, including the subject, PN, and clinical staging parameters. Parameters for inclusion and exclusion are set forth in Table 7.

[0147]

Sample Sample eligibility will be based on clinical
Inclusion parameters, including the following subject, nodule
Criteria and clinical staging parameters:
 Subject
  age ≥40
  any smoking status, e.g. current, former, or never
  co-morbid conditions, e.g. COPD
  prior malignancy - only skin carcinomas - squamous
  or basal cell
 Nodule
  radiology
   size ≥4 mm and ≤30 mm
   solid, semi-solid or non-solid
   any spiculation or ground glass opacity
  pathology
   malignant - e.g. adenocarcinoma, squamous, or large
   cell
   benign - inflammatory (e.g. granulomatous,
   infectious) or non-inflammatory (e.g. hamartoma)
   confirmed by biopsy, surgery or stability of lung
   nodule for 2 years or more.
 Clinical stage
  Primary tumor: ≤T1 (e.g. 1A, 1B)
  Regional lymph nodes: N0 or N1 only
  Distant metastasis: M0 only
Sample Subject
Exclusion  prior malignancy within 5 years of lung nodule
Criteria  diagnosis
Nodule
 size data unavailable
 for cancer or benign nodule, no pathology or follow-
 up CT data available
Clinical stage
 Primary tumor: ≥T2
 Regional lymph nodes: ≥N2
 Distant metastasis: ≥M1

[0148]

The assignment of a sample to a control or experimental group, and its further stratification or matching to other samples within and between these groups, is dependent on various clinical data about the subject. This data includes, for example, demographic information such as age, gender, and clinical history (e.g., smoking status), co-morbid conditions, PN characterization, and pathologic interpretation of resected lesions and tissues (Table 8).

[0149]

1. Enrollment Data
a. Demographics - age, birth date, gender, ethnicity
b. Measurements - Height (cm) and weight (kg)
c. Smoking history - never, former, or current with pack-
year estimation
d. Medical history - details of co-morbid conditions, e.g.
chronic obstructive pulmonary disease (COPD),
inflammatory or autoimmune diseases, endocrine
(diabetes), and cardiovascular
e. Medication history - current medications, dosages and
indications
f. Radiographic data and nodule characteristics
1) nodule size in millimeters (width × height × length)
2) location, e.g. right or left and upper, lower or middle
3) quality, e.g. solid, semi-solid, ground glass, calcified,
etc.
2. Diagnostic Evaluation Data
a. Primary diagnosis and associated reports (clinical
history, physical exam, and laboratory tests report)
b. Pulmonary Function Tests (PFTs), if available
c. Follow-up CT scans - subsequent nodule evaluations
by chest CT
d. PET scan
e. Clinical Staging
f. Biopsy procedures
1) FNA or TTNA
2) bronchoscopy with transbronchial or needle biopsy
3) surgical diagnostic procedures, e.g. VATS and/or
thoracotomy
3. Radiology Report(s)
4. Pathology Report(s)
5. Blood Sample Collection Information
6. Reporting of Adverse Events
a. AEs resulting from center's SOC, e.g. procedural
morbidity.
Subject demographics - e.g. age, gender, ethnicity
smoking status - e.g. never-, former- (“ex-”) or current-
smoker; pack-years
clinical history - e.g. co-morbid conditions, e.g. COPD,
infection
Nodule size - e.g. planar (width × height × length) and volume
dimensions
appearance - e.g. calcifications, ground glass appearance,
eccentricity
Pathologyprimary lung vs. systemic disorder
malignancy status - malignant vs. benign (vs.
indeterminate)
histopathology - e.g. small cell lung cancer (SCLC) vs.
non-small cell lung cancer (NSCLC - adenocarcinoma,
squamous carcinoma, large cell carcinoma); other types,
e.g. hematologic, carcinoid, etc.
immunologically quiescent, e.g. hamartoma, vs.
inflammatory, e.g. granulomatous and/or infectious,
e.g. fungal

[0150]

The study design and analytical plan prioritizes the control:experimental group pairings set forth in Table 9. Additional clinical and molecular insights may be gained by selective inclusion of phenotypes, e.g. effect of smoking, in the assignment of experimental and control groups. Demographic information available in the clinical database will enable further refinements in sample selection via the stratification or matching of samples in the case-control analyses with respect to clinical parameters, e.g., age and nodule size.

[0151]

Assignment of Experimental and Control Groups to Achieve Proteomic
Analysis Objectives
Experimental
Analysis Objective Group Control Group
1 Differentiate cancer from A. Cancer Any non-
benign lung nodule nodule malignant
(benign)
phenotype with
nodule ≥4 mm in
diameter
2 Differentiate cancer from A. Cancer Non-malignant
non-malignant nodule (non-benign) lung
(inflammatory, infectious) disorder, e.g.
lung nodule granulomatous
(fungal) disease,
with nodule

[0152]

LC-SRM-MS is performed to identify and quantify various plasma proteins in the plasma samples. Prior to LC-SRM-MS analysis, each sample is depleted using IgY14/Supermix (Sigma) and then trypsin-digested. Samples from each control or experimental group are batched randomly and processed together on a QTrap 5500 instrument (AB SCIEX, Foster City, CA) for unbiased comparisons. Each sample analysis takes approximately 30 minutes. Peak areas for two transitions (native and heavy label) are collected and reported for all peptides and proteins. The data output for each protein analyzed by LC-SRM-MS typically yields four measurements consisting of two transition measurements from each of two peptides from the same protein. These measurements enable an inference of the relative abundance of the target protein, which will be used as its expression level in the bioinformatics and statistical analyses.

[0153]

Identification of biomarker proteins having differential expression levels between the control and experimental groups yields one or more novel proteomic profiles. For example, biomarker proteins are identified with expression levels that differ in subjects with PNs who are diagnosed with NSCLC versus those without an NSCLC diagnosis, or in subjects with PNs who are diagnosed with NSCLC versus an inflammatory disorder. Panels of biomarker proteins are also identified which can collectively discriminate between dichotomous disease states.

[0154]

Analyses may be (a priori) powered appropriately to control type 1 and type 2 errors at 0.05 and to detect inter-cohort differences of 25% per analyte. The diagnostic power of individual proteins is generally assessed to distinguish between two cohorts, assuming a one-sided paired non-parametric test is used. This provides a lower bound on the sample size required to demonstrate differential expression between experimental and control groups. Multiple testing effects apply for the identification of panels of proteins for assessing diagnostic efficacy, which requires larger sample sizes.

[0155]

The sequence of steps for determining statistical significance for differential expression of an individual protein includes the following: 1) assessing and correlating the calibrated values of transitions of a single protein (a quality control measure); 2) comparing paired analysis of groups to control for other influences using the Mann-Whitney U-test (rank sum) to determine statistical significance; and 3) determining its significance based on a pre-defined significance threshold. Transitions within a protein that are not correlated across samples (e.g., Pearson correlation <0.5) will be deemed unreliable and excluded from the analysis.

[0156]

Comparison of calibrated samples between two cohorts, e.g., cancer and non-cancer, requires pairing or matching using a variety of clinical parameters such as nodule size, age and gender. Such pairing controls for the potential influence of these other parameters on the actual comparison goal, e.g. cancer and non-cancer. A non-parametric test such as the Mann-Whitney U-test (rank sum) will then be applied to measure the statistical difference between the groups. The resulting p value can be adjusted using multiple testing corrections such as the false discovery rate. Permutation tests can be used for further significance assessments.

[0157]

Significance will be determined by the satisfaction of a pre-defined threshold, such as 0.05, to filter out assays, with the potential use of higher threshold values for additional filtering. An additional significance criterion is that two of three replicate assays must individually be significant in order for the assay, e.g., single protein, to be significant.

[0158]

Panels of proteins that individually demonstrate statistically significant differential expression as defined above and which can collectively be used to distinguish dichotomous disease states are identified using statistical methods described herein. This requires developing multivariate classifiers and assessing sensitivity, specificity, and ROC AUC for panels. In addition, protein panels with optimal discriminatory performance, e.g., ROC AUC, are identified and may be sufficient for clinical use in discriminating disease states.

[0159]

The sequence of steps for determining the statistical significance of the discriminatory ability of a panel of proteins includes 1) developing multivariate classifiers for protein panels, and 2) identifying a protein panel with optimal discriminatory performance, e.g. ROC AUC, for a set of disease states.

[0160]

A multivariate classifier (e.g., majority rule) will be developed for protein panels, including single protein assays deemed to be significant. The sensitivity and specificity of each classifier will be determined and used to generate a receiver operating characteristics (ROC) curve and its AUC to assess a given panel's discriminatory performance for a specific comparison, e.g. cancer versus non-cancer.

[0000]

Protocol

[0161]

1. Review clinical data from a set of subjects presenting with lung disease.

[0162]

2. Provide plasma samples from the subjects wherein the samples are either benign, cancerous, COPD or another lung disease.

[0163]

3. Group the plasma samples that are benign or cancerous by PNs that are separated by size of the nodule.

[0164]

4. Target a pool of 371 putative lung cancer biomarker proteins consisting of at least two peptides per protein and at least two LC-SRM-MS transitions per peptide. Measuring the LC-SRM-MS transitions in each specimen along with 5 synthetic internal standards consisting of 10 transitions to compare peptide transitions from the plasma to the synthetic internal standards by LC-SRM-MS mass spectroscopy.

[0165]

5. Quantitate the intensity of each transition.

[0166]

6. Normalize the quantitated transitions to internal standards to obtain a normalized intensity.

[0167]

7. Review the measured peptide transitions for correlations from the same peptide, rejecting discordant transitions.

[0168]

8. Generate an ROC for each transition by comparing cancerous with benign samples. (ROC compare specificity (true positive) to (1-sensitivity) false positive).

[0169]

9. Define the AUC for each transition. (An AUC of 0.5 is a random classifier; 1.0 is a perfect classifier).

[0170]

10. Determine an AUC cut-off point to determine transitions that are statistically significant.

[0171]

11. Define the transitions that exceed the AUC cutoff point.

[0172]

12. Combine all pairings of significant transitions.

[0173]

13. Define a new AUC for each transition pair by means of logistical regression.

[0174]

14. Repeat pairing combinations into triples, quad, etc.; defining a new AUC based upon the logistical regression of combined transitions until a panel of biomarker transitions with combined desired performance (sensitivity & specificity) have been achieved.

[0175]

15. The panel of biomarker transitions is verified against previously unused set of plasma panels.

Example 2: Diagnosis/Classification of Lung Disease Using Biomarker Proteins

[0176]

Plasma samples will be obtained from one or more subjects presenting with PNs to evaluate whether the subjects have a lung condition. The plasma samples will be depleted using IgYl4/Supermix (Sigma) and optionally subjected to one or more rounds of enrichment and/or separation, and then trypsinized. The expression level of one or more biomarker proteins previously identified as differentially expressed in subjects with the lung condition will be measured using an LC-SRM-MS assay. The LC-SRM-MS assay will utilize two to five peptide transitions for each biomarker protein. For example, the assay may utilize one or more of the peptide transitions generated from any of the proteins listed in Table 6. Subjects will be classified as having the lung condition if one or more of the biomarker proteins exhibit expression levels that differ significantly from the pre-determined control expression level for that protein.

Example 3: Blood-Based Diagnostic Test to Determine the Likelihood that a Pulmonary Nodule (PN) is Benign or Malignant

[0177]

A panel of 15 proteins was created where the concentration of these 15 proteins relative to the concentration of 6 protein standards is indicative of likelihood of cancer. The relative concentration of these 15 proteins to the 6 protein standards was measured using a mass spectrometry methodology. A classification algorithm is used to combine these relative concentrations into a relative likelihood of the PN being benign or malignant. Further it has been demonstrated that there are many variations on these panels that are also diagnostic tests for the likelihood that a PN is benign or malignant. Variations on the panel of proteins, protein standards, measurement methodology and/or classification algorithm are described herein.

[0000]

Study Design

[0178]

A Single Reaction Monitoring (SRM) mass spectrometry (MS) assay was developed consisting of 1550 transitions from 345 lung cancer associated proteins. The SRM-MS assay and methodology is described above. The goal of this study was to develop a blood-based diagnostic for classifying PNs under 2 cm in size as benign or malignant. The study design appears in Table 10.

[0179]

Study Design
Small (<2 cm)Large (>2 cm)
LavalUPennNYULavalUPennNYU
Benign142929132115
Malignant142929132115
Batches122121
72 vs. 72 (94% power)49 vs. 49 (74% power)

[0180]

The study consisted of 242 plasma samples from three sites (Laval, UPenn and NYU). The number of benign and malignant samples from each site are indicated in Table 10. The study consisted of 144 plasma samples from patients with PNs of size 2 cm or less and of 98 samples from patients with PNs of size larger than 2 cm. This resulted in an estimated power of 94% for discovering proteins with blood concentrations of 1.5 fold or more between benign and malignant cancer samples of size 2 cm or less. Power is 74% for PNs of size larger than 2 cm.

[0181]

This study was a retrospective multisite study that was intended to derive protein biomarkers of lung cancer that are robust to site-to-site variation. The study included samples larger than 2 cm to ensure that proteins not detectable due to the limit of detection of the measurement technology (LC-SRM-MS) for tumors of size 2 cm or less could still be detected in tumors of size 2 cm or larger.

[0182]

Samples from each site and in each size class (above and below 2 cm) were matched on nodule size, age and gender.

[0000]

Sample Analysis

[0183]

Each sample was analyzed using the LC-SRM-MS measurement methodology as follows:

[0184]

1. Samples were depleted of high abundance proteins using the IGy14 and Supermix depletion columns from Sigma-Aldrich.

[0185]

2. Samples were digested using trypsin into tryptic peptides.

[0186]

3. Samples were analyzed by LC-SRM-MS using a 30 minute gradient on a Waters nanoacuity LC system followed by SRM-MS analysis of the 1550 transitions on a ABSciex 5500 triple quad device.

[0187]

4. Raw transition ion counts were obtained and recorded for each of the 1550 transitions.

[0188]

It is important to note that matched samples were processed at each step either in parallel (steps 2 and 4) or back-to-back serially (steps 1 and 3). This minimizes analytical variation. Finally, steps 1 and 2 of the sample analysis are performed in batches of samples according to day of processing. There were five batches of ‘small’ samples and four batches of ‘large’ samples as denoted in Table 10.

[0000]

Protein Shortlist

[0189]

A shortlist of 68 proteins reproducibly diagnostic across sites was derived as follows. Note that each protein can be measured by multiple transitions.

[0190]

Step 1: Normalization

[0191]

Six proteins were identified that had a transition detected in all samples of the study and with low coefficient of variation. For each protein the transition with highest median intensity across samples was selected as the representative transition for the protein. These proteins and transitions are found in Table 11.

[0192]

Normalizing Factors
ProteinTransi-
(Uniprottion
ID)Peptide (Amino Acid Sequence)(m/z)
CD44_HUMANYGFIEGHVVIPR (SEQ ID NO: 1)272.2
TENX_HUMANYEVTVVSVR (SEQ ID NO: 2)759.5
CLUS_HUMANASSIIDELFQDR (SEQ ID NO: 3)565.3
IBP3_HUMANFLNVLSPR (SEQ ID NO: 4)685.4
GELS_HUMANTASDFITK (SEQ ID NO: 5)710.4
MASP1_HUMANTGVITSPDFPNPYPK (SEQ ID NO: 6)258.10

[0193]

We refer to the transitions in Table 11 as normalizing factors (NFs). Each of the 1550 transitions were normalized by each of the six normalizing factors where the new intensity of a transition tin a sample s by NF f, denoted New(s,t,f), is calculated as follows:
New(s,t,f)=Raw(s,t)*Median(f)/Raw(s,f)

[0194]

where Raw(s,t) is the original intensity of transition tin sample s; Median(f) is the median intensity of the NF f across all samples; and Raw(s,f) is the original intensity of the NF f in sample s.

[0195]

For each protein and normalized transition, the AUC of each batch was calculated. The NF that minimized the coefficient of variation across the 9 batches was selected as the NF for that protein and for all transitions of that protein. Consequently, every protein (and all of its transitions) are now normalized by a single NF.

[0196]

Step 2: Reproducible Diagnostic Proteins

[0197]

For each normalized transition its AUC for each of the nine batches in the study is calculated as follows. If the transition is detected in fewer than half of the cancer samples and in fewer than half of the benign samples then the batch AUC is ‘ND’. Otherwise, the batch AUC is calculated comparing the benign and cancer samples in the batch.

[0198]

The batch AUC values are transformed into percentile AUC scores for each transition. That is, if a normalized transition is in the 82nd percentile of AUC scores for all transitions then it is assigned percentile AUC 0.82 for that batch.

[0199]

Reproducible transitions are those satisfying at least one of the following criteria:

[0200]

1. In at least four of the five small batches the percentile AUC is 75% or more (or 25% and less).

[0201]

2. In at least three of the five small batches the percentile AUC is 80% or more (or 20% and less) AND the remaining percentile AUCs in the small batches are above 50% (below 50%).

[0202]

3. In all five small batches the percentile AUC is above 50% (below 50%).

[0203]

4. In at least three of the four large batches the percentile AUC is 85% or more (or 15% and less).

[0204]

5. In at least three of the four large batches the percentile AUC is 80% or more (or 20% and less) AND the remaining percentile AUCs in the large batches are above 50% (below 50%).

[0205]

6. In all four large batches the percentile AUC is above 50% (below 50%).

[0206]

These criteria result in a list of 67 proteins with at least one transition satisfying one or more of the criteria. These proteins appear in Table 12.

[0207]

G3P_HUMAN11386%Glyceraldehyde-3-phosphateP04406
dehydrogenase; Short name = GAPDH;
Alternative name(s):
Peptidyl-cysteine 5-nitrosylase GAPDH
FRIL_HUMAN10782%Recommended name:P02792
Ferritin light chain
Short name = Ferritin L subunit
HYOU1_HUMAN6953%Recommended name:Q9Y4L1
Hypoxia up-regulated protein 1
Alternative name(s):
150 kDa oxygen-regulated protein
Short name = ORP-150
170 kDa glucose-regulated protein
Short name = GRP-170
ALDOA_HUMAN6650%Recommended name:P04075
Fructose-bisphosphate aldolase A
EC = 4.1.2.13
Alternative name(s):
Lung cancer antigen NY-LU-1
Muscle-type aldolase
HXK1_HUMAN6550%Recommended name:P19367
Hexokinase-1
EC = 2.7.1.1
Alternative name(s):
Brain form hexokinase
Hexokinase type I
Short name = HK I
APOE_HUMAN6348%Recommended name:P02649
Apolipoprotein E
Short name = Apo-E
TSP1_HUMAN6348%Recommended name:P07996
Thrombospondin-1
FINC_HUMAN6247%Recommended name:P02751
Fibronectin
Short name = FN
Alternative name(s):
Cold-insoluble globulin
Short name = CIG
Cleaved into the following 4 chains:
1. Anastellin
2. Ugl-Y1
3. Ugl-Y2
4. Ugl-Y3
LRP1_HUMAN5844%Recommended name:
Prolow-density lipoprotein receptor-related
protein 1
Short name = LRP-1
Alternative name(s):
Alpha-2-macroglobulin receptor
Short name = A2MR
Apolipoprotein E receptor
Short name = APOER
CD_antigen = CD91
Cleaved into the following 3 chains:
1. Low-density lipoprotein receptor-related
protein 1 85 kDa subunit
Short name = LRP-85
2. Low-density lipoprotein receptor-related
protein 1 515 kDa subunit
Short name = LRP-515
3. Low-density lipoprotein receptor-related
protein 1 intracellular domain
Short name = LRPICD
6PGD_HUMAN5038%Recommended name:P52209
6-phosphogluconate dehydrogenase,
decarboxylating
S10A6_HUMAN4736%Recommended name:P06703
Protein S100-A6
Alternative name(s):
Calcyclin
Growth factor-inducible protein 2A9
MLN 4
Prolactin receptor-associated protein
Short name = PRA
S100 calcium-binding protein A6
CALU_HUMAN4534%Recommended name:O43852
Calumenin
Alternative name(s):
Crocalbin
IEF SSP 9302
PRDX1_HUMAN4534%Recommended name:Q06830
Peroxiredoxin-1
EC = 1.11.1.15
Alternative name(s):
Natural killer cell-enhancing factor A
Short name = NKEF-A
Proliferation-associated gene protein
Short name = PAG
Thioredoxin peroxidase 2
Thioredoxin-dependent peroxide reductase
2
RAN_HUMAN4534%Recommended name:P62826
GTP-binding nuclear protein Ran
Alternative name(s):
Androgen receptor-associated protein 24
GTPase Ran
Ras-like protein TC4
Ras-related nuclear protein
CD14_HUMAN4333%Recommended name:P08571
Monocyte differentiation antigen CD14
Alternative name(s):
Myeloid cell-specific leucine-rich
glycoprotein
CD_antigen = CD14
Cleaved into the following 2 chains:
1. Monocyte differentiation antigen CD14,
urinary form
2. Monocyte differentiation antigen CD14,
membrane-bound form
AMPN_HUMAN4131%Recommended name:P15144
Aminopeptidase N
Short name = AP-N
Short name = hAPN
EC = 3.4.11.2
Alternative name(s):
Alanyl aminopeptidase
Aminopeptidase M
Short name = AP-M
Microsomal aminopeptidase
Myeloid plasma membrane glycoprotein
CD13
gp150
CD_antigen = CD13
GSLG1_HUMAN3627%Recommended name:Q92896
Golgi apparatus protein 1
Alternative name(s):
CFR-1
Cysteine-rich fibroblast growth factor
receptor
E-selectin ligand 1
Short name = ESL-1
Golgi sialoglycoprotein MG-160
1433Z_HUMAN3224%Recommended name:P63104
14-3-3 protein zeta/delta
Alternative name(s):
Protein kinase C inhibitor protein 1
Short name = KCIP-1
IBP3_HUMAN3124%Recommended name:P17936
Insulin-like growth factor-binding protein
3
Short name = IBP-3
Short name = IGF-binding protein 3
Short name = IGFBP-3
ILK_HUMAN3124%Recommended name:Q13418
Integrin-linked protein kinase
EC = 2.7.11.1
Alternative name(s):
59 kDa serine/threonine-protein kinase
ILK-1
ILK-2
p59ILK
LDHB_HUMAN3023%Recommended name:P07195
L-lactate dehydrogenase B chain
Short name = LDH-B
EC = 1.1.1.27
Alternative name(s):
LDH heart subunit
Short name = LDH-H
Renal carcinoma antigen NY-REN-46
MPRI_HUMAN2922%Recommended name:P11717
Cation-independent mannose-6-phosphate
receptor
Short name = CI Man-6-P receptor
Short name = CI-MPR
Short name = M6PR
Alternative name(s):
300 kDa mannose 6-phosphate receptor
Short name = MPR 300
Insulin-like growth factor 2 receptor
Insulin-like growth factor II receptor
Short name = IGF-II receptor
M6P/IGF2 receptor
Short name = M6P/IGF2R
CD_antigen = CD222
PROF1_HUMAN2922%Recommended name:P07737
Profilin-1
Alternative name(s):
Profilin I
PEDF_HUMAN2821%Recommended name:P36955
Pigment epithelium-derived factor
Short name = PEDF
Alternative name(s):
Cell proliferation-inducing gene 35 protein
EPC-1
Serpin F1
CLIC1_HUMAN2620%Recommended name:O00299
Chloride intracellular channel protein 1
Alternative name(s):
Chloride channel ABP
Nuclear chloride ion channel 27
Short name = NCC27
Regulatory nuclear chloride ion channel
protein
Short name = hRNCC
GRP78_HUMAN2519%Recommended name:P11021
78 kDa glucose-regulated protein
Short name = GRP-78
Alternative name(s):
Endoplasmic reticulum lumenal Ca(2+)-
binding protein grp78
Heat shock 70 kDa protein 5
Immunoglobulin heavy chain-binding
protein
Short name = BiP
CEAM8_HUMAN2418%Recommended name:P31997
Carcinoembryonic antigen-related cell
adhesion molecule 8
Alternative name(s):
CD67 antigen
Carcinoembryonic antigen CGM6
Non-specific cross-reacting antigen NCA-95
CD_antigen = CD66b
VTNC_HUMAN2418%Recommended name:P04004
Vitronectin
Alternative name(s):
S-protein
Serum-spreading factor
V75
Cleaved into the following 3 chains:
1. Vitronectin V65 subunit
2. Vitronectin V10 subunit
3. Somatomedin-B
CERU_HUMAN2217%Recommended name:P00450
Ceruloplasmin
EC = 1.16.3.1
Alternative name(s):
Ferroxidase
DSG2_HUMAN2217%Recommended name:Q14126
Desmoglein-2
Alternative name(s):
Cadherin family member 5
HDGC
KIT_HUMAN2217%Recommended name:P10721
Mast/stem cell growth factor receptor Kit
Short name = SCFR
EC = 2.7.10.1
Alternative name(s):
Piebald trait protein
Short name = PBT
Proto-oncogene c-Kit
Tyrosine-protein kinase Kit
p145 c-kit
v-kit Hardy-Zuckerman 4 feline sarcoma
viral oncogene homolog
CD_antigen = CD117
TBB3_HUMAN2217%Recommended name:Q13509
Tubulin beta-3 chain
Alternative name(s):
Tubulin beta-4 chain
Tubulin beta-III
CH10_HUMAN2116%Recommended name:P61604
10 kDa heat shock protein, mitochondrial
Short name = Hsp10
Alternative name(s):
10 kDa chaperonin
Chaperonin 10
Short name = CPN10
Early-pregnancy factor
Short name = EPF
ISLR_HUMAN2116%Immunoglobulin superfamily containingO14498
leucine-rich repeat protein
MASP1_HUMAN2116%Recommended name:P48740
Mannan-binding lectin serine protease 1
EC = 3.4.21.-
Alternative name(s):
Complement factor MASP-3
Complement-activating component of Ra-
reactive factor
Mannose-binding lectin-associated serine
protease 1
Short name = MASP-1
Mannose-binding protein-associated serine
protease
Ra-reactive factor serine protease p100
Short name = RaRF
Serine protease 5
Cleaved into the following 2 chains:
1. Mannan-binding lectin serine protease 1
heavy chain
2. Mannan-bindin lectin serine protease 1
light chain
ICAM3_HUMAN2015%Recommended name:P32942
Intercellular adhesion molecule 3
Short name = ICAM-3
Alternative name(s):
CDw50
ICAM-R
CD_antigen = CD50
PTPRJ_HUMAN2015%Recommended name:Q12913
Receptor-type tyrosine-protein
phosphatase eta
Short name = Protein-tyrosine phosphatase
eta
Short name = R-PTP-eta
EC = 3.1.3.48
Alternative name(s):
Density-enhanced phosphatase 1
Short name = DEP-1
HPTP eta
Protein-tyrosine phosphatase receptor type
J
Short name = R-PTP-J
CD_antigen = CD148
A1AG1_HUMAN1915%Recommended name:P02763
Alpha-1-acid glycoprotein 1
Short name = AGP 1
Alternative name(s):
Orosomucoid-1
Short name = OMD 1
CD59_HUMAN1814%Recommended name:P13987
CD59 glycoprotein
Alternative name(s):
1F5 antigen
20 kDa homologous restriction factor
Short name = HRF-20
Short name = HRF20
MAC-inhibitory protein
Short name = MAC-IP
MEM43 antigen
Membrane attack complex inhibition
factor
Short name = MACIF
Membrane inhibitor of reactive lysis
Short name = MIRL
Protectin
CD_antigen = CD59
MDHM_HUMAN1814%commended name:P40926
Malate dehydrogenase, mitochondrial
PVR_HUMAN1814%Recommended name:P15151
Poliovirus receptor
Alternative name(s):
Nectin-like protein 5
Short name = NECL-5
CD_antigen = CD155
SEM3G_HUMAN1814%Recommended name:Q9N598
Semaphorin-3G
Alternative name(s):
Semaphorin sem2
C06A3_HUMAN1713%Collagen alpha-3(VI) chainP12111
MMP9_HUMAN1713%Recommended name:P14780
Matrix metalloproteinase-9
Short name = MMP-9
EC = 3.4.24.35
Alternative name(s):
92 kDa gelatinase
92 kDa type IV collagenase
Gelatinase B
Short name = GELB
Cleaved into the following 2 chains:
1. 67 kDa matrix metalloproteinase-9
2. 82 kDa matrix metalloproteinase-9
TETN_HUMAN1713%Recommended name:P05452
Tetranectin
Short name = TN
Alternative name(s):
C-type lectin domain family 3 member B
Plasminogen kringle 4-binding protein
TNF12_HUMAN1713%Recommended name:O43508
Tumor necrosis factor ligand superfamily
member 12
Alternative name(s):
APO3 ligand
TNF-related weak inducer of apoptosis
Short name = TWEAK
Cleaved into the following 2 chains:
1. Tumor necrosis factor ligand superfamily
member 12, membrane form
2. Tumor necrosis factor ligand superfamily
member 12, secreted form
BST1_HUMAN1612%Recommended name:Q10588
ADP-ribosyl cyclase 2
EC = 3.2.2.5
Alternative name(s):
Bone marrow stromal antigen 1
Short name = BST-1
Cyclic ADP-ribose hydrolase 2
Short name = cADPr hydrolase 2
CD_antigen = CD157
COIA1_HUMAN1612%Recommended name:P39060
Collagen alpha-1(XVIII) chain
Cleaved into the following chain:
1. Endostatin
CRP_HUMAN1612%Recommended name:P02741
C-reactive protein
Cleaved into the following chain:
1.C-reactive protein(1-205)
PLSL_HUMAN1612%Recommended name:P13796
Plastin-2
Alternative name(s):
L-plastin
LC64P
Lymphocyte cytosolic protein 1
Short name = LCP-1
BGH3_HUMAN1511%Recommended name:Q15582
Transforming growth factor-beta-induced
protein ig-h3
Short name = Beta ig-h3
Alternative name(s):
Kerato-epithelin
RGD-containing collagen-associated
protein
Short name = RGD-CAP
CD44_HUMAN1511%Recommended name:P16070
CD44 antigen
Alternative name(s):
CDw44
Epican
Extracellular matrix receptor III
Short name = ECMR-III
GP90 lymphocyte homing/adhesion
receptor
HUTCH-I
Heparan sulfate proteoglycan
Hermes antigen
Hyaluronate receptor
Phagocytic glycoprotein 1
Short name = PGP-1
Phagocytic glycoprotein I
Short name = PGP-I
CD_antigen = CD44
ENOA_HUMAN1511%Recommended name:P06733
Alpha-enolase
EC = 4.2.1.11
Alternative name(s):
2-phospho-D-glycerate hydro-lyase
C-myc promoter-binding protein
Enolase 1
MBP-1
MPB-1
Non-neural enolase
Short name = NNE
Phosphopyruvate hydratase
Plasminogen-binding protein
LUM_HUMAN1511%
SCF_HUMAN1511%Recommended name:P21583
Kit ligand
Alternative name(s):
Mast cell growth factor
Short name = MGF
Stem cell factor
Short name = SCF
c-Kit ligand
Cleaved into the following chain:
1. Soluble KIT ligand
Short name = sKITLG
UGPA_HUMAN1511%Recommended name:Q16851
UTP--glucose-1-phosphate
uridylyltransferase
EC = 2.7.7.9
Alternative name(s):
UDP-glucose pyrophosphorylase
Short name = UDPGP
Short name = UGPase
ENPL_HUMAN1411%Recommended name:P14625
Endoplasmin
Alternative name(s):
94 kDa glucose-regulated protein
Short name = GRP-94
Heat shock protein 90 kDa beta member 1
Tumor rejection antigen 1
gp96 homolog
GDIR2_HUMAN1411%Recommended name:P52566
Rho GDP-dissociation inhibitor 2
Short name = Rho GDI 2
Alternative name(s):
Ly-GDI
Rho-GDI beta
GELS_HUMAN1411%Recommended name:P06396
Gelsolin
Alternative name(s):
AGEL
Actin-depolymerizing factor
Short name = ADF
Brevin
SODM_HUMAN1411%Recommended name:P04179
Superoxide dismutase [Mn], mitochondrial
TPIS_HUMAN1411%Recommended name:P60174
Triosephosphate isomerase
Short name = TIM
EC = 5.3.1.1
Alternative name(s):
Triose-phosphate isomerase
TENA_HUMAN1310%Recommended name:P24821
Tenascin
Short name = TN
Alternative name(s):
Cytotactin
GMEM
GP 150-225
Glioma-associated-extracellular matrix
antigen
Hexabrachion
JI
Myotendinous antigen
Neuronectin
Tenascin-C
Short name = TN-C
ZA2G_HUMAN1310%Recommended name:P25311
Zinc-alpha-2-glycoprotein
Short name = Zn-alpha-2-GP
Short name = Zn-alpha-2-glycoprotein
LEG1_HUMAN11 8%Recommended name:P09382
Galectin-1
Short name = Gal-1
Alternative name(s):
14 kDa laminin-binding protein
Short name = HLBP14
14 kDa lectin
Beta-galactoside-binding lectin L-14-I
Galaptin
HBL
HPL
Lactose-binding lectin 1
Lectin galactoside-binding soluble 1
Putative MAPK-activating protein PM12
5-Lac lectin 1
FOLH1_HUMAN9 7%Recommended name:Q04609
Glutamate carboxypeptidase 2
EC = 3.4.17.21
Alternative name(s):
Cell growth-inhibiting gene 27 protein
Folate hydrolase 1
Folylpoly-gamma-glutamate
carboxypeptidase
Short name = FGCP
Glutamate carboxypeptidase II
Short name = GCPII
Membrane glutamate carboxypeptidase
Short name = mGCP
N-acetylated-alpha-linked acidic
dipeptidase I
Short name = NAALADase I
Prostate-specific membrane antigen
Short name = PSM
Short name = PSMA
Pteroylpoly-gamma-glutamate
carboxypeptidase
PLXC1_HUMAN9 7%
PTGIS_HUMAN9 7%Recommended name:Q16647
Prostacyclin synthase
EC = 5.3.99.4
Alternative name(s):
Prostaglandin 12 synthase

[0208]

Step 3: Significance and Occurrence

[0209]

To find high performing panels, 10,000 trials were performed where on each trial the combined AUC of a random panel of 15 proteins selected from Table 12 was estimated. To calculate the combined AUC of each panel of 15 proteins, the highest intensity normalized transition was utilized. Logistic regression was used to calculate the AUC of the panel of 15 across all small samples. 131 panels of 15 proteins had combined AUC above 0.80, as shown in FIG. 1. (The significance by study separated into small (<2.0 cm) and large (>2.0 cm) PN are shown in FIG. 2). The resilience of the panels persisted despite site based variation in the samples as shown in FIG. 3. The panels are listed in Table 13.

[0210]

AUCP1P2P3P4P5P6P7P8
0.8282CD59CALULDHBALDOA DSG2MDHMTENA6PGD
0.8255CD59TSP1KITISLRALDOADSG21433ZCD14
0.8194S10A6ALDOAPVRTSP1CD44CH10PEDFAPOE
0.8189ALDOALEG1CALULDHBTETNFOLH1MASP11433Z
0.8187PVRCD59CRPALDOAGRP78DSG26PGDCD14
0.8171AMPNIBP3CALUCD44BGH3GRP781433Z6PGD
0.8171CALUCH10ALDOABST1MDHMVTNCAPOECD14
0.8165LDHBCO6A3CD44A1AG1GRP78DSG2MDHMVTNC
0.8163TPISCD59S10A6CALUENPLCH10ALDOADSG2
0.8163LEG1AMPNS10A6CALUISLRENOAVTNC6PGD
0.8161AMPNS10A6TSP1MPRIVTNCLUM6PGDAPOE
0.8159ALDOAAMPNTSP1BGH3GRP78PTPRJMASP1CERU
0.8159ALDOACO6A3MPRISEM3GCERULUMAPOECD14
0.8159AMPNCALUISLRSODMCERULUM6PGDAPOE
0.8159CALUPEDFCRPGRP78VTNC1433ZCD14FRIL
0.8157TPISLEG1S10A6LDHBTSP1ENPLMDHM6PGD
0.8155CALUCRPALDOASODMSEM3G1433ZFRILG3P
0.8153CALUMPRIALDOAPEDFDSG2CERUAPOEG3P
0.814LEG1COIA1AMPNS10A6TSP1MPRIPEDFGRP78
0.8138TSP1KITCERU6PGDAPOECD14FRILG3P
0.8132S10A6COIA1AMPNTSP1PEDFISLRPTPRJCERU
0.8128TPISLEG1AMPNS10A6IBP3CALUDSG2PTPRJ
0.8128TPISAMPNTSP1PEDFA1AG1MPRIALDOAVTNC
0.8124ALDOACALULDHBPLSLPEDFMASP16PGDAPOE
0.8124AMPNS10A6TSP1ENOAGRP786PGDAPOEFRIL
0.812IBP3TSP1CRPA1AG1SCFALDOAPEDFDSG2
0.8106COIA1CALUCD44BGH3ALDOATETNBST1LUM
0.8106TSP1PLSLCRPALDOAGRP78MDHMAPOEFRIL
0.8099CD59CALUENPLCD44ALDOATENA6PGDFRIL
0.8097AMPNS10A6IBP3A1AG1MPRIALDOAGRP78FRIL
0.8093ALDOAS10A6TSP1ENPLPEDFA1AG1GRP78APOE
0.8093PVRIBP3LDHBSCFTNF12LUM1433ZFRIL
0.8093CALULDHBCO6A3PEDFCH10BGH3PTPRJALDOA
0.8087ALDOAAMPNENPLKITMPRIGRP78LUM1433Z
0.8087CD59S10A6IBP3TSP1ENPLSODMMDHM6PGD
0.8083ALDOAAMPNS10A6IBP3PLSLCRPSCFMPRI
0.8081PVRIBP3TSP1CRPALDOASODMMDHMTNF12
0.8081S10A6LDHBENPLPLSLCH10CERUFRILG3P
0.8081IBP3LDHBPEDFMPRISEM3GVTNCAPOECD14
0.8079ALDOAAMPNCALUPLSLPEDFCH10MASP1TNF12
0.8077S10A6IBP3LDHBMDHMZA2GFRILG3PHYOU1
0.8077CD59S10A6LDHBTSP1CD44ISLRCERU1433Z
0.8077AMPNCALULDHBTSP1PLSLCD44ALDOATETN
0.8075TPISAMPNS10A6TSP1CH10COIA1CERUZA2G
0.8073CALUPEDFMPRIISLRBGH3ENOACERU1433Z
0.8071TPISCALUCO6A3KITDSG2MASP16PGDAPOE
0.8071LEG1COIA1TSP1CD44MPRIALDOAFOLH1TNF12
0.8065AMPNS10A6CALUCO6A3TSP1PLSLKITMASP1
0.8063S10A6TSP1A1AG1BGH3ZA2G1433ZFRILG3P
0.8063CALUKITENOA6PGDAPOECD14G3PICAM3
0.8061AMPNMPRIGRP78DSG2TENAAPOECD14FRIL
0.8059TPISIBP3TSP1PEDFTNF121433Z6PGDAPOE
0.8059CALULDHBPLSLCRPPEDFSEM3GMDHMAPOE
0.8058ALDOATSP1PLSLCD44KITCRPISLRTNF12
0.8058TPISTSP1MPRIISLRALDOAPEDFGRP78SEM3G
0.8054ALDOAS10A6CALUCRPA1AG1VTNCTENAZA2G
0.8054TPISCO6A3TSP1MPRIDSG2TNF12FRILG3P
0.8054CALULDHBDSG21433ZCD14FRILG3PHYOU1
0.805CALUMPRIENOAFOLH1LUMZA2GAPOECD14
0.8048PVRS10A6IBP3PEDFALDOABST1MDHMVTNC
0.8048AMPNCALUCH10DSG2TNF12CERU6PGDAPOE
0.8046ALDOALDHBTSP1KITISLRDSG2MASP11433Z
0.8046ALDOACOIA1CD59IBP3PTPRJSEM3GCERUCD14
0.8046PVRCD59S10A6PLSLPEDFCH10SCFBST1
0.8046COIA1IBP3MASP1DSG2TENAZA2G1433ZAPOE
0.8042BGH3CD59CALULDHBCO6A3SODMTENAAPOE
0.8042IBP3TSP1ENPLCH10CD14FRILG3PHYOU1
0.8042IBP3TSP1KITZA2G6PGDAPOECD14FRIL
0.804TPISBGH3S10A6LDHBCO6A3CH10PEDFTENA
0.804CALULDHBBGH3TETNFOLH1TNF12VTNCFRIL
0.8038TPISPVRCOIA1CALUSCFMPRIALDOAENOA
0.8036S10A6TPISCOIA1CD59CO6A3TSP1MPRIALDOA
0.8036LEG1CD59AMPNCALUCH10GRP78SEM3GTETN
0.8036AMPNS10A6TSP1ENPLPEDFSODMFOLH16PGD
0.8036S10A6CALUMASP1A1AG1MPRIALDOAVTNCTENA
0.8036IBP3CALUPLSLCD44KITCERU6PGDCD14
0.8036TSP1PLSLFOLH1COIA1TNF12VTNC6PGDFRIL
0.8034ALDOABGH3CD59TSP1KITCH10SODMVTNC
0.8034S10A6CALULDHBTSP1GRP781433Z6PGDG3P
0.8032S10A6CALUTSP1KITCH10PEDFGRP78SEM3G
0.8032TSP1MASP1CRPALDOAGRP78TETNTNF121433Z
0.803AMPNTSP1KITMPRISEM3GTETNDSG21433Z
0.803CALUCO6A3PLSLA1AG1ALDOAGRP786PGDAPOE
0.8028COIA1CD59AMPNTSP1KITISLRALDOAMDHM
0.8024S10A6CD44SCFMPRIISLRALDOAAPOEFRIL
0.8024S10A6TSP1ALDOASODMENOABST1FRILHYOU1
0.8024IBP3TSP1SCFALDOASODMDSG2VTNC1433Z
0.802ALDOATSP1PLSLCD44CH10A1AG1ENOATETN
0.802LEG1CALULDHBTSP1CH10ALDOAMDHMAPOE
0.802CD59IBP3TSP1A1AG1MPRIPTPRJ6PGDAPOE
0.802IBP3TSP1CRPBST1TNF12VTNC1433ZFRIL
0.8018LEG1S10A6IBP3CALUTSP1MASP1A1AG1SCF
0.8018COIA1CD59AMPNCALUMASP1BST1VTNCCERU
0.8018AMPNALDOASODMGRP78MDHMVTNC6PGDFRIL
0.8018LDHBCO6A3ALDOASEM3GDSG26PGDAPOEFRIL
0.8016S10A6LDHBSCFMPRIALDOAPEDFENOASEM3G
0.8016LDHBCO6A3TSP11433ZAPOECD14FRILG3P
0.8014ALDOAPEDFMPRIISLRFOLH1TNF12MASP1CERU
0.8014COIA1PEDFCRPA1AG1ENOACERUFRILG3P
0.8014CD59IBP3TSP1KITMASP1ENOATNF12CD14
0.8014LDHBKITSCFBGH3SEM3GVTNC1433ZFRIL
0.8013PVRAMPNLDHBCD44DSG2TETNMDHMFRIL
0.8013S10A6LDHBTSP1ISLRLUMG3PHYOU1ICAM3
0.8013CALUA1AG1MPRIALDOAPEDFDSG2VTNCZA2G
0.8013TSP1ENPLKITSODMSEM3GDSG2TETNLUM
0.8013TSP1PLSLISLRALDOAENOAMDHMAPOEG3P
0.8011ALDOAAMPNCO6A3SEM3GAPOECD14FRILG3P
0.8011TPISBGH3AMPNS10A6CALULDHBKITTENA
0.8011COIA1IBP3TSP1A1AG1TETNDSG26PGDFRIL
0.8011AMPNS10A6IBP3CALUKITSCFALDOAAPOE
0.8011IBP3A1AG1PEDFSEM3GMDHMTNF12VTNC1433Z
0.8009ALDOABGH3AMPNLDHBTSP1PLSLMPRIISLR
0.8009LEG1COIA1IBP3CH10MASP1SCFALDOATNF12
0.8009AMPNENPLALDOATETNFOLH1BST1ZA2G6PGD
0.8009CALUCO6A3ENPLALDOAGRP78PTPRJVTNCAPOE
0.8009TSP1CH10PTPRJTETNTNF12VTNCTENA1433Z
0.8007CD59S10A6IBP3CO6A3TSP1KITISLRGRP78
0.8007AMPNTSP1KITSCFTETNZA2G1433Z6PGD
0.8007S10A6IBP3TSP1CD44PEDFA1AG1PTPRJSODM
0.8007CALUCO6A3TSP1CH10SCFBGH3ALDOAENOA
0.8007ENPLCD44MASP1GRP781433ZCD14FRILG3P
0.8005TPISLEG1LDHBTSP1MASP1A1AG1MPRIALDOA
0.8005PEDFCRPISLRALDOAGRP78PTPRJZA2G6PGD
0.8003ALDOAS10A6CALUCRPBGH3TETN6PGDCD14
0.8003AMPNTSP1A1AG1MPRIISLRALDOAMASP1LUM
0.8003CO6A3TSP1SCFMPRIISLRFOLH11433ZAPOE
0.8001S10A6IBP3TSP1KITTETNCOIA1CERU6PGD
0.8001S10A6CALUCH10ISLRALDOASODMPTPRJMDHM
0.8001IBP3TSP1ENPLCH10CRPISLRALDOASODM
0.8001IBP3TSP1PTPRJALDOABST1LUM1433ZAPOE
0.8001LDHBTSP1MPRIGRP78SEM3GLUMZA2GFRIL
AUCP9P10P11P12P13P14P15
0.8282APOEFRILG3PHYOU1LRP1RANHXK1
0.8255FRILHYOU1LRP1PROF1TBB3FINCCEAM8
0.8194FRILG3PHYOU1LRP1TBB3CLIC1RAN
0.8189APOEG3PHYOU1PRDX1PROF1ILKHXK1
0.8187FRILG3PPRDX1ILKFINCGSLG1HXK1
0.8171CD14FRILG3PLRP1TBB3FINCRAN
0.8171FRILG3PICAM3PRDX1PROF1PVRHXK1
0.81651433ZFRILG3PS10A6FINCGSLG1HXK1
0.81636PGDFRILG3PHYOU1ICAM3PRDX1FINC
0.8163APOEG3PLRP1UGPARANCEAM8HXK1
0.8161CD14FRILG3PLRP1PROF1RANCEAM8
0.81596PGDFRILG3PHYOU1LRP1PRDX1CEAM8
0.8159FRILG3PLRP1TBB3FINCGSLG1HXK1
0.8159CD14FRILG3PPRDX1CLIC1ILKHXK1
0.8159G3PTBB3ILKGELSFINCRANGSLG1
0.8157APOEFRILG3PHYOU1CLIC1ILKHXK1
0.8155HYOU1LRP1PRDX1PROF1FINCRANGSLG1
0.8153HYOU1PLXC1PRDX1ILKCEAM8HXK1BST1
0.814CERUFRILG3PPLXC1PRDX1ILKHXK1
0.8138HYOU1PLXC1RANCEAM8HXK1BST1MMP9
0.81326PGDCD14FRILHYOU1FINCGSLG1BST1
0.8128BST16PGDG3PHYOU1ILKFINCHXK1
0.81281433ZAPOEFRILG3PLRP1PTGISRAN
0.8124CD14FRILG3PGDIR2FINCGSLG1HXK1
0.8124GDIR2LRP1CLIC1FINCGSLG1HXK1BST1
0.8121433ZAPOEFRILLRP1PRDX1PROF1FINC
0.81061433Z6PGDFRILG3PHYOU1PRDX1CLIC1
0.8106G3PPRDX1UGPAILKCEAM8GSLG1HXK1
0.8099G3PHYOU1PRDX1PROF1FINCGSLG1HXK1
0.8097G3PHYOU1LRP1PTGISILKFINCMMP9
0.8093CD14FRILG3PLRP1PLXC1CLIC1GSLG1
0.8093G3PGDIR2PRDX1UGPACLIC1FINCHXK1
0.8093SEM3GMASP1G3PHYOU1FINCCEAM8HXK1
0.80876PGDCD14FRILHYOU1TBB3CLIC1FINC
0.8087FRILG3PHYOU1LRP1FINCCEAM8HXK1
0.8083GRP78CERUCD14FRILLRP1FINCCEAM8
0.8081TENAFRILG3PHYOU1PROF1RANHXK1
0.8081HYOU1ICAM3PLXC1CLIC1ILKFINCGSLG1
0.8081FRILG3PHYOU1S10A6CEAM8GSLG1HXK1
0.8079LUM6PGDAPOEFRILHYOU1RANHXK1
0.8077LRP1PTGISCLIC1FINCRANGSLG1MMP9
0.8077FRILG3PHYOU1LRP1ILKGSLG1HXK1
0.8077APOECD14FRILG3PLRP1PRDX1GSLG1
0.80756PGDFRILG3PLRP1UGPAILKHXK1
0.80736PGDFRILG3PHYOU1LRP1PRDX1FINC
0.8071CD14FRILG3PLRP1AMPNRANHXK1
0.8071APOEFRILHYOU1LRP1PTGISCLIC1AMPN
0.8065ALDOAAPOEFRILG3PTBB3RANHXK1
0.8063LRP1PROF1TBB3UGPACLIC1AMPNRAN
0.8063LRP1PLXC1PROF1FINCRANHXK1MMP9
0.8061G3PLRP1PLXC1PROF1PVRFINCCEAM8
0.8059CD14FRILG3PLRP1TBB3RANGSLG1
0.8059G3PHYOU1PRDX1TBB3ILKRANHXK1
0.8058APOECD14FRILG3PHYOU1RANHXK1
0.8058FRILG3PHYOU1PROF1GELSPVRRAN
0.80546PGDFRILG3PHYOU1ILKGSLG1HXK1
0.8054HYOU1ICAM3PLXC1TBB3GELSRANBST1
0.8054PLXC1PRDX1PROF1FINCCEAM8GSLG1MMP9
0.805G3PHYOU1ICAM3PRDX1UGPAILKHXK1
0.8048CD14FRILG3PHYOU1PTGISFINCRAN
0.8048FRILG3PLRP1PRDX1UGPARANCEAM8
0.8046FRILG3PGDIR2HYOU1RANGSLG1HXK1
0.8046FRILG3PLRP1PRDX1FINCGSLG1MMP9
0.8046FRILG3PCLIC1ILKAMPNFINCHXK1
0.8046CD14FRILG3PICAM3AMPNFINCHXK1
0.8042G3PHYOU1S10A6ILKFINCRANHXK1
0.8042ICAM3LRP1PRDX1PROF1GELSFINCGSLG1
0.8042GDIR2HYOU1LRP1PRDX1PROF1CLIC1HXK1
0.804FRILG3PHYOU1LRP1PRDX1ILKGSLG1
0.804G3PGDIR2PRDX1CLIC1GELSFINCHXK1
0.8038MASP1APOEFRILG3PPRDX1FINCHXK1
0.8036ENOA6PGDFRILG3PGDIR2LRP1PRDX1
0.8036APOEG3PHYOU1ICAM3RANCEAM8HXK1
0.8036APOEFRILG3PHYOU1LRP1HXK1MMP9
0.8036FRILG3PPROF1PTGISFINCCEAM8HXK1
0.8036FRILG3PHYOU1PRDX1FINCCEAM8HXK1
0.8036G3PLRP1PRDX1PROF1GELSFINCRAN
0.8034TENA6PGDG3PHYOU1LRP1TBB3ILK
0.8034HYOU1ICAM3PROF1ILKGELSAMPNFINC
0.8032MASP16PGDCD14FRILG3PHYOU1ILK
0.8032APOECD14G3PHYOU1PVRRANHXK1
0.803APOEFRILG3PTBB3UGPAPVRRAN
0.803CD14FRILG3PHYOU1ICAM3PRDX1RAN
0.8028CERULUMZA2GAPOEFRILLRP1MMP9
0.8024G3PHYOU1PRDX1GELSFINCCEAM8HXK1
0.8024LRP1PROF1CLIC1GELSFINCCEAM8GSLG1
0.8024APOEFRILG3PLRP1PRDX1UGPAPTPRJ
0.802TENAAPOEFRILG3PTBB3AMPNGSLG1
0.802FRILG3PHYOU1ILKPVRGSLG1PTPRJ
0.802FRILG3PLRP1ILKRANCEAM8MMP9
0.802G3PGDIR2HYOU1LRP1PRDX1TBB3FINC
0.8018ALDOASEM3GVTNCFRILG3PLRP1CLIC1
0.80186PGDAPOECD14FRILHYOU1PROF1GSLG1
0.8018G3PHYOU1LRP1PTGISGELSFINCRAN
0.8018G3PHYOU1ICAM3PROF1FINCPTPRJHXK1
0.8016APOEFRILG3PHYOU1PRDX1CLIC1GSLG1
0.8016HYOU1PROF1UGPACLIC1RANCEAM8PTPRJ
0.80146PGDFRILG3PHYOU1PRDX1FINCHXK1
0.8014GDIR2LRP1S10A6GELSFINCGSLG1HXK1
0.8014FRILG3PPRDX1UGPAFINCPTPRJHXK1
0.8014G3PHYOU1LRP1PRDX1PROF1FINCHXK1
0.8013G3PLRP1PRDX1ILKFINCHXK1MMP9
0.8013LRP1PROF1UGPAILKFINCPTPRJHXK1
0.80136PGDFRILG3PCLIC1S10A6ILKPVR
0.8013APOEFRILG3PHYOU1CLIC1RANHXK1
0.8013GDIR2LRP1PTGISFINCRANHXK1MMP9
0.8011GDIR2HYOU1ICAM3PRDX1FINCHXK1MMP9
0.80116PGDAPOEG3PLRP1PROF1GELSMMP9
0.8011GDIR2HYOU1LRP1CLIC1S10A6PVRGSLG1
0.8011G3PICAM3LRP1GELSFINCRANCEAM8
0.8011G3PHYOU1PRDX1FINCGSLG1PTPRJHXK1
0.8009APOEFRILLRP1PVRFINCRANPTPRJ
0.8009CERUAPOECD14FRILTBB3ILKFINC
0.8009CD14FRILCLIC1S10A6ILKFINCMMP9
0.8009CD14G3PTBB3CLIC1GELSRANHXK1
0.80096PGDFRILG3PHYOU1RANHXK1MMP9
0.8007MDHMCD14FRILG3PHYOU1GSLG1HXK1
0.8007APOEG3PGDIR2LRP1PRDX1TBB3RAN
0.8007CERUAPOEFRILICAM3LRP1UGPAGSLG1
0.8007TETNLUMAPOEFRILG3PRANHXK1
0.8007GDIR2ICAM3LRP1PRDX1PROF1FINCHXK1
0.8005ENOAFRILG3PLRP1UGPAILKFINC
0.8005G3PHYOU1PRDX1TBB3FINCRANCEAM8
0.8003FRILG3PCLIC1FINCGSLG1HXK1MMP9
0.80036PGDAPOEFRILICAM3TBB3GSLG1BST1
0.8003G3PHYOU1ICAM3PRDX1UGPARANHXK1
0.8001CD14FRILG3PPROF1FINCHXK1MMP9
0.8001VTNCFRILG3PCLIC1ILKAMPNHXK1
0.80011433ZG3PHYOU1LRP1PRDX1PROF1CEAM8
0.8001G3PHYOU1LRP1PTGISTBB3PVRRAN
0.8001G3PICAM3PROF1TBB3FINCRANGSLG1

[0211]

To calculate the combined AUC of each panel of 15 proteins, the highest intensity normalized transition was utilized. Logistic regression was used to calculate the AUC of the panel of 15 across all small samples. 5 panels of 15 proteins had combined AUC above 0.80.

[0212]

Finally, the frequency of each of the 67 proteins on the 131 panels listed in Table 13 is presented in Table 12 both as raw counts (column 2) and percentage (column 3). It is an important observation that the panel size of 15 was pre-selected to prove that there are diagnostic proteins and panels. Furthermore, there are numerous such panels. Smaller panels selected from the list of 67 proteins can also be formed and can be generated using the same methods here.

Example 4: A Diagnostic Panel of 15 Proteins for Determining the Probability that a Blood Sample from a Patient with a PN of Size 2 cm or Less is Benign or Malignant

[0213]

In Table 14 a logistic regression classifier trained on all small samples is presented.

[0214]

ALDOA_HUMANALQASALK_401.25_7YGFIEGHVVIPR_1−1.96079
617.40462.92_272.20
BGH3_HUMANLTLLAPLNSVFK_8YEVTVVSVR_526.29_22.21074
658.40_804.50759.50
CLIC1_HUMANLAALNPESNTAGL9ASSIIDELFQDR_30.88028
DIFAK_922.99_256.20465.24_565.30
CO6A3_HUMANVAVVQYSDR_518.77_10ASSIIDELFQDR_3−1.52046
767.40465.24_565.30
COIA1_HUMANAVGLAGTFR_446.26_11YGFIEGHVVIPR_1−0.76786
721.40462.92_272.20
FINC_HUMANVPGTSTSATLTGLT12FLNVLSPR_473.28_40.98842
R_487.94_446.30685.40
G3P_HUMANGALQNIIPASTGAA13TASDFITK_441.73_50.58843
K_706.40_815.50710.40
ISLR_HUMANALPGTPVASSQPR_14FLNVLSPR_473.28_41.02005
640.85_841.50_685.40
LRPl_HUMANTVLWPNGLSLDIPA15YEVTVVSVR_526.29_2−2.14383
GR_855.00_400.20759.50
PRDX1_HUMANQITVNDLPVGR_606.30_16YGFIEGHVVIPR_1−1.38044
428.30462.92_272.20
PROF1_HUMANSTGGAPTFNVTVT17TASDFITK_441.73_5−1.78666
K_690.40_503.80710.40
PVR_HUMANSVDIVVLR_444.75_18TASDFITK_441.73_52.26338
702.40_710.40
TBB3_HUMANISVYYNEASSHK_19FLNVLSPR_473.28_4−0.46786
466.60_458.20685.40
TETN_HUMANLDTLAQEVALLK_20TASDFITK_441.73_5−1.99972
657.39_330.20_710.40
TPIS_HUMANVVFEQTK_425.74_21YGFIEGHVVIPR_12.65334
652.30462.92_272.20
Constant (Co)21.9997

[0215]

The classifier has the structure

[0216]

[0217]

Where C0 and C1 are logistic regression coefficients, Piare logarithmically transformed normalized transition intensities. Samples are predicted as cancer if Probability ≥0.5 or as benign otherwise. In Table 14 the coefficients C1appear in the sixth column, C0in the last row, and the normalized transitions for each protein are defined by column 2 (protein transition) and column 4 (the normalizing factor).

[0218]

The performance of this classifier, presented as a ROC plot, appears in FIG. 4. Overall AUC is 0.81. The performance can also be assessed by applying the classifier to each study site individually which yields the three ROC plots appearing in FIG. 5. The resulting AUCs are 0.79, 0.88 and 0.78 for Laval, NYU and UPenn, respectively.

Example 5: The Program “Ingenuity”® was Used to Query the Blood Proteins that are Used to Identify Lung Cancer in Patients with Nodules that were Identified Using the Methods of the Present Invention

[0219]

Using a subset of 35 proteins (Table 15) from the 67 proteins identified as a diagnostic panel (Table 13), a backward systems analysis was performed. Two networks were queried that are identified as cancer networks with the identified 35 proteins. The results show that the networks that have the highest percentage of “hits” when the proteins are queried that are found in the blood of patients down to the level of the nucleus are initiated by transcription factors that are regulated by either cigarette smoke or lung cancer among others. See also Table 16 and FIG. 6.

[0220]

These results are further evidence that the proteins that were identified using the methods of the invention as diagnostic for lung cancer are prognostic and relevant.

[0221]

16PGD_HUMAN6-phosphogluconatePGDphosphogluconate dehydrogenase
dehydrogenase,
decarboxylating
2AIFM1_HUMANApoptosis-inducingAIFM1apoptosis-inducing factor, mito-
factor 1, mitochondrialchondrion-associated, 1
3ALDOA_HUMANFructose-bisphosphateALDOAaldolase A, fructose-bisphosphate
aldolase A
4BGH3_HUMANTransforming growthTGFBItransforming growth factor, beta-
factor-beta-inducedinduced, 68 kDa
protein ig-h3
5C163A_HUMANScavenger receptorCD163CD163 molecule
cysteine-rich type 1
protein M130
6CD14_HUMANMonocyte differentiationCD14CD14 molecule
antigen CD14
7COIA1_HUMANCollagen alpha-COL18A1collagen, type XVIII, alpha 1
1(XVIII) chain
8ERO1A_HUMANERO1-like protein ERO1LERO1-like (S. cerevisiae)
alpha
9FIBA_HUMANFibrinogen alpha chainFGAfibrinogen alpha chain
10FINC_HUMANFibronectinFN1fibronectin 1
11FOLH1_HUMANGlutamate carboxy-FOLH1folate hydrolase (prostate-specific
peptidase 2membrane antigen) 1
12FRIL_HUMANFerritin light chainFTLferritin, light polypeptide
13GELS_HUMANGelsolinGSNgelsolin (amyloidosis, Finnish
type)
14GGH_HUMANGamma-glutamyl GGHgamma-glutamyl hydrolase
hydrolase(conjugase, folylpolygammaglutamyl
hydrolase)
15GRP78_HUMAN78 kDa glucose-HSPA5heat shock 70 kDa protein 5
regulated protein(glucose-regulated protein, 78 kDa)
16GSLG1_HUMANGolgi apparatus proteinGLG1golgi apparatus protein 1
1
17GSTP1_HUMANGlutathione S-GSTP1glutathione S-transferase pi 1
transferase P
18IBP3_HUMANInsulin-like growthIGFBP3insulin-like growth factor binding
factor-binding proteinprotein 3
3
19ICAM1_HUMANIntercellular adhesionICAM1intercellular adhesion molecule 1
molecule 1
20ISLR_HUMANImmunoglobulin super-ISLRimmunoglobulin superfamily
family containing containing leucine-rich repeat
leucine-rich repeat protein
21LG3BP_HUMANGalectin-3-binding LGALS3BPlectin, galactoside-binding,
proteinsoluble, 3 binding protein
22LRP1_HUMANProlow-density lipo-LRP1low density lipoprotein-related
protein receptor-relatedprotein 1 (alpha-2-macroglobulin
protein 1receptor)
23LUM_HUMANLumicanLUMlumican
24MASP1_HUMANMannan-binding lectinMASP1mannan-binding lectin serine
serine protease 1peptidase 1 (C4/C2 activating
component of Ra-reactive factor)
25PDIA3_HUMANProtein disulfide-PDIA3protein disulfide isomerase family
isomerase A3A, member 3
26PEDF_HUMANPigment epithelium-SERPINF1serpin peptidase inhibitor, clade F
derived factor(alpha-2 antiplasmin, pigment
epithelium derived factor),
member 1
27PRDX1_HUMANPeroxiredoxin-1PRDX1peroxiredoxin 1
28PROF1_HUMANProfilin-1PFN1profilin 1
29PTPA_HUMANSerine/threonine-PPP2R4protein phosphatase 2A activator,
protein phosphatase 2Aregulatory subunit 4
activator
30PTPRJ_HUMANReceptor-type tyrosine-PTPRJprotein tyrosine phosphatase,
protein phosphatase etareceptor type, J
31RAP2B_HUMANRas-related proteinRAP2BRAP2B, member of RAS
Rap-2boncogene family
32SEM3G_HUMANSemaphorin-3GSEMA3Gsema domain, immunoglobulin
domain (Ig), short basic domain,
secreted, (semaphorin) 3G
33SODM_HUMANSuperoxide dismutaseSOD2superoxide dismutase 2, mito-
[Mn], mitochondrialchondrial
34TETN_HUMANTetranectinCLEC3BC-type lectin domain family 3,
member B
35TSP1_HUMANThrombospondin-1THBS1thrombospondin 1

[0222]

NFE2L2nuclear92Cigarette Smoking Blocks the Protective
(NRF2)factortranscriptionExpression of Nrf2/ARE Pathway . . .
(erythroid-factorMolecular mechanisms for the regulation
derived 2)-protecting cell from of Nrf2-mediated cell proliferation in non-
like 2oxidative stresssmall-cell lung cancers . . .
EGR1early38Cigarette smoke-induced Egr-1 upregulates
growth transcriptionproinflammatory cytokines in pulmonary
responsefactorepithelial cells . . .
involved oxidative stressEGR-1 regulates Ho-1 expression induced
by cigarette smoke . . .
Chronic hypoxia induces Egr-1 via activa-
tion of ERK1/2 and contributes to
pulmonary vascular remodeling.
Early growth response-1 induces and
enhances vascular endothelial growth factor-
A expression in lung cancer cells . . .

Example 6: Cooperative Proteins for Diagnosing Pulmonary Nodules

[0223]

To achieve unbiased discovery of cooperative proteins, selected reaction monitoring (SRM) mass spectrometry (Addona, Abbatiello et al. 2009) was utilized. SRM is a form of mass spectrometry that monitors predetermined and highly specific mass products of particularly informative (proteotypic) peptides of selected proteins. These peptides are recognized as specific transitions in mass spectra. SRM possesses the following required features that other technologies, notably antibody-based technologies, do not possess:

    • Highly multiplexed SRM assays can be rapidly and cost-effectively developed for tens or hundreds of proteins.
    • The assays developed are for proteins of one's choice and are not restricted to a catalogue of pre-existing assays. Furthermore, the assays can be developed for specific regions of a protein, such as the extracellular portion of a transmembrane protein on the cell surface of a tumor cell, or for a specific isoform.
    • SRM technology can be used from discovery to clinical testing. Peptide ionization, the foundation of mass spectrometry, is remarkably reproducible. Using a single technology platform avoids the common problem of translating an assay from one technology platform to another.

[0227]

SRM has been used for clinical testing of small molecule analytes for many years, and recently in the development of biologically relevant assays [10].

[0228]

Labeled and unlabeled SRM peptides are commercially available, together with an open-source library and data repository of mass spectra for design and conduct of SRM analyses. Exceptional public resources exist to accelerate assay development including the PeptideAtlas and the Plasma Proteome Project [12, 13], the SRM Atlas and PASSEL, the PeptideAtlas SRM Experimental Library.

[0229]

Two SRM strategies that enhance technical performance were introduced. First, large scale SRM assay development introduces the possibility of monitoring false signals. Using an extension of expression correlation techniques [14], the rate of false signal monitoring was reduced to below 3%. This is comparable and complementary to the approach used by mProphet (Reiter, Rinner et al. 2011).

[0230]

Second, a panel of endogenous proteins was used for normalization. However, whereas these proteins are typically selected as “housekeeping” proteins (Lange, Picotti et al. 2008), proteins that were strong normalizers for the technology platform were identified. That is, proteins that monitored the effects of technical variation so that it could be controlled effectively. This resulted, for example, in the reduction of technical variation due to sample depletion of high abundance proteins from 23.8% to 9.0%. The benefits of endogenous signal normalization has been previously discussed (Price, Trent et al. 2007).

[0231]

The final component of the strategy was to carefully design the discovery and validation studies using emerging best practices. Specifically, the cases (malignant nodules) and controls (benign nodules) were pairwise matched on age, nodule size, gender and participating clinical site. This ensures that the candidate markers discovered are not markers of age or variations in sample collection from site to site. The studies were well-powered, included multiple sites, a new site participated in the validation study, and importantly, were designed to address the intended use of the test. The careful selection and matching of samples resulted in an exceptionally valuable feature of the classifier. The classifier generates a score that is independent of nodule size and smoking status. As these are currently used risk factors for clinical management of IPNs, the classifier is a complementary molecular tool for use in the diagnosis of IPNs.

[0232]

Selection of Biomarker Candidates for Assay Development

[0233]

To identify lung cancer biomarkers in blood that originate from lung tumor cells, resected lung tumors and distal normal tissue of the same lobe were obtained. Plasma membranes were isolated from both endothelial and epithelial cells and analyzed by tandem mass spectrometry to identify cell surface proteins over expressed on tumor cells. Similarly, Golgi apparatus were isolated to identify over-secreted proteins from tumor cells. Proteins with evidence of being present in blood or secreted were prioritized resulting in a set of 217 proteins. See Example 7: Materials and Methods for details.

[0234]

To ensure other viable lung cancer biomarkers were not overlooked, a literature search was performed and manually curated for lung cancer markers. As above, proteins with evidence of being present in blood or secreted were prioritized. This resulted in a set of 319 proteins. See Example 7: Materials and Methods for details.

[0235]

The tissue (217) and literature (319) candidates overlapped by 148 proteins resulting in a final candidate list of 388 protein candidates. See Example 7: Materials and Methods.

[0236]

Development of SRM Assays

[0237]

SRM assays for the 388 proteins were developed using standard synthetic peptide techniques (See Example 7: Materials and Methods). Of the 388 candidates, SRM assays were successfully developed for 371 candidates. The 371 SRM assays were applied to benign and lung cancer plasma samples to evaluate detection rate in blood. 190 (51% success rate) of the SRM assays were detected. This success rate compares favorably to similar attempts to develop large scale SRM assays for detection of cancer markers in plasma. Recently 182 SRM assays for general cancer markers were developed from 1172 candidates (16% success rate) [15]. Despite focusing only on lung cancer markers, the 3-fold increase in efficiency is likely due to sourcing candidates from cancer tissues with prior evidence of presence in blood. Those proteins of the 371 that were previously detected by mass spectrometry in blood had a 64% success rate of detection in blood whereas those without had a 35% success rate. Of the 190 proteins detected in blood, 114 were derived from the tissue-sourced candidates and 167 derived from the literature-sourced candidates (91 protein overlap). See Example 7: Materials and Methods and Table 6.

[0238]

Typically, SRM assays are manually curated to ensure assays are monitoring the intended peptide. However, this becomes unfeasible for large scale SRM assays such as this 371 protein assay. More recently, computational tools such as mProphet (Reiter, Rinner et al. 2011) enable automated qualification of SRM assays. A complementary strategy to mProphet was introduced that does not require customization for each dataset set. It utilizes correlation techniques (Kearney, Butler et al. 2008) to confirm the identity of protein transitions with high confidence. In FIG. 7 a histogram of the Pearson correlations between every pair of transitions in the assay is presented. The correlation between a pair of transitions is obtained from their expression profiles over all 143 samples in the discovery study detailed below. As expected, transitions from the same peptide are highly correlated. Similarly, transitions from different peptide fragments of the same protein are also highly correlated. In contrast, transitions from different proteins are not highly correlated and enables a statistical analysis of the quality of a protein's SRM assay. For example, if the correlation of transitions from two peptides from the same protein is above 0.5 then there is less than a 3% probability that the assay is false. See Example 7: Materials and Methods.

[0239]

Classifier Discovery

[0240]

A summary of the 143 samples used for classifier discovery appears in Table 17. Samples were obtained from three sites to avoid overfitting to a single site. Participating sites were Laval (Institut Universitaire de Cardiologie et de Pneumologie de Quebec), NYU (New York University) and UPenn (University of Pennsylvania). Samples were also selected to be representative of the intended use population in terms of nodule size (diameter), age and smoking status.

[0241]

Benign and cancer samples were paired by matching on age, gender, site and nodule size (benign and cancer samples were required to have a nodule identified radiologically). The benign and cancer samples display a bias in smoking (pack years), however, the majority of benign and cancer samples were current or past smokers. In comparing malignant and benign samples, the intent was to find proteins that were markers of lung cancer; not markers of age, nodule size or differences in site sample collection. Note that cancer samples were pathologically confirmed and benign samples were either pathologically confirmed or radiologically confirmed (no tumor growth demonstrated over two years of CT scan surveillance).

[0242]

Clinical data summaries and demographic
analysis for discovery and validation sets.
DiscoveryValidation
CancerBenignP valueCancerBenignP value
Sample72715252
(total)
SampleLaval14141.00†13120.89†
(Center)NYU292869
UPenn29291413
Vanderbilt001918
SampleMale29281.00†25270.85†
(Gender)Female43432725
SampleNever5190.006†3150.006†
(SmokingPast60443829
History)Current66117
No data1201
AgeMedian65640.46‡63620.03‡
(quartile(59-72)(52-71)(60-73)(56-67)
range)
NoduleMedian13130.69‡16150.68‡
Size(quartile(10-16)(10-18)(13-20)(12-22)
(mm)range)
Pack-Median37200.001‡40270.09‡
year§(quartile(20-52)(0-40) (19-50)(0-50) 
range)
†Based on Fisher's exact test.
‡Based on Mann-Whitney test.
§No data (cancer, benign): Discovery (4, 6), Validation (2, 3)

[0243]

The processing of samples was conducted in batches. Each batch contained a set of randomly selected cancer-benign pairs and three plasma standards, included for calibration and quality control purposes.

[0244]

All plasma samples were immunodepleted, trypsin digested and analyzed by reverse phase HPLC-SRM-MS. Protein transitions were normalized using an endogenous protein panel. The normalization procedure was designed to reduce overall variability, but in particular, the variability introduced by the depletion step. Overall technical variability was reduced from 32.3% to 25.1% and technical variability due to depletion was reduced from 23.8% to 9.0%. Details of the sample analysis and normalization procedure are available in Example 7: Materials and Methods.

[0245]

To assess panels of proteins, they were fit to a logistic regression model. Logistic regression was chosen to avoid the overfitting that can occur with non-linear models, especially when the number of variables measured (transitions) is similar or larger than the number of samples in the study. The performance of a panel was measured by partial area under the curve (AUC) with sensitivity fixed at 90% (McClish 1989). Partial AUC correlates to high NPV performance while maximizing ROR.

[0246]

To derive the 13 protein classifier, four criteria were used:

    • The protein must have transitions that are reliably detected above noise across samples in the study.
    • The protein must be highly cooperative.
    • The protein must have transitions that are robust (high signal to noise, no interference, etc.)
    • The protein's coefficient within the logistic regression model must have low variability during cross validation, that is, it must be stable.
      Details of how each of these criteria were applied appear in Example 7: Materials and Methods.

[0251]

Finally, the 13 protein classifier was trained to a logistic regression model by Monte Carlo cross validation (MCCV) with a hold out rate of 20% and 20,000 iterations. The thirteen proteins for the rule-out classifier are listed in Table 18 along with their highest intensity transition and model coefficient.

[0252]

The 13 protein classifier.
SEQ
IDCo-
ProteinTransitionNOefficient
Constant(α)36.16
LRP1_HUMANTVLWPNGLSLDIPAGR_15−1.59
855.00_400.20
BGH3_HUMANLTLLAPLNSVFK_81.73
658.40_804.50
COIA1_HUMANAVGLAGTFR_446.26_721.4011−1.56
TETN_HUMANLDTLAQEVALLK_20−1.79
657.39_330.20
TSP1_HUMANGFLLLASLR_495.31_559.40220.53
ALDOA_HUMANALQASALK_401.25_617.407−0.80
GRP78_HUMANTWNDPSVQQDIK_231.41
715.85_260.20
ISLR_HUMANALPGTPVASSQPR_141.40
640.85_841.50
FRIL_HUMANLGGPEAGLGEYLFER_240.39
804.40_913.40
LG3BP_HUMANVEIFYR_413.73_598.3025−0.58
PRDX1_HUMANQITVNDLPVGR_16−0.34
606.30_428.30
FIBA_HUMANNSLFEYQK_514.76_714.30260.31
GSLG1_HUMANIIIQESALDYR_660.86_338.2027−0.70

[0253]

Validation of the Rule-Out Classifier

[0254]

52 cancer and 52 benign samples (see Table 17) were used to validate the performance of the 13 protein classifier. All samples were independent of the discovery samples, in addition, over 36% of the validation samples were sourced from a new fourth site (Vanderbilt University). Samples were selected to be consistent with intended use and matched in terms of gender, clinical site and nodule size. We note a slight age bias, which is due to 5 benign samples from young patients. Anticipating a NPV of 90%, the 95% confidence interval is +/−5%.

[0255]

At this point we refer to the 13 protein classifier trained on 143 samples the Discovery classifier. However, once validation is completed, to find the optimal coefficients for the classifier, it was retrained on all 247 samples (discovery and validation sets) as this is most predictive of future performance. We refer to this classifier as the Final classifier. The coefficients of the Final classifier appear in Table 21.

[0256]

The performance of the Discovery and Final classifiers is summarized in FIG. 8. Reported are the NPV and ROR for the Discovery classifier when applied to the discovery set, the validation set. The NPV and ROR for the Final classifier are reported for all samples and also for all samples restricted to nodule size 8 mm to 20 mm (191 samples).

[0257]

NPV and ROR are each reported as a fraction from 0 to 1. Similarly, the classifier produces a score between 0 and 1, which is the probability of cancer predicted by the classifier.

[0258]

The discovery and validation curves for NPV and ROR are similar with the discovery curves superior as expected. This demonstrates the reproducibility of performance on an independent set of samples. A Discovery classifier rule out threshold of 0.40 achieves NPV of 96% and 90%, whereas ROR is 33% and 23%, for the discovery samples and the validation samples, respectively. Final classifier rule threshold of 0.60 achieves NPV of 91% and 90%, whereas ROR is 45% and 43%, for all samples and all samples restricted to be 8 mm-20 mm, respectively.

[0259]

Applications of the Classifier

[0260]

FIG. 9 presents the application of the final classifier to all 247 samples from the discovery and validation sets. The intent of FIG. 9 is to contrast the clinical risk factors of smoking (measured in pack years) and nodule size (proportional to the size of each circle) to the classifier score assigned to each sample.

[0261]

First, note the density of cancer samples with high classifier scores. The classifier has been designed to detect a cancer signature in blood with high sensitivity. As a consequence, to the left of the rule out threshold (0.60) there are very few (<10%) cancer samples, assuming cancer prevalence of 25% [16, 17].

[0262]

Third is the observation that nodule size does not appear to increase with the classifier score. Both large and small nodules are spread across the classifier score spectrum. Similarly, although there are a few very heavy smokers with very high classifier scores, increased smoking does not seem to increase with classifier score. To quantify this observation the correlation between the classifier score and nodule size, smoking and age were calculated and appear in Table 19. In all cases there is no significant relationship between the classifier score and the risk factors. The one exception is a weak correlation between benign classifier scores and benign ages. However, this correlation is so weak that the classifier score increases by only 0.04 every 10 years.

[0263]

Correlation between classifier scores and clinical risk factors.
AgeNodule SizeSmoking
Benign0.25−0.060.11
Cancer0.01−0.010.06

[0264]

This lack of correlation has clinical utility. It implies that the classifier provides molecular information about the disease status of an IPN that is incremental upon risk factors such as nodule size and smoking status. Consequently, it is a clinical tool for physicians to make more informed decisions around the clinical management of an IPN.

[0265]

To visual how this might be accomplished, we demonstrate how the cancer probability score generated by the classifier can be related to cancer risk (see FIG. 11)

[0266]

At a given classifier score, some percentage of all cancer nodules will have a smaller score. This is the sensitivity of the classifier. For example, at classifier score 0.8, 47% of cancer patients have a lower score, at classifier score 0.7, 28% of cancer patients have a lower score, at classifier score 0.5, only 9% are lower and finally at score 0.25, only 4% are lower. This enables a physician to interpret a patient's classifier score in terms of relative risk.

[0267]

The Molecular Foundations of the Classifier

[0268]

The goal was to identify the molecular signature of a malignant pulmonary nodule by selecting proteins that were the cooperative, robustly detected by SRM and stable within the classifier. How well associated with lung cancer is the derived classifier? Is there a molecular foundation for the perturbation of these 13 proteins in blood? And finally, how unique is the classifier among other possible protein combinations?

[0269]

To answer these questions the 13 proteins of the classifier were submitted for pathway analysis using IPA (Ingenuity Systems). The first step was to work from outside the cell inwards to identify the transcription factors most likely to cause a modulation of these 13 proteins. The five most significant were FOS, NRF2, AHR, HD and MYC. FOS is common to many forms of cancer. However, NRF2 and AHR are associated with lung cancer, response to oxidative stress and lung inflammation. MYC is associated with lung cancer and response to oxidative stress while HD is associated with lung inflammation and response to oxidative stress.

[0270]

The 13 classifier proteins are also highly specific to these three networks (lung cancer, response to oxidative stress and lung inflammation). This is summarized in FIG. 10 where the classifier proteins (green), transcription factors (blue) and the three merged networks (orange) are depicted. Only ISLR is not connected through these three lung specific networks to the other proteins, although it is connected through cancer networks not specific to cancer. In summary, the modulation of the 13 classifier proteins can be tracked back to a few transcription factors specific to lung cancer, lung inflammation and oxidative stress networks.

[0271]

To address the question of classifier uniqueness, every classifier from the 21 robust and cooperative proteins was formed (Table 20). Due to the computational overhead, these classifiers could not be fully trained by Monte Carlo cross validation, consequently, only estimates of their performance could be obtained. Five high preforming alternative classifiers were identified and then fully trained. The classifier and the five high performing alternatives appear in Table 20. The frequency of each protein appears in the tally column, in particular, the first 11 proteins appear in 4 out of the 6 classifiers. These 11 proteins have significantly higher cooperative scores than the remaining proteins. By this analysis it appears that there is a core group of proteins that form the blood signature of a malignant nodule.

[0272]

The classifier and the high performing alternatives;
coefficients for proteins on the respective panels are shown.
Coop-
PanelPanelPanelPanelPanelProteinerative
ProteinClassifier11042413097212674810991960767TallyScore
Constant36.1627.7227.6923.4721.3223.17
ALDOA−0.8−0.67−0.87−0.83−0.64−0.6861.3
COIA1−1.56−1.04−1.68−1.37−0.94−1.263.7
TSP10.530.530.390.420.470.4161.8
FRIL0.390.450.390.410.410.4162.8
LRP1−1.59−0.84−1.321.15−0.84−0.8764.0
GRP781.411.141.31−0.340.780.661.4
ISLR1.41.031.080.750.7451.4
IBP3−0.23−0.21−0.38−0.33−0.5453.4
TETN−1.79−1.23−1.99−1.2642.5
PRDX1−0.34−0.38−0.36−0.441.5
LG3BP−0.58−0.61−0.38−0.4844.3
CD140.991.081.434.0
BGH31.731.67−0.8331.8
KIT−0.31−0.5631.4
GGH0.440.5231.3
AIFM1−0.5111.4
FIBA0.3111.1
GSLG1−0.711.2
ENPL01.1
EF1A101.2
TENX01.1

[0273]

This result suggests that there is a core group of proteins that define a high performance classifier, but alternative panels exist. However, changes in panel membership affect the tradeoff between NPV and ROR.

Example 7: Materials and Methods

[0274]

Assay Development Candidates Sourced from Tissue

[0275]

Patient samples obtained from fresh lung tumor resections were collected from Centre Hospitalier de l′Universite de Montreal and McGill University Health Centre under IRB approval and with informed patient consent. Samples were obtained from the tumor as well as from distal normal tissue in the same lung lobe. Plasma membranes of each pair of samples were then isolated from the epithelial cells of 30 patients (19 adenocarcinoma, 6 squamous, 5 large cell carcinoma) and endothelial cells of 38 patients (13 adenocarcinoma, 18 squamous, 7 large cell carcinoma) using immune-affinity protocols. Golgi apparatus were isolated from each pair of samples from 33 patients (18 adenocarcinoma, 14 squamous, 1 adenosquamous) using isopycnic centrifugation followed by ammonium carbonate extraction. Plasma membrane isolations and Golgi isolations were then analyzed by tandem mass spectrometry to identify proteins overexpressed in lung cancer tissue over normal tissue, for both plasma membranes and Golgi.

[0276]

Assay Development Candidates Sourced from Literature

[0277]

Candidate lung cancer biomarkers were identified from two public and one commercial database: Entrez, NBK3836, UniProt and NextBio. Terminologies were predefined for the database queries which were automated using PERL scripts. The mining was carried out on May 6, 2010 (UniProt), May 17, 2010 (Entrez) and Jul. 8, 2010 (NextBio), respectively. Biomarkers were then assembled and mapped to UniProt identifiers.

[0278]

Evidence of Presence in Blood

[0279]

The tissue-sourced and literature-source biomarker candidates were required to have evidence of presence in blood. For evidence by mass spectrometry detection, three datasets were used. HUP09504 contains 9504 human proteins identified by tandem mass spectrometry [13]. HUP0889, a higher confidence subset of HUP09504, contains 889 human proteins [18]. The PeptideAtlas (November 2009 build) was also used. A biomarker candidate was marked as previously detected if it contained at least one HUP0889, or at least two HUP09504 peptides, or at least two PeptideAtlas peptides.

[0280]

In addition to direct evidence of detection in blood by mass spectrometry, annotation as secreted proteins or as single-pass membrane proteins were also accepted as evidence of presence in blood. Furthermore, proteins in UniProt or designation as plasma proteins three programs for predicting whether or not a protein is secreted into the blood were used. These programs were TMHMM [20], SignalP and SecretomeP [22]. A protein was predicted as secreted if TMHMM predicted the protein had one transmembrane domain and SignalP predicted the transmembrane domain was cleaved; or TMHMM predicted the protein had no transmembrane domain and either SignalP or SecretomeP predicted the protein was secreted.

[0281]

SRM Assay Development

[0282]

SRM assays for 388 targeted proteins were developed based on synthetic peptides, using a protocol similar to those described in the literature [15, 23, 24]. Up to five SRM suitable peptides per protein were identified from public sources such as the PeptideAtlas, Human Plasma Proteome Database or by proteotypic prediction tools and synthesized. SRM triggered MS/MS spectra were collected on an ABSciex 5500 QTrap for both doubly and triply charged precursor ions. The obtained MS/MS spectra were assigned to individual peptides using MASCOT (cutoff score ≥15) [26]. Up to four transitions per precursor ion were selected for optimization. The resulting corresponding optimal retention time, declustering potential and collision energy were assembled for all transitions. Optimal transitions were measured on a mixture of all synthetic peptides, a pooled sample of benign patients and a pooled sample of cancer patients. Transitions were analyzed in batches, each containing up to 1750 transitions. Both biological samples were immuno-depleted and digested by trypsin and were analyzed on an ABSciex 5500 QTrap coupled with a reversed-phase (RP) high-performance liquid chromatography (HPLC) system. The obtained SRM data were manually reviewed to select the two best peptides per protein and the two best transitions per peptide. Transitions having interference with other transitions were not selected. Ratios between intensities of the two best transitions of peptides in the synthetic peptide mixture were also used to assess the specificity of the transitions in the biological samples. The intensity ratio was considered as an important metric defining the SRM assays.

[0283]

Processing of Plasma Samples

[0284]

Plasma samples were sequentially depleted of high- and medium-abundance proteins using immuno-depletion columns packed with the IgY14-Supermix resin from Sigma. The depleted plasma samples were then denatured, digested by trypsin and desalted. Peptide samples were separated using a capillary reversed-phase LC column (Thermo BioBasic 18 KAPPA; column dimensions: 320 μm×150 mm; particle size: 5 μm; pore size: 300 Å) and a nano-HPLC system (nanoACQUITY, Waters Inc.). The mobile phases were (A) 0.2% formic acid in water and (B) 0.2% formic acid in acetonitrile. The samples were injected (8 μl) and separated using a linear gradient (98% A to 70% A over 19 minutes, 5 μl/minute). Peptides were eluted directly into the electrospray source of the mass spectrometer (5500 QTrap LC/MS/MS, AB Sciex) operating in scheduled SRM positive-ion mode (Q1 resolution: unit; Q3 resolution: unit; detection window: 180 seconds; cycle time: 1.5 seconds). Transition intensities were then integrated by software MultiQuant (AB Sciex). An intensity threshold of 10,000 was used to filter out noisy data and undetected transitions.

[0285]

Plasma Samples Used for Discovery and Validation Studies

[0286]

Aliquots of plasma samples were provided by the Institut Universitaire de Cardiologie et de Pneumologie de Quebec (IUCPQ, Hospital Laval), New York University, the University of Pennsylvania, and Vanderbilt University (see Table 17). Subjects were enrolled in clinical studies previously approved by their Ethics Review Board (ERB) or Institutional Review Boards (IRB), respectively. In addition, plasma samples were provided by study investigators after review and approval of the sponsor's study protocol by the respective institution's IRB as required. Sample eligibility for the proteomic analysis was based on the satisfaction of the study inclusion and exclusion criteria, including the subject's demographic information, the subject's corresponding lung nodule radiographic characterization by chest computed tomography (CT), and the histopathology of the lung nodule obtained at the time of diagnostic surgical resection. Cancer samples had a histopathologic diagnosis of either non-small cell lung cancer (NSCLC), including adenocarcinoma, squamous cell, large cell, or bronchoalveolar cell carcinoma and a radiographic nodule of 30 mm or smaller. Benign samples, including granulomas, hamartomas and scar tissue, were also required to have a radiographic nodule of 30 mm or smaller and either histopathologic confirmation of being non-malignant or radiological confirmation in alignment with clinical guidelines. To ensure the accuracy of the clinical data, independent monitoring and verification of the clinical data associated with both the subject and lung nodule were performed in accordance with the guidance established by the Health Insurance Portability and Accountability Act (HIPAA) of 1996 to ensure subject privacy.

[0287]

Study Design

[0288]

The objective of the study design was to eliminate clinical and technical bias. Clinically, cancer and benign samples were paired so that they were from the same site, same gender, nodule sizes within 10 mm, age within 10 years, and smoking history within 20 pack years. Up to pairs of matched cancer and benign samples per batch were assigned iteratively to processing batches until no statistical bias was demonstrable based on age, gender or nodule size.

[0289]

Paired samples within each processing batch were further randomly and repeatedly assigned to positions within the processing batch, until the absolute values of the corresponding Pearson correlation coefficients between position and gender, nodule size, and age were less than Afterwards, each pair of cancer and benign samples was randomized to their relative positions. To provide a control for sample batching, three 200 μl aliquots of a pooled human plasma standard (HPS) (Bioreclamation, Hicksville, NY) were positioned at the beginning, middle and end of each processing batch, respectively. Samples within a batch were analyzed together.

[0290]

Logistic Regression Model

[0291]

The logistic regression classification method was used to combine a panel of transitions into a classifier and to calculate a classification probability score between 0 and 1 for each sample. The probability score (Ps) of a sample was determined as Ps=1/[1+exp(−a−Σi=1Nβi*{hacek over (I)}i,s)], where {hacek over (I)}i,s, was the logarithmically transformed (base 2), normalized intensity of transition i in sample s, βiwas the corresponding logistic regression coefficient, a was a classifier-specific constant, and N was the total number of transitions in the classifier. A sample was classified as benign if Pswas less than a decision threshold. The decision threshold can be increased or decreased depending on the desired NPV. To define the classifier, the panel of transitions (i.e. proteins), their coefficients, the normalization transitions, classifier coefficient α and the decision threshold must be learned (i.e. trained) from the discovery study and then confirmed using the validation study.

[0292]

Discovery of the Rule-Out Classifier

[0293]

A summary of the 143 samples used for classifier discovery appears in Table 17 and processed as described above.

[0294]

Protein transitions were normalized as described above. Transitions that were not detected in at least 50% of the cancer samples or 50% of the benign samples were eliminated leaving 117 transitions for further consideration. Missing values for these transitions were replaced by half the minimum detected value over all samples for that transition.

[0295]

The next step was finding the set of most cooperative proteins. The cooperative score of a protein is the number of high performing panels it participates in divided by the number of such panels it could appear on by chance alone. Hence, a cooperative score above 1 is good, and a score below 1 is not. The cooperative score for each protein is estimated by the following procedure:

[0296]

One million random panels of 10 proteins each, selected from the 117 candidates, were generated. Each panel of 10 proteins was trained using the Monte Carlo cross validation (MCCV) method with a 20% hold-off rate and one hundred sample permutations per panel) to fit a logistic regression model and its performance assessed by partial AUC [28].

[0297]

By generating such a large number of panels, we sample the space of classifiers sufficiently well to find some high performers by chance. The one hundred best random panels (see Table 2) out of the million generated were kept and for each of the 117 proteins we determined how frequently each occurred on these top panels. Of the 117 proteins, 36 had frequency more than expected by chance, after endogenous normalizers were removed. (Table 22) The expected number of panels on which a protein would appear by chance is 100*10/117=8.33. The cooperative score for a protein is the number of panels it appears on divided by 8.33.

[0298]

ClassifierTSP1_HUMANTHBS11.80.250.24GFLLLASLR_495.31_559.40220.530.44510
ClassifierCOIA1_HUMANCOL18A13.70.160.25AVGLAGTFR_446.26_721.4011−1.56−0.9135
ClassifierISLR_HUMANISLR1.40.320.25ALPGTPVASSQPR_640.85_841.50141.400.83
ClassifierTETN_HUMANCLEC3B2.50.260.26LDTLAQEVALLK_657.39_330.2020−1.79−1.0258000
ClassifierFRIL_HUMANFTL2.80.310.26LGGPEAGLGEYLFER_804.40_913.40240.390.17Secreted, Epi,12
Endo
ClassifierGRP78_HUMANHSPA51.40.270.27TWNDPSVQQDIK_715.85_260.20231.410.55Secreted, 100
Epi, Endo
ClassifierALDOA_HUMANALDOA1.30.260.28ALQASALK_401.25_617.407−0.80−0.26Secreted, Epi250
ClassifierBGH3_HUMANTGFBI1.80.210.28LTLLAPLNSVFK_658.40_804.5081.730.54Epi140
ClassifierLG3BP_HUMANLGALS3BP4.30.290.29VEIFYR_413.73_598.3025−0.58−0.21Secreted440
ClassifierLRP1_HUMANLRP14.00.130.32TVLWPNGLSLDIPAGR_15−1.59−0.83Epi20
855.00_400.20
ClassifierFIBA_HUMANFGA1.10.310.35NSLFEYQK_514.76_714.30260.310.13130000
ClassifierPRDX1_HUMANPRDX11.50.320.37QITVNDLPVGR_606.30_428.3016−0.34−0.26Epi60
ClassifierGSLG1_HUMAN GLG11.20.340.45IIIQESALDYR_660.86_338.2027−0.70−0.44Epi, Endo
RobustKIT_HUMANKIT1.40.330.468.2
RobustCD14_HUMANCD144.00.330.48Epi420
RobustEF1A1_HUMANEEF1A11.20.320.56Secreted, Epi61
RobustTENX_HUMANTNXB1.10.300.56Endo70
RobustAIFM1_HUMANAIFM11.40.320.70Epi, Endo1.4
RobustGGH_HUMANGGH1.30.320.81250
RobustIBP3_HUMANIGFBP33.40.321.825700
RobustENPL_HUMANHSP90B11.10.295.90Secreted, Epi, 88
Endo
Non-RobustERO1A_HUMANERO1L6.2Secreted, Epi,
Endo
Non-Robust6PGD_HUMANPGD4.3Epi, Endo29
Non-RobustICAM1_HUMANICAM13.971
Non-RobustPTPA_HUMANPPP2R42.1Endo3.3
Non-RobustNCF4_HUMANNCF42.0Endo
Non-RobustSEM3G_HUMANSEMA3G1.9
Non-Robust1433T_HUMANYWHAQ1.5Epi180
Non-RobustRAP2B_HUMANRAP2B1.5Epi
Non-RobustMMP9_HUMANMMP91.428
Non-RobustFOLH1_HUMANFOLH11.3
Non-RobustGSTP1_HUMANGSTP11.3Endo32
Non-RobustEF2_HUMANEEF21.3Secreted, Epi30
Non-RobustRAN_HUMANRAN1.2Secreted, Epi 4.6
Non-RobustSODM_HUMANSOD21.2Secreted7.1
Non-RobustDSG2_HUMANDSG21.1Endo2.7

The 36 most cooperative proteins are listed in Table 22.

[0299]

ClassifierTSP1_HUMANTHBS11.80.250.24GFLLLASLR_495.31_559.40220.530.44510
ClassifierCOIA1_HUMANCOL18A13.70.160.25AVGLAGTFR_446.26_721.4011−1.56−0.9135
ClassifierISLR_HUMANISLR1.40.320.25ALPGTPVASSQPR_640.85_841.50141.400.83
ClassifierTETN_HUMANCLEC3B2.50.260.26LDTLAQEVALLK_657.39_330.2020−1.79−1.0258000
ClassifierFRIL_HUMANFTL2.80.310.26LGGPEAGLGEYLFER_804.40_913.40240.390.17Secreted, Epi,12
Endo
ClassifierGRP78_HUMANHSPA51.40.270.27TWNDPSVQQDIK_715.85_260.20231.410.55Secreted, Epi,100
Endo
ClassifierALDOA_HUMANALDOA1.30.260.28ALQASALK_401.25_617.407−0.80−0.26Secreted, Epi250
ClassifierBGH3_HUMANTGFBI1.80.210.28LTLLAPLNSVFK_658.40_804.5081.730.54Epi140
ClassifierLG3BP_HUMANLGALS3BP4.30.290.29VEIFYR_413.73_598.3025−0.58−0.21Secreted440
ClassifierLRP1_HUMANLRP14.00.130.32TVLWPNGLSLDIPAGR_855.00_400.2015−1.59−0.83Epi20
ClassifierFIBA_HUMANFGA1.10.310.35NSLFEYQK_514.76_714.30260.310.13130000
ClassifierPRDX1_HUMANPRDX11.50.320.37QITVNDLPVGR_606.30_428.316−0.34−0.26Epi60
ClassifierGSLG1_HUMANGLG11.20.340.45IIIQESALDYR_660.86_338.2027−0.70−0.44Epi, Endo
RobustKIT_HUMANKIT1.40.330.468.2
RobustCD14_HUMANCD144.00.330.48Epi420
RobustEF1A1_HUMANEEF1A11.20.320.56Secreted, Epi61
RobustTENX_HUMANTNXB1.10.300.56Endo70
RobustAIFM1_HUMANAIFM11.40.320.70Epi, Endo1.4
RobustGGH_HUMANGGH1.30.320.81250
RobustBP3_HUMANIGFBP33.40.321.825700
RobustENPL_HUMANHSP90B11.10.295.90Secreted, Epi,88
Endo
Non-RobustERO1A_HUMANERO1L6.2Secreted, Epi,
Endo
Non-Robust6PGD_HUMANPGD4.3Epi, Endo29
Non-RobustICAM1_HUMANICAM13.971
Non-RobustPTPA_HUMANPPP2R42.1Endo3.3
Non-RobustNCF4_HUMANNCF42.0Endo
Non-RobustSEM3G_HUMANSEMA3G1.9
Non-Robust1433T_HUMANYWHAQ1.5Epi180
Non-RobustRAP2B_HUMANRAP2B1.5Epi
Non-RobustMMP9_HUMANMMP91.428
Non-RobustFOLH1_HUMANFOLH11.3
Non-RobustGSTP1_HUMANGSTP11.3Endo32
Non-RobustEF2_HUMANEEF21.3Secreted, Epi30
Non-RobustRAN_HUMANRAN1.2Secreted, Epi4.6
Non-RobustSODM_HUMANSOD21.2Secreted7.1
Non-RobustDSG2_HUMANDSG21.1Endo2.7

[0300]

The set of 36 cooperative proteins was further reduced to a set of 21 proteins by manually reviewing raw SRM data and eliminating proteins that did not have robust SRM transitions due to low signal to noise or interference.

[0301]

Proteins were iteratively eliminated from the set of 21 proteins until a classifier with the optimal partial AUC was obtained. The criteria for elimination was coefficient stability. In a logistic regression model each protein has a coefficient. In the process of training the model the coefficient for each protein is determined. When this is performed using cross validation (MCCV), hundreds of coefficient estimates for each protein are derived. The variability of these coefficients is an estimate of the stability of the protein. At each step the proteins were trained using MCCV (hold out rate 20%, ten thousand sample permutations per panel) to a logistic regression model and their stability measured. The least stable protein was eliminated. This process continued until a 13 protein classifier with optimal partial AUC was reached.

[0302]

Finally, the 13 protein classifier was trained to a logistic regression model by MCCV (hold out rate 20%, twenty thousand sample permutations). The thirteen proteins for the rule-out classifier are listed in Table 18 along with their highest intensity transition and model coefficient.

[0303]

Selection of a Decision Threshold

[0304]

Assuming the cancer prevalence of lung nodules is prev, the performance of a classifier (NPV and ROR) on the patient population with lung nodules was calculated from sensitivity (sens) and specificity (spec) as follows:

[0305]

[0306]

The threshold separating calls for cancer or benign samples was then selected as the probability score with NPV ≥90% and ROR ≥20%. As we expect the classifier's performance measured on the discovery set to be an overestimate, the threshold is selected to be a range, as performance will usually degrade on an independent validation set.

[0307]

Validation of the Rule-Out Classifier

[0308]

52 cancer and 52 benign samples (see Table 17) were used to validate the performance of the 13 protein classifier. Half of the samples were placed in pre-determined processing batches analyzed immediately after the discovery samples and the other half of samples were analyzed at a later date. This introduced variability one would expect in practice. More specifically, the three HPS samples run in each processing batch were utilized as external calibrators. Details on HPS calibration are described below.

[0309]

Calibration by HPS Samples

[0310]

For label-free MS approach, variation on signal intensity between different experiments is expected. To reduce this variation, we utilized HPS samples as an external standard and calibrated the intensity between the discovery and validation studies. Assume that {hacek over (I)}i,sis the logarithmically transformed (base 2), normalized intensity of transition i in sample s, {hacek over (I)}i,disand {hacek over (I)}i,valare the corresponding median values of HPS samples in the discovery and the validation studies, respectively. Then the HPS corrected intensity is
Ĩi,s={hacek over (I)}i,s−{hacek over (I)}i,val+{hacek over (I)}i,dis
Consequently, assume that the probability for cancer of a clinical sample in the validation study is predicted as prob by the classifier. Then the HPS corrected probability of cancer of the clinical sample is calculated as follows:

[0311]

[0312]

Here SHPS,disand SHPS,valwere the median value of S of all HPS samples in the discovery and validation studies, respectively.

[0313]

Statistical Analysis

[0314]

All statistical analyses were performed with Stata, R and/or MatLab.

[0315]

Depletion Column Drift

[0316]

We observed an increase of signal intensity as more and more samples were depleted by the same column. We used transition intensity in HPS samples to quantify this technical variability. Assuming Ii,swas the intensity of transition i in a HPS sample s, the drift of the sample was defined as

[0317]


where Îiwas the mean value of Ii,samong all HPS samples that were depleted by the same column and the median was taken over all detected transitions in the sample. Then the drift of the column was defined as
driftcol=median(drifts>0)−median(drifts<0).

[0318]

Here the median was taken over all HPS samples depleted by the column. If no sample drift was greater or less than zero, the corresponding median was taken as 0. The median column drift was the median of drifts of all depletion columns used in the study.

[0319]

Identification of Endogenous Normalizing Proteins

[0320]

The following criteria were used to identify a transition as a normalizer:

    • Possessed the highest median intensity of all transitions from the same protein.
    • Detected in all samples.
    • Ranked high in reducing median technical CV (median CV of transition intensities that were measured on HPS samples) as a normalizer.
    • Ranked high in reducing median column drift that was observed in sample depletion.
    • Possessed low median technical CV and low median biological CV (median CV of transition intensities that were measured on clinical samples).

[0326]

Six transitions were selected and appear in Table 23.

[0327]

Panel of endogenous normalizers.
MedianMedian
SEQ IDTechnicalColumn Drift
Normalizer TransitionNOCV (%)(%)
PEDF_HUMANLQSLFDSPDFSK_692.34_593.302825.86.8
MASP1_HUMANTGVITSPDFPNPYPK_816.92_258.10626.518.3
GELS_HUMANTASDFITK_441.73_710.40527.116.8
LUM_HUMANSLEDLQLTHNK_433.23_499.302927.116.1
C163A_HUMANINPASLDK_429.24_630.303026.614.6
PTPRJ_HUMANVITEPIPVSDLR_669.89_896.503127.218.2
Normalization by Panel of Transitions25.19.0
Without Normalization32.323.8

[0328]

Data Normalization

[0329]

A panel of six normalization transitions (see Table 23) were used to normalize raw SRM data for two purposes: (A) to reduce sample-to-sample intensity variations within same study and (B) to reduce intensity variations between different studies. For the first purpose, a scaling factor was calculated for each sample so that the intensities of the six normalization transitions of the sample were aligned with the corresponding median intensities of all HGS samples. Assuming that Ni,Sis the intensity of a normalization transition i in sample s and {circumflex over (N)}ithe corresponding median intensity of all HGS samples, then the scaling factor for sample s is given by Ŝ/Ss, where

[0330]


is the median of the intensity ratios and Ŝ is the median of Ssover all samples in the study. For the second purpose, a scaling factor was calculated between the discovery and the validation studies so that the median intensities of the six normalization transitions of all HGS samples in the validation study were comparable with the corresponding values in the discovery study. Assuming that the median intensities of all HGS samples in the two studies are {circumflex over (N)}i,disand {circumflex over (N)}i,val, respectively, the scaling factor for the validation study is given by

[0331]

[0332]

Finally, for each transition of each sample, its normalized intensity was calculated as
Ĩi,s=Ii,s*R*Ŝ/Ss
where Ii,swas the raw intensity.

[0333]

Isolation of Membrane Proteins from Tissues

[0334]

Endothelial plasma membrane proteins were isolated from normal and tumor lung tissue samples that were obtained from fresh lung resections. Briefly, tissues were washed in buffer and homogenates were prepared by disrupting the tissues with a Polytron. Homogenates were filtered through a 180-μm mesh and filtrates were centrifuged at 900×g for 10 min, at 4° C. Supernatants were centrifuged on top of a 50% (w:v) sucrose cushion at 218,000×g for 60 min at 4° C. to pellet the membranes. Pellets were resuspended and treated with micrococcal nuclease. Membranes from endothelial cells were incubated with a combination of anti-thrombomodulin, antiACE, anti-CD34 and anti-CD144 antibodies, and then centrifuged on top of a 50% (w:v) sucrose cushion at 280,000×g for 60 min at 4° C. After pellets were resuspended, endothelial cell plasma membranes were isolated using MACS microbeads, treated with potassium iodide to remove cytoplasmic peripheral proteins.

[0335]

Epithelial plasma membrane proteins from normal and tumor lung tissue samples were isolated from fresh lung resections. Tissues were washed and homogenates as described above for endothelial plasma membrane proteins preparation. Membranes from epithelial cells were labeled with a combination of anti-ESA, anti-CEA, anti-CD66c and anti-EMA antibodies, and then centrifuged on top of a 50% (w:v) sucrose cushion at 218,000×g for 60 min at 4° C. Epithelial cell plasma membranes were isolated using MACS microbeads and the eluate was centrifuged at 337,000×g for 30 minutes at 4° C. over a 33% (w:v) sucrose cushion. After removing the supernatant and sucrose cushion, the pellet was resuspended in Laemmli/Urea/DTT.

[0336]

Isolation of Secreted Proteins from Tissues

[0337]

Secreted proteins were isolated from normal and tumor lung tissue samples that were isolated from fresh lung resections. Tissues were washed and homogenized using a Polytron homogenization. The density of the homogenates was adjusted to 1.4 M with concentrated sucrose prior to isolating the secretory vesicles by isopycnic centrifugation at 100,000×g for 2 hr at 4° C. on a 0.8 and 1.2 M discontinuous sucrose gradient. Vesicles concentrating at the 0.8/1.2 M interface were collected and further incubated for 25 minutes with 0.5 M KCl (final concentration) to remove loosely bound peripheral proteins. Vesicles were recuperated by ultracentrifugation at 150,000×g for one hour at 4° C. and then opened with 100 mM ammonium carbonate pH 11.0 for 30 minutes at 4° C. Secreted proteins were recovered in the supernatant following a 1-hour ultracentrifugation at 150,000×g at 4° C.

[0338]

Preparation of IgY14-SuperMix Immunoaffinity Columns

[0339]

Immunoaffinity columns were prepared in-house using a slurry containing a 2:1 ratio of IgY14 and SuperMix immunoaffinity resins, respectively (Sigma Aldrich). Briefly, a slurry (10 ml, 50%) of mixed immunoaffinity resins was added to a glass chromatography column (Tricorn, GE Healthcare) and the resin was allowed to settle under gravity flow, resulting in a 5 ml resin volume in the column. The column was capped and placed on an Agilent 1100 series HPLC system for further packing (20 minutes, 0.15M ammonium bicarbonate, 2 ml/min). The performance of each column used in the study was then assessed by replicate injections of aliquots of HPS sample. Column performance was assessed prior to beginning immunoaffinity separation of each batch of clinical samples.

[0340]

IgY14-Sumermix Immunoaffinity Chromatography

[0341]

Plasma samples (60 μl) were diluted (0.15M ammonium bicarbonate, 1:2 v/v, respectively) and filtered (0.2 μm AcroPrep 96-well filter plate, Pall Life Sciences) prior to immunoaffinity separation. Dilute plasma (90 μl) was separated on the IgY14-SuperMix column connected to an Agilent 1100 series HPLC system using a three buffers (loading/washing: 0.15M ammonium bicarbonate; stripping/elution: 0.1M glycine, pH 2.5; neutralization: 0.01M TrisHCl, 0.15M NaCl, pH 7.4) with a load-wash-elute-neutralization-re-equilibration cycle (36 minutes total time). The unbound and bound fractions were monitored using a UV absorbance (280 nm) and were baseline resolved after separation. Only the unbound fraction containing the low abundance proteins was collected for downstream processing and analysis. Unbound fractions were lyophilized prior to enzymatic digestion.

[0342]

Enzymatic Digestion of Low Abundance Proteins

[0343]

Low abundance proteins were reconstituted under mild denaturing conditions (200 μl of 1:1 0.1M ammonium bicarbonate/trifluoroethanol v/v) and allowed to incubate (30 minutes, room temperature, orbital shaker). Samples were then diluted (800 μl of 0.1M ammonium bicarbonate) and digested with trypsin (Princeton Separations; 0.4 μg trypsin per sample, 37° C., 16 hours). Digested samples were lyophilized prior to solid-phase extraction.

[0344]

Solid-Phase Extraction

[0345]

Solid phase extraction was used to reduce salt and buffer contents in the samples prior to mass spectrometry. The lyophilized samples containing tryptic peptides were reconstituted (350 μ10.01M ammonium bicarbonate) and allowed to incubate (15 minutes, room temperature, orbital shaker). A reducing agent was then added to the samples (30 μ10.05M TCEP) and the samples were incubated (60 minutes, room temperature). Dilute acid and a low percentage of organic solvent (375 μl 90% water/10% acetonitrile/0.2% trifluoroacetic acid) were added to optimize the solid phase extraction of peptides. The extraction plate (Empore C18, 3M Bioanalytical Technologies) was conditioned according to manufacturer protocol. Samples were loaded onto the solid phase extraction plate, washed (500 μl 95% water/5% acetonitrile/0.1% trifluoroacetic acid) and eluted (200 μl 52% water/48% acetonitrile/0.1% trifluoroacetic acid) into a collection plate. The eluate was split into two equal aliquots and each aliquot was taken to dryness in a vacuum concentrator. One aliquot was used immediately for mass spectrometry, while the other was stored (−80° C.) and used as needed. Samples were reconstituted (12 μl 90% water/10% acetonitrile/0.2% formic acid) just prior to LC-SRM MS analysis.

[0346]

Inclusion and Exclusion Criteria

[0347]

Plasma samples were eligible for the studies if they were (A) obtained in EDTA tubes, (B) obtained from subjects previously enrolled in IRB-approved studies at the participating institutions, and (C) archived, e.g. labeled, aliquotted and frozen, as stipulated by the study protocols. The samples must also satisfy the following inclusion and exclusion criteria:

    • 1) Inclusion Criteria:
    • 2) Sample eligibility was based on clinical parameters, including the following subject, nodule and clinical staging parameters:
      • a) Subject
        • i) age ≥40
        • ii) any smoking status, e.g. current, former, or never
        • iii) co-morbid conditions, e.g. COPD
        • iv) prior malignancy with a minimum of 5 years in clinical remission
        • v) prior history of skin carcinomas—squamous or basal cell
      • b) Nodule
        • i) Radiology
          • (1) size ≥4 mm and ≤70 mm (up to Stage 2B eligible)
          • (2) any spiculation or ground glass opacity
        • ii) pathology
          • (1) malignant—adenocarcinoma, squamous, or large cell
          • (2) benign—inflammatory (e.g. granulomatous, infectious) or non-inflam matory (e.g. hamartoma)
      • c) Clinical stage
        • i) Primary tumor: ≤T2 (e.g. 1A, 1B, 2A and 2B)
        • ii) Regional lymph nodes: N0 or N1 only
        • iii) Distant metastasis: M0 only
    • 3) Exclusion Criteria
      • a) Subject: prior malignancy within 5 years of IPN diagnosis
      • b) Nodule:
        • i) size data unavailable
        • ii) for cancer or benign SPNs, no pathology data available
        • iii) pathology—small cell lung cancer
      • c) Clinical stage
        • i) Primary tumor: ≥T3
        • ii) Regional lymph nodes: ≥N2
        • iii) Distant metastasis: ≥M1

[0377]

Power Analysis for the Discovery Study

[0378]

The power analysis for the discovery study was based on the following assumptions: 1) The overall false positive rate (α) was set to 0.05. 2) Sidak correction for multiple testing was used to calculate the effective αefffor testing 200 proteins, i.e.,

[0379]


The effective sample size was reduced by a factor of 0.864 to account for the larger sample requirement for the Mann-Whitney test than for the t-test. 4) The overall coefficient of variation was set to 0.43 based on a previous experience. 5) The power (143) of the study was calculated based on the formula for the two-sample, two-sided t-test, using effective αeffand effective sample size. The power for the discovery study was tabulated in Table 24 by the sample size per cohort and the detectable fold difference between control and disease samples.

[0380]

Cohort size required to detect protein fold
changes with a given probability.
Detectable Protein Fold Difference
Cohort Size1.251.51.752
200.0110.1120.3680.653
300.0250.2770.6980.925
400.0510.4950.9050.992
500.0880.6870.9770.999
600.1290.8120.9941
700.1830.9020.9991
800.2440.95311
900.3020.97711
1000.3690.9911

[0381]

Power Analysis for the Validation Study

[0382]

Sufficient cancer and benign samples are needed in the validation study to confirm the performance of the rule-out classifier obtained from the discovery study. We are interested in obtaining the 95% confidence intervals (CIs) on NPV and ROR for the rule-out classifier. Using the Equations in the Selection of a Decision Threshold section herein, one can derive sensitivity (sens) and specificity (spec) as functions of NPV and ROR, i.e.,
sens=1−ROR*(1−NPV)/prev,
spec=ROR*NPV/(1−prev),
where prev is the cancer prevalence in the intended use population. Assume that the validation study contains Nccancer samples and NBbenign samples. Based on binomial distribution, variances of sensitivity and specificity are given by
var(sens)=sens*(1−sens)/Nc
var(spec)=spec*(1−spec)/NB
Using the Equations in the Selection of a Decision Threshold section herein, the corresponding variances of NPV and ROR can be derived under the large-sample, normal-distribution approximation as

[0383]


The two-sided 95% CIs of NPV and ROR are then given by ±za/2√{square root over (var(NPV))} and +za/2√{square root over (var(ROR))}, respectively, where za/2=1.959964 is the 97.5% quantile of the normal distribution. The anticipated 95% CIs for the validation study were tabulated in Table 25 by the sample size (Nc=NB=N) per cohort.

[0384]

The 95% confidence interval (CI) of NPV as
a function of cohort size. The corresponding 95% CI
of ROR is also listed. The prevalence was set at
28.5%. The expected NPV and ROR were set to
values in the discovery study, i.e., 90% and 52%,
respectively.
95% CI of95% CI of ROR
Cohort SizeNPV (± %)(± %)
1012.522.1
208.815.7
307.212.8
406.211.1
505.69.9
605.19.0
704.78.4
804.47.8
904.27.4
1003.97.0
1503.25.7
2002.85.0

[0385]

Calculation of Q-Values of Peptide and Protein Assays

[0386]

To determine the false positive assay rate the q-values of peptide SRM assays were calculated as follows. Using the distribution of Pearson correlations between transitions from different proteins as the null distribution (FIG. 7), an empirical p-value was assigned to a pair of transitions from the same peptide, detected in at least five common samples otherwise a value of ‘NA’ is assigned. The empirical p-value was converted to a q-value using the “qvalue” package in Bioconductor. Peptide q-values were below 0.05 for all SRM assays presented in Table 6.

[0387]

The q-values of protein SRM assays were calculated in the same way except Pearson correlations of individual proteins were calculated as those between two transitions from different peptides of the protein. For proteins not having two peptides detected in five or more common samples, their q-values could not be properly evaluated and were assigned ‘NA’.

[0388]

Impact of Categorical Confounding Factors

[0389]

Impact of categorical confounding factors on classifier score.
Cancerp-valueBenignp-value
Gender# Female700.786* 680.387* 
Median0.7010.570
score
(quartile(0.642-(0.390-
range)0.788)0.70)
# Male5455
Median0.7360.621
(quartile(0.628-(0.459-
range)0.802)0.723)
Smoking# Never80.435**340.365**
Status
Median0.6640.554
score
(quartile(0.648-(0.452-
range)0.707)0.687)
# Past9873
Median0.7030.586
(quartile(0.618-(0.428-
range)0.802)0.716)
# Current1713
Median0.7490.638
score
(quartile(0.657-(0.619-
range)0.789)0.728)
*p-value by Mann-Whitney test
**p-value by Kruskal-Wallis test

[0390]

Impact of Continuous Confounding Factors

[0391]

Impact of continuous confounding factors on classifier score.
Coefficient of linear fit
Correlation(95% CI)p-value
AgeAll0.1980.0030.002
  (0.001-0.005)
Cancer0.0120.0000.893
(−0.003-0.003)
Benign0.2480.0040.006
  (0.001-0.007)
Nodule sizeAll−0.057−0.002  0.372
(−0.005-0.002)
Cancer−0.0130.0000.889
(−0.005-0.004)
Benign−0.055−0.001  0.542
(−0.006-0.003)
Pack-yearAll0.1540.0010.019
   (0.00-0.002)
Cancer0.0600.0000.520
(−0.001-0.001)
Benign0.1080.0010.254
   (0.00-0.002)

Example 8: A Systems Biology-Derived, Blood-Based Proteomic Classifier for the Molecular Characterization of Pulmonary Nodules

SUMMARY

[0392]

Each year millions of pulmonary nodules are discovered by computed tomography but remain undiagnosed as malignant or benign. As the majority of these nodules are benign, many patients undergo unnecessary and costly invasive procedures. This invention presents a 13-protein blood-based classifier for the identification of benign nodules. Using a systems biology strategy, 371 protein candidates were identified and selected reaction monitoring (SRM) assays developed for each. The SRM assays were applied in a multisite discovery study (n=143) with benign and cancer plasma samples matched on nodule size, age, gender and clinical site. Rather than identify the best individual performing proteins, the 13-protein classifier was formed from proteins performing best on panels. The classifier was validated on an independent set of plasma samples (n=104) demonstrating high negative predictive value (92%) and specificity (27%) sufficiently high to obviate one-in-four patients with benign nodules from invasive procedures. Importantly, validation performance on a nondiscovery clinical site showed NPV of 100% and specificity of 28%, arguing for the general effectiveness of the classifier. A pathway analysis demonstrated that the classifier proteins are likely modulated by a few transcription regulators (NF2L2, AHR, MYC, FOS) highly associated with lung cancer, lung inflammation and oxidative stress networks. Remarkably, the classifier score was independent of patient nodule size, smoking history and age. As these are the currently used risk factors for clinical management of pulmonary nodules, the application of this molecular test would provide a powerful complementary tool for physicians to use in lung cancer diagnosis.

[0393]

Rationale

[0394]

Computed tomography (CT) identifies millions of pulmonary nodules annually with many being undiagnosed as malignant or benign. The vast majority of these nodules are benign, but due to the threat of cancer, a significant number of patients with benign nodules undergo unnecessary invasive medical procedures costing the healthcare system billions of dollars annually. Consequently, there is a high unmet need for a non-invasive clinical test that can identify benign nodules with high probability.

[0395]

Presented is a 13-protein plasma test, or classifier, for identifying benign nodules. To develop the classifier, a systems biology approach based on the supposition that biological networks in tumors become disease-perturbed and alter the expression of their cognate proteins was adopted. This systems approach employs a variety of strategies to identify blood proteins that directly reflect lung cancer-perturbed networks.

[0396]

First, candidate biomarkers prioritized for inclusion on the classifier were those proteins secreted by or shed from the cell surface of lung cancer cells in contrast to normal lung cells. These are proteins both associated with lung cancer and also most likely to be emitted by a malignant pulmonary nodule into blood. The literature was also surveyed to identify blood proteins associated with lung cancer. In total, an initial list of 388 protein candidates for inclusion on the classifier were derived from these three sources.

[0397]

Another system-driven approach was to prioritize the 388 protein candidates for inclusion on the classifier by how frequently they appear on high performing protein panels, as opposed to their individual diagnostic performance. This strategy is motivated by the intent to capture the integrated behavior of proteins within lung cancer-perturbed networks. Proteins that appear frequently on high performing panels are called cooperative proteins. This is a defining step in the discovery of the classifier as the most cooperative proteins are often not the proteins with best individual performance.

[0398]

Third, the classifier is deconstructed in terms of its relationship to lung cancer networks. Ideally, the classifier consists of multiple proteins from multiple lung cancer-perturbed networks. We conjecture that measuring multiple proteins from the same lung cancer associated pathway increases the signal-to-noise ratio thus enhancing performance of the classifier.

[0399]

Selected reaction monitoring (SRM) mass spectrometry (MS) was utilized to measure the concentrations of the candidate proteins in plasma. SRM is a form of MS that monitors predetermined and highly specific mass products, called transitions, of particularly informative (proteotypic or protein-specific) peptides of targeted proteins. Briefly, SRM assays for proteins are based on the high reproducibility of peptide ionization, the foundation of MS. During a SRM analysis, the mass spectrometer is programmed to monitor for transitions of the specific protein(s) being assayed. The resulting chromatograms are integrated to provide quantitative or semi-quantitative protein abundance information. The benefits of SRM assays include high protein specificity, large multiplexing capacity, and both rapid and reliable assay development and deployment. SRM has been used for clinical testing of small molecule analytes for many years, and recently in the development of biologically relevant assays. Exceptional public resources exist to accelerate SRM assay development including the PeptideAtlas, the Plasma Proteome Project, the SRM Atlas and the PeptideAtlas SRM Experimental Library.

[0400]

In accordance with evolving guidelines for clinical test development, the classifier was discovered (n=143) and validated (n=104) using independent plasma sets from multiple clinical sites consistent with an intended use population of patients with lung nodules, defined as round opacities up to 30 mm in size. In contrast to other biomarker studies, utilizing biospecimens associated with the broad clinical spectrum of lung cancer (Stages I to IV), the cancer plasma samples analyzed were limited to Stage IA, which corresponds to the intended use population of lung nodules of size 30 mm or less. The classifier yielded a performance amendable to further clinical stratification of the intended use by parameters such as age, smoking history or nodule size, as guided by a clinician's diagnostic needs.

[0401]

Validated performance of the 13-protein classifier demonstrated a negative predictive value (NPV) of 92% and a specificity of 27%. For clinical utility, the classifier must reliably and frequently provide information that can participate in a physician's decision to avoid an invasive procedure. High NPV is required to ensure that the classifier reliably identifies benign nodules. Equivalently, malignant nodules are rarely (8% or less) reported as benign by the classifier. A specificity of 27% implies that one-in-four patients with a benign nodule can avoid invasive procedures, and so, frequently provides information of clinical utility. All validation samples were independent of discovery samples, and 37 came from a new clinical site. Performance on the samples from the new site demonstrated a NPV of 100% and a specificity of 28% suggesting that the classifier performance extends to new clinical settings. Remarkably, the classifier score is demonstrated to be independent of the patient's age, smoking history and nodule size, thereby complementing current clinical risk factors with an informative molecular dimension for evaluating the disease status of a pulmonary nodule.

[0402]

Results

[0403]

Table 28 presents the steps taken in the refinement of the initial 388 protein candidates down to the set of 13 classifier proteins used for validation and performance assessment. The results are presented in the same sequence.

[0404]

Steps in refining the 388 candidates down to the 13-protein classifier
Number of
ProteinsRefinement
388Lung cancer associated protein candidates
sourced from tissue and literature.
371Number of the 388 protein candidates
successfully developed into a SRM assay.
190Number of the 371 SRM protein assays detected
in plasma.
125Number of the 190 SRM protein assays detected
in at least 50% of cancer or 50% of benign
discovery samples.
36Number of the 125 detected proteins that were
cooperative.
21Number of the 36 cooperative proteins with
robust SRM assays (i.e. no interfering signals,
good signal-to-noise, etc.)
13Number of the 21 robust and cooperative
proteins with stable logistic regression
coefficients.

[0405]

Selection of Biomarker Candidates for Assay Development. To identify lung cancer biomarkers in blood that are shed or secreted from lung tumor cells, proteins overexpressed on the cell surface or over-secreted from lung cancer tumor cells relative to normal lung cells were identified from freshly resected lung tumors using organelle isolation techniques combined with mass spectrometry. In addition, an extensive literature search for lung cancer biomarkers was performed using public and private resources. Both the tissue-sourced biomarkers and literature-sourced biomarkers were required to have evidence of previous detection in blood. The tissue (217) and literature (319) candidates overlapped by 148 proteins, resulting in a list of 388 protein candidates.

[0406]

Development of SRM Assays. Standard synthetic peptide techniques were used to develop a 371-protein multiplexed SRM assay from the 388 protein candidates. For 17 of the candidates, appropriate synthetic peptides could not be developed or confidently identified. The 371 SRM assays were applied to plasma samples from patients with pathologically confirmed benign nodules and pathologically confirmed malignant lung nodules to determine how many of the 371 proteins could be detected in plasma. A total of 190 SRM assays were able to detect their target proteins in plasma (51% success rate). This success rate (51%) compares very favorably to similar efforts (16%) to develop large scale SRM assays for the detection of diverse cancer markers in blood. Of the 190 proteins detected in blood, 114 were derived from the tissue-sourced candidates and 167 derived from the literature-sourced candidates (91 protein overlap). It is conjectured that the 49% of candidate proteins not detected in blood were present, but below the level of detection of the technology.

[0407]

Classifier Discovery. A summary of the features of the 143 samples used for classifier discovery appears in Table 29. Samples were obtained from three clinical sites to avoid overfitting to a single clinical site. Participating clinical sites were Institut Universitaire de Cardiologie et de Pneumologie de Quebec (IUCPQ), New York University (NYU) and University of Pennsylvania (UPenn). All samples were selected to be consistent with intended use, specifically, having nodule size 30 mm or less. Cancer and benign samples were pathologically confirmed.

[0408]

Clinical characteristics of subjects and nodules in the discovery and validation studies
CancerBenignp CancerBenignp
nnvaluennvalue
CharacteristicsDiscovery StudyValidation Study
Subjects72715252
Age (year)*65640.4663620.03
(59-72)(52-71)(60-73)(56-67)
Gender1.000.85
Male29282527
Female43432725
Smoking History
Status0.0060.006
Never§519315
Former60443829
Current66117
No Data1201
Pack-Year*37200.00140270.09
(20-52)(0-40) (19-50)(0-50) 
Nodules
Size (mm)*13130.6916150.68
(10-16)(10-18)(13-20)(12-22)
Source1.000.89
IUCPQ||14141312
New York292869
Pennsylvania29291413
Vanderbilt001918
Histopathology
Benign Diagnosis
Granuloma4826
Hamartoma96
Scar22
Other**1218
Cancer Diagnosis
Adenocarcinoma4125
Squamous Cell315
Large Cell02
Bronchioloalveolar30
(BAC)
Adenocarcinoma/BAC215
Other††45
*Data shown are median values with quartile ranges indicated in parentheses.
Mann-Whitney test.
Fisher's exact test.
§A never smoker is defined as an individual who has a lifetime history of smoking less than 100 cigarettes.
A pack-year is defined as the product of the total number of years of smoking and the average number of packs of cigarettes smoked daily. Pack-year data were not available for 4 cancer and 6 benign subjects in the discovery set and 2 cancer and 3 benign subjects in the validation set.
||IUCPQ is the Institute Universitaire de Cardiologie et de Pneumologie de Quebec.
**For the discovery study, the Benign Diagnosis “Other” category included: amyloidosis, n = 2; fibroelastic nodule, n = 1; fibrosis, n = 1; hemorrhagic infarct, n = 1; lymphoid aggregate, n = 1; organizing pneumonia, n = 3; pulmonary infarct, n = 1; sclerosing hemangioma, n = 1; and subpleural fibrosis with benign lymphoid hyperplasia, n = 1. For the validation study, the Benign Diagnosis “Other” category included: amyloidosis, n = 1; bronchial epithelial cells, n = 4; bronchiolitis interstitial fibrosis, n = 1; emphysematous lung, n = 1; fibrotic inflammatory lesion, n = 1; inflammation, n = 1; parenchymal intussusception, n = 1; lymphangioma, n = 1; mixed lymphocytes and histiocytes, n = 1; normal parenchyma, n = 1; organizing pneumonia, n = 1; pulmonary infarct, n = 2; respiratory bronchiolitis, n = 1; and squamous metaplasia, n = 1.
††For the discovery study, the non-small cell lung cancer (NSCLC) Diagnosis “Other” category included: adenocarcinoma squamous cell mixed, n = 1; large cell squamous cell mixed, n = 1; pleomorphic carcinoma, n = 1, and not specified, n = 1. For the validation study, the NSCLC Diagnosis “Other” category included: carcinoid, n = 2; large cell squamous cell mixed, n = 1; and not specified, n = 2.

[0409]

Benign and cancer samples were paired by matching on age, gender, nodule size and clinical site to avoid bias during SRM analysis and also to ensure that the biomarkers discovered were not markers of age, gender, nodule size or clinical site.

[0410]

The 371-protein SRM assay was applied to the 143 discovery samples and the resulting transition data were analyzed to derive a 13-protein classifier using a logistic regression model (Table 30). The key step in this refinement (Table 28) was the identification of 36 cooperative proteins of which 21 had robust SRM signal. A protein was deemed cooperative if found more frequently on the best performing panels than expected by chance alone, with the significance determined using the following statistical estimation procedure. Briefly, a million random 10-protein panels were generated and the frequency of each protein among the best performing panels (p value ≤104) was calculated. These proteins were sampled from the list of 125 proteins reproducibly detected in either benign samples or in cancer samples (see Table 28). Full details of the estimation procedure and the full discovery process are described in Materials and Methods in Example 9. Importantly, the 13-protein classifier was fully defined before validation was performed.

[0411]

The 13-protein logistic regression classifier
Constant (α) equals to 36.16.
SEQ
ProteinIDCoeffi-
(Human)TransitionNOcient
LRP1TVLWPNGLSLDIPAGR_855.00_400.2015−1.59
BGH3LTLLAPLNSVFK_658.40_804.5081.73
COIA1AVGLAGTFR_446.26_721.4011−1.56
TETNLDTLAQEVALLK_657.39_330.2020−1.79
TSP1GFLLLASLR_495.31_559.40220.53
ALDOAALQASALK_401.25_617.407−0.80
GRP78TWNDPSVQQDIK_715.85_260.20231.41
ISLRALPGTPVASSQPR_640.85_841.50141.40
FRILLGGPEAGLGEYLFER_804.40_913.40240.39
LG3BPVEIFYR_413.73_598.3025−0.58
PRDX1QITVNDLPVGR_606.30_428.3016−0.34
FIBANSLFEYQK_514.76_714.30260.31
GSLG1IIIQESALDYR_660.86_338.2027−0.70

[0412]

Classifier Validation. A total of 52 cancer and 52 benign samples (Table 29) were used to validate the performance of the 13-protein classifier. All validation samples were from different patients than the discovery samples. In addition, 36% of the validation samples were sourced from a new fourth clinical site, Vanderbilt University (Vanderbilt). A new clinical site participating in the validation study provides greater confidence that the classifier's performance generalizes beyond the discovery study. The remaining validation samples were selected randomly from the discovery sites. Samples were selected to be consistent with intended use and matched as in the discovery study.

[0413]

The classifier was applied to the validation samples and analyzed (Materials and Methods in Example 9). The performance of the classifier is presented in FIG. 12 in terms of negative predictive value (NPV) and specificity (SPC), as these are the two most clinically relevant measures. NPV is the population-based probability that a nodule predicted to be benign by the classifier is truly benign. As the NPV is representative of the classifier's performance on the intended use population, it can be calculated from the classifier's sensitivity, specificity and the estimated cancer prevalence (20%) in the intended use population. Specificity is the percentage of benign nodules that are predicted to be benign by the classifier. The classifier generates a cancer probability score, ranging from 0 to 1. Any reference value in this range can be defined so that a sample is predicted to be benign if the sample's classifier score is below the reference value, or predicted to be malignant if the sample's classifier score is above the reference value. The reference value used in practice depends primarily on the physician and his/her minimum required NPV. For the purposes of illustration we assume that the NPV requirement is 90%.

[0414]

At reference value 0.43, the classifier has NPV of 96%+/−4% and specificity of 45%+/−13% on the discovery samples, where 95% confidence intervals are reported. At the same reference value of 0.43, the classifier has NPV of 92%+/−7% and specificity of 27%+/−12% on the validation samples. Table 31 reports the classifier's performance for discovery and validation sample sets and for multiple lung cancer prevalences. For each lung cancer prevalence, the reference value was selected to ensure NPV is 90% or more.

[0415]

Performance of the classifier in discovery and
validation at three cancer prevalences
PrevalenceReferenceSensitivity SpecificityNPVPPV
Dataset(%)Value(%)(%)(%)(%)
Discovery200.4393459630
(n = 143)250.3796389634
300.3396349538
Validation200.4390279224
(n = 104)250.3792239029
300.3394219034
Vanderbilt200.431002810026
(n = 37)250.371002210030
300.331001710034
NPV is negative predictive value. PPV is positive predictive value.

[0416]

NPV is negative predictive value. PPV is positive predictive value.

[0417]

The performance of the 13-protein classifier on validation samples from the new clinical site (Vanderbilt) is a great indicator of the classifier's performance on future samples, and a strong sign that the classifier is not overfit to the three discovery sites. The NPV and specificity on the Vanderbilt samples are 100% and 28%, respectively, at the same reference value 0.43.

[0418]

FIG. 13 presents the application of the classifier to all 247 discovery and validation samples. FIG. 13 compares the clinical risk factors of smoking (measured in pack years) and nodule size (proportional to the diameter of each circle) to the classifier score assigned to each sample. Nodule size does not appear to increase with the classifier score. Indeed, both large and small nodules are spread across the classifier score spectrum. To quantify this observation, the Pearson correlation between the classifier score and nodule size, smoking history pack-year and age were calculated and found to be insignificant (Table 32). The implication of this observation is remarkable. The classifier provides information on the disease status of a pulmonary nodules that is independent of the three currently used risk factors for malignancy (age, smoking history and nodule size), and thus provides incremental molecular information of great added clinical value. For a similar plot of nodule size vs. classifier score, see FIG. 15.

[0419]

Continuous Clinical Characteristics
Coefficient
SamplePearsonof95% CI* ofp-value on
CharacteristicsGroupCorrelationLinear FitCoefficientCoefficient
Subject
AgeAll0.1900.005  (0.002, 0.003
−0.008)
Cancer0.0150.000(−0.004, 0.871
−0.004)
Benign0.2270.005  (0.001, 0.012
−0.010)
Smoking HistoryAll0.1850.002  (0.000, 0.005
Pack-Years−0.003)
Cancer0.0890.001(−0.001, 0.339
−0.002)
Benign0.1390.001  (0.000, 0.140
−0.003)
Nodule
SizeAll−0.071−0.003(−0.008, 0.267
−0.002)
Cancer−0.081−0.003(−0.009, 0.368
−0.003)
Benign−0.035−0.001(−0.008, 0.700
−0.005)
Categorical Clinical Characteristicsp-valuep-value
Classifieronon
CharacteristicsScoreCancerCancerBenignBenign
Gender0.477†0.110†
FemaleMedian0.7860.479
(quartile range)(0.602-0.894)(0.282-0.721)
MaleMedian0.8150.570
(quartile range)(0.705-0.885)(0.329-0.801)
Smoking0.652‡0.539‡
History
Status
NeverMedian0.7070.468
(quartile range)(0.558-0.841)(0.317-0.706)
PastMedian0.8040.510
(quartile range)(0.616-0.892)(0.289-0.774)
CurrentMedian0.7900.672
(quartile range)(0.597-0.876)(0.437-0.759)

[0420]

The Molecular Foundations of the Classifier. To address the biological relevance of the 13 classifier proteins, they were submitted for pathway analysis using IPA (Ingenuity Systems). It is identified that the transcription regulators most likely to cause a modulation of these 13 proteins. Using standard IPA analysis parameters, the four most significant (see Materials and Methods in Example 9) nuclear transcription regulators were FOS (proto-oncogene c-Fos), NF2L2 (nuclear factor erythroid 2-related factor 2), AHR (aryl hydrocarbon receptor) and MYC (myc proto-oncogene protein). These proteins regulate 12 of the 13 classifier proteins, with ISLR being the exception (see below).

[0421]

FOS is common to many forms of cancer. NF2L2 and AHR are associated with lung cancer, oxidative stress response and lung inflammation. MYC is associated with lung cancer and oxidative stress response. These four transcription regulators and the 13 classifier proteins, collectively, are also highly associated (p-value 1.0e-07) with the same three biological networks, namely, lung cancer, lung inflammation and oxidative stress response. This is summarized in FIG. 14 where the classifier proteins (green), transcription regulators (blue) and the three merged networks (orange) are depicted. Only ISLR (Immunoglobulin superfamily containing leucine-rich repeat protein) is not connected through these three networks to other classifier proteins, although it is connected through cancer networks not specific to lung. In summary, the modulation of the 13 classifier proteins can be linked back to a few transcription regulators highly associated with lung cancer, lung inflammation and oxidative stress response networks; three biological processes reflecting aspects of lung cancer.

[0422]

The present invention distinguishes itself in multiple ways. First, the performance of the 13-protein classifier achieves intended use performance requirements with NPV (and sensitivity) of at least 90% or higher in validation, across multiple prevalence estimates (see Table 31). Second, intended use population samples (nodule size 30 mm or less and/or Stage IA) were used in discovery and validation, in contrast to prior studies where non-intended use samples ranging from Stage I to Stage IV were used. In some cases, nodule size information was not disclosed in prior work. Third, the 13-protein classifier was demonstrated to provide a score that is independent of the currently used cancer risk parameters of nodule size, smoking history and age.

[0423]

The utilization of SRM technology enables global interrogation of proteins associated with lung cancer processes in contrast to technologies such as those that multiplex antibodies where it is often not feasible to multiplex hundreds of candidate markers for a specific disease.

[0424]

Clinical Study Designs. The design and conduct of biomarker studies is necessarily impacted by the eventual intended use population and performance requirements for the clinical test. Emerging guidelines help in the design of studies that have greater chance of translating into clinical impact. In the design of the discovery and validation studies presented here, four requirements were especially important. First, conducting a multiple clinical site discovery study enabled us to determine those proteins robust to variations introduced by differences in site-to-site sample processing and management, as well as from any biological differences in the populations being served by the different site hospitals. Such a design is critical as site-to-site sources of variations can often exceed biological signal. Second, utilizing intended use samples, as defined by age, smoking history and nodule size, in discovery and validation phases enabled us to obtain a realistic estimate of the performance envelop of the classifier. Third, careful matching of cancer and benign cohorts on age, gender, nodule size and clinical site was critical in not only avoiding bias, but in the discovery and validation of a classifier that provides a score independent of these clinical factors as well as smoking history. Fourth, validation samples were from different patients than the discovery samples. Furthermore, 36% of the validation samples were from an entirely new clinical site, a critical validation step to show that results are not overfit to the sites used in the discovery phase. Performance on samples from the new clinical site was exceptionally high (NPV of 100%, specificity of 28%), yielding a high level of confidence in the performance of the test in clinical practice.

[0425]

Systems Biology and Blood Signatures. The integration of a systems biology approach to biomarker discovery with SRM technology enabled the simultaneous exploration of a large number of lung cancer relevant proteins, resulting in a highly sensitive classifier. The systems approach employed several strategies.

[0426]

First, proteins secreted or shed from the cell surface of lung cancer cells were identified (i.e. tissue-sourced) as these are likely lung cancer perturbed proteins to be detected in blood. Of the classifier's 13 proteins, seven were tissue-sourced, demonstrating that tissuesourcing is an effective method for prioritizing proteins for SRM assay development.

[0427]

A second systems driven approach was the identification of the most cooperative protein biomarkers. Cooperative proteins are those that may not be the best individual performers but appear frequently on high performance panels. Motivating this approach is the desire to derive a classifier with multiple proteins from multiple lung cancer associated networks. By monitoring multiple proteins and networks, it was expected that the classifier would be highly sensitive to the circulating signature of a malignant nodule, as demonstrated in validation.

[0428]

There are two confirmations of the effectiveness of the cooperative protein approach. A pathway analysis demonstrated that the classifier proteins are likely modulated by a small number of transcription regulators (AHR, NF2L2, MYC, FOS) highly associated with lung cancer, lung inflammation and oxidative stress response networks/processes. Chronic lung inflammation and oxidative stress response are both linked to NSCLC development. A strength of the classifier is that it monitors multiple proteins from these multiple lung cancer associated processes. This multiple protein, multiple process survey accounts for the high sensitivity of the classifier for detecting the circulating signature emitted by malignant nodules, and so, high NPV when the classifier calls a nodule benign.

[0429]

The second validation of the cooperative approach is a direct comparison to traditional biomarker strategies. Typically, proteins are shortlisted in the discovery process by filtering on individual diagnostic performance. To contrast the difference between filtering proteins based on strong individual performance as opposed to frequency on high performance panels, we calculated a p-value for each protein using the Mann-Whitney non-parametric test. Only 2 of the 36 cooperative proteins had a p-value below 0.05, a commonly used significance threshold for measuring individual performance. More importantly, we derived a “p-classifier” using the same steps for the 13-protein classifier derivation (see Table 28 and Materials and Methods in Example 9) except that the Mann Whitney p-value was used in place of cooperative score. The p-classifier achieved NPV 96% and specificity 18% in discovery and NPV 91% and specificity 19% in validation as compared to the 13-protein classifier performance of NPV 96% and specificity 45% in discovery and NPV 92% and specificity 27% in validation. Note that the reference value thresholds were selected to ensure NPV of at least 90%. Hence, we expect similar high NPV performance between the 13-protein cooperative classifier and the p-classifier. Specificity is the performance measure where a comparison can be made. This is where a significant drop in performance from the 13-protein cooperative classifier to the p-classifier is observed. This confirms that the best individual protein performers are not necessarily the best proteins for classifiers.

[0430]

Most Informative Proteins. Which proteins in the classifier are most informative? To answer this question all possible classifiers were constructed from the set of robust cooperative proteins and their performance measured. The frequency of each protein among the 100 best performing panels was determined. Four proteins (LRP1, COIAL ALDOA, LG3BP) were highly enriched with 95% of the 100 best classifiers having at least three of these four proteins (p-value <1.0e-100). Seven of eight proteins (LRP1, COIA1, ALDOA, LG3BP, BGH3. PRDX1, TETN, ISLR) appeared together on over half of all the best classifiers (p-value <1.0e-100). Note that the 13-protein classifier contains additional proteins as they further increase performance, likely by measuring proteins in the same three lung cancer networks (lung cancer, lung inflammation and oxidative stress). The conclusion is that high performance panels of cooperative proteins for pulmonary nodule characterization are similar in composition to one another with a preference for a set of particularly informative (cooperative) proteins.

[0431]

In summary, by integrating systems biology strategies for biomarker discovery (tissue-sourced candidates with cancer relevance, cooperative proteins, multiple proteins from multiple lung cancer associated networks), enabling technologies (SRM for global proteomic interrogation) and clinical focus (designing studies for intended use), this invention identifies a 13-protein proteomic classifier that provides molecular insight into the disease status of pulmonary nodules.

Example 9: Materials and Methods

[0432]

Identification of Candidate Plasma Proteins. Two approaches were employed to identify candidate proteins for a lung cancer classifier, including analysis of the proteome of lung tissues with a histopathologic diagnosis of NSCLC and a search of literature databases for lung cancer-associated proteins. All candidate proteins were also assessed for evidence of blood circulation and satisfied one or more requirement(s) for the evidence.

[0433]

Analysis of Plasma Samples Using SRM-MS. Briefly, the protocol for SRMMS analysis of plasma aliquots included immunodepletion on IgY14-Supermix resin columns (Sigma) of medium- and high-abundance proteins, denaturation, trypsin digestion, and desalting, followed by reversed-phase liquid chromatography and SRM-MS analysis of the obtained peptide samples.

[0434]

Development of SRM Assays. SRM assays for candidate proteins were developed based on synthetic peptides, as previously described. After identification and synthesis of up to five suitable peptides per protein, SRM triggered MS/MS spectra were collected on a 5500 QTrap® mass spectrometer for both doubly and triply charged precursor ions. The obtained MS/MS spectra were assigned to individual peptides using MASCOT and with a minimum cutoff score of 15. Up to four transitions per precursor ion were then selected for optimization. The resulting corresponding optimal retention time, declustering potential and collision energy were assembled for all transitions. Optimal transitions were measured on a mixture of all synthetic peptides and on two pooled plasma samples, each obtained from ten subjects with either benign or malignant, i.e. NSCLC, lung nodules at the Institut Universitaire de Cardiologie et de Pneumologie de Quebec (IUCPQ, Quebec, Canada). All subjects provided informed consent and contributed biospecimens in studies approved by the institution's Ethics Review Board (ERB). Plasma samples were processed as described above. Batches of 1750 transitions were analyzed by SRM-MS, with SRM-MS data manually reviewed to select the two best peptides per protein and the two best transitions per peptide. The intensity ratio, defined as the ratio between the intensities of the two best transitions of a peptide in the synthetic peptide mixture, was used to assess the specificity of the transitions in a biological sample. Transitions demonstrating interference with other transitions were not selected. A method to ensure the observed transitions corresponded to the peptides and proteins they were intended to measure was developed. In particular, 93% of peptide transitions developed had an error rate below 5%.

[0435]

Discovery Study Design. A retrospective, multi-center, case-control study was performed using archival K2-EDTA plasma aliquots previously obtained from subjects who provided informed consent and contributed biospecimens in studies approved by the Ethics Review Board (ERB) or the Institutional Review Boards (IRB) at the IUCPQ or New York University (New York, NY) and the University of Pennsylvania (Philadelphia, PA), respectively. In addition, plasma samples were provided by study investigators after review and approval of the sponsor's study protocol by the respective institution's ERB or IRB, as required. Sample eligibility for the proteomic analysis was based on the satisfaction of the study inclusion and exclusion criteria, including the subject's demographic information; the subject's corresponding lung nodule radiographic characterization by chest CT scan and a maximal linear dimension of 30 mm; and the histopathology of the lung nodule obtained at the time of diagnostic surgical resection, i.e. either NSCLC or a benign, i.e. non-malignant, process. Each cancer-benign sample pair was matched, as much as possible among eligible samples, by gender, nodule size (±10 mm), age (±10 years), smoking history pack-years (±20 pack-years), and by center. Independent monitoring and verification of the clinical data associated with both the subject and lung nodule were performed in accordance with the guidance established by the Health Insurance Portability and Accountability Act (HIPAA) of 1996 to ensure subject privacy. The study was powered with a probability of 92% to detect 1.5 fold differences in protein abundance between malignant and benign lung nodules.

[0436]

Logistic Regression Model. The logistic regression classification method was used to combine a panel of transitions into a classifier and to calculate a classification probability score between 0 and 1 for each sample. The probability score (P s) of a sample was determined as
Ps=1/[1+exp(−α−Σi=1Nβi*{hacek over (I)}i,s)],  (1)
where {hacek over (I)}i,swas the logarithmically transformed (base 2), normalized intensity of transition i in sample s, βiwas the corresponding logistic regression coefficient, α was a classifier-specific constant, and N was the total number of transitions in the classifier. A sample was classified as benign if Pswas less than a reference value or cancer otherwise. The reference value can be increased or decreased depending on the desired NPV. To define the classifier, the panel of transitions (i.e. proteins), their coefficients, the normalization transitions, classifier coefficient α and the reference value must be learned (i.e. trained) from the discovery study and then confirmed using the validation study.

[0437]

Lung Nodule Classifier Development. The goal of the discovery study was to derive a multivariate classifier with a target performance sufficient for clinical utility in the intended use population, i.e. a classifier having an NPV of 90% or higher. This goal was incorporated in the data analysis strategies. The classifier development included the following: normalization and filtering of raw SRM-MS data; identification of candidate proteins that occurred with a high frequency in top-performing panels; evaluation of candidate proteins based on SRM-MS signal quality; selection of candidate proteins for the final classifier based on their stability in performance; and training to a logistic regression model to derive the final classifier. Table 28 provides a summary overview of the primary steps.

[0438]

Normalization of raw SRM-MS data was performed to reduce sample-to-sample intensity variations using a panel of six endogenous proteins. After data normalization, SRMMS data were filtered down to transitions having the highest intensities of the corresponding proteins and satisfying the criterion for detection in a minimum of 50% of the cancer or 50% of the benign samples. A total of 125 proteins satisfied these criteria of reproducible detection. Missing values were replaced by half the minimum detected values of the corresponding transitions in all samples.

[0439]

Remaining transitions were then used to identify proteins, defined as cooperative proteins, that occurred with high frequency on top-performing protein panels. The cooperative proteins were derived using the following estimation procedure as it is not computational feasible to evaluate the performance of all possible protein panels.

[0440]

Monte Carlo cross validation (MCCV) (36) was performed on 1×106panels, each panel comprised of 10 randomly selected proteins and fitted to a logistic regression model, as described above, using a 20% holdout rate and 102sample permutations. The receiver operating characteristic (ROC) curve of each panel was generated and the corresponding partial area under the ROC curve (AUC) but above the boundary of sensitivity being 90%, defined as the partial AUC (37, 38), was used to assess the performance of the panel. By focusing on the performance of individual panels at high sensitivity region, the partial AUC allows for the identification of panels with high and reliable performance on NPV. The candidate proteins that occurred in the top 100 performing panels with a frequency greater than that expected by chance were identified as cooperative proteins. For each protein the cooperative score is defined as its frequency on the 100 high performance panels divided by the expected frequency. Highly cooperative proteins had a score of 1.75 or higher (the corresponding one-sided p value <0.05) while non-cooperative proteins had a score of 1 or less. Note that one million panels were sampled to ensure that the 100 top performing panels were exceptional (empirical p value ≤10−4). In addition, panels of size were used in this procedure based on empirical evidence that larger panels did not change the resulting list of cooperative proteins. We also wanted to avoid overfitting the logistic regression model. In total, 36 cooperative proteins were identified, including 15 highly cooperative proteins.

[0441]

Raw chromatograms of all transitions of cooperative proteins were then manually reviewed. Proteins with low signal-to-noise ratios and/or showing evidence of any interference were removed from further consideration for the final classifier. In total, 21 cooperative and robust proteins were identified.

[0442]

Remaining candidate proteins were then evaluated in an iterative, stepwise procedure to derive the final classifier. In each step, MCCV was performed using a holdout rate of 20% and 104 sample permutations to train the remaining candidate proteins to a logistic regression model and to assess the variability, i.e. stability, of the coefficient derived for each protein by the model. The protein having the least stable coefficient was identified and removed. Proteins for the final classifier were identified when the corresponding partial AUC was optimal. Seven of the 13 proteins in the final classifier were highly cooperative.

[0443]

Proteins in the final classifier were further trained to a logistic regression model by MCCV with a holdout rate of 20% and 2×104sample permutations.

[0444]

Lung Nodule Classifier Validation. The design of the validation study was identical to that of the discovery study, but involved K2-EDTA plasma samples associated with independent subjects and independent lung nodules not evaluated in the discovery study. Additional specimens were obtained from Vanderbilt University (Nashville, TN) with similar requirements for patient consent, IRB approval, and satisfaction of HIPAA requirements. Of the 104 total cancer and benign samples in the validation study, half were analyzed immediately after the discovery study, while the other half was analyzed later. The study was powered to observe the expected 95% confidence interval (CI) of NPV being 90±8%.

[0445]

The raw SRM-MS dataset in the validation study was normalized in the same way as the discovery dataset. Variability between the discovery and the validation studies was mitigated by utilizing human plasma standard (HPS) samples in both studies as external calibrator. Missing data in the validation study were then replaced by half the minimum detected values of the corresponding transitions in the discovery study. Transition intensities were applied to the logistic regression model of the final classifier learned previously in the training phase, from which classifier scores were assigned to individual samples. The performance of the lung nodule classifier on the validation samples was then assessed based on the classifier scores.

[0446]

IPA Pathway Analysis. Standard parameters were used. Specifically, in the search for nuclear transcription regulators, requirements were p-value <0.01 with a minimum of 3 proteins modulated. Significance was determined using a right-tailed Fisher's exact test using the IPA Knowledge Database as background.

[0447]

Candidate Biomarker Identification.

[0448]

Candidate Biomarkers Identified by Tissue Proteomics. Specimens of resected NSCLC (adenocarcinoma, squamous cell and large cell) lung tumors and non-adjacent normal tissue in the same lobe were obtained from patients who provided informed consent in studies approved by the Ethics Review Boards at the Centre Hospitalier de l′Universite de Montreal and the McGill University Health Centre.

[0449]

The proteomic analyses of lung tumor tissues targeted membrane-associated proteins on endothelial cells (adenocarcinoma, n=13; squamous cell, n=18; and large cell, n=7) and epithelial cells (adenocarcinoma, n=19; squamous cell, n=6; and large cell, n=5), and those associated with the Golgi apparatus (adenocarcinoma, n=13; squamous cell, n=15; and large cell, n=5).

[0450]

Membrane proteins from endothelial cells or epithelial cells and secreted proteins were isolated from normal or tumor tissues from fresh lung resections after washing in buffer and disruption with a Polytron to prepare homogenates. The cell membrane protocol included filtration using 180 μm mesh and centrifugation at 900×g for 10 min at 4° C., supernatants prior to layering on 50% (w:v) sucrose and centrifugation at 218,000×g for 1 h at 4° C. to pellet the membranes. Membrane pellets were resuspended and treated with micrococcal nuclease, and incubated with the following antibodies specified by plasma membrane type: endothelial membranes (anti-thrombomodulin, anti-ACE, anti-CD34 and anti-CD144 antibodies); epithelial membranes (anti-ESA, anti-CEA, anti-CD66c and anti-EMA antibodies), prior to centrifugation on top of a 50% (w:v) sucrose cushion at 280,000×g (endothelial) or 218,000×g (epithelial) for 1 h at 4° C. After pellet resuspension, plasma membranes were isolated using MACS microbeads. Endothelial plasma membranes were treated with KI to remove cytoplasmic peripheral proteins. The eluate of epithelial plasma membranes was centrifuged at 337,000×g for 30 min at 4° C. over a 33% (w:v) sucrose cushion, with resuspension of the pellet in Laemmli/Urea/DTT after removal of the supernatant and sucrose cushion.

[0451]

To isolate secreted tissue proteins, the density of the tissue homogenates (prepared as described above) was adjusted to 1.4 M sucrose prior to isolating the secretory vesicles by isopycnic centrifugation at 100,000×g for 2 h at 4° C. on a 0.8 and 1.2 M discontinuous sucrose gradient. Vesicles concentrating at the 0.8/1.2 M interface were collected and further incubated for 25 min with 0.5 M KCl to remove loosely bound peripheral proteins. Vesicles were recuperated by ultracentrifugation at 150,000×g for 1 h at 4° C. and then opened with 100 mM (NH 4)HCO 3 (pH 11.0) for 30 min at 4° C. Secreted proteins were recovered in the supernatant following ultracentrifugation at 150,000×g for 1 h at 4° C.

[0452]

Membrane or secreted proteins were then analyzed by Cell Carta® (Caprion, Montreal, Québec) proteomics platform, including digestion by trypsin, separation by strong cation exchange chromatography, and analysis by reversed-phase liquid chromatography coupled with electrospray tandem mass spectrometry (MS/MS). Peptides in the samples were identified by database searching of MS/MS spectra using MASCOT and quantified by a label-free approach based on their signal intensity in the samples, similar to those described in the literature. Proteins whose tumor-to-normal abundance ratio was either ≥1.5 or ≤⅔ were then identified as candidate biomarkers.

[0453]

Candidate Biomarkers Identified by Literature Searches. Automated literature searches using predefined terms and automated PERL scripts were performed on the following databases: UniProt on May 6, 2010, Entrez, NBK3836 on May 17, 2010, and NextBio on Jul. 8, 2010. Biomarker candidates were compiled and mapped to UniProt identifiers using the UniProt Knowledge Base.

[0454]

Presence of Candidate Biomarkers in the Blood. The tissue- and literature-identified biomarker candidates were required to demonstrate documented evidence in the literature or a database as a soluble or solubilized circulating protein. The first criterion was evidence by mass spectrometry detection, with a candidate designated as previously detected by the following database-specific criteria: a minimum of 2 peptides in HUP09504, which contains 9,504 human proteins identified by MS/MS; a minimum of 1 peptide in HUPO889, which is a higher confidence subset of HUP09504 containing 889 human proteins; or at least 2 peptides in Peptide Atlas (November 2009 build). The second criterion was annotation as either a secreted or single-pass membrane protein in UniProt. The third criterion was designation as a plasma protein in the literature. The fourth criterion was prediction as a secreted protein based on the use of various programs: prediction by TMHMM as a protein with one transmembrane domain, which however is cleaved based on prediction by SignalP; or prediction by TMHMM as having no transmembrane domain and prediction by either SignalP or SecretomeP as a secreted protein. All candidate proteins satisfied one or more of the criteria.

[0455]

Study Designs and Power Analyses.

[0456]

Sample, Subject and Lung Nodule Inclusion and Exclusion Criteria. The inclusion criteria for plasma samples were collection in EDTA-containing blood tubes; obtained from subjects previously enrolled in the Ethics Review Board (ERB) or the Institutional Review Boards (IRB) approved studies at the participating institutions; and archived, e.g. labeled, aliquoted and frozen, as stipulated by the study protocols.

[0457]

The inclusion criteria for subjects were the following: age ≥40; any smoking status, e.g. current, former, or never; any co-morbid conditions, e.g. chronic obstructive pulmonary disease (COPD); any prior malignancy with a minimum of 5 years in clinical remission; any prior history of skin carcinomas, e.g. squamous or basal cell. The only exclusion criterion was prior malignancy within 5 years of lung nodule diagnosis.

[0458]

The inclusion criteria for the lung nodules included radiologic, histopathologic and staging parameters. The radiologic criteria included size ≤4 mm and ≤30 mm, and any spiculation or ground glass opacity. The histopathologic criteria included either diagnosis of malignancy, e.g. non-small cell lung cancer (NSCLC), including adenocarcinoma (and bronchioloalveolar carcinoma (BAC), squamous, or large cell, or a benign process, including inflammatory (e.g. granulomatous, infectious) or non-inflammatory (e.g. hamartoma) processes. The clinical staging parameters included: primary tumor: ≤T1 (e.g. 1A and 1B); regional lymph nodes: N0 or N1 only; distant metastasis: M0 only. The exclusion criteria for lung nodules included the following: nodule size data unavailable; no pathology data available, histopathologic diagnosis of small cell lung cancer; and the following clinical staging parameters: primary tumor: ≥T2, regional lymph nodes: ≥N2, and distant metastasis: ≥M1.

[0459]

Sample Layout. Up to 15 paired samples per batch were assigned randomly and iteratively to experimental processing batches until no statistical bias was demonstrable on age, gender or nodule size. Paired samples within each processing batch were further randomly and repeatedly assigned to positions within the processing batch until the absolute values of the corresponding Pearson correlation coefficients between position and age, gender and nodule size were less than 0.1. Each pair of cancer and benign samples was then randomized to their relative positions in the batch. To provide a positive control for quality assessment, three 200 μl aliquots of a pooled human plasma standard (HPS) (Bioreclamation, Hicksville, NY) were positioned at the beginning, middle and end of each processing batch, respectively. Samples within a batch were analyzed together: sequentially during immunodepletion and SRM-MS analysis but in parallel during denaturing, digestion, and desalting.

[0460]

Power Analysis for the Classifier Discovery Study. The power analysis for the discovery study was based on the following assumptions: (A) The overall false positive rate (a) was set to 0.05. (B) Šidak correction for multiple testing was used to calculate the effective αefffor testing 200 proteins, i.e.

[0461]


The effective sample size was reduced by a factor of 0.864 to account for the larger sample requirement for the Mann-Whitney test than for the t-test (13). (D) The overall coefficient of variation was set to 0.43 based on a previous experience. (E) The power (1-(3) of the study was calculated based on the formula for the two-sample, two-sided t-test, using effective αeffand effective sample size.

[0462]

Power Analysis for the Classifier Validation Study. Sufficient cancer and benign samples are needed in the validation study to confirm the performance of the lung nodule classifier obtained from the discovery study. We are interested in obtaining the 95% confidence intervals (CIs) on NPV and specificity for the classifier. Assuming the cancer prevalence of lung nodules is prev, the negative predictive value (NPV) and the positive predictive value (PPV) of a classifier on the patient population with lung nodules were calculated from sensitivity (sens) and specificity (spec) as follows:

[0463]

[0464]

Using Eq. (S1) above, one can derive sensitivity as a function of NPV and specificity, i.e.

[0465]

[0466]

Assume that the validation study contains N c cancer samples and N B benign samples. Based on binomial distribution, variances of sensitivity and specificity are given by
var(sens)=sens*(1−sens)/Nc  (S4)
var(spec)=spec*(1−spec)/NB  (S5)
Using Eqs. (S1, S2) above, the corresponding variances of NPV and PPV can be derived under the large-sample, normal-distribution approximation as

[0467]


The two-sided 95% CIs of sensitivity, specificity, NPV and PPV are then given by ±za/2√{square root over (var(sens))}, ±za/2√{square root over (var(spec))}, ±za/2√{square root over (var(NPV))} and ±za/2√{square root over (var(PPV))}, respectively, where za/2=1.959964 is the 97.5% quantile of the normal distribution.

[0468]

Experimental Procedures.

[0469]

Immunoaffinity Chromatography. An immunoaffinity column was prepared by adding 10 ml of a 50% slurry containing a 2:1 ratio of IgY14 and SuperMix resins (Sigma Aldrich), respectively, to a glass chromatography column (Tricorn, GE Healthcare) and allowed to settle by gravity, yielding a 5 ml volume of resin in the column. The column was capped and placed on an HPLC system (Agilent 1100 series) for further packing with 0.15 M (NH4)HCO3at 2 ml/min for 20 min, with performance assessed by replicate injections of HPS aliquots. Column performance was assessed prior to immunoaffinity separation of each sample batch.

[0470]

To isolate low abundance proteins, 60 μl of plasma were diluted in 0.15M (NH4)HCO3(1:2 v/v) to a 180 μl final volume and filtered using a 0.2 μm AcroPrep 96-well filter plate (Pall Life Sciences). Immunoaffinity separation was conducted on a IgY14-SuperMix column connected to an HPLC system (Agilent 1100 series) using 3 buffers (loading/washing: 0.15 M (NH4)HCO3; stripping/elution: 0.1 M glycine, pH 2.5; and neutralization: 0.01 M TrisHCl and 0.15 M NaCl, pH 7.4) with a cycle comprised of load, wash, elute, neutralization and re-equilibration lasting 36 min. The unbound and bound fractions were monitored at 280 nm and were baseline resolved after separation. Unbound fractions (containing the low abundance proteins) were collected for downstream processing and analysis, and lyophilized prior to enzymatic digestion.

[0471]

Enzymatic Digestion and Solid-Phase Extraction. Lyophilized fractions containing low abundance proteins were digested with trypsin after being reconstituted under mild denaturing conditions in 200 μl of 1:1 0.1 M (NH 4)HCO3/trifluoroethanol (TFE) (v/v) and then allowed to incubate on an orbital shaker for 30 min at RT. Samples were diluted in 800 μl of 0.1 M (NH4)HCO3and digested with 0.4 μg trypsin (Princeton Separations) per sample for 16 h at 37° C. and lyophilized. Lyophilized tryptic peptides were reconstituted in 350 μl of 0.01 M (NH4)HCO3and incubated on an orbital shaker for 15 min at RT, followed by reduction using 30 μl of 0.05 M TCEP and incubation for 1 h at RT and dilution in 375 μl of 90% water/10% acetonitrile/0.2% trifluoroacetic acid. The extraction plate (Empore C18, 3M Bioanalytical Technologies) was conditioned according to the manufacturer's protocol, and after sample loading were washed in 500 μl of 95% water/5% acetonitrile/0.1% trifluroacetic acid and eluted by 200 μl of 52% water/48% acetonitrile/0.1% trifluoroacetic acid into a collection plate. The eluate was split into 2 equal aliquots and was taken to dryness in a vacuum concentrator. One aliquot was used immediately for mass spectrometry, while the other was stored at −80° C. Samples were reconstituted in 12 μl of 90% water/10% acetonitrile/0.2% formic acid just prior to LC-SRM MS analysis.

[0472]

SRM-MS Analysis. Peptide samples were separated using a capillary reversed-phase LC column (Thermo BioBasic 18 KAPPA; column dimensions: 320 μm×150 mm; particle size: 5 μm; pore size: 300 Å) and a nano-HPLC system (nanoACQUITY, Waters Inc.). The mobile phases were (A) 0.2% formic acid in water and (B) 0.2% formic acid in acetonitrile. The samples were injected (8 μl) and separated using a linear gradient (98% A to 70% A) at 5 μl/minute for 19 min. Peptides were eluted directly into the electrospray source of the mass spectrometer (5500 QTrap LC/MS/MS, AB Sciex) operating in scheduled SRM positive-ion mode (Q1 resolution: unit; Q3 resolution: unit; detection window: 180 seconds; cycle time: 1.5 seconds). Transition intensities were then integrated by software MultiQuant (AB Sciex). An intensity threshold of 10,000 was used to filter out non-specific data and undetected transitions.

[0473]

Normalization and Calibration of Raw SRM-MS Data.

[0474]

Definition of Depletion Column Drift. Due to changes in observed signal intensity after repetitive use of each immunoaffinity column, the column's performance was assessed by quantifying the transition intensity in the control HPS samples. Assuming Ii,swas the intensity of transition i in an HPS sample s, the drift of the sample was defined as

[0475]


where Îiwas the mean value of Ii,samong all HPS samples that were depleted by the same column, and the median was taken over all detected transitions in the sample. The column variability, or drift, was defined as
driftcol=median(drifts>0)−median(drifts<0).  (S9)
Here the median was taken over all HPS samples depleted by the column. If no sample drift were greater or less than zero, the corresponding median was taken as 0. The median column drift was the median of drifts of all depletion columns used in the study.

[0476]

Identification of Endogenous Normalizing Proteins. The following criteria were used to identify a transition of a normalization protein: (A) possession of the highest median intensity of all transitions from the same protein; (B) detected in all samples; (C) ranking high in reducing median technical coefficient of variation (CV), i.e. median CV of transition intensities that were measured on HPS samples, as a normalizer; (D) ranking high in reducing median column drift that was observed in sample depletion; and (E) possession of low median technical CV and low median biological CV, i.e. median CV of transition intensities that were measured on clinical samples. Six endogenous normalizing proteins were identified and are listed in Table 33.

[0477]

List of endogenous normalizing proteins
MedianMedian
NormalizingSEQ IDTechnical CVColumn Drift
ProteinTransitionNO(%)(%)
PEDF_HUMANLQSLFDSPDFSK_692.34_593.302825.86.8
MASP1_HUMANTGVITSPDFPNPYPK_816.92_258.10626.518.3
GELS_HUMANTASDFITK_441.73_710.40527.116.8
LUM_HUMANSLEDLQLTHNK_433.23_499.302927.116.1
C163A_HUMANINPASLDK_429.24_630.303026.614.6
PTPRJ_HUMANVITEPIPVSDLR_669.89_896.503127.218.2
Normalization by Panel of25.19.0
Transitions
Without Normalization32.323.8

[0478]

Normalization of Raw SRM-MS Data. Six normalization transitions were used to normalize raw SRM-MS data to reduce sample-to-sample intensity variations within same study. A scaling factor was calculated for each sample so that the intensities of the six normalization transitions of the sample were aligned with the corresponding median intensities of all HPS samples. Assuming that Ni,sis the intensity of a normalization transition i in sample s and {circumflex over (N)}ithe corresponding median intensity of all HPS samples, then the scaling factor for sample s is given by S/Ss, where

[0479]


is the median of the intensity ratios and Ŝ is the median of Ssover all samples in the study. Finally, for each transition of each sample, its normalized intensity was calculated as
Ĩi,s=Ii,S*Ŝ/Ss  (S11)
where Ii,swas the raw intensity.

[0480]

Calibration by Human Plasma Standard (HPS) Samples. For a label-free MS approach, variation on signal intensity between different experiments is expected. To reduce this variation, we utilized HPS samples as an external standard and calibrated the intensity between the discovery and validation studies. Assume that {hacek over (I)}i,sis the logarithmically transformed (base 2), normalized intensity of transition i in sample s, {hacek over (I)}i,disand {hacek over (I)}i,valare the corresponding median values of HPS samples in the discovery and the validation studies, respectively. Then the HPS corrected intensity is
Ĩi,s={hacek over (I)}i,sIi,val+{hacek over (I)}i,dis  (S12)

[0481]

Calculation of q-Values of Peptide and Protein Assays. In the development of SRM assays, it is important to ensure that the transitions detected correspond to the peptides and proteins they were intended to measure. Computational tools such as mProphet (15) enable automated qualification of SRM assays. We introduced a complementary strategy to mProphet that does not require customization for each dataset. It utilizes expression correlation techniques (16) to confirm the identity of transitions from the same peptide and protein with high confidence. In FIG. 16, a histogram of the Pearson correlations between every pair of transitions in the assay is presented. The correlation between a pair of transitions is obtained from their expression profiles over all samples in the discovery study. As expected, transitions from the same peptide are highly correlated. Similarly, transitions from different peptide fragments of the same protein are also highly correlated. In contrast, transitions from different proteins are not highly correlated, which enables a statistical analysis of the quality of a protein's SRM assay.

[0482]

To determine the false positive assay rate we calculated the q-values (17) of peptide SRM assays. Using the distribution of Pearson correlations between transitions from different proteins as the null distribution (FIG. 16), an empirical p-value was assigned to a pair of transitions from the same peptide, detected in at least five common samples. A value of ‘NA’ is assigned if the pair of transitions was detected in less than five common samples. The empirical p-value was converted to a q-value using the “qvalue” package in Bioconductor. We calculated the q-values of protein SRM assays in the same way except Pearson correlations of individual proteins were calculated as those between two transitions from different peptides of the protein. For proteins not having two peptides detected in five or more common samples, their q-values could not be properly evaluated and were assigned ‘NA’. If the correlation of transitions from two peptides from the same protein is above 0.5 then there was less than a 3% probability that the assay is false.

[0483]

Most 36 cooperative proteins are shown in table below.

[0484]

OfficialCoopera-Coeffi-
ProteinGenetivePartialcientTransition for
Category(UniProt)NameScoreAUCCVFrequencyQuantitation
ClassifierTSP1_THBS11.80.250.2459GFLLLASLR_
HUMAN495.31_559.40
ClassifierCOIA1_COL18A13.70.160.2591AVGLAGTFR_
HUMAN446.26_721.40
ClassifierISLR_ISLR1.40.320.2564ALPGTPVAS
HUMANSQPR_640.85_
841.50
ClassifierTETN_CLEC3B2.50.260.2667LDTLAQEVA
HUMANLLK_657.39_
330.20
ClassifierFRIL_FTL2.80.310.2653LGGPEAGLG
HUMANEYLFER_804.40_
913.40
ClassifierGRP78_HSPA51.40.270.2740TWNDPSVQQ
HUMANDIK_715.85_
260.20
ClassifierALDOA_ALDOA1.30.260.2888ALQASALK_
HUMAN401.25_617.40
ClassifierBGH3TGFBI1.80.210.2869LTLLAPLNS
HUMANVFK_658.40_
804.50
ClassifierLG3BP_LGAL53BP4.30.290.2976VEIFYR_413.73_
HUMAN598.30
ClassifierLRP1_LRP14.00.130.3293TVLWPNGLS
HUMANLDIPAGR_
855.00_400.20
ClassifierFIBA_FGA1.10.310.3511NSLFEYQK_
HUMAN514.76_714.30
ClassifierPRDX1_PRDX11.50.320.3768QITVNDLPV
HUMANGR_606.30_
428.30
ClassifierGSLG1_GLG11.20.340.4523IIIQESALDY
HUMANR_660.86_
338.20
RobustKIT_KIT1.40.330.4628YVSELHLTR_
HUMAN373.21_263.10
RobustCD14_CD144.00.330.4873ATVNPSAPR_
HUMAN456.80_527.30
RobustEF1A1_EEF1A11.20.320.5652IGGIGTVPVG
HUMANR_513.30_
428.30
RobustTENX_TNXB1.10.300.5622YEVTVVSVR_
HUMAN526.29_759.50
RobustAIFM1_AIFM11.40.320.706ELWFSDDPN
HUMANVTK_725.85_
558.30
RobustGGH_GGH1.30.320.8143YYIAASYVK_
HUMAN539.28_638.40
RobustIBP3_IGFBP33.40.321.8258FLNVLSPR_
HUMAN473.28_685.40
RobustENPL_HSP90B11.10.295.9022SGYLLPDTK_
HUMAN497.27_460.20
Non-ERO1A_ERO1L6.2VLPFFERPDF
RobustHUMANQLFTGNK_
685.70_318.20
Non-6PGD_PGD4.3LVPLLDTGDI
RobustHUMANIIDGGNSEYR_
1080.60_
897.40
Non-ICAM1_ICAM13.9VELAPLPSW
RobustHUMANQPVGK_760.93_
342.20
Non-PTPA_PPP2R42.1FGSLLPIHPV
RobustHUMANTSG_662.87_
807.40
Non-NCF4_NCF42.0GATGIFPLSF
RobustHUMANVK_618.85_
837.50
Non-SEM3G_SEMA3G1.9LFLGGLDAL
RobustHUMANYSLR_719.41_
837.40
Non-1433T_YWHAQ1.5TAFDEAIAEL
RobustHUMANDTLNEDSYK_
1073.00_
748.40
Non-RAP2B_RAP2B2B1.5VDLEGER_
RobustHUMAN409.21_603.30
Non-MMP9_MMP91.4AFALWSAVT
RobustHUMANPLTFTR_
840.96_290.20
Non-FOLH1_FOLH11.3LGSGNDFEV
RobustHUMANFFQR_758.37_
825.40
Non-GSTP1_GSTP11.3ALPGQLKPF
RobustHUMANETLLSQNQG
GK_709.39_
831.40
Non-EF2_EEF21.3FSVSPVVR_
RobustHUMAN445.76_470.30
Non-RAN_RAN1.2LVLVGDGGT
RobustHUMANGK_508.29_
591.30
Non-SODM_SOD21.2NVRPDYLK_
RobustHUMAN335.52_260.20
Non-DSG2_DSG21.1GQIIGNFQAF
RobustHUMANDEDTGLPAH
AR_753.04_
299.20
P
Value
(Mann-TransitionPeptidePredicted
SEQWhitneyforQTissueConcentration
CategoryID NOtest)Qualification ValueCandidate(ng/ml)
Classifier220.23GFLLLASL1.90E-05▪510
R_495.31_
318.20
Classifier110.16AVGLAGTF6.70E-0435
R_446.26_
551.30
Classifier140.74ALPGTPVA4.40E-03
SSQPR_640.85_
440.30
Classifier200.14LDTLAQEV3.70E-0558000
ALLK_657.39_
871.50
Classifier240.19LGGPEAGL4.30E-05Secreted,12
GEYLFER_Epi,
804.40_525.30Endo
Classifier230.44TWNDPSV1.80E-03Secreted,100
QQDIK_715.85_Epi,
288.10Endo
Classifier70.57ALQASALK_3.70E-05Secreted,250
401.25_489.30Epi
Classifier80.57LTLLAPLN1.40E-04140
SVFK_658.40_
875.50
Classifier250.45VEIFYR_413.73_2.80E-05Secreted440
485.30
Classifier150.26TVLWPNGL1.40E-04Epi20
SLDIPAGR_
855.00_605.30
Classifier260.57NSLFEYQK_1.90E-05130000
514.76_315.20
Classifier160.24QITVNDLP1.90E-05Epi60
VGR_606.30_
770.40
Classifier270.27IIIQESALD6.70E-03Epi,
YR_660.86_Endo
724.40
Robust320.27YVSELHLT2.40E-038.2
R_373.21_
526.30
Robust330.72ATVNPSAP4.30E-04Epi420
R_456.80_
386.20
Robust340.53IGGIGTVPV4.50E-04Secreted,61
GR_513.30_Epi
628.40
Robust20.54YEVTVVSV1.10E-03Endo70
R_526.29_
660.40
Robust350.20ELWFSDDP3.70E-02Epi,1.4
NVTK_725.85_Endo
875.40
Robust360.24YYIAASYV1.70E-03250
K_539.28_
567.30
Robust40.04FLNVLSPR_2.80E-055700
473.28_359.20
Robust370.57SGYLLPDT1.10E-03Secreted,88
K_497.27_Epi,
573.30Endo
Non-380.06VLPFFERP1.20E-02Secreted,
RobustDFQLFTGNEpi,
K_685.70_Endo
419.20
Non-390.03LVPLLDTG5.50E-03Epi,29
RobustDIIIDGGNSEndo
EYR_1080.60_
974.50
Non-400.31VELAPLPS2.80E-0271
RobustWQPVGK_
760.93_413.20
Non-410.26FGSLLPIHP1.90E-03Endo3.3
RobustVTSG_662.87_
292.10
Non-420.11GATGIFPLS7.90E-04Endo
RobustFVK_618.85_
690.40
Non-430.20LFLGGLDA1.10E-03
RobustLYSLR_719.41_
538.30
Non-440.69TAFDEAIA1.10E-02Epi180
RobustELDTLNED
SYK_1073.00_
969.50
Non-450.34VDLEGER_1.20E-03Epi
Robust409.21_361.20
Non-460.36AFALWSA4.00E-0328
RobustVTPLTFTR_
840.96_589.30
Non-470.06LGSGNDFE5.80E-03
RobustVFFQR_758.37_
597.30
Non-480.46ALPGQLKP1.70E-04Endo32
RobustFETLLSQN
QGGK_709.39_
261.20
Non-490.79FSVSPVVR_1.10E-02Secreted,30
Robust445.76_557.30Epi
Non-500.27LVLVGDG2.80E-03Secreted,4.6
RobustGTGK_508.29_Epi
326.20
Non-510.86NVRPDYLK_2.40E-02Secreted7.1
Robust335.52_423.30
Non-520.08GQIIGNFQ5.70E-03Endo2.7
RobustAFDEDTGL
PAHAR_753.04_
551.30

[0485]

A P-classifier using the same steps for the 13-protein classifier derivation (see Table 28 and Materials and Methods in Example 9) except that the Mann Whitney p-value was used in place of cooperative score was also derived.

[0486]

P-Classifiers
P
ValueCoeffi-
OfficialSEQ(Mann -cientCoeffi-Cooper-
ProteinGeneIDWhitney(α =cientative
Category(UniProt)NameTransition for QuantitationNOtest)27.24)CVProtein
P-FRIL_HUMANFTLLGGPEAGLGEYLFER_804.40_913.40240.190.390.21Yes
Classifier
P-TSP1_HUMANTHBS1GFLLLASLR_495.31_559.40220.230.480.21Yes
Classifier
P-LRP1_HUMANLRP1TVLWPNGLSLDIPAGR_855.00_400.20150.26−0.810.22Yes
Classifier
P-PRDX1_HUMANPRDX1QITVNDLPVGR_606.30_428.30160.24−0.510.24Yes
Classifier
P-TETN_HUMANCLEC3BLDTLAQEVALLK_657.39_330.20200.14−1.080.27Yes
Classifier
P-TBB3_HUMANTUBB3ISVYYNEASSHK_466.60_458.20190.08−0.210.29No
Classifier
P-COIA1_HUMANCOL18A1AVGLAGTFR_446.26_721.40110.16−0.720.29Yes
Classifier
P-GGH_HUMANGGHYYIAASYVK_539.28_638.40360.240.740.33Yes
Classifier
P-A1AG1_HUMANORM1YVGGQEHFAHLLILR_584.99_263.10530.270.300.36No
Classifier
RobustAIFM1_HUMANAIFM1ELWFSDDPNVTK_725.85_558.30350.20Yes
RobustAMPN_HUMANANPEPDHSAIPVINR_374.54_402.20540.16No
RobustCRP_HUMANCRPESDTSYVSLK_564.77_347.20550.17No
RobustGSLG1_HUMANGLG1IIIQESALDYR_660.86_338.20270.27Yes
RobustIBP3_HUMANIGFBP3FLNVLSPR_473.28_685.4040.04Yes
RobustKIT_HUMANKITYVSELHLTR_373.21_263.10320.27Yes
RobustNRP1_HUMANNRP1SFEGNNNYDTPELR_828.37_514.30560.22No
Non-6PGD_HUMANPGDLVPLLDTGDIIIDGGNSEYR_1080.60_390.03Yes
Robust897.40
Non-CH10_HUMANHSPE1VLLPEYGGTK_538.80_751.40570.07No
Robust
Non-CLIC1_HUMANCLIC1FSAYIK_364.70_581.3090.14No
Robust
Non-COF1_HUMANCFL1YALYDATYETK_669.32_827.40580.08No
Robust
Non-CSF1_HUMANCSF1ISSLRPQGLSNPSTLSAQPQLSR_590.23No
Robust813.11_600.30
Non-CYTB_HUMANCSTBSQVVAGTNYFIK_663.86_315.20600.16No
Robust
Non-DMKN_HUMANDMKNVSEALGQGTR_509.27_631.40610.17No
Robust
Non-DSG2_HUMANDSG2GQIIGNFQAFDEDTGLPAHAR_520.08Yes
Robust753.04_299.20
Non-EREG_HUMANEREGVAQVSITK_423.26_448.30620.16No
Robust
Non-ERO1A_HUMANERO1LVLPFFERPDFQLFTGNK_685.70_380.06Yes
Robust318.20
Non-FOLH1_HUMANFOLH1LGSGNDFEVFFQR_758.37_825.40470.06Yes
Robust
Non-ILEU_HUMANSERPINB1TYNFLPEFLVSTQK_843.94_379.20630.09No
Robust
Non-K1C19_HUMANKRT19FGAQLAHIQALISGIEAQLGDVR_640.17No
Robust803.11_274.20
Non-LYOX_HUMANLOXTPILLIR_413.28_514.40650.22No
Robust
Non-MMP7_HUMANMMP7LSQDDIK_409.72_705.30660.23No
Robust
Non-NCF4_HUMANNCF4GATGIFPLSFVK_618.85_837.50420.11Yes
Robust
Non-PDIA3_HUMANPDIA3ELSDFISYLQR_685.85_779.40670.04No
Robust
Non-PTGIS_HUMANPTGISLLLFPFLSPQR_665.90_340.30680.06No
Robust
Non-PTPA_HUMANPPP2R4FGSLLPIHPVTSG_662.87_807.40410.26Yes
Robust
Non-RAN_HUMANRANLVLVGDGGTGK_508.29_591.30500.27Yes
Robust
Non-SCF_HUMANKITLGLFTPEEFFR_593.30_261.20690.16No
Robust
Non-SEM3G_HUMANSEMA3GLFLGGLDALYSLR_719.41_837.40430.20Yes
Robust
Non-TBA1B_HUMANTUBA1lBAVFVDLEPTVIDEVR_851.50_928.50700.15No
Robust
Non-TCPA_HUMANTCP1IHPTSVISGYR_615.34_251.20710.17No
Robust
Non-TERA_HUMANVCPGILLYGPPGTGK_586.80_284.20720.29No
Robust
Non-TIMP1_HUMANTIMP1GFQALGDAADIR_617.32_717.40730.26No
Robust
Non-TNF12_HUMANTNFSF12AAPFLTYFGLFQVH_805.92_700.40740.29No
Robust
Non-UGPA_HUMANUGP2LVEIAQVPK_498.80_784.50750.08No
Robust

Example 10. XL2 ELISA Results

[0487]

Xpresys Lung has been developed to differentiate benign from malignant lung nodules. Xpresys Lung is a blood test for proteins that combines expertise in proteomics and computer science using large data sets. Mass spectrometry has been employed as a technology for molecular diagnostics for decades and recent advances in instrumentation allows measurement of hundreds of proteins at a time. Cancers secrete and shed proteins that are different from normal cells and some of these proteins circulate in the blood. InDi started with 388 protein candidates and blood samples stored from both patients with benign and malignant lung nodules. The initial analyses discovered and validated a predictor for benign nodules using a combination of 11 proteins. Xpresys Lung version one (XL1) provided significant performance over clinical risk factors physicians use to differentiate benign from malignant lung nodules. InDi has now completed further work with protocol-collected blood samples to refine a second version of Xpresys Lung (XL2) which is a robust test for determining which nodules are benign. This new version, XL2, improves on XL1 in four ways and these are: 1) a refined intended user population; 2) the identification of 2 of the prior 11 proteins that are most accurate in identifying benign lung nodules; 3) the incorporation of five clinical risk factors; and 4) discovery and validation based on two large prospective studies where samples were collected using a uniform protocol rather than archival biobanks.

[0488]

XL2 is intended for the evaluation of 8-30 mm lung nodules in patients 40 years or older where the physician estimates a lower cancer risk (pretest probability of cancer is 0 to 50%). The goal for Xpresys Lung is to identify those nodules that are likely benign so those nodules can be safely observed by CT surveillance rather than undergo costly and risky invasive procedures such as biopsy and surgery.

[0489]

The current study incorporates results for the two proteins used in XL2, C163A and LG3BP, using multiple reaction monitoring mass spectrometry (MRM MS) compared to ELISA measurements. Protein measurements from the two techniques are compared using correlation and statistical analysis.

[0490]

MRM MS: The eighteen plasma samples used in this study were analyzed by multiple reaction monitoring mass spectrometry (MRM MS). Each plasma sample was analyzed five times in order to generate a mean XL2 result.

[0491]

ELISA: The human soluble CD163 ELISA kit was purchased from CUASBIO, catalog number CSB-E14050h through the American Research Product Incorporated, Waltham, MA 02452. The human Galectin 3BP ELISA kit, catalog number ab213784, was purchased from Abcam, Cambridge, MA 02139.

[0492]

Plasma samples were analyzed according to manufacturers' protocols. A sevenpoint standard curve was generated in duplicate ranging from 100 ng/mL to 1.56 ng/mL for the human soluble CD163 protein and from 4,000 pg/mL to 62.5 pg/mL for the human Galectin 3BP protein. Negative controls were also created in duplicate. Plasma samples were thawed, and diluted using the sample diluent supplied with each ELISA kit to create sufficient sample volume to assess in duplicate. After addition of the diluted samples to the plate the human soluble CD163 ELISA plate was incubated for 2 hours at 37° C. and the human Galectin-3BP ELISA plate was incubated for 90 minutes at 37° C. Following the incubation the plate contents were discarded and 100 μL of the biotinylated detection antibody was added to each well on the ELISA plate and incubated for 60 minutes at 37° C. Following incubation the plate contents were discarded and plates were washed 3 times with 200 μL the appropriate wash buffer. After washing, 100 μL of the avidin detection reagent was added to each well and incubated for 1 hour at 37° C. for the human soluble CD163 ELISA plate and for 30 minutes at 37° C. for the human Galectin-3BP ELISA plate. Following incubation, the plate contents were discarded and the plates washed 5 times with 200 μL of wash buffer. Following wash 90 μL of TMB substrate was added to each well of the ELISA plates and the plates were developed for 15 to 30 minutes until a sufficient number of the samples were detected by the presence of the blue substrate indicator. The developing reaction was then stopped by adding 100 μL of the stop solution to each well to quench the reaction. The plates were then read on a Molecular Devices Spectra Max 190 UV/Vis plate reader at 450 nm and 540 nm within 30 minutes of stopping the reaction. Throughout the entire process care was taken to avoid allowing the ELISA plates to dry out between washes or addition of reagents.

[0493]

Results

[0494]

XL2 is defined as:

[0495]

[0496]

Where t=0.38 and is the threshold for the reversal score, Age is the age of the subject in years, Smoker is 1 if the subject is a former or current smoker (otherwise 0), Diameter is the size of the lung nodule in mm, Spiculation is 1 if the lung nodule is speculated (otherwise 0), and Location is 1 if the lung nodule is located in an upper lung lobe (otherwise 0).

[0497]

In this analysis we focus only on the reversal score, defined as

[0498]


as the clinical factors contained in X will not influence the comparison of the results.

[0499]

FIG. 17 shows the comparison of the MRM MS and ELISA data. The thick horizontal line indicates the XL2 threshold t of 0.38. The thick dashed line indicates a hypothetical threshold for the ELISA data. The data points in the lower left quadrant and the upper right quadrant show concordance between the MRM MS and ELISA methods. Using these two thresholds to compare the results we observe that 16/18 (89%) are concordant between the two methods. The results of the Fisher's Exact test for agreement between the MRM MS and ELISA results are p=0.0077, thus showing the significance of the concordance.

Example 11. XL1 and XL2 Alternative Assessment Testing (AAT) Characterization Study Design

[0500]

Definitions.

[0501]

Acceptable Range: Reference result+/−3 standard deviations.
XL1Wcalibrated: WCalibrated=W−WMedian_batch_pc+Wcalibration factor.

[0502]

Characterization: Establishing the mean and standard deviation of a sample's XL1 Wcalibratedand XL2 Reversal Score from the analysis of at least 3 aliquots.

[0503]

XL2 Reversal Score:

[0504]

[0505]

XL1: Xpresys Lung test version 1.

[0506]

XL2: Xpresys Lung test version 2.

[0507]

Sample Selection for Characterization.

[0508]

A set of 18 samples meeting the following criteria are selected for characterization. Samples selected for characterization must have a residual volume of at least 1 mL to be used for replicate testing during characterization and future use in AAT events. The list of selected samples are included in the final report.

[0509]

XL1 sample selection. Previously analyzed samples collected after 1 Jun. 2015 with a XL1 Wcalibratedbetween −2.83 and 2.93 (±3 standard deviations of the mean of the historical Wcalibrateddistribution in FIG. 18) are eligible to be selected for characterization. XL2 Sample Selection. Previously analyzed samples collected after 1 Jun. 2015 with a XL2 Reversal Score between −1.08 and 3.49 (±3 standard deviations of the mean of the historical XL2 Reversal Score distribution in FIG. 19) are eligible to be selected for characterization.

[0510]

Characterization Process.

[0511]

Characterization are performed in a clinical LIMS study for tracking purposes. Samples selected for characterization are accessioned into the characterization clinical study in the LIMS system. A minimum of seven 80 microliter aliquots of each selected sample are accessioned.

[0512]

Analysis of characterization study samples follows established SOPs for the XL1 assay. At least 3 aliquots of each sample are processed in separate batches on separate depletion columns (i.e. no two aliquots of the same sample will be processed in the same batch or on the same column). A randomized sample processing order for each batch are generated by QA after sample selection and are included in the final study report. Each batch of the characterization study can be processed on the same depletion column used to process commercial or other clinical samples, however commercial and clinical study samples cannot be processed within an AAT characterization batch.

[0513]

XL1 Wcalibratedfor at least three aliquots are averaged and the mean and standard deviation of the XL 1 Wcalibratedare used to determine suitability for use in the AAT sample archive. The mean of the results defines the reference result for each AAT sample. The acceptable range (the maximum upper and lower limits for Wcalibrated[Wcalibrated,UL and Wcalibrated,LL, respectively]) is defined as three standard deviations on either side of the reference result. However, because of the small sample size, a minimum standard deviation for Wcalibratedis set at σw=0.1927476. This minimum value is based on the standard deviation of fifteen replicate Positive Control samples that were part of the Xpresys Lung analytical validation study. A standard deviation smaller than this not expected and would be the result of under sampling during characterization.

[0514]

XL2 Reversal Scores for at least three aliquots are averaged and the mean and standard deviation of the XL2 Reversal Scores is used to determine suitability for used in the AAT sample archive. The mean of the results defines the reference result for each AAT sample. The acceptable range (the maximum upper and lower limits for Reversal Score [RSUL and RSLL, respectively]) is defined as three standard deviations on either side of the reference result. However, because of the small sample size, a minimum standard deviation for the XL2 Reversal Score is set at 0.216887. This minimum value is based on the standard deviation of fifteen replicate Positive Control samples that were part of the Xpresys Lung analytical validation study. A standard deviation smaller than this not expected and would be the result of under sampling during characterization.

[0515]

Acceptance Criteria.

[0516]

The Technical Supervisor and Quality Assurance will review the final results in order to select samples for use in the AAT archive. To be eligible for the AAT archive, the following general acceptance criteria must be met: (1) Samples tested must pass quality control as defined in approved SOPs; (2) At least 2 aliquots of 80 microliters must remain after characterization testing is complete; and (3) At least 3 aliquots must be acceptable for use in the following calculations.

[0517]

In addition to the general acceptance criteria above, the following acceptance criteria apply to XL1: the maximum standard deviation for Wcalibratedmust be less than σW=0.3855. This maximum value for σw is based on twice the standard deviation of fifteen replicate Positive Control samples that were part of the Xpresys Lung analytical validation study. A standard deviation larger than this not expected and would be the result of under sampling during characterization.

[0518]

In addition to the general acceptance criteria above, the following acceptance criteria apply to XL2: The maximum standard deviation for the Reversal Score must be less than uw =0.4338. This maximum value for aw is based on twice the standard deviation of fifteen replicate Positive Control samples that were part of the Xpresys Lung analytical validation study. A standard deviation larger than this not expected and would be the result of under sampling during characterization.

[0519]

Sample Storage Plan.

[0520]

All samples selected for use in the AAT sample archive are stored in a separate sample storage box in a −80° C. freezer. Access to this storage are limited to laboratory personnel and quality assurance.

REFERENCES

[0000]

  • 1. Albert & Russell Am Fam Physician 80:827-831 (2009)
  • 2. Gould et al. Chest 132:108S-130S (2007)
  • 3. Kitteringham et al. J Chromatrog B Analyt Technol Biomed Life Sci 877:1229-1239 (2009)
  • 4. Lange et al. Mol Syst Biol 4:222 (2008)
  • 5. Lehtio & De Petris J Proteomics 73:1851-1863 (2010)
  • 6. MacMahon et al. Radiology 237:395-400 (2005)
  • 7. Makawita Clin Chem 56:212-222 (2010)
  • 8. Ocak et al. Proc Am Thorac Soc 6:159-170 (2009)
  • 9. Ost, D. E. and M. K. Gould, Decision making in patients with pulmonary nodules. Am J Respir Crit Care Med, 2012. 185(4): p. 363-72.
  • 10. Cima, I., et al., Cancer genetics-guided discovery of serum biomarker signatures for diagnosis and prognosis of prostate cancer. Proc Natl Acad Sci USA, 2011. 108(8): p. 3342-7.
  • 11. Desiere, F., et al., The PeptideAtlas project. Nucleic Acids Res, 2006. 34 (Database issue): p. D655-8.
  • 12. Farrah, T., et al., A high-confidence human plasma proteome reference set with estimated concentrations in PeptideAtlas. Mol Cell Proteomics, 2011. 10(9): p. M110 006353.
  • 13. Omenn, G. S., et al., Overview of the HUPO Plasma Proteome Project: results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database. Proteomics, 2005. p. 3226-45.
  • 14. Kearney, P., et al., Protein identification and Peptide expression resolver: harmonizing protein identification with protein expression data. J Proteome Res, 2008. 7(1): p. 234-44.
  • 15. Huttenhain, R., et al., Reproducible quantification of cancer-associated proteins in body fluids using targeted proteomics. Sci Transl Med, 2012. 4(142): p. 142ra94.
  • 16. Henschke, C. I., et al., CT screening for lung cancer: suspiciousness of nodules according to size on baseline scans. Radiology, 2004. 231(1): p. 164-8.
  • 17. Henschke, C. I., et al., Early Lung Cancer Action Project: overall design and findings from baseline screening. Lancet, 1999. 354(9173): p. 99-105.
  • 18. States, D. J., et al., Challenges in deriving high-confidence protein identifications from data gathered by a HUPO plasma proteome collaborative study. Nat Biotechnol, 2006. 24(3): p. 333-8.
  • 19. Polanski, M. and N. L. Anderson, A list of candidate cancer biomarkers for targeted proteomics. Biomark Insights, 2007. 1: p. 1-48.
  • 20. Krogh, A., et al., Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol, 2001. 305(3): p. 567-80.
  • 21. Bendtsen, J. D., et al., Improved prediction of signal peptides: SignalP 3.0. J Mol Biol, 2004. 340(4): p. 783-95.
  • 22. Bendtsen, J. D., et al., Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng Des Sel, 2004. 17(4): p. 349-56.
  • 23. Lange, V., et al., Selected reaction monitoring for quantitative proteomics: a tutorial. Mol Syst Biol, 2008. 4: p. 222.
  • 24. Picotti, P., et al., High-throughput generation of selected reaction-monitoring assays for proteins and proteomes. Nat Methods, 2010. 7(1): p. 43-6.
  • 25. Mallick, P., et al., Computational prediction of proteotypic peptides for quantitative proteomics. Nat Biotechnol, 2007. 25(1): p. 125-31.
  • 26. Perkins, D. N., et al., Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis, 1999. 20(18): p. 3551-67.
  • 27. Hastie, T., R. Tibshirani, and J. H. Friedman, The elements of statistical learning: data mining, inference, and prediction: with 200 full-color illustrations. Springer series in statistics. 2001, New York: Springer. xvi, 533 p.
  • 28. McClish, D. K., Analyzing a portion of the ROC curve. Med Decis Making, 1989. 9(3): p. 190-5.
  • 29. X.-J. Li, C. Hayward, P.-Y. Fong, M. Dominguez, S. W. Hunsucker, L. W. Lee, M. McLean, S. Law, H. Butler, M. Schirm, O. Gingras, J. Lamontagne, R. Allard, D. Chelsky, N. D. Price, S. Lam, P. P. Massion, H. Pass, W. N. Rom, A. Vachani, K. C. Fang, L. Hood and P. Kearney, “A Blood-Based Proteomic Classifier for the Molecular Characterization of Pulmonary Nodules,” Science Translational Medicine, vol. 5, no. 207, p. 207ra142, 2013.
  • 30. A. Vachani, H. I. Pass, W. N. Rom, D. E. Medthun, E. S. Edell, M. Laviolette, X.-J. Li, P.-Y. Fong, S. W. Hunsucker, C. Hayward, P. J. Mazzone, D. K. Madtes, Y. E. Miller, M. G. Walker, J. Shi, P. Kearney, K. C. Fang and P. P. Massion, “Validation of a Multiprotein Plasma Classifier to Identify Benign Lung Nodules,” Journal of Thoracic Oncology, vol. 10, no. 4, pp. 629-637, 2015.

Как компенсировать расходы
на инновационную разработку
Похожие патенты