WO2003021229A2 - Diagnostic and prognostic tests - Google Patents

Diagnostic and prognostic tests Download PDF

Info

Publication number
WO2003021229A2
WO2003021229A2 PCT/US2002/028203 US0228203W WO03021229A2 WO 2003021229 A2 WO2003021229 A2 WO 2003021229A2 US 0228203 W US0228203 W US 0228203W WO 03021229 A2 WO03021229 A2 WO 03021229A2
Authority
WO
WIPO (PCT)
Prior art keywords
expression
nucleic acid
genes
gene
cancer
Prior art date
Application number
PCT/US2002/028203
Other languages
French (fr)
Other versions
WO2003021229A3 (en
Inventor
Gavin J. Gordon
Roderick V. Jensen
Steven R. Gullans
Raphael Bueno
Original Assignee
The Brigham And Women's Hospital, Inc.
Wesleyan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Brigham And Women's Hospital, Inc., Wesleyan University filed Critical The Brigham And Women's Hospital, Inc.
Priority to AU2002324881A priority Critical patent/AU2002324881A1/en
Priority claimed from US10/236,031 external-priority patent/US7622260B2/en
Publication of WO2003021229A2 publication Critical patent/WO2003021229A2/en
Publication of WO2003021229A3 publication Critical patent/WO2003021229A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/574Immunoassay; Biospecific binding assay; Materials therefor for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6834Enzymatic or biochemical coupling of nucleic acids to a solid phase
    • C12Q1/6837Enzymatic or biochemical coupling of nucleic acids to a solid phase using probe arrays or probe chips
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/52Predicting or monitoring the response to treatment, e.g. for selection of therapy based on assay results in personalised medicine; Prognosis

Definitions

  • the invention relates to methods for diagnosing conditions, predicting prognoses and optimizing treatment strategies using ratios of gene expression data.
  • the invention also relates to nucleic acid markers for cancer, particularly for distinguishing malignant pleural mesothelioma from other lung cancers or from normal lung tissue, and for distinguishing between subclasses of malignant pleural mesothelioma.
  • Subclassification of cancer has typically relied on the grouping of tumors based on tissue of origin, histology, cytogenetics, immunohistochemistry, and known biological behavior. The pathologic diagnosis used to classify the tumor taken together with the stage of the cancer is then used to predict prognosis and direct therapy.
  • current methods of cancer classification and staging are not completely reliable.
  • MPM malignant pleural mesothelioma
  • RT-PCR real time quantitative reverse-transcriptase polymerase chain reaction
  • diagnostic assays for classification of tumors, prediction of tumor outcome, selecting and monitoring treatment regimens, and monitoring tumor progression/regression can now be based on the ratios of expression of a small number of genes.
  • the gene expression ratio concept can be applied to other tissues to diagnose or distinguish between tissues in different biological states, such as tissues from subjects having disease and not having disease, subjects that vary in response to pharmaceutical or that metabolize pharmaceutical at different rates, subjects that vary is disease susceptibility or predisposition, and the like.
  • a subject's prognosis or response to treatments can be determined through analysis of a limited set of genes in particular biological samples.
  • the gene expression data can be obtained from, and comparisons can be made between, a number of different methods including nucleic acid hybridization (e.g., microarrays) and nucleic acid amplification methods (e.g., RT-PCR).
  • nucleic acid hybridization e.g., microarrays
  • nucleic acid amplification methods e.g., RT-PCR
  • methods for diagnosing the presence in a biological sample of tissue in a first biological state preferably cancer cells, in a tissue sample.
  • the methods include providing a set of two or more genes, wherein the set comprises at least one upregulated gene that is expressed in greater amounts in a tissue in a first biological state (preferably cancer cells) than in a second biological state (preferably corresponding non-cancer cells) and at least one downregulated gene that is expressed in lesser amounts in a tissue in the first biological state (preferably cancer cells) than in the second biological state (preferably corresponding non-cancer cells)
  • the methods also include determining the expression levels of the set of two or more genes, and calculating a ratio of the expression level of the upregulated gene to the expression level of the downregulated gene, wherein the ratio is indicative of the presence of tissue in the first biological state (preferably cancer cells) in the tissue sample.
  • Another preferred diagnostic use for the method is to identify non-cancer tissues or diseases.
  • two or more expression ratios are calculated.
  • the two or more expression ratios are combined, preferably by calculating the geometric mean of the two or more expression ratios.
  • the ratio is calculated by division of the expression level of one upregulated gene by the expression level of one downregulated gene, or by division of the expression levels of two or more upregulated genes by the expression level of one downregulated gene, or by division of the expression level of one upregulated gene by the expression levels of two or more downregulated genes, or by division of the expression levels of two or more upregulated genes by the expression levels of two or more downregulated genes.
  • the methods also include transforming the expression level data for the upregulated and/or downregulated genes prior to calculating the ratio.
  • the expression levels are determined by a method selected from the group consisting of nucleic acid hybridization and nucleic acid amplification.
  • the nucleic acid hybridization is performed using a solid-phase nucleic acid molecule array.
  • the nucleic acid amplification method is real-time PCR.
  • the expression levels are determined by an immunological method, preferably using a solid-phase antibody array, an ELISA or ELISPOT assay.
  • the cancer is selected from the group consisting of malignant pleural mesothelioma, lung adenocarcinoma, squamous carcinoma, medulloblastoma, prostate cancer, breast cancer, diffuse large B-cell lymphoma, follicular lymphoma and ovarian cancer.
  • the least one ratio is indicative of the presence of cancer cells in the tissue sample. In other embodiments, the at least one ratio is indicative of the presence of non-cancer cells in the tissue sample.
  • kits for cancer diagnosis include a set of one or more ratios applicable to the analysis of gene expression data, wherein the ratio is calculated from the expression levels of at least one upregulated gene that is expressed in greater amounts in the cancer cells than in corresponding non-cancer cells and at least one downregulated gene that is expressed in lesser amounts in cancer cells than in corresponding non-cancer cells.
  • the kit also includes instructions for the use of the one or more ratios in the diagnosis of the presence of cancer cells in a biological sample.
  • diagnostic systems include a measurement device that measures gene expression level data of a set of two or more genes, wherein the set comprises at least one upregulated gene that is expressed in greater amounts in a tissue in a first biological state (preferably cancer cells) than in a second biological state (preferably corresponding non-cancer cells) and at least one downregulated gene that is expressed in lesser amounts in the tissue in the first biological state (preferably cancer cells) than in the second biological state (preferably corresponding non-cancer cells).
  • the system also includes a data transformation device that acquires the gene expression data from the measurement device and performs data transformation to calculate a ratio of the gene expression levels of the upregulated and downregulated genes.
  • the data transformation device selects gene expression data of a selected set of genes from the measurement device for calculating the ratio of the selected set of genes, wherein the ratio calculated from the gene expression data of the selected set of genes is diagnostic for a selected biological state, such as a condition, preferably cancer.
  • the cancer diagnostic system also includes a user interface output device to output the ratio to a user.
  • the cancer diagnostic system also includes a database of ratios of gene expression that are diagnostic for cancers, and a comparison device that compares the ratio calculated from the measured gene expression to the diagnostic ratios stored in the database and outputs the comparison to the user interface output device.
  • the cancer diagnostic system also includes a database of treatment information for specific cancers, wherein the comparison device identifies treatment information in the database for the specific cancer for which the diagnostic ratio matches the calculated ratio, and wherein the comparison outputs the treatment information to the user interface output device.
  • methods for diagnosing malignant pleural mesothelioma in a subject suspected of having malignant pleural mesothelioma include obtaining from the subject a tissue sample suspected of being cancerous, determining the expression of a set of nucleic acid molecules or expression products thereof in the tissue sample, wherein the set of nucleic acid molecules includes at least two nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77.
  • the set of nucleic acids includes at least 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 11, 18, 19, 20, 21 , 22, 23, 24 or 25 nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77.
  • the methods include determining the expression of the set of nucleic acid molecules or expression products thereof in a non-cancerous tissue sample, and comparing the expression of the set of nucleic acid molecules or expression products thereof in the tissue sample suspected of being cancerous and the non-cancerous tissue sample. In other embodiments, the methods include calculating a ratio of the expression of at least two genes among the set of nucleic acid molecules.
  • Methods for selecting a course of treatment of a subject having or suspected of having malignant pleural mesothelioma are provided in another aspect of the invention.
  • the methods include obtaining from the subject a tissue sample suspected of being cancerous, determining the expression of a set of nucleic acid markers or expression products thereof which are differentially expressed in malignant pleural mesothelioma tumor tissue samples, and selecting a course of treatment appropriate to the malignant pleural mesothelioma of the subject.
  • the methods also include calculating a ratio of the expression of at least two genes among the set of nucleic acid markers or expression products thereof.
  • the methods include determining the expression of the set of nucleic acid molecules or expression products thereof in a non-cancerous tissue sample.
  • the expression of a set of nucleic acid markers is determined by a method selected from the group consisting of nucleic acid hybridization and nucleic acid amplification. More preferably, the nucleic acid hybridization is performed using a solid-phase nucleic acid molecule array, and the the nucleic acid amplification method is real-time PCR.
  • methods for evaluating treatment of malignant pleural mesothelioma include obtaining a first determination of the expression of a set of nucleic acid molecules, or expression products thereof, which are differentially expressed in an malignant pleural mesothelioma tumor tissue sample from a subject undergoing treatment for cancer, obtaining a second determination of the expression of the set of nucleic acid molecules, or expression products thereof, in a second malignant pleural mesothelioma tumor tissue sample from the subject after obtaining the first determination, and comparing the first determination of expression to the second determination of expression as an indication of evaluation of the treatment.
  • the determinations of expressions are used to calculate a ratio of gene expression.
  • the methods include determining the expression of a set of nucleic acid markers which are differentially expressed in non-cancerous tissue samples.
  • the expression of a set of nucleic acid markers is determined by a method selected from the group consisting of nucleic acid hybridization and nucleic acid amplification.
  • the nucleic acid hybridization is performed using a solid-phase nucleic acid molecule array and the nucleic acid amplification method is realtime PCR.
  • a solid-phase nucleic acid molecule array which consists essentially of at least two nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 fixed to a solid substrate.
  • the solid-phase nucleic acid molecule array also includes at least one control nucleic acid molecule.
  • the set of nucleic acid molecules comprises at least 3, 4, 5,
  • nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77.
  • the solid substrate comprises a material selected from the group consisting of glass, silica, aluminosilicates, borosilicates, metal oxides such as alumina and nickel oxide, various clays, nitrocellulose, or nylon.
  • the nucleic acid molecules are fixed to the solid substrate by covalent bonding.
  • solid-phase protein microa ⁇ ays include at least two antibodies or antigen-binding fragments thereof, that specifically bind at least two different polypeptides selected from the group consisting of SEQ ID NOs:10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, fixed to a solid substrate.
  • the microa ⁇ ay further comprises an antibody or antigen- binding fragment thereof, that binds specifically to a cancer-associated polypeptide other than those selected from the group consisting of SEQ ID NOs:10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78.
  • the protein microa ⁇ ay also includes at least one control polypeptide molecule.
  • the antibodies are monoclonal antibodies, or polyclonal antibodies.
  • Methods for identifying lead compounds for a pharmacological agent useful in the treatment of malignant pleural mesothelioma are provided in another aspect of the invention.
  • the methods include contacting a malignant pleural mesothelioma cell or tissue with a candidate pharmacological agent, determining the expression of a set of nucleic acid molecules in the malignant pleural mesothelioma cell or tissue sample under conditions which, in the absence of the candidate pharmacological agent, permit a first amount of expression of the set of nucleic acid molecules wherein the set of nucleic acid molecules comprises at least two nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, and detecting a test amount of the expression of the set of nucleic acid molecules, wherein a decrease in the test amount of expression in the presence of the candidate pharmacological agent relative to the first amount of expression indicates that the candidate pharmac
  • Fig. 1 shows tumor diagnosis using expression ratios.
  • Fig. 1 A patterns of relative expression levels for the 8 genes selected from the training set can be extended to the remaining samples. Relative expression levels increase from low to high per legend.
  • Fig. IB graphic depiction of the magnitude and direction, in all 149 samples comprising the test set, of the value for two independent ratios (calretinin/claudin-7 and VAC- ⁇ /TACSTDl) chosen for further study.
  • Fig. 1C the 8 individual samples (represented by colored bars) that were misdiagnosed using one ratio or the other from Fig. IB (blue bars for misdiagnosed MPM samples, red bars for misdiagnosed ADCA samples).
  • Fig. 2 depicts validation of microa ⁇ ay data and ratio based diagnosis. Quantitative RT-PCR was used to obtain ratio values for 12 MPM and 12 ADCA tumors. In this case, the two ratios co ⁇ ectly identified 23/24 samples with one no-call.
  • Fig. 4 shows the validation of microa ⁇ ay-based analysis of gene expression using real time quantitative RT-PCR.
  • Fig. 4A shows that the average expression levels of CFB, transgelin, and fibronectin are significantly (P ⁇ 0.05) different in tumor samples from Subclass 1 and Subclass 2.
  • Fig. 4B shows that the expression level ratios remain consistent in distinguishing epithelial tumor samples from all others using data obtained from either microarray analysis or RT-PCR. Ratio represents the average gene expression level in epithelial subtype tumors relative to the average expression level of all other tumors combined. E ⁇ or bars, SEM; M, data from microa ⁇ ay analysis; RT-PCR, data from quantitative RT-PCR analysis.
  • Fig. 4 shows the validation of microa ⁇ ay-based analysis of gene expression using real time quantitative RT-PCR.
  • FIG. 5 depicts prediction of outcome in MPM using expression ratios or tumor histology.
  • Fig. 5 A survival of 31 MPM patients whose outcome was predicted using a 4- gene expression ratio model.
  • Fig. 5B survival of the 31 MPM patient samples from Fig. 5A plus 5 additional samples (36 total) as a function of tumor histological subtype. Prediction of outcome using the geometric mean value of 3 expression ratios is more accurate than the use of histological appearance alone at identifying patients with widely divergent outcome (Fig.
  • FIG. 5A Although patients with epithelial histology tumors tend to survive longer, predicting prognosis in this manner is highly inexact for any individual patient (Fig. 5B). Each data point represents a single sample. Circles enclose tumor samples from patients with survival at or near the median for MPM. Horizontal bars depict median survival for each group. *, geometric mean calculated from the 3 most accurate expression ratios used to predict outcome (using data from a total of 4 genes).
  • Fig. 6 shows Kaplan-Meier survival predictions for mesothelioma patients and verification of microa ⁇ ay data.
  • Fig. 6A overall survival for all 31 patients from which the training set was chosen. The estimated median survival for entire cohort was 11 months.
  • Fig. 6C geometric mean values obtained for
  • Fig. 7 depicts independent validation of the 4-gene expression ratio model.
  • Fig. 7 A overall survival for 29 independent mesothelioma patients. Similar to the initial 31 samples, the estimated median survival for this cohort was 12 months.
  • Fig. 7 A overall survival for 29 independent mesothelioma patients. Similar to the initial 31 samples, the estimated median survival for this cohort was 12 months.
  • Fig. 9 shows Kaplan-Meier survival predictions of test set samples for adenocarcinoma patients as described in Example 5. Time to relapse for patients predicted to be good prognosis (top line) and poor prognosis (bottom line) using a 3-ratio model in the test set data of Bhattacharjee et al. Hash marks indicate censored data.
  • microa ⁇ ays have been used to distinguish between acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) (Golub, 1999), to explore molecular differences within the AML group of diseases (Nirtaneva, 2001), to identify subclasses of breast cancer (Perou, 2000) and ovarian carcinoma (Welsh, 2001), and to define the metastatic phenotype of melanoma (Clark, 2000).
  • AML acute myeloid leukemia
  • ALL acute lymphoblastic leukemia
  • microa ⁇ ay-based analysis of gene expression in cancer has yielded a wealth of information, effective clinical applications have not followed for several reasons.
  • There are no universally accepted and applicable computational methods to analyze microa ⁇ ay data (Quackenbush, 2001).
  • studies utilizing microa ⁇ ays have lacked a comprehensive clinical database linking patient characteristics to their tumors' gene expression patterns.
  • the prospect of having to use large numbers of genes to diagnose a disease subclass would require a relatively expensive analytical approach such as microarrays.
  • sophisticated computer algorithms cu ⁇ ently used for analysis of microa ⁇ ays have placed the practical use of the resulting data beyond the reach of many biomedical scientists.
  • Subclassification using unsupervised clustering also presents a more biologically relevant organization. It has been shown that similar tumor appearance in itself does not necessitate similar patterns of gene expression nor final clinical outcome. For example, it is not unusual for patients with lung cancers of identical histology, differentiation, location, and stage to have diverging survival (Mountain, 1997). In the experiments described herein, one subclass contained tumors of all three major histological subtypes, suggesting (i) that tumors of diverse appearance are more similar than originally thought, (ii) that all subtypes of MPM are co ⁇ ectly classified as a single disease, and (iii) that histology alone is not sufficient to determine prognosis.
  • ratios of gene expression are used as indicia of cancer type, cancer class, and/or cancer prognosis, all of which are useful for determining a course of treatment of a patient.
  • Changes in cell phenotype in cancer are often the result of one or more changes in the genome expression of the cell. Some genes are expressed in tumor cells, and not in normal cells. Other genes are expressed at higher or lower levels in cancer cells than in normal counte ⁇ arts. In addition, certain genes are expressed in different levels in different subgroups of cancers, which have different prognoses and require different treatment regimens to optimize patient outcome. The differential expression of such genes can be examined by the assessment of nucleic acid or protein expression in the cancer tissue.
  • microa ⁇ ays to measure simultaneously the expression of hundreds or thousands of genes.
  • Practical application of this technology requires that researchers or laboratories have a sophisticated knowledge of molecular biology to generated gene expression data, and of computer algorithms for analysis of the large quantities of data generated by the use of the microa ⁇ ays.
  • the requirements for such knowledge make the use of microa ⁇ ays impractical in the clinical setting, and difficult even for research laboratories.
  • the use of gene expression ratios in the diagnosis and prediction of prognosis in cancer overcomes several major obstacles to the clinical use of microa ⁇ ay data.
  • the methodology described herein avoids the technical difficulties described above. It generates a simple numerical measure that can be used to predict various aspects of patient clinical data (such as histological subtype and survival) using a single patient biopsy sample. Since this non-linear function of gene expression is a unit-less number, expression levels can be measured using any reliable method such as quantitative RT-PCR or microa ⁇ ays (nucleic acid or protein) regardless of the type of data capture equipment.
  • the present invention permits the diagnosis of cancer by clinical laboratories using standard equipment without the requirement for sophisticated data analysis.
  • the diagnostic/prognostic accuracy of ratios permits an earlier definitive diagnosis using initial biopsy samples and reveals important clues about anticipated patient outcome prior to the assignment of treatment strategies.
  • an initial diagnosis is usually made for patients presenting with a malignant pleural effusion. Typically, this diagnosis is confirmed prior to subjecting patients to major surgical resections.
  • standard pathological techniques for diagnosis even at this point may be inadequate due to a lack of suitable quantities of tissue.
  • the histological subtype of the tumor initially diagnosed may not always be the same as that conclusively determined at the time of surgery (samples analyzed in this study were obtained at surgery when sufficient amount of tissue was available for a definitive pathological diagnosis).
  • Ratios obtained using tumor tissues taken at the time of initial biopsy can provide a firm diagnosis, determine subclass, and predict outcome after therapy when cu ⁇ ent pathological techniques are insufficient.
  • the invention also provides a new, more powerful method of stratifying patients with MPM (and more generally, is applicable to other cancers and other biological states and conditions). It has been previously documented that patients with the epithelial subtype generally enjoy a better prognosis than patients with non-epithelial histology (regardless of treatment strategy) and benefit from aggressive surgical resection. However, this is not an all-inclusive phenomenon; some patients with non-epithelial histology enjoy a longer survival than those with epithelial histology. These factors make it difficult to design clinical studies to explore alternative treatment strategies based on histological subtype.
  • results presented herein provide a basis for at least one rational explanation of the aforementioned phenomenon: within MPM, there are actually two classes of epithelial tumors and two classes of mixed tumors.
  • a series of simple tests utilizing ratios of gene expression is proposed that can determine with a high degree of accuracy the co ⁇ ect tumor histological subtype/subclass, and the likely clinical outcome of the patient.
  • This information can be produced from a small tissue biopsy and does not require major surgery. Such classification is useful in the development of meaningful clinical trials in MPM.
  • ratios including different mathematical combinations and/or more than two genes.
  • the ratio concept described herein is simply the relationship between the expression levels of multiple genes that vary in expression between two different sample types, i.e., samples that have different biological properties or were obtained from subjects having different phenotypes, such as cancer/non- cancer phenotypes, responsive/non-responsive to stimuli, susceptible/not susceptible to disease, different metabolic functions, etc.
  • Non-linear unit-less ratios, in any form can still remain simple if a relatively small number of genes are used in such a way as to not require complex computational software. Therefore, expression ratios of selected genes that vary in expression in two different biological samples may be used to translate complex data sets into simple tests that give clinically useful information for the diagnosis and prediction of prognosis of cancers.
  • Ratios of gene expression levels can be calculated from expression data of two or more genes at the mRNA level and or protein level. Expression levels of two or more isoforms or variants of the same gene (e.g., splice variants or post-translationally modified variants) also can be used in the ratios. In contrast to prior methods for comparing gene expression, which compared the expression levels of genes relative to an gene having substantially unchanging expression (e.g., a housekeeping gene), the present method compares the expression of two or more genes that differ in expression between two (or more) biological states.
  • ratios are calculated from expression data of two or more genes, wherein one of the two or more genes is expressed at higher levels in a first biological state relative to the second biological state (upregulated in the first biological state), and a second of the two or more genes is expressed at lower levels in a second biological state relative to the first biological state (downregulated in the first biological state). Examples of this are demonstrated herein, wherein the expression levels of two or more genes that differ in expression in mesothelioma and normal tissue, or in subclasses of mesothelioma, are used to calculate ratios that effectively predict the phenotype of unknown tissue samples.
  • the ratios can be simple ratios (e.g., x/y) or more complex ratios tha include mathematical manipulation of gene expression levels, for example, (x+a)/( ⁇ +b) or x y 3 , wherein x and y represent the expression level data for genes X and Y, andk a and b can be either expression level data for genes A and B, or mathematical factors.
  • the use of the ratios is not limited to one set of two genes. Additional sets of genes (two sets, three sets, or more sets) may be required to provide an optimally accurate diagnosis of certain biological states or conditons (e.g., cancers) based on the expression of certain sets of gexies.
  • the methods are not limited to a ratio of two genes; a total of 4, 6, or more genes and various ratios of them may be used. Further transformation of the data in the form of multiple gene expression ratios also can be performed. In certain prefe ⁇ ed embodiments, the geometric mean of multiple gene ratios is calculated.
  • the expression data used co calculate the ratios may be obtained using any art-known method for analyzing gene expression including microarrays (e.g., standard or custom a ⁇ ays; nucleic acid, protein or antibody a ⁇ ays), quantitative RT-PCR, antibody or other immunoassay measuremeivts, etc.
  • the ratios can be used to diagnose any condition having a genetic component in which two or more genes are differentially expressed in two or more biological states.
  • Conditions include diseases, susceptibility to diseases, metabolic functions (e.g., variability in the metabolism of drugs), response to injury, responses to local cellular environments and the like.
  • the condition is a disease.
  • any diseases that are characterized by (1) the relative increase in the expression of a first gene in a first disease state, and (2) the relative increase in the expression of a second gene in a second disease state or nondisease state can be diagnosed using ratios of gene expression.
  • Prefe ⁇ ed examples of such diseases are cancer, as demonstrated herein for malignant pleural mesothelioma.
  • the ratios of gene expression also can be used to predict a condition outcome or condition prognosis, to monitor onset of a condition, to monitor treatment, and to select a course of treatment for a condition.
  • the gene expression data for calculation of the ratios may be obtained from analysis of biological samples including tissue, blood, urine, cerebrospinal fluid or other bodily fluids of a subject (e.g., humans or other animals).
  • the expression data can be used without any transformation to calculate a simple ratio of two or more genes as exemplified in the Examples, or data transformation can be applied prior to, or as a part of, calculating the ratios.
  • the ratio calculation and/or data transformation can be performed by the device that captures the expression data (e.g., a device for performing real-time PCR or a microa ⁇ ay reader), or can be performed by a separate computer running appropriate software.
  • software for calculating ratios as described herein can be provided on a computer connected by data link to a data generating device, such as a microa ⁇ ay reader or PCR machine.
  • a data generating device such as a microa ⁇ ay reader or PCR machine.
  • Any standard data link can be used, including serial or parallel cables, radio frequency or infrared telemetry links, LAN connections, WAN connections, etc.
  • data can be transfe ⁇ ed by computer-readable medium (e.g., magnetic or optical medium) and read by the software.
  • the data also can be entered directly by the user via user interface, such as a keyboard, monitor, mouse, graphical user interface such as touch screen, etc.
  • the computer may be contained within the data generating device, providing an integrated system for generating raw data, calculating ratios, and displaying such ratios.
  • One or more computers also may be linked to one or more data generating devices and one or more display devices, such as in a local area network or wide area network.
  • the data for the variables examined can be used to calculate gene expression ratios in accordance with the methods of the invention.
  • the software can allow the user to select a number of genes prefe ⁇ ed for diagnosis or prognosis, or the software may calculate ratios for a standardized set or sets of genes (e.g., genes known to be useful for classification of a tissue type or set of tissue types).
  • the software can execute data transformation algorithms from a preselected group, or can allow the user to input other algorithms.
  • the ratio data can be stored in a data file, printed, and/or directly displayed to the user on a graphical user interface.
  • a visual display is used to display the ratio data for the classification, diagnosis and or prediction of prognosis.
  • the visual display can be a graphical user interface, such as a monitor, or a printer.
  • the invention also relates to the identification of a set of genes that permit confirmation of the presence of malignant pleural mesothelioma cells in biological samples.
  • Probes for the expression of the genes can be inco ⁇ orated into a custom a ⁇ ay for diagnosis of malignant pleural mesothelioma.
  • the genes identified permit, ter alia, rapid screening of cancer samples by nucleic acid microa ⁇ ay hybridization or protein expression technology to determine the expression of the specific genes and thereby to predict the outcome of the cancer.
  • a microa ⁇ ay also can be used to diagnose malignant pleural mesothelioma, distinguish it from lung cancer (adenocarcinoma and squamous carcinoma), normal lung tissue and/or pleura.
  • Screening is beneficial, for example, in selecting the course of treatment to provide to the cancer patient (i.e., directing therapy), and to monitor the efficacy of a treatment.
  • the invention differs from traditional cancer diagnostic and classification techniques with respect to the speed, simplicity, and reproducibility of the cancer diagnostic assay.
  • the invention also differs from other microa ⁇ ay-based diagnostic methods in that it does not require extensive data analysis or data transformation employing complex algorithms.
  • the invention differs from other cancer diagnostic methods in that it permits accurate diagnosis and classification of tumors by the analysis of a limited set of genes.
  • the use of a limited set of genes in the methods permits the use of simpler methods for acquisition of data, e.g., nucleic acid hybridization based methods such as RT-PCR, that do not generate massive quantities of data from parallel analysis of a large number of genes.
  • the invention also presents targets for drug development because it identifies genes that are differentially expressed in tumors, which can be utilized in the development of drugs to treat such tumors, e.g., by reducing expression of the genes or reducing activity of proteins encoded by the genes.
  • the invention simplifies prognosis determination by providing an identified set of a small number of genes whose level of expression in malignant pleural mesothelioma predicts clinical outcome as defined by, e.g., patient survival times.
  • RNA expression phenotyping was performed using high density microarrays generated from quantitative expression data on over 12,000 genes, which have been analyzed to identify specific probe sets (genes).
  • the expression gene set has multifold uses including, but not limited to, the following examples.
  • the expression gene set may be used as a prognostic tool for malignant pleural mesothelioma patients, to make possible more finely tuned diagnosis of malignant pleural mesothelioma and allow healthcare professionals to tailor treatment to individual patients' needs.
  • the invention can also assess the efficacy of cancer treatment by determining progression or regression of malignant pleural mesothelioma cancer in patients before, during, and after treatment.
  • Another utility of the expression gene set is in the biotechnology and pharmaceutical industries' research on disease pathway discovery for therapeutic targeting.
  • the invention can identify alterations in gene expression in malignant pleural mesothelioma and can also be used to uncover and test candidate pharmaceutical agents to treat malignant pleural mesothelioma.
  • a subject is a human, non-human primate, cow, horse, pig, sheep, goat, dog, cat, or rodent. In all embodiments human subjects are prefe ⁇ ed.
  • the subject is a human either suspected of having malignant pleural mesothelioma, or having been diagnosed with malignant pleural mesothelioma.
  • the subject preferably is a human suspected of having cancer, or a human having been previously diagnosed as having cancer.
  • Methods for identifying subjects suspected of having cancer may include physical examination, subject's family medical history, subject's medical history, biopsy, or a number of imaging technologies such as ultrasonography, computed tomography, magnetic resonance imaging, magnetic resonance spectroscopy, or positron emission tomography. Diagnostic methods for cancer and the clinical delineation of cancer diagnoses are well known to those of skill in the medical arts.
  • tissue sample is tissue obtained from a tissue biopsy using methods well known to those of ordinary skill in the related medical arts.
  • the phrase "suspected of being cancerous" as used herein means a cancer tissue sample believed by one of ordinary skill in the medical arts to contain cancerous cells. Methods for obtaining the sample from the biopsy include gross apportioning of a mass, microdissection, laser-based microdissection, or other art-known cell-separation methods.
  • the sample size required for analysis may range from 1, 10, 50, 100, 200, 300, 500, 1000, 5000, 10,000, to 50,000 or more cells.
  • the appropriate sample size may be determined based on the cellular composition and condition of the biopsy and the standard preparative steps for this determination and subsequent isolation of the nucleic acid for use in the invention are well known to one of ordinary skill in the art.
  • RNA conversion and/or amplification methods or other methods to enhance resolution of the nucleic acid molecules.
  • Such methods which allow use of limited biopsy materials, are well known to those of ordinary skill in the art and include, but are not limited to: direct RNA amplification, reverse transcription of RNA to cDNA, amplification of cDNA, or the generation of radio-labeled nucleic acids.
  • the phrase determining the expression of a set of nucleic acid molecules in the tissue means identifying RNA transcripts in the tissue sample by analysis of nucleic acid or protein expression in the tissue sample.
  • set refers to a group of nucleic acid molecules that include 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, or 35 different nucleic acid sequences from the group of 26 nucleic acid sequences in Table 1 (SEQ ID NOs: 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57 and 59) and/or from the group of 11 nucleic acid sequences in Table 3 (SEQ ID Nos: 43, 45, 61, 63, 65, 67, 69, 71, 73, 75 and 77).
  • non-cancerous tissue means tissue determined by one of ordinary skill in the medical art to have no evidence of malignant pleural mesothelioma based on standard diagnostic methods including, but not limited to, histologic staining and microscopic analysis.
  • Nucleic acid markers for cancer are nucleic acid molecules that by their presence or absence indicate the presence of absence of malignant pleural mesothelioma. In tissue, certain nucleic acid molecules are expressed at different levels depending on whether tissue is non-cancerous or cancerous.
  • Hybridization methods for nucleic acids are well known to those of ordinary skill in the art (see, e.g. Molecular Cloning: A Laboratory Manual, J. Sambrook, et al., eds., Second Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1989, or Current Protocols in Molecular Biology, F.M. Ausubel, et al., eds., John Wiley & Sons, Inc., New York).
  • the nucleic acid molecules from a malignant pleural mesothelioma tissue sample hybridize under stringent conditions to nucleic acid markers expressed in malignant pleural mesothelioma.
  • the markers are sets of two or more of the nucleic acid molecules as set forth in Table 1 (SEQ ID NOs: 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57 and 59) or Table 3 (SEQ ID Nos: 43, 45, 61, 63, 65, 67, 69, 71, 73, 75, 77).
  • the malignant pleural mesothelioma nucleic acid markers disclosed herein are known genes and fragments thereof. It may be desirable to identify variants of those genes, such as allelic variants or single nucleotide polymo ⁇ hisms (SNPs) in tissues. Accordingly, methods for identifying malignant pleural mesothelioma nucleic acid markers, including variants of the disclosed full-length cDNAs, genomic DNAs, and SNPs are also included in the invention.
  • the methods include contacting a nucleic acid sample (such as a cDNA library, genomic library, genomic DNA isolate, etc.) with a nucleic acid probe or primer derived from one of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77.
  • a nucleic acid sample such as a cDNA library, genomic library, genomic DNA isolate, etc.
  • the nucleic acid sample and the probe or primer hybridize to complementary nucleotide sequences of nucleic acids in the sample, if any are present, allowing detection of nucleic acids related to SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77.
  • the probe or primer is detectably labeled.
  • the specific conditions, reagents, and the like can be selected by one of ordinary skill in the art to selectively identify nucleic acids related to sets of two or more of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77.
  • the isolated nucleic acid molecule can be sequenced according to standard procedures.
  • the invention also includes degenerate nucleic acids that include alternative codons to those present in the native materials.
  • serine residues are encoded by the codons TCA, AGT, TCC, TCG, TCT, and AGC. Each of the six codons is equivalent for the pu ⁇ oses of encoding a serine residue.
  • nucleotide sequence triplets that encode other amino acid residues include, but are not limited to: CCA, CCC, CCG, and CCT (proline codons); CGA, CGC, CGG, CGT, AGA, and AGG (arginine codons); ACA, ACC, ACG, and ACT (threonine codons); AAC and AAT (asparagine codons); and ATA, ATC, and ATT (isoleucine codons).
  • Other amino acid residues may be encoded similarly by multiple nucleotide sequences.
  • the invention embraces degenerate nucleic acids that differ from the biologically isolated nucleic acids in codon sequence due to the degeneracy of the genetic code.
  • the invention also provides modified nucleic acid molecules, which include additions, substitutions, and deletions of one or more nucleotides such as the allelic variants and SNPs described above.
  • these modified nucleic acid molecules and/or the polypeptides they encode retain at least one activity or function of the unmodified nucleic acid molecule and/or the polypeptides, such as hybridization, antibody binding, etc.
  • the modified nucleic acid molecules encode modified polypeptides, preferably polypeptides having conservative amino acid substitutions.
  • a "conservative amino acid substitution” refers to an amino acid substitution which does not alter the relative charge or size characteristics of the protein in which the amino acid substitution is made.
  • amino acids include substitutions made amongst amino acids within the following groups: (a) M, I, L, V; (b) F, Y, W; (c) K, R, H; (d) A, G; (e) S, T; (f) Q, N; and (g) E, D.
  • the modified nucleic acid molecules are structurally related to the unmodified nucleic acid molecules and in prefe ⁇ ed embodiments are sufficiently structurally related to the unmodified nucleic acid molecules so that the modified and unmodified nucleic acid molecules hybridize under stringent conditions known to one of skill in the art.
  • modified nucleic acid molecules that encode polypeptides having single amino acid changes can be prepared for use in the methods and products disclosed herein.
  • Each of these nucleic acid molecules can have one, two, or three nucleotide substitutions exclusive of nucleotide changes corresponding to the degeneracy of the genetic code as described herein.
  • modified nucleic acid molecules that encode polypeptides having two amino acid changes can be prepared, which have, e.g., 2-6 nucleotide changes.
  • Numerous modified nucleic acid molecules like these will be readily envisioned by one of skill in the art, including for example, substitutions of nucleotides in codons encoding amino acids 2 and 3, 2 and 4, 2 and 5, 2 and 6, and so on.
  • each combination of two amino acids is included in the set of modified nucleic acid molecules, as well as all nucleotide substitutions that code for the amino acid substitutions.
  • Additional nucleic acid molecules that encode polypeptides having additional substitutions (i.e., 3 or more), additions or deletions [e.g., by introduction of a stop codon or a splice site(s)] also can be prepared and are embraced by the invention as readily envisioned by one of ordinary skill in the art. Any of the foregoing nucleic acids can be tested by routine experimentation for retention of structural relation to or activity similar to the nucleic acids disclosed herein.
  • microa ⁇ ay technology which is also known by other names including: DNA chip technology, gene chip technology, and solid-phase nucleic acid a ⁇ ay technology, is well known to those of ordinary skill in the art and is based on, but not limited to, obtaining an a ⁇ ay of identified nucleic acid probes on a fixed substrate, labeling target molecules with reporter molecules (e.g., radioactive, chemiluminescent, or fluorescent tags such as fluorescein, Cye3-dUTP, or Cye5-dUTP), hybridizing target nucleic acids to the probes, and evaluating target-probe hybridization.
  • reporter molecules e.g., radioactive, chemiluminescent, or fluorescent tags such as fluorescein, Cye3-dUTP, or Cye5-dUTP
  • microarray substrates may include but are not limited to glass, silica, aluminosilicates, borosilicates, metal oxides such as alumina and nickel oxide, various clays, nitrocellulose, or nylon. In all embodiments a glass substrate is prefe ⁇ ed.
  • probes are selected from the group of nucleic acids including, but not limited to: DNA, genomic DNA, cDNA, and oligonucleotides; and may be natural or synthetic.
  • Oligonucleotide probes preferably are 20 to 25-mer oligonucleotides and DNA/cDNA probes preferably are 500 to 5000 bases in length, although other lengths may be used. Appropriate probe length may be determined by one of ordinary skill in the art by following art-known procedures.
  • prefe ⁇ ed probes are sets of two or more of the nucleic acid molecules set forth as SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 (see also Table 1 and Table 3). Probes may be purified to remove contaminants using standard methods known to those of ordinary skill in the art such as gel filtration or precipitation.
  • the microa ⁇ ay substrate may be coated with a compound to enhance synthesis of the probe on the substrate.
  • a compound to enhance synthesis of the probe on the substrate include, but are not limited to, oligoethylene glycols.
  • coupling agents or groups on the substrate can be used to covalently link the first nucleotide or oligonucleotide to the substrate. These agents or groups may include, but are not limited to: amino, hydroxy, bromo, and carboxy groups. These reactive groups are preferably attached to the substrate through a hydrocarbyl radical such as an alkylene or phenylene divalent radical, one valence position occupied by the chain bonding and the remaining attached to the reactive groups.
  • hydrocarbyl groups may contain up to about ten carbon atoms, preferably up to about six carbon atoms.
  • Alkylene radicals are usually prefe ⁇ ed containing two to four carbon atoms in the principal chain.
  • probes are synthesized directly on the substrate in a predetermined grid pattern using methods such as light-directed chemical synthesis, photochemical deprotection, or delivery of nucleotide precursors to the substrate and subsequent probe production.
  • the substrate may be coated with a compound to enhance binding of the probe to the substrate.
  • a compound to enhance binding of the probe to the substrate include, but are not limited to: polylysine, amino silanes, amino-reactive silanes (Chipping Forecast, 1999) or chromium (Gwynne and Page, 2000).
  • presynthesized probes are applied to the substrate in a precise, predetermined volume and grid pattern, utilizing a computer-controlled robot to apply probe to the substrate in a contact-printing manner or in a non-contact manner such as ink jet or piezo-electric delivery.
  • Probes may be covalently linked to the substrate with methods that include, but are not limited to, UN-i ⁇ adiation.
  • probes are linked to the substrate with heat.
  • Targets are nucleic acids selected from the group, including but not limited to: D ⁇ A, genomic D ⁇ A, cD ⁇ A, R ⁇ A, mR ⁇ A and may be natural or synthetic.
  • nucleic acid molecules from human tissue are prefe ⁇ ed.
  • the tissue may be obtained from a subject or may be grown in culture (e.g., from a malignant pleural mesothelioma cell line).
  • one or more control nucleic acid molecules are attached to the substrate.
  • control nucleic acid molecules allow determination of factors including but not limited to: nucleic acid quality and binding characteristics; reagent quality and effectiveness; hybridization success; and analysis thresholds and success.
  • Control nucleic acids may include but are not limited to expression products of genes such as housekeeping genes or fragments thereof.
  • expression of nucleic acid markers is used to select clinical treatment paradigms for cancers, sucli as malignant pleural mesothelioma.
  • Treatment options may include but are not limited to: radiotherapy, chemotherapy, adjuvant therapy, or any combination of the aforementioned methods.
  • Aspects of treatment that may vary include, but are not limited to: dosages, timing of administration, or duration or therapy; and may or may not be combined with other treatments, which may also vary in dosage, timing, or duration.
  • Another treatment for malignant pleural mesothelioma is surgery, which can be utilized either alone or in combination with any of the aforementioned treatment methods.
  • One of ordinary skill in the medical arts may determine an appropriate treatment paradigm based on evaluation of differential expression of sets of two or more genes, such as those set forth as SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 for malignant pleural mesothelioma. Cancers that express markers that are indicative of a more aggressive cancer or poor prognosis may be treated with more aggressive therapies.
  • Progression or regression of malignant pleural mesothelioma is determined by comparison of two or more different malignant pleural mesothelioma tissue samples taken at two or more different times from a subject.
  • progression or regression may be evaluated by assessments of expression of sets of two or more of the nucleic acid targets, preferably using ratios of expression, including but not limited to SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, in an malignant pleural mesothelioma tissue sample from a subject before, during, and following treatment for malignant pleural mesothelioma. Progression or regression or other cancers or disease states would be determined similarly.
  • novel pharmacological agents useful in the treatment of malignant pleural mesothelioma can be identified by assessing variations in the expression of sets of two or more malignant pleural mesothelioma nucleic acid markers (preferably, variations in the ratios of expression), from among SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, prior to and after contacting malignant pleural mesothelioma cells or tissues with candidate pharmacological agents for the treatment of malignant pleural mesothelioma.
  • the cells may be grown in culture (e.g. from an malignant pleural mesothelioma cell line), or may be obtained from a subject, (e.g. in a clinical trial of candidate pharmaceutical agents to treat malignant pleural mesothelioma).
  • the invention further provides efficient methods of identifying pharmacological agents or lead compounds for agents active at the level of malignant pleural mesothelioma cellular function.
  • the screening methods involve assaying for compounds that beneficially alter malignant pleural mesothelioma nucleic acid molecule expression. Such methods are adaptable to automated, high-throughput screening of compounds.
  • the assay mixture comprises a candidate pharmacological agent.
  • a plurality of assay mixtures are run in parallel with different agent concentrations to obtain a different response to the various concentrations.
  • one of these concentrations serves as a negative control, i.e., at zero concentration of agent or at a concentration of agent below the limits of assay detection.
  • Candidate agents encompass numerous chemical classes, although typically they are organic compounds.
  • the candidate pharmacological agents are small organic compounds, i.e., those having a molecular weight of more than 50 yet less than about 2500, preferably less than about 1000 and, more preferably, less than about 500.
  • Candidate agents comprise functional chemical groups necessary for structural interactions with polypeptides and/or nucleic acids, and typically include at least an amine, carbonyl, hydroxyl, or carboxyl group, preferably at least two of the functional chemical groups and more preferably at least three of the functional chemical groups.
  • the candidate agents can comprise cyclic carbon or heterocyclic structure and/or aromatic or polyaromatic structures substituted with one or more of the above-identified functional groups.
  • Candidate agents also can be biomolecules such as peptides, saccharides, fatty acids, sterols, isoprenoids, purines, pyrimidines, derivatives or structural analogs of the above, or combinations thereof and the like.
  • the agent typically is a DNA or RNA molecule, although modified nucleic acids as defined herein are also contemplated.
  • Candidate agents are obtained from a wide variety of sources including libraries of synthetic or natural compounds. For example, numerous means are available for random and directed synthesis of a wide variety of organic compounds and biomolecules, including expression of randomized oligonucleotides, synthetic organic combinatorial libraries, phage display libraries of random peptides, and the like.
  • libraries of natural compounds in the form of bacterial, fungal, plant, and animal extracts are available or readily produced.
  • natural and synthetically produced libraries and compounds can be readily be modified through conventional chemical, physical, and biochemical means.
  • known pharmacological agents may be subjected to directed or random chemical modifications such as acylation, alkylation, esterification, amidification, etc. to produce structural analogs of the agents.
  • reagents such as salts, buffers, neutral proteins (e.g., albumin), detergents, etc. which may be used to facilitate optimal protein-protein and/or protein-nucleic acid binding. Such a reagent may also reduce non-specific or background interactions of the reaction components.
  • reagents that improve the efficiency of the assay such as protease, inhibitors, nuclease inhibitors, antimicrobial agents, and the like may also be used.
  • the mixture of the foregoing assay materials is incubated under conditions whereby, the anti-cancer candidate agent specifically binds the cellular binding target, a portion thereof or analog thereof.
  • the order of addition of components, incubation temperature, time of incubation, and other parameters of the assay may be readily determined. Such experimentation merely involves optimization of the assay parameters, not the fundamental composition of the assay. Incubation temperatures typically are between 4°C and 40°C. Incubation times preferably are minimized to facilitate rapid, high throughput screening, and typically are between 0.1 and 10 hours.
  • a separation step is often used to separate bound from unbound components.
  • the separation step may be accomplished in a variety of ways. Conveniently, at least one of the components is immobilized on a solid substrate, from which the unbound components may be easily separated.
  • the solid substrate can be made of a wide variety of materials and in a wide variety of shapes, e.g., microtiter plate, microbead, dipstick, resin particle, etc.
  • the substrate preferably is chosen to maximize signal-to-noise ratios, primarily to minimize background binding, as well as for ease of separation and cost.
  • Separation may be effected for example, by removing a bead or dipstick from a reservoir, emptying or diluting a reservoir such as a microtiter plate well, rinsing a bead, particle, chromotographic column or filter with a wash solution or solvent.
  • the separation step preferably includes multiple rinses or washes.
  • the solid substrate is a microtiter plate
  • the wells may be washed several times with a washing solution, which typically includes those components of the incubation mixture that do not participate in specific bindings such as salts, buffer, detergent, non-specific protein, etc.
  • the solid substrate is a magnetic bead
  • the beads may be washed one or more times with a washing solution and isolated using a magnet.
  • Detection may be effected in any convenient way for cell-based assays such as two- or three-hybrid screens.
  • the transcript resulting from a reporter gene transcription assay of the anti-cancer agent binding to a target molecule typically encodes a directly or indirectly detectable product, e.g., ⁇ -galactosidase activity, luciferase activity, and the like.
  • one of the components usually comprises, or is coupled to, a detectable label.
  • labels can be used, such as those that provide direct detection (e.g., radioactivity, luminescence, optical, or electron density, etc) or indirect detection (e.g., epitope tag such as the FLAG epitope, enzyme tag such as horseseradish peroxidase, etc.).
  • the label may be bound to an anti-cancer agent binding partner, or inco ⁇ orated into the structure of the binding partner.
  • the label may be detected while bound to the solid substrate or subsequent to separation from the solid substrate.
  • Labels may be directly detected through optical or electron density, radioactive emissions, nonradiative energy transfers, etc. or indirectly detected with antibody conjugates, strepavidin-biotin conjugates, etc. Methods for detecting the labels are well known in the art.
  • the invention thus generally provides cancer gene- or protein-specific binding agents, methods of identifying and making such agents, and their use in diagnosis, therapy and pharmaceutical development.
  • malignant pleural mesothelioma gene- or protein-specific pharmacological agents are useful in a variety of diagnostic and therapeutic applications as described herein.
  • the specificity of an cancer gene or protein binding to a binding agent is shown by binding equilibrium constants.
  • Targets that are capable of selectively binding an cancer gene preferably have binding equilibrium constants of at least about 10 7 M "1 , more preferably at least about 10 8 M "1 , and most preferably at least about 10 9 M "1 .
  • the wide variety of cell-based and cell-free assays may be used to demonstrate cancer gene-specific binding.
  • Cell-based assays include one, two and three hybrid screens, assays in which cancer gene-mediated transcription is inhibited or increased, etc.
  • Cell-free assays include cancer gene-protein binding assays, immunoassays, etc.
  • Other assays useful for screening agents which bind cancer polypeptides include fluorescence resonance energy transfer (FRET), and electrophoretic mobility shift analysis (EMSA).
  • pre- and post-treatment alterations in expression of two or more sets of cancer nucleic acid markers for example malignant pleural mesothelioma cancer nucleic acid markers including, but not limited to, SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, in cancer cells or tissues
  • treatment parameters including, but not limited to: dosage, method of administration, timing of administration, and combination with other treatments as described herein.
  • Candidate pharmacological agents may include antisense oligonucleotides that selectively bind to a cancer-related nucleic acid marker molecule, as identified herein, to reduce the expression of the marker molecules in cancer cells and tissues.
  • antisense oligonucleotides that selectively bind to a cancer-related nucleic acid marker molecule, as identified herein, to reduce the expression of the marker molecules in cancer cells and tissues.
  • One of ordinary skill in the art can test of the effects of a reduction of expression of cancer nucleic acid marker sequences in vivo or in vitro, to determine the efficacy of one or more antisense oligonucleotides.
  • antisense oligonucleotide or “antisense” describes an oligonucleotide that is an oligoribonucleotide, oligodeoxyribonucleotide, modified oligoribonucleotide, or modified oligodeoxyribonucleotide, which hybridizes under physiological conditions to DNA comprising a particular gene or to an mRNA transcript of that gene and, thereby, inhibits the transcription of that gene and/or the translation of that mRNA.
  • the antisense molecules are designed so as to interfere with transcription or translation of a target gene upon hybridization with the target gene or transcript.
  • the exact length of the antisense oligonucleotide and its degree of complementarity with its target will depend upon the specific target selected, including the sequence of the target and the particular bases which comprise that sequence. It is preferred that the antisense oligonucleotide be constructed and arranged so as to bind selectively with the target under physiological conditions, i.e., to hybridize substantially more to the target sequence than to any other sequence in the target cell under physiological conditions.
  • antisense oligonucleotides Based upon the sequences of cancer expressed nucleic acids, or upon allelic or homologous genomic and/or cDNA sequences, one of skill in the art can easily choose and synthesize any of a number of appropriate antisense molecules for use in accordance with the present invention.
  • antisense oligonucleotides should comprise at least 10 and, more preferably, at least 15 consecutive bases that are complementary to the target, although in certain cases modified oligonucleotides as short as 7 bases in length have been used successfully as antisense oligonucleotides (Wagner et al., 1996). Most preferably, the antisense oligonucleotides comprise a complementary sequence of 20-30 bases.
  • oligonucleotides may be chosen that are antisense to any region of the gene or mRNA transcripts, in prefe ⁇ ed embodiments the antisense oligonucleotides co ⁇ espond to N-terminal or 5' upstream sites such as translation initiation, transcription initiation, or promoter sites. In addition, 3'- untranslated regions may be targeted. Targeting to mRNA splicing sites has also been used in the art but may be less preferred if alternative mRNA splicing occurs.
  • the antisense is targeted, preferably, to sites in which mRNA secondary structure is not expected (see, e.g., Sainio et al., 1994) and at which proteins are not expected to bind.
  • the listed sequences are cDNA sequences
  • one of ordinary skill in the art may easily derive the genomic DNA co ⁇ esponding to the cDNA of an cancer expressed polypeptide.
  • the present invention also provides for antisense oligonucleotides that are complementary to the genomic DNA co ⁇ esponding to cancer expressed nucleic acids, e.g, the malignant pleural mesothelioma nucleic acid markers described herein.
  • the use of antisense to allelic or homologous cDNAs and genomic DNAs are enabled without undue experimentation.
  • the antisense oligonucleotides of the invention may be composed of "natural" deoxyribonucleotides, ribonucleotides, or any combination thereof. That is, the 5' end of one native nucleotide and the 3' end of another native nucleotide may be covalently linked, as in natural systems, via a phosphodiester intemucleoside linkage.
  • These oligonucleotides may be prepared by art-recognized methods, which may be carried out manually or by an automated synthesizer. They also may be produced recombinantly by vectors.
  • the antisense oligonucleotides of the invention also may include "modified" oligonucleotides. That is, the oligonucleotides may be modified in a number of ways that do not prevent them from hybridizing to their target but which enhance their stability or targeting or which otherwise enhance their therapeutic effectiveness.
  • modified oligonucleotide as used herein describes an oligonucleotide in which (1) at least two of its nucleotides are covalently linked via a synthetic intemucleoside linkage (i.e., a linkage other than a phosphodiester linkage between the 5' end of one nucleotide and the 3' end of another nucleotide) and/or (2) a chemical group not normally associated with nucleic acids has been covalently attached to the oligonucleotide.
  • a synthetic intemucleoside linkage i.e., a linkage other than a phosphodiester linkage between the 5' end of one nucleotide and the 3' end of another nucleotide
  • a chemical group not normally associated with nucleic acids has been covalently attached to the oligonucleotide.
  • Prefe ⁇ ed synthetic intemucleoside linkages are phosphorothioates, alkylphosphonates, phosphorodithioates, phosphate esters, alkylphosphonothioates, phosphoramidates, carbamates, carbonates, phosphate triesters, acetamidates, carboxymethyl esters, and peptides.
  • modified oligonucleotide also encompasses oligonucleotides with a covalently modified base and/or sugar.
  • modified oligonucleotides include oligonucleotides having backbone sugars that are covalently attached to low molecular weight organic groups other than a hydroxyl group at the 3' position and other than a phosphate group at the 5' position.
  • modified oligonucleotides may include a 2'-O- alkylated ribose group.
  • modified oligonucleotides may include sugars such as arabinose instead of ribose.
  • the present invention contemplates pharmaceutical preparations containing modified antisense molecules that are complementary to and hybridizable with, under physiological conditions, malignant pleural mesothelioma expressed nucleic acids, together with pharmaceutically acceptable carriers.
  • Antisense oligonucleotides may be administered as part of a pharmaceutical composition.
  • a pharmaceutical composition may include the antisense oligonucleotides in combination with any standard physiologically and/or pharmaceutically acceptable carriers which are known in the art.
  • the compositions should be sterile and contain a therapeutically effective amount of the antisense oligonucleotides in a unit of weight or volume suitable for administration to a patient.
  • pharmaceutically acceptable means a non-toxic material that does not interfere with the effectiveness of the biological activity of the active ingredients.
  • physiologically acceptable refers to a non-toxic material that is compatible with a biological system such as a cell, cell culture, tissue, or organism. The characteristics of the carrier will depend on the route of administration.
  • Physiologically and pharmaceutically acceptable carriers include diluents, fillers, salts, buffers, stabilizers, solubilizers, and other materials, which are well known in the art.
  • Expression of cancer nucleic acid molecules can also be determined using protein measurement methods, e.g., for use in the ratio-based diagnostic and prognostic methods described herein.
  • the expression of malignant pleural mesothelioma genes such as SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77
  • SEQ ID NOs:9 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77
  • exemplary translations are provided herein as SEQ ID NOs: 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34
  • Prefe ⁇ ed methods of specifically and quantitatively measuring proteins include, but are not limited to: mass spectroscopy-based methods such as surface enhanced laser deso ⁇ tion ionization (SELDI; e.g., Ciphergen ProteinChip System), non-mass spectroscopy- based methods, immunoassay methods such as ELISA and immunohistochemistry-based methods such as 2-dimensional gel electrophoresis.
  • mass spectroscopy-based methods such as surface enhanced laser deso ⁇ tion ionization (SELDI; e.g., Ciphergen ProteinChip System)
  • non-mass spectroscopy- based methods such as immunoassay methods such as ELISA and immunohistochemistry-based methods such as 2-dimensional gel electrophoresis.
  • immunoassay methods such as ELISA
  • immunohistochemistry-based methods such as 2-dimensional gel electrophoresis.
  • SELDI methodology may, through procedures known to those of ordinary skill in the art, be used to vaporize microscopic amounts of tumor protein and to create a "finge ⁇ rint" of individual proteins, thereby allowing simultaneous measurement of the abundance of many proteins in a single sample.
  • SELDI-based assays may be utilized to classify tumors. Such assays preferably include, but are not limited to the following examples. Gene products discovered by RNA microa ⁇ ays may be selectively measured by specific (antibody mediated) capture to the SELDI protein disc (e.g., selective SELDI).
  • Gene products discovered by protein screening may be resolved by "total protein SELDI" optimized to visualize those particular markers of interest from among polypeptides encoded by SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 (e.g., SEQ ID NOs:10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78).
  • the invention also involves agents such as polypeptides that bind to malignant pleural mesothelioma-associated polypeptides, e.g., SEQ ID NOs: 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78.
  • agents such as polypeptides that bind to malignant pleural mesothelioma-associated polypeptides, e.g., SEQ ID NOs: 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78.
  • binding agents can be used, for example, in screening assays to detect the presence or absence of malignant pleural mesothelioma-associated polypeptides and complexes of malignant pleural mesothelioma-associated polypeptides and their binding partners and in purification protocols to isolate malignant pleural mesothelioma-associated polypeptides and complexes of malignant pleural mesothelioma-associated polypeptides and their binding partners.
  • Such agents also may be used to inhibit the native activity of the malignant pleural mesothelioma-associated polypeptides, for example, by binding to such polypeptides.
  • the invention therefore, embraces peptide binding agents which, for example, can be antibodies or fragments of antibodies having the ability to selectively bind to malignant pleural mesothelioma -associated polypeptides.
  • Antibodies include polyclonal and monoclonal antibodies, prepared according to conventional methodology.
  • an antibody from which the pFc' region has been enzymatically cleaved, or which has been produced without the pFc' region designated an F(ab') 2 fragment
  • an antibody from which the Fc region has been enzymatically cleaved, or which has been produced without the Fc region designated an Fab fragment
  • Fab fragments consist of a covalently bound antibody light chain and a portion of the antibody heavy chain denoted Fd.
  • the Fd fragments are the major determinant of antibody specificity (a single Fd fragment may be associated with up to ten different light chains without altering antibody specificity) and Fd fragments retain epitope-binding ability in isolation.
  • CDRs complementarity determining regions
  • FRs framework regions
  • CDR1 through CDR3 complementarity determining regions
  • non-CDR regions of a mammalian antibody may be replaced with similar regions of conspecific or heterospecific antibodies while retaining the epitopic specificity of the original antibody.
  • This is most clearly manifested in the development and use of "humanized" antibodies in which non-human CDRs are covalently joined to human FR and/or Fc/pFc' regions to produce a functional antibody. See, e.g., U.S. patents 4,816,567, 5,225,539, 5,585,089, 5,693,762 and 5,859,205.
  • Fully human monoclonal antibodies also can be prepared by immunizing mice transgenic for large portions of human immunoglobulin heavy and light chain loci. Following immunization of these mice (e.g., XenoMouse (Abgenix), HuMAb mice (Medarex/GenPharm)), monoclonal antibodies can be prepared according to standard hybridoma technology. These monoclonal antibodies will have human immunoglobulin amino acid sequences and therefore will not provoke human anti-mouse antibody (HAMA) responses when administered to humans.
  • HAMA human anti-mouse antibody
  • the present invention also provides for F(ab') 2 , Fab, Fv and Fd fragments; chimeric antibodies in which the Fc and/or FR and/or CDR1 and/or CDR2 and/or light chain CDR3 regions have been replaced by homologous human or non-human sequences; chimeric F(ab') 2 fragment antibodies in which the FR and/or CDR1 and/or CDR2 and/or light chain CDR3 regions have been replaced by homologous human or non-human sequences; chimeric Fab fragment antibodies in which the FR and/or CDR1 and/or CDR2 and/or light chain CDR3 regions have been replaced by homologous human or non-human sequences; and chimeric Fd fragment antibodies in which the FR and/or CDR1 and/or CDR2 regions have been replaced by homologous human or non-human sequences.
  • the present invention also includes so-called single chain antibodies.
  • the invention involves the use of polypeptides of numerous size and type that bind specifically to polypeptides selected from those encoded by SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 (e.g., SEQ ID NOs:10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78), and complexes of both malignant pleural mesothelioma-associated polypeptides and their binding partners.
  • polypeptides may be derived also from sources other than antibody technology.
  • polypeptide binding agents can be provided by degenerate peptide libraries which can be readily prepared in solution, in immobilized form or as phage display libraries.
  • Combinatorial libraries also can be synthesized of peptides containing one or more amino acids. Libraries further can be synthesized of peptoids and non-peptide synthetic moieties.
  • Phage display can be particularly effective in identifying binding peptides useful according to the invention. Briefly, one prepares a phage library (using e.g. ml 3, fd, or lambda phage), displaying inserts from 4 to about 80 amino acid residues using conventional procedures.
  • the inserts may represent, for example, a completely degenerate or biased a ⁇ ay.
  • DNA sequence analysis can be conducted to identify the sequences of the expressed polypeptides.
  • the minimal linear portion of the sequence that binds to the malignant pleural mesothelioma-associated polypeptide can be determined.
  • Yeast two-hybrid screening methods also may be used to identify polypeptides that bind to the malignant pleural mesothelioma-associated polypeptides.
  • the malignant pleural mesothelioma-associated polypeptides of the invention can be used to screen peptide libraries, including phage display libraries, to identify and select peptide binding partners of the malignant pleural mesothelioma-associated polypeptides of the invention.
  • Such molecules can be used, as described, for screening assays, for purification protocols, for interfering directly with the functioning of malignant pleural mesothelioma-associated polypeptides and for other pu ⁇ oses that will be apparent to those of ordinary skill in the art.
  • isolated malignant pleural mesothelioma-associated polypeptides can be attached to a substrate (e.g., chromatographic media, such as polystyrene beads, a filter, or an a ⁇ ay substrate), and then a solution suspected of containing the binding partner may be applied to the substrate. If a binding partner that can interact with malignant pleural mesothelioma-associated polypeptides is present in the solution, then it will bind to the substrate-malignant pleural mesothelioma-associated polypeptide. The binding partner then may be isolated.
  • a substrate e.g., chromatographic media, such as polystyrene beads, a filter, or an a ⁇ ay substrate
  • a solution suspected of containing the binding partner may be applied to the substrate. If a binding partner that can interact with malignant pleural mesothelioma-associated polypeptides is present in the solution, then it will bind to the substrate-malignant ple
  • antibodies and other binding molecules may be used for example, to identify tissues expressing protein or to purify protein.
  • Antibodies also may be coupled to specific diagnostic labeling agents for imaging of cells and tissues that express malignant pleural mesothelioma-associated polypeptides or to therapeutically useful agents according to standard coupling procedures.
  • Diagnostic agents include, but are not limited to, barium sulfate, iocetamic acid, iopanoic acid, ipodate calcium, diatrizoate sodium, diatrizoate meglumine, metrizamide, tyropanoate sodium and radiodiagnostics including positron emitters such as fluorine-18 and carbon-11, gamma emitters such as iodine-123, technitium-99m, iodine-131 and indium- 111, nuclides for nuclear magnetic resonance such as fluorine and gadolinium.
  • the invention further includes protein microarrays for analyzing expression of malignant pleural mesothelioma-associated peptides selected from those encoded by SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 (e.g., SEQ ID NOs:10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78).
  • Protein microarray technology which is also known by other names including: protein chip technology and solid-phase protein a ⁇ ay technology, is well known to those of ordinary skill in the art and is based on, but not limited to, obtaining an a ⁇ ay of identified peptides or proteins on a fixed substrate, binding target molecules or biological constituents to the peptides, and evaluating such binding. See, e.g., G. MacBeath and S.L. Schreiber, "Printing Proteins as Microa ⁇ ays for High-Throughput Function Determination," Science 289(5485):1760-1763, 2000.
  • control peptide or protein molecules are attached to the substrate.
  • control peptide or protein molecules allow determination of factors such as peptide or protein quality and binding characteristics, reagent quality and effectiveness, hybridization success, and analysis thresholds and success.
  • MPM Malignant pleural mesothelioma
  • ADCA adenocarcinoma metastatic to the pleura of lung or other origin is a far more common etiology for patients presenting with a unilateral pleural effusion (1).
  • the ultimate treatment strategies depend on the co ⁇ ect pathological diagnosis.
  • Early MPM is best treated with extrapleural pneumonectomy followed by chemoradiation, whereas metastatic lung cancer is treated with chemotherapy alone (2). Not infrequently, distinguishing MPM from ADCA of the lung is challenging from both clinical and pathological perspectives (3).
  • Fluid cytology is diagnostic in only 33% of the cases (2, 4) and sufficient additional tissue from an open surgical biopsy is often required for immunohistochemistry and cytogenetic analysis (1).
  • Cu ⁇ ent bioinformatics tools recently applied to microa ⁇ ay data have shown utility in predicting both cancer diagnosis (5) and outcome (6). Though highly accurate, their widespread clinical relevance and applicability are unresolved. The minimum number of predictor genes is not known, and the discrimination function can vary (for the same genes) based on the location and protocol used for sample preparation (5).
  • Profiling with microarray requires relatively large quantities of RNA making the process inappropriate for certain applications. Also, it has yet to be determined if these approaches can utilize relatively low- cost and widely available data acquisition platforms such as RT-PCR and still retain significant predictive capabilities.
  • the major limitation in translating microa ⁇ ay profiling to patient care is that this approach cannot cu ⁇ ently be used to diagnose individual samples independently and without comparison to a predictor model generated from samples whose data was acquired on the same platform.
  • Tumor tissues A combined total of 245 discarded MPM and lung ADCA surgical specimens were freshly collected (and snap frozen) from patients who underwent surgery at Brigham and Women's Hospital (BWH) between 1993 and 2001. Lung ADCA tumors consisted of both primary malignancies and metastatic ADC As of breast and colon origin. All MPM samples used in these studies contained relatively pure tumor (greater than 50% tumor cells in a high power field examined in a section adjacent to the tissue used). Linked clinical and pathological data were obtained for all patients who contributed tumor specimens and rendered anonymous to protect patient confidentiality. Studies utilizing human tissues were approved by and conducted in accordance with the policies of the Institutional Review Board at BWH. Microarray experiments.
  • RNA (7 ⁇ g) was prepared from whole tumor blocks using Trizol Reagent (Invitrogen Life Technologies, Carlsbad, California) and processed as described (8-10).
  • cRNA was hybridized to human U95A oligonucleotide probe a ⁇ ays (Affymetrix, Santa Clara, CA) using a protocol described previously (10).
  • Data from 64 of 245 samples were discarded after visual inspection of hybridization data revealed obvious scanning artifacts, leaving a total of 31 MPM samples and 150 ADCA samples (139 patient tumors and 11 duplicates).
  • Microa ⁇ ays for all ADCA samples and 12 MPM samples were processed at the Dana-Farber Cancer Institute and the Whitehead Institute. The remaining 19 MPM samples were processed separately at BWH.
  • RNA (2 ⁇ g) was reverse-transcribed into cDNA using Taq-Man Reverse Transcription reagents (Applied Biosystems, Foster City, CA) and quantified using all recommended controls for SYBR Green-based detection.
  • Primers amplifying portions of claudin-7, VAC- ⁇ , TACSTD1, and calretinin cDNA had the following sequences (forward and reverse): claudin-7 5'-GTTCCTGTCCTGGGAATGAG-3' (SEQ ID NO:87) and
  • This sample was originally obtained from a patient with the clinical and pathological diagnosis of ADCA. This specimen was annotated by a pathologist reviewing frozen sections of all specimens prior to RNA preparation as having unusual histological features raising suspicion of a "germ cell tumor or sarcoma”.
  • the gene expression ratio method by virtue of the fact that it is a ratio (i) negates the need for a third reference gene when determining expression levels, (ii) is independent of platform used for data acquisition, (iii) requires only small quantities of RNA (as little as 10 pg using RT-PCR), (iv) does not require the coupling of transcription to translation for chosen genes, and (v) permits analysis of individual samples without reference to additional "training samples” whose data was acquired on the same platform. For these reasons, expression ratios are more likely to find immediate use in clinical settings since they confer several advantages compared to other equally accurate techniques, such as linear discriminant analysis.
  • the discovery of diagnostic gene ratios is likely to make possible future clinical tests to definitively diagnose MPM and ADCA using smaller tissue specimens and perhaps pleural effusions. In this way the need for diagnostic surgery in many of these patients may be eliminated.
  • the expression ratio technique represents a substantial improvement over past efforts to translate the strengths of expression profiling into simple tests with clinical relevancy. Many bioinformatics tools under development and testing are quite complex and/or rely upon data from large numbers of "training samples" to establish a diagnosis for unknown samples. The end result is that the practical use of microa ⁇ ay data remains beyond the scope of many scientists and clinicians. Similarly, no comprehensive method has been proposed to translate the results of tumor profiling to the analysis of individual tissues. As a consequence, no simple yet effective clinical applications have resulted from microa ⁇ ay research.
  • the expression ratio technique represents a powerful use of microa ⁇ ay data that can be easily adapted and extended to routine clinical application without the need for additional sophisticated analysis.
  • Example 1 Aisner, J. Diagnosis, staging, and natural history of pleural mesothelioma. In: J. Aisner, R. Arriagada, M. R. Green, N. Martini, and M. C. Pe ⁇ y (eds.), Comprehensive Textbook of Thoracic Oncology, pp. 799-785. Baltimore: Williams and Wilkins, 1996.
  • TACSTD1 (alias TROP1, M4S1) to human chromosome 2p21 and the refinement of mapping of TACSTD2 (alias TROP2, Ml SI) to human chromosome lp32 by in situ hybridization, Cytogenet Cell Genet. 92: 164-165, 2001.
  • Example 2 Molecular Markers for Malignant Pleural Mesothelioma
  • MPM tissues Discarded malignant pleural mesothelioma (MPM) surgical specimens were freshly collected from patients undergoing pleurectomy or extrapleural pneumonectomy at Brigham and Women's Hospital (Boston, MA) from 1992 to 1998 and flash frozen. All tissues were obtained from patients who did not receive pre-operative treatment. Standard tissue banking procedures were followed throughout. Once brought to the Hospital Tumor Bank, tissues were sliced into 3 mm 3 portions and assigned an identifier to catalogue its position in the original specimen. Hematoxylin-stained slides were generated from each MPM specimen in the Tumor Bank and reviewed by a pathologist for tumor content and histological subtype.
  • a total of 80 specimens have been identified to date that contain relatively pure tumor (greater than 50% of cells in a high power field are tumor cells). Of these, 24 were chosen for microa ⁇ ay analysis. Linked clinical, epidemiological, outcome, and pathological data were obtained for all patients who contributed tumor specimens and rendered anonymous to protect patient confidentiality. Studies utilizing human tissues were approved by and conducted in accordance with the policies of the Institutional Review Board at Brigham and Women's Hospital.
  • RNA was isolated from frozen tumor blocks using Trizol solution (Invitrogen Life Technologies, Carlsbad, California) exactly per the manufacturer's recommended protocol. To initially assess total RNA degradation, a portion of RNA from each sample was resolved on a 1% agarose/formadehyde gel using standard procedures (Ausubel, 1998). Gels were stained with ethidium bromide and bands representing ribosomal subunits 28S and 18S were visualized. Approximately 10% of samples collected were discarded secondary to unsatisfactory quality.
  • RNA (7 ⁇ g) was amplified and the product labeled with biotin following a procedure previously described (Wang, 1999; Warrington, 2000; O'Dell, 1999). Briefly, double-stranded cDNA was synthesized using the Superscript Choice System (Invitrogen Life Technologies) and a T7-(dT)-24 first strand primer (Geneset Oligos, La Jolla, California). The cDNA was purified by phenol/chloroform/isoamyl alcohol extraction using a phase lock gel (5 Prime-3 Prime, Inc., Boulder, Colorado) and concentrated by ethanol precipitation.
  • Hybridization of RNA to high density oligonucleotide microarrays Prior to hybridization to experimental a ⁇ ays, the quality of cRNA was assessed for approximately half of all samples using test a ⁇ ays (Affymetrix Test2 gene a ⁇ ays) designed to compare relative expression levels of ⁇ -actin and GAPDH by using oligonucleotide probes complementary to both the 3' and 5' ends of gene products. Hybridization of test a ⁇ ays was done as detailed below for experimental a ⁇ ays with minor modifications as suggested by the manufacturer (Affymetrix).
  • Biotinylated cRNA (20 ⁇ g) was fragmented and hybridized to microa ⁇ ays containing oligonucleotide probe-sets representing approximately 12,000 known human genes (Affymetrix U95 A human a ⁇ ay, HG-95 Av2) according to Affymetrix protocols using a protocol described previously (O'Dell, 1999). Essentially, the hybridization mixture was incubated at 99 °C for 5 min. followed by incubation at 45 °C for 5 min. before injection of the sample into the probe a ⁇ ay cartridge. Hybridization was performed at 45 °C for 16-18 hours.
  • the array was stained with streptavidin-phycoerythrin (Molecular Probes, Eugene, Oregon) and the hybridization signal amplified using a biotinylated anti- streptavidin antibody (Vector Laboratories, Inc., Burlingame, California) before subsequent scanning in a HP GeneArray scanner (Affymetrix).
  • the intensity of all features of microa ⁇ ays were captured and examined for artifacts using Affymetrix GeneChip ® Software v. 4.0, according to standard Affymetrix procedures (O'Dell, 1999).
  • the "target intensity" was set to 100 for all samples.
  • Each a ⁇ ay contained several prokaryotic genes which served as internal hybridization controls for RNA spiked into experimental samples. Data from 5 a ⁇ ays was uninte ⁇ retable and discarded, leaving a total of 19 samples in the final analysis. Of these 19, 2 were tested in duplicate and 1 in triplicate.
  • GeneChip ® Software was used to generate quantitative gene expression values (measured by average differences).
  • RNA (2 ⁇ g) was reverse-transcribed into cDNA using Taq-Man Reverse Transcription reagents and random hexamers as the primer (Applied Biosystems). PCR reactions were set up in a 25 ⁇ l reaction volume using SYBR Green PCR Master Mix (Applied Biosystems).
  • Optimized primers amplifying portions of fibronectin, transgelin, complement factor B (CFB), and L32 ribosomal protein cDNA were designed according to recommended specifications (Applied Biosystems), synthesized by Invitrogen Life Technologies, and used at a final concentration of 900 nM in the reaction mixture.
  • fibronectin 5'-GCCATGACAATGGTGTGAAC-3' SEQ ID NO:l
  • amplification reactions are characterized by the point in time during cycling when amplification of a PCR product is first detected rather than by the amount of PCR product accumulated after a fixed number of cycles.
  • fluorescence signal which defines the baseline for the amplification plot.
  • An increase in fluorescence above the baseline indicates the detection of accumulated PCR product.
  • a fixed fluorescence threshold can be set above the baseline.
  • CT threshold cycle
  • the parameter CT is defined as the fractional cycle number at which the fluorescence passes the fixed threshold. The higher the starting copy number of the nucleic acid target, the sooner an increase in fluorescence past the selected threshold is observed.
  • a plot of the log of initial target copy number for a set of standards versus CT is a straight line. Therefore, quantification of the amount of target in unknown samples is accomplished by measuring CT and using the standard curve to determine starting copy number.
  • the L32 ribosomal gene was used for this pu ⁇ ose since its expression levels did not vary substantially over all samples (from microa ⁇ ay data).
  • Data analysis. A hierarchical clustering algorithm (AGNES) in the statistical package S- PLUS (Venables, 1997) was used to classify all 19 MPM tumors according to relative variation in gene expression patterns. All linked clinical data was held exclusively by one investigator and revealed only after cluster analysis was completed.
  • Gene hybridization intensities (from GeneChip ® Software) were appropriately scaled to a "target intensity" of 100 to facilitate comparison of data from all a ⁇ ays. To minimize contamination from signal background and saturation effects (Hsiao, 2001), only genes with an expression value between 1,000 and 5,000 were considered in the unsupervised cluster analysis.
  • a set of 26 genes was identified as providing a redundant set of diagnostic gene expression ratios that can be used in different combinations (Table 2). There is some overlap to ensure complete coverage of samples to co ⁇ ect for "no-calls" for any one ratio.
  • Subclasses 2 and/or 3 Genes overexpressed in Subclasses 2 and/or 3 consisted predominately of cytoskeletal and ECM- related genes such as actin, vimentin, tubulin, myosin, cofilin, osteonectin, and others.
  • Samples in Subclass 1 and 2 consisted exclusively of specimens histologically classified as epithelial and mixed subtypes, respectively.
  • Subclass 2 The analysis of cancer related clinical outcome (using nonparametric ANOVA) revealed that the median survival (19 months) of patients in Subclass 1 (all epithelial subtype) was significantly higher (RO.01) than the median survival (2 months) of patients in Subclass 2 (all mixed subtype). The median survival of patients with varied histology in Subclass 3 (11 months) was intermediate to Subclasses 1 and 2, but nonetheless was not significantly different (P>0.05) from that of either Subclass. There was no significant difference in survival between patients with epithelial histology classified as either Subclass 1 or 3.
  • SEQ ID Nos are given as nucleotide sequence, amino acid sequence
  • Expression levels for all three genes were significantly (R ⁇ 0.05) co ⁇ elated with survival i ⁇ espective of histological subtype.
  • expression level ratios of transgelin/CFB and fibronectin/CFB were also significantly co ⁇ elated with survival independent of histology. Levels of transgelin and fibronectin were approximately equal in individual samples; accordingly, the fibronectin/transgelin ratio remains close to 1 for all samples and is not co ⁇ elated with survival.
  • RT-PCR data from these patients was combined with that from patients whose tumors were subjected to microa ⁇ ay analysis, for a total of 32 samples.
  • median patient survival is inversely proportional to the value of the transgelin/CFB expression ratio.
  • transgelin/CFB gene expression level ratio b a , Value of transgelin/CFB gene expression level ratio b , in months
  • prognostic markers were originally selected by examining gene expression level differences between samples in subclasses with the greatest difference in median patient survival (Subclass 1 and Subclass 2, see Table 3). It also was found that these genes could distinguish tumors of the epithelial subtype from all others. Then, using the larger cohort of samples, we examined whether there was evidence that the transgelin/CFB expression ratio provided a valuable diagnostic tool in addition to a predictor of prognosis. We found that the transgelin/CFB expression ratio identified the histological subtypes of tumors with a high degree of accuracy.
  • transgelin and CFB are not random predictors of diagnosis/prognosis, but have notable biological relevance to carcinogenesis.
  • CFB is significantly overexpressed in epithelial tumors while transgelin is significantly underexpressed in the same samples.
  • Transgelin binds to native actin filament bundles and gels actin in vitro (Shapland, 1993) and has been proposed as a marker of neoplastic transformation (Lawson, 1997).
  • CFB has been implicated in tumor apoptosis in a manner independent of TNF/TNFR or FasL/Fas interactions (Uwai, 2000).
  • CFB's pro-apoptotic function is consistent with the observations showing high levels of this gene significantly co ⁇ elated with relatively good prognosis (i.e. survival).
  • M62895 Human lipocortin (LIP) 2 pseudogene mRNA, complete cds-like region
  • Ll 1566 Homo sapiens ribosomal protein L18 (RPL18) mRNA, complete eds
  • X16064 Human mRNA for translationally controlled tumor protein
  • X55954 Human mRNA for HL23 ribosomal protein homologue
  • X53777 Human L23 mRNA for putative ribosomal protein
  • LOI 124 Human ribosomal protein SI 3 (RPS13) mRNA, complete eds
  • D23661 Human mRNA for ribosomal protein L37, complete eds
  • L38941 Homo sapiens ribosomal protein L34 (RPL34) mRNA, complete eds
  • X52851 Human cyclophilin gene for cyclophilin (EC 5.2.1.8)
  • AF037643 Homo sapiens 60S ribosomal protein L12 (RPL12) pseudogene, partial sequence
  • X15940 Human mRNA for ribosomal protein L31
  • M94314 Homo sapiens ribosomal protein L30 mRNA, complete eds
  • X04098 Human mRNA for cytoskeletal gamma-actin
  • M58458 Human ribosomal protein S4 (RPS4X) isoform mRNA, complete eds
  • X55715 Human Hums3 mRNA for 40S ribosomal protein s3
  • X03342 Human mRNA for ribosomal protein L32
  • X06617 Human mRNA for ribosomal protein SI 1
  • M26252 Human TCB gene encoding cytosolic thyroid hormone-binding protein, complete eds
  • D23660 Human mRNA for ribosomal protein, complete eds
  • M16660 Human 90-kDa heat-shock protein gene, cDNA, complete eds
  • M22919 Human nonmuscle/smooth muscle alkali myosin light chain gene, complete eds
  • J03077 Human co-beta glucosidase (proactivator) mRNA, complete eds
  • X56009 Human GSA mRNA for alpha subunit of GsGTP binding protein
  • X04409 Human mRNA for coupling protein G(s) alpha-subunit (alpha-Si)
  • AI201310 qf71bl l.xl Homo sapiens cDNA, 3 end
  • J04182 Homo sapiens lysosomal membrane glycoprotein-1 (LAMP1) mRNA, complete eds
  • J03592 Human ADP/ATP translocase mRNA, 3 end
  • X13546 Human HMG-17 gene for non-histone chromosomal protein HMG-17
  • D32129 Human mRNA for HLA class-I (HLA-A26) heavy chain, complete eds (clone cMIY-1)
  • X57352 Human 1-8U gene from interferon-inducible gene family
  • AI557852:P6test.G05.r Homo sapiens cDNA, 5 end
  • M33680 Human 26-kDa cell surface protein TAPA-1 mRNA, complete eds
  • M63573 Human secreted cyclophilin-like protein (SCYLP) mRNA, complete eds
  • M22806 Human prolyl 4-hydroxylase beta-subunit and disulfide isomerase (P4HB) gene
  • X58536 Human mRNA for HLA class I locus C heavy chain
  • J04599 Human hPGI mRNA encoding bone small proteoglycan I (biglycan), complete eds
  • J02984 Human insulinoma rig-analog mRNA encoding DNA-binding protein, complete eds
  • L41498 Homo sapiens longation factor 1-alpha 1 (PTI-1) mRNA, complete eds X56681 :HumanjunD mRNA
  • M94046 Human zinc finger protein (MAZ) mRNA
  • M55914 HUMCMYCQ Human c-myc binding protein (MBP-1) mRNA, complete eds
  • M64241 HUMQM Human Wilm s tumor-related protein (QM) mRNA, complete eds
  • X58965 HSNM23H2G H.sapiens RNA for nm23-H2 gene
  • Dl 1139 HUMTIMP Human gene for tissue inhibitor of metalloproteinases, partial sequence
  • M55409 HUMPANCAN Homo sapiens pancreatic tumor-related protein mRNA, partial eds
  • M84711 HUMFTE1 A Human v-fos transformation effector protein (Fte-1), mRNA complete eds
  • X56681 HSJUNDR Human junD mRNA
  • M26880 HUMUBI13 Human ubiquitin mRNA, complete eds
  • X04803 HSYUBG1 Homo sapiens ubiquitin gene
  • D78361 HUMODAZ Human mRNA for ornithine decarboxylase antizyme, ORF 1 and ORF 2
  • J04617 HUMEF1A Human elongation factor EF-1 -alpha gene, complete eds
  • J04988 HUMHSP90B Human 90 kD heat shock protein gene, complete eds
  • D00017 HUMLIC Homo sapiens mRNA for lipocortin II, complete eds
  • J03040 HUMSPARC Human SPARC/osteonectin mRNA, complete eds
  • J04164 HUM927A Human interferon-inducible protein 9-27 mRNA, complete eds
  • V00567 HSMGLO Human messenger RNA fragment for the beta-2 microglobulin
  • D14530 HUMRSPT Human homolog of yeast ribosomal protein S28, complete eds
  • M 14199 HUMLAMR Human laminin receptor (2H5 epitope) mRNA, 5 end
  • M63138 HUMCATD5 Human cathepsin D (catD) gene, exons 7, 8, and 9
  • V00599 HSTUB2 Human mRNA fragment encoding beta-tubulin. (from clone D-beta-1)
  • Malignant pleural mesothelioma is an asbestos related, lethal neoplastic disease of the pleura (median survival between 4 and 12 months) subdivided into three major histological subtypes: epithelial, mixed, and sarcomatoid (1-4). Compared to patients with non-epithelial subtypes, patients with the epithelial subtype show a survival benefit from a variety of treatment strategies, including aggressive multi-modality therapy (5-7). Cu ⁇ ently, patients who present to our unit with unilateral mesothelioma without extrapleural invasion undergo complete surgical resection (extra-pleural pneumonectomy) followed by chemoradiation. The 5-year survival for those patients with stage I and epithelial histology is 40%.
  • RNA (2 ⁇ g) isolated from 29 tumors in the test set was reverse-transcribed into cDNA using Taq-Man Reverse Transcription reagents (Applied Biosystems, Foster City, CA) and quantified using all recommended controls.
  • Primer sequences (synthesized by Invitrogen Life Technologies) were as follows (forward and reverse):
  • PCR amplification was performed using an Applied Biosystems 5700 Sequence Detector. No-template (negative) controls containing H 2 O substituted for template were run in multiple wells on every reaction plate. An automatically calculated melting point disassociation curve generated after every assay was examined to ensure the presence of a single PCR species and a lack of primer-dimer formation in each well.
  • the Comparative C T method (Applied Biosystems) was used with minor modifications to obtain quantitative values for gene expression ratios in all samples. Calculation of an expression ratio using data from two genes in any single sample negates the need for a calibrator sample and a reference gene to standardize for different starting template amounts. Therefore, to form expression ratios of two genes, we merely stated the expression level of one gene relative to the other.
  • the ⁇ C ⁇ value in the Comparative CT equation reduces to: [C ⁇ ( g ene i) - C ⁇ ( gen e2)]- Data and statistical analysis.
  • a two-sided Student's (parametric) t-test was used for pair-wise comparisons of average gene expression levels among multiple groups and the Significance Analysis of Microa ⁇ ays (SAM) algorithm (18) was used to estimate the false discovery rate.
  • SAM Significance Analysis of Microa ⁇ ays
  • Kaplan-Meier curves were used to estimate survival in each group.
  • the log- rank test was used to statistically assess differences among multiple survival curves.
  • a Cox proportional-hazards regression model was used for multivariate analysis.
  • CTHBP cytosolic thyroid hormone-binding protein
  • IGFBP- 3 insulin-like growth factor-binding protein-3
  • PLCl plectin
  • IGFBP-3 is listed twice in the lower portion of the table because this gene is represented by multiple Affymetrix probe sets.
  • Cu ⁇ ent methods of prognosis in mesothelioma include stage and histology at the time of surgery. However, these techniques are not completely reliable and accurate staging usually requires extensive surgery (3,8,9). Recently, we discovered that simple ratios of gene expression can be used to accurately diagnose cancer (17) while successfully avoiding many of the shortcomings which preclude the use of other microa ⁇ ay analytical techniques in wider clinical applications (10,20). In this study, we describe a technique that uses expression data from four genes to independently predict outcome in mesothelioma patients who undergo extrapleural pneumonectomy followed by standard chemoradiation therapy. Although this analysis only utilized four genes, the expression ratio technique can easily incorporate larger numbers of genes when required for acceptable accuracy.
  • the prognostic tool described herein could dramatically impact the cu ⁇ ent clinical treatment of mesothelioma by identifying preoperatively patients not likely to respond to conventional treatment modalities thus sparing them from radical surgery. It is cu ⁇ ently our practice to obtain a tissue diagnosis prior to recommending therapy for patients with mesothelioma, but the absence of suitable prognostic molecular markers make it difficult to assign optimal treatments or investigate new modalities. The results of this work, if confirmed prospectively in a larger patient population, should prove helpful in the development of meaningful clinical trials for patients with mesothelioma.
  • the use of gene expression ratios to predict patient outcome in mesothelioma and other cancers (17) overcomes several major obstacles to the clinical use of microa ⁇ ay data.
  • the expression ratio method generates a simple numerical measure that can be used to predict clinical outcome using a single biopsy specimen. Since this non-linear function of gene expression is a unit-less number and does not require data from additional training samples or from additional reference genes, expression levels can be measured using any reliable method including quantitative RT-PCR, cDNA and oligonucleotide microa ⁇ ays, SAGE, or perhaps ELISAs for encoded proteins.
  • the expression ratio technique can also facilitate examination of microa ⁇ ay data by investigators without direct access to sophisticated analytical tools. Using previously published data, we have created ratio-based tests using small numbers of genes that can diagnose localized prostate cancer and predict clinical outcome in breast cancer (see Example 4).
  • Aisner J Diagnosis, staging, and natural history of pleural mesothelioma. In:Aisner J, Arriagada R, Green MR, et al, Aisner J, Arriagada R, Green MR, et als.
  • Tusher VG Tibshirani R, Chu G. Significance analysis of microa ⁇ ays applied to the ionizing radiation response. Proc Nati Acad Sci USA 2001;98:5116-5121.
  • Cu ⁇ ent gene expression profiling-based bioinformatics tools are highly accurate in the diagnosis and prognosis of cancer (1-6).
  • the widespread clinical applicability of these techniques is cu ⁇ ently limited owing largely to a lack of a practical method for translating complex profiling analyses to functional clinical tests.
  • Our method utilizes a supervised comparison of extensive gene profiling data to identify differentially expressed genes between two groups. Carefully chosen genes are then used to calculate simple expression ratios which in turn are set to predict (in a binary numerical manner) the clinical parameter in question.
  • Tumor tissues Ten sets of matched normal adjacent prostate and malignant prostate cancer (20 specimens total) were obtained from the Tumor Bank at Brigham and Women's Hospital. Studies utilizing human tissues were approved by and conducted in accordance with the policies of the institutional Review Board at Brigham and Women's Hospital.
  • Microa ⁇ ay data for prostate tissues was obtained from two sources.
  • PCA 9,984-element cDNA microa ⁇ ay
  • we scaled gene hybridization intensities i.e. ".eel” files
  • Gene expression data for breast cancer tissues were obtained from a single source using a microa ⁇ ay containing approximately 25,000 genes (6).
  • the "training set” consisted of two groups of samples: those from 44 patients with greater than 5 years disease-free survival (i.e., relatively good outcome) and those from 34 patients with less than 5 years disease-free survival (i.e., relatively poor outcome).
  • the "test set” consisted of 19 additional profiled patient samples.
  • these genes were not reliably detected (i.e., average expression levels >600) in at least one group of the test set and were not given further consideration.
  • these 11 genes we calculated 30 expression ratios per sample by dividing the expression value of each of the 6 genes expressed at relatively higher levels in NAP/BPH by the expression value of each of the 5 genes expressed at relatively higher levels in PCA. Then, we tested the diagnostic accuracy of these ratios in the 28 training set samples obtained from the same study. Samples with ratio values >1 were called NAP/BPH and those with ratio values ⁇ 1 were called PCA.
  • We performed Kaplan-Meier time-to-relapse analysis using predictions made from this test in the 19 test set samples and found a significant difference P 0.0197, Fig. 8) between groups predicted to have widely divergent disease-free survival times.
  • ratio-based predictor models we have not proposed an exact protocol for developing and testing ratio-based predictor models. In fact, we discovered in this study and others (7) that multiple combinations of genes, in the form of ratios, can achieve similarly accurate results. We merely assert that simple ratios can be a highly accurate means of predicting clinical parameters using very small numbers of genes and simpler data acquisition platforms, such as quantitative RT-PCR and/or custom microa ⁇ ays. Furthermore, this strategy can be used to analyze microa ⁇ ays without the need for additional reference samples. In the case of prostate cancer, we envision diagnosis using mRNA obtained from fine needle aspirations would be less invasive than cu ⁇ ent biopsy techniques and would likely increase compliance and reduce discomfort in men whose prostate-specific antigen levels mandate frequent screening.
  • women with breast cancer undergoing initial diagnostic biopsy could have tissue saved for a similar gene expression ratio based test using quantitative RT-PCR or a custom microarray. Women found to be at high risk for recu ⁇ ence may be selected for either neo-adjuvant chemotherapy or post-surgical adjuvant therapy.
  • the gene ratio method thus presents an opportunity to translate initial microa ⁇ ay based gene expression profiling to simple clinical tests that are performed using quantitative RT-PCR, microa ⁇ ays, or other platforms on material obtained surgically or from fine needle aspirations.
  • Example 5 Prediction of Outcomes of Lung Adenocarcinoma Using Expression Profiling Data This example describes the use of published data relating gene expression profiles and outcome in lung adenocarcinoma. A set of gene ratios was generated by analyzing the data from Beer et al. (Nature Med. 8: 816-824, 2002), who used smaller chips (6800 genes), as a training set.
  • the training set ratios were tested using the published data set derived from expression profiling experiments using 12,000 genes (Bhattacharjee et al., Proc. Nati. Acad. Sci. USA. 98: 13790-13795, 2001). The object was to predict good outcome versus recu ⁇ ence in stage I lung cancer after surgery. As shown below, the ratios derived from the training set data (Beer et al.) could differentiate significantly between good and poor outcomes in the test set data (Bhattacharjee et al.).
  • Gene selection criteria Genes having a >2-fold higher expression in good or poor outcome samples, and the lowest (best)/? values, were selected.
  • Table 14 Training set gene ratios for predicting outcome
  • E ⁇ or good number of e ⁇ ors in predicting good outcome in training set
  • Error poor number of e ⁇ ors in predicting poor outcome in training set
  • Error total number of total errors in predicting outcome in training set
  • E ⁇ or good number of e ⁇ ors in predicting good outcome in test set
  • Error poor number of e ⁇ ors in predicting poor outcome in test set
  • E ⁇ or total number of total errors in predicting outcome in test set Table 16: Predictions and status for individual tumor samples
  • Stage 1 adenocarcinoma only; excluded patient tissue samples of ⁇ 40% tumor cell and/or mixed histology.
  • Example 6 Analysis of gene expression data in various cancers for diagnosis and prognosis
  • This example represents analyses of gene expression profiling data presented in the literature for several different types of cancer. Each chart has several lists of genes that are increased in expression or decreased in expression in a given diagnosis or prognosis.
  • the method applied to the analyses of the data uses a combination of ratios of genes from one set always in the numerator and a second set always in the denominator to determine diagnosis or prognosis.
  • the genes used in the ratios for determination of diagnosis or prognosis are numbered.
  • Training set 109 samples
  • Test set 58 samples
  • Table 17 Genes overexpressed in germinal-center B-cell-like (GCB) or type III in diffuse large B-cell lymphoma (DLBCL)
  • Table 18 Genes overexpressed in tumor or normal ovarian epithelium
  • Table 20 Genes overexpressed in diffuse large B-cell lymphoma (DLBCL) or follicular lymphoma (FL)
  • Table 21 Genes overexpressed in subtype activated B-cell-like (ABC) or type III diffuse large-B-cell lymphoma
  • Example 7 Prognosis of lung adenocarcinoma

Abstract

The invention provides methods for diagnosing biological states or conditions based on ratios of gene expression data from tissue samples, such as cancer tissue samples. The invention also provides sets of genes that are expressed differentially in malignant pleural mesothelioma. These sets of genes can be used to discriminate between normal and malignant tissues, and between classes of malignant tissues. Accordingly, diagnostic assays for classification of tumors, prediction of tumor outcome, selecting and monitoring treatment regimens and monitoring tumor progression/regression also are provided.

Description

DIAGNOSTIC AND PROGNOSTIC TESTS
Related Applications
This application claims the benefit under 35 U.S.C. § 119(e) of U.S. provisional application 60/317,389, filed September 5, 2001 , and U.S. provisional application 60/ filed August 30, 2002, the entire disclosures of which are incorporated herein by reference.
Government Support
This invention was made in part with government support under grant number DK58849 from the National Institutes of Health. The United States government may have certain rights in this invention.
Field of the Invention
The invention relates to methods for diagnosing conditions, predicting prognoses and optimizing treatment strategies using ratios of gene expression data. The invention also relates to nucleic acid markers for cancer, particularly for distinguishing malignant pleural mesothelioma from other lung cancers or from normal lung tissue, and for distinguishing between subclasses of malignant pleural mesothelioma.
Background of the Invention
Although much progress has been made toward understanding the biological basis of cancer and in its diagnosis and treatment, it is still one of the leading causes of death in the United States. Inherent difficulties in the diagnosis and treatment of cancer include among other things, the existence of many different subgroups of cancer and the concomitant variation in appropriate treatment strategies to maximize the likelihood of positive patient outcome.
Subclassification of cancer has typically relied on the grouping of tumors based on tissue of origin, histology, cytogenetics, immunohistochemistry, and known biological behavior. The pathologic diagnosis used to classify the tumor taken together with the stage of the cancer is then used to predict prognosis and direct therapy. However, current methods of cancer classification and staging are not completely reliable.
Gene expression profiling using microarrays is likely to result in improvements in cancer classification and prediction of prognosis (Golub, 1999; Perou, 2000; Hedenfalk, 2001 ; Khan, 2001). Still, the wealth of information garnered using microarrays has, thus far, not yielded effective clinical applications. Global expression analysis has led to the development of sophisticated computer algorithms seeking to extend data analysis beyond simple expression profiles (Quackenbush, 2001; Khan, 2001). At this time, however, no clear consensus exists regarding which computational tools are optimal for the analysis of large gene expression profiling data sets, particularly in the clinical setting. Moreover, many of these bioinformatics tools under development and testing are quite complex leaving the practical use of microarray data beyond the scope of many biomedical scientists and/or clinicians. With rare exceptions (e.g. PSA and prostate cancer), it is generally assumed that expression levels of any one gene are insufficient in the diagnosis and/or prognosis of cancer. However, it is equally erroneous to assume a priori that the expression profiles of large numbers of genes are explicitly required for this purpose.
It is difficult to predict from standard clinical and pathologic features the clinical course of cancer. However, it is very important in the treatment of cancer to select and implement an appropriate combination of therapeutic approaches. The available methods for designing strategies for treating cancer patients are complex and time consuming. The wide range of cancer subgroups and variations in disease progression limit the predictive ability of the healthcare professional. In addition, continuing development of novel treatment strategies and therapeutics will result in the addition of more variables to the already complex decision-making process involving matching the cancer patient with a treatment regimen that is appropriate and optimized for the cancer stage, tumor growth rate, and other factors central to the individual patient's prognosis. Because of the critical importance of selecting appropriate treatment regimens for cancer patients, the development of guidelines for treatment selection is of key interest to those in the medical community and their patients. Thus, there presently is a need for objective, reproducible, and sensitive methods for diagnosing cancer, predicting cancer patient prognosis and outcome, and selecting and monitoring optimal treatment regimens.
Summary of the Invention
Using focused microarray-based expression profiling, a simple method was developed to diagnose and predict outcome in patients with malignant pleural mesothelioma (MPM). MPM is a mesodermally derived, neoplastic disease that arises in the pleura and relentlessly grows into adjacent structures until it ultimately results in the death of the patient. There are three distinct histological subtypes of MPM: epithelial, mixed, and sarcomatoid (Corson, 1996). Tumor specimens that are linked to a comprehensive clinical database were utilized to be able to directly correlate gene expression data to clinical variables such as survival and develop and test novel prognostic and diagnostic tests for MPM and other cancers. Additional tests have proven the applicability to cancers other than MPM, including lung adenocarcinoma, squamous carcinoma, medulloblastoma, prostate cancer, breast cancer, ovarian cancer, leukemias and lymphomas. The diagnostic and prognostic methods that were developed utilize gene expression data from as few as two genes through the use of expression level ratios and rationally chosen thresholds. The effectiveness of unit-less ratios in diagnosing cancer types was demonstrated and confirmed using real time quantitative reverse-transcriptase polymerase chain reaction (RT-PCR). This is a simple, but powerful, use of microarray data that can be easily adapted to a clinical setting to diagnose cancer (and non-cancer tissue or diseases) and predict patient outcome without complex computer software or hardware. Accordingly, diagnostic assays for classification of tumors, prediction of tumor outcome, selecting and monitoring treatment regimens, and monitoring tumor progression/regression can now be based on the ratios of expression of a small number of genes. The gene expression ratio concept can be applied to other tissues to diagnose or distinguish between tissues in different biological states, such as tissues from subjects having disease and not having disease, subjects that vary in response to pharmaceutical or that metabolize pharmaceutical at different rates, subjects that vary is disease susceptibility or predisposition, and the like. Thus a subject's prognosis or response to treatments, inter alia, can be determined through analysis of a limited set of genes in particular biological samples. Moreover, the gene expression data can be obtained from, and comparisons can be made between, a number of different methods including nucleic acid hybridization (e.g., microarrays) and nucleic acid amplification methods (e.g., RT-PCR).
According to one aspect of the invention, methods for diagnosing the presence in a biological sample of tissue in a first biological state , preferably cancer cells, in a tissue sample is provided. The methods include providing a set of two or more genes, wherein the set comprises at least one upregulated gene that is expressed in greater amounts in a tissue in a first biological state (preferably cancer cells) than in a second biological state (preferably corresponding non-cancer cells) and at least one downregulated gene that is expressed in lesser amounts in a tissue in the first biological state (preferably cancer cells) than in the second biological state (preferably corresponding non-cancer cells) The methods also include determining the expression levels of the set of two or more genes, and calculating a ratio of the expression level of the upregulated gene to the expression level of the downregulated gene, wherein the ratio is indicative of the presence of tissue in the first biological state (preferably cancer cells) in the tissue sample. Another preferred diagnostic use for the method is to identify non-cancer tissues or diseases.
In certain preferred embodiments, there is at least a 2-fold difference in mean expression levels between the at least one upregulated gene and the at least one downregulated gene. In other preferred embodiments, two or more expression ratios are calculated. In certain embodiments, the two or more expression ratios are combined, preferably by calculating the geometric mean of the two or more expression ratios.
In certain embodiments, the ratio is calculated by division of the expression level of one upregulated gene by the expression level of one downregulated gene, or by division of the expression levels of two or more upregulated genes by the expression level of one downregulated gene, or by division of the expression level of one upregulated gene by the expression levels of two or more downregulated genes, or by division of the expression levels of two or more upregulated genes by the expression levels of two or more downregulated genes.
In other embodiments, the methods also include transforming the expression level data for the upregulated and/or downregulated genes prior to calculating the ratio.
In still other embodiments, the expression levels are determined by a method selected from the group consisting of nucleic acid hybridization and nucleic acid amplification. In preferred embodiments, the nucleic acid hybridization is performed using a solid-phase nucleic acid molecule array. In other preferred embodiments, the nucleic acid amplification method is real-time PCR.
In yet other embodiments, the expression levels are determined by an immunological method, preferably using a solid-phase antibody array, an ELISA or ELISPOT assay. According to preferred embodiments of the foregoing methods, the cancer is selected from the group consisting of malignant pleural mesothelioma, lung adenocarcinoma, squamous carcinoma, medulloblastoma, prostate cancer, breast cancer, diffuse large B-cell lymphoma, follicular lymphoma and ovarian cancer. In certain embodiments, the least one ratio is indicative of the presence of cancer cells in the tissue sample. In other embodiments, the at least one ratio is indicative of the presence of non-cancer cells in the tissue sample.
Similar methods as those described above, for determining prognosis of a cancer patient, are also provided according to the invention.
According to another aspect of the invention, kits for cancer diagnosis are provided. The kits include a set of one or more ratios applicable to the analysis of gene expression data, wherein the ratio is calculated from the expression levels of at least one upregulated gene that is expressed in greater amounts in the cancer cells than in corresponding non-cancer cells and at least one downregulated gene that is expressed in lesser amounts in cancer cells than in corresponding non-cancer cells. In certain embodiments, the kit also includes instructions for the use of the one or more ratios in the diagnosis of the presence of cancer cells in a biological sample.
According to a further aspect of the invention, diagnostic systems are provided. The diagnostic systems include a measurement device that measures gene expression level data of a set of two or more genes, wherein the set comprises at least one upregulated gene that is expressed in greater amounts in a tissue in a first biological state (preferably cancer cells) than in a second biological state (preferably corresponding non-cancer cells) and at least one downregulated gene that is expressed in lesser amounts in the tissue in the first biological state (preferably cancer cells) than in the second biological state (preferably corresponding non-cancer cells). The system also includes a data transformation device that acquires the gene expression data from the measurement device and performs data transformation to calculate a ratio of the gene expression levels of the upregulated and downregulated genes.
In certain embodiments, the data transformation device selects gene expression data of a selected set of genes from the measurement device for calculating the ratio of the selected set of genes, wherein the ratio calculated from the gene expression data of the selected set of genes is diagnostic for a selected biological state, such as a condition, preferably cancer.
In other embodiments, the cancer diagnostic system also includes a user interface output device to output the ratio to a user. In preferred embodiments, the cancer diagnostic system also includes a database of ratios of gene expression that are diagnostic for cancers, and a comparison device that compares the ratio calculated from the measured gene expression to the diagnostic ratios stored in the database and outputs the comparison to the user interface output device. In other preferred embodiments, the cancer diagnostic system also includes a database of treatment information for specific cancers, wherein the comparison device identifies treatment information in the database for the specific cancer for which the diagnostic ratio matches the calculated ratio, and wherein the comparison outputs the treatment information to the user interface output device.
According to yet another aspect of the invention, methods for diagnosing malignant pleural mesothelioma in a subject suspected of having malignant pleural mesothelioma are provided. The methods include obtaining from the subject a tissue sample suspected of being cancerous, determining the expression of a set of nucleic acid molecules or expression products thereof in the tissue sample, wherein the set of nucleic acid molecules includes at least two nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77. Preferably the set of nucleic acids includes at least 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 11, 18, 19, 20, 21 , 22, 23, 24 or 25 nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77.
In certain embodiments, the methods include determining the expression of the set of nucleic acid molecules or expression products thereof in a non-cancerous tissue sample, and comparing the expression of the set of nucleic acid molecules or expression products thereof in the tissue sample suspected of being cancerous and the non-cancerous tissue sample. In other embodiments, the methods include calculating a ratio of the expression of at least two genes among the set of nucleic acid molecules.
Methods for selecting a course of treatment of a subject having or suspected of having malignant pleural mesothelioma are provided in another aspect of the invention. The methods include obtaining from the subject a tissue sample suspected of being cancerous, determining the expression of a set of nucleic acid markers or expression products thereof which are differentially expressed in malignant pleural mesothelioma tumor tissue samples, and selecting a course of treatment appropriate to the malignant pleural mesothelioma of the subject. In some embodiments the methods also include calculating a ratio of the expression of at least two genes among the set of nucleic acid markers or expression products thereof. In further embodiments, the methods include determining the expression of the set of nucleic acid molecules or expression products thereof in a non-cancerous tissue sample. In preferred embodiments, the expression of a set of nucleic acid markers is determined by a method selected from the group consisting of nucleic acid hybridization and nucleic acid amplification. More preferably, the nucleic acid hybridization is performed using a solid-phase nucleic acid molecule array, and the the nucleic acid amplification method is real-time PCR.
In another aspect of the invention, methods for evaluating treatment of malignant pleural mesothelioma are provided. The methods include obtaining a first determination of the expression of a set of nucleic acid molecules, or expression products thereof, which are differentially expressed in an malignant pleural mesothelioma tumor tissue sample from a subject undergoing treatment for cancer, obtaining a second determination of the expression of the set of nucleic acid molecules, or expression products thereof, in a second malignant pleural mesothelioma tumor tissue sample from the subject after obtaining the first determination, and comparing the first determination of expression to the second determination of expression as an indication of evaluation of the treatment. In some embodiments, the determinations of expressions are used to calculate a ratio of gene expression. In other embodiments, the methods include determining the expression of a set of nucleic acid markers which are differentially expressed in non-cancerous tissue samples.
In preferred embodiments, the expression of a set of nucleic acid markers is determined by a method selected from the group consisting of nucleic acid hybridization and nucleic acid amplification. Preferably, the nucleic acid hybridization is performed using a solid-phase nucleic acid molecule array and the nucleic acid amplification method is realtime PCR.
According to a further aspect of the invention, a solid-phase nucleic acid molecule array is provided which consists essentially of at least two nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 fixed to a solid substrate. In some embodiments, the solid-phase nucleic acid molecule array also includes at least one control nucleic acid molecule. In preferred embodiments, the set of nucleic acid molecules comprises at least 3, 4, 5,
6, 7, 8, 9, 10, 11, 12, 13, 14, 1,5 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77.
In certain embodiments, the solid substrate comprises a material selected from the group consisting of glass, silica, aluminosilicates, borosilicates, metal oxides such as alumina and nickel oxide, various clays, nitrocellulose, or nylon. In other embodiments, the nucleic acid molecules are fixed to the solid substrate by covalent bonding.
According to still another aspect of the invention, solid-phase protein microaπays are provided that include at least two antibodies or antigen-binding fragments thereof, that specifically bind at least two different polypeptides selected from the group consisting of SEQ ID NOs:10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, fixed to a solid substrate.
In some embodiments, the microaπay further comprises an antibody or antigen- binding fragment thereof, that binds specifically to a cancer-associated polypeptide other than those selected from the group consisting of SEQ ID NOs:10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78. In other embodiments, the protein microaπay also includes at least one control polypeptide molecule.
In prefeπed embodiments, the antibodies are monoclonal antibodies, or polyclonal antibodies. Methods for identifying lead compounds for a pharmacological agent useful in the treatment of malignant pleural mesothelioma are provided in another aspect of the invention. The methods include contacting a malignant pleural mesothelioma cell or tissue with a candidate pharmacological agent, determining the expression of a set of nucleic acid molecules in the malignant pleural mesothelioma cell or tissue sample under conditions which, in the absence of the candidate pharmacological agent, permit a first amount of expression of the set of nucleic acid molecules wherein the set of nucleic acid molecules comprises at least two nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, and detecting a test amount of the expression of the set of nucleic acid molecules, wherein a decrease in the test amount of expression in the presence of the candidate pharmacological agent relative to the first amount of expression indicates that the candidate pharmacological agent is a lead compound for a pharmacological agent which is useful in the treatment of malignant pleural mesothelioma. In prefeπed embodiments, the methods also include calculating a ratio of gene expression.
These and other aspects of the invention will be described in greater detail below.
Brief Description of the Drawing
Fig. 1 shows tumor diagnosis using expression ratios. Fig. 1 A, patterns of relative expression levels for the 8 genes selected from the training set can be extended to the remaining samples. Relative expression levels increase from low to high per legend. Fig. IB, graphic depiction of the magnitude and direction, in all 149 samples comprising the test set, of the value for two independent ratios (calretinin/claudin-7 and VAC-β/TACSTDl) chosen for further study. Fig. 1C, the 8 individual samples (represented by colored bars) that were misdiagnosed using one ratio or the other from Fig. IB (blue bars for misdiagnosed MPM samples, red bars for misdiagnosed ADCA samples).
Fig. 2 depicts validation of microaπay data and ratio based diagnosis. Quantitative RT-PCR was used to obtain ratio values for 12 MPM and 12 ADCA tumors. In this case, the two ratios coπectly identified 23/24 samples with one no-call.
Fig. 3 shows Kaplan-Meier survival predictions for medulloblastoma patients. Overall survival for patients predicted to be treatment responders (top line) and treatment failures (bottom line) using a 6-gene (5-ratio) model in a test set of samples (n=40). Hash marks indicate censored data.
Fig. 4 shows the validation of microaπay-based analysis of gene expression using real time quantitative RT-PCR. Fig. 4A shows that the average expression levels of CFB, transgelin, and fibronectin are significantly (P<0.05) different in tumor samples from Subclass 1 and Subclass 2. Fig. 4B shows that the expression level ratios remain consistent in distinguishing epithelial tumor samples from all others using data obtained from either microarray analysis or RT-PCR. Ratio represents the average gene expression level in epithelial subtype tumors relative to the average expression level of all other tumors combined. Eπor bars, SEM; M, data from microaπay analysis; RT-PCR, data from quantitative RT-PCR analysis. Fig. 5 depicts prediction of outcome in MPM using expression ratios or tumor histology. Fig. 5 A, survival of 31 MPM patients whose outcome was predicted using a 4- gene expression ratio model. Fig. 5B, survival of the 31 MPM patient samples from Fig. 5A plus 5 additional samples (36 total) as a function of tumor histological subtype. Prediction of outcome using the geometric mean value of 3 expression ratios is more accurate than the use of histological appearance alone at identifying patients with widely divergent outcome (Fig.
5 A). Although patients with epithelial histology tumors tend to survive longer, predicting prognosis in this manner is highly inexact for any individual patient (Fig. 5B). Each data point represents a single sample. Circles enclose tumor samples from patients with survival at or near the median for MPM. Horizontal bars depict median survival for each group. *, geometric mean calculated from the 3 most accurate expression ratios used to predict outcome (using data from a total of 4 genes).
Fig. 6 shows Kaplan-Meier survival predictions for mesothelioma patients and verification of microaπay data. Fig. 6A, overall survival for all 31 patients from which the training set was chosen. The estimated median survival for entire cohort was 11 months. Fig. 6B, overall survival based on the histological subtype of the tumor. The top line represents epithelial subtype tumors (median survival=17 months) and the bottom line represents non-epithelial subtype tumors (median survival=8.5 months). Although epithelial subtype tumors tend to favor longer survival, prediction of outcome in this manner is highly inexact and not accurate for individual samples. Fig. 6C, geometric mean values obtained for
6 randomly chosen samples (3 each from good and poor outcome groups) using quantitative RT-PCR confirmed microaπay data (M).
Fig. 7 depicts independent validation of the 4-gene expression ratio model. Fig. 7 A, overall survival for 29 independent mesothelioma patients. Similar to the initial 31 samples, the estimated median survival for this cohort was 12 months. Fig. 7B, overall survival based on the histological subtype of the tumor. The median survival of epithelial subtype tumors (top line, median survival=17 months) and non-epithelial subtype tumors (bottom line, median survival=12 months) in the new sample set was identical to that for the previous 31 samples and was equally insufficient for predicting outcome. Fig. 7C overall survival in the new set of samples for good outcome (top line, median survival=36 months) and poor outcome (bottom line, median survival=7 months) groups as defined by the 4 gene expression ratio model (utilizing RT-PCR for data acquisition). The 4-gene expression ratio model significantly ( =0.0035) predicts outcome in mesothelioma in an independent set of 29 samples.
Fig. 8 shows Kaplan-Meier disease-free survival predictions for breast cancer patients. Time to relapse for patients predicted to be good prognosis (top line) and poor prognosis (bottom line) using a 6-ratio model in the test set of samples (n=19). Hash marks indicate censored data.
Fig. 9 shows Kaplan-Meier survival predictions of test set samples for adenocarcinoma patients as described in Example 5. Time to relapse for patients predicted to be good prognosis (top line) and poor prognosis (bottom line) using a 3-ratio model in the test set data of Bhattacharjee et al. Hash marks indicate censored data.
Detailed Description of the Invention
Gene expression profiling using high density oligonucleotide aπays has figured prominently in recent studies using gene expression patterns in cancer to improve diagnosis and subclassification. Specifically, microaπays have been used to distinguish between acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) (Golub, 1999), to explore molecular differences within the AML group of diseases (Nirtaneva, 2001), to identify subclasses of breast cancer (Perou, 2000) and ovarian carcinoma (Welsh, 2001), and to define the metastatic phenotype of melanoma (Clark, 2000).
Although microaπay-based analysis of gene expression in cancer has yielded a wealth of information, effective clinical applications have not followed for several reasons. There are no universally accepted and applicable computational methods to analyze microaπay data (Quackenbush, 2001). Also, studies utilizing microaπays have lacked a comprehensive clinical database linking patient characteristics to their tumors' gene expression patterns. Furthermore, the prospect of having to use large numbers of genes to diagnose a disease subclass would require a relatively expensive analytical approach such as microarrays. Finally, sophisticated computer algorithms cuπently used for analysis of microaπays (Quackenbush, 2001; Khan, 2001) have placed the practical use of the resulting data beyond the reach of many biomedical scientists. These limitations were addressed using focused gene expression profiling of MPM in combination with an extensive clinical database to create an unexpectedly simple and effective ratio method with general clinical applicability (i.e., for cancers beyond MPM) for performing relatively low cost diagnosis and prediction of prognosis in cancer. In contrast to many microarray-based studies seeking to compare gene expression patterns between two or more predefined groups, unsupervised clustering was first used for class discovery in MPM. In this way, the introduction of experimental bias that follows from assuming that tumors of the same histological subtype necessarily possess similar gene expression profiles was avoided. By extension, prognostic genes were identified based on differential expression levels between tumors that were members of the two subclasses with the best and worst prognoses, and not based simply on tumor histology. The fact that the prognostic genes so identified also distinguish epithelial tumors is coincidental, though not suφrising since patients with epithelial subtype tumors tend to survive longer than those with mixed subtype tumors.
Subclassification using unsupervised clustering also presents a more biologically relevant organization. It has been shown that similar tumor appearance in itself does not necessitate similar patterns of gene expression nor final clinical outcome. For example, it is not unusual for patients with lung cancers of identical histology, differentiation, location, and stage to have diverging survival (Mountain, 1997). In the experiments described herein, one subclass contained tumors of all three major histological subtypes, suggesting (i) that tumors of diverse appearance are more similar than originally thought, (ii) that all subtypes of MPM are coπectly classified as a single disease, and (iii) that histology alone is not sufficient to determine prognosis.
Patient outcome depends on the phenotype of individual tumors at the molecular level, and this is reflected directly in gene expression. The recent explosion of bioinformatics has facilitated exploration of complex patterns of gene expression in human tissues (Fodor, 1997). However, exact relationships between gene expression patterns in cancer and clinical data remain largely undefined. Sophisticated computer algorithms have been recently developed capable of molecular diagnosis of tumors using the immense data sets generated by expression profiling (Khan, 2001). Though valid, the widespread clinical applicability of these techniques in the foreseeable future is questionable. The study described herein shows that diagnosis and prognosis of cancer using data originally obtained from microaπays is not explicitly dependent on the use of increasingly complex technology or complicated methods. Microarrays themselves are evolving at a rapid pace and gene expression analysis in this manner remains an expensive endeavor. Therefore, comparing historical data to that obtained from new generation microarrays remains a priority for most investigators. Yet there are no satisfactory solutions to date that adequately address all of the normalization issues encountered when attempting to merge data from older microaπays, or those from multiple manufacturers. Examination of ratios of gene expression, as described herein, as opposed to absolute expression levels, also assists in the practical use of data from the older generation of commercially obtained microaπays. The invention described herein also relates to the identification of a set of genes expressed in cancer tissue that are diagnostic for the cancer and/or predictive of the clinical outcome of the cancer. In one aspect, ratios of gene expression are used as indicia of cancer type, cancer class, and/or cancer prognosis, all of which are useful for determining a course of treatment of a patient.
Changes in cell phenotype in cancer are often the result of one or more changes in the genome expression of the cell. Some genes are expressed in tumor cells, and not in normal cells. Other genes are expressed at higher or lower levels in cancer cells than in normal counteφarts. In addition, certain genes are expressed in different levels in different subgroups of cancers, which have different prognoses and require different treatment regimens to optimize patient outcome. The differential expression of such genes can be examined by the assessment of nucleic acid or protein expression in the cancer tissue.
One of the recent developments in gene expression analysis involves the use of microaπays to measure simultaneously the expression of hundreds or thousands of genes. Practical application of this technology requires that researchers or laboratories have a sophisticated knowledge of molecular biology to generated gene expression data, and of computer algorithms for analysis of the large quantities of data generated by the use of the microaπays. The requirements for such knowledge make the use of microaπays impractical in the clinical setting, and difficult even for research laboratories. In addition, one must account for differences in microaπay architecture, sample preparation, and analytical equipment that captures the signals from the microaπays.
The use of gene expression ratios in the diagnosis and prediction of prognosis in cancer overcomes several major obstacles to the clinical use of microaπay data. The methodology described herein avoids the technical difficulties described above. It generates a simple numerical measure that can be used to predict various aspects of patient clinical data (such as histological subtype and survival) using a single patient biopsy sample. Since this non-linear function of gene expression is a unit-less number, expression levels can be measured using any reliable method such as quantitative RT-PCR or microaπays (nucleic acid or protein) regardless of the type of data capture equipment. Thus, the present invention permits the diagnosis of cancer by clinical laboratories using standard equipment without the requirement for sophisticated data analysis.
Importantly, the diagnostic/prognostic accuracy of ratios permits an earlier definitive diagnosis using initial biopsy samples and reveals important clues about anticipated patient outcome prior to the assignment of treatment strategies. Considering the clinical treatment of MPM, for example, an initial diagnosis is usually made for patients presenting with a malignant pleural effusion. Typically, this diagnosis is confirmed prior to subjecting patients to major surgical resections. Unfortunately, standard pathological techniques for diagnosis even at this point may be inadequate due to a lack of suitable quantities of tissue. As a consequence, the histological subtype of the tumor initially diagnosed may not always be the same as that conclusively determined at the time of surgery (samples analyzed in this study were obtained at surgery when sufficient amount of tissue was available for a definitive pathological diagnosis). This makes it difficult, if not impossible, to stratify treatment based on histological subtyping by prevailing methods. Diagnosis of other cancers is hampered by similar problems. Ratios obtained using tumor tissues taken at the time of initial biopsy can provide a firm diagnosis, determine subclass, and predict outcome after therapy when cuπent pathological techniques are insufficient.
The invention also provides a new, more powerful method of stratifying patients with MPM (and more generally, is applicable to other cancers and other biological states and conditions). It has been previously documented that patients with the epithelial subtype generally enjoy a better prognosis than patients with non-epithelial histology (regardless of treatment strategy) and benefit from aggressive surgical resection. However, this is not an all-inclusive phenomenon; some patients with non-epithelial histology enjoy a longer survival than those with epithelial histology. These factors make it difficult to design clinical studies to explore alternative treatment strategies based on histological subtype.
The results presented herein provide a basis for at least one rational explanation of the aforementioned phenomenon: within MPM, there are actually two classes of epithelial tumors and two classes of mixed tumors. A series of simple tests utilizing ratios of gene expression is proposed that can determine with a high degree of accuracy the coπect tumor histological subtype/subclass, and the likely clinical outcome of the patient. This information can be produced from a small tissue biopsy and does not require major surgery. Such classification is useful in the development of meaningful clinical trials in MPM. It therefore can be hypothesized that patients found to have tumors representative of those in Subclass 2 (short- lived mixed subtype) are excellent candidates for neo-adjuvant chemotherapy protocols as they are unlikely to benefit from surgery, whereas patients in Subclasses 1 and 3 are more likely to enjoy long term survival after surgical therapy. Expression ratios involving two genes that vary in expression between different sample types (e.g., cancer/non-cancer) were used to diagnose and predict prognosis in MPM. Diagnostic and/or prognostic genes in general can be initially identified from microaπay analysis and then be tested for clinical relevancy using simpler methods such as RT-PCR. To accomplish similar feats in other biological conditions or states, including other cancers, it may be necessary to use expression ratios including different mathematical combinations and/or more than two genes. The ratio concept described herein (e.g., for clinical use) is simply the relationship between the expression levels of multiple genes that vary in expression between two different sample types, i.e., samples that have different biological properties or were obtained from subjects having different phenotypes, such as cancer/non- cancer phenotypes, responsive/non-responsive to stimuli, susceptible/not susceptible to disease, different metabolic functions, etc. Non-linear unit-less ratios, in any form, can still remain simple if a relatively small number of genes are used in such a way as to not require complex computational software. Therefore, expression ratios of selected genes that vary in expression in two different biological samples may be used to translate complex data sets into simple tests that give clinically useful information for the diagnosis and prediction of prognosis of cancers.
Ratios of gene expression levels can be calculated from expression data of two or more genes at the mRNA level and or protein level. Expression levels of two or more isoforms or variants of the same gene (e.g., splice variants or post-translationally modified variants) also can be used in the ratios. In contrast to prior methods for comparing gene expression, which compared the expression levels of genes relative to an gene having substantially unchanging expression (e.g., a housekeeping gene), the present method compares the expression of two or more genes that differ in expression between two (or more) biological states. Thus in a prefeπed embodiment, ratios are calculated from expression data of two or more genes, wherein one of the two or more genes is expressed at higher levels in a first biological state relative to the second biological state (upregulated in the first biological state), and a second of the two or more genes is expressed at lower levels in a second biological state relative to the first biological state (downregulated in the first biological state). Examples of this are demonstrated herein, wherein the expression levels of two or more genes that differ in expression in mesothelioma and normal tissue, or in subclasses of mesothelioma, are used to calculate ratios that effectively predict the phenotype of unknown tissue samples. The ratios can be simple ratios (e.g., x/y) or more complex ratios tha include mathematical manipulation of gene expression levels, for example, (x+a)/(^ +b) or x y3, wherein x and y represent the expression level data for genes X and Y, andk a and b can be either expression level data for genes A and B, or mathematical factors. The use of the ratios is not limited to one set of two genes. Additional sets of genes (two sets, three sets, or more sets) may be required to provide an optimally accurate diagnosis of certain biological states or conditons (e.g., cancers) based on the expression of certain sets of gexies. Thus the methods are not limited to a ratio of two genes; a total of 4, 6, or more genes and various ratios of them may be used. Further transformation of the data in the form of multiple gene expression ratios also can be performed. In certain prefeπed embodiments, the geometric mean of multiple gene ratios is calculated. The expression data used co calculate the ratios may be obtained using any art-known method for analyzing gene expression including microarrays (e.g., standard or custom aπays; nucleic acid, protein or antibody aπays), quantitative RT-PCR, antibody or other immunoassay measuremeivts, etc.
The ratios can be used to diagnose any condition having a genetic component in which two or more genes are differentially expressed in two or more biological states. Conditions include diseases, susceptibility to diseases, metabolic functions (e.g., variability in the metabolism of drugs), response to injury, responses to local cellular environments and the like. In prefeπed embodiments, the condition is a disease. For example, any diseases that are characterized by (1) the relative increase in the expression of a first gene in a first disease state, and (2) the relative increase in the expression of a second gene in a second disease state or nondisease state, can be diagnosed using ratios of gene expression. Prefeπed examples of such diseases are cancer, as demonstrated herein for malignant pleural mesothelioma. The ratios of gene expression also can be used to predict a condition outcome or condition prognosis, to monitor onset of a condition, to monitor treatment, and to select a course of treatment for a condition.
The gene expression data for calculation of the ratios may be obtained from analysis of biological samples including tissue, blood, urine, cerebrospinal fluid or other bodily fluids of a subject (e.g., humans or other animals). The expression data can be used without any transformation to calculate a simple ratio of two or more genes as exemplified in the Examples, or data transformation can be applied prior to, or as a part of, calculating the ratios. The ratio calculation and/or data transformation can be performed by the device that captures the expression data (e.g., a device for performing real-time PCR or a microaπay reader), or can be performed by a separate computer running appropriate software.
In certain embodiments, software for calculating ratios as described herein can be provided on a computer connected by data link to a data generating device, such as a microaπay reader or PCR machine. Any standard data link can be used, including serial or parallel cables, radio frequency or infrared telemetry links, LAN connections, WAN connections, etc. Alternatively, data can be transfeπed by computer-readable medium (e.g., magnetic or optical medium) and read by the software. The data also can be entered directly by the user via user interface, such as a keyboard, monitor, mouse, graphical user interface such as touch screen, etc. The computer may be contained within the data generating device, providing an integrated system for generating raw data, calculating ratios, and displaying such ratios. One or more computers also may be linked to one or more data generating devices and one or more display devices, such as in a local area network or wide area network.
After acquiring the raw gene expression data from the data generating device, the data for the variables examined can be used to calculate gene expression ratios in accordance with the methods of the invention. The software can allow the user to select a number of genes prefeπed for diagnosis or prognosis, or the software may calculate ratios for a standardized set or sets of genes (e.g., genes known to be useful for classification of a tissue type or set of tissue types). The software can execute data transformation algorithms from a preselected group, or can allow the user to input other algorithms. The ratio data can be stored in a data file, printed, and/or directly displayed to the user on a graphical user interface.
In one embodiment of the invention, a visual display is used to display the ratio data for the classification, diagnosis and or prediction of prognosis. The visual display can be a graphical user interface, such as a monitor, or a printer.
The invention also relates to the identification of a set of genes that permit confirmation of the presence of malignant pleural mesothelioma cells in biological samples. Probes for the expression of the genes can be incoφorated into a custom aπay for diagnosis of malignant pleural mesothelioma. The genes identified permit, ter alia, rapid screening of cancer samples by nucleic acid microaπay hybridization or protein expression technology to determine the expression of the specific genes and thereby to predict the outcome of the cancer. A microaπay also can be used to diagnose malignant pleural mesothelioma, distinguish it from lung cancer (adenocarcinoma and squamous carcinoma), normal lung tissue and/or pleura. One also can use the custom aπays (or standard aπays that contain the genes identified herein) to identify the histological subtype of MPM, the subclass of MPM for determining prognosis. Such screening is beneficial, for example, in selecting the course of treatment to provide to the cancer patient (i.e., directing therapy), and to monitor the efficacy of a treatment.
The invention differs from traditional cancer diagnostic and classification techniques with respect to the speed, simplicity, and reproducibility of the cancer diagnostic assay. The invention also differs from other microaπay-based diagnostic methods in that it does not require extensive data analysis or data transformation employing complex algorithms.
Further, the invention differs from other cancer diagnostic methods in that it permits accurate diagnosis and classification of tumors by the analysis of a limited set of genes. The use of a limited set of genes in the methods permits the use of simpler methods for acquisition of data, e.g., nucleic acid hybridization based methods such as RT-PCR, that do not generate massive quantities of data from parallel analysis of a large number of genes. The invention also presents targets for drug development because it identifies genes that are differentially expressed in tumors, which can be utilized in the development of drugs to treat such tumors, e.g., by reducing expression of the genes or reducing activity of proteins encoded by the genes. The invention simplifies prognosis determination by providing an identified set of a small number of genes whose level of expression in malignant pleural mesothelioma predicts clinical outcome as defined by, e.g., patient survival times. In developing the invention, RNA expression phenotyping was performed using high density microarrays generated from quantitative expression data on over 12,000 genes, which have been analyzed to identify specific probe sets (genes). The expression gene set has multifold uses including, but not limited to, the following examples. The expression gene set may be used as a prognostic tool for malignant pleural mesothelioma patients, to make possible more finely tuned diagnosis of malignant pleural mesothelioma and allow healthcare professionals to tailor treatment to individual patients' needs. The invention can also assess the efficacy of cancer treatment by determining progression or regression of malignant pleural mesothelioma cancer in patients before, during, and after treatment. Another utility of the expression gene set is in the biotechnology and pharmaceutical industries' research on disease pathway discovery for therapeutic targeting. The invention can identify alterations in gene expression in malignant pleural mesothelioma and can also be used to uncover and test candidate pharmaceutical agents to treat malignant pleural mesothelioma.
As used herein, a subject is a human, non-human primate, cow, horse, pig, sheep, goat, dog, cat, or rodent. In all embodiments human subjects are prefeπed. In aspects of the invention pertaining to diagnosis of malignant pleural mesothelioma, the subject is a human either suspected of having malignant pleural mesothelioma, or having been diagnosed with malignant pleural mesothelioma. In aspects of the invention pertaining to cancer diagnosis in general, using the non-linear methods employing ratios of gene expression described herein, the subject preferably is a human suspected of having cancer, or a human having been previously diagnosed as having cancer. Methods for identifying subjects suspected of having cancer may include physical examination, subject's family medical history, subject's medical history, biopsy, or a number of imaging technologies such as ultrasonography, computed tomography, magnetic resonance imaging, magnetic resonance spectroscopy, or positron emission tomography. Diagnostic methods for cancer and the clinical delineation of cancer diagnoses are well known to those of skill in the medical arts.
As used herein, a tissue sample is tissue obtained from a tissue biopsy using methods well known to those of ordinary skill in the related medical arts. The phrase "suspected of being cancerous" as used herein means a cancer tissue sample believed by one of ordinary skill in the medical arts to contain cancerous cells. Methods for obtaining the sample from the biopsy include gross apportioning of a mass, microdissection, laser-based microdissection, or other art-known cell-separation methods.
Because of the variability of the cell types in diseased-tissue biopsy material, and the variability in sensitivity of the diagnostic methods used, the sample size required for analysis may range from 1, 10, 50, 100, 200, 300, 500, 1000, 5000, 10,000, to 50,000 or more cells. The appropriate sample size may be determined based on the cellular composition and condition of the biopsy and the standard preparative steps for this determination and subsequent isolation of the nucleic acid for use in the invention are well known to one of ordinary skill in the art. An example of this, although not intended to be limiting, is that in some instances a sample from the biopsy may be sufficient for assessment of RNA expression without amplification, but in other instances the lack of suitable cells in a small biopsy region may require use of RNA conversion and/or amplification methods or other methods to enhance resolution of the nucleic acid molecules. Such methods, which allow use of limited biopsy materials, are well known to those of ordinary skill in the art and include, but are not limited to: direct RNA amplification, reverse transcription of RNA to cDNA, amplification of cDNA, or the generation of radio-labeled nucleic acids.
As used herein, the phrase determining the expression of a set of nucleic acid molecules in the tissue means identifying RNA transcripts in the tissue sample by analysis of nucleic acid or protein expression in the tissue sample. As used herein for diagnosis of MPM and/or determination of outcome of MPM patients, "set" refers to a group of nucleic acid molecules that include 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, or 35 different nucleic acid sequences from the group of 26 nucleic acid sequences in Table 1 (SEQ ID NOs: 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57 and 59) and/or from the group of 11 nucleic acid sequences in Table 3 (SEQ ID Nos: 43, 45, 61, 63, 65, 67, 69, 71, 73, 75 and 77). Other sets will be used for other malignacies or other disorders to determine gene ratios for diagnosis, outcome determination and the like; some of these data sets are described in the Examples below. The expression of the set of nucleic acid molecules in the sample from the patient suspected of having malignant pleural mesothelioma can be compared to the expression of the set of nucleic acid molecules in a sample of tissue that is non-cancerous. As used herein with respect to diagnosis of malignant pleural mesothelioma, non-cancerous tissue means tissue determined by one of ordinary skill in the medical art to have no evidence of malignant pleural mesothelioma based on standard diagnostic methods including, but not limited to, histologic staining and microscopic analysis.
Nucleic acid markers for cancer are nucleic acid molecules that by their presence or absence indicate the presence of absence of malignant pleural mesothelioma. In tissue, certain nucleic acid molecules are expressed at different levels depending on whether tissue is non-cancerous or cancerous.
Hybridization methods for nucleic acids are well known to those of ordinary skill in the art (see, e.g. Molecular Cloning: A Laboratory Manual, J. Sambrook, et al., eds., Second Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1989, or Current Protocols in Molecular Biology, F.M. Ausubel, et al., eds., John Wiley & Sons, Inc., New York). The nucleic acid molecules from a malignant pleural mesothelioma tissue sample hybridize under stringent conditions to nucleic acid markers expressed in malignant pleural mesothelioma. In one embodiment the markers are sets of two or more of the nucleic acid molecules as set forth in Table 1 (SEQ ID NOs: 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57 and 59) or Table 3 (SEQ ID Nos: 43, 45, 61, 63, 65, 67, 69, 71, 73, 75, 77).
The malignant pleural mesothelioma nucleic acid markers disclosed herein are known genes and fragments thereof. It may be desirable to identify variants of those genes, such as allelic variants or single nucleotide polymoφhisms (SNPs) in tissues. Accordingly, methods for identifying malignant pleural mesothelioma nucleic acid markers, including variants of the disclosed full-length cDNAs, genomic DNAs, and SNPs are also included in the invention. The methods include contacting a nucleic acid sample (such as a cDNA library, genomic library, genomic DNA isolate, etc.) with a nucleic acid probe or primer derived from one of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77. The nucleic acid sample and the probe or primer hybridize to complementary nucleotide sequences of nucleic acids in the sample, if any are present, allowing detection of nucleic acids related to SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77. Preferably the probe or primer is detectably labeled. The specific conditions, reagents, and the like can be selected by one of ordinary skill in the art to selectively identify nucleic acids related to sets of two or more of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77. The isolated nucleic acid molecule can be sequenced according to standard procedures.
In addition to native nucleic acid markers (SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77), the invention also includes degenerate nucleic acids that include alternative codons to those present in the native materials. For example, serine residues are encoded by the codons TCA, AGT, TCC, TCG, TCT, and AGC. Each of the six codons is equivalent for the puφoses of encoding a serine residue. Similarly, nucleotide sequence triplets that encode other amino acid residues include, but are not limited to: CCA, CCC, CCG, and CCT (proline codons); CGA, CGC, CGG, CGT, AGA, and AGG (arginine codons); ACA, ACC, ACG, and ACT (threonine codons); AAC and AAT (asparagine codons); and ATA, ATC, and ATT (isoleucine codons). Other amino acid residues may be encoded similarly by multiple nucleotide sequences. Thus, the invention embraces degenerate nucleic acids that differ from the biologically isolated nucleic acids in codon sequence due to the degeneracy of the genetic code. The invention also provides modified nucleic acid molecules, which include additions, substitutions, and deletions of one or more nucleotides such as the allelic variants and SNPs described above. In prefeπed embodiments, these modified nucleic acid molecules and/or the polypeptides they encode retain at least one activity or function of the unmodified nucleic acid molecule and/or the polypeptides, such as hybridization, antibody binding, etc. In certain embodiments, the modified nucleic acid molecules encode modified polypeptides, preferably polypeptides having conservative amino acid substitutions. As used herein, a "conservative amino acid substitution" refers to an amino acid substitution which does not alter the relative charge or size characteristics of the protein in which the amino acid substitution is made. Conservative substitutions of amino acids include substitutions made amongst amino acids within the following groups: (a) M, I, L, V; (b) F, Y, W; (c) K, R, H; (d) A, G; (e) S, T; (f) Q, N; and (g) E, D. The modified nucleic acid molecules are structurally related to the unmodified nucleic acid molecules and in prefeπed embodiments are sufficiently structurally related to the unmodified nucleic acid molecules so that the modified and unmodified nucleic acid molecules hybridize under stringent conditions known to one of skill in the art.
For example, modified nucleic acid molecules that encode polypeptides having single amino acid changes can be prepared for use in the methods and products disclosed herein. Each of these nucleic acid molecules can have one, two, or three nucleotide substitutions exclusive of nucleotide changes corresponding to the degeneracy of the genetic code as described herein. Likewise, modified nucleic acid molecules that encode polypeptides having two amino acid changes can be prepared, which have, e.g., 2-6 nucleotide changes. Numerous modified nucleic acid molecules like these will be readily envisioned by one of skill in the art, including for example, substitutions of nucleotides in codons encoding amino acids 2 and 3, 2 and 4, 2 and 5, 2 and 6, and so on. In the foregoing example, each combination of two amino acids is included in the set of modified nucleic acid molecules, as well as all nucleotide substitutions that code for the amino acid substitutions. Additional nucleic acid molecules that encode polypeptides having additional substitutions (i.e., 3 or more), additions or deletions [e.g., by introduction of a stop codon or a splice site(s)] also can be prepared and are embraced by the invention as readily envisioned by one of ordinary skill in the art. Any of the foregoing nucleic acids can be tested by routine experimentation for retention of structural relation to or activity similar to the nucleic acids disclosed herein. In the invention, standard hybridization techniques of microaπay technology are utilized to assess patterns of nucleic acid expression and identify nucleic acid marker expression. Microaπay technology, which is also known by other names including: DNA chip technology, gene chip technology, and solid-phase nucleic acid aπay technology, is well known to those of ordinary skill in the art and is based on, but not limited to, obtaining an aπay of identified nucleic acid probes on a fixed substrate, labeling target molecules with reporter molecules (e.g., radioactive, chemiluminescent, or fluorescent tags such as fluorescein, Cye3-dUTP, or Cye5-dUTP), hybridizing target nucleic acids to the probes, and evaluating target-probe hybridization. A probe with a nucleic acid sequence that perfectly matches the target sequence will, in general, result in detection of a stronger reporter- molecule signal than will probes with less perfect matches. Many components and techniques utilized in nucleic acid microaπay technology are presented in The Chipping Forecast, Nature Genetics, Vol.21, Jan 1999, the entire contents of which is incoφorated by reference herein. According to the present invention, microarray substrates may include but are not limited to glass, silica, aluminosilicates, borosilicates, metal oxides such as alumina and nickel oxide, various clays, nitrocellulose, or nylon. In all embodiments a glass substrate is prefeπed. According to the invention, probes are selected from the group of nucleic acids including, but not limited to: DNA, genomic DNA, cDNA, and oligonucleotides; and may be natural or synthetic. Oligonucleotide probes preferably are 20 to 25-mer oligonucleotides and DNA/cDNA probes preferably are 500 to 5000 bases in length, although other lengths may be used. Appropriate probe length may be determined by one of ordinary skill in the art by following art-known procedures. In one embodiment, prefeπed probes are sets of two or more of the nucleic acid molecules set forth as SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 (see also Table 1 and Table 3). Probes may be purified to remove contaminants using standard methods known to those of ordinary skill in the art such as gel filtration or precipitation.
In one embodiment, the microaπay substrate may be coated with a compound to enhance synthesis of the probe on the substrate. Such compounds include, but are not limited to, oligoethylene glycols. In another embodiment, coupling agents or groups on the substrate can be used to covalently link the first nucleotide or oligonucleotide to the substrate. These agents or groups may include, but are not limited to: amino, hydroxy, bromo, and carboxy groups. These reactive groups are preferably attached to the substrate through a hydrocarbyl radical such as an alkylene or phenylene divalent radical, one valence position occupied by the chain bonding and the remaining attached to the reactive groups. These hydrocarbyl groups may contain up to about ten carbon atoms, preferably up to about six carbon atoms. Alkylene radicals are usually prefeπed containing two to four carbon atoms in the principal chain. These and additional details of the process are disclosed, for example, in U.S. Patent 4,458,066, which is incoφorated by reference in its entirety.
In one embodiment, probes are synthesized directly on the substrate in a predetermined grid pattern using methods such as light-directed chemical synthesis, photochemical deprotection, or delivery of nucleotide precursors to the substrate and subsequent probe production.
In another embodiment, the substrate may be coated with a compound to enhance binding of the probe to the substrate. Such compounds include, but are not limited to: polylysine, amino silanes, amino-reactive silanes (Chipping Forecast, 1999) or chromium (Gwynne and Page, 2000). In this embodiment, presynthesized probes are applied to the substrate in a precise, predetermined volume and grid pattern, utilizing a computer-controlled robot to apply probe to the substrate in a contact-printing manner or in a non-contact manner such as ink jet or piezo-electric delivery. Probes may be covalently linked to the substrate with methods that include, but are not limited to, UN-iπadiation. In another embodiment probes are linked to the substrate with heat.
Targets are nucleic acids selected from the group, including but not limited to: DΝA, genomic DΝA, cDΝA, RΝA, mRΝA and may be natural or synthetic. In all embodiments, nucleic acid molecules from human tissue are prefeπed. The tissue may be obtained from a subject or may be grown in culture (e.g., from a malignant pleural mesothelioma cell line). In embodiments of the invention one or more control nucleic acid molecules are attached to the substrate. Preferably, control nucleic acid molecules allow determination of factors including but not limited to: nucleic acid quality and binding characteristics; reagent quality and effectiveness; hybridization success; and analysis thresholds and success. Control nucleic acids may include but are not limited to expression products of genes such as housekeeping genes or fragments thereof.
In one embodiment of the invention, expression of nucleic acid markers is used to select clinical treatment paradigms for cancers, sucli as malignant pleural mesothelioma. Treatment options, as described herein, may include but are not limited to: radiotherapy, chemotherapy, adjuvant therapy, or any combination of the aforementioned methods. Aspects of treatment that may vary include, but are not limited to: dosages, timing of administration, or duration or therapy; and may or may not be combined with other treatments, which may also vary in dosage, timing, or duration. Another treatment for malignant pleural mesothelioma is surgery, which can be utilized either alone or in combination with any of the aforementioned treatment methods. One of ordinary skill in the medical arts may determine an appropriate treatment paradigm based on evaluation of differential expression of sets of two or more genes, such as those set forth as SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 for malignant pleural mesothelioma. Cancers that express markers that are indicative of a more aggressive cancer or poor prognosis may be treated with more aggressive therapies.
Progression or regression of malignant pleural mesothelioma is determined by comparison of two or more different malignant pleural mesothelioma tissue samples taken at two or more different times from a subject. For example, progression or regression may be evaluated by assessments of expression of sets of two or more of the nucleic acid targets, preferably using ratios of expression, including but not limited to SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, in an malignant pleural mesothelioma tissue sample from a subject before, during, and following treatment for malignant pleural mesothelioma. Progression or regression or other cancers or disease states would be determined similarly.
In another embodiment, novel pharmacological agents useful in the treatment of malignant pleural mesothelioma can be identified by assessing variations in the expression of sets of two or more malignant pleural mesothelioma nucleic acid markers (preferably, variations in the ratios of expression), from among SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, prior to and after contacting malignant pleural mesothelioma cells or tissues with candidate pharmacological agents for the treatment of malignant pleural mesothelioma. The cells may be grown in culture (e.g. from an malignant pleural mesothelioma cell line), or may be obtained from a subject, (e.g. in a clinical trial of candidate pharmaceutical agents to treat malignant pleural mesothelioma). Alterations in expression of two or more sets of nucleic acid markers, from among SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, in malignant pleural mesothelioma cells or tissues tested before and after contact with a candidate pharmacological agent to treat malignant pleural mesothelioma, indicate progression, regression, or stasis of the malignant pleural mesothelioma thereby indicating efficacy of candidate agents and concomitant identification of lead compounds for therapeutic use in malignant pleural mesothelioma.
The invention further provides efficient methods of identifying pharmacological agents or lead compounds for agents active at the level of malignant pleural mesothelioma cellular function. Generally, the screening methods involve assaying for compounds that beneficially alter malignant pleural mesothelioma nucleic acid molecule expression. Such methods are adaptable to automated, high-throughput screening of compounds.
The assay mixture comprises a candidate pharmacological agent. Typically, a plurality of assay mixtures are run in parallel with different agent concentrations to obtain a different response to the various concentrations. Typically, one of these concentrations serves as a negative control, i.e., at zero concentration of agent or at a concentration of agent below the limits of assay detection. Candidate agents encompass numerous chemical classes, although typically they are organic compounds. Preferably, the candidate pharmacological agents are small organic compounds, i.e., those having a molecular weight of more than 50 yet less than about 2500, preferably less than about 1000 and, more preferably, less than about 500. Candidate agents comprise functional chemical groups necessary for structural interactions with polypeptides and/or nucleic acids, and typically include at least an amine, carbonyl, hydroxyl, or carboxyl group, preferably at least two of the functional chemical groups and more preferably at least three of the functional chemical groups. The candidate agents can comprise cyclic carbon or heterocyclic structure and/or aromatic or polyaromatic structures substituted with one or more of the above-identified functional groups. Candidate agents also can be biomolecules such as peptides, saccharides, fatty acids, sterols, isoprenoids, purines, pyrimidines, derivatives or structural analogs of the above, or combinations thereof and the like. Where the agent is a nucleic acid, the agent typically is a DNA or RNA molecule, although modified nucleic acids as defined herein are also contemplated. Candidate agents are obtained from a wide variety of sources including libraries of synthetic or natural compounds. For example, numerous means are available for random and directed synthesis of a wide variety of organic compounds and biomolecules, including expression of randomized oligonucleotides, synthetic organic combinatorial libraries, phage display libraries of random peptides, and the like. Alternatively, libraries of natural compounds in the form of bacterial, fungal, plant, and animal extracts are available or readily produced. Additionally, natural and synthetically produced libraries and compounds can be readily be modified through conventional chemical, physical, and biochemical means. Further, known pharmacological agents may be subjected to directed or random chemical modifications such as acylation, alkylation, esterification, amidification, etc. to produce structural analogs of the agents.
A variety of other reagents also can be included in the mixture. These include reagents such as salts, buffers, neutral proteins (e.g., albumin), detergents, etc. which may be used to facilitate optimal protein-protein and/or protein-nucleic acid binding. Such a reagent may also reduce non-specific or background interactions of the reaction components. Other reagents that improve the efficiency of the assay such as protease, inhibitors, nuclease inhibitors, antimicrobial agents, and the like may also be used.
The mixture of the foregoing assay materials is incubated under conditions whereby, the anti-cancer candidate agent specifically binds the cellular binding target, a portion thereof or analog thereof. The order of addition of components, incubation temperature, time of incubation, and other parameters of the assay may be readily determined. Such experimentation merely involves optimization of the assay parameters, not the fundamental composition of the assay. Incubation temperatures typically are between 4°C and 40°C. Incubation times preferably are minimized to facilitate rapid, high throughput screening, and typically are between 0.1 and 10 hours.
After incubation, the presence or absence of specific binding between the anti- malignant pleural mesothelioma candidate agent and one or more binding targets is detected by any convenient method available to the user. For cell-free binding type assays, a separation step is often used to separate bound from unbound components. The separation step may be accomplished in a variety of ways. Conveniently, at least one of the components is immobilized on a solid substrate, from which the unbound components may be easily separated. The solid substrate can be made of a wide variety of materials and in a wide variety of shapes, e.g., microtiter plate, microbead, dipstick, resin particle, etc. The substrate preferably is chosen to maximize signal-to-noise ratios, primarily to minimize background binding, as well as for ease of separation and cost.
Separation may be effected for example, by removing a bead or dipstick from a reservoir, emptying or diluting a reservoir such as a microtiter plate well, rinsing a bead, particle, chromotographic column or filter with a wash solution or solvent. The separation step preferably includes multiple rinses or washes. For example, when the solid substrate is a microtiter plate, the wells may be washed several times with a washing solution, which typically includes those components of the incubation mixture that do not participate in specific bindings such as salts, buffer, detergent, non-specific protein, etc. Where the solid substrate is a magnetic bead, the beads may be washed one or more times with a washing solution and isolated using a magnet.
Detection may be effected in any convenient way for cell-based assays such as two- or three-hybrid screens. The transcript resulting from a reporter gene transcription assay of the anti-cancer agent binding to a target molecule typically encodes a directly or indirectly detectable product, e.g., β-galactosidase activity, luciferase activity, and the like. For cell- free binding assays, one of the components usually comprises, or is coupled to, a detectable label. A wide variety of labels can be used, such as those that provide direct detection (e.g., radioactivity, luminescence, optical, or electron density, etc) or indirect detection (e.g., epitope tag such as the FLAG epitope, enzyme tag such as horseseradish peroxidase, etc.). The label may be bound to an anti-cancer agent binding partner, or incoφorated into the structure of the binding partner.
A variety of methods may be used to detect the label, depending on the nature of the label and other assay components. For example, the label may be detected while bound to the solid substrate or subsequent to separation from the solid substrate. Labels may be directly detected through optical or electron density, radioactive emissions, nonradiative energy transfers, etc. or indirectly detected with antibody conjugates, strepavidin-biotin conjugates, etc. Methods for detecting the labels are well known in the art.
The invention thus generally provides cancer gene- or protein-specific binding agents, methods of identifying and making such agents, and their use in diagnosis, therapy and pharmaceutical development. For example, malignant pleural mesothelioma gene- or protein-specific pharmacological agents are useful in a variety of diagnostic and therapeutic applications as described herein. In general, the specificity of an cancer gene or protein binding to a binding agent is shown by binding equilibrium constants. Targets that are capable of selectively binding an cancer gene preferably have binding equilibrium constants of at least about 107 M"1, more preferably at least about 108 M"1, and most preferably at least about 109 M"1. The wide variety of cell-based and cell-free assays may be used to demonstrate cancer gene-specific binding. Cell-based assays include one, two and three hybrid screens, assays in which cancer gene-mediated transcription is inhibited or increased, etc. Cell-free assays include cancer gene-protein binding assays, immunoassays, etc. Other assays useful for screening agents which bind cancer polypeptides include fluorescence resonance energy transfer (FRET), and electrophoretic mobility shift analysis (EMSA). In another aspect of the invention, pre- and post-treatment alterations in expression of two or more sets of cancer nucleic acid markers, for example malignant pleural mesothelioma cancer nucleic acid markers including, but not limited to, SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, in cancer cells or tissues may be used to assess treatment parameters including, but not limited to: dosage, method of administration, timing of administration, and combination with other treatments as described herein.
Candidate pharmacological agents may include antisense oligonucleotides that selectively bind to a cancer-related nucleic acid marker molecule, as identified herein, to reduce the expression of the marker molecules in cancer cells and tissues. One of ordinary skill in the art can test of the effects of a reduction of expression of cancer nucleic acid marker sequences in vivo or in vitro, to determine the efficacy of one or more antisense oligonucleotides.
As used herein, the term "antisense oligonucleotide" or "antisense" describes an oligonucleotide that is an oligoribonucleotide, oligodeoxyribonucleotide, modified oligoribonucleotide, or modified oligodeoxyribonucleotide, which hybridizes under physiological conditions to DNA comprising a particular gene or to an mRNA transcript of that gene and, thereby, inhibits the transcription of that gene and/or the translation of that mRNA. The antisense molecules are designed so as to interfere with transcription or translation of a target gene upon hybridization with the target gene or transcript. Those skilled in the art will recognize that the exact length of the antisense oligonucleotide and its degree of complementarity with its target will depend upon the specific target selected, including the sequence of the target and the particular bases which comprise that sequence. It is preferred that the antisense oligonucleotide be constructed and arranged so as to bind selectively with the target under physiological conditions, i.e., to hybridize substantially more to the target sequence than to any other sequence in the target cell under physiological conditions.
Based upon the sequences of cancer expressed nucleic acids, or upon allelic or homologous genomic and/or cDNA sequences, one of skill in the art can easily choose and synthesize any of a number of appropriate antisense molecules for use in accordance with the present invention. In order to be sufficiently selective and potent for inhibition, such antisense oligonucleotides should comprise at least 10 and, more preferably, at least 15 consecutive bases that are complementary to the target, although in certain cases modified oligonucleotides as short as 7 bases in length have been used successfully as antisense oligonucleotides (Wagner et al., 1996). Most preferably, the antisense oligonucleotides comprise a complementary sequence of 20-30 bases. Although oligonucleotides may be chosen that are antisense to any region of the gene or mRNA transcripts, in prefeπed embodiments the antisense oligonucleotides coπespond to N-terminal or 5' upstream sites such as translation initiation, transcription initiation, or promoter sites. In addition, 3'- untranslated regions may be targeted. Targeting to mRNA splicing sites has also been used in the art but may be less preferred if alternative mRNA splicing occurs. In addition, the antisense is targeted, preferably, to sites in which mRNA secondary structure is not expected (see, e.g., Sainio et al., 1994) and at which proteins are not expected to bind. Finally, although the listed sequences are cDNA sequences, one of ordinary skill in the art may easily derive the genomic DNA coπesponding to the cDNA of an cancer expressed polypeptide. Thus, the present invention also provides for antisense oligonucleotides that are complementary to the genomic DNA coπesponding to cancer expressed nucleic acids, e.g, the malignant pleural mesothelioma nucleic acid markers described herein. Similarly, the use of antisense to allelic or homologous cDNAs and genomic DNAs are enabled without undue experimentation.
In one set of embodiments, the antisense oligonucleotides of the invention may be composed of "natural" deoxyribonucleotides, ribonucleotides, or any combination thereof. That is, the 5' end of one native nucleotide and the 3' end of another native nucleotide may be covalently linked, as in natural systems, via a phosphodiester intemucleoside linkage. These oligonucleotides may be prepared by art-recognized methods, which may be carried out manually or by an automated synthesizer. They also may be produced recombinantly by vectors.
In prefeπed embodiments, however, the antisense oligonucleotides of the invention also may include "modified" oligonucleotides. That is, the oligonucleotides may be modified in a number of ways that do not prevent them from hybridizing to their target but which enhance their stability or targeting or which otherwise enhance their therapeutic effectiveness. The term "modified oligonucleotide" as used herein describes an oligonucleotide in which (1) at least two of its nucleotides are covalently linked via a synthetic intemucleoside linkage (i.e., a linkage other than a phosphodiester linkage between the 5' end of one nucleotide and the 3' end of another nucleotide) and/or (2) a chemical group not normally associated with nucleic acids has been covalently attached to the oligonucleotide. Prefeπed synthetic intemucleoside linkages are phosphorothioates, alkylphosphonates, phosphorodithioates, phosphate esters, alkylphosphonothioates, phosphoramidates, carbamates, carbonates, phosphate triesters, acetamidates, carboxymethyl esters, and peptides.
The term "modified oligonucleotide" also encompasses oligonucleotides with a covalently modified base and/or sugar. For example, modified oligonucleotides include oligonucleotides having backbone sugars that are covalently attached to low molecular weight organic groups other than a hydroxyl group at the 3' position and other than a phosphate group at the 5' position. Thus modified oligonucleotides may include a 2'-O- alkylated ribose group. In addition, modified oligonucleotides may include sugars such as arabinose instead of ribose. The present invention, thus, contemplates pharmaceutical preparations containing modified antisense molecules that are complementary to and hybridizable with, under physiological conditions, malignant pleural mesothelioma expressed nucleic acids, together with pharmaceutically acceptable carriers.
Antisense oligonucleotides may be administered as part of a pharmaceutical composition. Such a pharmaceutical composition may include the antisense oligonucleotides in combination with any standard physiologically and/or pharmaceutically acceptable carriers which are known in the art. The compositions should be sterile and contain a therapeutically effective amount of the antisense oligonucleotides in a unit of weight or volume suitable for administration to a patient. The term "pharmaceutically acceptable" means a non-toxic material that does not interfere with the effectiveness of the biological activity of the active ingredients. The term "physiologically acceptable" refers to a non-toxic material that is compatible with a biological system such as a cell, cell culture, tissue, or organism. The characteristics of the carrier will depend on the route of administration. Physiologically and pharmaceutically acceptable carriers include diluents, fillers, salts, buffers, stabilizers, solubilizers, and other materials, which are well known in the art.
Expression of cancer nucleic acid molecules can also be determined using protein measurement methods, e.g., for use in the ratio-based diagnostic and prognostic methods described herein. For example, the expression of malignant pleural mesothelioma genes such as SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, can be determined by examining the expression of polypeptides encoded by SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 (exemplary translations are provided herein as SEQ ID NOs: 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78). Prefeπed methods of specifically and quantitatively measuring proteins include, but are not limited to: mass spectroscopy-based methods such as surface enhanced laser desoφtion ionization (SELDI; e.g., Ciphergen ProteinChip System), non-mass spectroscopy- based methods, immunoassay methods such as ELISA and immunohistochemistry-based methods such as 2-dimensional gel electrophoresis.
SELDI methodology may, through procedures known to those of ordinary skill in the art, be used to vaporize microscopic amounts of tumor protein and to create a "fingeφrint" of individual proteins, thereby allowing simultaneous measurement of the abundance of many proteins in a single sample. Preferably SELDI-based assays may be utilized to classify tumors. Such assays preferably include, but are not limited to the following examples. Gene products discovered by RNA microaπays may be selectively measured by specific (antibody mediated) capture to the SELDI protein disc (e.g., selective SELDI). Gene products discovered by protein screening (e.g., with 2-D gels), may be resolved by "total protein SELDI" optimized to visualize those particular markers of interest from among polypeptides encoded by SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 (e.g., SEQ ID NOs:10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78). Predictive models of tumor classification from SELDI measurement of multiple markers from among polypeptides encoded by SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 (e.g., SEQ ID NOs:10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78) may be utilized for the SELDI strategies. The invention also involves agents such as polypeptides that bind to malignant pleural mesothelioma-associated polypeptides, e.g., SEQ ID NOs: 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78. Such binding agents can be used, for example, in screening assays to detect the presence or absence of malignant pleural mesothelioma-associated polypeptides and complexes of malignant pleural mesothelioma-associated polypeptides and their binding partners and in purification protocols to isolate malignant pleural mesothelioma-associated polypeptides and complexes of malignant pleural mesothelioma-associated polypeptides and their binding partners. Such agents also may be used to inhibit the native activity of the malignant pleural mesothelioma-associated polypeptides, for example, by binding to such polypeptides.
The invention, therefore, embraces peptide binding agents which, for example, can be antibodies or fragments of antibodies having the ability to selectively bind to malignant pleural mesothelioma -associated polypeptides. Antibodies include polyclonal and monoclonal antibodies, prepared according to conventional methodology.
Significantly, as is well-known in the art, only a small portion of an antibody molecule, the paratope, is involved in the binding of the antibody to its epitope (see, in general, Clark, W.R. (1986) The Experimental Foundations of Mode Immunology Wiley & Sons, Inc., New York; Roitt, I. (1991) Essential Immunology, 7th Ed., Blackwell Scientific Publications, Oxford). The pFc' and Fc regions, for example, are effectors of the complement cascade but are not involved in antigen binding. An antibody from which the pFc' region has been enzymatically cleaved, or which has been produced without the pFc' region, designated an F(ab')2 fragment, retains both of the antigen binding sites of an intact antibody. Similarly, an antibody from which the Fc region has been enzymatically cleaved, or which has been produced without the Fc region, designated an Fab fragment, retains one of the antigen binding sites of an intact antibody molecule. Proceeding further, Fab fragments consist of a covalently bound antibody light chain and a portion of the antibody heavy chain denoted Fd. The Fd fragments are the major determinant of antibody specificity (a single Fd fragment may be associated with up to ten different light chains without altering antibody specificity) and Fd fragments retain epitope-binding ability in isolation.
Within the antigen-binding portion of an antibody, as is well-known in the art, there are complementarity determining regions (CDRs), which directly interact with the epitope of the antigen, and framework regions (FRs), which maintain the tertiary structure of the paratope (see, in general, Clark, 1986; Roitt, 1991). In both the heavy chain Fd fragment and the light chain of IgG immunoglobulins, there are four framework regions (FRl through FR4) separated respectively by three complementarity determining regions (CDR1 through CDR3). The CDRs, and in particular the CDR3 regions, and more particularly the heavy chain CDR3, are largely responsible for antibody specificity. It is now well-established in the art that the non-CDR regions of a mammalian antibody may be replaced with similar regions of conspecific or heterospecific antibodies while retaining the epitopic specificity of the original antibody. This is most clearly manifested in the development and use of "humanized" antibodies in which non-human CDRs are covalently joined to human FR and/or Fc/pFc' regions to produce a functional antibody. See, e.g., U.S. patents 4,816,567, 5,225,539, 5,585,089, 5,693,762 and 5,859,205.
Fully human monoclonal antibodies also can be prepared by immunizing mice transgenic for large portions of human immunoglobulin heavy and light chain loci. Following immunization of these mice (e.g., XenoMouse (Abgenix), HuMAb mice (Medarex/GenPharm)), monoclonal antibodies can be prepared according to standard hybridoma technology. These monoclonal antibodies will have human immunoglobulin amino acid sequences and therefore will not provoke human anti-mouse antibody (HAMA) responses when administered to humans.
Thus, as will be apparent to one of ordinary skill in the art, the present invention also provides for F(ab')2, Fab, Fv and Fd fragments; chimeric antibodies in which the Fc and/or FR and/or CDR1 and/or CDR2 and/or light chain CDR3 regions have been replaced by homologous human or non-human sequences; chimeric F(ab')2 fragment antibodies in which the FR and/or CDR1 and/or CDR2 and/or light chain CDR3 regions have been replaced by homologous human or non-human sequences; chimeric Fab fragment antibodies in which the FR and/or CDR1 and/or CDR2 and/or light chain CDR3 regions have been replaced by homologous human or non-human sequences; and chimeric Fd fragment antibodies in which the FR and/or CDR1 and/or CDR2 regions have been replaced by homologous human or non- human sequences. The present invention also includes so-called single chain antibodies.
Thus, the invention involves the use of polypeptides of numerous size and type that bind specifically to polypeptides selected from those encoded by SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 (e.g., SEQ ID NOs:10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78), and complexes of both malignant pleural mesothelioma-associated polypeptides and their binding partners. These polypeptides may be derived also from sources other than antibody technology. For example, such polypeptide binding agents can be provided by degenerate peptide libraries which can be readily prepared in solution, in immobilized form or as phage display libraries. Combinatorial libraries also can be synthesized of peptides containing one or more amino acids. Libraries further can be synthesized of peptoids and non-peptide synthetic moieties.
Phage display can be particularly effective in identifying binding peptides useful according to the invention. Briefly, one prepares a phage library (using e.g. ml 3, fd, or lambda phage), displaying inserts from 4 to about 80 amino acid residues using conventional procedures. The inserts may represent, for example, a completely degenerate or biased aπay. One then can select phage-bearing inserts which bind to the malignant pleural mesothelioma- associated polypeptide. This process can be repeated through several cycles of reselection of phage that bind to the malignant pleural mesothelioma-associated polypeptide. Repeated rounds lead to enrichment of phage bearing particular sequences. DNA sequence analysis can be conducted to identify the sequences of the expressed polypeptides. The minimal linear portion of the sequence that binds to the malignant pleural mesothelioma-associated polypeptide can be determined. One can repeat the procedure using a biased library containing inserts containing part or all of the minimal linear portion plus one or more additional degenerate residues upstream or downstream thereof. Yeast two-hybrid screening methods also may be used to identify polypeptides that bind to the malignant pleural mesothelioma-associated polypeptides.
Thus, the malignant pleural mesothelioma-associated polypeptides of the invention, including fragments thereof, can be used to screen peptide libraries, including phage display libraries, to identify and select peptide binding partners of the malignant pleural mesothelioma-associated polypeptides of the invention. Such molecules can be used, as described, for screening assays, for purification protocols, for interfering directly with the functioning of malignant pleural mesothelioma-associated polypeptides and for other puφoses that will be apparent to those of ordinary skill in the art. For example, isolated malignant pleural mesothelioma-associated polypeptides can be attached to a substrate (e.g., chromatographic media, such as polystyrene beads, a filter, or an aπay substrate), and then a solution suspected of containing the binding partner may be applied to the substrate. If a binding partner that can interact with malignant pleural mesothelioma-associated polypeptides is present in the solution, then it will bind to the substrate-malignant pleural mesothelioma-associated polypeptide. The binding partner then may be isolated.
As detailed herein, the foregoing antibodies and other binding molecules may be used for example, to identify tissues expressing protein or to purify protein. Antibodies also may be coupled to specific diagnostic labeling agents for imaging of cells and tissues that express malignant pleural mesothelioma-associated polypeptides or to therapeutically useful agents according to standard coupling procedures. Diagnostic agents include, but are not limited to, barium sulfate, iocetamic acid, iopanoic acid, ipodate calcium, diatrizoate sodium, diatrizoate meglumine, metrizamide, tyropanoate sodium and radiodiagnostics including positron emitters such as fluorine-18 and carbon-11, gamma emitters such as iodine-123, technitium-99m, iodine-131 and indium- 111, nuclides for nuclear magnetic resonance such as fluorine and gadolinium.
The invention further includes protein microarrays for analyzing expression of malignant pleural mesothelioma-associated peptides selected from those encoded by SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 (e.g., SEQ ID NOs:10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78). In this aspect of the invention, standard techniques of microarray technology are utilized to assess expression of the malignant pleural mesothelioma-associated polypeptides and/or identify biological constituents that bind such polypeptides. The constituents of biological samples include antibodies, lymphocytes (particularly T lymphocytes), and the like. Protein microarray technology, which is also known by other names including: protein chip technology and solid-phase protein aπay technology, is well known to those of ordinary skill in the art and is based on, but not limited to, obtaining an aπay of identified peptides or proteins on a fixed substrate, binding target molecules or biological constituents to the peptides, and evaluating such binding. See, e.g., G. MacBeath and S.L. Schreiber, "Printing Proteins as Microaπays for High-Throughput Function Determination," Science 289(5485):1760-1763, 2000.
Preferably antibodies or antigen binding fragments thereof that specifically bind polypeptides selected from the group consisting of those encoded by SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 (e.g., SEQ ID NOs:10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78) are attached to the microaπay substrate in accordance with standard attachment methods known in the art. These aπays can be used to quantify the expression of the polypeptides identified herein.
In some embodiments of the invention, one or more control peptide or protein molecules are attached to the substrate. Preferably, control peptide or protein molecules allow determination of factors such as peptide or protein quality and binding characteristics, reagent quality and effectiveness, hybridization success, and analysis thresholds and success. The use of such methods to determine expression of malignant pleural mesothelioma nucleic acids from among SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 and/or proteins encoded by SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 (e.g., SEQ ID NOs:10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78) can be done with routine methods known to those of ordinary skill in the art and the expression determined by protein measurement methods may be used as a prognostic method for selecting treatment strategies for malignant pleural mesothelioma patients.
Examples
Example 1: Diagnosis of Thoracic Malignancies Using Gene Expression Ratios
Introduction
Malignant pleural mesothelioma (MPM) is a highly lethal pleural malignancy (1). Patients with MPM frequently present with a malignant unilateral pleural effusion or pleural thickening. However, adenocarcinoma (ADCA) metastatic to the pleura of lung or other origin is a far more common etiology for patients presenting with a unilateral pleural effusion (1). The ultimate treatment strategies depend on the coπect pathological diagnosis. Early MPM is best treated with extrapleural pneumonectomy followed by chemoradiation, whereas metastatic lung cancer is treated with chemotherapy alone (2). Not infrequently, distinguishing MPM from ADCA of the lung is challenging from both clinical and pathological perspectives (3). Fluid cytology is diagnostic in only 33% of the cases (2, 4) and sufficient additional tissue from an open surgical biopsy is often required for immunohistochemistry and cytogenetic analysis (1). Cuπent bioinformatics tools recently applied to microaπay data have shown utility in predicting both cancer diagnosis (5) and outcome (6). Though highly accurate, their widespread clinical relevance and applicability are unresolved. The minimum number of predictor genes is not known, and the discrimination function can vary (for the same genes) based on the location and protocol used for sample preparation (5). Profiling with microarray requires relatively large quantities of RNA making the process inappropriate for certain applications. Also, it has yet to be determined if these approaches can utilize relatively low- cost and widely available data acquisition platforms such as RT-PCR and still retain significant predictive capabilities. Finally, the major limitation in translating microaπay profiling to patient care is that this approach cannot cuπently be used to diagnose individual samples independently and without comparison to a predictor model generated from samples whose data was acquired on the same platform.
In this study we have explored an alternative approach using gene expression measurements to predict clinical parameters in cancer. Specifically, we have explored the feasibility of a simple, inexpensive test with widespread applicability that utilizes ratios of gene expression levels and rationally chosen thresholds to accurately distinguish between genetically disparate tissues. This approach circumvents many of the problems that prevent the penetration of expression profiling research into the clinical setting. We found that expression ratio-based diagnosis of MPM and lung cancer was similarly accurate compared to standard statistical methods of class discrimination such as linear discrimination analysis (7) and similar models (5) while addressing many of their deficiencies.
Materials and Methods Tumor tissues. A combined total of 245 discarded MPM and lung ADCA surgical specimens were freshly collected (and snap frozen) from patients who underwent surgery at Brigham and Women's Hospital (BWH) between 1993 and 2001. Lung ADCA tumors consisted of both primary malignancies and metastatic ADC As of breast and colon origin. All MPM samples used in these studies contained relatively pure tumor (greater than 50% tumor cells in a high power field examined in a section adjacent to the tissue used). Linked clinical and pathological data were obtained for all patients who contributed tumor specimens and rendered anonymous to protect patient confidentiality. Studies utilizing human tissues were approved by and conducted in accordance with the policies of the Institutional Review Board at BWH. Microarray experiments. Total RNA (7 μg) was prepared from whole tumor blocks using Trizol Reagent (Invitrogen Life Technologies, Carlsbad, California) and processed as described (8-10). cRNA was hybridized to human U95A oligonucleotide probe aπays (Affymetrix, Santa Clara, CA) using a protocol described previously (10). Data from 64 of 245 samples were discarded after visual inspection of hybridization data revealed obvious scanning artifacts, leaving a total of 31 MPM samples and 150 ADCA samples (139 patient tumors and 11 duplicates). Microaπays for all ADCA samples and 12 MPM samples were processed at the Dana-Farber Cancer Institute and the Whitehead Institute. The remaining 19 MPM samples were processed separately at BWH. Microaπay data for the ADCA samples has been previously published (11). Bhattacharjee and colleagues used microaπay data from ADCAs utilized in this study in combination with additional samples but not MPM, to identify distinct subclasses within ADCA of the lung and to search for prognostic markers. However, their study did not provide any comparison of gene expression between ADCA and MPM.
Real time quantitative RT-PCR. Total RNA (2 μg) was reverse-transcribed into cDNA using Taq-Man Reverse Transcription reagents (Applied Biosystems, Foster City, CA) and quantified using all recommended controls for SYBR Green-based detection. Primers amplifying portions of claudin-7, VAC-β, TACSTD1, and calretinin cDNA (synthesized by Invitrogen Life Technologies) had the following sequences (forward and reverse): claudin-7 5'-GTTCCTGTCCTGGGAATGAG-3' (SEQ ID NO:87) and
5'-AAGGAGATCCCAGGTCACAC-3' (SEQ ID NO:88); VAC-β 5'-CCAGCCTTTCGGTCTTCTAT-3' (SEQ ID NO:89) and
5'-CTGGAGGAAGTTGGGAAGAG-3' (SEQ ID NO:90); TACSTD1 5'-AGCAGCTTGAAACTGGCTTT-3' (SEQ ID NO:91) and
5'-AACGATGGAGTCCAAGTTCTG-3' (SEQ ID NO:92); and calretinin 5 '-AGGACCTGGAGATTGTGCTC-3 ' (SEQ ID NO:93) and 5-GAGTCTGGGTAGACGCATCA-3' (SEQ ID NO:94).
Data analysis. Gene expression levels were appropriately scaled to facilitate comparison of data from aπays hybridized at different times and/or using multiple scanners. When the "average difference" was negative (i.e. negligible expression level), the absolute value was used. A two-tailed students t-test was used to compare the log(gene expression levels) for all 12,600 genes on the microaπay between samples from a training set consisting of 16 MPM and 16 ADCA samples. All differences in the mean log(expression levels) between the samples in the two groups in the training set were determined to be statistically significant if P<2xl0"6. Statistical comparisons (including linear discrimination analysis) were performed using S-PLUS (12). To generate the graphical representations of relative gene expression levels, all expression levels were first normalized within samples by setting the average (mean) to 0 and the standard deviation to 1. Scaled levels were assigned RGB values (representing 20 shades) for colorimetric display as a spectrum representing relative gene expression levels.
Results
Identification of Diagnostic Molecular Markers. We searched all of the genes represented on the microaπay for those with a highly significant difference (R<2xl0~6, >8- fold) in average expression levels between both tumor types in the training set of 16 ADCA and 16 MPM samples. For further analysis, we chose the 8 genes with the most statistically significant differences and a mean expression level >600 in at least one of the two training sample sets (gene name, GenBank Accession #): calretinin, X56667, (P=8xl0"12), NAC-β, X16662, CP=8xl0"13), TACSTD1, M93036, (P=6xl0"'2), claudin-7, AJ011497, (P=2xl0-9), thyroid transcription factor-1 (TITF-1), U43203, (P=10"9), MRC OX-2 antigen, X05323, (R=5xl0~13), prostacyclin synthase (PTGIS), D83402, (P=10"10), and hypothetical protein
KIAA0977, AB023194, (P=9xl0"n). Five of these genes were expressed at relatively higher levels in MPM tumors (calretinin, VAC-β, MRC OX-2, PTGIS, and KIAA0977) and 3 were expressed at relatively higher levels in ADCA tumors (TACSTD1, claudin-7, and TITF-1). We then investigated whether expression patterns of these genes extended to all samples (Fig. IA).
Diagnostic Accuracy of Gene Expression Ratios. Using the 8 genes identified in the initial training set, we calculated 15 expression ratios per sample by dividing the expression value of each of the 5 genes expressed at relatively higher levels in MPM by the expression value of each of the 3 genes expressed at relatively higher levels in ADCA. Then, we tested the diagnostic accuracy of these ratios in the 149 remaining samples not included in the training set (i.e. 15 MPM and 134 ADCA). Samples with ratio values >1 were called MPM and those with ratio values <1 were called ADCA. We found that these ratios could be used to coπectly distinguish between ADCA and MPM tumors with a high degree of accuracy (Table 1).
Table 1. Accuracy of all ratio combinations in predicting tumor diagnosis in test set Claudin-7 TACSTD1 TITF-1 Calretinin 97% (145/149) 98% (146/149) 91% (136/149) VAC-β 97% (144/149) 97% (145/149) 94% (140/149) MRC OX-2 97% (145/149) 97% (145/149) 95% (142/149) KIAA097 97% (145/149) 95% (142/149) 94% (140/149) PTGIS 97% (145/149) 97% (144/149) 96 % (143/149)
Eight candidate diagnostic genes were identified in a training set of samples as described in the Methods. A total of fifteen possible expression ratios (column/row intersection) were calculated where both genes used to form the ratio possessed inversely coπelated expression levels in both tumor types. The accuracy of each ratio in predicting diagnosis was examined in the 149 remaining tumor specimens not included in the training set (15 mesothelioma and 134 adenocarcinoma). Predictions are stated as the fraction diagnosed coπectly.
To incoφorate data from multiple ratios, we then randomly chose a pair of independent ratios (calretinin/claudin-7 and VAC-β/TACSTDl) and examined their predictive accuracy in the test set. Each ratio (calretinin/claudin-7 and VAC-β/TACSTDl) was 97% (145/149) accurate with 4 eπors (Figs. IB and IC). Thus, a total of 8 samples were incoπectly diagnosed using either ratio. However, these two ratios were in disagreement for all 8 incoπectly diagnosed samples (Fig. IC). When the diagnostic call of both ratios is combined, the final analysis results in 95% (141/149) of tumors correctly diagnosed with 0 eπors and 8 no-calls. No-calls were conservatively made for samples when both ratios did not return the same diagnosis (Fig. IC). To predict a diagnosis for the 8 no-calls, we randomly chose an additional ratio (MRC OX-2/TITF-1, Table 1). The addition of a third ratio established a majority diagnosis for the 8 samples that could not previously be determined using only two ratios. Using all 3 ratios (i.e. 6 genes), 99% (148/149) of tumors were coπectly diagnosed; 7 no-calls were resolved and 1 sample was incorrectly diagnosed.
Comparison with Linear Discrimination Analysis. Standard statistical methods of class discrimination (7), such as linear discrimination analysis, can also be used to achieve similar results for these three pairs of genes. We first determined a linear combination of measured expression levels for each pair of genes that provided maximal discrimination between the two sets of tumor samples in the training set. When applied to the test set samples, the linear discrimination functions for the (calretinin, claudin-7), (VAC-β, TACSTD1), and (MRC OX-2, TITF-1) pairs each gave 6, 5, and 4 misclassifications, respectively. However, only one sample was incoπectly diagnosed in all three tests combined. In fact, the same eπant sample was identified in the application of both the three ratio tests and the three linear discriminant tests. This sample was originally obtained from a patient with the clinical and pathological diagnosis of ADCA. This specimen was annotated by a pathologist reviewing frozen sections of all specimens prior to RNA preparation as having unusual histological features raising suspicion of a "germ cell tumor or sarcoma".
Verification of Microarray Data and Validation of Ratio-Based Diagnosis. We utilized real time quantitative RT-PCR (i) to confirm gene expression levels of diagnostic molecular markers identified in microaπay-based analysis and (ii) to demonstrate that ratio- based diagnosis of MPM and lung cancer is equally accurate using data obtained from another platform. We randomly chose 12 tumor samples each of MPM and ADCA from those used in microarray analysis then calculated expression ratios for calretinin/claudin-7 and VAC-β/TACSTDl. Expression ratios coπectly diagnosed 96% (23/24) of samples, with 0 eπors and 1 no-call (Fig. 2).
We have also explored the usefulness of expression ratios in predicting clinical parameters under more challenging circumstances, i.e. when predictor genes have substantially higher P values and smaller fold-change differences in average expression levels. In this analysis we used previously published microaπay data (6) for a set of 60 medulloblastoma tumors with linked clinical data (Dataset "C") to create a ratio-based test designed to predict patient outcome after treatment. Of these 60 samples, 39 and 21 originated from patients classified as "treatment responders" and "treatment failures", respectively. We used a training set composed of 20 randomly chosen samples (10 responders and 10 failures) to identify predictor genes. A total of 10 genes fit our filtering criteria (RO.05, >2-fold change in average expression levels, at least one mean >200) and we chose the most significant three genes expressed at relatively higher levels in each group for further analysis (gene name, GenBank Accession #): histone 2A, M37583, (P=0.012), GTPase rho C, L25081, (P=0.026), protein gene product 9.5, X04741, (P=0.046), neurofilament-66, S78296, (P=0.0025), sulfonylurea receptor, U63455, ( =0.0067), cell surface protein HCAR, U90716, (P=0.030). Using the previously stated diagnostic criteria, we calculated a total of 9 possible expression ratios using data from these 6 genes and examined their predictive accuracy in the remaining samples (i.e., the test set, n=29 responders and n=l 1 failures). A total of 5 ratios were equally accurate (75%, 30/40) in predicting test set samples and, in combination, utilized all 6 predictor genes. Our accuracy rate in a true test set of samples is similar to that reported by Pomeroy and colleagues (78%, 47/60) using all 60 samples to develop an 8-gene ^-nearest neighbor predictor model (6). To incoφorate the predictive accuracy of multiple ratios (and genes), we calculated the geometric mean of these 5 ratios to give equal weight to ratios with identical magnitude but opposite direction. Finally, we performed Kaplan-Meier survival analysis using predictions made from the geometric mean value. We found that a 6-gene (5-ratio) model could significantly (R=0.00357, log-rank test) predict patient outcome after treatment in the test set of samples (Fig. 3). This P value is moderately lower than that reported by Pomeroy et al. (7^=0.009) using all 60 samples to assess their 8-gene A:-nearest neighbor predictor model (6). There was no overlap in the list of genes comprising our model and that of these investigators, suggesting that multiple genes are present in this malignancy that have similar predictive capability.
Discussion
Accurate diagnosis of cancer (or any disease) is the first critical step in choosing appropriate treatments that will hopefully result in the best possible outcome. We propose that the ratio-based method described herein that utilizes expression levels of carefully chosen genes can be a simple, inexpensive, and highly accurate means to distinguish MPM from ADCA of the lung and that this method is applicable to many other clinical scenarios. We have also shown that multiple highly accurate ratios can be combined to form a simple diagnostic tool using the ratio direction ("majority rules" approach, e.g., MPM and lung cancer diagnosis) or the ratio magnitude (calculation of the geometric mean, e.g., prediction of outcome in medulloblastoma). The gene expression ratio method, by virtue of the fact that it is a ratio (i) negates the need for a third reference gene when determining expression levels, (ii) is independent of platform used for data acquisition, (iii) requires only small quantities of RNA (as little as 10 pg using RT-PCR), (iv) does not require the coupling of transcription to translation for chosen genes, and (v) permits analysis of individual samples without reference to additional "training samples" whose data was acquired on the same platform. For these reasons, expression ratios are more likely to find immediate use in clinical settings since they confer several advantages compared to other equally accurate techniques, such as linear discriminant analysis. The small P values and large fold-differences in average expression levels between genes used in expression ratio-based diagnosis of MPM and lung cancer are not suφrising given that both tumor types have different cell types of origin. It is important to note that we have not determined in the cuπent study the exact magnitude and consistency by which gene expression needs to differ between any two groups to allow the usage of a simple ratio test. In other clinical scenarios the differences in gene expression patterns between groups to be distinguished may be more subtle, thus necessitating a relaxed filtering criteria in choosing potential predictor genes. Even in these cases, simple ratios can still be a highly accurate means of predicting clinical parameters. We have also found that expression ratios are useful in predicting outcome after therapy in MPM using genes with considerably higher P values and lower fold-differences in average expression levels than those used in the cuπent study (Gordon et al., manuscript submitted). In the cuπent study, we have used previously published microarray data (6) to identify a small number of predictor genes that were able to significantly predict outcome after therapy in medulloblastoma in a true test set of samples using simple expression ratios. Nevertheless, in some cases larger numbers of genes (and perhaps sophisticated software) and/or initial expression profiling of a larger number of specimens for the training set may be required to achieve acceptable predictive power.
The selection of diagnostic genes for MPM and lung cancer was based solely on our stated criteria. Nevertheless, many of the molecular markers with the lowest P values and greatest difference in average expression levels have notable cancer relevance and/or are known to have tissue specific expression patterns. Calretinin (13, 14) and TITF-1 (15, 16) are part of several immunohistochemical panels cuπently used in the diagnosis of MPM and lung cancer. Claudin family members are expressed in various cancers (17, 18) and TACSDT1 (alias TROP1) is a recently described marker for carcinoma cells and, as a cell surface receptor protein, has been postulated to play a role in growth regulation of tumor cells (19, 20). The discovery of diagnostic gene ratios is likely to make possible future clinical tests to definitively diagnose MPM and ADCA using smaller tissue specimens and perhaps pleural effusions. In this way the need for diagnostic surgery in many of these patients may be eliminated. The expression ratio technique represents a substantial improvement over past efforts to translate the strengths of expression profiling into simple tests with clinical relevancy. Many bioinformatics tools under development and testing are quite complex and/or rely upon data from large numbers of "training samples" to establish a diagnosis for unknown samples. The end result is that the practical use of microaπay data remains beyond the scope of many scientists and clinicians. Similarly, no comprehensive method has been proposed to translate the results of tumor profiling to the analysis of individual tissues. As a consequence, no simple yet effective clinical applications have resulted from microaπay research. The expression ratio technique represents a powerful use of microaπay data that can be easily adapted and extended to routine clinical application without the need for additional sophisticated analysis.
References for Example 1 1. Aisner, J. Diagnosis, staging, and natural history of pleural mesothelioma. In: J. Aisner, R. Arriagada, M. R. Green, N. Martini, and M. C. Peπy (eds.), Comprehensive Textbook of Thoracic Oncology, pp. 799-785. Baltimore: Williams and Wilkins, 1996.
2. Pass, H. Malignant pleural mesothelioma: Surgical roles and novel therapies, Clin Lung Cancer. 3: 102-117, 2001.
3. Ordonez, N. G. The immunohistochemical diagnosis of epithelial mesothelioma, Hum Pathol. 30: 313-323, 1999.
4. Nguyen, G.-K., Akin, M.-R. M., Villanueva, R. R., and Slatnik, J. Cytopathology of malignant mesothelioma of the pleura in fine-needle aspiration biopsy, Diagn Cytopathol. 21: 253-259, 1999.
5. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C, Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and Landers, E. S. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science. 286: 531-537, 1999. 6. Pomeroy, S. L., Tamayo, P., Gaasenbeek, M., Sturla, L. M., Angelo, M., McLaughlin, M. E., Kim, J. Y. H., Goumnerova, L. C, Black, P. M., Lau, C, Allen, J. C, Zagzag, D., Olson, J. M., Cuπan, T., Wetmore, C, Biegel, J. A., Poggio, T., Mukherjee, S., Rifkin, R., Califano, A., Stolovitzky, G., Louis, D. N., Mesirov, J. P., Lander, E. S., and Golub, T. R. Prediction of central nervous system embryonal tumor outcome based on gene expression, Nature. 415: 436-442, 2002.
7. Dudoit, S., Fridlyand, J., and Speed, T. P. Comparison of discrimination methods for the classification of tumors using gene expression data, J Am Stat Assoc. In Press:, 2002. 8. Wang, K., Gan, L., Jeffery, E., Gayle, M., Gown, A. M., Skelly, M., Nelson, P. S., Ng, W. V., Schummer, M., Hood, L., and Mulligan, J. Monitoring gene expression profile changes in ovarian carcinomas using cDNA microaπays, Gene. 229: 101-108, 1999. 9. Warrington, J. A., Nair, A., Hahadevappa, M., and Tsyganskaya, M. Comparison of human adult and fetal expression and identification of 535 housekeeping/maintenance genes, Physiol Genomics. 2: 143-147, 2000.
10. O'Dell, S. D., Bujac, S. R., Miller, G. J., and Day, I. N. Associations of IGF2 Apal RFLP and INS VNTR class I allele size with obesity, Eur J Hum Genet. 7; 565-576, 1999.
11. Bhattacharjee, A., Richards, W. G., Staunton, J., Li, C, Monti, S., Vasa, P., Ladd, C, Beheshti, J., Bueno, R., Gillette, M., Loda, M., Weber, G., Mark, E. J., Lander, E. S., Wong, W., Johnson, B. E., Golub, T. R., Sugarbaker, D. J., and Meyerson, M. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma sub-classes, Proc Nati Acad Sci USA. 98: 13790-13795,
2001.
12. Venables, W. N. and Riley, B. D. Modem Applied Statistics with S-Plus. New York: Springer, 1997.
13. Lozano, M. D., Panizo, A., Toledo, G. R., Sola, J. J., and Pardo-Mindan, J. Immunocytochemistry in the differential diagnosis of serous effusions: a comparative evaluation of eight monoclonal antibodies in Papanicolaou stained smears, Cancer. 93: 68-72, 2001.
14. Sato, S., Okamoto, S., Ito, K., Konno, R., and Yajima, A. Differential diagnosis of mesothelial and ovarian cancer cells in ascites by immunocytochemistry using Ber- EP4 and calretinin, Acta Cytol. 44: 485-488, 2000.
15. Di Loreto, C, Puglisi, F., Di Lauro, V., Damante, G., and Beltrami, C. A. TTF-1 protein expression in malignant pleural mesotheliomas and adenocarcinomas of the lung, Cancer Lett. 124: 73-78, 1998.
16. Ordonez, N. G. The value of antibodies 44-36A, SM3, HBME-1, and thrombomodulin in differentiating epithelial pleural mesothelioma from lung adenocarcinoma, Am J Surg Pathol. 21: 1399-1408, 1997.
17. Michl, P., Buchholz, M., Rolke, M., Kunsch, S., Lohr, M., McClane, B., Tsukita, S., Leder, G., Adler, G., and Gress, T. M. Claudin-4: a new target for pancreatic cancer treatment using Clostridium perfringens enterotoxin, Gastroenterology. 121: 678-684, 2001.
18. Hough, C. D., Sherman-Baust, C. A., Pizer, E. _.., Montz, F. J., Im, D. D., Rosenshein, N. B., Cho, K. R., Riggins, G. J., and Morin, P. J. Large-scale serial analysis of gene expression reveals genes differentially expressed in ovarian cancer, Cancer. 60: 6281-
6287, 2000.
19. Alberti, S., Nutini, M., and Herzenberg, L. A. DNA methylation prevents the amplification of TROP1, a tumor-associatedcell surface antigen gene, Proc Nati Acad Sci USA. 91: 5833-5837, 1994. 20. Calabrese, G., Crescenzi, C, Morizio, E., Palka, G., Gueπa, E., and Alberti, S.
Assignment of TACSTD1 (alias TROP1, M4S1) to human chromosome 2p21 and the refinement of mapping of TACSTD2 (alias TROP2, Ml SI) to human chromosome lp32 by in situ hybridization, Cytogenet Cell Genet. 92: 164-165, 2001.
Example 2: Molecular Markers for Malignant Pleural Mesothelioma
Introduction
In this study we have refined a gene expression measurements approach to predict clinical parameters in cancer, including distinguishing between subclasses of malignant pleural mesothelioma and distinguishing between malignant pleural mesothelioma and lung adenocarcinoma. We have found that ratios of gene expression levels can accurately distinguish between genetically disparate tissues.
Methods
MPM tissues. Discarded malignant pleural mesothelioma (MPM) surgical specimens were freshly collected from patients undergoing pleurectomy or extrapleural pneumonectomy at Brigham and Women's Hospital (Boston, MA) from 1992 to 1998 and flash frozen. All tissues were obtained from patients who did not receive pre-operative treatment. Standard tissue banking procedures were followed throughout. Once brought to the Hospital Tumor Bank, tissues were sliced into 3 mm3 portions and assigned an identifier to catalogue its position in the original specimen. Hematoxylin-stained slides were generated from each MPM specimen in the Tumor Bank and reviewed by a pathologist for tumor content and histological subtype. A total of 80 specimens have been identified to date that contain relatively pure tumor (greater than 50% of cells in a high power field are tumor cells). Of these, 24 were chosen for microaπay analysis. Linked clinical, epidemiological, outcome, and pathological data were obtained for all patients who contributed tumor specimens and rendered anonymous to protect patient confidentiality. Studies utilizing human tissues were approved by and conducted in accordance with the policies of the Institutional Review Board at Brigham and Women's Hospital.
Tissue processing and RNA preparation. Total RNA was isolated from frozen tumor blocks using Trizol solution (Invitrogen Life Technologies, Carlsbad, California) exactly per the manufacturer's recommended protocol. To initially assess total RNA degradation, a portion of RNA from each sample was resolved on a 1% agarose/formadehyde gel using standard procedures (Ausubel, 1998). Gels were stained with ethidium bromide and bands representing ribosomal subunits 28S and 18S were visualized. Approximately 10% of samples collected were discarded secondary to unsatisfactory quality.
Total RNA (7 μg) was amplified and the product labeled with biotin following a procedure previously described (Wang, 1999; Warrington, 2000; O'Dell, 1999). Briefly, double-stranded cDNA was synthesized using the Superscript Choice System (Invitrogen Life Technologies) and a T7-(dT)-24 first strand primer (Geneset Oligos, La Jolla, California). The cDNA was purified by phenol/chloroform/isoamyl alcohol extraction using a phase lock gel (5 Prime-3 Prime, Inc., Boulder, Colorado) and concentrated by ethanol precipitation. In vitro transcription was performed to produce biotin-labeled cRNA using a BioArray High Yield RNA Transcript Labeling Kit (Affymetrix, Santa Clara, California) according to the manufacturer's instructions. Linearly amplified cRNA was obtained by incubation with T7 RNA polymerase. Final cRNA preparations were cleaned with RNeasy Mini kit (Qiagen, Valencia, California).
Hybridization of RNA to high density oligonucleotide microarrays. Prior to hybridization to experimental aπays, the quality of cRNA was assessed for approximately half of all samples using test aπays (Affymetrix Test2 gene aπays) designed to compare relative expression levels of β-actin and GAPDH by using oligonucleotide probes complementary to both the 3' and 5' ends of gene products. Hybridization of test aπays was done as detailed below for experimental aπays with minor modifications as suggested by the manufacturer (Affymetrix). Biotinylated cRNA (20 μg) was fragmented and hybridized to microaπays containing oligonucleotide probe-sets representing approximately 12,000 known human genes (Affymetrix U95 A human aπay, HG-95 Av2) according to Affymetrix protocols using a protocol described previously (O'Dell, 1999). Essentially, the hybridization mixture was incubated at 99 °C for 5 min. followed by incubation at 45 °C for 5 min. before injection of the sample into the probe aπay cartridge. Hybridization was performed at 45 °C for 16-18 hours. After washing, the array was stained with streptavidin-phycoerythrin (Molecular Probes, Eugene, Oregon) and the hybridization signal amplified using a biotinylated anti- streptavidin antibody (Vector Laboratories, Inc., Burlingame, California) before subsequent scanning in a HP GeneArray scanner (Affymetrix).
The intensity of all features of microaπays were captured and examined for artifacts using Affymetrix GeneChip® Software v. 4.0, according to standard Affymetrix procedures (O'Dell, 1999). The "target intensity" was set to 100 for all samples. Each aπay contained several prokaryotic genes which served as internal hybridization controls for RNA spiked into experimental samples. Data from 5 aπays was uninteφretable and discarded, leaving a total of 19 samples in the final analysis. Of these 19, 2 were tested in duplicate and 1 in triplicate. GeneChip® Software was used to generate quantitative gene expression values (measured by average differences).
Real time quantitative RT-PCR. Gene expression data obtained from microaπays was verified using real time quantitative RT-PCR. PCR reactions were set up, optimized, and performed precisely following the manufacturer's recommended protocol (Sequence Detection System, Applied Biosystems, Foster City, California). Total RNA (2 μg) was reverse-transcribed into cDNA using Taq-Man Reverse Transcription reagents and random hexamers as the primer (Applied Biosystems). PCR reactions were set up in a 25 μl reaction volume using SYBR Green PCR Master Mix (Applied Biosystems). Optimized primers amplifying portions of fibronectin, transgelin, complement factor B (CFB), and L32 ribosomal protein cDNA were designed according to recommended specifications (Applied Biosystems), synthesized by Invitrogen Life Technologies, and used at a final concentration of 900 nM in the reaction mixture.
Primer sequences were as follows: fibronectin 5'-GCCATGACAATGGTGTGAAC-3' (SEQ ID NO:l) and
5'-GCAAATGGCACCGAGATATT-3' (SEQ ID NO:2); transgelin 5'-AGGACTCTGGGGTCATCAAG-3' (SEQ ID NO:3) and 5'-AGTTGGGATCTCCACGGTAG-3' (SEQ ID NO:4); CFB 5'-TGAGGCTTCCTCCAACTACC-3' (SEQ ID NO:5) and A
5'-TGCCTTTCTTATCCCCATTC-3' (SEQ ID NO:6); L32 5'-AACCCAGAGGCATTGACAAC-3 ' (SEQ ID NO:7) and
5'-ACTTCCAGCTCCTTGACGTT-3' (SEQ ID NO:8). PCR amplification was performed in a 96-well format using optical plates and covers (Applied Biosystems) in an Applied Biosystems 5700 Sequence Detector. To confirm the absence of non-specific amplification in PCR reactions, no-template controls containing H2O substituted for template were run in multiple wells on every reaction plate. In addition, a melting point disassociation curve was automatically generated after every experiment to confirm the presence of a single PCR species in all experimental wells. The Comparative CT method was used to obtain quantitative values for gene expression levels in all samples (Applied Biosystems, see http://www.appliedbiosystems.com for details). This method normalizes expression levels between samples using another (housekeeping) gene as a reference to standardize for different starting template amounts.
Briefly, amplification reactions are characterized by the point in time during cycling when amplification of a PCR product is first detected rather than by the amount of PCR product accumulated after a fixed number of cycles. In the initial cycles of PCR, there is little change in fluorescence signal, which defines the baseline for the amplification plot. An increase in fluorescence above the baseline indicates the detection of accumulated PCR product. A fixed fluorescence threshold can be set above the baseline. The parameter CT (threshold cycle) is defined as the fractional cycle number at which the fluorescence passes the fixed threshold. The higher the starting copy number of the nucleic acid target, the sooner an increase in fluorescence past the selected threshold is observed. A plot of the log of initial target copy number for a set of standards versus CT is a straight line. Therefore, quantification of the amount of target in unknown samples is accomplished by measuring CT and using the standard curve to determine starting copy number. The L32 ribosomal gene was used for this puφose since its expression levels did not vary substantially over all samples (from microaπay data). Data analysis. A hierarchical clustering algorithm (AGNES) in the statistical package S- PLUS (Venables, 1997) was used to classify all 19 MPM tumors according to relative variation in gene expression patterns. All linked clinical data was held exclusively by one investigator and revealed only after cluster analysis was completed. Gene hybridization intensities (from GeneChip® Software) were appropriately scaled to a "target intensity" of 100 to facilitate comparison of data from all aπays. To minimize contamination from signal background and saturation effects (Hsiao, 2001), only genes with an expression value between 1,000 and 5,000 were considered in the unsupervised cluster analysis.
The significance of observed differences in gene expression levels between selected MPM subclasses was assessed using a Kruskal-Wallis test (nonparametric ANOVA) followed by Dunn's Multiple Comparison test. A Mann- Whitney test was used for selective pairwise comparisons, such as median patient survival. The degree of correlation between patient survival and matched gene expression levels was examined using Spearman coπelation calculations and trendlines were obtained by generating a LOWESS curve. Contingency tables were analyzed using a chi-square test for independence and a chi-square test for trend. All differences were determined to be statistically significant if PO.05. Calculations and statistical comparisons were generated using GraphPad Prism v.3.02 (GraphPad Software, San Diego, California). To generate the graphical representations of relative gene expression levels (log 2 scale), all expression levels were first normalized within samples by setting the average (mean) to 0 and the standard deviation to 1. Scaled levels were assigned RGB values (representing 17 shades) for colorimetric display as a spectrum representing relative gene expression levels.
Results Identification of MPM subclasses using hierarchical cluster analysis. Variation in the patterns of gene expression levels in 19 MPM tumors were examined using unsupervised cluster analysis, and distinct classes were identified based on similar expression profiles. The dendrogram specifying the grouping of samples shows that MPM tumors segregate into 2 major subclasses. A major division in the distribution of samples separates 6 tumor specimens from all others (designated Subclass 1). The remaining 13 samples form 2 distinct subclasses that cluster tightly on opposite sub-branches (designated Subclasses 2 and 3). A maximum of 145 genes with relatively high expression levels between 1,000 and 5,000 (i.e, average difference) was sufficient to accurately define MPM subclasses. Only a portion of the 145 genes used for analysis are identified here. The complete set is available in Table 6.
A set of 26 genes was identified as providing a redundant set of diagnostic gene expression ratios that can be used in different combinations (Table 2). There is some overlap to ensure complete coverage of samples to coπect for "no-calls" for any one ratio.
Table 2
Gene elevated in Accession # Symbol Name SEO ID NOs:*
Meso/Adeno adeno J02761 SFTBP surfactant pulmonary 9, 10 associated protein B adeno M93036 TACSTD1 tumor associated 11, 12 calcium signal transducer 1 adeno AJO 11497 CLDN7 claudin 7 13, 14 meso X56667 CALB2 calbindin 2 15, 16 (calretinin) meso XI 6662 ANXA8 annexin 8 17, 18 meso M21389 KRT5 keratin 5 19, 20
Meso/Normal meso X03168 VTN vitronectin 21, 22 meso X76029 NMU neuromedin U 23, 24 meso X56667 CALB2 calbindin 2 15, 16 (calretinin) normal U43203 TITF1 thyroid transcription 25, 26 factor 1 normal M18728 CEACAM6 carcinoembryonic 27, 28 Ag-related cell adhesion molecule 6 normal T92248 UGB uteroglobin 29, 30
Meso/Squamous meso AI651806 LOC51232 cysteine-rich repeat 31, 32 containing protein S52 precursor meso X56667 CALB2 calbindin 2 15, 16 (calretinin) meso D83402 PTGIS prostacyclin synthase 33, 34 squa U42408 LAD1 ladinin 1 35, 36 squa L33930 CD24 CD24 antigen 37, 38 squa AI539439 S100A2 SI 00 calcium-binding 39, 40 protein A2
Ept/All Other ept AL049963 LOC64116 up-regulated by 41, 42 BCG-CWS ept L15702 CFB complement factor B 43, 44 (CFB) other M95787 TAGLN transgelin 45, 46 Subclass 1/2 (also see ept/other)
1 AL049963 LOC64116 up-regulated by 41, 42 BCG-CWS
2 NM_001953 ECGF1 endothelial cell 47, 48 growth factor 1
Subclass 1/3
1 M22919 MYL6 myosin, light 49, 50 polypeptide 6
3 X06256 ITGA5 integrin, alpha 5 51, 52
Subclass 2/3
2 Z98946 MSN moesin 53, 54
2 AI540958 PIN dynein 55, 56
3 AI677689 KIAA0685 KIAA0685 gene 57, 58 product
3 M31932 FCGR2A Fc fragment of IgG, 59, 60 low affinity Iia meso = malignant pleural mesothelioma adeno = adenocarcinoma normal = noπnal lung tissue squa = squamous carcinoma ept = epithelial * SEQ ED NOs are given as nucleotide sequence, amino acid sequence
Genes that serve multiple puφoses (and are listed more than one) are italicized. There are a total of 26 genes.
The functional distribution of these reliably expressed genes can be roughly classified as follows: 33% ribosomal, 7% cytoskeletal, 6% inflammatory/immune, 3% extracellular matrix (ECM), 3% intracellular signaling, 2% proliferation, and 46% other, multiple, or unknown function. Approximately two-thirds of the 145 genes were expressed at substantially higher levels in samples from Subclass 1 ; the remaining one-third were expressed at relatively higher levels in samples from Subclasses 2 and 3. Ribosomal proteins account for approximately 50% of all genes overexpressed in Subclass 1. Genes overexpressed in Subclasses 2 and/or 3 consisted predominately of cytoskeletal and ECM- related genes such as actin, vimentin, tubulin, myosin, cofilin, osteonectin, and others. The organization of MPM subclasses was extremely robust and reproducible. For example, samples assigned to Subclasses 1, 2, and 3 remain in the same subclass when cluster analysis is repeated incoφorating data from 3 samples chosen at random for duplicate (n=2) and triplicate («=1) hybridization experiments on different microarrays.
Clinical characteristics of MPM subclasses. Linked clinical data for individual samples are presented in tabular format and arranged according to subclass membership (Table 3).
Samples in Subclass 1 and 2 consisted exclusively of specimens histologically classified as epithelial and mixed subtypes, respectively. Subclass 3 consisted of members of all major histological subtypes: epithelial (n=3), mixed (n=4), and sarcomatoid (n=2). Two samples were excluded from survival analysis since they originated from patients whose status was either unknown (sample 116, Subclass 1) or who did not die from disease (sample 118,
Subclass 2). The analysis of cancer related clinical outcome (using nonparametric ANOVA) revealed that the median survival (19 months) of patients in Subclass 1 (all epithelial subtype) was significantly higher (RO.01) than the median survival (2 months) of patients in Subclass 2 (all mixed subtype). The median survival of patients with varied histology in Subclass 3 (11 months) was intermediate to Subclasses 1 and 2, but nonetheless was not significantly different (P>0.05) from that of either Subclass. There was no significant difference in survival between patients with epithelial histology classified as either Subclass 1 or 3. However, patients with mixed histology classified as Subclass 2 samples (all "short-lived" mixed) had significantly shorter (R=0.029) median survival (2 months) when compared to the median survival (11.5 months) of patients with mixed histology classified as Subclass 3 (all "long-lived" mixed). Although asbestos count and exposure history appeared to be lower in members of Subclass 1, there are insufficient data at this time to draw meaningful conclusions. Beyond patient survival, there were no other significant associations between samples in MPM subclasses and any other aspect of the clinical data.
Table 3
Sample # Age Sex Histology3 Smoking Hx Pack Yearsb A Assbb<estos Exposure Hx Asbestos Fiber Count0 Survival Status
Group 1
76 67 m ept. yes - neg. 17 3
86 42 f ept. no 0 neg. 0 9 3
116 70 m ept. yes - neg. 50 9 U
90 48 m ept. no 0 pos. 28 2
68 61 m ept. yes 30 neg. 168 21 3
109 62 m ept. yes 14 pos. 19 3
Group 2
89 55 m mixed yes 4.5 pos. 147 3 3
118 74 m mixed yes 73.8 pos. 119 7 4
133 69 m mixed no 0 pos. 17,547 2 3
114 51 m mixed yes 3.75 neg. 2919 2 3
Group 3
101 71 f ept. yes 15 pos. 11 3
93 52 m ept. yes 113 pos. 15 3
229 33 f ept. no 0 pos. 13 5 3
105 66 m mixed yes 38 pos. 290 12 3
72 46 m mixed yes 25 pos. 6540 53 3
213 55 m mixed yes 5 pos. 266 11 1
130 55 m mixed yes 10 pos. 69 6 3
159 62 m sarc. no 0 pos. 2 3
166 66 m sarc. yes 30 pos. 451 6 3
", data unavailable a, ept. epithelial; sarc, sarcomatoid b, packs per day X years smoking c, per gram of lung tissue; control median value ~70 d, in months c, 1, alive without disease; 2, alive with disease; 3, dead with disease; 4, dead other causes; U, unknown
Identification of prognostic and diagnostic molecular markers for MPM. To identify candidate prognostic molecular markers for MPM, all 12,000 genes on the microarray were searched to find those with expression levels that were significantly different between tumors of Subclass 1 (best prognosis) and Subclass 2 (worst prognosis). Not suφrisingly, many of these genes also distinguished samples of epithelial histology from tumors of all other subtypes, and for this reason may further serve as diagnostic molecular markers. Approximately 400 genes on the microaπay fit our selection criteria (average expression level >1000 in at least one group) and the most statistically significant (P=10"3-10"6) 11 genes are listed in Table 4.
Table 4
Accession # Name SEO ID NOs:*
M95787 SM22 (transgelin) 45, 46
Z82215 DNA sequence from PAC 6802 61, 62
X13839 vascular smooth muscle alpha-actin 63, 64
L19182 MAC25 65, 66
J04599 proteoglycan I (biglycan) 67, 68
X02761 fibronectin 69, 70
X15882 collagen VI 71, 72
Y14690 procollagen alpha 2(V) 73, 74
L15702 complement factor B (CFB) 43, 44
N47307 cDNA, 3 end /clone = IMAGE- 75, 76 280506
L38941 ribosomal protein L34 (RPL34) 77, 78
SEQ ID NOs are given as nucleotide sequence, amino acid sequence
The fact that these genes also distinguish samples of the epithelial subtype suggests that tumors of the same histological subtype possess certain similarities in gene expression despite being assigned to different classes using hierarchical clustering. Another set of genes that distinguishes between subclass 1 and subclass 2 is presented in Table 7. Of the 11 genes in Table 4, the first 8 are expressed at relatively lower levels and the final three genes are expressed at higher levels in epithelial subtype samples (subclass 1) compared to all others.
From this set, two genes with relatively low levels of expression in epithelial subtype samples (transgelin and fibronectin, P=10"4 and 0.0028 respectively) and one with relatively high levels of expression in epithelial subtype samples (complement factor B (CFB), P=10~A) were chosen. Expression levels for all three genes were significantly (R<0.05) coπelated with survival iπespective of histological subtype. Furthermore, expression level ratios of transgelin/CFB and fibronectin/CFB were also significantly coπelated with survival independent of histology. Levels of transgelin and fibronectin were approximately equal in individual samples; accordingly, the fibronectin/transgelin ratio remains close to 1 for all samples and is not coπelated with survival.
When individual patient ratios of either transgelin CFB or fibronectin/CFB were 10 or greater, survival was lowest (median survival=2 months, range 2-6 months). Conversely, when these ratios are 1 or less, patient survival was substantially higher (median survival=13 months, range 5-28 months). Additionally, 80% (8/10) of patients whose ratio is less than 1 survived at least 9 months.
Another set of genes was identified to distinguish between adenocarcinoma and malignant pleural mesothelioma in accordance with the procedures described above. These genes are presented in Table 8.
Validation of microarray-based analysis of gene expression. Quantitative RT-PCR was utilized to verify gene expression levels of the molecular markers identified in microaπay- based analysis. As expected, average expression levels for CFB were significantly higher (P=0.019) and average expression levels for transgelin and fibronectin significantly lower ( =0.038, and R=0.024, respectively) in samples from Subclass 1 (good prognosis) compared to Subclass 2 (poor prognosis) (Fig. 4A).
We then determined whether expression ratios created using microarray data could be accurately reproduced using quantitative RT-PCR data. Ratios were created by expressing individual gene levels in epithelial subtype samples relative to levels in all other subtypes combined (Fig. 4B). We found that ratios created using data from both platforms were in relative agreement for all 3 genes. Expression level ratios (and individual expression levels) obtained from RT-PCR were also significantly coπelated with survival for transgelin/CFB ( =0.0015) and fibronectin CFB (P=0.009) independent of the histological subtype.
Verification of expression level ratios as prognostic and diagnostic molecular markers. To verify the prognostic capability of gene expression ratios, quantitative RT-PCR was used to obtain expression level values for transgelin and CFB in 17 additional tumor samples not subjected to microarray analysis. (Two of these samples were omitted because they did not express detectable levels of one or both genes.) Based on the prior analysis of samples using microaπays, we hypothesized for the remaining 15 samples that patients with transgelin/CFB ratios above 1 (n=6) would have generally poor prognosis, and those with ratios below 1 (n=9) would have generally good prognosis. In this case, ratios coπectly identified the 3 individuals with the best clinical outcome (20-, 21-, and 51 -month survival) and the 3 individuals with the worst clinical outcome (2-, 2-, and 4-month survival).
To increase the sample size for statistical considerations, RT-PCR data from these patients was combined with that from patients whose tumors were subjected to microaπay analysis, for a total of 32 samples. In this larger sample set, patient survival was significantly (P=0.0011) coπelated with matched values for transgelin/CFB expression ratios. As expected, median patient survival is inversely proportional to the value of the transgelin/CFB expression ratio.
Next, we formed a contingency table by sorting the number of patient samples with transgelin/CFB expression ratios either above 1 or below 1 into groups representing 5-month survival increments (Table 5). In this case, the prognostic value of the transgelin/CFB expression ratio was again confirmed. Statistical analysis revealed that survival and ratio value were significantly associated (P=0.007) and that this association follows a significant linear trend (P=0.0076). Still, prediction of prognosis was most efficient at either survival extreme (<5 months and >15 months) with 100% of samples from patients with poor survival having ratios >1 and nearly 85% of patients with the longest survival having ratios <1 (Table 5). Table 5
T/C Ratio3 Median Patient Survival0
Greater than 10 4 (n=8)
Greater than 1 5 (n=\4)
Less than 1 12 («=18)
Less than 0.5 14 (n=12)
Less than 0.1 17 («=4)
All samples 9 (n=32)
Survival0 Samples with T/C ratio <1 Samples with T/C ratio >1
<5 0/7 (0%) 7/7 (100%)
5-10 8/11 (73%) 3/11 (27%)
10-15 5/8 (63%) 3/8 (37%)
>15 5/6 (83%) 1/6 (17%)
a, Value of transgelin/CFB gene expression level ratio b, in months As mentioned previously, prognostic markers were originally selected by examining gene expression level differences between samples in subclasses with the greatest difference in median patient survival (Subclass 1 and Subclass 2, see Table 3). It also was found that these genes could distinguish tumors of the epithelial subtype from all others. Then, using the larger cohort of samples, we examined whether there was evidence that the transgelin/CFB expression ratio provided a valuable diagnostic tool in addition to a predictor of prognosis. We found that the transgelin/CFB expression ratio identified the histological subtypes of tumors with a high degree of accuracy. All epithelial subtype tumors (16/16, 100%) had ratio values <1 and nearly all mixed/sarcomatoid subtype tumors (14/16, 88%) had ratio values >1. The 2 non-epithelial subtype samples that were incoπectly diagnosed in this case originated from patients with atypically long survival (12 and 13 months), much longer than the median survival (6 months) of all non-epithelial subtype samples, thereby reflecting the original intent of the ratio (i.e. prediction of prognosis). Following the use of appropriate filtering techniques (e.g., Hsiao, 2001), expression level ratios were found to be extremely robust in differentiating the epithelial subtype of MPM using raw data obtained from Affymetrix aπays with probe sets representing 6,800 genes (n=l 1) and from Affymetrix aπays with probe sets representing 12,000 genes hybridized and scanned by another laboratory («=13). In these 24 samples, the transgelin CFB ratio coπectly predicted histological subtype in 18 with 2 eπors and 4 marginal calls. (The marginal calls were conservatively made when the ratio value was between 0.5 and 2.)
The genes used to create expression ratios (e.g., transgelin and CFB) are not random predictors of diagnosis/prognosis, but have notable biological relevance to carcinogenesis. CFB is significantly overexpressed in epithelial tumors while transgelin is significantly underexpressed in the same samples. Transgelin binds to native actin filament bundles and gels actin in vitro (Shapland, 1993) and has been proposed as a marker of neoplastic transformation (Lawson, 1997). CFB has been implicated in tumor apoptosis in a manner independent of TNF/TNFR or FasL/Fas interactions (Uwai, 2000). Although not determined in this study to have a functional role in MPM carcinogenesis, CFB's pro-apoptotic function is consistent with the observations showing high levels of this gene significantly coπelated with relatively good prognosis (i.e. survival).
Table 6 Accession No.:Description
M81757:H.sapiens S19 ribosomal protein mRNA, complete eds
U14969:Human ribosomal protein L28 mRNA, complete eds
M62895:Human lipocortin (LIP) 2 pseudogene mRNA, complete cds-like region
AL022097:Homo sapiens DNA sequence from PAC 256G22 on chromosome 6p24.1-25.3.
Z28407:H.sapiens mRNA for ribosomal protein L8
X64707:H.sapiens BBC1 mRNA
U14971:Human ribosomal protein S9 mRNA, complete eds
X17206:Human mRNA for LLRep3
M17885:Human acidic ribosomal phosphoprotein PO mRNA, complete eds
AL031228:dJ1033B10.4 (40S ribosomal protein S18 (RPS18, KE-3
Ll 1566:Homo sapiens ribosomal protein L18 (RPL18) mRNA, complete eds
M17733:Human thymosin beta-4 mRNA, complete eds
U 14972. Human ribosomal protein S10 mRNA, complete eds
M64716:Human ribosomal protein S25 mRNA, complete eds
X67247:H. sapiens rpS8 gene for ribosomal protein S8
X16064:Human mRNA for translationally controlled tumor protein
J04755:Human ferritin H processed pseudogene, complete eds
L05095:Homo sapiens ribosomal protein L30 mRNA, complete eds
AL022326:dJ333H23.1.1 (60S Ribosomal Protein L3)
Z48501:H. sapiens mRNA for polyadenylate binding protein II
X69391:H.sapiens mRNA for ribosomal protein L6
X65923:H.sapiens fau mRNA
M17886:Human acidic ribosomal phosphoprotein PI mRNA, complete eds /
M17886:Human acidic ribosomal phosphoprotein PI mRNA, complete eds
L06499:Homo sapiens ribosomal protein L37a (RPL37A) mRNA, complete eds
X55954:Human mRNA for HL23 ribosomal protein homologue
M13934:Human ribosomal protein S14 gene, complete eds
X63527:H. sapiens mRNA for ribosomal protein L19
U14968:Human ribosomal protein L27a mRNA, complete eds
U14970:Human ribosomal protein S5 mRNA, complete eds
L06498:Homo sapiens ribosomal protein S20 (RPS20) mRNA, complete eds
X53777:Human L23 mRNA for putative ribosomal protein
Z12962:H. sapiens mRNA for homologue to yeast ribosomal protein L41
AB002533:Homo sapiens mRNA for Qipl, complete eds
X80822:H.sapiens mRNA for ORF
LOI 124:Human ribosomal protein SI 3 (RPS13) mRNA, complete eds
D23661:Human mRNA for ribosomal protein L37, complete eds
L38941 :Homo sapiens ribosomal protein L34 (RPL34) mRNA, complete eds
X95404:H.sapiens mRNA for non-muscle type cofilin
U14966:Human ribosomal protein L5 mRNA, complete eds
X52851:Human cyclophilin gene for cyclophilin (EC 5.2.1.8)
AF037643:Homo sapiens 60S ribosomal protein L12 (RPL12) pseudogene, partial sequence
Z49148:H. sapiens mRNA for ribosomal protein L29
X15940:Human mRNA for ribosomal protein L31
M94314:Homo sapiens ribosomal protein L30 mRNA, complete eds
Z26876:H.sapiens gene for ribosomal protein L38
Z19554:H.sapiens vimentin gene
X04098:Human mRNA for cytoskeletal gamma-actin
M13932:Human ribosomal protein S17 mRNA, complete eds
M13932:Human ribosomal protein S17 mRNA, complete eds
M24194:Human MHC protein homologous to chicken B complex protein mRNA, complete eds
M24194:Human MHC protein homologous to chicken B complex protein mRNA, complete eds
M58458:Human ribosomal protein S4 (RPS4X) isoform mRNA, complete eds
AB021288:Homo sapiens mRNA for beta 2-microglobulin, complete eds
X55715:Human Hums3 mRNA for 40S ribosomal protein s3
AL031670:dJ681N20.2 (ferritin, light polypeptide-like 1)
X56932:H.sapiens mRNA for 23 kD highly basic protein
X67309:H. sapiens gene for ribosomal protein S6 X57958:H. sapiens mRNA for ribosomal protein L7
U09953:Human ribosomal protein L9 mRNA, complete eds
K00558:human alpha-tubulin mRNA, complete eds
X03342:Human mRNA for ribosomal protein L32
M31520:Human ribosomal protein S24 mRNA
X63432:H. sapiens ACTB mRNA for mutant beta-actin (beta-actin)
X06617:Human mRNA for ribosomal protein SI 1
AB009010:Homo sapiens mRNA for polyubiquitin UbC, complete eds
AB009010:Homo sapiens mRNA for polyubiquitin UbC, complete eds
U37230:Human ribosomal protein L23a mRNA, complete eds
M26252:Human TCB gene encoding cytosolic thyroid hormone-binding protein, complete eds
D23660:Human mRNA for ribosomal protein, complete eds
L20941:Human ferritin heavy chain mRNA, complete eds
M16660:Human 90-kDa heat-shock protein gene, cDNA, complete eds
M22919:Human nonmuscle/smooth muscle alkali myosin light chain gene, complete eds
U34995:Human normal keratinocyte substraction library mRNA, clone H22a, complete sequence
Z23090:H.sapiens mRNA for 28 kDa heat shock protein
J03077:Human co-beta glucosidase (proactivator) mRNA, complete eds
X56009:Human GSA mRNA for alpha subunit of GsGTP binding protein
X04409:Human mRNA for coupling protein G(s) alpha-subunit (alpha-Si)
M14630:Human prothymosin alpha mRNA, complete eds
AB011114:Homo sapiens mRNA for KIAA0542 protein, complete eds
AI201310:qf71bl l.xl Homo sapiens cDNA, 3 end
AI525834:PT1.3_06_D01.r Homo sapiens cDNA, 5 end
AF054187:Homo sapiens alpha NAC mRNA, complete eds
AF054187:Homo sapiens alpha NAC mRNA, complete eds
J04182:Homo sapiens lysosomal membrane glycoprotein-1 (LAMP1) mRNA, complete eds
R87876:yo45h01.rl Homo sapiens cDNA, 5 end
J03592:Human ADP/ATP translocase mRNA, 3 end
T89651 :yd99a05.sl Homo sapiens cDNA, 3 end
X79234:H.sapiens mRNA for ribosomal protein Ll 1
X13546:Human HMG-17 gene for non-histone chromosomal protein HMG-17
D32129:Human mRNA for HLA class-I (HLA-A26) heavy chain, complete eds (clone cMIY-1)
X57352:Human 1-8U gene from interferon-inducible gene family
U73824:Human p97 mRNA, complete eds
U49869:Human ubiquitin gene, complete eds
AI526078:DU3.2-7.G08.r Homo sapiens cDNA, 5 end
AI557852:P6test.G05.r Homo sapiens cDNA, 5 end
X58965:H.sapiens RNA for nm23-H2 gene
X74929:H.sapiens KRT8 mRNA for keratin 8
W52024:zdl3a03.sl Homo sapiens cDNA, 3 end
AL050224:Homo sapiens mRNA; cDNA DKFZp586L2123 (from clone DKFZp586L2123)
AI541542:libtestl6.A02.r Homo sapiens cDNA, 5 end
M33680:Human 26-kDa cell surface protein TAPA-1 mRNA, complete eds
M63573:Human secreted cyclophilin-like protein (SCYLP) mRNA, complete eds
Zl 1692:H.saρiens mRNA for elongation factor 2
M22806:Human prolyl 4-hydroxylase beta-subunit and disulfide isomerase (P4HB) gene
X62654:H.sapiens gene for Me491/CD63 antigen
X13710:H.sapiens unspliced mRNA for glutathione peroxidase
J00194:human hla-dr antigen alpha-chain mrna & ivs fragments
X58536:Human mRNA for HLA class I locus C heavy chain
U15131:Human pl26 (ST5) mRNA, complete eds
L13210:Human Mac-2 binding protein mRNA, complete eds
AI541256:pecl.2-3.Fl l.r Homo sapiens cDNA, 5 end
J04599:Human hPGI mRNA encoding bone small proteoglycan I (biglycan), complete eds
AA044823:zk72al0.sl Homo sapiens cDNA, 3 end /clone=IMAGE-488346
J02984:Human insulinoma rig-analog mRNA encoding DNA-binding protein, complete eds
AF095154:Homo sapiens Clq-related factor mRNA, complete eds
L41498:Homo sapiens longation factor 1-alpha 1 (PTI-1) mRNA, complete eds X56681 :HumanjunD mRNA
M94046:Human zinc finger protein (MAZ) mRNA
AA977163:oq25a04.sl Homo sapiens cDNA, 3 end
AA977163:oq25a04.sl Homo sapiens cDNA, 3 end
M55914 =HUMCMYCQ Human c-myc binding protein (MBP-1) mRNA, complete eds
M64241 =HUMQM Human Wilm s tumor-related protein (QM) mRNA, complete eds
X58965 =HSNM23H2G H.sapiens RNA for nm23-H2 gene
Dl 1139 =HUMTIMP Human gene for tissue inhibitor of metalloproteinases, partial sequence
M55409 =HUMPANCAN Homo sapiens pancreatic tumor-related protein mRNA, partial eds
M84711 =HUMFTE1 A Human v-fos transformation effector protein (Fte-1), mRNA complete eds
X56681 =HSJUNDR Human junD mRNA
M26880 =HUMUBI13 Human ubiquitin mRNA, complete eds
X04803 =HSYUBG1 Homo sapiens ubiquitin gene
D78361 =HUMODAZ Human mRNA for ornithine decarboxylase antizyme, ORF 1 and ORF 2
J04617 =HUMEF1A Human elongation factor EF-1 -alpha gene, complete eds
J04988 =HUMHSP90B Human 90 kD heat shock protein gene, complete eds
D00017 =HUMLIC Homo sapiens mRNA for lipocortin II, complete eds
J03040 =HUMSPARC Human SPARC/osteonectin mRNA, complete eds
J04164 =HUM927A Human interferon-inducible protein 9-27 mRNA, complete eds
V00567 =HSMGLO Human messenger RNA fragment for the beta-2 microglobulin
D14530 =HUMRSPT Human homolog of yeast ribosomal protein S28, complete eds
Ribosomal Protein S20
M 14199 =HUMLAMR Human laminin receptor (2H5 epitope) mRNA, 5 end
M63138 =HUMCATD5 Human cathepsin D (catD) gene, exons 7, 8, and 9
S82297 =S82297 beta 2-microglobulin
V00599 =HSTUB2 Human mRNA fragment encoding beta-tubulin. (from clone D-beta-1)
Table 7
Symbol Description BF B-factor, properdin
MSLN mesothelin
TM4SF1 transmembrane 4 superfamily member 1
CYC1 cytochrome c-l
RPL12 ribosomal protein L12
POLR2L polymerase (RNA) II (DNA directed) polypeptide L (7.6kD)
RPL18 ribosomal protein L18
RPL18A ribosomal protein L18a
RPS23 ribosomal protein S23
RPS21 ribosomal protein S21
RPL27 ribosomal protein L27
K-ALPHA-1 tubulin, alpha, ubiquitous
ARHGAP1 Rho GTPase activating protein 1
TPM1 tropomyosin 1 (alpha)
APOL apolipoprotein L
TPM1 tropomyosin 1 (alpha)
SPARC secreted protein, acidic, cysteine-rich
(osteonectin)
COL1A2 collagen, type I, alpha 2
FN1 fibronectin 1
NA Fibronectin, Alt. Splice 1
FN1 fibronectin 1
COL5A2 collagen, type V, alpha 2
COL1A2 collagen, type I, alpha 2
ACTA2 actin, alpha2, smooth muscle, aorta
TAGLN transgelin
Table 8
Accession # Symbol Description
U38980 PMS2L11 postmeiotic segregation increased 2-like 11
J04152 TACSTD2 tumor-associated calcium signal transducer 2
AI820718 Homo sapiens cDNA, 5 end
U43203 TITF1 thyroid transcription factor 1
AB000714 CLDN3 claudin 3
AJ002308 SYNGR2 synaptogyrin 2
AB000712 CLDN4 claudin 4
AF015128 Homo sapiens IgG heavy chain variable region
(Vh26) mRNA
Ml 8728 CEACAM6 carcinoembryonic antigen-related cell adhesion molecule 6
D83402 Homo sapiens gene for prostacyclin synthase J02761 SFTPB surfactant, pulmonary-associated protein B
X56667 CALB2 calbindin 2, (29kD, calretinin)
XI 6662 ANXA8 annexin (A8 vascular anticoagulant-beta (VAC- beta))
AB016789 GFPT2 glutamine-fructose-6-phosphate transaminase 2
Z93930 XBP1 X-box binding protein 1
AI651806 LOC51232 cysteine-rich repeat-containing protein S52 precursor,
AW024285 Homo sapiens cDNA, 3 end
AI445461 TM4SF1 transmembrane 4 superfamily member 1
M93036 TACSTD1 tumor-associated calcium signal transducer 1
M21389 KRT5 keratin 5
References for Example 2
1. Golub, T.R. et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531-537 (1999).
2. Perou, CM. et al. Molecular portraits of human breast tumours. Nature 406, 747-752 (2000).
3. Hedenfalk, I. et al. Gene expression profiles in hereditary breast cancer. N Engl J Med 344, 539-548 (2001).
4. Khan, J. et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7, 673-679 (2001).
5. Quackenbush, J. Computational analysis of microaπay data. Nat Rev Genet 2, 418- 427 (2001). 6. Corson, J.M. & Renshaw, A.A. Pathology of mesothelioma. in Comprehensive
Textbook of Thoracic Oncology (eds Aisner, J., Arriagada, R., Green, M.R., Martini, N. & Perry, M.C.) 757-758 (Williams and Wilkins, Baltimore, MD, 1996).
7. Virtaneva, K. et al. Expression profiling reveals fundamental biological differences in acute myleoid leukemia with isolated trisomy 8 and normal cytogenetics. Proc Nati Acad Sci USA 98, 1124-1129 (2001).
8. Welsh, J.B. et al. Analysis of gene expression profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular markers of epithelial ovarian cancer. Proc Nati Acad Sci USA 98, 1176-1181 (2001).
9. Clark, E.A., Golub, T.R., Lander, E.S. & Hynes, R.O. Genomic analysis of metastasis reveals an essential role for RhoC. Nature 406, 532-535 (2000). 10. Mountain, CF. Revisions in the international system for staging lung cancer. Chest 111, 1710-1717 (1997).
11. Fodor, S.A. Massively parallel genomics. Science 277, 393-395 (1997).
12. Lawson, D., Harrison, M. & Shapland, C Fibroblast transgelin and smooth muscle SM22a are the same protein, the expression of which is down-regulated in many cell lines. Cell Motil Cytoskeleton 38, 250-257 (1997).
13. Shapland, C, Hsuan, J.J., Totty, N.F. & Lawson, D. Purification and properties of transgelin: A transformation and shape change sensitive actin-gelling protein. J Cell Biol 121, 1065-1073 (1993). 14. Uwai, M. et al. A new apoptotic pathway for the complement factor B-derived fragment Bb. J Cell Physiol 185, 280-292 (2000). 15. Sugarbaker, DJ. et al. Extrapleural pneumonectomy in the multimodality therapy of malignant pleural mesothelioma. Results in 120 consecutive patients. Ann Surg 224,
288-294 (1996). 16. Wang, K. et al. Monitoring gene expression profile changes in ovarian carcinomas using cDNA microaπays. Gene 229, 101-108 (1999). 17. Warrington, J.A., Nair, A., Hahadevappa, M. & Tsyganskaya, M. Comparison of human adult and fetal expression and identification of 535 housekeeping/maintenance genes. Physiol Genomics 2, 143-147 (2000). 18. O'Dell, S.D., Bujac, S.R., Miller, G.J. & Day, I.N. Associations of IGF2 Apal RFLP and INS VNTR class I allele size with obesity. Eur J Hum Genet 7, 565-576 (1999).
19. Venables, W.N. & Riley, B.D. Modem Applied Statistics with S-Plus, (Springer, New York, 1997).
20. Harrison's Principles of Internal Medicine, 14/e, (McGraw-Hill Companies, New York, 1998).
21. The Chipping Forecast, Nature Genetics, 21(1), 1-60 (1999).
22. Gwynne, P., and Page, G., Microaπay Analysis: the next revolution in Molecular Biology, Science eMarketplace, Science, August 6 (1999). (sciencemag.org/feature/e-market/benchtop/micro.shl) 23. Molecular Cloning: A Laboratory Manual, J. Sambrook, et al., eds., Second Edition, (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1989). 24. Cuπent Protocols in Molecular Biology, F.M. Ausubel, et al., eds., (John Wiley & Sons, Inc., New York, 1999). 25. Wagner et al., Nature Biotechnol. 14, 840-844 (1996).
26. Sainio, K., Saarma, M., Nonclercq, D., Paulin, __., and Sariola, H. Antisense inhibition of low-affinity nerve growth factor receptor in kidney cultures: power and pitfalls. Cell Mol. Neurobiol. 14(5), 439-457 (1994).
Example 3: Prediction of Outcome in Mesothelioma Using Gene Expression Ratios
Introduction Malignant pleural mesothelioma is an asbestos related, lethal neoplastic disease of the pleura (median survival between 4 and 12 months) subdivided into three major histological subtypes: epithelial, mixed, and sarcomatoid (1-4). Compared to patients with non-epithelial subtypes, patients with the epithelial subtype show a survival benefit from a variety of treatment strategies, including aggressive multi-modality therapy (5-7). Cuπently, patients who present to our unit with unilateral mesothelioma without extrapleural invasion undergo complete surgical resection (extra-pleural pneumonectomy) followed by chemoradiation. The 5-year survival for those patients with stage I and epithelial histology is 40%. However, there are no predictive factors, prognostic molecular markers, or genetic abnormalities other than histological subtype to preoperatively identify these (or other) long-term survivors. In addition, established methods to predict outcome in mesothelioma based on histological appearance are somewhat subjective, prone to human eπor, and are ineffective for small patient cohorts or in extreme cases for individual patients (3,8,9).
Gene expression profiling using microarrays holds promise to improve strategies for tumor classification as well as for prediction of response to therapy and survival in cancer (10-16). Nevertheless, no clear consensus exists regarding which computational tools are optimal for the analysis of large gene expression profiling data sets, particularly when predicting outcome. As a result, microaπay-based research has not yet significantly impacted the clinical treatment of disease. Recently, we have shown that simple ratios of gene expression using as few as four to six genes are highly accurate in the diagnosis of cancer and we hypothesized that this technique was equally useful in additional clinical applications (17). To explore this further, we used gene expression profiling data (17) of mesothelioma samples from patients with widely divergent survival to create an expression ratio-based test capable of predicting outcome in mesothelioma in a manner independent of the histological subtype of the tumor. We found that a simple test (based on the expression levels of four genes) can (i) predict outcome in mesothelioma with high accuracy, (ii) use relatively inexpensive data acquisition platforms, and (iii) analyze individual patients without reference to additional samples.
Methods
Mesothelioma tumor tissues. Discarded mesothelioma surgical specimens were freshly collected and flash frozen from patients undergoing definitive surgery for mesothelioma at Brigham and Women's Hospital who did not receive pre-operative treatment (6). To train an outcome predictor model in this study, we used previously published microaπay data (17) to identify a subset of mesothelioma samples obtained from patients with widely divergent survival (n=17 total). An additional 29 samples (i.e., the test set) were used for quantitative RT-PCR analysis only. Each tumor specimen contained greater than 50% tumor cells. Linked clinical and pathological data were obtained for all patients who contributed tumor specimens and rendered anonymous to protect patient confidentiality. Studies utilizing human tissues were approved by and conducted in accordance with the policies of the Institutional Review Board at Brigham and Women's Hospital.
Real time quantitative RT-PCR. Total RNA (2 μg) isolated from 29 tumors in the test set was reverse-transcribed into cDNA using Taq-Man Reverse Transcription reagents (Applied Biosystems, Foster City, CA) and quantified using all recommended controls. Primer sequences (synthesized by Invitrogen Life Technologies) were as follows (forward and reverse):
L6 5'-TTCCATTCCACAATGTGCTT-3' (SEQ ID NO:79) and
5'-GGCCAGTGGAACTACACCTT-3' (SEQ ID NO:80); KIAA0977 5'-AACCGAAGCCTAACCTGAGA-3' (SEQ ID NO:81) and 5'-GTCATTTTGGGAGCAGGTTT-3' (SEQ ID NO:82); GDIAl 5'-AGAAGCAGTCGTTTGTGCTG-3' (SEQ ID NO:83) and
5'-TGTACTTCATGCCGGACACT-3' (SEQ ID NO: 84); and CTHBP 5'-ATCTGAAGTTTGGGGTCGAG-3' (SEQ ID NO:85) and 5 '-TCTCTCCC AGGACCTTCCTA-3 ' (SEQ ID NO: 86).
PCR amplification was performed using an Applied Biosystems 5700 Sequence Detector. No-template (negative) controls containing H2O substituted for template were run in multiple wells on every reaction plate. An automatically calculated melting point disassociation curve generated after every assay was examined to ensure the presence of a single PCR species and a lack of primer-dimer formation in each well. The Comparative CT method (Applied Biosystems) was used with minor modifications to obtain quantitative values for gene expression ratios in all samples. Calculation of an expression ratio using data from two genes in any single sample negates the need for a calibrator sample and a reference gene to standardize for different starting template amounts. Therefore, to form expression ratios of two genes, we merely stated the expression level of one gene relative to the other. In this case, the ΔΔCγ value in the Comparative CT equation reduces to: [Cτ(gene i) - Cτ(gene2)]- Data and statistical analysis. A two-sided Student's (parametric) t-test was used for pair-wise comparisons of average gene expression levels among multiple groups and the Significance Analysis of Microaπays (SAM) algorithm (18) was used to estimate the false discovery rate. Kaplan-Meier curves were used to estimate survival in each group. The log- rank test was used to statistically assess differences among multiple survival curves. A Cox proportional-hazards regression model was used for multivariate analysis. The "leave-one- out" method of cross validation (16,19,20) was used to assess internal consistency of the predictor model and analyzed using Fisher's exact test (i.e. 2x2 contingency table). All differences were determined to be statistically significant if P<0.05. Data from three highly accurate gene expression ratios were combined by calculating the geometric mean, (RiR R3) , where R, represents a single ratio value. This is the mathematical equivalent to the average of [log2(Rι), log2(R2), log2(R3)], thereby giving equal weight to ratio fold- changes of identical magnitude but opposite direction. All calculations and statistical comparisons were generated using S-PLUS (21).
Results Identification of prognostic molecular markers in mesothelioma. We have previously identified for study a representative cohort of 31 mesothelioma tumors obtained at pneumonectomy (17). The estimated median patient survival (11 months, Fig. 6A) and histological distribution of this group minor those of mesothelioma patients in our practice (6). The histological subtype of the tumor was not predictive of outcome for these samples ( =0.129, log-rank test, Fig. 6B), even though the estimated median survival of epithelial subtype samples (17 months) was longer than that for non-epithelial subtype samples (8.5 months). To identify genes that are discriminatory between tumors from patients with widely divergent survival and to create an expression ratio-based predictor model, we utilized microaπay data (17) for mesothelioma samples that originated from patients whose survival was within the 25th percentile of both disease-related survival extremes iπespective of tumor histological subtype (i.e., the training set, n=17, Table 9A). We formed two groups using these samples: relatively good outcome (survival > 17 months, n=8) and relatively poor outcome (survival < 6 months, n=9). The most accurate model developed in the training set was subsequently tested in an independent cohort of samples (i.e. the test set, n=29, Table 9B). We searched all of the genes represented on the microaπay for those with a statistically significant >2-fold difference in average expression levels between good outcome and poor outcome tumors in the training set of samples. To minimize the effects of background noise, the list of distinguishing genes was further refined by requiring that the mean expression level be >500 in at least one of the two sample sets. We identified a total of 46 prognostic genes in this analysis with an estimated false discovery rate of 10%-20%. The 10 genes with the lowest P values overexpressed in each group are listed in Table 10.
Table 9A. Clinical characteristics of MPM tumors, Training Set
Training Set
Age BWH Survival
Sample (years) Sex Histology" Stage (months) Status6
72 46 m mixed 2 53 3
74 40 f ept 1 51 2
90 48 m ept 2 28 2
2 44 f ept 2 26 2
68 61 m ept 2 21 3
33 60 f ept 2 20 3
109 62 m ept 2 19 3
76 67 m ept 1 17 3
130 55 m mixed 2 6 3
166 66 m sarc 2 6 3
67 49 f ept 2 6 3
229 33 f ept 2 5 3
6 39 m ept 2 5 3
89 55 m mixed 2 3 ' 3
133 69 m mixed 2 2 3
114 51 m mixed 2 2 3
159 62 m sarc 2 2 3
\ ept., epithelial; sarc, sarcomatoid b, 1, alive without disease; 2, alive with disease; 3, dead with disease; 4, dead other causes; U, unknown Table 9B. Clinical characteristics of MPM tumors, Test Set
Test Set
Age BWH Survival
Sample (years) Sex Histology" Stage (months) Status
169 46 m ept 2 7 3
146 67 m ept 2 7 3
219 39 m ept 2 6 1
104 40 m ept 2 5 3
110 64 m ept 2 5 3
112 31 m ept 2 55 3
165 51 m ept 2 27 2
5 51 m ept 2 8 3
148 51 m ept 2 17 3
96 40 m ept 2 1 3
134 56 m ept 2 1 4
216 43 f ept 2 8 1
208 63 f ept 2 7 1
224 68 f ept 2 6 1
225 35 f ept 2 42 2
163 68 f ept 2 25 1
235 46 m mixed 2 24 3
206 45 m mixed 2 45 2
107 69 m mixed 2 16 3
302 55 m mixed 2 13 3
161 59 m mixed 2 12 3
220 71 m mixed 2 12 3
217 57 m mixed 1 5 1
150 58 m mixed 2 3.6 3
44 57 m mixed 2 2 4
222 57 m mixed 2 1 U
154 56 f mixed 2 9 3
70 57 m sarc. 2 8 3
228 73 m sarc. 2 4 3
\ ept., epithelial; sarc., sarcomatoid b, 1, alive without disease; 2, alive with disease; 3, dead with disease; 4, dead other causes; U, unknown
Prediction of outcome using gene expression ratios. We chose the four genes most significantly overexpressed in each group (Table 10) to determine whether expression ratios could accurately classify the 17 samples used to train the model. We calculated a total of 16 possible expression ratios per sample by dividing the expression value of each of the 4 genes (i.e., SBP, KIAA0977 protein, L6 EST, LAR) expressed at relatively higher levels in good outcome samples by the expression value of each of the 4 genes (i.e., CTHBP, calgizzarin, IGFBP-3, GDIAl) expressed at relatively higher levels in poor outcome samples. Samples with ratio values >1 were predicted to be "good outcome" and those with ratio values <1 were predicted to be "poor outcome". The five most accurate ratios singularly identified 88% (15/17) of the samples used to train the model. To incorporate the predictive accuracy of multiple ratios, we calculated the geometric mean (see Methods) for all possible 3-ratio combinations (formed using these 5 ratios) and found that we could identify training samples with accuracy that met or exceeded that of any of the gene pair ratios when used alone (average=94%, range 88%- 100%). For further analysis, we chose one of the two 3-ratio combinations that coπectly classified 100% (17/17) of the training samples. A total of 4 genes were used in this 3-ratio test: KIAA0977/GDIA1, L6/CTHBP, and L6/GDIA1.
Table 10. Mesothelioma prognostic genes
Accession # P value Ratio" Description
Expressed at relatively higher levels in good outcome tumors
U29091 0.0033 2.8 selenium-binding protein (SBP)
AB023194 0.0065 2.1 KIAA0977 protein
AI445461 0.0073 3.0 EST (similar to L6 tumor antigen)
Y00815 0.0077 2.0 leukocyte antigen related protein (LAR)
D84424 0.0094 6.0 hyaluronan synthase
Y00318 0.0103 3.6 complement control protein factor I
AL049963 0.0103 3.7 EST
AJ223352 0.0142 3.5 histone H2B
AB000220 0.0181 2.3 semaphorin E
L39945 0.0182 2.5 cytochrome b5 (CYB5)
M90657 0.0256 2.8 L6 tumor antigen
AB002301 0.0257 2.1 KIAA0303 protein
Expressed at relatively higher levels in poor outcome tumors
M26252 0.0013 0.38 cytosolic thyroid hormone-binding protein (CTHBP)
D38583 0.0041 0.43 calgizzarin
*M35878 0.0046 0.35 insulin-like growth factor-binding protein-3 (IGFBP-
3)
X69550 0.0063 0.47 GDP-dissociation Inhibitor 1 (GDIAl)
M95787 0.0068 0.33 22kDa smooth muscle protein (SM22), AKA transgelin
AB023208 0.0069 0.49 KJAA0991 protein
X95735 0.0105 0.43 zyxin
AA976838 0.0131 0.40 EST
U90878 0.0132 0.49 carboxyl terminal LIM domain protein (CLEM1)
'M35878 0.0135 0.30 insulin-like growth factor-binding protein-3 (IGFBP- 3) U53204 0.0169 0.39 plectin (PLECl)
M95178 0.0215 0.40 non-muscle alpha-actinin a, average expression level in good outcome samples/average expression level in poor outcome samples
IGFBP-3 is listed twice in the lower portion of the table because this gene is represented by multiple Affymetrix probe sets.
Verification of microarray data. Next, we utilized quantitative RT-PCR to verify gene expression levels measured in microaπay-based analysis. We randomly chose 3 samples each from both groups: the good outcome group (samples 74, 33, and 68) and the poor outcome group (samples 89, 229, and 67). Using RT-PCR, we determined the relative expression level of all 4 prognostic genes (L6, GDI, CTHBP, and KIAA0977 protein) in these 6 samples. Then, we calculated the 3 individual ratios previously used to predict outcome: KIAA0977/GDIA1, L6/CTHBP, and L6/GDIA1. Finally, we calculated the geometric mean of these 3 ratios and compared the magnitude and direction (i.e. >1 or <1) of this number to that obtained using microaπay analysis. We found that classification using the 3-ratio geometric means calculated with data from both platforms were in perfect agreement for all 6 samples (Fig. 6C).
Validation of the model. We utilized a "leave-one-out" cross validation technique (16,19,20) to assess the internal variation of a 3-ratio predictor model. For this analysis, we analyzed 17 different training sets by withholding 1 of the 17 samples to construct a new expression ratio-based classifier exactly as before and then predicting the class (either good or poor outcome) of the withheld sample. This process was repeated sequentially for the remaining 16 samples. We found that 88% (15/17) of the samples were coπectly identified in this analysis (P=0.0034, Fisher's exact test).
Verification of expression level ratios as outcome predictors. Finally, we tested the ability of expression ratios to predict outcome in a new cohort of mesothelioma tumor samples not subjected to microarray analysis (n=29, the test set, Table 9B). The histological distribution and the estimated median patient survival (12 months, Fig. 7 A) of the test set of samples was also representative of those of mesothelioma patients in our practice (6). As before, we found that histological subtype was not strongly predictive of survival in the new cohort of samples (P=0.3.45, log-rank test, Fig. 7B). We used quantitative RT-PCR to determine relative expression levels for the 4 predictor genes and calculated the geometric mean of 3 prognostic ratios: KIAA0977/GDIA1, L6/CTHBP, and L6/GDIA1. Similarly, samples with geometric means >1 and <1 were assigned to good outcome and poor outcome groups, respectively. A total of 11 samples were assigned to the good outcome group and 18 to the poor outcome group. The number of test set samples "coπectly" classified was estimated using the median survival (12 months) of the entire cohort as a cut-off to form 2 groups: relatively good outcome (>12 month survival) and relatively poor outcome (<12 month survival). Only those 17 samples from patients that died from disease were considered (status 3, Table 9). We found that the exact same number of test set samples were classified coπectly in this analysis (88%, 15/17) as in the analysis of the training set. To include all samples in the assessment of the model, we performed Kaplan-Meier survival analysis using expression ratio predictions made for the test set of samples. The estimated median survival for the good outcome group (36 months) was over 5-fold higher than that for the poor outcome group (7 months). In addition, we found that the 3-ratio geometric mean model significantly (R=0.0035, log-rank test, Fig. 7C) predicted outcome in the new set of samples. Since it has been demonstrated in very large sample cohorts that patients with epithelial histology generally enjoy significantly longer disease-free survival than patients with non- epithelial histology (22), we used multivariate analysis to examine whether our results using expression ratios were independent of the histological subtype of the tumor. By fitting a Cox proportional-hazards regression model, we found that the (3-ratio) geometric mean value significantly predicts outcome (P=0.0094, hazard ratio=2.6) independent of the histological subtype of the tumor (P=0.75, hazard ratio=0.32). Expression ratios coπectly predicted outcome independently of the histological subtype of the tumor in the new set of samples, indicating the ratio method is a better prognostic tool.
Discussion
Cuπent methods of prognosis in mesothelioma include stage and histology at the time of surgery. However, these techniques are not completely reliable and accurate staging usually requires extensive surgery (3,8,9). Recently, we discovered that simple ratios of gene expression can be used to accurately diagnose cancer (17) while successfully avoiding many of the shortcomings which preclude the use of other microaπay analytical techniques in wider clinical applications (10,20). In this study, we describe a technique that uses expression data from four genes to independently predict outcome in mesothelioma patients who undergo extrapleural pneumonectomy followed by standard chemoradiation therapy. Although this analysis only utilized four genes, the expression ratio technique can easily incorporate larger numbers of genes when required for acceptable accuracy. To our knowledge, this is the first study in human cancer to use expression profiling techniques to identify treatment related prognostic markers in cancer for use in the development of an outcome predictor model, and to validate the model in an independent cohort using a simpler data acquisition platform such as RT-PCR. Other investigators have tested outcome predictor models in independent samples (16), but studies of this sort continue to be hindered in their clinical applicability through their reliance on relatively large numbers of genes, costly data acquisition platforms (i.e. microaπays), the need for sophisticated algorithms/software, and the inability to analyze a sample independently and without reference to other samples. The prognostic tool described herein could dramatically impact the cuπent clinical treatment of mesothelioma by identifying preoperatively patients not likely to respond to conventional treatment modalities thus sparing them from radical surgery. It is cuπently our practice to obtain a tissue diagnosis prior to recommending therapy for patients with mesothelioma, but the absence of suitable prognostic molecular markers make it difficult to assign optimal treatments or investigate new modalities. The results of this work, if confirmed prospectively in a larger patient population, should prove helpful in the development of meaningful clinical trials for patients with mesothelioma. We hypothesize that patients whose tumors are analyzed using gene expression ratios and predicted to have relatively poor outcomes are excellent candidates for neo-adjuvant chemotherapy protocols as they are unlikely to benefit from upfront surgery, whereas patients predicted to have relatively good outcomes are more likely to enjoy long term survival after conventional surgical and adjuvant chemoradiation.
The use of gene expression ratios to predict patient outcome in mesothelioma and other cancers (17) overcomes several major obstacles to the clinical use of microaπay data. Unlike other widely accepted supervised learning techniques with similar predictive accuracy (10,16,20), the expression ratio method generates a simple numerical measure that can be used to predict clinical outcome using a single biopsy specimen. Since this non-linear function of gene expression is a unit-less number and does not require data from additional training samples or from additional reference genes, expression levels can be measured using any reliable method including quantitative RT-PCR, cDNA and oligonucleotide microaπays, SAGE, or perhaps ELISAs for encoded proteins. The expression ratio technique can also facilitate examination of microaπay data by investigators without direct access to sophisticated analytical tools. Using previously published data, we have created ratio-based tests using small numbers of genes that can diagnose localized prostate cancer and predict clinical outcome in breast cancer (see Example 4).
We believe that attempts to bridge the gap between expression profiling studies in cancer and meaningful clinical applications should follow the general spirit of Occam's Razor principle: "among a set of otherwise equal models, choose the simplest". Although other microarray-based predictor models in cancer may utilize relatively small numbers of genes to accurately predict outcome (16,19,20), these approaches continue to be limited in their clinical applicability. Furthermore, it has yet to be determined if these approaches can utilize relatively low-cost and widely available data acquisition platforms such as RT-PCR and retain significant survival predictions. The expression ratio technique is fundamentally similar to other widely accepted bioinformatics techniques (10) in that it utilizes genes with inversely coπelated expression levels in multiple groups. The principal advantages to the use of expression ratios in predicting clinical parameters is their relative simplicity, platform independence for data acquisition, and requirement for small quantities of fresh or frozen tissue for analysis. In addition, these tests are relatively low cost and can be used to analyze samples independent of a training set. For this reason, it is likely that the expression ratio technique will find additional uses in the clinical management of other cancers and diseases.
References for Example 3
1. Pass H. Malignant pleural mesothelioma: Surgical roles and novel therapies. Clin Lung Cancer 2001;3:102-117.
2. Aisner J. Diagnosis, staging, and natural history of pleural mesothelioma. In:Aisner J, Arriagada R, Green MR, et al, Aisner J, Arriagada R, Green MR, et als.
Comprehensive Textbook of Thoracic Oncology. Baltimore (MD): Williams and Wilkins; 1996. 799-785.
3. Ong S-T, Vogelsang NJ. Cuπent therapeutic approaches to unresectable (primary and recuπent) disease. In:Aisner J, Arriagada R, Green MR, et al, Aisner J, Arriagada R, Green MR, et als. Comprehensive Textbook of Thoracic Oncology. Baltimore (MD):
Williams and Wilkins; 1996. 799-814.
4. Peto J, Hodgson JT, Matthews FE, Jones JR. Continuing increase in mesothelioma mortality in Britain. Lancet 1995;345:535-539. 5. Sugarbaker DJ, Flores RM, Jaklitsch MT, Richards WG, Strauss GM, Corson JM, et al. Resection margins, extrapleural nodal status, and cell type determine postoperative long-term survival in trimodality therapy of malignant pleural mesothelioma: results in 183 patients. J Thorac Cardiovasc Surg 1999;117:54-65. 6. Sugarbaker DJ, Garcia JP, Richards WG, Harpole DH, Jr., Healy-Baldini E, DeCamp
MM, Jr., et al. Extrapleural pneumonectomy in the multimodality therapy of malignant pleural mesothelioma. Results in 120 consecutive patients. Ann Surg 1996;224:288-294.
7. Sugarbaker D, Strauss GM, Lynch TJ, Richards W, Mentzer SJ, Lee TH, et al. Node status has prognostic significance in the multimodality therapy of diffuse, malignant mesothelioma. J Clin Oncol 1993 ; 11 : 1172- 1178.
8. Corson JM, Renshaw AA. Pathology of mesothelioma. In:Aisner J, Arriagada R, Green MR, et al, Aisner J, Arriagada R, Green MR, et als. Comprehensive Textbook of Thoracic Oncology. Baltimore (MD): Williams and Wilkins; 1996. 757-758. 9. Ordonez NG. The value of antibodies 44-36A, SM3, HBME-1, and thrombomodulin in differentiating epithelial pleural mesothelioma from lung adenocarcinoma. Am J Surg Pathol 1997;21:1399-1408.
10. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999;286:531-537.
11. Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, et al. Molecular portraits of human breast tumours. Nature 2000;406:747-752.
12. Hedenfalk I, Duggan D, Chen Y, Radmacher M, Bittner M, Simon R, et al. Gene expression profiles in hereditary breast cancer. N Engl J Med 2001 ;344:539-548. 13. Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 2001;7:673-679.
14. Welsh JB, Sapinoso LM, Su Al, Kem SG, Wang-Rodriguez J, Moskaluk CA, et al. Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. Cancer Res 2001 ;61 :5974-5978.
15. Dhanasekaran SM, Baπette TR, Ghosh D, Shah R, Varambally S, Kurachi K, et al. Delineation of prognostic biomarkers in prostate cancer. Nature 2001;412:822-826. 16. van 't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AAM, Mao M, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002;415:530- 536.
17. Gordon GJ, Jensen RV, Hsiao L-L, Gullans SR, Blumenstock JE, Ramaswami S, et al. Translation of microaπay data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 2002 ;In Press.
18. Tusher VG, Tibshirani R, Chu G. Significance analysis of microaπays applied to the ionizing radiation response. Proc Nati Acad Sci USA 2001;98:5116-5121.
19. Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, et al. Prediction of central nervous system embryonal tumor outcome based on gene expression. Nature 2002;415:436-442.
20. Shipp MA, Ross KA, Tamayo P, Weng AP, Kutok JL, Aguiar RCT, et al. Diffuse large B-cell lymphoma outcome prediction by gene expression profiling and supervised machine learning. Nat Med 2002;8:68-74. 21. Venables WN, Riley BD: Modem Applied Statistics with S-Plus. New York (NY):
Springer; 1997. 22. Sugarbaker DJ, Liptay MJ. Therapeutic approaches in malignant mesothelioma.
In:Aisner J, Aπiagada R, Green MR, et al, Aisner J, Arriagada R, Green MR, et als.
Comprehensive Textbook of Thoracic Oncology. Baltimore (MD): Williams and Wilkins; 1996. 786-798.
Example 4: Diagnostic and Pro£nostic Tests In Prostate and Breast Cancer From Expression Profiling Data
Cuπent gene expression profiling-based bioinformatics tools are highly accurate in the diagnosis and prognosis of cancer (1-6). However, the widespread clinical applicability of these techniques is cuπently limited owing largely to a lack of a practical method for translating complex profiling analyses to functional clinical tests. To address this issue, we have created a simple yet effective technique with broad and immediate clinical applicability for perfom ing relatively low cost diagnosis and prediction of prognosis in cancer (see Examples above and reference 7). Our method utilizes a supervised comparison of extensive gene profiling data to identify differentially expressed genes between two groups. Carefully chosen genes are then used to calculate simple expression ratios which in turn are set to predict (in a binary numerical manner) the clinical parameter in question. To date, we have demonstrated the applicability of this method in distinguishing mesothelioma from lung adenocarcinoma (see Examples above and reference 7), in identifying patients with favorable prognosis after surgery for mesothelioma, and in predicting patients with favorable outcome after treatment for medulloblastoma (see Examples above and reference 7). In this study, we have tested the accuracy of ratio-based predictions in two separate applications: the diagnosis of prostate cancer and the prediction of clinical outcome in early stage, node-negative resected breast cancer. By using multiple previously published datasets to train and validate our predictor models, we have also directly tested the hypothesis that this gene expression ratio technique is platform independent and can be utilized in widespread fashion by large numbers of clinical and translational investigators.
Methods
Tumor tissues. Ten sets of matched normal adjacent prostate and malignant prostate cancer (20 specimens total) were obtained from the Tumor Bank at Brigham and Women's Hospital. Studies utilizing human tissues were approved by and conducted in accordance with the policies of the institutional Review Board at Brigham and Women's Hospital.
Expression profiling data. Microaπay data for prostate tissues was obtained from two sources. Gene expression data composing the initial "training set" were obtained using a 9,984-element cDNA microaπay (12) and consisted of PCA (n=14) and a group (n=18 total) composed of both NAP (n=4) and BPH (n=14) tissues (Supplemental Figure 8 Data). When there was no data for a given gene due to a technical artifact, we conservatively assumed no change in expression level from the pooled reference mRNA. Gene expression data composing the initial "test set" were obtained using Affymetrix high-density oligonucleotide microaπays with probe sets representing approximately 12,000 genes (13) and consisted of NAP (n=9) and PCA (n=25) tissues. For this dataset, we scaled gene hybridization intensities (i.e. ".eel" files) to a "target intensity" of 100 using Affymetrix GeneChip® Software, v.5.0 (Affymetrix, Santa Clara, CA). Gene expression data for breast cancer tissues were obtained from a single source using a microaπay containing approximately 25,000 genes (6). The "training set" consisted of two groups of samples: those from 44 patients with greater than 5 years disease-free survival (i.e., relatively good outcome) and those from 34 patients with less than 5 years disease- free survival (i.e., relatively poor outcome). The "test set" consisted of 19 additional profiled patient samples.
Real time quantitative RT-PCR. Quantitative RT-PCR was performed as described in the examples above and in reference (7). Primer sequences were as follows: HPN 5 '-AATACATCCAGCCTGTGTGC-3 ' (SEQ ID NO:95) and
5'-TGGCCATAGTACTGCGTGTT-3' (SEQ ID NO:96); MEIS2 5'-TTAGCGCAAGACACAGGACT-3' (SEQ ID NO:97) and
5'-CACTCGTCGATTTGACTGGT-3' (SEQ ID NO: 98); C7 5'-TCAAAATGGTGGTTTGGCTA-3' (SEQ ID NO:99) and
5'-CCTACGAGGACTCCTTGCTC-3' (SEQ ID NO: 100); and FN1 5'-GCCATGACAATGGTGTGAAC-3' (SEQ ID NO:101) and
5'-GCAAATGGCACCGAGATATT-3' (SEQ ID NO: 102). Data and statistical analysis. The selection of predictor genes for use in expression ratio-based diagnosis and prognosis was performed essentially as described in the examples above and in reference (7). Basically, a two-sided Student's (parametric) t-test was used for pair- wise comparisons of average gene expression levels among multiple groups to select predictor genes that have highly significant, inversely coπelated average expression levels. Data from multiple highly accurate gene expression ratios were combined by calculating the geometric mean, thereby giving equal weight to ratio fold-changes of identical magnitude but opposite direction. The classification accuracy of selected ratios was assessed using Fisher's exact test. Kaplan-Meier time-to-relapse analysis was used to assess disease-free survival. The log-rank test was used to statistically assess differences among multiple survival curves. All differences were determined to be statistically significant if P<0.05. All calculations and statistical comparisons were generated using S-PLUS (14).
Results and Discussion
Diagnosis of prostate cancer using gene expression ratios. Prostate cancer is exceedingly common among males in the U.S. (8). Unfortunately, widespread serum prostate-specific antigen (PSA) screening has been found to present major drawbacks (9-11). For this reason, patients who are PSA-positive and at a moderate or high risk for prostate cancer undergo a core needle biopsy of the prostate for definitive diagnosis, a procedure associated with substantial patient discomfort. To improve the diagnostic accuracy, decrease the discomfort, and reduce the resulting non-compliance associated with current methodology, we explored the feasibility of designing a less invasive diagnostic test for prostate cancer. To accomplish this, we designed an expression ratio-based test which would utilize RT-PCR for data acquisition, and by virtue of the quantity of RNA needed (e.g., <100 pg), would likely support sample attainment using fine needle aspirations (FNA).
We identified two published reports that provide extensive gene profiling data from prostate cancer and non-malignant prostate tissues (12, 13). We used data from one manuscript to develop our training set and data from the other as our test set (see Methods for details). To create an expression ratio-based diagnostic test capable of distinguishing prostate cancer (PCA) from either normal adjacent prostate or benign prostatic hypertrophy (NAP and BPH, respectively) we first identified a total of 19 known genes with inversely coπelated average expression levels in the training set that matched our filtering criteria (PO.01, at least a 2-fold difference in mean expression levels between PCA and NAP/BPH). We chose 11 of these genes for further analysis since they were also represented on the expression profiling platform of the test set (Table 11).
Table 11. Prostate cancer diagnostic genes
Accession # P value P value LocusLink Description training set test set Symbol
Expressed at relatively higher levels in NAP/BPH
AA424743 1.5 x lO"7 - BRF1 butyrate response factor 1 (EGF- response factor 1)
AA418773 2.3 x 10"7 - HPS Hermansky-Pudlak syndrome
AA148641 2.8 x lO"7 1.9 x lO"10 MEIS2 Meis (mouse) homolog 2
R98851 0.0012 - CALLA common acute lymphoblastic leukemia antigen
AA598478 0.0036 0.015 C7 complement component 7
R62612 0.0070 9.8 x lO'5 FN1 fibronectin 1
Expressed at relatively higher levels in PCA
H50323 2.5 x 10"8 7.0 x lO"5 FASN fatty acid synthase
H62162 1.0 x lO"6 1.4 x lO"8 HPN hepsin
AA460115 2.8 x 10"6 0.22 ODC1 omithine decarboxylase 1
N26311 3.9 x 10"5 2.5 x 10"4 PLAB prostate differentiation factor
AA454207 3.4 x 10" - LABH2 putative transmembrane protein
-, these genes were not reliably detected (i.e., average expression levels >600) in at least one group of the test set and were not given further consideration. Using these 11 genes, we calculated 30 expression ratios per sample by dividing the expression value of each of the 6 genes expressed at relatively higher levels in NAP/BPH by the expression value of each of the 5 genes expressed at relatively higher levels in PCA. Then, we tested the diagnostic accuracy of these ratios in the 28 training set samples obtained from the same study. Samples with ratio values >1 were called NAP/BPH and those with ratio values <1 were called PCA. Not surprisingly, we found that these 30 ratios could be used to coπectly distinguish between non-malignant tissues and PCA with a high degree of accuracy (average=86%, range 76%- 100%). To further refine our diagnostic tool, we examined the expression patterns of the 11 genes identified in the training set in a new cohort of samples (i.e. the test set) for which published data was available from another laboratory. Four genes were discarded because they were not reliably detected in at least one group on the profiling platform of the test set (i.e., average expression level <600 in both NAP and PCA samples). Of the remaining 7 genes, only one (ODCl) was not expressed at significantly different levels in test set samples and was not given further consideration (Table 11). We formed a total of 9 possible ratios from the remaining 6 genes and found that all possessed similarly high accuracy in diagnosing test set samples (average=93%, range 88%-100%). To utilize more than two discriminating genes, we calculated the geometric mean of the 3 most accurate individual ratios and examined the ability of this 3-ratio (4-gene) test, C7/HPN, MEIS2/ΗPN, and
FN1/FΪPN, to diagnose test set samples. As expected, we found that the accuracy of this 3- ratio test remained high (91%, 33/34). The HPN gene is used in all 3 of the ratios, indirectly corroborating the results of the original analyses of these datasets showing this gene to be highly expressed in PCA (12, 13). Finally, we validated this 3-ratio diagnostic test in an independent set of discarded NAP (n=10) and PCA (n=10) patient specimens using quantitative RT-PCR performed at our institution. We found this technique to be highly accurate (90%, 18/20) in classifying these samples (P=0.0007, Fisher's exact test). In both misclassifications, normal prostate was diagnosed as PCA, but no PCA specimens were diagnosed as non-cancer. Despite the fact that diagnostic genes were chosen from a training set in which non- malignant tissues were composed primarily of BPH, the gene ratios remain accurate in distinguishing cancer from the non-malignant samples in the test set which were exclusively NAP. To see if the inverse is true, we reversed the training and test sets and identified predictor genes in exactly the same manner as above. Of the 4 genes used in the diagnostic test from above, only HPN was listed among the 10 most significant genes overexpressed in either group in the new training set. This finding may be attributed to the larger numbers of genes on this training set profiling platform and/or the fact that NAP and BPH do not have perfectly overlapping expression patterns, but enough similarities in key genes to be mutually distinctive from PCA. To test the discriminating nature of these new predictor genes, we chose the 4 genes most significantly overexpressed in each group (8 genes total) and present on both profiling platforms (LocusLink symbol, P value in new training set): DJ742C19.2, R=10"13, FHL1, R=4.8 x 10"12, SEC23A, R=7.5 x 10"n, ATP2A2, R=10"10, HPN, P=1.3 x 10"8, KLK3, P=1.3 x 10"6, LU, R=3.7 x 10"6, LIM, R=4.0 x 10"6. In the new test set, we discovered that 4 of these 8 genes were expressed at significantly different levels: HPN (R=10"6), SEC23A (P=3Λ x 10"4), and LIM (10"4), and KLK3 (R=0.049). As before, a total of 16 possible ratios were calculated using these 8 genes and used to diagnose samples in the new test set. The accuracy of these ratio varied greatly (average=75%, range 41%-91%). We combined the four most accurate individual ratios (>=88% accuracy) and found that this 4- ratio test was actually slightly less accurate (84%, 27/32) than any single ratio. No normal samples were misdiagnosed in this test and three of the five eπors resulted from BPH samples diagnosed as PCA. These observations combine to suggest that genes found to be discriminatory between BPH and PCA are effective in distinguishing between NAP and PCA, but the reverse is less likely to result in accurate stratification. Another, possible explanation is that the platform used in the second set of experiments is not sufficiently extensive to include the best diagnostic genes for this application. Nevertheless, the gene ratio technique was effective in producing relatively accurate and cancer sensitive diagnostic tests across two platforms and in both directions.
Prediction of prognosis in breast cancer using gene expression ratios. Breast cancer is the most common malignancy in women (in 2001) and is the second highest cause of cancer death in North American women (www.cancer.org). Breast cancer gene expression signatures have recently been used to stratify tumor samples into prognostic groups based on cancer recurrence (6). In this context, tumors were obtained from women who underwent surgical resection for lymph node negative breast cancer. "Good prognosis" was defined as disease-free survival for at least 5 years and "poor prognosis" was defined as the development of distant metastases within a 5 year period. An optimal 70-gene classifier was identified and validated in an independent set of tumors. Although the classifier described in this manuscript appears highly accurate and reproducible, there are several limitations to the rapid incorporation of these results into a clinically relevant test. For one, van 't Veer and colleagues ranked tumors for comparison to classification thresholds by comparing the coπelation of predictor genes to the average "good prognosis" profile taken from data acquired on the same platform. This relative measure of contrast relies upon absolute expression levels obtained using microaπays. Unfortunately, it is not certain that an alternative data acquisition platform will produce similarly accurate results. Also, by definition, this technique cannot classify an individual sample without reference to data from additional samples.
We hypothesized that the expression ratio technique could classify samples with similar or greater accuracy to that described by van 't Veer et al. while requiring substantially fewer genes. To test this hypothesis, we identified predictor genes with inversely coπelated average expression levels in the same training set as used by van 't Veer et al. and composed of good prognosis samples (n=44) and poor prognosis samples (n=34). We found 8 genes that fit our filtering criteria (P<0.01, at least a 2-fold difference in mean expression levels), 4 genes each overexpressed in good and poor prognosis samples (Table 12). We calculated all 16 possible expression ratios per training set sample by dividing the expression value of each of the 4 genes expressed at relatively higher levels in good prognosis samples by the expression value of each of the 4 genes expressed at relatively higher levels in poor prognosis samples. Samples with ratio values >1 were classified as good prognosis and those with ratio values <1 were classified as poor prognosis. The classification accuracy of these 16 ratios in the training set varied widely (average=70%, range 59%-80%) so we determined the classification accuracy of multiple ratios combined in a single test. Beginning with the three most accurate ratios, we added additional ratios in descending order of accuracy to form a total of three multiple-ratio tests. These tests used 3, 4, and 6 individual ratios and were 85%, 83%), and 84% accurate in the training set, respectively, demonstrating that the combination of multiple ratios in this analysis exceeds the classification accuracy of the single most accurate individual ratio. (The 6-ratio test incorporated two additional equally accurate ratios.) Only one of eight predictor genes (ASAH2) was not used in any of the three multiple-ratio tests. Table 12. Breast cancer prognostic genes
"Gene ID P value LocusLink Description Symbol
Expressed at relatively higher levels in good prognosis
NM 003862 1.7 x lO"4 FGF18 fibroblast growth factor 18
*Contig47178 RC 3.6 x lO"3 - EST
NM 003147 4.9 x lO"3 SSX2 synovial sarcoma, X breakpoint 2
NM_019893 6.5 x 10"3 ASAH2 N-acylsphingosine amidohydrolase (non- lysosomal ceramidase) 2
Expressed at relatively higher levels in poor prognosis
AL080059 2.0 x lO"6 KIAA1750 KIAA1750 protein
NM 006681 4.0 x lO"5 NMU neuromedin U
*Contig29050 RC 5.1 x 10"3 - EST
NM 000340 7.7 x lO"3 SLC2A2 solute carrier family 2 (facilitated glucose transporter), member 2
", sequences in training set expression profiling platform were identified by their GenBank
Accession number or EST contig number (6).
-, not available
*, these sequences were not homologous to any known genes at the time of this study
(BLAST search, http://www.ncbi.nlm.nih.gov/BLAST/)
We then examined these three multiple-ratio tests in the same test set used by van 't Veer which consisted of 19 additional samples: 7 good prognosis samples and 12 poor prognosis samples (6). We discovered that all 3 sets of gene ratio tests were able to distinguish test set samples with at least 80% accuracy. The most successful gene ratio combination coπectly identified 84% (16/19) of the test set samples (P=0.0055, Fisher's exact test) utilizing 5 genes (from Table 12) in 4 ratios: SSX2/KIAA1750, Contig47178_RC/KIAA1750, FGF18/KIAA1750, and FGF18/NMU. These results are nearly as accurate as those obtained with the optimized 70-gene classifier developed by van 't Veer and colleagues for the same dataset (6). Their classifier coπectly identified 17/19 samples but required an 65 additional genes.
We performed a final analysis of the gene profiling data obtained from breast cancer tissues to develop a model optimized for sensitivity. As noted by van 't Veer and colleagues (6), it is desirable for therapeutic purposes to minimize the number of poor prognosis samples assigned to the good prognosis category in order to ultimately capture all patients at risk of recuπence for adjuvant systemic therapy. Since all three misclassified samples in our best 4- ratio test were samples obtained from patients with poor prognosis, we analyzed multiple- ratio tests exactly as above, with the exception that individual ratios were ranked according to classification accuracy in the poor prognosis group only. As predicted, these tests accurately classified poor prognosis samples, but we discovered that they also remained relatively accurate overall (poor prognosis accuracy, overall accuracy): 3 ratios (91%, 75%), 5 ratios (88%), 82%>), and 6 ratios (91%, 83%). (Two equally accurate ratios were added to the 3-ratio test). In the test set of samples, the 3- and 5-ratio tests misclassified only 1 of 12 poor prognosis samples, but each resulted in 4 misclassifications overall (19%, 15/19). The 6-ratio test also resulted in accurate identification of 11 of 12 poor prognosis patients and only 3 overall eπors (84%, 16/19, R=0.00954, Fisher's exact test) using 6 genes (from Table 12): FGF18/SLC2A2, FGF18/Contig29050_RC, SSX2/SLC2A2, SSX2/Contig29050_RC, FGF18/KIAA1750, and FGF18/NMU. We performed Kaplan-Meier time-to-relapse analysis using predictions made from this test in the 19 test set samples and found a significant difference P=0.0197, Fig. 8) between groups predicted to have widely divergent disease-free survival times. These results indicate that ratios chosen for enhanced sensitivity perform similarly well in the test set samples without any substantial sacrifice in overall accuracy. There are two individual ratios in common between this 6-ratio test and the best 4-ratio test we used to initially develop a classifier based only on overall accuracy. Although both tests resulted in only 3 misclassifications in the test set (n=19), we found the 6-ratio test to be more sensitive.
It is important to note that we have not proposed an exact protocol for developing and testing ratio-based predictor models. In fact, we discovered in this study and others (7) that multiple combinations of genes, in the form of ratios, can achieve similarly accurate results. We merely assert that simple ratios can be a highly accurate means of predicting clinical parameters using very small numbers of genes and simpler data acquisition platforms, such as quantitative RT-PCR and/or custom microaπays. Furthermore, this strategy can be used to analyze microaπays without the need for additional reference samples. In the case of prostate cancer, we envision diagnosis using mRNA obtained from fine needle aspirations would be less invasive than cuπent biopsy techniques and would likely increase compliance and reduce discomfort in men whose prostate-specific antigen levels mandate frequent screening. Similarly, women with breast cancer undergoing initial diagnostic biopsy could have tissue saved for a similar gene expression ratio based test using quantitative RT-PCR or a custom microarray. Women found to be at high risk for recuπence may be selected for either neo-adjuvant chemotherapy or post-surgical adjuvant therapy. The gene ratio method thus presents an opportunity to translate initial microaπay based gene expression profiling to simple clinical tests that are performed using quantitative RT-PCR, microaπays, or other platforms on material obtained surgically or from fine needle aspirations.
References for Example 4
1. Shipp, M. A., Ross, K. A., Tamayo, P., Weng, A. P., Kutok, J. L., Aguiar, R. C T., Gaasenbeek, M., Angelo, M., Reich, M., Pinkus, G. S., Ray, T. S., Koval, M. A., Last, K. M., Norton, A., Lister, T. A., Mesirov, J., Neuberg, D. S., Lander, E. S., Aster, J. C, and Golub, T. R. Diffuse large B-cell lymphoma outcome prediction by gene expression profiling and supervised machine learning, Nat. Med. 8: 68-74, 2002.
2. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C, Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and Landers, E. S. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science. 286: 531-537, 1999.
3. Perou, C. M., Sorlie, T., Eisen, M. B., van de Rijn, M., Jeffrey, S. S., Rees, C. A., Pollack, J. R., Ross, D. T., Johnsen, H., Akslen, L. A., Fluge, O., Pergamenschikov, A., Williams, C, Zhu, S. X., Lonning, P. E., Boπesen-Dale, A.-L., Brown, P. O., and Botstein, D. Molecular portraits of human breast tumours, Nature. 406: 747-752, 2000.
4. Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P., Gusterson, B., Esteller, M., Kallioniemi, O.-P., Wilfond, B., Borg, A., and Trent, J. Gene expression profiles in hereditary breast cancer, N. Engl. J. Med. 344: 539-548, 2001. 5. Khan, J., Wei, J. S., Ringner, M., Saal, L. H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C R., Peterson, C, and Meltzer, P. S. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks, Nat. Med. 7: 673-679, 2001. 6. van 't Veer, L. J., Dai, H., van de Vijver, M. J., He, Y. D., Hart, A. A. M., Mao, M., Peterse, H. L., van der Kooy, K., Marton, M. J., Witteveen, A. T., Schreiber, G. J.,
Kerkhoven, R. M., Roberts, C, Linsley, P. S., Bernards, R., and Friend, S. Gene expression profiling predicts clinical outcome of breast cancer, Nature. 415: 530-536, 2002.
7. Gordon, G. J., Jensen, R. V., Hsiao, L.-L., Gullans, S. R., Blumenstock, J. E., Ramaswami, S., Richards, W. G., Sugarbaker, D. J., and Bueno, R. Translation of microaπay data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma, Cancer Res. 62: TBD (Sept. 1 issue), 2002.
8. Jemal, A., Thomas, A., Muπay, T., and Thun, M. Cancer statistics, 2002, CA Cancer. J. Clin. 52: 23-47, 2002.
9. Etzioni, R., Penson, D. F., Legler, J. M., Tommaso, D., Boer, R., Gann, P. H., and Feuer, E. J. Overdiagnosis due to prostate-specific antigen screening: Lessons from
U.S. prostate cancer incidence trends, J. Nati. Cancer Inst. 94: 981-990, 2002.
10. Djavan, B., Zlotta, A., Kratzik, C, Remzi, M., Seitz, C, Schulman, C. C, and Marberger, M. PSA, PSA density, PSA density of transition zone, free/total PSA ratio, and PSA velocity for early detection of prostate cancer in men with serum PSA 2.5 to 4.0 ng/mL, Urology. 54: 517-522, 2001.
11. Pannek, J. and Partin, A. W. The role of PAS and percent free PSA for staging and prognosis prediction in clinically localized prostate cancer, Semin. Urol. Oncol. 16: 100-105, 1998.
12. Dhanasekaran, S. M., Baπette, T. R., Ghosh, D., Shah, R., Varambaliy, S., Kurachi, K., Pienta, K. J., Rubin, M. A., and Chinnaiyan, A. M. Delineation of prognostic biomarkers in prostate cancer, Nature. 412: 822-826, 2001.
13. Welsh, J. B., Sapinoso, L. M., Su, A. I., Kem, S. G., Wang-Rodriguez, J., Moskaluk, C. A., Frierson, H. F., and Hampton, G. M. Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer, Cancer Res. 61: 5974-5978, 2001.
14. Venables, W. N. and Riley, B. D. Modem Applied Statistics with S-Plus. New York: Springer, 1997.
Example 5: Prediction of Outcomes of Lung Adenocarcinoma Using Expression Profiling Data This example describes the use of published data relating gene expression profiles and outcome in lung adenocarcinoma. A set of gene ratios was generated by analyzing the data from Beer et al. (Nature Med. 8: 816-824, 2002), who used smaller chips (6800 genes), as a training set.
The training set ratios were tested using the published data set derived from expression profiling experiments using 12,000 genes (Bhattacharjee et al., Proc. Nati. Acad. Sci. USA. 98: 13790-13795, 2001). The object was to predict good outcome versus recuπence in stage I lung cancer after surgery. As shown below, the ratios derived from the training set data (Beer et al.) could differentiate significantly between good and poor outcomes in the test set data (Bhattacharjee et al.).
The analysis the other direction (using Bhattachaηee et al. expression data as the training set and Beer et al. expression data as the test set) did not work because the best genes in the analysis of the Bhattacharjee et al. expression data were not present in the genes (6800 gene chips) analyzed by Beer et al.
Gene selection criteria: Genes having a >2-fold higher expression in good or poor outcome samples, and the lowest (best)/? values, were selected.
Training Set (Beer et al. data); good outcome (n=21) means alive at 5 years; poor outcome (n=l 1) means disease recuπence within 4 years.
Table 13: Genes overexpressed in tumors of different outcome
Figure imgf000091_0001
Gene ratios were calculated as follows: genes overexpressed in good outcome genes overexpressed in poor outcome
The application of the ratios is shown in Table 14. Table 14: Training set gene ratios for predicting outcome
Figure imgf000092_0001
Eπor good = number of eπors in predicting good outcome in training set Error poor = number of eπors in predicting poor outcome in training set Error total = number of total errors in predicting outcome in training set
The top 3 ratios from the training set (1/6, 4/5, 4/7) were chosen according to following criteria: poor>80% coπectly identified and overall>75% coπectly identified. The combination of these three ratios resulting in the prediction of 20/21 good outcome tumors and 9/11 poor outcome tumors (29/32 = 91%).
The three ratio combination was applied to the test set data of Bhattacharjee et al. The results are shown in Table 15:
Table 15: Application of gene expression ratios to test set
Test Set (Bhattachaηee et al. data); good outcome (n=28) means alive at 5 years; poor outcome (n=19) means disease recuπence within 4 years.
Figure imgf000092_0002
Eπor good = number of eπors in predicting good outcome in test set Error poor = number of eπors in predicting poor outcome in test set Eπor total = number of total errors in predicting outcome in test set Table 16: Predictions and status for individual tumor samples
Stage 1 adenocarcinoma only; excluded patient tissue samples of <40% tumor cell and/or mixed histology.
Figure imgf000093_0001
Figure imgf000094_0001
Survival: in months
Status: 1 = alive without disease; 2 = alive with disease; 3 = dead from disease; 4 = dead from other causes; A = alive, disease status unknown; D = dead, reason unknown. Censor: for Kaplan-Meier analysis (see Fig. 9); 0 = no censoring event; 1 = presence of censoring event % tumor: in sample on slide Group: good or poor outcome as predicted by gene ratios
Example 6: Analysis of gene expression data in various cancers for diagnosis and prognosis
This example represents analyses of gene expression profiling data presented in the literature for several different types of cancer. Each chart has several lists of genes that are increased in expression or decreased in expression in a given diagnosis or prognosis.
The method applied to the analyses of the data uses a combination of ratios of genes from one set always in the numerator and a second set always in the denominator to determine diagnosis or prognosis. The genes used in the ratios for determination of diagnosis or prognosis are numbered.
A. Rosenwald et al. (N Engl J Med 346(25):1937-1947, 2002), Diagnosis of subtype germinal-center B-cell-like (GCB) vs type III in diffuse large B-cell lymphoma (DLBCL). Genes having a >2-fold higher expression in different diagnosis samples, and the lowest (best)/? values, are shown in Table 17.
Training set = 109 samples, Test set = 58 samples
Table 17: Genes overexpressed in germinal-center B-cell-like (GCB) or type III in diffuse large B-cell lymphoma (DLBCL)
Figure imgf000095_0001
Figure imgf000096_0001
B. Welsh et al. (Proc Nati Acad Sci USA 98(3): 1176-1181, 2001), Analysis of gene expression profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular markers of epithelial ovarian cancer. Genes having a >2-fold higher expression in ovarian tumor or normal samples, and the lowest (best)/? values, are shown in Table 18.
Training set = 22 samples, Test set = 12 samples
Table 18: Genes overexpressed in tumor or normal ovarian epithelium
Figure imgf000096_0002
Figure imgf000097_0001
C. Rosenwald et al. (N Engl J Med 346(25): 1937-1947, 2002), Diagnosis of subtypes germinal-center B-cell-like (GBC) vs activated B-cell-like (ABC) in diffuse large-B-cell lymphoma (DLBCL). Genes having a >2-fold higher expression in different diagnosis samples, and the lowest (best)/? values, are shown in Table 19.
Training set = 129 samples, test set = 59 samples
Table 19: Genes overexpressed in germinal-center B-cell-like (GBC) or activated B-cell-like (ABC) in diffuse large-B-cell lymphoma (DLBCD
Figure imgf000097_0002
Figure imgf000098_0001
D. Shipp et al (Nat Med 8(l):68-74, 2002), Diagnosis of diffuse large B-cell lymphoma (DLBCL) vs. follicular lymphoma (FL). Genes having a >2-fold higher expression in different diagnosis samples, and the lowest (best)/? values, are shown in Table 20.
Training set = 39 samples, Test set = 38 samples
Table 20: Genes overexpressed in diffuse large B-cell lymphoma (DLBCL) or follicular lymphoma (FL)
Figure imgf000098_0002
Figure imgf000099_0001
E. Rosenwald et al. (N Engl J Med 346(25): 1937- 1947, 2002), Diagnosis of subtype activated B-cell-like (ABC) vs type III diffuse large-B-cell lymphoma. Genes having a >2-fold higher expression in different diagnosis samples, and the lowest (best)/? values, are shown in Table 21.
Training set = 82 samples, test set = 43 samples
Table 21 : Genes overexpressed in subtype activated B-cell-like (ABC) or type III diffuse large-B-cell lymphoma
Figure imgf000099_0002
Figure imgf000100_0001
F. Shipp et al (Nat Med. 8(l):68-74, 2002), Prognosis of diffuse large B-cell lymphoma (DLBCL). Good prognosis was defined by Shipp as no disease recurrance; bad prognosis was defined by Shipp as recuπance of disease. Genes having a >2-fold higher expression in good or poor prognosis samples, and the lowest (best)/? values, are shown in Table 22.
Training set, n = 29; Test set, n = 29
Table 22: Genes overexpressed in diffuse large B-cell lymphoma (DLBCL) of good and poor outcome
Figure imgf000100_0002
Figure imgf000101_0001
Example 7: Prognosis of lung adenocarcinoma
Data from Bhattacharjee et al. for Stage 1 lung cancer was used as in Example 5, except that: 1) only samples with >50% tumor were used, and 2) a 5 year survival cutoff was used instead of 4 year survival. Thus the criteria for prognosis were: good = alive, survival >60 mos; poor = dead, survival <60 mos. This reduced the sample numbers to: n=12 for good prognosis, n=17 poor prognosis. Genes having a >2-fold higher expression in good or poor prognosis samples, and the lowest (best)/? values, are shown in Table 23.
Table 23: Genes overexpressed in good or poor outcome
Figure imgf000101_0002
The present invention is not limited in scope by the examples provided, since the examples are intended as illustrations of various aspects of the invention and other functionally equivalent embodiments are within the scope of the invention. Various modifications of the invention in addition to those shown are described herein will become apparent to those skilled in the art for the foregoing description and fall within the scope of the appended claims. The advantages and objects of the invention are not necessarily encompassed by each embodiment of the invention.
All references, patents, and patent publications that are recited in this application are incorporated in their entirety herein by reference.

Claims

Claims
1. A method for diagnosing the presence of cancer cells o'r non-cancer cells in a tissue sample, comprising providing a set of two or more genes, wherein the set comprises at least one upregulated gene that is expressed in greater amounts in the cancer cells than in coπesponding non-cancer cells and at least one downregulated gene that is expressed in lesser amounts in cancer cells than in coπesponding non-cancer cells, determining the expression levels of the set of two or more genes, calculating at least one ratio of the expression level of the at least one upregulated gene to the expression level of the at least one downregulated gene, wherein the at least one ratio is indicative of the presence of cancer cells or non-cancer cells in the tissue sample.
2. The method of claim 1 , wherein there is at least a 2-fold difference in mean expression levels between the at least one upregulated gene and the at least one downregulated gene.
3. The method of claim 1, wherein two or more expression ratios are calculated.
4. The method of claim 3, further comprising combining the two or more expression ratios.
5. The method of claim 4, wherein the step of combining the two or more expression ratios comprises calculating the geometric mean of the two or more expression ratios.
6. The method of claim 1 , wherein the ratio is calculated by division of the expression level of one upregulated gene by the expression level of one downregulated gene.
7. The method of claim 1, wherein the ratio is calculated by division of the expression levels of two or more upregulated genes by the expression level of one downregulated gene.
8. The method of claim 1 , wherein the ratio is calculated by division of the expression level of one upregulated gene by the expression levels of two or more downregulated genes.
9. The method of claim 1, wherein the ratio is calculated by division of the expression levels of two or more upregulated genes by the expression levels of two or more downregulated genes.
10. The method of claim 1, further comprising transforming the expression level data for the upregulated and/or downregulated genes prior to calculating the ratio.
11. The method of claim 1, wherein the expression levels are determined by a method selected from the group consisting of nucleic acid hybridization and nucleic acid amplification.
12. The method of claim 11, wherein the nucleic acid hybridization is performed using a solid-phase nucleic acid molecule aπay.
13. The method of claim 11 , wherein the nucleic acid amplification method is real-time PCR.
14. The method of claim 1, wherein the expression levels are determined by an immunological method.
15. The method of claim 14, wherein the immunological method is performed using a solid-phase antibody aπay.
16. The method of claim 14, wherein the immunological method is an ELISA or ELISPOT assay.
17. The method of claim 1, wherein the cancer is selected from the group consisting of malignant pleural mesothelioma, lung adenocarcinoma, squamous carcinoma, medulloblastoma, prostate cancer, breast cancer, diffuse large B-cell lymphoma, follicular lymphoma and ovarian cancer.
18. The method of claim 1, wherein the at least one ratio is indicative of the presence of cancer cells in the tissue sample.
19. The method of claim 1, wherein the at least one ratio is indicative of the presence of non-cancer cells in the tissue sample.
20. A method for determining prognosis or outcome of a cancer patient, comprising providing a set of two or more genes, wherein the set comprises at least one upregulated gene that is expressed in greater amounts in a tumor sample from a cancer patient having a good prognosis or outcome than in a tumor sample from a cancer patient having a poor prognosis or outcome and at least one downregulated gene that is expressed in lesser amounts in the tumor sample from the cancer patient having a good prognosis or outcome than in the tumor sample from the cancer patient having a poor prognosis or outcome, determining the expression levels of the set of two or more genes, calculating at least one ratio of the expression level of the at least one upregulated gene to the expression level of the at least one downregulated gene, wherein the at least one ratio is indicative of the prognosis or outcome of the cancer patient.
21. The method of claim 20, wherein there is at least a 2-fold difference in mean expression levels between the at least one upregulated gene and the at least one downregulated gene.
22. The method of claim 20, wherein two or more expression ratios are calculated.
23. The method of claim 22, further comprising combining the two or more expression ratios.
24. The method of claim 23, wherein the step of combining the two or more expression ratios comprises calculating the geometric mean of the two or more expression ratios.
25. The method of claim 20, wherein the ratio is calculated by division of the expression level of one upregulated gene by the expression level of one downregulated gene.
26. The method of claim 20, wherein the ratio is calculated by division of the expression levels of two or more upregulated genes by the expression level of one downregulated gene.
27. The method of claim 20, wherein the ratio is calculated by division of the expression level of one upregulated gene by the expression levels of two or more downregulated genes.
28. The method of claim 20, wherein the ratio is calculated by division of the expression levels of two or more upregulated genes by the expression levels of two or more downregulated genes.
29. The method of claim 20, further comprising transforming the expression level data for the upregulated and/or downregulated genes prior to calculating the ratio.
30. The method of claim 20, wherein the expression levels are determined by a method selected from the group consisting of nucleic acid hybridization and nucleic acid amplification.
31. The method of claim 30, wherein the nucleic acid hybridization is performed using a solid-phase nucleic acid molecule array.
32. The method of claim 30, wherein the nucleic acid amplification method is real-time PCR.
33. The method of claim 20, wherein the expression levels are determined by an immunological method.
34. The method of claim 33, wherein the immunological method is performed using a solid-phase antibody aπay.
35. The method of claim 33, wherein the immunological method is an ELISA or ELISPOT assay.
36. The method of claim 20, wherein the cancer is selected from the group consisting of malignant pleural mesothelioma, lung adenocarcinoma, squamous carcinoma, medulloblastoma, prostate cancer, breast cancer, diffuse large B-cell lymphoma, follicular lymphoma and ovarian cancer.
37. A kit for cancer diagnosis comprising, a set of one or more ratios applicable to the analysis of gene expression data, wherein the ratio is calculated from the expression levels of at least one upregulated gene that is expressed in greater amounts in the cancer cells than in corresponding non-cancer cells and at least one downregulated gene that is expressed in lesser amounts in cancer cells than in coπesponding non-cancer cells.
38. The kit of claim 37, further comprising instructions for the use of the one or more ratios in the diagnosis of the presence of cancer cells or non-cancer cells in a biological sample.
39. A cancer diagnostic system comprising a measurement device that measures gene expression level data of a set of two or more genes, wherein the set comprises at least one upregulated gene that is expressed in greater amounts in the cancer cells than in coπesponding non-cancer cells and at least one downregulated gene that is expressed in lesser amounts in cancer cells than in coπesponding non-cancer cells, a data transformation device that acquires the gene expression data from the measurement device and performs data transformation to calculate a ratio of the gene expression levels of the upregulated and downregulated genes.
40. The cancer diagnostic system of claim 39, wherein data transformation device selects gene expression data of a selected set of genes from the measurement device for calculating the ratio of the selected set of genes, wherein the ratio calculated from the gene expression data of the selected set of genes is diagnostic for a selected cancer.
41. The cancer diagnostic system of claim 39, further comprising a user interface output device to output the ratio to a user.
42. The cancer diagnostic system of claim 41 , further comprising a database of ratios of gene expression that are diagnostic for cancers, and a comparison device that compares the ratio calculated from the measured gene expression to the diagnostic ratios stored in the database and outputs the comparison to the user interface output device.
43. The cancer diagnostic system of claim 41 , further comprising a database of treatment information for specific cancers, wherein the comparison device identifies treatment information in the database for the specific cancer for which the diagnostic ratio matches the calculated ratio, and wherein the comparison outputs the treatment information to the user interface output device.
44. A method for diagnosing malignant pleural mesothelioma in a subject suspected of having malignant pleural mesothelioma comprising: obtaining from the subject a tissue sample suspected of being cancerous, determining the expression of a set of nucleic acid molecules or expression products thereof in the tissue sample, wherein the set of nucleic acid molecules comprises at least two nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77.
45. The method of claim 44, wherein the set of nucleic acid molecules comprises at least 3 nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77.
46. The method of claim 44, wherein the set includes at least 4 nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77.
47. The method of claim 44, wherein the set includes at least 5 nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77.
48. The method of claim 44, wherein the set includes at least 10 nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77.
49. The method of claim 44, wherein the set includes at least 12 nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77.
50. The method of claim 44, wherein the set includes at least 15 nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29,
31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77.
51. The method of claim 44, wherein the set includes at least 20 nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77.
52. The method of claim 44, wherein the set includes at least 25 nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77.
53. The method of claim 44, further comprising: determining the expression of the set of nucleic acid molecules or expression products thereof in a non-cancerous tissue sample, and comparing the expression of the set of nucleic acid molecules or expression products thereof in the tissue sample suspected of being cancerous and the non-cancerous tissue sample.
54. The method of claim 44, further comprising calculating a ratio of the expression of at least two genes among the set of nucleic acid molecules.
55. A method for selecting a course of treatment of a subject having or suspected of having malignant pleural mesothelioma, comprising: obtaining from the subject a tissue sample suspected of being cancerous, determining the expression of a set of nucleic acid markers or expression products thereof which are differentially expressed in malignant pleural mesothelioma tumor tissue samples, and selecting a course of treatment appropriate to the malignant pleural mesothelioma of the subject.
56. The method of claim 55, further comprising calculating a ratio of the expression of at least two genes among the set of nucleic acid markers or expression products thereof.
57. The method of claim 55, further comprising: determining the expression of the set of nucleic acid molecules or expression products thereof in a non-cancerous tissue sample.
58. The method of claim 55, wherein the expression of a set of nucleic acid markers is determined by a method selected from the group consisting of nucleic acid hybridization and nucleic acid amplification.
59. The method of claim 58, wherein the nucleic acid hybridization is performed using a solid-phase nucleic acid molecule aπay.
60. The method of claim 58, wherein the nucleic acid amplification method is real-time PCR.
61. A method for evaluating treatment of malignant pleural mesothelioma, comprising: obtaining a first determination of the expression of a set of nucleic acid molecules, or expression products thereof, which are differentially expressed in a malignant pleural mesothelioma tumor tissue sample from a subject undergoing treatment for cancer, obtaining a second determination of the expression of the set of nucleic acid molecules, or expression products thereof, in a second malignant pleural mesothelioma tumor tissue sample from the subject after obtaining the first determination, comparing the first determination of expression to the second determination of expression as an indication of evaluation of the treatment.
62. The method of claim 61 , wherein the determinations of expressions are used to calculate at least one ratio of gene expression.
63. The method of claim 61 , further comprising: determining the expression of a set of nucleic acid markers which are differentially expressed in non-cancerous tissue samples.
64. The method of claim 61, wherein the expression of a set of nucleic acid markers is determined by a method selected from the group consisting of nucleic acid hybridization and nucleic acid amplification.
65. The method of claim 64, wherein the nucleic acid hybridization is performed using a solid-phase nucleic acid molecule aπay.
66. The method of claim 64, wherein the nucleic acid amplification method is real-time PCR.
67. A solid-phase nucleic acid molecule aπay consisting essentially of at least two nucleic acid molecules selected from the group consisting of SEQ ID NOs: 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 fixed to a solid substrate.
68. The solid-phase nucleic acid molecule aπay of claim 67, further comprising at least one control nucleic acid molecule.
69. The solid-phase nucleic acid molecule aπay of claim 67, wherein the set of nucleic acid molecules comprises at least 3 nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77.
70. The solid-phase nucleic acid molecule aπay of claim 67, wherein the set includes at least 4 nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77.
71. The solid-phase nucleic acid molecule aπay of claim 67, wherein the set includes at least 5 nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13,
15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77.
72. The solid-phase nucleic acid molecule aπay of claim 67, wherein the set includes at least 10 nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13,
15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77.
73. The solid-phase nucleic acid molecule aπay of claim 67, wherein the set includes at least 12 nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13,
15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77.
74. The solid-phase nucleic acid molecule aπay of claim 67, wherein the set includes at least 15 nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13,
15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77.
75. The solid-phase nucleic acid molecule aπay of claim 67, wherein the set includes at least 20 nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13,
15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77.
76. The solid-phase nucleic acid molecule aπay of claim 67, wherein the set includes at least 25 nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13,
15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77.
77. A solid-phase protein microaπay comprising at least two antibodies or antigen- binding fragments thereof, that specifically bind at least two different polypeptides selected from the group consisting of SEQ ID NOs: 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, fixed to a solid substrate.
78. The protein microaπay of claim 77, further comprising at least one control polypeptide molecule.
79. The protein microaπay of claim 77, wherein the antibodies are monoclonal f antibodies.
80. The protein microaπay of claim 77, wherein the antibodies are polyclonal antibodies.
81. A method for identifying lead compounds for a pharmacological agent useful in the treatment of malignant pleural mesothelioma, comprising: contacting a malignant pleural mesothelioma cell or tissue with a candidate pharmacological agent, determining the expression of a set of nucleic acid molecules in the malignant pleural mesothelioma cell or tissue sample under conditions which, in the absence of the candidate pharmacological agent, permit a first amount of expression of the set of nucleic acid molecules wherein the set of nucleic acid molecules comprises at least two nucleic acid molecules selected from the group consisting of SEQ ID NOs:9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, and detecting a test amount of the expression of the set of nucleic acid molecules, wherein a decrease in the test amount of expression in the presence of the candidate pharmacological agent relative to the first amount of expression indicates that the candidate pharmacological agent is a lead compound for a pharmacological agent which is useful in the treatment of malignant pleural mesothelioma.
82. The method of claim 81, further comprising calculating a ratio of gene expression.
PCT/US2002/028203 2001-09-05 2002-09-05 Diagnostic and prognostic tests WO2003021229A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2002324881A AU2002324881A1 (en) 2001-09-05 2002-09-05 Diagnostic and prognostic tests

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US31738901P 2001-09-05 2001-09-05
US60/317,389 2001-09-05
US10/236,031 2002-08-30
US10/236,031 US7622260B2 (en) 2001-09-05 2002-09-05 Diagnostic and prognostic tests

Publications (2)

Publication Number Publication Date
WO2003021229A2 true WO2003021229A2 (en) 2003-03-13
WO2003021229A3 WO2003021229A3 (en) 2004-11-25

Family

ID=26929397

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2002/028203 WO2003021229A2 (en) 2001-09-05 2002-09-05 Diagnostic and prognostic tests

Country Status (2)

Country Link
AU (1) AU2002324881A1 (en)
WO (1) WO2003021229A2 (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005005661A2 (en) * 2003-07-02 2005-01-20 Novartis Ag Genes regulated in ovarian cancer as prognostic and therapeutic targets
WO2005015236A2 (en) * 2003-07-18 2005-02-17 Roche Diagnostics Gmbh A method for predicting the progression of adenocarcinoma
EP1533619A3 (en) * 2003-11-20 2005-10-05 F. Hoffmann-La Roche Ag Specific markers for metabolic syndrome
EP1664762A2 (en) * 2003-09-03 2006-06-07 THE GOVERNMENT OF THE UNITED STATES OF AMERICA, as represented by THE SECRETARY, DEPARTMENT OF HEALTH AND HUMAN SERVICES Methods for identifying, diagnosing, and predicting survival of lymphomas
JP2009514553A (en) * 2005-11-03 2009-04-09 アラビタ・ファーマシューティカルズ・インコーポレーテッド Modified annexin protein and use of the protein in organ transplantation
US7622260B2 (en) 2001-09-05 2009-11-24 The Brigham And Women's Hospital, Inc. Diagnostic and prognostic tests
US8131475B2 (en) 2003-09-03 2012-03-06 The United States Of America As Represented By The Secretary, Department Of Health And Human Services Methods for identifying, diagnosing, and predicting survival of lymphomas
EP2474629A1 (en) * 2007-02-21 2012-07-11 Oslo Universitetssykehus HF New markers for cancer
US8321137B2 (en) 2003-09-29 2012-11-27 Pathwork Diagnostics, Inc. Knowledge-based storage of diagnostic models
US8450057B2 (en) 2006-08-14 2013-05-28 The Brigham And Women's Hospital, Inc. Diagnostic tests using gene expression ratios
CN104368012A (en) * 2014-08-20 2015-02-25 中国人民解放军第三〇七医院 Uses and relevant drugs of human RPL34 (ribosomal protein L34) gene
US8977506B2 (en) 2003-09-29 2015-03-10 Response Genetics, Inc. Systems and methods for detecting biological features
US20170045519A1 (en) * 2014-04-22 2017-02-16 Shanghai Kexin Biotech Co., Ltd. Method and biomarker for detecting cancer
CN107312846A (en) * 2017-07-12 2017-11-03 北京赛尔维康生物医学科技有限公司 Application of the CAPG and PTGIS genes in scoliosis detection kit is prepared
US9970059B2 (en) 2008-06-06 2018-05-15 The United States of America, as represented by the Secretary, Department of Human Services Survival predictor for diffuse large B cell lymphoma
US10370715B2 (en) 2003-09-03 2019-08-06 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Methods for identifying, diagnosing, and predicting survival of lymphomas
CN111912987A (en) * 2020-08-25 2020-11-10 北京信诺卫康科技有限公司 Combination of FGF18 and HE4 as early ovarian cancer biomarker and kit

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5840484A (en) * 1992-07-17 1998-11-24 Incyte Pharmaceuticals, Inc. Comparative gene transcript analysis
US6040138A (en) * 1995-09-15 2000-03-21 Affymetrix, Inc. Expression monitoring by hybridization to high density oligonucleotide arrays
WO2000055633A2 (en) * 1999-03-15 2000-09-21 Eos Biotechnology, Inc. Methods of screening for colorectal cancer modulators

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5840484A (en) * 1992-07-17 1998-11-24 Incyte Pharmaceuticals, Inc. Comparative gene transcript analysis
US6040138A (en) * 1995-09-15 2000-03-21 Affymetrix, Inc. Expression monitoring by hybridization to high density oligonucleotide arrays
WO2000055633A2 (en) * 1999-03-15 2000-09-21 Eos Biotechnology, Inc. Methods of screening for colorectal cancer modulators

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7622260B2 (en) 2001-09-05 2009-11-24 The Brigham And Women's Hospital, Inc. Diagnostic and prognostic tests
US8551700B2 (en) 2001-09-05 2013-10-08 The Brigham And Women's Hospital, Inc. Diagnostic and prognostic tests
WO2005005661A2 (en) * 2003-07-02 2005-01-20 Novartis Ag Genes regulated in ovarian cancer as prognostic and therapeutic targets
WO2005005661A3 (en) * 2003-07-02 2005-07-14 Novartis Ag Genes regulated in ovarian cancer as prognostic and therapeutic targets
WO2005015236A2 (en) * 2003-07-18 2005-02-17 Roche Diagnostics Gmbh A method for predicting the progression of adenocarcinoma
WO2005015236A3 (en) * 2003-07-18 2005-08-11 Roche Diagnostics Gmbh A method for predicting the progression of adenocarcinoma
US10697975B2 (en) 2003-09-03 2020-06-30 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Methods for identifying, diagnosing, and predicting survival of lymphomas
EP1664762A2 (en) * 2003-09-03 2006-06-07 THE GOVERNMENT OF THE UNITED STATES OF AMERICA, as represented by THE SECRETARY, DEPARTMENT OF HEALTH AND HUMAN SERVICES Methods for identifying, diagnosing, and predicting survival of lymphomas
EP1664762A4 (en) * 2003-09-03 2008-08-13 Us Gov Health & Human Serv Methods for identifying, diagnosing, and predicting survival of lymphomas
US10370715B2 (en) 2003-09-03 2019-08-06 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Methods for identifying, diagnosing, and predicting survival of lymphomas
US7711492B2 (en) 2003-09-03 2010-05-04 The United States Of America As Represented By The Department Of Health And Human Services Methods for diagnosing lymphoma types
US8131475B2 (en) 2003-09-03 2012-03-06 The United States Of America As Represented By The Secretary, Department Of Health And Human Services Methods for identifying, diagnosing, and predicting survival of lymphomas
US8321137B2 (en) 2003-09-29 2012-11-27 Pathwork Diagnostics, Inc. Knowledge-based storage of diagnostic models
US8977506B2 (en) 2003-09-29 2015-03-10 Response Genetics, Inc. Systems and methods for detecting biological features
EP1533619A3 (en) * 2003-11-20 2005-10-05 F. Hoffmann-La Roche Ag Specific markers for metabolic syndrome
JP2009514553A (en) * 2005-11-03 2009-04-09 アラビタ・ファーマシューティカルズ・インコーポレーテッド Modified annexin protein and use of the protein in organ transplantation
US8450057B2 (en) 2006-08-14 2013-05-28 The Brigham And Women's Hospital, Inc. Diagnostic tests using gene expression ratios
EP2474629A1 (en) * 2007-02-21 2012-07-11 Oslo Universitetssykehus HF New markers for cancer
US9970059B2 (en) 2008-06-06 2018-05-15 The United States of America, as represented by the Secretary, Department of Human Services Survival predictor for diffuse large B cell lymphoma
US11028444B2 (en) 2008-06-06 2021-06-08 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Survival predictor for diffuse large B cell lymphoma
US20170045519A1 (en) * 2014-04-22 2017-02-16 Shanghai Kexin Biotech Co., Ltd. Method and biomarker for detecting cancer
US10641774B2 (en) * 2014-04-22 2020-05-05 Shanghai Kexin Biotech Co., Ltd. Method for diagnosing cancer by detecting C-terminal segment of moesin in blood
CN104368012B (en) * 2014-08-20 2019-03-22 中国人民解放军第三〇七医院 The purposes and its related drugs of people's RPL34 gene
CN104368012A (en) * 2014-08-20 2015-02-25 中国人民解放军第三〇七医院 Uses and relevant drugs of human RPL34 (ribosomal protein L34) gene
CN107312846A (en) * 2017-07-12 2017-11-03 北京赛尔维康生物医学科技有限公司 Application of the CAPG and PTGIS genes in scoliosis detection kit is prepared
CN111912987A (en) * 2020-08-25 2020-11-10 北京信诺卫康科技有限公司 Combination of FGF18 and HE4 as early ovarian cancer biomarker and kit
CN111912987B (en) * 2020-08-25 2023-07-21 北京信诺卫康科技有限公司 FGF18 and HE4 combined used as early ovarian cancer biomarker and kit

Also Published As

Publication number Publication date
WO2003021229A3 (en) 2004-11-25
AU2002324881A8 (en) 2003-03-18
AU2002324881A1 (en) 2003-03-18

Similar Documents

Publication Publication Date Title
US7622260B2 (en) Diagnostic and prognostic tests
US7803552B2 (en) Biomarkers for predicting prostate cancer progression
US8450057B2 (en) Diagnostic tests using gene expression ratios
US20020106662A1 (en) Prognostic classification of endometrial cancer
US7741034B2 (en) Detection and diagnosis of smoking related cancers
EP2726635B1 (en) Multigene prognostic assay for lung cancer
US20070015148A1 (en) Gene expression profiles in breast tissue
US20040033502A1 (en) Gene expression profiles in esophageal tissue
WO2003068054A2 (en) Identification of ovarian cancer tumor markers and therapeutic targets
WO2003021229A2 (en) Diagnostic and prognostic tests
EP2754720A1 (en) Prostate cancer survival and recurrence
JP2011525106A (en) Markers for diffuse B large cell lymphoma and methods of use thereof
WO2002010436A2 (en) Prognostic classification of breast cancer
WO2010108638A1 (en) Tumour gene profile
WO2001074405A9 (en) Gene expression profiles in esophageal tissue
US20100221722A1 (en) Methods for evaluating breast cancer prognosis
AU2017201343A1 (en) Molecular biomarker set for early detection of ovarian cancer
CN101457254B (en) Gene chip and kit for liver cancer prognosis
JP5688497B2 (en) Methods and compositions for predicting postoperative prognosis in patients with lung adenocarcinoma
Fey The impact of chip technology on cancer medicine
EP1682906A2 (en) Method for distinguishing aml subtypes with differents gene dosages

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BY BZ CA CH CN CO CR CU CZ DE DM DZ EC EE ES FI GB GD GE GH HR HU ID IL IN IS JP KE KG KP KR LC LK LR LS LT LU LV MA MD MG MN MW MX MZ NO NZ OM PH PL PT RU SD SE SG SI SK SL TJ TM TN TR TZ UA UG US UZ VC VN YU ZA ZM

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ UG ZM ZW AM AZ BY KG KZ RU TJ TM AT BE BG CH CY CZ DK EE ES FI FR GB GR IE IT LU MC PT SE SK TR BF BJ CF CG CI GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
NENP Non-entry into the national phase in:

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP