WO2011106738A2 - Use of tcr clonotypes as biomarkers for disease - Google Patents

Use of tcr clonotypes as biomarkers for disease Download PDF

Info

Publication number
WO2011106738A2
WO2011106738A2 PCT/US2011/026373 US2011026373W WO2011106738A2 WO 2011106738 A2 WO2011106738 A2 WO 2011106738A2 US 2011026373 W US2011026373 W US 2011026373W WO 2011106738 A2 WO2011106738 A2 WO 2011106738A2
Authority
WO
WIPO (PCT)
Prior art keywords
sequences
disease
biomarker
cells
subject
Prior art date
Application number
PCT/US2011/026373
Other languages
French (fr)
Other versions
WO2011106738A3 (en
Inventor
Edus H. Warren
Christopher Scott Carlson
Harlan S. Robins
Original Assignee
Fred Hutchinson Cancer Research Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fred Hutchinson Cancer Research Center filed Critical Fred Hutchinson Cancer Research Center
Publication of WO2011106738A2 publication Critical patent/WO2011106738A2/en
Publication of WO2011106738A3 publication Critical patent/WO2011106738A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6881Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for tissue or cell typing, e.g. human leukocyte antigen [HLA] probes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/16Primer sets for multiplex assays

Definitions

  • B cells and T cells in the adaptive immune system form long lasting memory responses to protect against future exposures which can be measured.
  • the response specific to antigens is based on the highly polymorphic receptors encoded by B cells (immunoglobulins (IG)) and T cells (T cell receptors (TCR)).
  • IGs immunoglobulins
  • TCRs T cell receptors
  • IGs immunoglobulins
  • TCRs are heterodimeric proteins consisting of either an a chain and a ⁇ chain or a ⁇ chain and a ⁇ chain.
  • a similar structure in B cell receptors is present where IGs are also a heterodimer, consisting of one light and one heavy chain.
  • Receptor diversity is further augmented by the deletion of nucleotides adjacent to the recombination signal sequences (RSS) of the V, D, and J segments, and template-independent insertion of nucleotides at the V-D, D-J, and V-J junctions.
  • RSS recombination signal sequences
  • the present invention provides a method for identifying a biomarker for a disease comprising a) providing isolated polynucleotide sequences from immune cells from a group of patients with the disease; b) performing a nucleotide amplification reaction to produce a set of clonotype sequences; c) identifying clonotype sequences enriched within the group of patients; d) providing isolated polynucleotide sequences from immune cells from a group of normal subjects without the disease, and amplifying the polynucleotides; e) removing sequences present in the normal subject group, which are obtained in step d), from the exposure-specific clonotype sequences, which are obtained in step c).
  • the polynucleotide sequences are rearranged genomic sequences.
  • the nucleotide amplification reaction comprises (i) a multiplicity of V-segment primers, where each primer comprises a sequence that is complementary to a single functional V- segment or a small family of V- seqments; and (ii) a multiplicity of J-segment primers, where each primer comprises a sequence that is complementary to a J segment; and where the V- segment and J-segment primers amplify a TCR CDR3 region.
  • the V- segments are selected from the group consisting of Va, ⁇ , Vy and V6, and the J-segments are selected from the group consisting of Ja, jp, Jy and J5.
  • the immune cell samples are selected from the group consisting of CD45RO+, CD45RA int/neg CD8+ T cells and CD45RO " CD45RA hi CD62L hi CD8+ T cells.
  • the T cells share one or more HLA alleles.
  • V-segment primers wherein each primer comprises a sequence that is complementary to a single functional V-segment or a small family of V-seqments; and (ii) a multiplicity of J primers, wherein each primer comprises a sequence that is complementary to a J-segment; wherein the V-segment and J-segment primers amplify an IGH, IGL or IGK CDR3 region.
  • the V-segment primers forward primers
  • recombination signal sequence within the V segment.
  • Another embodiment can include where the J-segment primers (reverse primers) are at a position about 30 base pair 3' of the J gene RSS site.
  • Certain other embodiments may include where the exposure-specific clonotype sequences have an insertion of less than six nucleotides.
  • certain diseases are of importance and include diseases selected from the group consisting of an autoimmune disease, an inflammatory disease, an immune deficiency, a bacterial infection, a viral infection, a fungal infection, or a parasitic infection.
  • Another embodiment of the present invention provides a method comprising an additional step prior to step (a) of isolating polynucleotide sequences from the immune cell samples.
  • Further embodiments include polynucleotide sequences isolated from an immune cell sample, where the sample is a tissue comprising hematopoietic lineage cells.
  • the present invention also provides biomarkers produced by the methods described herein.
  • the biomarker is polypeptide encoded by the exposure-specific clonotype sequence.
  • the polypeptide biomarker is an antibody.
  • the present invention provides a diagnostic kit for detecting a disease or risk for a disease comprising one or more biomarkers described herein.
  • the biomarker is selected from the group consisting of a polynucleotide biomarker, a labeled polypeptide biomarker, and an antibody biomarker.
  • the biomarker detects a disease or risk of a disease selected from the group consisting of an autoimmune disease, an inflammatory disease, an immune deficiency, a bacterial infection, a viral infection, a fungal infection, a parasite infection, and a prenatal disease.
  • the present invention provides a method for detecting disease or a risk for disease in a subject comprising a method for detecting a disease or a risk of a disease in subject in a subject comprising: a) selecting a biomarker for the disease; b)detecting the presence of the biomarker in the polypeptide sequences of the subject.
  • the method provides further comprises determining that the presence of the biomarker in the polypeptide sequences of the subject is indicative of the disease or the risk of the disease.
  • the selection uses a panel of biomarkers, and (c) comprises determining if the frequency of the exposure-specific clonotype sequence is indicative of disease or risk of disease.
  • the clonotype sequences are genomic sequences.
  • It is another aspect of the present invention to provide a method for detecting disease or a risk of disease in subject in a subject comprising (a) providing isolated clonotype sequences from the subject's immune cell sample; (b) detecting the presence of an exposure-specific clonotype sequence using the claimed biomarkers described herein; and (c) determining if the frequency of the exposure-specific clonotype sequence from the subject is at a frequency indicative of disease or a risk of disease.
  • the method comprises using a panel of biomarkers and (c) comprises determining if the frequency of the exposure-specific clonotype sequence is indicative of disease or risk of disease.
  • the present invention provides a method for selecting a biomarker for diabetes or a risk for Type I diabetes comprising selecting a biomarker according to the method described herein, where the exposure-specific clonotype sequences are selected from the group consisting of SEQ ID. NOS: 75 to763.
  • the present invention provides a method for detecting diabetes Type I or a risk for diabetes Type I in a subject comprising (a) selecting a disease-specific biomarker from SEQ ID. NOS: 75 to763, where the biomarker consists of a disease specific clonotype sequence or a panel of such sequences; (b) measuring the frequency of the clonotype sequence(s) in the subject; and (c) determining if the frequency of the diagnostic clonotype sequences in the subject is more consistent with frequencies observed in diseases cases or controls.
  • the present invention provides a method for detecting multiple sclerosis or a risk for multiple sclerosis in a subject comprising selecting a biomarker according to the methods described herein, where the exposure-specific clonotype sequences are selected from the group consisting of SEQ ID. NOS: 764 to 1166.
  • the present invention provides a method for detecting multiple sclerosis or a risk for multiple sclerosis in a subject comprising (a) selecting a disease-specific biomarker from SEQ ID. NOS: 764 to 1166, where the biomarker consists of a disease specific clonotype sequence or a panel of such sequences; (b) measuring the frequency of the clonotype sequence(s) in the subject; and (c) determining if the frequency of the diagnostic clonotype sequences in the subject is more consistent with frequencies observed in diseases cases or controls.
  • the present invention provides a method for selecting a biomarker for diabetes or a risk for Epstein Barr Virus comprising selecting a biomarker according to the method described herein, wherein the exposure-specific clonotype sequences are selected from the group consisting of SEQ ID. NOS: 71 to 74 and 1167.
  • FIG. 1 Pairwise shared CD8+ TCR beta amino acid sequences.
  • the T cell repertoires from seven healthy people were compared. Each point on the graph represents a pairwise comparison between two of the seven donors for a total of 21 comparisons.
  • the Y-axis value is number of matching TCR beta chains from the naive T cells found in the blood.
  • the X-axis value is the number of matching TCR beta chains from the memory T cells in the blood.
  • Nucleotide or “nucleotide sequence” as used herein refers to polynucleotides, such as deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), oligonucleotides, fragments generated by the polymerase chain reaction (PCR), and fragments generated by any of ligation, scission, endonuclease action, and exonuclease action.
  • DNA deoxyribonucleic acid
  • RNA ribonucleic acid
  • PCR polymerase chain reaction
  • Nucleic acid molecules can be composed of monomers that are naturally occurring nucleotides (such as DNA and RNA), or analogs of naturally-occurring nucleotides (e.g., a-enantiomeric forms of naturally- occurring nucleotides), or a combination of both, unless otherwise specified.
  • Naturally occurring nucleotides such as DNA and RNA
  • analogs of naturally-occurring nucleotides e.g., a-enantiomeric forms of naturally- occurring nucleotides
  • Modified nucleotides can have alterations in sugar moieties and/or in pyrimidine or purine base moieties.
  • nucleic acid molecule refers to a nucleic acid molecule having a complementary nucleotide sequence and reverse orientation as compared to a reference nucleotide sequence.
  • a "primer” defines an oligonucleotide which is capable of annealing to (hybridizing with) a target sequence, thereby creating a double stranded region which can serve as an initiation point for polynucleotide synthesis under suitable conditions.
  • nucleotide amplification reaction refers to any suitable procedure that amplifies a specific region of polynucleotides (target) using primers. See generally woh et al, Am. Biotechnol. Lab. 5: 14 (1990); Kwoh et al, Proc. Natl. Acad. Sci. USA 86, 1 173-1177 (1989); Lizardi et al, BioTechnology 6:1197-1202 (1988); Malek et al, Methods Mol Biol, 25:253-260 (1994); and Sambrook et al, "Molecular Cloning: A laboratory Manual” (1989)).
  • a “polypeptide” is a polymer of amino acid residues joined by peptide bonds. As used herein refers to any two or more joined amino acid residues and includes peptides and proteins whether produced naturally or synthetically, unless specifically stated otherwise.
  • Antigen as used herein denotes a molecule that binds antibodies. As used herein antigen also includes molecules that can illicit an adaptive immune response also referred to as "immunogen”.
  • Immunoid cells denotes cells derived from a pleuripotent hematopoietic stem cell and differentiated from a common lymphoid stem cell into cells of the lymphoid cell lineage. Lymphoid lineage cells give rise B cells and T cells and are located in the lymphoid organs and tissues which include bone marrow, blood, thymus, spleen, lymph nodes and mucosal lymphoid tissues.
  • B cell denotes a lymphocyte derived from a B cell progenitor in bone marrow.
  • the receptor on the B cell is referred to as a B cell receptor or "immunoglobin” (IG) and consists of an effector region and a two component homodimer, each having two
  • IGH immunoglobulin heavy
  • IG L immunoglobulin light
  • the IG H is organized into five classes, ⁇ , ⁇ , ⁇ , ⁇ , and ⁇
  • the IGL is organized into two classes, LK and I Naive B cells have not encountered specific antigens and are naive when leaving the bone marrow. Memory B cells upon activation by antigen produce antibodies specific to that antigen.
  • T cell denotes a lymphocyte that is maintained in the thymus and has either ⁇ : ⁇ or ⁇ : ⁇ heterodimeric receptor. There are Va, ⁇ , Vy and V8, Ja, ⁇ , Jy and J5, and ⁇ and ⁇ loci. Naive T cells have not encountered specific antigens and T cells are naive when leaving the thymus. Naive T cells are identified as CD45RO " , CD45RA W , CD62L + . Memory T cells mediate immunological memory to respond rapidly on re-exposure to the antigen that originally induced their expansion and can be "CD8+” (T helper cells) or "CD4+” (T cytotoxic cells). Memory CD4 T cells are identified as CD4+, CD45RO+ cells and memory CD 8 cells are identified as CD8+ CD45RO+.
  • MHC molecule denotes a highly polymorphic glycoprotein encoded by major histocompatibility complex (MHC) class I and II genes, and plays a role in presenting antigen to the T cell.
  • MHC major histocompatibility complex
  • a “detectable label” is a molecule or atom which can be conjugated to a nucleic acid sequence, polypeptide sequence or antibody moiety to produce a molecule useful for diagnosis.
  • detectable labels include chelators, photoactive agents, radioisotopes, fluorescent agents, paramagnetic ions, or other marker moieties.
  • T cells which primarily recognize peptide antigens presented by MHC molecules on the surface of specialized antigen presenting cells
  • most of this receptor diversity is contained within the third complementarity-determining region (CDR3) of the T cell receptor (TCR) a and ⁇ chains.
  • CDR3 complementarity-determining region
  • B cell receptors antigenic specificity is determined in the third region of the CDR of the immunoglobulin heavy (IGH) and immunoglobulin light (IGL) peptides, and is not MHC dependent.
  • IGH immunoglobulin heavy
  • IGL immunoglobulin light
  • Two properties of B and T cells that make IGs and TCRs very enticing biomarkers are (1) clonal expansion upon recognition of an antigen and (2) the causal association of the cells with disease resistance (for pathogens) or disease pathology (in autoimmune disease).
  • the immune system Upon binding to a target antigen, the immune system amplifies these biomarkers through rapid cell division, allowing these receptors to be readily detectable even if the target is rare (e.g. an early stage tumor).
  • the immune cells useful as biomarkers might additionally be a causative component of the disease and, therefore, a drug target.
  • dysregulated T cells specifically attacking self tissues are thought to play a major role in multiple sclerosis, Type I diabetes, and rheumatoid arthritis.
  • the present invention is based on assessing the TCRjS CDR3 repertoires realized in the naive and memory CD8+ compartments of multiple donors, and for the first time reading and comparing approximately 3 million unique CDR3 sequences from ⁇ 40 million primary sequences from the repertoires of these different individuals.
  • These unique CDR3 sequences identified from TCR or IG are defined as "clonotype sequences”. The results unexpectedly showed half of all TCR/3 in each individual were derived from a small and specific subset of all possible TCR3, characterized by less than about five inserted nucleotides, and biased toward specific VJ pair usage.
  • TCR/3 was in response to the same pathogen, Epstein-Barr Virus.
  • Epstein-Barr Virus The frequency of these T cells where donors were exposed to the Epstein Barr Virus was small enough to identify clonotype sequences as biomarkers reflecting Epstein Barr Virus infection.
  • the clonotype sequences associated with a single disease or pathogen within an individual are defined as "exposure-specific clonotype sequences".
  • Exposure-specific clonotype sequences observed in common across multiple individuals are defined as "public antigen receptors", and the cells carrying these receptors are "public T cells" or "public B cells”.
  • the present specification discloses that there is a large overlap in TCR sequences between unrelated individuals, providing a pool of public antigen receptors between individuals that are enriched in response to a shared pathogenic exposure.
  • the present invention provides in one aspect a method for identifying public antigen receptors as biomarkers, based on the discovery that specific amino acid sequences of TCR or IG loci are recurrently produced in multiple individuals, and may be used in responding to the same pathogen or environmental exposure.
  • the rearranged VDJ regions from millions of rearranged TCR/3 genes in the naive and memory CD 8 compartments were sequenced and the unexpected result was a high overlap between individuals, where more than half of all TCR/3 clonotype sequences in each individual were derived from a small but specific subset of all possible TCR/3 clonotype sequences.
  • the present invention provides in one aspect a method for identifying public antigen receptors as biomarkers, based on the discovery that specific amino acid sequences of TCR or IG loci are recurrently produced in multiple individuals, and may be used in responding to the same pathogen or environmental exposure.
  • TCR/3 clonotype sequences in each individual were derived from a small but specific subset of all possible TCR3 clonotype sequences.
  • the proportion of public polypeptide sequences specific to an immune response carried in the memory compartment or the overall repertoire of an individual is defined as the "frequency" of public antigen receptor sequences for a specific immune response, within that individual.
  • tissue sample might be whole blood or subsets thereof, biopsy from a specific tissue or lesion, or specific cell types extracted from a biopsy specimen. Sequences might be derived from the tissue sample on the basis of either DNA or RNA.
  • the present invention provides biomarkers generated from exposure- specific clonotype sequences that are identified as a result of using the methods of the present invention.
  • the exposure-specific clonotype sequences are detected from immune cells taken from a group of subjects with a disease.
  • a group of subjects is two or more people and is also referred to as a "population".
  • the immune cells can originate from a tissue where a mixture of immune cells is present and T cells or B cells are isolated by any suitable means.
  • Polynucleotide sequences, such as DNA or RNA are isolated and prepared in manner suitable for amplification using oligonucleotide primers in a nucleotide amplification reaction.
  • the biomarker is defined as the panel of specific antigen receptor sequences that are significantly enriched in the cases as compared to controls.
  • the present invention provides a method for using these biomarkers or exposure-specific clonotype sequences as diagnostics for disease or risk of disease in normal subjects, to detect a disease or risk of a disease.
  • a panel of exposure-specific clonotype sequences could be used to monitor either the initiation or the progression of an autoimmune disease.
  • Clonotype sequences may refer to either amino acid or nucleotide sequences.
  • Clonotype amino acid sequences can be encoded by multiple possible clonotype nucleotide sequences e.g., degenerate nucleotide sequences, so the diagnostic criteria include the amino acid sequence and all possible nucleotide sequences giving rise to that amino acid sequence.
  • the method provides for collecting samples of immune cells from the subject.
  • the samples are collected and prepared independently and are not necessary for practicing the invention. In either case, one way of collecting samples uses whole blood from an individual where Peripheral Blood
  • PBMCs Mononuclear Cells
  • the cells can be cryopreserved for later use, and screened for tetramer positive T cells for a specific antigen.
  • Whole blood is processed to isolate memory B cells (CD 19+, CD27+), memory CD4 T cells (CD4+, CD45RO+) and memory CD8 T cells (CD8+ CD45RO+).
  • Apparatus for separation are known, for example the AutoMACS® magnetic bead cell separator (Miltenyi Biotec, Auburn, CA).
  • B cell or T cell nucleotides are sequenced. If total genomic DNA is to be used, it is extracted from cells, e.g., by using the QIAamp® DNA blood Mini Kit (QIAGEN®). The approximate mass of a single haploid genome is 3 pg. Using PBMCs as a source, the number of T cells can be estimated to be about 30% of total cells. Alternatively, total nucleic acid can be isolated from cells, including both genomic DNA and mRNA. Methods for extracting nucleotides from immune cells and tissue are well known to those of ordinary skill in the art.
  • a variety of methods are useful to detect diagnostic clonotype sequences of either TCR or IG for a given disease state. Methods include the comprehensive sequencing strategy disclosed herein and used to discover exposure-specific clonotype sequences, but are not limited to this method as other sequence-specific detection methods available should work, including hybridization, TaqMan, and other technologies that detect the presence or absence of specific nucleotide sequences. Additionally, the peptides encoded by the diagnostic TCR or IG might be detected directly by immunoassay with either the appropriate antigen or monoclonal antibodies developed for this purpose.
  • the immunoassays which can be used include, but are not limited to, competitive assay systems using techniques such western blots, radioimmunoassays, ELISA, "sandwich” immunoassays, immunoprecipitation assays, precipitin assays, gel diffusion precipitin assays, immunoradiometric assays, fluorescent immunoassays, protein A
  • nucleic acid based methods are preferred over protein based assays, but either can distinguish individuals carrying the clonotype- specific sequence at high frequency from individuals not carrying the diagnostic clonotype.
  • the present inventors previously disclosed a computational method to measure TCR CDR3 diversity based on single molecule DNA sequencing in WO2010/151416, which is hereby incorporated by reference in its entirety.
  • a method was developed based on single molecule DNA sequencing and an analytic computational approach to estimation of repertoire diversity using diversity measurements in finite samples.
  • the analysis demonstrated that the number of unique TCRP CDR3 sequences in the adult repertoire significantly exceeds previous estimates based on exhaustive capillary sequencing of small segments of the repertoire.
  • the TCRP chain diversity in the CD45RO- population (enriched for naive T cells) observed using the methods described herein was five-fold larger than previously reported.
  • the number of unique TCRp CDR3 sequences expressed in antigen-experienced CD45RO + T cells was between 10 and 20 times larger than expected.
  • CD45RO + cells suggested that the T cell repertoire contains a large number of clones with a small clone size. Furthermore, it was determined that the realized set of TCR/3 chains were sampled non-uniformly from the huge potential space of sequences. In particular, the ⁇ chains sequences closer to germ line (few insertions and deletions at the V-D and D-J boundaries) were created at a relatively high frequency. TCR sequences close to germ line were shared between different people because the germ line sequence for the V's, D's, and J's are shared, modulo a small number of polymorphisms, among the human population.
  • the degree of overlap did not appear to depend strongly on the extent of HLA-A-, -B, or -C matching.
  • Convergent evolution is the possibility that a diverse set of TCRs rearrange in the thymus, and the positive and negative selection process favors the same lower diversity subset of TCRs in each individual.
  • the small effective size of the TCR repertoire was primarily a result of the stochastic process of insertion of non-templated nucleotides.
  • the number of nucleotides inserted into a given junction varied according to a probability distribution, which was empirically determine from the data.
  • the sequences with low numbers of insertion were selected from a small number of possibilities and, were likely to be created in multiple people.
  • the process of /5-chain rearrangement occurs during T cell development, before the functional ⁇ receptor was displayed on the surface of the T cell, prior to thymic selection.
  • TCR sequences The large shared overlap of TCR sequences suggested that multiple people were able to respond to the same antigen with identical (or very similar) T cell clones. These public responses are believed to be common, as supported by the EBV example.
  • the clinical application is the diagnostic utility of these overlapping TCR sequences. With the technology presented, one can readily detect the repertoire of expanded clones in the memory compartment. A set of "public" TCR clones induced by a particular antigen can be used to diagnose a disease state associated with the antigen.
  • the HLA variability is a consideration in the application of such a diagnostic, but many diseases, such as Type 1 Diabetes are strongly associated with particular HLA alleles.
  • biomarkers and methods of the present invention allow one of skill in the art to identify, diagnose, or otherwise assess subjects who do not exhibit any symptoms of exposure to a pathogen, or initiation of an autoimmune disease.
  • a subject may not have clinically symptomatic diabetes, in particular Type I diabetes, but nonetheless may be at risk for developing diabetes or experiencing symptoms characteristic of a diabetic condition.
  • Other autoimmune diseases include, but are not limited to: multiple sclerosis, rheumatoid arthritis, scleroderma, Primary CNS Vasculitis, Rasmussen's Encephalitis, Autoimmune Peripheral Neuropathy, Autoimmune Cerebellar Degeneration, Gait Ataxia with Late Age Onset
  • GALOP Polyneuropathy
  • Stiff Person Syndrome Chronic Inflammatory Demyelinating Polyneuropathy
  • Myasthenia Gravis Lambert Eaton Myasthenic Syndrome
  • HTLV-1- Associated Myelopathy (HAM) / Tropical Spastic Paraparesis (TSP), Opsoclonus / Myoclonus (Anti-Ri), and Neuromyelitis Optica
  • Grave's disease Hashimoto's thyroiditis, Celiac disease, Crohn's disease, Ulcerative colitis, Systemic Lupus Erythematosus, Goodpasture's syndrome, Wegener's granulomatosis
  • Polymyalgia Rheumatica Guillain-Barre syndrome
  • Addison's disease Ankylosing Spondylitis
  • Psoriasis Psoriasis.
  • Infectious diseases that might generate useful biomarkers for exposure status include bacterial infections such as: Escherichia coli, Haemophilus influenzae, Actinomycosis,
  • Clostridrium, Staphylococcus, Streptococcocus including: Hepatitis A, Hepatitis B, Hepatitis C, Hantavirus, Denque Fever, Herpes Simplex Virus, Herpes Zoster,
  • Cytomegalovirus CMV
  • Epstein-Barr virus Ebola virus
  • Marburg Virus SARS
  • fungal infections including: Aspergillosis, Blastomycosis, Candidiasis, Coccidioidomycosis,
  • Cryptococcosis Histoplasmosis, Mucormycosis, Paracoccidioidomycosis, Sporotrichosis; and parasitic infections including: Amebiasis, Amebic Infections, Ascariasis, Babesiosis,
  • Cryptosporidiosis Dracunculiasis, Giardiasis, Hookworm Infection, Leishmaniasis,- Malaria, Microsporidiosis, Onchocerciasis, Pinworm Infection, Schistosomiasis, Tapeworm Infection, Toxocariasis, Toxoplasmosis, Trichinosis, Whipworm Infection.
  • the present claims embody methods wherein other non-specific inflammatory diseases or immune dysfunction syndromes can include useful disease-specific antigen receptor sequences, such as: Gouty arthritis,
  • Identifying a subject exposed to a pathogen, particularly a pathogenic antigen enables the selection and initiation of various therapeutic interventions or treatment regimens in order to delay, reduce or prevent that subject's onset of a disease state.
  • the present invention can also be used to screen subject populations in any number of settings.
  • a health maintenance organization, public health entity or school health program can screen a group of subjects to identify those requiring interventions, as described above, or for the collection of epidemiological data.
  • Tests to measure biomarkers and biomarker panels can be implemented on a wide variety of diagnostic test systems.
  • Amplification of a selected a nucleic acid sequence may be carried out by a number of suitable methods. See generally Kwoh et al, Am. Biotechnol. Lab. S:14 (1990). Numerous amplification techniques have been described and can be readily adapted to suit particular needs of a person of ordinary skill. Non-limiting examples of amplification techniques include polymerase chain reaction (PCR), ligase chain reaction (LCR), strand displacement amplification (SDA), transcription-based amplification, the q3 replicase system and NASBA, all known to those skilled in the art.
  • PCR polymerase chain reaction
  • LCR ligase chain reaction
  • SDA strand displacement amplification
  • transcription-based amplification the q3 replicase system
  • NASBA all known to those skilled in the art.
  • Useful assays include, for example, an enzyme immune assay (EIA) such as enzyme-linked immunosorbent assay
  • ELISA ELISA
  • RIA radioimmune assay
  • Western blot assay Western blot assay
  • slot blot assay see, e.g., Diamandis, Immunoassay (Academic Press, Inc. 1996),.U.S. Pat. Nos. 4,366,241; 4,376,110; 4,517,288; and 4,837,168).
  • the subjects were positive for Epstein Barr Virus.
  • four hundred four (404) biomarkers were identified as being found be "public” across subjects who have Multiple Sclerosis (MS) or exhibit symptoms characteristic of MS.
  • Six hundred eighty-nine (689) biomarkers were also identified as being found "public” in subjects who have Type 1 diabetes, or who exhibit symptoms characteristic of diabetes, including, for example, insulin resistance or altered beta cell function.
  • Example 1 Sample acquisition, PBMC isolation, FACS sorting and genomic DNA extraction
  • the T-lymphocytes were flow sorted into four compartments for each subject:
  • CD8 + CD45RO +/" and CD4 + CD45RO + " The characterization of lymphocytes used the following conjugated anti-human antibodies: CD4 FITC (clone M-T466, Miltenyi Biotec), CD8 PE (clone RPA-T8, BD Biosciences), CD45RO ECD (clone UCHL-1, Beckman Coulter), and CD45RO APC (clone UCHL-1, BD Biosciences). Staining of total PBMCs was done with the appropriate combination of antibodies for 20 minutes at 4°C, and stained cells were washed once before analysis. Lymphocyte subsets were isolated by FACS sorting in the BD FACSAriaTM cell-sorting system (BD Biosciences). Data were analyzed with FlowJo software (Treestar Inc.).
  • Total genomic DNA was extracted from sorted cells using the QIAamp ® DNA blood Mini Kit (QIAGEN ® ). The approximate mass of a single haploid genome is 3 pg. In order to sample millions of rearranged TCRB in each T cell compartment, 6 to 27 micrograms of template DNA from each compartment was isolated, as shown in Table 1.
  • TCR ⁇ chain spectratyping was performed as by those skilled in the art. Complementary DNA was synthesized from RNA extracted from sorted T cell populations and used as template for multiplex PCR amplification of the rearranged TCR ⁇ chain CDR3 region. Each multiplex reaction contained a 6-FAM-labeled antisense primer specific for the TCR ⁇ chain constant region, and two to five TCR ⁇ chain variable (TRBV) gene-specific sense primers. All 23 functional V ⁇ families were studied.
  • PCR reactions were carried out on a Hybaid PCR Express thermal cycler (Hybaid, Ashford, UK) under the following cycling conditions: 1 cycle at 95°C for 6 minutes, 40 cycles at 94°C for 30 seconds, 58 °C for 30 seconds, and 72°C for 40 seconds, followed by 1 cycle at 72°C for 10 minutes.
  • Each reaction contained cDNA template, 500 ⁇ dNTPs, 2mM MgCl 2 and 1 unit of AmpliTaq Gold DNA polymerase (Perkin Elmer) in
  • fluorescence intensity vs. time which was converted to a distribution of fluorescence intensity vs. length by comparison with the fluorescence intensity trace of a reference sample containing known size standards.
  • the multiplex PCR system uses 45 forward primers (SEQ ID. NOS:l-45), each specific to a functional TCR V/3 segment, and 13 reverse primers (SEQ ID. NOS:46-57), each specific to a TCR J/3 segment.
  • Primer design 45 forward PCR primers (SEQ ID NOS: 1-45) complementary to each of the 48 functional Variable segments, and 13 reverse PCR primers (SEQ ID NOS: 46-57) complementary to each of the 13 functional Joining genes (TRBJ) from the TRB locus, as listed in the international ImMunoGeneTics information system® (See, Lefranc, M.-P. et al, Nucleic Acids Research, 27, 209-212 (1999); Ruiz, M.
  • the forward primers are modified at the 5' end with the universal forward primer sequence compatible with the GA2 cluster station solid-phase PCR.
  • all of the reverse primers are modified with the GA2 universal reverse primer sequence.
  • the information used to assign the J and V segment of a sequence read was entirely contained within the amplified sequence, and did not rely upon the identity of the PCR primers.
  • the sequencing oligonucleotides such that promiscuous priming of a sequencing reaction for one J segment by an oligonucleotide-specific to another J segment would generate sequence data starting at exactly the same nucleotide as sequence data from the correct sequencing oligo. In this way, promiscuous annealing of the sequencing oligonucleotides should not impact the quality of the sequence data generated.
  • the average length of the CDR3 region defined following convention as the nucleotides between the 2 nd conserved cysteine of the V segment and the conserved phenylalanine of the J segment, is 35+/-3, so sequences starting from our J 3 segment tag nearly always captures the complete VNDNJ junction in a 50 bp read.
  • TCR ⁇ J gene segments are roughly 50 bp in length.
  • PCR primers that anneal and extend to mismatched sequences are referred to as promiscuous primers.
  • the design addressed the risk of promiscuous priming in the context of multiplex PCR, especially in the context of a gene family, the TCR J/3 Reverse PCR primers to minimize overlap with the sequencing
  • the 13 TCR J/3 reverse primers are anchored at the 3' end on the consensus splice site motif, with minimal overlap of the sequencing primers.
  • the TCR J/3 primers were designed for a consistent annealing temperature (58 degrees in 50 mM salt) using the OligoCalc program under default parameters (Kibbe, Nucleic Acid Res. 35 (Sup.2)43-46 (2007)).
  • the 45 TCR VjS forward primers were designed to anneal to the VjS segments in a region of relatively strong sequence conservation between VjS segments, for two express purposes. First, maximizing the conservation of sequence among these primers minimizes the potential for differential annealing properties of each primer. Second, the primers were chosen such that the amplified region between V and J primers will contain sufficient TCR VS sequence information to identify the specific VS gene segment used. This obviates the risk of erroneous TCR VjS gene segment assignment, in the event of promiscuous priming by the TCR VjS primers. TCR VjS forward primers were designed for all known non-pseudogenes in the TCRjS locus.
  • the total PCR product for a successfully rearranged TCRjS CDR3 region using this system was expected to be approximately 200 bp long.
  • Genomic templates were PCR amplified using an equimolar pool of the 45 TCR VjS F primers (the "VF pool”) and an equimolar pool of the 13 TCR JjS R primers (the "JR pool”).
  • 50 ⁇ PCR reactions were set up at 1.0 ⁇ VF pool (22 nM for each unique TCR VjS F primer), 1.0 ⁇ JR pool (77 nM for each unique TCRBJR primer), IX QIAGEN Multiple PCR master mix (QIAGEN part number 206145), 10% Q- solution (QIAGEN), and 16 ng/ul gDNA.
  • thermal cycling conditions were used in a PCR Express thermal cycler (Hybaid, Ashford, UK) under the following cycling conditions: 1 cycle at 95°C for 15 minutes, 25 to 40 cycles at 94°C for 30 seconds, 59°C for 30 seconds and 72°C for 1 minute, followed by one cycle at 72°C for 10 minutes. 12-20 wells of PCR were performed for each library, in order to sample hundreds of thousands to millions of rearranged TCR/3 CDR3 loci.
  • Example 4 Estimating relative CDR3sequence abundance in PCR pools and blood samples
  • the underlying distribution of T-cell sequences in the blood was reconstructed because of the sequence data output from the GA experiments.
  • the experimental procedure used three steps; 1) flow sorting T-cells drawn from peripheral blood, 2) PCR amplification, and 3) sequencing.
  • the observed data was worked from backwards 3) to first determine the ratio of sequences in the PCR distribution 2) before estimating the true distribution of clonotype sequences in blood.
  • the mathematical solution devised provided that for a total number of TCR/3 "species" or clonotypes, S, a sequencing experiment observes x s copies of sequence s. For all of the unobserved clonotypes, x s equals 0, and each TCR clonotype is "captured" in a blood draw according to a Poisson process with parameter X s . If the number of T cell genomes sequenced are experiment 1, then the number sequenced in a second experiment is t where t is exact same experiment).
  • G(X) is the empirical distribution function of the parameters 1]
  • s is the number of clonotypes sequenced exactly x times
  • the CDR3 region in each TCR ⁇ chain included sequences derived from one of the 13 gene segments. Analysis of the CDR3 sequences in the four different T cell populations from the two donors demonstrated that the fraction of total sequences which incorporated sequences derived from the 13 different gene segments varied more than 20-fold.
  • the gene segment usage pattern observed in the four different T cell populations was relatively constant within a given donor.
  • the Jp usage patterns observed in two donors which were inferred from analysis of genomic DNA from T cells sequenced using the GA, are qualitatively similar to those observed in T cells from umbilical cord blood and from healthy adult donors, both of which were inferred from analysis of cDNA from T cells sequenced using exhaustive capillary-based techniques.
  • the TCR ⁇ chain sequences were translated to amino acids and then compared pairwise between our two donors. Many thousands of exact sequence matches were found. For example, comparing the CD4 + CD45RO- sub-compartments, approximately 8000 of the 250,000 unique amino acid sequences from donor 1 were exact matches to donor 2. Many of these matching sequences at the amino acid level have multiple nucleotide differences at third codon positions. Following the example mentioned above, 1500/8000 identical amino acid matches had >5 nucleotide mismatches. Between any two T cell sub-types, we find 4-5% of the unique TCR 3 sequences have identical amino acid matches.
  • the number of unique CDR3 sequences observed in each lane of the GA flow cell routinely exceeded 1 x 10 5 .
  • the total number of unique TCR/3 CDR3 sequences in the entire T cell repertoire of each individual is likely to be far higher. Estimating the number of unique sequences in the entire repertoire, therefore, requires an estimate of the number of additional unique CDR3 sequences that exist in the blood but were not observed in the sample.
  • the estimation of total species diversity in a large, complex population using measurements of the species diversity present in a finite sample has historically been called the "unseen species problem", an analytic solution to which was developed over 60 years ago. The solution starts with determining the number of new species, or TCR 3 CDR3 sequences, that are observed if the experiment is repeated, i.e., if the GA
  • the total TCR/3 diversity in these populations is between 4-5 million unique sequences in the peripheral blood.
  • the CD45RO + , or antigen-experienced, compartment constitutes approximately 1.5 million of these sequences. This is at least an order of magnitude larger than expected. This discrepancy is likely attributable to the large number of these sequences observed at low relative frequency, which could only be detected through deep sequencing.
  • the estimated TCR3 CDR3 repertoire sizes of each compartment in the two donors are within 20% of each other.
  • Example 10 Sample acquisition, PBMC isolation, FACS sorting and genomic DNA extraction
  • a 100 blood samples of 200 ml whole blood from each individual are collected and antibody titers are measured at several laboratories.
  • DNA is sequenced and amplified as described in Example 1.
  • HLA-A HLA-A
  • -B -C typing data
  • EBV Epstein-Barr virus
  • Donor characteristics “+” indicates detectable serum IgG against the viral capsid antigen of EBV.
  • Donors 1 and 3 were full siblings and the daughters of donor 2; the other four donors were unrelated.
  • Table 3 Donor characteristics
  • PBMCs are isolated from the 50 ml whole blood aliquots on a Ficoll gradient. Cryopreserved PBMCs are prepared and can be screened for tetramer positive T cells for a specific antigen. Buffy coat DNA is isolated from the second syringe, for use in screening for specific sequences in the nai ' ve cells.
  • the third syringe of whole blood is processed on the AutoMACS magnetic bead cell separator to isolate approximately 10 6 memory B cells (CD 19+, CD27+), 10 6 memory CD4 T cells (CD4+, CD45RO+) and 10 6 memory CD 8 T cells (CD8+ CD45RO+).
  • DNA is prepared from each of the three cell subpopulations: B cell and CD4 T cell DNAs is sequenced.
  • HLA typing Because CD4 sequences are included in the adaptive profile, high resolution typing at the HLA-DRB1 gene, to identify carriers of high frequency alleles at this locus, which may constrain the CD4 memory sequences is performed.
  • the IgM class is typically associated with very recent exposures.
  • a panel of pathogenic exposures of between 10% and 90% seroprevalent is expected. These are detailed in Table 11. The panel encompasses 13 pathogens, and 40 serological subtypes.
  • the potential ⁇ chain repertoire in humans has been calculated as 10 11 possible amino acid sequences.
  • Much of the predicted TCR/3 diversity derives from non-templated nucleotide insertions at the V-D and D-J junctions. The prediction of six insertions at each of the two junctions, for a total of 12 non-templated nucleotides.
  • the cumulative distribution of TCR sequences is plotted as a function of number of insertions for the naive and memory compartments, respectively, from seven donors. Although examples of beta chains with higher numbers of insertions were observed, these were rare; less than 5% of human TCR/3 chain sequences had 12 or more insertions.
  • TCR/3 chain sequences had either zero or one inserted nucleotide, and almost half of the TCR/3 sequences in each of our donors had five or fewer total insertions, including both junctions. This shows that a large fraction of the TCRj3 repertoire is sampled from a small fraction of the 10 11 possible sequences.
  • the model was governed by a set of rules that were empirically determined from the observed TCR/3 CDR3 sequence data from the seven donors.
  • the model allowed all possible V-D-J combinations, deletion of up to 10 nucleotides from the 3' end of the V/3 segment and from the 5' end of the J/3 segment, deletion of nucleotides from the 5' and 3' ends, up to complete deletion, of the D/3 segment, and up to 7 total junctional insertions.
  • the total length of the CDR3 sequence defined as the interval from the codon for the conserved cysteine at the 3' end of the V/3 gene segment to the codon for the conserved phenylalanine in the 5' portion of the J/3 gene segment, was constrained such that it could encode from 9 to 23 amino acid residues.
  • Generation of the CDR3 amino acid sequences containing a total of 7 junctional insertions was performed on a 50-node linux cluster with 8 CPUs per node, and required 24 hours to complete.
  • Each CDR3 nucleotide sequence was translated into the corresponding amino acid sequence and then stored in a binary tree organized by alphabet. For each new sequence not found in the tree, a new leaf was created with that sequence. For each sequence already found in the tree, the count for the corresponding leaf was incremented by one.
  • TCRp CDR3 Much of the predicted diversity in the TCRp CDR3 repertoire is generated by non- templated nucleotide insertions at the Vp-Dp and Dp-Jp junctions.
  • the cumulative distribution of TCRp CDR3 sequences observed in the CD8 + naive and memory compartments, respectively, of the seven donors as a function of the number of junctional insertions demonstrates that sequences with 12 or more insertions were observed, but constitute only 10% of the total. In contrast, more than 10% of the observed sequences had zero, one, or two insertions, and 50% of the sequences in each donor had six or fewer total insertions at the two junctions.
  • the expected overlap was approximately 2 sequences, substantially less than the observed > 10,000.
  • the expected overlap O is the sum of the expected overlaps from the sequences with different numbers of insertions labeled by k.
  • the outcome Y ⁇ is the observed number of overlaps between in the n n sequences from donor 1 and the n n sequences from donor 2.
  • the model parameters (a,b) are chosen to minimize the error between the observed number of overlaps and the regression model using the non-linear least squares method.
  • Previously identified ⁇ 1 x 10 6 as a lower bound for the number of unique TCR CDR3 amino acid sequences in the naive CD8 + T cell compartment of a healthy adult was applied (Robins supra.)
  • Vp-Jp utilization was surprisingly consistent between individuals, especially for the rare Vp-Jp pairs, as reflected by the fact that the variance in Vp-Jp utilization was proportional to mean utilization.
  • a fraction of the TCRp CDR3 sequences in the genomic DNA from the nai ' ve and memory CD8 + T cells of each of the seven donors was predicted to generate out-of-frame TCRp transcripts that do not encode functional TCRP chains (Table 5).
  • Vp-Jp utilization in the out-of-frame CDR3 sequences was highly non-uniform and qualitatively similar to that observed for in- frame transcripts.
  • the variability of Vp-Jp utilization in the out-of- frame CDR3 sequences cannot be attributed to positive or negative selection in the thymus of T cells bearing specific receptors, because these sequences do not generate proteins that participate in the selection process.
  • the similarity in the utilization of specific Vp-Jp combinations in out-of- frame, nonfunctional and in- frame, functional TCRp transcripts therefore suggests that the variability in Vp-Jp utilization in both sets of sequences is attributable, at least in part, to mechanisms that operate before the stage of thymic selection.
  • Vp-Dp-Jp combinations suggest that rearrangement between Vp and Dp gene segments is random, while that between Dp and Jp gene segments is not.
  • the apparent non-random association between specific Dp and Jp gene segments is likely attributable to the organization of the TCRP locus, in which Dp i lies 5' of all 13 Jp segments, while Dp2 lies 3' of the 6 members of the Jpl cluster but 5' of the 7 members of the Jp2 cluster.
  • the Dpi segment is observed at roughly equal frequency with all 13 Jp's, while Dp2 is much more frequently paired with members of the Jp2 compared with the Jpl family.
  • Dp2 is observed with members of the Jpl family about a third (.30+/-.05) as often as would be expected if the pairing were random.
  • EBV Epstein-Barr virus
  • S CDR3 sequence CASSLGQAYEQYF derived from V#, D# and J# with # inserted nucleotides (Argaet et ah, J Exp Med 750.-2335-4O, 1994, hereby incorporated by reference in its entirety).
  • A2 specific clonotype sequences found with high frequency were:
  • a B8 specific sequence found with high frequency was CASSLGQAYEQYF (SEQ ID. NO: 1167 Of the seven donors, this TCR/3 clonotype was observed in the na ' ive CD* compartment of two individuals, but in the memory compartment of only one donor, where it accounted for over 1% of all clonotype sequences.
  • a comparison of our donors' sequences with results from a study that identified 50 TCRs that interact with known EBV epitopes presented by HLA A-2 tetramers was performed. Four individuals matched at least one of these sequences in their naive compartments.
  • mice deficient for terminal deoxynucleotidyl transferase the enzyme that catalyzes the template-independent insertion of nucleotides at the junctions, have 10-fold less diversity in their TCR CDR3 repertoires, with few insertions, yet these mice appear healthy, make efficient and specific immune responses, and display no increased susceptibility to infection (Gilfillan, et al, Eur J Immunol 25, 3115-3122 (1995).; Cabaniols,et al, J Exp Med 194, 1385-1390 (2001). Sequences with less insertions and deletions have receptor sequences closer to germ line.
  • EBV Epstein Barr virus
  • HLA- A* 0201- and HLA-B* 0801 -associated, EBV-specific CDR3 sequences only in the three donors expressing one of the associated HLA alleles was statistically significant (P 0.0002 by two-tailed Fisher exact test;).
  • Table 6 presents the probability of finding a false positive as a function of signal size. For the expected range of signal, a proposed study is sufficiently powered to limit false positives and to detect any sequence that is found in 20%» or more the cases. Note that Table 6 is the marginal calculation for each disease. Taking all the diseases together, a multiple hypothesis test correction is needed. The Benjamini Hochberg FDR analysis is sufficient.
  • Type 1 diabetes associates strongly with the class 2 HLA DRB 1 *03/04 genotype, so we sorted for CD4 memory (CD4+, CD45RO+) cells, and screened the CD4 memory cells for public sequences shared between three cases of T1D. These "public" TCR sequences were found in less than 5 of 10 HLA matched controls without T1D. DNA from all 13 donors was sequenced for both memory and naive genotypes. The clonotype sequences that distinguish the T1D cases from the controls are shown as SEQ ID. NOS: 75-763.
  • CD4 memory cells were sorted from three MS cases carrying the DRB1 *1501 allele, as well as three HLA-matched controls, to identify public sequences that might be enriched among MS cases.
  • a set of "public" T cells were selected as TCR sequences found in all three cases and no more than one of the three controls.
  • DNA from all donors was sequenced for both memory and na ' ive genotypes.
  • the clonotype sequences that distinguish the MS cases from the controls are shown as SEQ ID. NOS: 764-1166.

Abstract

A method of detecting disease or risk of disease in subject using a biomarker or panel of biomarkers identified from exposure-specific clonotype sequences is described. The method for identifying the exposure-specific clonotype sequences from a subset of B cells or T cells from an immune cell sample is also described.

Description

USE OF TCR CLONOTYPES AS BIOMARKERS FOR DISEASE
REFERENCE TO RELATED APPLICATIONS
This application claims the benefit of U.S. Provisional Application No. 61/308,261 filed February 25, 2010, and is hereby incorporated by reference in its entirety.
BACKGROUND
Identifying exposure to specific environmental agents, ranging from bacteria to chemical carcinogens is important for detecting, treating and monitoring a variety of human diseases. As part of the immune response to pathogens, B cells and T cells in the adaptive immune system form long lasting memory responses to protect against future exposures which can be measured. The response specific to antigens is based on the highly polymorphic receptors encoded by B cells (immunoglobulins (IG)) and T cells (T cell receptors (TCR)). TCRs are heterodimeric proteins consisting of either an a chain and a β chain or a γ chain and a δ chain. A similar structure in B cell receptors is present where IGs are also a heterodimer, consisting of one light and one heavy chain.
Diversity in lymphocyte antigen receptors is generated by somatic rearrangement of TCR and IG genes, and is concentrated within the third complementarity-determining region (CDR3) of each chain of the receptor heterodimer. In TCRB and IGH, the nucleotide sequences that encode the CDR3 regions are noncontiguous variable (V), diversity (D), and joining (J) region gene segments, while only V and J segments are used for the TCRA and IGL chains. The existence of multiple V, D, and J gene segments in germline DNA permits substantial combinatorial diversity in receptor composition. Receptor diversity is further augmented by the deletion of nucleotides adjacent to the recombination signal sequences (RSS) of the V, D, and J segments, and template-independent insertion of nucleotides at the V-D, D-J, and V-J junctions.
Infectious disease diagnostics presently rely upon some similarity of symptoms as a basis to screen for exposure to single pathogens or small panels of pathogens. Furthermore, in the past discoveries of association between exposure and disease state have required prior hypotheses as to the identity of the exposure. The ability to screen for environmental exposures that correlate with disease states without pre-specifying the exposure would be invaluable to public health efforts by improving measurements of environmental exposures. The sequencing depth of capillary-based technologies does not make a comprehensive comparison of memory IG and TCR variable sequences between individuals feasible. The present invention provides methods and compositions for the identification of sequences specific to pathogenic exposures and other uses that should be apparent to those skilled in the art from the teachings herein.
SUMMARY
In one aspect, the present invention provides a method for identifying a biomarker for a disease comprising a) providing isolated polynucleotide sequences from immune cells from a group of patients with the disease; b) performing a nucleotide amplification reaction to produce a set of clonotype sequences; c) identifying clonotype sequences enriched within the group of patients; d) providing isolated polynucleotide sequences from immune cells from a group of normal subjects without the disease, and amplifying the polynucleotides; e) removing sequences present in the normal subject group, which are obtained in step d), from the exposure-specific clonotype sequences, which are obtained in step c).
Included in the method are certain embodiments as described herein. In one
embodiment, the polynucleotide sequences are rearranged genomic sequences. In another embodiment, the nucleotide amplification reaction comprises (i) a multiplicity of V-segment primers, where each primer comprises a sequence that is complementary to a single functional V- segment or a small family of V- seqments; and (ii) a multiplicity of J-segment primers, where each primer comprises a sequence that is complementary to a J segment; and where the V- segment and J-segment primers amplify a TCR CDR3 region. In another embodiment, the V- segments are selected from the group consisting of Va, νβ, Vy and V6, and the J-segments are selected from the group consisting of Ja, jp, Jy and J5. In other embodiments, the immune cell samples are selected from the group consisting of CD45RO+, CD45RAint/neg CD8+ T cells and CD45RO" CD45RAhi CD62Lhi CD8+ T cells. In certain embodiments, the T cells share one or more HLA alleles. In another embodiment, (i) a multiplicity of V-segment primers, wherein each primer comprises a sequence that is complementary to a single functional V-segment or a small family of V-seqments; and (ii) a multiplicity of J primers, wherein each primer comprises a sequence that is complementary to a J-segment; wherein the V-segment and J-segment primers amplify an IGH, IGL or IGK CDR3 region. In another embodiment, the V-segment primers (forward primers) are anchored at a position between 40 and 60 base pairs 5' of the
recombination signal sequence (RSS), within the V segment. Another embodiment can include where the J-segment primers (reverse primers) are at a position about 30 base pair 3' of the J gene RSS site. Certain other embodiments may include where the exposure-specific clonotype sequences have an insertion of less than six nucleotides.
In other embodiments of the present invention, certain diseases are of importance and include diseases selected from the group consisting of an autoimmune disease, an inflammatory disease, an immune deficiency, a bacterial infection, a viral infection, a fungal infection, or a parasitic infection.
Another embodiment of the present invention provides a method comprising an additional step prior to step (a) of isolating polynucleotide sequences from the immune cell samples. Further embodiments include polynucleotide sequences isolated from an immune cell sample, where the sample is a tissue comprising hematopoietic lineage cells.
The present invention also provides biomarkers produced by the methods described herein. In certain embodiments, the biomarker is polypeptide encoded by the exposure-specific clonotype sequence. In another embodiment, the polypeptide biomarker is an antibody.
In another aspect, the present invention provides a diagnostic kit for detecting a disease or risk for a disease comprising one or more biomarkers described herein. In certain
embodiments, the biomarker is selected from the group consisting of a polynucleotide biomarker, a labeled polypeptide biomarker, and an antibody biomarker. In embodiments of the present invention, the biomarker detects a disease or risk of a disease selected from the group consisting of an autoimmune disease, an inflammatory disease, an immune deficiency, a bacterial infection, a viral infection, a fungal infection, a parasite infection, and a prenatal disease.
In another aspect, the present invention provides a method for detecting disease or a risk for disease in a subject comprising a method for detecting a disease or a risk of a disease in subject in a subject comprising: a) selecting a biomarker for the disease; b)detecting the presence of the biomarker in the polypeptide sequences of the subject. In other embodiments, the method provides further comprises determining that the presence of the biomarker in the polypeptide sequences of the subject is indicative of the disease or the risk of the disease.
In certain embodiments, the selection uses a panel of biomarkers, and (c) comprises determining if the frequency of the exposure-specific clonotype sequence is indicative of disease or risk of disease. In other embodiments, the clonotype sequences are genomic sequences.
It is another aspect of the present invention to provide a method for detecting disease or a risk of disease in subject in a subject comprising (a) providing isolated clonotype sequences from the subject's immune cell sample; (b) detecting the presence of an exposure-specific clonotype sequence using the claimed biomarkers described herein; and (c) determining if the frequency of the exposure-specific clonotype sequence from the subject is at a frequency indicative of disease or a risk of disease. In certain embodiments, the method comprises using a panel of biomarkers and (c) comprises determining if the frequency of the exposure-specific clonotype sequence is indicative of disease or risk of disease.
In another aspect, the present invention provides a method for selecting a biomarker for diabetes or a risk for Type I diabetes comprising selecting a biomarker according to the method described herein, where the exposure-specific clonotype sequences are selected from the group consisting of SEQ ID. NOS: 75 to763.
In another aspect, the present invention provides a method for detecting diabetes Type I or a risk for diabetes Type I in a subject comprising (a) selecting a disease-specific biomarker from SEQ ID. NOS: 75 to763, where the biomarker consists of a disease specific clonotype sequence or a panel of such sequences; (b) measuring the frequency of the clonotype sequence(s) in the subject; and (c) determining if the frequency of the diagnostic clonotype sequences in the subject is more consistent with frequencies observed in diseases cases or controls.
In another aspect, the present invention provides a method for detecting multiple sclerosis or a risk for multiple sclerosis in a subject comprising selecting a biomarker according to the methods described herein, where the exposure-specific clonotype sequences are selected from the group consisting of SEQ ID. NOS: 764 to 1166.
In another aspect, the present invention provides a method for detecting multiple sclerosis or a risk for multiple sclerosis in a subject comprising (a) selecting a disease-specific biomarker from SEQ ID. NOS: 764 to 1166, where the biomarker consists of a disease specific clonotype sequence or a panel of such sequences; (b) measuring the frequency of the clonotype sequence(s) in the subject; and (c) determining if the frequency of the diagnostic clonotype sequences in the subject is more consistent with frequencies observed in diseases cases or controls.
In another aspect, the present invention provides a method for selecting a biomarker for diabetes or a risk for Epstein Barr Virus comprising selecting a biomarker according to the method described herein, wherein the exposure-specific clonotype sequences are selected from the group consisting of SEQ ID. NOS: 71 to 74 and 1167. BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 : Pairwise shared CD8+ TCR beta amino acid sequences. The T cell repertoires from seven healthy people were compared. Each point on the graph represents a pairwise comparison between two of the seven donors for a total of 21 comparisons. The Y-axis value is number of matching TCR beta chains from the naive T cells found in the blood. The X-axis value is the number of matching TCR beta chains from the memory T cells in the blood.
DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS
The following definitions are intended to assist in understanding the present invention: "Nucleotide" or "nucleotide sequence" as used herein refers to polynucleotides, such as deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), oligonucleotides, fragments generated by the polymerase chain reaction (PCR), and fragments generated by any of ligation, scission, endonuclease action, and exonuclease action. Nucleic acid molecules can be composed of monomers that are naturally occurring nucleotides (such as DNA and RNA), or analogs of naturally-occurring nucleotides (e.g., a-enantiomeric forms of naturally- occurring nucleotides), or a combination of both, unless otherwise specified. Those skilled in the art will recognize that considerable sequence variation is possible among nucleotide molecules encoding for amino acids based on the degeneracy of the genetic code. Modified nucleotides can have alterations in sugar moieties and/or in pyrimidine or purine base moieties.
The term "complement of a nucleic acid molecule" refers to a nucleic acid molecule having a complementary nucleotide sequence and reverse orientation as compared to a reference nucleotide sequence.
As used herein, a "primer" defines an oligonucleotide which is capable of annealing to (hybridizing with) a target sequence, thereby creating a double stranded region which can serve as an initiation point for polynucleotide synthesis under suitable conditions.
The term "nucleotide amplification reaction" refers to any suitable procedure that amplifies a specific region of polynucleotides (target) using primers. See generally woh et al, Am. Biotechnol. Lab. 5: 14 (1990); Kwoh et al, Proc. Natl. Acad. Sci. USA 86, 1 173-1177 (1989); Lizardi et al, BioTechnology 6:1197-1202 (1988); Malek et al, Methods Mol Biol, 25:253-260 (1994); and Sambrook et al, "Molecular Cloning: A laboratory Manual" (1989)).
A "polypeptide" is a polymer of amino acid residues joined by peptide bonds. As used herein refers to any two or more joined amino acid residues and includes peptides and proteins whether produced naturally or synthetically, unless specifically stated otherwise. "Antigen" as used herein denotes a molecule that binds antibodies. As used herein antigen also includes molecules that can illicit an adaptive immune response also referred to as "immunogen".
"Immune cells" as used herein denotes cells derived from a pleuripotent hematopoietic stem cell and differentiated from a common lymphoid stem cell into cells of the lymphoid cell lineage. Lymphoid lineage cells give rise B cells and T cells and are located in the lymphoid organs and tissues which include bone marrow, blood, thymus, spleen, lymph nodes and mucosal lymphoid tissues.
"B cell" as used herein denotes a lymphocyte derived from a B cell progenitor in bone marrow. The receptor on the B cell is referred to as a B cell receptor or "immunoglobin" (IG) and consists of an effector region and a two component homodimer, each having two
heterodimeric chains with an immunoglobulin heavy (IGH) and immunoglobulin light (IGL). The IGH is organized into five classes, Ηα, Ηδ, Ηε, Ηγ, and Ημ, and the IGL is organized into two classes, LK and I Naive B cells have not encountered specific antigens and are naive when leaving the bone marrow. Memory B cells upon activation by antigen produce antibodies specific to that antigen.
"T cell" as used herein denotes a lymphocyte that is maintained in the thymus and has either α:β or γ:δ heterodimeric receptor. There are Va, νβ, Vy and V8, Ja, Ιβ, Jy and J5, and ϋβ and Όδ loci. Naive T cells have not encountered specific antigens and T cells are naive when leaving the thymus. Naive T cells are identified as CD45RO", CD45RAW, CD62L+. Memory T cells mediate immunological memory to respond rapidly on re-exposure to the antigen that originally induced their expansion and can be "CD8+" (T helper cells) or "CD4+" (T cytotoxic cells). Memory CD4 T cells are identified as CD4+, CD45RO+ cells and memory CD 8 cells are identified as CD8+ CD45RO+.
"MHC molecule" as used herein denotes a highly polymorphic glycoprotein encoded by major histocompatibility complex (MHC) class I and II genes, and plays a role in presenting antigen to the T cell.
A "detectable label" is a molecule or atom which can be conjugated to a nucleic acid sequence, polypeptide sequence or antibody moiety to produce a molecule useful for diagnosis. Examples of detectable labels include chelators, photoactive agents, radioisotopes, fluorescent agents, paramagnetic ions, or other marker moieties.
All references cited herein are incorporated by reference in their entirety. A person's ability to defend against specific environmental agents and pathogens including viruses, bacteria and chemical carcinogens requires recognizing the pathogen and responding in ways that are both immediate and long-lasting. Recognition is a function of both the innate and adaptive systems, but it is the B cells and T cells in the adaptive immune system that form long lasting memory responses to protect against future exposures to the same pathogen. To achieve this, the adaptive immune system employs several ingenious strategies to generate a repertoire of T- and B-cell antigen receptors with sufficient diversity to recognize the universe of potential pathogens. In T cells, which primarily recognize peptide antigens presented by MHC molecules on the surface of specialized antigen presenting cells, most of this receptor diversity is contained within the third complementarity-determining region (CDR3) of the T cell receptor (TCR) a and β chains. In B cell receptors antigenic specificity is determined in the third region of the CDR of the immunoglobulin heavy (IGH) and immunoglobulin light (IGL) peptides, and is not MHC dependent. Two properties of B and T cells that make IGs and TCRs very enticing biomarkers are (1) clonal expansion upon recognition of an antigen and (2) the causal association of the cells with disease resistance (for pathogens) or disease pathology (in autoimmune disease). Upon binding to a target antigen, the immune system amplifies these biomarkers through rapid cell division, allowing these receptors to be readily detectable even if the target is rare (e.g. an early stage tumor). Second, in autoimmune diseases, the immune cells useful as biomarkers might additionally be a causative component of the disease and, therefore, a drug target. For example, dysregulated T cells specifically attacking self tissues are thought to play a major role in multiple sclerosis, Type I diabetes, and rheumatoid arthritis.
Published studies of the human TCR/3 CDR3 sequence repertoire with high-throughput sequencing technologies have used a variety of pre-sequencing molecular strategies, sequencing platforms, and analytical approaches to assess the repertoire realized in single individuals or a pool of 550 individuals ((Robins, et al, Blood 114, 4099-4107 (2009); Venturi, et al, Nat Rev Immunol 8, 231-238 (2008); Wang, et al, Proc Natl Acad Sci USA 107, 1518-1523 (2010)). Due to the different molecular strategies employed in the three studies, the number of unique TCRjS CDR3 nucleotide sequences observed varied widely, from a low of -34,000 - derived from analysis of 40.5 million primary sequence reads (Freeman, et al. Genome Res 19, 1817- 1824 (2009)) - to more than 500,000 unique sequences (Robins, supra). The present invention is based on assessing the TCRjS CDR3 repertoires realized in the naive and memory CD8+ compartments of multiple donors, and for the first time reading and comparing approximately 3 million unique CDR3 sequences from ~40 million primary sequences from the repertoires of these different individuals. These unique CDR3 sequences identified from TCR or IG are defined as "clonotype sequences". The results unexpectedly showed half of all TCR/3 in each individual were derived from a small and specific subset of all possible TCR3, characterized by less than about five inserted nucleotides, and biased toward specific VJ pair usage. Moreover, a subset of TCR/3 from the donors was in response to the same pathogen, Epstein-Barr Virus. The frequency of these T cells where donors were exposed to the Epstein Barr Virus was small enough to identify clonotype sequences as biomarkers reflecting Epstein Barr Virus infection. The clonotype sequences associated with a single disease or pathogen within an individual are defined as "exposure-specific clonotype sequences". Exposure-specific clonotype sequences observed in common across multiple individuals are defined as "public antigen receptors", and the cells carrying these receptors are "public T cells" or "public B cells". The present specification discloses that there is a large overlap in TCR sequences between unrelated individuals, providing a pool of public antigen receptors between individuals that are enriched in response to a shared pathogenic exposure.
Thus, the present invention provides in one aspect a method for identifying public antigen receptors as biomarkers, based on the discovery that specific amino acid sequences of TCR or IG loci are recurrently produced in multiple individuals, and may be used in responding to the same pathogen or environmental exposure. The rearranged VDJ regions from millions of rearranged TCR/3 genes in the naive and memory CD 8 compartments were sequenced and the unexpected result was a high overlap between individuals, where more than half of all TCR/3 clonotype sequences in each individual were derived from a small but specific subset of all possible TCR/3 clonotype sequences. Thus, the present invention provides in one aspect a method for identifying public antigen receptors as biomarkers, based on the discovery that specific amino acid sequences of TCR or IG loci are recurrently produced in multiple individuals, and may be used in responding to the same pathogen or environmental exposure. The rearranged VDJ regions from millions of rearranged TCR/3 genes in the nai've and memory CD8
compartments were sequenced and the unexpected result was a high overlap between
individuals, where more than half of all TCR/3 clonotype sequences in each individual were derived from a small but specific subset of all possible TCR3 clonotype sequences. The proportion of public polypeptide sequences specific to an immune response carried in the memory compartment or the overall repertoire of an individual is defined as the "frequency" of public antigen receptor sequences for a specific immune response, within that individual.
Comparison of shared sequences across individuals with common exposures allows determines panels of public antigen receptors specific for a given exposure disease. The joint frequency of sequences associated with a specific immune response in the overall repertoire of unexposed individuals is adequately small allowing discrimination between exposed and unexposed individuals. These individuals could also be individuals who have broken tolerance to self in autoimmune disease, as compared to individuals who have not yet initiated the immune response to self. In a given tissue sample, frequency can be measured as the number or proportion of total sequences, the number or proportion of unique sequences, or even the identity of specific sequences, and can be compared to each sequence observed in individuals experiencing a given immune response. The tissue sample might be whole blood or subsets thereof, biopsy from a specific tissue or lesion, or specific cell types extracted from a biopsy specimen. Sequences might be derived from the tissue sample on the basis of either DNA or RNA.
In another aspect, the present invention provides biomarkers generated from exposure- specific clonotype sequences that are identified as a result of using the methods of the present invention. The exposure-specific clonotype sequences are detected from immune cells taken from a group of subjects with a disease. For the purposes herein, a group of subjects is two or more people and is also referred to as a "population". The immune cells can originate from a tissue where a mixture of immune cells is present and T cells or B cells are isolated by any suitable means. Polynucleotide sequences, such as DNA or RNA are isolated and prepared in manner suitable for amplification using oligonucleotide primers in a nucleotide amplification reaction. The clonotype sequences that are shared in common between individuals sharing either exposure or autoimmune disease ("cases") are identified. The process is repeated with a control population identified as either unexposed to a given pathogen, not experiencing autoimmune disease, or even a cohort of individuals with unknown exposure status in the case of rare exposures, to establish which sequences shared among the cases are specific to the
exposure/condition. The biomarker is defined as the panel of specific antigen receptor sequences that are significantly enriched in the cases as compared to controls.
In another aspect, the present invention provides a method for using these biomarkers or exposure-specific clonotype sequences as diagnostics for disease or risk of disease in normal subjects, to detect a disease or risk of a disease. For example, a panel of exposure-specific clonotype sequences could be used to monitor either the initiation or the progression of an autoimmune disease.
Clonotype sequences may refer to either amino acid or nucleotide sequences. Clonotype amino acid sequences can be encoded by multiple possible clonotype nucleotide sequences e.g., degenerate nucleotide sequences, so the diagnostic criteria include the amino acid sequence and all possible nucleotide sequences giving rise to that amino acid sequence.
In certain embodiments of the present invention the method provides for collecting samples of immune cells from the subject. In other embodiments, the samples are collected and prepared independently and are not necessary for practicing the invention. In either case, one way of collecting samples uses whole blood from an individual where Peripheral Blood
Mononuclear Cells (PBMCs) are isolated, for example, on a Ficoll gradient. The cells can be cryopreserved for later use, and screened for tetramer positive T cells for a specific antigen. Whole blood is processed to isolate memory B cells (CD 19+, CD27+), memory CD4 T cells (CD4+, CD45RO+) and memory CD8 T cells (CD8+ CD45RO+). Apparatus for separation are known, for example the AutoMACS® magnetic bead cell separator (Miltenyi Biotec, Auburn, CA).
B cell or T cell nucleotides are sequenced. If total genomic DNA is to be used, it is extracted from cells, e.g., by using the QIAamp® DNA blood Mini Kit (QIAGEN®). The approximate mass of a single haploid genome is 3 pg. Using PBMCs as a source, the number of T cells can be estimated to be about 30% of total cells. Alternatively, total nucleic acid can be isolated from cells, including both genomic DNA and mRNA. Methods for extracting nucleotides from immune cells and tissue are well known to those of ordinary skill in the art.
A variety of methods are useful to detect diagnostic clonotype sequences of either TCR or IG for a given disease state. Methods include the comprehensive sequencing strategy disclosed herein and used to discover exposure-specific clonotype sequences, but are not limited to this method as other sequence-specific detection methods available should work, including hybridization, TaqMan, and other technologies that detect the presence or absence of specific nucleotide sequences. Additionally, the peptides encoded by the diagnostic TCR or IG might be detected directly by immunoassay with either the appropriate antigen or monoclonal antibodies developed for this purpose. The immunoassays which can be used include, but are not limited to, competitive assay systems using techniques such western blots, radioimmunoassays, ELISA, "sandwich" immunoassays, immunoprecipitation assays, precipitin assays, gel diffusion precipitin assays, immunoradiometric assays, fluorescent immunoassays, protein A
immunoassays, and complement-fixation assays. Such assays are routine and well known in the art (see, e.g., Ausubel et al, eds, 1994 Current Protocols in Molecular Biology, Vol. 1, John Wiley & sons, Inc., New York). Additionally, routine cross-blocking assays such as those described in Antibodies, A Laboratory Manual, Cold Spring Harbor Laboratory, Ed Harlow and David Lane, 1988), can be performed. In certain embodiments, nucleic acid based methods are preferred over protein based assays, but either can distinguish individuals carrying the clonotype- specific sequence at high frequency from individuals not carrying the diagnostic clonotype.
The present inventors previously disclosed a computational method to measure TCR CDR3 diversity based on single molecule DNA sequencing in WO2010/151416, which is hereby incorporated by reference in its entirety. In those studies, a method was developed based on single molecule DNA sequencing and an analytic computational approach to estimation of repertoire diversity using diversity measurements in finite samples. The analysis demonstrated that the number of unique TCRP CDR3 sequences in the adult repertoire significantly exceeds previous estimates based on exhaustive capillary sequencing of small segments of the repertoire. The TCRP chain diversity in the CD45RO- population (enriched for naive T cells) observed using the methods described herein was five-fold larger than previously reported. The number of unique TCRp CDR3 sequences expressed in antigen-experienced CD45RO+ T cells was between 10 and 20 times larger than expected. The frequency distribution of CDR3 sequences in
CD45RO+ cells suggested that the T cell repertoire contains a large number of clones with a small clone size. Furthermore, it was determined that the realized set of TCR/3 chains were sampled non-uniformly from the huge potential space of sequences. In particular, the β chains sequences closer to germ line (few insertions and deletions at the V-D and D-J boundaries) were created at a relatively high frequency. TCR sequences close to germ line were shared between different people because the germ line sequence for the V's, D's, and J's are shared, modulo a small number of polymorphisms, among the human population.
Additionally, the insertion rates of nucleotides were strongly biased. The frequency distribution of CDR3 sequences in CD45RCT cells was notable for a significant number of sequences that were observed with low copy counts, suggesting that the αβ CD45RO+ T cell repertoire contains a large number of clones with a small clone size.
Previous work demonstrated that the number of unique CDR3 sequences found in the repertoire of a single individual is 3-4 x 106 (WO2010/151416 and Robins, et al, Blood 114, 4099-4107 (2009)), which, although large, comprises a negligible fraction of the number of the estimated 5 x 1011 theoretically possible TCRjS CDR3 sequences. Based on this, one would expect fewer than 5 sequences shared between any two individual repertoires assuming 3-4 x 106 sequences present in each individual were randomly chosen from a uniform distribution of 5 x 1011 sequences. The present inventors compared the TCR/3 CDR3 sequence repertoires present in pairs of healthy adults and unexpectedly found that the actual repertoire overlap between any two individuals is several thousand-fold larger than predicted. Moreover, the degree of overlap did not appear to depend strongly on the extent of HLA-A-, -B, or -C matching. The finding implied either (1) that individual TCR/3 sequence repertoires were not randomly selected from the space of possible sequences, or (2) that the sequences in this space were not uniformly probable, or both. Distinguishing between these two possibilities was important, because the first possibility would be consistent with convergent evolution during T cell development and the second implies that the sequence space of receptors in the cellular adaptive immune system is much smaller than presently believed. Convergent evolution is the possibility that a diverse set of TCRs rearrange in the thymus, and the positive and negative selection process favors the same lower diversity subset of TCRs in each individual. Arranging the TCR/3 CDR3 sequences observed in each donor according to the number of junctional insertions demonstrated that the sequences were not randomly sampled from the set of possible sequences. This lead to the conclusion that majority of sequences in each donor sampled were from a small corner of the set of possible sequences, and that the distribution of sequences in the possible sequence space is not uniform. To determine whether the distribution was sufficiently non-uniform to account for the larger than expected observed overlaps between individual repertoires, a model that assumes that the distribution is uniform within each subset of sequences carrying a specified number of insertions was created. The model let the fraction of each subset (with a specified number of insertions) equal the empirically determined value. Using the model, the overlap expected when the CDR3 sequences are randomly sampled from this distribution was predicted. It was determined the computed overlap was -14,000, which was indistinguishable from the observed overlap in the cohort. Thus, one does not need to invoke non-random selection of the sequences in the possible sequence space to account for the observed overlap. In other words, there was no evidence of convergent evolution.
The results determined that the small effective size of the TCR repertoire was primarily a result of the stochastic process of insertion of non-templated nucleotides. The number of nucleotides inserted into a given junction varied according to a probability distribution, which was empirically determine from the data. The sequences with low numbers of insertion were selected from a small number of possibilities and, were likely to be created in multiple people. The process of /5-chain rearrangement occurs during T cell development, before the functional β receptor was displayed on the surface of the T cell, prior to thymic selection.
The large shared overlap of TCR sequences suggested that multiple people were able to respond to the same antigen with identical (or very similar) T cell clones. These public responses are believed to be common, as supported by the EBV example. The clinical application is the diagnostic utility of these overlapping TCR sequences. With the technology presented, one can readily detect the repertoire of expanded clones in the memory compartment. A set of "public" TCR clones induced by a particular antigen can be used to diagnose a disease state associated with the antigen. The HLA variability is a consideration in the application of such a diagnostic, but many diseases, such as Type 1 Diabetes are strongly associated with particular HLA alleles.
The biomarkers and methods of the present invention allow one of skill in the art to identify, diagnose, or otherwise assess subjects who do not exhibit any symptoms of exposure to a pathogen, or initiation of an autoimmune disease. For example, a subject may not have clinically symptomatic diabetes, in particular Type I diabetes, but nonetheless may be at risk for developing diabetes or experiencing symptoms characteristic of a diabetic condition. Other autoimmune diseases include, but are not limited to: multiple sclerosis, rheumatoid arthritis, scleroderma, Primary CNS Vasculitis, Rasmussen's Encephalitis, Autoimmune Peripheral Neuropathy, Autoimmune Cerebellar Degeneration, Gait Ataxia with Late Age Onset
Polyneuropathy (GALOP), Stiff Person Syndrome, Chronic Inflammatory Demyelinating Polyneuropathy, Myasthenia Gravis, Lambert Eaton Myasthenic Syndrome, HTLV-1- Associated Myelopathy, (HAM) / Tropical Spastic Paraparesis (TSP), Opsoclonus / Myoclonus (Anti-Ri), and Neuromyelitis Optica, Grave's disease, Hashimoto's thyroiditis, Celiac disease, Crohn's disease, Ulcerative colitis, Systemic Lupus Erythematosus, Goodpasture's syndrome, Wegener's granulomatosis, Polymyalgia Rheumatica, Guillain-Barre syndrome, Addison's disease, Ankylosing Spondylitis, Psoriasis.
Infectious diseases that might generate useful biomarkers for exposure status include bacterial infections such as: Escherichia coli, Haemophilus influenzae, Actinomycosis,
Listeriosis, Meningococcus, Pneumococcus, Pseudomonas,-Salmonella, Shigellosis,
Clostridrium, Staphylococcus, Streptococcocus; viral infections including: Hepatitis A, Hepatitis B, Hepatitis C, Hantavirus, Denque Fever, Herpes Simplex Virus, Herpes Zoster,
Cytomegalovirus (CMV), Epstein-Barr virus (EBV), Ebola virus, Marburg Virus, SARS; fungal infections including: Aspergillosis, Blastomycosis, Candidiasis, Coccidioidomycosis,
Cryptococcosis, Histoplasmosis, Mucormycosis, Paracoccidioidomycosis, Sporotrichosis; and parasitic infections including: Amebiasis, Amebic Infections, Ascariasis, Babesiosis,
Cryptosporidiosis, Dracunculiasis, Giardiasis, Hookworm Infection, Leishmaniasis,- Malaria, Microsporidiosis, Onchocerciasis, Pinworm Infection, Schistosomiasis, Tapeworm Infection, Toxocariasis, Toxoplasmosis, Trichinosis, Whipworm Infection.
In addition to autoimmune disease and infectious disease, the present claims embody methods wherein other non-specific inflammatory diseases or immune dysfunction syndromes can include useful disease-specific antigen receptor sequences, such as: Gouty arthritis,
Polymyalgia rheumatica, Kawasaki disease, juvenile dermatomyositis, DiGeorge Syndrome, severe combined immunodeficiency (SCID), common variable immunodeficiency (CVID), Bruton's agammaglobulinemia, AIDs caused by retroviruses HIV-1 or HIV-2. These lists of diseases and antigenic agents are illustrative, and in some cases preferred, but are not intended to be limiting.
Identifying a subject exposed to a pathogen, particularly a pathogenic antigen enables the selection and initiation of various therapeutic interventions or treatment regimens in order to delay, reduce or prevent that subject's onset of a disease state.
The present invention can also be used to screen subject populations in any number of settings. For example, a health maintenance organization, public health entity or school health program can screen a group of subjects to identify those requiring interventions, as described above, or for the collection of epidemiological data.
Tests to measure biomarkers and biomarker panels can be implemented on a wide variety of diagnostic test systems. Amplification of a selected a nucleic acid sequence may be carried out by a number of suitable methods. See generally Kwoh et al, Am. Biotechnol. Lab. S:14 (1990). Numerous amplification techniques have been described and can be readily adapted to suit particular needs of a person of ordinary skill. Non-limiting examples of amplification techniques include polymerase chain reaction (PCR), ligase chain reaction (LCR), strand displacement amplification (SDA), transcription-based amplification, the q3 replicase system and NASBA, all known to those skilled in the art.
Methods for performing immunoassays are well-established. Useful assays include, for example, an enzyme immune assay (EIA) such as enzyme-linked immunosorbent assay
(ELISA), a radioimmune assay (RIA), a Western blot assay, or a slot blot assay (see, e.g., Diamandis, Immunoassay (Academic Press, Inc. 1996),.U.S. Pat. Nos. 4,366,241; 4,376,110; 4,517,288; and 4,837,168).
The following examples describe the methods and compositions disclosed herein. In one example the subjects were positive for Epstein Barr Virus. In other examples, four hundred four (404) biomarkers were identified as being found be "public" across subjects who have Multiple Sclerosis (MS) or exhibit symptoms characteristic of MS. Six hundred eighty-nine (689) biomarkers were also identified as being found "public" in subjects who have Type 1 diabetes, or who exhibit symptoms characteristic of diabetes, including, for example, insulin resistance or altered beta cell function.
The invention is illustrated by the following non-limiting examples.
EXAMPLES
L Two donor study
Example 1 : Sample acquisition, PBMC isolation, FACS sorting and genomic DNA extraction
The T-lymphocytes were flow sorted into four compartments for each subject:
CD8+CD45RO+/" and CD4+CD45RO+ ". The characterization of lymphocytes used the following conjugated anti-human antibodies: CD4 FITC (clone M-T466, Miltenyi Biotec), CD8 PE (clone RPA-T8, BD Biosciences), CD45RO ECD (clone UCHL-1, Beckman Coulter), and CD45RO APC (clone UCHL-1, BD Biosciences). Staining of total PBMCs was done with the appropriate combination of antibodies for 20 minutes at 4°C, and stained cells were washed once before analysis. Lymphocyte subsets were isolated by FACS sorting in the BD FACSAria™ cell-sorting system (BD Biosciences). Data were analyzed with FlowJo software (Treestar Inc.).
Total genomic DNA was extracted from sorted cells using the QIAamp® DNA blood Mini Kit (QIAGEN®). The approximate mass of a single haploid genome is 3 pg. In order to sample millions of rearranged TCRB in each T cell compartment, 6 to 27 micrograms of template DNA from each compartment was isolated, as shown in Table 1.
Table 1
Figure imgf000017_0001
Example 2: T cell receptor β chain spectratyping
TCR β chain spectratyping was performed as by those skilled in the art. Complementary DNA was synthesized from RNA extracted from sorted T cell populations and used as template for multiplex PCR amplification of the rearranged TCR β chain CDR3 region. Each multiplex reaction contained a 6-FAM-labeled antisense primer specific for the TCR β chain constant region, and two to five TCR β chain variable (TRBV) gene-specific sense primers. All 23 functional V β families were studied. PCR reactions were carried out on a Hybaid PCR Express thermal cycler (Hybaid, Ashford, UK) under the following cycling conditions: 1 cycle at 95°C for 6 minutes, 40 cycles at 94°C for 30 seconds, 58 °C for 30 seconds, and 72°C for 40 seconds, followed by 1 cycle at 72°C for 10 minutes. Each reaction contained cDNA template, 500 μΜ dNTPs, 2mM MgCl2 and 1 unit of AmpliTaq Gold DNA polymerase (Perkin Elmer) in
AmpliTaq Gold buffer, in a final volume of 20 ul. After completion, an aliquot of the PCR product was diluted 1 :50 and analyzed via capillary electrophoresis using a 3730X1 DNA Analyzer (Applied Biosystems). The output of the DNA Analyzer is a distribution of
fluorescence intensity vs. time, which was converted to a distribution of fluorescence intensity vs. length by comparison with the fluorescence intensity trace of a reference sample containing known size standards.
Example 3: Multiplex PCR amplification of TCRff CDR3 regions
In order to generate the template library for the Genome Analyzer, a multiplex PCR system to amplify rearranged TCR/3 loci from genomic DNA was designed. The multiplex PCR system uses 45 forward primers (SEQ ID. NOS:l-45), each specific to a functional TCR V/3 segment, and 13 reverse primers (SEQ ID. NOS:46-57), each specific to a TCR J/3 segment.
Primer design: 45 forward PCR primers (SEQ ID NOS: 1-45) complementary to each of the 48 functional Variable segments, and 13 reverse PCR primers (SEQ ID NOS: 46-57) complementary to each of the 13 functional Joining genes (TRBJ) from the TRB locus, as listed in the international ImMunoGeneTics information system® (See, Lefranc, M.-P. et al, Nucleic Acids Research, 27, 209-212 (1999); Ruiz, M. et al, Nucleic Acids Research, 28, 219-221 (2000); Lefranc, M.-P., Nucleic Acids Research, 29, 207-209 (2001); Lefranc, M.-P., Nucleic Acids Res., 31, 307-310 (2003); Lefranc, M.-P. et al, In Silico Biol, 5, 0006 (2004); Lefranc, M.-P. et al, Nucleic Acids Res., 33, D593-D597 (2005); Lefranc, M.-P. et al, Nucleic Acids Research 37: D1006-D1012; 2009). The primers have been designed such that adequate information is present within the amplified sequence to identify both the V and J genes uniquely (>40 base pairs of sequence upstream of the V gene recombination signal sequence (RSS), and >30 base pairs downstream of the J gene RSS).
The forward primers are modified at the 5' end with the universal forward primer sequence compatible with the GA2 cluster station solid-phase PCR. Similarly, all of the reverse primers are modified with the GA2 universal reverse primer sequence.
Analysis of existing data in the IMGT database shows that the average J deletion is 4 bp +/- 2.5 bp, which implies that J deletions greater than 10 nucleotides occur in less than 1% of sequences. The 13 different TCR J/3 gene segments each have a unique four base tag at positions +11 through +14 downstream of the RSS site. Thus, sequencing oligos are designed to anneal to a consensus nucleotide motif observed just downstream of this "tag", so that the first four bases of a sequence read will uniquely identify the J segment (SEQ ID NOS: 58-70).
The information used to assign the J and V segment of a sequence read was entirely contained within the amplified sequence, and did not rely upon the identity of the PCR primers. The sequencing oligonucleotides such that promiscuous priming of a sequencing reaction for one J segment by an oligonucleotide-specific to another J segment would generate sequence data starting at exactly the same nucleotide as sequence data from the correct sequencing oligo. In this way, promiscuous annealing of the sequencing oligonucleotides should not impact the quality of the sequence data generated.
The average length of the CDR3 region, defined following convention as the nucleotides between the 2nd conserved cysteine of the V segment and the conserved phenylalanine of the J segment, is 35+/-3, so sequences starting from our J 3 segment tag nearly always captures the complete VNDNJ junction in a 50 bp read.
TCR β J gene segments are roughly 50 bp in length. PCR primers that anneal and extend to mismatched sequences are referred to as promiscuous primers. The design addressed the risk of promiscuous priming in the context of multiplex PCR, especially in the context of a gene family, the TCR J/3 Reverse PCR primers to minimize overlap with the sequencing
oligonucleotides. Thus, the 13 TCR J/3 reverse primers are anchored at the 3' end on the consensus splice site motif, with minimal overlap of the sequencing primers. The TCR J/3 primers were designed for a consistent annealing temperature (58 degrees in 50 mM salt) using the OligoCalc program under default parameters (Kibbe, Nucleic Acid Res. 35 (Sup.2)43-46 (2007)).
The 45 TCR VjS forward primers were designed to anneal to the VjS segments in a region of relatively strong sequence conservation between VjS segments, for two express purposes. First, maximizing the conservation of sequence among these primers minimizes the potential for differential annealing properties of each primer. Second, the primers were chosen such that the amplified region between V and J primers will contain sufficient TCR VS sequence information to identify the specific VS gene segment used. This obviates the risk of erroneous TCR VjS gene segment assignment, in the event of promiscuous priming by the TCR VjS primers. TCR VjS forward primers were designed for all known non-pseudogenes in the TCRjS locus.
The total PCR product for a successfully rearranged TCRjS CDR3 region using this system was expected to be approximately 200 bp long. Genomic templates were PCR amplified using an equimolar pool of the 45 TCR VjS F primers (the "VF pool") and an equimolar pool of the 13 TCR JjS R primers (the "JR pool"). 50 μΐ PCR reactions were set up at 1.0 μΜ VF pool (22 nM for each unique TCR VjS F primer), 1.0 μΜ JR pool (77 nM for each unique TCRBJR primer), IX QIAGEN Multiple PCR master mix (QIAGEN part number 206145), 10% Q- solution (QIAGEN), and 16 ng/ul gDNA. The following thermal cycling conditions were used in a PCR Express thermal cycler (Hybaid, Ashford, UK) under the following cycling conditions: 1 cycle at 95°C for 15 minutes, 25 to 40 cycles at 94°C for 30 seconds, 59°C for 30 seconds and 72°C for 1 minute, followed by one cycle at 72°C for 10 minutes. 12-20 wells of PCR were performed for each library, in order to sample hundreds of thousands to millions of rearranged TCR/3 CDR3 loci.
Example 4: Estimating relative CDR3sequence abundance in PCR pools and blood samples
After collapsing the data, the underlying distribution of T-cell sequences in the blood was reconstructed because of the sequence data output from the GA experiments. The experimental procedure used three steps; 1) flow sorting T-cells drawn from peripheral blood, 2) PCR amplification, and 3) sequencing. The observed data was worked from backwards 3) to first determine the ratio of sequences in the PCR distribution 2) before estimating the true distribution of clonotype sequences in blood.
For each sequence observed a given number of times in our data, estimates of the probability that that sequence was sampled from a particular size PCR pool were made. Because the CDR3 regions sequenced are sampled randomly from a massive pool of PCR products, the number of observations for each sequence are drawn from Poisson distributions. The Poisson parameters are quantized according to the number of T cell genomes that provided the template for PCR. A simple Poisson mixture model both estimates these parameters and places a pairwise probability for each sequence being drawn from each distribution. This is an expectation maximization method which reconstructs the abundances of each sequence that was drawn from blood.
Example 5: Unseen species model for estimation of true diversity
Limited sample volumes from donors required a mixture model to reconstruct the frequency of each TCR/3 CDR3 species drawn from the blood, particularly for patients undergoing treatment.
The mathematical solution devised provided that for a total number of TCR/3 "species" or clonotypes, S, a sequencing experiment observes xs copies of sequence s. For all of the unobserved clonotypes, xs equals 0, and each TCR clonotype is "captured" in a blood draw according to a Poisson process with parameter Xs. If the number of T cell genomes sequenced are experiment 1, then the number sequenced in a second experiment is t where t is exact same experiment).
Since there are a large number of unique sequences, an integral was used instead of a sum. If G(X) is the empirical distribution function of the parameters 1], s, and nx is the number of clonotypes sequenced exactly x times, then
Figure imgf000021_0001
To find Δ(ΐ), which is the number of new clonotype sequences observed in the second sequencing experiment.
Figure imgf000021_0002
Taylor expansion of l-e Xt gives A(t) = E(xi)t-E(x2)t2+E(x3)t3-..., which can be approximated by replacing the expectations (E(nx)) with the observed numbers in the first experiment. Substituting in the numbers observed in experiment 1 , this formula predicted that 1.6*105 new unique sequences should be observed in experiment 2. The actual observation was 1.8 * 105 new TCR/3 sequences, a valid lower bound on total diversity. The expression for Δ (t) oscillates wildly as t goes to infinity, however, so to produce a lower bound for Δ(∞), Aft) needs to be regularized. There are many known methods to accomplish this, and e.g. Euler's transformation.
Example 6: J 3 gene segment usage
The CDR3 region in each TCR β chain included sequences derived from one of the 13 gene segments. Analysis of the CDR3 sequences in the four different T cell populations from the two donors demonstrated that the fraction of total sequences which incorporated sequences derived from the 13 different gene segments varied more than 20-fold. The gene segment usage pattern observed in the four different T cell populations was relatively constant within a given donor. Moreover, the Jp usage patterns observed in two donors, which were inferred from analysis of genomic DNA from T cells sequenced using the GA, are qualitatively similar to those observed in T cells from umbilical cord blood and from healthy adult donors, both of which were inferred from analysis of cDNA from T cells sequenced using exhaustive capillary-based techniques.
Example 7: TCR β chains with identical amino acid sequences found in different people
The TCR β chain sequences were translated to amino acids and then compared pairwise between our two donors. Many thousands of exact sequence matches were found. For example, comparing the CD4+ CD45RO- sub-compartments, approximately 8000 of the 250,000 unique amino acid sequences from donor 1 were exact matches to donor 2. Many of these matching sequences at the amino acid level have multiple nucleotide differences at third codon positions. Following the example mentioned above, 1500/8000 identical amino acid matches had >5 nucleotide mismatches. Between any two T cell sub-types, we find 4-5% of the unique TCR 3 sequences have identical amino acid matches.
To confirm our finding of thousands of identical TCR β chain amino acid sequences, we compared our two donors to the CD8+ CD62L+ CD45RA+ (naive-like) TCRs from a third donor who was also Caucasian, but a 44 year old CMV+ female. Identical pairwise matches of many thousands of sequences at the amino acid level between our third donor and each of our original two donors were found (See Figure 1). Example 8: Larger clonotypes are closer to germline
The variation in copy number between different sequences within every T cell sub- compartment that we studied ranged by over a factor of 10,000. The only property identified that correlated with copy number was the number of insertions + the number of deletions (with an inverse correlation).
Example 9: Estimation of total CDR3 sequence diversity
The number of unique CDR3 sequences observed in each lane of the GA flow cell routinely exceeded 1 x 105. Given that the PCR products sequenced in each lane were necessarily derived from a small fraction of the T cell genomes present in each of the two donors, the total number of unique TCR/3 CDR3 sequences in the entire T cell repertoire of each individual is likely to be far higher. Estimating the number of unique sequences in the entire repertoire, therefore, requires an estimate of the number of additional unique CDR3 sequences that exist in the blood but were not observed in the sample. The estimation of total species diversity in a large, complex population using measurements of the species diversity present in a finite sample has historically been called the "unseen species problem", an analytic solution to which was developed over 60 years ago. The solution starts with determining the number of new species, or TCR 3 CDR3 sequences, that are observed if the experiment is repeated, i.e., if the GA
sequencing is repeated on an identical sample of peripheral blood T cells. For our purposes, this amounts to sequencing an identically prepared library of TCR/3 CDR3 PCR products in a different lane of the GA flow cell and counting the number of new CDR3 sequences. For CD8+CD45RO~ cells from donor 2, the predicted and observed number of new CDR3 sequences in a second lane are within 5%, suggesting that this analytic solution can, in fact, be used to estimate the total number of unique TCR 3 CDR3 sequences in the entire repertoire.
The resulting estimates of the total number of unique TCR/3 CDR3 sequences in the four flow cytometrically-defined T cell compartments are shown in Table 2.
Table 2: TCR repertoire diversity
Donor CD 8 CD4 CD45RO Diversity GA*
1 + - + 6.3*105 2
1 + - - 1.24*106 2
1 - + + 8.2*105 2
1 - + - 1.28*106 2
2 + - + 4.4* 105 1
2 + - - 9.7*105 1
2 - + + 8.7*105 2
2 - + - 1.03*106 2 Of note, the total TCR/3 diversity in these populations is between 4-5 million unique sequences in the peripheral blood. Surprisingly, the CD45RO+, or antigen-experienced, compartment constitutes approximately 1.5 million of these sequences. This is at least an order of magnitude larger than expected. This discrepancy is likely attributable to the large number of these sequences observed at low relative frequency, which could only be detected through deep sequencing. The estimated TCR3 CDR3 repertoire sizes of each compartment in the two donors are within 20% of each other.
II. Seven Donor Study
Example 10: Sample acquisition, PBMC isolation, FACS sorting and genomic DNA extraction
A 100 blood samples of 200 ml whole blood from each individual are collected and antibody titers are measured at several laboratories. DNA is sequenced and amplified as described in Example 1.
The characteristics and demographic information of the donors was identified, and included: HLA-A, -B, -C typing data, and Epstein-Barr virus (EBV) serostatus
"+" indicates detectable serum IgG against the viral capsid antigen of EBV. Donors 1 and 3 were full siblings and the daughters of donor 2; the other four donors were unrelated. Table 3 : Donor characteristics
Figure imgf000024_0001
The serum tubes are processed according methods standard and known in the art for each of the serology assays specified in Table 3 above. PBMCs are isolated from the 50 ml whole blood aliquots on a Ficoll gradient. Cryopreserved PBMCs are prepared and can be screened for tetramer positive T cells for a specific antigen. Buffy coat DNA is isolated from the second syringe, for use in screening for specific sequences in the nai've cells. The third syringe of whole blood is processed on the AutoMACS magnetic bead cell separator to isolate approximately 106 memory B cells (CD 19+, CD27+), 106 memory CD4 T cells (CD4+, CD45RO+) and 106 memory CD 8 T cells (CD8+ CD45RO+). DNA is prepared from each of the three cell subpopulations: B cell and CD4 T cell DNAs is sequenced.
HLA typing: Because CD4 sequences are included in the adaptive profile, high resolution typing at the HLA-DRB1 gene, to identify carriers of high frequency alleles at this locus, which may constrain the CD4 memory sequences is performed.
To profile past and ongoing exposures, not only acute exposures, only titer information for each pathogen in the IgG class of antibodies is analyzed. The IgM class is typically associated with very recent exposures. In order to maximize the statistical power for associating exposure status with adaptive immune profile, a panel of pathogenic exposures of between 10% and 90% seroprevalent is expected. These are detailed in Table 11. The panel encompasses 13 pathogens, and 40 serological subtypes.
Table 4: pathogen serology
Figure imgf000025_0001
* Seroprevalence data ( Mandell, Douglas, Bennett, 7th edition 9) and Taylor and Blaser, Epidemiol Rev 13, 42 (1991).
Example 11 : Explicit generation of TCR 3 CDR3 sequences
The potential β chain repertoire in humans has been calculated as 1011 possible amino acid sequences. Much of the predicted TCR/3 diversity derives from non-templated nucleotide insertions at the V-D and D-J junctions. The prediction of six insertions at each of the two junctions, for a total of 12 non-templated nucleotides. The cumulative distribution of TCR sequences is plotted as a function of number of insertions for the naive and memory compartments, respectively, from seven donors. Although examples of beta chains with higher numbers of insertions were observed, these were rare; less than 5% of human TCR/3 chain sequences had 12 or more insertions. Surprisingly, over 10% of all observed TCR/3 chain sequences had either zero or one inserted nucleotide, and almost half of the TCR/3 sequences in each of our donors had five or fewer total insertions, including both junctions. This shows that a large fraction of the TCRj3 repertoire is sampled from a small fraction of the 1011 possible sequences.
To determine the full set of TCR 3 sequences with a fixed maximum number of insertions, the explicit generation the sequences for models with different numbers of insertions was performed, allowing for deletions of zero to ten nucleotides at each of the recombination signal sequences. Upon comparing these generated sequence lists to the real sequence data, it was found over half of all TCR/3 sequences in the population are contained on the list of sequences generated from a model with five or fewer insertions. The total number of sequences on this list is ~2*108, so it appears that more than half of the TCR/3 repertoire is sampled from less than 0.2% of the 1011 possible sequences in a model allowing 12 or fewer insertions. The small effective size of the potential TCR repertoire implies that the overlap of the cellular adaptive immune system between individuals is far greater than present dogma would suggest.
All possible TCR/3 CDR3 nucleotide sequences containing a total of N = 0, 1, 2, 3, ..., 7 nucleotide insertions at the V/3D/3 and D/3J/3 junctions were systematically generated with the model of VDJ rearrangement. The model was governed by a set of rules that were empirically determined from the observed TCR/3 CDR3 sequence data from the seven donors. The model allowed all possible V-D-J combinations, deletion of up to 10 nucleotides from the 3' end of the V/3 segment and from the 5' end of the J/3 segment, deletion of nucleotides from the 5' and 3' ends, up to complete deletion, of the D/3 segment, and up to 7 total junctional insertions. The total length of the CDR3 sequence, defined as the interval from the codon for the conserved cysteine at the 3' end of the V/3 gene segment to the codon for the conserved phenylalanine in the 5' portion of the J/3 gene segment, was constrained such that it could encode from 9 to 23 amino acid residues. Generation of the CDR3 amino acid sequences containing a total of 7 junctional insertions was performed on a 50-node linux cluster with 8 CPUs per node, and required 24 hours to complete. Each CDR3 nucleotide sequence was translated into the corresponding amino acid sequence and then stored in a binary tree organized by alphabet. For each new sequence not found in the tree, a new leaf was created with that sequence. For each sequence already found in the tree, the count for the corresponding leaf was incremented by one.
Example 12: Utilization of V-J Usage
To determine whether the CD8+ TCRp CDR3 repertoire in each individual is randomly sampled from the set of 5 x 1011 possible sequences, explicit enumeration of the complete set of CDR3 sequences predicted by a model of VDJ recombination that allowed up to ten nucleotide deletions from the ends of the Vp, Dp and Jp gene segments adjacent to the RSS, followed by insertion of a total of up to seven non-templated nucleotides at the Vp-Dp and Dp-Jp junctions was done. The model allowed the total CDR3 length to range from 27 to 69 nucleotides, encoding 9 to 23 amino acids, consistent with the experimentally observed sequence data.
Generation of all unique CDR3 amino acid sequences containing a total of 7 or fewer nucleotide insertions at the Vp-Dp and Dp-Jp junctions required approximately 10,000 cpu hours on a 2.3 Ghz processor (Robins supra). Comparison of the set of sequences observed in the naive and memory CD8+ repertoires of each of the seven donors with the full set of sequences generated by the model and revealed that 51.1 ± 3.5% of all the sequences observed in the naive CD8+ compartment of each donor are found in the subset of 5.7 x 10 predicted sequences containing six or fewer total insertions. Thus, the majority of TCRp CDR3 amino acid sequences are drawn from a defined subset of less than 0.11% of the estimated 5 x 1011 possible TCRp sequences.
Much of the predicted diversity in the TCRp CDR3 repertoire is generated by non- templated nucleotide insertions at the Vp-Dp and Dp-Jp junctions. The cumulative distribution of TCRp CDR3 sequences observed in the CD8+ naive and memory compartments, respectively, of the seven donors as a function of the number of junctional insertions demonstrates that sequences with 12 or more insertions were observed, but constitute only 10% of the total. In contrast, more than 10% of the observed sequences had zero, one, or two insertions, and 50% of the sequences in each donor had six or fewer total insertions at the two junctions.
Example 13: Determining the Overlap
The pairwise overlap of the na'ive CD8+ subsets of the seven donors predicted by our model of TCR J VDJ rearrangement is 1.44 x 104 + 1.66 x 103 sequences which agrees closely with the overlap calculated between all 21 possible pairs of the seven individuals studied (Figure 5A). CDR3 sequences that were shared between individuals had fewer inserted nucleotides than the mean number observed across the entire repertoire.
Calculation of expected overlap between TCR/3 CDR3 amino acid sequence repertoires
(1) Expected overlap if all 5 x 1011 TCRp CDR3 amino acid sequences are equally likely: The general calculation is in section (2), below. Here n ¾ 1 * 10s is the total number of unique TCRp CDR3 amino acid sequences in the naive CD8+ compartment and f is n/(S * 1011).
E[0] * n *
Therefore, the expected overlap was approximately 2 sequences, substantially less than the observed > 10,000.
(2) Estimate of expected overlap from the empirical distribution of sequences by number of insertions: The expected overlap O is the sum of the expected overlaps from the sequences with different numbers of insertions labeled by k.
Figure imgf000028_0001
An explicit generation of the set of all TCRp sequences with a fixed number k of insertions (Table 1 , Example 1) was called 7*. An empirical determination of the fraction ^of the number of TCRp CDR3 sequences found in blood carrying A; junctional insertions was made. Multiplying ffe * m≡ -nk, where n is the total number of TCRp CDR3 sequences in the naive
CD8+ compartment as determined in (2), resulted in τ¾, the number of unique TCRp CDR3 sequences in the blood with fixed number of insertions k.
The estimate of the number of sequences with k insertions that overlap between two individuals was equivalent to the problem of drawing ¾ elements twice from a total distribution with Tk elements and determining the number of matches. The assumption was that each sequence with a fixed number of insertions is equally likely.) The expectation was determined from an approximately binomial distribution:
Figure imgf000028_0002
We calculate this using the generating function:
M(t) = e* fc'Ci - f*)"^1 = lAe*+ Ci - /*)]«*
Figure imgf000029_0001
Inserting the empirical data, it was determined that E[0] ¾ 1.54 * lO4, which is consistent with the observed pairwise overlap.
Example 14: Estimation of the actual overlap between different TCR/3 CDR3 repertoires
Previously established a lower bound for the total number of unique TCRP CDR3 amino acid sequences expressed in the na'ive and memory CD8+ T cells circulating in the peripheral blood of a healthy adult (H. S. Robins, et al., Blood 114, 4099-4107 (2009)). Only a fraction of these sequences can be observed experimentally, because the volume of blood that can be sampled at one time represents a small fraction of the total blood volume. The sets of TCRP CDR3 sequences that are observed experimentally in each donor are enriched for sequences expressed in CD8+ T cell clones that are common in the blood. Estimating the CDR3 sequence overlap between two entire T-cell repertoires must therefore take into account the shape of the empirically observed distribution to avoid over-estimation. To estimate the total number of TCRP CDR3 sequences that are shared between the CD8+ T cell compartments of two individuals, a non-linear regression approach was used in which the observed overlap data with a simple two parameter model, Y = aXb , where x is the input variable described below, y is the number of overlapping sequences, and (a, b) are parameters to be estimated from the data.
First the unique TCRP CDR3 amino acid sequences encoded by the in-frame were sorted, read-though TCRP CDR3 nucleotide sequences observed in each donor in descending order according to their observed relative frequency. Let N ' . be the total number of amino acid sequences in donor j, ( j = 1 , 2), and let ntj be the top (or most frequent) ni sequences from donor j, where i indexes the ith sequence. Then define the ith input variable, Xt = nn * ni2 , as the area equal to the maximum possible number of overlaps between nn and ni2 sequences. The outcome Y{ is the observed number of overlaps between in the nn sequences from donor 1 and the nn sequences from donor 2. The model parameters (a,b) are chosen to minimize the error between the observed number of overlaps and the regression model using the non-linear least squares method. Previously identified ~1 x 106 as a lower bound for the number of unique TCR CDR3 amino acid sequences in the naive CD8+ T cell compartment of a healthy adult was applied (Robins supra.) To estimate the minimum overlap between the entire nai've CD8+ repertoires of two donors, compute Y = aXb for i = 1 x 106 and therefore Xi = nn * nj2 = 1 x
12 "H
10 . To estimate the overlap between the nai've and memory CD8 compartments of a single donor, however, use = 1 x 106 for the naive compartment and i = 3 x 105 for the memory compartment, since the lower bound for the number of unique TCRp CDR3 amino acid sequences in the memory CD8+ compartment is ~3 x 105 (2). The calculated overlaps between the nai've and memory CD8+ TCRp CDR3 sequence repertoires for the full set of 7!/(2!5!)=21 pairwise comparisons are summarized in Figure 1.
Example 15: Differential VJ Usage
The frequency with which specific Vp-Jp combinations were utilized was highly variable in each of the seven individuals. Although every possible Vp-Jp combination was observed, the frequency of specific combinations varied by more than 10,000-fold. Vp-Jp utilization was surprisingly consistent between individuals, especially for the rare Vp-Jp pairs, as reflected by the fact that the variance in Vp-Jp utilization was proportional to mean utilization. A fraction of the TCRp CDR3 sequences in the genomic DNA from the nai've and memory CD8+ T cells of each of the seven donors was predicted to generate out-of-frame TCRp transcripts that do not encode functional TCRP chains (Table 5). The Vp-Jp utilization in the out-of-frame CDR3 sequences was highly non-uniform and qualitatively similar to that observed for in- frame transcripts. The variability of Vp-Jp utilization in the out-of- frame CDR3 sequences cannot be attributed to positive or negative selection in the thymus of T cells bearing specific receptors, because these sequences do not generate proteins that participate in the selection process. The similarity in the utilization of specific Vp-Jp combinations in out-of- frame, nonfunctional and in- frame, functional TCRp transcripts therefore suggests that the variability in Vp-Jp utilization in both sets of sequences is attributable, at least in part, to mechanisms that operate before the stage of thymic selection.
The observed frequencies of specific Vp-Dp-Jp combinations suggest that rearrangement between Vp and Dp gene segments is random, while that between Dp and Jp gene segments is not. The apparent non-random association between specific Dp and Jp gene segments is likely attributable to the organization of the TCRP locus, in which Dp i lies 5' of all 13 Jp segments, while Dp2 lies 3' of the 6 members of the Jpl cluster but 5' of the 7 members of the Jp2 cluster. The Dpi segment is observed at roughly equal frequency with all 13 Jp's, while Dp2 is much more frequently paired with members of the Jp2 compared with the Jpl family. Dp2 is observed with members of the Jpl family about a third (.30+/-.05) as often as would be expected if the pairing were random.
Table 5
Figure imgf000031_0001
Example 16: Epstein Barr Virus exposure-specific clonotvpe sequences
A wide variety of pathogens are endemic to the human population, some of which are quite pervasive. A well studied example is the Epstein-Barr virus (EBV). A common, public TCR that responds to EBV in HLA B8 individuals has been described, with the V|S CDR3 sequence CASSLGQAYEQYF, derived from V#, D# and J# with # inserted nucleotides (Argaet et ah, J Exp Med 750.-2335-4O, 1994, hereby incorporated by reference in its entirety). A2 specific clonotype sequences found with high frequency were:
CSARDRTGNGYT (SEQ ID. NO:71)
CSARDRVGNTIY (SEQ ID. NO:72)
CSARIGVGNTIY (SEQ ID. NO:73)
CSVGTGGTNEKLF (SEQ ID. NO:74)
A B8 specific sequence found with high frequency was CASSLGQAYEQYF (SEQ ID. NO: 1167 Of the seven donors, this TCR/3 clonotype was observed in the na'ive CD* compartment of two individuals, but in the memory compartment of only one donor, where it accounted for over 1% of all clonotype sequences. In the study a comparison of our donors' sequences with results from a study that identified 50 TCRs that interact with known EBV epitopes presented by HLA A-2 tetramers was performed. Four individuals matched at least one of these sequences in their naive compartments. The two donors in our study that carried HLA A2 were the only individuals with matches to the public EBV responses in their memory compartment, and each of these two matched three of the 50 sequences. Thus, a total of four to seven HLA-restricted, public EBV responses were observed in our sample, and few had less than five inserted nucleotides.
Mice deficient for terminal deoxynucleotidyl transferase, the enzyme that catalyzes the template-independent insertion of nucleotides at the junctions, have 10-fold less diversity in their TCR CDR3 repertoires, with few insertions, yet these mice appear healthy, make efficient and specific immune responses, and display no increased susceptibility to infection (Gilfillan, et al, Eur J Immunol 25, 3115-3122 (1995).; Cabaniols,et al, J Exp Med 194, 1385-1390 (2001). Sequences with less insertions and deletions have receptor sequences closer to germ line. One possibility for the increased number of sequences closer to germ line is that they are the created multiple times during T cell development, and that V/3, D/3, and J/3 segment sequences contributing to recurrently generated TCRs were be subject to evolutionary pressures favoring sequences recognizing antigens from common pathogens, as these sequences are present in the germline. Since germ line sequences are shared between people, it was hypothesized that shared TCR/3 chains are created by TCRs with a small number of insertions and deletions. Components of the CD8+ T cell response to ubiquitous pathogens such as Epstein Barr virus (EBV) are characterized by highly conserved TCR/3 CDR3 amino acid sequences that are found in multiple individuals and encoded by nucleotide sequences with few junctional insertions (Venturi,et al. Nat Rev Immunol 8, 231-238 (2008); Argaet,et al, J Exp Med 180, 2335-2340 (1994), Venturi, et al, J Immunol 181, 7853-7862 (2008)).
Example 17: Correlations with Disease
To find a correlation with disease, investigation of 12 "public" TCRp CDR3 sequences that have been associated with the CD8+ response to EBV in individuals who express either HLA-A*0201 or HLA-B*0801 and detected 5 HLA-A*0201-associated, EBV-specific CDR3 sequences in the memory CD8+ compartments of donors 1 and/or 3, both of whom are HLA- A*0201+, and an HLA-A*0801-associated, EBV-specific CDR3 sequence in the memory compartment of donor 7, who is HLA-B*0801+ was made. None of these responses were detected in the other four donors, all of whom were HLA-A*0201~ and HLA-B*0801~. The observation of the HLA- A* 0201- and HLA-B* 0801 -associated, EBV-specific CDR3 sequences only in the three donors expressing one of the associated HLA alleles was statistically significant (P = 0.0002 by two-tailed Fisher exact test;).
Evaluation of correlation between observation of "public" EBV-associated TCRp CDR3 sequences and expression of the corresponding class I MHC allele.
A two-sided Fisher Exact Test was utilized to test the hypothesis that there is a correlation in our sequence data between the observation of 12 "public", i.e., found in multiple individuals, EBV-associated TCRP CDR3 amino acid sequences reported in the literature and expression of their associated class I MHC restricting elements. Previous studies have identified at least 1 1 public TCRp CDR3 amino acid sequences used by CD8+ T cells specific for EBV- encoded peptides presented by HLA-A*0201 (Venturi, et al, J Immunol 181, 7853-7862 (2008)), and one public CDR3 sequence specific for an EBV-encoded peptide presented by HLA-B*0801 (Argaet supra). The seven individuals studied included two individuals expressing HLA-A*0201 and one individual expressing HLA-B*0801 (Table 1, Example 1). Thus, there were 2*11=22 CDR3 sequence:HLA-A*0201 combinations and one CDR3 sequence:HLA- B*0801 combination possible in the dataset that would support a correlation between the observation of a public EBV-specific CDR3 sequence and expression of the associated MHC restricting allele. Similarly, there were 5*11=55 possible CDR3 sequence:HLA-A*0201 combinations and 6*1=6 possible CDR3 sequence:HLA-B*0801 combinations, for a total of 61 combinations, that would not support a correlation. Five of the public EBV-specific CDR3 sequences associated with HLA- A* 0201 in the memory CD8+ compartments of one or both of the YiLA- A* 0201 -positive donors but none of the five HLA-A*0201 -negative donors, and observed the public EBV-specific CDR3 sequence associated with HLA-B*0801 in the memory CD8+ compartment of the sole HLA-B* 080l-positive donor but in none of the HLA-B*0801- negative donors were observed. Thus, 6 of the 23 possible public CDR3 sequence:MHC allele combinations that would support a correlation, and none of the 61 possible public CDR3 sequence:MHC allele combinations that would not support a correlation, were observed in the data. To test the null hypothesis of no correlation, the Fisher Exact Test was used which calculates a p-value of 0.0002, and provided confidence to reject the null hypothesis.
Table 6: False positive rates for sample size number of fraction with "public" N sharing "public" P(false
cases BCR BCR positive)
30 25% 7.5 0.0119
50 25% 12.5 0.0172
70 25% 17.5 0.1946
30 50% 15 1.43E-06
50 50% 25 2.98E-06
70 50% 35 3.79E-04
Table 6 presents the probability of finding a false positive as a function of signal size. For the expected range of signal, a proposed study is sufficiently powered to limit false positives and to detect any sequence that is found in 20%» or more the cases. Note that Table 6 is the marginal calculation for each disease. Taking all the diseases together, a multiple hypothesis test correction is needed. The Benjamini Hochberg FDR analysis is sufficient.
III. Additional Testing
Example 18: Diabetes
A study in Type 1 diabetic patients was done using protocols described above. Type 1 diabetes associates strongly with the class 2 HLA DRB 1 *03/04 genotype, so we sorted for CD4 memory (CD4+, CD45RO+) cells, and screened the CD4 memory cells for public sequences shared between three cases of T1D. These "public" TCR sequences were found in less than 5 of 10 HLA matched controls without T1D. DNA from all 13 donors was sequenced for both memory and naive genotypes. The clonotype sequences that distinguish the T1D cases from the controls are shown as SEQ ID. NOS: 75-763.
Example 19: Multiple Sclerosis
Similar to the example above, CD4 memory cells were sorted from three MS cases carrying the DRB1 *1501 allele, as well as three HLA-matched controls, to identify public sequences that might be enriched among MS cases. A set of "public" T cells were selected as TCR sequences found in all three cases and no more than one of the three controls. DNA from all donors was sequenced for both memory and na'ive genotypes. The clonotype sequences that distinguish the MS cases from the controls are shown as SEQ ID. NOS: 764-1166.

Claims

What is Claimed:
1. A method for identifying a biomarker for a disease comprising:
a) providing isolated polynucleotide sequences from immune cells from a group of patients with the disease;
b) performing a nucleotide amplification reaction to produce a set of clonotype sequences;
c) identifying clonotype sequences enriched within the group of patients;
d) providing isolated polynucleotide sequences from immune cells from a group of normal subjects without the disease, and amplifying the polynucleotides;
e) removing sequences present in the normal subject group, which are obtained in step d), from the exposure-specific clonotype sequences, which are obtained in step c).
2. The method of claim 1, wherein the polynucleotide sequences are rearranged genomic sequences.
3. The method according to claim 1, wherein the nucleotide amplification reaction comprises:
(i) a multiplicity of V-segment primers, wherein each primer comprises a sequence that is complementary to a single functional V- segment or a small family of V- seqments; and
(ii) a multiplicity of J-segment primers, wherein each primer comprises a sequence that is complementary to a J segment;
wherein the V-segment and J-segment primers amplify a TCR CDR3 region.
4. The method according to claim 3, wherein the V- segments are selected from the group consisting of Va, νβ, Vy and V6, and the J-segments are selected from the group consisting of
Figure imgf000035_0001
5. The method of claim 3, wherein the immune cell samples are selected from the group consisting of CD45RO+, CD45RAint/neg CD8+ T cells and CD45RO" CD45RAhi CD62Lhi CD8+ T cells.
6. The method of claim 5, wherein the CD 8 cells are CD45RO+, CD45RAmt/neg CD8+ T cells.
7. The method of claim 5, wherein the T cells share one or more HLA alleles.
8. The method according to claim 1, wherein the nucleotide amplification reaction comprises:
(i) a multiplicity of V-segment primers, wherein each primer comprises a sequence that is complementary to a single functional V-segment or a small family of V-seqments; and
(ii) a multiplicity of J primers, wherein each primer comprises a sequence that is complementary to a J-segment,
wherein the V-segment and J-segment primers amplify an IGH , IGL or IGK CDR3 region.
9. The method of claim 3 or 8, wherein the V-segment primers (forward primers) are anchored at a position between 40 and 60 base pairs 5' of the recombination signal sequence (RSS), within the V segment.
10. The method of claim 3 or 8, wherein the J-segment primers (reverse primers) are at a position about 30 base pair 3' of the J gene RSS site.
11. The method of claim 1, wherein the exposure-specific clonotype sequences have an insertion of less than six nucleotides.
12. The method of claim 1, wherein the disease is selected from the group consisting of an autoimmune disease, an inflammatory disease, an immune deficiency, a bacterial infection, a viral infection, a fungal infection, and a parasitic infection.
13. The method of claim 1, further comprising an additional step prior to step (a) of isolating polynucleotide sequences from the immune cell samples.
14. The method of claim 13, wherein the polynucleotide sequences are isolated from an immune cell sample, wherein the sample is a tissue comprising hematopoietic lineage cells.
15. A biomarker produced by the method of claim 1.
16. A polypeptide encoded by the exposure-specific clonotype sequence of claim 15.
17. The polypeptide of claim of 16, wherein the polypeptide is antibody.
18. A diagnostic kit for detecting a disease or risk for a disease comprising one or more biomarkers produced by claim 1.
19. The diagnostic kit of claim 18, wherein the biomarker is selected from the group consisting of a polynucleotide biomarker, a labeled polypeptide biomarker, and an antibody biomarker.
20. The diagnostic kit of claim of 18, wherein the biomarker detects a disease or risk of a disease selected from the group consisting of an autoimmune disease, an inflammatory disease, an immune deficiency, a bacterial infection, a viral infection, a fungal infection, a parasite infection, and a prenatal disease.
A method for detecting a disease or a risk for a disease in a subject comprising:
a) selecting a biomarker for the disease according to the method of claim 1 , wherein the biomarker consists of at least one exposure-specific clonotype sequence;
b) providing isolated polynucleotide sequences from immune cells of the subject, and amplifying the polynucleotides,
c) determining if the biomarker is present in the amplified polynucleotides of the subject.
22. The method of claim 21 , further comprising determining that the presence of the biomarker in the amplified polynucleotides of the subject is indicative of the disease or the risk of the disease.
The method of claim 21, wherein the clonotype sequences are genomic sequences. A method for detecting a disease or a risk of a disease in subject in a subject comprising: a) selecting a biomarker for the disease according to the method of claim 16;
b) detecting the presence of the biomarker in the polypeptide sequences of the subject.
25. The method of claim 24, further comprising determining that the presence of the biomarker in the polypeptide sequences of the subject is indicative of the disease or the risk of the disease.
26. A method for selecting a biomarker for diabetes or a risk for Type I diabetes comprising selecting a biomarker according to the method of claim 1 , wherein the exposure-specific clonotype sequences are selected from the group consisting of SEQ ID. NOS: 75 to 763.
27. A method for detecting diabetes Type I or a risk for diabetes Type I in a subject comprising:
a) selecting a disease-specific biomarker according to claim 26, wherein the biomarker consists of a disease specific clonotype sequence or a panel of such sequences; b) providing isolated polynucleotide sequences from immune cells of the subject, and amplifying the polynucleotides,
c) determining if the of the diagnostic clonotype sequences are present in the amplified polynucleotides of the subject.
28. A method for detecting multiple sclerosis or a risk for multiple sclerosis in a subject comprising selecting a biomarker according to the method of claim 1 , wherein the exposure- specific clonotype sequences are selected from the group consisting of SEQ ID. NOS: 764 to 1166.
29. A method for detecting multiple sclerosis or a risk for multiple sclerosis in a subject comprising:
a) selecting a disease-specific biomarker according to claim 28, wherein the biomarker consists of a disease specific clonotype sequence or a panel of such sequences; b) providing isolated polynucleotide sequences from immune cells of the subject, and amplifying the polynucleotides,
c) determining if the of the diagnostic clonotype sequences are present in the amplified polynucleotides of the subject.
PCT/US2011/026373 2010-02-25 2011-02-25 Use of tcr clonotypes as biomarkers for disease WO2011106738A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US30826110P 2010-02-25 2010-02-25
US61/308,261 2010-02-25

Publications (2)

Publication Number Publication Date
WO2011106738A2 true WO2011106738A2 (en) 2011-09-01
WO2011106738A3 WO2011106738A3 (en) 2011-12-01

Family

ID=43901343

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2011/026373 WO2011106738A2 (en) 2010-02-25 2011-02-25 Use of tcr clonotypes as biomarkers for disease

Country Status (1)

Country Link
WO (1) WO2011106738A2 (en)

Cited By (62)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8236503B2 (en) 2008-11-07 2012-08-07 Sequenta, Inc. Methods of monitoring conditions by sequence analysis
EP2567226A1 (en) * 2010-05-06 2013-03-13 Sequenta, Inc. Monitoring health and disease status using clonotype profiles
WO2013086450A1 (en) 2011-12-09 2013-06-13 Adaptive Biotechnologies Corporation Diagnosis of lymphoid malignancies and minimal residual disease detection
WO2013169957A1 (en) 2012-05-08 2013-11-14 Adaptive Biotechnologies Corporation Compositions and method for measuring and calibrating amplification bias in multiplexed pcr reactions
WO2013188831A1 (en) 2012-06-15 2013-12-19 Adaptive Biotechnologies Corporation Uniquely tagged rearranged adaptive immune receptor genes in a complex gene set
US8628927B2 (en) 2008-11-07 2014-01-14 Sequenta, Inc. Monitoring health and disease status using clonotype profiles
US8691510B2 (en) 2008-11-07 2014-04-08 Sequenta, Inc. Sequence analysis of complex amplicons
WO2014055561A1 (en) 2012-10-01 2014-04-10 Adaptive Biotechnologies Corporation Immunocompetence assessment by adaptive immune receptor diversity and clonality characterization
US8748103B2 (en) 2008-11-07 2014-06-10 Sequenta, Inc. Monitoring health and disease status using clonotype profiles
US20140186848A1 (en) * 2011-10-21 2014-07-03 Fred Hutchinson Cancer Research Center Quantification of Adaptive Immune Cell Genomes in a Complex Mixture of Cells
WO2014145992A1 (en) 2013-03-15 2014-09-18 Adaptive Biotechnologies Corporation Uniquely tagged rearranged adaptive immune receptor genes in a complex gene set
CN104560978A (en) * 2015-01-20 2015-04-29 中国人民解放军第三军医大学 Multiplex-polymerase chain reaction (PCR) primer and method for constructing human B cell receptor (BCR) heavy-chain library based on high-throughput sequencing
US9043160B1 (en) 2009-11-09 2015-05-26 Sequenta, Inc. Method of determining clonotypes and clonotype profiles
WO2016069886A1 (en) 2014-10-29 2016-05-06 Adaptive Biotechnologies Corporation Highly-multiplexed simultaneous detection of nucleic acids encoding paired adaptive immune receptor heterodimers from many samples
US9365901B2 (en) 2008-11-07 2016-06-14 Adaptive Biotechnologies Corp. Monitoring immunoglobulin heavy chain evolution in B-cell acute lymphoblastic leukemia
US9394567B2 (en) 2008-11-07 2016-07-19 Adaptive Biotechnologies Corporation Detection and quantification of sample contamination in immune repertoire analysis
WO2016138122A1 (en) 2015-02-24 2016-09-01 Adaptive Biotechnologies Corp. Methods for diagnosing infectious disease and determining hla status using immune repertoire sequencing
WO2016144996A1 (en) * 2015-03-09 2016-09-15 Cb Biotechnologies, Inc. Method for identifying disease-associated cdr3 patterns in an immune repertoire
WO2016161054A1 (en) * 2015-04-01 2016-10-06 Pharmacyclics Llc Massive parallel primer dimer-mediated multiplexed single cell-based amplification for concurrent evaluation of multiple target sequences in complex cell mixtures
WO2016161273A1 (en) 2015-04-01 2016-10-06 Adaptive Biotechnologies Corp. Method of identifying human compatible t cell receptors specific for an antigenic target
US9499865B2 (en) 2011-12-13 2016-11-22 Adaptive Biotechnologies Corp. Detection and measurement of tissue-infiltrating lymphocytes
US9506119B2 (en) 2008-11-07 2016-11-29 Adaptive Biotechnologies Corp. Method of sequence determination using sequence tags
US9528160B2 (en) 2008-11-07 2016-12-27 Adaptive Biotechnolgies Corp. Rare clonotypes and uses thereof
US9708657B2 (en) 2013-07-01 2017-07-18 Adaptive Biotechnologies Corp. Method for generating clonotype profiles using sequence tags
US9809813B2 (en) 2009-06-25 2017-11-07 Fred Hutchinson Cancer Research Center Method of measuring adaptive immunity
US10066265B2 (en) 2014-04-01 2018-09-04 Adaptive Biotechnologies Corp. Determining antigen-specific t-cells
US10077478B2 (en) 2012-03-05 2018-09-18 Adaptive Biotechnologies Corp. Determining paired immune receptor chains from frequency matched subunits
US10150996B2 (en) 2012-10-19 2018-12-11 Adaptive Biotechnologies Corp. Quantification of adaptive immune cell genomes in a complex mixture of cells
US10246701B2 (en) 2014-11-14 2019-04-02 Adaptive Biotechnologies Corp. Multiplexed digital quantitation of rearranged lymphoid receptors in a complex mixture
US10323276B2 (en) 2009-01-15 2019-06-18 Adaptive Biotechnologies Corporation Adaptive immunity profiling and methods for generation of monoclonal antibodies
EP3498866A1 (en) 2014-11-25 2019-06-19 Adaptive Biotechnologies Corp. Characterization of adaptive immune response to vaccination or infection using immune repertoire sequencing
US10385475B2 (en) 2011-09-12 2019-08-20 Adaptive Biotechnologies Corp. Random array sequencing of low-complexity libraries
US10428325B1 (en) 2016-09-21 2019-10-01 Adaptive Biotechnologies Corporation Identification of antigen-specific B cell receptors
CN112143777A (en) * 2020-08-18 2020-12-29 北京臻知医学科技有限责任公司 Primer group for constructing CDR3 region high-throughput sequencing library of human TCR beta and application thereof
US10941396B2 (en) 2012-02-27 2021-03-09 Becton, Dickinson And Company Compositions and kits for molecular counting
US10954570B2 (en) 2013-08-28 2021-03-23 Becton, Dickinson And Company Massively parallel single cell analysis
US11220685B2 (en) 2016-05-31 2022-01-11 Becton, Dickinson And Company Molecular indexing of internal sequences
USRE48913E1 (en) 2015-02-27 2022-02-01 Becton, Dickinson And Company Spatially addressable molecular barcoding
US11248253B2 (en) 2014-03-05 2022-02-15 Adaptive Biotechnologies Corporation Methods using randomer-containing synthetic molecules
US11254980B1 (en) 2017-11-29 2022-02-22 Adaptive Biotechnologies Corporation Methods of profiling targeted polynucleotides while mitigating sequencing depth requirements
US11319583B2 (en) 2017-02-01 2022-05-03 Becton, Dickinson And Company Selective amplification using blocking oligonucleotides
US11332776B2 (en) 2015-09-11 2022-05-17 Becton, Dickinson And Company Methods and compositions for library normalization
US11365409B2 (en) 2018-05-03 2022-06-21 Becton, Dickinson And Company Molecular barcoding on opposite transcript ends
US11390921B2 (en) 2014-04-01 2022-07-19 Adaptive Biotechnologies Corporation Determining WT-1 specific T cells and WT-1 specific T cell receptors (TCRs)
US11390914B2 (en) 2015-04-23 2022-07-19 Becton, Dickinson And Company Methods and compositions for whole transcriptome amplification
US11460468B2 (en) 2016-09-26 2022-10-04 Becton, Dickinson And Company Measurement of protein expression using reagents with barcoded oligonucleotide sequences
US11492660B2 (en) 2018-12-13 2022-11-08 Becton, Dickinson And Company Selective extension in single cell whole transcriptome analysis
US11525157B2 (en) 2016-05-31 2022-12-13 Becton, Dickinson And Company Error correction in amplification of samples
US11535882B2 (en) 2015-03-30 2022-12-27 Becton, Dickinson And Company Methods and compositions for combinatorial barcoding
US11639517B2 (en) 2018-10-01 2023-05-02 Becton, Dickinson And Company Determining 5′ transcript sequences
US11649497B2 (en) 2020-01-13 2023-05-16 Becton, Dickinson And Company Methods and compositions for quantitation of proteins and RNA
US11661631B2 (en) 2019-01-23 2023-05-30 Becton, Dickinson And Company Oligonucleotides associated with antibodies
US11661625B2 (en) 2020-05-14 2023-05-30 Becton, Dickinson And Company Primers for immune repertoire profiling
EP4201954A1 (en) * 2021-12-22 2023-06-28 Christian-Albrechts-Universität zu Kiel Proteins and t-cells involved in chronic inflammatory diseases
US11739443B2 (en) 2020-11-20 2023-08-29 Becton, Dickinson And Company Profiling of highly expressed and lowly expressed proteins
US11773436B2 (en) 2019-11-08 2023-10-03 Becton, Dickinson And Company Using random priming to obtain full-length V(D)J information for immune repertoire sequencing
US11773441B2 (en) 2018-05-03 2023-10-03 Becton, Dickinson And Company High throughput multiomics sample analysis
US11845986B2 (en) 2016-05-25 2023-12-19 Becton, Dickinson And Company Normalization of nucleic acid libraries
US11932849B2 (en) 2018-11-08 2024-03-19 Becton, Dickinson And Company Whole transcriptome analysis of single cells using random priming
US11932901B2 (en) 2020-07-13 2024-03-19 Becton, Dickinson And Company Target enrichment using nucleic acid probes for scRNAseq
US11939622B2 (en) 2019-07-22 2024-03-26 Becton, Dickinson And Company Single cell chromatin immunoprecipitation sequencing assay
US11970737B2 (en) 2019-08-26 2024-04-30 Becton, Dickinson And Company Digital counting of individual molecules by stochastic attachment of diverse labels

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4366241A (en) 1980-08-07 1982-12-28 Syva Company Concentrating zone method in heterogeneous immunoassays
US4376110A (en) 1980-08-04 1983-03-08 Hybritech, Incorporated Immunometric assays using monoclonal antibodies
US4517288A (en) 1981-01-23 1985-05-14 American Hospital Supply Corp. Solid phase system for ligand assay
US4837168A (en) 1985-12-23 1989-06-06 Janssen Pharmaceutica N.V. Immunoassay using colorable latex particles
WO2010151416A1 (en) 2009-06-25 2010-12-29 Fred Hutchinson Cancer Research Center Method of measuring adaptive immunity

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7375211B2 (en) * 2005-11-18 2008-05-20 Kou Zhong C Method for detection and quantification of T-cell receptor Vβ repertoire

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4376110A (en) 1980-08-04 1983-03-08 Hybritech, Incorporated Immunometric assays using monoclonal antibodies
US4366241A (en) 1980-08-07 1982-12-28 Syva Company Concentrating zone method in heterogeneous immunoassays
US4366241B1 (en) 1980-08-07 1988-10-18
US4517288A (en) 1981-01-23 1985-05-14 American Hospital Supply Corp. Solid phase system for ligand assay
US4837168A (en) 1985-12-23 1989-06-06 Janssen Pharmaceutica N.V. Immunoassay using colorable latex particles
WO2010151416A1 (en) 2009-06-25 2010-12-29 Fred Hutchinson Cancer Research Center Method of measuring adaptive immunity

Non-Patent Citations (28)

* Cited by examiner, † Cited by third party
Title
"Antibodies, A Laboratory Manual", 1988, COLD SPRING HARBOR LABORATORY
"Current Protocols in Molecular Biology", vol. 1, 1994, JOHN WILEY & SONS, INC.
ARGAET ET AL., JEXP MED, vol. 180, 1994, pages 2335 - 40
ARGAET, J EXP MED, vol. 180, 1994, pages 2335 - 2340
CABANIOLS, J EXP MED, vol. 194, 2001, pages 1385 - 1390
DIAMANDIS: "Immunoassay", 1996, ACADEMIC PRESS, INC.
FREEMAN ET AL., GENOME RES, vol. 19, 2009, pages 1817 - 1824
GILFILLAN ET AL., EUR J IMMUNOL, vol. 25, 1995, pages 3115 - 3122
H. S. ROBINS ET AL., BLOOD, vol. 114, 2009, pages 4099 - 4107
KIBBE, NUCLEIC ACID RES., vol. 35, no. 2, 2007, pages 43 - 46
KWOH ET AL., AM. BIOTECHNOL. LAB., vol. 8, 1989, pages 14
KWOH ET AL., PROC. NATI. ACAD. SCI. USA, vol. 86, 1997, pages 1173 - 1177
LEFRANC, M.-P. ET AL., IN SILICO BIOL., vol. 5, 2004, pages 0006
LEFRANC, M.-P. ET AL., NUCLEIC ACIDS RES., vol. 33, 2005, pages D593 - D597
LEFRANC, M.-P. ET AL., NUCLEIC ACIDS RESEARCH, vol. 27, 1999, pages 209 - 212
LEFRANC, M.-P. ET AL., NUCLEIC ACIDS RESEARCH, vol. 37, 2009, pages D1006 - D1012
LEFRANC, M.-P., NUCLEIC ACIDS RES., vol. 31, 2003, pages 307 - 310
LEFRANC, M.-P., NUCLEIC ACIDS RESEARCH, vol. 29, 2001, pages 207 - 209
LIZARDI ET AL., BIOTECHNOLOGY, vol. 6, 1988, pages 1197 - 1202
MALEK ET AL., METHODS MOL. BIOL., vol. 28, 1994, pages 253 - 260
ROBINS ET AL., BLOOD, vol. 114, 2009, pages 4099 - 4107
RUIZ, M. ET AL., NUCLEIC ACIDS RESEARCH, vol. 28, 2000, pages 219 - 221
SAMBROOK ET AL.: "Molecular Cloning: A laboratory Manual", 1989
VENTURI ET AL., J IMMUNOL, vol. 181, 2008, pages 7853 - 7862
VENTURI ET AL., JLMMUNOL, vol. 181, 2008, pages 7853 - 7862
VENTURI ET AL., NAT REV IMMUNOL, vol. 8, 2008, pages 231 - 238
VENTURI, NAT REV IMMUNOL, vol. 8, 2008, pages 231 - 238
WANG ET AL., PROC NATL ACAD SCI USA, vol. 107, 2010, pages 1518 - 1523

Cited By (116)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10246752B2 (en) 2008-11-07 2019-04-02 Adaptive Biotechnologies Corp. Methods of monitoring conditions by sequence analysis
US9365901B2 (en) 2008-11-07 2016-06-14 Adaptive Biotechnologies Corp. Monitoring immunoglobulin heavy chain evolution in B-cell acute lymphoblastic leukemia
US11001895B2 (en) 2008-11-07 2021-05-11 Adaptive Biotechnologies Corporation Methods of monitoring conditions by sequence analysis
US8507205B2 (en) 2008-11-07 2013-08-13 Sequenta, Inc. Single cell analysis by polymerase cycling assembly
US8795970B2 (en) 2008-11-07 2014-08-05 Sequenta, Inc. Methods of monitoring conditions by sequence analysis
US10865453B2 (en) 2008-11-07 2020-12-15 Adaptive Biotechnologies Corporation Monitoring health and disease status using clonotype profiles
US10760133B2 (en) 2008-11-07 2020-09-01 Adaptive Biotechnologies Corporation Monitoring health and disease status using clonotype profiles
US8628927B2 (en) 2008-11-07 2014-01-14 Sequenta, Inc. Monitoring health and disease status using clonotype profiles
US8691510B2 (en) 2008-11-07 2014-04-08 Sequenta, Inc. Sequence analysis of complex amplicons
US10519511B2 (en) 2008-11-07 2019-12-31 Adaptive Biotechnologies Corporation Monitoring health and disease status using clonotype profiles
US8748103B2 (en) 2008-11-07 2014-06-10 Sequenta, Inc. Monitoring health and disease status using clonotype profiles
US8236503B2 (en) 2008-11-07 2012-08-07 Sequenta, Inc. Methods of monitoring conditions by sequence analysis
US9394567B2 (en) 2008-11-07 2016-07-19 Adaptive Biotechnologies Corporation Detection and quantification of sample contamination in immune repertoire analysis
US11021757B2 (en) 2008-11-07 2021-06-01 Adaptive Biotechnologies Corporation Monitoring health and disease status using clonotype profiles
US9528160B2 (en) 2008-11-07 2016-12-27 Adaptive Biotechnolgies Corp. Rare clonotypes and uses thereof
US10155992B2 (en) 2008-11-07 2018-12-18 Adaptive Biotechnologies Corp. Monitoring health and disease status using clonotype profiles
US10266901B2 (en) 2008-11-07 2019-04-23 Adaptive Biotechnologies Corp. Methods of monitoring conditions by sequence analysis
US9347099B2 (en) 2008-11-07 2016-05-24 Adaptive Biotechnologies Corp. Single cell analysis by polymerase cycling assembly
US9416420B2 (en) 2008-11-07 2016-08-16 Adaptive Biotechnologies Corp. Monitoring health and disease status using clonotype profiles
US9523129B2 (en) 2008-11-07 2016-12-20 Adaptive Biotechnologies Corp. Sequence analysis of complex amplicons
US9217176B2 (en) 2008-11-07 2015-12-22 Sequenta, Llc Methods of monitoring conditions by sequence analysis
US9228232B2 (en) 2008-11-07 2016-01-05 Sequenta, LLC. Methods of monitoring conditions by sequence analysis
US9512487B2 (en) 2008-11-07 2016-12-06 Adaptive Biotechnologies Corp. Monitoring health and disease status using clonotype profiles
US9506119B2 (en) 2008-11-07 2016-11-29 Adaptive Biotechnologies Corp. Method of sequence determination using sequence tags
US10323276B2 (en) 2009-01-15 2019-06-18 Adaptive Biotechnologies Corporation Adaptive immunity profiling and methods for generation of monoclonal antibodies
US9809813B2 (en) 2009-06-25 2017-11-07 Fred Hutchinson Cancer Research Center Method of measuring adaptive immunity
US11214793B2 (en) 2009-06-25 2022-01-04 Fred Hutchinson Cancer Research Center Method of measuring adaptive immunity
US9043160B1 (en) 2009-11-09 2015-05-26 Sequenta, Inc. Method of determining clonotypes and clonotype profiles
EP3144673B1 (en) 2010-05-06 2019-07-10 Adaptive Biotechnologies Corporation Monitoring lymphoid neoplasm status using clonotype profiles
EP3144673A1 (en) * 2010-05-06 2017-03-22 Adaptive Biotechnologies Corporation Monitoring lymphoid neoplasm status using clonotype profiles
EP2567226A4 (en) * 2010-05-06 2013-08-28 Sequenta Inc Monitoring health and disease status using clonotype profiles
EP2567226A1 (en) * 2010-05-06 2013-03-13 Sequenta, Inc. Monitoring health and disease status using clonotype profiles
US10385475B2 (en) 2011-09-12 2019-08-20 Adaptive Biotechnologies Corp. Random array sequencing of low-complexity libraries
US20140186848A1 (en) * 2011-10-21 2014-07-03 Fred Hutchinson Cancer Research Center Quantification of Adaptive Immune Cell Genomes in a Complex Mixture of Cells
US20150051089A1 (en) * 2011-10-21 2015-02-19 Adaptive Biotechnologies Corporation Quantification of Adaptive Immune Cell Genomes in a Complex Mixture of Cells
US9279159B2 (en) 2011-10-21 2016-03-08 Adaptive Biotechnologies Corporation Quantification of adaptive immune cell genomes in a complex mixture of cells
US9181590B2 (en) * 2011-10-21 2015-11-10 Adaptive Biotechnologies Corporation Quantification of adaptive immune cell genomes in a complex mixture of cells
US9181591B2 (en) * 2011-10-21 2015-11-10 Adaptive Biotechnologies Corporation Quantification of adaptive immune cell genomes in a complex mixture of cells
WO2013086450A1 (en) 2011-12-09 2013-06-13 Adaptive Biotechnologies Corporation Diagnosis of lymphoid malignancies and minimal residual disease detection
EP3388535A1 (en) 2011-12-09 2018-10-17 Adaptive Biotechnologies Corporation Diagnosis of lymphoid malignancies and minimal residual disease detection
US9824179B2 (en) 2011-12-09 2017-11-21 Adaptive Biotechnologies Corp. Diagnosis of lymphoid malignancies and minimal residual disease detection
EP3904536A1 (en) 2011-12-09 2021-11-03 Adaptive Biotechnologies Corporation Diagnosis of lymphoid malignancies and minimal residual disease detection
AU2012347460B2 (en) * 2011-12-09 2017-05-25 Adaptive Biotechnologies Corporation Diagnosis of lymphoid malignancies and minimal residual disease detection
US9499865B2 (en) 2011-12-13 2016-11-22 Adaptive Biotechnologies Corp. Detection and measurement of tissue-infiltrating lymphocytes
US11634708B2 (en) 2012-02-27 2023-04-25 Becton, Dickinson And Company Compositions and kits for molecular counting
US10941396B2 (en) 2012-02-27 2021-03-09 Becton, Dickinson And Company Compositions and kits for molecular counting
US10077478B2 (en) 2012-03-05 2018-09-18 Adaptive Biotechnologies Corp. Determining paired immune receptor chains from frequency matched subunits
WO2013169957A1 (en) 2012-05-08 2013-11-14 Adaptive Biotechnologies Corporation Compositions and method for measuring and calibrating amplification bias in multiplexed pcr reactions
US9371558B2 (en) 2012-05-08 2016-06-21 Adaptive Biotechnologies Corp. Compositions and method for measuring and calibrating amplification bias in multiplexed PCR reactions
US10214770B2 (en) 2012-05-08 2019-02-26 Adaptive Biotechnologies Corp. Compositions and method for measuring and calibrating amplification bias in multiplexed PCR reactions
US9150905B2 (en) 2012-05-08 2015-10-06 Adaptive Biotechnologies Corporation Compositions and method for measuring and calibrating amplification bias in multiplexed PCR reactions
US10894977B2 (en) 2012-05-08 2021-01-19 Adaptive Biotechnologies Corporation Compositions and methods for measuring and calibrating amplification bias in multiplexed PCR reactions
WO2013188831A1 (en) 2012-06-15 2013-12-19 Adaptive Biotechnologies Corporation Uniquely tagged rearranged adaptive immune receptor genes in a complex gene set
US11180813B2 (en) 2012-10-01 2021-11-23 Adaptive Biotechnologies Corporation Immunocompetence assessment by adaptive immune receptor diversity and clonality characterization
US10221461B2 (en) 2012-10-01 2019-03-05 Adaptive Biotechnologies Corp. Immunocompetence assessment by adaptive immune receptor diversity and clonality characterization
WO2014055561A1 (en) 2012-10-01 2014-04-10 Adaptive Biotechnologies Corporation Immunocompetence assessment by adaptive immune receptor diversity and clonality characterization
EP3330384A1 (en) 2012-10-01 2018-06-06 Adaptive Biotechnologies Corporation Immunocompetence assessment by adaptive immune receptor diversity and clonality characterization
EP3640343A1 (en) 2012-10-01 2020-04-22 Adaptive Biotechnologies Corporation Immunocompetence assessment by adaptive immune receptor diversity and clonality characterization
US10150996B2 (en) 2012-10-19 2018-12-11 Adaptive Biotechnologies Corp. Quantification of adaptive immune cell genomes in a complex mixture of cells
WO2014145992A1 (en) 2013-03-15 2014-09-18 Adaptive Biotechnologies Corporation Uniquely tagged rearranged adaptive immune receptor genes in a complex gene set
US9708657B2 (en) 2013-07-01 2017-07-18 Adaptive Biotechnologies Corp. Method for generating clonotype profiles using sequence tags
US10077473B2 (en) 2013-07-01 2018-09-18 Adaptive Biotechnologies Corp. Method for genotyping clonotype profiles using sequence tags
US10526650B2 (en) 2013-07-01 2020-01-07 Adaptive Biotechnologies Corporation Method for genotyping clonotype profiles using sequence tags
US11702706B2 (en) 2013-08-28 2023-07-18 Becton, Dickinson And Company Massively parallel single cell analysis
US10954570B2 (en) 2013-08-28 2021-03-23 Becton, Dickinson And Company Massively parallel single cell analysis
US11618929B2 (en) 2013-08-28 2023-04-04 Becton, Dickinson And Company Massively parallel single cell analysis
US11248253B2 (en) 2014-03-05 2022-02-15 Adaptive Biotechnologies Corporation Methods using randomer-containing synthetic molecules
US10435745B2 (en) 2014-04-01 2019-10-08 Adaptive Biotechnologies Corp. Determining antigen-specific T-cells
US11261490B2 (en) 2014-04-01 2022-03-01 Adaptive Biotechnologies Corporation Determining antigen-specific T-cells
US11390921B2 (en) 2014-04-01 2022-07-19 Adaptive Biotechnologies Corporation Determining WT-1 specific T cells and WT-1 specific T cell receptors (TCRs)
US10066265B2 (en) 2014-04-01 2018-09-04 Adaptive Biotechnologies Corp. Determining antigen-specific t-cells
EP3715455A1 (en) 2014-10-29 2020-09-30 Adaptive Biotechnologies Corp. Highly-multiplexed simultaneous detection of nucleic acids encoding paired adaptive immune receptor heterodimers from many samples
US10392663B2 (en) 2014-10-29 2019-08-27 Adaptive Biotechnologies Corp. Highly-multiplexed simultaneous detection of nucleic acids encoding paired adaptive immune receptor heterodimers from a large number of samples
WO2016069886A1 (en) 2014-10-29 2016-05-06 Adaptive Biotechnologies Corporation Highly-multiplexed simultaneous detection of nucleic acids encoding paired adaptive immune receptor heterodimers from many samples
US10246701B2 (en) 2014-11-14 2019-04-02 Adaptive Biotechnologies Corp. Multiplexed digital quantitation of rearranged lymphoid receptors in a complex mixture
US11066705B2 (en) 2014-11-25 2021-07-20 Adaptive Biotechnologies Corporation Characterization of adaptive immune response to vaccination or infection using immune repertoire sequencing
EP3498866A1 (en) 2014-11-25 2019-06-19 Adaptive Biotechnologies Corp. Characterization of adaptive immune response to vaccination or infection using immune repertoire sequencing
CN104560978A (en) * 2015-01-20 2015-04-29 中国人民解放军第三军医大学 Multiplex-polymerase chain reaction (PCR) primer and method for constructing human B cell receptor (BCR) heavy-chain library based on high-throughput sequencing
WO2016138122A1 (en) 2015-02-24 2016-09-01 Adaptive Biotechnologies Corp. Methods for diagnosing infectious disease and determining hla status using immune repertoire sequencing
EP3591074A1 (en) 2015-02-24 2020-01-08 Adaptive Biotechnologies Corp. Methods for diagnosing infectious disease and determining hla status using immune repertoire sequencing
US11047008B2 (en) 2015-02-24 2021-06-29 Adaptive Biotechnologies Corporation Methods for diagnosing infectious disease and determining HLA status using immune repertoire sequencing
EP3262196A4 (en) * 2015-02-24 2018-11-21 Adaptive Biotechnologies Corp. Methods for diagnosing infectious disease and determining hla status using immune repertoire sequencing
USRE48913E1 (en) 2015-02-27 2022-02-01 Becton, Dickinson And Company Spatially addressable molecular barcoding
WO2016144996A1 (en) * 2015-03-09 2016-09-15 Cb Biotechnologies, Inc. Method for identifying disease-associated cdr3 patterns in an immune repertoire
US11535882B2 (en) 2015-03-30 2022-12-27 Becton, Dickinson And Company Methods and compositions for combinatorial barcoding
US11041202B2 (en) 2015-04-01 2021-06-22 Adaptive Biotechnologies Corporation Method of identifying human compatible T cell receptors specific for an antigenic target
WO2016161054A1 (en) * 2015-04-01 2016-10-06 Pharmacyclics Llc Massive parallel primer dimer-mediated multiplexed single cell-based amplification for concurrent evaluation of multiple target sequences in complex cell mixtures
WO2016161273A1 (en) 2015-04-01 2016-10-06 Adaptive Biotechnologies Corp. Method of identifying human compatible t cell receptors specific for an antigenic target
US11390914B2 (en) 2015-04-23 2022-07-19 Becton, Dickinson And Company Methods and compositions for whole transcriptome amplification
US11332776B2 (en) 2015-09-11 2022-05-17 Becton, Dickinson And Company Methods and compositions for library normalization
US11845986B2 (en) 2016-05-25 2023-12-19 Becton, Dickinson And Company Normalization of nucleic acid libraries
US11525157B2 (en) 2016-05-31 2022-12-13 Becton, Dickinson And Company Error correction in amplification of samples
US11220685B2 (en) 2016-05-31 2022-01-11 Becton, Dickinson And Company Molecular indexing of internal sequences
US10428325B1 (en) 2016-09-21 2019-10-01 Adaptive Biotechnologies Corporation Identification of antigen-specific B cell receptors
US11460468B2 (en) 2016-09-26 2022-10-04 Becton, Dickinson And Company Measurement of protein expression using reagents with barcoded oligonucleotide sequences
US11467157B2 (en) 2016-09-26 2022-10-11 Becton, Dickinson And Company Measurement of protein expression using reagents with barcoded oligonucleotide sequences
US11782059B2 (en) 2016-09-26 2023-10-10 Becton, Dickinson And Company Measurement of protein expression using reagents with barcoded oligonucleotide sequences
US11319583B2 (en) 2017-02-01 2022-05-03 Becton, Dickinson And Company Selective amplification using blocking oligonucleotides
US11254980B1 (en) 2017-11-29 2022-02-22 Adaptive Biotechnologies Corporation Methods of profiling targeted polynucleotides while mitigating sequencing depth requirements
US11773441B2 (en) 2018-05-03 2023-10-03 Becton, Dickinson And Company High throughput multiomics sample analysis
US11365409B2 (en) 2018-05-03 2022-06-21 Becton, Dickinson And Company Molecular barcoding on opposite transcript ends
US11639517B2 (en) 2018-10-01 2023-05-02 Becton, Dickinson And Company Determining 5′ transcript sequences
US11932849B2 (en) 2018-11-08 2024-03-19 Becton, Dickinson And Company Whole transcriptome analysis of single cells using random priming
US11492660B2 (en) 2018-12-13 2022-11-08 Becton, Dickinson And Company Selective extension in single cell whole transcriptome analysis
US11661631B2 (en) 2019-01-23 2023-05-30 Becton, Dickinson And Company Oligonucleotides associated with antibodies
US11939622B2 (en) 2019-07-22 2024-03-26 Becton, Dickinson And Company Single cell chromatin immunoprecipitation sequencing assay
US11970737B2 (en) 2019-08-26 2024-04-30 Becton, Dickinson And Company Digital counting of individual molecules by stochastic attachment of diverse labels
US11773436B2 (en) 2019-11-08 2023-10-03 Becton, Dickinson And Company Using random priming to obtain full-length V(D)J information for immune repertoire sequencing
US11649497B2 (en) 2020-01-13 2023-05-16 Becton, Dickinson And Company Methods and compositions for quantitation of proteins and RNA
US11661625B2 (en) 2020-05-14 2023-05-30 Becton, Dickinson And Company Primers for immune repertoire profiling
US11932901B2 (en) 2020-07-13 2024-03-19 Becton, Dickinson And Company Target enrichment using nucleic acid probes for scRNAseq
CN112143777A (en) * 2020-08-18 2020-12-29 北京臻知医学科技有限责任公司 Primer group for constructing CDR3 region high-throughput sequencing library of human TCR beta and application thereof
CN112143777B (en) * 2020-08-18 2022-07-01 北京臻知医学科技有限责任公司 Primer group for constructing CDR3 region high-throughput sequencing library of human TCR beta and application thereof
US11739443B2 (en) 2020-11-20 2023-08-29 Becton, Dickinson And Company Profiling of highly expressed and lowly expressed proteins
WO2023118489A1 (en) * 2021-12-22 2023-06-29 Christian-Albrechts-Universität Zu Kiel Proteins and T-cells involved in chronic inflammatory diseases
EP4201954A1 (en) * 2021-12-22 2023-06-28 Christian-Albrechts-Universität zu Kiel Proteins and t-cells involved in chronic inflammatory diseases

Also Published As

Publication number Publication date
WO2011106738A3 (en) 2011-12-01

Similar Documents

Publication Publication Date Title
WO2011106738A2 (en) Use of tcr clonotypes as biomarkers for disease
US11905511B2 (en) Method of measuring adaptive immunity
US20170335386A1 (en) Method of measuring adaptive immunity
DK2281065T3 (en) PROCEDURE TO EVALUATE AND COMPARE IMMUNE REPERTOIRS
WO2015134787A2 (en) Methods using randomer-containing synthetic molecules
WO2020056451A1 (en) Phenotypic and molecular characterisation of single cells
US11066705B2 (en) Characterization of adaptive immune response to vaccination or infection using immune repertoire sequencing
Nozuma et al. Immunopathogenic CSF TCR repertoire signatures in virus-associated neurologic disease
US20220259670A1 (en) Kit and method for analyzing single t cells
Sidahmed et al. A novel HLA-B* 18 allele, HLA-B* 18: 124, identified in a German volunteer bone marrow donor.
Gonçalves et al. B-Cell Gene Expression and Microbiota Prior Immunization Profile Vaccine Humoral Responsiveness
Delbos et al. A novel allele HLA-C* 07: 445 identified in a French hematopoietic stem cell donor.
Tu Recovery of T cell receptor variable sequences from 3'barcoded single-cell RNA sequencing libraries
WO2022207682A1 (en) Immune cell counting of sars-cov-2 patients based on immune repertoire sequencing
Kirpach Erforschung der Einsatzmöglichkeiten von B Zellen für die Diagnose der Borreliose: Investigation of the possible use of B cells for the diagnosis of acute Lyme disease
Becker et al. The novel alleles HLA-B* 44: 101 and HLA-B* 57: 48 of Caucasian origin are characterized by amino acid substitutions in the alpha 2 domain.
Ritter Illuminating T-cell repertoires in health and disease by ultra-deep sequencing
Kirpach Erforschung der Einsatzmöglichkeiten von B Zellen für die Diagnose der Borreliose: Investigation of the possible use of B cells for the diagnosis of acute Lyme disease

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11706728

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11706728

Country of ref document: EP

Kind code of ref document: A2