CA2185379A1 - Method for serial analysis of gene expression - Google Patents

Method for serial analysis of gene expression

Info

Publication number
CA2185379A1
CA2185379A1 CA002185379A CA2185379A CA2185379A1 CA 2185379 A1 CA2185379 A1 CA 2185379A1 CA 002185379 A CA002185379 A CA 002185379A CA 2185379 A CA2185379 A CA 2185379A CA 2185379 A1 CA2185379 A1 CA 2185379A1
Authority
CA
Canada
Prior art keywords
tag
sequence
oligonucleotide
tags
ditags
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002185379A
Other languages
French (fr)
Inventor
Kenneth W. Kinzler
Bert Vogelstein
Victor E. Velculescu
Lin Zhang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
School of Medicine of Johns Hopkins University
Original Assignee
Kenneth W. Kinzler
Bert Vogelstein
Victor E. Velculescu
Lin Zhang
The Johns Hopkins University School Of Medicine
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US08/527,154 external-priority patent/US5695937A/en
Application filed by Kenneth W. Kinzler, Bert Vogelstein, Victor E. Velculescu, Lin Zhang, The Johns Hopkins University School Of Medicine filed Critical Kenneth W. Kinzler
Publication of CA2185379A1 publication Critical patent/CA2185379A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1096Processes for the isolation, preparation or purification of DNA or RNA cDNA Synthesis; Subtracted cDNA library construction, e.g. RT, RT-PCR
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6809Methods for determination or identification of nucleic acids involving differential detection
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Abstract

Serial analysis of gene expression, SAGE, a method for the rapid quantitative and qualitative analysis of transcripts is provided. Short defined sequence tags correspond-ing to expressed genes are isolated and analyzed. Sequencing of over 1,000 defined tags in a short period of time (e.g, hours) reveals a gene expression pattern characteris-tic of the function of a cell or tissue. Moreover, SAGE is useful as a gene discovery tool for the identification and isolation of novel sequence tags corresponding to novel transcripts and genes.

Description

PA~
An'OlWEY DOC~;F1 hlO: 07~65/078001 M~THO~ ~OR SER~AL ANALYS~S OF G~NE EX~RESSION

This inven~on was made w~.~ support from N~orlal Ins~tutes of Heal~ Glant Nos.
CA57345, CA35494, and GM07309. The Go~emment has certain nghts in t~is invention.

This applicatiorl is a conhn~ on-in-part applicahon of Serial No. 0~527,154, filed Septernber 12, 19~5.

Field of the ~nvention The present ~nvention relat~s generally to the field of gene e~ essio~ a~d specifically to a method for the serial analysis of gene expression (SA&E) ~or the analysis of a large number of ~anscripts by iden~fication of ~ defirled reg~on o~ a transcript which correspo~ds to a region of a~ eAprcssed gene.

Ba~l~round of the In~ention Dct~ ,;n~tion of the genomic se~uence of hig~er org~nisms, includ~ng hl~m~n~, is1~ now a rea~ and ~t~in~ble goal. However~ ~is analys~s only represents one ~evel of genetic complexity. The ordered and ~mely expression of genes represen~s anotherlevel of complexi~ e~ually im~o~lant to ~e ~ehni~on and bioIo~y of t~e Qr~

The role of sequencing compIeme~taIy DNA (cDl!JA), revers~ ~scribed ~om mRNA, as part of the human genome p~oject has been debated as proponents o~
genomic sequencing have ar~ed dle dif~lCUl~ of findin~ e~eIy mRNA expressed in all tissues, cell types, and developmental stages and have po~nted out that m~lch valuable informa~on ~om }n~onic and intergenic regions, includi~g con~ol and v ., r l ,~n oc n~-,~A~lJ.~ . C . !~ OU~;
2 1 ~5379 reglulatory seguences, will be misse~ ~y cr)NA sequencing (Report of ~e Comnuttee on Mapping and Sequencing the Hurnan Gcnom~, ~ational Acaderny Press, Wa~hin~ton, D.C., l 98~ equencing of tra~scribed regions of the genome ~Ising cr)~A librar~es has heretofore been consi~ered unsatiS~ctory. Libra~ies of cDNA are believed to be dominated by repe~hve elements, mitochondnal ge~es, ribosom~l RNrA genes, and other nuclear genes comprising common or hou~ekeep-ing sequences. It is ~elieved that cD~A libraries do not provide aIl sequences co~Tespon~ng to struct~ra~ and regulato~y pol~pephdes or pep~des (Putney, et al., ~V~ture, 302:718, 1983).

Ano~er dr~wbaGk of standard cDNA clo~g is that some mRNAs are abundant while ~ers ale rare. The celluIar ql-~n~es of rnRNA f~om various genes can vary by several orders of m~itude~

Techniques based on cDNA subtrac~on or differential display can be quite useful f~r CO~u~ g ge~e expre~sion dif~erences between t~o cell types ~Hedrick, e~ a~.,Ncrtu~e, 308:149, 1984; Liang and Pardee, Science, 257: 9G7, Igg2), but provide on~y ~ par~al anal~sis, w~ no direc~ infoml~tion reg~rding abun~ance of messenger RNA. The expressed sequerlce ta$ (EST) appro~ch has been shown to be a valuabIe tool f~r geIIe discoveIy (Ad~ns, et ~l., Science 252 1656, 199l; AdamLs, et al.,N~t2~re, 355:632, I992; Ohlbo et a~ crtu~e Genetics, 2: 173, 1992), but like Nor~em blo~nng. RN~se protect~on, and reverse transcnptase-polymerase chain reac~on ~RT-PCR) analysis ~Alwine, et al., P~oc. Nc~1 Acad Sci, U.S.~.. 74:5;50, 1977; Z~ et al., Cell, 34 865, lg~3, Veres, e~ al.~ Scrence, ~37:415, l9g7), on~y evaluates a l~mited number of genes at a time. In addihun~ ~e EST approach preferably employs ~ucleo~ide sequences of 1~0 base pairs or lvnger for simila~ity searches and mapping.

.7V ~ L~. ~o r~ o~ ~uYY l lSI~ & KIC~ARDSO.~i P.C. 1~.1007 Sequence tagged sites ~ST~s) (Olsor~ et al., Science, 245:1434, 1989) have also been u~ ed to identify gen~mic mar~ers for the physical mapping of tlle genome.
I~lese short seguences ~om physical~y mapped clones represent uniquely identified map posi~ons in th~ genome. Ln contrast, the iden~fication of ~ essed genes relies on expressed sequence tags which are markers for those genes ~c~ y ~anscnbed and expressed in vivo.

There is a need for a2n ~mproved me~od which allov s rapid, detaiIed analysis ofthousands of expressed genes for the inves~g~tion of a valiety of biological applicabons, particularly for establishing ~e o~erall paKern of gene express~on in di~erent cell types or in dle ~ame cell type u~der di:fferent physiologic or padlologic conditions. Identificat~ion of different patterns of expression has several utilities, including ~e i~çnti~C~hon of a~ iate ~erapeuhc targets, can~idate ~genes for ~ene ~erapy (e.g., gene rep~ cement)~ tissue typing, ~orensic identification, mapping locations of disease-asso~iate~ gen~s, and ~or the ide~ hfie~tion of diagrlostic and prognostic int~;c~to~rge~es OY, 11/96 UED 12:4~ F.'~ 619 ~7~ 509~ FISH & RIC~RDS~)N P.C. 1~008 2~537~

S~T~IM~Y OF THE INVENT~O~

The present invention prov~des a method for the rapid ar.alysis of numerous transcripts in order ~o iden~ ~e overa}l p~ttern of gene expression in different cel~
types or ~n the same cell type under different physiologic, de~elopmental or disease condi~ons. The me~od is based on the i~l~nllfic.atio~ of a short nucleotide sequence tag at a def;ned position in a messenger R~A. The tag is used to identify the corTesponding transcnpt a~d gene ~om which it was ~ranscIibed. By ~ltili7Tn~
dimerized tags, tenned a "ditag", the method of the invention allows eliminabon of certain types of ~ias which might occur d~g cloning andlor amplification and possibly during d~ta evaluaaon. Concat~n~ion of ~ese short nucleotide sequence tags allvws the ef~cient analysis of transcript~ ~ a senal m~nn~r by sequencing mulhple tags on a single DNA molecule, for exarnple, a D~A molecule inserted in a vector or in a s~r.gle clone.

The method descnbed herein is ~e serial analysis o~gene expression (SAGE), a novel approach which allows ~e analysis of a large number of ~anscripts. To demonstrate ~is strategy, short cDNA sequence ~ags were generated from mRNA
isolated from pancreas, r~n(tomly pai~ed ta form ditags, coIIc~tenated, and cloned.
Manual sequencing of 1,000 tags revealed a gene exprcssion p~ttern charactelistic of pancrea~c ~c~on. I~e~lhfic~tion of such pa~tems is ir~portant ~iagnostically and 2~ ~erapeu1ic~11y, for example Moreover, ~he use of SAGE as a gene discovery tool was documented by the identification and isola~on of new pancre~tic ~anscripts co~respond~ng to novel tags. SAGE provides ~ ~r~dly applicable means for ~e q~ re catalog~ng and compalison o~ expressed genes in a var~ety of noTm~l, developmEntal, and disense states.

Ll ~ V~ r.~ ff / ~ ~UYY FISI~ & RIC~IARDSON P. C . !~ OO9 BRIEF nF.SCRlPT~ON OF THE DR~WING~.

FIGURE 1 shows a sçh~ c of SAGE . The first restriction e~e~ or anchoring enzyme, is MaIII and ~e second enzyme, or tagging enzyTne, is FokI in this.
example. Sequences represent primer deri~ed sequences, and ~anscript derived sequences with "X" and "O" representi~g nuc3eotides of differ~nt tags.

FIG~RE 2 shows a c~ ison ~fkanscript abunda,~ce. Bar-. represent ~e percent ab~md~n~e as de~P,.~ ed by SAGE (dark bals) or hybrl~l.7~on allaIysis (light bars) SAGE qu~n~i~ti~n~ were derived ~om Table 1 as f~llows: TRYl/2 in~ludes ~e tags for ~ypsinogen 1 and 2, PROCAR indicates tags for procarboxypep~dase Al, CHYMO indic~tes tags for chymotryps;nogen, and ELAJPRO includes the ~ags for el~ct~cemB andproteaseE Errorbarsrepresentdle standard ~e~via~on det~nined by tal~ng ~he squ~re rwt of colmted e~ents and conver~ng it to a percent abundance (assumed POISSOn distnbu~on) FIGU~E 3 shows t~e results of scree~g a cDNA ~ibra}y wi~h SAGE tags. Pl alld 1~ P2 show typical hybridizahon resu~ts obtained with 13 bp oligonucleotides as described in ~e Examples. Pl and P2 colrespol~d to ~e ~ cripts described in Table 2. Images were obta~ned using a Molecular Dynamios PhosphorIrnager and the circle mdicates the outline of the ~Iter membral~e to which ~e recomb-n~nt phage were trans~erred pnor to hybridization.

FIGURE 4 is a block dia~ram of a tag code database access system in accordance wi~ the present invenhon.

vv v ~ u r~ D/~ `15H & RIC~ARDSON P.C. ~ûlO

DESCRIPTIOr~ OF THE; PREFERPCFl:l EM~Ol~IMFNTS

The present ~nven~on provides a rapid, gl~n~it~tive process for dete~g the abim~nce and n~ture of ~a~cripts co~esponding to expressed genes. The me~d, temled serial analy~is of gene expression (SAGE), is based on ~e identification o s an~ characterization of pa~ial, de~ned sequences of ~anscripts coITespond~ng to gene segments. These defined transcript sequence "tags'` are markers for genes which are expressed in a cell, a tissue, or an extract, for example.

SAGE is based on several principles. Filst, a short nucleotid~ sequence tag (9 to 10 bp) C0"~ S s~lf ici~ ro~ ation content to un~quely identi~ a ~ script provided it is isolated ~om a de~ned poSi~ion wit~ the ~anscript. Fo~ example, a seque~ceas short 8S 9 bp can distinguish 262,144 ~anscripts (4~) gi~e~ a random nucleotide distribution 2t ~e tag site, wl~eleas es~im~tes suggest dlat ~e human gen~me encodes about 80,000 to 200,~00 ~anscripts (Fields, et al., Nature Gene~ics, 7:345 1994). The size of the t~g can be shorter for lower euka~yotes or prokaryotes, for li ex~mple, where the number of ~anscripts encoded by ~e g~nome is lower. For example, a tag as short as 6-7 ~p m~y be s~lcient for distinguishing ~ansclipts in yeast.

Second, rdIldom dimeri~ation of tags allows a procedure for reduc~ng bias (caused by amplific~1io~i ~nd/or cloning). Third, concate~t;oll of ~ese short se~uence tags a~ows t~e e~icient ana~ysis of t:ransclipts in a serial m~nner ~y sequencing multiple tags Wi~fh~Il a single vector or clone. As with serial commuI~ica~on by computers, wherein inforrna~on is ~ansmitted as a cont~nuous s~ing of data, serial analysis of ~he se~uence tags ~equ*es a means to establish ~e ~egister and bou~daries of each tag. All of ~ese p~ciples may be applied indepç~ntly, in combina~ion, or in com~inat~o~ u~ o~er hlown me~ods of sequence i~ t;fication.

V ~ u rir~ u ~ 7~ 5Ug9 FISEI & RIC~A~DSON P. C. [~
21 ~5379 -?-In a first embod~ment, the inven~on provides a method ~or the detection of gene expression in a par~icula;r cell or tissue, or ce11 extract, for example, including at a particular developrn~ont~l stage or in a par~cular disease state. The method comprises producing complementary deoxyribonuclelc acid (c~NA) oligon2;cleotides, isola~ing a first ~efined nucleotide se~uence tag from ~ first c~NA oligonuçleotide and a se~ond def~Led nucleotide sequence tag ~om a second cI)NA oligonucleotide, linl~ng d~e fi~st ~g to a first oligonucleo~de linker, wherein the first oligonucleotide linker comprises a first sequence for hybridization of an amplification primer and linkin~ ~e se~ond tag to a second oligonucleohde linker, wherein ~e second oligonucleo~ide linker comprises a second sequ~ce for hy~ri-li7~inn of a~
amplifica~on pIimer, and de~rmin;n~ ~e nucleo~de sequence of ~e tag(s), wherein ~e tag(s) co,lc;,~ond to an expressed gene.

Figure 1 shows a schem~c representation of ~e analysis of messenger RNA
TA) us~Ilg SAGE as described in ~e method of ~e inven~on. mRNA is isolated from a ~ell or tissue of il.Ler~sl for tn vitro synthesis of a double-stranded l:)NA
se~uence by reverse transcription of the mRNA. The double-s~anded DNA
complement of mRNA formed is referred to as complement~y (cDNA~

The telm ;'oli~o~lcleohde" as llsed here~n refers to primers or oligomer ~ rnrntc comprised of two or mor~ deoxynb~m~rleQtides or ribonucleotides, pre~erably morethan ~ree. The exact size will depend on many factors~ which in tum depend vn the ul~imate fimction or use of the oligonucleohde.

The me~od fi~er includes ~ hn~ the first t2g linked to ~e ~rst oligonucleotide linker to the second tag linlced to the second o~igonucleotide linker and fonning a "ditag". Each ditag represents h~o def~ed nucleotide sequences o~at least one ~anscript, representative of at least one gene. Typically, a ditag represents two U nr~ r~.~ n~ u~Y FISK & ~IC1~4RDSO:M P. C . I~J 012 21~5379 transcnpts ~om two distinct genes. The presence of a define :1 cDNA tag within the di~g is indicative of express~on of a ~ene having a sequence of that tag.

The s~a~ysis of ditags, ~orrned prior to any amplificat.on step, provides a means to elimin~te pot~n~al distor~Qns in~oduced by amplification, e.g, PCR. The pa~ring of tags for the fo~nahon of ditags is a random event. The mlmber of di~rent tagsis expected to be large, therefore, the probability of any two tags bei~g coupled in the same ditag is small, even ~r abundant transcr~pts. Therefore, repeated ditags potentially produced by bi~sed standard amplification andJor clon~ng methods aree~.luded f~om 2nalysis 'oy ~e method of ~e in~ention.

The term "defined'~ nucleo~de sequence~ or "defined" llucleotide sequence tag, refers to a nucleotide sequence derived ~om ei~her Lle 5' or 3' t~ninl-c of a ~nC~rirt. The se~ence is rl~finer3 by cleavage w~h a first res~c~on endonllcle~ce, and re~r~e.l~s nucleotides ei~er 5' o~ 3' of the first restric~on endQmlclea5e site, depending on ~vbich le~us is used for capture (~.g., 3' when oligo-dT is used for 1~ cap~re as described herein).

As used here~n, ~e te~ns "res~iction endon~i~le~ses" and"restriction enzymes"
refer to bacte~al en2ymes which bind to a specific double-s~anded DNA sequence termed a recc~on site or recogni~on nucleo~de sequence, and cut double-s~anded ~NA at or near ~e specific recQgn~tiCn site.

The SrSt er~onu~le~ce~ te~ned "~nclmnn~ enzyme'~ or CLAE~ in ~igure 1, is se1ected by its abili~ to cleave a transcr.ipt at least ane time arld ~erefore produce a defined sequence tag firom either ~e 5' or 3' end Qf 2 transcript PreferabZ.y, a res~iction ~nllclease h~.ving at least one recogni~ion site and ~erefore hav~ng ~e abilit~ to cleave a majority of c~NAs is lhli7~ For example, as illus~ated herein, enzymes ~v~ ~ VV ~ r ~ 5 JJ~ Kl~EI.4Rl)~ P. C. ~1013 2 ~ 85379 .9.
whieh have a 4 base pair recognition site are expected to cleave every ~ bzse pairs (44) on average while most ~anscripts are consid~ra~ly larger. Res~ction en-donucleases which reco~ize a 4 base pair site ~nclude Nlal~l~ as exemplif~ed in the EXAMPI,ES of ~e present ~nvention. O~er similar endonucleases having at least one recognihon site u~i~n a DNA molecule (e.g., cDNA) ~11 be kno~n to those of skill in ~e a~t ~see for example, Currenr Ptotocols in Mvlecular BiO~Qf~, Vol. 2, 199~, Ed. Ausubel~ et af., Greene Publish. Assoc. & Wiley In~erscie~ce, Unit 3.1.15; New F~ n~ Biolabs Catalog, 1995).

After cleavage wi~ ~e anchonng enzyme, ~he most 5' o~ 3' region of ~e cleaved cDNA c~ then be isola~ed by ~inding to a ~apture medium. For example, as illustrated in the present EXAMPLES, ~lle~ ,idin beads are nsed to isolate ~e defined 3' nucleo~de sequenee tag when the oligo dT primer for cDNA synthesis isbiotinyla~ed. ~n ~is example, cIea~ ge w~ the filst or anchoring enzyme providesa uni~ue site on ea~h ~anscnpt which correspon~s to ~e restric~on site located closest to the poly-A tail. Likew~ise, the 5' cap of a ~ans~Iipt (the cDNA) ean be utili7e~ for labeling or bmtlin~ ~ caplu.e me~ns for isola~on of a 5' defined nucleo~de sequence tag. Those of skill in ~e art will know o~er similar capture systems (e.g., bio~nts~eptavidin, dîgoxigenin/an~-dig~x~genin~ for isolation of thc defined sequence tag as describ~d herein.

The invention is not limited to u~e of a s~gle "a~chorLng" or first res~ic~on endonuclease. It rnay be desira~le to perfonn dle me~od of ~e invenhon se~uen-tially, using different ~nzymes on separate samples of a preparation, in order to iden~ a comrlete p~ttern of ~scr~ption ~or a cell or ~ssue. I~ addi~on, the use of more ~n one anchoring en:zyme provides confiIlua~on of ~e expression p~ttern 2~ ob~ed ~om ~e first a~choring e~yme. Thcrefore, it is also envisioned that ~e f~st or allchoring endonucle~se may rsrely cut cDNA sueh ~at few or no c~)NA

.v~ ulr ~ u~ il & RIC~ARDSûN P.C. l~lûl4 21 8537~

.~o fc~"cse ~ abundant ~anscripts are cleaved. Thus, ~anscripts which are cleaved represent "wlique" tr~nscripts. Restnc~ion enzymes that haYe a 7-8 bp reco~itionsite for exarr.ple, would be enz~mes that would rarely cut cD N A. ~imilarly, more than one tagging enz~qne, described bel~w, can be utilized in order to identify a s complete patte~ of tr~nscription.

The term "isolated" as used herein includes polynucleotides subst~ntially ~ee ofother nucleic acids, prote~ns, lipids, carbohyd~ates or other m~terials with which it is naturally associated. cD~A is not naturally occun~g as such, but ra ~ er is obta`ined via manipulation o~ a par~ially purified naturally occu~ing r~NA.
Isolation of a d~fine~ sequence tag refers to ~e punfication of the ~' o~ 3' tag ~om other cIeaved cDNA.

~ one embodime~t, the isolated fl.o.fin~3 m~leo~de sequence tags are separated into two pools of cDNA, when dle li~ikers have di~eren~ sequences. Each pool is lig~ted v~a the anchoring, or first restriction endonuclease site to o~e of two Zinkers. When ~e linkers h~ve ~e same sequence, it is not nPce~C~ty to separate ~e tags into pools.
The first oligonucleotide linker comprises a first sequence for hybridization of an amplificatiorl pIimer and the second oligonucleo~e linker comprises a second se~uence for hy~n~1i7~*0n of an amplifica~i~n pr~mer. ~ addition, the ~inkers filrther co~nrrice a second rest~ic~o~ endonuclease site, also termed ~e "t~g~n~ enzyme"
or "TE". The me~od of ~e inYen~ion does not require, but pre~erably comprises ampli~ing the ditag oligonucleotide a~ter ligation.

The second restriction endonuclease cl~a~,es at a site distant f~om ar outside of t~e recognihon site. For eY~mrle, ~e second res~c~an endonuelease can be a type IIS
res~iction enzyme. Type IIS res~iction endonllcleases cleaYe at a de~ne~l distance up to 20 bp away ~om their ~symmetric recog~ ion sites (S~y~alski, W., Gene, U nL~ .) rA.~ D1~ 5(~ FISH & RIC~4.ROSONr P. C . b~ olJ
2 1 ~5379 40:169, 1985). Examples of ~e IIS res~ic~on endonucleases mclude BsmFI and FokI.Other similar en~ymes will be known to ~o~e of slcill ~n the art (see, Current Pro~ocols in Molecu~ar Biolo~y, supra) The first and second "linkers" which are llgated to the defined nucleo~de sequence tags are oligonucleotides havirlg ~e s~me or different nucleo~de sequences. For example, t~e l~nLers illus~ated in the Examples of the present ~nvention includelinl~ers ha~ing ~ l sequences:
5'-TTTTACCAGCTTATTCAATTCG~T(:~CTCTCGCACAGGGACATG -3' (SEQ II) N0: 1) 3~ TGC}TCGAATAAC;TI AAGCCAGGAGAGCGTGTCCCT ~5' (SEQ Il~ ~0:2) and 5'- 1-1 1 1 l GTAGACATTCTAGTATCTC&TCAAGT~GC;AAGGGACATG -3' (SEQ ID N0:3) 1~ 3'~ AACATC~TGTAAGATCATAGAGCAGTTCAGCCTTCCCr -5' (SEQ ID N0;4), whercin A ~s a dideo~y nucleo~de (e.g, dideoxy A). ~er simil~r linlcers can be llt~ ed in ~he me~od of the invention; ~ose of skill in ~e art can design such altern~te linkers.

The linkers are ~leci~ed so ~at cleavage of t~e ligahon products wi~ ~e seeond restn:c~on e~7yme, or t~e~in~ enzyme, results ~ release of ~e linker having a de~ed n~lcl~otid~ sequence ta~ (e.g., 3' o~e res~ction endonuclease cleaYage site as exemplified herein). The defined nucleotide se~uence tag may be ~om about 6 to 3û base pairs. Prefera~ly, ~e tag is about 9 tc 11 base pairs. There~orel ~ ditag is ~om about 12 ~o 60 base pairs, and preferably ~om 18 to 2~ b~se pairs.

~ L ~ U ~ ~ ~ U ~ ~ K l (;~ ;U~ 0 1 6 2 1 8537~

.,~
The pool of defined tags ligated to linkers having the same sequence, or dle ~wopools of defined nucleo~ide sequence tags ligated to linkers having different nucleo~de sequences, are randomly ligated to each o~er "tail to tail". The portion of ~e cDNA tag filrthest from the linker is refelTed to as dle "tail". As illus~aated in FIGURE 1, the ligated tag pair, or ditag, has a first res~iction endonuclease site ~ ~&l~ (5') and a first res~iction endonuclease site downs~eam (3') of ~e ditag;a second restnc~on endonuclease cleavage site upstream and downstream of ~e ditag, and a linker oligonucleotide cont~inin~ bo~ a second restriction en~ne recogn~tion si~e and an amplification primer hybridi~ation site upstream aI~d ~0 dow~ a~ of the ditag. In other words, the ditag is flanked by the first restric~on ~nl)clease site, ~e second res~ic~on endonuclease cleavage site and the lin~cers, lespc~,liv~ly The ditag c~ be amplified by ~ 7.in~ primers which specifically hybridize to onestrand of each l~nker. Preferably, the amplifica~on is perfo~ned by standard polymerase ch~ reac~on (PCR) me~ods as described (U.S. Patent No. 4,683,19~).
A}ternative~y, the ditags can be amplified by cloning ~n prokaryotic-&ompatible vectors or by o~er amplification methods knowD to ~ose of skill in the art.

The te~n "prime~' as used herein refers to an oligonucleotide, whe~er occurr~ng naturaIly or produced sy~thetically, which is capable of acting as a point of initiation of synthesis when placed under condinons in which synthesis of primer e~tension product which is complementa~y to a nucleic acid s~and is induced, i.e., in the p~esence of nucleotides and an agent for polymerization such as DNA polymerase and at a suitable temperature and pH. The p~imer is preferably single stranded for maximum efficiency in amplifie~tion. PreferabIy~ ~e primer is u~ oligodeoxy ribonucleotide. The primer must be sufficiently long to prirne the syn~esis of e~en~ion products in ~Le presence of ~e agent for polyn~n7~tion The exact lengths v ~ u r~ r JJ ~ r ~A ~l ~ D ~ ~ ;) UY Y l; I SEI & R I CE~4RDSûN P . C ~ 1~ û 17 2 I ~J53~79 of the pIirners ~ill depend on many factors, including temperature and source ofprimer.

The plimers herein are selected to be "subst~nti~lly" complementary to the different s~ands of each specifi~ sequence to be amplified. Thi5 means ~at the pr~mers must be sllffiçi~nt~y complçm~n~y to hybridize with ~eir respec~ve strands. There~ore, the primer seqllence need not reflect ~e exact sequence of the template. In ~e present inven~on, the primers are subs~n~lly comple~nent~ry to the oli~onucleo-tide linkers.

Plimers usefilI for amplificahon of dle linkers exempli~ied herein as SEQ ID NO: 1-4 include5l-CCAGCTTATTCAATTCGGTCC-3'(SEQIDNO:5)and 5'-GTAGACATTCTAGTATCTCGT-3'(SEQ ~ NO:6).Thoseofsk~l~ ~e~
can l)rc~a~e similar primers for smplific~ion based on the nucleotide se~uence of ~e linkers ~ithout undue experiment~t;on.

Cleavage of the amplified PCR produc~ wi~ the first restriction endonuclease 1~ allows isolatiorl of ditags which Gan be conca~en~te~ by liga~on. A~ter ligation, it may be ~it~ble to clone ~e con~ ..c~, al~ough it is not required in the method of~einvention. Analysis of ~e dit~gs or con~atemers~ whether or not amplification was p~- ru, ..~ç~ is bystandard seq~encing methods. Concate~ rs generally consist of about 2 to 200 ditags and preferably from about 8 to 20 ~tags. While these are ~lc~ d colls~lr .. ers, it w~ll be apl,are1~t ~at ~e number of ditags which can be concatenated will depend on ~e leng~ of ~e individual tags and can be readily determined by ~ose of skil1 in the art wi~out undue ~xperiment~hon. A~ter fonnaaon of concat~mers, multiple ~ags can be cloned into a vector for sequence analysis, or alternatively, ditags or conc~tenlers can be directly sequenced wi~out 2~ cloning by me~hods Imown to those of sl~ll in the art.

lJY/ll~Y~ 55 I-A~ ~19 1i7~ 5099 I-ISH & RICHARDS0.~ P.C. 1~018 2 1 8~79 Among ~e standard procedures for cloning the defined nucleotide sequence tags ofthe ~nvention is insertion of the tags i~to vectors such as pl:~cmi~1c or phage. The ditag or conc&lr~ of ditags produced by ~e me~od described herein are cloned into reco~bin~nt vectors ~or fi~er analysis, e.g., sequence analysis, plaque/plasmid hybridization using the tags as pro~es, by methods known to ~hose of skill in the art.

The ter,m "recom~in~nt vector" refers to a plasrni~ v~rus or n~er vehicle known ul the art that has been man~pulated by inserhon or ir~corpora~on of the ditag ~enetic seqllenc~c, Such ~rectors contain a p~omo~er sequence w,hich facilit~tes the efficient transcription o~ the a marker geneti~ sequence for e~;ample The vector ~pically cont~;nC an origin of replication, a promoter, as well as specific ~enes which allow phenotypic selection of the tr~ncfnnne~ cells. Vectore swtable for use in ~e present i~vention include for example7 pBlueScnpt (S~at~nP~ La Jolla, CA); pBC, pSL301 (~vitrogen) and o~er similar vectors known to ~ose of skill in ~e art. Preferably, ~e ditags or co,.r tt~ ers thereof are ligated into a vector for sequ~ncin~ purposes.

Vectors in which the ditags are cloned can be transferred into a suitable host cell.
"Host cells" are cells in which a vector can be propa~ d and ;ts DNA e~ssed.
The term also incl~ es any progeny of ~e subject host cell. It is understood ~at all progeny may not be identical ~o ~e pa~ental cell since there may be m-lt~tions ~at occur dur~ng replic~on. However, such progeny are includ~d when ~he teIm "host cell" is used. Methods of stable ~ansfer, mean~ng ~at ~e folei~n D~A is con~anuously ,~ t~ir~d in the host, are l~own in dle art.

TransfoImation of a host cell with a vector con~inin~ ditag~s) may be ca~ied outby conventional techni~ s as a~e well lcnown to ~ose skiIled in the art. Where the host is prokaryotic, such as E. coli, competent cells which are cap~ble of DNA
~5 uptalce can be pr~par~d ~om cells har~ested aPter exponen~ial grow~h phase and U ~ o r ~ A.~ ~ l Y 6 7 ~ 5 0 ~ 9 F I SH & R I CIIARI) SON P . C . [~1 019 21~5379 subsequently aeated by the CaCI~ me~od using procedures well k~o~m in the art.
Al~r~ ely, MgCl2 or RbCl can be used. Transforma~on can also be perfolmed by elec~oporation or o~er commonly used me~ods in the art.

The ditags present in a part~cular clone can be sequenced by standard methods (see for example, CurrentPro~ocols mMolecularBiology, supra, l~r~t 7) ei~er m~n.n~llyor using automated me~ods.

In ano~er embo~iment, ~e present i~vention provides ~ kit useful for detect~on of gene expression wherein the presence of a defined nucleo~de tag or ditag is inr~ir.a~ive of e.x.~lession of a gene having a seque~ce of the tag, ~e kit comprising one or more cont~in~s c~mrrici~ a first corlt~in~ con~inin~ a ~irst oligonucleoade linker having a first sequence useful hybridi7~hon of an amplification plimer; asecond container cont~inin~ a second oligonucleotide linlcer h~ving a second oligonucleo~de linker having a second sequence usefilt hybr~i7~tion o~ an ~.nptifiç~hon primer, wherein the linkers fi~er co.~ll). ice a restnc~on en~lonllclease sIte for cleavage of DNA a~ a site distant f~om ~e res~iction endon-lcl~ce recognition site; and a l~ird and four~ co..l~;..c~ having a nucleic acid ~i~C;l~ for hybridi7~nion to the ~rst and second unique sequence ofthe linker. It is appa~ent that if the oligonucleo~de linlcers comprise the same nucleo~ide sequence~ only one container contAining linkers is necessaIy in ~e kit of ~e invention.

In yet ano~er embo~im~nt, ~e invention provides aIl ~ligonucleo~ide composi~on having at least two defined nucleotide sequence tags, where~n at least one of the se~en~e tags corresponds to at least one ex~ressed gene. The composi~on consistsof about 1 to 200 di~ags, and preferabl~ about 8 to 2~ ditags. Such composi~ons are useful for the analysis of gene e~,ssion by iden~ing ~e defined nucleotide u;7~ U '1~ I A~ 01~ )UY~ FISE~ & RICEIARDSON P. (, . 1~ 020 2 1 ~537~

sequence tag corresponding t~ a~l expressed gene in a cell, tissue or cell extract, for example.

It is en~sioned ~hat the identification of di~rere,lLial.ly expressed genes us~ng ~e SAGE technique of ~e in~vention can be used in combina~ion with other genomics techniques. For example, indi~idu~l tags, and preferably ditags, can be hybridized with oligonucleo~des immobilized on a solid support (e.g., nitrocellulose filter, glass slide, sil~con chip). Such techniques include "par.~lel sequence analysis" or PSA, as described below. The sequence of the ditags fomled by the method of the invention caIl also be detennined using limi~ng di1u~ions ~y me~ods including clonal o sequencing (CS).

Briefly, PSA is performed after ditag preparation, wherein ~e oligonucleotide sequences to which the ditags are hybri~ized are preferably unlabeled a~d the ditag is preferably detectably labeled. Altema~vely, ~e oligonucleotide can be labeledrather ~an the ditag. The ditags can be detectably labeled, for example, wi~ a radioisotope, a fluoresce~t compound, a ~iolummescent compound, a chemi-hlmines~erlt compound, a metal chelator, or an enyme. Those of ordina~y skill inthe alt will know of other suitable labels for bin~in~ to the ditag, or will be able to ascertain such, using routine eA~Je~ ent~tion. For example, PCR can be performedwit~ labeled (e.g, fluorescei~ tagged) primers. Preferably, the di~ag contain~ afluorescent end label.

The labeled or un1~el~d ~itags are separated into single-stranded molecules wl~ch are preferably serially diluted aIld added to a solid suppo~t (e.g., a silicon chip as described by Fodor, ~t ~l., Science, 251:767, 1991) cont~inin~ oligonucleotides represenlin~ for ~Y~mrle, evely possible ~ vli~l;On of a 10-mer (e.g, in each ~id of a c~ip). The solid support is ~en used to del~. .";i-e dif~erelltial e~ression of ~e U~ o .J~ l lan ~ Kl~,tlAKl)~iUN 1'.( . (~021 ~ 1 85:~7~

-l7-tags contained wi~in that ~u~o,L ~e.g.~ on a gr~d on a chip) by hybridization of the oligon~lçleotides on the solid support with tags produced from cells under different condi~ons (e.g, different stage of development, grow~ of cells in the absence and presence pf a grow~ factor, nolmal versus transformed cells, comparison of different ~ssue expression, etc). LTI ~e case of fluoresceinated end labeled ditags, analysis of ~uorescen~e is indica~e of hybridiza~on to a particular 10-mer. Whenthe imrnobilizcd oligonucleotide is fluoresceinated for ex~ml~le, a loss of fluores-cence due to quenching (by the ~loxi,~ / of ~e hybridized ditag to ~e labeled oligo) is observed and is analyzed for the pattern of gene expression.
An illus~ative example of the method is shown in Example 4 herein.

Ihe SAGE me~od of ~e in~ention is also useful for clonal se~uencing, similar to limil~'n~ dilu~on tecluliques used in clo~g of cell lines. For example, ditags or COI~C~ erS ~ereof7 are diluted a~d added to individual receptacles such ~at eachreceptacle contains less than one DNA molecule per receptacle. DNA in each receptac]e is amplified and sequenced by standard methods known ~n the art, in~ {in~ mass spec~oscopy. Assessm~nt of di~,cllLial expression is performed as described above for SAGE.

Those of ski~l in ~e alt can readily d~le~ le other methods of analysis for ditags or individual tags produced by SAGE as dcscribed in the present invention, without resor~ng to undue experimentation.

The conc~pt of deri~ring a defined tag ~om a sequence in accordance with the present invention is useful m m~tchtn~ tags of samples to a sequence database. II1 ~e IJle~ ed embo~iment a cG.ll~ul~r method is used to match a sample sequence wi~h known sequences.

~v~ VV '~ 11 r.~ u~ 1 & RlC~IARDSON P. C . 1~ U22 ~ 1 8 537~

In one embod~ment, a sequence tag for a sample is compared to corresponding informa~ion in a sequence d~t~b~se to identi~r known sequences ~at ma~ch ~e sample sequence. One or more tags can be determined for each sequence in ~e sequence d~t~bace as the Nbase pairs adjacent to each anchonng enzyme site wi~inthe sequence. However, ~n ~e prefer~ed emb~iment only the first anchor~ng enzyme site ~om ~e 3' end is used to d~te~ine a tag. In ~e preferred embo~ en~
ttle adjacet~t base pairs ~l~fining a tag are on the ~ ' side of ~e anchor~ng enzyme site, aIld N is preferably 9.

A linear search t~rough such a d~t~b~ce may be used. However, in ~e ~r~relred embodiment, a sequence tag ~om a sample is converted to a u~ique numeric represent~hon by conver~ng each base pair (A, C, G, or T) o~ an N-base tag to a number or "tag code" (e.g, A=0, C=l, G=2, T=3, or any other sui~.able mapping).
A tag is ~elr~ "~ e~ for each sequence of a sequence ~ h~e as described above, and the tag is co,~e,L~d to a tag code in a similar m~nner ~ e p.~,feLred embo~itr ~nt, a set of tag codes for a sequence da~abase is stored in a poirlter file.
The tag code for a sample sequence is compared to the tag codes in the pointer fi1e to ~l~t~ ;.,e ~e location in the sequence database of the sequence c~cspo~ding to ~e sample tag code. (Multiple corresponding sequences may exist if ~e sequence database has re~lm~l~ncies).

2û FIGU~E 4 is a bloclc diagram of a tag code ~ h~ce access system in accordance with the prcse~t inven~ion. A sequence database 10 (e.g., ~e Human Cienome Sequence Data?oase) is processed as described above, such ~at each sequence has a tag code dete~ed and stored in a po~nter file L2. A sample tag code X for a sample is det~ ed as described abo~e, and stored within a memory location 14 of a co~ uLcl. The sample tag code X is co~npa,ed to ~e pointer file 12 for a matching sequence tag code. If a match is found, a pointer associated with ~he 0 ~ 0~ u~ 11 & I~LC~lAlWSU.h, P. C. E~i 023 2 1 ~53 79 matching se~uence tag code is used to access the corresponding sequence in the sequence ~t~b~ce 10.

The pointer file 12 may be in any of several fonnats. Ln one fon~at, each en~y of the po~nter file 12 comprises a tag code and a pointer to a correspond~ng record in ~e sequence database 12. The sample tag code X can be compared to sequence tag codes in a linear search. Alterna~vely, ~e sequence tag codes can be sorted and a bina~y search used. ~s another ~It~ re~ the sequence tag codes can be structuredin a hierarchical tree structure (e.g, a B-~ree), or as a singly or doubly linked list, or in any other conver~iently sea~chable data sl-uclule or format.

In the prefelred embo~ nt, each en~y of the pointer file 1~ comprises only ~
pointer to a corresponding record in ~e sequence ~l~t~b~se 10. In building ~e pointel file 12, each se~nce tag code is assigned to an entry position ~n the pointer file 12 colresponding to ~e value of dle tag code. For e~ nple, if a sequence tag code was " 1043", a poi~ter to the co~l~ollding record in dle se~uence database 10 wouldbe stored in en~ 1043 ofthe pointer file 12. The value of a sample tag codeXc~n be used to di~ec~y address ~e location ~n ~e pointer file 12 that corresponds to ~e sample tag code X, and thus rapidly acçess the pointer stored in ~at location in order to address ~e sequence cl~t~h~cc 10.

Because only four values are needed to fe~l~sull all possible base pairs, using bina~y coded decirnal (BCD) numbers for t~g codes in conjunction with the preferred pointer file 12 s~uct~re leads to a "sparse" po~nter file 12 that wastes memo~y or storage space. Accordingly, ~e present invention tlansforms each tag code to erbase 4 (i.e., 2 bits per code digit), in known fashio~ resulting in a compact pointer file 12 s~ucture. For example, for tag sequence ;'AGCr', wi~ A=002, C=01~, G=102, T=l 12, the 'oase fou~ represent~t;on ~n binary would be "00011011".

~J~ i7U ~U 1~.. 10_ ri~ 01 ~ b~i ;)UYY l'`lSII & RIClfARD~ P.(,. ~û24 2 1 ~537~

ln contrast, the BCD representation would be "00~000~0 OOOOOOOI OOOOOOlO
000000011". Of course, it should be understood that other mappings of base pairsto codes would provide equ~lent fiJnc~ion.

The concept of deriv~ng a defined tag from a sarnple sequénce in accordance w~th~e present invention is also useful in companng diff~ient samples for simi~rity. In the preÇelled embodiment, a co~l~u~er method is used to match sequence tags ~om different samples. For example, in comparing matenals ha~Lng a large number of sequences (e.g., tlssue), the ~equency of occu~ence of ~e various tags in a ~;rst sample ca;rl be mapped out as tag codes stored in a dis~ibuhon or h~stogram-type~ ucl~ For example, a table s~uctl~red similar to po~ter f~e 12 in FIGURE
4 can be used where each en~ay comp~ises a frequency of occu~ ce value.
Thereafter, the various tags ~n a second sample can be generated, converted to tag codes, and compared to the table by direc~y addressing table entries wi~h the tag code. A count can be kept of ~e number of matches found, as well as ~e location of the matches, for output in text or graphic form on an output device, and~or for storage in a data storage system for later use.

The tag companso~ aspects of the in~rention may be ~mplemented in hardware or software, or a combina~ion of both. Preferably, these aspects of the inveni~on are impl~mented in co",~,ul~l plo~ls e~cuting on a p~u~ able cv~llpi~tc~
comprising a processor, a data storage system (including volatile and non-volatile memory and/or storage eleTnentc), at least one ~nput device, and at least o~e output device. Data input through one or more inp~ deYices for tempora~y or permanent storage in ffle data stora~e system includes sequences, and may include previously gel ~a~d tags a~d tag codes for known andlor unlmown seguences. Program code is applied to the ~nput data to perform the ~nctions descr~bed above and generate v.7, ~, vv ~ r~ Ul~ o~o ~U~ iH & RlCHARDSON P.C. 1~1025 2 1 ~5~7q ou~ut inform~ion. The output infonn~tion is applied to one ol more output de~ices, in hlown fashion.

Each such co,n~uler pfO~ iS preferably s~ored on a storage media or device (e.g,ROM or magne~c 11iQ~ e) readable by a gener~l or special purpose programmable computer, for COnflgUrlllg a~d operating ~e computer when the storage media or device is read by the co~ Le~ to perfoIm ~e procedures described herein. The inven~ve system may a~so ~e considered to be ~mplemented as a co,llpu~er-readable sto~ge medium, configured with a co-~p~t~ program, where t~e storage mediurn so configured causes a computer to opelate in a specific and predefined m~nnçr to perform the fimctions described herein.

The following ex~m~'~s are inte~de~ to illus~ate but not limit the inven~on. While they are typical of ~ose that might be used, o~her procedures known to ~ose skilled irl ~e art may alternatively be used.

~ A~IJ~UN ~'-C E~026 2 1 ~7q EXAMPLES

For exemplaly purposes, the SAGE me~od of ~e inven~ion was used to character-ized gene expression ~n ~e human pancreas. ~lam was ~ yed as ~e first res~iction endonuclease, or anchoring en~yme, and BsmFI as the second res~ction s endonuclease, or t~gging enzyme, yield~ng a 9 bp tag (BsrnFI was predicted to cleave the complemPn~ry s~and 14 bp 3' to ~e reco~ition site GGGAC and to y~eld a 4 ~p 5' overhang ~ew F.n~ n~ BioLabs). Overlapping dle BsmFI al~d MaIII
(CATG) sites as indicated (GGGACATG) wo~ld be predicted to result in a 11 bp tag. ~Iowever, analysis suggested that under ~e cleav~ge condi~ons used (37C), BsmFI often clea~ed closer to its recognition site leaving a miniml~m of 12 bp 3' of its recog~ition site. Therefore, only the 9 bp closest to the ~nchonng enzyme site was used for analysis of tags Cleavage at 65 C results in a more consistent l l bp tag.

Cv..~ analysis of hurnan ~anscripts ~om Gen 13ank indicated that greater than ~5 95% of tags of 9 bp in length were likely to be un~que and ~at inclusion of two additional bases prov~ded little a~di~ion~l resolution. Human sequences (84,300)were extr~cted from ~e GenBank 87 database using ~e Findseq program provided on the I~telliGenetics Bionet on-l~ne service. All ~r~er analysis was perf~rmed with a SAGE prograrn group written in Microsoft Visual Basic for the Microsoft Windows operating system. The SAGE d~ ce analysis program was set to include only seqll~nc~s noted as "RNA" in the locus descrip~on and to exclude entries noted as "EST", reslll~in~ in a reduction to 13,241 sequences. Arlalysis of this subset of seq~ences using Nlam as ~n~h~ring Enzyme ~ndicated that 4, }27 nine bp tags wereunique while 1,511 tags were found in more ~an one erl~y. Nucleo~de compa~ison of a r~T~domly chosen subset (100) of ~e latter en~ies ~ndica~ed ~at at least 83%
were due to re-hln~nt data base entnes for ~e same gene or highly related genes 2 I ~5~7~

(~95/0 identity over at least 250 bp). This suggested that 5381 of the 9 bp tags (95.5/O) were unique to a transcr~pt or highly conserved ~anscnpt family. Likewise, analysis of the s~ne subset of GenBank with an 11 bp tag resulted only in a 6%
decreæ in repeated tags ( 15 l 1 to 1425) ~nstead of the 94% decrease expected if ~e repeated tags were due to unrelated transcripts.

EX~IPLE I
As outliDed abo~e, m~A ~om human pancreas was used to generate ditags.
Briefly, five ug mR~A ~om total pancreas (Clontech) was converted to dou~le stranded cDNA using a BRL cDNA sy~esis kit follouing the manufacnlrer's protocol, us~ng the pnmer bio~in-5'T g-3'. The cD~A was then cleaved with MaIII
and the 3' restrichon fragments isolated by binding to ma~e~c s~eptavidin beads ~Dynal). The bound DNA was di~ided into two pools, and one of the following linkers ligated to each pool:
5'-lTrrACCAGCTTATTCAATTCGGTCCTCTCGCACAGGGACATG -3' 3'- ~TGGTCGAATAAGTTAAGCCAGGAGAGCGTGTCCCT -5' (SEQ II) NO: 1 and 2) 5'~ llGTAGACATTCTAGTATCTCGTCAAGTCGGAAGGGACATG -3' 3'- ~CATCTGTAAGATCATAGA~CAGTTCAGCCTTCCCT -S
(SEQ ID NO:3 and 4), where ~ is a dideoxy nucleo~de (e.g., dideoxy A).
After extensive wash~g to remove unligated linkers, ~c linkers and adjacent tagswere released by clca~,age wi~ BsmFI. The resul~ng overhangs were filled in u~thT4 polymerase and the pools combined and ligated to each other. The des~red ligation product was then amplified for 25 cycles us~ng 5'-CCAGCTTATTCAAl~CGGTCC-3' and 5'~TAGACAl~CTAGTATCTCGT-3' (SEQ ID NO:5 and 6, respec~vely) as pnmers. The PCR reachon was then analyzed by poly~crylam~de gel electropharesis and the desired product excised. An a~ ion~l 15 cycles of PCR werc then performed to generate sufflcie~t product forefflcient liga~on and cloning.

2185~79 The PCR ditag products were cleaved wi~ nd ~e b~nd conta~g the ditags was excised and self-ligated. Afi:er liga~ion, ~e concatenated ditags were separated by polyacIylamide gel electrophoresis and pr~ducts greater than 2~0 bp were excised. These products wele cloned into the SphI sit~ of pSL3~1 (In~kogen).
S Colonies were screened fo~ inserts by PCR usi~Lg T7 and T3 sequences outside the cloning site as pnmers. Clones cont~inin~ at least 10 tags (r3Dge 10 to 50 tags) were iden~fied by PCR amplifica~an and manually sequenced as described (Del S~l, et al., Biotechniques ~:514, 1989) using ~'-GACGTCGACCTGAGGTAATTATAACC-3' (SEQ ID NO:7) as prLmer. Sequence 13 files were analyzed using ~e SAGE software group which identifies ~e anchonng enzyme site with ~he proper spacing a~d extracts ~e two inter~ening tags and records ~em in a database. The 1,000 tags were derived ~m 413 ~que ditags alld 87 repeated ditags. The latter were only counted once to elimin~te potential PCRbias of ffle quan~ta~on. The fi~nc~ion of SAGE so~ware is merely to op~ze ~e se~rch for gene sequences.

Table I shows analysis of ~e first 1,000 tags. Sixteen perce~ were climin~t~rl because they ei~er had sequence ambigui~es or were denved form linker sequences.The r~ ;"~ 840 ~gs i~cl~lde~l 351 tags that occu~ed once and 77 tags that were found multiple ~nes. Nine of the ten most abundant tags matched at least one en~y 2~ in GenBanlc R87. Th.e remAini~g tag was subsequently shown to be denved ~om ~nylA~e. All ten ~ans.,ripts were der~ved f~om genes of known pancreahc fi~nction ~d t~eir prevalence was ct-n~ic~nt W~ previous analyses of p~crea~c RNA using conventional approaches (Han, e~ al., Proc. Natl. Acad. Sci. U.S.A. 83:110~ 19~6;
Takeda, et al., Hum. Mol. Gen, 2:1793, 1993).

., ~ , . v . . i,, U ~ O ~ n Kl~ 129 2 1 ~517q TABI,F 1 Pancreati~ S~GE Ta~
per~nt GAGCACACC P~ o~pep~idaseAl(X67318) 64 7.6 TTCTGTGTG P~cr~cT~psino~2~7602) 46 5.5 GAACACAAA C~ o~s~og~(~4~) 37 4.4 TCAGGGTGA PAncreatic TIYPSi~ 22612) 31 3.7 GCGTC~ACCA Elastnse IllB (M186g2~ 20 2.4 GTGTC;rGCT Protesse E ~)00306) 1~ I.g TCATTG&CC P.~ .. ,dtic Lipase (M93285) 16 1.
CCAGAGAGT P~ c2)t;daseB (M81057) 14 1.7 TCCTCAAAA NoM~c~S~ Table2,P1 14 ~ 1.7 AGCCTTGGT BileS~t S~ated Lipa~ (X~4457) 12 1.4 GTGTGCGCT No Match I 1 1.3 TGCGAG~CC NoMatc~Soe Table 2, P2 9 1. I
GTGAAACCC 21~uen~ 8 1.0 GGT&ACTCT NoM~tch 8 l.O
AAG&TAACA ~cret~yT~ps~ ~bi~r~ 4~ 5 V.7 TCCCCT~TG NoMs~h 5 o.~
~TGACCACG ~oM~h 5 0.6 CCTGTAATC M~1159,M~g3~,11~uen~ 5 ~6 CACGT~GGA NoMatch 5 0.6 AGCCCTACA NoMa~h 5 0.6 A5CACCTCC Fl~.~ Fac~r2(Z11~92) 5 0.6 ACGCAGG~A NoMfltG~S~T~lcZ,P3 5 0.6 M TT&AAGA No~atc~S~T~Ic2,P4 5 U.
~ GG NoMs~ 4 0-5 TTCATACAC NoMa~h 4 0.5 ~ GTG&CAGGC NF-kB~614~9),~u~l~y(S94541~ 4 0.5 GTAAAACCC T~r~tor~55~4), ~u~yÇ~01448) ~ 0.5 GAACACACA Noh~ch 4 O.S
CCTGGGAAG P~o~aticM~i~(J05582) 4 Q.5 CCCATCGTC ~~ inlC~CO~d~e~X1575~ 4 0.5 {SEQnDNO:8-37) ~nmn ~
SAGEtags G~at~ ~ ~t~es 3~0 45.2 O~u~ng ~xtunes tl5x3~) 45 54 Two~mes (3~x2=) ~ 7.6 Oneti~ 351 41~
Tot~l SAGETa~s 84~ 100.0 . v .. ~ o .JU~ l h l~l~lt~ 030 2 1 ~5~ 79 "Tag" indicates ~e 9 bp sequence un~que to each ta~, adiacent ~o ~e 4 bp ancho~ng ~laIII site. '~" and "Percent" indicates ~e nurn~er of times ~e t~g w~s i~entified and its ~equency, respectively. "C~ene" indicites the accession number and descript'on of C}enBank R~7 entries found to match ~e Mdicated tag us~ng the SAGE sof tware group wi~ the following excep~ions. When multiple entries were identi~iea because OL duplicated entries, only one en~y is li~ted. ~ ~he cases of chy~o~ps~noge~ ~psinogen 1, otber gene~ were identified ~at were predi~ted to contain the same tags, but subsequent hybridi2a~ion and sequence analys~s id~n~ifie~1 ~e listed genes as ~e sou}ce of ~e tags. "Alu en~y" ~ndicates a 1 O match wi~ a GenR~nk en~y far a ~nscript that co~ained at least one copy of the alu consensu~ sequence (Deininger, et at., .J. Mol. Bif~l., 151:17, 1~81~.

1 J~ I~ lUI~ r ~ n 3 1 21~ 37q EXAMPL~ 2 The quantitatil~e nature of SAGE was evaluated by construc~on of &'l C\llgO-dT
primed pancrea~c cDNA library which was screened with cDNA pro~es for trypsinogen lJ2, procarboxpeptidase Al, chyrnotlypsinogen and e}astase I-IIB/protease E. Pancre~hc mR~A ~om ~e same prep~ration as used for SAGE in Example 1 was used to cons~uct a cDNA libraty in the ZAP Express ve~tor using ~e ZAP Express cD~TA Synthesis l~t following the m~nlTf~cturer's protocol (Str~t~g~ne). Ana~ysis of 15 r~ndo~y selected clones indicated ~at 100% co~tained cD?~A ~nserts. Plates cnnt~inin~ 250 ~o 50a plaques were hybridized ~s previously descnbed (Rupp~rt, et al., Mo~. Cefl. Biol. ~:3104, 1988~. cDNA probes for trypsinog~n 1, trypsinogen 2, procallw~Lypep~da3e A1, ch~motrypsinogen, and elast~se mB were denved by RT-PCR ~om pancreas RNA. The ~ypsinogen 1 and 2 probes were 93% ide~ical and hybridized to ~e same plaques under ~le conditions used. Likewise, ~e eIastase ~IIB probe and protease E probe were over95% identical s~d hy~ridized to the same plaques.

I he relative abundanGe of the SAGE tags for these ~anscnpts was in excellent a~eement ui~ ~e results obtained wi~ libraIy screenin~ (Figure ~). Fur~ermole, whereas nei~er ~sinogen 1 alld 2 nor elastase IIIB and protease E could be distinguished by ~e cD~IA probes used to screen the library, all four transcripts could readily be dist;n~ hed on ~e basis of ~eil SA~3E tags (Table 13.

V nc~ ;U~ r~ ol~ ~7~ ~Ub~ Fl~;H & RICH~RDSO~ P.C. b~ltl3~

21~5~ 79 In addition to yrovid~ng quantitati~e infonnation on the abundance of known t:ranscr~pt, SAGE cou~d be used to identify no~eI expressed ~enes. While for ~e purposes of the SAGE ar,aIysis ~ this example, only the 9 bp sequence uni~ue ~o ea~h tl~lsc~i~t was cnnci~ered, each SAGE tag defined a 13 bp sequence cornposedof ~e anchoring enzyme (4 bp) site plus ~Te 9 bp tag. To illus~a~ ~s potential, 13 bp oligonueleo~ides were used to isolate ~e transcripts corresponding to four Tm~T~sJ~5Ted ~gs (Pl to P4), dlat is, tags ui~out corresponding en~ies from GenBank R87 (Table ~). In each of ~e four cases7 it was possible to isoIate multiple cDNA
clones for the tag by simply screen~g the pancrea~ic cDNA lib~a~y using 13 bp oligonucleotide as hybridization probe (ex~mples in Figure 3).

Plates con~inin~ ~50 to 2,000 p1aques were hybridized to oligonucIeo~de probes using the same con~lihons previously described ~or sta~ldard probes excep~ ~at ~e hybndization temperature was reduced to room tempera~e. Washes were 1~ performed in 6xSSC/0.1% SDS for 30 mi~lt~c at room tempera~e. The probes consisted of 13 bp oligonucleotides which were la~eled wi~ y32P-ATP using T4 polynucleotide kinase. In each case, sequenc~ng of the derived clones in7en1;fied ~e correct SA&E tag ~tthe p~edicted 3' end of ~e id~nt7fied transcript. The abundance of plaques id~7tifi~d by ~y~ni7~ n wi~ ~e 13-mers was in good agre~ment wi~
that predicted ~y SAGE (Ta~le 2). Tags P1 and P2 were ~ound to correspond to arnylase and preprocarboxypept~dase ~2, respec~vely. No e~try fo.
p~eprocarboxypeptidase A2 aIld only a ~uncated en~y for amyiase was present in GenBank R87, ~us accoun~ng fcr their lln~csi~ned ch~racteriza~oll. Tag P3 did not match any genes of known fim~t7OI7. in GerLBa~k but did match n-7merous ES~'s, 2s providing ~er e~idence ~hat it represented a ~ona fide transcript. The ~DNA

v~ . r~A UlV J~ O ;~u~ c Kll,~IARl)SON P.C. ii~i033 2 ~ ~537~

idenhfied by P4 ~howed no sign~ficant hom~logy, sugges~ing ~at it represented a previously unch~ractenzed pancreatic ~anscript.

TAB~,F, 2 Characte~ tion of Unassi~ned SAGE Ta~,s ~bundance SAGE
TAG SAGE 13mer Hvb ~ ~escription Pl TCCTCA~AA 1.7% 1.5% (6/38~) + 3' end of Pancrea~ic Amylase (M28443) (SEQ ID N0:38) P2 TGCGAGACC 1.1% 1.2% ~43/3700) + 3' ald of Pl~p~ oxypeptidase A2 (Ul 9977) (~;EQ ID l~rO 39) P3 ACGCAGGGA 0.6% 0.2% (5/2772) + EST match (R4~808) (SEQ ID NO:40) P4 AATTGAAGA0.6~o 0.4% ~6J1~87~ + no match ~SEQ ID ~0:41~
"Tag" ~nd "SAGE Abundance" are descIibed ~n Table 1; "l3mer Hyb" in~c~tPs the results obtained by screenuLg a cD~A libra~ a 13mer, as described above. The number of positi~,e plaques divided by ~e total plaques screened is indicated ~np~ eses following dle percent abundance. A positive in the "SAGE Tag" column 20indicates ~t the expected SAG~; tag sequence was iden~fied near ~e 3' end of isolated clones. "Descnp~on" ~ndicates ~e results of BLAST searches r,f thc daily e~&enR~nk entnesatNCBIaof 6/~/95 ~Altschu1s e~al., J; Mol. B~ol., 2l5:40~, 1990). A descrip~on ~nd Accecsion number are gi~en for ~e most significant matc~es.
Pl was found to match a ~lnc~te~ en~y for amylase, and P2 was found to match an 2~published entry for preprocarboxypephdase A2 which was entered after GenE~ank ~7.

~JV~ 0 ;~U~Y 1 15~1 & Rl(l~ARDSl~N P. C . b~3 f 34 2 ~ ~5~79 -3~

Ditags produced by SAGE e~n be analy2ed by PSA or CS, ~s deseribed in the specification. In a preferred embodiment of PSA, the fol1owing steps a~e ca~ned out -wi~ dit~g~:
Ditags are ~ep~cd, amplified and cleaved wi~ ~e anchoring er~yme as descril~ed in ~e previous examples.
OOOOOOOOOO~XCATG-3' 3'-GTACOOOOOOOOOO~O~X.Y~:X , Fo~-base oligomers cont~inin~ an identifIer (e.g., a fluorescen~ moiety, FL) a~e] O pl~par~d that are romplen-pnt~ry to ~e ovçrh~n~ for e~amlple, FL-CA~G. The FL-CATG oligomers (~n excess) are ligated to ~e ditags as shown bclow 5'-FL~CATGOQOOOOOOOOXX~ ,~CATG
GTACOOOOOOOOO(~ oCGTAC-FL 5' The ditags ~re then purified and melted to yield si~gle-stlanded DNAs ha~ng ~e fo;mula:

5'-FL-CATGOOO~OOOOOOOXXK~:XCATG and GTACOOOOOOOOC)(;)X~GTAC-FL-5', ~or example. ~he mixture of single-s~anded DNAs is prefer~Dly serially diluted :Each serial dilution is hy~ndized und~r d~.~ropliate stringency conditions ~i~ solid matrices cont~inin$ gr~dded single-stranded oligonucleo~id~s; all of ~e oligo-nucleotides cont~in a hal~-site of ~e anchor~ng enzyrne clca~age ssquence. In ~he . v ~ 7 U I J ~UJ;~ k Kl(,~iAIW~Ul~ 1'. (; . ~ 035 2 1 ~537~
-example used here~n, ~e oligonucleo~ide sequences contain a CATG sequence at ~e S' end:

CAlGOOOOOOOOOO, CATGXX~COO~ etc.
(or altema~vely a CA~G sequence at dle 3' end: OOOC~OOOOOCATG) The m~ices can be constructed of any m~teri~1 known in the ~t and the oligonucleo~de-bear~ng chips can be generated by any procedure k~own ~n ~e ar~ e.g.
silicon chips cant~ining oligonucleotides pr~d by ~e VLSIP procedure (Fodor et al.7 supra).

The oTi~o~l~cleo~de-beanng ma~ces are evaluated for the presence or abseIlce of a fluorescellt ditag at each posi~on in the grid.

a ~.crc,led embo~imPnt7 there are 410, or 1,048,5~69 oli~onucleo~ides on ~e gnd(s~
of ~e general sequence CATGOOOOOOOOOO, such ~at every possible ~0-base sequence is represented 3' to ~e CAT~, w~lere CATG is used as an example of aII
~n~hnrin~ eny~ne half site that is co~plementary to ~e ancho~ing enzyme half site at ~5 the 3' end of ~e ditag. Since there are es~mated to ~e no m~re than 100,000 to 200,000 di~ele,~ pressed genes in ~e h~Lman PenoFn~ ~ere are enough oligonucleo-~de sequences to detect all ofthe possible se~ nees adjacent to ~e 3'-most anchoring enzyme site observed ~n ~e cDNAs i~om the expressed genes in the human ~no~e.

In yet another embo~iimPnt structures as described above cont~ n~ ~e sequences PRIMER A- &GAGCATG (X)IO ~)10 CATGCATCC- PRI~ER B
PRIMER A- CCTCGTAC ~X),0 (O),0 GTA(:~G~'AGG- PRI~ER B
are amplified, cleaved wi~ t:~gin~ enzyme and ~erea~cer with anchoring en~yme togenerate tag complements of ~e s~uc'cure:

u ~ U ~ . u ~ r . ~ u ~ ~ F l S Il & R l CH.~RD SON P . C, ~ 0 3 ~;
2 1 85~7~

(0),0 CATG-3', which csn then be Iabeled, melted, and hybridi7.ed wi~ oligo-nucleo~des on a solid suppor~.

v . ~ J i O .~ H ~c Kl(~tl~ 13N ~. C . !~j o~7 A determina~ion i3 m~de of differentiaI e~pression by cornpar~r~g ~e fluorescencc profIl~ on the grids at dif~erent dilutions arnong different libranes (iepresenting di~erential scleening prc~es~. Fo~ example:

Libra~v A, Di~ags Diluted 1:1 O Li~rary B, Dita~s ~iluted 1:10 A B C D E A B C D E
I FL I FL
FL ' FL FL
3 FL FL 3 , FL FL

FL ~ E~ FL

Librar~A, Ditags Diluted 1:50 Libra~ A, I)itags Diluted 1:100 A 13 C 1:~ E
h B C D E
FL 1 Fl, Libr~y B, Ditags Di]uted i:50 Libra~ E~, Ditags Dilut~d 1:100 A B C ~ E A B C D E
FL

~ .. , . v~ JU;~ c K~ UI`~ 038 2 1 ~537~

The individ~.~aI oligonucleotides ~us h~brid~e to di~ags with the following charactens-~s Table 3 Dilution ~:~Q 1:50 I:IW
Lib A Lib 13 !ib A Li~ E~ Lib A Lib B
S IA + + + + +

2~ ~ +
3B + + + + + +
3C ~ ~ +
0 4D + + +
~A ~ ~ +
SE

Table 3 s~mm~n~es ,he results of ~e di~ hybn~i7~tion. Tags hybridi~ing to lA
and 3B re~e~t ~ghIy ablandant mRNAs that are not diL~r~ ially expressed (since thc tags hyb~idize to bo~ libraries at all dilu~ons); tag 2C ide~fies a hig~ly abund~nt mRNA, but only i~ LibraIy B. ~E ~flects a low abundaIlce tra~script (since it is only detected at the ~owest dilution) ~at is not found ~o be di~eren~ially expressed; 3(:
reflects a moderately abundant transcript ~s~nce it is expressed at ~e lower twodilutions) in Lib~a~y B ~at is expressed at low a~und~nce ~n Libraly ~. 4D reflects a di~erentially-e~l~ssed~ m-1~ncc transcript res¢icted to Library A; 5A re;flects a ~scnpt ~at is expressed at high ab~md~n~e in Library A but only at low abundance ~ LibraIy B; and 5E re~ects a di~c;leu~ y~xpressed transcript l;hat is detectable only in Libr~y B.

.VU ~ V~ lS~I & l~IC~RDSON P.~. ~103~
~ 1 ~537q Ln ano~er PSA embo~imen~ s~ep 3 a~ove does not in~olve the u;e of a fluorescent or other identifier; instea~ at the Iast round of ~nplificat~on of ~e dit~s, l~beled dNTPs are used so ~at after mel~n~ half of all molecules are labeled and can serve as pro~es for hybridizahon to oligonucleo~ides fixed on ~e chips.

In yet ano~er PSA embodiment"nstezd of ditags, a particular por~ion of the transcript is used, e.g, ~e sequence between the 3' tem~inus of the ~ansc~ipt a~d the Erst anchori~g enzyme site. In ~at particular case, a doubIe-s~anded cDNA reverse transcIipt is generated as described ~n ~e DetaiIed Description. I he transcnpts are cut e anchoring enymeS a linker is added cont~ining a PCR pr~mer and ~nplifica-~on is inihated (us~ ~e p~er at one end and ~e poly A tail at the other) while the transcripts are s~Il on ~e s~epavidin bead. A~ the last round of a~plificaaon, fluorescein~ted dNTPs are used so that half of ~e molecules are labeled. The li~ker-pI~mer can be op~onally removed by use of ~e ~n~o~ng enzyme at dlis point in order to reduce ~e size ofthe fi~ ntc The soluble L~ c are ~en melted and captu~d on solid matrices co~ g CATGOOOOOOOOOO, as in ~e previous example.
Analysis and sconng (only of the h~lf of the i~ragments which contain fluoresceinated bases) is as descr~bed above.

For use in clonal sequencing, di~ags or concatemers would be diluted and added to wells of mul~well plates, f~ example, or o~er recep~acles so ~at on average ~:he wells would contain, s~s~cally, less ~ one D~A molecule per well (as is done in 3~ted dilu~on for cell cloning). ~ach well would ~en receive reagen~s fo~ PCR or ano~er amplifica1ion process and ~e D?~A in each receptacle would be sequer~ced~ e.g, by ~nass spectroscopy. Tbe results ~ill ei~cher be a single sequence (there hav~ng been s~ngle sequence ~I that receptacle), a "nuli" se~uence (no D~ present) or a double 2~ sequence (more ~an one DNA molecule), which would be elimin~te~ ~om consider-.r~ v . ~ l l i7 1~ ~ 0 V~ k K~ UN l'~. C . ~ 1~ 040 21 ~537q a~un dunng data analysis. ~here~er, ~ssessment o~ dif3~ren~al expression woul~l be ~e same as described herein.

~hese results demons~at~ that SAGE pro~ides ~o~ quan~ita~ve ~.d ~ualita~ive dataabout gene expression. The use of di~erent anchor~ng eI~mes and~or ta~ging enzymes with ~,rarious recognition elements lends great fl~xib~lity to t~is saategy. ~n par~cular, since di~e~ t ~n~ n~ enzymes cle~ve cDNA at different sites, dle use of at le~st 2 di~erent Aes on ~ferent samplcs of the same cD~A prepara~ion allou s conf~nationof results ~nd aT~lysis of sequences ~at ~ht not contain a recogI~ihsn site ~or one o~
the enzymes.

As efforts to fillly characterize ~e genoIne near comple~on, SAGE should ~llow adirect readout o~ expression in any gi~en cell ~pe or ~issue. In ~e int~rim, a major application of SAGE will be the col~p~ison of gene expre~sio~ pat~ems in among ~ssues and in var~ous development~T and ~isease states in a iven cell or tissue. One of skill in ~e 3rt wi~ ~he capa~ilit~y to perform PCR and manual sequencing coul~l 1~ perform SAGE for this purpose. Adaptation of this technique to a~ au~omatedsequencer wGuld allow t~e an~lysis of ov~r 1,000 ~nscnpts in a sin21e 3 kour run. An ABI 377 sequencer can produce a 451 bp readout or 3~ templates in a 3 hour ruD
(45 lbpll lbp per tag x 36=~476 tags). The ~ o~riate num~er of tags to be det~nnined will depend on ~e applica~l~n. ~or example, ~ defm~tion of genes expressed at rela~rely high levels (0.~% or more) in one ~ssue, but low in ano~er, would require only a single d~y. D~t~ on of ~ans ripts expressed at ~reater than 100 In:RNA' sper cell (.~2~% or more) should be q~ tifi~hle ~i~;n a few mon~s by a single investigator Use of two di~re~ L Anchoring Enzymes will ensure that ~,irtually all ~anscnpts of ~e des~red a~undance will be identified. The genes encodi~g those t~gs fvund to be most interesting on the b~sis of their dif~re&tial represen~on can ~e positively identifie~ by a combinat;ion of data-~ase searching, ~lybridization, and ~V~ ~ V.~ ~T~ l~ .V~ U~ .>U~ Kl(~l:L4Kl)SIJ.~ 041 sequence analysis as demons~ated in T~ble 2. Obviously, SAGE could also be applied to the analysis of o~ nc o~er ~an hl-m~ns~ and could direct inves~iga~on towardsgenes expressed ~n specific biologic states.

SAGE, as described herein, allows compaIison of expression of numerous genes s among ~ssues or amorlg dif~erent .states of develop~nent of the same tissue, or between pa~ologic hssue a~d its rso~nal cou~lltl~art. Such a~alysis is useful for identifying ~erape~cally, diagnostica~ly and prognostically rele~rant genes, for example. Among the many u~ es for SAGE tech~ology~ is the id~nhf~cation of ap~ p~iaLe anasense or triple helix reagents which may be ~erapeutically usen~. Fur~er, gene ~erapy candidates can also be identified by ffle SAGE tec~ology. Other uses ~nclude noS~C applicati~ns for iden~fication of ind~idual genes or groups of genes whoseexpression is shown to corelate to predisposiaon to disease, the presence of disease, and progncsis of disease, for example. An abundance profile, such as ~at depicted in Table 1, is usefill ~r the above descnbed applications. SAGE is also useful for ~lefechon of ~n organ~sm (e.g., a pathogen) Ln a host or detection of ~nfection-specific genes expressed by a pat-hcgen in a host.

Ths abi~ty to identif y a large numbcr of expressed genes in a short penod of ~me, as descnbed by SAGE in the present in~enti~n, pro-qdes unli~ted uses.

Although ~e Ln~ention has been descr~bed wlth reference to ~e presen~y E,ref~l-e~
embodiment, it should be understoGd that v~ous modificatio~s can be made wi~out dep~g ~om ~e spirit of the invention. Acccrdingly, ~e inven~ion is ILm~te~ only by ~e follow~ng claims.

Claims (45)

1. An isolated oligonucleotide composition having at least two defined nucleotide sequence tags, wherein at least one tag corresponds to at least one expressed gene.
2. The composition of claim 1, wherein the oligonucleotide consists of about 1 to 200 ditags.
3. The composition of claim 2, wherein the oligonucleotide consists of about 8 to 20 ditags.
4. A method for the detection of gene expression comprising:
producing complementary deoxyribonucleic acid (cDNA) oligo-nucleotides;
isolating a first defined nucleotide sequence tag from a first cDNA
oligonucleotide and a second defined nucleotide sequence tag from a second cDNA oligonucleotide;
linking the first tag to a first oligonucleotide linker, wherein the first oligonucleotide linker comprises a first sequence for hybridization of an amplification primer and linking the second tag to a second oligonucleo-tide linker, wherein the second oligonucleotide linker comprises a second sequence for hybridization of an amplification primer; and determining the nucleotide sequence of the tag(s), wherein the tag(s) correspond to an expressed gene.
5. The method of claim 4, further comprising ligating the first tag linked to the first oligonucleotide linker to the second tag linked to the second oligonucleo-tide linker and forming a ditag.
6. The method of claim 5, further comprising amplifying the ditag oligonucleo-tide.
7. The method of claim 5, further comprising producing concatemers of the ditags.
8. The method of claim 7, wherein the concatemer consists of about 2 to 200 ditags.
9. The method of claim 8, wherein the concatemer consists of about 8 to 20 ditags.
10. The method of claim 4, wherein the first and second oligonucleotide linkers comprise the same nucleotide sequence.
11. The method of claim 4, wherein the first and second oligonucleotide linkers comprise different nucleotide sequences.
12. The method of claim 11, wherein the first and second oligonucleotide linkers have a sequence:
5'-TTTTACCAGCTTATTCAATTCGGTCCTCTCGCACAGGGACATG -3' 3'- ATGGTCGAATAAGTTAAGCCAGGAGAGCGTGTCCCT -5' or 5'-TTTTTGTAGACATTCTAGTATCTCGTCAAGTCGGAAGGGACATG -3' 3'- AACATCTGTAAGATCATAGAGCAGTTCAGCCTTCCCT -5', wherein A is dideoxy A.
13. The method of claim 4, wherein the linkers comprise a second restriction endonuclease recognition site which allows cleavage at a site distant from the recognition site.
14. The method of claim 13, wherein the second restriction endonuclease is a type IIS endonuclease.
15. The method of claim 14, wherein the type IIS endonuclease is selected from the group consisting of BsmFI and FokI.
16. The method of claim 5, wherein the ditag is about 12 to 60 base pairs.
17. The method of claim 16, wherein the ditag is about 18 to 22 base pairs.
18. The method of claim 6, wherein the amplifying is by polymerase chain reaction(PCR).
19. The method of claim 18, wherein primers for PCR are selected from the group consisting of 5'-CCAGCTTATTCAATTCGGTCC-3' and 5'-GTAGACATTCTAGTATCTCGT-3'.
20. A method for detection of gene expression comprising:
cleaving a cDNA sample with a first restriction endonuclease, wherein the endonuclease cleaves the cDNA at a defined position at the 5' or 3' terminus of the cDNA thereby producing a defined sequence tag;
isolating the defined 5' or 3' cDNA tag;
ligating a first pool of tags with a first oligonucleoade linker having a first sequence useful hybridization of an amplification primer and ligating a second pool of tags with a second oligonucleotide linker having a second sequence useful hybridization of an amplification primer;
cleaving the tags with a second restriction endonuclease;
ligating the two pools of tags to produce a ditag; and determining the nucleotide sequence of the tag(s), wherein the tag(s) correspond to a mRNA from an expressed gene.
21. The method of claim 20, further comprising amplifying the ditag.
22. The method of claim 20, wherein the first restriction endonuclease has at least one recognition site in the cDNA.
23. The method of claim 22, wherein the first restriction enzyme has a four base pair recognition site.
24. The method of claim 23, wherein the restriction endonuclease is N1aIII.
25. The method of claim 20, wherein the cDNA comprises a means for capture.
26. The method of claim 25, wherein the means for capture is a binding element.
27. The method of claim 26, wherein the binding element is biotin.
28. The method of claim 70, whelein the first and second oligonucleotide linkers comprise the same nucleotide sequence.
29. The method of claim 20, wherein the first and second oligonucleohde linkers comprise different nucleotide sequences.
30. The method of claim 29, wherein the first and second oligonucleotide linkers have a sequence:
5'-TTTTACCAGCTTATTCAATTCGGTCCTCTCGCACAGGGACATG -3' 3'- ATGGTCGAATAAGTTAAGCCAGGAGAGCGTGTCCCT -5' or 5'- TTTTTGTAGACATTCTAGTATCTCGTCAAGTCGGAAGGGACATG -3' 3'- AACATCTGTAAGATCATAGAGCAGTTCAGCCTTCCCT -5', wherein A is dideoxy A.
31. The method of claim 20, wherein the second restrction endonuclease cleaves at a site distant from the recogition site.
32. The method of claim 31, wherein the second restriction endonuclease is a type IIS endonuclease.
33. The method of claim 32, wherein the type IIS endonuclease is selected from the group consisting of BsmFI and FokI.
34. The method of claim 20, wherein the ditag is about 12 to 60 base pairs.
35. The method of claim 34, wherein the ditag is about 14 to 22 base pairs.
36. The method of claim 20, further comprising ligating the ditags to produce a concatemer.
37. The method of claim 36, wherein the concatemer consists of about 2 to 200 ditags.
38. The method of claim 37, wherein the concatemer consists of about 8 to 20 ditags.
39. The method of claim 20, wherein the amplifying is by polymerase chain reaction (PCR).
40. The method of claim 39, wherein primers for PCR are selected from the group consisting of 5'-CCAGCTTATTCAATTCGGTCC-3' and 5'-GTAGACATTCTACTTATCTCGT-3'.
41. A kit useful for detection of gene expression wherein the presence of a cDNAditag is indicative of expression of a gene having a sequence of a tag of the ditag, the kit comprising one or more containers comprising a first container containing a first oligonucleotide linker having a first sequence useful hybridization of an amplification primer; a second container containing a second oligonucleotide linker having a second oligonucleotide linker having a second sequence usefill hybridization of an amplification primer, wherein the linkers further comprise a restriction endonuclease site for cleavage of DNA at a site distant from the restriction endonuclease recognition site; and a third and fourth container having a nucleic acid primers for hybridization to the first and second unique sequences of the linker.
42. The kit of claim 41, wherein the linkers have a sequence 5'-TTTTACCAGCTTATTCAATTCGGTCCTCGCACAGGGACATG -3' 3'- ATGGTCGAATAAGTTAAGCCAGGAGAGCGTGTCCCT -5' or 5'- TTTTTGTAGACATTCTAGTATCTCGTCAAGTCGGAAGGGACATG -3' 3'- AACATCTGTAAGATCATAGAGCAGTTCAGCCTTCCCT -5', wherein A is dideoxy A.
43. The kit of claim 41, wherein the restriction endonuclease is atype IIS
endonuclease.
44. The kit of claim 43, wherein the type II3 endonuclease is BsrnFI.
45-45. The kit of claim 41, wherein the primers for amplification are selected from the group consisting of 5'-CCAGCTTATTCAATTCGGTCC-3' and 5'-GTAGACATTCTAGTATCTCGT-3 '.
CA002185379A 1995-09-12 1996-09-12 Method for serial analysis of gene expression Abandoned CA2185379A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US08/527,154 US5695937A (en) 1995-09-12 1995-09-12 Method for serial analysis of gene expression
US08/544,861 US5866330A (en) 1995-09-12 1995-10-18 Method for serial analysis of gene expression
US08/527,154 1995-10-18
US08/544,861 1995-10-18

Publications (1)

Publication Number Publication Date
CA2185379A1 true CA2185379A1 (en) 1997-03-13

Family

ID=27062344

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002185379A Abandoned CA2185379A1 (en) 1995-09-12 1996-09-12 Method for serial analysis of gene expression

Country Status (12)

Country Link
US (3) US5866330A (en)
EP (2) EP1231284A3 (en)
JP (3) JP3334806B2 (en)
AT (1) ATE239093T1 (en)
AU (2) AU7018896A (en)
CA (1) CA2185379A1 (en)
DE (2) DE69627768T2 (en)
DK (1) DK0761822T3 (en)
ES (1) ES2194957T3 (en)
GB (1) GB2305241B (en)
IE (1) IE80465B1 (en)
WO (1) WO1997010363A1 (en)

Families Citing this family (124)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5459037A (en) * 1993-11-12 1995-10-17 The Scripps Research Institute Method for simultaneous identification of differentially expressed mRNAs and measurement of relative concentrations
US6379897B1 (en) 2000-11-09 2002-04-30 Nanogen, Inc. Methods for gene expression monitoring on electronic microarrays
US5866330A (en) * 1995-09-12 1999-02-02 The Johns Hopkins University School Of Medicine Method for serial analysis of gene expression
US6418382B2 (en) 1995-10-24 2002-07-09 Curagen Corporation Method and apparatus for identifying, classifying, or quantifying DNA sequences in a sample without sequencing
US5871697A (en) 1995-10-24 1999-02-16 Curagen Corporation Method and apparatus for identifying, classifying, or quantifying DNA sequences in a sample without sequencing
US5972693A (en) * 1995-10-24 1999-10-26 Curagen Corporation Apparatus for identifying, classifying, or quantifying DNA sequences in a sample without sequencing
GB9618544D0 (en) * 1996-09-05 1996-10-16 Brax Genomics Ltd Characterising DNA
US5981190A (en) * 1997-01-08 1999-11-09 Ontogeny, Inc. Analysis of gene expression, methods and reagents therefor
US5968784A (en) * 1997-01-15 1999-10-19 Chugai Pharmaceutical Co., Ltd. Method for analyzing quantitative expression of genes
US6461814B1 (en) * 1997-01-15 2002-10-08 Dominic G. Spinella Method of identifying gene transcription patterns
US6143496A (en) 1997-04-17 2000-11-07 Cytonix Corporation Method of sampling, amplifying and quantifying segment of nucleic acid, polymerase chain reaction assembly having nanoliter-sized sample chambers, and method of filling assembly
JP2002502237A (en) * 1997-05-12 2002-01-22 ライフ テクノロジーズ,インコーポレイテッド Methods for generation and purification of nucleic acid molecules
WO1998053319A2 (en) * 1997-05-21 1998-11-26 The Johns Hopkins University Gene expression profiles in normal and cancer cells
AU750187B2 (en) 1997-09-17 2002-07-11 Johns Hopkins University, The P53-induced apoptosis
US6399334B1 (en) 1997-09-24 2002-06-04 Invitrogen Corporation Normalized nucleic acid libraries and methods of production thereof
US6297010B1 (en) * 1998-01-30 2001-10-02 Genzyme Corporation Method for detecting and identifying mutations
US6054276A (en) 1998-02-23 2000-04-25 Macevicz; Stephen C. DNA restriction site mapping
US6136537A (en) * 1998-02-23 2000-10-24 Macevicz; Stephen C. Gene expression analysis
DE19822287C2 (en) * 1998-05-18 2003-04-24 Switch Biotech Ag Cloning vector, its production and use for the analysis of mRNA expression patterns
AU4825699A (en) * 1998-06-19 2000-01-05 Genzyme Corporation Identification and use of differentially expressed genes and polynucleotide sequences
AU1478200A (en) * 1998-11-16 2000-06-05 Genelabs Technologies, Inc. Method for measuring target polynucleotides and novel asthma biomolecules
EP1024201B1 (en) * 1999-01-27 2003-11-26 Commissariat A L'energie Atomique Microassay for serial analysis of gene expression and applications thereof
JP3924976B2 (en) * 1999-02-17 2007-06-06 味の素株式会社 Gene frequency analysis method
AU3237600A (en) * 1999-02-23 2000-09-14 Warner-Lambert Company System and method for managing and presenting information derived from gene expression profiling
US7008768B1 (en) * 1999-02-26 2006-03-07 The United States Of America As Represented By The Department Of Health And Human Services Method for detecting radiation exposure
AU7569600A (en) 1999-05-20 2000-12-28 Illumina, Inc. Combinatorial decoding of random nucleic acid arrays
US20060115826A1 (en) * 1999-06-28 2006-06-01 Michael Bevilacqua Gene expression profiling for identification monitoring and treatment of multiple sclerosis
US20040225449A1 (en) * 1999-06-28 2004-11-11 Bevilacqua Michael P. Systems and methods for characterizing a biological condition or agent using selected gene expression profiles
US6692916B2 (en) * 1999-06-28 2004-02-17 Source Precision Medicine, Inc. Systems and methods for characterizing a biological condition or agent using precision gene expression profiles
US20080183395A1 (en) * 1999-06-28 2008-07-31 Michael Bevilacqua Gene expression profiling for identification, monitoring and treatment of multiple sclerosis
US6960439B2 (en) * 1999-06-28 2005-11-01 Source Precision Medicine, Inc. Identification, monitoring and treatment of disease and characterization of biological condition using gene expression profiles
US20050060101A1 (en) * 1999-06-28 2005-03-17 Bevilacqua Michael P. Systems and methods for characterizing a biological condition or agent using precision gene expression profiles
WO2001009384A2 (en) * 1999-07-29 2001-02-08 Genzyme Corporation Serial analysis of genetic alterations
US6306628B1 (en) * 1999-08-25 2001-10-23 Ambergen, Incorporated Methods for the detection, analysis and isolation of Nascent proteins
US6376177B1 (en) 1999-10-06 2002-04-23 Virtual Pro, Inc. Apparatus and method for the analysis of nucleic acids hybridization on high density NA chips
GB9923790D0 (en) * 1999-10-08 1999-12-08 Isis Innovation Immunoregulatory compositions
US6221600B1 (en) 1999-10-08 2001-04-24 Board Of Regents, The University Of Texas System Combinatorial oligonucleotide PCR: a method for rapid, global expression analysis
CA2395920A1 (en) * 1999-12-29 2001-07-05 Arch Development Corporation Method for generation of longer cdna fragments from sage tags for gene identification
US6566130B1 (en) 2000-01-28 2003-05-20 Henry M. Jackson Foundation For The Advancement Of Military Medicine Androgen-regulated gene expressed in prostate tissue
US20090176722A9 (en) 2000-01-28 2009-07-09 Shiv Srivastava Androgen-regulated PMEPA1 gene and polypeptides
US6618679B2 (en) 2000-01-28 2003-09-09 Althea Technologies, Inc. Methods for analysis of gene expression
AU2001234769A1 (en) * 2000-02-04 2001-08-14 Genzyme Corporation Isolation and identification of secreted proteins
US6897020B2 (en) 2000-03-20 2005-05-24 Newlink Genetics Inc. Methods and compositions for elucidating relative protein expression levels in cells
CA2403567A1 (en) * 2000-03-20 2001-09-27 Newlink Genetics Methods and compositions for elucidating protein expression profiles in cells
US6468749B1 (en) 2000-03-30 2002-10-22 Quark Biotech, Inc. Sequence-dependent gene sorting techniques
AU2001262152A1 (en) * 2000-03-31 2001-10-08 Memorec Stoffel Gmbh Method for extracting nucleic acids
DK2206791T3 (en) 2000-04-10 2016-10-24 Taxon Biosciences Inc Methods of study and genetic analysis of populations
WO2001084148A2 (en) 2000-04-28 2001-11-08 Sangamo Biosciences, Inc. Pharmacogenomics and identification of drug targets by reconstruction of signal transduction pathways based on sequences of accessible regions
AU2001257331A1 (en) 2000-04-28 2001-11-12 Sangamo Biosciences, Inc. Methods for designing exogenous regulatory molecules
US7923542B2 (en) 2000-04-28 2011-04-12 Sangamo Biosciences, Inc. Libraries of regulatory sequences, methods of making and using same
GB2365011A (en) * 2000-04-28 2002-02-13 Sangamo Biosciences Inc Methods for the characterisation of regulatory DNA sequences
CN100497655C (en) * 2000-05-01 2009-06-10 荣研化学株式会社 Method for detecting product of nucleic acid synthesizing reaction
DE10027218A1 (en) * 2000-05-31 2001-12-06 Hubert Bernauer Detecting heterogeneous nucleic acid sequences in organisms and cells, useful for detecting and identifying genetically modified organisms or their products
US7300751B2 (en) * 2000-06-30 2007-11-27 Syngenta Participations Ag Method for identification of genetic markers
AU7970401A (en) * 2000-06-30 2002-01-14 Syngenta Participations Ag Method for identification, separation and quantitative measurement of nucleic acid fragments
WO2002010438A2 (en) * 2000-07-28 2002-02-07 The Johns Hopkins University Serial analysis of transcript expression using long tags
US7257562B2 (en) * 2000-10-13 2007-08-14 Thallion Pharmaceuticals Inc. High throughput method for discovery of gene clusters
DE10100121A1 (en) * 2001-01-03 2002-08-01 Henkel Kgaa Method for determining skin stress or skin aging in vitro
DE10100127A1 (en) * 2001-01-03 2002-10-02 Henkel Kgaa Procedure for determining the homeostasis of the skin
US7754208B2 (en) 2001-01-17 2010-07-13 Trubion Pharmaceuticals, Inc. Binding domain-immunoglobulin fusion proteins
WO2002059359A2 (en) 2001-01-24 2002-08-01 Syngenta Participations Ag Method for non-redundant library construction
WO2002059357A2 (en) * 2001-01-24 2002-08-01 Genomic Expression Aps Assay and kit for analyzing gene expression
US20030165865A1 (en) * 2001-01-29 2003-09-04 Hinkel Christopher A. Methods of analysis of nucleic acids
FR2821087B1 (en) * 2001-02-16 2004-01-02 Centre Nat Rech Scient PROCESS FOR QUALITATIVE AND QUANTITATIVE ANALYSIS OF A POPULATION OF NUCLEIC ACIDS CONTAINED IN A SAMPLE
GB0104993D0 (en) * 2001-02-28 2001-04-18 Isis Innovations Ltd Methods for analysis of RNA
US6850930B2 (en) 2001-03-13 2005-02-01 Honeywell International Inc. Method for transforming words to unique numerical representation
JPWO2002074951A1 (en) * 2001-03-15 2005-05-19 呉羽化学工業株式会社 Method for creating cDNA tag for identification of expressed gene and method for gene expression analysis
AU2002245988A1 (en) * 2001-04-18 2002-10-28 Ulrich J. Krull Gradient resolved hybridisation platform
JP2004533245A (en) * 2001-05-04 2004-11-04 ヘルス リサーチ インコーポレイテッド High-throughput assays to identify gene expression modifiers
US20030082584A1 (en) * 2001-06-29 2003-05-01 Liang Shi Enzymatic ligation-based identification of transcript expression
US20030170695A1 (en) * 2001-06-29 2003-09-11 Liang Shi Enzymatic ligation-based identification of nucleotide sequences
US7026123B1 (en) 2001-08-29 2006-04-11 Pioneer Hi-Bred International, Inc. UTR tag assay for gene function discovery
AU2002350131A1 (en) * 2001-11-09 2003-05-26 Gene Logic Inc. System and method for storage and analysis of gene expression data
EP1451340B1 (en) * 2001-11-09 2014-01-08 Life Technologies Corporation Identification, monitoring and treatment of disease and characterization of biological condition using gene expression profiles
US20030190618A1 (en) * 2002-03-06 2003-10-09 Babru Samal Method for generating five prime biased tandem tag libraries of cDNAs
DE60326224D1 (en) * 2002-04-26 2009-04-02 Solexa Inc SIGNATURES OF CONSTANT LENGTH FOR THE PARALLEL SEQUENCING OF POLYNUCLEOTIDES
US7115370B2 (en) 2002-06-05 2006-10-03 Capital Genomix, Inc. Combinatorial oligonucleotide PCR
US20050250100A1 (en) * 2002-06-12 2005-11-10 Yoshihide Hayashizaki Method of utilizing the 5'end of transcribed nucleic acid regions for cloning and analysis
JP2004097158A (en) * 2002-09-12 2004-04-02 Kureha Chem Ind Co Ltd METHOD FOR PRODUCING cDNA TAG FOR IDENTIFICATION OF EXPRESSION GENE AND METHOD FOR ANALYZING GENE EXPRESSION BY USING THE cDNA TAG
AU2003295692A1 (en) * 2002-11-15 2004-06-15 Sangamo Biosciences, Inc. Methods and compositions for analysis of regulatory sequences
GB0228289D0 (en) 2002-12-04 2003-01-08 Genome Inst Of Singapore Nat U Method
DE10260928A1 (en) * 2002-12-20 2004-07-08 Henkel Kgaa Method for the determination of markers of human facial skin
DE10260931B4 (en) * 2002-12-20 2006-06-01 Henkel Kgaa Method for determining the homeostasis of hairy skin
EP1587914A4 (en) * 2003-01-16 2007-06-27 Health Research Inc Method for comprehensive identification of cell lineage specific genes
US20100216649A1 (en) * 2003-05-09 2010-08-26 Pruitt Steven C Methods for protein interaction determination
AU2004239760A1 (en) * 2003-05-09 2004-11-25 Health Research Inc. Improved methods for protein interaction determination
US8222005B2 (en) * 2003-09-17 2012-07-17 Agency For Science, Technology And Research Method for gene identification signature (GIS) analysis
EP2202322A1 (en) 2003-10-31 2010-06-30 AB Advanced Genetic Analysis Corporation Methods for producing a paired tag from a nucleic acid sequence and methods of use thereof
JP3845416B2 (en) * 2003-12-01 2006-11-15 株式会社ポストゲノム研究所 Gene tag acquisition method
EP1718765A2 (en) * 2004-01-26 2006-11-08 Isis Innovation Limited Molecular analysis
US20050266447A1 (en) * 2004-04-19 2005-12-01 Pioneer Hi-Bred International, Inc. Method for identifying activators of gene transcription
US20070003924A1 (en) * 2004-06-18 2007-01-04 The Ohio State University Research Foundation Serial analysis of ribosomal and other microbial sequence tags
US8005621B2 (en) * 2004-09-13 2011-08-23 Agency For Science Technology And Research Transcript mapping method
CN105012953B (en) 2005-07-25 2018-06-22 阿普泰沃研发有限责任公司 B- cells are reduced with CD37- specificity and CD20- specific binding molecules
CN101395281B (en) * 2006-01-04 2013-05-01 骆树恩 Methods for nucleic acid mapping and identification of fine-structural-variations in nucleic acids and utilities
US8071296B2 (en) * 2006-03-13 2011-12-06 Agency For Science, Technology And Research Nucleic acid interaction analysis
WO2007111937A1 (en) 2006-03-23 2007-10-04 Applera Corporation Directed enrichment of genomic dna for high-throughput sequencing
US20080124707A1 (en) * 2006-06-09 2008-05-29 Agency For Science, Technology And Research Nucleic acid concatenation
CA2654317A1 (en) 2006-06-12 2007-12-21 Trubion Pharmaceuticals, Inc. Single-chain multivalent binding proteins with effector function
EP2167130A2 (en) * 2007-07-06 2010-03-31 Trubion Pharmaceuticals, Inc. Binding peptides having a c-terminally disposed specific binding domain
JP2011509095A (en) 2008-01-09 2011-03-24 ライフ テクノロジーズ コーポレーション Method for producing a library of paired tags for nucleic acid sequencing
WO2012044847A1 (en) 2010-10-01 2012-04-05 Life Technologies Corporation Nucleic acid adaptors and uses thereof
US8263367B2 (en) * 2008-01-25 2012-09-11 Agency For Science, Technology And Research Nucleic acid interaction analysis
US9328172B2 (en) * 2008-04-05 2016-05-03 Single Cell Technology, Inc. Method of obtaining antibodies of interest and nucleotides encoding same
EP2365003A1 (en) * 2008-04-11 2011-09-14 Emergent Product Development Seattle, LLC CD37 immunotherapeutic and combination with bifunctional chemotherapeutic thereof
WO2009137369A1 (en) * 2008-05-03 2009-11-12 Tufts Medical Center, Inc. Neonatal salivary genomics
US8362318B2 (en) * 2008-12-18 2013-01-29 Board Of Trustees Of Michigan State University Enzyme directed oil biosynthesis in microalgae
WO2010127186A1 (en) 2009-04-30 2010-11-04 Prognosys Biosciences, Inc. Nucleic acid constructs and methods of use
EP2910649A1 (en) 2009-08-24 2015-08-26 National University Corporation Kanazawa University Detection of pancreatic cancer by gene expression profiling
WO2011082253A2 (en) 2009-12-30 2011-07-07 Board Of Trustees Of Michigan State University A method to produce acetyldiacylglycerols (ac-tags) by expression ofan acetyltransferase gene isolated from euonymus alatus (burning bush)
WO2011137368A2 (en) 2010-04-30 2011-11-03 Life Technologies Corporation Systems and methods for analyzing nucleic acid sequences
DK2582846T3 (en) 2010-06-16 2019-02-04 Taxon Biosciences Inc COMPOSITIONS AND PROCEDURES FOR IDENTIFICATION AND MODIFICATION OF CARBON CONTAINING COMPOSITIONS
US9268903B2 (en) 2010-07-06 2016-02-23 Life Technologies Corporation Systems and methods for sequence data alignment quality assessment
CN110016499B (en) 2011-04-15 2023-11-14 约翰·霍普金斯大学 Safety sequencing system
JP6366580B2 (en) 2012-06-22 2018-08-01 エイチティージー モレキュラー ダイアグノスティクス, インコーポレイテッド Molecular malignancy in melanocytic lesions
EP2694669B1 (en) 2012-06-28 2017-05-17 Taxon Biosciences, Inc. Methods for making or creating a synthetic microbial consortium identified by computational analysis of amplicon sequences
CN109457030B (en) 2012-10-29 2022-02-18 约翰·霍普金斯大学 Papanicolaou test for ovarian and endometrial cancer
US10392629B2 (en) 2014-01-17 2019-08-27 Board Of Trustees Of Michigan State University Increased caloric and nutritional content of plant biomass
WO2017027653A1 (en) 2015-08-11 2017-02-16 The Johns Hopkins University Assaying ovarian cyst fluid
EP3347466B1 (en) 2015-09-08 2024-01-03 Cold Spring Harbor Laboratory Genetic copy number determination using high throughput multiplex sequencing of smashed nucleotides
EA201890613A1 (en) 2015-09-21 2018-10-31 Аптево Рисёрч Энд Девелопмент Ллс POLYPEPTIDES CONNECTING CD3
CA3000405A1 (en) 2015-09-29 2017-04-06 Htg Molecular Diagnostics, Inc. Methods for subtyping diffuse large b-cell lymphoma (dlbcl)
CN109023536A (en) * 2018-06-28 2018-12-18 河南师范大学 A kind of plant degradation group library constructing method
JP7445334B1 (en) 2022-09-05 2024-03-07 株式会社キュービクス Detection of pancreatic cancer by combined detection of gene expression pattern specific to pancreatic cancer and measurement of CA19-9

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2036946C (en) * 1990-04-06 2001-10-16 Kenneth V. Deugau Indexing linkers
WO1993000353A1 (en) * 1991-06-20 1993-01-07 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Sequences characteristic of human gene transcription product
AU3665893A (en) * 1992-02-12 1993-09-03 United States Of America, As Represented By The Secretary, Department Of Health And Human Services, The Sequences characteristic of human gene transcription product
US5665544A (en) * 1992-05-27 1997-09-09 Amersham International Plc RNA fingerprinting to determine RNA population differences
US5840484A (en) * 1992-07-17 1998-11-24 Incyte Pharmaceuticals, Inc. Comparative gene transcript analysis
US6114114A (en) * 1992-07-17 2000-09-05 Incyte Pharmaceuticals, Inc. Comparative gene transcript analysis
US5756291A (en) * 1992-08-21 1998-05-26 Gilead Sciences, Inc. Aptamers specific for biomolecules and methods of making
US5652128A (en) * 1993-01-05 1997-07-29 Jarvik; Jonathan Wallace Method for producing tagged genes, transcripts, and proteins
US5459037A (en) * 1993-11-12 1995-10-17 The Scripps Research Institute Method for simultaneous identification of differentially expressed mRNAs and measurement of relative concentrations
WO1995014772A1 (en) * 1993-11-12 1995-06-01 Kenichi Matsubara Gene signature
EP1813684A3 (en) * 1994-02-14 2009-11-18 Smithkline Beecham Corporation Differently expressed genes in healthy and diseased subjects
US5552278A (en) * 1994-04-04 1996-09-03 Spectragen, Inc. DNA sequencing by stepwise ligation and cleavage
US5866330A (en) * 1995-09-12 1999-02-02 The Johns Hopkins University School Of Medicine Method for serial analysis of gene expression
US5658736A (en) * 1996-01-16 1997-08-19 Genetics Institute, Inc. Oligonucleotide population preparation

Also Published As

Publication number Publication date
AU6561496A (en) 1997-03-20
DE69627768D1 (en) 2003-06-05
DE69627768T2 (en) 2004-04-08
US6746845B2 (en) 2004-06-08
AU7018896A (en) 1997-04-01
JP2001155035A (en) 2001-06-08
JPH10511002A (en) 1998-10-27
EP1231284A2 (en) 2002-08-14
WO1997010363A1 (en) 1997-03-20
IE80465B1 (en) 1998-08-12
US5866330A (en) 1999-02-02
JP3334806B2 (en) 2002-10-15
GB9619024D0 (en) 1996-10-23
DE761822T1 (en) 2001-01-11
DK0761822T3 (en) 2003-08-18
US6383743B1 (en) 2002-05-07
EP1231284A3 (en) 2003-02-26
EP0761822A3 (en) 1998-08-05
GB2305241A (en) 1997-04-02
GB2305241B (en) 1999-11-10
ES2194957T3 (en) 2003-12-01
AU707846B2 (en) 1999-07-22
US20030049653A1 (en) 2003-03-13
EP0761822B1 (en) 2003-05-02
JP2001145495A (en) 2001-05-29
EP0761822A2 (en) 1997-03-12
ATE239093T1 (en) 2003-05-15

Similar Documents

Publication Publication Date Title
CA2185379A1 (en) Method for serial analysis of gene expression
US6498013B1 (en) Serial analysis of transcript expression using MmeI and long tags
EP1054999B1 (en) Solid phase selection of differentially expressed genes
Mount et al. Sequence of U1 RNA from Drosophila melanogaster: implications for U1 secondary structure and possible involvement in splicing
US6403319B1 (en) Analysis of sequence tags with hairpin primers
Sylvestre et al. Long mRNAs coding for yeast mitochondrial proteins of prokaryotic origin preferentially localize to the vicinity of mitochondria
Kato Description of the entire mRNA population by a 3′ end cDNA fragment generated by class IIS restriction enzymes
CN105358714B (en) Enrichment of DNA sequencing libraries from samples containing small amounts of target DNA
RU2111254C1 (en) Method of detection of differentially expressing template rnas and cloning the corresponding cdna fragments
KR20210029147A (en) Compositions and methods for storing nucleic acid-based data
CN102858995A (en) Methods of targeted sequencing
KR20200132921A (en) Chemical methods for storing nucleic acid-based data
JP4669614B2 (en) Polymorphic DNA fragments and uses thereof
US20060228714A1 (en) Nucleic acid representations utilizing type IIB restriction endonuclease cleavage products
Samatov et al. Expressible molecular colonies
EP2510114A1 (en) Rna analytics method
Cullis et al. Isolation of tissue culture-induced polymorphisms in bananas by representational difference analysis
Tomlinson Serial analysis of gene expression (SAGE) for studying the platelet and megakaryocyte transcriptome
JP2004526443A (en) RNA analysis method
JP2002500050A (en) Solid phase selection of differentially expressed genes
JP2004532048A (en) REALHAPPY mapping

Legal Events

Date Code Title Description
FZDE Discontinued

Effective date: 20051114