CA2264581C - Characterising dna - Google Patents

Characterising dna Download PDF

Info

Publication number
CA2264581C
CA2264581C CA002264581A CA2264581A CA2264581C CA 2264581 C CA2264581 C CA 2264581C CA 002264581 A CA002264581 A CA 002264581A CA 2264581 A CA2264581 A CA 2264581A CA 2264581 C CA2264581 C CA 2264581C
Authority
CA
Canada
Prior art keywords
sub
fragments
sequence
site
sticky end
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CA002264581A
Other languages
French (fr)
Other versions
CA2264581A1 (en
Inventor
Gunter Schmidt
Andrew Hugin Thompson
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xzillion GmbH and Co KG
Original Assignee
Xzillion GmbH and Co KG
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xzillion GmbH and Co KG filed Critical Xzillion GmbH and Co KG
Publication of CA2264581A1 publication Critical patent/CA2264581A1/en
Application granted granted Critical
Publication of CA2264581C publication Critical patent/CA2264581C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1096Processes for the isolation, preparation or purification of DNA or RNA cDNA Synthesis; Subtracted cDNA library construction, e.g. RT, RT-PCR
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6834Enzymatic or biochemical coupling of nucleic acids to a solid phase
    • C12Q1/6837Enzymatic or biochemical coupling of nucleic acids to a solid phase using probe arrays or probe chips

Abstract

A method for characterising cDNA, which comprises: (a) cutting a sample comprising a population of one or more cDNAs or isolated fragments thereof, each having a strand complementary to the 3' poly-A terminus of an mRNA and bearing a tail, with a first sampling endonuclease at a first sampling site of known displacement from a reference site proximal to the tail to generate from each cDNA or isolated fragment thereof a first and second sub-fragment, each comprising a sticky end sequence of predetermined length and unknown sequence, the first sub-fragment bearing the tail; (b) sorting either the first or second sub-fragments into sub-populations according to their sticky end sequence and recording the sticky end sequence of each sub-population as the first sticky end; (c) cutting the sub-fragments in each sub-population with a second sampling endonuclease, which is the same as or different from the first sampling endonuclease, at a second sampling site of known displacement from the first sampling site to generate from each sub-fragment a further sub-fragment comprising a second sticky end sequence of predetermined length and unknown sequence; and (d) determining each second sticky end sequence; wherein the aggregate length of the first and second sticky end sequences of each sub-fragment is from 6 to 10; and wherein the sequences and relative positions of the reference site and first and second sticky ends characterise the or each cDNA.

Description

CA 02264581 1999-03-02wo 93/10095 PCT/GB97/02403CHARACTERI S ING DNAField of the InventionThe present invention relates to a method for characterising DNA,especially CDNA, so that the DNA may be identified, for example,from a population of DNAs. The invention also relates to amethod for assaying the DNA.Background to the InventionAnalysis of complex nucleic acid populations is a common problemin many areas of molecular biology, nowhere more so than in theanalysis of patterns of gene expression. Various methods havebeen developed to allow simultaneous analysis of entire mRNApopulations, or their corresponding cDNA populations, to enableus to begin to understand patterns of gene expression in Vivo.The method of "subtractive cloning" (Lee et al, Proc. Nat. Acad.Sci. USA 88, 2825-2829) allows identification of mRNAs, orrather, their corresponding cDNAs, that are differentiallyexpressed. in two related cell types. One can selectivelyeliminate cDNAs common to two related cell types by hybridisingcDNAs from a library derived from one cell type to a large excessof mRNA from a related, but distinct cell type. mRNAs in thesecond cell type complementary to cDNAs from the first type willform double—stranded hybrids. Various enzymes exist whichdegrade such ds—hybrids allowing these to be eliminated thusenriching the remaining population in cDNAs unique to the firstcell type. This method allows highly specific comparativeinformation about differences in gene expression between relatedcell types to be derived and has had moderate success inisolating rare cDNAs.The method of "differential display" (Laing and Pardee, Science257, 967-971, 1992) sorts mRNAs using PCR primers to amplifyselectively specific subsets of an mRNA population. An mRNApopulation is primed with a general poly—T primer to amplify onestrand and a specific primer, of perhaps 10 nucleotides or so toamplify the reverse strand with greater specificity. In this wayCA 02264581 1999-03-02wo 93/10095 PCT/GB97/02403-2-only mRNAs bearing the second primer sequence are amplified; thelonger the second primer the smaller a proportion of the totalCDNA population is amplified or any given sequence of that lengthused. The resultant amplified sub-population can then be clonedfor screening" or sequencing or the fragments can simply’ beseparated on a sequencing gel. Low copy number mRNAs are lesslikely to get lost in this sort of scheme in comparison withsubtractive cloning, for example, and it is probably morereproducible. Whilst this method is more general thansubtractive cloning, time-consuming analysis is required.The method of "molecular indexing" (PCT/GB93/01452) usespopulations of adaptor molecules to hybridise to the ambiguoussticky-ends generated by cleavage of a nucleic acid with a typeIIs restriction endonuclease to categorise the cleavagefragments. Using specifically engineered adaptors one canspecifically immobilise or amplify or clone specific subsets offragments in a manner similar to differential display butachieving a greater degree of control. Again, time—consuminganalysis is required.The method of Kato (Nucleic Acids Research 12, 3685-3690, 19955exemplifies the above molecular indexing approach and effectscDNA population analysis by sorting terminal cDNA fragments intosub—populations followed by selective amplification of specificsubsets of cDNA fragments. Sorting is effected by using typeIIs restriction endonucleases and adaptors. The adaptors alsocarry primer sites which in conjunction with general poly—Tprimers allows selective amplification of terminal CDNA fragmentsas in differential display. It is possibly more precise thandifferential display in that it effects greater sorting: onlyabout 100 cDNAs will be present in a given subset and sorting canbe related to specific sequence features rather than usingprimers chosen by trial and error.The method of "serial analysis of gene expression" (SAGE, Science270, 484-487. 1995) allows identification of mRNAs, or rather,CA 02264581 1999-03-02wo 93/10095 PCT/GB97/02403-3-their corresponding cDNAs, that are expressed in a given celltype. It gives quantitative information about the levels ofthose cDNAs as well. The process involved isolating a "tag" fromevery CDNA in a population using adaptors and type IIsrestriction endonucleases. A tag is a sample of a cDNA sequenceof a fixed number of nucleotides sufficient to identify uniquelythat CDNA in the population. Tags are then ligated together andsequenced. The method gives quantitative data on gene expressionand will readily identify novel cDNAs. However, the method isextremely time—consuming in View of the large amount ofsequencing required.All of the above methods are relatively laborious and rely uponsequencing by traditional gel methods. Moreover, the methodsrequire amplification by PCR, which is prone to produceartefacts.Methods involving hybridisation grids, chips and arrays areadvantageous in that they avoid gel methods for sequencing andare quantitative. They can be performed entirely in solution,thus are readily automatable. These methods come in two forms.The first involves immobilisation of target nucleic aids to anarray of oligonucleotides complementary to the terminal sequencesof the target nucleic acid. Immobilisation is followed. bypartial sequencing of those fragments by a single base method,e.g. using type IIs restriction endonucleases and adaptors. Thisparticular approach is advocated by Brenner in PCT/US95/12678.The second form involves arrays of oligonucleotides of N bplength. The array carries all 4” possible oligonucleotides atspecific points on the grid. Nucleic acids are hybridised assingle strands to the array. Detection of hybridisation isachieved by fluorescently labelling each nucleic acid anddetermining from where on the grid the fluorescence arises, whichdetermines the oligonucleotide to which the nucleic acid hasbound. The fluorescent labels also give quantitative informationabout how much nucleic acid has hybridised to a givenCA 02264581 1999-03-02W0 93/10095 PCT /GB97/02403-4-oligonucleotide. This information and knowledge of the relativequantities of individual nucleic acids should be sufficient toreconstruct the sequences and. quantities of the hybridisingpopulation. This approach is advocated by Lehrach in numerouspapers and Nucleic Acids Research 22, 3423 contains a recentdiscussion. A disadvantage of this approach is that theconstruction of large arrays of oligonucleotides is extremelytechnically demanding and expensive.Summary of the InventionThe present invention provides a method for characterising CDNA,which comprises:(a) cutting a sample comprising a population of one or morecDNAs or isolated fragments thereof, each having a strandcomplementary to the 3' poly—A terminus of an mRNA and bearinga tail, with a first sampling endonuclease at a first samplingsite of known displacement from a reference site proximal to thetail to generate from each CDNA or isolated fragment thereof afirst and second sub—fragment, each comprising a sticky endsequence of predetermined length and unknown sequence, the firstsub—fragment bearing the tail;(b) sorting either the first or second sub—fragments into sub-populations according to their‘ sticky end sequence and recordingthe sticky end sequence of each sub—population as the firststicky end;(c) cutting the sub—fragments in each sub—population with asecond sampling endonuclease, which is the same as or differentfrom the first sampling endonuclease, at a second sampling siteof known displacement from the first sampling site to generatefrom each sub—fragment a further sub—fragment comprising a secondsticky end sequence of predetermined length.and.unknown sequence;and(d) determining each second sticky end sequence;CA 02264581 1999-03-02W0 98/ 10095 PCT/GB97/02403-5-wherein the aggregate length of the first and second stickyend sequences of each sub—fragment is from 6 to 10; and whereinthe sequences and relative positions of the reference site andfirst and second sticky ends characterise the or each CDNA.Optionally, the sample cut with the first sampling endonucleasecomprises isolated fragments of the cDNAs produced by cutting asample comprising a pmpulation of one or more cDNAs with arestriction endonuclease and isolating fragments whoserestriction site is at the reference site.This invention involves a process that allows a CDNA population,generated by various means, to be sorted into sub—populations orsubsets. The process also allows the identification of individualmolecules within a subset and it allows the quantity of thoseindividual molecules to be determined. More specifically thisinvention is capable of analysing a population of cDNAs derivedfrom a specific cell type to generate a profile of geneexpression for that cell. This profile would reveal which cDNAsare present and how much of each is present. From this it shouldthen be possible to determine initial quantities of mRNA presentin the cell, possibly by calibrating CDNA quantities against theexpression of a known house—keeping gene whose in vivo levelscould be determined directly.It is not necessary" to sequence an entire CDNA. to identifyuniquely its presence; only a short 'signature’ of a few basepairs should.be sufficient to identify uniquely all cDNAs, given,for example, a total CDNA population of about 80 000 in the humangenome. Given also that in the next few years the entire humangenome will have been sequenced, it should be possible to usesuch signatures derived by this process to acquire the entiresequence of the original cDNAs from a sequence database. With theincomplete database that already exists, signatures that returnno sequence from the database will probably be novel and thisprocess will readily allow them to be isolated for completesequencing. If aigiven signature returns more than one sequencethen this process can readily resolve the returned sequence byCA 02264581 1999-03-02W0 93/10095 PCTIGB97/02403-5-acquiring further sequence data specifically from the sequenceof interest. This is a feature of this process that is of greatadvantage over other methods such as SAGE.Velculescu et al, Science 270, 484 — 487 (1995), have testedhuman sequences in release version 87 of the GenBank sequencedatabase with. every’ possible 9 bp sequence starting front aparticular reference point, their ‘anchoring enzyme’ cuttingsite. Their results indicated that with a 9 bp sequence 95.5 %of tags corresponded to a unique transcript or highly conservedh> 95 % sequence identity over at least 250 bp) transcriptfamily. Increasing the number of bp in the tags to 11 bp, usedto test the database resulted in only a 6 % decrease in thenumber of tags returning more than 1 sequence from the database.Statistically, the odds that 2 sequences with the same signatureare identical sequences, can be calculated using Bayes’ Theorem:P(Identical|Same Signature)=P(Same Signature|IdenticalxP(Identical) (1)P(Same Signature)Where “|” means “given that" and, similarly:P(Not Identical|Same Signature)=P(Same Signature|Not Identical)x(P(Not Identical) (2)P(Same Signature)(1) divided by (2) gives:Posterior Odds Identical=P(Same Signature|Identical) x Prior Odds IdenticalP(Same Signature|Not Identical)CA 02264581 1999-03-02wo 93/10095 PCT/GB97/02403-7-= 4N X Prior Odds IdenticalWhere N is the number of bases in the signature. 4” clearly willrise very quickly with N. The Prior Odds Identical are the knownodds of two random sequences being identical. In terms of a non-redundant sequence database this is actually zero. Thus we have4“ signatures available to search a human sequence database. Thisanalysis assumes equiprobable and spatially uncorrelated bases,which is clearly not true for real sequences. If there is spatialcorrelation. of bases etc., much larger signatures might benecessary but as the analysis of Velculescu et al suggests thisis not the case, longer signatures do not give greater resolutionof sequences; 9 bp is sufficient as the human genome probablycontains of the order of 80 000 sequences of which a large numberare closely related, as defined above. An 8 bp signature gives65536 distinct signatures. For experimental purposes, i.e. foranalysing tissue samples this will be enough to resolve theestimated 15000 distinct cDNAs that are expected in the averagecell but one might expect that a number of signatures mightreturn more than 1 sequence. These can fortunately be readilyresolved by further analysis, as discussed below.Thus, at least for human cDNAs, the aggregate length of the firstand second sticky-end sequences of~ each sub—fragment ispreferably 8, and conveniently, the length of each sticky end is4 .cDNAs from species other than humans can also be readily analysedby the process of the present invention. The aggregate lengthof the first and second sticky-end sequences can be tailored tothe size of the cDNA.population expected for a particular specieswith similar optimization procedures as discussed below. Thesize of the signature may vary depending on the size of thegenome to be analysed. More general nucleic acid populations mayalso be analysed, such as restriction fragments generated fromplasmids or small bacterial or viral genomes. Other similarlygenerated populations could similarly be analysed.CA 02264581 1999-03-02wo 93/10095 PCT/GB97/02403-8-When the restriction endonuclease is used to produce fragmentsfrom the cDNAs, it is preferred that the first samplingendonuclease binds to a first recognition site and cuts at thefirst sampling site at a predetermined displacement from therestriction site of the restriction endonuclease. Preferably,the first recognition site is provided in a first adaptoroligonucleotide which is hybridised or ligated to the restrictionsite of the isolated fragments. In this way, the fragments needcontain no recognition site for the first sampling endonuclease.Preferably, a low stringency restriction endonuclease is used togenerate the CDNA fragments, such as one which recognises a 4base pair binding site (e.g. NlaIII which cuts at CATG leavinga 4bp sticky—end). If too large a binding site needs to berecognised, the probability that no recognisable binding site ispresent in a specific CDNA is too great.As an alternative to using the restriction endonuclease, thefirst sampling endonuclease may bind to the reference site andcut at the first sampling site at a predetermined displacementfrom the reference site. In either arrangement, it is necessarythat a reference site be used becaue this site contributes to theinformation required to establish each "signature".The importance of this step should be noted with regard toanalysing a population of cDNAs. Cleaving the immobilised cDNAswith the ‘reference enzyme’ (i.e. the restriction endonucleaseor first sampling endonuclease) will leave fragments that areknown to be terminated by the reference site that is most 3’ onthe CDNA. With the purpose of searching a database in mind thisgreatly reduces the search by starting from the restriction sitenearest the 3’ terminus (see Figure 8). It also gives additionalspatial information regarding the positions of the ‘signature’,in that there is a defined spacing between an 8 bp signature, sayof two quadrats, and the reference site. There is a lowerprobability of an 8 bp signature occurring with a given spatialrelationship to a defined restriction site than for a given 8 bpsequence to appear at a random position in the whole cDNA or inCA 02264581 1999-03-02wo 93/10095 PCT/GB97/02403-9-the genome as a whole. In this way the determinative power of an8 bp signature is increased so that it is sufficient to identifyuniquely all or at least the vast majority of cDNAs.It is also important to ensure no sampling endonucleaserecognition sites are present in the CDNA fragments prior toaddition of adaptors bearing the sampling endonucleaserecognition site. To avoid this problem the CDNA can be pre-treated with the sampling endonuclease before use of therestriction endonuclease or for that matter the samplingendonuclease and restriction endonuclease can.be the same enzyme.This will generate fragments with ambiguous sticky—ends. If adifferent ‘reference enzyme’ is to be used, the majority of thesesticky—ends will be removed by the subsequent cleavage with the’reference enzyme’ as this would be chosen to cut morefrequently. Those that remain. will be accounted for in thesorting process. This means that there will effectively be two‘reference enzymes’ and this must be taken into account in thesubsequent database searching by searching for both possiblereference sequences. This might return more sequences for eachregion of 8 bp of variable sequence, thus use of two referenceenzymes would preferably be avoided.As a preferred alternative, to ensure the sampling endonucleasebinds only to occurrences of its recognition sequence within anadaptor rather than to occurrences which may occur in the cDNA,one can synthesise the CDNA xuith 5—methyl cytosine and useadaptors synthesised with ordinary cytosine nucleotides. As longas one uses a sampling endonuclease that is methylationsensitive, the sampling endonuclease will only bind tooccurrences of its recognition sequence in an adaptor.Preferably, the second sampling endonuclease binds to a secondrecognition site and Cuts at the second sampling site at apredetermined displacement from the first sampling site. In thisway, information (in the form of the first and second sticky-endsequences) is derived from first and second sampling sites and,CA 02264581 1999-03-02pwo 98/10095 PCT/GB97l02403..l0_additionally, their displacement from one another and from thereference site is known. Preferably, the first and secondsampling endonucleases each comprise a type IIs endonuclease,which may be the same as or different from one another. Thesecond recognition site may be provided in a second adaptoroligonucleotide which is hybridised or ligated to the firststicky-end.The process of the present invention acquires minimal sequencedata so that it is not reliant on excessive sequencing. It doesnot require traditional gel methods to acquire minimal sequenceinformation. Since the entire process takes place in solution,the steps involved could be performed by a liquid—handling robot;hence this process is highly automatable. Sequence data in anautomated system can then be acquired in parallel for the entirecDNA population of a cell.Mixed nucleic acid populationSort molecules into subsetsSample sequence or otherwise characterisemolecules within subsets simultaneouslyThe process avoids excessive sequencing using a samplingprocedure, above, to generate signatures for each CDNA in apopulation. The preferred form of these signatures would be:5 ’ — CATGNNNNNXXXXNNNNNYYYYNNN . . . . NNN — 3 ’Reference...space...Sample l...space...Samp1e 2...unknownspace...po1y-A tailCA 02264581 1999-03-02wo 93/10095 PCT/GB97/02403-11-This sort of signature would preferably be acquired from animmobilised CDNA population but clearly a signature could beacquired from anywhere in a sequence but it must be from the samedefined reference point in each. sequence to be compared ifminimal sequence data is to be usable. The CDNA population ispreferably immobilised using the poly—A tail, in bold at 3'terminus, using, for example, a solid phase matrix. The first 4bp of the signature, in bold, is known, as this corresponds tothe reference site which could be from a low stringency ordinarytype II restriction endonuclease. This may be used to fragmentthe cDNA population initially to generate a reference point fromwhich samples are taken to generate unique signature informationfor every cDNA in a cell. The next 4 bp in bold, are acquired ata known number of bp, which is the same for every cDNA in apopulation, from the ’reference site’ by the ’first samplingendonuclease', which preferably is a type IIs restrictionendonuclease. These 4 bp are unknown, but obviously only 256possibilities exist. These may be determined by pulling outsubsets corresponding to each of the possible 4 bp sequencesusing beads with oligonucleotides complementary to one of thepossible sequences as described below for the sorting procedure.The next 4 bp in bold, are again generated at a known distance,which is the same for every CDNA in a population, from the firstsampled sequence possibly by the same type Ils ’sampling enzyme’and may be determined by the ’adaptor cycle’, as described below.Thus for every cDNA, we have a known restriction site that is thelast one of its kind on the CDNA before the poly—A. tail,separated by a known distance from a sample of the cDNA sequenceof known length. This sample in turn is separated from the nextsample by a known number of bp and the second sample length isagain defined.The sample lengths can be up to 5 bp as determined by the enzymespresently available. The distances between the samples or betweenthe first sample and the reference site can be up to 20 bases butthe actual distance does not matter except that it must be known.CA 02264581 1999-03-02wo 98ll0095 PCT/GB97I02403-12-The restriction endonuclease cutting sequence can be of anylength as long as it is a sequence that is recognised by a typeIIs restriction endonuclease but practically speaking it mustsuch as to ensure that the enzyme actually cuts every cDNA andthat the terminal fragments of the cDNAs that remain are of areasonable length to sample subsequently with the samplingendonuclease.Clearly if a nucleic acid population is subjected to cleavagewith.a restriction endonuclease there will be sticky—ends at bothtermini of the nucleic acid fragments which in most cases wouldbe different at each end. This would cause problems to thissorting process.For the purposes of this invention use of mRNA avoids thisterminus of the UTR of a mRNA ischaracterised by the presence of a poly—A tail. This can be usedproblem, since the 3’to immobilise one terminus of each mRNA present to a matrix witha complementary poly~T oligonucleotide attached to its surface.This ensures only one terminus is exposed to subsequent cleavageby the type IIs restriction enzyme after CDNA synthesis. Afterrestriction all non—immobilised fragments, i.e. those without apoly—A tail are to be washed away leaving only the immobilisedterminal fragments. The purpose of this process is to derivesufficient information to identify uniquely each CDNA moleculepresent in a population. As long as the terminal fragments areof the order of about 10 to 20 nucleotides from the terminationcodon, this should be sufficient to obtain a unique signature forevery cDNA, given a maximum total population of about 100 000cDNAs in the human genome.Type IIs restriction endonucleases, the ’sampling endonucleases’,have the property that they recognise and bind to a specificsequence within a target DNA molecule, but they cut at a defineddistance away from that sequence generating sing1e—strandedsticky—ends of known length but unknown sequence at the cleavagetermini of the restriction products.CA 02264581 2002-10-24-13..For example, the enzyme fokl, generates an ambiguous (i,e,unknown) sticky—end of 4 bp, 9 bp downstream of its recognitionsequence. This ambiguous sticky—end could thus be one of 256possible 4 bp oligonucleotides (see Figure 1). Numerous othertype IIs restriction endonucleases exist and could be used forthis process as discussed below in section on restrictionendonucleases. Their binding site can be provided by the adaptorsused as shown in Figure 2, for example.Numerous type IIs restriction endonucleases exist and could beused as sampling enzymes for this process. Table 1 below givesa list of examples but is by no means comprehensive. A literaryreview of restriction endonucleases can be found in Roberts, R.,J. Nucl. Acids Res. 18, 2351 - 2365, 1988. New enzymes arediscovered at an increasing rate and more up to date listings arerecorded in specialist databases such as REBase which is readilyaccessible on. the internet using software ,packages‘ such asNetscapembr Mosaignnuiis found at the World Wide Web address:http://www.neb.com/rebase/; Roberts R.J., Macelis D. (1996)Nucleic Acids Res. 24(1):223—25, “REBASE—restriction enzymes andmethylases”. REBase lists all restriction enzymes as they arediscovered and is updated regularly, moreover it lists recognitionsequences and isoschizomers of each enzyme and nanufactures andsuppliers. The spacing of recognition sites for a given enzymewithin an adaptor can be tailored according to requirements andthe enzyme’s cutting behaviour. (See Figure 2 above).Enzyme Name Recognition Cutting sitesequenceFokl GGATG 9/13BstFsl GGATG 2/OSfaNI GCATC S/9HgaI GACGC 5/10Bbvl GCAGC 8/12Table 1: Some typical type IIs restriction endonucleasesThe requirement of the process is the generation of ambiguoussticky-ends at the termini of the nucleic acids being analysed.This could also be achieved by controlled use of 5’ to 3’CA 02264581 1999-03-02WO 98/10095 PCTIGB97/02403-14-exonucleases. Clearly any method that achieves the creation ofsuch sticky—ends will suffice for the process.Similarly the low stringency restriction endonuclease isnecessary only to cleave each CDNA once, preferably leavingsticky-ends. Any means, however, of cleaving the immobilisednucleic acid would suffice for this invention. Site specificchemical cleavage has been reported in Chu, B.C.F. and Orgel,L.E., Proc. Natl. Acad. Sci. USA, 1985, 963 — 967. Use of a non-specific nuclease to generate blunt ended fragments might alsobe used. Preferably, though, a type II restriction endonucleasewould be used, chosen for accuracy of recognition of its site,maximal processivity and cheap and ready availability.The first or second sub—fragments may be sorted in step (b) byany sorting method suitable to generate sub—populations accordingto their sticky—end sequence. One method comprises dividing thesub—fragments into an array of samples, each sample in a separatecontainer; contacting the array of samples with an array of solidphase affinity matrices, each solid phase affinity matrix bearinga unique base sequence of same predetermined length as the firststicky end, so that each sample is contacted with one of thepossible base sequences and the array of samples is contactedwith all possible base sequences of that predetermined length forhybridisation to occur only between each unique base sequence andfirst sticky end complementary with one another; and washingunhybridised material from the containers.Thus, a heterogeneous population of nucleic acids derived bycleavage with the sampling endonuclease, like fokl, can be sortedinto sub—populations by ’pulling out’ subsets of nucleic acidscharacterised by a particular sequence at the sticky-ends. Onecan isolate the sub—populations using, for example, beads coatedwith an oligonucleotide carrying a sticky—end complementary tothat on the target subset of nucleic acids. The beads can thenbe isolated, washed and released into a clean container, whichfor the purposes of this process would preferably be a well inCA 02264581 1999-03-02W0 98/10095 PCT/GB97/02403-15-an array. Clearly any means of isolating cDNAs is usable in thisinvention, which includes immobilising complementaryoligonucleotides onto any insoluble, solid phase support. Thismight for example include affinity chromatography, inert beadsand centrifugation or any similar means, but beads, magnetic ornot, are preferred. Any appropriate container could be used butan array of wells would be preferred for use with liquid handlingrobots in an automated embodiment of the process.In an alternative embodiment, CDNA fragments generated by thefirst cleavage with a type IIs restriction endonuclease togenerate ambiguous sticky—ends can.be sorted into sub—populationsaccording’ to their sticky—ends using’ a hybridisation array.Typically, this method comprises (i) binding the sub—fragmentsto a hybridisation array comprising an array of oligonucleotidesets, each set bearing a unique base sequence of samepredetermined length as the first sticky end and identifiable bylocation in the array, all possible base sequences of thatpredetermined length being present in the array, so that eachsub—population bearing its unique first sticky end is hybridisedat an identifiable location in the array; and (ii) determiningthe location to identify the first sticky end sequence.For a 4bp ambiguous sticky-end, every possible combination ofbases can be accounted for with an array of 256 oligonucleotidesets.Ideally, the fragments to be used would be the fragments free insolution generated by the first sampling endonuclease cleavage.These fragments would carry an adaptor at the 5’ terminus. Toallow for a second cleavage with a sampling endonuclease, theoligonucleotides on the array would have to carry a recognitionsite for the second sampling endonuclease.The step of determining each second sticky-end sequence may beaccomplished in a number of ways. By the use of the secondsampling endonuclease, two further sub—fragments are generated.CA 02264581 1999-03-02wo 93/10095 PCT/GB97I02403-16-Generally, immobilized fragments and fragments free in solutionwill have been generated. Either sets of fragments, both bearingambiguous sticky—ends, could be analysed to determine additionalsequence information.Where a hybridisation array has been used to sort sub—fragments,the sub—fragments cut in step (c) are preferably those bound tothe hybridisation array so that the further sub—fragmentsgenerated thereby remain bound to the hybridisation array. Inthis embodiment, the step (d) of determining each second sticky-end sequence comprises contacting the further sub—fragments underhybridisation conditions with an array of adaptoroligonucleotides, each adaptor oligonucleotide bearing a labeland a unique base sequence of same predetermined length as thesecond sticky end, the array containing all possible basesequences of that predetermined length, removing any unhybridisedadaptor oligonucleotide, and determining the location of anyhybridised adaptor oligonucleotide by detection of the label.This embodiment is particularly advantageous because such arraysof oligonucleotides can be constructed in very small chips ofperhaps 2mm? or less. This enables minimal quantities ofreagents to be used and so high concentrations can be used toincrease the hybridisation rate of adaptors, which is the ratelimiting step of this process.As an alternative, where sub-populations of sub—fragments havebeen sorted, the step of determining each second sticky—endsequence comprises isolating the further sub—fragments from step(C) and contacting the further sub—fragments with an array ofadaptor oligonucleotides in.a cycle, each adaptor oligonucleotidebearing a label and a unique base sequence of same predeterminedlength as the second sticky end, the array containing allpossible base sequences of that predetermined length; wherein thecycle comprises sequentially contacting each adaptoroligonucleotide of the array with each sub—population of isolatedsub—fragments under hybridisation conditions, removing anyCA 02264581 1999-03-02wo 93/10095 PCT/GB97/02403-17..unhybridised adaptor oligonucleotide and.determining'the presenceof any hybridised adaptor oligonucleotide by detection of thelabel, then repeating the cycle, until all of the adaptors in thearray have been tested.This particular part of the process may be termed "the adaptorcycle".This part of the process is essentially sequencing byhybridisation and can be understood first by explaining it forthe case of a single nucleic acid. Consider a single nucleicacid, immobilised at one terminus to a fixed insoluble matrix,that has been cleaved at the free terminus, as above, with foklthus generating a 4 bp ambiguous sticky-end.To determine the sequence of that sticky—end one can probe theimmobilised nucleic acid with an adaptor molecule. This would bean oligonucleotide carrying a sticky—end with one, known,sequence of 4 bp of the possible 256. The adaptor wouldadditionally carry a fluorescent probe (and possibly a bindingsite for the sampling endonuclease). If the adaptor iscomplementary to the ambiguous end of the target nucleic acid,it will hybridise and it will then be possible to ligate theadaptor to the target. The immobilised matrix can then be washedto remove any unbound adaptor. To determine whether the adaptorhas hybridised to the immobilised target, one need only measurethe fluorescence of the matrix. This will also reveal how muchof the adaptor has hybridised, hence the amount of immobilisedcDNA. Other means of detecting hybridisation may be used in thisinvention. Radio—labeled adaptors could.be used as an alternativeto a fluorescent probe, so also could dyes, stable isotopes,tagging oligonucleotides, enzymes, carbohydrates, biotin amongstothers.The construction of adaptor oligonucleotides is well known anddetails and reviews are available in numerous texts, including:Gait, M.J. editor, ‘Oligonucleotide Synthesis: A PracticalCA 02264581 1999-03-02W0 98/ 10095 PCT/GB97/02403-18..Approach’, IRL Press, Oxford, 1990;‘Oligonucleotides and. Analogues: A Practical Approach’, IRLPress, Oxford, 1991;Eckstein, editor,Kricka, editor, ‘Nonisotropic DNA ProbeTechniques’, Academic Press, San Diego, 1992; Haugland, ‘Handbookof Fluorescent Probes and Research Chemicals’, Molecular Probes,1992; ‘DNA Probes, 2ndEdition’, Stockton Press, New York, 1993; and Kessler, editor,Inc., Eugene, Keller and Manack,‘Nonradioactive Labeling and Detection of Biomolecules’,Springer—Ver1ag, Berlin, 1992.Conditions for using such adaptors are also well known. Detailson the effects of hybridisation conditions for nucleic acidprobes are available, for example, in any one of the followingtexts: Wetmur, Critical Reviews in Biochemistry and MolecularBiology, 26, 227-259, 1991; Sambrook et al, ‘Molecular Cloning:2nd Edition’, ColdLaboratory, New York, 1989; and Hames, B.D., Higgins, S.J.,A Laboratory Manual, Spring Harbour‘Nucleic Acid Hybridisation: A Practical Approach’, IRL Press,Oxford, 1988.Likewise, ligation of adaptors is well known and chemical methodsof ligation are discussed, for example, in Ferris et al,Nucleosides and Nucleotides 8, 407 — 414, 1989; and Shabarovaet al, Nucleic Acids Research 19, 4247 — 4251, 1991.Preferably, enzymatic ligation would be used and preferredligases are T4 DNA ligase, T7 DNA ligase, E. coli DNA ligase, Taqligase, Pfu ligase, and Tth ligase. Details of such ligases areLehman, Science 186, 790 — 797, 1974;and Engler et al, ‘DNA Ligases’, pg 3 — 30 in Boyer, editor, ‘Thefound, for example, in:Enzymes, Vol 15B’, Academic Press, New York, 1982. Protocols forthe use of such ligases can be found in: Sambrook et al, citedabove; Barany, PCR Methods and Applications, 1: 5 — 16, 1991; andMarsh et al, Strategies 5, 73 — 76, 1992.If the adaptor is not complementary to the ambiguous sticky—endof the target nucleic acid then a second probe can be tried andCA 02264581 1999-03-02wo 93/10095 PCT/GB97/02403-19..the above process repeated until all 256 possible probes havebeen tested.Clearly one of these will have to be complementary to theambiguous end. Once this has been found, then the terminus of thetarget nucleic acid will carry also a funding site for thesampling endonuclease that will allow cleavage of the targetnucleic acid exposing further bases for analysis and the aboveprocess can be repeated for the next 4 bp of the target. Thisiterative process can be repeated until the entire target nucleicacid has been sequenced.In a further aspect, the present invention provides a method foridentifying cDNA iJ1 a sample. The method comprisescharacterising CDNA as described above so as to obtain thesequences and relative positions of the reference site and firstand second sticky—ends and comparing those sequences and relativepositions with. the sequences and relative positions of thereference site and first and second sticky—ends of known cDNAs,such as those available from DNA databases, in order to identifythe or each cDNA in the sample. This method can be used toidentify a single CDNA or a population of cDNAs.In a further aspect, the present invention provides a method forassaying for one or more specific cDNAs in a sample. This assaymethod comprises performing a method of characterising cDNA asdescribed above, wherein the reference site is predetermined,each first sticky—end sequence in sorting step (b) is apredetermined first sticky—end sequence and each second sticky-end sequence in step (d) is determined by assay of apredetermined second sticky—end sequence. In this assay method,the relative positions of the reference site and predeterminedfirst and second sticky—ends characterise the or each specificCDNA. The assay method can be used to detect the presence of asingle specific cDNA or a population of specific cDNAs. Thereference site and first and second sticky—end sequences arepreferably predetermined by selecting corresponding sequencesCA 02264581 1999-03-02wo 93/10095 V PCT/GB97/02403-20-from one or more known target cDNAs, such as those available froma DNA database.The invention will now be described in further detail by way ofexample only, with reference to the following Example and theaccompanying drawings, in which:FIGURE 1 shows the restriction behaviour of fokl;FIGURE 2 shows the cutting behaviour of adaptor oligonucleotides;FIGURE 3 shows the structure of a preferred adaptoroligonucleotide;FIGURE 4 shows the structure of a self—removing adaptoroligonucleotide;FIGURE 5 shows a set of multiple dyes on oligonucleotideadaptors;FIGURES 6a-c show a schematic representation. of a processaccording to one embodiment of the invention;FIGURES 7a-c show a schematic representation of a processaccording to another embodiment of the invention; andFIGURE 8 shows an algorithm to search a sequence database toisolate human cDNAs corresponding to signatures.The process of the invention can be applied to a heterogeneouspopulation of immobilised nucleic acids allowing them to beanalysed in parallel. To be successful when applied to apopulation of nucleic acids, this method relies on the fact thatstatistically 1 out of 256 molecules within the total populationwill carry each of the possible 4 bp sticky—ends after cleavagewith fokl, The average human cell is estimated to express about15000 distinct types of mRNA. If a CDNA population is sorted into256 sub—populations by the sorting procedure described above,each will contain on average 60 different cDNAs given an mRNApopulation of about 15,000 transcripts. If these are then cleavedwith fokl, one would expect that almost all will have differentambiguous sticky—ends (there is about a 1 in 1000 chance of therebeing 2 distinct cDNAs having the same initial 4 bp sticky end)CA 02264581 1999-03-02wo 93/10095 PCT/GB97/02403-21-so for most purposes one can assume that a hybridisation signalcorresponds to a single cDNA type. Thus sequential addition offluorescently labeled adaptors will allow the terminal 4 bp ofa mixed population of cDNAs to be determined, resulting in 8 bpof signature in total for each CDNA in the population.Fluorescence detectors can usually detect fluorescence of justat single molecule as long as the signal reaches thephotomultiplier so choice in the design of immobilisationmatrices is crucial to ensure the fidelity of the process. Thismeans, however, that the hybridisation signal is quantitative,when using fluorescently labeled adaptors, which will reveal howmany adaptor molecules have hybridised to the immobilisedfragments. This is clearly directly proportional to the numberof copies of each CDNA that is present. Thus each hybridisationsignal will also reveal the relative proportion of each cDNAwithin the population. This can be related back to the in vivolevels of the mRNA by determining directly the quantity of aspecific mRNA in vivo, preferably one with a high copy numberlike a housekeeping gene. The ratio of this quantity to therelative quantity of that mRNA as determined by the adaptor cyclewill be the conversion factor to calculate the original in vivoquantities of each mRNA.Detection of fluorescent signals can be performed using opticalequipment that is readily available. Fluorescent labels usuallyhave optimum frequencies for excitation and then fluoresce atspecific wavelengths in returning from an excited state to aground state. Excitation can be performed with lasers at specificfrequencies and fluorescence detected using collections lenses,beam splitters and. signal distribution optics. These directfluorescent signals to photomultiplier systems which convertoptical signals to electronic signals which can be interpretedusing appropriate electronics systems. See, for example, pp 26-28 of PCT/US95/12678. A discussion of solid phase supports canalso be found on pps 12 — 14 of that document.CA 02264581 1999-03-02wo 98/10095 PCTlGB97l02403-22-Having acquired 4 bp of sequence information in the process ofsorting cDNAs into subsets, one need only perform the adaptorcycle once to acquire an 8 bp signature for each CDNA in a well.Using a liquid handling robot, this can be performedsimultaneously for all 256 wells generated by the sortingprocess.The positioning of the recognition site for fokl in the adaptorwill determine whether the next 4 bp exposed are the next 4 bpin the sequence. Alternatively, they may overlap partially withthe last four base pairs thus giving partially redundantinformation or they may be further downstream missing out a fewbases, thus only sampling the sequence of the immobilised targetnucleic acid. This is illustrated in Figure 2. The cuttingbehaviour of adaptors with respect to which nucleotides are leftsingle—stranded in the target nucleic acid is determined by thespacing between the fokl recognition site and the target DNA.Sequential bases can be exposed with adaptor 1, while bases aresampled at intervals by adaptor 2. With adaptor 3, redundantinformation is acquired. Adaptor nucleic acid is shown in bold,whilst fokl binding sites are underlined.Whatever spacing is used, the spatial information relating the4 bp oligonucleotides is retained. For the purposes of thisinvention a sampling approach is sufficient thus allowing thesmallest and most economical adaptor to be constructed. Figure3 shows a preferred minimal adaptor for use in acquiringsignatures in the present invention. The recognition sequenceof fokl is shown in bold.A preferred embodiment of the process is shown in Figures 6a toc. In step 1, mRNAs are immobilized. by hybridisation tobiotinylated poly—T. This allows capture of the population,after reverse transcription of the mRNA onto avidinated glassbeads. In step 2, the poly-A carrying cDNAs are treated with therestriction endonuclease and loose fragments are washed away.In step 3, an adaptor oligonucleotide is added which bears aCA 02264581 1999-03-02wo 93/10095 PCT/GB97/02403-23-sticky~end complementary to the restriction endonuclease sticky-end. The adaptor‘ carries a ‘recognition site for‘ the firstsampling endonuclease and, optionally, a label. In step 4, theimmobilized CDNA fragments are treated with the first samplingendonuclease so as to generate for the first time an immobilizedfragment with a sticky—end and a fragment free in solution (steps2 and 3 are only optional if the immobilized sticky—end fragmentis to be analysed). In step 5 of this embodiment, the loose sub-fragments in solution are isolated from the immobilized sub-fragments and sub—divided into 256 wells. Each well contains aninsoluble matrix, preferably beads, derivatised witholigonucleotides carrying sticky—ends complementary to one of the256 possible sticky—ends. Thus, the beads in each well in step6 will immobilize one of the 256 possible sticky—ends from thesample which are then ligated to the beads. Fragments that arenot immobilized can then be washed way, thus generating a sortedpopulation of 256 sub-populations of cDNA fragments.In step 8, the second sampling endonuclease is added to each wellcontaining the sub—population of immobilized fragments generatedfrom step 7. The second sampling enzyme in this example is BspM1whose recognition site is provided in the same sampling adaptoroligonucleotide attached to the bead.The ambiguous sticky-end YYYY generated in step 8 is present onboth the further sub—fragment in solution and the further sub-fragment immobilized to the bead. The further sub~fragments aretherefore readily separable by washing the immobilized matrix toremove cleaved adaptors and reagent as shown in step 9.At this stage in the process, one option for analysis is to enterthe "adaptor cycle" with the immobilized fragments. This isdiscussed in further detail below. If the fragments to beanalysed by the adaptor cycle are free in solution, then theymust be immobilized first. As a second option, either fragmentcan be analysed further by a number of other methods. If thefragment is labelled with a fluorescent dye, one can determineCA 02264581 1999-03-02wo 98ll0095 PCT/GB97/02403-24-the terminal sequence using a hybridisation chip. If the labelis an immobilization effector, then cleavage fragments can beisolated, immobilized and analysed by a single base method.Referring to step 10 in Figure 6c, the further sub—fragmentattached to the bead enters the adaptor cycle, as discussed infurther detail below.In a second preferred embodiment of the invention as shown inFigures 7a to c, steps 1 to 4 are as described above. At step5 it is the immobilized fragments that are sorted into sub-setsfor further analysis. The cDNAs on beads are divided into 256samples and the cDNAs from the beads are released and the beadsrecovered. At step 6 in Figure 7b, to each well is added amagnetic bead bearing an oligonucleotide complementary to one ofthe possible 256 4bp ambiguous sticky—ends generated by the firstsampling endonuclease. After hybridisation, the beads arerecovered and washed and each bead type binding a sub—populationof the fragments bear a unique first sticky—end are released intoone of 256 clean wells. The wells contain a matrix to immobilizecDNAs permanently, such as avidinated glass beads.In step 8, the hybridisation conditions are altered to releasethe beads, which are then recovered. As a result of step 8, eachwell now contains beads with known first sticky—ends to which aknown adaptor can be added carrying a recognition site for thesame sampling endonuclease (in this case, fokl). Step 9 showsthe step of adding the adaptor oligonucleotide, which ishybridised to the immobilized fragment. In step 10, the samplingendonuclease is added whereby a loose sub-fragment and animmobilized sub—fragment, each.bearing the second sticky—end, aregenerated. Either of these fragments can be further analysed,as discussed in relation to the first embodiment.Use of the adaptor cycle is further described in Figure 6c forthe first embodiment of the invention and in Figure 7c for thesecond embodiment. Referring to Figure 6c, the beads carryingCA 02264581 1999-03-02wo 9s/10095 PCTIGB97/02403-25-the second sticky-end are analysed using the adaptor cycle atstep 10. An adaptor oligonucleotide bearing a fluorescent labelis added to the beads. The adaptor contains a unique sticky—endwhich will be complementary to one of the 256 possible four basesecond sticky—ends that might be present on the immobilized sub-fragment. The sequence of the sticky—end of each adaptoroligonucleotide is predetermined. Unhybridised adaptors arewashed away and the fluorescence is measured. The cycle isrepeated until all of the adaptors have been tested.If a signature returns more than one sequence from a database,one can attempt to resolve these sequences by using the knownsignature information. If resolving sequences is required theadaptor cycle could be altered using adaptors of the form belowshown in Figure 4. This figure shows a self—removing adaptor inwhich the addition of a sampling endonuclease results in theadaptor cleavage of only the nucleotides it adds to the targetnucleic acid, thereby re-exposing the bases whose sequence isbeing determined. The recognition sequence shown in the adaptorin the figure is that of BspM1.After determining the second quadrat of a signature usingadaptors of the form above it would be possible to remove themand then if a particular signature had returned more than onesequence, a second adaptor specific to the terminal 4 bp couldbe added to acquire a further sample. Using an appropriatesampling enzyme this could be 2 or 3 or 4 further bp of sequence,depending on requirement but clearly fewer bases of additionalsequence require fewer adaptors to determine the sequence of theresulting sticky—ends.Once sequence information has been derived for a CDNA, perhapsby previous profiling, the present invention can be used toisolate a specific CDNA fragment using the same approach butfocusing on one specific CDNA. Thus if the first 4 bp ofsignature are known then one can select for that subset of allcDNAs using the corresponding magnetic bead that would have beenCA 02264581 1999-03-02wo 93/10095 PCT/GB97/02403-25-used in the sorting process. The sequence of the next 4. bpderived from the adaptor cycle could then be used to constructan adaptor carrying that appropriate sticky-end and a specificPCR primer. The desired CDNA could then be amplified using ageneral poly—T primer and the specific primer on the adaptor. Theamplified fragment would provide a unique probe that could beused to identify the complete CDNA or mRNA on a Southern orNorthern blot.In order to speed up the adaptor cycle, adaptors can be added ingroups so long as individual subsets of adaptors are each labeledwith a different fluorescent marker to permit hybridisation ofeach adaptor subset to be distinguished. This sort ofmodification.will still allow quantitative information to derivedbut 4 different photomultipliers would.be required to detect eachlabel. Figure 5 shows the use of multiple dyes on adaptors whichwould allow groups of adaptors to be tested simultaneously.One potential problem with the ‘Adaptor Cycle’ is to ensure thathybridisation of probes is accurate. There are major differencesbetween the stability of short oligonucleotide duplexescontaining all Watson—Crick base pairs. For example, duplexescomprising only adenine and thymine are unstable relative toduplexes of guanine and cytosine only. These differences instability can present problems when trying to hybridise mixturesof short oligonucleotides ( e.g. 4mers) to complementary targetDNA. Low temperatures are needed to hybridise A-T rich sequencesbut at these temperatures G-C rich sequences will hybridise tosequences that are not fully complementary. This means that somemismatches may happen and specificity can be lost for the G-Crich sequences. At higher temperatures G-C rich sequences willhybridise specifically but A-T rich sequences will not hybridise.In order to normalise these effects modifications can be made tothe Watson—Crick bases. The following are examples but they arenot limiting:CA 02264581 1999-03-02wo 93/10095 PCT/GB97/02403-27..0 The adenine analogue 2,6-diaminopurine forms threehydrogen bonds to thymine rather than two and therefore formsmore stable base pairs.0 The thymine analogue 5—propynyl dU forms more stable basepairs with adenine.O The guanine analogue hypoxanthine forms two hydrogen bondswith cytosine rather than three and therefore forms less stablebase pairs.These and other possible modifications should make it possibleto compress the temperature range at which random mixtures ofshort nucleotides can hybridise specifically to theircomplementary sequences.It may also be possible to design smaller sets of adaptors withbase analogs that bind to multiple bases such as deoxyinosine,2 — aminopurine or the like (Kong Thoo Lin et al, Nucleic AcidsResearch 20, 5149 — 5152). Such a set might have adaptors of theform below:GGATG GGATGCCTACAANG CCTACANTGN would represent all 4 bases at that position. Thus each adaptorabove represents a set of four adaptors. The two sets shown abovewould have only one common member. Each set would have one commonmember with four other sets. There are only 64 sets with N at the3rd position in the sticky—end, similarly there are only 64 setswith N at the 2nd position. Hence to identify every baseuniquely, 128 sets of adaptors could be used rather than thecomplete 256. To resolve the overlapping sets one might need tohave some initial information about the number of cDNAs in eachof the 256 samples. Sorted sets of cDNAs of the kind to be usedin this process would have on average 60 cDNAs which could beresolved on a sequencing gel. If radiolabeled or fluorescentlylabeled the quantities of each CDNA could be determined. Thismight be valuable in order to save time as each adaptor set addedin the adaptor cycle may take up to an hour to hybridise fully.CA 02264581 1999-03-02wo 93/10095 PCT/GB97I02403_28_Thus any means of increasing the speed of the process might beuseful and worth the additional labour of producing the gels.Clearly also a larger tissue sample might have to be used.Construction of redundant sets above would be made cheaper ifbases with ‘wobble’ could be used to reduce degeneracy.Various single base methods of analysing nucleic acids have beenproposed and may be usable with the present invention. Most ofthese avoid gel techniques of DNA sequencing and potentiallycould be appropriate for analysing, in parallel, thesubpopulations generated by the sorting process described above.Single base methodsare disclosed, for example, in U.S. patent5,302,509; WO 91/06678; J.D. Harding and R.A. Keller, Trends inBiotechnology 10, 55 — 58, 1992; WO 93/21340; Canard et al, Gene148, 1 — 6, 1994; Metzker et al, Nucleic Acids Research 22, 4259— 4267, 1994; PCT/US95/03678; and PCT/GB95/00109.Use a of hybridisation chips, grids or arrays would also bepractical for use with this invention. An array ofoligonucleotides would need to contain only 256 oligonucleotidescorresponding to the 256 possible 4 bp sticky ends that would begenerated by the second treatment of the CDNA fragments with the‘sampling enzyme’. If the fragments to be analysed are labeledwith a fluorescent dye then the sticky—ends in each subset ofcDNAs can be determined from the positions on the grid from whichfluorescence is observed. Analysis using hybridisation grids willalso provide quantitative information in the same way as the‘Adaptor Cycle’. Such methods are described in Lehrach etPoutska, Trends Genet. 2, 174 — 179, 1986; and Pevzner et al,Journal of Biomolecular Structure and Dynamics 9, 399 — 410,1991 .As further information is acquired it will be possible to developthe process further for example to make use of databaseinformation.CA 02264581 1999-03-02wo 93/10095 PCTIGB97/02403_29-Clearly’ with 1JSe of this process a significant database ofsignatures and their corresponding genes will be acquired. It isestimated that there may be as many as 10 000 housekeeping genes.For most purposes it is the tissue specific cDNAs thatresearchers will be interested in. The presence of thehousekeeping genes will undoubtedly be expected and it will beextremely wasteful to have to identify these every time theprocess is used, except perhaps for calibrating expressionlevels. It will be possible using the adaptor cycle, to ignorecertain subsets of cDNAs or miss out certain adaptors if thegenes they identify are known housekeeping genes. This shouldgreatly speed up the process of profiling a cell's cDNAs.Moreover it is highly likely that most adaptors will nothybridise to any sequences. If the tissue specific genes arealready known, and information about abundance is all that issought then only the adaptors corresponding to the expectedsignatures need be used.These sorts of process modifications may require liquid—handlingrobotics that are flexible in their programming.As a further modification, the choice of restrictionendonucleases can be optimised. Since spatial correlation ofbases and nucleotide frequencies are not random in the genomesof living organisms, it might be found empirically that certaincombinations of sampling enzymes may resolve more sequences usingthe 8 bp signatures than other combinations and clearly thesewould be of great value as it would save time spent on resolvingsignatures that return multiple sequences.Similarly, once a database of cell—type specific genes isestablished, resolution steps will probably not be required asit will be known which genes, hence which signatures are to beexpected in a given cell type.Analysing cDNAs to determine sequence variation of alleleles ofa particular gene is a further application that would be of greatCA 02264581 2002-10-24_ 30 -value to develop, in the context of analysing how these changesmight alter patterns of gene expression in a cell. Variations inalleles may alter signatures and again these sorts of effectswill only become apparent with use of this invention and will inthe long term form another extremely useful database forimproving the use of this invention.ExampleExperimental DesignThree different PCR products were used to represent 3 differentgenes at varying expression levels. The PCR product used‘forthis were exons 14,16 and 19 of the anion exchanger (AE1) asthese PCRs have already been optimised in our laboratory. Thesewill be referred to as AE14, AEl6 and AEl9TM(The products were captured to Dynalbeads by incorporating abiotin in one of ‘the PCR primers) a_nd effectively representcaptured CDNA. AE16 was at half the concentration of A1314 andA319 was at one fifth the concentration of AEZ14.AEl4 sequenceccaaagctgggagagaacagaatgccttggttttctgctgcagatcttccaggaccacccactacagaagacttataactacaacgtgttgatggtgcccaaacctcagggccccctgcccaacacagccctcctctcccttgtgctcatggccggtaccttcttctttgccatgatgctgcgcaagttcaagaacagctcctatttccctggcaagtcagcataccctcctcgcctgtccttgccaacactgcAEl6 sequencectgggagaatgccagggaaaggtctctgcctcccaccctcccaggcccagcccccaccctgtctctcacgtggtgatctgagactccaggaatatgaggatgaagaccagcagagcaggcagggcggaggcaaaatcatccagatgggaaactcggaacgcaagcccagtggo ggatgacccagccccgggctgaggagttgacaccttgaagccatcaggcaccgagagtttctgtgggagggggtagcaggtaagaatgccaagggcCA 02264581 1999-03-02wo 98/10095 PCTIGB97l02403AE19 sequencegtgataggcactgaccccagcctccgcctgcaggtgaagacctggcgcatgcacttattcacgggcatccagatcatctgcctggcagtgctgtgggtggtgaagtccacgccggcctccctggccctgcccttcgtcctcatcctcactgtgccgctgcggcgcgtcctgctgccgctcatcttcaggaacgtggagcttcagtgtgtgagtggctgcctgggcctggggcacaagagctgggagcatgcgFollowing capture, they were first digested with the frequentcutter Sau 3A1. This enzyme recognises the sequence GATC.This provided the following 4bp overhangs of each of theproducts.AEl4TTCCAGGACCACC...CTAGAAGGTCCTGGTGG...AEl6TGAGACTCCAGGAATAT..CTAGACTCTGAGGTCCTTATA.AEl9ATCTGCCTGGCAG...CTAGTAGACGGACCGTC...The following adaptor complimentary to the 4bp overhang revealedby Sau 3A1, and containing a Fok I site, was ligated to thecaptured fragments.CAW0 98/ 10095Adaptor SauFAMFAM —CTAGAGGACGATCGA.GGATG.02264581 1999-03-02GATCTCCTGCTAGCT.CCTAC.GATCThis will produce theAEl4FAM — CTAGAGGACGATCGA.GATCTCCTGCTAGCT.AEl6FAM — CTAGAGGACGATCGA.GATCTCCTGCTAGCT.AEl9FAM — CTAGAGGACGATCGA.GATCTCCTGCTAGCTTheseand 13 basesFok I sitefollowing sequencesreleased into solution.AEl4FAM -GGATG.CCTAC.GGATG.CCTAC.GGATG..CCTAC.GATC.CTAGGATC.CTAGGATCCTAG.andPCT/GB97l02403TTCCAGGACCACC....AAGGTCCTGGTGG...TGAGACTCCAGGAATAT....ACTCTGAGGTCCTTATA....ATCTGCCTGGCAG...TAGACGGACCGTC...sequences were then digested with Fok I, which cuts at 9from GGATG,the following fragments wereCTAGAGGACGATCGA.GGATG.GATC.TTCCAGATCTCCTGCTAGCT.CCTAC.CTAG.AAGGTCCTGCA 02264581 1999-03-02wo 93/10995 PCT/GB97/02403_ 33 _AEl6FAM — CTAGAGGACGATCGA . GGATG . GATC . TGAGAGATCTCCTGCTAGCT.CCTAC.CTAG.ACTCTGAGGAEl9FAM - CTAGAGGACGATCGA.GGATG.GATC.ATCTGGATCTCCTGCTAGCT.CCTAC.CTAG.TAGACGGACThe cleaved fragments were then captured, through ligation, to3 different wells of a microtitreplate each containing a specificadaptor (which contains a site for Bbvl ’GCAGC’) simulating thefirst stage division into 256 subgroups and providing the first4 bases. Bbv I cuts at 8 and 12 bases from GCAGC.See adaptor sheet for full sequencesFor AEl4 (adaptor Bbv14)Biotin—N—GCAGC.AGAN-CGTCG.TCT.CAGGBbv I siteFor AEl6 (adaptor Bbvl6)Biotin—N—GCAGC.AGAN-CGTCG.TCT.CCTCFor AE19 (adaptor Bbvl9)Biotin—N—GCAGC.AGAN—CGTCG.TCT.GTCCW0 98/ 10095CAWhere N is a number of bases02264581 1999-03-02This produced the following sequences:For AEl4Biotin—N-GCAGCN-CGTCG.For ARl6Biotin—N-GCAGCN-CGTCG.For AEl9Biotin—N-GCAGC.N-CGTCG..AGA.TCT..AGATCT.AGA.TCT.GTCCTGGAAGATC.CAGGACCTTCTAG..GGAGTCTCAGATC.CCTCAGAGTCTAG.CAGGCAGATGATC.GTCCGTCTACTAG.CATCCCATCCCATCCPCT/GB97l02403.AGCTAGCAGGAGATCGTAGG.TCGATCGTCCTCTAG-FAM.AGCTAGCAGGAGATCGTAGG.TCGATCGTCCTCTAG -FAM.AGCTAGCAGGAGATCGTAGG.TCGATCGTCCTCTAG—FAMAt this point the concentration was measured through fluorescenceof the FAM label and the first 4 bases(XXXX) determined.Following this the fragments were digested with Bbv I and thenext 4bp revealed.For AE14Biotin—N—GCAGC.AGA.GTCCTN—CGTCG.TCT.CAGGACCTTFor AR16Biotin-N-GCAGC.AGA.GGAGTN—CGTCG.TCT.CCTCAGAGTCA 02264581 1999-03-02wo 93/10095 PCT/GB97/02403For AEl9Biotin—N—GCAGC.AGA.CAGGCN-CGTCG.TCT.GTCCGTCTAFollowing digestion 3 different adaptors, complementary to the3 different 4bp over hangs were then ligated to each well in turnto simulate the ‘adaptor cycle’ and the fluorescence measure ateach stage.These adaptors wereAE14 (adaptor C14)GGAA.GATCCTGGACAGTTGCTAGGACCTGTCAAC—FAMAE16 (adaptor C16)CTCA.GATCCTGGACAGTTGCTAGGACCTGTCAAC—FAMAE19 (adaptor C19)AGAT.GATCCTGGACAGTTGCTAGGACCTGTCAAC-FAMSuccessfully ligation, measured by fluorescence thereforeprovided concentration information and the next 4 bases (YYYY)of the ’tag’.CA 02264581 1999-03-02W0 M10095 t PCT/GB97/02403Tag — GATC.YYYY.N.XXXXWhere GATC corresponds to the Sau 3A1 site, XXXX the first 4bases uncovered by the Fok I digestion which is separated by asingle unknown base, N, to YYYY which corresponds to the next 4bases revealed by Bbv I.Materials and MethodsAdaptor Sequences and PreparationSauFam5’—FAM-CTAGAGGACGATCGAGGATG-3'3’-GATCTCCTGCTAGCTCCTACCTAG—PO4-5’'Bbv" AdaptorsBbv145’BIOTIN—6C-CCTAGACTAGAGGACCGATCGAATCAGCAGCAGA—3’3’-GATCTGATCTCCTGGCTAGCTTAGTCGTCGTCTCAGG-P04-5’Bbv165’BIOTIN—6C-CCTAGACTAGAGGACCGATCGAATCAGCAGCAGA—3’3’—GATCTGATCTCCTGGCTAGCTTAGTCGTCGTCTCCTC—P04-5’Bbv195’BIOTIN—6C-CCTAGACTAGAGGACCGATCGAATCAGCAGCAGA—3’3’—GATCTGATCTCCTGGCTAGCTTAGTCGTCGTCTGTCC—P04-5'CA 02264581 1999-03-02wo 93/10095 PCTIGB97/02403Cycling AdptorsC145’FAM—CAACTGTCCAGGATC-3’3’—GTTGACAGGTCCTAGAAGG—PO4-5’C165'FAM-CAACTGTCCAGGATC-3’3’—GTTGACAGGTCCTAGACTC—PO4-5’C195’FAM—CAACTGTCCAGGATC-3’3’—GTTGACAGGTCCTAGTAGA—PO4-5’BioFAMFok5’BIOTIN-GGTCACTTAGATCGATCCATGAGGATGCTTCATTCTGATTCAGTCC-3'3'—CCAGTGAATCTAGCTAGGTACTCCTACGAAGTAAGACTAAGTCAGG-FAMBioG5’BIOTIN—GCATCTGGAGTCTACAGTCGTCTATTGACG—3’3’—CGTAGACCTCAGATGTCAGCAGATAACTGCCGGC-P04-5’GCCG5’FAM—GCATCAGGATGTACAG-3’3’—CGTAGTCCTACATGTCGCCA—PO4-5’FAM— fluorescein PO4 - phosphateCA 02264581 2002-10-24.._38 ..All primers were purchased from Oswell DNA Services.All adaptors were made but heating 200ul of TE containing eachprimer at 20pmol/ul concentration at 90°C, in a Techne Dryblockand allowing the block to cool to room temperature over 2 hours.The adaptors were then incubated on ice for 1 hour and thenfrozen at -20°C until used.Binding Bbv14,l6, and 19 Adaptors to Microtitre plateIn order to capture the Fok 1 cleaved fragments to the ’Bbv'adaptors via ligation the ’Bbv" adaptors were bound to black,streptavidin coated 96 well microtitre plates (BoehringerMannheim). This was achieved by .incubating lopmol of theappropriate adaptor in 3Sul of lxTE+0.lM NaCl in each wellovernight at 4°C. Following the overnight incubation eachwell was washed 3 times with Soul of 1xTE+O.1M NaCl. The1xTE+0.1M Nacl was removed and Soul of lxligase buffer was addedto each well and the plate was stored at 4°C until used.Plate capacityTo determine the binding capacity of each well lopmol ofBioFAMFok adaptor was bound to 8 wells by incubating lopmol ofthe adaptor in 25ul of lxTE+O.lM Nacl in each well overnight at4°C. Following the overnight incubation each well was washed 3times with Soul of 1xTE+O.lM Nacl. A dilution of BioFAMFok (S,2.5, 1.25, 0.675, O.3375pmol) diluted in lxTE+O.lM Nacl was addedto a series of well and the fluorescence of the plate read in aBiolumigmflicrotiter plate Reader (Molecular Dynamics)The following readings (expressed as Relative Fluorescent Units)was obtained.CA 02264581 2002-10-24Dilution wells5 pmol 74575 RFU2.5pmOl 35429 RFU1.25pmol 16232 RFUO.62Spmol 9388 RFUO.337Spmol 4807 RFUWells incubated with 10pmol of_adaptor and washed20872 RFU21516 RFU22519 RFU21679 RFU22658 RFU21517 RFU21742 RFU22417 RFUmean=21865From these figures one can calculate that 21856 RFUS is equal to1.5 pmol of BioFAMFok. This data agree with the capacity of thewells to bind biotinylated double stranded DNA (5pmol hybridisedin 200ul) provided by Boehringer Mannheim technical help line.Effect of Tween 20 on LigationThe addition of 0.1% Tween 20 to the reaction buffer used withFok 1 is claimed to reduce the exonuclease activity associatedwith this enzyme (Fok 1 data sheet - New England Biolabs). Thefollowing experiment was performed in order to determine if theaddition of Tweenymwould have any effect on the subsequentligation of the cleaved fragments.CA 02264581 1999-03-02W0 93/10995 PCT/GB97/02403_ 40 _Nine reactions were set up with each set of three reactions eachcontaining either 0, 0.05 or 0.1% tween in 25ul of lxligasebuffer, 10pmol BioG adaptor, 10pmol GCCG adaptor and 200ul ligase(New England Biolabs). One set of three reactions was set up asthe above with O.1%tween and no ligase. These were thenincubated at 16°C for 1 hours and then each reaction transferredto a well of a black streptavidin coated. microtitre plate(Boehringer Mannheim). The plate was incubated at roomtemperature for one hour and each well washed 3 times with 100ulof TES and the fluorescence measured in a Biolumin Microtiterplate Reader (Molecular Dynamics).The following readings (expressed as Relative Fluorescent Units)was obtained.0% tween 20 0.05% tween 20 0.1% tween 20 0.1% tween 20(no ligase)8592 8742 10213 36608083 8712 10605 39678720 8519 11598 34688465 8657 10805 3698 — meansThe above data demonstrate that the inclusion of 0.1% tween 20increases ligation efficiency and therefore should not bedetrimental to the ligation of the Fok 1 cleaved fragments to the’Bbv" adaptors.PCR primers and Conditions and PurificationThe 3 PCR products used to represent CDNA transcripts atdifferent concentrations were exons 14,16 and 19 from the humanerythrocyte anion exchanger gene located on chromosome 17q21—22.Primer sequences use to amplify exons 14,16 and 19CA 02264581 1999-03-02W0 98/ 10095 PCT/GB97I02403Exon 14Forward primer5’-GTATTTTCCAGCCCAAGCCAAAGCTGG-3’Reverse primer5'BIOTIN—GCAGTGTTGGCAAGGACAGGC—3’Exon 16Forward primer5’BIOTIN-GCCCTTGGCATTCTTACCTGC—3’Reverse primer5’—CTGGGAGAATGCCAGGGAAAGG—3’Exon 19Forward primer5’—GTGATAGGCACTGACCCCAG—3’Reverse primer5’BIOTIN-CGCATGCTCCCAGCTCTTGTGC—3’The inclusion of biotin into one of the primers in each set willallow their capture to streptavidin coated beads (Dynal UK)CA 02264581 2002-10-24-42-All PCR reactions were performed in 50ul containing 1xAmplitaqbuffer (Perkin Elmer), 30pmol of forward and reverse primer200uM dNTPs, 1.25 units of Amplitao"%Perkin Elmer) and 100ng ofhuman genomic DNA. The reactions were overlaid with Soul ofmineral oil and cycled on a Techne ‘Genie’ PCR machine with thefollowing conditions.Exon 141 cycle 95°C for 2 min35 cycles S7.5°C for 45 sec, 72°C for 1 min, 95°C for 35 sec1 cycle 72°C for 5 minExon 161 cycle 95°C for 2 min35 cycles 52°C for 45 sec, 72°C for 1 min, 95°C for 35 sec1 cycle 72°C for 5 minExon 191 cycle 95°C for 2 min35 cycles 57.S°C for 45 sec, 72°C for 1 min, 95°C for 35 sec1 cycle 72°C for 5 minCA 02264581 2002-10-24PurificationExcess primers and salts need to be removed before the PCRproducts are bound to DynaBeads, this is performed as describedbelow.10 reactions of each were pooled following PCR, separately,prior to purification. The PCR. products were then ethanolprecipitated by adding 2.5 volumes of 100% ethanol and one tenthof a volume of 3M sodium acetate. The solution was thenincubated at -2OoC for 30 minutes and then spun at 13000rpm ina Heraeus A13 benchtop centrifuge for 15 minutes to precipitatethe DNA. The supernatant was then poured off and the pelletallowed to air dry. The dry pellet was then resuspended in'150ulof water. Following this, 2 Chromospinnfioo columns (Clonetech)were prepared for each sample by spinning the columns in aHereaus 17RS centrifuge for 3 minutes at 3S0Orpm according to themanufacturer's instructions. Following centrifugation 75ul ofthe DNA solution was added to each prepared column and spun asbefore collecting the purified DNA into a 1.5ml eppendorf tube.The 2 samples for each exon were then. pooled and the DNAconcentration measured by reading the absorption at 260nm and. . TM .280nm in a Pharmacia Genequant spectrophotometer.Solutions and Buffers lxTE pH7.SlOmM Tris HCllmM EDTATES pH7.Sl0mM Tris-HCl1mM EDTACAWO 98/100952M NaCllxFok I buffer pH7.950mM potassium acetate20mM Tris AcetatelOmM magnesium acetatelmM DTT1xBbv I buffer Ph7.950mM NaCllOmM Tris-HCllOmM MgCl2lmM DTTlxsau 3A buffer pH7.933mM Tris acetate66mM potassium acetatel0mM magnesium acetate0 . 5mM DTT1xLigase buffer pH7.850mM Tris—HCllOmM MgCl202264581 1999-03-02PCT/GB97/02403CA 02264581 1999-03-02wo 98/10095 PCTIGB97/02403lOmM DTTlmM ATP50ug/ml BSAResultsConcentrations of Column Purified DNAexon 14 — 130ng/ulexon 16 — 120ng/ulexon 19 — 115ng/ullug exon14 (255bp)=5.9pmol, lug exon16 (272bp)=5.58pmol, lugexon19 (252bp)=6.o3pmo1lug exon14=7.7ul, 1ug exon16= 8.3ul, lug exon19=8.7ul thereforeexon 14=O.76 pmol/ul, exon 16=O.67pmol/ul, exon 19=O.69pmol/ulSau 3A1 Digest30, 15 and 6pmol of column purified exons 14, 16 and 19,respectively, were digested with 20 units of Sau 3A1 in 100ul of1XSau 3A1 buffer at 37°C for 4 hours.exon14 39.5ulexon16 22.4ulexon19 8.7ulSau 3A1 5u110xSau 3A1 buffer 10ulH2O 14.4ulCA 02264581 1999-03-02wo 93/10095 A PCT/GB97/02403-45..Following digestion the reaction mix was heated at 65°C in aTechne Dryblock for 20 minutes to inactivate the enzyme.Preparation of DynaBead M280According to the manufacture’s instructions 3mg of DynaBeads M280will bind 60-120 pmol of biotinylated double stranded DNA.300ul of DynaBeads M280 at lmg/ml were washed with 100ul TES byholding the beads to the side of an eppendorf tube with aMagnetic Particle Concentrator (Dynal UK) so that the supernatantcould be removed. This was repeated three times (All subsequentbead manipulation were carried out in this manner according tomanufacture’s instructions). The beads were resuspended in 100ulof TES and the Sau 3A1 digested DNA added and incubated at roomtemperature for 1 hour to allow the biotinylated DNA to bind tothe beads.The Beads/DNA were then washed three times with lxligase bufferusing the Magnetic Particle Concentrator (Dynal UK) as before.Ligation of SauFAM Adaptor (Containing Fok I site)The supernatant was removed and the beads/DNA were resuspendedin 75ul of lxligase buffer containing 300pmol of SauFAM adaptorand 4000 units of ligase (New England Biolabs).Beads/DNA ,7.5ul 10 ligase buffer, l5ul SauFAM (at 20pmol/ul),10ul ligase (at400 units/ul), 42.5ul H20The reaction was then incubated at 16°C for 2 hours.CA 02264581 1999-03-02wo 98/10095 PCT/GB97/02403Fok I DigestionFollowing ligation the beads/DNA were was 2 times with 75 ul of1x Fok I buffer and the resuspended in looul of lxFok I bufferand heated. at 65°C in. a Techne Dryblock for 20 minutes toinactivate any remaining ligase. The buffer was was removed andthe beads/DNA resuspended in 95ul of 1x Fok I buffer containing20 units of Fok I (New England Biolabs).Beads/DNA, 9.5ul 10x Fok I buffer, Sul Fok I (at 4 units/ul)The beads/DNA were then incubated at 37°C for 2 hours.Following incubation the supernatant, containing the fragmentscleaved by Fok I, was then transferred to a fresh eppendorf tubeand heated at 65°C for 20 minutes in a Techne Dryblock ininactivate the Fok I.Ligation of Fok I Cleaved Fragments to Bbv Adaptors on MicrotiterPlateThe Fok I fragments were then divided into three tubes eachcontaining 30ul of Fok I cleaved fragments, Sul of 10x Ligasebuffer, 3ul ligase (at 400uints/ul —New England Biolabs) and l2ulof H20.The ligase buffer on a plate containing adaptors Bbvl4, 16, 19in separate wells (prepared as previously described) was removedand the above reaction mixtures, containing the Fok I cleavedfragments and ligase, added to each.The wells were then incubated at 16°C for one hour and thenwashed three times with Soul of TES. The TES was removed fromthe wells, another Soul of TES added and the fluorescencemeasured in Biolumin Microplate reader (Molecular Dynamics). ACA 02264581 1999-03-02W0 98/ 10095 i 9 PCT/GB97/02403-48-well to which no fragments were added and just contained Bbvadaptors was used as a blank.Data expressed as RFUsBbVl4 well 1774 RFUBbv16 well 1441 RFUBbV19 well 1192 RFUBlank 1010 RFUThe reading from the blank well, which is a background reading,was subtracted from the reading of the other wells and gave thefollowing.Bbv14 well 764 RFUBbv16 well 431 RFUBbVl9 well 182 RFUAs half as much of exon 16 compared to exon 14 (15pmol exon 16,30 pmol exon 14) was included into the procedure the readingobtained from the Bbv16 well should be half (i.e. 50%) of thatobtained from the Bbvl4 well and as one fifth the amount of exon19 compared to exon 14 (6pmol exon 19, 30 pmol exon 14) thereading obtained from the Bbv19 well should be one fifth (i.e.20%) that obtained from the Bbv14 well.Ideal Reading Expressed As PercentagesBbv14 well 100Bbv16 well 50Bbv19 well 20CA 02264581 1999-03-02W0 98/10095 PCT/GB97/02403Actual Readings Expressed As Percentages (using Bbvl4 well as100%)Bbv14 well 100Bbvl6 well 56.4BbVl9 well 23.8Bbvl6 well 6.4% errorBbv19 well 3.8% errorTherefore, this process is capable of separating a mixedpopulation of DNA ,and identifying 4bp, while at the same timemaintaining the relative proportions of the original mixture withminimal errors. Which in turn can then be reprobed to obtainanother 4bp and the associated quantitative data.CA 02264581 2002-10-24SEQUENCE LISTING(1) GENERAL INFORMATION:(i) APPLICANT:(A) NAME: Brax Genomics Limited(B) STREET: 13 Station Road(C) CITY: Cambridge(E) COUNTRY: United Kingdom(F) POSTAL CODE (ZIP): CB1 2JB(ii) TITLE OF INVENTION: CHARACTERISING DNA(iii) NUMBER OF SEQUENCES: 41(iv) COMPUTER READABLE FORM:(A) MEDIUM TYPE: Floppy disk(B) COMPUTER: IBM PC compatible(C) OPERATING SYSTEM: PC—DOS/MS-DOS(D) SOFTWARE: Patentln Release #1.0, Version #1.30 (EPO)(V) CURRENT APPLICATION DATA:APPLICATION NUMBER: 2,264,581(Vi) PRIOR APPLICATION DATA:(A) APPLICATION NUMBER: GB 9618544.2(B) FILING DATE: O5—SEP-1996(2) INFORMATION FOR SEQ ID NO: 1:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 254 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: unknown(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1:CCAAAGCTGG GAGAGAACAG AATGCCTTGG TTTTCTGCTG CAGATCTTCC AGGACCACCCACTACAGAAG ACTTATAACT ACAACGTGTT GATGGTGCCC AAACCTCAGG GCCCCCTGCCCAACACAGCC CTCCTCTCCC TTGTGCTCAT GGCCGGTACC TTCTTCTTTG CCATGATGCTGCGCAAGTTC AAGAACAGCT CCTATTTCCC TGGCAAGTCA GCATACCCTC CTCGCCTGTCCTTGCCAACA CTGC(2) INFORMATION FOR SEQ ID NO: 2:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 270 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: unknown(D) TOPOLOGY: unknown1/1360120180240254(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:CTGGGAGAATGTCTCTCACGAGGGCGGAGGCCCAGCCCCGGAGGGGGTAG(2) INFORMATION FOR SEQ ID NO: 3:(i) SEQUENCE CHARACTERISTICS:253 base pairs) TYPE: nucleic acid(C) STRANDEDNESS:)(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:GTGATAGGCA CTGACCCCAG CCTCCGCCTG CAGGTGAAGAACGGGCATCC AGATCATCTG CCTGGCAGTG CTGTGGGTGGCTGGCCCTGC CCTTCGTCCT CATCCTCACT GTGCCGCTGCATCTTCAGGA ACGTGGAGCT TCAGTGTGTT GAGTGGCTGCCTGGGAGCAT(2) INFORMATION FOR SEQ ID NO: 4:(i) SEQUENCE CHARACTERISTICS:30 base pairs(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:(C)(D)GCCAGGGAAATGGTGATCTGCAAAATCATCGGCTGAGGAGCAGGTAAGAA(A) LENGTH:(BGCG(A) LENGTH:(B) TYPE: nucleic acidSTRANDEDNESS:TOPOLOGY: unknownCA 02264581 2002-10-24GGTCTCTGCC TCCCACCCTCAGACTCCAGG AATATGAGGACAGATGGGAA ACTCGGAACGTTGACACCTT GAAGCCATCATGCCAAGGGCTOPOLOGY: unknownTTCCAGGACC ACCCTAGAAG GTCCTGGTGG(2) INFORMATION FOR SEQ ID NO: 5:(i) SEQUENCE CHARACTERISTICS:unknowndouble2:3:4:CCAGGCCCAGTGAAGACCAGCAAGCCCAGTGGCACCGAGACCTGGCGCATTGAAGTCCACGGCGCGTCCTCTGGGCCTGGCCCCCACCCTCAGAGCAGGCGGGTGGATGAGTTTCTGTGGGCACTTATTCGCCGGCCTCCGCTGCCGCTCGGCACAAGAG601201802402706012018024025330CA 02264581 2002-10-24(A) LENGTH: 38 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5:TGAGACTCCA GGAATATCTA GACTCTGAGG TCCTTATA(2) INFORMATION FOR SEQ ID NO: 6:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 30 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6:ATCTGCCTGG CAGCTAGTAG ACGGACCGTC(2) INFORMATION FOR SEQ ID NO: 7:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 44 base pairsB) TYPE: nucleic acid) STRANDEDNESS: double((C(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7:CTAGAGGACG ATCGAGGATG GATCTCCTGC TAGCTCCTAC GATC(2) INFORMATION FOR SEQ ID NO: 8:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 74 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8:CTAGAGGACG ATCGAGGATG GATCTTCCAG GACCACCGAT CTCCTGCTAG CTCCTACCTA3/1338304460CA 02264581 2002-10-24GAAGGTCCTG GTGG(2) INFORMATION FOR SEQ ID NO: 9:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 82 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9:CTAGAGGACG ATCGAGGATG GATCTGAGAC TCCAGGAATA TGATCTCCTG CTAGCTCCTACCTAGACTCT GAGGTCCTTA TA(2) INFORMATION FOR SEQ ID NO: 10:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 74 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10:CTAGAGGACG ATCGAGGATG GATCATCTGC CTGGCAGGAT CTCCTGCTAG CTCCTACCTAGTAGACGGAC CGTC(2) INFORMATION FOR SEQ ID NO: 11:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 62 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11:CTAGAGGACG ATCGAGGATG GATCTTCCAG ATCTCCTGCT AGCTCCTACC TAGAAGGTCCTG(2) INFORMATION FOR SEQ ID NO: 12:(i) SEQUENCE CHARACTERISTICS:4/1374608260746062CA 02264581 2002-10-24(A) LENGTH: 62 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12:CTAGAGGACG ATCGAGGATG GATCTGAGAG ATCTCCTGCT AGCTCCTACC TAGACTCTGAGG(2) INFORMATION FOR SEQ ID NO: 13:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 62 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13:CTAGAGGACG ATCGAGGATG GATCATCTGG ATCTCCTGCT AGCTCCTACC TAGTAGACGGAC(2) INFORMATION FOR SEQ ID NO: 14:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 20 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14:GCAGCAGACG TCGTCTCAGG(2) INFORMATION FOR SEQ ID NO: 15:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 20 base pairs( TYPE: nucleic acidB)(C) STRANDEDNESS: double(D) TOPOLOGY: unknown5/136062606220CA 02264581 2002-10-24(xi) SEQUENCE DESCRIPTION: SEQ ID NO:GCAGCAGACG TCGTCTCCTC(2)INFORMATION FOR SEQ ID NO: 16:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 20 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:GCAGCAGACG TCGTCTGTCC(2)INFORMATION FOR SEQ ID NO: 17:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 82 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:GCAGCAGAGT CCTGGAAGAT CCATCCAGCT AGCAGGAGAT CCGTCGTCTC AGGACCTTCTAGGTAGGTCG ATCGTCCTCT AG(2) INFORMATION FOR SEQ ID NO: 18:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 82 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:GCAGCAGAGG AGTCTCAGAT CCATCCAGCT AGCAGGAGAT CCGTCGTCTC CTCAGAGTCTAGGTAGGTCG ATCGTCCTCT AG(2) INFORMATION FOR SEQ ID NO: 19:(i) SEQUENCE CHARACTERISTICS:6/1315:2016:2017:608218:6082CA 02264581 2002-10-24(A) LENGTH: 82 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19:GCAGCAGACA GGCAGATGAT CCATCCAGCT AGCAGGAGAT CCGTCGTCTG TCCGTCTACT 60AGGTAGGTCG ATCGTCCTCT AG 82(2) INFORMATION FOR SEQ ID NO: 20:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 30 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20:GCAGCAGAGT CCTCGTCGTC TCAGGACCTT 30(2) INFORMATION FOR SEQ ID NO: 21:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 30 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21:GCAGCAGAGG AGTCGTCGTC TCCTCAGAGT 30(2) INFORMATION FOR SEQ ID NO: 22:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 30 base pairs) TYPE: nucleic acidBC) STRANDEDNESS: double(((D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22:7/13CA 02264581 2002-10-24GCAGCAGACA GGCCGTCGTC TGTCCGTCTA 30(2) INFORMATION FOR SEQ ID NO: 23:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 34 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23:GGAAGATCCT GGACAGTTGC TAGGACCTGT CAAC 34(2) INFORMATION FOR SEQ ID NO: 24:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 34 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24:CTCAGATCCT GGACAGTTGC TAGGACCTGT CAAC 34(2) INFORMATION FOR SEQ ID NO: 25:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 34 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25:AGATGATCCT GGACAGTTGC TAGGACCTGT CAAC 34(2) INFORMATION FOR SEQ ID NO: 26:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 44 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown8/13CA 02264581 2002-10-24(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26:CTAGAGGACG ATCGAGGATG GATCTCCTGG TAGCTCCTAC CTAG(2) INFORMATION FOR SEQ ID NO: 27:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 72 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27:CCCTAGACTA GAGGACCGAT CGAATCAGCA GCAGAGATCT GATCTCCTGG CTAGCTTAGTCGTCGTCTCA GG(2) INFORMATION FOR SEQ ID NO: 28:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 72 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28:CCCTAGACTA GAGGACCGAT CGAATCAGCA GCAGAGATCT GATCTCCTGG CTAGCTTAGTCGTCGTCTCC TC(2) INFORMATION FOR SEQ ID NO: 29:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 72 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29:CCCTAGACTA GAGGACCGAT CGAATCAGCA GCAGAGATCT GATCTCCTGG CTAGCTTAGT9/13446072607260CA 02264581 2002-10-24CGTCGTCTGT CC 72(2) INFORMATION FOR SEQ ID NO: 30:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 34 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30:CAACTGTCCA GGATCGTTGA CAGGTCCTAG AAGG 34(2) INFORMATION FOR SEQ ID NO: 31:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 34 base pairsB) TYPE: nucleic acidC) STRANDEDNESS: double(((D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31:CAACTGTCCA GGATCGTTGA CAGGTCCTAG ACTC 34(2) INFORMATION FOR SEQ ID NO: 32:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 34 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32:CAACTGTCCA GGATCGTTGA CAGGTCCTAG TAGA 34(2) INFORMATION FOR SEQ ID NO: 33:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 92 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown10/13CA 02264581 2002-10-24(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33:GGTCACTTAG ATCGATCCAT GAGGATGCTT CATTCTGATT CAGTCCCCAG TGAATCTAGCTAGGTACTCC TACGAAGTAA GACTAAGTCA GG(2) INFORMATION FOR SEQ ID NO: 34:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 64 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: double(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34:GCATCTGGAG TCTACAGTCG TCTATTGACG CGTAGACCTC AGATGTCAGC AGATAACTGCCGGC(2) INFORMATION FOR SEQ ID NO: 35:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 36 base pairs( TYPE: nucleic acidB)(C) STRANDEDNESS: double(D) TOPOLOGY: unknown(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35:GCATCAGGAT GTACAGCGTA GTCCTACATG TCGCCA(2) INFORMATION FOR SEQ ID NO: 36:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 27 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: single(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36:GTATTTTCCA GCCCAAGCCA AAGCTGG(2) INFORMATION FOR SEQ ID NO: 37:11/13609260643627CA 02264581 2002-10-24(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 21 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: single(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37:GCAGTGTTGG CAAGGACAGG C 21(2) INFORMATION FOR SEQ ID NO: 38:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 21 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: single(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38:GCCCTTGGCA TTCTTACCTG C 21(2) INFORMATION FOR SEQ ID NO: 39:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 22 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: single(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39:CTGGGAGAAT GCCAGGGAAA GG 22(2) INFORMATION FOR SEQ ID NO: 40:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 20 base pairs(B TYPE: nucleic acid)(C) STRANDEDNESS: single(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40:12/13CA 02264581 2002-10-24GTGATAGGCA CTGACCCCAG(2) INFORMATION FOR SEQ ID NO: 41:(i) SEQUENCE CHARACTERISTICS:(A) LENGTH: 22 base pairs(B) TYPE: nucleic acid(C) STRANDEDNESS: single(D) TOPOLOGY: unknown(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41:CGCATGCTCC CAGCTCTTGT GC13/132022

Claims (21)

1. A method for characterising cDNA, which comprises:
(a) cutting a sample comprising a population of one or more cDNAs or isolated fragments thereof, each having a strand complementary to the 3' poly-A terminus of an mRNA and bearing a tail, with a first sampling endonuclease at a first sampling site of known displacement from a reference site proximal to the tail to generate from each cDNA or isolated fragment thereof a first and second sub-fragment, each comprising a sticky end sequence of predetermined length and unknown sequence, the first sub-fragment bearing the tail;
(b) sorting either the first or second sub-fragments into sub-populations according to their sticky end sequence and recording the sticky end sequence of each sub-population as the first sticky end;
(c) cutting the sub-fragments in each sub-population with a second sampling endonuclease, which is the same as or different from the first sampling endonuclease, at a second sampling site of known displacement from the first sampling site to generate from each sub-fragment a further sub-fragment comprising a second sticky end sequence of predetermined length and unknown sequence;
and (d) determining each second sticky end sequence;
wherein the aggregate length of the first and second sticky end sequences of each sub-fragment is from 6 to 10; and wherein the sequences and relative positions of the reference site and first and second sticky ends characterise the or each cDNA.
2. A method according to claim 1, wherein the sample cut with the first sampling endonuclease comprises isolated fragments of the cDNAs produced by cutting a sample comprising a population of one or more cDNAs with a restriction endonuclease and isolating fragments whose restriction site is at the reference site.
3. A method according to claim 2, wherein the first sampling endonuclease binds to a first recognition site and cuts at the first sampling site at a predetermined displacement from the restriction site of the restriction endonuclease.
4. A method according to claim 3, wherein the first recognition site is provided in a first adaptor oligonucleotide which is hybridised to the restriction site of the isolated fragments.
5. A method according to any one of claims 2 to 4, wherein the restriction endonuclease recognises a 4 base pair binding site.
6. A method according to any one of claims 2 to 5, wherein the second sub-fragments are sorted in step (b).
7. A method according to claim 1, wherein the first sampling endonuclease binds to the reference site and cuts at the first sampling site at a predetermined displacement from the reference site.
8. A method according to any one of claims 1 to 7, wherein the first sampling endonuclease comprises a Type IIs endonuclease.
9. A method according to any one of claims 1 to 8, wherein the second sampling endonuclease binds to a second recognition site and cuts at the second sampling site at a predetermined displacement from the first sampling site.
10. A method according to claim 9, wherein the second sampling endonuclease comprises a Type IIs endonuclease.
11. A method according to claim 9 or claim 10, wherein the second recognition site is provided in a second adaptor oligonucleotide which is hybridised to the first sticky end.
12. A method according to any one of claims 1 to 11, wherein the tails of the cDNAs or fragments thereof are bound to a solid phase matrix.
13. A method according to any one of claims 1 to 12, wherein the aggregate length of the first and second sticky end sequences of each sub-fragment is 8.
14. A method according to claim 13, wherein the length of each sticky end is 4.
15. A method according to any one of claims 1 to 14, wherein the step (b) of sorting the sub-fragments comprises dividing the sub-fragments into an array of samples, each sample in a separate container; contacting the array of samples with an array of solid phase affinity matrices, each solid phase affinity matrix bearing a unique base sequence of same predetermined length as the first sticky end, so that each sample is contacted with one of the possible base sequences and the array of samples is contacted with all possible base sequences of that predetermined length for hybridisation to occur only between each unique base sequence and first sticky end complementary with one another; and washing unhybridised material from the containers.
16. A method according to any one claims 1 to 15, wherein the step (d) of determining each second sticky end sequence comprises isolating the further sub-fragments from step (c) and contacting the further sub-fragments with an array of adaptor oligonucleotides in a cycle, each adaptor oligonucleotide bearing a label and a unique base sequence of same predetermined length as the second sticky end, the array containing all possible base sequences of that predetermined length; wherein the cycle comprises sequentially contacting each adaptor oligonucleotide of the array with each sub-population of isolated sub-fragments under hybridisation conditions, removing any unhybridised adaptor oligonucleotide and determining the presence of any hybridised adaptor oligonucleotide by detection of the label, then repeating the cycle, until all of the adaptors in the array have been tested.
17. A method according to any one of claims 1 to 14, wherein the step (b) of sorting the sub-fragments comprises (i) binding the sub-fragments to a hybridisation array comprising an array of oligonucleotide sets, each set bearing a unique base sequence of same predetermined length as the first sticky end and identifiable by location in the array, all possible base sequences of that predetermined length being present in the array, so that each sub-population bearing its unique first sticky end is hybridised at an identifiable location in the array; and (ii) determining the location to identify the first sticky end sequence.
18. A method according to claim 17, wherein the sub-fragments cut in step (c) are those bound to the hybridisation array so that the further sub-fragments generated thereby remain bound to the hybridisation array; and wherein the step (d) of determining each second sticky end sequence comprises contacting the further sub-fragments under hybridisation conditions with an array of adaptor oligonucleotides, each adaptor oligonucleotide bearing a label and a unique base sequence of same predetermined length as the second sticky end, the array containing all possible base sequences of that predetermined length, removing any unhybridised adaptor oligonucleotide, and determining the location of any hybridised adaptor oligonucleotide by detection of the label.
19. A method for identifying cDNA in a sample, which comprises characterising cDNA in accordance with a method according to any one of claims 1 to 18, comparing the sequences and relative positions of the reference site and first and second sticky ends obtained thereby with the sequences and relative positions of the reference site and first and second sticky ends of known cDNAs in order to identify the or each cDNA in the sample.
20. A method for assaying for one or more specific cDNAs in a sample, which comprises performing a method according to any one of claims 1 to 14, wherein the reference site is predetermined, each first sticky end sequence in sorting step (b) is a predetermined first sticky end sequence, each second sticky sequence in step (d) is determined by assaying for a predetermined second sticky end sequence, and the relative positions of the reference site and predetermined first and second sticky ends characterise the or each specific cDNA.
21. A method according to claim 20, wherein the reference site and first and second sticky end sequences are predetermined by selecting corresponding sequences from one or more known target cDNAs.
CA002264581A 1996-09-05 1997-09-05 Characterising dna Expired - Fee Related CA2264581C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
GBGB9618544.2A GB9618544D0 (en) 1996-09-05 1996-09-05 Characterising DNA
GB9618544.2 1996-09-05
PCT/GB1997/002403 WO1998010095A1 (en) 1996-09-05 1997-09-05 Characterising dna

Publications (2)

Publication Number Publication Date
CA2264581A1 CA2264581A1 (en) 1998-03-12
CA2264581C true CA2264581C (en) 2003-11-11

Family

ID=10799468

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002264581A Expired - Fee Related CA2264581C (en) 1996-09-05 1997-09-05 Characterising dna

Country Status (12)

Country Link
US (1) US6225077B1 (en)
EP (1) EP0927267B1 (en)
JP (1) JP3863189B2 (en)
CN (1) CN1118581C (en)
AT (1) ATE215994T1 (en)
AU (1) AU721861B2 (en)
CA (1) CA2264581C (en)
DE (1) DE69711895T2 (en)
GB (1) GB9618544D0 (en)
IL (1) IL128809A (en)
NZ (1) NZ334426A (en)
WO (1) WO1998010095A1 (en)

Families Citing this family (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100434531C (en) * 1997-01-15 2008-11-19 X齐里昂有限两合公司 Mass label linked hybridisation probes
US6699668B1 (en) 1997-01-15 2004-03-02 Xzillion Gmbh & Co. Mass label linked hybridisation probes
GB9707980D0 (en) * 1997-04-21 1997-06-11 Brax Genomics Ltd Characterising DNA
US6054276A (en) 1998-02-23 2000-04-25 Macevicz; Stephen C. DNA restriction site mapping
US6136537A (en) * 1998-02-23 2000-10-24 Macevicz; Stephen C. Gene expression analysis
US7399844B2 (en) * 1998-07-09 2008-07-15 Agilent Technologies, Inc. Method and reagents for analyzing the nucleotide sequence of nucleic acids
WO2000029621A2 (en) * 1998-11-16 2000-05-25 Genelabs Technologies, Inc. Method for measuring target polynucleotides and novel asthma biomolecules
US8367322B2 (en) 1999-01-06 2013-02-05 Cornell Research Foundation, Inc. Accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing
JP2002534098A (en) 1999-01-06 2002-10-15 コーネル リサーチ ファンデーション インク. Accelerated Identification of Single Nucleotide Polymorphisms and Alignment of Clones in Genome Sequencing
AU5557400A (en) * 1999-03-18 2001-01-31 Complete Genomics As Methods of cloning and producing fragment chains with readable information content
CN1176212C (en) * 1999-05-10 2004-11-17 东洋钢板株式会社 Methods for constructing DNA library and support carrying DNA library immobilized thereon
US8137906B2 (en) 1999-06-07 2012-03-20 Sloning Biotechnology Gmbh Method for the synthesis of DNA fragments
DE19925862A1 (en) * 1999-06-07 2000-12-14 Diavir Gmbh Process for the synthesis of DNA fragments
WO2001021840A2 (en) * 1999-09-23 2001-03-29 Gene Logic, Inc. Indexing populations
JP3668075B2 (en) * 1999-10-12 2005-07-06 光夫 板倉 Suspension system for determining genetic material sequence, method for determining genetic material sequence using the suspension system, and SNPs high-speed scoring method using the suspension system
AU1812801A (en) * 1999-12-06 2001-06-12 Pioneer Hi-Bred International, Inc. Short shared nucleotide sequences
JP2001204463A (en) * 2000-01-27 2001-07-31 Toyo Kohan Co Ltd Support for immobilizing nucleotide
AU2001232140A1 (en) * 2000-02-17 2001-08-27 Complete Genomics As A method of mapping restriction endonuclease cleavage sites
US6468749B1 (en) * 2000-03-30 2002-10-22 Quark Biotech, Inc. Sequence-dependent gene sorting techniques
DE10060827A1 (en) * 2000-12-07 2002-06-13 Basf Lynx Bioscience Ag Methods of coding hybridization probes
EP1423529B1 (en) * 2001-01-24 2013-11-20 Genomic Expression APS Assay for analyzing gene expression
US20030143612A1 (en) * 2001-07-18 2003-07-31 Pointilliste, Inc. Collections of binding proteins and tags and uses thereof for nested sorting and high throughput screening
US7189509B2 (en) * 2001-08-16 2007-03-13 Zhifeng Shao Analysis of gene expression profiles using sequential hybridization
ATE414767T1 (en) 2001-11-22 2008-12-15 Sloning Biotechnology Gmbh NUCLEIC ACID LINKERS AND THEIR USE IN GENE SYNTHESIS
US7291460B2 (en) 2002-05-31 2007-11-06 Verenium Corporation Multiplexed systems for nucleic acid sequencing
US7563600B2 (en) 2002-09-12 2009-07-21 Combimatrix Corporation Microarray synthesis and assembly of gene-length polynucleotides
FR2852605B1 (en) * 2003-03-18 2012-11-30 Commissariat Energie Atomique PROCESS FOR PREPARING DNA FRAGMENTS AND ITS APPLICATIONS
US7642064B2 (en) 2003-06-24 2010-01-05 Ventana Medical Systems, Inc. Enzyme-catalyzed metal deposition for the enhanced detection of analytes of interest
US20040265883A1 (en) * 2003-06-27 2004-12-30 Biocept, Inc. mRNA expression analysis
DK1557464T3 (en) 2004-01-23 2011-01-24 Sloning Biotechnology Gmbh Enzymatic preparation of nucleic acid molecules
FR2890859B1 (en) * 2005-09-21 2012-12-21 Oreal DOUBLE-STRANDED RNA OLIGONUCLEOTIDE INHIBITING TYROSINASE EXPRESSION
US9115352B2 (en) 2008-03-31 2015-08-25 Sloning Biotechnology Gmbh Method for the preparation of a nucleic acid library
CN101942509B (en) * 2010-06-10 2013-02-27 广州医学院第一附属医院 Method for detecting common/identical DNA sequence in two unknown DNA fragments
WO2013012674A1 (en) 2011-07-15 2013-01-24 The General Hospital Corporation Methods of transcription activator like effector assembly
US9890364B2 (en) 2012-05-29 2018-02-13 The General Hospital Corporation TAL-Tet1 fusion proteins and methods of use thereof
EP3789405A1 (en) 2012-10-12 2021-03-10 The General Hospital Corporation Transcription activator-like effector (tale) - lysine-specific demethylase 1 (lsd1) fusion proteins
AU2014214719B2 (en) 2013-02-07 2020-02-13 The General Hospital Corporation Tale transcriptional activators
ITUB20155686A1 (en) * 2015-11-18 2017-05-18 Fondazione St Italiano Tecnologia Nucleic acid detection procedure and related kit.
CN107557355A (en) * 2016-07-01 2018-01-09 Pgi股份有限公司 The method of construction circular template and detection DNA molecular

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2036946C (en) * 1990-04-06 2001-10-16 Kenneth V. Deugau Indexing linkers
GB9214873D0 (en) * 1992-07-13 1992-08-26 Medical Res Council Process for categorising nucleotide sequence populations
US5695934A (en) 1994-10-13 1997-12-09 Lynx Therapeutics, Inc. Massively parallel sequencing of sorted polynucleotides
EP0735144B1 (en) * 1995-03-28 2002-06-05 Japan Science and Technology Corporation Method for molecular indexing of genes using restriction enzymes
US5866330A (en) * 1995-09-12 1999-02-02 The Johns Hopkins University School Of Medicine Method for serial analysis of gene expression
US5871697A (en) * 1995-10-24 1999-02-16 Curagen Corporation Method and apparatus for identifying, classifying, or quantifying DNA sequences in a sample without sequencing

Also Published As

Publication number Publication date
EP0927267A1 (en) 1999-07-07
IL128809A (en) 2002-09-12
ATE215994T1 (en) 2002-04-15
AU721861B2 (en) 2000-07-13
CN1118581C (en) 2003-08-20
AU4027497A (en) 1998-03-26
WO1998010095A1 (en) 1998-03-12
CN1234076A (en) 1999-11-03
JP3863189B2 (en) 2006-12-27
GB9618544D0 (en) 1996-10-16
DE69711895T2 (en) 2002-11-14
US6225077B1 (en) 2001-05-01
CA2264581A1 (en) 1998-03-12
JP2000517192A (en) 2000-12-26
DE69711895D1 (en) 2002-05-16
EP0927267B1 (en) 2002-04-10
IL128809A0 (en) 2000-01-31
NZ334426A (en) 2000-09-29

Similar Documents

Publication Publication Date Title
CA2264581C (en) Characterising dna
US6258539B1 (en) Restriction enzyme mediated adapter
US6403319B1 (en) Analysis of sequence tags with hairpin primers
US6297017B1 (en) Categorising nucleic acids
EP0675966B1 (en) Novel oligonucleotide arrays and their use for sorting, isolating, sequencing, and manipulating nucleic acids
JP4110216B2 (en) Improved sequence analysis based on adapters
EP1711631B1 (en) Nucleic acid characterisation
US8202691B2 (en) Uniform fragmentation of DNA using binding proteins
WO1998015652A1 (en) Nucleic acid sequencing by adaptator ligation
AU5925099A (en) Molecular cloning using rolling circle amplification
MX2013003349A (en) Direct capture, amplification and sequencing of target dna using immobilized primers.
AU733924B2 (en) Characterising DNA
AU728805B2 (en) Nucleic acid sequencing
JP3789317B2 (en) Isometric primer extension method and kit for detecting and quantifying specific nucleic acids
US20070148636A1 (en) Method, compositions and kits for preparation of nucleic acids
CA3149025A1 (en) Highly sensitive methods for accurate parallel quantification of nucleic acids
US20060240431A1 (en) Oligonucletide guided analysis of gene expression
GB2492042A (en) Selector oligonucleotide-based methods and probes for nucleic acid detection or enrichment
US20030044827A1 (en) Method for immobilizing DNA
WO2006003638A2 (en) Novel method for labeling nucleic acid in a sequence-specific manner, and method for detecting nucleic acid using the same

Legal Events

Date Code Title Description
EEER Examination request
MKLA Lapsed