WO1998011258A1 - Method and apparatus for analysis of chromatographic migration patterns - Google Patents

Method and apparatus for analysis of chromatographic migration patterns Download PDF

Info

Publication number
WO1998011258A1
WO1998011258A1 PCT/US1997/016933 US9716933W WO9811258A1 WO 1998011258 A1 WO1998011258 A1 WO 1998011258A1 US 9716933 W US9716933 W US 9716933W WO 9811258 A1 WO9811258 A1 WO 9811258A1
Authority
WO
WIPO (PCT)
Prior art keywords
idx
int
data
const
double
Prior art date
Application number
PCT/US1997/016933
Other languages
French (fr)
Inventor
Andy F. Marks
Original Assignee
University Of Utah Research Foundation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University Of Utah Research Foundation filed Critical University Of Utah Research Foundation
Priority to AU45882/97A priority Critical patent/AU4588297A/en
Priority to JP10514017A priority patent/JP2001502165A/en
Priority to EP97944372A priority patent/EP0944739A4/en
Publication of WO1998011258A1 publication Critical patent/WO1998011258A1/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/26Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating electrochemical variables; by using electrolysis or electrophoresis
    • G01N27/416Systems
    • G01N27/447Systems using electrophoresis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • This invention relates to the field of signal detection and analysis of chromatographic migration patterns as commonly applied to mixtures of molecules. More specifically, this invention relates to a method and apparatus for signal detection and analysis of chromatographic migration patterns as applied to the determination of DNA sequences.
  • Such informative variables may include the relative intensities between adjacent signals, the relative signal spacing and pattern recognition factors.
  • Tibbetts' method is limited, however, by the quality of the chromatographic data.
  • Tibbetts 1 method relies to a certain extent on the reproducibility of chromatographic data to train the base identification ("calling") system. The apparatus generating the chromatographic data, therefore, needs to be consistent from run to run to avoid retraining the algorithm.
  • analyses based on signal spacing may produce errors in signal identification.
  • signal intensity often varies in an unpredictable manner, signal identification based on intensity may also result in significant identification errors.
  • a U.S. patent of Thomas Stockham and Jeff Ives (No. 5,273,632) discloses an alternate method for base identification using blind deconvolution ("BD").
  • the method of Stockham and Ives uses blind deconvolution to deblur information-containing signals in chromatographic data. This method, however, is significantly limited in the following manner. First, it relies on data derived from scanned autoradiogram image data. Second, the method requires user input of the BD filter bandwidth and programmer alterations to various thresholds. Third, the Stockham and Ives method does not adequately deal with lane to lane mobility differences. Fourth, the insertion/deletion and correction logic was too simple. Fifth, the putative peak detection was based on thresholds, and therefore, could miss band detections when band amplitudes dropped below the threshold. Sixth, the method of Stockham and Ives lacked the ability to align and merge adjacent sample segments. Finally, that method lacked band quality measures useful in automatic data routing and or sequence assembly.
  • the present invention includes a method and apparatus for the detection and analysis of information-containing signals in chromatographic data.
  • the invention also includes a method and apparatus for detecting and sharpening signal peaks in chromatographic data. It is an advantage of the present invention that a chromatographic data from a wide variety of separation processes can be analyzed. Such separation processes include, but are not limited to, gel and capillary electrophoresis.
  • the present invention includes the steps of preprocessing signal data, reading successive sample segments, selecting blocks of high quality sequence and then producing traces of aligned high quality sequences. It is an advantage of the present invention that the chromatographic data may include single fluor samples fractionated in multiple lanes and multiple fluor samples fractionated in single lanes.
  • This analytic technique uses iterative blind deconvolution to determine band frequency in sample segments. It is an advantage of the invention that the filter-band width is automatically varied during iteration to optimally detect the signals in the preprocessed chromatographic data. It is a further function of the invention to detect and correct signal data derived from chromatographic data which have segments which are short in one or more signal types (for example, "band-lite" signals).
  • the invention may optionally provide a quality measure for each signal. It is a feature of the invention that the quality measure can be utilized during subsequent alignment steps. It is an advantage of the invention that the quality measure can provide left and right cutoff point to limit subsequent analysis to data above a given quality measure.
  • Figure 1 depicts a flow chart for the invented base calling method.
  • Figure 2 depicts a flow chart of the preprocessing step of Figure 1.
  • Figure 3 depicts a flow chart of the base reading step of Figure 1.
  • Figure 4 depicts a flow chart of the extra-normalization step of Figure 3.
  • Figure 5 depicts a flow chart of the peak detection and refinement step of Figure 3.
  • Figure 6 depicts a flow chart of the OmitOkN fuzzy logic block of Figure 5.
  • Figure 7 depicts a flow chart of the OKSpMembership fuzzy logic block of Figure 6.
  • Figure 8 depicts a flow chart of the OmitOkN Bad Spacing Membership fuzzy logic block of Figure 6.
  • Figure 9 depicts a flow chart of the OmitOkN Cross Banding fuzzy logic block of Figure 6.
  • Figure 10 depicts a flow chart of the OmitOkN Height fuzzy logic block of Figure 6.
  • Figure 1 1 depicts a flow chart of the GapCheck fuzzy logic block of Figure 5.
  • Figure 12 depicts a flow chart of the GapCheck Gap Membership fuzzy logic block of
  • Figure 13 depicts a flow chart of the GapCheck Width Membership fuzzy logic block of Figure 11.
  • Figure 14 depicts a flow chart of the Monte Carlo Alignment function of Figure 4
  • Figure 15 depicts a flow chart of the BaseQual fuzzy logic block of Figure 1.
  • Figure 16 depicts a flow chart of the BaseQual Height Membership fuzzy logic block of Figure 15.
  • Figure 17 depicts a flow chart of the BaseQual Cross Banding Membership fuzzy logic block of Figure 15.
  • Figure 18 depicts a flow chart of the BaseQual Width Membership fuzzy logic block of Figure 15.
  • Figure 19 depicts a flow chart of the BaseQual Shape Membership fuzzy logic block of Figure 15.
  • Figure 20 depicts a flow chart of the BaseQual Baseline Buzz Membership fuzzy logic block of Figure 15.
  • Figure 21 depicts a flow chart of the BaseQual OK Spacing Membership fuzzy logic block of Figure 15.
  • Figure 22 depicts a flow chart of the Baseline Substraction algorithm of Figure 2.
  • Figure 23 depicts a flow chart of the Pre-Processing Begin/End Detection of Figure 1.
  • the present invention provides a method and apparatus for detecting and analyzing information-containing signals in chromatographic data.
  • the invention analyzes chromatographic data from DNA sequence analysis machines employing various and sundry imaging techniques, including autoradiograms, four lane-single fluor, and single lane-four fluor data.
  • the invention further includes general and dedicated apparatuses for performing the invented method.
  • the invention also includes a kit comprising one or more of the following components in combination with the invented method: a DNA sequence apparatus, signal detection apparatus, information storage devices for preserving chromatographic data before, during and after analysis, and output devices for displaying the analyzed sequence information.
  • the invented method takes as input the output from a DNA sequencing apparatus and returns the called sequence, aligned traces, and band metrics for each called base. After each sample segment is read, its called sequence, aligned traces, and band metrics are joined to previous read segments. After an entire ladder has been read, a final step analyzes each called base's metrics and assigns a quality value. The quality values are used to identify the largest block of high quality sequence and establish left and right cutoff values. If a "preamble" sequence is available, the base calling software will attempt to locate the preamble in the called sequence and set the left cutoff value beyond it. Such preamble sequences may include primer sequences or known sequences which are to be excluded from the collected data. This latter step improves the chance that the sequence called by this software would merge with the least amount of human intervention.
  • the invented base calling software first performs a preprocessing step 102 on the input data set 101.
  • Preprocessing can include spectral separation, background subtraction and interpolation of input data set 101.
  • the preprocessed data set 103 then enters Steps 104-106, which reads successive sample segments of the preprocessed data.
  • the sample segments 104 may be any suitable size which provides efficient signal analysis.
  • the first segment is 2048 scanline samples.
  • Subsequent segments are also 2048 samples, with 148 samples overlapping the previous segment.
  • the following description is based on the most preferred sample segment size, although the scope of the invention is not intended to be limited to that segment size.
  • Each sample segment 104 is first analyzed to estimate the coarse band spacing. Subsequently, the segment 104 is analyzed at second time 106 to refine the predicted band spacing.
  • the band spacing drives the selection of the reconstruction filter employed during blind deconvolution. Band spacing and filter band width are inversely related. Once a sample segment of 2048 scanlines is read twice (a refined sample segment) and its band spacing measured and normalized for that 2048 scanline segment, the next sample segment of 2048 samples is read. The next sample segment overlaps the previous segment by 148 scanlines (or about 15 nucleotide bases) to establish the frame and relative positioning of adjacent segments. Subsequent segments 104 are similarly processed until the final sample segment 104 is reached.
  • pseudo-random noise is generated to fill the sample segment to the required 2048 samples. Pseudo-random noise is preferred because sources of non-random noise will cause improper processing during the blind deconvolution and alignment steps.
  • fuzzy logic allows multivalued logic to enhance peak detection. By using fuzzy logic, a gap is "somewhat big,” a band is “not so tall.” Fuzzy logic also provides logic operators (for example AND, OR. NOT).
  • Each fuzzy logic block in the base calling method provides a particular analysis of its data. The logic blocks operate on normalized input data and essentially classify each band based on absolute and relative criteria which are based on the band's neighboring bands. For example, fuzzy logic block 108 analyzes each base, its upstream context and assigns a quality value to each called band (base identity).
  • fuzzy logic block 108 also identifies the largest block of high quality data in the processed and aligned data 107.
  • the right and left cutoff points for the high quality data block are recorded and set as left and right cutoff points.
  • the output data set 109 includes the finished traces, the called bases with their assigned quality values and the suggested left and right cutoff points.
  • Output data set 109 can optionally be visually enhanced to normalize all bands to about the same band amplitude and to remove the saw tooth appearance of the non-visual ly enhanced traces.
  • input data set 201 is an 2048x4 trace matrix.
  • the first step is to established Begin and End points by analyzing input data set 201 for the first scanlines containing above-background signal and for the last scanlines containing such signal (See trace 202a as an example.) Large signal spikes, due to artifacts such as primer peaks, are excluded.
  • the Preprocessing Begin/End subroutine identifies the Begin and End points based on signal amplitude.
  • the Begin and End points define the usable signal for subsequent operations.
  • Usable signal typically begins just left of the largest left-most signal amplitude (the so called primer peak), and continues until either the end of the sample segment data or another region of large signal amplitude is encountered. (The latter peak is typically called a biostreptation peak.) More specifically, steps 2302 through 2305 identify the putative start and end points by breaking the sample segment into zones and determining the maximum signal amplitude in each zone.
  • Step 2306 determines whether a second primer peak is present. If the second peak is present, Step 2306 sets the Begin point at the second primer peak.
  • Steps 2307 and 2308 make final adjustments to the Begin and End points, setting the Begin point to the first sample with amplitude below the mean of the first half of the signal, and setting the End point back 350 samples from end.
  • the baseline of the Preprocessed Begin/End data 202 is then determined at Step 203.
  • a single baseline is established for each fluor of the Preprocessed Begin/End data 202.
  • the baseline is subtracted from Preprocessed Begin/End data 202 to generate a baseline subtracted data set 203.
  • the localized data set 207 becomes baseline subtracted data set 208.
  • a single baseline is established based on data from all lanes.
  • the baseline can be established by estimating the baseline of the Preprocessed Begin/End 2201.
  • each trace lane is processed twice using a rising exponential threshold.
  • One pass is made from left to right (establishing one baseline) (Step 2202), and the next pass is from right to left (establishing another baseline) (Step 2203).
  • a fairly natural subtrahend is produced. (See sample traces 2205 and 2206.)
  • a threshold is initially set to the lowest point found within the first 10 samples. As each successive sample is considered, the threshold is incremented by an exponential which slowly ramps upward. When a subthreshold sample is encountered, the baseline between the previous subthreshold point and the current point is taken to be a line segment between the points. The threshold is reset to the new subthreshold sample value and the process continues. If, after 100 samples no subthreshold point has been found, a 100 point segment of the baseline is computed (again, piecewise linear), and the rate of rise of the exponential is increased. The exponential is calculated to rise by 1/3 the amplitude of the most recent subthreshold point over a span of 75 samples.
  • baseline subtracted data set 203 is preferably spectrally or leakage separated. This step markedly improves the quality of capillary electrophoresis data. For slab gel data with a signal to noise ratio of 2.0 or less, separation step 204 significantly improves data quality, such that unreadable data can become readable.
  • the separation step 204 is preferably performed during preprocessing without user input. For four fluor-single lane data, the baseline-subtracted data set 203 is spectrally separated. For single fluor-four lane data, data set 203 is leakage separated. For either separation, the separation algorithm 204 builds a characteristic matrix (CHM) which is used to perform the separation.
  • CHM characteristic matrix
  • the characteristic matrix captures the spectral cross-talk ratios in four fluor data.
  • the characteristic matrix is generated from the ratio of leakage from the signal in the center of the lane in question to the signal in adjacent lanes.
  • the ratios are measured at "peak center.”
  • slab gel data all data points are used to generate the characteristic matrix.
  • the result of separation 204 is a separated data set 204.
  • Processing steps 104-106 are optimally performed on preprocessed data 206 containing at least 8 scanlines (samples) per band.
  • a baseline subtracted data set 203 or a separated data set 204 may optionally be enhanced to double or triple the number of samples using cubic spline interpolation 205.
  • Reading Referring to Figure 3, the exemplified reading step analyzes sample segments 301 of
  • Each sample segment 301 first undergoes blind deconvolution 302 to cancel the effects of an unknown laurentian blurring function and to normalize the amplitudes of the traces.
  • Blind deconvolution is described in the U.S. Patent to T.G. Stockham and J.T. Ives (No. 5,273,632), which is incorporated by reference herein.
  • the presently invented method includes the following improvements over the method of Stockman and Ives.
  • the first 2048 samples are blind deconvolved with an initial narrowband guess for the filter band width ("FBW") value. The narrow-band guess is made so that the initial reading does not overestimate the band density along the sample segment.
  • FBW filter band width
  • the invented method includes a means for selecting the filter band width (FBW) during blind deconvolution 302.
  • the blind deconvolution step 302 deblurs the signal and normalizes its amplitude.
  • an extra-normalization function 303 adjusts band spacing due to mobility differences in the samples.
  • Extra-normalization 303 also corrects for the tendency of blind deconvolution to create spurious bands, especially in regions of mono-, di- or tri- nucleotide repeats where one or more lanes are band-lite for extended regions.
  • extra-normalization 303 corrects two types of artifacts created by blind deconvolution. Path 406-410 cancels artifacts created in band-lite lanes.
  • the blindly deconvolved data set 406 is scanned for band-lite lanes by comparing the relative lane signal strengths and the relative lane band frequencies.
  • the proxy used for the lane signal strength analysis is the 97 th percentile signal amplitude found in each lane.
  • the proxy used for the band frequency is the proportion of the signal over which the lane in question has the largest signal amplitude. If a lane has less than 15% of the total bands found in all four lanes in a sample segment, and if the band amplitudes are low relative to the other lanes, the amplitudes of the bands in that lane are attenuated. If the band amplitude is lowest, those amplitudes are attenuated by one-half. If the band amplitude is above the lowest, the amplitudes are attenuated to three-quarters of the original band amplitude. In contrast, in ideal sequence data, where A, G, C, and T are equal in frequency, each trace should dominate 25% of the time.
  • Extra-normalization path 401-403 corrects for mobility differences between lanes and performs the actual band-lite attenuation 404.
  • the blind deconvolved data set 401 is analyzed to identify any regions with inordinately large or coincident bands. These regions are set to zero (base-line). If these regions were not set to zero the Monte Carlo alignment algorithm 403 would produce an aberrant alignment which focused on separating them.
  • Alignment is accomplished using a Monte Carlo search of a 3D space, where the x-axis defines the relation between the A and G lanes, the y-axis defines the relation between the AG relation and the C lane, and the z-axis defines the relation between the AGC relation and the T lane.
  • An initial set of possible alignments are chosen, each triple is applied to the traces to be aligned, and the integral of the resulting envelope is calculated.
  • a subset of the triples, those yielding the largest integrals, are then refined. The triple which yields the lowest integral is removed from the set under consideration. It is replaced by a triple which results from a random alteration of the triple which yields the largest integral.
  • the highest yielding triple is chosen as the alignment vector for the segment under consideration. More specifically, the search is conducted in a three dimensional space, where the x- axis specifies the offset between trace, and trace-, the y-axis specifies the offset between the trace, 2 registry and trace., and the z-axis specifies the offset between the trace, 23 registry and trace 4 (See illustration 1401).
  • the algorithm employed was originally described by W.L.Price in The Computer Journal, Vol. 20, No. 4, which is incorporated by reference herein.
  • a set of putative alignment solutions 1401 is generated.
  • the addresses of the lattice points of 6 concentric cubes centered about a point in the space are used as the initial alignment solutions.
  • Subsequent calls can either continue to center the lattice on the origin, or they can bias the search by centering the lattice on the previous alignment solution (x n . 1 ,y n . collaboratez n . 1 ).
  • Each alignment guess is converted into a shift vector of four values, wherein one value is 0.
  • Each trace in the matrix is shifted by the amount specified in the shift vector, the envelope of the shifted traces is obtained (the maximum value of the four trace values found at each position along the traces), and is summed.
  • the sum represents the integral of the envelope produced by the alignment guess.
  • a low integral value represents a poor alignment (see, e.g. illustration 1402, where the bands are aligned behind others, not arranged "shoulder to shoulder"), whereas a high integral value corresponds to a good alignment (see, e.g. illustration 1407, where all bands are fully exposed, arranged "shoulder to shoulder” ).
  • the worst alignment solution is replaced by a small, random perturbation of the best alignment solution 1405.
  • the new alignment solution is evaluated, and the process repeats, replacing the new worst alignment with a perturbation of the new best alignment.
  • the set of points in the 3D space converge about the best alignment solution 1406.
  • Step 304 peak detection and refinement, occurs.
  • the aligned traces then undergo putative peak detection.
  • putative peak detection 502 is performed on the blind deconvolved, extra- normalized data set 501 (unstopped, attenuated and with the relative mobilities corrected).
  • a trace envelope is first determined.
  • the Stockham and Ives Patent described detecting peaks in each trace separately with thresholds derived from the underlying data. In the invented method, the trace envelope is peak-detected and no thresholds are employed.
  • a peak is liberally defined to be a sample which is taller than either of its two neighbors. Subsequent processing culls this liberally defined putative peak list. This form of peak detection is both faster (one trace instead of four) and less prone to error (no subthreshold peaks).
  • the Stockham and Ives Patent required individual trace peak detection because its alignment algorithm attempted to determine lane alignment using peak location information.
  • each putative peak's instantaneous spacing, cross banding, height, and spacing to adjacent bands is measured (Step 503). These observed band spacing measurements are fit with a quadratic curve. This quadratic fit is used as the expectation of the band spacing along the entire read segment. This approach to defining the expected band spacing is sufficiently general to handle segments where, as in the Stockham and Ives Patent, the average spacing is an adequate expectation, as well as segments where the spacing changes radically. In the invented method, more information was found necessary to sufficiently identify insertions and regions of deletions, and as a result, the invented method can resolve a series of insertions and deletions.
  • the first of three fuzzy logic blocks 504, OmitOkN Fuzzy Logic is then used to identify bands which are most likely insertion artifacts of the band detection process.
  • This block classifies the detections as OK, AMBIGUOUS or OMIT.
  • the putative bands given the OMIT classification are removed from the putative peak set.
  • each band has several of its attributes 601 examined by this first logic block. If a band is where it ought to be with respect to either of its neighbors, then variable okSp is set "TRUE" (Step 602).
  • the intent of the membership function for the OmitOkN Ok Spacing fuzzy logic block is to "accept" a spacing measurement which is an integer multiple of the expected spacing.
  • the observed spacing is normalized to a value on the interval [0..1] using its relationship to expected spacing (Step 702).
  • the normalized spacing of 0.3 is found to be OK with a truth value of 0.7 (Step 703 and Example 704).
  • the intent of the membership function is to "deprecate" a spacing measurement which is not an integer multiple of the expected spacing. Consequently, the observed spacing is normalized to interval [0..1] using its relationship to expected spacing (Step 802).
  • a normalized spacing of 0.3 is found to be BAD with a truth value of 0.5. ; this spacing is not as good as it could be.
  • variable abSp is set "TRUE” (Step 603). If the amount of "cross banding” (i.e. the amount of competition by two bands for a particular region of the read segment) is high, then variable badXb is set “TRUE” (Step 604). Similarly, if there is negligible cross banding then variable neglXb is set "TRUE”.
  • cross banding designates the amount of competition for the scanlines underlying a detected band. Bands of a dubious nature have wide ranging cross band ratios due to their apex's proximity to the baseline. However, compressions and stops, with significant amplitudes, can have their cross banding measured.
  • the cross banding membership function is best used in identifying OK or AMBIGUOUS bands.
  • the first complex has two bands vying for the same location, with the second largest band having one-half the amplitude of the largest.
  • this ratio approaches infinity.
  • the badXb membership is 0.25
  • the negligibleXb membership is 1.0; in other words, while a ratio of 1.5 is found negligible, the band legitimacy will be questioned.
  • the band height is also categorized as either tiny or ok (Step 605).
  • the membership sets are best customized for the general signal quality one observes from the machine providing the data.
  • a function of the median value of amplitudes measured where bands intersect determines the height membership function break points.
  • the tinyHt function breaks at 0.4*med_intersect_pt and is zero by l .l*med_intersect_pt.
  • the okHt function comes off zero at 0.5*med_intersect_pt and flattens off at 1.0 at
  • the blind deconvolution process normalizes band amplitudes to interval [0..1], with most bands having a height in excess of 0.1. This example given is typical in that it begins deprecating a band based on its height when the height falls below 0.07.
  • the measured band height is 0.1 and has membership in okHt of 1.0 and in tinyHt of 0.0.
  • the band has, per this example of the sets, sufficient height.
  • the strength of the rule firings is then used to scale the output sets (Step 609).
  • the output set OK is scaled with amplitude 1.0
  • output set N ambiguous
  • set OMIT is scaled with 0.0.
  • Defuzzification, or obtaining a crisp (conclusion) value from the output rule sets is achieved by calculating the centroid of the resultant "masses”.
  • the conclusion reached is that the band is OK (Step 611).
  • each peak's instantaneous spacing, instantaneous band width, spacing to its left neighbor (left spacing), band width and called bases is remeasured (Step 505).
  • These observed band spacings are fit with a quadratic curve which then serves as the expected spacing along the read segment.
  • the observed band width measurements are also fit with a quadratic curve which serves as the expected band width along the read segment.
  • the second fuzzy logic block 506 GapCheck Fuzzy Logic, then identifies bands, or gaps between bands, where one or more bands may need to be inserted to achieve the band spacing predicted by the quadratic fit.
  • This block classifies the detections as NORMAL, SPLIT or SUFFERING FROM UPSTREAM TURBULENCE.
  • the gaps are split and a suitable number of bands are inserted (Step 507).
  • the bands given the SPLIT classification are split a suitable number of times, with the division points being the centroid of the interval to be split. The centroid is used to place the insertion on the shoulder of a poorly defined band, and not in the bottom of the trough between the SPLIT band and its left neighbor.
  • Each insertion has a defined Begin, Middle and End scanline value.
  • each band pair considered by fuzzy logic block GapCheck has several attributes which are measured (Step 1 101).
  • the expected spacing curve, expected width curve, band width, left gap (gap to the leftmost neighbor) and sequence is determined.
  • the upstream sequence is assigned a measure of GC-richness (Step 1102).
  • These measurements coupled with the GC richness of the sequence of the 5 bands to the left, are informative in identifying bands which need additional bands added to their left.
  • the gap is normalized with respect to the expected spacing onto interval [-1..inf] (Step 1103).
  • the GapCheck logic block determines if a band is located where it should be (independent of its absolute spacing but focusing instead on how far off the spacing curve it is), in the GapCheck logic block, the concern is on the absolute distance of the band from its left neighbor. If the gap is an integer multiple of the spacing curve (say three spaces from its left neighbor) two bands are inserted to its left to establish the proper spacing. In addition to the gap between bands in the pair, this logic also considers the widths of the bands. Band width is normalized onto interval [- .infj (Step 1 104). Usually, when band resolution decreases and a region in the observed trace contains fewer peaks than are required, one or both bands in the pair is wider than it should be.
  • Step 1 108 the normalized widths of each band in a band pair are classified as big.
  • Figure 13 provides details of the GapCheck band width membership function. This membership function characterizes an observed band width measurement (owd) if it exceeds expectations (ewd). The width is measured between B n 's Begin and End points (Step 1301). In the example given in Step 1303, a normalized gap of 0.2 is found to have membership in BigWidth of 0.2; the band is not that wide, but it is wider than expected (Step 1304).
  • bandhold is not marked as needing its left gap split if any of the following are TRUE: a) there is a large gap (bigGap n ) but the upstream sequence is GC-rich, or b) the gap to the first band in the pair is small (smlGap,,.,) and the two bands are not wide (IbigWid n and IbigWid,,.,) (i.e., ignore the gap between the bands), or c) the gap between the two bands is not large (!bigGap n ).
  • RULE SPLIT marks band, as needing its left gap split if either of the following are TRUE: a) the gap between the two bands is large (bigGap and the first and/or second band is wide (bigWid ⁇ . , or bigWid.) or, b) the gap between the two bands is large (bigGap n ) and the gap left of the first band is not small (IsmlGap..,) and the upstream sequence is not gc-rich (Igcrich).
  • RULE SPLIT (a) detects the combination of a wide and normal band (in either order) while RULE SPLIT (b) selects a run of wide bands separated by large gaps. The strength of the rule firings is then used to scale the output sets (Step 1111). In example 1112, the output set NORMAL is scaled with amplitude 1.0 and output set SPLIT is scaled with 0.25. The conclusion is formed by calculating the centroid of the resultant "masses”. In example 1112, the conclusion reached is that the band is NORMAL.
  • Step 507 remeasures the cross banding, instantaneous spacing, band height, band amplitude, and the spacing (left and right gaps) to adjacent bands.
  • the observed band spacing measurements are fit with a quadratic curve. This quadratic fit is used as the expectation of band spacing along the entire read segment.
  • the OmitOkN Fuzzy logic block (Step 508) is then used to identify bands which are most likely insertion artifacts of the band detection process. Any and all such bands are removed from the putative peak set. Newly proposed insertions may be deleted in this step.
  • the fuzzy logic band refinement stage adds the important advantage of reducing insertions and deletions and preventing arbitrary band calling when the reader encounters two or three base regions of signal dropout. See Figure 6 and the accompanying text for details of insertion detection.
  • the set of putative peaks which survive this processing are recorded as the bands for the read segment under consideration (Step 509).
  • reading function 104-106 consecutively processes sample segments until all of the input data set 101 is analyzed. Because each sample segment 104 overlaps the previous sample segment by a predetermined amount, the relative positioning of each read and aligned sample segment 106 is known. Step 107 assembles all of the read and aligned sample segments 106 to form a processed and reassembled sample segment 107.
  • a final process analyzes the set of measured band features with a third fuzzy logic block, BaseQual fuzzy logic block 109.
  • This block assigns a quality measure to each called band.
  • This block evaluates each band based on the band height, width, shape, left and right gap, cross-banding and baseline "buzz.”
  • This quality value on the interval (0.0 to 1.0) can be used during subsequent sequence alignment/merging steps.
  • the present invention uses the quality value to select the longest block of high quality sequence to be considered for alignment and merging with other sequences into a large DNA sequence.
  • the algorithm that selects the left and right cutoff points generates a surface, with the x-axis labeled MOVING AVERAGE FILTER WIDTH, the y-axis labeled THRESHOLD, and the z-axis labeled READ LENGTH.
  • the quality values are filtered with six moving average filters, and the filtered data is compared against nine thresholds. The longest contiguous block of above threshold filtered quality values provides the read length value for the surface for a particular filter width, threshold pair.
  • this surface is scaled so that narrow filter and high threshold read lengths are favored over wide filter and low threshold read lengths.
  • the surface maximum z-value is then chosen as the read length, and the associated first and last above threshold filtered quality value indexes serve as the left and right cutoff points, respectively. If a "preamble" sequence was submitted to this EDIT stage, and if the sequence is found beyond the established left cutoff point, the cutoff point is moved further left to exclude the "preamble" sequence.
  • the BaseQual fuzzy logic algorithm assesses the quality of the called bases. Experience has shown that some sequence assembly algorithms fail to assemble sequences containing regions of incorrect sequence and that others can only succeed when each base is accompanied by an indication of its quality (or inversely, its probability of error). In the former case, if the incorrect sequence regions are masked from consideration by the assembly program, the bulk of the good sequence will successfully assemble. In the latter case, if the low quality regions are identified, the overall base caller product will assemble. In either case, incorrect sequence, encountered in isolation, can be and usually is identified by an experienced technician using visual inspection. That process is time consuming and monotonous and subtle errors may go undetected. In general though, incorrect base calling is done where the underlying data traces are marginal.
  • the BaseQual routine automates quality assessment by measuring and analyzing multiple features (Step 1501) of each called base. Fuzzy logic is used to identify certain band presentation patterns and assign levels of quality to them. These band features include the band height, cross-banding, band width, band shape, the band's small gap and the band's large gap.
  • Band height variations are informative in many of the classifications.
  • Six fuzzy variables are used to classify a band's height (tiny, small, moderate, normal, tall and collectively, OK) (Step 1502).
  • the membership function characterizes an observed band height measurement.
  • a band with a "tiny" or "small” height is usually suspect, with the tiny bands being more suspect that the merely small.
  • Moderate height bands, and tall bands also require scrutiny.
  • Tall bands are suspect because usually they are found amid stops, compressions, and, on slab gels, artifacts.
  • a band height of 0.18 is found to have membership in NormalHeight of 1.0, which is to say that the band's height is within tolerances (See Example 1603).
  • cross banding the measure of competition by two traces for the same region of the trace, is also informative.
  • the BaseQual cross banding membership functions characterize an observed band's cross banding measurement.
  • the cross banding measurement is the ratio of the dominant trace to the next dominant trace. Ratios above 1.5 are deemed to have negligible cross banding, whereas those with lower ratios (with 1.0 being the lowest ratio possible) are suspect.
  • a cross banding ratio of 1.35 is found to have membership in negligibleXb of 0.33 (and 0.67 in the negation, InegligibleXb).
  • band width (normalized based on a quadratic fit of observed band widths), is another informative variable.
  • the BaseQual band width membership function a band's observed width is normalized with respect to the expected band width. The intent of the membership function is to determine how normal the normalized band width is. Referring to the example in Step 1802, a normalized width of 0.2 has membership in the Normal set of 1.0 (See Example 1803).
  • the band shape the linear correlation coefficient between the coefficients of a quadratic fit of the band and the coefficients of a quadratic fit of an ideal band, identifies abnormally shaped bands.
  • the BaseQual band shape membership function is informative in determining the quality of the base call.
  • sample rate conversion normalizes the observed band width
  • the band amplitude was normalized to 1.0
  • the result was then compared against an ideal, gaussian bell-shaped band.
  • the approach is computationally expensive and much information regarding the observed shape is discarded through the morphing process.
  • each band's height values are fit with a quadratic curve.
  • an ideal band shape is fit with a quadratic curve. (The ideal band shape is defined to have normal height and the expected width.) This approach reduces each sample set to an equal number of points.
  • the shape metric is taken as the linear correlation coefficient of these two sample sets.
  • a band's shape is "abnormal" if this shape metric falls below 0.5. Referring to the example in Step 1902, a shape metric of 0.6 is found to have membership in GoodShape of 1.0 (See example 1903).
  • Step 1506 "baseline buzz,” defined as the ratio of two other ratios, helps identify regions of the trace (usually the ends) where there is competition by several traces for the called band's domain. Toward the margins of a trace the baseline can often become quite busy, and when it does the quality of the underlying data, and the reads made thereon, become suspect. Baseline buzz can result from either incorrect signal processing, or from underlying data being so erratic as to defy correct signal processing. In either case the called sequence should come under suspicion. Referring to Figure 20, a buzz measurement above 0.2 begins to signal a problematic sequence. Referring to the example is step 2003 and Example 2004, a buzz measurement of 0.28 has membership in okBuz of 0.63 (and 0.37 in the negation, lokBuz).
  • the band's quality has come into question.
  • the gaps to a band's left and right neighbor are further informative variables in assessing band quality. These measurements help identify bands that, despite all previous efforts to the contrary, remain positioned too close or too far from a preferred position.
  • a band's observed spacing is normalized with respect to the expected spacing. The intent of the membership function is to determine how normal the normalized band spacing is. In the example given in blocks 2102 and 2103, a normalized spacing of 0.2 receives an unqualified OK, with membership in OK Spacing of 1.0.
  • Step 1508 logical combinations of several variables (e.g. buzz, width, shape, and spacing) help keep the rules for assigning a quality value to a band tractable.
  • Variable IBad indicates that one of the measures was out of tolerance.
  • variables 2 Bad, 3 Bad. and 4Bad indicate that two, three, or all four measurements are out of tolerance.
  • variable 40k notes that all four measurements are in tolerance.
  • a quality assessment is determined through application of a series of nine rules.
  • the lowest quality assessment is made of bands that are tiny in height and are incorrectly positioned (Step 1509).
  • the rule will assign a nonzero scale value to the output set with centroid near 0 (Step 1519).
  • the second quality assessment, RULE QUAL20 is made of bands that are short, show signs of cross banding, and are incorrectly positioned (Step 1510).
  • the rule will assign a nonzero scale value to the output set with centroid near 13 (Step 1519).
  • the third quality assessment, RULE QUAL30 is made of bands which are tiny in height yet correctly positioned (Step 1511). A band which matches this rule to some degree will be assigned a nonzero scale value to the output set with centroid near 25 (Step 1519).
  • the fourth quality assessment, RULE QUAL40 is made of bands with small or moderate height and which show signs of cross banding or have 3Bad or 4Bad attributes (Step 1512). A band which matches this rule to some degree will be assigned a nonzero scale value to the output set with centroid near 38 (Step 1519).
  • the fifth quality assessment is made of bands with small or moderate height and which show either some degree of cross banding or have 2Bad or 3Bad attributes (slightly better than quality class 4 in that these might have one less bad attribute) (Step 1513). A band which matches this rule to some degree will assign a nonzero scale value to the output set with centroid near 50 (Step 1519).
  • RULE QUAL60 the sixth quality assessment, is applied to bands with OK height but which show signs of baseline buzz, non- negligible cross banding, or have 2Bad attributes (Step 1514). A band which matches this rule to some degree will assign a nonzero scale value to the output set with centroid near 63 (1519).
  • the seventh quality assessment is made of bands in one of three general classes.
  • One class of bands has OK height, little baseline buzz, negligible cross banding, but has 2Bad attributes.
  • Another class of bands shows negligible cross banding, has OK height, no baseline buzz, is correctly positioned, but has both abnormal width and bad shape.
  • This class, named runfil is characteristic of a band inserted within a poorly resolved run of bands).
  • the final class of bands has 4Ok attributes but has small height and possibly some degree of cross banding present. A band which matches this rule to some degree will assign a nonzero scale value to the output set with centroid near 75. (Step 1519).
  • the eighth quality assessment, RULE QUAL80 is made of bands with OK height, little baseline buzz, negligible cross banding, but IBad attribute. (Step 1516). A band which matches this rule to some degree (many do) will assign a nonzero scale value to the output set with centroid near 88. (Example 1519).
  • the top quality assessment. RULE QUAL90. is made of bands with absolutely nothing visually wrong with them (Step 1517). A band which matches this rule to some degree (again, given good quality input, many do) will assign a nonzero scale value to the output set with centroid near 100 (Step 1519). Finally, as with all the other fuzzy logic blocks, the output sets are scaled with the strength of their respective rule firings, and the centroid is calculated to determine the final quality assessment.
  • the final quality assessments from the BaseQual Fuzzy Logic analysis control the length of the final sequence 109. Where high quality sequence data is desired, the quality assessments determines may limit the read length of the final sequence. Where longer read lengths are desired, and lower quality sequence is acceptable, the quality assessments can aid is correlating the resulting sequence data from other sequence analysis. For example, when overlapping sequences are obtained, the quality assessments can determine which base calls are more reliable. Similarly, when both strands of a DNA sequence are available, the quality assessments aid in identifying higher probability base calls.
  • the invented Base Calling Software can be implemented on standard desktop computers, such as Pentium- and 486-containing PC's. Computers with less powerful processors are also suitable, although the overall processing time for each input data set will be slower. Such computers will preferably include at least a central processing unit, dynamic memory and a device for outputting processed information.
  • the invented base calling software can be stored on any suitable storage media, including computer diskettes, removable media, hard-drives, CD's, magnetic tapes and similar electronic storage means.
  • CONCLUSION ConstructCFuzzySet( CONC_SZ, conc_x, conc_y, NOHEDGE );
  • C 1 ConstructCFuzzySet( CONCL_SZ, cone 1 _x. concn_y, NOHEDGE );
  • C2 ConstructCFuzzySet( CONCL_SZ, conc2_x, concn_y, NOHEDGE );
  • C3 ConstructCFuzzySet( CONCL_SZ, conc3_x, concn_y, NOHEDGE );
  • C4 ConstructCFuzzySet( CONCL_SZ, conc4_x, concn ⁇ y, NOHEDGE );
  • C5 ConstructCFuzzySet( CONCL_SZ, conc5_x, concn_y, NOHEDGE );
  • C6 ConstructCFuzzySet( CONCL_SZ, conc6_x. concn ⁇ y, NOHEDGE );
  • C7 ConstructCFuzzySet( CONCL_SZ, conc7_x, concn_y, NOHEDGE );
  • C8 ConstructCFuzzySet( CONCL_SZ, conc8_x, concn_y, NOHEDGE );
  • _4bad AND(NOT(okbuz).AND(NOT(normw),AND(NOT(goods).NOT(okgp)))); C4->scale(C4,AND(OR(smalh.modrh),OR(presx, OR(_3bad._4bad))));
  • ⁇ rv float(numer/3.0*denom);
  • R_NORM ConstructCFuzzySet( NORMSZ, norm_x, norm_y, NOHEDGE );
  • R_SPLIT ConstructCFuzzySet( SPLITSZ, split_x, split_y, NOHEDGE );
  • argstr is a [:] delimited set of 7 f ⁇ elds ⁇ n"); fprintf(stderr," fld 1 : sp(n- 1 ) ⁇ n"); fprintf(stderr," fld2: sp(n) ⁇ n”); fprintf(stderr,” fld3: gp(n-l ) ⁇ n”); fprintf(stderr,” fld4: gp(n) ⁇ n”); fprintf(stderr,” fld5: Exp[sp] ⁇
  • # include ⁇ ab/wsc.h> #else
  • # include ⁇ malloc.h> #endif
  • ⁇ xint (x2*yl - xl *y2 - x2*y3 + xl*y4) / den;
  • *yint (y 1 *y4 - y2*y3) / den; ⁇ static void cj(struct CFuzzySet* pcfz,struct CFuzzySet const* s, LOGICAL_OPERATOR cjdj) ⁇ int nn. vertexO, vertex 1 , use func O; double *xn, + yn, xl.
  • FreeMemory( pcfz->x ); #else free( pcfz->x ); #endif pcfz->x NULL;
  • next xl rightmost x
  • next_y 1 h2(s->yf vertex 1 - 1 ])
  • next_xO rightmost x
  • 0.0 s.yy2) return 0.0; else return s.xy2/::sqrt(s.xx2*s.yy2); ⁇
  • ConstructCFuzzySet (int npts,double const* xpts,double const* ypts,HEDGE ht); void DestructCFuzzySet(CFuzzySet* pcfz);
  • BandStatArray const& operator ( BandStatArray const& rhs ); -BandStatArrayO; void init( int len ); void ntnr( int idx, int v ) ⁇ bandj idx ].ntnr(v); ⁇ void posn( int idx, int v ) ⁇ band J idx ].posn(v); ⁇ void bbgn( int idx, int v ) ⁇ bandj idx ].bbgn(v); ⁇ void bend( int idx, int v ) ⁇ bandj idx ].bend(v); ⁇ void insr( int idx, int v ) ⁇ bandj idx ].insr(v); ⁇ void hght( int idx, float v )
  • bbgn() ⁇ int bend( int idx ) const ⁇ return band_[idx].bend(); ⁇ int insr( int idx ) const ⁇ return band [idx]. insr(); ⁇ float hght( int idx ) const ⁇ return band Jidx].hght(); ⁇ float lowv( int idx ) const ⁇ return band Jidx]. lowv(); ⁇ float xbnd( int idx ) const ⁇ return band Jidx].
  • xbnd() ⁇ float shap( int idx ) const ⁇ return band [idx].shap(); ⁇ float buzz( int idx ) const ⁇ return bandjidx].buzz(); ⁇ float widt( int idx ) const ⁇ return band Jidx], widt(); ⁇ float lgap( int idx ) const ⁇ return band_[idx].lgap(); ⁇ float sgap( int idx ) const ⁇ return band Jidx].
  • DLLexport void spline (float const x[], float const y[], int n, float ypl, float ypn. float y2[]); DLLexport void splint(float const xa[],float const ya[],float const ya2[],int n, float x, float *y); DLLexport void dfourl (double data[], unsigned long nn, int isign); #endif
  • ⁇ nsj ⁇ status_ STS_NO_MEM; release();
  • PKDET const& operator (PKDET const& rhs);
  • !y2) ⁇ status_ STS_NO_MEM; return 0;
  • *prr - float(ss_due_reg/ss_about_mu); ::free_matrix(m,l,2,l,2);
  • *prr float(ss_due_reg/ss_about_mu); : :free_matrix(m, 1 ,2, 1 ,2); : :free_matrix(v, 1 ,2, 1 , 1 ); return 1; ⁇ int iquadratic( int const* px, int const* py, int N, float* quad)
  • patrnj dx] ordered [pdx]; bugout: delete [] ordered; return rv; ⁇ double
  • RatioPattern :chm(int rn. int cn)
  • RatioBin :release_()
  • RatioBin::operator ( RatioBin const& rhs )
  • RatioBin :- k -RatioBin() ⁇ reiease_();
  • CHM_[idx][jdx] float( ⁇ _.chm(idx,jdx));
  • RatioBin( int nsmpl 4 ); RatioBin( RatioBin const& rhs );
  • RatioBin const& operator ( RatioBin const-fe rhs);
  • int classify( double const* smpls, int scanline ); int analyze(); double sst(short row, short col) const ⁇ return SST_[row][col]; ⁇ void debug(unsigned lvl 0) const; private: void release_(); int nfluor_; float** CHM_; float** SST_;
  • QualCtrl :QualCtrl(QualCtrl const& rhs) : tbgnJO), tendJO), nseg JO), skipseqJNULL)
  • RdrOut :join( SegRead const& sr ) ⁇ int IhsEndO, rhsBgnO.oscnl.nscnl. lShortFall.rShortFall;
  • nv sr.wvfm()->sc la( nscnl, lane ); wvfm().sc a_set( oscnl.lane, osf*ov + nsfnv );

Abstract

The present invention includes a method and apparatus for the detection and analysis of information-containing signals in chromatographic data using iterative blind deconvolution and fuzzy logic algorithms. The invented method analyzes chromatographic data from a wide variety of sources of DNA sequencing information, including gel and capillary electrophoresis. Autoradiograms, single-fluor, four-lane and four fluor, single lane fluorescent chromatographic data are suitable sources of unprocessed input data. The output from the invented base calling method includes called (identified) sequence data and quality values for the called bases.

Description

METHOD AND APPARATUS FOR ANALYSIS OF CHROMATOGRAPHIC MIGRATION PATTERNS
I. Background of the Invention A. Field of the Invention.
This invention relates to the field of signal detection and analysis of chromatographic migration patterns as commonly applied to mixtures of molecules. More specifically, this invention relates to a method and apparatus for signal detection and analysis of chromatographic migration patterns as applied to the determination of DNA sequences.
B. Description of Related Art.
The ability to efficiently and accurately detect and analyze information-containing signals in chromatographic data is important for handling large amounts of data. Such an ability is particularly important for projects such as the Human Genome Project, where large amounts of information will be generated which must be analyzed and integrated to produce a representative sequence of an entire human genome. To expedite the analysis of DNA sequence information, numerous methods have been developed. For example, a U.S. patent to Clark Tibbetts (No. 5.365.455) discloses a method for the automated processing of DNA sequence data. This patent is incorporated by reference herein in its entirety. The Tibbetts' method derives information from informative variables obtained from the input data set.
Such informative variables may include the relative intensities between adjacent signals, the relative signal spacing and pattern recognition factors.
The Tibbetts' method is limited, however, by the quality of the chromatographic data. Tibbetts1 method relies to a certain extent on the reproducibility of chromatographic data to train the base identification ("calling") system. The apparatus generating the chromatographic data, therefore, needs to be consistent from run to run to avoid retraining the algorithm. Because chromatographic data frequently contain background noise and migration aberrations which obscure information-containing signals, analyses based on signal spacing may produce errors in signal identification. Similarly, because signal intensity often varies in an unpredictable manner, signal identification based on intensity may also result in significant identification errors. A U.S. patent of Thomas Stockham and Jeff Ives (No. 5,273,632) discloses an alternate method for base identification using blind deconvolution ("BD"). This patent is incorporated by reference herein in its entirety. The method of Stockham and Ives uses blind deconvolution to deblur information-containing signals in chromatographic data. This method, however, is significantly limited in the following manner. First, it relies on data derived from scanned autoradiogram image data. Second, the method requires user input of the BD filter bandwidth and programmer alterations to various thresholds. Third, the Stockham and Ives method does not adequately deal with lane to lane mobility differences. Fourth, the insertion/deletion and correction logic was too simple. Fifth, the putative peak detection was based on thresholds, and therefore, could miss band detections when band amplitudes dropped below the threshold. Sixth, the method of Stockham and Ives lacked the ability to align and merge adjacent sample segments. Finally, that method lacked band quality measures useful in automatic data routing and or sequence assembly.
II. Summary and Objects of the Invention
The present invention includes a method and apparatus for the detection and analysis of information-containing signals in chromatographic data. The invention also includes a method and apparatus for detecting and sharpening signal peaks in chromatographic data. It is an advantage of the present invention that a chromatographic data from a wide variety of separation processes can be analyzed. Such separation processes include, but are not limited to, gel and capillary electrophoresis.
The present invention includes the steps of preprocessing signal data, reading successive sample segments, selecting blocks of high quality sequence and then producing traces of aligned high quality sequences. It is an advantage of the present invention that the chromatographic data may include single fluor samples fractionated in multiple lanes and multiple fluor samples fractionated in single lanes.
It is an object of the present invention to provide a method for preprocessing chromatographic data by baseline subtracting background noise. It is an advantage of the present invention that the method of baseline subtraction may be varied according to the type of chromatographic data being analyzed. It is a further advantage of the invention that sparse chromatographic data may be interpolated during preprocessing.
It is an object of the present invention to read the preprocessed signals in successive sample segments. It is an advantage that the sample segment size may be sufficiently large to provide for rapid and efficient signal analysis.
It is an object of the invention to provide a method and apparatus for detecting information-containing signals which are not uniformly distributed in the chromatographic data. This analytic technique uses iterative blind deconvolution to determine band frequency in sample segments. It is an advantage of the invention that the filter-band width is automatically varied during iteration to optimally detect the signals in the preprocessed chromatographic data. It is a further function of the invention to detect and correct signal data derived from chromatographic data which have segments which are short in one or more signal types (for example, "band-lite" signals).
It is an object of the present invention to provide a method and apparatus to detect and correct for mobility differences. It is a feature of the invention that mobility differences are corrected using a Monte Carlo alignment rather than using band position or spacing information. It is an advantage of the present invention that the Monte Carlo alignment is an iterative process to optimize signal alignment.
It is an object of the invention to enhance band detection using fuzzy logic. It is a feature of the invention that band detection is performed using fuzzy logic blocks, each block providing a particular method of data analysis. It is an object of the invention that each fuzzy logic block may be optimized for a particular analytic function.
It is an object of the present invention that the invention may optionally provide a quality measure for each signal. It is a feature of the invention that the quality measure can be utilized during subsequent alignment steps. It is an advantage of the invention that the quality measure can provide left and right cutoff point to limit subsequent analysis to data above a given quality measure.
These and other objects, features and advantages of the invention will be clear to a person of ordinary skill in the art upon reading this specification in light of the appending drawings. III. Brief Description of the Drawings
Figure 1 depicts a flow chart for the invented base calling method. Figure 2 depicts a flow chart of the preprocessing step of Figure 1. Figure 3 depicts a flow chart of the base reading step of Figure 1. Figure 4 depicts a flow chart of the extra-normalization step of Figure 3.
Figure 5 depicts a flow chart of the peak detection and refinement step of Figure 3. Figure 6 depicts a flow chart of the OmitOkN fuzzy logic block of Figure 5. Figure 7 depicts a flow chart of the OKSpMembership fuzzy logic block of Figure 6. Figure 8 depicts a flow chart of the OmitOkN Bad Spacing Membership fuzzy logic block of Figure 6.
Figure 9 depicts a flow chart of the OmitOkN Cross Banding fuzzy logic block of Figure 6.
Figure 10 depicts a flow chart of the OmitOkN Height fuzzy logic block of Figure 6. Figure 1 1 depicts a flow chart of the GapCheck fuzzy logic block of Figure 5. Figure 12 depicts a flow chart of the GapCheck Gap Membership fuzzy logic block of
Figure 1 1.
Figure 13 depicts a flow chart of the GapCheck Width Membership fuzzy logic block of Figure 11.
Figure 14 depicts a flow chart of the Monte Carlo Alignment function of Figure 4, Figure 15 depicts a flow chart of the BaseQual fuzzy logic block of Figure 1.
Figure 16 depicts a flow chart of the BaseQual Height Membership fuzzy logic block of Figure 15.
Figure 17 depicts a flow chart of the BaseQual Cross Banding Membership fuzzy logic block of Figure 15. Figure 18 depicts a flow chart of the BaseQual Width Membership fuzzy logic block of Figure 15.
Figure 19 depicts a flow chart of the BaseQual Shape Membership fuzzy logic block of Figure 15.
Figure 20 depicts a flow chart of the BaseQual Baseline Buzz Membership fuzzy logic block of Figure 15. Figure 21 depicts a flow chart of the BaseQual OK Spacing Membership fuzzy logic block of Figure 15.
Figure 22 depicts a flow chart of the Baseline Substraction algorithm of Figure 2. Figure 23 depicts a flow chart of the Pre-Processing Begin/End Detection of Figure 1.
IV. Detailed Description of the Preferred Embodiment
The present invention provides a method and apparatus for detecting and analyzing information-containing signals in chromatographic data. In the preferred embodiment, the invention analyzes chromatographic data from DNA sequence analysis machines employing various and sundry imaging techniques, including autoradiograms, four lane-single fluor, and single lane-four fluor data. The invention further includes general and dedicated apparatuses for performing the invented method. Finally, the invention also includes a kit comprising one or more of the following components in combination with the invented method: a DNA sequence apparatus, signal detection apparatus, information storage devices for preserving chromatographic data before, during and after analysis, and output devices for displaying the analyzed sequence information.
For DNA sequence analysis, the invented method takes as input the output from a DNA sequencing apparatus and returns the called sequence, aligned traces, and band metrics for each called base. After each sample segment is read, its called sequence, aligned traces, and band metrics are joined to previous read segments. After an entire ladder has been read, a final step analyzes each called base's metrics and assigns a quality value. The quality values are used to identify the largest block of high quality sequence and establish left and right cutoff values. If a "preamble" sequence is available, the base calling software will attempt to locate the preamble in the called sequence and set the left cutoff value beyond it. Such preamble sequences may include primer sequences or known sequences which are to be excluded from the collected data. This latter step improves the chance that the sequence called by this software would merge with the least amount of human intervention.
The following sections provide a detailed description of each function of the invented method. The illustrative embodiments of the invention exemplify the application of the useful characteristics discussed below, and further reference to these and other useful and novel features is made in the following discussion of each illustrative embodiment. These exemplary embodiments are intended to limit neither the scope of the method and apparatus that are needed for performing the invented method.
Referring to Figure 1 , the invented base calling software first performs a preprocessing step 102 on the input data set 101. Preprocessing can include spectral separation, background subtraction and interpolation of input data set 101. The preprocessed data set 103 then enters Steps 104-106, which reads successive sample segments of the preprocessed data. The sample segments 104 may be any suitable size which provides efficient signal analysis. In the most preferred embodiment of the invention, the first segment is 2048 scanline samples. Subsequent segments are also 2048 samples, with 148 samples overlapping the previous segment. The following description is based on the most preferred sample segment size, although the scope of the invention is not intended to be limited to that segment size.
Each sample segment 104 is first analyzed to estimate the coarse band spacing. Subsequently, the segment 104 is analyzed at second time 106 to refine the predicted band spacing. The band spacing drives the selection of the reconstruction filter employed during blind deconvolution. Band spacing and filter band width are inversely related. Once a sample segment of 2048 scanlines is read twice (a refined sample segment) and its band spacing measured and normalized for that 2048 scanline segment, the next sample segment of 2048 samples is read. The next sample segment overlaps the previous segment by 148 scanlines (or about 15 nucleotide bases) to establish the frame and relative positioning of adjacent segments. Subsequent segments 104 are similarly processed until the final sample segment 104 is reached. If fewer than 2048 scanlines are available in the original data set, then pseudo-random noise is generated to fill the sample segment to the required 2048 samples. Pseudo-random noise is preferred because sources of non-random noise will cause improper processing during the blind deconvolution and alignment steps.
Once all sample segments have been processed (read twice, normalized and the segments aligned), the processed and aligned data is analyzed in three fuzzy logic blocks. Fuzzy logic allows multivalued logic to enhance peak detection. By using fuzzy logic, a gap is "somewhat big," a band is "not so tall." Fuzzy logic also provides logic operators (for example AND, OR. NOT). Each fuzzy logic block in the base calling method provides a particular analysis of its data. The logic blocks operate on normalized input data and essentially classify each band based on absolute and relative criteria which are based on the band's neighboring bands. For example, fuzzy logic block 108 analyzes each base, its upstream context and assigns a quality value to each called band (base identity). Following the assignment of quality values, fuzzy logic block 108 also identifies the largest block of high quality data in the processed and aligned data 107. The right and left cutoff points for the high quality data block are recorded and set as left and right cutoff points. The output data set 109 includes the finished traces, the called bases with their assigned quality values and the suggested left and right cutoff points. Output data set 109 can optionally be visually enhanced to normalize all bands to about the same band amplitude and to remove the saw tooth appearance of the non-visual ly enhanced traces.
A. Preprocessing
Referring to Figure 2, input data set 201 is an 2048x4 trace matrix. The first step is to established Begin and End points by analyzing input data set 201 for the first scanlines containing above-background signal and for the last scanlines containing such signal (See trace 202a as an example.) Large signal spikes, due to artifacts such as primer peaks, are excluded.
Referring to Figure 23, the Preprocessing Begin/End subroutine identifies the Begin and End points based on signal amplitude. The Begin and End points define the usable signal for subsequent operations. Usable signal typically begins just left of the largest left-most signal amplitude (the so called primer peak), and continues until either the end of the sample segment data or another region of large signal amplitude is encountered. (The latter peak is typically called a biostreptation peak.) More specifically, steps 2302 through 2305 identify the putative start and end points by breaking the sample segment into zones and determining the maximum signal amplitude in each zone. Step 2306 determines whether a second primer peak is present. If the second peak is present, Step 2306 sets the Begin point at the second primer peak. Steps 2307 and 2308 make final adjustments to the Begin and End points, setting the Begin point to the first sample with amplitude below the mean of the first half of the signal, and setting the End point back 350 samples from end. Referring to Figure 2, the baseline of the Preprocessed Begin/End data 202 is then determined at Step 203. A single baseline is established for each fluor of the Preprocessed Begin/End data 202. The baseline is subtracted from Preprocessed Begin/End data 202 to generate a baseline subtracted data set 203. For example, after baseline subtraction, the localized data set 207 becomes baseline subtracted data set 208. In a more preferred embodiment of the invention, a single baseline is established based on data from all lanes. This best determines the baseline beneath a run of poorly resolved bases in one lane. Currently available DNA sequence data precludes this embodiment because no two fluors reliably a common baseline. Referring to Figure 22, the baseline can be established by estimating the baseline of the Preprocessed Begin/End 2201. In the working embodiment, each trace lane is processed twice using a rising exponential threshold. One pass is made from left to right (establishing one baseline) (Step 2202), and the next pass is from right to left (establishing another baseline) (Step 2203). By taking the geometric mean of the two baseline approximations a fairly natural subtrahend is produced. (See sample traces 2205 and 2206.)
To establish a baseline approximation using a rising exponential threshold, a threshold is initially set to the lowest point found within the first 10 samples. As each successive sample is considered, the threshold is incremented by an exponential which slowly ramps upward. When a subthreshold sample is encountered, the baseline between the previous subthreshold point and the current point is taken to be a line segment between the points. The threshold is reset to the new subthreshold sample value and the process continues. If, after 100 samples no subthreshold point has been found, a 100 point segment of the baseline is computed (again, piecewise linear), and the rate of rise of the exponential is increased. The exponential is calculated to rise by 1/3 the amplitude of the most recent subthreshold point over a span of 75 samples.
Following baseline subtraction, baseline subtracted data set 203 is preferably spectrally or leakage separated. This step markedly improves the quality of capillary electrophoresis data. For slab gel data with a signal to noise ratio of 2.0 or less, separation step 204 significantly improves data quality, such that unreadable data can become readable. The separation step 204 is preferably performed during preprocessing without user input. For four fluor-single lane data, the baseline-subtracted data set 203 is spectrally separated. For single fluor-four lane data, data set 203 is leakage separated. For either separation, the separation algorithm 204 builds a characteristic matrix (CHM) which is used to perform the separation. For spectral separation, the characteristic matrix captures the spectral cross-talk ratios in four fluor data. For leakage separation, the characteristic matrix is generated from the ratio of leakage from the signal in the center of the lane in question to the signal in adjacent lanes. For capillary electrophoresis data, the ratios are measured at "peak center." For slab gel data, all data points are used to generate the characteristic matrix.
A separation matrix is calculated according to the formula, SST = inv(CHM*CHMτ)-' where the columns of CHM hold the ratios for each respective lane. The ratios are normalized so that the largest element in each column has a value of one. The result of separation 204 is a separated data set 204.
Processing steps 104-106 are optimally performed on preprocessed data 206 containing at least 8 scanlines (samples) per band. To increase the number of scanlines per band, a baseline subtracted data set 203 or a separated data set 204 may optionally be enhanced to double or triple the number of samples using cubic spline interpolation 205.
B. Reading Referring to Figure 3, the exemplified reading step analyzes sample segments 301 of
2048 scanlines. Each sample segment 301 first undergoes blind deconvolution 302 to cancel the effects of an unknown laurentian blurring function and to normalize the amplitudes of the traces. Blind deconvolution is described in the U.S. Patent to T.G. Stockham and J.T. Ives (No. 5,273,632), which is incorporated by reference herein. The presently invented method includes the following improvements over the method of Stockman and Ives. The first 2048 samples are blind deconvolved with an initial narrowband guess for the filter band width ("FBW") value. The narrow-band guess is made so that the initial reading does not overestimate the band density along the sample segment. Given the resulting conservative estimate of the band density, a subsequent, more apt FBW is chosen and the segment is reread using it. The FBW chosen for the second read of each segment also serves as the FB used for the 1 st read of following segment. This iterative approach to determining the best FBW has proven invaluable in practice; the band densities may vary from about 6 samples/band to about 50 samples/band, and do so not only within a given ladder but also from sequencing run to sequencing run. In the preferred embodiment of the invention, the method is adaptable to a wide range of acceptable inputs. The invented method includes a means for selecting the filter band width (FBW) during blind deconvolution 302. In the working embodiment median band spacing is mapped to a filter band width value using the following equations: K = 2 * sqrt(ln(0.23)/-0.5) FBW = 0.5 + (K med_band_spacing) * (2048/(2*pi)) The blind deconvolution step 302 deblurs the signal and normalizes its amplitude.
Following blind deconvolution, an extra-normalization function 303 adjusts band spacing due to mobility differences in the samples. Extra-normalization 303 also corrects for the tendency of blind deconvolution to create spurious bands, especially in regions of mono-, di- or tri- nucleotide repeats where one or more lanes are band-lite for extended regions. Referring to Figure 4, extra-normalization 303 corrects two types of artifacts created by blind deconvolution. Path 406-410 cancels artifacts created in band-lite lanes. Briefly, the blindly deconvolved data set 406 is scanned for band-lite lanes by comparing the relative lane signal strengths and the relative lane band frequencies. The proxy used for the lane signal strength analysis is the 97th percentile signal amplitude found in each lane. The proxy used for the band frequency is the proportion of the signal over which the lane in question has the largest signal amplitude. If a lane has less than 15% of the total bands found in all four lanes in a sample segment, and if the band amplitudes are low relative to the other lanes, the amplitudes of the bands in that lane are attenuated. If the band amplitude is lowest, those amplitudes are attenuated by one-half. If the band amplitude is above the lowest, the amplitudes are attenuated to three-quarters of the original band amplitude. In contrast, in ideal sequence data, where A, G, C, and T are equal in frequency, each trace should dominate 25% of the time.
Extra-normalization path 401-403 corrects for mobility differences between lanes and performs the actual band-lite attenuation 404. Briefly, the blind deconvolved data set 401 is analyzed to identify any regions with inordinately large or coincident bands. These regions are set to zero (base-line). If these regions were not set to zero the Monte Carlo alignment algorithm 403 would produce an aberrant alignment which focused on separating them.
Referring to Figure 14, mobility shifts are most noticeable in the autoradiogram and single fluor-four lane data particularly near the edges of the gel. The prior method of Stockham and Ives described an algorithm which attempted to align the lanes by driving the band spacing to as nearly a uniform value as possible. This approach was limited because, without proper alignment, many true bands would go undetected because they were shadowed by other bands. The algorithm attempted to normalize spacing between detected bands, yet the algorithm knew of only a simple majority of the bands present in the data. The present invention uses an algorithm which does not use band position or spacing information. Instead, the present invention seeks to maximize the integral of the "envelope" of all four lanes of data when they share a common baseline. Alignment is accomplished using a Monte Carlo search of a 3D space, where the x-axis defines the relation between the A and G lanes, the y-axis defines the relation between the AG relation and the C lane, and the z-axis defines the relation between the AGC relation and the T lane. An initial set of possible alignments are chosen, each triple is applied to the traces to be aligned, and the integral of the resulting envelope is calculated. A subset of the triples, those yielding the largest integrals, are then refined. The triple which yields the lowest integral is removed from the set under consideration. It is replaced by a triple which results from a random alteration of the triple which yields the largest integral. When either a maximum number of iterations has occurred or the variation within the set of high integral triples has reached a suitably low value, the highest yielding triple is chosen as the alignment vector for the segment under consideration. More specifically, the search is conducted in a three dimensional space, where the x- axis specifies the offset between trace, and trace-, the y-axis specifies the offset between the trace, 2 registry and trace., and the z-axis specifies the offset between the trace, 23 registry and trace4 (See illustration 1401). The algorithm employed was originally described by W.L.Price in The Computer Journal, Vol. 20, No. 4, which is incorporated by reference herein.
Initially, a set of putative alignment solutions 1401 is generated. The addresses of the lattice points of 6 concentric cubes centered about a point in the space are used as the initial alignment solutions. The first time the procedure is used the central point of the concentric cube lattice is the origin (x0=0,y0---0,z0---0). Subsequent calls can either continue to center the lattice on the origin, or they can bias the search by centering the lattice on the previous alignment solution (xn.1,yn.„zn.1).
Each alignment guess is converted into a shift vector of four values, wherein one value is 0. Each trace in the matrix is shifted by the amount specified in the shift vector, the envelope of the shifted traces is obtained (the maximum value of the four trace values found at each position along the traces), and is summed. The sum represents the integral of the envelope produced by the alignment guess. A low integral value represents a poor alignment (see, e.g. illustration 1402, where the bands are aligned behind others, not arranged "shoulder to shoulder"), whereas a high integral value corresponds to a good alignment (see, e.g. illustration 1407, where all bands are fully exposed, arranged "shoulder to shoulder" ).
Once all alignment guesses have been evaluated, the worst alignment solution is replaced by a small, random perturbation of the best alignment solution 1405. The new alignment solution is evaluated, and the process repeats, replacing the new worst alignment with a perturbation of the new best alignment. Eventually, the set of points in the 3D space converge about the best alignment solution 1406.
Referring to Figure 3, following extra-normalization 303, Step 304, peak detection and refinement, occurs. The aligned traces then undergo putative peak detection. Referring to Figure 5, putative peak detection 502 is performed on the blind deconvolved, extra- normalized data set 501 (unstopped, attenuated and with the relative mobilities corrected). A trace envelope is first determined. The Stockham and Ives Patent described detecting peaks in each trace separately with thresholds derived from the underlying data. In the invented method, the trace envelope is peak-detected and no thresholds are employed. A peak is liberally defined to be a sample which is taller than either of its two neighbors. Subsequent processing culls this liberally defined putative peak list. This form of peak detection is both faster (one trace instead of four) and less prone to error (no subthreshold peaks). In contrast, the Stockham and Ives Patent required individual trace peak detection because its alignment algorithm attempted to determine lane alignment using peak location information.
To identify errors in putative band detection 502, including insertion errors, each putative peak's instantaneous spacing, cross banding, height, and spacing to adjacent bands is measured (Step 503). These observed band spacing measurements are fit with a quadratic curve. This quadratic fit is used as the expectation of the band spacing along the entire read segment. This approach to defining the expected band spacing is sufficiently general to handle segments where, as in the Stockham and Ives Patent, the average spacing is an adequate expectation, as well as segments where the spacing changes radically. In the invented method, more information was found necessary to sufficiently identify insertions and regions of deletions, and as a result, the invented method can resolve a series of insertions and deletions.
The first of three fuzzy logic blocks 504, OmitOkN Fuzzy Logic, is then used to identify bands which are most likely insertion artifacts of the band detection process. This block classifies the detections as OK, AMBIGUOUS or OMIT. The putative bands given the OMIT classification are removed from the putative peak set. Referring to Figure 6, each band has several of its attributes 601 examined by this first logic block. If a band is where it ought to be with respect to either of its neighbors, then variable okSp is set "TRUE" (Step 602). Referring to Figure 7, the intent of the membership function for the OmitOkN Ok Spacing fuzzy logic block is to "accept" a spacing measurement which is an integer multiple of the expected spacing. Consequently, the observed spacing is normalized to a value on the interval [0..1] using its relationship to expected spacing (Step 702). In the example given in block 702, the normalized spacing of 0.3 is found to be OK with a truth value of 0.7 (Step 703 and Example 704). Given the vagaries of band migration, compressions, band shape (hence band peak position), and other factors, a peak spaced 17 from its neighbor when the expected spacing is 13 is neither ideal nor terrible.
Referring to Figure 8, for the OmitOkN Bad Spacing fuzzy logic block, the intent of the membership function is to "deprecate" a spacing measurement which is not an integer multiple of the expected spacing. Consequently, the observed spacing is normalized to interval [0..1] using its relationship to expected spacing (Step 802). In the example given in Step 802-03 and Example 804, a normalized spacing of 0.3 is found to be BAD with a truth value of 0.5. ; this spacing is not as good as it could be.
If a band is not where it ought to be with respect to either of its neighbors then variable abSp is set "TRUE" (Step 603). If the amount of "cross banding" (i.e. the amount of competition by two bands for a particular region of the read segment) is high, then variable badXb is set "TRUE" (Step 604). Similarly, if there is negligible cross banding then variable neglXb is set "TRUE". Referring to Figure 9, cross banding designates the amount of competition for the scanlines underlying a detected band. Bands of a dubious nature have wide ranging cross band ratios due to their apex's proximity to the baseline. However, compressions and stops, with significant amplitudes, can have their cross banding measured. The cross banding membership function is best used in identifying OK or AMBIGUOUS bands. In the diagram provided (Example 901), the first complex has two bands vying for the same location, with the second largest band having one-half the amplitude of the largest. The cross banding ratio (Step 902) is the amplitude of the largest band divided by the amplitude of the next largest band, or in this case Xb=2.0. In the second complex, where one band is clearly the band of choice, this ratio approaches infinity. In the example given in Step 903, with a cross banding ratio of 1.5, the badXb membership is 0.25, while the negligibleXb membership is 1.0; in other words, while a ratio of 1.5 is found negligible, the band legitimacy will be questioned.
The band height is also categorized as either tiny or ok (Step 605). Referring to Figure 10, for the height membership functions the membership sets are best customized for the general signal quality one observes from the machine providing the data. In the working embodiment a function of the median value of amplitudes measured where bands intersect determines the height membership function break points. In particular, the tinyHt function breaks at 0.4*med_intersect_pt and is zero by l .l*med_intersect_pt. Similarly, the okHt function comes off zero at 0.5*med_intersect_pt and flattens off at 1.0 at
1.5*med_intersect_pt. The blind deconvolution process normalizes band amplitudes to interval [0..1], with most bands having a height in excess of 0.1. This example given is typical in that it begins deprecating a band based on its height when the height falls below 0.07. In the example given in Step 1002, the measured band height is 0.1 and has membership in okHt of 1.0 and in tinyHt of 0.0. The band has, per this example of the sets, sufficient height.
These six variables then serve as input to fuzzy combinational logic. A significant advantage of fuzzy logic is that it works with and can resolve contradictions among rules involving these variables. Bands classified as OK are those with negligible cross banding and either OK height or OK spacing (Step 606). Bands classified as ambiguous exhibit bad cross banding and either OK height or OK spacing and little height (Step 607). Ambiguous bands are typically those where the band is correctly positioned with sufficient amplitude but significant cross banding (Step 607). Bands classified as clear insertions, and therefore to be omitted, are characterized by negligible height (Step 608). Cross banding is not considered when deciding whether a band should be omitted because usually insertions are made very close to the baseline where cross banding measurements are unreliable.
The strength of the rule firings is then used to scale the output sets (Step 609). In the example given (illustration 610), the output set OK is scaled with amplitude 1.0, output set N (ambiguous) is scaled with 0.25, and set OMIT is scaled with 0.0. Defuzzification, or obtaining a crisp (conclusion) value from the output rule sets, is achieved by calculating the centroid of the resultant "masses". In the example, the conclusion reached is that the band is OK (Step 611).
Referring to Figure 5, following fuzzy logic block 504, each peak's instantaneous spacing, instantaneous band width, spacing to its left neighbor (left spacing), band width and called bases is remeasured (Step 505). These observed band spacings are fit with a quadratic curve which then serves as the expected spacing along the read segment. Similarly, the observed band width measurements are also fit with a quadratic curve which serves as the expected band width along the read segment.
The second fuzzy logic block 506, GapCheck Fuzzy Logic, then identifies bands, or gaps between bands, where one or more bands may need to be inserted to achieve the band spacing predicted by the quadratic fit. This block classifies the detections as NORMAL, SPLIT or SUFFERING FROM UPSTREAM TURBULENCE. The gaps are split and a suitable number of bands are inserted (Step 507). The bands given the SPLIT classification are split a suitable number of times, with the division points being the centroid of the interval to be split. The centroid is used to place the insertion on the shoulder of a poorly defined band, and not in the bottom of the trough between the SPLIT band and its left neighbor.
Depending upon the size interval, and the expected band spacing, one or more insertions may be made. Each insertion has a defined Begin, Middle and End scanline value.
Referring to Figure 11 for more detail, each band pair considered by fuzzy logic block GapCheck has several attributes which are measured (Step 1 101). In particular, the expected spacing curve, expected width curve, band width, left gap (gap to the leftmost neighbor) and sequence is determined. The upstream sequence is assigned a measure of GC-richness (Step 1102). These measurements, coupled with the GC richness of the sequence of the 5 bands to the left, are informative in identifying bands which need additional bands added to their left. The gap is normalized with respect to the expected spacing onto interval [-1..inf] (Step 1103). Unlike the OmitOkN logic, where the logic determines if a band is located where it should be (independent of its absolute spacing but focusing instead on how far off the spacing curve it is), in the GapCheck logic block, the concern is on the absolute distance of the band from its left neighbor. If the gap is an integer multiple of the spacing curve (say three spaces from its left neighbor) two bands are inserted to its left to establish the proper spacing. In addition to the gap between bands in the pair, this logic also considers the widths of the bands. Band width is normalized onto interval [- .infj (Step 1 104). Usually, when band resolution decreases and a region in the observed trace contains fewer peaks than are required, one or both bands in the pair is wider than it should be. The gap between the bands can be marginal and the band width can be the determining factor. Finally, large gaps and band widths should be viewed less aggressively in the presence of upstream GC-richness. The normalized left-gaps of each band in a band pair are classified as big (Step 1 105), medium (Step 1 106) or small (Step 1107). Figure 12 provides details of the GapCheck band gap membership function. Briefly, the membership function characterizes an observed gap measurement (ogp) if it differs from expectation (egp). The gap is measured between B„ and B-., (Step 1201). The observed gap is normalized to interval [-1..inf] with the equation: ngp = ogp/egp - 1.0. Referring to the example in Step 1203, a normalized gap of 0.1 is found to have 0.0 membership in all sets; that is, the gap meets expectations and is neither small, medium nor big (Step 1203).
Referring to Figure 1 1, in Step 1 108 the normalized widths of each band in a band pair are classified as big. Figure 13 provides details of the GapCheck band width membership function. This membership function characterizes an observed band width measurement (owd) if it exceeds expectations (ewd). The width is measured between Bn's Begin and End points (Step 1301). In the example given in Step 1303, a normalized gap of 0.2 is found to have membership in BigWidth of 0.2; the band is not that wide, but it is wider than expected (Step 1304). Referring to Figure 11 , in RULE NORM (Step 1109), band „ is not marked as needing its left gap split if any of the following are TRUE: a) there is a large gap (bigGapn) but the upstream sequence is GC-rich, or b) the gap to the first band in the pair is small (smlGap,,.,) and the two bands are not wide (IbigWidn and IbigWid,,.,) (i.e., ignore the gap between the bands), or c) the gap between the two bands is not large (!bigGapn). In step 1110, RULE SPLIT marks band,, as needing its left gap split if either of the following are TRUE: a) the gap between the two bands is large (bigGap and the first and/or second band is wide (bigWid^., or bigWid.) or, b) the gap between the two bands is large (bigGapn) and the gap left of the first band is not small (IsmlGap..,) and the upstream sequence is not gc-rich (Igcrich).
RULE SPLIT (a) detects the combination of a wide and normal band (in either order) while RULE SPLIT (b) selects a run of wide bands separated by large gaps. The strength of the rule firings is then used to scale the output sets (Step 1111). In example 1112, the output set NORMAL is scaled with amplitude 1.0 and output set SPLIT is scaled with 0.25. The conclusion is formed by calculating the centroid of the resultant "masses". In example 1112, the conclusion reached is that the band is NORMAL.
To identify putative peak insertion errors, Step 507 remeasures the cross banding, instantaneous spacing, band height, band amplitude, and the spacing (left and right gaps) to adjacent bands. The observed band spacing measurements are fit with a quadratic curve. This quadratic fit is used as the expectation of band spacing along the entire read segment. The OmitOkN Fuzzy logic block (Step 508) is then used to identify bands which are most likely insertion artifacts of the band detection process. Any and all such bands are removed from the putative peak set. Newly proposed insertions may be deleted in this step. The fuzzy logic band refinement stage adds the important advantage of reducing insertions and deletions and preventing arbitrary band calling when the reader encounters two or three base regions of signal dropout. See Figure 6 and the accompanying text for details of insertion detection. The set of putative peaks which survive this processing are recorded as the bands for the read segment under consideration (Step 509).
C. Processing and Alignment Referring to Figure 1, reading function 104-106 consecutively processes sample segments until all of the input data set 101 is analyzed. Because each sample segment 104 overlaps the previous sample segment by a predetermined amount, the relative positioning of each read and aligned sample segment 106 is known. Step 107 assembles all of the read and aligned sample segments 106 to form a processed and reassembled sample segment 107.
D. Post-Processing Editing
In the working embodiment of the invention, a final process analyzes the set of measured band features with a third fuzzy logic block, BaseQual fuzzy logic block 109. This block assigns a quality measure to each called band. This block evaluates each band based on the band height, width, shape, left and right gap, cross-banding and baseline "buzz." This quality value, on the interval (0.0 to 1.0) can be used during subsequent sequence alignment/merging steps. The present invention uses the quality value to select the longest block of high quality sequence to be considered for alignment and merging with other sequences into a large DNA sequence. The algorithm that selects the left and right cutoff points generates a surface, with the x-axis labeled MOVING AVERAGE FILTER WIDTH, the y-axis labeled THRESHOLD, and the z-axis labeled READ LENGTH. The quality values are filtered with six moving average filters, and the filtered data is compared against nine thresholds. The longest contiguous block of above threshold filtered quality values provides the read length value for the surface for a particular filter width, threshold pair.
Finally, this surface is scaled so that narrow filter and high threshold read lengths are favored over wide filter and low threshold read lengths. The surface maximum z-value is then chosen as the read length, and the associated first and last above threshold filtered quality value indexes serve as the left and right cutoff points, respectively. If a "preamble" sequence was submitted to this EDIT stage, and if the sequence is found beyond the established left cutoff point, the cutoff point is moved further left to exclude the "preamble" sequence.
Referring to Figure 15, the BaseQual fuzzy logic algorithm assesses the quality of the called bases. Experience has shown that some sequence assembly algorithms fail to assemble sequences containing regions of incorrect sequence and that others can only succeed when each base is accompanied by an indication of its quality (or inversely, its probability of error). In the former case, if the incorrect sequence regions are masked from consideration by the assembly program, the bulk of the good sequence will successfully assemble. In the latter case, if the low quality regions are identified, the overall base caller product will assemble. In either case, incorrect sequence, encountered in isolation, can be and usually is identified by an experienced technician using visual inspection. That process is time consuming and monotonous and subtle errors may go undetected. In general though, incorrect base calling is done where the underlying data traces are marginal.
The BaseQual routine, automates quality assessment by measuring and analyzing multiple features (Step 1501) of each called base. Fuzzy logic is used to identify certain band presentation patterns and assign levels of quality to them. These band features include the band height, cross-banding, band width, band shape, the band's small gap and the band's large gap.
Band height variations are informative in many of the classifications. Six fuzzy variables are used to classify a band's height (tiny, small, moderate, normal, tall and collectively, OK) (Step 1502). Referring to Figure 16 for details of the BaseQual height membership functions, the membership function characterizes an observed band height measurement. A band with a "tiny" or "small" height is usually suspect, with the tiny bands being more suspect that the merely small. Moderate height bands, and tall bands, also require scrutiny. Tall bands are suspect because usually they are found amid stops, compressions, and, on slab gels, artifacts. In the example given in Step 1602, a band height of 0.18 is found to have membership in NormalHeight of 1.0, which is to say that the band's height is within tolerances (See Example 1603).
Referring to Step 1503, cross banding, the measure of competition by two traces for the same region of the trace, is also informative. Referring to Figure 17, the BaseQual cross banding membership functions characterize an observed band's cross banding measurement. The cross banding measurement is the ratio of the dominant trace to the next dominant trace. Ratios above 1.5 are deemed to have negligible cross banding, whereas those with lower ratios (with 1.0 being the lowest ratio possible) are suspect. Referring to the example in Figure 1702, a cross banding ratio of 1.35 is found to have membership in negligibleXb of 0.33 (and 0.67 in the negation, InegligibleXb). Referring to Step 1504, band width (normalized based on a quadratic fit of observed band widths), is another informative variable. In Figure 18, the BaseQual band width membership function, a band's observed width is normalized with respect to the expected band width. The intent of the membership function is to determine how normal the normalized band width is. Referring to the example in Step 1802, a normalized width of 0.2 has membership in the Normal set of 1.0 (See Example 1803). Referring to Step 1505, the band shape, the linear correlation coefficient between the coefficients of a quadratic fit of the band and the coefficients of a quadratic fit of an ideal band, identifies abnormally shaped bands. The BaseQual band shape membership function is informative in determining the quality of the base call. The range of band heights and widths observed in a run varies considerably. In one embodiment, sample rate conversion normalizes the observed band width, the band amplitude was normalized to 1.0, and the result was then compared against an ideal, gaussian bell-shaped band. The approach is computationally expensive and much information regarding the observed shape is discarded through the morphing process.
In a more preferred embodiment, each band's height values are fit with a quadratic curve. Similarly, an ideal band shape is fit with a quadratic curve. (The ideal band shape is defined to have normal height and the expected width.) This approach reduces each sample set to an equal number of points. The shape metric is taken as the linear correlation coefficient of these two sample sets. Experience has shown that a band's shape is "abnormal" if this shape metric falls below 0.5. Referring to the example in Step 1902, a shape metric of 0.6 is found to have membership in GoodShape of 1.0 (See example 1903). Referring to Step 1506, "baseline buzz," defined as the ratio of two other ratios, helps identify regions of the trace (usually the ends) where there is competition by several traces for the called band's domain. Toward the margins of a trace the baseline can often become quite busy, and when it does the quality of the underlying data, and the reads made thereon, become suspect. Baseline buzz can result from either incorrect signal processing, or from underlying data being so erratic as to defy correct signal processing. In either case the called sequence should come under suspicion. Referring to Figure 20, a buzz measurement above 0.2 begins to signal a problematic sequence. Referring to the example is step 2003 and Example 2004, a buzz measurement of 0.28 has membership in okBuz of 0.63 (and 0.37 in the negation, lokBuz). In this case, the band's quality has come into question. Referring to Step 1507, the gaps to a band's left and right neighbor are further informative variables in assessing band quality. These measurements help identify bands that, despite all previous efforts to the contrary, remain positioned too close or too far from a preferred position. Referring to Figure 21 for details of BaseQual band spacing membership functions, a band's observed spacing is normalized with respect to the expected spacing. The intent of the membership function is to determine how normal the normalized band spacing is. In the example given in blocks 2102 and 2103, a normalized spacing of 0.2 receives an unqualified OK, with membership in OK Spacing of 1.0.
Referring to Step 1508, logical combinations of several variables (e.g. buzz, width, shape, and spacing) help keep the rules for assigning a quality value to a band tractable. Variable IBad indicates that one of the measures was out of tolerance. Similarly, variables 2 Bad, 3 Bad. and 4Bad indicate that two, three, or all four measurements are out of tolerance. Finally, variable 40k notes that all four measurements are in tolerance.
Subsequently, a quality assessment is determined through application of a series of nine rules. In RULE QUAL10, the lowest quality assessment is made of bands that are tiny in height and are incorrectly positioned (Step 1509). For a band which matches this rule to some degree, the rule will assign a nonzero scale value to the output set with centroid near 0 (Step 1519). The second quality assessment, RULE QUAL20, is made of bands that are short, show signs of cross banding, and are incorrectly positioned (Step 1510). For a band which matches this rule to some degree, the rule will assign a nonzero scale value to the output set with centroid near 13 (Step 1519). The third quality assessment, RULE QUAL30, is made of bands which are tiny in height yet correctly positioned (Step 1511). A band which matches this rule to some degree will be assigned a nonzero scale value to the output set with centroid near 25 (Step 1519). The fourth quality assessment, RULE QUAL40, is made of bands with small or moderate height and which show signs of cross banding or have 3Bad or 4Bad attributes (Step 1512). A band which matches this rule to some degree will be assigned a nonzero scale value to the output set with centroid near 38 (Step 1519).
The fifth quality assessment, RULE QUAL50, is made of bands with small or moderate height and which show either some degree of cross banding or have 2Bad or 3Bad attributes (slightly better than quality class 4 in that these might have one less bad attribute) (Step 1513). A band which matches this rule to some degree will assign a nonzero scale value to the output set with centroid near 50 (Step 1519). RULE QUAL60, the sixth quality assessment, is applied to bands with OK height but which show signs of baseline buzz, non- negligible cross banding, or have 2Bad attributes (Step 1514). A band which matches this rule to some degree will assign a nonzero scale value to the output set with centroid near 63 (1519). Bands which have higher degrees of quality satisfy the seventh to ninth quality assessments. The seventh quality assessment, RULE QUAL70, is made of bands in one of three general classes. (Step 1515). One class of bands has OK height, little baseline buzz, negligible cross banding, but has 2Bad attributes. Another class of bands shows negligible cross banding, has OK height, no baseline buzz, is correctly positioned, but has both abnormal width and bad shape. (This class, named runfil, is characteristic of a band inserted within a poorly resolved run of bands). The final class of bands has 4Ok attributes but has small height and possibly some degree of cross banding present. A band which matches this rule to some degree will assign a nonzero scale value to the output set with centroid near 75. (Step 1519). The eighth quality assessment, RULE QUAL80, is made of bands with OK height, little baseline buzz, negligible cross banding, but IBad attribute. (Step 1516). A band which matches this rule to some degree (many do) will assign a nonzero scale value to the output set with centroid near 88. (Example 1519). The top quality assessment. RULE QUAL90. is made of bands with absolutely nothing visually wrong with them (Step 1517). A band which matches this rule to some degree (again, given good quality input, many do) will assign a nonzero scale value to the output set with centroid near 100 (Step 1519). Finally, as with all the other fuzzy logic blocks, the output sets are scaled with the strength of their respective rule firings, and the centroid is calculated to determine the final quality assessment.
E. Final Sequence Assembly
The final quality assessments from the BaseQual Fuzzy Logic analysis control the length of the final sequence 109. Where high quality sequence data is desired, the quality assessments determines may limit the read length of the final sequence. Where longer read lengths are desired, and lower quality sequence is acceptable, the quality assessments can aid is correlating the resulting sequence data from other sequence analysis. For example, when overlapping sequences are obtained, the quality assessments can determine which base calls are more reliable. Similarly, when both strands of a DNA sequence are available, the quality assessments aid in identifying higher probability base calls.
F. Computer Implementation of the Base Calling Software
The invented Base Calling Software can be implemented on standard desktop computers, such as Pentium- and 486-containing PC's. Computers with less powerful processors are also suitable, although the overall processing time for each input data set will be slower. Such computers will preferably include at least a central processing unit, dynamic memory and a device for outputting processed information. The invented base calling software can be stored on any suitable storage media, including computer diskettes, removable media, hard-drives, CD's, magnetic tapes and similar electronic storage means.
* FILE: AboutBQ.hxx
* AUTHOR: Andy Marks */ #if!defined(_ABOUTBQ_HXXJ # define _ABOUTBQ_HXX_ #ifdefined(WIN32) class _declspec( dllexport ) AboutBQ #else class AboutBQ #endif
{ public:
AboutBQ(); -AboutBQO; char const* productName() const; float productVersion() const; int numProceduresO const; char const* procedureName(int pdx) const; };
#endif
I*** **********if^**************r%ilf**rr ******************* ************
* FILE: AboutBQ.cxx * AUTHOR: Andy Marks
*/
#include <basecall/AboutBQ.hxx> static char const* plut_[] =
{ "Base Calling With Quality Metrics", "Set Spectral Separation Matrix", "Set Read Start And End Scanlines", "Illegal Procedure Index"
};
#define NPS (sizeof(plut_)/sizeof(plut_[0])) AboutBQ ::AboutBQ()
{
}
AboutBQ: :~AboutBQ()
{ } float
AboutBQ ::productVersion() const
{ return 0.995f; } char const* AboutBQ::productName() const
{ static char const* prod = "BaseQual: Cimarron Software Inc."; return prod;
} int
AboutBQ: :numProcedures() const
{ return NPS- 1 ;
} char const*
AboutBQ ::procedureName(int idx) const
{ if(idx<0 || idx>=NPS) idx = NPS- 1 ; return plut_[idx]; }
* FILE: bandqual.cxx */
#include <basecall/mb.hxx> #include <basecall/fuzzyset.h> void RdrOut::bandqual()
{ static double const tinyh_x[] = {0.00, 0.02, 0.03}, tinyh_y[] = { 1.00, 1.00, 0.00}; static double const smalh_x[] = {0.015, 0.025, 0.04, 0.05}, smalh_y[] = {0.000, 1.000, 1.00, 0.00} ; static double const modrh_x[] = {0.035, 0.045, 0.14, 0.16}, modrh_y[] = {0.000, 1.000, 1.00, 0.00}; static double const normh_x[] = {0.135, 0.145, 0.48, 0.55}, normh_ [] = {0.000, 1.000, 1.00, 0.00}; static double const tallh_x[] = {0.475, 0.485. 1.00}, tallh_y[] = {0.000. 1.000, 1.00};
# define TINYH_SZ (sizeof(tinyh_x)/sizeof(tinyh_x[0]))
# define SMALH_SZ (sizeof(smalh_x)/sizeof(smalh_x[0]))
# define MODRH SZ (sizeof(modrh_x)/sizeof(modrh_x[0])) # define NORMH_SZ (sizeof(normh_x)/sizeof(normh_x[0]))
# define TALLH_SZ (sizeof(tallh_x)/sizeof(tallh_x[0])) static double const presx_x[] = { 1.0, 1.35}, presx_y[] = { 1.0, 0.0}; static double const neglx_x[] = { 1.35, 1.5}, neglx_y[] = {0.00, 1.0};
# define PRESX_SZ (sizeof(presx_x)/sizeof(presx_x[0])) # define NEGLX_SZ (sizeof(neglx_x)/sizeof(neglx_x[0])) static double const normw_x[] = {-1.0, -0.4, 0.4, 1.0}, normw_y[] = { 0.0, 1.0, 1.0, 0.0};
# define NORMW SZ (sizeof(normw_x)/sizeof(normw_x[0])) static double const goods_x[] = { 0.4, 0.5, 1.0}, goods_y[] = { 0.0, 1.0, 1.0};
# define GOODS_SZ (sizeof(goods_x)/sizeof(goods_x[0])) static double const goodg_x[] = {0.3, 0.5, 0.7}, goodg_y[] -= {1.0, 0.0, 1.0}; # define GOODG_SZ (sizeof(goodg_x)/sizeof(goodg_x[0])) static double const okbuz_x[] = {0.0, 0.2, 0.4}, okbuz_y[] = { 1.0, 1.0, 0.0};
# define GOODB_SZ (sizeof(okbuz_x)/sizeof(okbuz_x[0])) static double const concl_x[] = {-0.100, 0.010, 0.125 }; static double const conc2_x[] = { 0.010, 0.125, 0.250 }; static double const conc3_x[] = { 0.125, 0.250, 0.375 static double const conc4_x[] = { 0.250, 0.375, 0.500 }; static double const conc5_x[] = { 0.375, 0.500, 0.625 }; static double const conc6_x[] = { 0.500, 0.625, 0.750 }; static double const conc7_x[] = { 0.625, 0.750, 0.875 } ; static double const conc8_x[] = { 0.750, 0.875, 0.990 }; static double const conc9_x[] = { 0.875, 0.990, 1.105 } ; static double const concn_y[] = {0.0, 1.0, 0.0};
# define CONCL_SZ (sizeof(concl_x)/sizeof(concl_x[0])) static double const conc_x[] = {0.0, 1.0}, conc_y[] = {0.0, 0.0} ;
# define CONC_SZ (sizeof(conc_x)/sizeof(conc_x[0])) CFuzzySet *tinyH, *smalH, *modrH, *normH, *tallH; CFuzzySet *presX. *neglX, *normW, *goodS, *goodG, *goodB; int NB = bandStats_.len(), idx; tinyH = ConstructCFuzzySet( TINYH_SZ, tinyh_x. tinyh_y, NOHEDGE ); smalH = ConstructCFuzzySet( SMALH_SZ, smalh_x. smalh_y. NOHEDGE ); modrH = ConstructCFuzzySet( MODRH SZ. modrh_x, modrh_y, NOHEDGE ). nor H = ConstructCFuzzySet( NORMH SZ, normh x, normh_y, NOHEDGE ). tallH = ConstructCFuzzySet( TALLH_SZ, tallh_x, tallh_y, NOHEDGE ); presX = ConstructCFuzzySet( PRESX_SZ, presx x, presx_y, NOHEDGE ); neglX = ConstructCFuzzySet( NEGLX_SZ, neglx_x, neglx_y, NOHEDGE ); normW = ConstructCFuzzySet( NORMW SZ, normw x, normw^y, NOHEDGE ); goodS = ConstructCFuzzySet( GOODS SZ, goods_x. goods_y, NOHEDGE ); goodG = ConstructCFuzzySet( GOODG_SZ, goodg_x, goodg_y, NOHEDGE ); goodB = ConstructCFuzzySet( GOODB_SZ, okbuz_x. okbuz_y, NOHEDGE ); for(idx=0;idx<NB;idx++) { double tinyh, smalh, modrh, normh, tallh, okbuz, runchl , runfil, fourok; double presx, neglx, normw, goods, goodgl,goodg2,okgp, conj, okhght;
CFuzzySet *CONCLUSION, *C1, *C2, *C3, *C4, *C5, *C6, *C7, *C8, *C9;
CONCLUSION = ConstructCFuzzySet( CONC_SZ, conc_x, conc_y, NOHEDGE ); C 1 = ConstructCFuzzySet( CONCL_SZ, cone 1 _x. concn_y, NOHEDGE );
C2 = ConstructCFuzzySet( CONCL_SZ, conc2_x, concn_y, NOHEDGE );
C3 = ConstructCFuzzySet( CONCL_SZ, conc3_x, concn_y, NOHEDGE );
C4 = ConstructCFuzzySet( CONCL_SZ, conc4_x, concn^y, NOHEDGE );
C5 = ConstructCFuzzySet( CONCL_SZ, conc5_x, concn_y, NOHEDGE ); C6 = ConstructCFuzzySet( CONCL_SZ, conc6_x. concn^y, NOHEDGE );
C7 = ConstructCFuzzySet( CONCL_SZ, conc7_x, concn_y, NOHEDGE );
C8 = ConstructCFuzzySet( CONCL_SZ, conc8_x, concn_y, NOHEDGE );
C9 = ConstructCFuzzySet( CONCL_SZ, conc9_x, concn_y, NOHEDGE ); tinyh = tinyH->membership( tinyH, bandStats .hght(idx) ); smalh = smalH->membership( smalH, bandStats .hght(idx) ); modrh = modrH->membership( modrH, bandStats .hght(idx) ); normh = normH->membership( normH, bandStats .hght(idx) ); tallh = tallH->membership( tallH, bandStats .hght(idx) ); presx = presX->membership( presX, bandStats .xbnd(idx) ); neglx = neglX->membership( neglX, bandStats_.xbnd(idx) ); normw = normW->membership( normW. bandStats .widt(idx) ); goods = goodS->membership( goodS, bandStats_.shap(idx) ); okbuz = goodB->membership( goodB, bandStats .buzz(idx) ); goodgl = goodG->membership( goodG, bandStats_.sgap(idx) ); goodg2 = goodG->membership( goodG, bandStats_.lgap(idx) ); okgp = AND(goodgl,goodg2);
Cl->scale( Cl , AND( tinyh, NOT(okgp)) );
C2->scale( C2, AND( smalh, AND(presx,NOT(okgp))) );
C3->scale( C3, AND( tinyh, okgp) ); okhght = OR(modrh,OR(normh.tallh)); runchl = AND(NOT(goods),okhght); runfil = AND(runchl.AND(neglx,AND(normw,AND(okbuz.okgp)))); fourok = AND(okbuz.AND(normw,AND(goods,okgp))); double cj I,cj2,cj3,cj4,cj5,cj6, lbad, _2bad, _3bad, _4bad; cj l = AND(okbuz.AND(normw,AND(goods,NOT(okgp)))); cj2 = AND(okbuz.AND(normw,AND(NOT(goods),okgp))); cj3 = AND(okbuz,AND(NOT(normw),AND(goods,okgp))); cj4 = AND(NOT(okbuz),AND(normw,AND(goods,okgp)));
_lbad = OR(cjl ,OR(cj2,OR(cj3,cj4))); cj l = AND(okbuz.AND(normw,AND(NOT(goods),NOT(okgp)))); cj2 = AND(okbuz.AND(goods.AND(NOT(normw),NOT(okgp)))); cj3 - AND(okbuz.AND(okgp,AND(NOT(normw),NOT(goods)))); cj4 = AND(normw,AND(goods,AND(NOT(okbuz),NOT(okgp)))); cj5 = AND(normw.AND(okgp,AND(NOT(okbuz),NOT(goods)))); cj6 -= AND(goods,AND(okgp,AND(NOT(okbuz),NOT(normw)))); _2bad = OR(cj 1 ,OR(cj2,OR(cj3,OR(cj4,OR(cj5,cj6))))); cj l = AND(okbuz,AND(NOT(normw),AND(NOT(goods),NOT(okgp)))); cj2 = AND(normw.AND(NOT(okbuz),AND(NOT(goods),NOT(okgp)))); cj3 = AND(goods,AND(NOT(okbuz),AND(NOT(normw),NOT(okgp)))); cj4 = AND(okgp,AND(NOT(okbuz),AND(NOT(normw),NOT(goods)))); _3bad = OR(cj 1 ,OR(cj2,OR(cj3,cj4)));
_4bad = AND(NOT(okbuz).AND(NOT(normw),AND(NOT(goods).NOT(okgp)))); C4->scale(C4,AND(OR(smalh.modrh),OR(presx, OR(_3bad._4bad))));
C5->scale(C5,AND(OR(smalh.modrh),OR(NOT(neglx),OR(_2bad,_3bad))));
C6->scale(C6,AND(okhght,OR( OR(NOT(okbuz),NOT(neglx)), _2bad ))); conj = AND(okhght,AND(okbuz,AND(neglx,_2bad))); C7->scale(C7,OR(conj,OR(runfil,AND(smalh.AND(NOT(presx).fourok)))));
C8->scale(C8,AND(okbuz,AND(okhght,AND(neglx,_lbad))));
C9->scale(C9,AND(okhght,AND(neglx,fourok)));
CONCLUSION->cj( CONCLUSION, Cl, DISJ );
CONCLUSION->cj( CONCLUSION, C2, DISJ ); CONCLUSION->cj( CONCLUSION, C3, DISJ );
CONCLUSION->cj( CONCLUSION, C4, DISJ );
CONCLUSION->cj( CONCLUSION, C5, DISJ );
CONCLUSION->cj( CONCLUSION, C6, DISJ );
CONCLUSION->cj( CONCLUSION, C7, DISJ ); CONCLUSION->cj( CONCLUSION, C8, DISJ );
CONCLUSION->cj( CONCLUSION, C9, DISJ ); double concl;
(void)CONCLUSION->fcentroid( CONCLUSION, &concl ); bandStats_.qual(idx,float(concl)); DestructCFuzzySet(CONCLUSION);
DestructCFuzzySet(C 1 );
DestructCFuzzySet(C2);
DestructCFuzzySet(C3);
DestructCFuzzySet(C4); DestructCFuzzySet(C5);
DestructCFuzzySet(C6);
DestructCFuzzySet(C7);
DestructCFuzzy Set(C8) ;
DestructCFuzzySet(C9); }
DestructCFuzzySet(tinyH); DestructCFuzzySet(smalH); DestructCFuzzySet(modrH); DestructCFuzzySet(normH); DestructCFuzzySet(tallH); DestructCFuzzySet(presX); DestructCFuzzySet(neglX); DestructCFuzzySet(normW); DestructCFuzzySet(goodS); DestructCFuzzySet(goodG); DestructCFuzzySet(goodB);
> i
j*********************************************** ******************
* FILE: blindeconv.cxx * AUTHOR: Andy Marks
*/
#include <basecall/mb.hxx>
#include <nrc/Complex.hxx>
#ifdefined(DEBUG) static void dzp( double const* p, int n, char const* fname )
{
FILE* fp = ::fopen(fname,"w"); if(NULL != fp) { Complex const* z = (Complex const* )&p[- 1 ] ; for(int idx=l; idx<=n; idx++) ::fprintf(fp,"%l 1.6f %11.6f\n",z[idx].real(),z[idx].imag()); ::fclose(fp);
} } static void drp( double const* p, int n, char const* fname )
{ FILE* fp = ::fopen(fname,"w"); if(NULL != fp) { for(int idx= 1 ; idx<=n; idχ-H-)
::fprintf(fp,"%l 1.6f\n",p[idx]); ::fclose(fp);
} } #endif static void fftshift( double* v, int n )
{ int idx. mid = n/2; for(idx=l ; idx<=mid; idx++) { double t = v[idx]; v[idx] = v[mid+idx]; v[mid+idx] = t;
} } static void dCadd( double* cl, double const* c2, int n )
{ Complex* zl = (Complex*)&cl [-l]; Complex const* z2 = (Complex const*)&c2[-l]; for(int idx=l ; idx<=n; idx++) zl[idx] += z2[idx];
} static void dCln( double* pc, int n ) { Complex* z = (Complex*)&pcf-l ]; for(int idx=l ; idx<=n; idx++) z[idx].ln();
} static void dCexp( double* pc, int n )
{ Complex* z = (Complex*)&pc[-l]; for(int idx-=l ; idx<=n; idx++) z[idx].exp();
} static void dRCmul( double* pc, double const* pr, int n )
{ Complex* z = (Complex*)&pc[-l]; for(int idx=l ; idx<-=n; idx++) z[idx].cmul(pr[idx]);
} static void dRkCmul( double* pc. double Rk, int n )
{ Complex* z = (Complex*)&pc[-l]; for(int idx=l ; idx<=n; idx++) z[idx].cmul( Rk ); } enum RI { DREAL, DIMAG }; static void dGetComponent( double const* pin, double* pout, RI ri. int n )
{ int idx, kdx; if(DREAL == ri) { idx = 1 ; kdx = 1 ; } else { idx = 2; kdx = -l ; } for( ; idx<=2*n; idx+=2) { pout[idx+kdx] = 0.0; pout[idx] = pin[idx]; }
} static void dCNozeros( double* ivec, int n )
{ Complex* z = (Complex*)&ivec[-l]; for(int idx=l ; idx<=n; idx++) if(z[idx] = 0.0) z[idx].real( DBL_EPSILON );
} void
SegRead::blindeconv( Wvfm& wv, int FBW )
{
#if defined(DEBUG) static char call = 0; #endif int lane, jdx, sdx; #if defined(DEBUG)
#endif if(FBW != pmb_->lastFbw { double fsigma, alpha, c; pmb_->lastFbw_ = FBW; fsigma = double((FBW-l)*2)*NR_PI/double(NPTS); alpha = 0.5* fsigma* fsigma; c = ::sqrt(NR_PI/alpha); for(sdx=l ; sdx<=NPTS; sdx++) pmb_->filter_[sdx] = c * ::exp( -pmb_->ww_[sdx] / (4*alpha) ); ::fftshift( pmb_->filter_, NPTS );
} for(lane=l ; lane<=4; lane-H-) { forOdx-=sdx=l ; sdx<=NPTS; jdx+-2, sdx+=l ) { pmb_->ivec_[jdx] = wv.sc_la( sdx, lane ); pmb_->ivec_[jdx +1] = 0.0;
} ::fftshift( pmb_->ivec_, 2*NPTS ); ::dfourl ( pmb_->ivec_, NPTS, 1 );
::dCNozeros( pmb_->ivec_, NPTS );
::dCln( pmb_->ivec_, NPTS );
::dGetComponent(pmb_->ivec_,pmb_->imag_,DIMAG,NPTS);
::dGetComponent( pmb_->ivec_, pmb_->ivec_, DREAL, NPTS ); ::dfourl( pmb_->ivec_, NPTS, -1 );
::dRkCmul( pmb_->ivec_, 1.0/double(NPTS), NPTS ); #ifdefined(DEBUG) if(l==call) drp(pmb_->lifter_.NPTS,"lifter"); if( 1 ==call) dzp(pmb_->ivec_,NPTS,"cepstrum"); #endif
::dRCmul( pmb_->ivec_, pmb_->lifter_, NPTS );
::dfourl( pmb_->ivec_, NPTS, 1 );
::dGetComponent(pmb_->ivec_,pmb_->ivec_,DREAL,NPTS );
::dCadd( pmb_->ivec_, pmb_->imag_, NPTS ); ::dCexp( pmb_->ivec_, NPTS );
::dRCmul( pmb_->ivec_, pmb_->filter_, NPTS );
::dfourl( pmb_->ivec_, NPTS, -1 );
::dRkCmul( pmb_->ivec_, 1.0/double(NPTS), NPTS );
::fftshift( pmb_->ivec_, 2*NPTS ); if(l-==pmb_->remune()) for(sdx=jdx=l; sdx <= NPTS; sdx+=l, jdx+=2) wv.sc_la_set(sdx.lane, pmb_->ivec_[jdx]);
I************************************************************ ***********
* FILE: Centroid.cxx
AUTHOR: Andy Marks */
#include <nrc/Centroid.hxx> int icentroid( int const* px. int const* py, int n )
{ int rv; if(l==n) rv = px[l]; else if(2=n) rv = (px[l]+px[2]+l)/2; else { long numer=0L, denom-=0L; for(int i=2; i<=n; i++) { long t = py[i-l]*(2*px[i-l] + px[i]) + py[i]*(px[i-l]+2*px[i]); numer = numer + t*(px[i]-px[i-l]); denom = denom + (px[i] - px[i-l])*(py[i-l]+py[i]);
} rv = int(0.5f + float(numer)/float(3L*denom));
} return rv;
} float fcentroid( float const* px, float const* py, int n ) { float rv; if(l=n) rv = px[l]; else if(2=n) rv = (px[l]+px[2])/2.0f; else { double numer=0.0, denom=0.0; for(int i=2; i<=n; i-H-) { double t = py[i-l]*(2.0*px[i-l] + px[i]) + py[i]*(px[i-l]+2.0*px[i]); numer = numer + t*(px[i]-px[i-l]); denom = denom + (px[i] - px[i-l])*(py[i-l]+py[i]);
} rv = float(numer/3.0*denom);
} return rv;
}
* FILE: Centroid.hxx * AUTHOR: Andy Marks */
#if !defιned(_CENTROID_HXXJ
# define _CENTROID_HXX_
#include <nrc/nrutil.hxx> DLLexport int icentroid(int const* px, int const* py, int n);
DLLexport float centroid(float const* px, float const* py, int n);
#endif
I************************************************************************* * FILE: Complex.cxx
AUTHOR: Andy Marks * COPYRIGHT (c) 1996, University of Utah
*/
#include <nrc/Complex.hxx> Complex const& Complex::operator=(Complex const& b)
{ if(this != &b) { r_ = b.r_; i_ = b.i_; } return *this;
}
Complex
Complex: :operator+(Complex const& b) const {
Complex c; c.r_ = r_ + b.r_; c.i_ = i_ + b.i_; return c; }
Complex const&
Complex: :operator+=(Complex const& b)
{ this = *this + b; return *this;
}
Complex
Complex::operator*(Complex const& b) const
{ Complex c; cr = r *b.r - i *b.i ; c.i_ = i_*b.r_ + r_*b.i_; return c;
}
Complex const& Comple ::operator*=(Complex const& b)
{ *this = *this * b; return *this;
} Complex
Complex: :operator-(Complex const& b) const
{ Complex c; c.r_ = r_ - b.r_; c.i_ -= i_ - b.i_; return c;
}
Complex const&
Complex: :operator----(Complex const& b) {
this =- *this - b; return this;
}
Complex Complex: :operator/(Complex const-fe b) const
{ Complex c; double R, D; if(::fabs(b.r >= ::fabs(b.i ) { if(0.0 = b.r J return c;
R = b.i / b.r ; D = b.r_ + R*b.i_; if(0.0 = D) return c; c.r_=(r_+R*i_)/D; c.i_ = (i_ - R*rJ/D; } else { if(0.0 = b.i J return c; R =b.r_/b.i_; D = b.i_ + R*b.r_; if(0.0==D) return c; c.r_ = (R*r_ + i_)/D; c.i_ = (R*i_ - rJ/D;
} return c; }
Complex const&
Complex: :operator/=(Complex const& b)
{ this = this / b; return this;
} void
Complex::exp()
{ double ez = ::exp( r_ ); r_ = ez * ::cos( i_); i_ = ez * ::sin( i_);
} void Complex::ln()
{ double ang = angle(); r_ = 0.5 * ::log( r_*r_ + i_*i_ ); i_ = ang;
} void
Complex ::conj()
{ = - ; } void
Complex: :cmul( double k )
{ r_ *= k; i_ *= k; } void dCMul( double const* a, double const* b, double* c, int n) t
Complex const *za = (Complex const*)&a[-l], *zb = (Complex const* )&b[- 1 ] ;
Complex *zc = (Complex *)&c[-l ];
Figure imgf000043_0001
zc[idx] = za[idx] * zb[idx];
}
^********** *************************************** *********************
* FILE: Complex.hxx AUTHOR: Andy Marks
* COPYRIGHT (c) 1996, University of Utah */
#ifndef COMPLEX HXX #define _COMPLEX_HXX_ #include <math.h> class Complex
{ public:
Complex( double real = 0.0, double imag = 0.0 ) : r_(real), i (imag) { };
Complex( double const nr[2] ) : r_(nr[0]), i_(nr[l]) {};
Complex( Complex const& rhs ) : r_(0.0), i_(0.0) { this = rhs; } ;
~Complex() {}; Complex const& operator=(Complex const& b);
Complex operator+(Complex const& b) const;
Complex operator* (Complex const& b) const;
Complex operator-(Complex const& b) const;
Complex operator/(Complex const& b) const; int operator-==(Complex const& b) const {return(r_=b.r_ && i_-==b.i_);}
Complex const& operator+=(Complex const& b);
Complex const& operator*=(Complex const& b);
Complex const& operator-=(Complex const& b);
Complex const& operator/=(Complex const& b); void real(double r) { r_ = r; } void imag(double i) { i_ = i; } void exp(); void ln(); void conjO; void cmul(double k); double real() const { return r_; } double imag() const { return i_; } double abs() const { return ::sqrt(r_*r_ + i_*i_); } double angle() const { return ::atan2(i_,r_); } private: double r_, i_;
}; void dCMul( double const* a, double const* b, double* c, int n); #endif
* FILE: dfourl .cxx
* TYPIST: Andy Marks */ ^include <nrc/nr.hxx>
#define SWAP(a,b) { temρr=(a); (a)=(b); (b)=tempr; } #if Idefined(SA) void dfourl( double data[], unsigned long nn, int isign ) { unsigned long n, mmax, m, j, istep, i; double wtemp, wr,wpr,wpi,wi,theta; double tempr, tempi; n = nn«l ; j=l ; for(i-=l ;i<n;i+=2) { if(j>i) {
SWAP(data[j],data[i]);
SWAP(data[j+l],data[i+l]); } m = n»l; while(m >= 2 && j>m) { j -= m; m »= 1 ; } j+=m; } mmax = 2; while(n>mmax) { istep = mmax«l; theta = isign*(2.0*NR_PI/mmax); wtemp = sin(0.5*theta); wpr = -2.0*wtemp*wtemp; wpi = sin(theta); wr = 1.0; wi = 0.0; for(m=l ;m<mmax:m+-=2) { for(i=m;i<=n;i+---istep) { j=i+mmax; tempr = wr*data[j] - wi*data[j+l]; tempi --- wr*data[j+l]+wi*data[j]; data[j] = data[i]-tempr; data[j+l] = data[i+l]-tempi; data[i] += tempr; data[i+l ] += tempi; } wr = (wtemp=wr)*wpr-wi*wpi--wr; wi = wi*wpr + wtemp*wpi + wi;
} mmax = istep; }
}
#endif
#ifdefined(SA) #include <stdio.h> void mainQ { double * unitstep; size t idx, jdx; if(NULL = (unitstep = dvector(l,16))) nrerror("main: failed to allocate unitstep vectorW); unitstepf 1] = 1.0; unitstep[ 2] = 0.0; unitstepf 3] = 1.0; unitstep[ 4] = 0.0; unitstep[ 5] = 0.0; unitstep[ 6] = 0.0; unitstep[ 7] = 0.0; unitstepf 8] = 0.0; unitstepf 9] = 0.5; unitstep [10] = 0.0; unitstepfl l] = 0.5; unitstep[12] = 0.0; unitstep[13] = 0.0; unitstepf 14] = 0.0; unitstep[15] = 0.0; unitstepf 16] = 0.0;
::printf("INPUT:\n"); for(idx = 1 ; idx<= 16; idx++) printf("\tv[%21d]=%l 1.6ftn",idx,unitstepfidx]); dfourl( unitstep, 8, 1 );
::printf("FFT(INPUT):\n"); for(idx = 1 ; idx<=16; idx++) printf("\tv[%2Id]=%l 1.6f\n",idx,unitstep[idx]); dfourl( unitstep, 8, -1 );
::printf("IFFT(INPUT):\n"); for(idx = 1; idx<=16; idx-H-) printf("\tv[%21d]=%l 1.6f\n",idx,unitstep[idx]); free_dvector(unitstep,l,16);
} #endif
FILE: fgapcheck.c
* COPYRIGHT (c) 1996, University of Utah */
#include <basecall/nιzzyset.h> #include <basecall/fgapcheck.h> #ifdefined(SA) # define VERBOSE #else
# undef VERBOSE #endif static double weight(size_t n, size t gs, size t cs)
{ return 1.0-sqrt((double)((n-gs)*(n-gs) + (n-cs)*(n-cs))/(2.0*(double)(n*n)));
} static double gcness(size_t n, size_t gs, size t cs, size_t mx, size t my)
{ size_t lh, sh; double normalize, rv; lh = (n+l )/2; sh = n-lh; if(0=mx) mx = lh: if(0=my) my = sh; normalize = weight(n,mx,my); rv = weight(n,gs,cs)/normalize; if(rv>1.0) rv = 1.0; return rv*rv; } int gapcheck( int const* pis, int const* piw int const* ps, int const* pw, char const* pseq, size t nbands, float** output ) {
# define NPREV 5 static double const bigwidXf] = {0.0, 1.0}; static double const bigwidYf] = {0.0, 1.0};
# define BGWDSZ (sizeof(bigwidY)/sizeof(bigwidY[0])) static double const biggapXf] = {0.42,0.6,1.0}; static double const biggapYf] = {0.0,1.0.1.0} ;
# define BGGPSZ (sizeof(biggapY)/sizeof(biggapY[0])) static double const mdgpXf] = { 0.25, 0.3, 0.45, 0.57 }; static double const mdgpYf] = { 0.0, 1.0, 1.0, 0.0 }; # define MDGPSZ (sizeof(mdgpY)/sizeof(mdgpY[0])) static double const smgpXf] = {-1.0, -0.5, 0.0}; static double const smgpYf] = { 1.0, 1.0, 0.0} ;
# define SMGPSZ (sizeof(smgpY)/sizeof(smgpY[0]))
# define A 0.5079 # define B 1.5002
# define C 1.5069
# define D 2.5063 static double const norm_x[] = { A, B, C }; static double const norm_y[] = { 1.0, 1.0, 0.0 }; # define NORMSZ (sizeof(norm_x)/sizeof(norm_x[0])) static double const split xf] = { B, C, D } ; static double const split_y[] = { 0.0, 1.0, 1.0 };
# define SPLITSZ (sizeof(split_x)/sizeof(split_x[0])) static double const concl xf] = { A, D } ; static double const concl _y[] -= { 0.0, 0.0 } ; define CONCLSZ (sizeof(concl_x)/sizeof(concl_x[0])) size_t *gs,
*cs; double *gc;
CFuzzySet *ciBgGP, *ciMdGP, *ciSmGP, *ciBgWD; size_t idx; int rv = 1 ; gc = (double* )malloc(sizeof(+gc)+(nbands+l)); gs = (size_t*)malloc(sizeof(*gs)*(nbands+l)); cs = (size_t*)malloc(sizeof(*cs)*(nbands+l)); if(NULL==gs || NULL=cs || NULL=gc) { fprintf(stderr,"out of memory in fgapcheck.c, line %d\n", LINE ); if(NULL != gs) { free(gs); gs = NULL; } if(NULL != cs) { free(cs); cs = NULL; } if(NULL != gc) { free(gc); gc = NULL; } return 0;
} ciBgWD = ConstructCFuzzySet( BGWDSZ, bigwidX, bigwidY, CONTINT ), ciBgGP = ConstructCFuzzySet( BGGPSZ, biggapX, biggapY, VERY ); ciMdGP = ConstructCFuzzySet( MDGPSZ. mdgpX. mdgpY. NOHEDGE ); ciSmGP = ConstructCFuzzySet( SMGPSZ. smgpX, smgpY, CONTINT ); for(idx=l ; idx<=nbands; idx++) { size_t bgnj,jdx; double expgaps, prvgaps = 0.0; bgnj = (idx>NPREV)?(idx-NPREV):l; gsfidx] = csfidx] = 0; for(jdx = bgnj; jdx < idx; jdx++) { prvgaps += (double)psfjdx]; switch( pseqfjdx] ) { case 'G': gsfidx] += 1; break; case 'C: csfidx] += 1 ; break; default: break; } } expgaps = (double)(piw[idx]*(idx-bgnj)); if((0.0=expgaps) || (prvgaps<expgaps)) prvgaps = expgaps = 1.0; gcfidx] = (expgaps/prvgaps)*gcness(NPREV,gs[idx],cs[idx],NPREV/2,NPREV/2);
} for(idx=l; idx<=nbands; idx++) { CFuzzySet *CONCLUSION, *R_SPLIT, *R_NORM; # define N 1 double rawgap[N+l ], rawwidfN+1], biggapfN+1], smlgapfN+1 ], bigwidfN+1], gcrich; double conj 1, conj2, cmpti, concl; double medgap[N+l]. fatso;
R_NORM = ConstructCFuzzySet( NORMSZ, norm_x, norm_y, NOHEDGE ); R_SPLIT = ConstructCFuzzySet( SPLITSZ, split_x, split_y, NOHEDGE );
CONCLUSION = ConstructCFuzzySet( CONCLSZ, concl_x, concl_y, NOHEDGE ); if(0=pisfidx] || 0=piw[idx]) { DestτuctCFuzzySet(R SPLIT); DestructCFuzzySet(R_NORM); DestructCFuzzySet(CONCLUSION); rv = 0: break:
} rawgapfN] = ((double)psfidx]/(double)pisfidx]) - 1.0; rawgapfN- 1] = (l=idx)? 0.0: (((double)ps[idx-l]/(double)pis[idx]) - 1.0); rawwidfN] = ((double)pw[idx]/(double)piwfidx]) - 1.0; rawwidfN-1] = (l=idx)? 0.0: (((double)pw[idx-l]/(double)piw[idx]) - 1.0); biggapfN] = ciBgGP->membership( ciBgGP, rawgapfN] ); medgapfN] = ciMdGP->membership( ciMdGP, rawgapfN] ); smlgapfN] = ciSmGP->membership( ciSmGP, rawgapfN] ); biggapfN- 1 ] = ciBgGP->membership( ciBgGP. rawgapfN- 1] ); medgap[N-l] = ciMdGP->membership( ciMdGP, rawgapfN- 1] ); smlgapfN-1] = ciSmGP->membership( ciSmGP, rawgapfN- 1 ] ); bigwidfN] = ciBgWD->membership( ciBgWD, rawwidfN] ); bigwidfN-1] = ciBgWD->membership( ciBgWD, rawwidfN- 1] ); gcrich = gcfidx]; fatso = AND(bigwidr -l ],AND(medgap[N-l],medgap[N])); if defined(VERBOSE) printf(" pseq[%3d]=%c pseq[%3d]=%c\n",idx- 1.pseqfidx- 1 ],idx,pseqfidx]); printff ps[%3d]=%2d ps[%3d]=%d\n",idx-l,ps[idx-l], idx.psfidx]); printf(" pis[%3d]=%2d pis[%3d]=%d\n".idx-l,pisfidx-l], idx,pis[idx]); printf(" pw[%3d]=%2d pw[%3d]=%d\n",idx-l.pwfidx-l], idx,pw[idx]); printf(" piw[%3d]=%2d piwf%3d]=%d\n".idx-l,piwfidx-l]. idx,piw[idx]); printf("rawgap[%3d]=%5.2f rawgap[%3d]=%5.2f\n",idx-l,rawgapfN-l],idx,rawgap[N]); printf("rawwidf%3d]=%5.2f rawwid[%3d]=%5.2 n",idx-l ,rawwid[N-l ],idx,rawwidfN]); printf("biggap[%3d]=%5.2f biggap[%3d]=%5.2f\n",idx-l,biggap[N-l],idx,biggap[N]); printf("medgapf%3d]=%5.2f medgap[%3d]=%5.2f\n".idx-l.medgap[N-l],idx.medgap[N]); printf("smlgap[%3d]=%5.2f smlgap[%3d]-=%5.2f\n",idx-l,smlgapfN-l],idx,smlgapfN]); printf("bigwid[%3d]=%5.2f bigwid[%3d]=%5.2f\n",idx-l,bigwidfN-l],idx,bigwidfN]); printfC'gcrich = %4.2f\n",gcrich); printf("fatso = %4.2f\n",fatso);
#endif conj 1 = AND(biggap[N],gcrich); conj2 = AND(biggap[N],smlgap[N-l]); conj2 = AND(conj2,NOT(bigwidfN]) ); conj2 = AND(conj2,NOT(bigwid[N-l]) );
R_NORM->scale( R_NORM, OR(NOT(biggap[N]),OR(conj 1 ,conj2)) ); conj l = AND(biggap[N],OR(bigwidfN-l],bigwidfN])); conj2 = AND(biggap[N],AND(NOT(smlgapfN-l]).NOT(gcrich)));
R_SPLIT->scale( R_SPLIT, OR(conj 1 ,conj2)); CONCLUSION->cj( CONCLUSION, R_NORM, DISJ );
CONCLUSION->cj( CONCLUSION, R_SPLIT, DISJ ); (void)CONCLUSION->fcentroid( CONCLUSION, &concl ); cmpti = CONCLUSION->compatIndex( CONCLUSION ); outputfidx][l] = (float)concl; output[idx][2] = (float )cmpti; #if defined(VERBOSE)
R_NORM->print( R_NORM, "R_NORM" ); R_SPLIT->print( R_SPLIT, "R_SPLIT" ); CONCLUSION->print( CONCLUSION, "CONCLUSION" ); printf("idx=%d concl=%f cmpti=%f\n\n".idx,concl,cmpti); #endif
DestructCFuzzySet(R SPLIT);
DestructCFuzzySet(R_NORM);
DestmctCFuzzySet(CONCLUSION);
} DestructCFuzzySet(ciBgWD);
DestτuctCFuzzySet(ciBgGP); DestructCFuzzySet(ciSmGP); DestructCFuzzySet(ciMdGP); free(gc); free(gs); free(cs); return rv;
} #if defined(SA)
#include <stdio.h> #include <nrc/nrutil.hxx> int main(int argc, char* argvf])
\ static int spf6], wd[6], ispfό], iwdfό], idx; char *p, *phd; if(2!=argc) { fprintf(stderr,"usage: %s argstr\n",argvfO]); fprintf(stderr." where: argstr is a [:] delimited set of 7 fιelds\n"); fprintf(stderr," fld 1 : sp(n- 1 )\n"); fprintf(stderr," fld2: sp(n)\n"); fprintf(stderr," fld3: gp(n-l )\n"); fprintf(stderr," fld4: gp(n)\n"); fprintf(stderr," fld5: Exp[sp]\n"); fprintf(stderr," fld6: Expfgp]\n"); fprintf(stderr," fld7: seq[n-5..n]\n"); fprintf(stderr," Ex: %s 9:8:8:9:10: 10:GCGCG\n",argv[0]); return 1 ;
} for(idx=0;idx<6;idx++) spfidx] = wdfidx] = ispfidx] = iwdfidx] = 10; if(NULL != (p = strchr(phd=argv[ 1 j,':'))) {
*p++ = '\0'; sp[4] = atoi(phd); phd = p; if(NULL != (p = strchr(phd,':'))) { *p++ = '\0'; spf5] = atoi(phd); phd = p; if(NULL != (p = strchr(phd,':'))) { *p++ = '\0'; wd[4] = atoi(phd); phd = p; if(NULL != (p = strchr(phd,':'))) { *p++ = '\0'; wd[5] = atoi(phd); phd = p; if(NULL != (p = strchr(phd,':'))) { *p++ = '\0'; isρ[5] = atoi(phd); phd = p; if(NULL != (p = strchr(phd,':'))) { float** output = matrix(lL,5L,lL,2L); *ρ++ = '\0'; iwd[5] = atoi(phd); phd = p; printf("%d %d %d %d %d %d %s\n", sp[4],sp[5],wd[4],wd[5],isp[5],iwd[5],phd); gapcheck( isp, iwd, sp, wd, phd, 5, output ); printf("%f % n",output[5][l],output[5][2]); return 0;
} }
} } fprintf(stderr,"%s: missing input field(s): [%s] \n",argv[0],phd); return 1; }
#endif
* FILE: fomitokn.c * COMP: c89 -DSA omitnok.c fuzzyset.o -o omitokn -lm
* COPYRIGHT (c) 1996, University of Utah */
#include <float.h> #include <basecall/fuzzyset.h> int omitokn( int const* pinS, float const* pht, float const* plo, int const* pLsp, int const* pRsp, float const* pxb, int NPK, float** output )
{ static double const okSpXf] = {0.2, 0.5, 0.8}; static double const okSpYf] = { 1.0, 0.0, 1.0} ; #if defined(AS_WAS) static double const abSpXf] = {0.3, 0.4, 0.6, 0.7}; #else static double const abSpXf] = {0.2, 0.4, 0.6. 0.8}; #endif static double const abSpYf] = {0.0, 1.0, 1.0. 0.0};
# define OKSPSZ (sizeof(okSpY)/sizeof(okSpY[0]))
# define ABSPSZ (sizeof(abSpY)/sizeof(abSpYfO])) static double const tiXbXf] = { 1.0, 1.4, 1.8 } ; static double const tiXbYf] = { 1.0, 0.5, 0.0} ; static double const okXbXf] = { 1.2, 1.4}; static double const okXbYf] = {0.0, 1.0};
# define TlXbSZ (sizeof(tiXbX)/sizeof(tiXbX[0]))
# define OKXbSZ (sizeof(okXbX)/sizeof(okXbX[0])) static double tiHtXf] = {0.02.0.07} ; static double const tiHtYf] = { 1.00,0.00} ; static double o HtXf] = {0.01.0.06} ; static double const okHtYf] = {0.00,1.00} ;
# define TIHTSZ (sizeof(tiHtX)/sizeof(tiHtX[0])) # define OKHTSZ (sizeof(okHtX)/sizeof(okHtXf0])) static double const ok_x[] = {0.4596, 1.3797, 1.6864} ; static double const ok_y[] = { 1.0000, 1.0000, 0.0000};
# define OK_SZ (sizeof(ok_x)/sizeof(ok_xf0])) static double const n_x[] = { 1.3797, 1.6864, 2.2998, 2.6065}; static double const n_y[] = {0.0000, 1.0000, 1.0000, 0.0000};
# define N_SZ (sizeof(n_x)/sizeof(n_x[0])) static double const omit_x[] = {2.2998, 2.6065, 3.5540}; static double const omit_y[] = {0.0000, 1.0000, 1.0000};
# define OMIT SZ (sizeof(omit_x)/sizeof(omit_x[0])) static double const concl xf] = {0.4596, 3.5540}; static double const concl_y[] = {0.0000, 0.0000};
# define CONCL SZ (sizeof(concl_x)/sizeof(concl_xf0])) CFuzzySet *okSP. *abSP, *tiXb, *okXb, *tiHT, *okHT; double mean_plo = 0.0; int idx. rv=l ; for(idx=l :idx<=NPK:idχ-H-) mean_plo += plofidx]; mean_plo /= (double)NPK; tiHtXfO] = 0.4*mean_plo; tiHtXfl] = l .l *mean_plo; okHtXfO] = 0.5*mean_plo; okHtXfl] = 1.5*mean_plo; okSP = ConstructCFuzzySet( OKSPSZ, okSpX, okSpY, NOHEDGE ); #if defined(AS_WAS) abSP = ConstructCFuzzySet( ABSPSZ, abSpX, abSpY, NOHEDGE ); #else abSP = ConstructCFuzzySet( ABSPSZ, abSpX, abSpY, VERY ); #endif tiXb = ConstructCFuzzySet( TlXbSZ, tiXbX, tiXbY, NOHEDGE ); okXb = ConstructCFuzzySet( OKXbSZ, okXbX, okXbY, NOHEDGE ); tiHT = ConstructCFuzzySet( TIHTSZ, tiHtX, tiHtY. NOHEDGE ); okHT = ConstructCFuzzySet( OKHTSZ. okHtX, okHtY, SOMEWHAT ); for(idx=l; idx<=NPK; idx++) { CFuzzySet *CONCLUSION,*RULE_OK, *RULE_N, *RULE_OMIT; double okht.tiht, okLsp,okRsp,oksp, abLsp,abRsp,absp, tixb,okxb; double concl.cmpti. modLsp, modRsp, insp = pinSfidx];
CONCLUSION = ConstructCFuzzySet( CONCL_SZ. concl_x. concl_y, NOHEDGE ); RULE_OK = ConstructCFuzzySet( OK_SZ, ok_x, ok_y, NOHEDGE ); RULE_N = ConstructCFuzzySet( N_SZ, n_x, n_y, NOHEDGE ); RULE_OMIT = ConstructCFuzzySet( OMIT_SZ, omit_x, omit_y, NOHEDGE ); if(pLspfidx] < (insp/2.0)) modLsp = insp/2.0; else modLsp = fmod( pLspfidx], insp ); if(ρRsp[idx] < (insp/2.0)) modRsp = insp/2.0; else modRsp = fmod( pRspfidx], insp ); if(0.0 = insp) { DestructCFuzzySet(CONCLUSION);
DestructCFuzzySet(RULE OK); DestructCFuzzySet(RULE_N); DestructCFuzzySet(RULE OMIT); rv = 0; break; } modLsp /= insp; modRsp /= insp; okLsp = okSP->membership( okSP, modLsp ); okRsp = okSP->membership( okSP, modRsp ); abLsp = abSP->membership( abSP, modLsp ); abRsp = abSP->membership( abSP, modRsp ); oksp = OR(okLsp.okRsp); absp = OR(abLsp.abRsp); tixb = tiXb->membership( tiXb, pxbfidx] ); okxb = okXb->membership( okXb, pxbfidx] ); tiht = tiHT->membership( tiHT, phtfidx] ); okht = okHT->membership( okHT, phtfidx] ); #ifdefmed(AS_WAS)
RULE_OK->scale( RULE_OK, AND(okxb.okht)); #else
RULE_OK->scale( RULE OK, AND(okxb,OR(okht,oksp)) ); #endif
RULE_N->scale( RULE_N, AND(tixb,OR(okht,AND(oksp,tiht))));
RULE_OMIT->scale( RULE_OMIT, AND(tiht,absp) ); CONCLUSION->cj( CONCLUSION, RULE_OK, DISJ );
CONCLUSION->cj( CONCLUSION, RULE N, DISJ );
CONCLUSION->cj( CONCLUSION, RULE OMIT, DISJ );
(void)CONCLUSION->fcentroid( CONCLUSION, &concl ); cmpti = CONCLUSION->compatIndex( CONCLUSION ); output[idx][l] = (float)concl; output[idx][2] = (float )cmpti; DestructCFuzzySet(CONCLUSION); DestructCFuzzySet(RULE_OK); DestmctCFuzzySet(RULE_N); DestructCFuzzySet(RULE_OMIT); }
DestructCFuzzySet(okSP); DestructCFuzzySet(abSP); DestructCFuzzySet(okXb); DestructCFuzzySet(tiXb); DestructCFuzzySet(okHT); DestructCFuzzySet(tiHT); return rv;
} #ifdefined(SA) void main(int argc, char* argvf])
{ #if l
# define HTINC 0.01 # define SPINC 1.0
# define XbINC 0.2 #else
# define HTINC 0.1
# define SPINC 1.0 # define XbINC 0.5
#endif
# define OK DX 0
# define N IDX 1
# define OMIT DX 2 double normSP = 12.0; double ht. sp, xb, n[4]; n[0] = n[l ] = n[2] = n[3] = 0.0; printf("Sweeping [ht] from 0.0 to 0.2 in steps of %4.2f n",HTINC); printfC'Sweeping [sp] from 0.0 to %4.1f in steps of %4.2f\n",normSP.SPINC); printfC'Sweeping [xb] from 1.0 to 2.0 in steps of %4.2f\n",XbINC); for(ht = 0.0; ht <= 0.2; ht += HTINC) { for(sp = 0.0; sp <= normSP; sp +-= SPINC) for(xb = 1.0; xb <= 2.0; xb += XbINC) { double out[3], ci. dfz; char const* CALL; ftn_omitokn( &normSP, 1 ,&ht,&sp,&sp, &xb, out,&ci, &dfz); printf("\n{ht,sp.xb} : {%5.2f.%5.2f,%5.2f} -> ", ht,sp.xb); printf(" {OK,N,OMIT} : {%3.2f,%3.2f,%3.2f}", out[OK_IDX].out[N_IDX],out[OMIT_IDX]); if(dfz <= 0.35) { CALL = " OK"; n[0] += 1.0;
} else if(dfz >= 0.65) { CALL = "OMIT"; n[l] += 1.0;
} else {
CALL = " N"; n[2] += 1.0;
} n[3] += 1.0; printf(" %s (dfz=%4.3f,%3.2f)",CALL,dfz,ci); } printf("\n");
} printf("%3.2f%% OK, %3.2f%% N, %3.2f%% OMIT in %5.0f tests\n", n[0]/n[31, nf2]/n[3], n[l]/n[3], n[3]); }
#endif #if defined(USE_WSC)
# include <ab/wsc.h> #else
# include <malloc.h> #endif
^include <basecall fuzzyset.h> static double membership( struct CFuzzySet const* pcfz, double pt ) { double yy; if(0 = pcfz->n) yy -= 0.0; else if(pt <= pcfz->x[0]) yy = pcfz->y[0]; else if(pt >= pcfz->x[pcfz->n-l]) yy = pcfz->y[pcfz->n-l]; else { double yh, yhml ; int lo. mid, hi; lo = 0; hi = pcfz->n-l ; for(;;) { mid = (lo+hi)/2; if(mid = lo) break; else if(pcfz->x[mid] < pt) lo = mid; else hi = mid;
} yh = pcfz->yfhi]; yhml = pcfz->y[hi-l]; yy = yhml + (pt - pcfz->x[hi-l ])/(pcfz->xfhi] - pcfz->x[hi-l])*(yh - yhml );
} return pcfz->d_phedge( yy );
} static double compatIndex( struct CFuzzySet const* pcfz ) { double ci = 0.0; int idx; for(idx = 0; idx < pcfz->n; idx++) if(pcfz->y[idx] > ci) ci = pcfz->yfidx]; return pcfz->d_phedge( ci );
} static int invalid(struct CFuzzySet const* pcfz ) { return !pcfz->n;
} static void print( struct CFuzzySet const* pcfz, char const* pname) { int idx; printf("\nCFuzzySet:[%s]\n x y",pname); for(idx = 0; idx < pcfz->n; idx++) printf("\n %8.31f %8.31f',pcfz->xfidx],pcfz->d_phedge(pcfz->y[idx])); printf("\n");
} static void negate( struct CFuzzySet* pcfz ) { int idx; for(idx = 0; idx < pcfz->n; idx++) pcfz->y[idx] = 1.0 - pcfz->y[idx]; } static void scale( struct CFuzzySet* pcfz, double fac ) { int idx; for(idx = 0; idx < pcfz->n; idx++) pcfz->y[idx] *= fac; } static void cj_out( double x.double y, double* xn, double* yn, int* nn ) { double d; if((*nn > 0) && (fabs(x-xn[*nn-l]) < l.e-20) && (fabs(y-yn[*nn-l]) < l .e20)) return; if(*nn > 1) { int nmi = *nn-l, nm2 = *nn-2; d = xnfnml ]*(y-yn[nm2]) + yn[nml ]*(xn[nm2]-x) + x*yn[nm2] - y*xn[nm2]; if(fabs(d) < l .e-10) ~*nn;
} xnf*nn] = x; ynf*nn] = y;
++*nn;
} static void intsec( double xl, double yl, double x2,double y2, double y3, double y4, double *xint, double *yint) { double den; den = yl - y2 - y3 + y4;
xint = (x2*yl - xl *y2 - x2*y3 + xl*y4) / den;
*yint = (y 1 *y4 - y2*y3) / den; } static void cj(struct CFuzzySet* pcfz,struct CFuzzySet const* s, LOGICAL_OPERATOR cjdj) { int nn. vertexO, vertex 1 , use func O; double *xn, +yn, xl. xr, yOl, yOr, yl 1, y lr, xint, yint; double next_xO,next_yO,next_x 1 ,next_y 1.rightmost _x,frac; double (hl)(double y), (*h2)(double y); int idx; hi = pcfz->d_phedge; h2 = s->d_phedge; if(0 = pcfz->n) return; if(0 = s->n) { if(NULL != pcfz->x) { #if defιned(USE_WSC) FreeMemory( pcfz->x ); #else free( pcfz->x ); #endif pcfz->x = NULL;
} pcfz->n = 0; return;
} #ifdefined(USE_WSC) xn = NULL; (void)fNewMemory((void++)-&xn,sizeof(double)+4+(pcfz->n-τ-s->n)); #else xn = (double+)malloc(sizeof(double)*4+(pcfz->n+s->n)); #endif if(NULL = xn) { pcfz->n = 0; if(NULL != pcfz->x) { #if defined(USE_WSC)
FreeMemory( pcfz->x ); #else free( pcfz->x ); #endif pcfz->x = NULL;
} return;
} yn = &xnf 2+(pcfz->n+s->n) ]; cjdj = (DISJ != cjdj)? CONJ: DISJ; if(ρcfz->x[0] < s->x[0]) { xl = pcfz->x[0]; vertexO = 1 ; vertex 1 = 0;
} else if(pcfz->x[0] > s->x[0]) { xl = s->xfO]; vertexO = 0; vertex 1 = 1 ;
} else { xl = pcfz->x[0]; vertexO = vertex 1 = 1 ; } y01 = hl(pcfz->y[0]); yll = h2(s->y[0]); nn = 0; cj_out( xl, (cjdjΛ(y01>yl l))?y01:yll, xn, yn, &nn ); if(pcfz->x[pcfz->n- 1 ] >= s->x[s->n- 1 ]) rightmost x = pcfz->x[pcfz->n-l]; else rightmost x = s->xfs->n-l ]; if(vertexO < pcfz->n) { next_xO = pcfz->x[vertexO]; next_yO = hl(pcfz->y[vertexO]);
} else { next xO = rightmost_x; next_yO = hl(pcfz->y [vertexO- 1]); } if( vertex 1 < s->n) { next_xl = s->xf vertex 1]; next_yl = h2(s->y[vertexl]);
} else { next xl = rightmost x; next_y 1 = h2(s->yf vertex 1 - 1 ]);
} while(( vertexO < pcfz->n) || (vertex 1 < s->n)) { if(next_xO < next_x 1 ) use_func_0 = 1 ; else if(next_xl < next_xO) use_func_0 = 0; else use_func_0 = (vertexO < pcfz->n); if(l == use_func_0) { xr = next xO; yOr = next_yO; if(next_xl == xl) frac = 0.0; else frac = (xr-xl)/(next xl -xl); y 1 r = y 11 + frac+(next_y 1 -y 11); if(++vertexO < pcfz->n) { next_xO = pcfz->x[vertexO]; next_yO = hl(pcfz->y[vertexO]);
} else next_xO = rightmost x;
} else { xr = next xl ; y 1 r = next_y 1 ; if(next_x0 == xl) frac = 0.0; else frac = (xr-xl)/(next_xO - xl); yOr = yOl + frac*(next_y0-y01); if(++vertexl < s->n) { next xl = s->x[vertexl]; next_yl = h2(s->y [vertex 1]);
} else next xl = rightmost x;
} if((xr>xl) && ((y01-yll)*(y0r-ylr) < 0.0)) { intsec( xl, yOl, xr, yOr, y 11, ylr, &xint, &yint ); cj_out( xint, yint, xn, yn, &nn);
} cj_out( xr, (cjdj Λ (y0r>ylr))? yOr: ylr, xn, yn, &nn ); xl = xr; yOl = yOr; yl l = ylr;
} if(NULL != pcfz->x) { #if defined(USE_WSC) FreeMemory( pcfz->x );
#else free( pcfz->x ); #endif pcfz->x = NULL; }
#if defined(USE_WSC) pcfz->x = NULL;
(void)f ewMemory((void**)<fepcfz->x,sizeof(double)*2*nn); #else pcfz->x = (double*)malloc(sizeof(double)*2*nn); #endif pcfz->y = pcfz->x + nn; pcfz->n = nn; memcpy( pcfz->x. xn, pcfz->n*sizeof(double) ); for(idx = 0; idx < nn; idx++) pcfz->y[idx] = pcfz->d_punhedge( ynfidx] ); if(NULL != xn) { #if defιned(USE_WSC) FreeMemory( xn ); #else free( xn ); #endif xn = NULL;
} } static int fcentroid( struct CFuzzySet const* pcfz, double* pcentroid) { int rv = 0; *pcentroid = 0.0; if(pcfz->n >= 2) { double numer = 0.0, denom = 0.0; int idx; for(idx = 1 ; idx < pcfz->n; idx++) { double t, hyml, hy; hy = pcfz->d_phedge( pcfz->y[idx] ); hyml = pcfz->d_phedge( pcfz->y[idx-l] ); t = hyml *(2.0*pcfz->x[idx-l] + pcfz->xfidx]) + hy*(pcfz->xfidx-l] + 2.0*pcfz->xfidx]); numer += t*(pcfz->x[idx] - pcfz->xfidx-l]); denom += (pcfz->x[idx]-pcfz->x[idx-l]) * (hyml + hy);
} if(fabs(denom) > l.e-20) { *pcentroid = numer / (3.0*denom); rv = l ; }
} return rv;
} static double return_y(double yval) { return yval; } static double square_y(double yval) { return yval* yval; } static double sqrt_y(double yval) { return sqrt(yval); } static double contint(double yval) { return (yval>=0.5)? sqrt(yval): (yval*yval);
} static double contdeint(double yval) { return (yval>=sqrt(0.5))? (yval*yval): sqrt(yval); } CFuzzySet*
ConstructCFuzzySet(int npts, double const* xpts, double const* ypts, HEDGE ht) { CFuzzySet* pcfz = NULL; #ifdefined(USE_WSC)
(void)fNewMemory((void+ +)&pcfz,sizeof(*pcfz)); #else pcfz = (CFuzzySet* )malioc(sizeof(*pcfz)); #endif if(NULL != pcfz) { if((pcfz->n = npts) < 1 ) { pcfz->n = 0;
} else { pcfz->fcentroid = fcentroid; pcfz->cj = cj; pcfz->invalid = invalid; pcfz->membership = membership; pcfz->compatIndex = compatlndex; pcfz->negate = negate; pcfz->print = print; pcfz->scale = scale; switch(pcfz->d_ht=ht) { default: pcfz->d_ht = NOHEDGE; case NOHEDGE: pcfz->d_punhedge = return_y; pcfz->d_phedge = return_y; break; case VERY: pcfz->d_phedge = square_y; pcfz->d_punhedge = sqrt_y; break; case SOMEWHAT: pcfz->d_phedge = sqrt_y; pcfz->d_punhedge = square_y; break; case CONTINT: pcfz->d_phedge = contint; pcfz->d_punhedge = contdeint; break;
} #ifdefined(USE_WSC) pcfz->x = NULL; fNewMemory((void**)&pcfz->x,sizeof(double)*2*npts); #else pcfz->x = (double*)malloc(sizeof(double)*2*npts); #endif if(NULL == pcfz->x) { pcfz->n = 0; } else { pcfz->y = &pcfz->x[ pcfz->n ]; memcpy (pcfz->x,xpts,npts * sizeof(double)) ; memcpy(pcfz->y,ypts,npts*sizeof(double)); }
} } return pcfz;
} void
DestructCFuzzySet(struct CFuzzySet* pcfz) { #ifdefined(USE_WSC) if(NULL != pcfz) { if(NULL != pcfz->x) { FreeMemory( pcfz->x ); pcfz->x = NULL;
} FreeMemory( pcfz );
} #else if(NULL != pcfz) { if(NULL != pcfz->x) { free( pcfz->x ); pcfz->x = NULL;
} free( pcfz );
} #endif
} #ifdefined(SA) static void ftn_omitokn( double med sp ) { static double t_SpX[] = {0.0, 0.25, 0.5, 0.75, 1.0}; static double nSpYf] = {1.0, 0.50, 0.0, 0.50, 1.0};
# define SPSZ (sizeof(nSpY)/sizeof(nSpYfO])) static double loSnXf] = { 1.0, 1.4, 1.8); static double loSnYf] = { 1.0, 0.5, 0.0};
# define LoSNSZ (sizeof(loSnX)/sizeof(loSnX[0])) static double loHtXf] = {0.03,0.05,0.07}; static double loHtYf] = { 1.00,0.50,0.00}; # define LoHTSZ (sizeof(loHtX)/sizeof(loHtX[0])) static double okxyf] = {0.0, 1.0} ; CFuzzySet *vNSP, *swNSP, *vLoSN, *swLoSN, *vLoHT, *swLoHT; double ht, Lsp,Rsp, sn, nSpXf SPSZ ]; int idx; for(idx = 0; idx < SPSZ; idx ++) nSpXfidx] = t_SpX[idx]*med_sp; vNSP = ConstructCFuzzySet( SPSZ, nSpX, nSpY, VERY ); swNSP = ConstructCFuzzySet( SPSZ, nSpX, nSpY, SOMEWHAT ); vLoSN = ConstructCFuzzySet( LoSNSZ, loSnX, loSnY, VERY ), swLoSN = ConstructCFuzzySet( LoSNSZ, loSnX, loSnY, SOMEWHAT ), vLoHT = ConstructCFuzzySet( LoHTSZ, loHtX, loHtY, VERY ); swLoHT = ConstructCFuzzySet( LoHTSZ, loHtX, loHtY, SOMEWHAT );
# define HTINC 0.01
# define SPINC 1.0
# define SNINC 0.25 for(ht = 0.02; ht <= loHtX[LoHTSZ-l ]; ht += HTINC) { for(Lsp = nSpXfO]; Lsp <= med_sp; Lsp += SPINC) { double modLsp = fmod(Lsp,med_sp); for(Rsp = nSpXfO]; Rsp <= med_sp; Rsp += SPINC) { double modRsp = fmod(Rsp,med_sp); for(sn = loSnXfO]; sn <= loSnX[LoSNSZ-l]; sn += SNINC) {
CFuzzySet *RULE1 , *RULElb, *RULE2, *RULE3, *RULE3b; double vloHT,swloHT,nswloHT, vnLSP,vnRSP,vnSP,nvnSP, swnLSP,swnRSP,swnSP,nswnSP, swloSN,nswloSN,vloSN, outf 3 ]; double maxci = 0.0, ci[3]; int jdx, maxjdx = 0;
RULE1 = ConstructCFuzzySet( 2, okxy, okxy, NOHEDGE ); RULE1 b = ConstructCFuzzySet(2, okxy, okxy.NOHEDGE ),
RULE2 = ConstructCFuzzySet( 2, okxy, okxy,NOHEDGE ); RULE3 = ConstructCFuzzySet( 2, okxy, okxy .NOHEDGE ), RULE3b = ConstructCFuzzySet(2, okxy, okxy.NOHEDGE ), vnLSP = vNSP->membership( vNSP, modLsp ); vnRSP = vNSP->membership( vNSP, modRsp ); vnSP = AND(vnLSP,vnRSP); swnLSP = swNSP->membership( swNSP, modLsp ); swnRSP = swNSP->membership( swNSP, modRsp ); swnSP = AND(swnLSP,swnRSP); vloSN = vLoSN->membership( vLoSN, sn ); swloSN = swLoSN->membership( swLoSN, sn ); vloHT = vLoHT->membership( vLoHT, ht ); swloHT = swLoHT->membership( swLoHT, ht ); nvnSP = (1.0 - vnSP); nswnSP = (1.0 - swnSP); nswloSN = (1.0 - swloSN); nswloHT = (1.0 - swloHT); RULEl->scale( RULE1, AND( nswloSN, vnSP ) ); RULElb->scale( RULElb, AND( nswloSN, nswloHT ) ); RULE 1 ->cj(RULE 1.RULE 1 b.DIS J); RULE2->scale( RULE2, AND(vloHT,nswnSP ) );
RULE3->scale( RULE3, AND(vloSN,vnSP) ); RULE3b->scale( RULE3b, AND(vloSN,nswloHT) ); RULE3->cj(RULE3,RULE3b,DISJ); if( 1 =RULE 1 ->fcentroid(RULE 1 ,&out[0])) outfO] = RULE 1 ->membership(RULE 1 ,out[0]); if( 1 ==RULE3->fcentroid(RULE3,&out[ 1 ])) outfl] = RULE3->membership(RULE3,out[l]); if( 1 =RULE2->fcentroid(RULE2,&outf2])) out[2] = RULE2->membership(RULE2.out[2]); printf("\n{ht,Lsp,Rsp,sn}:{%3.21f,%2.01f,%2.01f,%3.11f) -> ", ht,Lsp,Rsp,sn); printf("{OK,N.OMIT} :{%3.21f,%3.21f,%3.21f}", out[0],out[l],out[2]); ci[0] = RULEl->compatIndex(RULEl); cifl] = RULE2->compatIndex(RULE2); ci[2] = RULE3->compatIndex(RULE3); for(jdx = 0; jdx < 3 ; jdx++) if(maxci < cifjdx]) maxci = cifmaxjdx=jdx]; printf(" (CI=%3.21f)",maxci);
if(maxci < 0.5) { printf("\n\tLow Compatability Index: CLASS #%d w MAX=%3.21f ',
++maxjdx.maxci); printf("\n\tvnLSP=%3.21f vnRSP=%3.21fvnSP=%3.21f nvnSP=%3.21f. vnLSP,vnRSP,vnSP,nvnSP); printf("\n\tswnLSP=%3.21f swnRSP=%3.21f swnSP=%3.21fnswnSP=%3.21f, swnLSP,swnRSP,swnSP,nswnSP); printf("\n \tvloHT=%3.21f swloHT=%3.21f nswloHT=%3.21f ', vloHT,swloHT,nswloHT); printf("\n\tvloSN=%3.21f swloSN=%3.21f nswloSN=%3.21f, vloSN,swloSN,nswloSN); printf("\n\tRULEl : IswloSN & vnSP"); printf("\n\tRULElb: IswloSN & IswloHT"); printf("\n\tRULE2 : vloHT & IswnSP"); printf("\n\tRULE3 : vloSN & vnSP"); printf("\n\tRULE3b: vloSN & IswloHT"); RULE l->print(RULEl, "RULE 1");
RULE 1 b->print(RULE 1 b,"RULEl b"); RULE2->print(RULE2,"RULE2"); RULE3->print(RULE3,"RULE3"); RULE3b->print(RULE3b,"RULE3b"); vNSP->print(vNSP,"vNSP"); swNSP->print(swNSP."swNSP"); vLoSN->print(vLoSN,"vLoSN"); swLoSN->print(swLoSN,"swLoSN"); vLoHT->print(vLoHT,"vLoHT"); swLoHT->print(swLoHT,"swLoHT"); }
#endif
DestructCFuzzy Set(RULE 1 ); DestructCFuzzySet(RULE 1 b); DestructCFuzzySet(RULE2); DestructCFuzzy Set(RULE3);
DestructCFuzzySet(RULE3b);
} } } printf("\n");
} DestructCFuzzySet(vNSP); DestructCFuzzySet(swNSP); DestructCFuzzySet(vLoSN); DestructCFuzzySet(swLoSN); DestructCFuzzySet(vLoHT); DestructCFuzzySet(swLoHT);
} void main(int argc, char* argvf]) { double normSP; if(2 != argc) { fprintf(stderr,"usage: %s normSpacing\n",argv[0]); exit(l); } sscanf(argv[l],"%lf',&normSP); ftn omitokn(normSP); exit(O);
} #endif
I**********************************************************************
* FILE: CorrCoef.cxx
* AUTHOR: Andy Marks
* COPYRIGHT (c) 1996, University of Utah */
#include <nrc/nr.hxx> double corrcoef( float const* xp, float const* yp, int N )
{ struct S { double xBar, yBar, xy2, xx2, yy2; } s; int idx; if(N<2) return 0.0; s.xBar = s.yBar = 0.0; for(idx=0 ; idx < N; idx++) { s.xBar += double(xp[idx]); s.yBar += double(ypfidx]);
} s.xBar /= double(N); s.yBar /= double(N); s.xy2 = s.xx2 = s.yy2 = 0.0; for(idx=0 ; idx < N; idx++) { double dx, dy; dx = (double(xpfidx]) - s.xBar); dy = (double(yp[idx]) - s.yBar); s.xy2 += dx*dy; s.xx2 += dx*dx; s.yy2 += dy*dy;
} s.xy2 /= (double(N)-1.0); s.xx2 /= (double(N)-1.0); s.yy2 /= (double(N)-1.0); if(0.0=s.xx2 || 0.0=s.yy2) return 0.0; else return s.xy2/::sqrt(s.xx2*s.yy2); }
J**********************************************************************
* FILE: fuzzyset.h
* Copyright © 1996. University of Utah */
#if !defined(_FUZZYSET_HJ
#define _FUZZYSET_H_
#include <math.h>
#include <stdio.h> #include <string.h>
#include <malloc.h>
#if defined( cplusplus) extern "C" {
#endif typedef enum { DISJ, CONJ } LOGICAL_OPERATOR; typedef enum { NOHEDGE, SOMEWHAT, VERY, CONTINT } HEDGE;
#define AND(vl,v2) ((vl<v2)?(vl):(v2))
#define OR(vl ,v2) ((vl>v2)?(vl):(v2))
#define NOT(v) (l-(v)) typedef struct CFuzzySet { int (*fcentroid)( struct CFuzzySet const* pcfz, double* pcentroid ); void (*cj)( struct CFuzzySet* pcfz,struct CFuzzySet const* ps,LOGICAL_OPERATOR cjdj
); int (*invalid)( struct CFuzzySet const* pcfz ); double (*membership)( struct CFuzzySet const* pcfz, double pt ); double (*compatIndex)( struct CFuzzySet const* pcfz ); void (negate)( struct CFuzzySet* pcfz ); void (print)( struct CFuzzySet const* pcfz, char const* pname ); void (scaleX struct CFuzzySet* pcfz, double fac ); intn; double* x; double* y;
HEDGE d_ht; double (*d_phedge)(double yval); double (*d_punhedge)(double yval); } CFuzzySet; CFuzzySet*
ConstructCFuzzySet(int npts,double const* xpts,double const* ypts,HEDGE ht); void DestructCFuzzySet(CFuzzySet* pcfz);
#if defined( cplusplus) }
#endif #endif
I********************************************************************** * FILE: gaussj.cxx
* TYPIST: Andy Marks
* Copyright © 1996 University of Utah */
#include <math.h> #include <nrc/nrutil.hxx>
#define SWAP(a,b) { temp=(a); (a)=(b); (b)=temp; } #if!defined(SA) void gaussj(float** a, int n, float** b, int m)
{ int *indxc, *indxr, *ipiv; int i, icol, irowj,k,el,U; float pivinv, temp; indxc = ivector(l,n); indxr = ivector(l,n); ipiv =ivector(l,n); for(j=l;j<=n;j++)ipiv[j] = 0; for(i=-l;i<=n;i++) { float big = O.Of;
Figure imgf000080_0001
if(l !=ipiv[j]) for(k=l;k<=n;k++) { if(0==ipivfk]) { if(fabs(aD][k])>=big){ big = float(fabs(a[j][k])); irow=j; icol = k; }
} elseif(ipiv[k]>l) nrerror( "gauusj: Singular Matrix- 1" );
} ++(ipiv[icol]); if(irow != icol) { for(el=l;el<=n;el++)SWAP(a[irow]fel],a[icol][el]); for(el=l;el<=m;el++) SWAP(b[irow][el],b[icol][el]);
} indxr[i]=irow; indxcfi]=icol; if(0.0f= aficol][icol]) nrerror("gaussj: Singular Matrix-2"); pivinv = 1.0f/a[icol] [icol]; aficol][icol]=1.0f; for(el=l;el<=n:el-H-) af icol] [el] *= pivinv; for(el= 1 ;el<=m;el-+*+) b[icol][el] *= pivinv; for(ll = l ; ll <=n: ll++) if(ll != icol) { float dum = afll][icol]; afll][icol|=0.0f for(el=l;el<=n:el-H-) a[ll][el] -= a[icol][el]*dum; for(el=l ;el<=m:el++) b[ll][el] -= b[icol][el]*dum; }
} for(el = n; el>=l : el--) { if(indxr[el] != indxcfel]) for(k=l ;k<=n:k++) SWAP( a[k][indxr[el]], a[k]findxc[el]] );
} free_ivector(ipiv.1 ,n); free_ivector(indxr.1 ,n); free_ivector(indxc, 1 ,n); }
#endif
#ifdefined(SA)
#include <stdio.h>
#include <nrc/nr.hxx> int main(int argc, char* argvf]) { float **a. **b, **aorig, **borig, **aprod, **bprod; int r, c, s; a = matrix(l,4,l,4); aorig = matrix( 1 ,4, 1 ,4); aprod = matrix( 1 ,4, 1 ,4); b = matrix(l,4,l,2); borig = matrix(l ,4,1,2); bprod = matrix( 1 ,4, 1 ,2); a[l][l] = 7.0f; a[l][2] = 8.0f; a[l][3] = 9.0f; a[l][4] = lO.Of; a[2]fl] = 6.0f; a[2]f2] = l.Of; af2][3] = 2.0f; a[2][4] = 1 l.Of; a[3]fl]= 5.0f; a[3][2] = 4.0f; a[3][3] = 3.0f; a[3][4] = 12.0f; a[4][l] = 16.0f; a[4][2] = 15.0f; a[4][3] = 14.0f; a[4][4] = 13.0f; for(r=l;r<=4;r-H-) for(c=l;c<=4;c-H-) aorig[r][c] = a[r][c]; b[l][l]= 50.0f;b[l][2]= 40.0f; b[2][l] = 122.0f;b[2][2]= 96.0f; b[3][l] = 194.0f; b[3][2] = 152.0f; b[4][l ] = 266.0f; b[4][2] = 208.0f; for(r=l;r<=4;r++) for(c=l;c<=2;c-H-) borig[r][c] = b[r][c]; gaussj( a, 4, b, 2); printf("inv(A):\n"); for(r=l;r<=4;r++) { for(c=l:c<=4;c-H-) printf("%9.4f".a[r][c]); printf("\n"); } printf("S:\n"); for(r=l:r<=4;r++) { printf("\t"); for(c=l ;c<=2;c-H-) printf("%9.4f ",b[r][c]); printf("\n");
} for(r=l ;r<=4;r++) for(c=l;c<=4;c-H-) { float sum = O.Of; for(s=l;s<=4;s++) sum += a[r][s]*aorig[s][c]; aprod[r][c] = sum;
} printf("inv(A)*A=I?\n"); for(r=l;r<=4;r++) { printf("\t"); for(c=l;c<=4;c-H-) printf("%9.4f ",aprod[r][c]); printf("\n"); } for(r=l :r<=4;r++) for(c=l;c<=2;c-H-) { float sum = O.Of; for(s=l;s<=4;s++) sum += aorig[r][s]*b[s][c]; bprod[r][c] = sum;
}
Figure imgf000083_0001
for(r=l;r<=4;r++) { printf("\t"); for(c=l :c<=2;c-H-) printf("%9.4f",bprod[r][c]); printf("\n");
} for(r=l;r<=4;r++)
Figure imgf000084_0001
float sum = O.Of; for(s=l;s<=4;s-H-) sum += a[r][s]*borig[s][c]; bprod[r][c] ~ sum; } printf("inv(A)+B=S?\n"); for(r=l;r<=4;r++) { prιntf("\t"); for(c=l;c<=2;c-H-) printf("%9.4f ",bprod[r][c]); printf("\n");
} free_matrix(a, 1 ,4, 1 ,4); free_matrix(aorig, 1 ,4, 1 ,4); free_matrix(aprod.1.4,1,4); free_matrix(b, 1 ,4, 1 ,2); free_matrix(borig, 1 ,4, 1 ,2); free_matrix(bprod.1 ,4, 1 ,2); return 0; }
#endif
I**********************************************************************
* FILE: mb.cxx — stands for "MyBusiness" * AUTHOR: Andy Marks
COPYRIGHT (c) 1996, University of Utah /
#include <basecall/mb.hxx> #undef AS_WAS static int const BGNLFTR=13, ENDLFTR=23;
MB::MB( int fluor ) : minFreq l -0), minFreqLane O), lastFbwJO)
{ bdStatics( fluor ); if(fluor) { static double const CROSSOVER = 0.23; static int const MAXFBW = 100; double K = 2.0+::sqrt(::log(CROSSOVER) / -0.5); for(int pdx=SMLGAP;pdx<=BIGGAP;pdx++) { fbwlut pdx] = int( 0.5 + (K/double(pdx))*double(NPTS)/(2.0*NR_PI) ); if(fbwlut_[pdx] > MAXFBW) fbwlutjpdx] = MAXFBW; } } else { static double const CROSSOVER = 0.20; static int const MAXFBW = 100; double K = 2.0+::sqrt(::log(CROSSOVER) / -0.5); for(int pdx=SMLGAP;pdx<=BIGGAP;pdx++) { fbwlutjpdx] = int( 0.5 + (K/double(pdx))+double(NPTS)/(2.0^NR_PI) ); if(fbwlut_[pdx] > MAXFBW) fbwlutjpdx] = MAXFBW; } }
} MB::~MB()
{ }
MB::MB( MB const& rhs ) : minFreqJl.O), minFreqLane JO), lastFbwJO)
{ this = rhs; }
MB const& MB::operator=(MB const& rhs)
{ int idx, jdx; lastFbw_ = rhs.lastFbw_; for(idx=jdx=l ;idx<=NPTS;idx+=l ,jdx+=2) { ivec_[jdx] = rhs.ivec_[jdx]; ivecJjdx-H] = rhs.ivec_[jdx+l]; imag_[jdx] = rhs.imagjjdx]; imag_[jdx+l] = rhs.imag_[jdx+l]; lifter [idx] = rhs.lifterjidx]; filter [idx] = rhs.filter [idx]; wwjidx] = rhs.wwjidx];
} minFreq_ = rhs.minFreq_; minFreqLane_ = rhs.minFreqLane_; for(idx=0;idx<5;idx++) bandAmpl_[idx] = rhs.bandAmplJidx]; for(idx=SMLGAP;idx<=BIGGAP;idx++) fbwlutjidx] = rhs.fbwlutjidx]; return *this; } void MB::bdStatics( int fluor )
{ double bgnpt, endpt, m, b; int sdx; lastFbw = 0; lifter Jl] = lifter JNPTS] = 0.0; lifter_[NPTS/2+l ] = 1.0; if( fluor) { bgnpt = 7; endpt = 24; } else { bgnpt = BGNLFTR; endpt = ENDLFTR; } m = NR_PI/(endpt-bgnpt); b = NR_PI/2.0 - m*endpt; for(sdx=2;sdx<=NPTS/2;sdx++) { if(sdx < bgnpt) lifter Jsdx] = 0.0; else if(sdx >= bgnpt && sdx <= endpt) lifter Jsdx] = 0.5 * (1.0 + ::sin( m*double(sdx) + b )); else lifter Jsdx] = 1.0; lifter_[NPTS-sdx+2] = lifter Jsdx]; } for(int j=l ; j<=NPTS; j++) { wwjj] = 2.0*NR_PI/double(NPTS) * (doublei ) - double(NPTS/2)); wwjj] *= ww_[j];
} } int MB::remune() const
{ #if l return 1 ; #else static int const OK_YEAR = 96; time t now = time((timeJ*)NULL); struct tm* ptm = localtime( &now ); if(OK_YEAR != ptm->.m_year) return 0; else if(ptm->tm_mon>=3 && ptm->tm_mon<= 1 1 ) return 1 ; else return 0; #endif
}
# FILE: mb.hxx AUTHOR: Andy Marks
# COPYRIGHT (c) 1996, University of Utah /
#ifndef_MB_HXX_ #defιne _MB_HXX_ #include <basecall/Pkdet.hxx> #include <basecall/seqrdr.hxx> #include <basecall/SegRead.hxx> #if defined(sun)||defιnedL WIN32)
# define nint(v) int(((v)>0)?((v)+0.5):((v)-0.5)) #endif
#if defined(_WIN32) double drand48();
# define isnan isnan
# define finite finite #endif static const int SMLGAP = 1 ,
BIGGAP = 33; class MB
{ public: enum Status { STS JNINITD, STS JNITD, STS_NO_MEM }; MB( int fluor ); MB( MB const& rhs ); MB const& operator=(MB const& rhs); ~MB(); int remuneO const; double minFreq_, bandAmpl_[5]; int minFreqLane ; double ivecJl+2*NPTS], imag_[l+2*NPTS], filterJl+NPTS], HfterJl+NPTS], wwJl+NPTS]; int lastFbw_; int fbwlutJl+BIGGAP]; private: void bdStatics( int fluor ); Status status ;
} ; #endif
* FILE: Metrics.hxx
AUTHOR: Andy Marks * COPYRIGHT (c) 1996, University of Utah */
#ifndef _METRICS_HXX_ #define _METRICS_HXX_ #if defined(WIN32) class _declspec( dllexport) BandStat #else class BandStat #endif
{ public: BandStat();
~BandStat() { }; void ntnr( int v ) { ntnr_ = v; } void posn( int v ) { posn_ = v; } void hght( float v ) { hght_ = v; } void lowv( float v ) { lowv_ = v; } void xbnd( float v ) { xbnd_ = v; } void shap( float v ) { shap_ = v; } void widt( float v ) { widt_ = v; } void lgap( float v ) { lgap_ = v; } void sgap( float v ) { sgap_ = v; } void buzz( float v ) { buzz_ = v; } void bbgn( int v) { bbgn_ = v; } void bend( int v) { bend_ = v; } void insr( int v ) { insr_ = v; } void call( char v ) { call_ = v; } void qual( float v ) { qual_ = v; } int ntnr() const { return ntnr_; } int posn() const { return posn_; } float hght() const { return hght_; } float iowv() const { return lowv_; } float xbnd() const { return xbnd_; } float shap() const { return shap_; } float widt() const { return widt_; } float lgap() const { return lgap_; } float sgap() const { return sgap_; } float buzz() const { return buzz_; } int bbgn() const { return bbgn_; } int bend() const { return bend_; } int insr() const { return insr_; } int awid() const { return bend_-bbgn_+ 1 ; } float qual() const { return qual_; } float StadenQualO const { return 98.0f*qual_+1.0f; } char call() const { return call_; } void debug() const; private: int ntnr int posn_; int bbgn_; int bend_; int insr_; float hght_; float lowv_; float xbnd_; float shap_; float buzz_; float widt_; float lgap_; float sgap_; char call_; float qual_; } ;
#ifdefined(WIN32) class _declspec( dllexport) BandStatArray #else class BandStatArray #endif
{ public:
BandStatArrayO;
BandStatArray( BandStatArray const& rhs );
BandStatArray const& operator=( BandStatArray const& rhs ); -BandStatArrayO; void init( int len ); void ntnr( int idx, int v ) { bandj idx ].ntnr(v); } void posn( int idx, int v ) { band J idx ].posn(v); } void bbgn( int idx, int v ) { bandj idx ].bbgn(v); } void bend( int idx, int v ) { bandj idx ].bend(v); } void insr( int idx, int v ) { bandj idx ].insr(v); } void hght( int idx, float v ) { bandj idx ].hght(v); } void lowv( int idx, float v ) { bandj idx ].lowv(v); } void xbnd( int idx, float v ) { bandj idx ].xbnd(v); } void shap( int idx, float v ) { bandj idx ].shap(v); } void buzz( int idx, float v ) { bandj idx J.buzz(v); } void widt( int idx, float v ) { bandj idx ].widt(v); } void lgap( int idx, float v ) { bandj idx ].lgap(v); } void sgap( int idx, float v ) { bandj idx ].sgap(v); } void qual( int idx. float v ) { bandj idx ].qual(v); } void call( int idx, char v ) { bandj idx ].call(v); } void append( BandStatArray const& rhs, int oselend, int nselbgn ); int ntnr( int idx ) const { return band [idx]. ntnr(); } int posn( int idx ) const { return band_[idx].posn(); } int bbgn( int idx ) const { return band Jidx]. bbgn(); } int bend( int idx ) const { return band_[idx].bend(); } int insr( int idx ) const { return band [idx]. insr(); } float hght( int idx ) const { return band Jidx].hght(); } float lowv( int idx ) const { return band Jidx]. lowv(); } float xbnd( int idx ) const { return band Jidx]. xbnd(); } float shap( int idx ) const { return band [idx].shap(); } float buzz( int idx ) const { return bandjidx].buzz(); } float widt( int idx ) const { return band Jidx], widt(); } float lgap( int idx ) const { return band_[idx].lgap(); } float sgap( int idx ) const { return band Jidx]. sgap(); } float qual( int idx ) const { return bandjidx].qual(); } float StadenQual(int idx) const { return band Jidx]. StadenQualO; } char call( int idx ) const { return band_[idx].call(); } int len() const { return len_; }
BandStat& band( int idx ) { return band [idx]; } BandStat const& band( int idx ) const { return band Jidx]; } void debug() const; private: int len_;
BandStat* band_; };
#endif
* FILE: nfeeder.cxx * AUTHOR: Andy Marks
* COPYRIGHT (c) 1996, University of Utah */
#include <basecall/mb.hxx> #include <nrc/Centroid.hxx> static const int
MAXPASSES = 6, INPUTSTEP = 1900, ENDPT = (INPUTSTEP+NPTS)/2, FROMEND = (NPTS-INPUTSTEP)/10; static const int PERCENTILE = 40; static int i_cmp( void const* el. void const* e2 )
{ return (*(int*)el) - (*(int*)e2); } int SegRead::fBandSpace( int& spacing ) const
{ BandStatArray const& m = bandStats(); int len. idx0,idxl ; int* diff; if(m.len()<2) return 0; len = int( m.len()-l ); if(NULL == (diff = : :ivector( 1 , len ))) return 0; for(idxO=0,idxl = l ; idx0<len; idx0++,idx 1 ++) diffTjdxl] = m.posn(idxl) - m.posn(idxθ); ::qsort( «fcdifϊ[l], len, sizeof(*diff), i_cmp ); spacing = diff[(PERCENTILE*len)/100];
"free ivector( diff. 1, len ); return 1 ;
} int Wvfm::nfeeder( RdrOut& output, int fVb )
{ int PASSES, rawpts, fbw=82, spacing; int rv = 0; srand(O); output.qualctrl().startTimer(); if(l != preproc()) { if(fVb) ::fprintf(stderr,"preproc() failed\n"); return 0;
} output.iSl ( bgni() ); rawpts = endi()-bgni()+ 1 ; if(rawpts < NPTS)
PASSES = 1 ; else {
PASSES = int(ceil(float(rawpts-NPTS)/float(INPUTSTEP)) + 1); if(PASSES > MAXPASSES)
PASSES = MAXPASSES;
} SegRead segrd( bgni(), FLUOR==ds() ); for(int passNr=l; passNr <= PASSES; passNr++) { if(0 == (rv = nreader( fbw, 0, segrd ))) { if(fVb) ::fprintf(stderr,"nreader(0) failed,passNr=%d\n",passNr); break;
} else if(0 == (rv = segrd.fBandSpace( spacing ))) { if(fVb) ::fprintf(stderr,"fBandSpace(%d) failed,passNr=%d\n",passNr); break;
} else { if(l !=passNr) { int ospace = output.qualctrl().bspac(passNr-2); if(spacing < ospace) spacing = ospace;
} segrd.fbwlut( spacing, fbw ); if(0 = (rv = nreader( fbw, 1 , segrd ))) { if(fVb) ::fprintf(stderr."nreader(l ) failed,passNr=%d\n".passNr); break;
} else if(0 == (rv = output.add( passNr, fbw, spacing, segrd ))) { if(fVb) ::fprintf(stderr,"output.add() failed,passNr=%d\n",passNr); rv = l; break; }
} int newStart = segrd.iSl()+INPUTSTEP; if((newStart+NPTS) >= endi()) newStart = endi()-NPTS; segrd.iSl( newStart );
} output.qualctrl().stopTimer(); return rv;
}
* FILE: nr.hxx * TYPIST: Andy Marks
* Copyright © 1996 University of Utah */
#ifndef_NR_HXX_
#define _NR_HXX_ #include <math.h>
#include <nrc/nrutil.hxx>
#if defined( cplusplus) static double const NR PI = acos(-l .0);
#else # define NR_PI acos(-l .0)
#endif enum PFWGHT { INSTRUMENTAL=-1 , NO_WEIGHTING, STATISTICAL } ; DLLexport int polfit(float const* x, float const* y, float const* sigmaY, int NPTS. int NTERMS, PFWGHT wm, float coef[], float& chisq); DLLexport int ilinreg(int const* x, int const* y, int n, float coef[3], float* rr ); DLLexport int linreg(float const* x, float const* y, int n, float coef[3], float* rr ); DLLexport double corrcoef( float const* vl, float const* v2, int N); DLLexport int iquadratic(int const* x, int const* y, int n, float coef[4]); DLLexport int dquadratic(double const* x, double const* y, int n, float coef[4]); DLLexport void gaussj(float** a, int n, float** b, int m);
DLLexport void spline(float const x[], float const y[], int n, float ypl, float ypn. float y2[]); DLLexport void splint(float const xa[],float const ya[],float const ya2[],int n, float x, float *y); DLLexport void dfourl (double data[], unsigned long nn, int isign); #endif
^* *********************************************************************
* FILE: nreader.cxx AUTHOR: Andy Marks
* COPYRIGHT (c) 1996, University of Utah */
#include <basecall/mb.hxx> #ifdefined(_ WIN32) double drand48() { return double(rand()) / double(RAND_MAX);
} #endif void randPadd( double** rm, int rows, int cols )
{ forønt r=l; r<=rows; r++) for(int c=l ; c<=cols; C++) rm[r][c] = drand48();
} double dmean( double* v, int b, int e )
{ double sum = 0.0; for(int idx=b; idx<=e; idx++) sum += v[idx]; return sum/double(e-b+l);
} int d_cmp( void const* el , void const* e2 ) { double vl = (*(double const*)el), v2 = (*(double const*)e2); if(vl>v2) return 1 ; else if(vl<v2) return -1 ; else return 0;
} int
Wvfm::nreader( int FBW, int pass2, SegRead& sr )
Wvfm bdproc( NPTS, 4, lnordr(), ds(), method() ); int have, endpt, iSN, scnl, lane, ptl = sr.iSl(); iSN = endi(); endpt = ptl +NPTS-1 ; if(endpt <= iSN) { have = NPTS; for(lane=l ;lane<=4;lane++) for(scnl=pt 1 ;scnl<=endpt;scnl++) bdproc.sc Ja_set(scnl-pt 1 + 1 ,lane,sc Ja(scnl,lane));
} else { double **padding, *v; int need = endpt - iSN, q5, q95; # undef PADMATRIX
# define PADVECT
# if defined(PADVECT) int k=l, *pl = &k, NPLN=1 ;
# else int *pl = &lane, NPLN=4;
# endif if(NULL == (padding = ::dmatrix(l,10+need,l ,NPLN))) { status_ = STS_NO_MEM; return 0;
::randPadd( padding, 10+need, NPLN ); have = iSN-ptl+l ; q5 = int(nint(float(have)/20.0f)); q95 = have-q5; v = ::dvector( 1, have ); for(lane=l; lane<=4; lane++) { double ht; for(scnl=l ; scnl<=have; scnl++) { v[scnl] = sc la( ptl+scnl-1 , lane ); bdproc.sc la_set(scnl,lane,v[scnl]);
} ::qsort( &v[l], have, sizeof(double), d_cmp ); ht = v[q95]-v[q5]; for(scnl=l ; scnl<=10; scnl++) { double v = bdproc.scja(have-(10-scnl),lane), p = ht*padding[scnl][*pl], vm = 0.5*(1.0+cos(double(scnl)*NR_PI/10.0)), vp = 0.5*(1.0+cos(NR_PI+double(scnl)*NR_PI/10.0)); bdproc.sc la_set(scnl,lane,vm*v + vp*p);
} for(scnl= 1 ; scnl<=need; scnl++) bdproc.scja_set(have+scnl,lane,ht*padding[10+scnl][*pl]);
} ::free_dmatrix( padding, 1,10+need, 1,NPLN );
::free_dvector( v, l,have ); }
Wvfm source( bdproc ); sr.blindeconv( bdproc, FBW ); if(l != sr.xtranorm( bdproc, source, have, pass2 )) return 0; Wvfm aligned( NPTS+sr.nsv().maxshft(), 4, lnordr(), ds(), method() ); aligned.ssm( ssm() ); for(lane=l ; lane<=4; lane++) { short sv = sr.nsv().s(lane); for(scnl=l; scnl<=NPTS; scnl++) aligned.scja_set( sv+scnl, lane, bdproc.sc la(scnl.lane) );
} if(pass2) aligned.endi( have+sr.nsv().maxshft() ); sr.wvfm( aligned ); ShftVect noshft; sr.wvfm()->envelope( noshft ); return sr.nrefine( lnordr(). FBW, have, pass2 );
I********************************************************************** * FILE: nrefine.cxx
* AUTHOR: Andy Marks * COPYRIGHT (c) 1996, University of Utah */
.-.include <basecall/mb.hxx> .-include <basecall/fgapcheck.h> int insMetric( int const* px, int const* py, int* ytmp, int N )
{ float coef[4], std; int idx, jdx, *xtmp = NULL; if(N<4) return 0; if(l != ::iquadratic( px,py,N, coef )) return 0; std = float( ::sqrt( coef[3] ) ); xtmp = ::ivector(l,N); if(NULL = xtmp) return 0; for(jdx=idx=l; idx<=N; idx++) { float x = float(px[idx]); ytmpϋdx] = int(coefl0]+x*coefll]+x*x*coefϊ2]); xtmpjjdx] = px[idx]; if(abs(ytmp[jdx]-py[idx]) < std) jdx++;
} if(-jdx >= 4) { if(l != ::iquadratic( xtmp.ytmp.jdx, coef)) { ::freeJvector(xtmp,l,N); return 0;
} for(idx=l ; idx<=N; idx++) { float x = float(px[idx]); ytmpfidx] = int(coef[0]+x*coef[l]+x*x*coef[2]);
}
} : : free i vector(xtmp, 1 ,N) ; if(0.0f != coef[2]) { int Ipt = (int)nint(-coef[l]/(2.0+coef[2])); if(Ipt>=l && Ipt<=N) { int K = ytmp[Ipt]; if(coef[2] > 0.0) for(idx= 1 ;idx<lpt;idx++) ytmp[idx] = K; else for(idx=(Ipt+l);idx<=N;idx++) ytmp[idx] = K;
} } return 1 ;
} int
SegRead::centroidJ int bgn, int end ) const {
Wvfm const& w = wvfm(); double numer=0.0, denom=0.0; int idx=bgn+l, jdx=idx-l, N = end-bgn+1 ; if(N < 2) return (bgn+N/2); else { double minv -= 100000.0; for(int kdx=bgn;kdx<=end;kdx++) if(w.enw(kdx)<minv) minv = w.enw(kdx); for( ;idx<=end; idx++jdx++) { double vj = w.enw(jdx)-minv; double vi = w.enw(idx)-minv; numer += vj*(2.0*jdx + idx) + vi*(jdx + 2.0*idx); denom += (vj + vi); }
} return int(nint(numer/(3.0*denom)));
} int SegRead::nrefine( char const* LNORDR, int FBW, int npts, int pass2 )
{ PKDET rawPks, refPks; float *xbnd, *ht, *lo, **om; int *bandcode, *insSP, *insWD, NPK, idx, jdx, sts; if(l != peakdet( npts, rawPks )) return 0; NPK = rawPks.npk(); xbnd = xbndara J rawPks ); if(NULL == xbnd) { status_ = STS_NO_MEM; return 0;
} if(NULL -= (bandcode = ::ivector(l, nWvf_->scanl() ))) { status_ = STS_NO_MEM;
::free_vector( xbnd, 1. NPK ); return 0;
} maxlanecode J npts, bandcode ); if(NPK >= STATIC J3UF_SZ) { ::free_vector( xbnd, 1 ,NPK ); status_ = STS_BUF2SMALL; return 0;
} insSP = ::ivector( 1, NPK ); if(NULL == insSP) { status_ = STS_NO_MEM; ::free_vector( xbnd, 1, NPK ); return 0;
} if(l != ::insMetric( rawPks.bmid(), rawPks.lgap(), insSP, NPK )) { status_ = STS_TOO_FEW;
::free_vector( xbnd, 1, NPK );
::free_ivector(insSP,l ,NPK); return 0; } ht = ::vector( 1 , NPK ); if(NULL == ht) { status_ = STS_NO_MEM;
::free_vector( xbnd, 1, NPK ); ::freeJvector( insSP, 1, NPK ); return 0;
} for(idx = 1 ; idx <= NPK; idx++) ht[ idx ] = float(nWvf_->enw( rawPks.bmid(idx) )); lo = ::vector( l. NPK ); if(NULL = lo) { status_ = STS_NO_MEM; ::free_vector( xbnd.1, NPK ); ::free vector( insSP, 1, NPK ); ::free_vector( ht, 1, NPK ); return 0;
} for(idx = 1 ; idx <= NPK; idx++) { float onsv, ofsv; onsv = float(nWvf_->enw( rawPks.bbgn(idx) )); ofsv = float(nWvf_->enw( rawPks.bend(idx) )); lo[ idx ] = onsv<ofsv?onsv:ofsv;
} om = ::matrix(l,NPK,l,2); sts = : :omitokn(insSP,ht,lo,rawPks.lgap(),rawPks.rgap(),xbnd,NPK.om); ::free_vector( xbnd, l ,NPK ); xbnd = NULL;
: :free_vector( ht, 1.NPK ); ht = NULL; ::free_vector( lo, 1,NPK ); lo = NULL; -free ivector( insSP, l.NPK ); insSP = NULL; if(l != sts) { ::free_matrix( om, 1 , NPK, 1,2 ); status_ = STS_TOO_FEW; return 0;
} enum OKNOMIT { BAND JDK = 1 , BAND N = 2, BAND_OMIT = 3 }; for(jdx=idx= 1 ;idx<=NPK;idx++) switch(OKNOMIT(nint(om[idx] [ 1 ]))) { case BAND_N: bandcode[ rawPks. bmid( idx ) ] = 5; case BAND JDK: nband J jdx ] = rawPks.band( idx ); seq_[ jdx ] = LNORDR[ nWvf_->envi( rawPks.bmid( idx ) )-l ]; jdx++; break;
} ::free_matrix( om, 1,NPK,1,2 ); om = NULL; NPK =jdx-l ; refPks.set( nband_, NPK ); insSP = ::ivector( 1, NPK ); if(NULL = insSP) { status_ = STS_NO_MEM; return 0;
} if(l != ::insMetric( refPks.bmid(), refPks.lgap(), insSP, NPK )) { status_ = STS_TOO_FEW; ::free ivector( insSP, 1 , NPK ); return 0;
} ins WD = ::ivector( 1, NPK ); if(NULL = ins WD) { status_ = STS_NO_MEM; : :free _ivector( insSP, 1 ,NPK ); return 0;
} if(l != ::insMetric( refPks.bmid(), refPks.bwid(), ins WD, NPK )) { status_ = STS_TOO_FEW; ::free vector( insSP,l,NPK );
::free_ivector( ins WD, 1, NPK ); return 0;
} om = ::matrix(l.NPK,l,2); sts = ::gapcheck(insSP, insWD, refPks.lgap(),refPks.bwid(),seq_, NPK,om);
::free_ivector( insWD, 1 ,NPK ); ins WD = NULL; if(l != sts) { ::freeJvector( insSP,l,NPK); ::free_matrix(om, 1 ,NPK, 1 ,2); return 0; } nband J 1 ] = refPks.band( 1 ); for(jdx=idx=2; idx<=NPK; idx++) { if(GAP_SPLIT = GPCHK(nint(om[idx][l]))) { int m4, numSmallGap, newSP, mid; double ratio; m4 = idx- 1 ; if(0 = insSP[m4]) { status^ = STS_TOO_FEW; ::free ivector( insSP,l,NPK ); ::free_matrix( om,l,NPK,l,2 ); return 0;
} ratio = double(refPks.rgap(m4))/double(insSP[m4]); numSmallGap = int(0.5 + 0.2 + ratio); if(0 == numSmallGap) { status_ = STS_TOO_FEW;
::freeJvector( insSP,l,NPK );
::free_matrix( om,l,NPK,l,2 ); return 0; } newSP = refPks.rgap( m4 ) / numSmallGap; mid = refPks.bmid( m4 ); for(int ndx=l ; ndx<numSmallGap; ndx++) { int bgn,end,c; mid += newSP; bgn = mid-newSP/2; end = bgn+newSP-1 ;
Band b( bgn, c=centroid (bgn.end). end, 1 ); nband J jdx++ ] = b; if(jdx >= STATIC_BUF_SZ) { status_ = STS_BUF2SMALL;
::freeJvector( insSP,l,NPK ); ::free_matrix( om, l .NPK, 1,2 ); return 0;
} }
} nband [ jdx++ ] = refPks.band( idx ); if(jdx >= STATICJ3UF SZ) { status_ = STS_BUF2SMALL; : :free ivector( insSP, 1 ,NPK );
::free_matrix( om,l ,NPK,l,2 ); return 0;
} } ::free_ivector( insSP, 1 ,NPK ); insSP = NULL;
::free_matrix( om, 1, NPK, 1,2 ); om = NULL; NPK = jdx- 1 ; refPks.set( nband_, NPK ); insSP = ::ivector( l, NPK ); if(NULL = insSP) { status_ = STS_NO_MEM; return 0;
} if(l != ::insMetric( re Pks.bmid(), ref?ks.lgap(), insSP. NPK )) { status_ = STS_TOO_FEW;
::freeJvector( insSP. 1, NPK ); return 0;
} xbnd = xbndara refPks ); if(NULL = xbnd) { status_ = STS_NO_MEM;
::freeJvector( insSP, 1, NPK); return 0;
} ht = ::vector( 1, NPK ); if(NULL == ht) { status_ = STS_NO_MEM; ::free vector( insSP,l,NPK ); ::free_vector( xbnd,l,NPK ); return 0; } for(idx=l ;idx<=NPK;idx++) ht[ idx ] = float(nWvf_->envv( refPks.bmid(idx) )); lo = ::vector( 1, NPK ); if(NULL == lo) { status_ = STS_NO_MEM;
::freeJvector( insSP,l,NPK ); : :free_vector( xbnd, 1 ,NPK ) ; ::free_vector( ht,l,NPK ); return 0; } for(idx = 1; idx <= NPK; idx++) { float onsv, ofsv; onsv = float(nWvf_->enw( refPks.bbgn(idx) )); ofsv = float(nWvf_->enw( refPks.bend(idx) )); lo[ idx ] = onsv<ofsv?onsv:ofsv;
} om = ::matrix( l.NPK, 1,2); sts = ::omitokn(insSP,ht,lo,refPks.lgap(),refPks.rgap(),xbnd,NPK,om ); ::free_vector( xbnd, l.NPK ); xbnd = NULL; ::free_vector( ht, l,NPK ); ht = NULL; ::free_vector( lo, l ,NPK ); lo = NULL;
::freeJvector( insSP, 1 ,NPK ); insSP = NULL; if(l != sts) { : :free_matrix(om, 1 ,NPK, 1 ,2); return 0; } for(idx=)dx=l ; idx<=NPK; idx++) { int bmid = refPks.bmid(idx); switch(OKNOMIT( nint(om[idx][l]) )) { case BANDJ-.: bandcode[ bmid ] = 5; case BAND JDK: nband [jdx ] = refPks.band(idx); seqj jdx ] = LNORDR[ bandcode[ bmid ]-l ]; jdx++; break; default: break; } } ::free_matrix( om, l.NPK, 1,2 ); om = NULL;
"free ivector( bandcode, 1 ,nWvf_->scanl() ); bandcode = NULL; NPK = jdx- 1 ; refPks.set( nband_, NPK ); return setBandStats( refPks, FBW, seq_, pass2 ); } /**********************************************************************
* FILE: nrutil.cxx
* TYPIST: Andy Marks */ #include <stdio.h> #include <stddef.h> #include <stdlib.h> #include <nrc/nrutil.hxx> #define NR_END 1 #define FREE_ARG char* void nrerror( char const error Jext[] )
{ fprintf(stderr, "Numerical Recipes run-time error. An"); fprintf(stderr,"%s\n", error ext ); fprintf(stderr,"...now exiting to system..An"); exit(l);
} float* vector( long nl, long nh )
{ float *v; v = (float*)malloc((sizeJ)((nh-nl+l+NR_END)*sizeof(float))); if(NULL == v) nrerror("allocation failure in vector()"); return v-nl+NR_END;
} int* ivector( long nl, long nh )
{ int *v; v = (int*)malloc((sizeJ)((nh-nl+l+NR_END)*sizeof(int))); if(NULL = v) { ::fprintf(stderr,"ivector: nl=%ld nh=%ld v=%p\n",nl,nh,v); ::nrerror("ivector: allocation failure");
} return v-nl+NR_END;
} unsigned char* cvector( long nl, long nh )
{ unsigned char v; v = (unsigned char*)malloc((size t)((nh-nl+l+NR_END)*sizeof(unsigned char))); if(NULL = v) nrerror("allocation failure in cvector()"); return v-nl+NR_END;
} unsigned long* lvector( long nl, long nh )
{ unsigned long *v; v = (unsigned long*)malloc((size t)((nh-nl+l+NR_END)*sizeof(unsigned long))); if(NULL == v) nrerror("allocation failure in lvector()"); return v-nl+NR_END;
} double* dvector( long nl, long nh ) { double *v; v = (double*)malloc((sizeJ)((nh-nl+l+NR_END)*sizeof(double))); if(NULL == v) nrerror("allocation failure in dvector()"); return v-nl+NR_END; } float* matrix(long nrl,long nrh,long ncl,long nch)
{ long i, nrow = nrh-nrl+1, ncol=nch-ncl+l ; float **m; m = (float* *)malloc((sizeJ)((nrow+NR_END)*sizeof(float*))); if(NULL = m) nrerror("allocation failure 1 in matrix()"); m += NR_END; -= rl; m[nrl] = (float*)malloc((sizeJ)((nrow*ncol+NR_END)*sizeof(float))); if(NULL = m[nrl]) nrerror("allocation failure 2 in matrix()"); m[nrl] += NR END; m[nrl] -= ncl; for(i=nrl+l ;i<=nrh;i++) m[i] = m[i-l]+ncol; return m; } double** dmatrix(long nrl,long nrh.long ncl, long nch)
{ long i, nrow = nrh-nrl+1, ncol=nch-ncl+l ; double **m; m = (double**)malloc((sizeJ)((nrow+NR_END)*sizeof(double+))); if(NULL = m) {
::fprintf(stderr,"nrl=%ld, nrh=%ld, ncl=%ld, nch=%ld, nrow=%ld\n", nrl,nrh,ncl,nch,nrow); nrerror(" allocation failure 1 in dmatrix()");
} m += NR_END; m -= nrl; m[nrl] = (double* )malloc((sizeJ)((nrow*ncol+NR_END)*sizeof( double))); if(NULL == m[nrl]) nrerror("allocation failure 2 in dmatrix()"); m[nrl] += NR_END; m[nrl] -= ncl; for(i=nrl+l ;i<=nrh;i++) m[i] = m[i-l]+ncol; return m;
} int** imatrix(long nrl,long nrh.long ncl,long nch)
{ long i, nrow = nrh-nrl+1 , ncol=nch-ncl+l ; int **m; m = (int*+)malloc((sizeJ)((nrow+NR_END)*sizeof(int*))); if(NULL == m) nrerror("allocation failure 1 in dmatrix()"); m += NR_END; m -= nrl; m[nrl] = (int*)malloc((sizeJ)((nrow*ncol+NR_END)*sizeof(int))); if(NULL == m[nrl]) nrerror("allocation failure 2 in imatrix()"); m[nrl] += NR_END; m[nrl] -= ncl; for(i=nrl+l ;i<=nrh;i++) m[i] = m[i-l]+ncol; return m; } float** submatrix(float **a, long oldrl, long oldrh, long oldcl, long oldch, long newrl, long newel)
{ long ij,nrow=oldrh-oldrl+l,ncol=oldch-oldcl+l ; float **m; m = (float**)malloc((sizeJ)((nrow+NR_END)*sizeof(float*))); if(NULL == m) nrerror("allocation failure in submatrix()"); m += NR_END; m -= newrl; for(i = oldrl, j=newrl; i<=oldrh; i++j++) m[j] = a[i]+ncol; return m;
} float** convert_matrix(float* a, long nrl, long nrh, long ncl, long nch) { long i, j, nrow=nrh-nrl+l , ncol=nch-ncl+l ; float **m; m = (float**)malloc((sizeJ)((nrow+NR_END)*sizeof(float*))); if(NULL == m) nrerror(" allocation failure in convert_matrix()"); m += NR_END; m -= nrl; m[nrl] = a-ncl; for(i=l ,j=nrl+l ; i<=nrow; i++,j++) m[j] = m[j-l]+ncol; return m; } void free_vector(float* v. long nl, long nh)
{ free((FREE_ARG)(v+nl-NR_END)); } void free ivector(int* v, long nl, long nh)
{ free((FREE_ARG)(v+nl-NR_END)); } void free_cvector(unsigned char* v, long nl, long nh)
{ free((FREE_ARG)(v+nl-NR_END)); } void free_lvector( unsigned long* v, long nl, long nh)
{ free((FREE_ARG)(v+nl-NR_END));
} void free_dvector(double* v, long nl, long nh)
{ free((FREE_ARG)(v+nl-NR_END));
} void free_matrix(float **m, long nrl, long nrh, long ncl, long nch)
{ free((FREE_ARG)(m[nrl]+ncl-NR_END)); free((FREE_ARG)(m+nrl-NR_END)); } void free_dmatrix(double ♦♦m, long nrl, long nrh, long ncl, long nch)
{ free((FREE_ARG)(m[nrl]+ncl-NR_END)); free((FREE_ARG)(m+nrl-NR_END));
} void free imatrix(int **m, long nrl, long nrh, long ncl, long nch)
{ free((FREE_ARG)(m[nrl]+ncl-NR_END)); free((FREE_ARG)(m+nrl-NR_END));
} void free_submatrix(int **b, long nrl, long nrh, long ncl, long nch) { free((FREE_ARG)(b+nrl-NR_END)); } void free_convert_matrix(int **b, long nrl)
{ free((FREE_ARG)(b+nrl-NR_END));
}
I**********************************************************************
* FILE: Pkdet.cxx * AUTHOR: Andy Marks
* COPYRIGHT (c) 1996, University of Utah */
#include <basecall/mb.hxx> PKDET::PKDET() : status JSTS JNITD), ppk JNULL), ptr_(NULL), gap JNULL), wid_(NULL), ins_(NULL), Np JO), Nt JO), medGapJO), medWid JO)
{ }
PKDET::PKDET( PKDET const& rhs ) : status JSTS JNITD), ppk JNULL), ptr JNULL), gap JNULL), wid JNULL), ins JNULL), NpJO), Nt JO), medGapJO), medWid JO)
{ this = rhs;
} PKDET const&
PKDET: :operator=(PKDET const& rhs)
{ if(&rhs != this) { release(); Np_ = rhs.Np_;
Nt =rhs.Nt ; medGap_ = rhs.medGap_; medWid_ = rhs.medWid_; ppk_ = ::ivector( 1, Np_ ); wid_ = ::ivector( 1, Np_ ); ins_ = ::ivector( 1, Np_ ); ptr_ = ::ivector( 1, Nt_ ); gap_ = ::ivector( 1, Nt_ ); if(!ppk_ || !ptr_ || !gap_ || !wid_ || ϋnsj { status_ = STS_NO_MEM; release();
} else { for(int idx = 1 ; idx<=Np_; idx++) { ppkj idx ] = rhs.ppk [ idx ]; widj idx ] = rhs. wid J idx ]; ptrj idx ] = rhs.ptrj idx ]; gapj idx ] = rhs.gapj idx ]; insj idx ] = rhs.insj idx ];
} ptrj idx ] = rhs.ptrj idx ]; gapj idx ] = rhs.gapj idx ];
} } return *this; } void PKDET: :release()
{ if(NULL != ppk { ::free_ivector( ppk_, 1, Np_ ); ppk_ = NULL; } if(NULL != widj { ::freeJvector( wid_. 1, Np_ ); wid_ = NULL; } if(NULL != ins { ::freeJvector( ins_, 1. Np_ ); ins_ = NULL; } if(NULL !■= ptrj { ::freeJvector( ptr_, 1. Nt_ ); ptr_ = NULL; } iffNULL != gapj { ::freeJvector( gap_.1, Nt_ ); gap_ = NULL; } Np_ = Nt_ = 0;
} PKDET: :~PKDET()
{ release();
} static int i_cmp(void const* el , void const* e2)
{ return (*(int const*)el)-(*(int const*)e2);
} int PKDET::set( int const* Pp, int Np, int const* Pt, int Nt )
{ int P1.T1, pxl.pxn, txl,txn, *gtmp, *wtmp; release(); pxl = Pp[ Pl=l ]; pxn = Pp[ Np ]; txl=Pt[Tl=l];txn = Pt[Nt]; if(pxl <txl)Pl++; if(pxn > txn) Np~; Np_ = Np-Pl+l; Nt_ = Nt; if((Nt_<2)||(Nt_!=(Np_+l))){
::fprintf(stderr,"PKDET::set, Np=%d, Nt=%d: Nt should = Np+l\n",Np_,NtJ; ::fprintf(stderr,"\tpxl=%d txl=%d pxn=%d txn=%d\n",pxl,txl,ρxn.txn); Nt_ = Np_ = 0; return 0; } ppk_ = ::ivector( 1, Np_ ); wid_ = ::ivector( 1, Np_ ); ins_ = ::ivector( 1, Np_ ); wtmp = ::ivector( 1, Np_ ); gtmp = ::ivector( 1, Np_-1 ); ptr_ = : :ivector( 1 , Nt_ ); gap_ = ::ivector( 1 , Nt_ ); if(!ppk_ || !ptr_ || !gap_ || !wid_ || !ins_ || Igtmp || Iwtmp) { status_ = STS_NO_MEM;
Nt_ = Np_ = 0; return 0;
} for(int idx=l; idx<=Np_; idx++) { ppkj idx ] = Pp[Pl++]; ptrj idx ] = Pt[Tl++]; widj idx ] = Pt[ Tl ]-Pt[Tl-l ]; insj idx ] = 0; wtmp[ idx ] = widj idx ]; if(idx<NpJ{ gapJl+idx] = Pp[Pl]-Pp[Pl-l]; gtmp[ idx ] = gapj 1+idx ];
} } ptrJidx] = Pt[Tl]:
::qsort( &gtmp[l], Np_-1, sizeof(+gtmp), i_cmp ); if((Np_-l)&l) { int mid = l+(Np_-l)/2; medGap_ = gtmp [mid];
} else { intmid = (Np_-l)/2; medGap_ = (gtmp[mid] + gtmp[mid+l])/2; } gaP_[l] = gaP-fN = medGap_; "free ivector( gtmp, l,Np_-l ); ::qsort( &wtmp[l], Np_, sizeof(*wtmp), i_cmp ); medWid_ = (Np_& 1 )? wtmp[ 1 +Np_/2] : (wtmp[Np 2]+wtmp[ 1 +Np 2])/2;
-free ivector( wtmp, l,Np_ ); return 1 ;
} int PKDET: :set( Band* Pb, int Nb )
{ int idx, *gtmp, *wtmp; release(); Np_ = Nb; Nt_ = Nb+l ; if(Nt_ < 2) { ::fprintf(stderr,"PKDET::set, Np=%d, Nt=%d: Nt!=Np+l\n",Np_.NtJ; Nt_ = Np_ = 0; return 0; } ppk_ = ::ivector( 1, Np_ ); wid_ -= ::ivector( 1, Np_ ); ins_ = ::ivector( 1 , Np_ ); ptr_ = ::ivector( 1, Nt_ ); gap_ = ::ivector( 1, Nt_ ); wtmp = ::ivector( 1 , Np_ ); gtmp = ::ivector( 1, Np_-1 ); if(!ppk_ || !ptr_ || !gap_ || !wid_ || !ins_ || Igtmp || Iwtmp) { status_ = STS_NO_MEM; Nt_ = Np_ = 0; return 0; } for(idx=l ; idx<Nb; idx++) {
Band bn = Pbfidx], bnpl = Pb[idx+1]; if(bn.end() != bnpl .bgn()) if(bn.wid() > bnpl .wid())
Pb[idx].end( bnpl.bgn() ); else Pb[idx+l].bgn( bn.end() ); ppk_[idx] = Pb[idx].mid(); ptr_[idx] = Pb[idx].bgn(); wid Jidx] = Pb[idx].wid(); insjidx] = Pbfidx]. ins(); wtmp[idx] = widjidx]; gapjl+idx] = Pb[idx+l].mid() - Pb[idx].mid(); gtmpfidx] = gapjl+idx];
} ppkj idx ] = Pb[ idx ].mid(); ptr_[ idx ] = Pb[ idx ].bgn(); ptr_[ idx+1 ] = Pb[ idx ].end(); widj idx ] = ptrjidx+l] - ptrjidx]; insj idx ] = Pb[ idx ].ins(); wtmp[ idx ] = wid_[ idx ];
::qsort( &gtmp[l], Np_-1, sizeof(*gtmp), i_cmp ); if((Np_-l)&l) { int mid = l+(Np_-l)/2; medGap_ = gtmp[mid];
} else { int mid = (Np_-l)/2; medGap_ = (gtmp[mid]+gtmp[mid+l])/2;
} gaP_[l] = gaP-fN-J = medGap_; ::freeJvector( gtmp, l,Np_-l ); ::qsort( &wtmp[l], Np_, sizeof(*wtmp), i cmp ); medWid_ = (Np_&l)? wtmp[l+NpJ2]: (wtmp[NpJ21+wtmp[l+NpJ2])/2; ::free ivector( wtmp, l,Np_ ); return 1 ;
} void
PKDET: :debug( int lvl ) const {
::printf("PKDET::debug\n");
::printf("Np=%3d Nt=%3d medGap=%2d medWid=%2d\n", Np_,Nt_,medGap_,medWidJ; if(NULL = ptr ::printf("PKDET::debug failed: ptr_=%p\n",ptrj; else if(NULL == ppkj
::printf("PKDET::debug failed: ppk_=%p\n",ppk ; else if(NULL == gap ::printf("PKDET::debug failed: gap_=%p\n",gap ; else if(NULL == wid
::printf("PKDET::debug failed: wid_=%p\n",widj; else if(NULL == ins
::printf("PKDET::debug failed: ins_=%p\n",insj; else for(int idx=l; idx<=Np_; idx++) { ::printf("%3d: %4d|%4d|%4d, lgap=%2d rgap=%2d width=%2d ins=%d\n", idx,ptr Jidx], ppk Jidx], ptr [idx+l],gap Jidx], gap Jidx+1], wid [idx],insjidx]); "fflush( stdout );
} } * FILE: Pkdet.hxx AUTHOR: Andy Marks
* COPYRIGHT (c) 1996, University of Utah */
#if !defined(_PKDET_HXXJ # define _PKDET_HXX_ #include <stdio.h> class Band { public: Band() : bJO), mJO), eJO), ins JO) {;} Band(int b, int m, int e, int i) : bjb), m Jm), e Je), ins (i) {;} ~Band() {;} int bgn() const { return b_; } int mid() const { return m_; } int end() const { return e_; } int wid() const { return e_-b_+l ; } int ins() const { return ins_; } void bgn( int b ) { b_ = b; } void mιd( int m ) { m_ = m; } void end( int e ) { e_ = e; } void ins( int i ) { ins_ = i; } void debug( int lvl=0 ) const {
: :printf("%d,%d,%d (inserted=%d)\n",b_,m_,e_,ins ; -fflush(stdout);
} private: short b_. m_, e_, ins_;
}; class PKDET
{ public: enum Status { STS UNINITD, STS JNITD, STS_NO_MEM }; PKDETO;
PKDET(PKDET const& rhs);
PKDET const& operator=(PKDET const& rhs);
-PKDETO; int set( int const* ppk. int Np, int const* ptr, int Nt ); int set( Band* pb, int Nb );
Status status() const { return status_; } int npk() const { return Np_; } int ntr() const { return Nt_; } int const* bbgn() const { return ptr_; } int const* bmid() const { return ppk_; } int const* bend() const { return &ptrjl]; } int ins(int idx) const { return ins Jidx]; } int const* bwid() const { return wid_; } int bbgn(int idx) const { return ptr [idx]; } int bmid(int idx) const { return ppk [idx]; } int bend(int idx) const { return ptrjl+idx]; } int bwid(int idx) const { return widjidx]; } int lgap(int idx) const { return gap Jidx]; } int rgap(int idx) const { return gapjl+idx]; } Band band(int idx) const {
Band b(bbgn(idx),bmid(idx),bend(idx),ins(idx)); return b;
} int const* lgap() const { return gap_; } int const* rgap() const { return &gapjl]; } int medGapO const { return medGap_; } int medWid() const { return medWid_; } void debug( int lvl = 0 ) const; private: Status status ; int ppk_, *ptr_,
*gap_,
*wid_,
ins_; int Np_,
Nt_. medGap_, medWid_; void release(); };
#endif
I*********************************************************************
* FILE: polfit.cxx * AUTHOR: Philip R. Bevington, p 140-141 */
#include <nrc/nr.hxx> #if!defined(SA) int polfιt( float const* px, float const* py, float const* sigmaY, int NPTS, int NTERMS, PFWGHT mode, float coef[], float& chisq ) int NMAX, inr, jnr; float degfre, **soln, *sumx, *sumy, **array; NMAX = 2*NTERMS-1 ; sumx = ::vector(l ,NMAX); sumy = ::vector(l.NTERMS); soln = ::matrix(l,NTERMS,l,l); array = ::matrix(l,NTERMS,l,NTERMS); for(inr=l ;inr<=NMAX;inr++) suπυ nr] = O.Of; for(inr=l ;inr<=NTERMS;inr++) sumy[inr] = O.Of; chisq = O.Of; for(inr=l;inr<=NPTS;inr++) { float x, y, wt, xterm, yterm; x = pxfinr]; y = pypnr]; switch(mode) { case INSTRUMENTAL: if(y<0) wt = -1.0f/y; else if(0=y) wt = l.Of; else wt = 1.Of/y; break; case NO EIGHTING: wt = 1.0f; break; case STATISTICAL: wt -= 1.0f/(sigmaY[inr]*sigmaY[inr]); break; } xterm = wt; for(jnr-=l ;jnr<=NMAX;jnr++) { sumxβnr] += xterm; xterm *= x;
} yterm = wt*y; forOnr=l ;jnr<=NTERMS;jnr++) { sumyfjnr] += yterm; yterm *= x;
} chisq += wt+y+y;
} for(inr=l ;inr<=NTERMS;inr++) { soln[inr][l] = sumy[inr]; forOnr=l ;jnr<=NTERMS;jnr++) array[inr]ijnr] = sumx[inr+jnr-l];
} ::gaussj( array,NTERMS, soln,l); for(inr=l;inr<=NTERMS;inr++) { coeflinr] = soln[inr][l]; chisq -= 2.0f*soln[inr][l]*sumy[inr]; forOnr= 1 ;jnr<=NTERMS;jnr++) chisq += soln[inr][l]+soln[jnr][l]+sumx[inr+jnr-l];
} : :free_vector(sumx, 1 ,NM X); ::free_vector(sumy,l, NTERMS);
: :free_matrix(soln, 1.NTERMS, 1,1); : :free_matrix(arra , 1.NTERMS, 1.NTERMS); if(0.0f = (degfre = float(NPTS - NTERMS))) return 1; chisq /= degfre; return 0: }
#endif
#if defined(SA) #include <stdio.h> int main(int argc, char* argv[]) t static float x[] =
{ O.Of,
282.0f. 338.0f. 393.0f. 460.0f, 486.0f, 512.0f. 564.0f. 615.0f, 668.0f, 726.0f,
780.0f, 832.0f. 886.0f, 940.0f, 993.Of, 1047.0f,1099.0f.l l52.0f,1204.0f,1262.0f, 1317.0f,1370.0f,1424.0f,1476.0f,1532.0f,
1587.0f,1640.0f.l694.0f,1746.0f,1801.0f
} ; static float y[] =
{ O.Of.
1436.0f,1420.0f,1408.0f,1404.0f,1400.0f, 1388.0f,1384.0f,1364.0f,1364.0f,1352.0f,
1348.0f,1340.0f,1340.0f,1336.0f,1328.0f, 1328.0f, 1328.0f,l 328.0f,l 324.Of, 1328.0f, 1328.0f,1324.0f,1324.0f,1324.0f,1324.0f,
1320.0f,1332.0f,1332.0f,1332.0f,1328.0f
};
# define NPTS (sizeof(x)/sizeof(x[0]) - 1) float ochisq; for(int tnr=2;tnr<=6;tnr++) { float chisq, coef[8]; (void)polfit(x,y,NULL,NPTS.tnr,NO_WEIGHTING.coef,chisq); : :printf("\ntnr-=%d chisq=%f\n",tnr,chisq); for(int t=l;t<=tnr;t++) ::printf("\tcoeft%d]=%f\n",t,coefTt]); if(2!=tnr) { float improv = lOO.OPochisq/chisq; ::printf("IMPROVED BY %7.2f/o%\n,,,improv); if(improv < 115.0f) break; } ochisq = chisq; } }
#endif
^******************************************* ***************************
* FILE: preproc.cxx AUTHOR: Andy Marks
* COPYRIGHT (c) 1996, University of Utah */
#include <basecall/mb.hxx> struct MINV { double v; int i; }; void
Wvfm::truvelAdjust() { double const EXPONENT = 1.0/2.2; double CONSTANT = ::pow( 1.0/255.0, EXPONENT ); double minv[5]; int Idx, sdx; for(ldx = 1 ; Idx <= lanes(); ldx++) { minv[ldx] = 1000.0; for(sdx = bgni(); sdx <= endi(); sdx++) { double v = sc_la(sdx,ldx); if(v <= 63.0) v = v*0.2511; else if(v <= 127.0) v = (v*0.7028 - 255.0*0.1129); else if(v <= 191.0) v = (v* 1.245 - 255.0*0.3887); else v = (v*1.7916 - 255.0+0.7916); v = pow( 1.0/v, EXPONENT ) - CONSTANT; sc_la_set( sdx, Idx, v ); if(v < minv[ldx]) minv[ldx] = v; }
} for(ldx = 1 ; Idx <= lanes(); ldx++) { double mn = minvfldx]; for(sdx = bgni(); sdx <= endi(); sdx++) sc_la_sub( sdx.ldx, mn );
} } static void minimum( double **pm, int ROW, int COL, int N, MINV& rm ) { rm.v = pm[ rm.i=ROW ][ COL ]; for(int sdx = ROW+1 ; sdx<ROW+N; sdx++) if( pm[ sdx ][ COL ] < rm.v ) rm.v = pm[ rm.i=sdx ][ COL ]; } void Wvfm::baseline( int N )
{ int LEN = endi()-bgni()+l, sdx, Idx; MINV minv[5]; double **pm; pm = ::dmatrix( 1 , LEN + (2%N), 1, lanes() ); for(ldx=l ; ldx<=lanes(); ldx++) { double S 1 = sc_la( bgni(), Idx ), SN = sc_la( endi(), Idx ); for(sdx=l; sdx<=N; sdx++) { pm[sdx][ ldx ] = Sl; pm[N+LEN+sdx][ Idx ] = SN;
} for(sdx=l; sdx<=LEN; sdx++) pm[N+sdx ][ Idx ] = sc_la( bgni()+sdx-l , Idx);
} for(Idx=l; ldx<=lanes(); ldx++) minimum( pm. 1 , Idx, 2+N, minvfldx] ); for(idx = 1 ; Idx <= lanes(); ldx++) { int irnr, Irnr; irnr = bgni(); for(lrnr = N+l; Irnr <- N+LEN; irnr++) { pm_[ irnr ][ Idx ] = pm[lrnr++][ldx] - minv[ldx].v; if( minv[ Idx ].i < (lrnr-N) ) minimum( pm, lrnr-N, Idx, 2*N, minv[ldx] ); else if( minv[ Idx ].v > pm[ lrnr+N-1 ][ Idx ]) minv[ Idx ].v = pm[ minv[ Idx ].i = lrnr+N-1 ][ Idx ]; } } ::free_dmatrix(pm,l,LEN+2*N, l, lanes());
} void
Wvfm::noZeros()
{ for(int Idx = 1 ; Idx <= lanes(); ldx++) for(int sdx = bgni(); sdx <= endi(); sdx++) if(0.0 == pm sdx][ldx]) pm_[sdx][ldx] = DBL_EPSILON;
} int Wvfm : : mdy npre()
{ bgnEnd(); if(l !=specSep()) return 0; if(_lX!=ibJ{ double **pn; float *x, *y, *y2; int r,R,s,c,k,NF,N,NP; NP = C3X==ibJ?3:2; NF = (scanl()+l)/ P;
N =(NF*NP*NP+1); if(NULL = (pn = ::dmatrix( 1,N, l,lanes()))) { status_ = STS_NO_MEM; return 0; } x= ::vector(l,4); y = ::vector(l,4); y2 = ::vector(l,4); if(!x || !y || !y2) { status_ = STS_NO_MEM; return 0;
} for(c=l; c<=lanes(); C++) {
R = (bgni()-1)*NP+1; for(r=bgni(); r<=(endi()-3); r+=NP) { for(s=0; s<=3; s++) { x[s+l ] = float(r+s); y[s+l ] = float(sc_la(r+s,c));
} ::spline( x, y, 4, y[2]-y[l], y[4]-y[3], y2); for(s=0; s<NP; s++) for(k=0; k<NP; k++) { float yout, xin; xin = float(r+s)+float(k)/float(NP); ::splint( x, y, y2, 4, xin, &yout); pn[R++][c] = double(yout);
} } pn[R][c] = y[4];
} ::free_vector(x,l,4);
: :free_vector(y, 1 ,4);
::free_vector(y2,l,4); pm( pn. R, bgni()*NP-NP+l,endi()*NP-NP+l );
} return 1 ;
} int
Wvfm::preproc()
{ if(MDYN = ds if(l != mdynpre()) return 0; if(TRUVEL = dsj truvelAdjust(); else if(MDYN != ds baseline( 50 ); noZerosQ; bgni_++; endi_— ; return 1 ; }
;****************************************************************
FILE: Quadratic.cxx
AUTHOR: Andy Marks COPYRIGHT (c) 1996, University of Utah
*/
#include <stdio.h>
#include <float.h>
#if defined(sun) # include <ieeefp.h>
#endif
#if defined(_WIN32)
# define finite _finite
#endif #include <nrc/nr.hxx> int ilinreg( int const* px, int const* py, int N, float* coef, float* prr)
{ float *m, **v, muY=0.0f; double sx, sx2, ss_about_mu=0.0, ss_due_reg=0.0, ss_about_reg=0.0; int idx; coeflO] = coef[l] = coef[2] = O.Of; if(N<2) return 0; if(NULL = (m = ::matrix(1.2,l,2))) return 0; if(NULL == (v = ::matrix(l,2,l,l))) { if(m) ::free_matrix(m,l,2,l,2); return 0;
} v[l][l] = v[2][l] = 0.0f; sx = sx2 = 0.0; for(idx=l; idx<=N; idx++) { double x = double(px[idx]), y = double(py[idx]); sx += x; sx2 += x*x; v[l][l] += float(y); v[2J[l] +-= float(x*y);
} muY = v[l][l]/float(N); m[l][l] -= float(N); m[l][2] = m[2][l] -= float(sx); m[2][2] = float(sx2); gaussj( m,2, v.l ); coef[0] = v[l][l]; coefTl] = v[2][l]; for(idx=l ;idx<=N:idx++) { double yh = coef[l ]*double(px[idx]) + coef[0]; double y = double(py[idx]); ss about mu += (y-muY)*(y-muY); ss_due_reg += (yh-muY)*(yh-muY); ss_about_reg += (y-yh)*(y-yh); } if(N>2) coef 2] = float(ss_about_reg/double(N-2)); if(NULL != prr)
*prr -= float(ss_due_reg/ss_about_mu); ::free_matrix(m,l,2,l,2);
::free_matrix(v,l,2,l,l); return 1 ;
} int linreg( float const* px, float const* py, int N, float* coef, float* prr) { float **m, **v, muY=0.0f; double sx, sx2, ss_about_mu=0.0, ss_due_reg=0.0, ss_about_reg=0.0; int idx; coef[0] = coefT.1] = coef[2] = O.Of; if(N<2) return 0; if(NULL = (m = ::matrix(l,2,l,2))) return 0; if(NULL = (v = ::matrix(l,2,l,l))) { if(m) ::free_matrix(m,l,2,l,2); return 0;
} v[l][l] = v[2][l] = 0.0f; sx = sx2 = 0.0; for(idx=l; idx<=N; idx++) { float x=px[idx], y=py[idx]; sx += x; sx2 += x*x; v[l][l] += float(y); v[2][l] += float(x*y);
} muY = v[l][l]/float(N); m[l][l] = float(N); m[l][2] = m[2][l] = float(sx); m[2][2] = float(sx2); gaussj( m,2, v,l ); coef[0] = v[l][l]; coef[l] = v[2][l]; for(idx=l ;idx<=N;idx++) { double yh = coefp]*double(px[idx]) + coef[0]; double y = double(py[idx]); ss about mu += (y-muY)*(y-muY); ss due reg += (yh-muY)*(yh-muY); ss_about_reg += (y-yh)*(y-yh);
} if(N>2) coef[2] = float(ss_about_reg/double(N-2)); if(NULL != prr)
*prr = float(ss_due_reg/ss_about_mu); : :free_matrix(m, 1 ,2, 1 ,2); : :free_matrix(v, 1 ,2, 1 , 1 ); return 1; } int iquadratic( int const* px, int const* py, int N, float* quad)
{ float **m, **v; double sx, sx2, sx3, sx4, uy; double sum; int idx; quad[0] = quad[l] = quad[2] = quad[3] = O.Of; if(N<=3) return 0; m = ::matrix(l,3,l,3); if(!m) return 0; v = ::matrix(l,3,l,l); if(!v) { ::free_matrix(m,l,3,l,3); return 0; } v[l][l] = v[2][l] = v[3][l] = 0.0f; sx = sx2 = sx3 = sx4 = 0.0; for(idx=l ; idx<=N; idx++) { double x = double(px[idx]), y = double(py[idx]); sx += x; sx2 += x*x; sx3 += x*x*x; sx4 += x*x*x*x; v[l][l] += float(y); v[2][l] += float(x*y); v[3][l] += float(x*x*y); } uy = v[l][l]/float(N): sum = 0.0; for(idx=l; idx<=N: idx++) { double y = double(py[idx]); sum += (y-uy)*(y-uy);
} sum /= double(N-l); if(0.0 = sum) quad[0] = float(uy); else { v[l][l] /= float(sum); v[2][l] /= float(sum); v[3][l] /= float(sum); m[l][l] = float(float(N)/sum); m[l][2] = m[2][l] = float(sx sum); m[l][3] = m[2][2] = m[3][l] = float(sx2/sum); m[2][3] = m[3][2] = float(sx3/sum); m[3][3] = float(sx4/sum); gaussj( m,3, v,l); sum = 0.0; for(idx=l ; idx<=N: idx++) { double x = double(px[idx]), y = double(py[idx]), yprim; ypπm = v[l][l] + v[2][l]+x + v[3][l ]*x*x; sum += (yprim-y)*(yprim-y);
} quad[0] = v[l][l]; quad[l] = v[2][l]; quad[2] = v[3][l]; quad[3] = float(sum double(N-2-l));
} ::free_matrix(m,l,3,l,3);
::free_matrix(v,l,3,l,l); return 1 ;
} int dquadratic( double const* px, double const* py, int N, float* coef)
{ float ♦ ♦m, ♦ ♦v; double sx, sx2, sx3, sx4, uy, sum; int idx; coef[0] = coetTl ] = coef[2] = coef ] = O.Of; if(N<=3) return 0; m = ::matrix(l,3,l,3); if(!m) return 0; v = ::matrix(l,3,l,l); if(!v) {
::free_matrix(m, 1 ,3, 1 ,3); return 0;
} v[l][l] = v[2][l] = v[3][l] = 0.0f; sx = sx2 = sx3 = sx4 = 0.0; for(idx=l; idx<=N; idx++) { double x=px[idx], y=py[idx]; sx +=x; sx2 += x+x; sx3 += x+x+x; sx4 += x+x*x*x; v[l][l]+=float(y); v[2][l]+=float(x*y); v[3][l]+=float(x*x*y); } uy = v[l][l]/float(N); sum - 0.0; for(idx=l; idx<=N; idx++) { double y = double(py[idx]); sum += (y-uy)*(y-uy);
} sum /= double(N-l); if(0.0 = sum) { coeflO] = float(uy); coefl 1 ] = coef[2] = coefI3] = O.Of;
} else { v[l][l]/=float(sum); v[2][l]/=float(sum); v[3][l]/=float(sum); m[l][l] = float(float(N)/sum); m[l][2] =m[2][l] = float(sx/sum); m[l][3] = m[2][2] = m[3][l] = float(sx2/sum); m[2][3] = m[3][2] = float(sx3/sum); m[3][3] = float(sx4/sum); gaussj(m,3,v,l); sum = 0.0; for(idx=l; idx<=N; idx++) { double x = px[idx], y = py[idx], yprim; yprim = v[l][l] + v[2][l]*x + v[3][l]*x*x; sum += (yprirn-y)*(yprim-y);
} coef[0] = float(v[l][l]); coef[l] = float(v[2][l]); coef[2] = float(v[3][l]); coefT3] = float(sum/double(N-2- 1 ));
} ::free_matrix(m,l,3,l,3); : : free_matrix( v, 1 ,3 , 1 , 1 ) ; if(!finite(coef[3])) coefT3] = float(lEό); return 1 ;
}
j* *********************************************************************
* FILE: Quadratic. hxx * AUTHOR: Andy Marks
* COPYRIGHT (c) 1996, University of Utah */
#if !defined _QUADRATIC_HXXJ
* define _QUADRATIC_HXX_ void iquadratic(int const* x, int const* y, int n, float coef .]); void dquadratic(double const* x, double const* y, int n, float coef{4]); #endif
l* ********************************************************************* * FILE: RatioBin.hxx
AUTHOR: Andy Marks */
#include <stdio.h> #include <stdlib.h> #include <math.h> #include <basecall RatioBin.hxx> #include <nrc/nr.hxx> RatioPattern::RatioPattern(int nfluor) : nfluor_(nfluor), npatrn (O), chmrdy O)
{ int pdx, rdx; for(pdx=0;pdx<MAXPTRN;pdx++) {
PATRN& = patrn pdxl; rp.frstAtBand = φ.lastAtBand = 0; φ.nentry = rp.t3 = φ.t4 = 0; φ.r34 = 0.0; for(rdx=0;rdx<nfluor_;rdx++) φ.rasumfrdx] = φ.ratio[rdx] = O.Of;
} #if 1
PATRN* pp; pp = &patrn_[0]; pp->nentry=l ; pp->t3=3; pp->t4=l ; pp->r34=0.8f; pp->rnsum[0] = l.Of; pp->rnsum[l] = 0.5f; pp->rnsum[2] = 0.5f; pp->rnsum[3] = 0.8f; pp->ratio[0] = l.Of; pp->ratio[l] = 0.5f; pp->ratio[2] = 0.5f; pp->ratio[3] = 0.8f; pp = &patrn_[l]; pp->nentry=l ; pp->t3=3; pp->t4=l ; pp->r34=0.6f; pp->rnsum[0] = 0.4f; pp->rnsum[l] = l .Of; pp->rnsum[2] = 0.3f; pp->rnsum[3] = 0.3f; pp->ratio[0] = 0.4f; pp->ratio[l ] = 1.Of; pp->ratio[2] = 0.3f; pp->ratio[3] = 0.3f: pp = &patrn_[2]; pp->nentry=l; pp->t3=l ; pp->t4=2; pp->r34=0.3f; pp->rnsum[0] = O.Of; pp->rnsum[l] = 0.3f; pp->rnsum[2] = 1.Of; pp->rnsum[3] = O.lf; pp->ratio[0] = O.Of; pp->ratio[l] = 0.3f; pp->ratio[2] = l.Of; pp->ratio[3] = O.lf; pp = &patm 3]; pp->nentry=l ; pp->t3=3; pp->t4=2; pp->r34=0.8f; pp->rnsum[0] = 0.3f; pp->rnsum[l] = 0.4f; pp->rnsum[2] = l .Of; pp->rnsum[3] = 0.8f; pp->ratio[0] = 0.3f; pp->ratio[l] = 0.4f; pp->ratio[2] -= l .Of; pp->ratio[3] = 0.8f; npatrn_ = 4; #endif for(pdx=0;pdx<nfluor_;pdx++) for(rdx=0;rdx<nfluor_;rdx++) chm_[pdx][rdx] = (pdx==rdx)?l .0:0.0;
} void
RatioPattem::debug() const
{ ::printf("P# #ent t3 t4 frst last rat34 ratfl 1 ratfl2 ratfl3 ratfl4\n"); for(int pdx=0;pdx<MAXPTRN;pdx++) { PATRN const& φ = patrn_[pdx];
::printf("%2d %4d %2d %2d %4d %4d %5.3f %6.3f %6.3f %6.3f %6.3f\n", pdx,φ.nentry,φ.t3,φ.t4,φ.frstAtBand,φ.lastAtBand,φ.r34. φ.ratio[0],φ.ratio[l],φ.ratio[2],φ.ratio[3]);
::printf("\n");
} } int RatioPattem::add( double const* obs, int bandnr )
{ double mind=4.0f; int minx=- 1 , pdx, rdx, rv= 1 ; if(bandnr<=2 || npatrn_>=10) return 1; for(pdx=0;pdx<npatrn_;pdx++) { double eucd=0.0; for(rdx=0;rdx<nfluor_;rdx++) { double v = (patrn_[pdx].ratio[rdx]-obs[rdx]); eucd += v*v; } eucd = "sqrt(eucd); if(eucd<mind) { mind=eucd; minx=pdx;
} } if((-l=minx || mind>MAXEUCD) && npatrn_<MAXPTRN) { PATRN& φ = patrn_[minx = npatrn_++]; φ.frstAtBand = bandnr; φ.nentry++; for(rdx=0;rdx<nfluor_;rdx++) { φ.ratiofrdx] = obs[rdx]; φ.msum[rdx] = obs[rdx];
} mind = 0.0; } else if(mind<=MAXEUCD) { PATRNffe = patrn_[minx]; φ.lastAtBand = bandnr; φ.nentry++; for(rdx=0;rdx<nfluor_;rdx++) { φ.ratio[rdx] = ((FLTRK-1.0)*φ.ratio[rdx] + obs[rdx]) FLTRK: φ.rnsum[rdx] += obsfrdx];
} } else rv = 0; 0 if(l=rv) { PATRN& = patrn_[minx]; ::printf('OBS[nn](%5.2f%5.2f%5.2f%5.2f)|%5.3fl". obs[0],obs[l],obs[2],obs[3].mind); ::printf("PAT[%2d](%5.2f %5.2f %5.2f %5.2f)\n", minx.φ.ratio[0],φ.ratio[ 1 ].φ.ratio[2],φ.ratio[3]);
} #endif return rv;
} void
RatioPattern: :sortOnNentry_()
{ for(int pdx=npatrn_- 1 ;pdx>0;pdx-) for(int tdx=0;tdx<pdx;tdx++) if(patrn_[tdx].nentry < patrn_[tdx+l ].nentry) { PATRN tmp = patm tdx]; patrn_[tdx] = patrn_[tdx+l]; patrn_[tdx+ 1 ] = tmp;
} } void
RatioPattern: :sortOnR34_() { int pdx. rdx, tdx; for(pdx=0;pdx<npatrn_;pdx++) { PATRN& φ = patrn pdx]; double dtmp[4]; int itmp[4]; for(rdx=0;rdx<nfluor_;rdx++) { φ.ratiofrdx] = φ.rnsum[rdx]/double(φ.nentry); dtmpfrdx] = φ.ratio[rdx]; itmp[rdx] = rdx; } for(tdx=nfluor_- 1 ;tdx>0;tdx~) for(rdx=0 ;rdx<tdx :rdx++) if(dtmp[rdx]>dtmp[rdx+l]) { double dt = dtmpfrdx]; dtmp[rdx] = dtmp[rdx+l]; dtmp[rdx+l] = dt; int it = itmp[rdx]; itmpfrdx] = itmp[rdx+l]; itmp[rdx+l] = it; } φ.r34 -= (0.0=dtmp[3])? 1.0: dtmp[2]/dtmp[3]; φ.t4 = itmp[3]; φ.t3 = itmp[2];
} for(pdx=nfluor_- 1 ;pdx>0;pdx~) for(tdx=0:tdx<pdx;tdx++) if(patm_[tdx].r34 > patrn_[tdx+l].r34) { PATRN t = patrn tdx]; patrn ftdx] = patrn_[tdx+l]; patrn tdx+l] = t;
}
} int
RatioPattern: :orderIt_() {
PATRN* ordered = new PATRN[ nfluor_ ]; int pdx. rv = 1 ; for(pdx=0;pdx<nfluor_;pdx++) ordered[pdx].nentry = 0; if(NULL = ordered) return 0; for(pdx=0;pdx<nfluor_;pdx++) { PATRN const& = patrn_[pdx]; int mjr=φ.t4, mnr=φ.t3; if(0=ordered[mjr].nentry) ordered[mjr] = φ; else if(0=ordered[mnr].nentry) orderedfmnr] = φ; else { int alt = ordered[mjr].t3; if(0=ordered[alt].nentry) { ordered[alt] = ordered [mjr]; ordered[mjr] = φ;
} else { alt = ordered[mnr].t3; if(0=ordered[alt].nentry) { ordered [alt] = ordered [mnr] ; ordered[mnr] = φ;
- else { rv = 0; goto bugout;
} } } } for(pdx=0;pdx<nfluor_;pdx++) patrnj dx] = ordered [pdx]; bugout: delete [] ordered; return rv; } double
RatioPattern: :chm(int rn. int cn)
{ if(0=chmrdyj { sortOnNentry ); sortOnR34_(); if(l == (chmrdy_ = orderlt ))) for(int pdx=0;pdx<nfluor_;pdx++) for(int tdx=0;tdx<nfluor_;tdx++) chm_[tdx][pdx] = patrn_[pdx].ratio[tdx]; } return chm_[rn- 1 ] [cn- 1 ] ;
}
RatioBin::RatioBin( int nf ) : nfluor nf), CHM NULL), SST NULL), φ_(nf)
{ if(nfluor_>l) {
CHM_ = ::matrix(l. nfluor , l.nfluor_); SST_ = ::matrix(l ,nfluor_,l,nfluor_); }
} RatioBin::RatioBin( RatioBin const& rhs ) : nfluor O), CHM NULL), SSTJNULL)
{ *this = rhs;
} void
RatioBin: :release_()
{ if(NULL != CHM { ::free_matrix(CHM_,l,nfluor_,l, nfluor J; CHM_ = NULL; } if(NULL != SSTJ { ::free natrix(SST_,l,nfluor_,l,nfluor_); SST_ = NULL; } }
RatioBin const&
RatioBin::operator=( RatioBin const& rhs )
{ if(&rhs != this) { release_(); nfluor_ = rhs.nfluor ; φ_ = rhs.φ_; if(nfluor_>l ) { if(NULL = (CHM_ = ::matrix(l,nfluor_,l , nfluor J)) goto bugout; if(NULL = (SST_ = ::matrix(l ,nfluor_,l , nfluor J)) {
: :free_matrix(CHM_, 1 ,nfluor_, 1 ,nfluor_); goto bugout;
} for(int rdx=l;rdx<=nfluor_;rdx++) for(int cdx= 1 ;cdx<=nfluor_;cdx++) {
CHM_[rdx][cdx] = rhs.CHM_[rdx][cdx]; SST_[rdx][cdx] = rhs.SST_[rdx][cdx];
1
/
} } bugout: return *this;
}
RatioBin: :-k-RatioBin() { reiease_();
} int
RatioBin::classify( double const * smpl, int x ) { double mxv, normal ized [4]; int idx. mxi; mxv = smpl[mxi=0]; for(idx=l ;idx<nfluor_;idx++) if(smpl[idx]>mxv) mxv = smpl[mxi=idx]; for(idx=0;idx<nfluor_;idx++) normalized[idx] = smpl[idx]/mxv; (void)φ_.add( normalized, x ); return 1 ; } void RatioBin: :debug(unsigned lvl) const
{ int idx, cdx; ::printf("RatioBin @ %p\n",(void*)this); φ_.debug();
::printf("CHM @ %p:\n",(void*)CHM J; if(NULL != CHM J for(idx=l ;idx<=nfluor_:idx++) { ::printf("\t"); for(cdx=l ;cdx<=nfluor_;cdx++)
::printf("%8.5f ".CHM_[idx][cdx]); ::printf("\n");
} -printfC'SST @ %p:\n",(void*)SSTJ; if(NULL != SSTJ for(idx=l ;idx<=nfluor_;idx++) { ::printf("\t"); for(cdx=l ;cdx<=nfluor_;cdx++) ::printf("%8.5f M.SST_[idx][cdx]);
::printf("\n"); } } int RatioBin::analyze() { float** CNT = ::matrix(l,nfluor_,l,l ); int idx. jdx; for(idx=l :idx<=nfluor_;idx++) {
CNT[idx][l] = 1.0f; for(jdx= 1 ;jdx<=nfluor_;jdx++)
CHM_[idx][jdx] = float(φ_.chm(idx,jdx));
} for(jdx=l ;jdx<=nfluor_;jdx++) for(idx= 1 ;idx<=nfluor_;idx++) SST_[idx]ιjdx] = CHM_[idx]Qdx];
::gaussj( SST_, nfluor_, CNT, 1 ); if l for(idx=l ;idx<=nfluor_;idx++) { int mxi=l ; float mxv=SST_[mxi] [idx]; for(jdx=2;jdx<=nfluor_;jdx++) if(SST_[jdx][idx]>mxv) { mxv = SST_[jdx][idx]; mxi = jdx; } if(idx!=mxi) { ::fprintf(stderr,"fluor(%d) has max SST value of %f for fluor(%d)\n",idx,mxv,mxi); SST_[l][l]=1.0f; SST_[l][2]=0.4f; SST_[l][3]=0.0f; SST_[l][4]=0.3f; SST_[2][l]=0.5f; SST_[2][2]=1.0f; SST_[2][3]=0.3f; SST_[2][4]=0.4f; SST_[3][l]=0.5f; SST_[3][2]=0.3f; SST_[3][3]=1.0f; SST_[3][4]=1.0f;
SST_[4][l]=0.8f; SST_[4][2]=0.3f; SST_[4][3]=0.1f; SST_[4][4]=0.8f; ::gaussj( SST_, nfluor_, CNT, 1 ); break:
} }
#endif ::free_matrix( CNT, l,nfluor_. 1,1); return 1;
* FILE: RatioBin.hxx
AUTHOR: Andy Marks
* Copyright © 1996 University of Utah
*/ #if !defιned(_RATIOBIN_HXX J #defιne _RATIOBIN_HXX_ static int const MAXPTRN = 20; static double const FLTRK = 12.0; static double const MAXEUCD = 0.50; class RatioPattern
{ public: RatioPattem( int nfluor=4 ); -RatioPatternO { } ; int add(double const* obs, int bandNr); double chm(int rdx, int cdx); void debugO const; private: int nfluor_; int npatrn_; int chmrdy_; struct PATRN { int nentry; double ratio[4]; double rnsum[4]; double r34; short t3, t4, frstAtBand, lastAtBand; } patrn MAXPTRN]; double chm_[4][4]; void sortOnNentry_(); void sortOnR34_(); int orderIt_(); }; class RatioBin
I
\ public:
RatioBin( int nsmpl=4 ); RatioBin( RatioBin const& rhs );
RatioBin const& operator=( RatioBin const-fe rhs);
-RatioBinQ; int classify( double const* smpls, int scanline ); int analyze(); double sst(short row, short col) const { return SST_[row][col]; } void debug(unsigned lvl=0) const; private: void release_(); int nfluor_; float** CHM_; float** SST_;
RatioPattern φ_;
}; #endif
********************************************************************** * FILE: RdrOut.cxx AUTHOR: Andy Marks
* COPYRIGHT (c) 1996, University of Utah */ #include <basecall/mb.hxx> #include <nrc/Complex.hxx> #include <basecall/sw.hxx> SSNODE::SSNODE() : fltwid O), start O), finish O), S old O), thresh_(0.0f) {
} void
SSNODE::debug() const
{ ::printf("SSNODE @ %p\n",(void*)this);
::printf(" fltwid = %4d\n",fltwid());
::printf(" SWold = %4d\n",SWold());
::printf(" start = %4d\n",start());
::printf(" finish = %4d\n",fιnish()); ::printf(" rdlen = %4d\n",rdlen());
::printf(" thresh = %4.2 n",thresh());
} QualCtrl::QualCtrl() : tbgnJO). tendJO), nseg O), skipseq_(NULL) { for(int idx=0;idx<MAXSEG;idx++) bspac fidx] = fbwv_[idx] = 0;
} char const* QualCtrl::shft( int idxO, char* buf, int buflen ) const
{ if((buflen>=15) && (idxO<nseg())) { ShftVect s = shft(idxθ); ::sprintf( buf, "[%2hd.%2hd,%2hd,%2hd]", s.s(l),s.s(2),s.s(3),s.s(4));
} else
::strcpy( buf, "?" ); return (char const* )buf;
}
QualCtrl: :QualCtrl(QualCtrl const& rhs) : tbgnJO), tendJO), nseg JO), skipseqJNULL)
{ *this = rhs;
}
QualCtrl const& QualCtrl: :opcrator=(QuaICtrl const& rhs)
{ if(this != &rhs) { for(int idx=0;idx<MAXSEG;idx++) { shft [idx] = rhs.shft idx]; fbwv [idx] = rhs.fbwv [idx]; bspac [idx] = rhs.bspacjidx];
} tbgn_ = rhs.tbgn_; tend_ = rhs.tend_; nseg_ = rhs.nseg_; cutdata_ = rhs.cutdata_; if(skipseq_) delete [] skipseq_; skipseq_ = new char[ l+::strlen(rhs.skipseq_) ]; if(skipseq_) ::strcpy( skipseq_, rhs.skipseq_ ); } return *this; }
QualCtrl: :~-QualCtrl()
{ if(skipseqj { delete [] skipseq_; skipseq_ = NULL;
}
} void QualCtrl ::debug() const
{ -printfC'QualCtrl @ %p\n",(void const* )this); ::printf(" elapsed=%d(Sec)\n", tend_-tbgn ; cutdata_.debug(); for(int idx=0;idx<nseg_;idx++) {
: -printfC idx=%d: fbw=%2d, bspac=%2d\t",idx,fbwv_[idx],bspac_[idx]); shft_[idx].debug();
} } RdrOut::RdrOut( Wvfm const& in ) : overlayldxjl), oiSl J in.bgni()+l ), oPosLenJO), oPos_(NULL), previ Jl)
{ ::strcpy( lnordr_, in.lnordr() ); for(int idx=l ;idx<=OVRLAP;idx++) FM idx] = (-1.0/double(OVRLAP-l))*idx+(double(OVRLAP)/double(OVRLAP-l));
} RdrOut::~RdrOut()
{ if(NULL != oPosJ { ::free_ivector( oPos_, l,oPosLen_); oPos = NULL; oPosLen_ = 0; } } int RdrOut::add( int passnr, int fbw, int medSP, SegRead const& sr )
{ int rv; qualctrl_.shft( passnr- 1, sr.nsv() ); qualctrl_.fbw( passnr- 1, fbw ); qualctrl_.bspac( passnr- 1, medSP ); qualctrl_.nseg( passnr ); if( 1 = passnr) { bandStats_ = sr.bandStats(); traceOut_ = *sr.wvfm(); rv = l ;
} else if(l = (rv = was_at( sr, passnr ))) join( sr ); if(rv) { if(NULL != oPosJ
"free ivector( oPos_, l,oPosLen_ ); oPosLen_ = sr.bandStats().len(); if((0=oPosLenJ || (NULL = (oPos_=::ivector(l,oPosLenJ))) rv = 0; else for(int idx=l :idx<=oPosLen_;idx++) oPos_[idx] = sr.bandStats().posn(idx-l);
} return rv;
} void
RdrOut::join( SegRead const& sr ) { int IhsEndO, rhsBgnO.oscnl.nscnl. lShortFall.rShortFall;
IhsEndO = bandStats_.posn(overlayIdx_-l) + 1 - OVRLAP/2; rhsBgnO = sr.bandStats().posn(previ_-l) + 1 - OVRLAP/2; IShortFall = OVRLAP-(wvfm().rows()-lhsEndO+ 1 ); rShortFa!l = l-rhsBgnO; if(lShortFall > 0) {
IhsEndO -= IShortFall; rhsBgnO -= IShortFall; } else if(rShortFall > 0) {
IhsEndO += rShortFall; rhsBgnO += rShortFall;
} for(int idx=l ; idx<=OVRLAP; idx++) { double osf = FMJidxl, nsf = FMJOVRLAP-idx+1]; oscnl = lhsEndO+idx-1 ; nscnl = rhsBgnO+idx-1 ; for(int lane=l ;lane<=4;lane++) { double ov, nv; ov = wvfm().sc_la( oscnl. lane ); nv = sr.wvfm()->sc la( nscnl, lane ); wvfm().sc a_set( oscnl.lane, osf*ov + nsfnv );
} } wvfm().append( *sr.wvfm(), oscnl, nscnl+1 ); bandstat().append( sr.bandStats(),overlayIdx_-l,previ_ );
} void RdrOut::closesTo(int wasAt. int& dist, int& j) const { int lodx=l, hidx=oPosLen_, tsti, tstp, htsti, ltsti; Itsti = lodx; htsti = hidx; while(Iodx <= hidx) { tstp = oPosJ tsti = (hidx+lodx)/2 ]; if(wasAt == tstp) { dist = 0; j = tsti; return; } else if(wasAt < tstp) { htsti = tsti—; hidx = tsti;
} else { ltsti = tsti++; lodx = tsti;
} } int dlo = abs( wasAt - oPos Jltsti] ), dhi = abs( oPos [htsti] - was At ); if(dlo < dhi) { dist = dlo; j = ltsti; } else { dist = dhi; j = htsti; }
} int
RdrOut::was_at( SegRead const& sr, int n )
{ static int const CHKORDR[] = {0,5,6,4,7,3,8,2,9,1,10}; # define NCHK ((sizeof(CHKORDR)/sizeof(CHKORDR[0]))-l) int mindist=4, xlation[3], cdx, K; int Nband = sr.bandStats().len(): int rv = 0;
K = sr.iS 1 () - (oiS 1 _ + (n-2)* 1900); for(cdx=l; cdx<=NCHK; cdx++) { int Cdx = CHKORDRf cdx ]; for(int lane=0;lane<4;lane++) { if((Cdx<=Nband)&& (lnordr_[lane] == sr.bandStats().call(Cdx-l))) break;
} if(5 != ++lane) { int wasAt. distj; wasAt = (sr.bandStats().posn(Cdx-l) - qualctrl().shft(n-l).s(lane)) +
(K + qualctrl().shft(n-2).s(lane)); closesTo( wasAt. distj ); if(dist < mindist) { rv = l ; xlation[l] = Cdx; xlation[2] =j; if(0 = (mindist=dist)) break; }
} } if(l = rv) { overlayIdx_ += (xlation[2]-previ_); previ_ = xlationfl];
} return rv;
} void RdrOut::Edit( char const* preamble )
{ int lcutscnl. rcutscnl. NB; NB = bandStats_.len(); lcutscnl = bandStats_.posn(0); rcutscnl = bandStats_.posn(NB-l); bandqual(); pickcuts( preamble ); qualctrl_. stopTimer( ) ;
} static int const FW[] = { 7, 9, 1 1 , 13, 15 } ; static float const TH[] =
{ 0.74f. 0.75f, 0.76f, 0.77f, 0.78f, 0.79f, 0.80f }; static int const NW = (sizeof(FW)/sizeof(FW[0])), NT = (sizeof(TH)/sizeof(TH[0])); void RdrOut::cutoff(int fdx, SSNODE& ss. int FFTSZ) const
{ int NB = bandStats_.len(); int runlen=0, mxrl=0; ss.fltwid( FW[ fdx ] ); if(ss.fltwid() <= FFTSZ) { int LHS = (l+FW[fdx])/2, RHS = FW[fdx]-LHS; int anyGood=0, anyL2H=0, anyH2L=0; int idx, jdx, good, goodMl=0; for(idx=l;idx<=2*LHS;idx+=2) { HF J idx ] = 1.0/double(ss.fltwid());
HF [idx+l] = 0.0;
} for(;idx<=2*(FFTSZ-RHS);idx++) HF J idx ] = 0.0; for(;idx<=2*FFTSZ;idx+=2) {
HFJ idx ] = 1.0/double(ss.fltwid()); HFJidx+l] = 0.0;
} ::dfourl( HF_, FFTSZ, 1); ::dCMul( XF_, HF_, AF_, FFTSZ ); ::dCMul( TF_, HF_, BF_, FFTSZ ); ::dfourl( AF_, FFTSZ, -1); ::dfourl( BF_, FFTSZ, -1); for(jdx=idx=l ;jdx<=2*NB;idx++ jdx+=2) { if(good = (AF_[jdx] >= BF_[jdx])) anyGood++; if(l !=- idx) { int dg=good-goodM 1 ; if(dg>0) nL2HJ++anyL2H] = idx-1; else if(dg<0) nH2L_[++anyH2L] = idx- 1 ;
goodMl = good;
} if(anyGood) { if(anyGood == NB) { ss.start( 1 ); ss.fιnish( NB );
} else if(0=anyL2H) { ss.start( 1 ); ss.fιnish( nH2LJl] );
} else if(0==anyH2L) { ss.start( l+nL2HJl] ); ss.finish( NB ); } else { int *mL2H, *mH2L. mLn=anyL2H, mHn=anyH2L, lastH2L, lastL2H; mL2H = ::ivector(l,anyL2H+l); mH2L = ::ivector(l,anyH2L+l); IastH2L = nH2L_[ mHn ]; lastL2H = nL2H_[ mLn ]; if(nH2LJl] < nL2HJl]) { mL2H[l] = l; for(idx=l ;idx<=mLn;idx++) mL2H[idx+l ] = nL2H [idx]; mLn++; } else for(idx=l:idx<=anyL2II;idx++) mL2H[idx] = nL2H_[idx]; for(idx=l;idx<=mHn;idx++) mH2L[idx] = nH2L_[idx]; if(lastH2L < lastL2H) mH2L[++mHn] = NB; if(mHn == mLn) { for(idx=l;idx<=mHn;idx++) { runlen = mH2L[idx]-mL2H[idx]; if(runlen > mxrl) { mxrl = runlen; ss.start( mL2H[idx] ); ss.fιnish( mH2L[idx] ); } } ::free ivector(mL2H, l,anyL2H+l ); ::free vector(mH2L, l ,anyH2L+l );
}
}
} void RdrOut::pickcuts( char const* preamble )
{ static float const WGHT[6][9] =
{ {0.8944f,0.9076f,0.9208f,0.9340f,0.9472f,0.9604f,0.9736f,0.9868f,1.0000f},
{0.8755f,0.8885f,0.9014f,0.9143f,0.9272f,0.9401f,0.9530f,0.9660f.0.9789f},
{0.8567f,0.8693f,0.8819f,0.8946f,0.9072f,0.9199f,0.9325f,0.9451f,0.9578f),
{0.8378f,0.8501f,0.8625f,0.8749f,0.8872f,0.8996f,0.9119f,0.9243f,0.9367f},
{0.8189f.0.8310f,0.8430f,0.8551f,0.8672f,0.8793f,0.8914f,0.9035f,0.9155f}, {0.8000f,0.8118f,0.8236f,0.8354f,0.8472f,0.8590f,0.8708f,0.8826f.0.8944f}
} ; int jdx. idx. FFTSZ, NB = bandStats__.len(), mxrdlen=0;
FFTSZ = int( ::pow( 2.0, ceil(::log(double(NB))/::log(2.0)) ) );
XF_ = ::dvector( 1 , FFTSZ*2 ); TF_ = : :dvector( 1 , FFTSZ*2 );
HF_ = ::dvector( 1, FFTSZ*2 );
AF_ = ::dvector( 1, FFTSZ*2 );
BF_ = ::dvector( 1, FFTSZ*2 ); nH2L_ = ::ivector( 1,NB ); nL2H_ = ::ivector( l,NB ); for(jdx=idx=l ;idx<2*NB;jdx++,idx+=2) { XF_[ idx ] = double( bandStats_.qual(jdx-l) ); XF idx+1 ] = 0.0;
} for(; idx<=2*FFTSZ; idx++)
XF_[ idx ] = 0.0; ::dfourl( XF_, FFTSZ, 1); forønt tdx=0;tdx<NT;tdx++) { for(idx=l:idx<2*NB;idx+=2) { TF_[ idx ] = double( TH[tdx] );
TF idx+1 ] = 0.0; } for(; idx<=2*FFTSZ; idx++) TF_[ idx ] = 0.0;
::dfourl( TF_, FFTSZ, 1); for(int fdx=0;fdx<NW;fdx++) { SSNODE ss; ss.thresh( TH[tdx] ); cutoff( fdx, ss, FFTSZ ); int wt_rdlen = int(WGHT[fdx][tdx] * ss.rdlen()); if(mxrdlen < wt_rdlen) { mxrdlen = wt rdlen; qualctrl_.cutdata( ss ); } } } : :free_dvector( XF_. 1 ,2*FFTSZ );
::free_dvector( HF_, 1,2*FFTSZ ); ::free_dvector( TF_, 1,2*FFTSZ ); ::free_dvector( AF_, 1,2*FFTSZ ); ::free_dvector( BF_, 1,2*FFTSZ ); ::free_ivector( nH2L_, 1 ,NB );
-free ivector( nL2H_, 1,NB ); if(NULL != preamble) { int lpre2x = ::strlen(preamble) * 2; int lobs = bandstat().len(); int len = (lpre2x<lobs)? Ipre2x: lobs; if(qualctrl().cutdata().start() < len) { char* obscopy = new char[ len+1 ]; if(NULL != obscopy) { static int const MAGIC NR = 4; int idx, s; for(idx=0;idx<len;idx++) obscopy[idx] = bandstat().call(idx); obscopy [idx] = '\0'; SW swalign( preamble, obscopy ); s = qualctrl().cutdata().start(); if(swalign.hcoord() > s) { int hslen = ::strlen(swalign.hout()); if(hslen >= MAGIC J^R) { int hitval = (swalign.score()*swalign.score())/hslen; if(hitval > MAGIC_NR) { qualctrl().cutdata().SWold( s ); qualctrl().cutdata().start( swalign.hcoord()+l );
}
}
} delete [] obscopy;
} } } } void
RdrOut::Beautify(int fbool nt bbool)
{ if(l=fbool) flatten J); if(l=bbool) minNegSwingJ); } void
RdrOut:: flatten J)
{ int idx, jdx, kdx, N, FFTSZ; double *sn, *hn;
N = traceOut_.endi()-traceOut_.bgni()+l ; FFTSZ = int(::pow(2.0,::ceil(::log(double(N+129-l))/::log(2.0)))); sn = ::dvector(l,2*FFTSZ); hn = ::dvector( l ,2*FFTSZ); for(idx=l; idx<=2*FFTSZ; idx++) hnfidx] = sn[idx] = 0.0; for(jdx= 1 ,idx=traceOut_.bgni(); jdx<=2*N; idx++, jdx+=2) sm dx] = traceθut .enw(idx); for(kdx=2*FFTSZ-l,jdx-=idx=l ; idx<=128; idx++,jdx+=2.kdx-=2) hnfjdx] = hn[kdx] = 1.0/129.0; hnfjdx] = 1.0/129.0;
dfourK sn. FFTSZ, 1 ); :dfourl( hn, FFTSZ, 1 ); :dCMul( sn, hn, sn, FFTSZ ); :dfourl( sn, FFTSZ, -1 ); for(kdx=l,idx=traceOut_.bgni(); idx<=traceOut_.endi(); idx++,kdx+=2) { double sf = double(FFTSZ)/(DBL_EPSILON+sn[kdx]); for(j dx= 1 ;jdx<=4 ;j dx++) traceOut_.se la_mul(idx dx, sf);
} ::free_dvector(sn,1.2*FFTSZ);
: :free_dvector(hn, 1 ,2*FFTSZ);
} void
RdrOut: :minNegSwing J) { for(int lnr=l;lnr<=traceOut_.lanes();lnr++) { double minv = 0.0; int bgn=traceOut_.bgni(), end=traceOut_.endi(); for(int snr=bgn; snr<=end; snr++) { double v = traceOut_.scJa(snr,lnr); if(v < minv) minv = v; } if(0.0 > minv) { double sf = -0.02/minv; for(snr=bgn;snr<=end;snr++) if(traceOut_.sc la(snr,lnr) < 0.0) traceOut_.sc la_mul(snr,lnr,sf); } } } int
RdrOut::avgqual() const
{ float aq = O.Of; int bgn = qualctrl().cutdata().start(), end = qualctrl().cutdata().fιnish();
BandStatArray const& bs = bandstat(); for(int idx=bgn;idx<end;idx++) aq += int(100.0f * bs.qual(idx)); return int(0.5f + aq/float(end-bgn+l)); } int RdrOut::percentN() const
{ BandStatArray const& bs = bandstat(); int bgn = qualctrl().cutdata().start(), end = qualctrl().cutdata().fιnish(), ambig = 0; for(int idx=bgn;idx<end;idx++) if(lnordr_[4] == bs.call(idx)) ambig++; return int(0.5f + 100.0Pfloat(ambig)/float(end-bgn+l)); } char const*
RdrOut::sequence( char* buf, int buflen ) const
{ BandStatArray const& bs = bandstat(); int len = (bs.len()<buflen)? bs.len(): buflen; for(int idx=0;idx<len;idx++) buf[idx] = bs.call(idx); buf[idx] = '\0'; return (char const *)buf;
} void
RdrOut::debug() const
{ ::printf("RdrOut at %p\n",(void const*)this);
::printf(" oiSl = %d\n",iSl());
::printf(" average quality = %2d, %%ambig = %2d\n",avgqual(),percentN()); qualctrl_.debug(); band Stats . debug( ) ; traceOut_.debug();
}
/*** ******************************************************** ***********
* FILE: RdrOut.hxx * AUTHOR: Andy Marks
* COPYRIGHT (c) 1996, University of Utah */
#ifndef _RDROUT_HXX_ #defιne _RDROUT_HXX_ #include <time.h>
#include <basecall/Metrics.hxx> static int const MAXSEG = 6; static int const OVRLAP = 20; #if defιned(WIN32) class _declspec( dllexpoπ ) SSNODE #else class SSNODE #endif
{ public: SSNODE0;
-SSNODE0 {;} int fltwidO const { return fltwid_; } int start() const { return start_; } int finishO const { return finish_; } int rdlen() const { return fιnish_-start_+ 1 ; } int S Wold() const { return SWold_; } float thresh() const { return thresh_; } void fltwid(int w) { fltwid_ = w; } void start(int s) { start_ = s; } void fιnish(int f) { fιnish_ = f; } void SWold(int s) { S Wold_ = s; } void thresh(float t) { thresh_ = t; } void debug() const; private: int fltwid_, start_, finish , SWold_; float thresh_;
#if defιned(WIN32) class _declspec( dllexport) QualCtrl
#else class QualCtrl
#endif { public: QualCtrl();
QualCtrl( QualCtrl const& rhs ); QualCtrl const& operator=( QualCtrl const& rhs ); -QualCtrlO; void shft( int idxO, ShftVect const& sv ) { if(idxO<MAXSEG) shft_[idxO] = sv; } void fbw( int idxO, int val ) { if(idxO<MAXSEG) fbwv [idxO] = val; } void bspac( int idxO, int val )
{ if(idxO<MAXSEG) bspacJidxO] = val; } void nseg( int v ) { nseg_ = v; } void cutdata( SSNODE const& v ) { cutdata_ = v; } void ignoredSeq( char const* seq ); void startTimerO { tbgn_ = time((time t*)NULL); } void stopTimer() { tend_ = time((time t*)NULL); } int nseg() const { return nseg_; }
SSNODE& cutdata() { return cutdata_; }
SSNODE const& cutdata() const { return cutdata_; } char const* ignoredSeq() const { return NULL; } char const* shft( int idxO, char* buf, int buflen ) const; ShftVect shft( int idxO ) const
{ return (idxO<nseg())? shft_[idxO]: shftJO]; } int fbw( int idxO ) const { return (idxO<MAXSEG)? fbvw idxO]: -1 ; } int bspac( int idxO ) const { return (idxO<MAXSEG)? bspacJidxO]: -1 ; } time t runtime() const { return tend -tbgn_+ 1 ; } void debug() const; private: ShftVect shftJMAXSEG]; int fbwvJMAXSEG]; int bspac_[MAXSEG]; time t tbgn_, tend_; int nseg_;
SSNODE cutdata_; char* skipseq_;
} ;
#if defιned(WIN32) class _declspec( dllexport) RdrOut #else class RdrOut #endif
{ public:
RdrOut( Wvfm const& in );
~RdrOut(); void iSl( int iSl ) { oiSl_ = iSl; } int add( int passnr, int fbw, int medSP, SegRead const& sr ); int iSl() const { return oiSl ; } int avgqual() const; int percentN() const;
BandStatArray& bandstat() { return bandStats_; }
BandStatArray const& bandstat() const { return bandStats_; } char const* sequence(char* buffer,int buflen) const;
Wvfm& wvfm() { return traceOut_; } Wvfm const& wvfm() const { return traceOut_; }
QualCtrl& qualctrl() { return qualctrl_; }
QualCtrl const& qualctrl() const { return qualctrl_; } void Edit( char const* preamble = NULL ); void Beautify( int fbool, int bbool); void debug() const; private: int overlayIdx_, oiSl_, oPosLen_,
*oPos , previ_; char lnordr [6];
BandStatArray bandStats_; Wvfm traceOut_;
QualCtrl qualctrl_; double FM_[l+OVRLAP]; double *XF_,
*ΎY *HF_.
*AF_, *BF_; int *nH2L_,
*nL2H_; int was_at(SegRead const& sr, int passnr); void closesTo(int wasAt, int& dist, int& j) const; void join( SegRead const& sr ); void bandqual(); void pickcuts( char const* preamble ); void cutoff( int fdx. SSNODE& ss, int FFTSZ ) const; void flattenj); void minNegSwingJ);
} ;
#endif
i*********** **** **** **** ******************** ********* *********** ** *****
* FILE: SegRead.cxx
* AUTHOR: Andy Marks
* COPYRIGHT (c) 1996, University of Utah */ #include <basecall/mb.hxx>
SegRead: :SegRead( int iSl , int fluor ) : status JSTS JNITD), iSl J iSl ), nWvf JNULL), pmb JNULL)
{ pmb_ = new MB(fluor); }
SegRead: :~SegRead()
{ if(nWvfJ { delete nWvf_; nWvf_ = NULL; } if(pmb { delete pmb_; pmb_ = NULL; } }
SegRead: :SegRead( SegRead const& rhs ) : status JSTS JNINITD), iSl JO), nWvf JNULL), pmb JNULL)
{ *this = rhs; }
SegRead const&
SegRead: :operator=( SegRead const& rhs )
{ if(this != &rhs) { if(nWvfJ { delete nWvf_; nWvf_ = NULL; } if(pmb _) { delete pmb_; pmb_ = NULL; } nWvf_ = new Wvfm( *rhs.nWvf_ ); pmb_ = new MB( *rhs.pmb_ ); iSI_ = rhs.iSl_; nsv_ = rhs.nsv_; bsa_ = rhs.bsa_; status_ = rhs. status ;
} return *this;
} void
SegRead: :wvfm( Wvfm const& rhs )
{ if(NULL != nWvf { delete nWvf_; nWvf_ = NULL; } nWvf_ = new Wvfm( rhs ); } float*
SegRead:. -xbndaraj PKDET const& pks ) const
{ int idx. N = pks.npk(); float* xbnd; if(NULL != (xbnd = ::vector(l,N))) for(idx=l ; idx<=N; idx++) xbnd[ idx ] = float(nWvf_->xbnd( pks.bmid( idx ) )); return xbnd; } float*
SegRead "buzzaraj PKDET const& pks ) const
{ int idx. N = pks.npk(); float* buzz; if(NULL != (buzz = ::vector(l,N))) for(idx=l ; idx<=N; idx++) buzzf idx ] = float(nWvf_->buzz( pks.bmid( idx ) )); return buzz;
} int
SegRead: :peakdet( int npts, PKDET& putative ) const
{ double vml,v; int *ppk=NULL,Np=0, *ptr=NULL,Nt=0, idx; Wvfm const& wvfm = *nWvf_; short MAXSV = nsv_.maxshft(); vml = wvfm.envv( MAXSV+1 ); v = wvfm.enw( MAXSV+2 ); ppk = ::ivector( 1 , npts ); ptr = ::ivector( 1 , npts ); if(NULL == ppk || NULL == ptr) return 0; enum STATE { ST JK, ST_UP, ST DN } st = ST UK; for(idx = MAXSV+2; idx <= npts; idx++ ) { switch(st) { case STJJK: if(v > vml) st=ST_UP; else if(v < vml) st=ST_DN; break; case ST_UP: if(v < vml) { st = ST_DN; ppk[ ++Np ] = idx-1; } break; case ST_DN: if(v > vml) { st = ST_UP; ptr[ ++Nt ] = idx-1 ; } break;
} vml = v; v = wvfm.envv( idx+1 );
} putative.set( ppk, Np, ptr, Nt ); ::free ivector( ppk, l,npts );
-free ivector( ptr, l,npts ); return 1 ;
} void SegRead: :maxlanecode ( int have, int* bcodes ) const
{ short MAXSV = nsv().maxshft(); for(int sdx=MAXSV+l ; sdx<=int(have); sdx++) { double THR = 0.80*nWvf_->envv(sdx); int Idx, code = 0; for(ldx=l ; 5!=code && ldx<=int(nWvf_->lanes()); ldx++) if(nWvf_->scJa(sdx,ldx) >= THR) code = (0==code)? Idx: 5; bcodes [ sdx ] = code; }
} static void dfliplr(double* p, int n)
{ int idx. ndx: double t; for(idx=l,ndx=n;idx<=(n/2);idx++,ndx~) { t = p[idx]; p[idx] = p[ndx]; p[ndx] = t; }
} int
SegRead::setBandStats(PKDET const& final nt FBW,char const* seq.int pass2) int NPK = final.npkO, *insSP=NULL, *insWD=NULL; int rv = 1 ; double f_sigma = (double(FBW-l)*2.0*NR_PI)/2048.0; double t_sigma = 1.0/f_sigma; int M = (final.medWidO + l)/2; float PATTCOEF[4], *xbnd, *buzz; if(l = pass2) { double *PX = ::dvector(l,M), *PY = ::dvector(l ,M); int sts; if(!PX || !PY) return 0; for(int pdx=l ;pdx<=M;pdx++) { double v;
PX[pdx] = double(pdx); v = (PX[pdx]-((double(M)+ 1.0)/2.0))/(t_sigma/2.0);
PY[pdx] = ::exp(-0.5*v*v);
} sts = ::dquadratic( PX,PY, M, PATTCOEF ); : :free_dvector(PX, 1 ,M); ::free_dvector(PY,l,M); if(l != sts) return 0;
} if(NULL == (insSP = ::ivector( 1, NPK ))) { status_ = STS_NO_MEM; return 0;
} if(l != ::insMetric( fιnal.bmid(), fmal.lgap(), insSP, NPK )) { status_ = STS_TOO_FEW; -free Jvector(insSP,l , NPK); return 0;
} if(NULL == (ins WD = ::ivector( I, NPK ))) { status_ = STS_NO_MEM; : :free Jvector(insSP, 1 ,NPK); return 0; } if(l != ::insMetric( fιnal.bmid(), fιnal.bwid(), ins WD, NPK )) { status_ = STS_TOO_FEW; ::free Jvector(insSP,l ,NPK); : :free Jvector(insWD, 1 ,NPK); return 0;
} if(NULL == (xbnd = xbndara_( final ))) { status_ = STS_NO_MEM; ::free ivector(insSP, l.NPK); ::freeJvector(insWD,l,NPK); return 0;
} if(NULL == (buzz = buzzara J final ))) { status_ = STS_NO_MEM; ::freeJvector(insSP,l,NPK);
::freeJvector(insWD,l,NPK); ::free_vector(xbnd,l ,NPK); return 0;
} BandStatArray& bsa = bandStats(); bsa.init( NPK ); for(int idxO = 0; idxO < NPK; idxO++) { int idxl = idxO+1, scanl = fιnal.bmid( idxl ); float onsv, ofsv; bsa.ntnr( idxO, idxl ); bsa.posn( idxO, scanl ); bsa.hght( idxO, float(nWvf_->enw( scanl ) )); onsv = float(nWvf_->envv(final.bbgn(idxl))); ofsv = float(nWvf_->enw(fιnal.bend(idxl))); bsa.lowv( idxO, onsv<ofsv?onsv:ofsv ); bsa.xbnd( idxO, xbnd[ idxl ] ); bsa.buzz( idxO, buzz[ idxl ] ); bsa.insr( idxO, fιnal.ins(idxl) ); bsa.call( idxO, seq[ idxl ] ); if(pass2) { double cc = -1.0; int N = fιnal.bwid(idxl); if(N>4) { double *BX, *BY, miny = 1000.0, cclr, ccrl; float bandcoef[4]; int wdx, scanl = final . bbgn(idx 1 ) ;
BX = ::dvector(l,N); BY = ::dvector(l,N); for(wdx=l ;wdx<=N;wdx++) { BX[wdx] = double(wdx); BYfwdx] = nWvf_->envv( scanl++ ); if(BY[wdx] < miny) miny = BY[wdx];
} for(wdx=l ;wdx<=N;wdx++)
BY[wdx] -= miny; if(l != ::dquadratic( BX, BY, N, bandcoef )) { rv = 0; status_ = STS_TOO_FEW; goto bugout;
} cclr = ::corrcoef( PATTCOEF, bandcoef, 4 );
::dfliplr( BY. N ); if(l != ::dquadratic( BX, BY, N, bandcoef )) { rv = 0; status_ = STS_TOO JEW; goto bugout; } ccrl = ::corrcoef( PATTCOEF, bandcoef, 4 ); cc = (cclr>ccrl)? cclr: ccrl; if(isnan(cc)) cc = -1.0; ::free_dvector(BX,l,N); ::free_dvector(BY,l,N);
} bsa.shap( idxO, float(cc) ); double width, iwidth; width = double( final. bwid(idxl) ); iwidth = double( insWD[ idxl ] ); bsa.widt( idxO, float(width/iwidth - 1.0) ); bsa.bbgn( idxO, fιnal.bbgn(idxl) ); bsa.bend( idxO, final. bend(idxl) ); float igap,slope,intercept, lfgap,rtgap,biggap,smlgap; igap = float(insSP[idx 1 ]); if(0.0f = igap) { bsa.lgap( idx0, 0.5f); bsa.sgap( idxO, 0.5f ); rv = 0; goto bugout;
} else { igap = float(insSP[idxl]); slope = igap/4.0f; intercept = igap/2.0f; lfgap = float( final.lgap(idxl) ); rtgap = float( final.rgap(idxl) ); if(lfgap>rtgap) { biggap = lfgap; smlgap = rtgap; } else { biggap = rtgap; smlgap = lfgap; } if(biggap < slope) biggap = intercept - biggap* slope; if( smlgap < slope) smlgap = intercept - smlgap*slope;
#if !defined(_WTN32) bsa.lgap( idxO, fmod(biggap,igap)/igap ); bsa.sgap( idxO, fmod(smlgap,igap)/igap ); #else bsa.lgap( idxO, float(fmod(biggap,igap)/igap) ); bsa.sgap( idxO. float(fmod(smlgap,igap)/igap) ); #endif
} } } bugout:
::free_vector( buzz, l,NPK ); buzz = NULL; ::free_vector( xbnd, l ,NPK ); xbnd = NULL; ::freeJvector( ins WD, 1,NPK ); ins WD = NULL; ::free_ivector( insSP, 1 ,NPK ); insSP = NULL; return rv;
} void
SegRead: :fbwlut( int spacing, int& fbw ) const { if(spacing < SMLGAP) spacing = SMLGAP; else if(spacing > BIGGAP) spacing = BIGGAP; fbw = pmb_->fbwlut [ spacing ]; } * FILE: SegRead.hxx
* AUTHOR: Andy Marks
* NOTE: To use this easily include seqrdr.hxx, which
* includes all the headers this needs. * COPYRIGHT (c) 1996, University of Utah
*/
#ifndef _SEGREAD_HXX_ #define _SEGREAD_HXX_ static int const STATIC_BUF_SZ = NPTS/2; class MB; class PKDET; class SegRead
{ public: enum Status { STS J INITD, STS JNITD, STS_NO_MEM, STS_BUF2SMALL,
STS_TOO_FEW };
SegRead( int iSl, int fluor );
~SegRead();
SegRead( SegRead const& rhs ); SegRead const& operator=( SegRead const& rhs ); void iSl( int v ) { iSl_ = v; } void wvfm( Wvfm const& rhs ); int iSl() const { return iSl_; }
Wvfm* wvfm() { return nWvf_; } Wvfm const* wvfm() const { return nWvf_; }
ShftVect& nsv() { return nsv_; }
ShftVect const& nsv() const { return nsv_; }
BandStatArray& bandStats() { return bsa_; }
BandStatArray const& bandStats() const { return bsa_; } void fbwlut( int spacing, int& fbw ) const; int fBandSpace( int& medSP ) const; int nrefine( char const* Inordr, int FBW, int have, int p2 ); void blindeconv( Wvfm& sr, int Fbw ); int xtranorm( Wvfm& bd, Wvfm const& source, int nscanl, int pass2 ); int setBandStats(PKDET const& final.int FBW,char const* seq.int pass2); int peakdet( int npts, PKDET& putative ) const;
Status status() const { return status ; } void debug() const; private:
Status status ; int iSl_;
Wvfm *nWvf_;
ShftVect nsy_;
BandStatArray bsa_;
Band nband_[ STATIC_BUF_SZ ]; char seq_ [ STATIC_BUF_SZ ]; void maxlanecode J int n, int* bcode ) const; float* xbndaraj PKDET const& putative ) const; float* buzzaraj PKDET const& putative ) const; int centroid Jint bgn, int end) const; MB *pmb_;
};
#endif
/********************************************************************** * FILE: seqrdr.hxx
AUTHOR: Andy Marks
* COPYRIGHT (c) 1996, University of Utah
*/
#ifndef_SEQRDR_HXX_ #define _SEQRDR_HXX_ static int const NPTS = 2048; .-. include <stdio.h>
#include <math.h>
#include <limits.h>
#include <float.h> #include <stddef.h>
^include <stdlib.h>
#include <string.h>
#include <nrc/nr.hxx>
#include <basecall/Wvfm.hxx> #include <basecall/Metrics.hxx>
#include <basecall/RdrOut.hxx>
#include <basecall/AboutBQ.hxx>
#if defined(sun)
# include <ieeefp.h> #endif int d_cmp( void const*, void const* ); double corrcoef( float const* vl, float const* v2, int n ); int insMetric( int const* px, int const* py, int* ytmp, int N );
#if defined( cplusplus) extern "C" {
#endif int omitokn( int const* i, float const* h, float const* 1, int const* lg, int const* rg, float const* x, int n, float** o); int gapcheck( int const* ig,int const* iw,int const* g,int const* w, char const* seq, int n, float** o);
#if defined( cplusplus)
}
#endif #endif /*********** ** ********* ************************* *** ********************
* FILE: ShftVect.cxx AUTHOR: Andy Marks
* Copyright (c) 1995,1996. University of Utah */
#include <basecall/mb.hxx> static const int NCUBES = 6,
M = (27*NCUBES) - (NCUBES- 1 ); class Criterion
{ public:
CriterionO : valJO.O), idx JO) {;} -CriterionO {;} Criterion const& operator=( Criterion const& rhs); void debug( int lvl = 0 ) const; double val_; int idx_;
} ; class ShftVects
{ public:
ShftVects( int len );
ShftVects( ShftVect const& sv ); ShftVects( ShftVects const& rhs );
ShftVects const& operator=( ShftVects const& rhs );
~ShftVects();
ShftVect align( Wvfm const& t ); void debug( int lvl = 0 ) const; private: void evaluate( Wvfm const& t ); void best( int nelem ); int terminate(); int len_; ShftVect* svm_; Criterion* crit_; int mni_, mxi_; int loops_;
} ;
ShftVect mcalign( Wvfm const& trace, ShftVect const& sv )
{ ShftVects svm( sv ); return svm.align( trace );
} class Tri Vects
{ public:
TriVectsO;
TriVect const& operator[](int idx) const { return tvm [idx]; } -TriVectsO; void debug() const; private: int len_;
TriVect* tvm_; };
Criterion const-fe
Criterion::operator=( Criterion const& rhs)
{ if(this != &rhs) { idx_ = rhs.idx_; val_ = rhs.val_; } return *this;
} void Criterion::debug( int lvl ) const
{ "printfC'Criterion: val=%8.3f idx=%2d\n",val_,idx ;
} static const TriVects tvm; TriVect: :TriVect()
{ t[0] = t[l] = t[2] = 0;
}
TriVect: :TriVect( TriVect const& rhs ) { t[0] = rhs.t[0]; t[l] = rhs.t[l] t[2] = rhs.t[2]
} TriVect
TriVect::operator+( TriVect const& rhs ) const
{ TriVect s( rhs ); s.t[0] += t[0]; s.t[l] += t[l]; s.t[2] += t[2]; return s;
} void TriVect: :debug( int lvl ) const
{ -printfC'TriVect: [%hd,%hd,%hdj\n",t[0],t[l ],t[2]);
}
TriVects::TriVects() : lenJM), tvm JNULL)
{ static int const radiif] = { 4, 8, 12, 16, 20, 24 }; int idx = 1 ; tvm_ = new TriVect[ len_ ]; if(NULL = tvm J { ::fprintf(stderr,"TriVects::TriVects() out of memory Λn"); exit(l);
} tvm 0].t[0] = tvm 0].t[l] = tvm_[0].t[2] = 0; for(int s=0;s<NCUBES;s++) { int ko[3]; ko[0] = -radii[s]; ko[l] = 0; ko[2] = radii[s]; for(int i0=0; i0<3; i0++) for(int il=0; il<3; il++) for(int i2=0; i2<3; i2++) if(l !=i0 || l !=il || l !=i2) { tvm idx].t[0] = ko[i0]; tvm Jidx].t[l] = ko[iI]; tvm idx].t[2] = ko[i2]; idx++;
} }
}
TriVects: :~Tri Vects()
{ if(tvm J { delete [] tvm_; tvm_ = NULL; len_ = 0; } } void TriVects::debug() const
{ -printfC'TriVects @ %p\n",(void*)this); ::printf("\tlen_ = %ld\n",lenj; for(int idx = 0; idx < len_; idx++) tvm_[idx].debug();
}
ShftVect: :ShftVect()
{ sJ0] = s l ] = sJ2] = s [3] = 0;
I
ShftVect::ShftVect(short vl , short v2, short v3, short v4)
{ s_[0] = v 1 ; s 1 ] = v2; s 2] = v3 ; s 3] = v4; }
ShftVect: :ShftVcct( TriVect const& tv )
1 l int min = 1000; int idx; s O] = tv.t[0]+tv.t[l]+tv.t[2]; s [l] = tv.t[l ]+tv.t[2]; sJ2] = tv.t[2]; sJ3] = 0; for(idx = 0; idx < 4; idx++) if(s idx]<min) min = s idx]; for(idx = 0; idx < 4; idx++) s [idx] -= min;
TriVect
ShftVect: :trivec() const f t TriVect t; short mv = -s_[3]; t.t[2] = s [2]+mv; t.t[l] = s l]-t.t[2]+mv; t.t[0] = sJ0]-t.t[l]-t.t[2]+mv; return t;
} short
ShftVect: :maxshft() const { short mx = s_[0]; for(int idx=l ;idx<=3;idx++) if(s [idx] > mx) mx = s Jidx]; return mx;
}
ShftVect
ShftVect: :ralt() const
{ short a,g,c,t, mn; a = sJO] + short(5.0*drand48()) - 2; g = s l ] + short(5.0*drand48()) - 2; c = s_[2] + short(5.0*drand48()) - 2; t = sJ3] + short(5.0*drand48()) - 2; mn = a; if(g<mn) mn = g; if(c<mn) mn = c; if(t<mn) mn = t;
ShftVect sv(a-mn,g-mn.c-mn,t-mn); return sv;
} void
ShftVect: :debug( int lvl ) const
{ ::printf("ShftVect @ %p:",(void*)this); ::printf("\ts_: [%hd,%hd,%hd,%hd]\n",sJ0],sJl],sJ2],sJ3]);
}
ShftVects::ShftVects( int len ) : lenjlen), svm_(NULL), crit_(NULL), mni JO), mxijlen_-l), loops JO)
{ svm_ = new ShftVectf len_ ]; crit_ = new Criterion[ len_ ]; if(!svm_ || lcritj { ::fprintf(stderr,"ShftVects::ShftVects(%ld) out of memory Λn'MenJ; exit(l); } for(int idx = 0; idx < len_; idx++) {
Figure imgf000193_0001
crit idx].val_ = 0.0;
} }
ShftVects::ShftVects( ShftVects const& rhs ) : len JO), svm JNULL), crit JNULL), mni JO), mxi JO), loops JO)
{ *this = rhs; }
ShftVects: :ShftVects( ShftVect const& sv ) : lenJM), svm JNULL), crit JNULL), mni JO), mxiJM-1), loops JO)
{ TriVect t = sv.trivec(); svm_ = new ShftVectf len_ ]; crit_ = new Criterionf len_ ]; if(!svm_ || lcritj { ::fprintf(stderr,''ShftVects::ShftVects(ShftVect&) out ofmemory.\n''); exit(l);
} for(int idx = 0; idx < len_; idx++) {
ShftVect sv( tvm[ idx ] + 1 ); svm Jidx] - sv; crit [idx].idx_ = idx; crit [idx]. val_ = 0.0; }
} ShftVects::~ShftVects()
{ if(svmj { delete [] svm_; svm_ = NULL; } if(crit { delete [] crit_; crit_ = NULL; }
}
ShftVects const&
ShftVects ::operator=( ShftVects const& rhs )
{ if(this != &rhs) { if(svm J { delete [] svm_; svm_ = NULL; } if(crit J { delete [] crit_; crit_ = NULL; } len_ = rhs.len_; svm_ = new ShftVectf len_ ]; crit_ = new Criterionf len ]; if(!svm_ || lcritj { ::fprintf(stderr,"ShftVects::operator= out of memory An"); exit(l);
} for(int idx=0; idx<len_; idx++) { svm_[idx] = rhs.svm idx]; crit Jidx] = rhs.critjidx];
} mni_ = rhs.mni_; mxi_ = rhs.mxi_; loops_ = rhs.loops_;
} return *this;
} void ShftVects: :evaluate( Wvfm const& trace )
{ for(int idx=mni_;idx<=mxi_;idx++) { static int const BGNPT = 299, ENDPT = BGNPT+1399; crit idx ].val_ = 0.0;
Figure imgf000195_0001
trace.envelope( svm [idx] ); if(Wvfm::MAXJNTEGRAL == trace.method()) for(int sdx=BGNPT; sdx<=ENDPT; sdx++) crit_[ idx ].val_ += trace.enw( sdx ); else for(int sdx=BGNPT; sdx<=ENDPT; sdx++) crit [ idx ].val_ += trace.xbnd( sdx ); } } void
ShftVects ::debug(int lvl) const
{ ::printf("ShftVects @ %p\n",(void*)this); "printf("\tlen_ = %ld\n",len ; for(int idx = 0; idx < len_; idx++) {
::printf("\t%ld ",idx); svm idx].debug(lvl); crit_[idx].debug(lvl);
} ::printf("\tmxi = %ld mni = %ld loops_=%ld\n",mxi_,mni_,loopsJ; } static int c_cmp( void const* el, void const* e2 )
{ double vl = ((Criterion const* )el)->val_; double v2 = ((Criterion const*)e2)->val_; if(vl > v2) return 1 ; else if(vl < v2) return -1 ; else return 0;
} void
ShftVects: :best( int topN )
{ ShftVects tmpSvs( topN ); tmpSvs.loops_ = loops_; ::qsort( (void*)crit_, len_, sizeof(*crit , c_cmp ); for(int tdx=0; tdx<topN; tdx++) { int sdx = len_ - topN + tdx; int fdx = crit sdx ].idx_; tmpSvs.svm Jtdx] = svm fdx ]; tmpSvs.critJtdx] = critj sdx ];
} *this = tmpSvs;
} int ShftVects: :terminate()
{ static int const MAXLOOPS = 100; double ratio; mni_ = mxi_ = 0; for(int idx= 1 ; idx<len_; idx++) { double v = crit Jidx]. val_; if(v >= crit JmxiJ. val J { mxi_ = idx; } if(v <= crit JmniJ. val J { mni_ = idx; }
} svm [mni J = svm J mxi_ ].ralt(); ratio = crit Jmni . val /crit [mxi J.val_; return int(((ratio > 0.97) && (loops_ >-= 50)) || (loops_ > MAXLOOPS));
}
ShftVect
ShftVects::align( Wvfm const& trace ) { while(l) { evaluate( trace ); if(l = ++loopsJ best( 25 ); if(terminate()) break; else mxi_ = mni_;
} return svm JmxiJ;
}
#ifdefined(SA) int main(int argc, char const* argvf])
{ ShftVect sv, rsv;
Wvfm wvfm( Wvfm::MIN XBANDING ); wvfm.debug(); rsv = mcalign( wvfm, sv );
::printf("The best alignemt is with "); rsv.debug(); return 0;
} #endif
* FILE: ShftVect.hxx AUTHOR: Andy Marks
* Copyright (c) 1995,1996 University of Utah */
#ifndef_SHFTVECT_HXX_ #define _SHFTVECT_HXX_ struct TriVect
{ TriVectO;
TriVect operator+( TriVect const& rhs ) const;
TriVect( TriVect const& rhs );
-TriVectO {}; void debug( int lvl = 0 ) const; short t[3];
};
#ifdefined( WIN32) class _declspec( dllexport) ShftVect #else class ShftVect #endif { public:
ShftVect();
ShftVect( short vl, short v2, short v3, short v4 ); ShftVect( TriVect const& tv );
TriVect trivecO const; ShftVect ralt() const; void debug( int lvl = 0 ) const; short s( int idx ) const { return sjidx-l]; } short maxshft() const; private: short s_[4];
}; class Wvfm; ShftVect mcalign( Wvfm const& trace, ShftVect const& sv ); #endif
* FILE: spline.cxx * TYPIST: Andy Marks
* Human Genetics Dept
* Univ ofUtah
* DATE: Fri Mar 15 12:34:18 MST 1996
*/ #include <stdio.h>
#include <nrc/nr.hxx>
#if !defmed(SA) void spline( float const x[], float const y[], int n, float ypl, float ypn, float y2f] ) { int i, k; float qn,un,*u; u = vector(l,n-l); if(ypl > 0.99e30) y2[l] = u[l] = 0.0f; else { y2[l] = -0.5f; ufl] = (3.0f7(x[2]-x[l])) * ((y[2]-y[l])/(x[2]-x[l]) - ypl);
} for(i=2; i<=n-l; i++) { float p, sig; sig = (x[i]-x[i-l]) / (x[i+l]-x[i-l]); p = sig*y2[i-l] + 2.0f; y2[i]=(sig-1.0f)/p; u[i] = (y[i+l]-y[i]) / (x[i+l ]-χ[i]) - (y[i]-y[i-l]) / (x[i]-x[i-l]); ufi] = (6.0 u[i]/(x[i+l]-x[i-l]) - sig*u[i-l])/p;
} if(ypn > 0.99e30) qn = un = O.Of; else { qn = 0.5f; un = (3.0f/(x[n]-x[n-l]))*(ypn-(y[n]-y[n-l])/(x[n]-x[n-l]));
} y2[n] = (un-qn*u[n-l])/(qn*y2[n-l]+1.0f); for(k=n-l; k>=l ; k~) y2[k] = y2[k]*y2[k+l] + u[k]; free_vector(u, 1 ,n- 1 );
} void splint(float const xa[], float const yaf], float const y2af], int n, float x, float *y) { static int klo,khi; int k; float h,b,a; if(0=klo || 0=khi || x<xa[klo] || x>xa[khi]) { klo = l ; khi = n; while((khi-klo) > l) { k = (khi+klo)»l ; if(xa[k]>x) khi = k; else klo = k; } t h = xa[khi] - xa[klo]; if(0.0f = h) nrerror("Bad xa input to routine splint: the xa's must be distinct"); a = (xa[khi]-x)/h; b = (x-xa[klo])/h;
*y = a*ya[klo] + b*ya[khi] + ((a*a*a-a)*y2a[klo] + (b*b*b-b)*y2a[khi]) * (h*h)/6.0f;
}
#endif
#if defined(SA) void main()
{ float* x = vector( 1 ,4); float* y = vector(l,4); float* y2 = vector( 1,4); x[l] = l .Of; x[2] = 2.0f; x[3] = 3.0f; x[4] = 4.0f; y[l] = l.Of; y[2] = 2.0f; y[3] = l.Of; y[4] = 2.0f;
::spline( x, y, 4, l .Of. 1.0f, y2 ); for(float xt = 1.Of; xt < 4.0f; xt += 1.Of/3. Of) { float yout;
::splint( x, y, y2, 4. xt. &yout ); ::printf("xt=%fyout=%f\n",xt,yout);
}
}
#endif
ι******** ****************************************************** ********
* FILE: sw.cxx AUTHOR: Andy Marks
COPYRIGHT (c) 1996, University of Utah /
#include <stdio.h> #include <limits.h> #include <string.h> #include <basecall/sw.hxx> #include <nrc/nrutil.hxx>
SW::SW( char const* vseq, char const* hseq ) : vsz JO), hsz JO), outsz JO), vseqJNULL), hseqJNULL), score _(INT_MIN), vseqout_(NULL), hsαjout JNULL), scores JNULL), path JNULL), vcoord JO), hcoord (0), vposOJO), hposOJO)
{ if(!vseq || I hseq) return; vsz_ = ::strlen( vseq ); hsz_= ::strlen( hseq); vseq_ = new char[ 1 + vsz_ + 1 ]; hseq_ = new char[ 1 + hsz_ + 1 ]; if(!vseq_ || Ihseq { if(vseq delete [] vseq_; if(hseq delete [] vseq_; vseq_ = hseq_ = NULL; return; } ::strcpy( &vseq_[l], vseq ); ::strcpy( &hseq_[l], hseq ); int MVSZ=vsz_+l, MHSZ=hsz_+l ; scores_ = ::imatrix( l.MVSZ, 1,MHSZ ); path_ = ::imatrix( l,vsz_, l,hsz_ ); if(!scores_ || Ipath { delete [] vseq_; vseq_ = NULL; delete [] hseq_; hseq_ = NULL; if(scoresJ ::freeJmatrix(scores_,l,MVSZ,l,MHSZ); scores_ = NULL; if(path ::freejmatrix(path_, 1, vsz , l,hsz_ ); path_ = NULL; return;
} swj); forønt hdx=2;hdx<=MHSZ;hdx++) if(scores_[MVSZ][hdx] > score { vcoord_ = vsz_; hcoord_ = hdx- 1 ; score_ = scores [MVSZ] [hdx]; } for(int vdx=2;vdx<=MVSZ;vdx++) if( scores Jvdx][MHSZ] > scorej { vcoord_ = vdx- 1 ; hcoord_ = hsz_; score_ = scores_[vdx][MHSZ];
} swwalkj);
}
SW::SW( SW const& rhs ) : vsz JO), hsz JO), outszJO), vseqJNULL), hseqJNULL), scoreJINT_MIN). vseqout JNULL). hseqout JNULL), scores JNULL), path JNULL), vcoord JO), hcoord JO), vposOJO), hposOJO)
{ *this = rhs; }
SW const&
SW::operator=( SW const& rhs )
{ if(this != &rhs) { if(vseq { delete [] vseq_; vseq_ = NULL; } if(hseq { delete [] hseq_; hseq_ = NULL; } if(vseqout J { delete [] vseqout_; vseqout_ = NULL; } if(hseqout ) { delete [] hseqout_; hseqout_ = NULL; } if(scoresJ { ::freeJmatrix(scores_,l,vsz_+l,l,hsz_+l); scores_=NULL; } if(path { :: free Jmatrix(path_,l, vsz , 1, hsz J; path_ =NULL; } vsz_ = rhs.vsz_; hsz_ = rhs.hsz_; outsz_ = rhs.outsz_; score_ = rhs.score_; vpos0_ = rhs.vpos0_; hpos0_ = rhs.hpos0_; vseq_ = new char[ vsz_ + 2 ]; if(vseq ::memcpy( vseq_, rhs.vseq_, vsz_+2 ); hseq_ = new charf hsz_ + 2 ]; if(hseq ::memcpy( hseq_, rhs.hseq_, hsz_+2 ); vseqout_ = new char[ outsz_ ]; if(vseqoutJ ::memcpy( vseqout_, rhs.vseqout_, outsz_ ); hseqout_ = new char[ outsz_ ]; if(hseqoutJ ::memcpy( hseqout_, rhs.hseqout_, outsz_ ); scores_ = : :imatrix( 1 ,vsz_+ 1 , 1 ,hsz_+ 1 ); path_ = ::imatrix(l.vsz_, l.hsz ; if(scores_ && path J { for(int vdx=l ;vdx<=vsz_+l ;vdx++) for(int hdx=l;hdx<=hsz_+l ;hdx++) { scores [vdx][hdx] = rhs.scores [vdx][hdx]; if(vdx<=vsz_ && hdx<=hsz path_[vdx][hdx] = rhs.path [vdx][hdx]; } } } return *this;
}
SW::~SW()
{ if(vseq {delete [] vseq_; vseq_ = NULL; } if(hseq {delete [] hseq_; hseq_ = NULL; } if(vseqout J {delete [] vseqout_; vseqout_ = NULL; } if(hseqoutJ {delete [] hseqout_; hseqout _ = NULL; } if( scores J { ::free Jmatrix( scores , 1 ,vsz_+ 1 , 1 ,hsz_+ 1 );scores_=NULL; } if(path { : :free_imatrix( path_, 1 ,vsz_, 1 ,hsz_ );ρath_ = NULL; } } void
SW::sw () const
{ static int const MATCHVAL = 1 , MISMATCHVAL = - 1 , VGAP VAL = -3, HGAP_VAL = -3; int vdx, hdx; for(vdx=l ;vdx<=vsz_+l ;vdx++) for(hdx= 1 ;hdx<=hsz_+ 1 ;hdx++) { scores [vdx] [hdx] = 0; if(vdx<=vsz_ && hdx<=hszj path [vdx] [hdx] = 0; } for(int i=l ;i<=vsz_;i++) for(intj=l ;j<=hsz ++) { int val[4], I, m = (vseq_[i]=hseq_ ])? MATCHVAL: MISMATCHVAL; valf l] = scores i][j+1] + VGAP_VAL; val[2] = scores [i][j] + m; val[3] = scores Ji+l]|j] + HGAPJVAL; int mx = val[I=l]; for(int k=2;k<=3;k++) if(val[k]>mx) mx = valfl=k]; scores_[i+l][j+l J = mx; Path [i]lj] = 1-2;
} } void SW::swwalkJ)
{ static char const GAPCHAR = '-'; int idx. N, vpos, hpos. pos; char *v. *h; N = vcoord_+hcoord_; v = new char[ 1+N+l ]; if(!v) return; h = new char[ 1+N+l ]; if(!h) { delete [] v; return; } pos = N; vpos = vcoord_; hpos = hcoord_; for(idx=0;idx<=(N+ 1 );idx++) v[idx] = h[idx] = 0; while(vpos>0 && hpos>0) { switch( path vpos] [hpos] ) { case -1 : v[pos] = vseqj vpos- ]; h[pos] = GAPCHAR; break; case 0: v[pos] = vseq_[ vpos- ]; h[pos] = hseq_[ hpos— ]; break; case 1 : v[pos] = GAPCHAR; h[pos] = hseq hpos— ]; break; default: ::fprintf(stderr,"SW::swwalkJ) encountered %d, not in [-l,..,l]\n", path_[ vpos] [hpos]); delete [] v; delete [] h; return; } pos--;
} pos++; vpos0_ = vpos; hpos0_ = hpos; if(vseqout J { delete [] vseqout_; vseqout_ = NULL; } if(hseqout J { delete [] hseqout ; hseqout_ = NULL; } outsz_ = N-pos+ 1 + 1 ; vseqout_ = new char[ outsz_ ]; hseqout_ = new char [ outsz_ ]; if(vseqoutJ ::strcpy( vseqout_. &v[pos] ); if(hseqoutJ ::strcpy( hseqout_, &h[pos] ); delete [] v; delete [] h; } void SW::debug() const
{
::printfC'SW @ %p\n".(void*)this); -printfC vsz_ = %2d hsz_=%2d outsz_=%2d\n",vsz_.hsz_,outsz ;
::printf(" vseq_ = [%s]\n".vseq_?&vseq_[l]:"null"); ::printf(" hseq_ = [%s]\n'',hseq_?&hseq_[l]:''nuir'); ::printf(" vpos0_ = %2d hpos0_ = %2d\n",vpos0_,hpos0 ); ::printf(" score_=%2d\n", score ;
::printf(" vcoord_ = %4d hcoord_ = %4d\n".vcoord_,hcoordJ; ::printf(" vseqout_ = [%s]\n",vseqout_?vseqout_:"null"); ::printf(" hseqout_ = [%s]\n",hseqout_?hseqout_:"null"); ::printf("scores_:\n"); if( scores J for(int vdx=l;vdx<=vsz_+l ;vdx++) {
-printfC "); for(int hdx=l ;hdx<=hsz_+l ;hdx++) ::printf(''%2d ".scores vdx][hdx]);
::printf("\n");
} ::printf("path_:\n"); if(pathj for(int vdx=l ;vdx<=vsz_;vdx++) { ::printf(" "); for(int hdx=l ;hdx<=hsz_;hdx++)
::printf("%2d *',path_[vdx][hdx]); ::printf("\n"); }
}
#ifdefined(SA) int main()
{ char const* transposon = "tccattggccctcaaacccc"; char const* observed = "tccattggccctccaaacccc"; SW sw( transposon. observed ); sw.debug(); return 0;
}
#endif
*******************************************************************
* FILE: sw.hxx
* AUTHOR: Andy Marks
* COPYRIGHT (c) 1996, University of Utah */ #if IdefinediJS W_HXX J
#define _SW_HXX_
#ifdefined( WIN32) class _declspec( dllexport ) SW
#else class SW
#endif
{ public:
SW( char const* known, char const* unknown ); SW( SW const& rhs );
SW constffe operator=(SW const& rhs);
~SW(); int score() const { return score_; } int vcoord() const { return vcoord_; } int hcoord() const { return hcoord_; } int vpos0() const { return vpos0_; } int hpos0() const { return hpos0_; } char const* vout() const { return vseqout_; } char const* hout() const { return hseqout_; } void debug() const; private: int vsz_, hsz_. outsz_, vposO_, hposO_; char *vseq_, *hseq_; int score_; char *vseqout_, *hseqout_; int ** scores ,
**path_; int vcoord_, hcoord_; void swj) const; void swwalkj);
};
#endif
i**********************************************************
* FILE: Wvfm.cxx
* AUTHOR: Andy Marks
* COPYRIGHT (c) 1996, University of Utah */ #include <basecall/mb.hxx>
#include <nrc/nr.hxx>
#include <nrc/Complex.hxx>
#include <time.h>
#include <basecall/RatioBin.hxx> Wvfm::Wvfm() : rows JO), cols JO), bgni JO), endi JO), dsJUNKNOWN), pm JNULL), pv JNULL), pi JNULL), px JNULL), pb JNULL), method JMAXJNTEGRAL), status JSTS JJNINITD), ibJ_2X), fCalcSSTJl ), obgni JO), oendi JO)
{ lnordr_[0] = '\0'; for(int idx=0;idx<4;idx++) for(int jdx=0;jdx<4;jdx++) ssmjidx][jdx] = (idx==jdx)? 1.0: 0.0; status_ = STS_NO_SIZE;
} Wvfm::Wvfm( char const* dataf, char const* parmf. char const* Inordr. DATASRC dsrc, Method m ) : rowsJO), colsJO). bgni JO), endi JO), ds Jdsrc), pm JNULL), pv JNULL), pi JNULL), px JNULL), pb JNULL), method Jm), status JSTS JJNINITD), ib_(_2X), fCalcSSTJl), obgni JO), oendi JO) { int BBUFSZ; int** bbuf. rdx, cdx. scanl, idx, jdx; char scratch [ 128 ]; FILE* fp = NULL; for(idx=0;idx<4;idx++) for(jdx=0;jdx<4;jdx++) ssm Jidx] [jdx] = (idx==jdx)? 1.0: 0.0; if(! parmf) { if(lnordr) { ::strcpy(lnordr_.lnordr);
Inordr 4] = '-*; lnordr_[5] = '\0';
} else { status_ = STS_NO_LNORD; return; } } else if(NULL-=(fp=::fopen(parmf,"r"))) { status_ = STS_NO_PARMF; return;
} else { if(NULL != Inordr) : :strcpy(lnordr_.lnordr); else {
::fscanf( fp, "%s\n", scratch ); ::fscanf( fp, "%s\n", lnordr ; ::fscanf( fp, "%s\n", scratch );
} Inordr 4] = '-';
Inordr 5] = '\0'; for(idx = 0; idx < 4; idx++) if(4 != ::fscanf(fp,"%lf %lf %lf %lf, &ssmjidx][0], &ssm idx][l], &ssm idx][2], &ssmjidx][3] )) { ::fprintf(stderr."SSM file[%s] has [%s] instead of 4 doublesΛn", parmfscratch);
::fprintf(stderr,"Wrong file, or incorrect file formatΛn"); exit(l);
} ::fclose(fp); fCalcSST_ = 0;
} if(!dataf || (NULL = (fp = ::fopen(dataf,"r")))) { status_ = STS_NO_DATAF; return;
} #undef OLD_FMT #ifdefined(OLD_FMT) BBUFSZ = 12000; while(l) { if(NULL == ::fgets( scratch, sizeof(scratch), fp)) break; else if(0 == ::strncmp(scratch,"INTENSITY_DATA",14)) break;
} #else char str[2][64]; int nfluor; float fs; int dummy; int NF;
NF = ::fscanf(fp,"%s %s %d %d %f %d\n",str[0],str[l],&nfluor,&BBUFSZ,&fs,&dummy); if(4 != nfluor) { ::fprintf(stderr,"%s:%d, $3==%d, not 4\n",_FILE_,_LINE_.nfluor); status_ = STS JFILE_UN; return;
} if(1.75f != fs && 2.00f != fs) { ::fprintf(stderr,"%s:%d, $4=%4.2f, neither 1.75 nor 2.00\n",_FILE_,_LINE_,fs); status_ = STS JFILE JJN; return;
} #endif if(NULL == (bbuf = ::imatrix( 1 ,BBUFSZ, 1,4 ))) { status_ = STS_NO_MEM; return; } for(scanl-=l ; scanl<=BBUFSZ; scanl++) if(NULL == ::fgets( scratch, sizeof(scratch), fp )) break; else if(0 = ::stmcmp(scratch,"I TENSITY_DATA_END:", 19)) break; else { float b[4]; int* p = bbuffscanl]; if(4 !=*■ ::sscanf( scratch, "%f %f %f %f*,<S b[0],&b[l],«feb[2],&b[3])) {
::freejmatrix( bbuf, 1,BBUFSZ,1,4 ); starus_ = STS_SCNL_RD_FAIL; return;
} else { p[l ] = int( b[0]+0.5f ); p[2] = int( b[l]+0.5f ); p[3] = int( b[2]+0.5f ); p[4] = int( b[3]+0.5f ); }
} ::fclose(fp); scanl-; if(NULL==(pm_=::dmatrix( 1 ,rows_=scanl, 1 ,cols_=4))) { : :free_imatrix( bbuf, 1 ,BBUFSZ, 1 ,4 ); status_ = STS_NO_MEM; return;
} else for(rdx= 1 ;rdx<=τows_;rdx++) for(cdx=l ;cdx<=cols_;cdx++) pm_[rdx][cdx] = double( bbuf[rdx][cdx] ); ::freejmatrix( bbuf, 1,BBUFSZ, 1 ,4 ); status_ = STS JNITD;
}
Wvfm::Wvfm( int r, int c, char const* Inordr, DATASRC ds, Method m ) : rowsjr), colsjc), bgnijl), endi JO), dsjds), pm JNULL), pv JNULL), pi JNULL), px JNULL), pb JNULL), method J ), status JSTS JJNINITD), ib_(_2X), fCalcSSTJl), obgnijl ), oendi JO)
{ int idx, jdx, cdx, rdx; if(0 = endi endi_ = scanl();
Inordr JO] = '\0'; if(lnordr)
::strcpy( lnordr_, Inordr ); pm_ = ::dmatrix( l,rows_, l,cols ; for(cdx= 1 ; cdx<=cols_; cdx++) for(rdx= 1 ; rdx<=rows_; rdx++) pm_[rdx][cdx] = 0.0; for(idx=0;idx<4;idx++) for(jdx=0;jdx<4;jdx++) ssm idx] [jdx] = (idx==jdx)? 1.0: 0.0; status_ = STS_INITD;
}
Wvfm::Wvfm( Wvfm const& rhs ) : rows JO), cols JO), bgni JO), endi JO), pm JNULL), pv JNULL), pi JNULL), px JNULL), pb JNULL), methodJMAX JNTEGRAL), ib_(_2X), fCalcSSTJl), obgni JO), oendi JO) t l
*this = rhs;
} Wvfm const&
Wvfm::operator=( Wvfm const& rhs) if(this != &rhs) { int NS, idx, jdx; release(); status_ = rhs.status_; rows_ = rhs.rows_; cols_ = rhs.cols_; obgni_ = rhs.obgni_; oendi_ = rhs.oendi_; bgni_ = rhs.bgni_; endi_ = rhs.endi_; ds_ = rhs.ds_; method_ = rhs.method_; ib_ = rhs.ib_; fCalcSST_ = rhs.fCalcSST_; if(rhs.pm_ && 0!=rows_ && 0!=cols if(NULL == (pm_ = ::dmatrix(l,rows_,l ,cols )) { status_ = STS_NO_MEM; goto bugout; } else for(idx=l ;idx<=rows_;idx++) for(int jdx= 1 ;jdx<=cols_;jdx++) pm [idx]fjdx] = rhs.pm idx] [jdx]; ::strcpy(lnordr_,rhs.lnordrJ; for(idx=0;idx<4;idx++) for(jdx=0;jdx<4;jdx++) ss [idx][jdx] = rhs.ssm_[idx][jdx]; NS = scanl(); if(rhs.pv_ && rhs.pi_ && rhs.px_ && rhs.pb J { pv_ = ::dvector( 1, NS ); px_ = ::dvector( 1. NS ); pi_= ::ivector( l.NS ); pb_ = ::vector( 1,NS); if(!pv_||!pi_||!px_||!pbj{ status_ = STS_NO_MEM; goto bugout;
} for(idx=l;idx<=NS;idx++) { pv_fidx] = rhs.pv idx]; pi [idx] = rhs.pi [idx]; px [idx] = rhs.px [idx]; pb idx] = rhs.pb Jidx]; } } } bugout: return *this;
} void
Wvfm::release() f int NS = scanl(); if(NULL = pm { ::free_dmatrix( pm_,l,rows_,l,cols ; pm_ = NULL; } if(NULL = pv { ::free_dvector( pv_, 1, NS ); pv_ = NULL; } if(NULL = pi { ::freeJvector( pi_, 1, NS ); pi_ = NULL; } if(NULL = px { ::free_dvector( px_, 1, NS ); px_ = NULL; } if(NULL = pb { ::free_vector( pb_, 1, NS ); pb_ = NULL; }
} Wvfm::~Wvfm()
{ release();
} struct VI { double v; int i,z ; }; static int vi_cmp(void const* pl, void const* p2)
{ double vl = ((VI const* )p 1 )->v; double v2 = ((VI const* )ρ2)->v; if(vl>v2) return 1; else if(v1<v2) return -1; else return 0; } void
Wvfm::bgnEnd()
{ if(bgni_>=2 && endi_>bgni_ && endi _<scanl()) { obgni_ = bgni_; oendi_ = endi_; return;
} static int const NZ = 8; VI vi[l+NZ]; int ZSZ = scanl()/NZ; for(int nz=l ;nz<=NZ;nz++) { int zb = l+(nz-l)*ZSZ, ze = nz*ZSZ, mi; double mv = scja(mi=zb,l); for(int s=zb+l ;s<=ze;s++) { for(int f=2;f<=lanes();f++) { double v = scja(s,f); if(v>mv) { mv=v; mi=s; }
} } vifnzj.v = mv; vi[nz].i = mi; vi[nz].z = nz;
} : :qsort(&vi[ 1 ],NZ.sizeof(VI),vi_cmp); int mxi = vi[NZ].z; int mx2 = vi[NZ-l].z; double primerPeak; if(l=abs(mxl-mx2)) { bgni_ = vi[NZ].i; primerPeak = vi[NZ].v; endi_ = scanl()-100;
} else if(mxl<mx2) { bgni_ = vi[NZ].i; primerPeak = vi[NZ].v; endi_ = vi[NZ-l].i-100;
} else { bgni_ = vi[NZ-l].i; primerPeak --- vi[NZ-l].v; endi_ = vi[NZ].i-100;
} int B=bgni_; for(int ns=bgni_+30:ns<=bgni_+400;ns++) for(int f=l ;f<=lanes();f++) if(sc la(ns,f) > 0.3*primerPeak) { if(bgni_=B) B = ns; continue;
} double mu=0.0, nv=0.0; int m. f. LHS = (bgni_+endi J/2; for(m=bgni_;m<=LHS;m++) { double lmx = sc la(m.l); for(f=2;f<=lanes();f++) if(sc Ja(m,f) > lmx) lmx = sc la(m,f); mu += lmx; nv += 1.0;
} mu /= nv; nv = 0.0; for(m=bgni_;m<=LHS;m++) { double lmx = scja(m,l); for(f=2;f<=lanes();f++) if(sc Ja(m,f) > lmx) lmx = scja(m,f); if(0.0=nv && lmx<=mu) nv++; else if(0.0!=nv && lmx>mu) { bgni_ = m; break;
} } #if l endi_ -= 350; #else double denom, numer, nb; numer = double(endi_-bgni_+l); if(JlX=ibJ denom = 10.0; else if(_2X==ibJ denom = 7.0; else denom = 5.0; nb = numer/denom; if(nb>=650.0) endi_ -= int((nb-650.0)*denom); #endif obgni_ = bgni_; oendi_ = endi_; } static void fsmPkdet(double const* vec,short b,short e,short& Pl,short& PN.short* pk,short* tr)
{ enum STATE { ST_UK, ST_UP, ST_DN } st = ST_UK; short scnl,TN; double v 1 ; TN = PN = 0; vml = vec[ scnl=b ]; for(++scnl; scnl<=e: scnl++) { double v = vec[scnl]; switch(st) { case STJJK: if(v > vml ) st=ST_UP; else if(v < vml) st=ST_DN; break; case ST_UP: if(v < vml) { st = ST_DN; pk[ ++PN ] = scnl-1; } break; case ST_DN: if(v > vm 1 ) { st = ST_UP; trf ++TN ] = scnl- 1 ; } break;
} vml = v;
} Pl=l ; if(pk[Pl ] < trf l]) Pl++; if(pk[PN] > tr[TN]) PN-;
} static void bslAdjust( double* bsl, int bgn, int end ) { static int const LW = 125; int N, FFTSZ; double* minv, *hn, *lpfbsl, *rawmin=&bsl[bgn-l]; # define REAL(n) (2*(n)-l) # define IMAG(n) (2*(n)) int c;
N = end-bgn+1 ;
FFTSZ = 1 « int(::ceil(::log(3*N+(2*LW-l)-l)/::log(2.0)) ); minv = ::dvector(l,2*FFTSZ); hn = ::dvector(l,2*FFTSZ); lpfbsl = ::dvector(l,N); for(c=l ;c<=2*FFTSZ;c++) minv[c] = hn[c] = 0.0; for(c=l ;c<=N;c++) { double v = rawminfc]; minv[REAL(N+l-c)] = v; minv[REAL(N+c)] = v; minv[REAL(3*N+l-c)] = v;
} ::dfourl( minv, FFTSZ, 1 ); for(c=l;c<=LW;c++) { double v = 0.5*(1.0 + ::cos(double(c-l)*NR_PI/double(LW)))/double(LW); hn[REAL(c)] = v; if(l !=c) hn[REAL(FFTSZ-c+2)] = v; }
::dfourl( hn, FFTSZ, 1 ); ::dCMul( minv, hn, hn, FFTSZ ); ::dfourl( hn, FFTSZ, -1 ); for(c=l ;c<=N;c++) lpfbsl[c] = hn[REAL(N+c)]/double(FFTSZ); double BX=0.0, BY=0.0, BN=0.0, EX=0.0, EY=0.0, EN=0.0; double si, in; for(c=l;c<=N/10;c++) { double bdel=lpfbsl[c]-rawmin[c], edel=lpfbsl[N+ 1 -c]-rawmin[N+ 1 -c] ; if(bdel>0.0) { BN++; BX+=double(c); BY+=bdel; } if(edel>0.0) { EN++; EX+=double(N+l-c); EY+=edel; }
}
BX /= BN; BY /= BN;
EX /= EN; EY /= EN; si = (EY-BYV(EX-BX); in = EY - sl*EX; for(c=l ;c<=N;c++) rawmin[c] = lpfbsl[c] - (double(c)*sl+in);
::free_dvector(minv, 1 ,2*FFTSZ); ::free_dvector(hn, 1,2*FFTSZ);
::free_dvector(lpfbsl, 1 ,N);
} void
Wvfm::fbblsjint fnr. double* bsl, DIRECTION_ DIR) const {
#undef DOC_PATENT
#if defined(DOC_PATENT) static int s 1=1 ;
#endif double TAU=75.0; double prvmin, newmin. K, cnt=l .0, si, in. thr, dy.dx; int lstknt,x, IC,FC.LC; if(FWD=DIR) { lstknt=bgni_; IC=bgni_+l; FC=endi_+l ;
} else { lstknt=endi_; IC=endi_-l ; FC=bgni_-l;
} prvmin = sc la(lstknt,fnr); LC = IC+10*DIR; for(x=IC;x!=LC;x+=DIR) if(0.0 != (prvmin=scja(x,fnr))) break; K = (prvmin/3.0)/(exp(l .0)-l .0); for(int snr=IC;snr!=FC;snr+=DIR) { thr = prvmin + K*(exp(++cnt/TAU)- 1.0); if(0.0=(newmin=sc la(snr,fnr))) new in = thr+1.0; if(newmin<=thr || cnt>= 100.0) { if(cnt>= 100.0 && thr<newmin) newmin = thr; dy = newmin-prvmin; dx = double(snr-lstknt); si = dy/dx; in = newmin - sl*snr; prvmin = newmin; for(x=lstknt;x!=snr;x+=DIR) { double v = double(x)*sl+in; bsl[x] = (FWD=DIR)? v. sqrt(v*bsl[x]); #if defined(DOC_PATENT) if(3=fnr && l=sl) ::printf("%d %7.2f %7.2f %7.2 n",DIR,scJa(x,fnr),v,bsl[x]); #endif } lstknt = snr; cnt = 1.0;
K = (prvmin/3.0)/(exp(l .0)-l .0); }
} snr -= DIR; if(snr=lstknt) { double v = double(snr)*sl+in; bsl [snr] = (FWD=DIR)? v: sqrt(v*bsl[snr]);
#if defined(DOC JPATENT) if(3==fnr && l==sl ) ::printf("%d %7.2f %7.2f %7.2f\n",DIR,scJa(snr,fnr),v,bsl[snr]); #endif
} else { if(thr<newmin) newmin = thr; dy = newmin-prvmin; dx = double(snr-lstknt); si = dy/dx; in = newmin - sl*snr;
LC = snr+DIR; for(x=lstknt;x!=LC;x+=DIR) { double v = double(x)*sl+in; bslfx] = (FWD=DIR)? v: sqrt(v*bsl[x]); #ifdefined(DOC_PATENT) if(3=fnr && l==sl) ::printf("%d %7.2f %7.2f %7.2 n",DIR,scJa(x,fnr),v,bsl[x]); #endif } } #ifdefined(DOC_PATENT) if(3=fnr && FWD!=DIR) { si = 0; exit(l); } #endif
} void
Wvfm::bestBaselineJ) { double *bsl = ::dvector(l,endi ; for(int fnr=l ;fnr<=lanes();fnr++) { fbbls J fnr, bsl, FWD ); fbblsj fnr, bsl, BCK ); bslAdjust( bsl, bgni , endi_ ); for(int snr=bgni_;snr<=endi_;snr++) { double del = scja(snr,fnr)-bsl[snr]; if(del<0.0) del = 0.0; sc_la_set(snr,fnr, del); }
}
#if0 debugC'AFTER BESTBASLINE"); #endif ::free_dvector(bsl,l.endi ;
} int
Wvfm:: rawPks J RatioBin& rb ) const
{ short* ppk = new short[(endi_-bgni_+l)/2]; short* ptr = new short [(endi_-bgni_+l)/2]; double* penv = new doublef l+endi ; short lane, scnl, Pl, PN; if(!ppk || !ptr || Ipenv) return 0; for(scnl=bgni_;scnl<=endi_;scnl++) { penvfscnl] = scja(scnl,l ); for(lane=2;lane<=lanes();lane++) if(scja(scnl,lane)>penv[scnl]) penv[scnl] = sc la(scnl,lane); } fsmPkdet( penv, bgni_, endi_, Pl, PN, ppk, ptr); if(0=PN) return 0; for(short idx=Pl ; idx<=PN; idx++) { double colsmpl[4]; short x = ppkfidx]; for(lane= 1 ;lane<=lanes();lane++) colsmpl[lane-l ] = sc la(x,lane); rb.classify( colsmpl, id ); } delete [] ppk; delete [] ptr; delete [] penv; return 1; } int Wvfm::specSep()
{ int scnl, lane, mrow; if(0=bgni_ || 0=endi_ || bgni_>endi_ || 0=rows_ || 0=cols { status_ = STS_CORRUPT; return 0;
} #if 0 debugC'Before bestBaseline_");
#endif bestBaselineJ); #ifO debugC'After bestBaselineJ'); #endif if(fCalcSSTJ {
RatioBin b( lanes() ); Wvfm tmp = *this; if(l !=tmp.rawPksJ b )) { starus_ = STS_CORRUPT; return 0;
} if(l != b.analyzeO) { status_ = STS_CORRUPT; return 0; } for(short row=l ;row<=lanes();row++) for( short col=l ;col<=lanes();col++) ssm_[row-l][col-l] = b.sst(row,col);
} double minv[5]; minv[l] = minv[2] = minv[3] = minv[4] = 0.0; for(scnl=bgni_;scnl<=endi_;scnl++) { double temp[4]; for(mrow=0;mrow<lanes();mrow++) { double sum=0.0; for(lane=l ; lane<=lanes(); lane++) sum += scja(scnl,lane) * ssm_[mrow][lane-l]; temp[mrow] = sum; if(sum<minv[mrow+l ]) minv[mrow+l J = sum; } for(lane= 1 ;lane<=lanes();lane++) sc la_set(scnl,lane.temp[lane- 1 ]);
} for(scnl=bgni_;scnl<=endi_;scnl++) for(lane=l ;lane<=lanes();lane++) { double v = scja(scnl,lane);
#if l if(v<0.0) scja_set(scnl,lane,0.0);
#else if(v<0.0) sc la_set(scnl,lane, v / -minv[lane] ); #endif
} #if 0 debug(" After SST applied"); #endif return 1 ;
} void
Wvfm-sortO
{ int transposed = 0;
if(rows_ > cols J { transpose(); transposed = 1 ; } for(int ldx=l ; Idx <= rows_; ldx++)
::qsort( &pm [ldx][l], cols_, sizeof(double), d cmp ); if(l = transposed) transpose();
static void bubbleswap( double* v, int* i, int i 1 , int i2)
{ double td; int ti: td = v[il]; ti = i[il]; v[il] = v[i2]; i[il] = i[i2]; v[i2] = td; i[i2] = ti;
} void Wvfm::envelope( ShftVect const& sv ) const
{ static float const MAXXBND = 2.0f; static float const MINXBND = l.Of;
Wvfm* It = (Wvfm*)this; int scnl. SVMAX = 1 +sv.maxshft(); int mi. NS = scanl(), NL = lanes();
if(!pvj lt->pv_ = ::dvector( 1 , NS ); if(!pij lt->pi_ = ::ivector( 1, NS ); if(!px lt->px_ = ::dvector( 1 , NS ); if(!pbj lt->pb_ = ::vector( 1, NS ); if(!pv_ || !pi_ || !px_ || !pb { lt->status_ = STS_NO_MEM; return; } for(scnl=l; scnKSVMAX; scnl++) { lt->pv_[scnl] = lt->pxjscnl] = 0.0; lt->pb scnl] = O.Of; lt->pi Jscnl] = 0; } for(scnl=SVMAX; scnl<=NS; scnl++) { double mx, smx, v[5]; int i[5]; vfl] = scja( scnl-sv.s(l), 1 ); i[l]=l; v[2] = scja( scnl-sv.s(2), 2 ); i[2]=2; v[3] = scja( scnl-sv.s(3), 3 ); i[3]=3; v[4] = sc la( scnl-sv.s(4), 4 ); i[4]=4; if(v[ 1 ]>v[2]) bubbleswap(v,i, 1 ,2); if(v[2]>v[3]) bubbleswap(v,i,2,3); if(v[3]>v[4]) bubbleswap(v,i,3,4); if(v[ 1 ]>v[2]) bubbleswap(v,i, 1 ,2); if(v[2]>v[3]) bubbleswap(v,i.2,3); if(v[l]>v[2]) bubbleswap(v,i,l,2); mx = v[4]; smx = v[3]; mi=i[4]; lt->pv J scnl ] = mx; lt->pi J scnl ] = mi; if(v[l] < 0.0) v[l] = 0.0; if(v[3] < 0.0) v[3] = 0.0; lt->pb scnl ] = (v[4]!=v[3])? float((v[3]-v[l])/(v[4]-v[3])): 0.5f; if(lt->pb_[ scnl ] > 9.99f) lt->pb scnl] = 9.99f; if(smx < DBL_EPSILON) smx = DBL_EPSILON; if(mx < 0.1) mx = MINXBND + mx/sqrt(smx); else mx /= smx; if(mx > MAXXBND) mx = MAXXBND; else if(mx < MINXBND) mx = MINXBND; lt->px_[ scnl ] = mx;
} } void
Wvfm: :transpose() { Wvfm tmp( cols_, rows_, lnordr_ ); double **td; for(int rdx= 1 ;rdx<=rows_;rdx++) for(int cdx= 1 ;cdx<=cols_;cdx++) tmp[cdx][rdx] = pm_[rdx][cdx]; rows_ = tmp.rows_; cols_ = tmp.cols_; td = tmp.pm_; tmp.pm_ = pm_; pm_ = td; tmp.rows_ = cols_; tmp.cols_ = rows_;
} void
Wvfm::append( Wvfm const& rhs, int oscnl, int nscnl )
{ int NS = (rhs.endi()<rhs.scanl())? rhs.endi(): rhs.scanl(); Wvfm tmp( oscnl+(NS-nscnl+l ), 4, lnordr_ ); ShftVect noshift; int or=rows_, oc=cols_; double **td =- tmp.pm_; for(int lane=l;lane<=4;lane++) { for(int os=l ;os<=oscnl;os++) tmp.sc la_set( os. lane, sc la(os,iane) ); for(int ns=nscnl;ns<=NS;os++,ns++) tmp.sc la_set( os, lane, rhs.sc la(ns,lane) );
} rows_ = tmp.rows_; cols_ = tmp.cols_; tmp.pm_ = pm_; pm_ = td; tmp.rows_ = or; tmp.cols_ = oc; bgni_ = tmp.bgni_; endi_ = tmp.endi_;
NS = scanl(); if(NULL != pvj { ::free_dvector( pv_, 1, NS ); pv_ = NULL; } if(NULL != pi { ::freeJvector( pi_, 1, NS ); pi_ = NULL; } if(NULL != px { ::free_dvector( px_, 1, NS ); px_ = NULL; } if(NULL != pb { ::free_vector( pb_, 1, NS ); pb_ = NULL; } envelope( noshift );
} void
Wvfm::lnordr( char const* Inordr ) { for(int idx=0;idx<6;idx++) lnordr_[idx] = lnordr[idx];
} void Wvfm::ssm( double const* pssm )
{ fCalcSST_ = 0; for(int idx=0; idx<4; idx++) for(int jdx=0; jdx<4; jdx++) ssm [idx][jdx] = pssm[idx*4+jdx];
} void
Wvfm::pm( double** pn, int r. int b. int e )
{ release(); pm_ = pn; rows_ = r; bgni_ = b; endi_ = (Orows ? rows : e;
} void
Wvfm::debug( char const* msg ) const
{ static char const* METLUT[] =
{ "MAXJNTEGRAL", "MIN_XBANDING"
} ; static char const* DSLUT[] =
{
"UNKNOWN", "ABI ", "MDYN", "FLUOR", "TRUVEL" } ;
::printf("Wvfm @ %p\n",(void*)this); if(NULL != msg) : :printf(" %s\n",msg); ::printf("\trows=%u cols=%u bgni=%u endi=%u method=%s\n", rows_.cols_,bgni_.endi_, METLUT[method()] );
::printf("datasrc: [%s]\n", DSLUTfdsJ ); ::printf("lnordr: [%s]\n",('\0'=lnordr [0]? "": Inordr J); if(ABI=ds_ || MDYN=dsJ { ::printf("specsep:\n"); for(int idx=0;idx<4;idx++) {
::printf(" "); for(int jdx=0;jdx<4;jdx++)
::printf("%9.6f ",ssm idx]ϋdx]); ::printf("\n"); if(NULL = pm J
-printfC pm = (null)\n"); else { int idx; ::printf(" pm @ %p\n",(void*)pmJ; if(rows_>colsJ for(idx = bgni_; idx <= endi_; idx++) { ::printf("%)41d %11.61f %11.61f %11.61f %11.61 An", idx,pmjidx][l],pm [idx][2],pm [idx][3],pm [idx][4]); } else for(idx = bgni_; idx <= endi_; idx++) { ::printf("%41d %11.61f%11.61f%11.61f%11.61An", idx,pm 1 ] [idx],pm [2] [idx] ,pm 3] [idx],pm [4][idx]);
} }
/** ********************************************************
* FILE: Wvfm.hxx * AUTHOR: Andy Marks
* COPYRIGHT (c) 1996, University of Utah */
#ifndef _WVFM_HXX_
#defιne _WVFM_HXX_ #include <basecall/ShftVect.hxx> class RdrOut; class SegRead; class RatioBin;
#ifdefined(WIN32) class _declspec (dllexport) Wvfm
#else class Wvfm
#endif t public: enum INTERPOLATE_BY { _3X, _2X, JX }; enum DATASRC { UNKNOWN, ABI, MDYN, FLUOR, TRUVEL }; enum Method { MAXJNTEGRAL, MIN_XBANDING }; enum Status
{ STS JJNINITD,
STS JNITD.
STS_NO_MEM,
STS_NO_SIZE,
STS_NO_PARMF, STS_NO_DATAF,
STS_SCNL_RD_FAIL,
STS_CORRUPT,
STS_NO_LNORD,
STSJFILEJJN };
Wvfm();
Wvfm(char const* dfn,char const* pfn,char const* lnordr=NULL, DATASRC ds=MDYN, Method m=MAX_INTEGRAL ); Wvfm( int rows, int cols, char const* Inordr,
DATASRC ds=MDYN, Method m=MAXJNTEGRAL );
Wvfm( Wvfm const-fe rhs );
Wvfm const& operator=( Wvfm const& rhs );
~Wvfrn(); void lnordr( char const* Inordr ): void ssm( double const* pssm ); void ds( DATASRC ds ) { ds_ = ds; } void smplRate( INTERPOLATE_BY ib ) { ib_ - ib; } void rows(int r) { rows_ = r; } void cols(int c) { cols_ = c; } void bgni(int b) { bgni_ = b; } void endi(int e) { endi_ = e; } void sc la_set(int s,int l.double v) const { if(rows_>colsJ pm _[s][l] = v; else pm_[l][s] = v;
} void sc_la_mul(int s.int l,double v) const { if(rows _>colsJ pm Js][l] *= v; else pm l][s] v;
} void append( Wvfm const& rhs, int oscnl, int nscnl ); int scanl() const { return rows_>cols_? rows_: cols_; } int lanes() const { return rows_<cols_? rows_: cols_; }
DATASRC ds() const { return ds_; } double sc la(int s.int 1) const { return rows_>cols_? pm Js]fl]: pm Jl][s]; } double const* ssm() const { return ssm JO]; } void envelope( ShftVect const& sv ) const; int rows() const { return rows_; } int cols() const { return cols_; } int obgni() const { return obgni int oendi() const { return oendi_; } int bgni() const { return bgni_; } int endi() const { return endi_; } double envv( size J idx ) const { return (pv ? pv idx]: 0.0; } int envi( size t idx ) const { return (pi ? pi Jidx]: 0; } double xbnd( size t idx ) const { return (px ? pxjidx]: 0.0; } float buzz( size_t idx ) const { return (pb ? pb [idx]: O.Of; }
Method method() const { return method ; } char const* lnordr() const { return lnordr_; } void debug(char const* msg = NULL) const; int nfeeder( RdrOut& ro, int fVb=0 ); int unstop(); void sort();
Status status() const { return status ; } private: int rows_; int cols_; int bgni_; int endi_;
DATASRC ds_; double **pm_;
double* pv_; int* pi_; double* px_; float* pb_;
Method method_; char lnordr_[6]; double ssm_[4][4];
Status status ; INTERPOLATE_BY ib_; char fCalcSST_; int obgni_, oendi_; int preproc(); int nreader( int Fbw, int finalRead, SegRead& segrd ); void release(); void bgnEnd(); int specSep(); void sc la_sub(int s.int l,double v) const { if(rows_>colsJ pm Js][l] -= v; else pm Jl][s] -= v;
} void pm( double **p, int r, int b, int e ); void transpose(); void truvelAdjust(); void leastSquareBaseline J); void baseline( int N ); void noZeros(); int mdynpre(); double* dmscanlsum() const; double* dmscanlprod() const; double* operator[](int i) { return pm i]; } int t2tBaselineSubt (); int rawPks J RatioBin& rb) const; void bestBaselineJ); enum DIRECTION, { BCK=- 1 , F WD= 1 } ; void fbblsjint fnr, double* b, DIRECTION, D) const;
};
#endif
i*******************************************************************
* FILE: xtranorm.cxx - Extra Normalization to compensate for
* a band-lite (as opposed to other band-rich) lanes.
* AUTHOR: Andy Marks
* COPYRIGHT (c) 1996, University of Utah */
#include <basecall/mb.hxx> struct SD
{ double v; int i: }; double* Wvfm::dmscanlprod() const
{ double *p; if(NULL != (p = ::dvector(l,scanl()))) for(int sdx=l ; sdx<=scanl(); sdx++) { pfsdx] = scja(sdx.l); for(int ldx=2; ldx<=lanes(); ldx++) p[sdx] *= scja(sdx.ldx); } return p;
} double*
Wvfm::dmscanlsum() const { double *p; if(NULL != (p = ::dvector(l ,scanl()))) for(int sdx=l;sdx<=scanl();sdx++) { p[sdx] = sc_la( sdx, 1 ); for(int ldx=2;ldx<=lanes();ldx++) p[sdx] += scja( sdx, Id );
} return p;
} static void stats( double const* v. int n, double& mn. double& std ) { int idx; mn = std = 0.0; for(idx=l ;idx<=n;idx++) mn += vfidx]; mn /= double(n); for(idx=l ;idx<=n;idx++) { double del = (v[idx]-mn); std += del*del; } std = sqrt( std/double(n) );
} int
Wvfm::unstop() { int testagain = 1 ; while(l = testagain) { double *p, *s, pmn,pstd, smn.sstd; double pmz, smz, geoMean; int pmi. smi; int Idx. sdx; testagain = 0; p = dmscanlprodO; if(NULL == p) return 0; s = dmscanlsum(); if(NULL = s) return 0; stats( p, scanl(), pmn, pstd ); stats( s. scanl(), smn, sstd ); pmz=p[l]; pmi=l ; smz=s[l]; smi=l; if(0.0 = pstd || 0.0 = sstd) { ::free_dvector( p, 1, scanl() ); ::free_dvector( s, 1. scanl() ); return 0;
} for(sdx=l ; sdx<=scanl(); sdx++) { p[sdx] = (p[sdx]-pmn)/pstd; s[sdx] = (s[sdx]-smn)/sstd; if(p[sdx] > pmz) { pmz = p[sdx]; pmi = sdx; } if(s[sdx] > smz) { smz = s[sdx]; smi = sdx; } }
::free_dvector( p. 1 , scanl() ); ::free_dvector( s, 1 , scanl() ); if(10 > -abs(pmi-smi)) { geoMean = ::sqrt( pmz * smz ); if(geoMean > 9.5) { int bgn, end; testagain = 1 ; bgn = (smi>25)? smi-25: 1 ; end = ((smi+25) < scanl())? smi+25: scanl(); for(ldx= 1 ; ldx<=4; ldx++) for(sdx=bgn; sdx<=end; sdx++) sc la_set( sdx, Idx, 0.0 );
} } } return 1;
} static int sumi2j(in i, intj) { int t; if(i>j) { t = i:i=j;j=t;
} return j*(j+l)/2 - i*(i-l)/2; } static int sd_cmp( void const* el, void const* e2 )
{ double dl = ((SD const* )el)->v; double d2 = ((SD const* )e2)->v; if(dl >d2) return 1; else if(dl < d2) return -1; else return 0;
} static int rankordr( double* pv, int n )
{ SD* pv2; int idx, jdx, hi; pv2 = newSD[n]; if(NULL = pv2) return 0; for(idx=l; idx<=n; idx++) { pv2[idx-l].v = pvfidx]; pv2[idx-l].i = idx;
} ::qsort( pv2, n, sizeof(SD), sd cmp ); for(idx=0; idx<n; ) { for(hi=idx+l; hi<n; hi++) if(pv2[idx].v != pv2[hi].v) break; double v = double(sumi2j(idx+l .hi))/double(hi-idx); for(jdx=idx;jdx<hi;jdx++) pv2[jdx].v = v; idx = hi; } for(idx=0; idx<n; idx++) pv[ pv2[idx].i ] = pv2[idx].v; delete [] pv2; return 1; } int SegRead ::xtranorm( Wvfm& bdproc, Wvfm const& raw, int nscanl, int pass2 )
{ ShftVect& sv = nsv(); int lane, sdx; double gain = 0.0; if(!pass2) { double bandFreq[5]; Wvfm rawcopy( raw ); Wvfm rdycopy( bdproc ); if(l != rdycopy.unstopO) return 0; sv = mcalign( rdycopy, sv ); if(0 = pmb_->remune()) { ShftVect noshft; sv = noshft; } pmb_->minFreq_ = 1.0; rawcopy.sort(); bandFreq[0] = pmb_->bandAmpl 0] = 0.0; for(lane=l ;lane<=4;lane++) pmb_->bandAmpl J lane ] = rawcopy.sc la( 2000, lane ); rawcopy = raw; rawcopy.envelope( sv ); for(lane= 1 ;lane<=4;lane++) bandFreq[ lane ] = 0.0; for(sdx=l+sv.maxshft(); sdx<=nscanl; sdx++) bandFreqf rawcopy.envi( sdx ) ] += 1.0; for(lane=l; lane<=4; lane++) { bandFreq[ lane ] /= double(nscanl-sv.maxshft()+l ); if(bandFreq[lane] < pmb_->minFreqJ pmb_->minFreq_ = bandFreq[ pmb_->minFreqLane_ = lane ];
} if(l != ::rankordr( pmb_->bandAmpl_, 4 )) return 0;
} if(pmb_->minFreq_ < 0.15) { gain = (pmb_->bandAmpl_[pmb_->minFreqLaneJ < 2.0)? 0.5: 0.75; int NS = bdproc.endiO; for(sdx=bdproc.bgni();sdx<NS;sdx++) bdproc. scja_mul( sdx, pmb_->minFreqLane_, gain );
} return 1 ; ]
While the present invention has been described and illustrated in conjunction with a number of specific embodiments, those skilled in the art will appreciate that variations and modifications may be made without departing from the principles of the invention as herein illustrated and described.
The invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are to be considered in all respects as illustrative, and not restrictive.

Claims

V. Claims
1. A method for determining base identity in unprocessed nucleic acid sequencing data, comprising the steps of: receiving unprocessed input data comprising unprocessed nucleic acid sequencing data; preprocessing said input data to generate preprocessed data; blind deconvolving said preprocessed data to generate blind deconvolved data; extranormalizing said blind deconvolved data to generate extranormalized data; detecting peaks in said extranormalized data using peak detection means to generate processed data; editing the quality of said processed data using fuzzy logic editing means to generate called nucleotide sequence and at least one quality value for said called sequence.
2. The method according to claim 1 , wherein said preprocessing step further comprises: identifying Begin and End points in said unprocessed data; establishing a baseline in said unprocessed data; subtracting said baseline from said unprocessed data to generate baseline-subtracted data; separating said baseline-subtracted data to generate preprocessed data, said separating step comprising spectral or leakage separation;
3. The method according to claim 1 , wherein said extranormalizing step further comprises: correcting the relatives mobility of signals in said blind deconvolved data using Monte Carlo alignment means.
4. The method according to claim 3, wherein said extranormalizing step further comprises: attenuating signals which were accentuated by said blind deconvolving.
5. The method according to claim 1. wherein said peak detection means comprises fuzzy logic insertion detection means to identify and remove putative insertions in said extranormalized data; and fuzzy logic gapchecking means to identify putative gaps in said extranormalized data and inserting data in said gaps.
6. The method according to claim 5, further comprising: analysing said extranormalized data with said fuzzy logic insertion detection means before and after analyzing said extranormalized data with said fuzzy logic gap checking means.
7. The method according to claim 1, wherein said editing means generates at least one quality value by analyzing characteristics of said processed data selected from the group consisting of band height, band width, band shape, band's left gap. band's right gap. cross- banding and baseline buzz.
8. The method according to claim 7, wherein said editing means generates at least one quality value from said characteristics of said processed data by applying a plurality of fuzzy logic rules.
9. The method of claim 1, wherein said blind deconvolving is iterative and includes at least a first narrow-band guess for the filter band width value and a refined second band for the filter band width value.
10. A method for identifying DNA sequence in unprocessed nucleic acid sequencing data. comprising the steps of: receiving unprocessed input data comprising unprocessed nucleic acid sequencing data; preprocessing said input data to generate preprocessed data; blind deconvolving said preprocessed data to generate blind deconvolved data; extranormalizing said blind deconvolved data to generate extranormalized data; detecting peaks in said extranormalized data to generate peak detected-data; identifying and removing insertions in said peak detected-data using a fuzzy logic insertion detection algorithm; identifying and filling gaps in said peak detected-data using a fuzzy logic gap checking algorithm; and producing processed sequence data.
1 1. The method of claim 10, further comprising: editing the quality of said processed sequence data using fuzzy logic editing means to generate called nucleotide sequence and at least one quality value for said called sequence.
12. The method according to claim 10, wherein said preprocessing step further comprises: identifying Begin and End points in said unprocessed data; establishing a baseline in said unprocessed data; subtracting said baseline from said unprocessed data to generate baseline-subtracted data; separating said baseline-subtracted data to generate preprocessed data, said separating step comprising spectral or leakage separation;
13. The method according to claim 10, wherein said extranormalizing step further comprises: correcting the relatives mobility of signals in said blind deconvolved data using a Monte Carlo alignment.
14. The method according to claim 13, wherein said extranormalizing step further comprises: attenuating signals accentuated by blind deconvolving.
15. The method according to claim 10, further comprising: analyzing said extranormalized data with said fuzzy logic insertion detection algorithm before and after analyzing said extranormalized data with said fuzzy logic gap checking algorithm.
16. The method according to claim 1 1 , wherein said editing means generates at least one quality value by analyzing characteristics of said processed data selected from the group consisting of band height, band width, band shape, band's left gap, band's right gap, cross-banding and baseline buzz.
17. The method according to claim 16, wherein said editing means generates at least one quality value from said characteristics of said processed data by applying a plurality of fuzzy logic rules.
18. The method of claim 10, wherein said blind deconvolving is iterative and includes at least a first narrow-band guess for the filter band width value and a refined, second guess for the filter band width value.
19. A method of determining a nucleotide sequence of a DNA molecule comprising: providing a set of lane signals encoding the migration pattern of a DNA molecule subjected to DNA sequence analysis to generate an input data; preprocessing said input data to generate preprocessed data, said preprocessing comprising at least one of the following steps: identifying Begin and End points, subtracting baseline noise, spectrally separating said input data using a separation matrix to correct for spectral cross-talk, and leakage separating said input data using a separation matrix to correct for lane leakage; blind deconvolving said preprocessed data to generate blind deconvolved data, said blind deconvolving deblurring signals in said preprocessed data and normalizing signal amplitudes, said blind deconvolving using an iterative filter band width algorithm; extranormalizing said blind deconvolved data to generate extranormalized data, said extranormalizing including at least one of the following steps: correcting relative signal mobility differences using a Monte Carlo alignment, and attenuating signals accentuated by blind deconvolution; detecting peaks in said extranormalized data to generate peak detected-data; identifying and removing insertions in said peak-detected data using fuzzy logic insertion detection algorithm; identifying and filling gaps in said peak-detected data using fuzzy logic gap checking algorithm; producing processed sequence data.
20. A digital computer system programmed to perform method for identifying DNA sequence in unprocessed nucleic acid sequencing data, said digital computer system including: a central processing unit, dynamic memory, and means for outputting data, the method comprising: receiving unprocessed input data comprising unprocessed nucleic acid sequencing data; preprocessing said input data to generate preprocessed data: blind deconvolving said preprocessed data to generate blind deconvolved data; extranormalizing said blind deconvolved data to generate extranormalized data; detecting peaks in said extranormalized data to generate peak detected-data; identifying and removing insertions in said peak detected-data using a fuzzy logic insertion detection algorithm; identifying and filling gaps in said peak detected-data using a fuzzy logic gap checking algorithm; producing processed sequence data, and editing the quality of said processed data using fuzzy logic editing means to generate called nucleotide sequence and at least one quality value for said called sequence.
PCT/US1997/016933 1996-09-16 1997-09-16 Method and apparatus for analysis of chromatographic migration patterns WO1998011258A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
AU45882/97A AU4588297A (en) 1996-09-16 1997-09-16 Method and apparatus for analysis of chromatographic migration patterns
JP10514017A JP2001502165A (en) 1996-09-16 1997-09-16 Chromatographic electrophoresis pattern analysis method and apparatus
EP97944372A EP0944739A4 (en) 1996-09-16 1997-09-16 Method and apparatus for analysis of chromatographic migration patterns

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US2524196P 1996-09-16 1996-09-16
US60/025,241 1996-09-16

Publications (1)

Publication Number Publication Date
WO1998011258A1 true WO1998011258A1 (en) 1998-03-19

Family

ID=21824887

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1997/016933 WO1998011258A1 (en) 1996-09-16 1997-09-16 Method and apparatus for analysis of chromatographic migration patterns

Country Status (5)

Country Link
US (1) US6208941B1 (en)
EP (1) EP0944739A4 (en)
JP (1) JP2001502165A (en)
AU (1) AU4588297A (en)
WO (1) WO1998011258A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000000637A2 (en) * 1998-06-26 2000-01-06 Visible Genetics Inc. Method for sequencing nucleic acids with reduced errors
US6303303B1 (en) 1995-06-30 2001-10-16 Visible Genetics Inc Method and system for DNA sequence determination and mutation detection
WO2003029487A2 (en) * 2001-10-04 2003-04-10 Scientific Generics Limited Dna sequencer
US6554987B1 (en) 1996-06-27 2003-04-29 Visible Genetics Inc. Method and apparatus for alignment of signals for use in DNA base-calling
US7593819B2 (en) 2001-07-11 2009-09-22 Applied Biosystems, Llc Internal calibration standards for electrophoretic analyses

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6760668B1 (en) * 2000-03-24 2004-07-06 Bayer Healthcare Llc Method for alignment of DNA sequences with enhanced accuracy and read length
AU2001283299A1 (en) * 2000-08-14 2002-02-25 Incyte Genomics, Inc. Basecalling system and protocol
US6598013B1 (en) * 2001-07-31 2003-07-22 University Of Maine Method for reducing cross-talk within DNA data
US7222059B2 (en) * 2001-11-15 2007-05-22 Siemens Medical Solutions Diagnostics Electrophoretic trace simulator
DE10315581B4 (en) * 2003-04-05 2007-06-28 Agilent Technologies, Inc. (n.d.Ges.d.Staates Delaware), Palo Alto Method for quality determination of RNA samples
JP3978193B2 (en) * 2004-03-15 2007-09-19 ジーイー・メディカル・システムズ・グローバル・テクノロジー・カンパニー・エルエルシー Crosstalk correction method and X-ray CT apparatus
US8945361B2 (en) * 2005-09-20 2015-02-03 ProteinSimple Electrophoresis standards, methods and kits
WO2008006201A1 (en) 2006-07-10 2008-01-17 Convergent Bioscience Ltd. Method and apparatus for precise selection and extraction of a focused component in isoelectric focusing performed in micro-channels
US10107782B2 (en) * 2008-01-25 2018-10-23 ProteinSimple Method to perform limited two dimensional separation of proteins and other biologicals
US9330148B2 (en) * 2011-06-30 2016-05-03 International Business Machines Corporation Adapting data quality rules based upon user application requirements

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5273632A (en) * 1992-11-19 1993-12-28 University Of Utah Research Foundation Methods and apparatus for analysis of chromatographic migration patterns
US5365455A (en) * 1991-09-20 1994-11-15 Vanderbilt University Method and apparatus for automatic nucleic acid sequence determination

Family Cites Families (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4888695A (en) 1983-01-08 1989-12-19 Fuji Photo Film Co., Ltd. Signal processing method in autoradiography
US4868749A (en) 1983-01-08 1989-09-19 Fuji Photo Film Co., Ltd. Signal processing method in autoradiography
US4837687A (en) 1985-03-27 1989-06-06 Fuji Photo Film Co. Ltd. Method for analyzing an autoradiograph
US4941092A (en) 1985-05-23 1990-07-10 Fuji Photo Film Co., Ltd. Signal processing method for determining base sequence of nucleic acid
JPS6285861A (en) 1985-10-11 1987-04-20 Fuji Photo Film Co Ltd Signal processing method for determining base sequence of nucleic acid
JPS6285862A (en) 1985-10-11 1987-04-20 Fuji Photo Film Co Ltd Signal processing method for determining base sequence of nucleic acid
EP0240729A3 (en) 1986-03-05 1988-08-24 Fuji Photo Film Co., Ltd. Method of analyzing autoradiograph for determining base sequence of nucleic acid
US4885696A (en) 1986-03-26 1989-12-05 Fuji Photo Film Co., Ltd. Signal processing method for determining base sequence of nucleic acid
JPS62228165A (en) 1986-03-29 1987-10-07 Fuji Photo Film Co Ltd Signal processing method for determining base sequence of nucleic acid
JPS63167290A (en) 1986-12-27 1988-07-11 Fuji Photo Film Co Ltd Signal processing method for autoradiographic analysis
JPH0664057B2 (en) 1987-01-06 1994-08-22 富士写真フイルム株式会社 Signal processing method for autoradiographic analysis
WO1991016675A1 (en) 1990-04-06 1991-10-31 Applied Biosystems, Inc. Automated molecular biology laboratory
US5119316A (en) 1990-06-29 1992-06-02 E. I. Du Pont De Nemours And Company Method for determining dna sequences
US5218529A (en) 1990-07-30 1993-06-08 University Of Georgia Research Foundation, Inc. Neural network system and methods for analysis of organic materials and structures using spectral data
US5098536A (en) 1991-02-01 1992-03-24 Beckman Instruments, Inc. Method of improving signal-to-noise in electropherogram
US5888819A (en) * 1991-03-05 1999-03-30 Molecular Tool, Inc. Method for determining nucleotide identity through primer extension
US5419825A (en) 1991-07-29 1995-05-30 Shimadzu Corporation Base sequencing apparatus
US5502773A (en) * 1991-09-20 1996-03-26 Vanderbilt University Method and apparatus for automated processing of DNA sequence data
US5379420A (en) 1991-12-26 1995-01-03 Trw Inc. High-speed data searching apparatus and method capable of operation in retrospective and dissemination modes
US5400249A (en) 1992-03-27 1995-03-21 University Of Iowa Research Foundation Apparatus for assessing relatedness from autoradiograms
US5329461A (en) 1992-07-23 1994-07-12 Acrogen, Inc. Digital analyte detection system
WO1995005458A1 (en) 1993-08-12 1995-02-23 Perlin Mark W A system and method for producing maps and cloning genes therefrom
US5580728A (en) 1994-06-17 1996-12-03 Perlin; Mark W. Method and system for genotyping
DE4428658A1 (en) * 1994-08-12 1996-02-15 Siemens Ag Method for recognizing signals using fuzzy classification
US5741462A (en) * 1995-04-25 1998-04-21 Irori Remotely programmable matrices with memories
US5867402A (en) * 1995-06-23 1999-02-02 The United States Of America As Represented By The Department Of Health And Human Services Computational analysis of nucleic acid information defines binding sites
US5604100A (en) 1995-07-19 1997-02-18 Perlin; Mark W. Method and system for sequencing genomes

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5365455A (en) * 1991-09-20 1994-11-15 Vanderbilt University Method and apparatus for automatic nucleic acid sequence determination
US5273632A (en) * 1992-11-19 1993-12-28 University Of Utah Research Foundation Methods and apparatus for analysis of chromatographic migration patterns

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP0944739A4 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6303303B1 (en) 1995-06-30 2001-10-16 Visible Genetics Inc Method and system for DNA sequence determination and mutation detection
US6554987B1 (en) 1996-06-27 2003-04-29 Visible Genetics Inc. Method and apparatus for alignment of signals for use in DNA base-calling
WO2000000637A2 (en) * 1998-06-26 2000-01-06 Visible Genetics Inc. Method for sequencing nucleic acids with reduced errors
WO2000000637A3 (en) * 1998-06-26 2000-02-17 Visible Genetics Inc Method for sequencing nucleic acids with reduced errors
US6404907B1 (en) 1998-06-26 2002-06-11 Visible Genetics Inc. Method for sequencing nucleic acids with reduced errors
US7593819B2 (en) 2001-07-11 2009-09-22 Applied Biosystems, Llc Internal calibration standards for electrophoretic analyses
US8268558B2 (en) 2001-07-11 2012-09-18 Applied Biosystems, Llc Internal calibration standards for electrophoretic analyses
WO2003029487A2 (en) * 2001-10-04 2003-04-10 Scientific Generics Limited Dna sequencer
WO2003029487A3 (en) * 2001-10-04 2004-05-27 Scient Generics Ltd Dna sequencer

Also Published As

Publication number Publication date
JP2001502165A (en) 2001-02-20
EP0944739A1 (en) 1999-09-29
US6208941B1 (en) 2001-03-27
EP0944739A4 (en) 2000-01-05
AU4588297A (en) 1998-04-02

Similar Documents

Publication Publication Date Title
WO1998011258A1 (en) Method and apparatus for analysis of chromatographic migration patterns
US5273632A (en) Methods and apparatus for analysis of chromatographic migration patterns
US6554987B1 (en) Method and apparatus for alignment of signals for use in DNA base-calling
US5853979A (en) Method and system for DNA sequence determination and mutation detection with reference to a standard
US7384525B2 (en) Electrophoretic analysis system having in-situ calibration
US20070172828A1 (en) Genetic algorithms for optimization of genomics-based medical diagnostic tests
EP1456800A2 (en) A system and method for consensus-calling with per-base quality values for sample assemblies
EP3090381A1 (en) Systems and methods for spectral unmixing of microscopic images using pixel grouping
CN111755067A (en) Screening method of tumor neoantigen
CN109920480B (en) Method and device for correcting high-throughput sequencing data
CN114530199A (en) Method and device for detecting low-frequency mutation based on double sequencing data and storage medium
US20160357902A1 (en) Apparatus, Method, and System for Creating Phylogenetic Tree
US20160098517A1 (en) Apparatus and method for detecting internal tandem duplication
CN110729025B (en) Paraffin section sample somatic mutation detection method and device based on second-generation sequencing
Walther et al. Basecalling with lifetrace
US20020147548A1 (en) Basecalling system and protocol
CN105420374B (en) A kind of induction myeloid-lymphoid stem cell applies mutation detection methods early period
CN114005489B (en) Analysis method and device for detecting point mutation based on third-generation sequencing data
KR20210083208A (en) Methods and compositions for detection of somatic variations
Nelson Improving DNA sequencing accuracy and throughput
CN114913918A (en) High-throughput sequencing data analysis method and device for autism
CN112599251B (en) Construction method of disease screening model, disease screening model and screening device
Fjalldal et al. Automated genotyping: combining neural networks and decision trees to perform robust allele calling
CN116543835B (en) Method and device for detecting microsatellite state of plasma sample
WO2003029487A2 (en) Dna sequencer

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW AM AZ BY KG KZ MD RU TJ TM

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH KE LS MW SD SZ UG ZW AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
ENP Entry into the national phase

Ref country code: JP

Ref document number: 1998 514017

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 1997944372

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWP Wipo information: published in national office

Ref document number: 1997944372

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: CA

WWW Wipo information: withdrawn in national office

Ref document number: 1997944372

Country of ref document: EP