WO2003069534A2 - Analysis and management of molecular data and sequences - Google Patents

Analysis and management of molecular data and sequences Download PDF

Info

Publication number
WO2003069534A2
WO2003069534A2 PCT/EP2003/001586 EP0301586W WO03069534A2 WO 2003069534 A2 WO2003069534 A2 WO 2003069534A2 EP 0301586 W EP0301586 W EP 0301586W WO 03069534 A2 WO03069534 A2 WO 03069534A2
Authority
WO
WIPO (PCT)
Prior art keywords
computer system
data set
user
central computer
database
Prior art date
Application number
PCT/EP2003/001586
Other languages
French (fr)
Other versions
WO2003069534A3 (en
Inventor
Stefan Emler
Original Assignee
Smartgene Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Smartgene Gmbh filed Critical Smartgene Gmbh
Priority to AU2003210293A priority Critical patent/AU2003210293A1/en
Priority to DE20316651U priority patent/DE20316651U1/en
Priority to EP03739496A priority patent/EP1479027A2/en
Publication of WO2003069534A2 publication Critical patent/WO2003069534A2/en
Publication of WO2003069534A3 publication Critical patent/WO2003069534A3/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Definitions

  • the present invention relates to the field of analysis and management of molecular data and DNA/RNA/protein sequences.
  • HIV HIV
  • This treatment is in general started during early stages of the infection, in order to prevent or reverse immunodeficiency caused by HIV.
  • a "cocktail" of several drugs with different active components is prescribed. The drugs are administered every day and the overall duration of the treatment depends on its efficiency and has yet not been defined. Parameters to monitor the treatment are: the general status of the patient, the viral load in blood, the CD4 cell count and - for reasons of side-effects, liver and other parameters.
  • anti-HIV treatment can be very successful, the virus by itself has a great ability to mutate and can convert into a drug-resistant genotype by simply changing its target genes for the respective drug.
  • the Reverse Transcriptase or the Protease are targeted by drugs, however new drugs for other targets are now appearing on the market. Mutations of the target gene sequences can occur under treatment pressure and would then render the virus isolate resistant to the drug used. As mutation of HIV into a resistant variant may be followed by an increase of viral replication and therefore by a decrease of the patients immune defense and possibly lead to a fatal outcome, doctors carefully monitor treatment with the parameters mentioned above, plus with the viral resistance profile obtained by genotyping.
  • HIV genotyping is becoming an increasingly important tool in HIV patient care and therefore physicians all over the world order it as a routine test, performed by experienced hospital and private- owned labs. Handling of genotyping data however is quite cumbersome for the laboratory as for the connected physicians, they normally lack of an adequate IT infrastructure for sequence management and handling of complex data and analyses.
  • the object of the present invention is to provide solutions enhancing analysis of molecular data and sequences.
  • the present invention provides an environment according to claim 1 and a method according to claim 12.
  • Fig. 1 illustrates a functional arrangement of the system according to the present invention
  • Figs. 2 to 19 illustrate exemplary graphical user interfaces of the environment according to the present invention.
  • HIV genotyping and resistance analysis In the following desciption of preferred embodiments, reference is made to HIV genotyping and resistance analysis. It has to be noted that these embodiments, in particular software features, hardware features, related configurations and implemetations, shown grapgical user interfaces, presented sequences, mutations, targets and molucular structures are only of illustrative purpose, but are not intented to limit the present invention in any way. Further, as illustrative example, reference is made to the Integrated Database Network System IDNSTM for HIV genotyping without intending any limitation of the present invention.
  • IDNSTM Integrated Database Network System
  • IDNSTM the Integrated Database Network System provides the following features:
  • the IDNS is a service for data-management and data-analysis provided through the Web. Its backbone consists of three basic modules: a server based SQL database, a user-management module and an application-defining module. Through these three modules,the IDNS can provide an application (disease)-specific platform to any user worldwide; the platform can be specifically adapted to the customer requirements and passwords protect the access and retrict it to the customers data (user-management). Via flags, set on data-sets, the IDNS can enable customers to network together and enables them to share selected data and communicate online.
  • the IDNS is an ideal tool for multi-center studies, data gathering, long-distance collaboration between research centers, while providing each participant with the specific tools and data formats required.
  • the IDNS also provides with reference databases derived from proprietary or public datasets.
  • the Integrated Database Network System is a database service with a web- based user interface. It is accessed through the Internet via personalized passwords. The transmission of the login and of the password is encrypted and passwords can be changed regularly. Login, access time, duration, actions and - if required - the IP address of the computer where the access came from, are all recorded for a desired or given period of time, e.g. up to 1 year.
  • a user-specific login and a personal password permit direct access to all data- platforms of the IDNSTM-system for which the user has entitled access authorization.
  • the personal password delivers maximum security by utilizing encryption technology. This password is never to be shared. Standard license agreement can in- clude,e.g., up to three users with personal passwords and can be expanded to accommodate more users.
  • the IDNSTM-system registers every access made by a user.
  • the users executed activities, modifications or analysis made during the session are recorded and stored for 1 year.
  • the access computer can be retraced and IP addresses of servers or access computers can be recorded as well.
  • the access to the IDNSTM can be limited to certain computers, if the user requires higher standards of security.
  • the IDNSTM secured access allows in the aftermath to generate a complete documentation of the steps accomplished in order to obtain a result and indispensable for quality management or when submitting study results obtained with IDNSTM to supervising authorities, such as the FDA, where a record of progress is required.
  • IDNSTM- system Besides security, personal passwords in the IDNSTM- system are also intended to grant access authorization that is custom fit specifically to the user's role and to his level of expertise.
  • a lab technician for example, would be allowed to add and analyze sequences, but could not be enabled to modify or delete sequences in order to avoid errors.
  • An epidemiologist would be entitled to analyze existing data statistically, but could not modify them.
  • the head of the project and other authorized personnel are able to add, modify and delete entries without restriction. Levels of access to certain fields can be adapted to fit specific needs of the research project and of the staff involved.
  • IDNSTM permits to grant access to shared study- data for outside collaborators, while access to other non-shared lab-data is restricted to entitled inside lab staff.
  • Access to the IDNSTM over the Internet is guaranteed 24 hours 7 days a week and users can access it easily and conveniently from any computer that is connected to the Internet. It is moreover possible to work in an institute at several computers simultaneously, thus avoiding waiting lists for analyses to be accomplished by different persons. This creates a more efficient means of acquiring data in a timely manner. Accesses are not limited in numbers or duration; this enables users to manage their data in a convenient manner without restriction and render the IDNSTM - system particularly convenient and cost-efficient.
  • IDNSTM After a given number of failed consecutive login attempts (e.g. three failed consecutive login attempts), IDNSTM will automatically block all subsequent logins. The user then has the possibility to reactivate the Login by sending an email to system operator. This function is designed to prevent that an unauthorized user, through multiple tries, accesses the data-platform. 3 IDNS 3.0 USER MANAGER AND APPLICATION MANAGER
  • the IDNS 3.0 User Manager and Application Manager provide the following features:
  • Each user's access to the IDNS database is controlled by the Application Manager and the User Manager.
  • the Application Manager defines which applications a user can access (HIV, Bacteria 16s, Orthopox, etc.) as well as the tools, reference databases and tool parameters relevant to that particular user.
  • the User Manager defines the access rights, datasets (i.e. sample datasets) available, affiliations, and data sharing rights for each user.
  • Fig. 1 The functional arrangement of the User Manager and Application Manager as regards the database and system users and end user computer system accessing the system is illustrated in Fig. 1.
  • the IDNS 3.0 User Manager is a highly secure, web-based system (see figure 1) for managing IDNS 3.0 user rights. Through this tool new IDNS 3.0 databases can be implemented quickly and effectively, as well as modify those already existing, and monitor IDNS 3.0 use. For initial access to the system, a user interacts with the User Manager entrance illustrated in Fig. 2
  • the target menu allows the labelling of the IDNS 3.0 reference databases to be controlled as illustrated by the by User Manager Target Screen shown in Fig. 4. 3.4 Data sets menu
  • the various reference and private data sets are managed via the Data Sets Menu shown in Fig. 5.
  • the data sets can be either private or shared, depending on inter-group collaborations.
  • the Data Set Groups Screen allows the various data sets to be grouped together allowing users to access a group of data sets as illustrated in Fig. 6.
  • IDNS 3.0 platforms, reference wesites, tools and tool parameters are defined via the Applications Screen (see Fig. 7). Also initial settings are included for the definition of the reference and sample data set access.
  • the companies section contains details of the IDNS 3.0 client companies. For disaply towards a user, the system uses the User Manager Companies Screen shown in Fig. 8.
  • Definitions of the IDNS 3.0 individual users are stored in the users section and can be displayed via the User Manager User Screen illustrated in Fig. 9. Details include the user's name, login information, contact information, and the company a user belongs to.
  • the User Manager Activity Log Scree shown in Fig. 10 allows User Manager operatives to monitor IDNS 3.0 use. Each time the IDNS is accessed the relevant details such as the date, user, which application, the IP address of the computer used, etc., are added to the list on the right-hand side of the screen. The activity log can also be searched permitting a more precise display of information, for example, searching by company allows the details relevant to that particular company to be displayed. This is also illustrated in Fig. 10.
  • the entry page or so-called "home-page" of a user platform is the first Web-page which pops up after the login (see Fig. 11). Its design is kept simple in order to render the access to data and data-management functions easy. Only access to databases and functions that have been requested by the user are shown. The page is therefore not overloaded with unnecessary tools and items. Clear separation of reference data and user-owned sample data renders data-management and access to databases easy and reliable.
  • the design and logical structure of IDNSTM Web pages remain similar for different users and applications; this allows users to switch between different IDNSTM applications and platforms without loosing time for adaptation.
  • the platform's top section carries the user's logo and shows his name for the running session (see Fig. 12).
  • the bottom section shows common tools for administrative and communication purposes.
  • the central main menu which allows access to the reference and client databases will be discussed in Chapters 5 and 6.
  • IDNSTM users and laboratory manager can check for overall sequence- and/or sample- entry numbers; these functions allow give an overview of the respective data platform and of the laboratory's activities.
  • Hyperlinks Clicking the "link” button opens a webpage with customized hyperlinks which are directly accessible for the user. This function allows the user to organize and bookmark important websites with hyperlink connectivity. Hyperlinks and make these sites accessible quick and easily. The system operator takes care to update and evaluate those hyperlinks regularly and can integrate new sites on suggestion. Hyperlinks can be adapted and expanded according to platform profile and user needs.
  • This function allows to enter and store email-addresses of all individual users involved in a project, network or study and make their contact information available to all project participants. Through simple clicking on this email-address, the user can contact colleagues outside of its institute and can share problems or experiences.
  • the logout function the user can logoff himself from IDNSTM-platform when finished with his work. If there is no logoff after a longer period without no activity, the system will log-out automatically (the time-out can be defined specifically). This is another layer of database security and should avoid access of unauthorized personnel.
  • the respective logo of the user or his/her institution is displayed together with a hyperlink that directly connects to the respective home page of the institution, if available.
  • the logo of the system operator leads to the its Homepage.
  • the homepage will give you general information and on ongoing or accomplished development of with regard to IDNSTM and will provide you with links to customer service.
  • the homepage carries information on the services provided to the customers and presents new software tools which can be integrated on demand to existing IDNSTM platforms. 5 THE BLUE DATABASE - REFERENCE SEQUENCE DATABASE
  • the reference sequence databank in the IDNSTM the "Blue Database" contains reference sequences to which patient- or sample-sequences can be compared.
  • the NL43 consensus-sequence for HIV is used as a reference sequence as it is validated and updated regularly by different expert panels.
  • the NL43 sequences are updated with regard to the newest literature and highlights the positions susceptible for therapy resistances with blue coloring, in accordance to advice from expert panels.
  • the reference sequences determine the reading-frame and therefore the exact positions of possible mutations of analyzed sample sequences.
  • Other reference sequences representing regional dominant variants can also be added to the reference databank, after expert validation and may then be used for comparison.
  • New reference databases for new targets can be added at a later stage, thus enabling the user to keep his data-management up-to-date with his scientific proceedings.
  • IDNSTM-HIV platforms can vary with regard to specific user requirements and customization; user-specific application platforms show only analysis tools which have been requested by the user for his specific requirements. Below is a description of typical functions.
  • Search mutations will detect and identify mutations of a specific sample sequence, in comparison to the designated reference sequence (see Fig. 13). Any sequence which the user pastes into the "search mutations” field is compared to the reference sequence, already stored in the reference database.
  • This function enables experienced users to enter and store new, additional reference sequences, such as regionally dominant virus sub-types.
  • the RT gene sequence and the Protease gene sequence can be entered as separate sequences or as one stretch, plus the therewith-connected information such as origin and particularity. Entering and assignment of reference sequences is restricted to experts with specific access rights. By default, the international reference sequence "NL43" is accessible in the reference database.
  • authorized experts/users can delete reference sequences.
  • HIV drug target sequences can be entered either in a separate manner (RT and Protease) or as a continuous stretch; here, the separate entry is commented as an example.
  • Other sequence targets can be added such as gp41, pl7...
  • the IDNSTM-HIV platform can handle more than 8 different sequence targets; this renders it flexible for the ongoing evolution in drug resistance monitoring, patient care and for other aspects of clinical HIV research. 1.1.1 "Reference nb'V'Sequence date"
  • a publicly available reference sequence from Genbank, EMBL, or from other public databases comes with its accession number under which the sequence has been published. When a lab-internal reference sequence is entered, this will be the tag given by the laboratory, plus the date of entry registration.
  • Determination and “Last Update” indicate the origin of the reference sequence (e.g. sequence derived from isolate XY West- Africa) and the date of the last update.
  • Source is where the reference sequence entry originates from: e.g. laboratory XY or expert panel/publication/journal.
  • RT sequence of HIV encodes for the gene of the Reverse Transcriptase enzyme in HIV .
  • RT transcribes viral RNA in DNA after the entry of the virus into the cell. The transcription renders the viral genome compatible with the host DNA and permits integration in the host genome.
  • This enzyme is retro-virus- specific and is therefore a preferred target to many anti- viral drugs (RT-inhibitors, nu- cleoside-analogs).
  • the sequence can be pasted along with its associate information into the respective fields and will then be recorded when quitting the site.
  • Protease (PR) sequence of HIV codes for the gene of the HIV protease enzyme. This enzyme cleaves the HIV proteins after their reproduction within the host-cell and thus renders allows assembly of infectious virus particles.
  • Reference databases for all sequence targets can be designed and can integrate published or customer-owned sequences.
  • Regular - automated — updates of reference databases with regard to published data from public databases guarantee an up-to-date quality standard of the sequence analysis procedure and diminish the work-load of the laboratory staff considerably.
  • Customer-owned reference sequences would be an integral part of the respective reference database but will not be shared with other customers, unless the submitting laboratory decides otherwise.
  • the "Red database” is the customer-created sample database.
  • each laboratory has its own databases that are freely accessed by its lab personnel and non-accessible to other IDNSTM users. Upon request, this database or subsets of it (“study database”) can be connected to other laboratories' databases and can be integrated into a "collector" database for multi-center collaborations.
  • the sample database stores the sample-data and patient sequences that are produced in the laboratory and submitted to the IDNSTM.
  • Patient names, patient addresses or other data which may be used to retrieve patients, are not stored in the IDNSTM; for this purpose, a link to the hospital/laboratory internal data-system is created by a common key number.
  • the user is provided with individually adapted and application-optimized functions, designed to analyze the data, the sequences, to organize the database, to export data and to inform other users/collaborators on results etc.
  • the sample database fulfills the function of a data archive and provides the user with parts of this database can be shared with other institutes or laboratories.
  • switch-board central network node
  • mutations of the patient sample can be obtained quickly by comparing the patient sequence to NL43.
  • “Quick search mutations” see Fig. 14
  • the user can search for any sequence of an HIV-isolate with the sample- or patient-number and automatically compare them to the respectively active reference sequence (see reference “Blue Database”).
  • the function "Quick search mutations" separately examines the RT- and Protease- sequence of a sample. This function allows the selected sequence to align with the reference sequence, and any mutations will be identified, as well as any deletions and insertions. These variations will be recorded automatically (optional) or by the user and will be translated respectively to the reading frame in amino acids with the respective amino acid position. This will be displayed on a Clipboard. IUPAC- nucleotide-positions are also recognized and corresponding alternatively-amino acid through "/" separately indicated.
  • the function "Quick search mutations” reduces the once laborious preparation of a sequence to less than 1 minute, in utmost security (recognizes mutation that are al- ready stored, the cursor forgets no mutations and positions that are known for resistances, are blue underlined) and with total flexibility (the user selects relevant mutations). If all mutations are recognized and are itemized correspondingly in a Clipboard, they can be stored directly in the respective sample file; at the same time earlier mutation are deleted and therefore doubling or transmission mistakes avoided. Physicians, lab personnel or nurses, who does not need to possess knowledge in the virus genetics or the molecular biology, can also use the function "Quick search mutations". Therefore the user spectrum is considerably expanded. Common functions found in this section are described below.
  • search sample here represents the patient tag of the hospital of with a network. It also can be used as "Patient ID number”.
  • Another patient ID can also be named laboratory label or study label. All labels can be named and defined (number of positions) according to the customer's requirement. With the "Patient Label", the lab personnel can type in the patient number or unique identifier without revealing patient name and other private data.
  • Patient samples can be sorted by cohort number/sequence date/sample data and also for a certain time period (see above sample date from - to). This enables to retrieve samples specifically and to set samples in relation, e.g. all samples from 1 patient sorted by date.
  • search-sequence is copied by simple copy/paste into the assigned field, furnished with a reference number and then in second compared against the entire sample databank.
  • the sample- sequences that are similar to the search-sequence are itemized and organized according to the degree of similarity and are represented in pair wise alignment.
  • Additional sample sequence serves to input and to storage of new samples and its sequences into the IDNSTM- system (see Fig. 18).
  • New Samples are furnished with a patient number, a laboratory number, and a sample date and sequence date. These parameters can individually be customized and adapted to the laboratory condition. New sequences are copied directly out of the respective sequencing program as a text file and are inserted into the respective window.
  • a further important option is the comparisons of samples to particular studies, which can be freely defined by the user and are implemented by SmartGene on request. Studies create data sub-groups out of assigned inputs of the sample databank, which can be shared again with other laboratories or can be separately managed. This function serves for the compilation of research programs or multi-center studies, without the need for the user to manage data twice. Samples or sequences that are assigned to certain studies will be adapted automatically for all users if the first user who inputted the sequence changes it afterwards.
  • This function permits the user to sort samples and sequences according to certain criteria or to summarize them in groups (see Fig. 19). At the same time, samples can be organized into lists along different criteria, printed out and single samples if necessary can be edited and modified.
  • sample sequences also permits the specific administration of studies as a subgroup of the sample databank. Editing functions, e.g. the modification of samples and sequences can be restricted on certain user-groups; therewith can external users for clinical research purposes also use the databank without jeopardizing the data at the same time.
  • the "Delete sample sequence” function can be used to delete samples-entries and also the associated sequences. This function can be blocked for unauthorized users to prevent unintentional manipulation and deletion of entries.
  • Align sequences is used for the specific comparison of single sequences that are single itemized through the "Align Sequences" - function or in list and then selected through a mouse-click and therewith ready for alignment.
  • the alignment-sequence list can include several lists/pages and single sequences again can be deleted.
  • IDNSTM-server All sample data that is handled on the secured IDNSTM-server is automatically backed up on a second, separate hard-drive. After 12 hours, a safety back-up copy is generated on a secondary server. Monthly and on customer request (optional) a CD-ROM with the customer specific data is produced and sent to the customer. Several other options for increased data protection are available within the IDNSTM system.
  • the IDNSTM system automatically performs back-ups on a second server hard-disc on a different computer. Every 24h, a copy of the complete database system is made to a server within another building, thus avoiding data loss and damage in case of physical destruction of a server. On request, IDNSTM users can get copies of their sample databases on CD-ROM for a minimal fee.
  • the IDNSTM access protection via personalized passwords does not allow one user the general access to all data on the IDNSTM server, but restricts is access privileges to the data he is entitled to see. No person outside the technical staff from the system operator has larger access privileges; physical access to servers is restricted by safe server location in locked compartments and by specific passwords to the server maintenance staff.
  • Firewalls to the Swiss academic network at the EPFL (Eidjust Tech- nische Hochhoff Lausanne/CH) and several firewalls within the the system operator's network prevent unauthorized access to the internal network and its server management capabilities.
  • Sample data stored on IDNSTM servers belong to the client who has deposited it; issues on data-ownership with regard to patients and third-parties are not relevant to the system operator.
  • the system operator can access user data for database maintenance procedures and for internal technical development. SmartGene will not disclose data to third-parties.
  • users can only modify and access their own data; only in case of a data-collection platform in a multi-center set-up, specifically entitled users from the study monitoring team will acquaint access to the complete study subsets.
  • IDNSTM can offer the following solutions:

Abstract

The present invention provides an environment for analysis of molecular sequences, comprising a central computer system, at least one end user computer system, a network for data communications between the central computer system and the at least one end user computer system, a database associated to the central computer system and comprising at least one reference molecular sequence or data set, wherein the central computer system is adapted to store at least one sample sequence or data set, communicated from the at least one end user computer system in the database for analysis with respect to the at least one reference molecular sequence or data set.

Description

ANALYSIS AND MANAGEMENTOF MOLECULAR DATA ANDSEQUENCES
Field of the Invention
The present invention relates to the field of analysis and management of molecular data and DNA/RNA/protein sequences.
Background of the Invention
Molecular diagnostics based on genetic and genomic analysis of patient genes or genes from microorganisms is a rapidly growing field in today's medicine. As technology enables to run sequence analysis and detection of genetic profiles in an automated and speedy manner, those tests become available for routine medical diagnosis and disease management. However, technology not only handles tests and reagents, but provides with an increasingly complex amount of information which cannot be handled by the usual tools and staff expertise available to laboratories and physicians. Data-management and data-analysis therefore become the bottle-neck of molecular diagnostics in labs and physician offices.
As an example, we may mention the infection with the HIV, causing AIDS: HIV, as a fatal viral infection has become a chronic disease since the successful treatment with antiviral drugs. This treatment is in general started during early stages of the infection, in order to prevent or reverse immunodeficiency caused by HIV. For treatment purposes, in general, a "cocktail" of several drugs with different active components is prescribed. The drugs are administered every day and the overall duration of the treatment depends on its efficiency and has yet not been defined. Parameters to monitor the treatment are: the general status of the patient, the viral load in blood, the CD4 cell count and - for reasons of side-effects, liver and other parameters.
While anti-HIV treatment can be very successful, the virus by itself has a great ability to mutate and can convert into a drug-resistant genotype by simply changing its target genes for the respective drug.
In most cases, the Reverse Transcriptase or the Protease are targeted by drugs, however new drugs for other targets are now appearing on the market. Mutations of the target gene sequences can occur under treatment pressure and would then render the virus isolate resistant to the drug used. As mutation of HIV into a resistant variant may be followed by an increase of viral replication and therefore by a decrease of the patients immune defense and possibly lead to a fatal outcome, doctors carefully monitor treatment with the parameters mentioned above, plus with the viral resistance profile obtained by genotyping.
Detection of a potential resistance-encoding mutation helps them to adjust the treatment for a more efficient individual drug combination. HIV genotyping is becoming an increasingly important tool in HIV patient care and therefore physicians all over the world order it as a routine test, performed by experienced hospital and private- owned labs. Handling of genotyping data however is quite cumbersome for the laboratory as for the connected physicians, they normally lack of an adequate IT infrastructure for sequence management and handling of complex data and analyses.
Object of the Invention
In general, the object of the present invention is to provide solutions enhancing analysis of molecular data and sequences. Short Description of the Invention
To solve to above problem, the present invention provides an environment according to claim 1 and a method according to claim 12.
Further solutions and embodiments of the present invention are defined in the further claims.
Short Description of the Figures
In the following description of preferred embodiments of the present invention, it is referred to the accompanying drawings wherein:
Fig. 1 illustrates a functional arrangement of the system according to the present invention, and
Figs. 2 to 19 illustrate exemplary graphical user interfaces of the environment according to the present invention.
Description of preferred Embodiments
In the following desciption of preferred embodiments, reference is made to HIV genotyping and resistance analysis. It has to be noted that these embodiments, in particular software features, hardware features, related configurations and implemetations, shown grapgical user interfaces, presented sequences, mutations, targets and molucular structures are only of illustrative purpose, but are not intented to limit the present invention in any way. Further, as illustrative example, reference is made to the Integrated Database Network System IDNS™ for HIV genotyping without intending any limitation of the present invention.
1 GENERAL FEATURES OF IDNS™ FOR HIV
IDNS™ , the Integrated Database Network System provides the following features:
_ Can be implemented by a simple bookmark to a Web-browser on any Web- connected computer; no dedicated workstation is required
_ Is a customized, application-adapted data-management service.
_ Manages sequence data of HIV genetic drug targets.
_ Manages other data linked to the sequence targets (e.g. viral load, CD4 cell counts, treatment prescription, virtual phenotype...)
_ Keeps sequences and related data available for comparison to earlier cases, for treatment follow-up (development of resistance under treatment...) and for documentation
_ Allows easy and reliable analysis of sequences for the detection of treatment- relevant mutations
_ Can be extended to other genetic targets, even at a later stage
_ Can handle access from different sites simultaneously
_ Allows networking and data-collection/ data-sharing with other centers
_ Can be interfaced with customer lab programs or with expert programs for treatment advice.
_ Is easy to use and does not require knowledge in bio-informatics, in informatics or in program languages. It. can be handled by lab technicians for data- entry and routine data analysis.
_ Handles back-up, access, access-protection, data-safety from its central server following individual customer requirements without involvement of customer staff. _ Implements updates and upgrades remotely and on demand, without interference on-site _ Saves working time!
The IDNS is a service for data-management and data-analysis provided through the Web. Its backbone consists of three basic modules: a server based SQL database, a user-management module and an application-defining module. Through these three modules,the IDNS can provide an application (disease)-specific platform to any user worldwide; the platform can be specifically adapted to the customer requirements and passwords protect the access and retrict it to the customers data (user-management). Via flags, set on data-sets, the IDNS can enable customers to network together and enables them to share selected data and communicate online.
The IDNS is an ideal tool for multi-center studies, data gathering, long-distance collaboration between research centers, while providing each participant with the specific tools and data formats required. The IDNS also provides with reference databases derived from proprietary or public datasets.
1 LOGIN WEBPAGE: USERNAME/P ASS WORD : GATEWAY TO APPLI¬
CATION SPECIFIC IDNS™-PLATFORMS
The Integrated Database Network System (IDNS™) is a database service with a web- based user interface. It is accessed through the Internet via personalized passwords. The transmission of the login and of the password is encrypted and passwords can be changed regularly. Login, access time, duration, actions and - if required - the IP address of the computer where the access came from, are all recorded for a desired or given period of time, e.g. up to 1 year.
1.1 Personalized login and security
A user-specific login and a personal password permit direct access to all data- platforms of the IDNS™-system for which the user has entitled access authorization. The personal password delivers maximum security by utilizing encryption technology. This password is never to be shared. Standard license agreement can in- clude,e.g., up to three users with personal passwords and can be expanded to accommodate more users.
The IDNS™-system registers every access made by a user. The users executed activities, modifications or analysis made during the session are recorded and stored for 1 year. Finally, the access computer can be retraced and IP addresses of servers or access computers can be recorded as well. Thus the access to the IDNS™ can be limited to certain computers, if the user requires higher standards of security.
The IDNS™ secured access allows in the aftermath to generate a complete documentation of the steps accomplished in order to obtain a result and indispensable for quality management or when submitting study results obtained with IDNS™ to supervising authorities, such as the FDA, where a record of progress is required.
Problems and unauthorized activities can be detected, retraced and solutions designed in the event that contamination or other factors cause unexpected results. 1.2 Access at different levels of expertise
Besides security, personal passwords in the IDNS™- system are also intended to grant access authorization that is custom fit specifically to the user's role and to his level of expertise. A lab technician, for example, would be allowed to add and analyze sequences, but could not be enabled to modify or delete sequences in order to avoid errors. An epidemiologist would be entitled to analyze existing data statistically, but could not modify them. The head of the project and other authorized personnel are able to add, modify and delete entries without restriction. Levels of access to certain fields can be adapted to fit specific needs of the research project and of the staff involved. Within multi-center studies, IDNS™ permits to grant access to shared study- data for outside collaborators, while access to other non-shared lab-data is restricted to entitled inside lab staff.
1.3 Accessibility of IDNS™ data platforms
Access to the IDNS™ over the Internet is guaranteed 24 hours 7 days a week and users can access it easily and conveniently from any computer that is connected to the Internet. It is moreover possible to work in an institute at several computers simultaneously, thus avoiding waiting lists for analyses to be accomplished by different persons. This creates a more efficient means of acquiring data in a timely manner. Accesses are not limited in numbers or duration; this enables users to manage their data in a convenient manner without restriction and render the IDNS™ - system particularly convenient and cost-efficient.
1.4 Invalid logins
After a given number of failed consecutive login attempts (e.g. three failed consecutive login attempts), IDNS™ will automatically block all subsequent logins. The user then has the possibility to reactivate the Login by sending an email to system operator. This function is designed to prevent that an unauthorized user, through multiple tries, accesses the data-platform. 3 IDNS 3.0 USER MANAGER AND APPLICATION MANAGER
The IDNS 3.0 User Manager and Application Manager provide the following features:
- Each user's access to the IDNS database is controlled by the Application Manager and the User Manager.
- The Application Manager defines which applications a user can access (HIV, Bacteria 16s, Orthopox, etc.) as well as the tools, reference databases and tool parameters relevant to that particular user.
- The User Manager defines the access rights, datasets (i.e. sample datasets) available, affiliations, and data sharing rights for each user.
The functional arrangement of the User Manager and Application Manager as regards the database and system users and end user computer system accessing the system is illustrated in Fig. 1.
3.1 User Manager entrance
The IDNS 3.0 User Manager is a highly secure, web-based system (see figure 1) for managing IDNS 3.0 user rights. Through this tool new IDNS 3.0 databases can be implemented quickly and effectively, as well as modify those already existing, and monitor IDNS 3.0 use. For initial access to the system, a user interacts with the User Manager entrance illustrated in Fig. 2
3.2 User manager main menu
Once inside the User Manager the various functions are accessed via the User Manager Main Menu shown in Fig. 3.
3.3 Target menu
The target menu allows the labelling of the IDNS 3.0 reference databases to be controlled as illustrated by the by User Manager Target Screen shown in Fig. 4. 3.4 Data sets menu
The various reference and private data sets are managed via the Data Sets Menu shown in Fig. 5. The data sets can be either private or shared, depending on inter- group collaborations.
3.5 Data set groups
The Data Set Groups Screen allows the various data sets to be grouped together allowing users to access a group of data sets as illustrated in Fig. 6.
3.6 Applications
The particular IDNS 3.0 platforms, reference wesites, tools and tool parameters are defined via the Applications Screen (see Fig. 7). Also initial settings are included for the definition of the reference and sample data set access.
3.7 Companies
The companies section contains details of the IDNS 3.0 client companies. For disaply towards a user, the system uses the User Manager Companies Screen shown in Fig. 8.
3.8 Users
Definitions of the IDNS 3.0 individual users are stored in the users section and can be displayed via the User Manager User Screen illustrated in Fig. 9. Details include the user's name, login information, contact information, and the company a user belongs to.
3.9 Activity log
The User Manager Activity Log Scree shown in Fig. 10 allows User Manager operatives to monitor IDNS 3.0 use. Each time the IDNS is accessed the relevant details such as the date, user, which application, the IP address of the computer used, etc., are added to the list on the right-hand side of the screen. The activity log can also be searched permitting a more precise display of information, for example, searching by company allows the details relevant to that particular company to be displayed. This is also illustrated in Fig. 10.
1 THE HOME-PAGE AND MAIN MENU OF AN IDNS™-DATA-
PLATFORM FOR HIV
The entry page or so-called "home-page" of a user platform is the first Web-page which pops up after the login (see Fig. 11). Its design is kept simple in order to render the access to data and data-management functions easy. Only access to databases and functions that have been requested by the user are shown. The page is therefore not overloaded with unnecessary tools and items. Clear separation of reference data and user-owned sample data renders data-management and access to databases easy and reliable. The design and logical structure of IDNS™ Web pages remain similar for different users and applications; this allows users to switch between different IDNS™ applications and platforms without loosing time for adaptation.
1.1 General layout
The platform's top section carries the user's logo and shows his name for the running session (see Fig. 12). The bottom section shows common tools for administrative and communication purposes. The central main menu which allows access to the reference and client databases will be discussed in Chapters 5 and 6.
1.2 Database information
IDNS™ users and laboratory manager can check for overall sequence- and/or sample- entry numbers; these functions allow give an overview of the respective data platform and of the laboratory's activities.
1.3 Links
Clicking the "link" button opens a webpage with customized hyperlinks which are directly accessible for the user. This function allows the user to organize and bookmark important websites with hyperlink connectivity. Hyperlinks and make these sites accessible quick and easily. The system operator takes care to update and evaluate those hyperlinks regularly and can integrate new sites on suggestion. Hyperlinks can be adapted and expanded according to platform profile and user needs.
1.4 E-mails
This function allows to enter and store email-addresses of all individual users involved in a project, network or study and make their contact information available to all project participants. Through simple clicking on this email-address, the user can contact colleagues outside of its institute and can share problems or experiences.
1.5 Logout
With the logout function, the user can logoff himself from IDNS™-platform when finished with his work. If there is no logoff after a longer period without no activity, the system will log-out automatically (the time-out can be defined specifically). This is another layer of database security and should avoid access of unauthorized personnel.
1.6 Logo
At the right upper screen edge, the respective logo of the user or his/her institution is displayed together with a hyperlink that directly connects to the respective home page of the institution, if available.
1.7 SmartGene IDNS Homepage hyperlink
The logo of the system operator leads to the its Homepage. The homepage will give you general information and on ongoing or accomplished development of with regard to IDNS™ and will provide you with links to customer service.
The homepage carries information on the services provided to the customers and presents new software tools which can be integrated on demand to existing IDNS™ platforms. 5 THE BLUE DATABASE - REFERENCE SEQUENCE DATABASE
The reference sequence databank, in the IDNS™ the "Blue Database", contains reference sequences to which patient- or sample-sequences can be compared. In the case of most IDNS™-HIV platforms, the NL43 consensus-sequence for HIV is used as a reference sequence as it is validated and updated regularly by different expert panels. The NL43 sequences are updated with regard to the newest literature and highlights the positions susceptible for therapy resistances with blue coloring, in accordance to advice from expert panels.
The reference sequences determine the reading-frame and therefore the exact positions of possible mutations of analyzed sample sequences. Other reference sequences representing regional dominant variants can also be added to the reference databank, after expert validation and may then be used for comparison.
New reference databases for new targets can be added at a later stage, thus enabling the user to keep his data-management up-to-date with his scientific proceedings.
1.1 Analysis tools for the Reference Sequence-database
Analysis tools within the IDNS™-HIV platforms can vary with regard to specific user requirements and customization; user-specific application platforms show only analysis tools which have been requested by the user for his specific requirements. Below is a description of typical functions.
1.2 "Search mutations"
"Search mutations" will detect and identify mutations of a specific sample sequence, in comparison to the designated reference sequence (see Fig. 13). Any sequence which the user pastes into the "search mutations" field is compared to the reference sequence, already stored in the reference database.
The comparison is presented as pair-wise alignment marked by vertical bars, indicating identical positions. Mutations of the sequence are easily recognized, identified (through a mouse click) and automatically translated into the corresponding amino- acid with the amino-acid position. Mutated amino-acids and positions can then be stored to the sample sequence file by another mouse-click on "store mutations". This function also takes into account of deletions and insertions, which here should not interfere with the correct positioning of mutations.
"Search mutations" is completed by the "Quick Search Mutations" tool from the menu of the sample-database, which allows mutation analysis of already registered sample sequences.
1.3 "Add reference sequence"
This function enables experienced users to enter and store new, additional reference sequences, such as regionally dominant virus sub-types. The RT gene sequence and the Protease gene sequence can be entered as separate sequences or as one stretch, plus the therewith-connected information such as origin and particularity. Entering and assignment of reference sequences is restricted to experts with specific access rights. By default, the international reference sequence "NL43" is accessible in the reference database.
1.4 "Delete reference sequence"
Here, authorized experts/users can delete reference sequences.
1.5 "Edit reference sequence"
By editing a reference sequence file, the experienced user can modify the sequence, introduce new mutation sites, add or change comments; this function is typically restricted through the login to qualified staff in order to avoid errors in data- interpretation (see Fig. 14). HIV drug target sequences can be entered either in a separate manner (RT and Protease) or as a continuous stretch; here, the separate entry is commented as an example. Other sequence targets can be added such as gp41, pl7...
The IDNS™-HIV platform can handle more than 8 different sequence targets; this renders it flexible for the ongoing evolution in drug resistance monitoring, patient care and for other aspects of clinical HIV research. 1.1.1 "Reference nb'V'Sequence date"
A publicly available reference sequence from Genbank, EMBL, or from other public databases comes with its accession number under which the sequence has been published. When a lab-internal reference sequence is entered, this will be the tag given by the laboratory, plus the date of entry registration.
1.1.2 "Defmition'V'Last update"
"Definition" and "Last Update" indicate the origin of the reference sequence (e.g. sequence derived from isolate XY West- Africa) and the date of the last update.
1.1.3 "Source"
"Source" is where the reference sequence entry originates from: e.g. laboratory XY or expert panel/publication/journal.
1.1.4 "RT sequence"
Reverse transcriptase (RT) sequence of HIV: encodes for the gene of the Reverse Transcriptase enzyme in HIV . RT transcribes viral RNA in DNA after the entry of the virus into the cell. The transcription renders the viral genome compatible with the host DNA and permits integration in the host genome. This enzyme is retro-virus- specific and is therefore a preferred target to many anti- viral drugs (RT-inhibitors, nu- cleoside-analogs). The sequence can be pasted along with its associate information into the respective fields and will then be recorded when quitting the site.
1.1.5 "PR sequence"
Protease (PR) sequence of HIV: codes for the gene of the HIV protease enzyme. This enzyme cleaves the HIV proteins after their reproduction within the host-cell and thus renders allows assembly of infectious virus particles. Anti-HIV drugs, the so-called "protease-inhibitors", target this enzyme.
1.1.6 "RT and PR mutations"
Known differences (mutations) within the RT and Protease of this particular reference sequence with regard to the NL43 consensus reference sequence; e.g. mutations of a regional dominant variant with regard to NL43. 1.1.7 "Remarks"
This field is intended for entering remarks and comments on a particular reference sequence. "Free text" can be added explaining the sequences and the originating isolate, or any other relevant information that is important to the sample.
1.1.8 Updates of reference databases, customer reference data
Reference databases for all sequence targets can be designed and can integrate published or customer-owned sequences. Regular - automated — updates of reference databases with regard to published data from public databases guarantee an up-to-date quality standard of the sequence analysis procedure and diminish the work-load of the laboratory staff considerably. Customer-owned reference sequences would be an integral part of the respective reference database but will not be shared with other customers, unless the submitting laboratory decides otherwise.
6 THE RED DATABASE - SAMPLE SEQUENCE DATABASE:
The "Red database" is the customer-created sample database. Within the IDNS™ system, each laboratory has its own databases that are freely accessed by its lab personnel and non-accessible to other IDNS™ users. Upon request, this database or subsets of it ("study database") can be connected to other laboratories' databases and can be integrated into a "collector" database for multi-center collaborations.
The sample database stores the sample-data and patient sequences that are produced in the laboratory and submitted to the IDNS™. Patient names, patient addresses or other data which may be used to retrieve patients, are not stored in the IDNS™; for this purpose, a link to the hospital/laboratory internal data-system is created by a common key number. To manage the sample database, the user is provided with individually adapted and application-optimized functions, designed to analyze the data, the sequences, to organize the database, to export data and to inform other users/collaborators on results etc. The sample database fulfills the function of a data archive and provides the user with parts of this database can be shared with other institutes or laboratories. In multi-center studies data can be collected and disseminated through a centralized location, known as the "switch-board". Those with access to the switchboard (central network node), do not have access to modify the database, even if the data is in discord. If the inputting user decides later to modify the data, it will be automatically updated at the switchboard place and at other users places and will be recorded when this occurred.
The decision which data is made accessible to other institutes or the switchboard place is made by a simple mouse-click by the user. This procedure avoids the effort and danger to have two data-management systems: one for own and one for study purposes. These functions are more exactly explained below.
1.1 Analysis and management tools for the Sample Sequence database
As with the reference database the functions available can be customized to suit user needs.
1.2 "Quick search mutations"
With this automated function, mutations of the patient sample can be obtained quickly by comparing the patient sequence to NL43. Under "Quick search mutations" (see Fig. 14), the user can search for any sequence of an HIV-isolate with the sample- or patient-number and automatically compare them to the respectively active reference sequence (see reference "Blue Database").
The function "Quick search mutations" separately examines the RT- and Protease- sequence of a sample. This function allows the selected sequence to align with the reference sequence, and any mutations will be identified, as well as any deletions and insertions. These variations will be recorded automatically (optional) or by the user and will be translated respectively to the reading frame in amino acids with the respective amino acid position. This will be displayed on a Clipboard. IUPAC- nucleotide-positions are also recognized and corresponding alternatively-amino acid through "/" separately indicated.
The function "Quick search mutations" reduces the once laborious preparation of a sequence to less than 1 minute, in utmost security (recognizes mutation that are al- ready stored, the cursor forgets no mutations and positions that are known for resistances, are blue underlined) and with total flexibility (the user selects relevant mutations). If all mutations are recognized and are itemized correspondingly in a Clipboard, they can be stored directly in the respective sample file; at the same time earlier mutation are deleted and therefore doubling or transmission mistakes avoided. Physicians, lab personnel or nurses, who does not need to possess knowledge in the virus genetics or the molecular biology, can also use the function "Quick search mutations". Therefore the user spectrum is considerably expanded. Common functions found in this section are described below.
"Search sample"
The "search sample" here represents the patient tag of the hospital of with a network. It also can be used as "Patient ID number".
1.2.1 "Patient label"
Another patient ID, can also be named laboratory label or study label. All labels can be named and defined (number of positions) according to the customer's requirement. With the "Patient Label", the lab personnel can type in the patient number or unique identifier without revealing patient name and other private data.
1.2.2 "Lab label" / "Sample tag"
This is to be used as the lab designation of the sample (e.g. tube #12345).
1.2.3 "From sample date" / "To sample date"?
Defines a period of time for the editing of samples: e.g. all samples from April 1st to June 30th for surveillance studies etc.
1.2.4 "From sequence date" / "To sequence date"?
This defines a period of time for edition of sequencing results: e.g. all samples from April 1st to June 30th for lab management (e.g. accuracy testing of the sequencer, work load determination...) 1.2.5 "Studies"
Lab personnel can select in which research program the entry should participate. Sequences can be stored in multiple study databases without duplicating entry procedures. Study subsets can be managed separately and disclosed to other labs in a collaborative network.
1.2.6 "Empty RT", "Empty PR"
Selects sample entries where RT or PR sequences have not yet been entered - this tool enables lab personnel to check easily on work that currently is uncompleted.
1.2.7 "Sort by"
Patient samples can be sorted by cohort number/sequence date/sample data and also for a certain time period (see above sample date from - to). This enables to retrieve samples specifically and to set samples in relation, e.g. all samples from 1 patient sorted by date.
1.2.8 "RT Mutation" / "PR Mutation"
Here, specific mutations can be searched (1 at a time), e.g. 215 Y in RT sequences The function "Search for similar samples" serves, in addition, to query certain sequence patterns. It can also search new mutations in the sample databank and to completely compare the discovered sequences.
1.3 "Search for similar samples"
Under "Search for similar samples" (see Figs. 15, 16, 17), a sequence pattern in all sample sequences of the laboratory sample-databank is sought: the search-sequence is copied by simple copy/paste into the assigned field, furnished with a reference number and then in second compared against the entire sample databank. The sample- sequences that are similar to the search-sequence are itemized and organized according to the degree of similarity and are represented in pair wise alignment.
Out of a "Hit-list" that is itemized in addition to the alignments, the user can select the sample-sequence by clicking on it, which is in this context of special interest. These sequences can then be directly compared in a multi-alignment together without further formatting (see also "align sequences"). The combination of "Search for similar samples" and the direct multi-alignments function permits a fast and certain comparison of several sample-sequences in enormous comfort and time profit for the user. Interesting mutation patterns become recognizable and obvious, for example for epidemi- ologic aspects or the recognition of newer mutating sites, for example under therapy- influence.
1.4 "Add sample sequence"
"Add sample sequence" serves to input and to storage of new samples and its sequences into the IDNS™- system (see Fig. 18). New Samples are furnished with a patient number, a laboratory number, and a sample date and sequence date. These parameters can individually be customized and adapted to the laboratory condition. New sequences are copied directly out of the respective sequencing program as a text file and are inserted into the respective window.
Again, a comment can be added as free text. A further important option is the comparisons of samples to particular studies, which can be freely defined by the user and are implemented by SmartGene on request. Studies create data sub-groups out of assigned inputs of the sample databank, which can be shared again with other laboratories or can be separately managed. This function serves for the compilation of research programs or multi-center studies, without the need for the user to manage data twice. Samples or sequences that are assigned to certain studies will be adapted automatically for all users if the first user who inputted the sequence changes it afterwards.
1.5 "Edit sample sequence"
This function permits the user to sort samples and sequences according to certain criteria or to summarize them in groups (see Fig. 19). At the same time, samples can be organized into lists along different criteria, printed out and single samples if necessary can be edited and modified.
To edit sample sequences also permits the specific administration of studies as a subgroup of the sample databank. Editing functions, e.g. the modification of samples and sequences can be restricted on certain user-groups; therewith can external users for clinical research purposes also use the databank without jeopardizing the data at the same time.
1.6 "Delete sample sequence"
The "Delete sample sequence" function can be used to delete samples-entries and also the associated sequences. This function can be blocked for unauthorized users to prevent unintentional manipulation and deletion of entries.
1.7 "Align sequences"
The function "Align sequences" is used for the specific comparison of single sequences that are single itemized through the "Align Sequences" - function or in list and then selected through a mouse-click and therewith ready for alignment. The alignment-sequence list can include several lists/pages and single sequences again can be deleted.
With this function, for example all sequences of a patient or a patient group can be compared together. Therefore, the development of resistances under therapy can be observed. This function is especially important for physicians and drug development because the mutation frequency under therapy and the probability of the development of drug resistance can be detected.
7 DATA STORAGE PROTECTION / BACKUP
All sample data that is handled on the secured IDNS™-server is automatically backed up on a second, separate hard-drive. After 12 hours, a safety back-up copy is generated on a secondary server. Monthly and on customer request (optional) a CD-ROM with the customer specific data is produced and sent to the customer. Several other options for increased data protection are available within the IDNS™ system.
1.1 Data transmission
On the IDNS™ system, no personal data is stored that could permit a trace back to a particular patient (name, address etc). The data transmission is secured by https 128 bit encryption and permits no filter and access to passwords etc. 1.2 Back-ups and data storage
Without an intervention from the user, the IDNS™ system automatically performs back-ups on a second server hard-disc on a different computer. Every 24h, a copy of the complete database system is made to a server within another building, thus avoiding data loss and damage in case of physical destruction of a server. On request, IDNS™ users can get copies of their sample databases on CD-ROM for a minimal fee.
1.3 Access to data stored on an IDNS™ server
The IDNS™ access protection via personalized passwords does not allow one user the general access to all data on the IDNS™ server, but restricts is access privileges to the data he is entitled to see. No person outside the technical staff from the system operator has larger access privileges; physical access to servers is restricted by safe server location in locked compartments and by specific passwords to the server maintenance staff.
Secured data- and server access is achieved through the password system and its access attempt limitation; thus, hackers cannot enter the IDNS™ just by multiple tries to login. Firewalls, to the Swiss academic network at the EPFL (Eidgenossisch Tech- nische Hochschule Lausanne/CH) and several firewalls within the the system operator's network prevent unauthorized access to the internal network and its server management capabilities.
1.4 Data ownership
Sample data stored on IDNS™ servers belong to the client who has deposited it; issues on data-ownership with regard to patients and third-parties are not relevant to the system operator. The system operator can access user data for database maintenance procedures and for internal technical development. SmartGene will not disclose data to third-parties. In multi-center networks, users can only modify and access their own data; only in case of a data-collection platform in a multi-center set-up, specifically entitled users from the study monitoring team will acquaint access to the complete study subsets. 1.5 Extended data-protection
As additional features for increased data-protection, IDNS™ can offer the following solutions:
• Regularly changing passwords
• Access restricted to certified IP addresses
• Use of SmartCard® teclinology instead of passwords
• Configuration and installation of a user-specific IDNS™ server, either at locations within the system or within the user's intranet.

Claims

PATENT CLAIMS
1. An environment for analysis of molecular sequences, comprising:
- a central computer system,
- at least one end user computer system,
- a network for data communications between the central computer system and the at least one end user computer system,
- a database associated to the central computer system and comprising at least one reference molecular sequence or data set, wherein
- the central computer system is adapted to store at least one sample sequence or data set communicated from the at least one end user computer system in the database for analysis with respect to the at least one reference molecular sequence or data set.
2. The environment of claim 1, wherein
- the central computer system is adapted to analyze a molecular sequence or a data set communicated from the end user computer system to the central computer system with respect to the at least one reference molecular sequence or data set.
3. The environment of one of the preceding claims, wherein
- the central computer system is adapted to control accesses of the at least one end user computer system to the database.
4. The environment of the preceding claim, wherein
- the database comprises at least one public reference molecular sequence or data set that can be accessed by any user via the at least one end user computer system, if the accessing user is registered with the central computer system.
5. The environment of the preceding claim, wherein
- the database comprises at least one personalized reference molecular sequence or data set that can be only accessed by a user via the at least one end user computer system, if the accessing user is registered with the central computer system as user associated to the at least one personalized reference molecular sequence or data set .
6. The environment of one of the preceding claims, wherein
- the central computer system is adapted to store a sample molecular sequence or data set communicated from the at least one end user computer system in the database as a personalized molecular sequence or data set for later retrieval by a user being registered as authorized as regards the stored personalized molecular sequence or data set via an end user computer system.
7. The environment of one of the preceding claims, wherein
- the central computer system is adapted to store a sample molecular sequences or data sets communicated from the at least one end user computer system in the database as a public molecular sequence or data sets for later retrieval by any user being registered as authorized as regards accesses the database.
8. The environment of one of the preceding claims, wherein
- the central computer system is adapted to provided a front end interface to be displayed on the at least one end user computer system
9. The environment of claim 8, wherein
- the central computer system is adapted to control the front end interface.
10. The environment according the preceding claim, wherein
- the central computer system is adapted to provide the front end interface in line with requirements specified by a user only to an end user computer system utilized by that user.
11. The environment one of the preceding claims, wherein
- the central computer system is adapted to control at least one storing, accessing, analyzing and retrieving of at least one the at least one reference molecular sequence or data set and sample molecular sequences or data sets communicated from the at least one end user computer system in line with user specified requirements.
12. A method for analysis of molecular sequences or data sets, comprising the steps of operating the system according to one of the claims 1 to 11
13. The method of claim 12, comprising the steps of:
- providing a central computer system,
- providing at least one end user computer system,
- providing a network for data communications between the central computer system and the at least one end user computer system,
- providing a database associated to the central computer system and comprising at least one reference molecular sequence or data set, and
- storing by means of the central computer system at least one sample sequence or data set communicated from the at least one end user computer system in the database for analysis with respect to the at least one reference molecular sequence or data set.
14. The method of claim 12 or 13, comprising the step of
- analyzing by means of the central computer system a molecular sequence or data set communicated from the end user computer system to the central computer system with respect to the at least one reference molecular sequence or data set.
15. The method of one of the claims 12 to 14, comprising the step of
- controlling by means of the central computer system accesses of the at least one end user computer system to the database.
16. The method of one of the claims 12 to 15, comprising the steps of
- storing at least one public reference molecular sequence or data set in the database, and
- controlling accesses to the database such that the public reference molecular sequence or data set can be accessed by any user via the at least one end user computer system, if the accessing user is registered with the central computer system.
17. The method of one of the claims 12 to 16, comprising the steps of
- storing at least one personalized reference molecular sequence or data set in the database, and
- controlling accesses to the database such that the personalized reference molecular sequence can be only accessed by a user via the at least one end user computer system, if the accessing user is registered with the central computer system as user associated to the at least one personalized reference molecular sequence.
18. The method of one of the claims 12 to 17, comprising the step of
- storing by means of the central computer system a sample molecular sequences communicated from the at least one end user computer system in the database as a personalized molecular sequence or data set for later retrieval by a user being registered as authorized as regards the stored personalized molecular sequence or data set via an end user computer system.
19. The method of one of the claims 12 to 18, comprising the step of
- storing by means of the central computer system a sample molecular sequence or a sample data set communicated from the at least one end user computer system in the database as a referemce molecular sequence data or data set for later retrieval by any user being registered as authorized as regards accesses to the database.
20. The method of one of the claims 12 to 19, comprising the step of
- providing by means of the central computer system a front end interface to be displayed on the at least one end user computer system.
21. The method of claim 20, comprising the step of
- controlling by means of the central computer system the front end interface.
22. The method of claim 20 or 21, comprising the step of
- providing by means of the central computer system a front end interface in line with requirements specified by a user only to an end user computer system utilized by that user.
23. The method of one of the claims 12 to 22, comprising at least one of the steps of
- storing, accessing, analyzing and retrieving of at least one the at least one reference molecular sequence or data set and sample molecular sequences or data sets communicated from the at least one end user computer system in line with user specified requirements under control of the central computer system.
24. A computer program product, comprising program code portions for at least one of controlling the system according to one of the claims 1 to 11 and carrying out the method according to one of the claims 12 to 23.
25. Computer program code according to claim 24, being stored in a computer readable device or on a computer readable storage means.
PCT/EP2003/001586 2002-02-15 2003-02-17 Analysis and management of molecular data and sequences WO2003069534A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
AU2003210293A AU2003210293A1 (en) 2002-02-15 2003-02-17 Analysis and management of molecular data and sequences
DE20316651U DE20316651U1 (en) 2002-02-15 2003-02-17 Analysis and management of molecular data and sequences
EP03739496A EP1479027A2 (en) 2002-02-15 2003-02-17 Analysis and management of molecular data and sequences

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE10206406.7 2002-02-15
DE10206406A DE10206406A1 (en) 2002-02-15 2002-02-15 Device and method for carrying out and evaluating genetic analyzes

Publications (2)

Publication Number Publication Date
WO2003069534A2 true WO2003069534A2 (en) 2003-08-21
WO2003069534A3 WO2003069534A3 (en) 2004-09-10

Family

ID=27674687

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2003/001586 WO2003069534A2 (en) 2002-02-15 2003-02-17 Analysis and management of molecular data and sequences

Country Status (4)

Country Link
EP (1) EP1479027A2 (en)
AU (1) AU2003210293A1 (en)
DE (2) DE10206406A1 (en)
WO (1) WO2003069534A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007053962A1 (en) * 2005-11-09 2007-05-18 Smartgene Gmbh Computer-implemented method and computer system for identifying organisms
US20120215724A1 (en) * 2011-02-18 2012-08-23 Bank Of America Corporation Institutional provided data share platform
US9026991B2 (en) 2011-02-18 2015-05-05 Bank Of America Corporation Customizable financial institution application interface

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102004020860A1 (en) * 2004-04-28 2005-11-24 Siemens Ag Method and system for transmitting data originating from a medical examination device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5966712A (en) * 1996-12-12 1999-10-12 Incyte Pharmaceuticals, Inc. Database and system for storing, comparing and displaying genomic information
WO2001055911A1 (en) * 2000-01-27 2001-08-02 Informax, Inc. Integrated access to biomedical resources

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5966712A (en) * 1996-12-12 1999-10-12 Incyte Pharmaceuticals, Inc. Database and system for storing, comparing and displaying genomic information
WO2001055911A1 (en) * 2000-01-27 2001-08-02 Informax, Inc. Integrated access to biomedical resources

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
KANTOR R ET AL: "Human Immunodeficiency Virus Reverse Transcriptase and Protease Sequence Database: an expanded data model integrating natural language text and sequence analysis programs" NUCLEIC ACIDS RESEARCH, OXFORD UNIVERSITY PRESS, SURREY, GB, vol. 29, no. 1, 1 January 2001 (2001-01-01), pages 296-299, XP002202301 ISSN: 0305-1048 *
SHAFER R W ET AL: "Human Immunodeficiency Virus Reverse Transcriptase and Protease Sequence Database" NUCLEIC ACIDS RESEARCH, OXFORD UNIVERSITY PRESS, SURREY, GB, vol. 28, no. 1, 1 January 2000 (2000-01-01), pages 346-348, XP002202302 ISSN: 0305-1048 *
SMITH R F ET AL: "BCM SEARCH LAUNCHER-AN INTEGRATED INTERFACE TO MOLECULAR BIOLOGY DATA BASE SEARCH AND ANALYSIS SERVICES AVAILABLE ON THE WORLD WIDE WEB" GENOME RESEARCH, COLD SPRING HARBOR LABORATORY PRESS, US, vol. 6, no. 5, 1996, pages 454-462, XP001109609 ISSN: 1088-9051 *
UNWIN R ET AL: "Biology Workbench: A Computing and Analysis Environment for the Biological Sciences" BIOINFORMATICS, DATABASES AND SYSTEMS. S. LETOVSKY, EDITOR, 1999, pages 233-244, XP002284127 NORWELL, MA, USA *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007053962A1 (en) * 2005-11-09 2007-05-18 Smartgene Gmbh Computer-implemented method and computer system for identifying organisms
US20120215724A1 (en) * 2011-02-18 2012-08-23 Bank Of America Corporation Institutional provided data share platform
US8548930B2 (en) 2011-02-18 2013-10-01 Bank Of America Corporation Institutional provided data share platform
US9026991B2 (en) 2011-02-18 2015-05-05 Bank Of America Corporation Customizable financial institution application interface

Also Published As

Publication number Publication date
EP1479027A2 (en) 2004-11-24
AU2003210293A1 (en) 2003-09-04
DE10206406A1 (en) 2003-11-13
DE20316651U1 (en) 2004-02-12
WO2003069534A3 (en) 2004-09-10

Similar Documents

Publication Publication Date Title
Chan et al. Database-driven multi locus sequence typing (MLST) of bacterial pathogens
US6640211B1 (en) Genetic profiling and banking system and method
US5970500A (en) Database and system for determining, storing and displaying gene locus information
Rozanov et al. A web-based genotyping resource for viral sequences
US5786816A (en) Method and apparatus for graphical user interface-based and variable result healthcare plan
WO2003039234A2 (en) Pharmacogenomics-based system for clinical applications
US20030140043A1 (en) Clinical research data management system and method
US20070178501A1 (en) System and method for integrating and validating genotypic, phenotypic and medical information into a database according to a standardized ontology
CA2369485A1 (en) Methods for obtaining and using haplotype data
US20140142961A1 (en) Managing research data for clinical drug trials
US7246319B2 (en) Information system supporting customizable user interfaces and process flows
JP2001125929A (en) Graphical viewer for biomolecular array data
CA2447963A1 (en) System and method for life sciences discovery, design and development
US20020187496A1 (en) Genetic research systems
Lindblom et al. Bioinformatics for human genetics: promises and challenges
EP1500022A2 (en) DATA MINING OF SNP DATABASES FOR THE SELECTION OF INTRAGENIC SNPs
WO2003069534A2 (en) Analysis and management of molecular data and sequences
US20030211504A1 (en) Methods for identifying nucleic acid polymorphisms
Fu et al. Design and implementation of clinical LIS360 laboratory management system based on AI technology
EP1221126A2 (en) Graphical user interface for display and analysis of biological sequence data
Olund et al. BIMS: An information management system for biobanking in the 21st century
WO2003025703A2 (en) Methods of providing medical information and related systems and computer program products
JP2012515402A (en) Integrated desktop software for managing virus data
Casavant et al. An illustration of a Parallel/Distributed Architecture for Hierarchically Heterogeneous Web-Based Cooperative Applications
CN106529210A (en) Method and device for acquiring gene mutation site corresponding to psychology and spirit

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2003739496

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2003739496

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP