WO2003069534A2

WO2003069534A2 - Analysis and management of molecular data and sequences

Info

Publication number: WO2003069534A2
Application number: PCT/EP2003/001586
Authority: WO
Inventors: Stefan Emler
Original assignee: Smartgene Gmbh
Priority date: 2002-02-15
Filing date: 2003-02-17
Publication date: 2003-08-21
Also published as: EP1479027A2; AU2003210293A1; DE10206406A1; DE20316651U1; WO2003069534A3

Abstract

The present invention provides an environment for analysis of molecular sequences, comprising a central computer system, at least one end user computer system, a network for data communications between the central computer system and the at least one end user computer system, a database associated to the central computer system and comprising at least one reference molecular sequence or data set, wherein the central computer system is adapted to store at least one sample sequence or data set, communicated from the at least one end user computer system in the database for analysis with respect to the at least one reference molecular sequence or data set.

Description

ANALYSIS AND MANAGEMENTOF MOLECULAR DATA ANDSEQUENCES

Field of the Invention

The present invention relates to the field of analysis and management of molecular data and DNA/RNA/protein sequences.

Background of the Invention

Molecular diagnostics based on genetic and genomic analysis of patient genes or genes from microorganisms is a rapidly growing field in today's medicine. As technology enables to run sequence analysis and detection of genetic profiles in an automated and speedy manner, those tests become available for routine medical diagnosis and disease management. However, technology not only handles tests and reagents, but provides with an increasingly complex amount of information which cannot be handled by the usual tools and staff expertise available to laboratories and physicians. Data-management and data-analysis therefore become the bottle-neck of molecular diagnostics in labs and physician offices.

As an example, we may mention the infection with the HIV, causing AIDS: HIV, as a fatal viral infection has become a chronic disease since the successful treatment with antiviral drugs. This treatment is in general started during early stages of the infection, in order to prevent or reverse immunodeficiency caused by HIV. For treatment purposes, in general, a "cocktail" of several drugs with different active components is prescribed. The drugs are administered every day and the overall duration of the treatment depends on its efficiency and has yet not been defined. Parameters to monitor the treatment are: the general status of the patient, the viral load in blood, the CD4 cell count and - for reasons of side-effects, liver and other parameters.

While anti-HIV treatment can be very successful, the virus by itself has a great ability to mutate and can convert into a drug-resistant genotype by simply changing its target genes for the respective drug.

In most cases, the Reverse Transcriptase or the Protease are targeted by drugs, however new drugs for other targets are now appearing on the market. Mutations of the target gene sequences can occur under treatment pressure and would then render the virus isolate resistant to the drug used. As mutation of HIV into a resistant variant may be followed by an increase of viral replication and therefore by a decrease of the patients immune defense and possibly lead to a fatal outcome, doctors carefully monitor treatment with the parameters mentioned above, plus with the viral resistance profile obtained by genotyping.

Detection of a potential resistance-encoding mutation helps them to adjust the treatment for a more efficient individual drug combination. HIV genotyping is becoming an increasingly important tool in HIV patient care and therefore physicians all over the world order it as a routine test, performed by experienced hospital and private- owned labs. Handling of genotyping data however is quite cumbersome for the laboratory as for the connected physicians, they normally lack of an adequate IT infrastructure for sequence management and handling of complex data and analyses.

Object of the Invention

In general, the object of the present invention is to provide solutions enhancing analysis of molecular data and sequences. Short Description of the Invention

To solve to above problem, the present invention provides an environment according to claim 1 and a method according to claim 12.

Further solutions and embodiments of the present invention are defined in the further claims.

Short Description of the Figures

In the following description of preferred embodiments of the present invention, it is referred to the accompanying drawings wherein:

Fig. 1 illustrates a functional arrangement of the system according to the present invention, and

Figs. 2 to 19 illustrate exemplary graphical user interfaces of the environment according to the present invention.

Description of preferred Embodiments

In the following desciption of preferred embodiments, reference is made to HIV genotyping and resistance analysis. It has to be noted that these embodiments, in particular software features, hardware features, related configurations and implemetations, shown grapgical user interfaces, presented sequences, mutations, targets and molucular structures are only of illustrative purpose, but are not intented to limit the present invention in any way. Further, as illustrative example, reference is made to the Integrated Database Network System IDNS™ for HIV genotyping without intending any limitation of the present invention.

1 GENERAL FEATURES OF IDNS™ FOR HIV

IDNS™ , the Integrated Database Network System provides the following features:

_ Can be implemented by a simple bookmark to a Web-browser on any Web- connected computer; no dedicated workstation is required

_ Is a customized, application-adapted data-management service.

_ Manages sequence data of HIV genetic drug targets.

_ Manages other data linked to the sequence targets (e.g. viral load, CD4 cell counts, treatment prescription, virtual phenotype...)

_ Keeps sequences and related data available for comparison to earlier cases, for treatment follow-up (development of resistance under treatment...) and for documentation

_ Allows easy and reliable analysis of sequences for the detection of treatment- relevant mutations

_ Can be extended to other genetic targets, even at a later stage

_ Can handle access from different sites simultaneously

_ Allows networking and data-collection/ data-sharing with other centers

_ Can be interfaced with customer lab programs or with expert programs for treatment advice.

_ Is easy to use and does not require knowledge in bio-informatics, in informatics or in program languages. It. can be handled by lab technicians for data- entry and routine data analysis.

_ Handles back-up, access, access-protection, data-safety from its central server following individual customer requirements without involvement of customer staff. _ Implements updates and upgrades remotely and on demand, without interference on-site _ Saves working time!

The IDNS is a service for data-management and data-analysis provided through the Web. Its backbone consists of three basic modules: a server based SQL database, a user-management module and an application-defining module. Through these three modules,the IDNS can provide an application (disease)-specific platform to any user worldwide; the platform can be specifically adapted to the customer requirements and passwords protect the access and retrict it to the customers data (user-management). Via flags, set on data-sets, the IDNS can enable customers to network together and enables them to share selected data and communicate online.

The IDNS is an ideal tool for multi-center studies, data gathering, long-distance collaboration between research centers, while providing each participant with the specific tools and data formats required. The IDNS also provides with reference databases derived from proprietary or public datasets.

1 LOGIN WEBPAGE: USERNAME/P ASS WORD : GATEWAY TO APPLI¬

CATION SPECIFIC IDNS™-PLATFORMS

The Integrated Database Network System (IDNS™) is a database service with a web- based user interface. It is accessed through the Internet via personalized passwords. The transmission of the login and of the password is encrypted and passwords can be changed regularly. Login, access time, duration, actions and - if required - the IP address of the computer where the access came from, are all recorded for a desired or given period of time, e.g. up to 1 year.

1.1 Personalized login and security

A user-specific login and a personal password permit direct access to all data- platforms of the IDNS™-system for which the user has entitled access authorization. The personal password delivers maximum security by utilizing encryption technology. This password is never to be shared. Standard license agreement can in- clude,e.g., up to three users with personal passwords and can be expanded to accommodate more users.

The IDNS™-system registers every access made by a user. The users executed activities, modifications or analysis made during the session are recorded and stored for 1 year. Finally, the access computer can be retraced and IP addresses of servers or access computers can be recorded as well. Thus the access to the IDNS™ can be limited to certain computers, if the user requires higher standards of security.

The IDNS™ secured access allows in the aftermath to generate a complete documentation of the steps accomplished in order to obtain a result and indispensable for quality management or when submitting study results obtained with IDNS™ to supervising authorities, such as the FDA, where a record of progress is required.

Problems and unauthorized activities can be detected, retraced and solutions designed in the event that contamination or other factors cause unexpected results. 1.2 Access at different levels of expertise

Besides security, personal passwords in the IDNS™- system are also intended to grant access authorization that is custom fit specifically to the user's role and to his level of expertise. A lab technician, for example, would be allowed to add and analyze sequences, but could not be enabled to modify or delete sequences in order to avoid errors. An epidemiologist would be entitled to analyze existing data statistically, but could not modify them. The head of the project and other authorized personnel are able to add, modify and delete entries without restriction. Levels of access to certain fields can be adapted to fit specific needs of the research project and of the staff involved. Within multi-center studies, IDNS™ permits to grant access to shared study- data for outside collaborators, while access to other non-shared lab-data is restricted to entitled inside lab staff.

1.3 Accessibility of IDNS™ data platforms

Access to the IDNS™ over the Internet is guaranteed 24 hours 7 days a week and users can access it easily and conveniently from any computer that is connected to the Internet. It is moreover possible to work in an institute at several computers simultaneously, thus avoiding waiting lists for analyses to be accomplished by different persons. This creates a more efficient means of acquiring data in a timely manner. Accesses are not limited in numbers or duration; this enables users to manage their data in a convenient manner without restriction and render the IDNS™ - system particularly convenient and cost-efficient.

1.4 Invalid logins

After a given number of failed consecutive login attempts (e.g. three failed consecutive login attempts), IDNS™ will automatically block all subsequent logins. The user then has the possibility to reactivate the Login by sending an email to system operator. This function is designed to prevent that an unauthorized user, through multiple tries, accesses the data-platform. 3 IDNS 3.0 USER MANAGER AND APPLICATION MANAGER

The IDNS 3.0 User Manager and Application Manager provide the following features:

- Each user's access to the IDNS database is controlled by the Application Manager and the User Manager.

- The Application Manager defines which applications a user can access (HIV, Bacteria 16s, Orthopox, etc.) as well as the tools, reference databases and tool parameters relevant to that particular user.

- The User Manager defines the access rights, datasets (i.e. sample datasets) available, affiliations, and data sharing rights for each user.

The functional arrangement of the User Manager and Application Manager as regards the database and system users and end user computer system accessing the system is illustrated in Fig. 1.

3.1 User Manager entrance

The IDNS 3.0 User Manager is a highly secure, web-based system (see figure 1) for managing IDNS 3.0 user rights. Through this tool new IDNS 3.0 databases can be implemented quickly and effectively, as well as modify those already existing, and monitor IDNS 3.0 use. For initial access to the system, a user interacts with the User Manager entrance illustrated in Fig. 2

3.2 User manager main menu

Once inside the User Manager the various functions are accessed via the User Manager Main Menu shown in Fig. 3.

3.3 Target menu

The target menu allows the labelling of the IDNS 3.0 reference databases to be controlled as illustrated by the by User Manager Target Screen shown in Fig. 4. 3.4 Data sets menu

The various reference and private data sets are managed via the Data Sets Menu shown in Fig. 5. The data sets can be either private or shared, depending on inter- group collaborations.

3.5 Data set groups

The Data Set Groups Screen allows the various data sets to be grouped together allowing users to access a group of data sets as illustrated in Fig. 6.

3.6 Applications

The particular IDNS 3.0 platforms, reference wesites, tools and tool parameters are defined via the Applications Screen (see Fig. 7). Also initial settings are included for the definition of the reference and sample data set access.

3.7 Companies

The companies section contains details of the IDNS 3.0 client companies. For disaply towards a user, the system uses the User Manager Companies Screen shown in Fig. 8.

3.8 Users

Definitions of the IDNS 3.0 individual users are stored in the users section and can be displayed via the User Manager User Screen illustrated in Fig. 9. Details include the user's name, login information, contact information, and the company a user belongs to.

3.9 Activity log

The User Manager Activity Log Scree shown in Fig. 10 allows User Manager operatives to monitor IDNS 3.0 use. Each time the IDNS is accessed the relevant details such as the date, user, which application, the IP address of the computer used, etc., are added to the list on the right-hand side of the screen. The activity log can also be searched permitting a more precise display of information, for example, searching by company allows the details relevant to that particular company to be displayed. This is also illustrated in Fig. 10.

1 THE HOME-PAGE AND MAIN MENU OF AN IDNS™-DATA-

PLATFORM FOR HIV

The entry page or so-called "home-page" of a user platform is the first Web-page which pops up after the login (see Fig. 11). Its design is kept simple in order to render the access to data and data-management functions easy. Only access to databases and functions that have been requested by the user are shown. The page is therefore not overloaded with unnecessary tools and items. Clear separation of reference data and user-owned sample data renders data-management and access to databases easy and reliable. The design and logical structure of IDNS™ Web pages remain similar for different users and applications; this allows users to switch between different IDNS™ applications and platforms without loosing time for adaptation.

1.1 General layout

The platform's top section carries the user's logo and shows his name for the running session (see Fig. 12). The bottom section shows common tools for administrative and communication purposes. The central main menu which allows access to the reference and client databases will be discussed in Chapters 5 and 6.

1.2 Database information

IDNS™ users and laboratory manager can check for overall sequence- and/or sample- entry numbers; these functions allow give an overview of the respective data platform and of the laboratory's activities.

1.3 Links

Clicking the "link" button opens a webpage with customized hyperlinks which are directly accessible for the user. This function allows the user to organize and bookmark important websites with hyperlink connectivity. Hyperlinks and make these sites accessible quick and easily. The system operator takes care to update and evaluate those hyperlinks regularly and can integrate new sites on suggestion. Hyperlinks can be adapted and expanded according to platform profile and user needs.

1.4 E-mails

This function allows to enter and store email-addresses of all individual users involved in a project, network or study and make their contact information available to all project participants. Through simple clicking on this email-address, the user can contact colleagues outside of its institute and can share problems or experiences.

1.5 Logout

With the logout function, the user can logoff himself from IDNS™-platform when finished with his work. If there is no logoff after a longer period without no activity, the system will log-out automatically (the time-out can be defined specifically). This is another layer of database security and should avoid access of unauthorized personnel.

1.6 Logo

At the right upper screen edge, the respective logo of the user or his/her institution is displayed together with a hyperlink that directly connects to the respective home page of the institution, if available.

1.7 SmartGene IDNS Homepage hyperlink

The logo of the system operator leads to the its Homepage. The homepage will give you general information and on ongoing or accomplished development of with regard to IDNS™ and will provide you with links to customer service.

The homepage carries information on the services provided to the customers and presents new software tools which can be integrated on demand to existing IDNS™ platforms. 5 THE BLUE DATABASE - REFERENCE SEQUENCE DATABASE

The reference sequence databank, in the IDNS™ the "Blue Database", contains reference sequences to which patient- or sample-sequences can be compared. In the case of most IDNS™-HIV platforms, the NL43 consensus-sequence for HIV is used as a reference sequence as it is validated and updated regularly by different expert panels. The NL43 sequences are updated with regard to the newest literature and highlights the positions susceptible for therapy resistances with blue coloring, in accordance to advice from expert panels.

The reference sequences determine the reading-frame and therefore the exact positions of possible mutations of analyzed sample sequences. Other reference sequences representing regional dominant variants can also be added to the reference databank, after expert validation and may then be used for comparison.

New reference databases for new targets can be added at a later stage, thus enabling the user to keep his data-management up-to-date with his scientific proceedings.

1.1 Analysis tools for the Reference Sequence-database

Analysis tools within the IDNS™-HIV platforms can vary with regard to specific user requirements and customization; user-specific application platforms show only analysis tools which have been requested by the user for his specific requirements. Below is a description of typical functions.

1.2 "Search mutations"

"Search mutations" will detect and identify mutations of a specific sample sequence, in comparison to the designated reference sequence (see Fig. 13). Any sequence which the user pastes into the "search mutations" field is compared to the reference sequence, already stored in the reference database.

The comparison is presented as pair-wise alignment marked by vertical bars, indicating identical positions. Mutations of the sequence are easily recognized, identified (through a mouse click) and automatically translated into the corresponding amino- acid with the amino-acid position. Mutated amino-acids and positions can then be stored to the sample sequence file by another mouse-click on "store mutations". This function also takes into account of deletions and insertions, which here should not interfere with the correct positioning of mutations.

"Search mutations" is completed by the "Quick Search Mutations" tool from the menu of the sample-database, which allows mutation analysis of already registered sample sequences.

1.3 "Add reference sequence"

This function enables experienced users to enter and store new, additional reference sequences, such as regionally dominant virus sub-types. The RT gene sequence and the Protease gene sequence can be entered as separate sequences or as one stretch, plus the therewith-connected information such as origin and particularity. Entering and assignment of reference sequences is restricted to experts with specific access rights. By default, the international reference sequence "NL43" is accessible in the reference database.

1.4 "Delete reference sequence"

Here, authorized experts/users can delete reference sequences.

1.5 "Edit reference sequence"

By editing a reference sequence file, the experienced user can modify the sequence, introduce new mutation sites, add or change comments; this function is typically restricted through the login to qualified staff in order to avoid errors in data- interpretation (see Fig. 14). HIV drug target sequences can be entered either in a separate manner (RT and Protease) or as a continuous stretch; here, the separate entry is commented as an example. Other sequence targets can be added such as gp41, pl7...

The IDNS™-HIV platform can handle more than 8 different sequence targets; this renders it flexible for the ongoing evolution in drug resistance monitoring, patient care and for other aspects of clinical HIV research. 1.1.1 "Reference nb'V'Sequence date"

A publicly available reference sequence from Genbank, EMBL, or from other public databases comes with its accession number under which the sequence has been published. When a lab-internal reference sequence is entered, this will be the tag given by the laboratory, plus the date of entry registration.

1.1.2 "Defmition'V'Last update"

"Definition" and "Last Update" indicate the origin of the reference sequence (e.g. sequence derived from isolate XY West- Africa) and the date of the last update.

1.1.3 "Source"

"Source" is where the reference sequence entry originates from: e.g. laboratory XY or expert panel/publication/journal.

1.1.4 "RT sequence"

Reverse transcriptase (RT) sequence of HIV: encodes for the gene of the Reverse Transcriptase enzyme in HIV . RT transcribes viral RNA in DNA after the entry of the virus into the cell. The transcription renders the viral genome compatible with the host DNA and permits integration in the host genome. This enzyme is retro-virus- specific and is therefore a preferred target to many anti- viral drugs (RT-inhibitors, nu- cleoside-analogs). The sequence can be pasted along with its associate information into the respective fields and will then be recorded when quitting the site.

1.1.5 "PR sequence"

Protease (PR) sequence of HIV: codes for the gene of the HIV protease enzyme. This enzyme cleaves the HIV proteins after their reproduction within the host-cell and thus renders allows assembly of infectious virus particles. Anti-HIV drugs, the so-called "protease-inhibitors", target this enzyme.

1.1.6 "RT and PR mutations"

Known differences (mutations) within the RT and Protease of this particular reference sequence with regard to the NL43 consensus reference sequence; e.g. mutations of a regional dominant variant with regard to NL43. 1.1.7 "Remarks"

This field is intended for entering remarks and comments on a particular reference sequence. "Free text" can be added explaining the sequences and the originating isolate, or any other relevant information that is important to the sample.

1.1.8 Updates of reference databases, customer reference data

Reference databases for all sequence targets can be designed and can integrate published or customer-owned sequences. Regular - automated — updates of reference databases with regard to published data from public databases guarantee an up-to-date quality standard of the sequence analysis procedure and diminish the work-load of the laboratory staff considerably. Customer-owned reference sequences would be an integral part of the respective reference database but will not be shared with other customers, unless the submitting laboratory decides otherwise.

6 THE RED DATABASE - SAMPLE SEQUENCE DATABASE:

The "Red database" is the customer-created sample database. Within the IDNS™ system, each laboratory has its own databases that are freely accessed by its lab personnel and non-accessible to other IDNS™ users. Upon request, this database or subsets of it ("study database") can be connected to other laboratories' databases and can be integrated into a "collector" database for multi-center collaborations.

The sample database stores the sample-data and patient sequences that are produced in the laboratory and submitted to the IDNS™. Patient names, patient addresses or other data which may be used to retrieve patients, are not stored in the IDNS™; for this purpose, a link to the hospital/laboratory internal data-system is created by a common key number. To manage the sample database, the user is provided with individually adapted and application-optimized functions, designed to analyze the data, the sequences, to organize the database, to export data and to inform other users/collaborators on results etc. The sample database fulfills the function of a data archive and provides the user with parts of this database can be shared with other institutes or laboratories. In multi-center studies data can be collected and disseminated through a centralized location, known as the "switch-board". Those with access to the switchboard (central network node), do not have access to modify the database, even if the data is in discord. If the inputting user decides later to modify the data, it will be automatically updated at the switchboard place and at other users places and will be recorded when this occurred.

The decision which data is made accessible to other institutes or the switchboard place is made by a simple mouse-click by the user. This procedure avoids the effort and danger to have two data-management systems: one for own and one for study purposes. These functions are more exactly explained below.

1.1 Analysis and management tools for the Sample Sequence database

As with the reference database the functions available can be customized to suit user needs.

1.2 "Quick search mutations"

With this automated function, mutations of the patient sample can be obtained quickly by comparing the patient sequence to NL43. Under "Quick search mutations" (see Fig. 14), the user can search for any sequence of an HIV-isolate with the sample- or patient-number and automatically compare them to the respectively active reference sequence (see reference "Blue Database").

The function "Quick search mutations" separately examines the RT- and Protease- sequence of a sample. This function allows the selected sequence to align with the reference sequence, and any mutations will be identified, as well as any deletions and insertions. These variations will be recorded automatically (optional) or by the user and will be translated respectively to the reading frame in amino acids with the respective amino acid position. This will be displayed on a Clipboard. IUPAC- nucleotide-positions are also recognized and corresponding alternatively-amino acid through "/" separately indicated.

The function "Quick search mutations" reduces the once laborious preparation of a sequence to less than 1 minute, in utmost security (recognizes mutation that are al- ready stored, the cursor forgets no mutations and positions that are known for resistances, are blue underlined) and with total flexibility (the user selects relevant mutations). If all mutations are recognized and are itemized correspondingly in a Clipboard, they can be stored directly in the respective sample file; at the same time earlier mutation are deleted and therefore doubling or transmission mistakes avoided. Physicians, lab personnel or nurses, who does not need to possess knowledge in the virus genetics or the molecular biology, can also use the function "Quick search mutations". Therefore the user spectrum is considerably expanded. Common functions found in this section are described below.

"Search sample"

The "search sample" here represents the patient tag of the hospital of with a network. It also can be used as "Patient ID number".

1.2.1 "Patient label"

Another patient ID, can also be named laboratory label or study label. All labels can be named and defined (number of positions) according to the customer's requirement. With the "Patient Label", the lab personnel can type in the patient number or unique identifier without revealing patient name and other private data.

1.2.2 "Lab label" / "Sample tag"

This is to be used as the lab designation of the sample (e.g. tube #12345).

1.2.3 "From sample date" / "To sample date"?

Defines a period of time for the editing of samples: e.g. all samples from April 1^st to June 30^th for surveillance studies etc.

1.2.4 "From sequence date" / "To sequence date"?

This defines a period of time for edition of sequencing results: e.g. all samples from April 1^st to June 30th for lab management (e.g. accuracy testing of the sequencer, work load determination...) 1.2.5 "Studies"

Lab personnel can select in which research program the entry should participate. Sequences can be stored in multiple study databases without duplicating entry procedures. Study subsets can be managed separately and disclosed to other labs in a collaborative network.

1.2.6 "Empty RT", "Empty PR"

Selects sample entries where RT or PR sequences have not yet been entered - this tool enables lab personnel to check easily on work that currently is uncompleted.

1.2.7 "Sort by"

Patient samples can be sorted by cohort number/sequence date/sample data and also for a certain time period (see above sample date from - to). This enables to retrieve samples specifically and to set samples in relation, e.g. all samples from 1 patient sorted by date.

1.2.8 "RT Mutation" / "PR Mutation"

Here, specific mutations can be searched (1 at a time), e.g. 215 Y in RT sequences The function "Search for similar samples" serves, in addition, to query certain sequence patterns. It can also search new mutations in the sample databank and to completely compare the discovered sequences.

1.3 "Search for similar samples"

Under "Search for similar samples" (see Figs. 15, 16, 17), a sequence pattern in all sample sequences of the laboratory sample-databank is sought: the search-sequence is copied by simple copy/paste into the assigned field, furnished with a reference number and then in second compared against the entire sample databank. The sample- sequences that are similar to the search-sequence are itemized and organized according to the degree of similarity and are represented in pair wise alignment.

Out of a "Hit-list" that is itemized in addition to the alignments, the user can select the sample-sequence by clicking on it, which is in this context of special interest. These sequences can then be directly compared in a multi-alignment together without further formatting (see also "align sequences"). The combination of "Search for similar samples" and the direct multi-alignments function permits a fast and certain comparison of several sample-sequences in enormous comfort and time profit for the user. Interesting mutation patterns become recognizable and obvious, for example for epidemi- ologic aspects or the recognition of newer mutating sites, for example under therapy- influence.

1.4 "Add sample sequence"

"Add sample sequence" serves to input and to storage of new samples and its sequences into the IDNS™- system (see Fig. 18). New Samples are furnished with a patient number, a laboratory number, and a sample date and sequence date. These parameters can individually be customized and adapted to the laboratory condition. New sequences are copied directly out of the respective sequencing program as a text file and are inserted into the respective window.

Again, a comment can be added as free text. A further important option is the comparisons of samples to particular studies, which can be freely defined by the user and are implemented by SmartGene on request. Studies create data sub-groups out of assigned inputs of the sample databank, which can be shared again with other laboratories or can be separately managed. This function serves for the compilation of research programs or multi-center studies, without the need for the user to manage data twice. Samples or sequences that are assigned to certain studies will be adapted automatically for all users if the first user who inputted the sequence changes it afterwards.

1.5 "Edit sample sequence"

This function permits the user to sort samples and sequences according to certain criteria or to summarize them in groups (see Fig. 19). At the same time, samples can be organized into lists along different criteria, printed out and single samples if necessary can be edited and modified.

To edit sample sequences also permits the specific administration of studies as a subgroup of the sample databank. Editing functions, e.g. the modification of samples and sequences can be restricted on certain user-groups; therewith can external users for clinical research purposes also use the databank without jeopardizing the data at the same time.

1.6 "Delete sample sequence"

The "Delete sample sequence" function can be used to delete samples-entries and also the associated sequences. This function can be blocked for unauthorized users to prevent unintentional manipulation and deletion of entries.

1.7 "Align sequences"

The function "Align sequences" is used for the specific comparison of single sequences that are single itemized through the "Align Sequences" - function or in list and then selected through a mouse-click and therewith ready for alignment. The alignment-sequence list can include several lists/pages and single sequences again can be deleted.

With this function, for example all sequences of a patient or a patient group can be compared together. Therefore, the development of resistances under therapy can be observed. This function is especially important for physicians and drug development because the mutation frequency under therapy and the probability of the development of drug resistance can be detected.

7 DATA STORAGE PROTECTION / BACKUP

All sample data that is handled on the secured IDNS™-server is automatically backed up on a second, separate hard-drive. After 12 hours, a safety back-up copy is generated on a secondary server. Monthly and on customer request (optional) a CD-ROM with the customer specific data is produced and sent to the customer. Several other options for increased data protection are available within the IDNS™ system.

1.1 Data transmission

On the IDNS™ system, no personal data is stored that could permit a trace back to a particular patient (name, address etc). The data transmission is secured by https 128 bit encryption and permits no filter and access to passwords etc. 1.2 Back-ups and data storage

Without an intervention from the user, the IDNS™ system automatically performs back-ups on a second server hard-disc on a different computer. Every 24h, a copy of the complete database system is made to a server within another building, thus avoiding data loss and damage in case of physical destruction of a server. On request, IDNS™ users can get copies of their sample databases on CD-ROM for a minimal fee.

1.3 Access to data stored on an IDNS™ server

The IDNS™ access protection via personalized passwords does not allow one user the general access to all data on the IDNS™ server, but restricts is access privileges to the data he is entitled to see. No person outside the technical staff from the system operator has larger access privileges; physical access to servers is restricted by safe server location in locked compartments and by specific passwords to the server maintenance staff.

Secured data- and server access is achieved through the password system and its access attempt limitation; thus, hackers cannot enter the IDNS™ just by multiple tries to login. Firewalls, to the Swiss academic network at the EPFL (Eidgenossisch Tech- nische Hochschule Lausanne/CH) and several firewalls within the the system operator's network prevent unauthorized access to the internal network and its server management capabilities.

1.4 Data ownership

Sample data stored on IDNS™ servers belong to the client who has deposited it; issues on data-ownership with regard to patients and third-parties are not relevant to the system operator. The system operator can access user data for database maintenance procedures and for internal technical development. SmartGene will not disclose data to third-parties. In multi-center networks, users can only modify and access their own data; only in case of a data-collection platform in a multi-center set-up, specifically entitled users from the study monitoring team will acquaint access to the complete study subsets. 1.5 Extended data-protection

As additional features for increased data-protection, IDNS™ can offer the following solutions:

• Regularly changing passwords

• Access restricted to certified IP addresses

• Use of SmartCard^® teclinology instead of passwords

• Configuration and installation of a user-specific IDNS™ server, either at locations within the system or within the user's intranet.

Claims

PATENT CLAIMS

1. An environment for analysis of molecular sequences, comprising:

- a central computer system,

- at least one end user computer system,

- a network for data communications between the central computer system and the at least one end user computer system,

- a database associated to the central computer system and comprising at least one reference molecular sequence or data set, wherein

- the central computer system is adapted to store at least one sample sequence or data set communicated from the at least one end user computer system in the database for analysis with respect to the at least one reference molecular sequence or data set.

2. The environment of claim 1, wherein

- the central computer system is adapted to analyze a molecular sequence or a data set communicated from the end user computer system to the central computer system with respect to the at least one reference molecular sequence or data set.

3. The environment of one of the preceding claims, wherein

- the central computer system is adapted to control accesses of the at least one end user computer system to the database.

4. The environment of the preceding claim, wherein

- the database comprises at least one public reference molecular sequence or data set that can be accessed by any user via the at least one end user computer system, if the accessing user is registered with the central computer system.

5. The environment of the preceding claim, wherein

- the database comprises at least one personalized reference molecular sequence or data set that can be only accessed by a user via the at least one end user computer system, if the accessing user is registered with the central computer system as user associated to the at least one personalized reference molecular sequence or data set .

6. The environment of one of the preceding claims, wherein

- the central computer system is adapted to store a sample molecular sequence or data set communicated from the at least one end user computer system in the database as a personalized molecular sequence or data set for later retrieval by a user being registered as authorized as regards the stored personalized molecular sequence or data set via an end user computer system.

7. The environment of one of the preceding claims, wherein

- the central computer system is adapted to store a sample molecular sequences or data sets communicated from the at least one end user computer system in the database as a public molecular sequence or data sets for later retrieval by any user being registered as authorized as regards accesses the database.

8. The environment of one of the preceding claims, wherein

- the central computer system is adapted to provided a front end interface to be displayed on the at least one end user computer system

9. The environment of claim 8, wherein

- the central computer system is adapted to control the front end interface.

10. The environment according the preceding claim, wherein

- the central computer system is adapted to provide the front end interface in line with requirements specified by a user only to an end user computer system utilized by that user.

11. The environment one of the preceding claims, wherein

- the central computer system is adapted to control at least one storing, accessing, analyzing and retrieving of at least one the at least one reference molecular sequence or data set and sample molecular sequences or data sets communicated from the at least one end user computer system in line with user specified requirements.

12. A method for analysis of molecular sequences or data sets, comprising the steps of operating the system according to one of the claims 1 to 11

13. The method of claim 12, comprising the steps of:

- providing a central computer system,

- providing at least one end user computer system,

- providing a network for data communications between the central computer system and the at least one end user computer system,

- providing a database associated to the central computer system and comprising at least one reference molecular sequence or data set, and

- storing by means of the central computer system at least one sample sequence or data set communicated from the at least one end user computer system in the database for analysis with respect to the at least one reference molecular sequence or data set.

14. The method of claim 12 or 13, comprising the step of

- analyzing by means of the central computer system a molecular sequence or data set communicated from the end user computer system to the central computer system with respect to the at least one reference molecular sequence or data set.

15. The method of one of the claims 12 to 14, comprising the step of

- controlling by means of the central computer system accesses of the at least one end user computer system to the database.

16. The method of one of the claims 12 to 15, comprising the steps of

- storing at least one public reference molecular sequence or data set in the database, and

- controlling accesses to the database such that the public reference molecular sequence or data set can be accessed by any user via the at least one end user computer system, if the accessing user is registered with the central computer system.

17. The method of one of the claims 12 to 16, comprising the steps of

- storing at least one personalized reference molecular sequence or data set in the database, and

- controlling accesses to the database such that the personalized reference molecular sequence can be only accessed by a user via the at least one end user computer system, if the accessing user is registered with the central computer system as user associated to the at least one personalized reference molecular sequence.

18. The method of one of the claims 12 to 17, comprising the step of

- storing by means of the central computer system a sample molecular sequences communicated from the at least one end user computer system in the database as a personalized molecular sequence or data set for later retrieval by a user being registered as authorized as regards the stored personalized molecular sequence or data set via an end user computer system.

19. The method of one of the claims 12 to 18, comprising the step of

- storing by means of the central computer system a sample molecular sequence or a sample data set communicated from the at least one end user computer system in the database as a referemce molecular sequence data or data set for later retrieval by any user being registered as authorized as regards accesses to the database.

20. The method of one of the claims 12 to 19, comprising the step of

- providing by means of the central computer system a front end interface to be displayed on the at least one end user computer system.

21. The method of claim 20, comprising the step of

- controlling by means of the central computer system the front end interface.

22. The method of claim 20 or 21, comprising the step of

- providing by means of the central computer system a front end interface in line with requirements specified by a user only to an end user computer system utilized by that user.

23. The method of one of the claims 12 to 22, comprising at least one of the steps of

- storing, accessing, analyzing and retrieving of at least one the at least one reference molecular sequence or data set and sample molecular sequences or data sets communicated from the at least one end user computer system in line with user specified requirements under control of the central computer system.

24. A computer program product, comprising program code portions for at least one of controlling the system according to one of the claims 1 to 11 and carrying out the method according to one of the claims 12 to 23.

25. Computer program code according to claim 24, being stored in a computer readable device or on a computer readable storage means.