WO2013091709A1

WO2013091709A1 - Method and apparatus for real-time dynamic transformation of the code of a web document

Info

Publication number: WO2013091709A1
Application number: PCT/EP2011/073824
Authority: WO
Inventors: Antonio Felguera Segador; David HERNANDO DAVALILLO; Marta PALANQUES VILALLONGA
Original assignee: Fundació Privada Barcelona Digital Centre Tecnologic
Priority date: 2011-12-22
Filing date: 2011-12-22
Publication date: 2013-06-27

Abstract

An Obfuscation Server, a corresponding method, and a computer readable medium is disclosed for real-time dynamic transformation of the code generated by a web server before it is delivered to the user device, the code being used to provide a web page, or web application, to at least one user of a computing device thereby enhancing the security of the web application's data transmitted to and received from a user of at least one client device.

Description

METHOD AND APPARATUS FOR REAL-TIME DYNAMIC TRANSFORMATION OF THE CODE OF A WEB DOCUMENT

TECHNICAL FIELD

[001] The present invention relates generally to secure internet browsing, and in particular, to enhancing browsing security by transforming the morphology of a web application code.

BACKGROUND OF THE INVENTION

[002] In the last years, the presence of Internet has experienced a huge growth to the extent that web applications now offer capabilities comparable to desktop applications and are used in many aspects of daily life, from gaming to e-banking. FIG. 1 depicts a standard architecture 100 for accessing a web application which may be used, for example, for navigating the Internet. To access the World Wide Web, WWW, a piece of software, called a browser 111, installed on a user's computing device 110, such as personal computer PC, is executed. On execution, the browser 111 establishes a communication link 140 from the user's device 110 to a web server 120 hosted by any web application provider. The link 140 is established over a network 130, which is typically the Internet, but which can also refer to any sort of public or private network. The most common browsers used are Internet Explorer from Microsoft, Firefox from Mozilla, Chrome from Google, or Safari from Apple.

[003] Interaction between the browser 111 and the server 120 typically follows a series of steps that starts when the browser requests some contents, i.e. a specific web page. The server then generates and/or retrieves the requested contents, depending on whether those are dynamic or static, and sends them as a response to the browser. Last, the browser processes and renders the received contents. To retrieve a web page, which is composed by a set of web documents, the user introduces an address, or Universal Resource Locator URL, into the browser triggering a request to be sent from the PC's browser to the web server via the Hypertext Transfer Protocol HTTP. The web server 120 receives and processes the request and either recovers the document from a local storage or generates it dynamically. The web server 120 then transmits the document to the user's PC 110 via link 140. When the browser on the PC receives the document, it processes its contents to end up rendering it in the user device's 110 display.

[004] Since the contents of the web page are in general terms plaintext source code and require processing by the browser before they are displayed, they can be relatively easily analyzed and manipulated. This is one of the most common ways in which a malicious code, or malware, achieves its objectives.

[005] One protection typically applied to the channel established between the browser and the web application server is the use of ciphering protocols, so that information cannot be sniffed or modified throughout the channel. However, once in the user device, the contents need to be deciphered for processing by the browser. Moreover, although cryptographic tools may enable confidentiality and verification of integrity, a security gap exists between the endpoints of the actual channel (the user and the server) and the actual endpoints of the secure channel offered by these technologies (the browser and the server). This means that ciphering protocols cannot protect data once it reaches the browser and is deciphered. Therefore, a need exists to provide further security after deciphering occurs and before and while the user interacts with the application. As a consequence, contents are exposed to local manipulation in the mentioned gap, and it has become common for malware to be capable of exploiting the mentioned characteristics. This is typically used with two main goals: to obtain confidential information and/or to modify the user's or the server's actions. This malware is illegitimately installed on the user device and manipulates the browser, which in turn enables it to log contents, or modify contents, be it either contents to be transmitted or received content.

[006] Various solutions have been proposed to try to correct or minimize the possibility of attacks happening and enhance a device's security. One such solution requires the user to periodically update its browsing software, the operating system, or any other software in the user device. Security is maximized by re-installing the full operating system together with every application. However, this has the drawback of being very time consuming and burdensome for the end user.

[007] Another standard solution is to use complementary security tools, like antivirus, antispyware or online scanning of the device. However, even these additional security software tools are not completely efficient, as new malicious software are programmed to exploit the weakness even in the latest versions of these security software.

[008] Additionally, it is worth noting that the average browser user is not sufficiently aware of the existing security risks whilst browsing. Many times, potentially dangerous sites are browsed and programs run or downloaded from the Internet, or received by e- mail, without the user even suspecting that their device could get infected. In other words, the average user is not worried about the device's security, however still needs a secure environment.

[009] Furthermore, even if such users were security aware, they typically do not have enough computing knowledge to keep their devices protected. Even when a computing device has already all the security software installed, it is a known problem that many users do not apply the available updates for their operating systems, their web browsers, or antivirus. Even users with such knowledge do not want to be aggravated with standard security maintenance of their IT devices, and prefer such updating and security management to be performed for them, for example by the administrator.

[0010] The combination of these facts together with the browser's inherent vulnerabilities poses a great risk for devices and more concretely for a user's access to web applications. In case a machine gets infected, it becomes a hostile environment wherein the continuous execution of applications, including the browser itself, could result in irreparable harm. In such Internet navigating scenarios the browser is in fact one of the weakest security points in the whole navigating process. Although the user is responsible for browser's security and, more generally, for his device's security, effects of this situation have consequences for both the user and the web application administrator. There is therefore an interest for the administrator to ensure the browsing sessions are properly secured. This would result in a safer IT environment as well as savings in overall resources as time is not wasted dealing with malware.

[0011] The fact that source code is processed in the user device also opens up the possibility of malicious interaction with the web server, besides of attacks centred in manipulation of the user environment. Among others, a malicious user is able to automatically interact with a web application via a bot or malware. These bots can be used for a wide range of activities including automatic crawling of web pages in search for sensitive information, automatic execution of brute force attacks and systematic request for resources, in order to waste the server's resources and reduce its overall performance. [0012] All attacks above presented are based normally on reverse engineering a web site's structure, in order to understand its internal details and define an attack which can exploit weak points. For instance, a typical attack could be an attacker targeting a financial application in order to deviate legitimate money transfer operations to a different account. To that end, the attacker first automatically changes the account number in a form. Crafting such an attack requires analysis of the source code of the web page showing the form, in order to identify the variable to be changed, and of the corresponding confirmation screen, where the opposite substitution has to be made to prevent the user from noticing the attack. This task is usually facilitated due to human factors related to code development habits.

[0013] For the above explained reasons, a need exists for enhancing the security of web page applications in order to minimise the risk of infection. However, these mechanisms should be fulfilled without affecting a user's navigation experience and without requiring specific actions from an average user with low security awareness and knowledge.

SUMMARY

[0014] It is therefore an object of the present invention to provide solutions to the above mentioned problems.

[0015] In particular, it is the object of the present invention to provide an apparatus for enhancing the security of a web application's data transmitted to and received from a user of at least one client device.

[0016] It is another object of the present invention to provide a method for enhancing the security of a web application's data transmitted to and received from a user of at least one client device.

[0017] It is another object of the present invention to provide a computer readable medium configured to store instructions, which when executed on the apparatus, performs a method for enhancing the security of a web application's data transmitted to and received from a user of at least one client device.

[0018] The invention provides methods and devices that implement various aspects, embodiments, and features of the invention, and are implemented by various means. For example, these techniques may be implemented in hardware, software, firmware, or a combination thereof. [0019] For a hardware implementation, the processing units may be implemented within one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors, controllers, microcontrollers, microprocessors, other electronic units designed to perform the functions described herein, or a combination thereof.

[0020] For a software implementation, the various means may comprise modules (e.g. procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in a memory unit and executed by a processor. The memory unit may be implemented within the processor or external to the processor.

[0021] Various aspects, configurations and embodiments of the invention are described. In particular the invention provides methods, apparatus, systems, processors, program codes, and other apparatuses and elements that implement various aspects, configurations and features of the invention, as described below.

BRIEF DESCRIPTION OF THE DRAWING(S)

[0022] The features and advantages of the present invention will become more apparent from the detailed description set forth below when taken in conjunction with the drawings in which like reference characters identify corresponding elements in the different drawings. Corresponding elements may also be referenced using different characters.

[0023] FIG. 1 is a general overview of an Internet navigation system of the prior art.

[0024] FIG. 2 depicts the Obfuscation Server and web application server interaction according to the main embodiment of the invention.

[0025] FIG. 3 depicts a detailed view of the Obfuscation Server.

[0026] FIG. 4 depicts a process for code transformation determination.

[0027] FIG. 5 depicts a process implemented by the Parameter Translation Modules according to one aspect of the invention.

[0028] FIG. 6 depicts the information flow and exchange according to the main embodiment of the invention.

[0029] FIG. 7 depicts the steps performed by the Session Object upon receiving a web page request.

[0030] FIG. 8 depicts a method of delivering an obfuscated document. [0031] FIG. 9 depicts a complementary method of delivering an obfuscated document applied to user requests.

[0032] FIG. 10 depicts the steps performed during document caching according to a first aspect of the invention.

[0033] FIG. 11 the steps performed during document caching according to a second aspect of the invention.

[0034] FIG. 12 depicts the caching process when a document with static contents is requested for the first time.

[0035] FIG. 13 depicts the caching process when a document with static contents is requested again.

DETAILED DESCRIPTION OF THE INVENTION

[0036] In the following the words "web", "WWW", or "Internet" may be used interchangeably, as they refer to the same entity, which represents the network of interconnected computing devices commonly known in the art.

[0037] The term "web document" refers to the data files hosted on diverse computing devices on the Internet and which are served to end users by transmission to their computing devices so that they can be displayed for viewing on the user device's display.

[0038] The term "web page" refers to a set of web documents which, when processed together, result in one screen display. These documents may follow different formatting; moreover, they can use different coding languages.

[0039] The term "web application" will be used to refer to a set of related web pages that offer in conjunction a certain functionality to the user.

[0040] The term "browser" refers to the software, computer program, or application, which permits the content files received to be displayed on the user device's display. The browser typically performs a number of data processing actions for converting the received data file to a format ready for display. In addition, the browser responds to user interaction with the web page executing scripts or requesting a new web page.

[0041] The term "malware" will be used to refer to any code, such as software code or computer program, which is hosted by a legitimate user and which executes actions in detriment of the host, thereby exhibiting malicious behaviour. From the following description, it will be understood by the person skilled in the art that although any one preferred aspect of the invention already provides solutions to at least some of the problems of the devices and methods of the prior art, the combination of multiple aspects herein disclosed results in additional synergistic advantageous effects over the prior art, as will be described in the following.

[0042] Obfuscation techniques consist in applying a series of transformations to source code such that it becomes unintelligible and hence harder to reverse engineer while visual appearance and functionality of the code are maintained. These latter characteristics are necessary to make this solution transparent to the user. Moreover, one distinguishing feature of obfuscation, when compared to encryption, is the fact that contents do not need to be de-obfuscated for execution, interpretation and/or use. On the contrary, the transformations applied are such that obfuscated code can be executed and works properly without need to transform it back to the original code. These characteristics have been identified and exploited in the present invention.

[0043] The present invention provides an Obfuscation Server 220 and a corresponding method for real-time transforming the code generated by a web server before it is delivered to the user device, the code being used to provide a web page, or web application, to at least one user of a computing device. Since the invention is configured to act as an intermediate layer between the client device and the web server, similar to a proxy, it advantageously provides transparency to both of them.

[0044] In this intermediate layer transformations are applied to the original code such that resulting code, transmitted over the network and interpreted by the browser, is different from the original one but maintains both functionality and visual appearance. As a result, the user does not notice any differences between an original web page and the transformed version. Moreover, as described later, these changes also have an effect on user requests generated in the obfuscated web page. These changes are intended to complicate a malware's task of automatically analyzing and/or modifying the contents exchanged between a user and a web application server 120.

[0045] In one embodiment the Obfuscation Server may be integrated within web application servers which receive web page requests, generate content therefrom, and provide the contents in return to the client devices. In another embodiment the Obfuscation Server may be implemented external to web application servers, interacting with them as necessary to provide the advantageous functionalities of the invention. In yet another embodiment, some functionalities of the Obfuscation Server may be partially integrated within web application servers, and the remaining functionalities may be implemented external to these web application servers. [0046] FIG. 2 depicts the main components of a web server 120 necessary to manage the data flow from a document's request to its delivery. In general terms, a request for a web application is normally transmitted by a computing device as an HTTP request comprising a universal resource locator, URL. Once this request is received by the web server 120, it is pre-treated by an input processor 211. Once the input is ready, the content generator 212 creates the web page document to be sent to the user device 1 10 in response, as a function of the requested URL and the parameters comprised within. This document is then processed by the output processor 213, where it is again treated before it is sent to the user. The input and output processor perform a set of operations intended to ease operation of the content generator, for instance, packet header manipulation. Moreover, these processors can be modified to add new and customized operations, also called filters.

[0047] FIG. 2 depicts also the integration of the Obfuscation Server 220 with the web server 120 according to an embodiment of the invention. As can be seen, the Obfuscation Server 220 is placed in parallel to the web server, as it interacts both at the input as well as the output end of the server 120. The Obfuscation Server 220 performs transformations by means of a Translation Module 222 and an Obfuscation Module 223. In general terms, the Obfuscation Module interacts with the web server's output processor in order to transform contents before being output by the web server. Correspondingly the Translation Module intervenes during the input processor to process the received requests in order to reverse transform obfuscated code or requests.

[0048] In one embodiment, in order to implement communication between the web server and the Obfuscation Server, new filters are added to both processors, implementing interception of requests and responses. These filters might communicate with the Obfuscation Server in the same machine or remotely, with an independent server hosting the Obfuscation Server.

[0049] Once intercepted, contents are derived to the Obfuscation Server, where they are modified as needed, and they are then returned to the processor. An Obfuscation Core 221 is provided to coordinate the Translation and Obfuscation Modules, as will be explained in more detail in the following.

[0050] An inherent characteristic of how web applications are operated is that, due to user interaction, the web page contents are dynamically generated for one specific session and user at a time. Therefore, in order to cater for this need, and for any obfuscation to be of practical utility, transformations need to be applied in real-time to the code, upon demand of the contents. Prior art obfuscation techniques are all implemented during code development, that is, prior to compilation of an executable file. However, in a web application obfuscation scenario, real-time operation poses many challenges to the system. For instance, the present invention needs to be capable of serving a similar amount of users accessing the web application as without obfuscation. Furthermore, while providing a solution to this aspect, it also cannot have detrimental effects on the web server's performance or in the user experience.

Obfuscation Module

[0051] The Obfuscation Module 223 receives the contents generated by the web server, obfuscates them and sends the result back to the output processor 213 for it to be further treated and finally delivered to the user device. Since transformations and their application partially depend on the type of code that is being analyzed, a specific Obfuscation Module is required for each of these types of codes. The Obfuscation Server 220 therefore will have at least one Obfuscation Module 223, but is preferred to implement many such Modules to enable universal transformation when combining many code types. Different code types are for example Hyper Text Markup Language, HTML, JavaScript and Cascading Style Sheets, CSS, code. Therefore in one aspect of the present embodiment these components of the server's generated content (that is, a web page) will be processed separately. However, since cross-references may exist between different codes, consistency needs to be assured within transformations to maintain correctness and functionality of the resulting web page. Therefore, in another aspect of this embodiment, the Obfuscation Core 221 is in charge of maintaining coherence across changes to every kind of code although they are obfuscated by different modules.

[0052] Although every type of code requires to be treated in a different manner it is desirable to have one single algorithm for processing all codes, independently of the coding language under analysis. This allows simple development and integration of new Obfuscation Modules for other coding languages if necessary. It also allows easy addition of new transformation algorithms as well as enabling development and maintenance of the transformations independently of the language.

[0053] FIG. 3 depicts an embodiment of the Obfuscation Modules 223 of FIG. 2 according to one embodiment of the invention. In order to provide universal and modular obfuscation, each Obfuscation Module is divided into three main components, namely a Code Parser, a Transformer and a Serializer. When a piece of source code is received for obfuscation, the first Code Parser analyzes its content and transforms it into an Abstract Syntax Tree, AST, object, which contains all the web document information within the code itself. This allows easy navigation, extraction and manipulation of the elements of the code. This object is input to the Transformer 312, which comprises the obfuscation algorithms and iteratively applies transformations to the object.

[0054] In one embodiment the obfuscation process is advantageously applied by the transformers in many steps, each of them sequentially and independently applying one specific transformation. This stepped transformation simplifies the obfuscation process and thus enhances performance. It alsoprovides additional randomness, as explained later. The output of the Transformer 312 is a modified AST object that needs to be converted back to the source code in the original web document language. This task is implemented by the Serializer, which generates plain text source code to be sent back to the output processor 213 for delivery to the user.

[0055] As FIG. 3 shows, one of the Obfuscation Modules, the main Obfuscation Module 310 has a different function from the rest, as it acts as a code distributor, channelling different codes to the remaining Obfuscation Modules 320. The main Obfuscation Module 310 is determined depending on the document type under analysis. In the figure, for instance, an HTML document is received and hence the main Obfuscation Module chosen is an HTML Obfuscation Module. At this point the main Obfuscation Module 310 receives a document and starts parsing it. During this process it identifies and extracts code blocks written in different coding languages, through identification of tags, and passes these to the corresponding secondary Code Obfuscators 320 instead of including them in the Abstract Syntax Tree object that it is generating. As an example, FIG. 3 depicts two other obfuscators, one for JavaScript code obfuscation and the other for CSS code obfuscation. Each obfuscator builds an independent AST, which is then separately transformed, and serialized. Finally, the outcome of the JavaScript and CSS Serializers 322 is collected by the main Obfuscation Module 310 which integrates these in the main obfuscated document. The person skilled in the art will understand that any number of Obfuscation Modules, as well as Code Parsers, Transformers, and Serializers may be implemented, dependent on the number of different code types.

[0056] In another aspect of the Obfuscation Server, in order to enhance obfuscation security, many transformations are sequentially applied to the same document and they can be randomly chosen from a set of transformations available. This combination enhances randomness of the obfuscated output. Furthermore, different implementations of the same transformations can be available, such that the transformation concept is the same, but different implementations of it can lead to different results. Hence, each Obfuscation Module has at its disposal a set of transformation types and for every type a set of transformation algorithms may be applied.

[0057] The transformations to be applied by the plurality of Obfuscation Modules are selected by a Session Object comprised within the Obfuscation Core which is independent of the plurality of Obfuscation Modules. FIG. 4 depicts a process 400 implemented by the Session Object according to one embodiment of the invention for determining which transformations to apply to a particular element of the AST. In step 410 a transformation type is selected and, according to this selected type, a determination 420 is made as to which algorithm to choose from among the available transformation algorithms. The selected transformation is then applied 430 by the respective Transformer to the code. At this point a determination 440 is made as to whether the obfuscation process has been completed. This decision might be taken depending on many criteria, as will be further explained. In case negative, process returns to step 410 where a new transformation type and algorithm are selected again and applied. In case the result of determination step 440 is positive, the complete obfuscated code is output.

[0058] Theoretically speaking, even such reproduced random processes could be subject to malicious attacks. If the algorithm used to reproduce the pseudo-random process is reproduced, for example, via ascertaining the correct generation seed, it is possible to mimic the random output produced by the process. Thus an attacker could eventually predict future transformation replacements and thus attempt to automate parsing of the obfuscated code looking for the predicted transformation results.

[0059] Therefore in order to provide a further security enhancement, in one aspect of this embodiment, the transformation determination process 400 can be configured to select types and algorithms randomly. For instance, the Session Object can be configured to go through the list of types in a specific order, and apply one algorithm for each one sequentially. In another aspect a specific set of types and algorithms are randomly selected.

[0060] This process adds uncertainty in the code mimicking process as layers of randomness are added to a regular transformation. Different pseudo-random algorithms could be applied within the same transformation type. This further increases difficulty when reverse engineering the web application and complicates the malware's task of implementing automatic mechanisms for extracting or manipulating information.

[0061] In another aspect of this embodiment the web page provider's security and performance requirements can be taken into account. Since code complexity is increased in each iteration of the transformation determination process 400, the application of previous transformations complicate the task of the Obfuscation Module 223 to perform future iterations. Hence executing the transformations following a specific pattern sensibly lightens the task of obfuscation. Therefore, in order to optimise the use of resources, a deterministic application of transformations is possible so as to reduce the time needed to obfuscate a page. This results eventually in an overall reduction of the Obfuscation Server's load and resource consumption.

[0062] In another aspect of this embodiment, a trade-off exists, on one hand, between enhanced security provided by a more random and less deterministic approach to transformation determination and, on the other hand, the optimisation of resource usage provided by a less random and more deterministic approach. Therefore, on a case by case basis, the Session Object can determine whether performance is more relevant or security, and thereby select the applicable level or randomness accordingly. In another aspect, the Session Object can be configured to apply transformations until a certain number of iterations have been completed. This implementation has the advantage of providing a constant loading for the Obfuscation Server. In yet another aspect, the Session Object can be configured to apply transformations until a certain complexity overhead is reached as determined by a pre-evaluated threshold. The advantage of this aspect is that the Obfuscation Server's loading can be dynamically set depending on other criteria, for example, a desired complexity level. In a different aspect the Session Object may determine a desired balance between the loading produced by transformation processing, on one hand, and the loading produced due to the number of simultaneously operating Obfuscation Modules, on the other, thereby levying performance and security.

[0063] In one aspect of the invention the number of types of transformation algorithms is limited to five, and all transformations are grouped in one of these five types. A first type is the Variable Name Transformation, wherein variable identifiers are replaced with random names so that they, and specially their purpose, are difficult to deduce. [0064] However, an attacker could identify the above mentioned variable through other characteristics, besides of its name. For instance, constant values can be indicators of a variable's purpose. Therefore a second type of transformation is proposed consisting in Constant Values Modification. In this second type the changes are applied to forms and JavaScript code, and take advantage of the possibility to code the same value using different methods; for instance, a constant value 23 can also be coded as a formula (10+5+11-3) or as hexadecimal (0x17), among others. This results in the value being less recognizable. Constant strings, for example, a URL, may also similarly be transformed.

[0065] Other characteristics exist that might reveal the purpose of a specific part of the code, such as formatting and layout of the source code. To complicate the reverse engineering of the source code format, in another aspect of the invention Instruction Homomorphism is applied in a third type of transformation. Modifications under this category replace instructions with other equivalent ways to code the same action, and hence they depend strongly on the language that is being processed. The aim of these substitutions is to prevent searches for regular expressions that could be used as reference for location of variables (or for inserting malicious code) inside the code.

[0066] In a fourth type, Form Structure Modification is proposed which enables alteration of a form's structure, such that the layout will vary in every resulting piece of code. These changes comprise disordering of fields, addition of dummy fields and division of the form in many pieces. However, it should be considered that only hidden fields can be disordered or added as dummy, to avoid effects on the page's visual appearance.

[0067] Finally, the fifth type of transformation type proposed is Change of Function Prototypes, which also allows hiding of variables through hiding the use of these in function calls and alteration of code's layout. In this case, the number and order of the parameters used in function calls can be changed, making analysis more difficult.

[0068] As explained, these transformation types have the overall effect of confusing one specific type of information or complicating one concrete step in the reverse engineering process. Grouping of algorithms in transformation types allows configuration of rules that may, for instance, assure that at least one algorithm is applied under every type, such that every kind of information is eventually obfuscated. An additional advantage of this grouping is that it makes the transformation determination process more efficient as setting a specific order to the obfuscation process is simplified.

Translation Module

[0069] As already described, transformations applied to a particular web page's code have also an impact on the format and content of subsequent incoming web page requests from the user. Hence web page requests are likewise protected in the same, or corresponding, manner as the original web page. This feature prevents a malware to automatically collect and analyse sensitive data transmitted from the user to the web application server. It is common practice for malware code to register every request originated by a browser and perform a search for some specific variable name, such as 'password'.

[0070] Similarly, these malware attacks attempt to modify web page requests systematically. A typical scenario for attacks are e-banking applications wherein a user requests a bank transfer. The user would fill in a form with the details of the operation, such as the amount of money to transfer and the source and destination account numbers. An attacker could systematically search for a variable representing the 'destination account number', and replace it with another account number of his choice thereby re-routing the money transfer to himself. This fraudulent operation would need to be complemented with changes to the verification web page, and hence the attacker would need to modify the incoming response to the mentioned form. The obfuscation techniques described above difficult the task of hiding the attack, however they cannot completely stop the execution of it.

[0071] In one aspect of the present invention, user requests are obfuscated as a result of the obfuscation of web pages. For instance, when transforming the name of a variable in the original web page code, requests generated in response use the already obfuscated variable name instead of the original one. In fact, the browser does not even know the original variable name since it is never transmitted to it. As a consequence, obfuscation of the web page enables also to obtain obfuscated requests from the user in return.

[0072] Reverting to the embodiment of FIG. 2, in another aspect of the invention, these obfuscated requests are de-obfuscated by the Obfuscation Server before they enter the content generation phase. This ensures transparency with the web application server, which does not need to deal with details of code reverse transformation processing. The de-obfuscation is performed by the Translation Module 222 when it is requested by the input processor 211, so that requests are reverse transformed before they reach the content generator 212.

[0073] Since reverse transformations, or translations, are intended to undo changes previously made, no decisions are taken by the plurality of Translation Modules. Moreover, since transformations are pre-defined by the Obfuscation Modules, rather than created by the translators, the latter only need to retrieve information of previously made changes and apply the opposite modification. As a consequence, these modules do not need any knowledge on the code's structure. Furthermore, given these circumstances, different modules for every language are not necessary. This provides the advantage of simplicity in design and reduces the resource requirements for implementing the at least one Translation Module.

[0074] In another aspect of this embodiment, Translation Modules are divided based on the tasks they perform. A user request comprises, in general terms, two types of obfuscated information: the requested URL and a series of filled in fields, this is, a set of parameters with values that have been assigned based on user's input to the form. Thus, not all of the obfuscation transformations are present in the returned request. In fact, only a part of the obfuscation transformations previously presented have an impact on future requests. More specifically, changes to be undone by the Translation Modules include modification of URLs, changes to both variable names and their content formatting, substitution of constant strings and changes to form's structures, such as insertion of dummy fields or splitting into many forms. The Obfuscation Core 221 provides the Translation Modules the necessary data for reverse transformation processing, such as the correspondence between original names, and the transformed replacements. Hence translation of URLs, variable names and constant values are translated through substitution of the obfuscated values by the original names as dictated by the Obfuscation Core. On the other hand, translations of variable formatting require transforming the received values into the expected format before replacing the value by its original. Translation of dummy fields is straightforward as it comprises a dummy field deletion operation.

[0075] However, reconstruction of a form after it has been split into many parts poses some challenges and restrictions on the implementation of the Translation Modules. Firstly, parameters received are spread across the different split requests and need to be gathered before they are delivered to the content generator 212. Secondly, in order to maintain functionality and avoid errors, all requests need to receive a response, even if they are actually intended to generate only one petition to the content generator. Moreover, only one of these responses needs to be displayed to the user.

[0076] Hence, since behaviour of the Translation Modules varies depending on the information under translation, in another aspect of reverse transformation processing two types of Translation Modules are provided, namely a URL Translation Module and a Parameter Translation Module. This ensures correct translation when the input is a URL as well as when the type of input are parameters which need to be translated. Therefore the URL is extracted from every request, on one hand, and a table containing all its parameters and their values on the other. This data is then separately provided to respective Translation Modules which perform reverse transformation. Once both types of petitions have been translated, the input processor delivers it to the content generator 212.

[0077] In order to cater for these different eventualities, in another aspect of the present invention, the obfuscated requests resulting from a split form are classified into two types comprising a single primary request and one or many secondary requests. The classification is performed by the Obfuscation Core 221 in correspondence with the Obfuscation Module 223, that is, when the splitting is actually defined and implemented. The result of the classification is stored together with obfuscation information for later use in a memory in the Session Object. In the reverse transformation process, upon translation, the primary request is converted into an original petition by gathering the parameters in all split requests and assigning them to the intended URL. On the contrary, secondary requests are translated into requests to one specific URL which confirms the reception of the request but does not trigger the display of any new contents on the browser. Therefore the displayed URL and the confirmation URL ensure transparency in information exchange between user and server, as well as minimise errors during this information transaction.

[0078] The role of the URL Translation Modules is to replace the obfuscated address by the corresponding original as recorded upon obfuscation. In case that the form has been split, the primary request is assigned the intended URL, while the secondary requests return a URL containing a blank document. As a result of the secondary requests, the content generator 212 delivers this blank document, hence causing the original request to receive an empty confirmation which causes no changes on the browser display. On the other hand, the primary request triggers delivering of the document the user requested. [0079] Complementarily, the Parameter Translation Modules are configured to translate parameters into their original names, values and formats, as well as collecting them, in case the request has been split, in order to obtain a single table. FIG. 5 illustrates the process 500 implemented by the Parameter Translation Module according to one aspect of the invention. In step 510 the Parameter Translation Module is configured to verify whether the request is secondary and, in such case, extracts and stores its parameters and returns an empty confirmation to the input processor, as in steps 520 and 530. In case that the request under analysis is primary, in step 540 a determination is made whether all contents are available for reverse transformation processing. These parameters are made available by other Translation Modules. In case negative, in step 550, the Translation Module waits for the parameters of the corresponding secondary requests to be available. Once all secondary requests have been translated, this is, once parameters contained in all secondary requests have been stored, in step 560 the Parameter Translation Module retrieves them all and in step 570 undoes transformations to these parameters. Finally, in step 580, the translated parameters are provided to the input processor, where, together with the translated URL as provided by the URL Translation Module, the complete request is formed and forwarded to the content generator 212.

[0080] As mentioned, in steps 540 and 550, once a Parameter Translation Module has received a primary request for processing it waits for the remaining parameters to arrive. Hence, if any secondary request reaches the server after the primary request, the Parameter Translation Module is not available for processing as it is in a blocked state waiting for its contents to be extracted. Hence, in another aspect of the invention at least two instances of the Parameter Translation Module are provided, so that secondary requests can be processed by the remaining at least one Parameter Translation Modules while the first one is blocked processing the primary request.

[0081] FIG. 6 depicts a method 600 according to one aspect of the invention showing the information flow and exchange throughout the previous aspects and embodiments. In this example a form is split into three requests. The process starts when a first Request 1 is received 650 and the input processor extracts 651 the URL and parameters and delivers (652; 653) them to the URL Translation Module 610 and to any of the available Parameter Translation Modules, in this case the first one 620. Those modules verify obfuscation information to find out that Request 1 is secondary, and hence the URL Translation Module 610 returns 654 a URL to a blank document, while the first Parameter Translation Modules 620 extracts parameters and returns 655 a blank table as confirmation. Once the input processor receives both responses, it generates and provides 656 the translated Request 1 to the content generator 212.

[0082] The process then continues when Request 2 is received 660, its parameters and URL are extracted again and delivered (662; 663) to the URL Translation Module 610 and to the second Parameter Translation Module 630. Since Request 2 represents a primary request, the URL Translation Module 620 returns 664 the original intended destination of the petition. The second Parameter Translation Module 630, however, discovers that parameters are still left to complete the request, and therefore waits.

[0083] When Request 3 is received 670 and the obfuscated information is extracted 671, the input processor is forced to deliver this request for parameter translation to the first Parameter Translation Module 620, since the second Parameter Translation Module 630 is busy processing the second Request 2. Request 3 is processed (steps 672 to 676) similar to Request 1.

[0084] Once completed, the second Parameter Translation Module 620 is able to collect all parameters of the split request and build a translated parameter table containing them. Upon receiving 665 of this table, the input processor re-assembles it together with the URL received 664 from the URL Translation Module 610 and provides 666 the translated primary request to the content generator 212.

Session Manager

[0085] As already mentioned, obfuscation, and more generally code transformation, does not fully prevent reverse engineering, but only complicates this task. This means that, given enough time, the risk always exists that code can be understood although it is obfuscated and hence previously mentioned attacks could be performed in an automated manner. If the transformations applied by the Obfuscation Server were static, an attacker would find certain difficulties in understanding the code. However, once the code is understood, the attacker would be able to build a piece of malware that would automatically parse and modify the code at his will. This implies a higher effort at the beginning of an attack, however the effort necessary decreases considerably after this initial phase as, once the code is analyzed, an attack can be infinitely reproduced with no extra cost at all.

[0086] In order to mitigate the possibilities of a successful malware attack, the present invention proposes to apply the transformation operations dynamically to the code. The effect of this is that the result of the obfuscation becomes time variant, therefore introducing further uncertainty in the code breaking process. Moreover, since the code is generated on demand, as dictated by the user's browsing habits, these transformations are also applied in real time. This has the effect of reducing to a minimum the exposure window which can be used by a malicious user to break the obfuscated code. Therefore, in another aspect of the invention code is obfuscated in real-time and on demand, so that transformations are not only applied, but also chosen in real time. As a result, the system can deliver a different result (obfuscated code) to different requests for the same document. The combined effect is that of providing a dynamic and real-time obfuscation mechanism which maximises security while browsing the internet.

[0087] As already mentioned, web pages are usually formed by a combination of several documents and languages, which usually include HTML, JavaScript and CSS. During the output processor, these are separately transformed by different Obfuscation Modules, although some changes need to be applied coherently inside and between every piece of code. For instance, if the definition of a function (its name and the parameters accepted) is changed, all calls to this function should be changed in accordance; and these calls may appear in many documents forming the web page, independent from the language in use. The same applies to variables, which are defined once but can be used by many documents. Moreover, changes like Variable Name Transformation, Constant Value Modification and Form Structure Modification can also have effects on incoming requests from the user. Hence, it has been realised that it is not possible to apply different and independent sets of transformations to each document.

[0088] Furthermore, in certain situations, it is desirable to preserve some transformations over time. This is the case of variables that are used by several HTML frames at different moments during one session or static menus that are displayed throughout the session. Applying a different set of transformations to every web page would require constantly reloading these contents, which in turn would negatively impact on the traffic generated. Also, this is suboptimal for the Obfuscation Server, which would need to obfuscate the same contents with a high frequency.

[0089] To deal with these specific issues, another aspect of the invention provides an Obfuscation Server which is configured to apply transformations to a set of documents, which belong to an obfuscation session, and to keep track of the changes made. To that end, a Session Object is created for every obfuscation session in place. An obfuscation session refers to a collection of transformations that are applied consistently to a set of contents generated by the web application server. These contents can be grouped under many criteria, such as, in the most fine-grained configuration, every obfuscation session can match up with a user session in the web application. This ensures that coherence is maintained inside one session, but also that a set of transformations is discarded once a session expires, hence causing delivered code to change in time. The session object thus stores all transformations relating a specific obfuscation session, so that they can be retrieved during both the obfuscation and the translation process and be applied to contents belonging to the same obfuscation session. Moreover, the Session Object is in charge of selecting the transformations to be applied during the session, as previously described by FIG. 4.

[0090] The Obfuscation Modules iteratively query the Session Object to obtain the specific transformation to apply in every step of the obfuscation process. However, simultaneous accesses to the object could cause inconsistencies in the information stored by the Session Object and, as a consequence, in transformations performed by the different Obfuscation Modules. As a result, there is a need for the Session Object to be blocked to incoming requests when it is determining a new transformation.

[0091] FIG. 7 depicts a method 700 according to one aspect of the invention performed by the Session Object upon receiving a web page request. In step 710 the Session Object first tries to retrieve the transformation from storage. If the transformation has already been selected at any prior moment during the session, it is forwarded to the Obfuscation Module. Otherwise, the transformation needs to be selected and hence the Session Object is locked 720, so that any incoming requests have to wait and the flow in FIG. 4 is executed. Once this process is finished the result is stored 740, the object is unlocked 750 and the generated transformation is sent 760 to the requesting Obfuscation Module.

[0092] Since the information contained in the Session Object needs to be accessed by both Obfuscation and Translation Modules, it is controlled by a third party: the Obfuscation Core 221. More concretely, the Session Manager, inside the core, is configured for managing and maintaining obfuscation sessions. However, the number of sessions to be simultaneously handled can be large, which would pose a challenge to scalability. Moreover, session handling could provoke bottlenecks in the Obfuscation Server, which should keep track of every session's status in order to be able to identify the session to which requests and contents belong, both when obfuscating and when translating. Hence, in one aspect of the invention, instead of storing stateful information about every session, contents (both requests and responses) are tagged with a cookie, which is associated to one corresponding obfuscation session. Thus, the Session Manager stores both a reference to the Session Object and a relation of cookies and Session Identifiers.

[0093] FIG. 8 depicts a method 800 of delivering an obfuscated document according to this aspect of the invention. Upon receiving 810 a response to a web page request from the content generator 212, the output processor 213 extracts 820 the cookie and queries 830 the Session Manager hosted in the Obfuscation Core 221 to retrieve the corresponding Session Identifier, which is then provided 840 to the output processor. The Obfuscation Modules are then queried (850, 860) with the content and the Session Identifier, which enables them to retrieve 870 the Session Object from the Session Manager. As a result, the output processor obtains 880 in return an obfuscated version of the content, and forwards 890 it to the User Browser 111 through the output processor.

[0094] FIG. 9 depicts the complementary method 900 of delivering a translated user request to the content generator according to this aspect of the invention. In step 910, the input processor receives 910 a web page request and extracts 920 the cookie and obtains (930, 940) the Session Identifier from the Obfuscation Core. Then, the Translation Modules are queried 950 to de-obfuscate given the contents (URL and parameters, in this case) and the Session Identifier. The Translation Modules then request 960 the Obfuscation Core and receive the Session Object in return 970, which contains the information necessary to undo the previously made transformations. Once the input processor 211 receives 980 the translated URL and parameters, it creates a complete de-obfuscated request that is then forwarded 990 to the content generator 212.

[0095] In another aspect of this invention, further performance advantages may be provided by creating many instances of the Translation and Obfuscation Modules, so that they can work in parallel and serve a larger number of petitions. In this aspect of the invention, the Session Manager is in charge for coordinating usage of the different instances. More concretely, once the Session Manager is queried with a cookie, it returns both the Session Identifier and a reference to the Translator/Obfuscator instance to be used. The input/output processor then issues a request for translation/obfuscation to the instance it has been assigned.

[0096] As already mentioned, due to the intensive use of such Obfuscation Servers envisaged due to its application in an Internet web browsing environment for a large number of users, one of the main challenges when applying obfuscation to a web page is scalability. Since the invention acts as a proxy between the server 120 and the browser 111 and it needs to apply modifications to the contents issued by the server in real time, it is critical for the Obfuscation Server not to become a bottleneck in the architecture. When applying obfuscation to a document, a trade-off exists on one hand between the sturdiness, or resistance to attacks, of the obfuscation result and, on the other hand, the time and resources needed to perform the operations. This implies that resources to obfuscate every access to a web application should be allocated depending on this application's criticality and the number of accesses it supports. Otherwise, obfuscation might be unfeasible for applications accessed by a large number of users.

[0097] In this regard, in another aspect of the invention, a group of user sessions is defined and a set of transformations is applied to each group. This is in contraposition to generating one unique set of transformations for every user session. In this case, one obfuscation session, composed by a Session Identifier and a Session Object, is assigned to a group of cookies, so that a one-to-many relationship is established by the Session Manager. The system accepts establishment of this relationship following different rules.

[0098] In a first option of this aspect, a definite number of users N is assigned to each transformation. In this case, when the first user session is established, the set of transformations is created. Progressively, as the user session progresses this Session Object is filled in with the transformations applied. Future user sessions are also assigned the same obfuscation session and they retrieve the previous generated obfuscation transformations. In case one document was not obfuscated before, the Obfuscation Modules generate the corresponding transformations and store them in the object. Once the limit has been reached, that is, once the object has been assigned to N user sessions, a new object is created for the next N user sessions. This scheme sensibly decreases the processing time and resources needed, since fewer transformations need to be generated, especially if N is large enough. However, it has the drawback that during off-peak access times one set of transformations can be used for a long time. If the exposure time of one specific set of transformations is long enough, an attacker can be able to analyze the code and create a piece of malware for that period that is able to modify user's actions.

[0099] To overcome the mentioned limitation, in a second option of this aspect, a fixed validity time for every obfuscation session T is determined and all contents delivered and requests received during the period T are obfuscated using the same set of transformations. After period T is finished, a new set is created and all user sessions taking place are assigned the new obfuscation session, so that one can explicitly control the exposure time of every obfuscation session. Note how, following this scheme, one user session might be applied many obfuscation sessions during its life. This change is non-trivial since, as it has already been pointed out, some elements can be used during the whole browsing session. To solve this eventuality, when the obfuscation session change takes place, the user browser is forced to reload all contents in the web page, in order to ensure that no elements of the previous session are in use. However, these reloads should not be performed simultaneously by every user session, to avoid system's overload due to a massive request for new transformations. Moreover, this situation also creates the need for the Obfuscation Server to create the new Session Object before the swap takes place.

[00100] Although this solution is technically desirable, it has another drawback related to the attacker's resource availability and economy. An attacker collecting traffic in search for confidential information is not completely prevented simply by obfuscation techniques, as he can continue analyzing this traffic. The drawback for the attacker is that obfuscation considerably increases the resource investment needed to find the desired information, making information collection not worthwhile in economical terms. However, when one set of transformations is applied to a higher number of users, the relative profit obtained when performing a single analysis is greater, that is, when obtaining the translations needed to understand the collected information, these can be applied to a large set of sessions. Furthermore, when assigning obfuscation sessions based on time criteria, the number of user sessions that are applied the same transformations notably increases during peak-times, and this can increase the interest of attackers in those sessions.

[00101] A third option according to this aspect is provided which combines the first and second options, in other words two limits on the number of users N and sessions validity time T are established. Whenever one of these limits is reached, the session object is renewed, enabling control of both the exposure time and the number of users assigned.

Document Caching [00102] The interactive nature of browsing means that contents are generated and downloaded whenever they are needed. This condition leads to a requirement for obfuscation to be performed in real time but also, as a consequence, response time of the Obfuscation Server is crucial to the suitability of the present invention. Web pages also usually comprise static contents, which, furthermore, are usually repeatedly used throughout every session. This is the case, for instance, of headers or menus that are permanently displayed in the web pages. Hence, in another embodiment of the invention, which may be combined with all other embodiments and aspects of the invention, the Obfuscation Core 221 comprises a Cache Manager 1111 which is responsible for caching obfuscated static documents. Thus, every time the user requests a static document, its obfuscated version is retrieved from the Cache Manager instead of obfuscating it again, which results in considerable savings in processing resources. In this embodiment two different methods for static document caching are proposed.

[00103] According to a first aspect, the method 1000 as depicted by FIG. 10 is followed during the session whenever a request is received. In this first aspect all static documents are obfuscated at the beginning of the obfuscation session. Upon initialization of a session, the Session Manager notifies the Cache Manager 1111. The latter retrieves all static documents from the web server and requests the Obfuscation Modules to transform them. The resulting documents are then temporarily stored by the Cache Manager. When the input processor requests 1010 the URL Translation Module 610 to de-obfuscate the URL, the translator checks 1020 if the requested URL corresponds to a static document by asking the Cache Manager. If so, it returns 1030 a URL pointing to the temporary storage. As a result, when the content generator 212 processes the request, it retrieves 1050 the obfuscated document from the temporary storage location, instead of the original one. Finally, during output processing, the document is not sent to the Obfuscation Modules but it skips (1070, 1080) the obfuscation phase and is directly forwarded to the next step in the output processor. In an alternative flow, the URL Translation Module replaces the URL by its original de- obfuscated version and the input processor 211 queries the Cache Manager thereafter to obtain a reference to the cached document. The method above presented requires for the Cache Manager to know in advance which documents are static. This is achieved through the use of a list of static documents, which needs to be configured by the administrator and is stored in the Cache Manager. [00104] In this method, since the Obfuscation Modules are not involved in the delivery process, the response time is decreased. However, there is a peak in processing workload at the beginning of session setup since at this moment all static documents have to be obfuscated. Moreover, this process obfuscates all documents, independently of the fact that some of them might never be requested during the corresponding session. Finally, this mechanism also enables previous creation of a pool of obfuscation sessions. In this case, the Cache Manager could take advantage of low activity periods of the Obfuscation Modules in order to transform static documents and store them until it is notified of the start of a new session.

[00105] According to a second aspect, the method 1100 as depicted by FIG. 11 is followed during the session whenever a first request is received for a static document. According to this aspect, a second mechanism is proposed which obfuscates static documents on demand, that is, whenever they are first requested by the user. This prevents the Obfuscation Modules from pointlessly obfuscating static documents that are never requested, and also reduces the workload peak upon session creation. In this case, whenever a static document is found, the Cache Manager verifies whether it has already been obfuscated and, in case it has, it delivers the cached content. Otherwise, it requests the Obfuscation Modules to obfuscate it and stores the result for future requests.

[00106] When the URL Translation Module receives 1010 a URL from the input processor 211, it queries 1020 the Cache Manager 1111 to determine whether it is a static document. However, following the on-demand caching method, if the document is static, the Cache Manager verifies 1110 whether it has already been cached. Since the document was still never requested during this session, it is not obfuscated and hence the manager requests 1120 an Obfuscation Module to perform 1130 the corresponding transformation. Note how, in the first aspect of FIG. 10, all documents were obfuscated in advance. Again, another option is for the input processor, instead of the URL Translation Module, to perform the query and obtain a reference to the cached document, which would accelerate the flow. Once the Cache Manager receives 1140 the obfuscated result, it determines a temporary storage location and provides 1030 the URL Translation Module a URL pointing to it. This URL is forwarded 1040 to the input processor and composes, together with a parameter table, the translated request. Once the content generator 212 receives 1050 the translated request, it directly retrieves 1060 the obfuscated content from the location where the Cache Manager stored it and provides it (1070, 1080) via the output processor 213 to the user browser 111 without the need to be processed by any additional Obfuscation Modules. The steps in FIG. 11 are only executed the first time a static document is requested inside one specific obfuscation session. In the following requests of the same document, however, the retrieval process works as previously described in FIG. 10. This mechanism enables progressive obfuscation of static documents and only obfuscates requested documents. The drawback is that it does not allow off-line obfuscation as the first aspect of FIG. 10 did.

[00107] The caching and retrieval mechanisms presented require configuration of a list of static documents in order for the Obfuscation Server to identify the static documents. Configuration of the mentioned list can be burdensome for the administrator and, in large applications, it can tend to contain errors. Moreover, this mechanism only allows caching of fully static documents, while some dynamically generated documents in a web site might contain also static parts. Hence, in another aspect of the invention, the Obfuscation Server accepts tagged code to identify static contents and documents, instead of configuration of a list of static documents. In this case, tags are identified by the Obfuscation Modules when they parse the document to build the Abstract Syntax Tree, as a result of which this option only works for on-demand caching, but not to cache contents at session start.

[00108] FIG. 12 illustrates the caching process 1200 followed when a document with static contents is requested for the first time. The input processor 211 obtains a translated request and forwards 1210 it to the content generator 212, where a non- obfuscated document is generated for delivery 1215. This document is sent to the output processor 213, which requests 1220 a corresponding Obfuscation Module to transform it. Once the Obfuscation Module starts parsing 1225 the document, it detects static contents, which can be the whole document or part of it. In order to illustrate the complexity of the process, let us consider that only a fraction of the document is static. Hence, the Obfuscation Module extracts the identifier attached to the tagged contents and forwards 1230 it to the Cache Manager. When the latter determines 1235 that the content has not been cached, it notifies 1240 the Obfuscation Module again who, in turn, obfuscates 1245 both the static and dynamic contents. The obfuscated version of the static elements is finally provided 1250 to the Cache Manager for storage and at the same time integrated 1255 with the obfuscated dynamic content. The integrated obfuscated contents are finally forwarded 1260 to the output processor 213 for delivery 1270 to the browser 111.

[00109] FIG. 13 illustrates the caching process 1300 followed when a document with static contents is requested again. The process is similar to the one described for FIG. 12, with the difference that the Cache Manager retrieves the cached static obfuscated code version and delivers 1310 it to the Obfuscation Module, which only obfuscates the dynamic contents and then integrates the cached static code. This system enables caching of static contents inside dynamic documents, which improves overall performance of the Obfuscation Server. However, when compared to the configuration of a list of static documents, it presents the drawback that Obfuscation Modules need to be involved in the retrieval process in order to identify the tags. In another aspect of the invention this problem can be solved by combining aspects of FIGs. 11 to 13, that is, using a configured list to identify fully static documents and tags to identify static parts in dynamic documents. This avoids the aforementioned limitations of the Obfuscation Module when retrieving static portions of dynamic documents.

Tagging

[00110] As already mentioned, one of the advantageous features of the present invention is that it automatically obfuscates the contents transmitted and de-obfuscates the contents received by any web application server. However, in order to generally achieve compatibility with any web application, some restrictions apply to the possible transformations. The JavaScript eval function evaluates a string as if it was an expression and is an example of such a limitation. Eval can be used, among others, to dynamically compose variable names using expressions. For instance, consider the existence of three variables VI, V2 and V3 and another one, an option O, which can be assigned a value of 1, 2 or 3. A parameter selection S can be defined as S = eval ("value V" + O) and selection will be assigned either VI ' s, V2's or V3's value, depending on O's value. Such an expression forces the variable names VI, V2 and V3, and the content of the option O not to be obfuscated.

[00111] One solution to this problem is to avoid obfuscating all variables referenced by eval expressions. But, back to the example, although identifying variable option O is relatively straightforward, this is not the case with values VI, V2 and V3. On the contrary, the names referenced by the example expression are available upon execution time, but not necessarily upon obfuscation. Identifying those variable names would require to search for every suffix "V" and to take into account all possible values of option O, which, if even theoretically feasible, is highly burdensome. Moreover, in general terms, establishing loose exceptions to obfuscation, like avoiding obfuscation of any variable referenced by an eval function, is not desirable, since it sets important limits to the sturdiness that the Obfuscation Server can provide.

[00112] Therefore in another aspect of the invention a tagging mechanism is proposed that can be used to explicitly indicate whether an expression can be obfuscated or not. Back to the previous example, for instance, the administrator can place tags to point out that variable names VI, V2 and V3 must not be obfuscated. This enables treatment of exceptions while not needing setting of too wide exceptions to the obfuscation algorithm. Moreover, to increase flexibility of the invention, besides on marking some parts of the text not to be obfuscated, such tags can also be used to specify a certain complexity level, either higher or lower. For instance, the login web page, which is especially sensitive, can be tagged to force a high complexity level upon obfuscation.

[00113] Therefore, in yet another aspect of the invention, tags can be used to mark relationships between code fractions that can be difficult to detect by an Obfuscation Module, in order to both increase sturdiness of the result and improve performance of the Obfuscation Server.

[00114] It is to be understood by the skilled person in the art that the various embodiments, realisations, and aspects of the invention have been so drafted with the aim of disclosing the invention in a concise manner. This does not mean that the intention is of limiting the scope of the disclosure to the precise combination of embodiments, realisations, and aspects as drafted. On the other hand, the intention is that the different features of the inventive concepts described may be readily understood to be combinable as would be derived from a direct and objective reading of the disclosure by one of ordinary skill in the art.

[00115] Furthermore, it is to be understood that the embodiments described herein may be implemented by hardware, software, firmware, middleware, microcode, or any combination thereof. When the systems and/or methods are implemented in software, firmware, middleware or microcode, program code or code segments, a computer program, they may be stored in a machine-readable medium, such as a storage component. A computer program or a code segment may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements. A code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data,and the like, may be passed, forwarded, or transmitted using any suitable means including memory sharing, message passing, token passing, network transmission, or others.

[00116] For a software implementation, the techniques described herein may be implemented with modules (for example, procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in memory units and executed by processors. The memory unit may be implemented within the processor or external to the processor, in which case it can be communicatively coupled to the processor through various means as is known in the art. Further, at least one processor may include one or more modules operable to perform the functions described herein.

[00117] Moreover, various aspects or features described herein may be implemented, on one hand, as a method or process or function, and on the other as an apparatus, a device, a system, or an article of manufacture using standard programming and/or engineering techniques. The term "article of manufacture" as used herein is intended to encompass a computer program accessible from any computer-readable device, carrier, or media. For example, computer-readable media can include but are not limited to magnetic storage devices (for example, hard disk, floppy disk, magnetic strips,and similar), optical disks (for example, compact disk (CD), digital versatile disk (DVD), and similar), smart cards, and flash memory devices (for example, EPROM, card, stick, key drive, and similar). Additionally, various storage media described herein can represent one or more devices and/or other machine-readable media for storing information. The term "machine-readable medium" can include, without being limited to, various media capable of storing, containing, and/or carrying instruction(s) and/or data. Additionally, a computer program product may include a computer readable medium having one or more instructions or codes operable to cause a computer to perform the functions described herein.

[00118] What has been described above includes examples of one or more embodiments. It is, of course, not possible to describe every conceivable combination, or permutation, of components and/or methodologies for purposes of describing the aforementioned embodiments. However one of ordinary skill in the art may recognize that many further combinations and permutations of various embodiments are possible within the general inventive concept clearly derivable from an objective reading of the present disclosure. Accordingly, the described embodiments are intended to embrace all such alterations, modifications and variations that fall within scope of the appended claims.

[00119] The various logical blocks, modules, and circuits described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor (DSP), and application specific integrated circuit (ASIC), a field programmable gate array (FPGA), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine.

[00120] Those skilled in the art should appreciate that the foregoing discussion of one or more embodiments does not limit the present invention, nor do the accompanying figures. Rather, the present invention is limited only by the following claims.

Claims

1. An apparatus for enhancing the security of a web application's data transmitted to and received from a user of at least one client device, wherein the apparatus comprises: means for receiving a web document; and

means for transforming the web document comprising

at least one obfuscation means for processing the web document to generate an obfuscated document from an original document; and/or translation means for processing the web document to generate an original document from an obfuscated document.

2. The apparatus of claim 1, wherein the web document comprises content data to be delivered by the web application server to the at least one client device or requests for content data received from the at least one client device at the web application server.

3. The apparatus of claim 2, further comprising controlling means for controlling the receiving means and the transforming means.

4. The apparatus of claim 3, wherein the at least one obfuscation means receives the web document from, and provides to, an output processor of the web server for forwarding to the client device.

5. The apparatus of claim 3, wherein the translation means receives the web document from, and provides to, an input processor of the web server once received from the client device.

6. The apparatus of claim 3, wherein the controlling means applies at least one transformation to the web document by selecting an obfuscation algorithm, wherein the at least one obfuscation means generates the obfuscated document by applying the algorithm to the original document, and/or the translation means generates the original document by applying the reverse algorithm to the obfuscated document.

7. The apparatus of claim 6, wherein the controlling means applies a series of transformations iteratively to the web document, wherein the output of one iteration is used as the input to the next.

8. The apparatus of claim 7, wherein the series of transformations are chosen by the controlling means randomly or in a deterministic manner.

9. The apparatus of claim 6, wherein each of the at least one obfuscation means comprises a code parser for generating an abstract syntax tree of the document, a code transformer for applying the corresponding obfuscation algorithm as determined by the controlling means to the output of the code parser, and a code serializer for converting the output of the code transformer into its original web document language.

10. The apparatus of claim 6, wherein each document may comprise at least one code type, and the transforming means comprises one obfuscation means for every code type, wherein the controlling means chooses a main obfuscation means and secondary obfuscation means, wherein the main obfuscation means distributes functions and codes to remaining secondary obfuscation means depending on the document type.

11. The apparatus of claim 6, wherein the translation means comprises at least one URL module and at least one parameter module for completely translating one document request.

12. The apparatus of claim 11, wherein the translation means comprises at least two parameter modules, one for translating a primary document request and the others for extracting and storing the parameters of secondary requests of same primary document, wherein the complete original document is assembled based on the translations of the URL module and the at least two parameter modules.

13. The apparatus of claim 7, wherein the controlling means determines the number of iterations to perform as a tradeoff between performance and security.

14. The apparatus of claim 3, wherein the controlling means stores the history and details of the transformations performed in an internal memory as transformation sessions.

15. The apparatus of claim 14, wherein the translation means applies the reverse obfuscation algorithm to the web document after retrieving information as provided by the controlling means.

16. The apparatus of claim 14, wherein the controlling means tags documents generated from the transforming means with at least one cookie, and stores relations of cookies with transformation session information in an internal memory.

17. The apparatus of claim 14, wherein the controlling means stores one set of transformations for every user session, or one set of transformations for a group of sessions.

18. The apparatus of claim 17, wherein the controlling means determines a maximum number of sessions per group, and/or a predetermined validity time for every session, and/or both in combination.

19. The apparatus of claim 3, further comprising caching means for storing obfuscated versions of web documents or parts of web documents for later use.

20. A method for enhancing the security of a web application's data transmitted to or received from a user of at least one client device, the method comprising: receiving a web document; and

transforming the web document comprising

applying at least one obfuscation transformation to the code of the web document to generate an obfuscated document from an original document; and/or

applying a reverse obfuscation transformation to the code of the web document to generate an original document from an obfuscated document.

21. A computer readable medium comprising instructions which, when executed on a computer, perform the steps of method claim 20.

22. A system for enhancing the security of a web application's data transmitted to, or received from, a user of at least one client device, the system comprising an input processor for receiving web document requests from at least one client device, a content generator for generating content in response to the web document requests received, and an output processor for providing web documents comprising the generated content to the at least one client device, and at least one apparatus according to any one of claims 1 to 19.