WO2002088968A1 - Apparatus and method for network analysis - Google Patents

Apparatus and method for network analysis Download PDF

Info

Publication number
WO2002088968A1
WO2002088968A1 PCT/US2002/013391 US0213391W WO02088968A1 WO 2002088968 A1 WO2002088968 A1 WO 2002088968A1 US 0213391 W US0213391 W US 0213391W WO 02088968 A1 WO02088968 A1 WO 02088968A1
Authority
WO
WIPO (PCT)
Prior art keywords
event
statement
session
sessions
entity
Prior art date
Application number
PCT/US2002/013391
Other languages
French (fr)
Inventor
Todd A. Moore
Mark E. Longworth
Brian P. Girardi
Damon S. Love
Original Assignee
Ctx Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ctx Corporation filed Critical Ctx Corporation
Publication of WO2002088968A1 publication Critical patent/WO2002088968A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/18Protocol analysers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/08Protocols for interworking; Protocol conversion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/18Multiprotocol handlers, e.g. single devices capable of handling multiple protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/06Notations for structuring of protocol data, e.g. abstract syntax notation one [ASN.1]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/10Streamlined, light-weight or high-speed protocols, e.g. express transfer protocol [XTP] or byte stream

Definitions

  • the present invention generally relates to the field of network analysis. More particularly, the present invention relates to methods and apparatus for parsing information in network protocols into a common language for analysis.
  • Computers on a network send information to each other as part of a communication session.
  • the data for this communication session is broken up by the network and transferred from a source address to a destination address.
  • This is analogous to the mail postal system, which uses zip codes, addresses, and known routes of travel to ship packages. If one were to ship the entire contents of a home to another location, it would not be cost effective or an efficient use of resources to package everything into one container for shipping. Instead, smaller containers would be used for the transportation and assembled after delivery.
  • Computer networks work in a similar fashion by taking data and packaging it into smaller pieces for transmitting across a network.
  • Each of these packets is governed by a set of rules that defines its structure and the service it provides.
  • the World Wide Web has a standard protocol defined for it, the Hyper Text Transport Protocol (HTTP). This standard protocol dictates how packets are constructed and how data is presented to web servers and how these web servers return data to the client web browsers.
  • HTTP Hyper Text Transport Protocol
  • Any application that transmits data over a computer network uses one or more protocols.
  • protocols There are many layers of protocols in use between computers on a network. Not only do web browsers have protocols they use to communicate, but the network has underlying protocols as well.
  • This technique is called data encapsulation. For example, when you make a request to a web site, your data request is encapsulated by the HTTP protocol used by your browser. The data is then encapsulated by the computer's network stack before it is put onto the network. The network may encapsulate the packet into another packet using another protocol for transmission to another network. Each layer of the protocol helps provide routing information to get the packets to their target destination.
  • a conventional protocol analyzer can provide insight into the type of protocols being used on a network.
  • the analysis tools within this analyzer enable the analyzer to decode protocols and examine individual packets. By examining individual packets, conventional protocol analyzers can determine where the packet came from, where it is going, and the data that it is carrying. It would be impossible to look at every packet on a network by hand to see if security concerns exist, therefore, more specialized analysis products were created.
  • IDS IDS Detection System
  • FIG. 10 Another example of a more specialized but conventional analysis tool is an application monitor.
  • Application monitors focus on specific application layer protocols to decide if illegal or suspicious activity is being performed.
  • This conventional application monitor may focus, for example, on the Hyper Text Transfer Protocol (HTTP) to monitor employee accesses to websites.
  • HTTP Hyper Text Transfer Protocol
  • the company can monitor the packets transmitted and received between the employee's computer and the web server. These packets can be analyzed by parsing the HTTP protocol to determine the website's hostname, the name of the file requested, and the associated content that was retrieved.
  • this HTTP analyzer could be used to decide if an employee is visiting inappropriate web sites and alert the company of this activity.
  • This type of analysis tool monitors the actions of web browsers, but falls short for other types of communications .
  • Another conventional application monitor can monitor the Simple Mail Transport
  • SMTP Short Protocol
  • This system could be used record and track e-mails sent outside of the company to ensure employees were not sending trade secrets or intellectual property owned by the company. It could also ensure e-mails entering into the corporation did not contain malicious attachments or viruses. Employees could, however, use other means of communication such as instant messaging, chat rooms, and website-based e-mail systems. Because this application monitor only monitors SMTP communications, companies must also use many other security and analytical tools to monitor network activity.
  • LCS log consolidator system
  • the LCS processes log-based output from network applications or devices. These data inputs can include firewall logs, router logs, application logs such as web server or mail server logs, computer system logs, and/or IDS alerts.
  • a specific LCS analysis tool is required for each different log format, which means multiple analysis systems are needed for each different type of log file format.
  • the present invention is directed to conversion of network traffic containing multiple protocols into a common language suited for analysis.
  • a unique analysis logic or a protocol-specific analyzer will not be needed for every protocol, thereby significantly reducing the complexity associated with conventional systems.
  • the common language of the present invention permits any network transaction, regardless of the particular application or protocol, to be described.
  • Metadata which describes the communication.
  • metadata means information taken from a communication or associated with a communication that describes the communication.
  • metadata can include the communication's start time; stop time; size; protocols used; computers, entities, and resources involved; routing information; aliases of the computers, entities, and resources; properties of communication; and other information useful to a person or computer analyzing the communication.
  • Common language descriptions of the metadata describing a communication often requires less than one percent of the storage space as the communication itself.
  • the common language is in the form of an event- based language that permits description of a communication in terms of its sessions, events, and properties.
  • protocol-specific data is parsed into an event-based language based on the nature of the transaction included within the data.
  • the present invention can be used in a variety of contexts, including transactions in a computer network, transactions in an application or device log file, transactions found on computer media, transactions in badge detectors, transactions generated by motion detectors, transactions generated in connection with phone calls, transactions generated in connection with credit card transactions, and other systems in which transactions occur according to one or more protocols.
  • systems with communications using multiple protocols, formats, and/or application types can benefit from the invention.
  • Figure 1 is a schematic diagram of a system for analyzing computer network traffic in accordance with an embodiment of the present invention.
  • Figure 2 is a schematic diagram of parsers in accordance with an embodiment of the present invention.
  • Figure 1 is a schematic diagram of a system for analyzing computer network traffic in accordance with an embodiment of the present invention.
  • Figure 2 is a schematic diagram of parsers in accordance with an embodiment of the present invention.
  • Figure 3 is a flow diagram of a method for analyzing data packets in accordance with an embodiment of the present invention.
  • Figure 4 is a flow diagram of a method for analyzing session data in accordance with an embodiment of the present invention.
  • Figure 5 is a schematic diagram of an event-based language in accordance with an embodiment of the present invention.
  • Figure 6 is a flow diagram of a method for generating an event-based language from data packets in accordance with an embodiment of the present invention.
  • Figure 7 illustrates an exemplary generation of an event-based language corresponding to an email session in accordance with the present invention.
  • Figure 8 illustrates an exemplary generation of an event-based language corresponding to a file transfer session in accordance with the present invention.
  • Figure 9a illustrates an exemplary generation and form of an event-based language in accordance with the present invention.
  • Figure 9b illustrates an exemplary generation and form of an event-based language in accordance with the present invention.
  • Figures 9c and 9d illustrate exemplary generations of an event-based language in accordance with the present invention.
  • Figure 12a illustrates an exemplary generation of an event-based language in accordance with the present invention.
  • Figure 12b illustrates an exemplary form of an event-based language in accordance with the present invention.
  • Figure 1 is a schematic diagram of a system for analyzing network traffic in accordance with an embodiment of the present invention.
  • the embodiment of the present invention shown in Figure 1 is a system configured to translate network communications or input files containing network communications into a common language for analysis.
  • this embodiment includes a system configured to input packets associated with communications across a network, assemble those packets into sessions, direct the sessions to appropriate parsers, parse the sessions into session in a common language, and communicate these common-language sessions to an analyzer.
  • a protocol-specific parser in accordance with the present invention can convert protocol-specific data at any network level into a common language.
  • the common language can be used to describe network layer communications including, for example: Ethernet, Token Ring, TCP/IP , IPX/SPX, AppleTalkTM, IPv6, and other network layer protocols.
  • the common language also can be used to describe application layer communications including, for example: SMTP, HTTP, TELNET, FTP, POP3, RIP , RPC, Lotus NotesTM, TDS, TNS, IRC, DNS, SMB, RIP, NFS, DHCP, NNTP, instant messengers (AOL IMTM, MSN, YAHOOTM) and other application layer protocols.
  • a network 102 represents any network whereby communication between two or more entities may be made or monitored.
  • Network 102 may be a simple network, for example, a cable connecting two computers, such as a computer 122 and a computer 124.
  • Network 102 may be a complex network as well, such as representing a network configured to pass, allow passage of, or monitoring of communications between computers, servers, wireless computers, satellites, or other communication devices.
  • network 102 may represent intranets, extranets, and global networks including the Internet.
  • Figure 1 sets forth a limited number of communication devices communicating through or monitored by network 102: computer 122; computer 124; a server 126; and a wireless computer 128.
  • communications between entities across or monitored by network 102 are made in pieces, rather than as a complete transfer. In such cases, a complete communication between two entities is broken into multiple pieces, or "packets," of data. Such packets conform to one or more protocols.
  • protocol or protocols depending on the context, refers to network protocols such as TCP/IP, IPX/SPX, or AppleTalkTM, as well as application protocols, such as FTP, SMTP, HTTP, and so forth.
  • the terms "protocol or protocols” unless the context establishes a particular protocol, is intended to include any protocol in which data may be represented or transferred in any communication system.
  • a packet handler 104 is configured to monitor the many packets of data in network
  • packet handler 104 can be a sniffer, such as EtherPeekTM available from WildPackets, Inc. In doing so, packet handler 104 is also configured to copy the packets in network 102. Packet handler 104 is also configured to send the packets to an assembler 106. Alternatively, assembler 106 may be configured to access the copied packets from packet handler 104. Packet handler 104 may also be configured to send the packets in real- time to an assembler 106 without recording the packets. In any event, assembler 106 is configured to receive the packets of data representing communications in network 102.
  • Packet handlers and assemblers may, in a preferred embodiment of the invention, be configured as set forth in co-pending U.S. Patent Application No. 09/552,878, filed April 20, 2000, claiming the benefit of U.S. Provisional Application No. 60/131,904, filed April 30, 1999, which is incorporated herein by reference in its entirety.
  • Assembler 106 is also configured to assemble the packets into the communication that the packets represent. Such communications are preferably assembled into sessions. Each session represents a communication between two or more entities.
  • assembler 106 is configured to assemble the packets into a set of sessions 110.
  • the set of sessions 110 can include sessions 110a, 110b, 110c, and 1 lOd. Sessions 110a, 110b, 110c, and 1 lOd can conform to the same protocol, or conform to different protocols. For example, one of the sessions, session 110b conforms to the well-known HTTP application protocol.
  • Sessions can also be generated by other session sources 108.
  • Other session sources can also be generated by other session sources 108.
  • Other session sources can also be generated by other session sources 108.
  • Session 1 lOe conforms to a protocol, which may be, but need not be, the same as the protocol associated with one of the sessions of set of sessions 110.
  • Sessions generated by assembler 106 or other session source, such as other session source 108, are transmitted (or input) to a parser director 112.
  • Parser director 112 is configured to accept sessions generated by assembler 106 or other session source 108. Parser director 112 directs each session to one of a set of protocol-specific parsers 116 corresponding to the protocol of the session.
  • Each protocol-specific parser in the set of protocol-specific parsers 116 is configured to receive sessions corresponding to that particular protocol.
  • protocol-specific parser 116a is configured to receive sessions conforming to the File Transfer Protocol (FTP).
  • Protocol-specific parser 116b is configured to receive sessions conforming to the Telnet protocol.
  • FTP File Transfer Protocol
  • Protocol-specific parser 116c is configured to receive sessions conforming to the HTTP protocol.
  • Protocol-specific parser 116d is configured to receive sessions conforming to MS instance messaging protocol.
  • Protocol-specific parser 116e is configured to receive sessions conforming to the Network News Transfer Protocol (NNTP).
  • Protocol-specific parser 116f is configured to receive sessions conforming to the Simple Mail Transfer Protocol (SMTP).
  • directed session 114c (related to session 110b) is directed to protocol-specific parser 116c because protocol-specific parser 116c is configured as an HTTP parser.
  • each protocol-specific parser is configured to produce a common language representation of each session that is input to it.
  • An analyzer 120 communicates with the output of any of the set of protocol-specific parsers 116. That is, analyzer 120 is configured to communicate with protocol-specific parsers 116 using the common language generated by each of the set of protocol-specific parsers 116. Thus, analyzer 120 can communicate with any of the protocol-specific parsers 116 regardless of the protocol of the sessions they are configured to handle. Consequently, using the common language output of protocol-specific parsers 116 eliminates the need to have a plurality of parsers corresponding to each of the protocols as required in conventional network analysis systems.
  • FIG. 2 is a schematic diagram illustrating the parser aspect of the present invention in greater detail.
  • Directed sessions 114 are the sessions output by parser director 112 according to the protocol(s) of the sessions. Directed sessions 114 are directed to a set of protocol-specific parsers 116. [0052] As shown in Figure 2, directed sessions 114 generally conform to disparate protocols.
  • Each directed session output by parser director 112 is input to a protocol-specific parser configured to process the protocol associated with that session.
  • FTP session 114a is input to an FTP-specific parser 116a.
  • Telnet session 114b is input to Telnet-specific parser 116b.
  • HTTP session 114c is input to HTTP-specific parser 116c.
  • MS Instant Messaging session 114d is input to MS Instant Messaging-specific parser 116d.
  • NNTP session 114e is input to NNTP-specific parser 116e.
  • SMTP session 114f is input to SMTP-specific parser 116f.
  • Protocol-specific parsers 116 process their input in order to output data conformed to a protocol-independent common language.
  • the term "common language” means a language that can be used to represent network traffic conformed from multiple, disparate protocols.
  • the content expressed in the form of the common language may be referred to herein as "metadata.”
  • the common language is an event-based language (described in greater detail below).
  • FTP-specific parser 116a outputs sessions in a common language 118a.
  • Telnet-specific parser 116b outputs session in a common language 118b.
  • HTTP-specific parser 116c outputs session in a common language 118c.
  • MS Instant Messaging-specific parser 116d outputs session in a common language 118d.
  • NNTP-specific parser 116e outputs session in a common language 118e.
  • SMTP-specific parser 116f outputs session in a common language 118f.
  • FIG 3 is a flow diagram of an embodiment of a method for analyzing network traffic in accordance with the present invention.
  • this method is practiced by a system that collects, assembles, and parses data conformed to multiple protocols into data conformed to a common language.
  • many different elements, configurations, or combination of elements can be used to implement the methods described below. For clarity, however, the below description of preferred methods of the invention uses many of the elements described in Figures 1 and 2.
  • packet handler 104 collects packets from network 102. Preferably, as part of collecting packets in step 302, packet handler 104 monitors communications comprising packets across network 102. hi one embodiment of the present invention, packet handler 104 collects packets by copying them from the monitored communications across network 102. The collected packets can be stored in a file (not shown).
  • step 304 packet handler 104 makes the collected packets available to assembler
  • Packet handler 104 can make the packets available to assembler 106 by storing the packets in a file that assembler 106 can access. In another exemplary embodiment, packet handler 104 makes the packets available to assembler 106 in real-time without recording the packets. In each of these embodiments, as part of step 304, assembler 106 receives the collected packets.
  • step 306 assembler 106 assembles the packets into sessions. These sessions preferably consist of packets of the same network protocol and preferably the same source/target addresses found in each network layer.
  • step 308 assembler 106 communicates the sessions, which conform to one or more protocols to parser director 112. Alternatively, parser director 112 may actively capture sessions 110 from assembler 106.
  • parser director 112 directs assembled sessions to protocol-specific parsers
  • parser director 112 performs protocol matching and lexical analysis of the session content to decide to which protocol-specific parsers 116 to direct each assembled session.
  • protocol-specific parsers 116 receive directed sessions 114 from parser director 112.
  • protocol-specific parsers 116 output the parsed sessions in the common language.
  • each of protocol-specific parsers 116 operates on sessions that conform to the protocol to which the parser is configured to parse. If there is more than one protocol present in the session data presented to parser director 112, preferably there will be a protocol-specific parser for each protocol present in the session data.
  • the protocol-specific parsers output a common language representation of the session data input to them.
  • the protocol-specific parsers parse metadata representative of the session data.
  • the metadata conforms to the common language.
  • protocol-specific parsers 116 submit the common language data to an analyzer.
  • Protocol-specific parsers 116 can also record common language data to a record (or log).
  • protocol-specific parsers 116 or analyzer 120 may access the common language data from the record. If protocol-specific parsers 116 access the common language data from the record, protocol-specific parsers 116 then communicate the common language data to analyzer 120.
  • step 318 analyzer 120 analyzes data conformed to the common language.
  • analyzer 120 is a workstation-based system having a graphical user interface (GUI) for formulating queries and performing other analyses on the database.
  • GUI graphical user interface
  • analysis tools such as those included in analyzer 120, do not have to be changed when protocols are added or changed because protocol-specific parsers 116 can be modified or added to the system. Sessions parsed into metadata in the common language are described in an exemplary embodiment as common language data in Figures 1 and 2 and as common-language sessions or sessions in common language herein.
  • Figure 4 is a flow diagram of another embodiment of a method for analyzing network communications in accordance with the present invention.
  • the method comprises steps for parsing information from sessions conforming to one or more protocols into metadata conforming to a common language.
  • Many different elements, configurations, or combinations of elements can be used to implement the methods described below. For clarity, however, the below description of preferred methods of the invention uses many of the elements set forth in Figures 1 and 2.
  • protocol-specific parsers 116 receive directed sessions 114. Each parser of protocol-specific parsers 116 receives only directed sessions 114 that conform, at least in part, with the protocol to which the receiving protocol-specific parser is configured to parse. For example, parser 116b is configured to parse sessions conformed to the Telnet protocol. Thus, parser 116b receives any session that, in part, conforms with the Telnet protocol (see Figure 2).
  • protocol-specific parsers 116 extract information from directed sessions
  • the extracted information can be stored in step 405.
  • protocol-specific parsers 116 translate the extracted information into a common language.
  • Telnet-specific parser 116b extracts session data conforming to the Telnet protocol and translates that data into the common language.
  • protocol-specific parsers 116 carefully extract only information generally useful in analyzing the communication(s) that each session represents.
  • this embodiment of the present invention creates a common language 118 representation of the session data that is significantly smaller than directed sessions 114 or sessions 110. Consequently, these representations are cheaper and more efficient to store.
  • the common language data is more quickly and easily analyzed due to its significantly smaller size.
  • step 408 protocol-specific parsers 116 communicate sessions in common language
  • protocol-specific parsers 116 may communicate each session of the sessions in common language 118 one-at-a-time or in groups to analyzer 120.
  • analyzer 120 analyzes sessions in common language 118. In this exemplary embodiment, only one analyzer 120 is used to analyze all of the sessions in common language 118.
  • one or more database records for storing the common language data is created in step 414. The database can be later accessed by an analyzer such as analyzer 120 to analyze the data.
  • FIG. 5 is a schematic diagram of another embodiment of a system for analyzing network traffic in accordance with the present invention.
  • this embodiment shows an exemplary embodiment of a common language, called an event-based language, to which network communications or input files containing communications are translated in preparation for analysis.
  • event-based language 502 follows a taxonomy of session 504, events 506, and properties 508.
  • event-based language 502 further comprises aliases 510 and routes 512.
  • each session corresponds to one or more network events.
  • sessions may be used to group events per computer per application.
  • a computer in communication with a server using a Netscape browser can be one session; the server response to the computer can be another session.
  • Sessions can be used to group events in other fashions, for example, in order to accommodate so-called "port-jumping" protocols.
  • sessions can encompass other sessions in a directory- type system structure.
  • Events 506 can be described in terms of entities 514 involved in each event of events
  • each event of events 506 corresponds to a communication between at least two entities 514.
  • Each event of events 506 can also be described in terms of various properties 508 associated it.
  • each event of events 506 can also be described in terms of aliases 510 of entities 514 for each event, and routes 512 associated with each event.
  • aliases 510 of entities 512 can be recorded as a property to each entity (not shown in Figure 5) and routes 512 can be recorded as indirect events to session 504.
  • each session (e.g., network transaction or other communication) can be converted to a standard set of outputs.
  • a protocol-specific parser such as one of protocol-specific parsers 116: events 506 and properties 508.
  • the metadata describing sessions involving a variety of protocols can be stored in as little as two basic tables. This is a significant benefit of the present invention in comparison to prior approaches.
  • the metadata conforming to the event-based language can be stored in a log or record having as little as two columns.
  • Figure 5 illustrates an exemplary structure of the event-based language as applied to transactions in a computer network.
  • each transaction will be grouped in a single session 504 and can be described in terms of one or more of: events 506, properties 508, aliases 510, and routes 512.
  • an entity of entities 514 can be one of three types: a computer 522, a user 520, or a resource 524.
  • an entity that is computer 522 could be a host, a server, a desktop, a laptop, and so forth.
  • Computer 522 could be identified by a network address, a computer name, a host name, a port number, and so forth.
  • Computer 522 can be a computer that is within network 102 ( Figure 1) or another network that is being accessed or one that is outside of either network 102 or the other network.
  • User 520 can be an individual, such as an authorized user on a computer network.
  • User 520 may be an e-mail address, a local area network (LAN) user, the "Full Name” (real name) of the user, a handle or name used to identify user 520, and so forth.
  • LAN local area network
  • Full Name real name
  • Resource 524 may be a resource that is accessed or used during an event.
  • resource 524 may be a file, data from within a database, or a message from a shared bulletin board.
  • Resource 524 can also be a container of other resources, such as a file system directory structure, a database, tables in a database, or a shared bulletin board. Examples of entity types, such as resource 524, computer 522, and user 520, and corresponding numerical representations are:
  • IP-PORT IP-PORT
  • IP-USER IP-USER
  • the common language is represented by an event-based language.
  • the event-based language permits events on a computer network to be described using so-called event statements.
  • an event can refer to transactions between or involving differing types of entities, such as the following interactions between entities: computer - computer; user- computer, user- user, user- resource, and so forth.
  • An event statement 526 describes an action taken by one entity with respect to at least one other entity using a service.
  • each event statement 526 preferably comprises two parameters: (1) one or more entities 514; and (2) an action 516.
  • a session statement 534 describes a session. As such, each session statement 534 includes some facts about session 504. In an exemplary embodiment, session statement 534 includes the times that session 504 began/ended, the size of session 504 (e.g., 1.5 MB), and a service type 518 of the session. Generally, service types (sometimes referred to herein as "services" or “applications”) refers to or is related to a protocol or application used during network communications.
  • a property statement 528 preferably includes facts about either session 504 or event 506. In an exemplary embodiment where event 506 includes an email communication, property statement 528 can include the subject line of the email communication.
  • a route statement 532 preferably includes facts about the route that an event traveled.
  • An alias statement 530 preferably includes information regarding the identity of user 520, computer 522, or resource 524.
  • Examples of actions that might be logged into a record using the event-based language for network level communications include: an ETHERNET transaction, an IP transaction, or a TCP transaction.
  • Examples of actions that might be logged into a record at the application level include: a "user login” (a user attempting or obtaining access to a system) a "user logoff," a "get resource” (e.g., getting or acquiring a resource, such as downloading a file or selecting a database row), a "put resource” (e.g., performing an operation using a resource, such as saving a file, uploading a file, or inserting a database row), a "delete resource” (e.g., removing a resource, such as deleting a file or database row), a "send message” (e.g., sending an e-mail or sending an Instant Message), a "receive message” (e.g., receiving an e-mail or receiving an Instant Message), a "read message” (e
  • actions can be used in order to tailor the common language to a particular computer network or to accommodate new applications.
  • the library of actions is sufficient to describe actions, such as action 516, taken in connection with a communication between two entities, such as entities 514.
  • Examples of services that might be logged into a record using the common language include: File Transfer Protocol (FTP), TELNET, Simple Mail Transfer Protocol (SMTP), Domain Name Service (DNS), Hypertext Transfer Protocol (HTTP), POP3, Network News Transfer Protocol (NNTP), Server Message Block (SMB), MSSQLTM/SybaseTM Database protocol (e.g., TDS), OracleTM Database Protocol (e.g., TNS), Lotus NotesTM, Dynamic Host Configuration Protocol (DHCP), Remote Procedure Call (RPC), Routing Information Protocol (RIP), Network File System (NFS), and Instant Messenger Protocols (AOLTM, MSN, YahooTM, etc.).
  • FTP File Transfer Protocol
  • TELNET Simple Mail Transfer Protocol
  • DNS Domain Name Service
  • HTTP Hypertext Transfer Protocol
  • HTTP Hypertext Transfer Protocol
  • NTP Network News Transfer Protocol
  • SMB Server Message Block
  • MSSQLTM/SybaseTM Database protocol e.g., TDS
  • OracleTM Database Protocol e.g., TNS
  • event statement 526 can be expressed in the form: ⁇ ENTITY1> was seen ⁇ ACTION> to ⁇ ENTITY2>.
  • event statement 526 can also include service type 518, as shown in Figure 9a.
  • the expression of event statement 526 is of the form: ⁇ ENTITY1> was seen ⁇ ACTION> to ⁇ ENTITY2> with ⁇ SERVICE TYPE> for an event of events 506 involving two entities of entities 514, one at the "source” end and one at the "target” end.
  • event statement 526 can be expressed as: ⁇ ENTITY1A, ENTITY1B> was seen ⁇ ACTION> to ⁇ ENTITY 2A, ENTITY2B> with ⁇ SERVICE TYPE>, also as shown in Figure 9a.
  • event 506 for a first user (TODD) of entities 514 sending an e-mail to a second user (DAMON) of entities 514 can be expressed by event statement 526 conformed to the following form: ⁇ USER TODD> was seen ⁇ SENDING MESSAGE> to ⁇ USER DAMON> with ⁇ SMTP>, as shown in Figure 9a.
  • event 506 for a user (TODD) of entities 514 using a first computer to receive via File Transfer Protocol (FTP) a file containing a password stored on a second computer can be expressed by event statement 526 conformed to the following form: ⁇ COMPUTER 192.168.1.2, USER TODD> was seen ⁇ GETTING RESOURCE> from ⁇ COMPUTER 192.168.1.1, RESOURCE: /etc/passwd> using ⁇ FTP>, as shown in Figure 9a.
  • Protocol-specific parsers 116 do not have to output events in the format of event statement 526.
  • protocol-specific parsers 116 extract and output three parameters that can form event statement 526: entities, action, and service type. These basic parameters can be stored and, if desired, displayed in event statement format for a readily comprehended metadata description of the event, or in some other format.
  • Each event 506 may also have properties associated with the event.
  • event 506 corresponding to an e-mail e.g., referring to the action types listed above, the action type "SENDJVISG” and the service "E-mail (SMTP)" may have associated properties.
  • the properties for such an e-mail may include the subject line of the e-mail ("IMPORTANT INFORMATION, PLEASE READ"), the sender password ("testl2”), and the application used for the action (“Outlook Express").
  • Figure 9b illustrates an exemplary property name-value pair for storing properties associated with an event.
  • Figure 9b shows three name fields: "subject,” “password,” and “application.”
  • Figure 9b shows three values for those name fields: “IMPORTANT INFORMATION, PLEASE READ”, “testl2”, and "Outlook Express”.
  • Other property types or fields could be included, such as the size of the event, the time of the event, file attachments, full names of the sender and all recipients, and so forth.
  • Each event such as event 506, may also have associated routes, such as route 512.
  • Route 512 refers to network communication information that may be carried within captured data, but that was not directly observed in collecting the data.
  • a collected e-mail may include a list or log of the servers through which the e-mail message passed. This internal routing information, while not directly observed, can be extracted and stored.
  • Figure 9c illustrates an exemplary format for capturing the routing information.
  • the exemplary format is a ⁇ COMPUTER ENTITY> to ⁇ COMPUTER ENTITY> format.
  • Event 506 may have multiple routes 512 corresponding to multiple route statements, each like the one shown in Figure 9c.
  • Each event such as event 506, may also have associated aliases, such as alias 510.
  • Aliases 510 are names or values for an entity (e.g., a computer or a user) that describe the same entity.
  • event 506 may involve a computer entity, such as computer 522, defined by the IP address "192.168.1.12.”
  • Event 506 may also involve a user entity, such as user 520, defined by the e-mail address "todd@forensicsexplorers.com.”
  • Computer 522 may be correlated to the alias "forensicsexplorer.com” and user 520 may be correlated to the alias "Todd Moore.”
  • Figure 9d illustrates an exemplary storage format for storing alias information for events. Therefore, the present invention provides that when event 506 is extracted the observed entities 514 can be correlated to known aliases 510. This information can be stored and associated with event 506 for later review and/or processing.
  • the invention parses information from each session or other communication data.
  • the invention parses information following the method set forth in Figure 6.
  • Figure 6 provides a flow diagram for an exemplary method for converting sessions into the event-based language.
  • the event-based language is one example of a common language according to the present invention.
  • the step of identifying event routes may comprise treating an identified route as an "indirect event.”
  • the step of identifying aliases may comprise treating an identified alias as a property of an entity. This might permit storing routes in an event table and aliases in the properties table. By treating routes and aliases under the rubric of events and properties, respectively, the number of tables required for a log or file of the sessions can be reduced.
  • assembler 106 receives packets in step 602.
  • the packets are assembled into sessions in step 604.
  • Protocol-specific parsers 116 (in this case one parser for each protocol in the session), extract session properties in step 606.
  • Protocol-specific parsers 116 then identify events in step 608, identify routes in step 610, identify entities in step 612, identify entity aliases in step 614, identify actions in step 616, and extract event properties in step 618, from within the session.
  • Protocol-specific parsers 116 continue to parse the session until all events within the session have been parsed in step 620.
  • Protocol-specific parsers 116 parse other sessions, according to step 620 and so forth.
  • the method illustrated in Figure 6 presumes that the service type will be the same for all events in a session. Accordingly, the service is extracted as a property of the session. Alternatively, the service type can be identified for each event. In that case, the method performs the step of identifying a service type in the session in step 617.
  • Figure 7 illustrates an example of the present invention to parse an SMTP (Simple
  • the area "A” displays data from the session in protocol, which consists of multiple data packets for an e- mail that was sent from one user to another.
  • the session includes network-level data (e.g., Ethernet and TCP/IP) and application data (e.g., SMTP and Microsoft Outlook).
  • network-level data e.g., Ethernet and TCP/IP
  • application data e.g., SMTP and Microsoft Outlook
  • Area "B” displays the metadata that describes the session according to the event- based language.
  • the overall SMTP session is described by four properties: time, size, service, and subject (not shown).
  • the session includes three separate events: (1) a first event between the source computer (entity) and the target computer (entity) for an IP transaction (action); (2) a second event between the port (entity) of the source computer and the port (entity) of the target computer for a TCP transaction (action); and (3) a third event between the source user (entity) and the target user (entity) for sending a message (action).
  • the service type (SMTP) is not separately recited for each of the events because it is the same for all events in the session.
  • Properties of the third event are also identified.
  • the properties include the identity of the application (MS Outlook) and the attached file (winmail.dat).
  • Figure 8 illustrates an example of applying the present invention to parse an FTP (File
  • Transfer Protocol Transfer Protocol
  • a user has logged into a site, stored a file, retrieved some data, and then deleted the file.
  • area "A” of Figure 8 network-level data and application data from the packets and within the session are shown.
  • the session is translated into metadata conformed to the event-based language shown in area "B.”
  • FIGs 7 and 8 provide an exemplary illustration of the benefits of the invention.
  • the protocol-specific data in area A for both figures is complex and unwieldy. More importantly, the extracted data for the SMTP session (shown in Figure 7) is very different from the extracted data for the FTP session (shown in Figure 8). Additionally, the extracted data (area A) is not readily or easily understood in terms of the events that took place. Without the present invention, logs of SMTP sessions and FTP sessions would require separate analysis tools to be analyzed.
  • Figures 10, 1 la, and 1 lb provide a record of an exemplary embodiment of data from protocol-specific sessions.
  • Figure 10 illustrates data from a session conforming to the HTTP protocol.
  • Figure 11a illustrates data from a session conforming to the SMTP protocol.
  • Figure lib illustrates data from a session conforming to the FTP protocol.
  • Figure 12a illustrates a log output file of the three sessions illustrated in part in
  • Figures 10, 1 la, and 1 lb after they have been parsed into metadata conformed to the event- based language of the present invention.
  • the metadata for the first session is represented in the first seven lines of the exemplary log output file.
  • the metadata for the second session is represented in lines eight to eighteen of the exemplary log output file.
  • the metadata for the third session is represented in lines nineteen to twenty-three of the exemplary log output file. This output follows the form shown in Figure 12b.
  • R relate to types of metadata about the route or routes taken by the session of data or the data packets that comprise the session.
  • the output of this exemplary embodiment of the invention shows parsing of sessions in disparate protocols into a compact output conforming to a common language.

Abstract

A system for and method of extracting infomraitn from multiple sessions of disparate protocols into a common language is disclosed. A method of creating a record conforming to an event-based language is also disclosed. A system configured to create a record conforming to an event-based language is also disclosed.

Description

APPARATUS AND METHOD FOR NETWORK ANALYSIS
Background [0001] This application claims the benefit of U.S. Provisional Application serial No.
60/286,966, filed April 30, 2001.
Government Rights [0002] The invention was made with Government support under a classified contract awarded by the U.S. Government. The Government may have certain rights in the invention.
Field of the Invention [0003] The present invention generally relates to the field of network analysis. More particularly, the present invention relates to methods and apparatus for parsing information in network protocols into a common language for analysis.
Background of the Invention [0004] Not long ago, people communicated important information between one another through the physical delivery of paper. Delivering documents in this way to convey important information once dominated business but has since been largely displaced by electronic delivery and communication. Whether it is by email or otherwise, today people send many sensitive and important documents and information electronically. [0005] The movement to electronic distribution of information has increased businesses' awareness of security issues. Electronic files are easy to copy and transmit out of an unwitting organization. Potential saboteurs like hackers, for example, can access, steal, alter, and/or destroy important information. [0006] This increased awareness in security issues concerning electronic communications led companies to begin to monitor data transfers between entities, such as people, computers, and resources. The enormous volume of data generated by communications between entities (e.g., people viewing websites, people sending emails to one another, people transferring files to one another, and many other communications) made it difficult for a company to monitor all of the communication information. To help alleviate this problem, companies developed systems that analyze communications to determine which communications are likely illegal or otherwise prohibited by the companies' business rules.
[0007] Computers on a network send information to each other as part of a communication session. The data for this communication session is broken up by the network and transferred from a source address to a destination address. This is analogous to the mail postal system, which uses zip codes, addresses, and known routes of travel to ship packages. If one were to ship the entire contents of a home to another location, it would not be cost effective or an efficient use of resources to package everything into one container for shipping. Instead, smaller containers would be used for the transportation and assembled after delivery. Computer networks work in a similar fashion by taking data and packaging it into smaller pieces for transmitting across a network. Each of these packets is governed by a set of rules that defines its structure and the service it provides. For example, the World Wide Web has a standard protocol defined for it, the Hyper Text Transport Protocol (HTTP). This standard protocol dictates how packets are constructed and how data is presented to web servers and how these web servers return data to the client web browsers.
[0008] Any application that transmits data over a computer network uses one or more protocols. There are many layers of protocols in use between computers on a network. Not only do web browsers have protocols they use to communicate, but the network has underlying protocols as well. This technique is called data encapsulation. For example, when you make a request to a web site, your data request is encapsulated by the HTTP protocol used by your browser. The data is then encapsulated by the computer's network stack before it is put onto the network. The network may encapsulate the packet into another packet using another protocol for transmission to another network. Each layer of the protocol helps provide routing information to get the packets to their target destination.
[0009] In order for a company to analyze or monitor its users' traffic effectively, companies typically use tool(s) to: "sniff or capture the packets traversing the network of interest; understand the protocol being used in the communication; analyze the data packets used in the communication; and draw conclusions based on information gained from this analysis. Conventional tools for analyzing network traffic include protocol analyzers, intrusion detection systems, application monitors, log consolidators, and combinations of these tools.
[0010] A conventional protocol analyzer can provide insight into the type of protocols being used on a network. The analysis tools within this analyzer enable the analyzer to decode protocols and examine individual packets. By examining individual packets, conventional protocol analyzers can determine where the packet came from, where it is going, and the data that it is carrying. It would be impossible to look at every packet on a network by hand to see if security concerns exist, therefore, more specialized analysis products were created.
[0011] One example of a more specialized but conventional analysis tool is an Intrusion
Detection System (IDS), which validates network packets based on a series of known signatures. If the IDS determines that certain packets are invalid or suspicious, the IDS will alert the company. Company employees, in some cases using additional analysis tools, must then analyze most of these alerts. This analysis can require extensive manpower and resources.
[0012] Another example of a more specialized but conventional analysis tool is an application monitor. Application monitors focus on specific application layer protocols to decide if illegal or suspicious activity is being performed. This conventional application monitor may focus, for example, on the Hyper Text Transfer Protocol (HTTP) to monitor employee accesses to websites. When this monitor is used, such as when an employee visits a website, the company can monitor the packets transmitted and received between the employee's computer and the web server. These packets can be analyzed by parsing the HTTP protocol to determine the website's hostname, the name of the file requested, and the associated content that was retrieved. Thus, this HTTP analyzer could be used to decide if an employee is visiting inappropriate web sites and alert the company of this activity. This type of analysis tool monitors the actions of web browsers, but falls short for other types of communications .
[0013] Another conventional application monitor can monitor the Simple Mail Transport
Protocol (SMTP). This system could be used record and track e-mails sent outside of the company to ensure employees were not sending trade secrets or intellectual property owned by the company. It could also ensure e-mails entering into the corporation did not contain malicious attachments or viruses. Employees could, however, use other means of communication such as instant messaging, chat rooms, and website-based e-mail systems. Because this application monitor only monitors SMTP communications, companies must also use many other security and analytical tools to monitor network activity.
[0014] Another example of a more specialized but conventional analysis tool is a log consolidator system (LCS). The LCS processes log-based output from network applications or devices. These data inputs can include firewall logs, router logs, application logs such as web server or mail server logs, computer system logs, and/or IDS alerts. Typically, a specific LCS analysis tool is required for each different log format, which means multiple analysis systems are needed for each different type of log file format.
[0015] While these and other conventional network analysis systems analyze communications of a particular protocol or format, they fail to analyze a broad breadth of protocols and formats. Thus, a company wishing to ensure security of its network currently must purchase and maintain multiple network analysis systems. Further, with each new protocol or protocol change, companies must create, rewrite, upgrade, or repurchase at least one of their systems. The conventional method of using a patch- work of multiple analyzers is expensive and complex to maintain.
[0016] In addition, because of the many ways to communicate over a network and the many different analysis tools needed to perform network forensics, the conventional method makes it difficult to answer even simple questions such as "What is happening on my network?," "Who is talking to whom?," and "What resources are being accessed?" It is difficult because there is no limit as to which applications one can use. Each application introduced onto a network brings new protocols and new analytical tools to audit those applications. For example, there are many ways to send a file to another person using a network: E-mailing the document as an attachment using the SMTP protocol; transmitting the file using an Instant Messenger like MSN, AOL IM™, or Yahoo™ IM; uploading the file to a shared file server using the FTP protocol; web sharing the document using the HTTP protocol; or uploading the file directly using an intranet protocol like SMB or CIFS. All of these protocols are implemented differently and special analysis tools are required to interpret them; a complex and expensive system.
[0017] The conventional analysis systems also fail because they require training personnel to use the numerous analysis tools needed to investigate network communications having many different protocols. This training is expensive. In addition, network analysis continues to become increasingly difficult due to the large number of new applications and protocols being introduced every year.
[0018] Other systems found outside of computer networks have similar issues regarding analysis. These issues can be found in "badge swipe" systems, used to monitor the movement of persons in and out of a building, in traffic monitoring systems that monitor cars passing through radio frequency identification (RFID) toll points, property monitoring systems that monitor video cameras and various motion sensors or other sensors, and in other contexts involving the collection and analysis of data of varying protocols or languages. Specific analytical tools must be developed for each collection system making it difficult to cross-correlate events and perform analysis. Summary of the Invention
[0019] To address the foregoing problems and others associated with monitoring large volumes of data in numerous protocols, the present invention is directed to conversion of network traffic containing multiple protocols into a common language suited for analysis. In addition, because data in multiple, disparate protocols may be described in a common language, a unique analysis logic or a protocol-specific analyzer will not be needed for every protocol, thereby significantly reducing the complexity associated with conventional systems.
[0020] In one aspect of the invention, the common language of the present invention permits any network transaction, regardless of the particular application or protocol, to be described.
[0021] In another aspect of the invention, common language descriptions are stored as
"metadata," which describes the communication. As used herein, the term "metadata" means information taken from a communication or associated with a communication that describes the communication. For example, metadata can include the communication's start time; stop time; size; protocols used; computers, entities, and resources involved; routing information; aliases of the computers, entities, and resources; properties of communication; and other information useful to a person or computer analyzing the communication. Common language descriptions of the metadata describing a communication often requires less than one percent of the storage space as the communication itself.
[0022] In another aspect of the invention, the common language is in the form of an event- based language that permits description of a communication in terms of its sessions, events, and properties.
[0023] In another aspect of the invention, protocol-specific data is parsed into an event-based language based on the nature of the transaction included within the data.
[0024] The present invention can be used in a variety of contexts, including transactions in a computer network, transactions in an application or device log file, transactions found on computer media, transactions in badge detectors, transactions generated by motion detectors, transactions generated in connection with phone calls, transactions generated in connection with credit card transactions, and other systems in which transactions occur according to one or more protocols. Generally, systems with communications using multiple protocols, formats, and/or application types can benefit from the invention.
[0025] Additional features and advantages of the present invention will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the invention. The objectives and advantages of the invention will be realized and attained by the structure and steps particularly pointed out in the written description, the claims and the drawings. Brief Description of the Drawings
[0026] Figure 1 is a schematic diagram of a system for analyzing computer network traffic in accordance with an embodiment of the present invention.
[0027] Figure 2 is a schematic diagram of parsers in accordance with an embodiment of the present invention. Brief Description of the Drawings [0026] Figure 1 is a schematic diagram of a system for analyzing computer network traffic in accordance with an embodiment of the present invention. {0027] Figure 2 is a schematic diagram of parsers in accordance with an embodiment of the present invention. [0028] Figure 3 is a flow diagram of a method for analyzing data packets in accordance with an embodiment of the present invention. [0029] Figure 4 is a flow diagram of a method for analyzing session data in accordance with an embodiment of the present invention. [0030] Figure 5 is a schematic diagram of an event-based language in accordance with an embodiment of the present invention. [0031] Figure 6 is a flow diagram of a method for generating an event-based language from data packets in accordance with an embodiment of the present invention. [0032] Figure 7 illustrates an exemplary generation of an event-based language corresponding to an email session in accordance with the present invention. [0033] Figure 8 illustrates an exemplary generation of an event-based language corresponding to a file transfer session in accordance with the present invention. [0034] Figure 9a illustrates an exemplary generation and form of an event-based language in accordance with the present invention. [0035] Figure 9b illustrates an exemplary generation and form of an event-based language in accordance with the present invention. [0036] Figures 9c and 9d illustrate exemplary generations of an event-based language in accordance with the present invention. [0040] Figure 12a illustrates an exemplary generation of an event-based language in accordance with the present invention.
[0041] Figure 12b illustrates an exemplary form of an event-based language in accordance with the present invention.
Detailed Description of the Invention
[0042] Figure 1 is a schematic diagram of a system for analyzing network traffic in accordance with an embodiment of the present invention. Generally, the embodiment of the present invention shown in Figure 1 is a system configured to translate network communications or input files containing network communications into a common language for analysis. Specifically, this embodiment includes a system configured to input packets associated with communications across a network, assemble those packets into sessions, direct the sessions to appropriate parsers, parse the sessions into session in a common language, and communicate these common-language sessions to an analyzer.
[0043] For example, a protocol-specific parser in accordance with the present invention can convert protocol-specific data at any network level into a common language. The common language can be used to describe network layer communications including, for example: Ethernet, Token Ring, TCP/IP , IPX/SPX, AppleTalk™, IPv6, and other network layer protocols. The common language also can be used to describe application layer communications including, for example: SMTP, HTTP, TELNET, FTP, POP3, RIP , RPC, Lotus Notes™, TDS, TNS, IRC, DNS, SMB, RIP, NFS, DHCP, NNTP, instant messengers (AOL IM™, MSN, YAHOO™) and other application layer protocols. The common language can also be used to describe the content of communications including, for example: E-Mail messages, PGP, S/MIME, V-Card, HTML, images, and other content types. [0044] In Figure 1, a network 102 represents any network whereby communication between two or more entities may be made or monitored. Network 102 may be a simple network, for example, a cable connecting two computers, such as a computer 122 and a computer 124. Network 102 may be a complex network as well, such as representing a network configured to pass, allow passage of, or monitoring of communications between computers, servers, wireless computers, satellites, or other communication devices. For example, network 102 may represent intranets, extranets, and global networks including the Internet. For clarity in explaining but not to limit the function of network 102, Figure 1 sets forth a limited number of communication devices communicating through or monitored by network 102: computer 122; computer 124; a server 126; and a wireless computer 128.
[0045] Typically, communications between entities across or monitored by network 102 are made in pieces, rather than as a complete transfer. In such cases, a complete communication between two entities is broken into multiple pieces, or "packets," of data. Such packets conform to one or more protocols. As used herein, the terms "protocol or protocols," depending on the context, refers to network protocols such as TCP/IP, IPX/SPX, or AppleTalk™, as well as application protocols, such as FTP, SMTP, HTTP, and so forth. In other words, the terms "protocol or protocols," unless the context establishes a particular protocol, is intended to include any protocol in which data may be represented or transferred in any communication system.
[0046] A packet handler 104 is configured to monitor the many packets of data in network
102. For example, packet handler 104 can be a sniffer, such as EtherPeek™ available from WildPackets, Inc. In doing so, packet handler 104 is also configured to copy the packets in network 102. Packet handler 104 is also configured to send the packets to an assembler 106. Alternatively, assembler 106 may be configured to access the copied packets from packet handler 104. Packet handler 104 may also be configured to send the packets in real- time to an assembler 106 without recording the packets. In any event, assembler 106 is configured to receive the packets of data representing communications in network 102. Packet handlers and assemblers may, in a preferred embodiment of the invention, be configured as set forth in co-pending U.S. Patent Application No. 09/552,878, filed April 20, 2000, claiming the benefit of U.S. Provisional Application No. 60/131,904, filed April 30, 1999, which is incorporated herein by reference in its entirety.
[0047] Assembler 106 is also configured to assemble the packets into the communication that the packets represent. Such communications are preferably assembled into sessions. Each session represents a communication between two or more entities. In an exemplary embodiment of the present invention, assembler 106 is configured to assemble the packets into a set of sessions 110. For example, the set of sessions 110 can include sessions 110a, 110b, 110c, and 1 lOd. Sessions 110a, 110b, 110c, and 1 lOd can conform to the same protocol, or conform to different protocols. For example, one of the sessions, session 110b conforms to the well-known HTTP application protocol.
[0048] Sessions can also be generated by other session sources 108. Other session sources
108 can generate sessions that conform to a specific application type or protocol. These sources typically do not require the assembler 106 to reconstruct the network packets into a session. As shown in Figure 1, for example, other session sources 108 generates a session 1 lOe. Session 1 lOe conforms to a protocol, which may be, but need not be, the same as the protocol associated with one of the sessions of set of sessions 110.
[0049] Sessions generated by assembler 106 or other session source, such as other session source 108, are transmitted (or input) to a parser director 112. Parser director 112 is configured to accept sessions generated by assembler 106 or other session source 108. Parser director 112 directs each session to one of a set of protocol-specific parsers 116 corresponding to the protocol of the session. Each protocol-specific parser in the set of protocol-specific parsers 116 is configured to receive sessions corresponding to that particular protocol. For example, protocol-specific parser 116a is configured to receive sessions conforming to the File Transfer Protocol (FTP). Protocol-specific parser 116b is configured to receive sessions conforming to the Telnet protocol. Protocol-specific parser 116c is configured to receive sessions conforming to the HTTP protocol. Protocol-specific parser 116d is configured to receive sessions conforming to MS instance messaging protocol. Protocol-specific parser 116e is configured to receive sessions conforming to the Network News Transfer Protocol (NNTP). Protocol-specific parser 116f is configured to receive sessions conforming to the Simple Mail Transfer Protocol (SMTP). For example, directed session 114c (related to session 110b) is directed to protocol-specific parser 116c because protocol-specific parser 116c is configured as an HTTP parser. As described in detail below, each protocol-specific parser is configured to produce a common language representation of each session that is input to it.
[0050] An analyzer 120 communicates with the output of any of the set of protocol-specific parsers 116. That is, analyzer 120 is configured to communicate with protocol-specific parsers 116 using the common language generated by each of the set of protocol-specific parsers 116. Thus, analyzer 120 can communicate with any of the protocol-specific parsers 116 regardless of the protocol of the sessions they are configured to handle. Consequently, using the common language output of protocol-specific parsers 116 eliminates the need to have a plurality of parsers corresponding to each of the protocols as required in conventional network analysis systems.
[0051] Figure 2 is a schematic diagram illustrating the parser aspect of the present invention in greater detail. Directed sessions 114 are the sessions output by parser director 112 according to the protocol(s) of the sessions. Directed sessions 114 are directed to a set of protocol-specific parsers 116. [0052] As shown in Figure 2, directed sessions 114 generally conform to disparate protocols.
For example, in the embodiment illustrated in Figure 2, six sessions having different protocols are shown. The six protocols are FTP, Telnet, HTTP, MS Instant Messaging, NNTP, and SMTP. It would be apparent to those skilled in the art that the illustrated protocols are by way of example only. Any set of protocols could be represented. Each directed session output by parser director 112 is input to a protocol-specific parser configured to process the protocol associated with that session. For example, as illustrated in Figure 2, FTP session 114a is input to an FTP-specific parser 116a. Telnet session 114b is input to Telnet-specific parser 116b. HTTP session 114c is input to HTTP-specific parser 116c. MS Instant Messaging session 114d is input to MS Instant Messaging-specific parser 116d. NNTP session 114e is input to NNTP-specific parser 116e. SMTP session 114f is input to SMTP-specific parser 116f.
[0053] Protocol-specific parsers 116 process their input in order to output data conformed to a protocol-independent common language. As used herein, the term "common language" means a language that can be used to represent network traffic conformed from multiple, disparate protocols. The content expressed in the form of the common language may be referred to herein as "metadata." In an exemplary embodiment, the common language is an event-based language (described in greater detail below). For example, FTP-specific parser 116a outputs sessions in a common language 118a. Telnet-specific parser 116b outputs session in a common language 118b. HTTP-specific parser 116c outputs session in a common language 118c. MS Instant Messaging-specific parser 116d outputs session in a common language 118d. NNTP-specific parser 116e outputs session in a common language 118e. SMTP-specific parser 116f outputs session in a common language 118f.
[0054] Figure 3 is a flow diagram of an embodiment of a method for analyzing network traffic in accordance with the present invention. Generally, this method is practiced by a system that collects, assembles, and parses data conformed to multiple protocols into data conformed to a common language. As would be known to those skilled in the art, many different elements, configurations, or combination of elements can be used to implement the methods described below. For clarity, however, the below description of preferred methods of the invention uses many of the elements described in Figures 1 and 2.
[0055] In step 302, packet handler 104 collects packets from network 102. Preferably, as part of collecting packets in step 302, packet handler 104 monitors communications comprising packets across network 102. hi one embodiment of the present invention, packet handler 104 collects packets by copying them from the monitored communications across network 102. The collected packets can be stored in a file (not shown).
[0056] In step 304, packet handler 104 makes the collected packets available to assembler
106. Packet handler 104 can make the packets available to assembler 106 by storing the packets in a file that assembler 106 can access. In another exemplary embodiment, packet handler 104 makes the packets available to assembler 106 in real-time without recording the packets. In each of these embodiments, as part of step 304, assembler 106 receives the collected packets.
[0057] In step 306, assembler 106 assembles the packets into sessions. These sessions preferably consist of packets of the same network protocol and preferably the same source/target addresses found in each network layer. In step 308, assembler 106 communicates the sessions, which conform to one or more protocols to parser director 112. Alternatively, parser director 112 may actively capture sessions 110 from assembler 106.
[0058] In step 310, parser director 112 directs assembled sessions to protocol-specific parsers
116. In an exemplary embodiment, parser director 112 performs protocol matching and lexical analysis of the session content to decide to which protocol-specific parsers 116 to direct each assembled session. [0059] In step 312, protocol-specific parsers 116 receive directed sessions 114 from parser director 112. In step 314, protocol-specific parsers 116 output the parsed sessions in the common language. As described above, each of protocol-specific parsers 116 operates on sessions that conform to the protocol to which the parser is configured to parse. If there is more than one protocol present in the session data presented to parser director 112, preferably there will be a protocol-specific parser for each protocol present in the session data. The protocol-specific parsers output a common language representation of the session data input to them. Preferably, the protocol-specific parsers parse metadata representative of the session data. Also preferably, the metadata conforms to the common language.
[0060] In step 316, protocol-specific parsers 116 submit the common language data to an analyzer. Protocol-specific parsers 116 can also record common language data to a record (or log). Also as part of step 316, protocol-specific parsers 116 or analyzer 120 may access the common language data from the record. If protocol-specific parsers 116 access the common language data from the record, protocol-specific parsers 116 then communicate the common language data to analyzer 120.
[0061] In step 318, analyzer 120 analyzes data conformed to the common language.
Preferably, only one analyzer 120 is used to analyze all of the common language data. In an exemplary embodiment, only one analyzer using one analysis logic is needed to analyze the communications represented by the sessions because the communications are conformed to the common language rather than disparate protocols. In an exemplary embodiment, analyzer 120 is a workstation-based system having a graphical user interface (GUI) for formulating queries and performing other analyses on the database. In another exemplary embodiment, analysis tools, such as those included in analyzer 120, do not have to be changed when protocols are added or changed because protocol-specific parsers 116 can be modified or added to the system. Sessions parsed into metadata in the common language are described in an exemplary embodiment as common language data in Figures 1 and 2 and as common-language sessions or sessions in common language herein.
[0062] Figure 4 is a flow diagram of another embodiment of a method for analyzing network communications in accordance with the present invention. Generally, the method comprises steps for parsing information from sessions conforming to one or more protocols into metadata conforming to a common language. Many different elements, configurations, or combinations of elements can be used to implement the methods described below. For clarity, however, the below description of preferred methods of the invention uses many of the elements set forth in Figures 1 and 2.
[0063] hi step 402, protocol-specific parsers 116 receive directed sessions 114. Each parser of protocol-specific parsers 116 receives only directed sessions 114 that conform, at least in part, with the protocol to which the receiving protocol-specific parser is configured to parse. For example, parser 116b is configured to parse sessions conformed to the Telnet protocol. Thus, parser 116b receives any session that, in part, conforms with the Telnet protocol (see Figure 2).
[0064] In step 404, protocol-specific parsers 116 extract information from directed sessions
114. If desired, the extracted information can be stored in step 405. In step 406, protocol- specific parsers 116 translate the extracted information into a common language. For example, Telnet-specific parser 116b extracts session data conforming to the Telnet protocol and translates that data into the common language.
[0065] Preferably, in step 404, protocol-specific parsers 116 carefully extract only information generally useful in analyzing the communication(s) that each session represents. By extracting only a portion of the information, this embodiment of the present invention creates a common language 118 representation of the session data that is significantly smaller than directed sessions 114 or sessions 110. Consequently, these representations are cheaper and more efficient to store. Moreover, the common language data is more quickly and easily analyzed due to its significantly smaller size.
[0066] In step 408, protocol-specific parsers 116 communicate sessions in common language
118. If the common language data is not to be stored in a database, as determined in step 410, protocol-specific parsers 116 may communicate each session of the sessions in common language 118 one-at-a-time or in groups to analyzer 120. In step 412, analyzer 120 analyzes sessions in common language 118. In this exemplary embodiment, only one analyzer 120 is used to analyze all of the sessions in common language 118. Alternatively, if the common language data is to be stored in a database, one or more database records for storing the common language data is created in step 414. The database can be later accessed by an analyzer such as analyzer 120 to analyze the data.
[0067] Figure 5 is a schematic diagram of another embodiment of a system for analyzing network traffic in accordance with the present invention. Generally, this embodiment shows an exemplary embodiment of a common language, called an event-based language, to which network communications or input files containing communications are translated in preparation for analysis.
[0068] Preferably, event-based language 502 follows a taxonomy of session 504, events 506, and properties 508. In an exemplary embodiment, event-based language 502 further comprises aliases 510 and routes 512. According to the sessions-events-properties taxonomy, each session corresponds to one or more network events. In one embodiment, sessions may be used to group events per computer per application. For example, a computer in communication with a server using a Netscape browser can be one session; the server response to the computer can be another session. Sessions can be used to group events in other fashions, for example, in order to accommodate so-called "port-jumping" protocols. In another embodiment, sessions can encompass other sessions in a directory- type system structure.
[0069] Events 506 can be described in terms of entities 514 involved in each event of events
506. Generally, each event of events 506 corresponds to a communication between at least two entities 514. Each event of events 506 can also be described in terms of various properties 508 associated it. In an exemplary embodiment, each event of events 506 can also be described in terms of aliases 510 of entities 514 for each event, and routes 512 associated with each event. In an exemplary embodiment, aliases 510 of entities 512 can be recorded as a property to each entity (not shown in Figure 5) and routes 512 can be recorded as indirect events to session 504.
[0070] In an exemplary embodiment, each session (e.g., network transaction or other communication) can be converted to a standard set of outputs. For example, there may be two basic outputs provided by a protocol-specific parser, such as one of protocol-specific parsers 116: events 506 and properties 508. Thus, the metadata describing sessions involving a variety of protocols can be stored in as little as two basic tables. This is a significant benefit of the present invention in comparison to prior approaches. For this exemplary embodiment, the metadata conforming to the event-based language can be stored in a log or record having as little as two columns.
[0071] Figure 5 illustrates an exemplary structure of the event-based language as applied to transactions in a computer network. Preferably, each transaction will be grouped in a single session 504 and can be described in terms of one or more of: events 506, properties 508, aliases 510, and routes 512. In the embodiment set forth in Figure 5, an entity of entities 514 can be one of three types: a computer 522, a user 520, or a resource 524. For example, an entity that is computer 522 could be a host, a server, a desktop, a laptop, and so forth. Computer 522 could be identified by a network address, a computer name, a host name, a port number, and so forth. Computer 522 can be a computer that is within network 102 (Figure 1) or another network that is being accessed or one that is outside of either network 102 or the other network.
[0072] User 520 can be an individual, such as an authorized user on a computer network.
User 520 may be an e-mail address, a local area network (LAN) user, the "Full Name" (real name) of the user, a handle or name used to identify user 520, and so forth.
[0073] Resource 524 may be a resource that is accessed or used during an event. For example, resource 524 may be a file, data from within a database, or a message from a shared bulletin board. Resource 524 can also be a container of other resources, such as a file system directory structure, a database, tables in a database, or a shared bulletin board. Examples of entity types, such as resource 524, computer 522, and user 520, and corresponding numerical representations are:
100, "IP";
101, "IP-PORT";
102, "IP-USER";
103, "IP-RESOURCE";
200, "HOST";
201, "HOST- PORT";
202, "HOST-USER";
203, "HOST-RESOURCE"; and 300, "GROUP."
[0074] In the exemplary embodiment set forth in Figure 5, the common language is represented by an event-based language. The event-based language permits events on a computer network to be described using so-called event statements. For example, an event can refer to transactions between or involving differing types of entities, such as the following interactions between entities: computer - computer; user- computer, user- user, user- resource, and so forth.
[0075] An event statement 526 describes an action taken by one entity with respect to at least one other entity using a service. Thus, each event statement 526 preferably comprises two parameters: (1) one or more entities 514; and (2) an action 516.
[0076] A session statement 534 describes a session. As such, each session statement 534 includes some facts about session 504. In an exemplary embodiment, session statement 534 includes the times that session 504 began/ended, the size of session 504 (e.g., 1.5 MB), and a service type 518 of the session. Generally, service types (sometimes referred to herein as "services" or "applications") refers to or is related to a protocol or application used during network communications. A property statement 528 preferably includes facts about either session 504 or event 506. In an exemplary embodiment where event 506 includes an email communication, property statement 528 can include the subject line of the email communication. A route statement 532 preferably includes facts about the route that an event traveled. An alias statement 530 preferably includes information regarding the identity of user 520, computer 522, or resource 524.
[0077] Examples of actions that might be logged into a record using the event-based language for network level communications include: an ETHERNET transaction, an IP transaction, or a TCP transaction. Examples of actions that might be logged into a record at the application level: a "user login" (a user attempting or obtaining access to a system) a "user logoff," a "get resource" (e.g., getting or acquiring a resource, such as downloading a file or selecting a database row), a "put resource" (e.g., performing an operation using a resource, such as saving a file, uploading a file, or inserting a database row), a "delete resource" (e.g., removing a resource, such as deleting a file or database row), a "send message" (e.g., sending an e-mail or sending an Instant Message), a "receive message" (e.g., receiving an e-mail or receiving an Instant Message), a "read message" (e.g., opening an e-mail or opening an Instant Message to read it), a "database query request" (e.g., a client issuing a request from a database), and a "database query response" (e.g., a server providing a response to the client's request). Examples of actions that can be logged into a record in an exemplary system and corresponding numerical representations are:
I, "IP Transaction"; 10, "User Login";
II, "User Logoff ';
20, "Get Resource";
21, "Put Resource";
22, "Delete Resource";
30, "Send MSG";
31, "Receive MSG";
32, "Read MSG";
33, "Delete MSG"; 40, "Database Query"; 110, "User Login Response";
III, "User Logoff Response";
120, "Get Resource Response";
121, "Put Resource Response";
122, "Delete Resource Response";
130, "Send MSG Response";
131, "Receive MSG Response";
132, "Read MSG Response"; and 140, "Database Query Response." [0078] Other values for actions can be used in order to tailor the common language to a particular computer network or to accommodate new applications. Generally, the library of actions is sufficient to describe actions, such as action 516, taken in connection with a communication between two entities, such as entities 514. [0079] Examples of services that might be logged into a record using the common language include: File Transfer Protocol (FTP), TELNET, Simple Mail Transfer Protocol (SMTP), Domain Name Service (DNS), Hypertext Transfer Protocol (HTTP), POP3, Network News Transfer Protocol (NNTP), Server Message Block (SMB), MSSQL™/Sybase™ Database protocol (e.g., TDS), Oracle™ Database Protocol (e.g., TNS), Lotus Notes™, Dynamic Host Configuration Protocol (DHCP), Remote Procedure Call (RPC), Routing Information Protocol (RIP), Network File System (NFS), and Instant Messenger Protocols (AOL™, MSN, Yahoo™, etc.). Examples of services that can be logged into a record in an exemplary system and corresponding numerical representations are:
21, "Ftp";
23, "Telnet";
25, "E-Mail (SMTP);
53, "Domain Name Service";
67, "DHCP";
5190, "AOL™ Instant Msg";
5050, "Yahoo™ Instant Msg";
80, "WWW";
109, "E-Mail (POP-2)";
110, "E-Mail (POP-3)"; 119, "News";
135, "Microsoft RPC"; 137, "Netbios™";
139, "MS File Access";
161, "SNMP";
520, "RIP";
1122, "MS Instant Msg";
1352, "Lotus Notes™";
1362, "Sybase™ Database";
1433, "MSSQL™ Database";
1521, "Oracle™ Database";
1533, "Lotus Sametime™";
2049, "Unix™ File Access"; and
6667, "IRC." [0080] Other values for services can be used in order to tailor the event-based language to accommodate new applications and protocols. [0081] Using the two parameters (entities 514 and action 516), event statement 526 can be expressed in the form: <ENTITY1> was seen <ACTION> to <ENTITY2>. In an exemplary embodiment, event statement 526 can also include service type 518, as shown in Figure 9a. As shown in Figure 9a, the expression of event statement 526 is of the form: <ENTITY1> was seen <ACTION> to <ENTITY2> with <SERVICE TYPE> for an event of events 506 involving two entities of entities 514, one at the "source" end and one at the "target" end. For an event involving multiple entities of entities 514 at each end, event statement 526 can be expressed as: <ENTITY1A, ENTITY1B> was seen <ACTION> to <ENTITY 2A, ENTITY2B> with <SERVICE TYPE>, also as shown in Figure 9a. [0082] For example, event 506 for a first user (TODD) of entities 514 sending an e-mail to a second user (DAMON) of entities 514 can be expressed by event statement 526 conformed to the following form: <USER TODD> was seen <SENDING MESSAGE> to <USER DAMON> with <SMTP>, as shown in Figure 9a.
[0083] Also for example, event 506 for a user (TODD) of entities 514 using a first computer to receive via File Transfer Protocol (FTP) a file containing a password stored on a second computer can be expressed by event statement 526 conformed to the following form: <COMPUTER 192.168.1.2, USER TODD> was seen <GETTING RESOURCE> from <COMPUTER 192.168.1.1, RESOURCE: /etc/passwd> using <FTP>, as shown in Figure 9a.
[0084] Protocol-specific parsers 116 (Figures 1 and 2) do not have to output events in the format of event statement 526. Preferably, however, protocol-specific parsers 116 extract and output three parameters that can form event statement 526: entities, action, and service type. These basic parameters can be stored and, if desired, displayed in event statement format for a readily comprehended metadata description of the event, or in some other format.
[0085] Each event 506 may also have properties associated with the event. For example, event 506 corresponding to an e-mail (e.g., referring to the action types listed above, the action type "SENDJVISG" and the service "E-mail (SMTP)") may have associated properties. For example, the properties for such an e-mail may include the subject line of the e-mail ("IMPORTANT INFORMATION, PLEASE READ"), the sender password ("testl2"), and the application used for the action ("Outlook Express"). Figure 9b illustrates an exemplary property name-value pair for storing properties associated with an event. Figure 9b shows three name fields: "subject," "password," and "application." Figure 9b shows three values for those name fields: "IMPORTANT INFORMATION, PLEASE READ", "testl2", and "Outlook Express". Other property types or fields could be included, such as the size of the event, the time of the event, file attachments, full names of the sender and all recipients, and so forth.
[0086] Each event, such as event 506, may also have associated routes, such as route 512.
Route 512 refers to network communication information that may be carried within captured data, but that was not directly observed in collecting the data. For example, a collected e-mail may include a list or log of the servers through which the e-mail message passed. This internal routing information, while not directly observed, can be extracted and stored. Figure 9c illustrates an exemplary format for capturing the routing information. The exemplary format is a <COMPUTER ENTITY> to <COMPUTER ENTITY> format. Event 506 may have multiple routes 512 corresponding to multiple route statements, each like the one shown in Figure 9c.
[0087] Each event, such as event 506, may also have associated aliases, such as alias 510.
Aliases 510 are names or values for an entity (e.g., a computer or a user) that describe the same entity. For example, event 506 may involve a computer entity, such as computer 522, defined by the IP address "192.168.1.12." Event 506 may also involve a user entity, such as user 520, defined by the e-mail address "todd@forensicsexplorers.com." Computer 522 may be correlated to the alias "forensicsexplorer.com" and user 520 may be correlated to the alias "Todd Moore." Figure 9d illustrates an exemplary storage format for storing alias information for events. Therefore, the present invention provides that when event 506 is extracted the observed entities 514 can be correlated to known aliases 510. This information can be stored and associated with event 506 for later review and/or processing.
[0088] To create event statements or otherwise generate metadata, the invention parses information from each session or other communication data. In an exemplary embodiment, using for purpose of clarity the elements of Figures 1 and 2, the invention parses information following the method set forth in Figure 6. [0089] Figure 6 provides a flow diagram for an exemplary method for converting sessions into the event-based language. As described above, the event-based language is one example of a common language according to the present invention. In an exemplary embodiment intending to reduce the number of tables in a metadata log, the step of identifying event routes may comprise treating an identified route as an "indirect event." In this embodiment, the step of identifying aliases may comprise treating an identified alias as a property of an entity. This might permit storing routes in an event table and aliases in the properties table. By treating routes and aliases under the rubric of events and properties, respectively, the number of tables required for a log or file of the sessions can be reduced.
[0090] In the exemplary embodiment set forth in Figure 6, assembler 106 (Figure 1) receives packets in step 602. The packets are assembled into sessions in step 604. Protocol-specific parsers 116 (in this case one parser for each protocol in the session), extract session properties in step 606. Protocol-specific parsers 116 then identify events in step 608, identify routes in step 610, identify entities in step 612, identify entity aliases in step 614, identify actions in step 616, and extract event properties in step 618, from within the session. Protocol-specific parsers 116 continue to parse the session until all events within the session have been parsed in step 620. Protocol-specific parsers 116 parse other sessions, according to step 620 and so forth.
[0091] The method illustrated in Figure 6 presumes that the service type will be the same for all events in a session. Accordingly, the service is extracted as a property of the session. Alternatively, the service type can be identified for each event. In that case, the method performs the step of identifying a service type in the session in step 617.
[0092] Figure 7 illustrates an example of the present invention to parse an SMTP (Simple
Mail Transfer Protocol) session into the event-based language. In Figure 7, the area "A" displays data from the session in protocol, which consists of multiple data packets for an e- mail that was sent from one user to another. The session includes network-level data (e.g., Ethernet and TCP/IP) and application data (e.g., SMTP and Microsoft Outlook).
[0093] Area "B" displays the metadata that describes the session according to the event- based language. The overall SMTP session is described by four properties: time, size, service, and subject (not shown). The session includes three separate events: (1) a first event between the source computer (entity) and the target computer (entity) for an IP transaction (action); (2) a second event between the port (entity) of the source computer and the port (entity) of the target computer for a TCP transaction (action); and (3) a third event between the source user (entity) and the target user (entity) for sending a message (action). The service type (SMTP) is not separately recited for each of the events because it is the same for all events in the session.
[0094] Properties of the third event are also identified. The properties include the identity of the application (MS Outlook) and the attached file (winmail.dat).
[0095] Figure 8 illustrates an example of applying the present invention to parse an FTP (File
Transfer Protocol) session into the event-based language. In the session of Figure 8, a user has logged into a site, stored a file, retrieved some data, and then deleted the file. In area "A" of Figure 8, network-level data and application data from the packets and within the session are shown. By application of the invention, the session is translated into metadata conformed to the event-based language shown in area "B."
[0096] Figures 7 and 8 provide an exemplary illustration of the benefits of the invention. The protocol-specific data in area A for both figures is complex and unwieldy. More importantly, the extracted data for the SMTP session (shown in Figure 7) is very different from the extracted data for the FTP session (shown in Figure 8). Additionally, the extracted data (area A) is not readily or easily understood in terms of the events that took place. Without the present invention, logs of SMTP sessions and FTP sessions would require separate analysis tools to be analyzed.
[0097] When a session is converted to metadata conforming to the event-based language (as shown in areas B of Figures 7 and 8), the network-level events are readily understood. The metadata for different protocols (here, SMTP and FTP) can be stored in the same finite set of tables in a log or record. Importantly, the same analysis tool or tools can be used to analyze both types of sessions.
[0098] Figures 10, 1 la, and 1 lb provide a record of an exemplary embodiment of data from protocol-specific sessions. Figure 10 illustrates data from a session conforming to the HTTP protocol. Figure 11a illustrates data from a session conforming to the SMTP protocol. Figure lib illustrates data from a session conforming to the FTP protocol.
[0099] Figure 12a illustrates a log output file of the three sessions illustrated in part in
Figures 10, 1 la, and 1 lb after they have been parsed into metadata conformed to the event- based language of the present invention. The metadata for the first session is represented in the first seven lines of the exemplary log output file. The metadata for the second session is represented in lines eight to eighteen of the exemplary log output file. The metadata for the third session is represented in lines nineteen to twenty-three of the exemplary log output file. This output follows the form shown in Figure 12b.
[00100] In Figure 12b, the terms shown after the "S:" relate to types of metadata about a session of data from which an event is a part. The terms shown after the first two "P:" relate to metadata about properties of the session of data. The terms shown after the "E:" relate to types of metadata about the event. The terms shown after the "P:" below the "E:" relate to types of metadata about properties of the event. For example, "<source name:subname>" and "<target name:subname>" are entities involved in event. The terms shown after the "A:" relate to types of metadata about an alias or aliases of these entities. The terms after the "R:" relate to types of metadata about the route or routes taken by the session of data or the data packets that comprise the session. As can be readily seen, the output of this exemplary embodiment of the invention shows parsing of sessions in disparate protocols into a compact output conforming to a common language.
[00101] The foregoing disclosure of the preferred embodiments of the present invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many variations and modifications of the embodiments described herein will be obvious to one of ordinary skill in the art in light of the above disclosure.
[00102] Further, in describing representative embodiments of the present invention, the specification may have presented the method and/or process of the present invention as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. As one of ordinary skill in the art would appreciate, other sequences of steps may be possible. Therefore, the particular order of the steps set forth in the specification should not be construed as limitations of the invention.

Claims

WHAT IS CLAIMED IS:
1. A method of parsing sessions in disparate protocols into a common language comprising the steps of: receiving sessions in disparate protocols; parsing the sessions in disparate protocols into sessions of a common language; and communicating the common-language sessions to an analyzer.
2. The method of claim 1 further comprising the steps of: collecting packets of network traffic; and assembling the packets into the sessions in disparate protocols.
3. The method of claim 2 further comprising the steps of: communicating the packets to an assembler.
4. The method of claim 1 further comprising the steps of: communicating the sessions in disparate protocols to a protocol director; and directing each of the sessions in disparate protocols to an appropriate parser.
5. The method of claim 1 further comprising the step of: analyzing the common-language sessions.
6. A system for parsing sessions in disparate protocols into a common language comprising: a parser director; parsers; and an analyzer, wherein the parser director is configured to direct a session of a particular protocol to a parser configured to parse sessions of the particular protocol, wherein each of the parsers is configured to parse sessions of a particular protocol into sessions of a common language, and wherein the analyzer is configured to analyze the common-language sessions.
7. The system of claim 6 further comprising: a packet generator configured to copy packets communicated as part of communications within a network; and an assembler configured to group the packets related to a single communication between two or more entities into one session.
8. A method of extracting information from a session to create a record conforming to an event-based language comprising the steps of: receiving a session; extracting information from the session; translating the information into an event statement describing an event between a first entity and a second entity; and creating a record containing the event statement.
9. The method of claim 8, wherein the first entity and the second entity comprise one of the following entities: IP, IP-port, IP-user, IP-resource, host, host-port, host-user, or host-resource.
10. The method of claim 8, wherein the event statement describes the first entity, the second entity, an application used for the event, and an action describing the event.
11. The method of claim 10, wherein the record conforms to the following structure: "<the first entity> was seen <the action> to <the second entity> with <the application>."
12. The method of claim 10, wherein the application is one of the following application types: FTP, Telnet, SMTP, Domain Name Service, DHCP, AOL™ Instant Messenger, Yahoo™ Instant Messenger, HTTP, POP-2, POP-3, NNTP, Microsoft RPC, Netbios, MS File Access, SNMP, RIP, MS Instant Messenger, Lotus Notes™, Sybase™ Database, MSSQL™ Database, Oracle™ Database, Lotus Sametime™, Unix™ File Access, or IRC.
13. The method of claim 10, wherein the event statement further contains one of the following content types: Mail, HTML, DCARD, SMIME, or PGP.
14. The method of claim 10, wherein the action includes at least one of the following action types: IP Transaction, User Login, User Logoff, Get Resource, Put Resource,
Delete Resource, Send Message, Receive Message, Read Message, Delete Message, Database Query, User Login Response, User Logoff Response, Get Resource Response, Delete Resource Response, Send Message Response, Read Message Response, or Database Query Response.
15. The method of claim 8 further comprising the step of translating the information into a session statement describing a communication of which the event is a part, wherein the record also contains the session statement.
16. The method of claim 8 further comprising the step of translating the information into a property statement describing properties of the event, wherein the record also contains the property statement.
17. The method of claim 16, wherein the properties of the event include at least one of the following property types: an application used, a subject of the event, or a database queried.
18. The method of claim 8 further comprising the step of translating the information into a route statement describing a route through a network traveled by the event, wherein the record also contains the route statement.
19. The method of claim 8 further comprising the step of translating the information into an alias statement describing additional information related to an identity of the first entity or the second entity, wherein the record also contains the alias statement.
20. The method of claim 19, wherein the alias statement contains at least one of the following alias types: IP-Alias or User-Alias.
21. The method of claim 8 further comprising the step of translating the information into a session statement describing a communication of which the event is a part, a property statement describing properties of the event, a route statement describing a route through a network traveled by the session or part of the session, and an alias statement describing additional information related to an identity of the first entity or the second entity, wherein the record also contains the session statement, the property statement, the route statement, and the alias statement.
22. The method of claim 8 further comprising the step of translating the information into a property statement describing properties of the event, wherein the record also contains the property statement and wherein the record is a condense and simple representation of the session from which the information was extracted.
23. An event-based language for use in network security comprising: a session statement configured to describe a session of which an event is a part; an event statement configured to describe the event through an action between a first entity and a second entity using an application; and a properties statement configured to describe properties of the event.
24. The event-based language of claim 23 further comprising: a route statement configured to describe a route through a network traveled by the session or the event; and an alias statement configured to describe an alias related to an identity of the first entity or an identity of the second entity.
25. The event-based language of claim 23 wherein the event statement conforms to the following structure: "<the first entity> was seen <the action> to <the second entity> with <the application>."
PCT/US2002/013391 2001-04-30 2002-04-29 Apparatus and method for network analysis WO2002088968A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US28696601P 2001-04-30 2001-04-30
US60/286,966 2001-04-30

Publications (1)

Publication Number Publication Date
WO2002088968A1 true WO2002088968A1 (en) 2002-11-07

Family

ID=23100901

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2002/013391 WO2002088968A1 (en) 2001-04-30 2002-04-29 Apparatus and method for network analysis

Country Status (2)

Country Link
US (2) US7634557B2 (en)
WO (1) WO2002088968A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005071890A1 (en) * 2004-01-27 2005-08-04 Actix Limited Monitoring system for a mobile communication network for traffic analysis using a hierarchial approach

Families Citing this family (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7917744B2 (en) * 1999-02-03 2011-03-29 Cybersoft, Inc. Apparatus and methods for intercepting, examining and controlling code, data and files and their transfer in instant messaging and peer-to-peer applications
US7054617B2 (en) * 2002-01-11 2006-05-30 Motorola, Inc. Method and apparatus for providing a message creation reference associated with a real-time communication message
US7240213B1 (en) 2002-03-15 2007-07-03 Waters Edge Consulting, Llc. System trustworthiness tool and methodology
US7016978B2 (en) * 2002-04-29 2006-03-21 Bellsouth Intellectual Property Corporation Instant messaging architecture and system for interoperability and presence management
US7636320B1 (en) * 2002-06-28 2009-12-22 At&T Intellectual Property I, L.P. System and method for creating an asynchronous transfer mode port mirror
US7035942B2 (en) * 2002-09-17 2006-04-25 Bellsouth Intellectual Property Corp. Server-based message protocol translation
WO2004027561A2 (en) * 2002-09-17 2004-04-01 Bellsouth Intellectual Property Corporation Client-based message protocol translation
US7991827B1 (en) 2002-11-13 2011-08-02 Mcafee, Inc. Network analysis system and method utilizing collected metadata
US8458805B2 (en) * 2003-06-23 2013-06-04 Architecture Technology Corporation Digital forensic analysis using empirical privilege profiling (EPP) for filtering collected data
US7496959B2 (en) * 2003-06-23 2009-02-24 Architecture Technology Corporation Remote collection of computer forensic evidence
US9094805B2 (en) * 2003-06-25 2015-07-28 Oracle International Corporation Mobile messaging concierge
US7529853B2 (en) * 2003-06-25 2009-05-05 Oracle International Corporation Universal IM and presence aggregation on technology-specific client
US8028073B2 (en) 2003-06-25 2011-09-27 Oracle International Corporation Mobile meeting and collaboration
US7710885B2 (en) * 2003-08-29 2010-05-04 Agilent Technologies, Inc. Routing monitoring
US7673066B2 (en) * 2003-11-07 2010-03-02 Sony Corporation File transfer protocol for mobile computer
EP1706960B1 (en) 2004-01-07 2014-09-17 Intellinx Ltd. Apparatus and method for monitoring and auditing activity of a legacy environment
CA2571391C (en) * 2004-06-21 2010-12-21 Research In Motion Limited System and method for handling electronic messages
US7748040B2 (en) * 2004-07-12 2010-06-29 Architecture Technology Corporation Attack correlation using marked information
US7492763B1 (en) * 2004-07-16 2009-02-17 Applied Micro Circuits Corporation User-specified key creation from attributes independent of encapsulation type
US20060271698A1 (en) * 2005-05-16 2006-11-30 Shrader Anthony G Boa back office integration protocol
US7345585B2 (en) * 2005-08-01 2008-03-18 Cisco Technology, Inc. Network based device for providing RFID middleware functionality
US20080301320A1 (en) * 2007-05-31 2008-12-04 Morris Robert P Method And System For Managing Communication Protocol Data Based On MIME Types
US20090182899A1 (en) * 2008-01-15 2009-07-16 Microsoft Corporation Methods and apparatus relating to wire formats for sql server environments
US20100299430A1 (en) * 2009-05-22 2010-11-25 Architecture Technology Corporation Automated acquisition of volatile forensic evidence from network devices
US10057298B2 (en) 2011-02-10 2018-08-21 Architecture Technology Corporation Configurable investigative tool
US10067787B2 (en) 2011-02-10 2018-09-04 Architecture Technology Corporation Configurable forensic investigative tool
US8683568B1 (en) * 2011-09-22 2014-03-25 Emc Corporation Using packet interception to integrate risk-based user authentication into online services
US8683592B1 (en) 2011-12-30 2014-03-25 Emc Corporation Associating network and storage activities for forensic analysis
US8825848B1 (en) 2012-03-20 2014-09-02 Emc Corporation Ordering of event records in an electronic system for forensic analysis
US9485276B2 (en) 2012-09-28 2016-11-01 Juniper Networks, Inc. Dynamic service handling using a honeypot
US9432278B2 (en) 2013-03-07 2016-08-30 Microsoft Technology Licensing, Llc Simulation of interactions between network endpoints
KR101505845B1 (en) * 2014-02-04 2015-03-26 한국전자통신연구원 Apparatus for processing packet and method thereof
US9628512B2 (en) * 2014-03-11 2017-04-18 Vectra Networks, Inc. Malicious relay detection on networks
US9467343B1 (en) 2014-09-30 2016-10-11 Emc Corporation Collaborative analytics for independently administered network domains
US11762989B2 (en) 2015-06-05 2023-09-19 Bottomline Technologies Inc. Securing electronic data by automatically destroying misdirected transmissions
US20170163664A1 (en) 2015-12-04 2017-06-08 Bottomline Technologies (De) Inc. Method to secure protected content on a mobile device
US11163955B2 (en) 2016-06-03 2021-11-02 Bottomline Technologies, Inc. Identifying non-exactly matching text
US11416713B1 (en) 2019-03-18 2022-08-16 Bottomline Technologies, Inc. Distributed predictive analytics data set
US11042555B1 (en) 2019-06-28 2021-06-22 Bottomline Technologies, Inc. Two step algorithm for non-exact matching of large datasets
US11269841B1 (en) 2019-10-17 2022-03-08 Bottomline Technologies, Inc. Method and apparatus for non-exact matching of addresses
US11449870B2 (en) 2020-08-05 2022-09-20 Bottomline Technologies Ltd. Fraud detection rule optimization
US11544798B1 (en) 2021-08-27 2023-01-03 Bottomline Technologies, Inc. Interactive animated user interface of a step-wise visual path of circles across a line for invoice management
US11694276B1 (en) 2021-08-27 2023-07-04 Bottomline Technologies, Inc. Process for automatically matching datasets

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5327544A (en) * 1991-08-29 1994-07-05 At&T Bell Laboratories Method and apparatus for designing gateways for computer networks

Family Cites Families (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5345587A (en) 1988-09-14 1994-09-06 Digital Equipment Corporation Extensible entity management system including a dispatching kernel and modules which independently interpret and execute commands
US5319453A (en) 1989-06-22 1994-06-07 Airtrax Method and apparatus for video signal encoding, decoding and monitoring
JP2508851B2 (en) * 1989-08-23 1996-06-19 日本電気株式会社 Active matrix substrate for liquid crystal display device and manufacturing method thereof
US5191525A (en) 1990-01-16 1993-03-02 Digital Image Systems, Corporation System and method for extraction of data from documents for subsequent processing
US5673252A (en) 1990-02-15 1997-09-30 Itron, Inc. Communications protocol for remote data generating stations
JP2943447B2 (en) 1991-01-30 1999-08-30 三菱電機株式会社 Text information extraction device, text similarity matching device, text search system, text information extraction method, text similarity matching method, and question analysis device
US5696899A (en) 1992-11-18 1997-12-09 Canon Kabushiki Kaisha Method and apparatus for adaptively determining the format of data packets carried on a local area network
US5495607A (en) 1993-11-15 1996-02-27 Conner Peripherals, Inc. Network management system having virtual catalog overview of files distributively stored across network domain
DE69433482T2 (en) 1993-11-16 2004-06-03 Fuji Xerox Co., Ltd. Network Printer
US5835726A (en) 1993-12-15 1998-11-10 Check Point Software Technologies Ltd. System for securing the flow of and selectively modifying packets in a computer network
US5819034A (en) 1994-04-28 1998-10-06 Thomson Consumer Electronics, Inc. Apparatus for transmitting and receiving executable applications as for a multimedia system
JP2940403B2 (en) 1994-08-03 1999-08-25 株式会社日立製作所 Monitor Data Collection Method for Parallel Computer System
US5825775A (en) 1994-11-14 1998-10-20 Bay Networks, Inc. Method and apparatus for managing an integrated router/hub
US5715397A (en) 1994-12-02 1998-02-03 Autoentry Online, Inc. System and method for data transfer and processing having intelligent selection of processing routing and advanced routing features
US5892900A (en) 1996-08-30 1999-04-06 Intertrust Technologies Corp. Systems and methods for secure transaction management and electronic rights protection
US5790799A (en) 1995-05-17 1998-08-04 Digital Equipment Corporation System for sampling network packets by only storing the network packet that its error check code matches with the reference error check code
US5568471A (en) 1995-09-06 1996-10-22 International Business Machines Corporation System and method for a workstation monitoring and control of multiple networks having different protocols
US5787253A (en) 1996-05-28 1998-07-28 The Ag Group Apparatus and method of analyzing internet activity
US5850523A (en) 1996-06-21 1998-12-15 National Instruments Corporation Method and system for monitoring fieldbus network with multiple packet filters
FR2751448B1 (en) 1996-07-17 1999-01-15 Bull Sa METHOD FOR REAL-TIME MONITORING OF A COMPUTER SYSTEM FOR ITS ADMINISTRATION AND ASSISTANCE IN MAINTAINING IT IN OPERATION
US5796942A (en) * 1996-11-21 1998-08-18 Computer Associates International, Inc. Method and apparatus for automated network-wide surveillance and security breach intervention
US5848233A (en) 1996-12-09 1998-12-08 Sun Microsystems, Inc. Method and apparatus for dynamic packet filter assignment
US5892994A (en) * 1997-05-14 1999-04-06 Inaba; Minoru Stereo camera with prism finder
JP3563584B2 (en) * 1998-01-23 2004-09-08 株式会社東芝 Network connection device
US6269447B1 (en) 1998-07-21 2001-07-31 Raytheon Company Information security analysis system
US6529954B1 (en) * 1999-06-29 2003-03-04 Wandell & Goltermann Technologies, Inc. Knowledge based expert analysis system
ATE496341T1 (en) * 1999-06-30 2011-02-15 Apptitude Inc METHOD AND DEVICE FOR MONITORING NETWORK TRAFFIC

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5327544A (en) * 1991-08-29 1994-07-05 At&T Bell Laboratories Method and apparatus for designing gateways for computer networks

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005071890A1 (en) * 2004-01-27 2005-08-04 Actix Limited Monitoring system for a mobile communication network for traffic analysis using a hierarchial approach
US7830812B2 (en) 2004-01-27 2010-11-09 Actix Limited Monitoring system for a mobile communication network for traffic analysis using a hierarchial approach
US7904080B2 (en) 2004-01-27 2011-03-08 Actix Limited Mobile communications network monitoring systems

Also Published As

Publication number Publication date
US7634557B2 (en) 2009-12-15
US20100046391A1 (en) 2010-02-25
US20020163934A1 (en) 2002-11-07

Similar Documents

Publication Publication Date Title
US7634557B2 (en) Apparatus and method for network analysis
US20100027430A1 (en) Apparatus and Method for Network Analysis
TW476204B (en) Information security analysis system
TW476207B (en) Information security analysis system
US6253337B1 (en) Information security analysis system
US7047423B1 (en) Information security analysis system
US7555550B2 (en) Asset tracker for identifying user of current internet protocol addresses within an organization&#39;s communications network
US9135432B2 (en) System and method for real time data awareness
US20080144655A1 (en) Systems, methods, and computer program products for passively transforming internet protocol (IP) network traffic
US20020107925A1 (en) Method and system for e-mail management
US20070180101A1 (en) System and method for storing data-network activity information
EP1097554B1 (en) Information security analysis system
CN102394885A (en) Information classification protection automatic verification method based on data stream
US8176169B2 (en) Method for network visualization
AU2002311381B2 (en) Information security analysis system
Fong et al. Networking support for neural network-based intelligent web monitoring and filtering
Li et al. Dynamic Electronic Forensics Based on Plug-in
Filoni Computing assets categorization according to collected configuration and usage information

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP