WO2012016327A1

WO2012016327A1 - A method and system for generating metrics representative of ip data traffic from ip data records

Info

Publication number: WO2012016327A1
Application number: PCT/CA2011/000877
Authority: WO
Inventors: Jean-Philippe Goyet; Alexandre Hayon; Eric MÉLIN
Original assignee: Neuralitic Systems
Priority date: 2010-08-06
Filing date: 2011-07-21
Publication date: 2012-02-09

Abstract

The present relates to a method and a system for generating metrics representative of IP data traffic from IP data records. The method and system receives, at an analytic system, IP data records representative of IP data traffic on an IP network. The method and system processes, at the analytic system, the IP data records from the perspective of at least one traffic dimension. The processing comprises, for each at least one traffic dimension: extracting information representative of the traffic dimension from the IP data records, and using the extracted information to compute at least one measure representative of the traffic dimension. The processing further comprises computing a metric, consisting of an aggregated value of the at least one measure representative of the traffic dimension against at least one traffic characteristic.

Description

TITLE

A METHOD AND SYSTEM FOR GENERATING METRICS REPRESENTATIVE OF IP DATA TRAFFIC FROM IP DATA RECORDS

SUMMARY

[0001] A method and system are disclosed to generate metrics representative of IP data traffic from IP data records. The method and system receives, at an analytic system, IP data records representative of the IP data traffic. The method and system processes, at the analytic system, the IP data records from the perspective of at least one traffic dimension. The processing comprises, for each at least one traffic dimension: extracting information representative of the traffic dimension from the IP data records, and using the extracted information to compute at least one measure representative of the traffic dimension. The processing further comprises computing a metric, consisting of an aggregated value of the at least one measure representative of the traffic dimension against at least one traffic characteristic

[0002] Additionally, a method and system are disclosed, wherein the traffic dimensions comprise at least one of: protocols, data services, web domains, applications, and IP television (IPTV) channels.

[0003] Also, a method and system are disclosed, wherein the measures comprise at least one of: a volume of data, a number of users, a number of sessions, a session duration, and a number of transactions.

[0004] And a method and system are disclosed, wherein the traffic characteristics comprise at least one of: time, devices, users, and networks. BRIEF DESCRIPTION OF THE DRAWINGS [0005] In the appended drawings:

[0006] Figure 1 illustrates an analytic framework for generating metrics representative of IP data traffic from IP data records, according to a non-restrictive illustrative embodiment;

[0007] Figure 2 illustrates a system for generating metrics representative of IP data traffic from IP data records, according to a non- restrictive illustrative embodiment;

[0008] Figure 3 illustrates a method for generating metrics representative of IP data traffic from IP data records, according to a non- restrictive illustrative embodiment.

DETAILED DESCRIPTION

[0009] Nowadays, it is becoming increasingly complex for a network

Operator of an IP data network to monitor and analyze the usage of the IP data services available to its subscribers via its IP based network infrastructure. This is due to a combination of factors, including the variety of devices available to connect to the IP data network, the variety of applications and IP data services available via the IP based network infrastructure, the differences in terms of capacity and performance of the IP based network infrastructures, and the variety of usage patterns of the subscribers consuming IP data services. This issue is applicable to any type of network Operator with an IP based network infrastructure, including for example: a mobile network Operator, a fixed broadband network Operator, a corporation operating its own network infrastructure. [0010] At the same time, it is becoming increasingly critical for a network Operator to have the capability to monitor and analyze the usage of the IP data services offered via its IP based network infrastructure. First, having information related to an IP data traffic generated on its IP based network infrastructure enables the network Operator to adjust its offerings, in terms of devices, data plans, IP data services, and network capacity. Having the proper offering is key in the competition between network Operators for subscribers' retention and Average Revenue Per User (ARPU) protection / development. Also, the cost of upgrading the IP based network infrastructure to sustain the growth in IP data services consumption shall be kept under control. Having comprehensive information on the IP data traffic enables the network Operator to upgrade its IP based network infrastructure in the most efficient way.

[0011] The scope of the information related to the IP data traffic on an IP based network infrastructure, available to a network Operator, is generally limited to a specific perspective. In one use case, the scope focuses on information related to the operating conditions of the IP data network (bandwidth usage, quality of service enforcement, network failures, user experience, etc). In another use case, the scope focuses on information related to the usage of a specific service (or class of service) delivered via the IP data network. In any case, the variety of IP data traffic types, devices used, user types, and behaviors, (to name a few) are not taken into account via a comprehensive and coherent analytic framework.

[0012] Thus, there is a need of overcoming the above discussed limitations concerning the lack of availability of a comprehensive and coherent analytic framework, to extract and analyze information related to IP data traffic. An object of the present method and system is therefore to generate metrics representative of IP data traffic from IP data records. [0013] In a general embodiment, the present method is adapted for generating metrics representative of IP data traffic from IP data records. For doing so, the method receives, at an analytic system, IP data records representative of IP data traffic on an IP network. The method processes, at the analytic system, the IP data records from the perspective of at least one traffic dimension. The processing comprises for each at least one traffic dimension: extracting information representative of the traffic dimension from the IP data records, and using the extracted information to compute at least one measure representative of the traffic dimension. The processing further comprises computing a metric, consisting of an aggregated value of the at least one measure representative of the traffic dimension against at least one traffic characteristic.

[0014] In another general embodiment, the present system is adapted for generating metrics representative of IP data traffic from IP data records. For doing so, the system comprises an analytic system for receiving IP data records representative of IP data traffic on an IP network, and for processing the IP data records from the perspective of at least one traffic dimension. The processing comprises for each at least one traffic dimension: extracting information representative of the traffic dimension from the IP data records, and using the extracted information to compute at least one measure representative of the traffic dimension. The processing further comprises: computing a metric consisting of an aggregated value of the at least one measure representative of the traffic dimension against at least one traffic characteristic.

[0015] In one specific aspect of the present method and system, the at least one traffic dimension comprises at least one of: protocols, data services, web domains, applications, and IP television (IPTV) channels. [0016] In another specific aspect of the present method and system, the at least one measure comprises at least one of: a volume of data, a number of users, a number of sessions, a session duration, and a number of transactions. The at least one measure may further comprise: an average volume of data per user, an average volume of data per session, an average volume of data per transaction, an average number of sessions per user, an average number of transactions per user, and an average session duration per user.

[0017] In still another specific aspect of the present method and system, the at least one traffic characteristic comprises at least one of: time, devices, users (optionally further divided into subscribers and roamers), and networks.

[0018] Additionally, a non-restrictive illustrative embodiment of the present method and system consists in: selecting the at least one traffic dimension from a set of pre-defined traffic dimensions, selecting the at least one measure from a set of pre-defined measures, and selecting the at least one traffic characteristic from a set of pre-defined traffic characteristics.

[0019] Now referring to Figure 1 , an analytic framework for generating metrics representative of IP data traffic from IP data records will be described.

[0020] The analytic framework 10 described in Figure 1 is a representation of an IP data traffic occurring on an IP based network infrastructure. The objective of this analytic framework 10 is to enable the processing of raw data, received in the form of IP data records, according to a pre-defined data model. This data model takes into account the complexity resulting from several factors, including: the variety of applications and IP data services used on the IP based network infrastructure, the multitude of devices available (with a range of specific characteristics for each model of device), and the different usage patterns of the users when consuming IP data services. Using the analytic framework 10, the information extracted from the IP data records is processed and aggregated, allowing the computation of various metrics representative of the IP data traffic on the IP based network infrastructure.

[0021] The analytic framework 10 relies on the definition of traffic dimensions 20. A traffic dimension 20 is a perspective from which the IP data traffic occurring on an IP based network infrastructure is analyzed. Each traffic dimension 20 is related to a specific aspect of the IP data traffic. The analysis of each aspect presents an added value for a network Operator, in terms of (among others): network performance tracking and optimization; subscriber behavior monitoring and profiling; IP data services monitoring, marketing, and monetizing; etc. Thus, the analysis of the IP data traffic is performed from the perspective of at least one of the traffic dimensions 20. The traffic dimensions 20 defined in a specific analytic framework 10 evolve over time, taking into consideration the evolution of the IP data services available on the IP based network infrastructures, and the specific needs of the network Operators in terms of monitoring and analysis.

[0022] The following examples of traffic dimensions 20 are represented in Figure 1 : protocols 21 , data services 22, web domains 23, applications 24, and IPTV channels 25. It will be apparent to one of ordinary skill in the art, that other examples of traffic dimensions 20 may be considered, without changing the scope of the present method and system.

[0023] The protocols traffic dimension 21 allows the analysis of the

IP data traffic from the perspective of the various networking protocols used to support the applications and IP data services consumed via the IP based network infrastructure. Examples of such protocols include the Hypertext Transfer Protocol (HTTP), BitTorrent, Skype, the Real-time Transfer Protocol (RTP) and the Real Time Streaming Protocol (RTSP), the Session Initiation Protocol (SIP).

[0024] The data services traffic dimension 22 allows the analysis of the IP data traffic from the perspective of various categories of IP data services. Examples of such IP data services include: web browsing, emailing, instant messaging, audio and video streaming, Voice over IP (VoIP), peer-to-peer, business services, IP based television (IPTV). Each IP data service is usually represented by a collection of applications providing this specific type of data service 22. Each application is implemented via a single, or a set of, networking protocol(s). Identifying a specific IP data service implies identifying the networking protocols implementing the related applications. Additional characteristics (for example specific patterns in specific IP packets) are taken into consideration, to identify a specific application related to a specific IP data service, since the same protocol may be used to provide different types of IP data services (for example, HTTP is used for both the web browsing and audio / video streaming data services - RTP is used for both the audio / video streaming and VoIP data services).

[0025] The web domains traffic dimension 23 allows the analysis of the IP data traffic from the perspective of the web domains accessed via the web browsing service. Each web domain is identified by a unique Uniform Resource Identifier (URI) transmitted via the HTTP protocol (the networking protocol supporting the web browsing service).

[0026] The applications traffic dimension 24 allows the analysis of the IP data traffic from the perspective of the third party applications used on the devices. This traffic dimension is particularly relevant for a mobile IP network, where such third party applications are usually downloaded from an application portal / store associated to a specific mobile operating system provider, to a specific mobile device manufacturer, or to a specific service provider (for instance the mobile network Operator). The download and installation of a third party application on a mobile device is performed at the initiative of the owner of the mobile device. The mobile IP traffic generated by each specific third party application is taken into consideration. HTTP based third party applications constitute a significant proportion of these third party applications. The detection of the most popular HTTP based third party applications may have a critical importance for a mobile network Operator. The method of identification of each specific HTTP based third party application is out of the scope of the present method and system, but is considered as technically feasible.

[0027] The IPTV channels traffic dimension 25 allows the analysis of the IP data traffic from the perspective of the IPTV channels viewed by the owners of devices using an IP based TV service, provisioned over an IP based network infrastructure. The IPTV channels are delivered via dedicated networking protocols (for example RTP, RTSP, SIP, HTTP), with specific patterns in the related IP packets, allowing the identification of the IPTV traffic in general, and the identification of each delivered IPTV channel in particular. The notion of IPTV is taken in a broad sense, to encompass any IP based TV service delivered on an IP based network infrastructure, including a mobile IP network or any kind of IP based fixed broadband network infrastructure.

[0028] The analytic framework 10 also relies on the notion of measures 40. Information extracted from the IP data records is used to compute each measure 40. The analytic framework 10 represents various measures 40, which are relevant for the analysis of the IP data traffic. For each traffic dimension 20, several measures 40 are relevant to represent this specific traffic dimension. For instance, as illustrated in Figure 1 , the protocols traffic dimension 21 is represented by the following measures 40: 41 (volume), 42 (number of users), and 46 (average volume per user). And the IPTV channels traffic dimension 25 is represented by the following measures 40: 41 (volume), 42 (number of users), 43 (number of sessions), 44 (session duration), and 47 (average volume per session). Thus, to analyze the IP data traffic from the perspective of the protocols traffic dimensions 21 , at least one among the measures 41 , 42, and 46 is computed. Similarly, to analyze the IP data traffic from the perspective of the IPTV channels dimensions 25, at least one among the measures 41 , 42, 43, 44, and 47 is computed. Furthermore, the definition of a measure 40 is different for each specific traffic dimension 20, so that the computation of each measure is tailored to the traffic dimension for which it is considered. For instance, the definition (and the computation) of the volume 41 is different in the context of the following traffic dimensions: protocols 21 , and IPTV channels 25. The measures 40 defined in a specific analytic framework 10 may evolve over time, to enhance the capabilities of analysis of a particular traffic dimension, or to support a new traffic dimension.

[0029] The following examples of measures 40 are represented in

Figure 1 : volume 41 , number of users 42, number of sessions 43, session duration 44, number of transactions 45, average volume per user 46, average volume per session 47, average volume per transaction 48, average number of session per user 49, average number of transaction per user 50, and average session duration per user 51. It will be apparent to one of ordinary skill in the art that other examples of measures 40 may be considered, without changing the scope of the present method and system.

[0030] The volume 41 measures a volume of IP traffic. Specific IP packets are taken into consideration, to compute the volume 41 from the perspective of a particular traffic dimension. Additionally, the layers of the Open System Interconnection (OSI) model taken into consideration for the computation may vary, based on the considered traffic dimension. In one case, layers 3 (network) to 7 (application) may be considered. In another case, layers 4 (transport) to 7 (application) may be considered. And in still another case, layers 5 to 7 (session / presentation / application) may be considered.

[0031] From the perspective of the protocols traffic dimension 21 , the volume 41 is the total volume of IP traffic transported by an IP flow associated to a specific networking protocol (for example an HTTP, or a BitTorrent, or a RTP, IP flow). The usual definition of an IP flow is considered in the present method and system: an IP flow is defined by a source IP address and source port, a destination IP address and destination port, and a transport protocol (in most cases, Transmission Control Protocol (TCP) or User Datagram Protocol (UDP)).

[0032] From the perspective of the data services traffic dimension

22, the volume 41 is the total volume of IP traffic transported by all the IP flows (potentially involving several networking protocols) associated to an instance of a specific IP data service (for example an instance of a VoIP session). An IP data service is usually composed of several applications (for example several VoIP applications), each application implemented via one or several networking protocols. Thus, an instance of the IP data service is a specific applicative session (represented by one or several related IP flows) generated by a device connected to an IP data network.

[0033] From the perspective of the web domains traffic dimension

23, the volume 41 is the total volume of HTTP traffic transported by all the IP flows supporting an HTTP session (access to a specific web domain (identified by its URI) by a device). [0034] From the perspective of the applications traffic dimension 24, the volume 41 is the total volume of IP traffic transported by all the IP flows associated to a specific third party applicative session (execution of the specific third party application on a device).

[0035] From the perspective of the IPTV channels traffic dimension

25, the volume 41 is the total volume of IP traffic transported by the IP flow(s) (for example, the RTP flows) associated to a specific IPTV session (delivery of a specific IPTV channel to a device).

[0036] The computation of a volume 41 (from the perspective of a specific traffic dimension 20) is based on the information (related to this volume) extracted from the IP data records. The information extracted from the IP data records is memorized (e.g. in a data warehouse), and the effective computation of the volume is performed over a reference time period, as will be detailed later. If available in the IP data records, generic information (e.g. timestamps of occurrence, identification of the related device, etc) is also memorized in the data warehouse. This generic information is used to perform the calculation / aggregation of the volume, as will be detailed later.

[0037] The number of users 42 measures the number of unique users accessing / making use of a specific resource. Specific IP flows are taken into consideration, to compute the number of users 42 accessing / making use of this specific resource, from the perspective of a particular traffic dimension. The notion of number of unique users is defined as follows: the same user accessing / making use of the specific resource several times under predefined circumstances (for instance, a reference time period) is counted only once. [0038] From the perspective of the protocols traffic dimension 21 , the number of users 42 is the number of unique users making use of a specific networking protocol (for example HTTP, or BitTorrent, or RTP).

[0039] From the perspective of the data services traffic dimension

22, the number of users 42 is the number of unique users making use of a specific IP data service (for example emailing, or VoIP).

[0040] From the perspective of the web domains traffic dimension

23, the number of users 42 is the number of unique users accessing a specific web domain (identified by its URI).

[0041] From the perspective of the applications traffic dimension 24, the number of users 42 is the number of unique users making use of a specific third party application (the third party application is identified by the associated IP flows).

[0042] From the perspective of the IPTV channels traffic dimension

25, the number of users 42 is the number of unique users accessing a specific IPTV channel. The access to the IPTV channel is identified by the associated IP flows (for example, the RTP flows) delivering this specific IPTV channel.

[0043] The computation of a number of users 42 (from the perspective of a specific traffic dimension 20) is based on the information (related to this number of users) extracted from the IP data records. In this case, having a unique identifier of the device accessing a specific resource is mandatory (in the IP data records), to support the notion of unique user (a unique user is defined by the unique identification of its device). The information extracted from the IP data records is memorized (e.g. in a data warehouse), to keep a record of each access to a specific resource. The effective computation of the number of users is performed over a reference time period, as will be detailed later. If available in the IP data records, generic information (e.g. timestamps of occurrence, identification of the related device, etc) is also memorized (in the data warehouse). This generic information is used to perform the calculation / aggregation of the number of users, as will be detailed later.

[0044] The number of sessions 43 measures the number of occurrences of the access to / the use of a specific resource. Specific IP flows are taken into consideration, to compute the number of occurrences of the access to / the use of the specific resource, from the perspective of a particular traffic dimension. Each time a device accesses / makes use of the specific resource, it is counted as an additional occurrence.

[0045] From the perspective of the protocols traffic dimension 21, the number of sessions 43 is the number of occurrences of an IP flow associated to a specific networking protocol (for example HTTP, or BitTorrent, or RTP).

[0046] From the perspective of the data services traffic dimension

22, the number of sessions 43 is the number of occurrences of the use of a specific IP data service (for example emailing, or VoIP).

[0047] From the perspective of the web domains traffic dimension

23, the number of sessions 43 is the number of occurrences of the access to a specific web domain (identified by its URI). [0048] From the perspective of the applications traffic dimension 24, the number of sessions 43 is the number of occurrences of the use of a specific third party application (the third party application is identified by the associated IP flows).

[0049] From the perspective of the IPTV channels traffic dimension

25, the number of sessions 43 is the number of occurrences of the access to a specific IPTV channel. The access to the IPTV channel is identified by the associated IP flows (for example, the RTP flows) delivering this specific IPTV channel.

[0050] The computation of a number of sessions 43 (from the perspective of a specific traffic dimension 20) is based on the information (related to this number of sessions) extracted from the IP data records. The information extracted from the IP data records is memorized (e.g. in a data warehouse), and the effective computation of the number of sessions is performed over a reference time period, as will be detailed later. If available in the IP data records, generic information (e.g. timestamps of occurrence, identification of the related device, etc) is also memorized (in the data warehouse). This generic information is used to perform the calculation / aggregation of the number of sessions, as will be detailed later.

[0051] The session duration 44 measures the duration of the access to / the use of a specific resource. Specific IP flows are taken into consideration, to compute the duration of the access to / the use of the specific resource, from the perspective of a particular traffic dimension.

[0052] From the perspective of the protocols traffic dimension 21 , the session duration 44 is the duration of an IP flow associated to a specific networking protocol (for example HTTP, or BitTorrent, or RTP). [0053] From the perspective of the data services traffic dimension

22, the session duration 44 is the duration of an applicative session representative of a specific IP data service (duration of the execution of an applicative session by a device).

[0054] From the perspective of the web domains traffic dimension

23, the session duration 44 is the duration of an HTTP session related to a specific web domain (duration of the access to a specific web domain by a device).

[0055] From the perspective of the applications traffic dimension 24, the session duration 44 is the duration of an applicative session of a specific third party application (duration of the execution of an applicative session by a device).

[0056] From the perspective of the IPTV channels traffic dimension

25, the session duration 44 is the duration of an IPTV session related to a specific IPTV channel (duration of the access to a specific IPTV channel by a device).

[0057] The computation of a session duration 44 (from the perspective of a specific traffic dimension 20) is based on the information (related to this session duration) extracted from the IP data records. The extracted information is memorized (e.g. in a data warehouse), and the effective computation of the session duration is performed over a reference time period, as will be detailed later. If available in the IP data records, generic information (e.g. timestamps of occurrence, identification of the related device, etc) is also memorized (in the data warehouse). This generic information is used to perform the calculation / aggregation of the session duration, as will be detailed later. [0058] The number of transactions 45 measures the number of occurrences of a specific transaction for the access to / the use of a specific resource. This measure may not be applicable to every traffic dimension. When applicable, specific IP flows are taken into consideration, to compute the number of occurrences of a specific transaction for the access to / the use of the specific resource, from the perspective of a specific traffic dimension. Each time a device performs the specific transaction, it is counted as an additional occurrence.

[0059] For instance, from the perspective of the web domains traffic dimension 23, the number of transactions 45 is, for example, the number of HTTP GET or HTTP POST requests related to a specific web domain (identified by its URI).

[0060] The computation of a number of transactions 45 (from the perspective of a specific traffic dimension 20) is based on the information (related to this number of transactions) extracted from the IP data records. The information extracted from the IP data records is memorized (e.g. in a data warehouse), to keep a record of each occurrence of a specific transaction. The effective computation of the number of transactions is performed over a reference time period, as will be detailed later. If available in the IP data records, generic information (e.g. timestamps of occurrence, identification of the related device, etc) is also memorized (in the data warehouse). This generic information is used to perform the calculation / aggregation of the number of transactions, as will be detailed later.

[0061] Additional measures may be defined, for instance by combining two among the previously defined measures. For example, the following additional measures may be considered: the average volume per user 46, the average volume per session 47, the average volume per transaction 48, the average number of sessions per user 49, the average number of transactions per user 50, and the average session duration per user 51. Each of these additional measures may not be applicable to every traffic dimension. Following are a few examples of how these additional measures are applied to a specific traffic dimension.

[0062] From the perspective of the protocols traffic dimension 21 , the average volume per user 46 is the volume 41 (total volume of IP traffic transported by all the IP flows associated to a specific networking protocol), divided by the number of users 42 (number of unique users making use of this specific networking protocol).

[0063] From the perspective of the IPTV channels traffic dimension

25, the average volume per session 47 is the volume 41 (total volume of IP traffic transported by all the IP flows associated to the delivery of a specific IPTV channel), divided by the number of sessions 43 (number of occurrences of an access to this specific IPTV channel).

[0064] The measures 40 are computed over a reference time period and stored in a data warehouse. The reference time period is dependent on the granularity in terms of time interval of the data received in the IP data records. Each network event reported in the IP data records usually has a timestamp with a lowest possible granularity in hours, preferably minutes, and even preferably seconds. The computation of the measures 40 is also dependant on the frequency of reception of the IP data records.

[0065] A reasonable reference time period for the computation of the measures 40 is the hour. However, some measures may require a lower reference time period, for instance minutes or even seconds. This is the case for the session duration 44, in the context of the IPTV channels 25. Higher reference time periods, like days, are not sufficiently accurate, since Operators are more and more interested in comparing various aspects of the IP data traffic at different periods of the day (the hour, and optionally the minute, is a good granularity in this case).

[0066] For illustration purposes, we consider a reference time period in hours for all the measures 40. Additionally, we consider that the IP data records are received every 15 minutes, and that the granularity of the network events in the IP data records is the second. One way to proceed is to extract information (representative of at least one traffic dimension 20) from the IP data records, upon reception of these IP data records every 15 minutes. The extracted information is stored in a data warehouse, according to a data model designed to enable the computation of at least one measure 40 (associated to the at least one traffic dimension 20). Then, every day, the information corresponding to the previous day is extracted from the data warehouse, and processed to compute the at least one measure 40 over the one hour reference time periods. This process will be further detailed later, in the description of the present method and system.

[0067] The analytic framework 10 also relies on the notion of traffic characteristics 60. The measures 40, computed as described previously, are further aggregated against at least one traffic characteristic 60. The result of the aggregation is a metric, which is used by the network Operator to monitor a specific aspect of the IP data traffic on its IP based network infrastructure. The traffic characteristics 60 defined in a specific analytic framework 10 may evolve over time, to enhance the capabilities of analysis of the IP data traffic.

[0068] The following examples of traffic characteristics 60 are represented in Figure 1 : time 61 , devices 62, users 63, and networks 64. It will be apparent to one of ordinary skill in the art that other examples of traffic characteristics 60 may be considered, without changing the scope of the present method and system.

[0069] The time 61 is a traffic characteristic 60, which is generally always available to perform the aggregation of the measures 40. This is due to the fact that the notion of time is already present in the computation of the measures 40, since these measures are computed over a reference time period. Thus, the measures 40 may be further aggregated against a multiple of the reference time period. For instance, if the reference time period is the hour, each measure 40 may be further aggregated, to compute an aggregated value of the measure against days, weeks, or months. The computation of the measures over the reference time period (for instance the hour) may be considered in itself as an aggregation of the measures against the reference time period.

[0070] The devices 62 are a traffic characteristic 60, which may be used to perform the aggregation of the measures 40. For instance, if every network event in the IP data records is associated to a unique identifier of the device which initiated this network event, then the measures 40 can be computed for each individual unique identifier. The information extracted from the IP data records is sorted per unique identifier of the related device, and the measures 40 are computed for each unique identifier over the reference time period. For example, in the case of a mobile IP network of the 3GPP family (Third Generation Partnership Project), a unique identifier of the mobile devices is the IMEI (International Mobile Equipment Identity). Having this unique identifier, it is possible to identify the manufacturer of the device, and the specific model of device within the manufacturer's portfolio. For example, in the case of a 3GPP mobile IP network, having the IMEI, the manufacturer and the model of the mobile device can be inferred. Then, additional characteristics can be further determined (using an additional data source 140 as represented in Figure 2): type of mobile device (feature phone, smart phone, tablet, computer with a dongle used as a modem), operating system used by the mobile device, etc. Thus, the measures 40 may be further aggregated against one of the characteristics of the devices 62: aggregated values of the measures may be computed for each device manufacturer, for each model of device, for each type of device, for each operating system, etc.

[0071] The users 63 are a traffic characteristic 60, which may be used to perform the aggregation of the measures 40. For instance, if every network event in the IP data records is associated to a unique identifier of the user who owns the device which initiated this network event, then the measures 40 can be computed for each individual unique identifier. The information extracted from the IP data records is sorted per unique identifier of the related users, and the measures 40 are computed for each unique identifier over the reference time period. For example, in the case of a mobile IP network of the 3GPP (Third Generation Partnership Project) family, a unique identifier of the user who owns a mobile device is the IMSI (International Mobile Subscriber Identity) or the MSISDN (Mobile Subscriber Integrated Services Digital Network Number). Having the unique identifier, it is possible to obtain demographic characteristics (sex, age, revenue, localization, etc) of the user associated to this unique identifier from an additional data source (140 as represented in Figure 2). In the case of a mobile IP network, having the unique identifier of the user, it is also possible to detect if this user is a subscriber of the mobile network Operator or a roamer. The detection may be direct (for example, the IMSI contains information that allows the segregation between subscribers and roamers), or indirect via a cross reference of the unique identifier with an additional source of information (for example, the MSISDN may be cross referenced with the database containing the mobile network Operator's subscribers). Thus, the measures 40 may be further aggregated against one of the characteristics of the users 63: aggregated values of the measures may be computed for various demographic characteristics of the users, aggregated values of the measures may also be computed for the subscribers or the roamers in the case of a mobile network Operator.

[0072] In certain cases, having unique identifiers of the devices associated to the network events in the IP data records allows the segmentation of the measures 40 against the devices 62, but also against the users 63. For this purpose, an additional data source is used to map the unique identifier of a device to a unique identifier of the user who owns this device. Alternatively, having unique identifiers of the users who own the devices associated to the network events in the IP data records allows the segmentation of the measures 40 against the users 63, but also against the devices 62. For this purpose, an additional data source is used to map the unique identifier of a user who owns a device to a unique identifier of the device itself. In the most favorable case, unique identifiers of both the devices and the users are associated to the network events in the IP data records, allowing the direct segmentation against both the devices 62 and the users 63.

[0073] The networks 64 are a traffic characteristic 60, which may be used to perform the aggregation of the measures 40. This characteristic applies specifically to mobile IP networks. For instance, in the case of a mobile IP network of the 3GPP family, various types of radio access networks within the 3GPP family may be deployed in the global mobile network infrastructure, including General Packet Radio Service (GPRS), Universal Mobile Telecommunications System (UMTS), Long Term Evolution (LTE), etc. If every network event in the IP data records is associated to the radio access network type over which this network event occurred, this information may be taken into consideration during the computation of the measures 40. Thus, the measures 40 may be further aggregated against one of the radio access network type defined in the traffic characteristic networks 64. [0074] In the previous paragraphs, the aggregation of the measures

40 against one single traffic characteristic 60 has been explained. It is also possible to perform the aggregation of the measures 40 against two or more of the traffic characteristics 60. For instance, in the case of the IPTV channels 25 delivered over a mobile IP network, the volume of data 41 is computed per IPTV channel over a reference time period (one hour) for each unique identifier of a mobile device. It may be further aggregated against a combination of: a specific model of mobile device 62, a specific duration 61 (e.g. on a daily basis), and the subscribers 63 (e.g. excluding the roamers).

[0075] Now referring concurrently to Figures 1 , 2, and 3, a method and system for generating metrics representative of IP data traffic from IP data records will be described.

[0076] An IP data network 100 is represented in Figure 2. It allows various types of devices (110, 111 , and 112) to access IP based applications and services 120, via the IP data network 100. For this purpose, IP data traffic is generated between the devices (110, 111 , and 112), and the infrastructure supporting the IP based applications and services 120. The role of a collecting entity 130 is to collect data related to this IP data traffic.

[0077] The present method and system is applicable to any type of mobile IP network (100) operated by a mobile network Operator, including without limitations: GPRS networks, UMTS networks, LTE networks, Code Division Multiple Access (CDMA) networks, or Worldwide Interoperability for Microwave Access (WIMAX) networks.

[0078] The present method and system is also applicable to any type of IP based fixed broadband network (100) operated by an Internet Service Provider (ISP), including without limitations: Digital Subscriber Line (DSL) networks, cable networks, or optical fiber networks.

[0079] The present method and system is also applicable to an IP data network (100) operated by a corporation, for instance a private company or a governmental / public organization.

[0080] The type of devices using the IP data network 100 includes without limitations: mobile phones 110, tablets (not represented), laptops and computers 111 , television sets 112; and more generally any type of IP enabled mobile device, home networking equipment / IP enabled multimedia equipment.

[0081] The collecting entities 130 collect data representative of the

IP data traffic on the IP data network 100, and generate IP data records based on the collected data.

[0082] The collecting entities 130 may operate in two ways. In a first embodiment, the collecting entities 130 collect data by capturing in real time IP packets from the IP data traffic occurring on a specific segment of the IP data network 100. The captured IP packets contain data related to IP data sessions occurring on the IP data network. An IP data session is defined as an IP based data session initiated by a device (110, 111 , or 112) on the IP data network 100, during which the device (110, 111 , or 112) consumes various types of applications and services 120 (for example messaging, web browsing, social networking, multimedia streaming, etc). The IP packets related to a specific IP data session are analyzed according to the protocol layers of the Open System Interconnection (OSI) model, to extract parameters representative of the IP data traffic on the IP data network 100. Such an embodiment is well known in the art as Deep Packet Inspection (DPI); and the type of parameters which can be extracted from IP packets by DPI based collecting entities 130 is also well known in the art. For instance, in the case of an UMTS network, the collecting entities 130 may be positioned between a Serving GPRS Support Node (SGSN) and a Gateway GPRS Support Node (GGSN), in order to collect the IP data traffic between these two equipments, well known in the art as the GPRS Tunneling Protocol (GTP) control and user planes.

[0083] In an alternative embodiment, the collecting entities 130 do not operate on the real time traffic, but collect data that have been gathered by one or several networking equipments of the IP based network infrastructure. The gathered data usually consist in logs of the IP data sessions. The same type of parameters is extracted from these gathered data, as in the case of the previously described first embodiment. However, the granularity of the parameters may be lower in this second embodiment, since the collecting entities 130 can only extract the subset of parameters present in the gathered data, and some useful parameters may be missing. Nevertheless, the extracted parameters are generally sufficient to be representative of the IP data traffic on the IP data network 100, since they usually include for each IP data session: timestamps of occurrence of network events, an identifier of the device (1 10, 1 1 1 , and 112) performing the IP data session, and several parameters representative of the specific protocols, applications and services used during the IP data session. In the case of a mobile IP network, such collected data are usually referred to as Call Detail Records (CDRs). For instance, in the case of an UMTS network, the collecting entities 130 retrieve the CDRs from network equipments such as the GGSN and the SGSN, and extract the relevant parameters from these CDRs. In the case of an IP based fixed broadband network, such collected data are usually referred to as IP Detail Records (IPDRs), and are retrieved by the collecting entities 130 from various networking equipments, based on the specific fixed broadband technology considered. [0084] The collecting entities 130 transmit IP data records to an analytic system 160. The IP data records contain all the parameters collected by the collecting entities 130 over a pre-defined time period. In one embodiment of the present method and system, the analytic system 160 is composed of a pre-processing unit 162, a post-processing unit 164, a data warehouse 166, an analytic engine 168, and a reports unit 170.

[0085] From an implementation perspective, a single centralized analytic system 160 receives the IP data records from all the collecting entities 130 operating on the IP data network 100. Alternatively, several centralized analytic systems 160 may be deployed, each of these receiving the IP data records from a set of collecting entities 130 grouped by geographical proximity (or any other relevant criteria).

[0086] The IP data records received from the collecting entities 130 are processed by the pre-processing unit 162, according to an analytic framework 10 as represented in Figure 1. This processing consists in extracting information from the IP data records, transforming the extracted information according to a data model defined in relation to the analytic framework 10, and storing the resulting information in the data warehouse 166. If necessary, additional data provided by an additional data source 140 may be used, to enrich the information extracted from the IP data records, as has been explained previously in the description of Figure 1.

[0087] The IP data records contain generic information relevant in the context of any of the traffic dimensions 20. For instance, the identifiers of the devices (1 10, 111 , and 1 12), the identifiers of the users who own the devices (1 10, 1 1 1 , and 1 12), the type of network technology used for a given IP data session, represent information extracted from the IP data records, which are relevant to any traffic dimension 20. [0088] The IP data records also contain information only relevant in the context of a specific traffic dimension 20. For instance, detailed information related to the usage of a specific network protocol (identification of the protocol, timestamps of beginning and end of the usage of the protocol within an IP data session, volume of data transferred with this protocol, and various parameters specific to this protocol like the URI in the case of HTTP), represent information extracted from the IP data records, which may be relevant to only one of the specific traffic dimensions 20.

[0089] It may also occur that the same information extracted from the IP data records is relevant to more than one traffic dimension 20. For instance, basic information related to the usage of a specific network protocol (identification of the protocol, timestamps of usage, volume of data) is relevant to the traffic dimension protocols 21 ; and is also relevant to the traffic dimensions data services 22 (various network protocols), web domains 23 (HTTP protocol), applications 24 (HTTP protocol), and IPTV channels 25 (HTTP, SIP, RTP, RTSP protocols).

[0090] Thus, when processing the IP data records from the perspective of a specific traffic dimension 20, the pre-processing unit 162 extracts information representative of this specific traffic dimension (including information exclusively relevant to this traffic dimension, and information also relevant to other traffic dimensions). This information is optionally adapted (to comply with a specific data model representative of the analytic framework 10), and then stored in the data warehouse 166.

[0091] The post-processing unit 164 processes the information

(previously extracted from the IP data records) stored in the data warehouse 166 by the pre-processing unit 162, in order to compute measures 40 defined according to the analytic framework 10 over a reference time period. Each resulting measure 40 is stored in the data warehouse 166. According to the analytic framework 10, the computation of a measure 40 is performed in relation to a specific traffic dimension 20, as has been explained previously in the description of Figure 1.

[0092] The post-processing unit 164 also processes the measures

40 stored in the data warehouse 166, in order to perform the aggregation of these measures. According to the analytic framework 10, the computation of an aggregated value of a measure 40, representative of a specific traffic dimension 20, is performed against at least one traffic characteristic 60, as has been explained previously in the description of Figure 1. The resulting aggregated values of the measures are stored in the data warehouse 166, and constitute metrics representative of the IP data traffic on the IP data network 100.

[0093] To illustrate the sequence of operations of the pre-processing unit 162 and the post-processing unit 164, we consider (in one illustrative example) that IP data records are received from the collecting entities 130 every 15 minutes, that the granularity of the timestamps associated to the network events in the IP data records is one second, and that the reference time period to compute the measures is one minute. Thus, in one illustrative operational mode, every 15 minutes (upon reception of the IP data records), for each traffic dimension 20 defined in the analytic framework 10, the preprocessing unit 162 performs the extraction of the information representative of this specific traffic dimension 20 from the IP data records, and the storage of the extracted information in the data warehouse 166. Every hour, for each traffic dimension 20 defined in the analytic framework 10, the post-processing unit 164 performs the computation of the measures 40 representative of this specific traffic dimension 20 over reference time periods (based on the previously extracted information stored in the data warehouse by the pre- processing unit 162), and stores these resulting measures 40 in the data warehouse 166. The reference time period for the computation of a measure 40 may be a minute or an hour, based on the granularity required for this measure 40. Every 24 hours, the post-processing unit 164 computes aggregated values of the measures 40 (representative of the traffic dimensions 20) stored in the data warehouse 166, to generate the metrics representative of the IP data traffic on the IP data network 100. The aggregation of the measures 40 is performed against one (or several) traffic characteristic(s) 60 defined in the analytic framework 10.

[0094] The pre-processing unit 162 and the post-processing unit 164 have been represented as two separate functional entities in Figure 2. However, from an implementation perspective, these two functional entities may be integrated into a single processing unit.

[0095] The analytic engine 168 queries the data warehouse 166, to extract metrics (the aggregated values of the measures 40) representative of the IP data traffic on the IP data network, according to the analytic framework 10. The extracted metrics are further processed if necessary (some additional computations may be necessary to generate the finalized values of the metrics), and then combined, to generate reports. These reports are transferred to the reports unit 170, to be presented to the staff of the network Operator (e.g. operational, marketing, and product development teams) via a Graphical User Interface (GUI). The reports presented via the GUI consist in visual representations of the metrics, relying on various types of charts and diagrams, to illustrate the metrics in an intuitive, easy to understand way.

[0096] The generation of a report according to the analytic framework 10 follows the following steps. First, a traffic dimension 20 is selected. Then, one or several measures 40 representative of this traffic dimension 20 is (are) selected. Then, for each measure 40, one or several traffic characteristics 60 are selected. For each traffic characteristic 60, a range of values representative of this traffic characteristic is selected. The data warehouse 166 is queried by the analytic engine 168, to extract the aggregated values of each selected measure 40 against the range of values of the selected traffic characteristic(s) 60. The extracted aggregated values of the measures 40 constitute the metrics, which are further processed if necessary, and then combined, to generate the report.

[0097] For example, a report is generated to analyze the IPTV channels 25. The volume of traffic 41 for three different IPTV channels is compared. Additionally, the volume of traffic 41 is aggregated on a daily basis, and compared over a one month period (aggregation against time 61). Furthermore, the comparison is performed for two different models of mobile devices (aggregation against devices 62).

[0098] Usually, the analytic engine 168 has Business Intelligence

(Bl) and / or data mining capabilities, to further process the metrics extracted from the data warehouse 166. Trends and behaviors in the usage of the IP data network 10 are identified via the Bl capabilities. Additionally, clusters of users with specific consumption patterns (of the applications and services 120) are identified via the data mining capabilities.

[0099] The pre-processing unit 162, the post-processing unit 164, the analytic engine 168, and the reports unit 170, are respectively composed of dedicated software programs executed on a dedicated computer. Alternatively, dedicated software programs corresponding to several units may be executed on the same computer (for instance for the pre-processing unit 162 and postprocessing unit 164). The implementation of the data warehouse 166 is considered as well known in the art. [00100] In one particular embodiment of the present method and system, the analytic framework 10 consists of a set of pre-defined traffic dimensions 20, a set of pre-defined measures 40, and a set of pre-defined traffic characteristics 60. The processing of the received IP data records by the analytic system 160 according to the analytic framework 10, as detailed in the previous paragraphs, includes various steps where the traffic dimensions 20, the measures 40, and the traffic characteristics 60, are taken into consideration. For each of these steps, whenever a traffic dimension is selected, it is selected from the set of pre-defined traffic dimensions 20. Similarly, whenever a measure is selected, it is selected from the set of predefined measures 40. And, whenever a traffic characteristic is selected, it is selected from the set of pre-defined traffic characteristics 60.

[00101] Although the present method and system have been described in the foregoing specification by means of several non-restrictive illustrative embodiments, these illustrative embodiments can be modified at will without departing from the scope of the following claims.

Claims

3 ± What is claimed is:

1. A method for generating metrics representative of IP data traffic from IP data records, the method comprising:

receiving at an analytic system IP data records representative of IP data traffic on an IP network;

processing at the analytic system said IP data records from the perspective of at least one traffic dimension;

the processing comprising for each at least one traffic dimension:

- extracting information representative of said traffic dimension from the IP data records,

using said extracted information to compute at least one measure representative of said traffic dimension;

the processing further comprising computing a metric consisting of an aggregated value of the at least one measure representative of the traffic dimension against at least one traffic characteristic.

2. The method of claim 1 , wherein the at least one traffic dimension comprises at least one of: protocols, data services, web domains, applications, and IP television (IPTV) channels.

3. The method of any of claims 1 and 2, wherein the at least one measure comprises at least one of: a volume of data, a number of users, a number of sessions, a session duration, and a number of transactions.

4. The method of claim 3, wherein the at least one measure further comprises at least one of: an average volume of data per user, an average volume of data per session, an average volume of data per transaction, an average number of sessions per user, an average number of transactions per user, and an average session duration per user.

5. The method of any of claim 1 to 4, wherein the at least one traffic characteristic comprises at least one of: time, devices, users, and networks.

6. The method of claim 5, wherein the users are further divided into subscribers and roamers.

7. The method of any of claims 1 to 6, wherein the at least one traffic dimension is selected among a set of pre-defined traffic dimensions, the at least one measure is selected among a set of pre-defined measures, and the at least one traffic characteristic is selected among a set of pre-defined traffic characteristics.

8. A system for generating metrics representative of IP data traffic from IP data records, the system comprising:

an analytic system:

for receiving IP data records representative of IP data traffic on an IP network, and

for processing said IP data records from the perspective of at least one traffic dimension;

the processing comprising for each at least one traffic dimension:

- using said extracted information to compute at least one measure representative of said traffic dimension;

9. The system of claim 8, wherein the analytic system comprises:

at least one processing unit for processing the IP data records from the perspective of at least one traffic dimension; and a data warehouse for storing at least one of: the extracted information, the at least one measure, and the aggregated value of the at least one measure.

10. The system of any of claims 8 and 9, wherein the at least one traffic dimension comprises at least one of: protocols, data services, web domains, applications, and IP television (IPTV) channels.

11. The system of any of claims 8-10, wherein the at least one measure comprises at least one of: a volume of data, a number of users, a number of sessions, a session duration, and a number of transactions.

12. The system of claim 11 , wherein the at least one measure further comprises at least one of: an average volume of data per user, an average volume of data per session, an average volume of data per transaction, an average number of sessions per user, an average number of transactions per user, and an average session duration per user.

13. The system of any of claims 8-12, wherein the at least one traffic characteristic comprises at least one of: time, devices, users, and networks.

14. The system of claim 13, wherein the users are further divided into subscribers and roamers.

15. The system of any of claims 8-14, wherein the at least one traffic dimension is selected among a set of pre-defined traffic dimensions, the at least one measure is selected among a set of pre-defined measures, and the at least one traffic characteristic is selected among a set of pre-defined traffic characteristics.

16. The system of any of claims 8-15, wherein at least one collecting entity: collects data representative of IP data traffic on an IP data network, generates IP data records based on said collected data, and

transmits said IP data records to the analytic system.