Insights - Description of Methodology

This article discusses, at high level, the methodology and logic that is being employed to drive the Insights product.

Product Overview

The Insights solution is anchored in real device and app engagement called Mobility Data. Insights unlocks this data to help marketers see the bigger picture around consumer behavior through our proprietary analytic reporting suite. Intuitive dashboards and data visualization inform flexible reports, putting privacy-compliant app insights at your fingertips. ​The following articles detail the many foundational concepts of Mobility Data, and how it is transformed into the finished charts and graphs seen in each Insights' report page.

Mobility Data

Mobility data is the term used by T-Mobile Advertising Solutions to refer to the raw, proprietary, passively collected, device level app engagement data that drives the charts and graphs available in the Insights suite of analytic reports.

Mobility data specifically refers to two key app engagement data sources that T-Mobile Advertising Solutions has access to from devices within the T-Mobile network:

  • App Focus Data
  • App Broadband Data

Each of the above two data sources play a key role in the creation of the finished data points found in the App Insights charts and graphs.

App Focus Data

App Focus Data are the foundational app engagement records that are at the core of the Insights product's metrics. This data is recorded at the device level utilizing either a proprietary T-Mobile SDK or proprietary T-Mobile Headless application. This data set is collected from those devices in the T-Mobile network which are post-paid Android OS, and in which the end user of the device has not opted-out of analytics data collection and usage. This results in a "Focus Data Pool" of ~8 million devices worth of real app engagement data on any given day.

The Focus Data is captured at a level of detail called a "Focus Event". The concept of the Focus Event is central to understanding how App Insights engagement based metrics are created. For participating devices (Focus Data Pool), a data record is generated every time an app is "in focus" on their device.

An app is defined as being "in focus" when the device user brings the app to the forefront of the device in full screen mode so that they can engage with the app. Once the app is in focus on the device, the device triggers the creation of a Focus Event record, and starts the clock on how many seconds the device engages with the app while it is in focus.

Because these focus event records are captured continuously, and passively, there is no need for the end user to perform any action to trigger the creation of engagement records other than to use their device, and associated apps, as they normally would. This allows for a more organic pattern of app engagement to be assessed.

The Focus Event records, and associated seconds of app engagement, are utilized in the creation of such metrics as:

  • App Ownership footprints.
  • Active & Inactive device counts.
  • App Dwell Time aggregates and averages.
  • App Focus Frequency.

New Focus Event records are ingested for use in App Insights reports every single day. In addition to Focus Event records, the proprietary T-Mobile SDK and Headless App also captures App Install and App Uninstall events. Together with Focus Data, these 3 event types comprise the core of the Insights suite of reporting metrics.

Limitations of App Focus Event Data Collection & Usage

While the Focus Event data records are extremely powerful, there are limitations to how the data is captured, and what insights can be created from them.

Fidelity of Data Capture
All app Focus Event records are captured at the hourly time grain. This means that regardless of how many distinct app focus event are generated by a device in an hour long period, only one (1) record of focus will be recorded for the app on the device in the hour.

For Example, suppose that a given device focuses on App "A" as such:

  • Hour 1 - App A Focus Event - 324 seconds of engagement.
  • Hour 1 - App A Focus Event - 213 seconds of engagement.
  • Hour 1 - App A Focus Event - 23 seconds of engagement.
  • Hour 3 - App A Focus Event - 322 seconds of engagement.
  • Hour 3 - App A Focus Event - 123 seconds of engagement.
  • Hour 4 - App A Focus Event - 453 seconds of engagement.

Given the above engagement history, there will be associated focus events recorded like such:

  • Hour 1 - App A Focus Event - 560 seconds of engagement.
  • Hour 3 - App A Focus Event - 445 seconds of engagement.
  • Hour 4 - App A Focus Event - 452 seconds of engagement.

However, the seconds of engagement are aggregated to a single focus event record for the given app on the given device during the given hour.

Extent of Data Capture
Devices have the ability to capture up to 10 distinct app's worth of focus events in a given hour. This means that only the top 10 most engaged apps, as ranked by seconds of total engagement, will be captured into focus event records for a given device during a given hour.

For Example, suppose that a given device has the following focus event aggregates for Hour 1:

  • Hour 1 - App A Focus Event - 543 seconds of engagement.
  • Hour 1 - App B Focus Event - 24 seconds of engagement.
  • Hour 1 - App C Focus Event - 678 seconds of engagement.
  • Hour 1 - App D Focus Event - 49 seconds of engagement.
  • Hour 1 - App E Focus Event - 146 seconds of engagement.
  • Hour 1 - App F Focus Event - 234 seconds of engagement.
  • Hour 1 - App G Focus Event - 343 seconds of engagement.
  • Hour 1 - App H Focus Event - 234 seconds of engagement.
  • Hour 1 - App I Focus Event - 62 seconds of engagement.
  • Hour 1 - App J Focus Event - 238 seconds of engagement.
  • Hour 1 - App K Focus Event - 56 seconds of engagement.
  • Hour 1 - App L Focus Event - 23 seconds of engagement.
Only the top 10 apps by total seconds of engagement will have their focus events recorded for this device for Hour 1, thus the below two records will be dropped and not reported:
  • Hour 1 - App B Focus Event - 24 seconds of engagement.
  • Hour 1 - App L Focus Event - 23 seconds of engagement.

Depth of Data Capture
The focus event data structure is specifically superficial in nature, meaning that there is no way for T-Mobile to know what is transpiring within an app, once that app is in focus. The interaction of the end user with a given app is private between that end user and the publisher of the app. T-Mobile has no ability to see this depth of engagement.

App Broadband Data

The T-Mobile network facilitates the ability of devices on the network to speak with servers that support apps installed on the device. When devices that subscribe to the T-Mobile network are not utilizing a WiFi connection to speak to app servers, the T-Mobile network bridges this connection for devices. This is what is commonly known as a Data Plan, or Data Usage, on a mobile carrier enabled device. When a given device is utilizing the T-Mobile network to transmit app data, certain signals are present that, when translated, indicate that an app server is being called by a mobile device.

App Broadband Data is recorded in very high level aggregated daily packages, and so it does not have the same useful granularity as Focus Event data for constructing metrics like Dwell Time and Focus Frequency. However, what Broadband Data lacks in fidelity, it makes up for in breadth of device coverage.

T-Mobile Advertising Solutions has the ability to understand basic app engagement from between 40 million and 50 million Android and iOS devices on any given day. This wide breadth of intelligence is used in App Insights to form the basis of the App Insights Scaling methodology. The Scaling methodology is covered in more depth further below.

Privacy Compliance

T-Mobile considers end user privacy to be of the upmost importance. As such, it is imperative to T-Mobile Advertising Solutions that end user privacy wishes are honored. For the App Insights suite of analytic reports, this means assessing the declared opt-in and opt-out decisions made by each device, and removing those devices which have opted-out of analytic data collection and usage.

Data Summarization & Aggregation

Mobility Data are incredibly powerful records of app engagement, which can be translated into informative and actionable metrics. However, Mobility Data does not come from a mobile device in an immediately usable state. Mobility data is comprised of hundreds of millions of individual data records every single day, all of which must be ingested, stored, verified, culled for privacy compliance, transformed, and ultimately aggregated into usable metrics for Insights reports.

Data Summarization

While it may seem unintuitive that reports in Insights are able to return app engagement data in mere seconds, especially when those reports are based on hundreds of millions of individual data records, the reason that this is possible is due to the steps taken to pre-aggregate every data point that could be requested by an Insights end user. This pre-aggregation is called "Data Summarization", and it's the key to creating incredibly fast, efficient, and scalable reporting.

To better understand the idea of data summarization, consider an example report run from Insights to trace back how the data is prepared to appear in that report. Thinking about the Insights' Installation Report, there are a number of filters that can dictate exactly which data points are required, and a number of options regarding which metrics are able to be displayed. An Installation Report can be broken down as such:

Filtros

  • Apps (~750)
  • Segment (~150)
  • Date Grain (3)
  • Date Range (Arbitrary)

Métricas

  • Installed All
  • Installed & Active
  • Installed & Inactive

Given the above filter options, it quickly becomes apparent that to produce the 3 available metrics in the Installation Report, there are numerous conceivable combinations of data points necessary dependent on what choices are made by the Insights end user. Por ejemplo:

  • If there are currently 750 reporting apps...
  • If engagement data is historically available from July 1, 2021 forward; then as of March 31, 2022 that means that there are 9 possible monthly intervals, 39 possible weekly intervals, and 274 possible daily intervals...
  • If there are 150 possible Demographic and AppGraph Persona segments available...

...then there must exist the following number of summarized data points for the Installation Report:

  • Daily Grain = 750 Apps x 150 Segments x 274 Days x 3 Reporting Metrics = 92,475,000 data points
  • Weekly Grain = 750 Apps x 150 Segments x 39 Weeks x 3 Reporting Metrics = 13,162,500 data points
  • Monthly Grain = 750 Apps x 150 Segments x 9 Months x 3 Reporting Metrics = 3,037,500 data points

Data Recency & Finality

Insights' reports update on a daily basis, with new mobility data flowing in every day.

Scaling Methodology

Insights' reports are built to provide compelling mobile app data that is representative of all mobile devices across the entire United States for all carriers. The way this representation is achieved is through a rigorous, multi-step, scaling methodology that projects app engagement data from T-Mobile's reporting devices, to all mobile devices in the US.

Insight's Scaling Methodology takes ~7-8mm "Insights Reporting" devices (App Diagnostic Data for Android devices), and then utilizes ~40-50mm "Scalar Panel" devices (App Broadband Data for Android & iOS devices) to achieve device level weights that are applied to all Insights' data points so that they are representative of ~273mm US mobile devices.

Through this device level scaling methodology, each T-Mobile reporting device is de-biased and weighted based on a number of factors. Here is a high level look at how this scaling methodology is created and deployed to create the the final Insights' metrics.

Create Mobile Device Distributions
The first step in creating device level scaling factors (weights) is to create high-level distributions of mobile devices within the United States. These distributions are created based on the following device and compositional traits:

  • Device Region (Multi-State Region)
  • Device Operating System (Android Vs. iOS)
  • Device Owner Age & Device Owner Gender
  • Device Service Type (Pre-Paid Vs. Post-Paid)

 

These distributions demonstrate what portion of mobile devices across all carriers within the United States have certain attributes, so that the correct type and number of T-Mobile devices can be effectively placed within each distribution bucket. The goal is to determine the US mobile device owning footprint that each T-Mobile reporting device must ultimately represent.

Create a Representative Panel of Mobile Devices
After the distributions are created, devices from the App Broadband Data set (~40-50mm iOS and Android mobile devices) are placed within the distribution buckets as applicable. Not all T-Mobile App Broadband Data devices will be assigned to a distribution bucket. Only those devices from this data set that meet the needs of the distribution percentages are assigned. Por ejemplo:

  • When deciding on a particular set of device owner genders, if the US distribution is 50/50 between Male and Female device owners, and there are 60% Male and 40% Female device owners in the App Broadband Data set, then 20% of these male owned devices will be removed to create the desired gender distribution that adequately represents the device distribution in the US.

This process of device selection results in ~30mm of the available ~40-50mm T-Mobile App Broadband data devices being assigned into distribution buckets. The T-Mobile App Broadband data devices that are assigned to distribution buckets represent the final "Scalar Panel" of mobile devices that will be used to create the device level scaling factors (weights).

Assess App Engagement Behavior for Devices in the Panel
Both the "Scalar Panel" of devices found in the distribution buckets, as well as the App Diagnostic Data reporting devices that underpin the Insights' reports, are assessed for their overall app engagement. App engagement is a key piece of the mapping step detailed below.

First, the devices in the "Scalar Panel" are assessed for the top 100 hostnames (App & Website data) engaged with over the course of a predefined sample week.

Second, the devices that that provide reporting to Insights are assessed for the top 100 hostnames (App & Website data) engaged with over the course of the most recent complete week of data available.

Multiple personal traits icon. Simple illustration of Multiple personal traits vector icon for web design isolated on white background

Map "Scalar Panel" Devices to Insights Reporting Devices
Once an overall broadband network engagement profile is created, this profile along with the device, and device owner, attributes that were utilized in the creation of the original distribution buckets, are employed to pair together the devices from the "Scalar Panel" with the "Insights Reporting" devices.

This process is called "Raking" (also known as Iterative Proportional Fitting), and is a standard form of statistical analysis used for associating together respondents and responses. In essence, "Raking" is a method for adjusting sample weights so that they more accurately reflect the true population weights. In regards to the use of raking in Insights scaling methodology, the process can be understood as:

  • Finding the fitted matrix of Insights Reporting devices that is closest to the App Broadband devices that exist in the final "Scalar Panel", but with row and column totals that represent the full distributions of US devices.

From this Raking process the optimal device weights can be produced:

  • ~7-8mm Insights Reporting devices are mapped to...
  • ~30mm adjusted Scalar Panel devices which are distributed to represent...
  • ~273mm US Mobile Devices

In this way, each of the Insights Reporting devices is assigned a device level weight which determines how many US mobile devices it should represent app engagement for.

Thresholding Methodology

A feature of most Insights' reports is the ability to help the end user identify, at a broad level, the reporting scale that every data point has. For Insights' report pages that contain Info Tables of the graphed data points, every data point in the table will be assessed for its statistical uncertainty as based on the number of reporting devices that are used to create it.

For each data point reported, Insights utilizes standard statistical techniques to calculate the margin of error associated with a 95% confidence interval. A tiered approach is then employed to display these uncertainties based on ranges for the percent error of each of these estimates. Every estimate in the Insights' report's Info tables is labeled with the highest statistical tolerance grade for which it passes according to:

Fully Confident

  • No Annotation is provided ( ).
  • Estimate's percentage error is no more than 20%.

Mostly Confident

  • Annotated by a single asterisk ( * ).
  • Estimate's percentage error is no more than 35%.

Somewhat Confident

  • Annotated by two asterisks ( * * ).
  • Estimate's percentage error is no more than 50%.

Minimally Confident

  • Annotated by three asterisks ( * * * ).
  • Cannot provide statistical guarantee on estimate's percentage error.

Thresholds in Action

The following two example scenarios illustrate how Insights calculate and associate statistical uncertainty to each data point. These examples help to show the useful nature of each estimate's uncertainty designation.

Example 1 - Engagement Report

While in the Engagement Report, data is run for the following filter combination:

  • Metric: Number of Unique Engaged Devices
  • Mobile App: ESPN
  • Segment: Female
  • Granularity: Week
  • Date Range: Week of 2021-02-22

This report run produced an estimate of 300,000 female owned devices that engaged with the ESPN app during this week. Insights finds this data point to be "Fully Confident" and no additional annotation is attached to it in the Engagement Report Info Table. Recall that for an estimate to receive an assessment of "Fully Confident", the data point in question must be found to have a 20% or less margin of error. This means that Insights believes the true value of female ESPN app users to be within 20% of 300,000 for this week (i.e. +/- 60,000 devices).

Example 2 - Installation Report

While in the Installation Report, data is run for the following filter combination:

  • Metric: Number of Installations
  • Mobile App: Chipotle
  • Segment: Age 25-34
  • Granularity: Days
  • Date Range: Day of 2022-01-18

This report run produced an estimate of 3,000 devices that Installed the Chipotle app on this day. Insights finds this data point to be "Mostly Confident", and a single asterisk annotation is attached to it in the Installation Report Info Table. Recall that for an estimate to receive an assessment of "Mostly Confident", the data point in question must be found to have a 35% or less margin of error. This means that Insights believes the true value of the number of Chipotle app installers aged 25-34 to be within 35% of 3,000 for this day (i.e. +/- 1,050 devices).