Measuring web performance: lab vs. production

Carlos Morales • 2022-06-29

Measuring web performance is tricky, the performance of a site can vary dramatically based on a user’s device, the network conditions, and how the user interacts with the page. This complexity starts with what data to collect and how to collect/send the data. This document explains first how to collect the data and then offers a high-level solution.

Lab vs field measurements

Performance metrics are generally measured in one of two ways:

In the lab: using developer tools to collect synthetic data from a controlled environment
In the field: measures how real users interact with the webpage

Let’s further analyze these two options and how they could be implemented.

In the lab

The baseline is to collect synthetic data with some tool, the tool simulates a page load and quantifies the time it takes to load. This process is done in development or CICD environment.

Advantages

Conditions should be more or less constant, this is useful to identify regressions in web performance.
This can be easily incorporated into developer workflows and continuous integration processes.

Disadvantages

Data will always have discrepancies with real users.
It does not consider the user’s variables: device, user-specific data, network conditions, and how real users interact with the page.

Options for implementation

WebPageTest: For many years, the WebPageTest tool was the most common audit tool to measure web performance. It mainly focused on the SpeedIndex metric, a page load performance metric that shows how quickly the page is visually complete.
Another approach was to use the JavaScript Performance API to measure the time it takes until the JavaScript onload event is completed. The main problem of this measurement is Single Page Applications normally start loading user content at that time, and modify the DOM asynchronously. Therefore it is not focused on the user and modern web applications.
In recent years, Google popularized the Lighthouse tool to audit not only performance but also accessibility and SEO. This tool indicates areas that can be improved to optimize the web. It can run inside the Chrome devtools or through the CLI open-source tool.

In the field

The goal is to measure how the web performs for real users. Some JavaScript must be injected into the web client, this code should not affect the overall performance while collecting data and sending it to a backend.

Advantages

It measures the real user experience of the application.
It collects performance data from the whole environment (including user variables).
This data could be used for monitoring and alerting in real-time.
The same architecture could be used for collecting errors and exceptions in the backend.

Disadvantages

Complexity: measuring and collecting data in the frontend and storing it in the backend is not trivial, we need to collect, send it and store it securely and reliably.
Measurements may be misleading: averages do not represent any single user’s session, instead, we should identify percentiles across the distribution to focus on subsets of users.
The actual measurement should not affect the actual performance of the web. I.e. if the extra JavaScript code is too big, it will impact the performance and have a negative UX.
Measuring Single Page Application route transitions is not a straightforward task, if possible

Options for implementation

Buy: Some of the tools to collect Real User Monitoring are:
- AppDynamics Browser Real User Monitoring (BRUM) and Mobile Real User Monitoring (MRUM)
- Dynatrace Real user monitoring (RUM), and
- New Relic Browser
Build: develop a small code that is injected into our web applications, it would gather some anonymized data and send it to a backend. This backend would use the existing infrastructure to collect (Prometheus) and display (Grafana) these performance metrics.

The problem of measuring only in the lab

Because it is complex to measure web performance in the field, people normally only focuses on synthetic measurements. Sadly this does not show the real picture.

Based on the research document Lighthouse scores as predictors of page-level CrUX data, although there is a positive correlation between lab performance measurements (Lighthouse) and real user experience, more than 40% of all pages that scored >90 on Lighthouse did not meet one or more of the recommended Core Web Vitals thresholds. The conclusions of this research document: ”for some pages, there are issues troubling performance in the field that Lighthouse isn’t (yet?) able to capture … remember to check both lab and field data if you’re looking to accurately assess your page’s performance“.

In conclusion, collecting data in the field should be mandatory for assessing and monitoring user-perceived performance.

Build a solution for collecting web performance from real users

In my current company we have experience using AppDynamics and Dynatrace. They both have modules that would allow us to monitor the web performance. Sadly, these solutions experienced some resistance (either because of license costs or complexity) in the last few years.

My proposal was to reuse the available Kubernetes infrastructure (Prometheus and Grafana) and fill the gaps building our own solution: a client agent built-in TypeScript that collects the performance metrics and a backend service that would expose this data to Prometheus.

In this solution, one of the trickiest problems is to define what metrics to collect. My proposal was to use Web Vitals.

Web Vitals is an initiative by Google to ”provide unified guidance for quality signals that are essential to delivering a great user experience on the web”. It offers:

What to measure: a set of “User-centric” performance metrics that help to understand how users perceive performance.
An implementation: An open-source library that simplifies how to collect and send the metrics. Although the Google Chrome team developed this library, it works in all browsers.

Advantages

Good quality:
- Leverage Google’s experience in web performance to identify good web performance metrics.
- Some Real User Monitoring (RUM) analytics providers (like AppDynamics and Dynatrace) already include the Core Web Vitals metrics, we could achieve very similar metric results.
- These metrics would cover real users: users with huge data sets (some external asset managers) and very slow networks or devices (end users from web banking).
Full control:
- Among all metrics provided by Web Vitals, you could define what metrics are relevant for each case.
- Extend these metrics with error alerting. This would allow us to collect errors and exceptions from the browser.
Low cost:
- Reuse existing infrastructure for monitoring.
- Reuse existing workflows and delivery processes for JavaScript libraries and Docker images.
- No license costs.
- No provider lockdown.

Disadvantages

This requires to implement a solution that, strictly speaking, is not business-focused software.
Web vital metrics will evolve and we may need to extend them in the future.
Web vital metrics do not cover Single Page Application (SPA) routing transitions across multiple views, more info at this link: How SPA architectures affect Core Web Vitals.

Measuring web performance: lab vs. production

Lab vs field measurements

In the lab

Advantages

Disadvantages

Options for implementation

In the field

Advantages

Disadvantages

Options for implementation

The problem of measuring only in the lab

Build a solution for collecting web performance from real users

Advantages

Disadvantages

Links