Post

Chrome Performance Logs with Selenium and Python

Analyzing browser performance logs

This is an introduction to using Chrome performance logs with Selenium and Python. Browser performance logs contain timing data and other information that are captured during web page loads. It is similar to what you see in Chrome DevTools Network tab.

This requires the selenium package (client bindings) for Python used to automate your browser.

To access performance logs with Selenium, you first need to create a driver instance with the performance logging preference enabled:

1
2
3
4
5
from selenium import webdriver

options = webdriver.ChromeOptions()
options.set_capability("goog:loggingPrefs", {"performance": "ALL"})
driver = webdriver.Chrome(options=options)

You can then navigate to a URL and get the performance log containing network request information for the page that was loaded:

1
2
driver.get("https://example.com")
log = driver.get_log("performance")

This returns a list of dictionaries containing log entries. The pertinent log message is in a JSON blob that can be accessed with the message key in each dictionary. You can parse the JSON blob and load it into a dictionary. From there, the actual log message is in a nested dictionary accessed by another message key. You can then access the message’s method (string) and params (dictionary).

Here is an example of accessing each log message’s method and params:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import json
from selenium import webdriver

URL = "https://example.com"

options = webdriver.ChromeOptions()
options.set_capability("goog:loggingPrefs", {"performance": "ALL"})
driver = webdriver.Chrome(options=options)
driver.get(URL)

for entry in driver.get_log("performance"):
    message = json.loads(entry["message"])["message"]
    method = message["method"]
    params = message["params"]

The methods captured are:

  • Network.dataReceived
  • Network.loadingFinished
  • Network.policyUpdated
  • Network.requestServedFromCache
  • Network.requestWillBeSent
  • Network.requestWillBeSentExtraInfo
  • Network.responseReceived
  • Network.responseReceivedExtraInfo
  • Page.domContentEventFired
  • Page.frameNavigated
  • Page.frameResized
  • Page.frameStartedLoading
  • Page.frameStartedNavigating
  • Page.frameStoppedLoading
  • Page.loadEventFired

The params associated with each of these contain different information and timestamps for network requests/responses and page load events. You can use this data to analyze and profile network and page load performance for your website.

Here is an example of using Chrome performance logs to report the domains accessed and URLs requested when loading the page at https://selenium.dev:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#!/usr/bin/env python3
#
# print a report of domains accessed and URLs requested during a
# web page load in Chrome browser.
#
# author: Corey Goldberg - https://github.com/cgoldberg

import json
from urllib.parse import urlparse

from selenium import webdriver


def load_page(url):
    """Loads a web page and retrieves the performance log."""
    options = webdriver.ChromeOptions()
    options.set_capability("goog:loggingPrefs", {"performance": "ALL"})
    with webdriver.Chrome(options=options) as driver:
        driver.get(page_url)
        log = driver.get_log("performance")
    return log


def get_urls_and_domains(perf_log):
    """Gets lists of domains and URLs from requests in performance log."""
    domains = set()
    urls = []
    for entry in perf_log:
        message = json.loads(entry["message"])["message"]
        if message["method"] == "Network.requestWillBeSent":
            url = message["params"]["request"]["url"]
            domain = urlparse(url).netloc
            if domain:
                urls.append(url)
                domains.add(domain)
    return sorted(domains), sorted(urls)


if __name__ == "__main__":
    page_url = "https://selenium.dev"

    perf_log = load_page(page_url)
    domains, urls = get_urls_and_domains(perf_log)

    print(f"network requests when loading '{page_url}':\n")
    print(f"{len(domains)} domains:")
    for domain in domains:
        print(f" - {domain}")
    print()
    print(f"{len(urls)} urls:")
    for url in urls:
        print(f" - {url}")

When you run this script, the output looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
network requests when loading 'https://selenium.dev':

8 domains:
 - cdn.jsdelivr.net
 - code.jquery.com
 - fonts.googleapis.com
 - fonts.gstatic.com
 - plausible.io
 - selenium.dev
 - www.netlify.com
 - www.selenium.dev

23 urls:
 - https://cdn.jsdelivr.net/npm/@docsearch/css@3.6.0
 - https://cdn.jsdelivr.net/npm/@docsearch/js@3.6.0
 - https://code.jquery.com/jquery-3.7.1.min.js
 - https://fonts.googleapis.com/css?family=Encode+Sans:300,300i,400,400i,700,700i&display=swap
 - https://fonts.gstatic.com/s/encodesans/v22/LDIhapOFNxEwR-Bd1O9uYNmnUQomAgE25imKSbHLSMA6.woff2
 - https://plausible.io/js/plausible.js
 - https://selenium.dev/
 - https://www.netlify.com/v3/img/components/netlify-light.svg
 - https://www.selenium.dev/
 - https://www.selenium.dev/css/prism.css
 - https://www.selenium.dev/favicons/favicon.ico
 - https://www.selenium.dev/images/sponsors/applitools.png
 - https://www.selenium.dev/images/sponsors/bright-data.png
 - https://www.selenium.dev/images/sponsors/browserstack.png
 - https://www.selenium.dev/images/sponsors/lambda-test.png
 - https://www.selenium.dev/images/sponsors/saucelabs.png
 - https://www.selenium.dev/js/docsearch-fix.js
 - https://www.selenium.dev/js/main.min.0a56205ed1c75268f017591e5418fdb944578a279fe7c7abe3981ebed191c341.js
 - https://www.selenium.dev/js/prism.js
 - https://www.selenium.dev/js/tabpane-persist.js
 - https://www.selenium.dev/scss/main.min.6d9ec00f116673042486c0086bf8ce0719532a8bb9d972fe34e86be6010ccb3c.css
 - https://www.selenium.dev/webfonts/fa-brands-400.woff2
 - https://www.selenium.dev/webfonts/fa-solid-900.woff2
This post is licensed under CC BY 4.0 by the author.