A popular technique used by website operators to observe the keystrokes, mouse movements and scrolling behavior of visitors on Web pages is fraught with risk, according to researchers at Princeton’s Center for Information Technology Policy.
The technique offered by a number of service providers uses scripts to capture the activity of a visitor on a Web page, store it on the provider’s servers, and play it back on demand for a website’s operators.
The idea behind the practice is to give operators insights into how users are interacting with their websites and to identify broken and confusing pages.
“You use session replay scripts to find out where all the dead zones are on your website,” said Tod Beardsley, director of research at Rapid 7.
“If you have a space for a ‘click here for 10 percent off’ and no one clicks there, there may be a problem with that page,” he told TechNewsWorld.
The scripts also can be used for support and to troubleshoot user problems, Beardsley added.
Peeping Scripts
However, the extent of data collected by the scripts far exceeds user expectations, according to researchers Steven Englehardt, Gunes Acar and Arvind Narayanan.
Text typed into forms is collected before a user submits the form, and precise mouse movements are saved — all without any visual indication to the user, they noted in an online post.
What’s more, the data can’t be reasonably expected to be kept anonymous.
“In fact, some companies allow publishers to explicitly link recordings to a user’s real identity,” wrote the team. “Unlike typical analytics services that provide aggregate statistics, these scripts are intended for the recording and playback of individual browsing sessions, as if someone is looking over your shoulder.”
That means that whether a visitor completes a form and submits it to the website or not, any information keyed in at the website can be seen by the operator.
“Even if you deleted the data you entered into a form, it would be exposed and visible to the website owner,” said Abine CTO Andrew Sudbury.
“You’re being recorded when you think you aren’t, so you might reveal things you wouldn’t reveal if you knew you were being recorded,” he told TechNewsWorld.
Flubbing Scrubbing
The researchers studied seven session replay script service providers for 482 of the top 50,000 sites listed on Alexa. The services were Yandex, FullStory, Hotjar, UserReplay, Smartlook, Clicktale and SessionCam.
The services offer a number of ways for website publishers to exclude sensitive information from the replay sessions, the researchers found, but those options were labor-intensive, which discouraged their use.
For leaks to be avoided, publishers would need to diligently check and scrub all pages that display or accept user information, they explained.
For dynamically generated sites, the process would involve inspecting the underlying Web application’s server-side code, wrote Englehardt, Acar and Narayanan.
Further, the process would need to be repeated every time a site was updated or the Web application powering it changed.
“The scripts just gather everything, so someone would have to go in and spend time and energy telling the service provider what not to gather on any particular Web page,” Sudbury said. “Generally, the publishers don’t do that.”
Leaking Passwords
To identify some of the risks replay scripts posed to site visitors, the researchers set up test pages and used scripts from six of the seven companies in the study. One of the companies, Clicktale, was excluded for practical considerations.
Password leakage is one risk the replay services can pose. All the services take pains to redact passwords from their replays, the researchers explained, but those policies can break down on pages with mobile-friendly login boxes that use text inputs to store unmasked passwords.
The services redacted sensitive information in a partial and imperfect way, the researchers also found. In addition to automated blocking of information in the replay sessions, the services let publishers manually specify fields for exclusion.
“To effectively deploy these mitigations, a publisher will need to actively audit every input element to determine if it contains personal data,” the team wrote. “This is complicated, error prone and costly, especially as a site or the underlying web application code changes over time. “
Vulnerable Transmissions
User input isn’t the only way privacy can be violated. Information on rendered pages also is captured by the replay services.
“Unlike user input recording, none of the companies appear to provide automated redaction of displayed content by default; all displayed content in our tests ended up leaking,” the researchers wrote.
Because it forces publishers to address that issue manually, the process is fundamentally insecure, they maintained.
There are also potential risks in the transmission of data between the service provider and the publisher.
Once a session recording is complete, publishers can review it using a dashboard provided by the recording service, the researchers explained.
Some services deliver playbacks in an HTTP page, even if the original page was protected by HTTPS, they continued. That makes the playback page vulnerable to a man-in-middle attack that could suck all the data from the page and into a hacker’s hands.
What’s more, some services don’t use HTTPS to communicate with their clients, which exposes the transmissions to passive network surveillance.
Strict Requirements
At least one session replay provider said it took a number of precautions to protect its clients’ information.
“All of Clicktale’s policies and practices meet ISO 27001, aligning with the strict requirements of our global customers,” said Leor Hurwitz, general counsel at Clicktale.
ISO 27001 is a security standard for information security management systems that mandates requirements for implementing, monitoring, maintaining and continually improving those systems.
“By default, Clicktale is set up to not capture keystrokes or any common sensitive data fields contained within a Web page,” Hurwitz told TechNewsWorld.
In addition to establishing default blocks, the company works closely with its customers to ensure that when it implements a session replay system, any sensitive information contained within a Web page is not included in the capture process, he explained.
Those measures allow its clients to improve customer experiences without the need to capture sensitive information that is not directly related to the shopping experience, Hurwitz added.
Blocking the Scripts
Consumers concerned about replay scripts can obtain software to block them.
“The javascript that performs this action is loaded by your browser when you visit a website. That can be blocked by a tracker blocker,” Abine’s Sudbury said.
“The Web provides all sorts of amazing technical capabilities that are designed to let users have rich experiences at websites,” he observed, “but what’s frustrating is that the advertising, profiling and tracking industries have discovered very quickly clever ways to track people against their will.”
Replay scripts have become an emerging topic among privacy advocates, noted David Picket, a security analyst at AppRiver.
“The current discussion will raise user awareness,” he told TechNewsWorld. “That typically results in greater demand for oversight, and technologies to combat this problem will most likely be built into existing solutions or emerge to prevent it.”