Memory safety in Chrome is an ever-ongoing effort to protect our users. We are constantly experimenting with different technologies to stay ahead of malicious actors. In this spirit, this post is about our journey of using heap scanning technologies to improve memory safety of C++.



Let’s start at the beginning though. Throughout the lifetime of an application its state is generally represented in memory. Temporal memory safety refers to the problem of guaranteeing that memory is always accessed with the most up to date information of its structure, its type. C++ unfortunately does not provide such guarantees. While there is appetite for different languages than C++ with stronger memory safety guarantees, large codebases such as Chromium will use C++ for the foreseeable future.



auto* foo = new Foo();

delete foo;

// The memory location pointed to by foo is not representing

// a Foo object anymore, as the object has been deleted (freed).

foo->Process();



In the example above, foo is used after its memory has been returned to the underlying system. The out-of-date pointer is called a dangling pointer and any access through it results in a use-after-free (UAF) access. In the best case such errors result in well-defined crashes, in the worst case they cause subtle breakage that can be exploited by malicious actors. 



UAFs are often hard to spot in larger codebases where ownership of objects is transferred between various components. The general problem is so widespread that to this date both industry and academia regularly come up with mitigation strategies. The examples are endless: C++ smart pointers of all kinds are used to better define and manage ownership on application level; static analysis in compilers is used to avoid compiling problematic code in the first place; where static analysis fails, dynamic tools such as C++ sanitizers can intercept accesses and catch problems on specific executions.



Chrome’s use of C++ is sadly no different here and the majority of high-severity security bugs are UAF issues. In order to catch issues before they reach production, all of the aforementioned techniques are used. In addition to regular tests, fuzzers ensure that there’s always new input to work with for dynamic tools. Chrome even goes further and employs a C++ garbage collector called Oilpan which deviates from regular C++ semantics but provides temporal memory safety where used. Where such deviation is unreasonable, a new kind of smart pointer called MiraclePtr was introduced recently to deterministically crash on accesses to dangling pointers when used. Oilpan, MiraclePtr, and smart-pointer-based solutions require significant adoptions of the application code.



Over the last decade, another approach has seen some success: memory quarantine. The basic idea is to put explicitly freed memory into quarantine and only make it available when a certain safety condition is reached. Microsoft has shipped versions of this mitigation in its browsers:  MemoryProtector in Internet Explorer in 2014 and its successor MemGC in (pre-Chromium) Edge in 2015. In the Linux kernel a probabilistic approach was used where memory was eventually just recycled. And this approach has seen attention in academia in recent years with the MarkUs paper. The rest of this article summarizes our journey of experimenting with quarantines and heap scanning in Chrome.



(At this point, one may ask where pointer authentication fits into this picture – keep on reading!)

Quarantining and Heap Scanning, the Basics

The main idea behind assuring temporal safety with quarantining and heap scanning is to avoid reusing memory until it has been proven that there are no more (dangling) pointers referring to it. To avoid changing C++ user code or its semantics, the memory allocator providing new and delete is intercepted.

Upon invoking delete, the memory is actually put in a quarantine, where it is unavailable for being reused for subsequent new calls by the application. At some point a heap scan is triggered which scans the whole heap, much like a garbage collector, to find references to quarantined memory blocks. Blocks that have no incoming references from the regular application memory are transferred back to the allocator where they can be reused for subsequent allocations.



There are various hardening options which come with a performance cost:

  • Overwrite the quarantined memory with special values (e.g. zero);

  • Stop all application threads when the scan is running or scan the heap concurrently;

  • Intercept memory writes (e.g. by page protection) to catch pointer updates;

  • Scan memory word by word for possible pointers (conservative handling) or provide descriptors for objects (precise handling);

  • Segregation of application memory in safe and unsafe partitions to opt-out certain objects which are either performance sensitive or can be statically proven as being safe to skip;

  • Scan the execution stack in addition to just scanning heap memory;



We call the collection of different versions of these algorithms StarScan [stɑː skæn], or *Scan for short.

Reality Check

We apply *Scan to the unmanaged parts of the renderer process and use Speedometer2 to evaluate the performance impact. 



We have experimented with different versions of *Scan. To minimize performance overhead as much as possible though, we evaluate a configuration that uses a separate thread to scan the heap and avoids clearing of quarantined memory eagerly on delete but rather clears quarantined memory when running *Scan. We opt in all memory allocated with new and don’t discriminate between allocation sites and types for simplicity in the first implementation.


Note that the proposed version of *Scan is not complete. Concretely, a malicious actor may exploit a race condition with the scanning thread by moving a dangling pointer from an unscanned to an already scanned memory region. Fixing this race condition requires keeping track of writes into blocks of already scanned memory, by e.g. using memory protection mechanisms to intercept those accesses, or stopping all application threads in safepoints from mutating the object graph altogether. Either way, solving this issue comes at a performance cost and exhibits an interesting performance and security trade-off. Note that this kind of attack is not generic and does not work for all UAF. Problems such as depicted in the introduction would not be prone to such attacks as the dangling pointer is not copied around.



Since the security benefits really depend on the granularity of such safepoints and we want to experiment with the fastest possible version, we disabled safepoints altogether.



Running our basic version on Speedometer2 regresses the total score by 8%. Bummer…



Where does all this overhead come from? Unsurprisingly, heap scanning is memory bound and quite expensive as the entire user memory must be walked and examined for references by the scanning thread.



To reduce the regression we implemented various optimizations that improve the raw scanning speed. Naturally, the fastest way to scan memory is to not scan it at all and so we partitioned the heap into two classes: memory that can contain pointers and memory that we can statically prove to not contain pointers, e.g. strings. We avoid scanning memory that cannot contain any pointers. Note that such memory is still part of the quarantine, it is just not scanned.



We extended this mechanism to also cover allocations that serve as backing memory for other allocators, e.g., zone memory that is managed by V8 for the optimizing JavaScript compiler. Such zones are always discarded at once (c.f. region-based memory management) and temporal safety is established through other means in V8.



On top, we applied several micro optimizations to speed up and eliminate computations: we use helper tables for pointer filtering; rely on SIMD for the memory-bound scanning loop; and minimize the number of fetches and lock-prefixed instructions.



We also improve upon the initial scheduling algorithm that just starts a heap scan when reaching a certain limit by adjusting how much time we spent in scanning compared to actually executing the application code (c.f. mutator utilization in garbage collection literature).



In the end, the algorithm is still memory bound and scanning remains a noticeably expensive procedure. The optimizations helped to reduce the Speedometer2 regression from 8% down to 2%.



While we improved raw scanning time, the fact that memory sits in a quarantine increases the overall working set of a process. To further quantify this overhead, we use a selected set of Chrome’s real-world browsing benchmarks to measure memory consumption. *Scan in the renderer process regresses memory consumption by about 12%. It’s this increase of the working set that leads to more memory being paged in which is noticeable on application fast paths.


Hardware Memory Tagging to the Rescue

MTE (Memory Tagging Extension) is a new extension on the ARM v8.5A architecture that helps with detecting errors in software memory use. These errors can be spatial errors (e.g. out-of-bounds accesses) or temporal errors (use-after-free). The extension works as follows. Every 16 bytes of memory are assigned a 4-bit tag. Pointers are also assigned a 4-bit tag. The allocator is responsible for returning a pointer with the same tag as the allocated memory. The load and store instructions verify that the pointer and memory tags match. In case the tags of the memory location and the pointer do not match a hardware exception is raised.



MTE doesn't offer a deterministic protection against use-after-free. Since the number of tag bits is finite there is a chance that the tag of the memory and the pointer match due to overflow. With 4 bits, only 16 reallocations are enough to have the tags match. A malicious actor may exploit the tag bit overflow to get a use-after-free by just waiting until the tag of a dangling pointer matches (again) the memory it is pointing to.



*Scan can be used to fix this problematic corner case. On each delete call the tag for the underlying memory block gets incremented by the MTE mechanism. Most of the time the block will be available for reallocation as the tag can be incremented within the 4-bit range. Stale pointers would refer to the old tag and thus reliably crash on dereference. Upon overflowing the tag, the object is then put into quarantine and processed by *Scan. Once the scan verifies that there are no more dangling pointers to this block of memory, it is returned back to the allocator. This reduces the number of scans and their accompanying cost by ~16x.



The following picture depicts this mechanism. The pointer to foo initially has a tag of 0x0E which allows it to be incremented once again for allocating bar. Upon invoking delete for bar the tag overflows and the memory is actually put into quarantine of *Scan.

We got our hands on some actual hardware supporting MTE and redid the experiments in the renderer process. The results are promising as the regression on Speedometer was within noise and we only regressed memory footprint by around 1% on Chrome’s real-world browsing stories.



Is this some actual free lunch? Turns out that MTE comes with some cost which has already been paid for. Specifically, PartitionAlloc, which is Chrome’s underlying allocator, already performs the tag management operations for all MTE-enabled devices by default. Also, for security reasons, memory should really be zeroed eagerly. To quantify these costs, we ran experiments on an early hardware prototype that supports MTE in several configurations:

  1. MTE disabled and without zeroing memory;

  2. MTE disabled but with zeroing memory;

  3. MTE enabled without *Scan;

  4. MTE enabled with *Scan;



(We are also aware that there’s synchronous and asynchronous MTE which also affects determinism and performance. For the sake of this experiment we kept using the asynchronous mode.) 

The results show that MTE and memory zeroing come with some cost which is around 2% on Speedometer2. Note that neither PartitionAlloc, nor hardware has been optimized for these scenarios yet. The experiment also shows that adding *Scan on top of MTE comes without measurable cost. 


Conclusions

C++ allows for writing high-performance applications but this comes at a price, security. Hardware memory tagging may fix some security pitfalls of C++, while still allowing high performance. We are looking forward to see a more broad adoption of hardware memory tagging in the future and suggest using *Scan on top of hardware memory tagging to fix temporary memory safety for C++. Both the used MTE hardware and the implementation of *Scan are prototypes and we expect that there is still room for performance optimizations.




At the KubeCon EU 2022 conference in Valencia, security researchers from Palo Alto Networks presented research findings on “trampoline pods”—pods with an elevated set of privileges required to do their job, but that could conceivably be used as a jumping off point to gain escalated privileges.

The research mentions GKE, including how developers should look at the privileged pod problem today, what the GKE team is doing to minimize the use of privileged pods, and actions GKE users can take to protect their clusters.

Privileged pods within the context of GKE security

While privileged pods can pose a security issue, it’s important to look at them within the overall context of GKE security. To use a privileged pod as a “trampoline” in GKE, there is a major prerequisite – the attacker has to first execute a successful application compromise and container breakout attack.

Because the use of privileged pods in an attack requires a first step such as a container breakout to be effective, let’s look at two areas:
  1. features of GKE you can use to reduce the likelihood of a container breakout
  2. steps the GKE team is taking to minimize the use of privileged pods and the privileges needed in them.
Reducing container breakouts

There are a number of features in GKE along with some best practices that you can use to reduce the likelihood of a container breakout:

More information can be found in the GKE Hardening Guide.

How GKE is reducing the use of privileged pods.

While it’s not uncommon for customers to install privileged pods into their clusters, GKE works to minimize the privilege levels held by our system components, especially those that are enabled by default. However, there are limits as to how many privileges can be removed from certain features. For example, Anthos Config Management requires permissions to modify most Kubernetes objects to be able to create and manage those objects.

Some other privileges are baked into the system, such as those held by Kubelet. Previously, we worked with the Kubernetes community to build the Node Restriction and Node Authorizer features to limit Kubelet's access to highly sensitive objects, such as secrets, adding protection against an attacker with access to the Kubelet credentials.

More recently, we have taken steps to reduce the number of privileged pods across GKE and have added additional documentation on privileges used in system pods as well as information on how to improve pod isolation. Below are the steps we’ve taken:
  1. We have added an admission controller to GKE Autopilot and GKE Standard (on by default) and GKE/Anthos (opt-in) that stops attempts to run as a more privileged service account, which blocks a method of escalating privileges using privileged pods.
  2. We created a permission scanning tool that identifies pods that have privileges that could be used for escalation, and we used that tool to perform an audit across GKE and Anthos.
  3. The permission scanning tool is now integrated into our standard code review and testing processes to reduce the risk of introducing privileged pods into the system. As mentioned earlier, some features require privileges to perform their function.
  4. We are using the audit results to reduce permissions available to pods. For example, we removed “update nodes and pods” permissions from anetd in GKE.
  5. Where privileged pods are required for the operation of a feature, we’ve added additional documentation to illustrate that fact.
  6. We added documentation that outlines how to isolate GKE-managed workloads in dedicated node pools when you’re unable to use GKE Sandbox to reduce the risk of privilege escalation attacks.
In addition to the measures above, we recommend users take advantage of tools that can scan RBAC settings to detect overprivileged pods used in their applications. As part of their presentation, the Palo Alto researchers announced an open source tool, called rbac-police, that can be used for the task. So, while it only takes a single overprivileged workload to trampoline to the cluster, there are a number of actions you can take to minimize the likelihood of the prerequisite container breakout and the number of privileges used by a pod.



Every year at I/O we share the latest on privacy and security features on Android. But we know some users like to go a level deeper in understanding how we’re making the latest release safer, and more private, while continuing to offer a seamless experience. So let’s dig into the tools we’re building to better secure your data, enhance your privacy and increase trust in the apps and experiences on your devices.

Low latency, frictionless security

Regardless of whether a smartphone is used for consumer or enterprise purposes, attestation is a key underpinning to ensure the integrity of the device and apps running on the device. Fundamentally, key attestation lets a developer bind a secret or designate data to a device. This is a strong assertion: "same user, same device" as long as the key is available, a cryptographic assertion of integrity can be made.

With Android 13 we have migrated to a new model for the provisioning of attestation keys to Android devices which is known as Remote Key Provisioning (RKP). This new approach will strengthen device security by eliminating factory provisioning errors and providing key vulnerability recovery by moving to an architecture where Google takes more responsibility in the certificate management lifecycle for these attestation keys. You can learn more about RKP here.

We’re also making even more modules updatable directly through Google Play System Updates so we can automatically upgrade more system components and fix bugs, seamlessly, without you having to worry about it. We now have more than 30 components in Android that can be automatically updated through Google Play, including new modules in Android 13 for Bluetooth and ultra-wideband (UWB).

Last year we talked about how the majority of vulnerabilities in major operating systems are caused by undefined behavior in programming languages like C/C++. Rust is an alternative language that provides the efficiency and flexibility required in advanced systems programming (OS, networking) but Rust comes with the added boost of memory safety. We are happy to report that Rust is being adopted in security critical parts of Android, such as our key management components and networking stacks.

Hardening the platform doesn’t just stop with continual improvements with memory safety and expansion of anti-exploitation techniques. It also includes hardening our API surfaces to provide a more secure experience to our end users.

In Android 13 we implemented numerous enhancements to help mitigate potential vulnerabilities that app developers may inadvertently introduce. This includes making runtime receivers safer by allowing developers to specify whether a particular broadcast receiver in their app should be exported and visible to other apps on the device. On top of this, intent filters block non-matching intents which further hardens the app and its components.

For enterprise customers who need to meet certain security certification requirements, we’ve updated our security logging reporting to add more coverage and consolidate security logs in one location. This is helpful for companies that need to meet standards like Common Criteria and is useful for partners such as management solutions providers who can review all security-related logs in one place.

Privacy on your terms

Android 13 brings developers more ways to build privacy-centric apps. Apps can now implement a new Photo picker that allows the user to select the exact photos or videos they want to share without having to give another app access to their media library.

With Android 13, we’re also reducing the number of apps that require your location to function using the nearby devices permission introduced last year. For example, you won’t have to turn on location to enable Wi-fi for certain apps and situations. We’ve also changed how storage works, requiring developers to ask for separate permissions to access audio, image and video files.

Previously, we’ve limited apps from accessing your clipboard in the background and alerted you when an app accessed it. With Android 13, we’re automatically deleting your clipboard history after a short period so apps are blocked from seeing old copied information.

In Android 11, we began automatically resetting permissions for apps you haven’t used for an extended period of time, and have since expanded the feature to devices running Android 6 and above. Since then, we’ve automatically reset over 5 billion permissions.

In Android 13, app makers can go above and beyond in removing permissions even more proactively on behalf of their users. Developers will be able to provide even more privacy by reducing the time their apps have access to unneeded permissions.

Finally, we know notifications are critical for many apps but are not always of equal importance to users. In Android 13, you’ll have more control over which apps you would like to get alerts from, as new apps on your device are required to ask you for permission by default before they can send you notifications.

Apps you can trust

Most app developers build their apps using a variety of software development kits (SDKs) that bundle in pre-packaged functionality. While SDKs provide amazing functionality, app developers typically have little visibility or control over the SDK code or insight into their performance.

We’re working with developers to make their apps more secure with a new Google Play SDK Index that helps them see SDK safety and reliability signals before they build the code into their apps. This ensures we're helping everyone build a more secure and private app ecosystem.

Last month, we also started rolling out a new Data safety section in Google Play to help you understand how apps plan to collect, share, and protect your data, before you install it. To instill even more trust in Play apps, we're enabling developers to have their apps independently validated against OWASP’s MASVS, a globally recognized standard for mobile app security.

We’re working with a small group of developers and authorized lab partners to evolve the program. Developers who have completed this independent validation can showcase this on their Data safety section.

Additional mobile security and safety

Just like our anti-malware protection Google Play, which now scans 125 billion apps a day, we believe spam and phishing detection should be built in. We’re proud to announce that in a recent analyst report, Messages was the highest rated built-in messaging app for anti-phishing and scams protection.

Messages is now also helping to protect you against 1.5 billion spam messages per month, so you can avoid both annoying texts and attempts to access your data. These phishing attempts are increasingly how bad actors are trying to get your information, by getting you to click on a link or download an app, so we are always looking for ways to offer another line of defense.

Last year, we introduced end-to-end encryption in Messages to provide more security for your mobile conversations. Later this year, we’ll launch end-to-end encryption group conversations in beta to ensure your personal messages get even more protection.

As with a lot of features we build, we try to do it in an open and transparent way. In Android 11 we announced a new platform feature that was backed by an ISO standard to enable the use of digital IDs on a smartphone in a privacy-preserving way. When you hand over your plastic license (or other credential) to someone for verification it’s all or nothing which means they have access to your full name, date of birth, address, and other personally identifiable information (PII). The mobile version of this allows for much more fine-grained control where the end user and/or app can select exactly what to share with the verifier. In addition, the verifier must declare whether they intend to retain the data returned. In addition, you can present certain details of your credentials, such as age, without revealing your identity.

Over the last two Android releases we have been improving this API and making it easier for third-party organizations to leverage it for various digital identity use cases, such as driver’s licenses, student IDs, or corporate badges. We’re now announcing that Google Wallet uses Android Identity Credential to support digital IDs and driver’s licenses. We’re working with states in the US and governments around the world to bring digital IDs to Wallet later this year. You can learn more about all of the new enhancements in Google Wallet here.

Protected by Android

We don’t think your security and privacy should be hard to understand and control. Later this year, we’ll begin rolling out a new destination in settings on Android 13 devices that puts all your device security and data privacy front and center.

The new Security & Privacy settings page will give you a simple, color-coded way to understand your safety status and will offer clear and actionable guidance to improve it. The page will be anchored by new action cards that notify you of critical steps you should take to address any safety risks. In addition to notifications to warn you about issues, we’ll also provide timely recommendations on how to enhance your privacy.

We know that to feel safe and in control of your data, you need to have a secure foundation you can count on. Because if your device isn’t secure, it’s not private either. We’re working hard to make sure you’re always protected by Android. Learn more about these protections on our website.


Left: legitimate Google SMS verification. Right: spoofed message asking victim to share verification code.


In other cases, attackers have leveraged more sophisticated dynamic phishing pages to conduct relay attacks. In these attacks, a user thinks they're logging into the intended site, just as in a standard phishing attack. But instead of deploying a simple static phishing page that saves the victim's email and password when the victim tries to login, the phisher has deployed a web service that logs into the actual website at the same time the user is falling for the phishing page.

The simplest approach is an almost off-the-shelf "reverse proxy" which acts as a "person in the middle", forwarding the victim's inputs to the legitimate page and sending the response from the legitimate page back to the victim's browser.



These attacks are especially challenging to prevent because additional authentication challenges shown to the attacker—like a prompt for an SMS code—are also relayed to the victim, and the victim's response is in turn relayed back to the real website. In this way, the attacker can count on their victim to solve any authentication challenge presented.

Traditional multi-factor authentication with PIN codes can only do so much against these attacks, and authentication with smartphone approvals via a prompt — while more secure against SIM-swap attacks — is still vulnerable to this sort of real-time interception.

The Solution Space

Over the past year, we've started to automatically enable device-based two-factor authentication for our users. This authentication not only helps protect against traditional password compromise but, with technology improvements, we can also use it to help defend against these more sophisticated forms of phishing.

Taking a broad view, most efforts to protect and defend against phishing fall into the following categories:
  • Browser UI improvements to help users identify authentic websites.
  • Password managers that can validate the identity of the web page before logging in.
  • Phishing detection, both in email—the most common delivery channel—and in the browser itself, to warn users about suspicious web pages.
  • Preventing the person-in-the-middle attacks mentioned above by preventing automated login attempts.
  • Phishing-resistant authentication using FIDO with security keys or a Bluetooth connection to your phone.
  • Hardening the Google Prompt challenge to help users identify suspicious sign-in attempts, or to ask them to take additional steps that can defeat phishing (like navigating to a new web address, or to join the same wireless network as the computer they're logging into).

Expanding phishing-resistant authentication to more users


Over the last decade we’ve been working hard with a number of industry partners on expanding phishing-resistant authentication mechanisms, as part of FIDO Alliance. Through these efforts we introduced physical FIDO security keys, such as the Titan Security Key, which prevent phishing by verifying the identity of the website you're logging into. (This verification protects against the "person-in-the-middle" phishing described above.) Recently, we announced a major milestone with the FIDO Alliance, Apple and Microsoft by expanding our support for the FIDO Sign-in standards, helping to launch us into a truly passwordless, phishing-resistant future.

Even though security keys work great, we don't expect everyone to add one to their keyring.



Instead, to make this level of security more accessible, we're building it into mobile phones. Unlike physical FIDO security keys that need to be connected to your device via USB, we use Bluetooth to ensure your phone is close to the device you're logging into. Like physical security keys, this helps prevent a distant attacker from tricking you into approving a sign-in on their browser, giving us an added layer of security against the kind of "person in the middle" attacks that can still work against SMS or Google Prompt.

(But don't worry: this doesn't allow computers within Bluetooth range to login as you—it only grants that approval to the computer you're logging into. And we only use this to verify that your phone is near the device you're logging into, so you only need to have Bluetooth on during login.)

Over the next couple of months we’ll be rolling out this technology in more places, which you might notice as a request for you to enable Bluetooth while logging in, so we can perform this additional security check. If you've signed into your Google account on your Android phone, we can enroll your phone automatically—just like with Google Prompt—allowing us to give this added layer of security to many of our users without the need for any additional setup.

But unfortunately this secure login doesn't work everywhere—for example, when logging into a computer that doesn't support Bluetooth, or a browser that doesn't support security keys. That's why, if we are to offer phishing-resistant security to everyone, we have to offer backups when security keys aren't available—and those backups must also be secure enough to prevent attackers from taking advantage of them.


Hardening existing challenges against phishin
g

Over the past few months, we've started experimenting with making our traditional Google Prompt challenges more phishing resistant.

We already use different challenge experiences depending on the situation—for example, sometimes we ask the user to match a PIN code with what they're seeing on the screen in addition to clicking "allow" or "deny". This can help prevent static phishing pages from tricking you into approving a challenge.

We've also begun experimenting with more involved challenges for higher-risk situations, including more prominent warnings when we see you logging in from a computer that we think might belong to a phisher, or asking you to join your phone to the same Wi-Fi network as the computer you're logging into so we can be sure the two are near each other. Similar to our use of Bluetooth for Security Keys, this prevents an attacker from tricking you into logging into a "person-in-the-middle" phishing page.


Bringing it all together

Of course, while all of these options dramatically increase account security, we also know that they can be a challenge for some of our users, which is why we're rolling them out gradually, as part of a risk-based approach that also focuses on usability. If we think an account is at a higher risk, or if we see abnormal behavior, we're more likely to use these additional security measures.

Over time, as FIDO2 authentication becomes more widely available, we expect to be able to make it the default for many of our users, and to rely on stronger versions of our existing challenges like those described above to provide secure fallbacks.

All these new tools in our toolbox—detecting browser automation to prevent "person in the middle" attacks, warning users in Chrome and Gmail, making the Google Prompt more secure, and automatically enabling Android phones as easy-to-use Security Keys—work together to allow us to better protect our users against phishing.

Phishing attacks have long been seen as a persistent threat, but these recent developments give us the ability to really move the needle and help more of our users stay safer online.