A collision occurs when two distinct pieces of data—a document, a binary, or a website’s certificate—hash to the same digest as shown above. In practice, collisions should never occur for secure hash functions. However if the hash algorithm has some flaws, as SHA-1 does, a well-funded attacker can craft a collision. The attacker could then use this collision to deceive systems that rely on hashes into accepting a malicious file in place of its benign counterpart. For example, two insurance contracts with drastically different terms.

Finding the SHA-1 collision

In 2013, Marc Stevens published a paper that outlined a theoretical approach to create a SHA-1 collision. We started by creating a PDF prefix specifically crafted to allow us to generate two documents with arbitrary distinct visual contents, but that would hash to the same SHA-1 digest. In building this theoretical attack in practice we had to overcome some new challenges. We then leveraged Google’s technical expertise and cloud infrastructure to compute the collision which is one of the largest computations ever completed.

Here are some numbers that give a sense of how large scale this computation was:

  • Nine quintillion (9,223,372,036,854,775,808) SHA1 computations in total
  • 6,500 years of CPU computation to complete the attack first phase
  • 110 years of GPU computation to complete the second phase

While those numbers seem very large, the SHA-1 shattered attack is still more than 100,000 times faster than a brute force attack which remains impractical.

Mitigating the risk of SHA-1 collision attacks

Moving forward, it’s more urgent than ever for security practitioners to migrate to safer cryptographic hashes such as SHA-256 and SHA-3. Following Google’s vulnerability disclosure policy, we will wait 90 days before releasing code that allows anyone to create a pair of PDFs that hash to the same SHA-1 sum given two distinct images with some pre-conditions. In order to prevent this attack from active use, we’ve added protections for Gmail and GSuite users that detects our PDF collision technique. Furthermore, we are providing a free detection system to the public.

You can find more details about the SHA-1 attack and detailed research outlining our techniques here.

About the team

This result is the product of a long-term collaboration between the CWI institute and Google’s Research security, privacy and anti-abuse group.

Marc Stevens and Elie Bursztein started collaborating on making Marc’s cryptanalytic attacks against SHA-1 practical using Google infrastructure. Ange Albertini developed the PDF attack, Pierre Karpman worked on the cryptanalysis and the GPU implementation, Yarik Markov took care of the distributed GPU code, Alex Petit Bianco implemented the collision detector to protect Google users and Clement Baisse oversaw the reliability of the computations.


ann@example.com/dir/file.
Any user with appropriate permission can access the contents of this file by using Upspin services to evaluate the full path name, typically via a FUSE filesystem so that unmodified applications just work. Upspin names usually identify regular static files and directories, but may point to dynamic content generated by devices such as sensors or services.

If the user wishes to share a directory (the unit at which sharing privileges are granted), she adds a file called Access to that directory. In that file she describes the rights she wishes to grant and the users she wishes to grant them to. For instance,
read: joe@here.com, mae@there.com
allows Joe and Mae to read any of the files in the directory holding the Access file, and also in its subdirectories. As well as limiting who can fetch bytes from the server, this access is enforced end-to-end cryptographically, so cleartext only resides on Upspin clients, and use of cloud storage does not extend the trust boundary.

Upspin looks a bit like a global file system, but its real contribution is a set of interfaces, protocols, and components from which an information management system can be built, with properties such as security and access control suited to a modern, networked world. Upspin is not an "app" or a web service, but rather a suite of software components, intended to run in the network and on devices connected to it, that together provide a secure, modern information storage and sharing network. Upspin is a layer of infrastructure that other software and services can build on to facilitate secure access and sharing. This is an open source contribution, not a Google product. We have not yet integrated with the Key Transparency server, though we expect to eventually, and for now use a similar technique of securely publishing all key updates. File storage is inherently an archival medium without forward secrecy; loss of the user's encryption keys implies loss of content, though we do provide for key rotation.

It’s early days, but we’re encouraged by the progress and look forward to feedback and contributions. To learn more, see the GitHub repository at upspin.


Different threats to different types of organizations

Attackers appear to choose targets based on multiple dimensions, such as the size and the type of the organization, its country of operation, and the organization’s sector of activity. Let’s look at an example of corporate users across businesses, nonprofits, government-related industries, and education services. If we consider business inboxes as a baseline, we find attackers are far more likely to target nonprofits with malware, while attackers are more likely to target businesses with phishing and spam.

histogram.png

These nuances go all the way down to the granularity of country and industry type. This shows how security and abuse professionals must tailor defenses based on their personalized threat model, where no single corporate user faces the same attacks.


Constant improvements to corporate Gmail protections

Research like this enables us to better protect our users. We are constantly innovating to better protect our users, and we've already implemented these findings into our G Suite protections. Additionally, we have implemented and rolled out several features that help our users stay safe against these ever-evolving threats.
  • The forefront of our defenses is a state-of-the-art email classifier that detects abusive messages with 99.9% accuracy.
  • To protect yourself from unsafe websites, make sure to heed interstitial warnings that alert you of potential phishing and malware attacks.
  • Use many layers of defense: we recommend using a security key enforcement (2-step verification) to thwart attackers from accessing your account in the event of a stolen password.
  • To ensure your email contents’ stays safe and secure in transit, use our hosted S/MIME feature.
  • Use our TLS encryption indicator, to ensure only the intended recipient can read your email.
We will never stop working to keep our users and their inboxes secure. To learn more about how we protect Gmail, check out this YouTube video that summarizes the lessons we learned while protecting Gmail users through the years.