Google’s annual Webspam Report covering 2022 highlighted all the ways their SpamBrain anti-spam system became more adept at catching multiple forms of spam. While the report is mainly about reporting how much more spam they caught compared to the year before, the bits about how SpamBrain works seemed just as important.
Google’s SpamBrain Anti-Spam System Becomes More Effective in Catching Multiple Forms of Spam in 2022
Google’s annual Webspam Report covering 2022 has shed light on the company’s ongoing efforts to combat spam on the internet. The report highlights all the ways in which their SpamBrain anti-spam system became more adept at catching multiple forms of spam in 2022. While the report is mainly about reporting how much more spam they caught compared to the year before, the bits about how SpamBrain works seemed just as important.
Spam is a significant problem on the internet. It can take many forms, from unwanted emails and texts to malicious links and fake news. Spam is not only a nuisance, but it can also be a significant threat to cybersecurity. It can lead to phishing attacks, malware infections, and identity theft. Therefore, it is essential to have an effective anti-spam system in place to protect users’ online safety and privacy.
Google’s SpamBrain anti-spam system uses machine learning algorithms to detect and block spam from websites and emails. It works by analyzing the content of a page or email and detecting patterns that indicate spam. For example, SpamBrain can identify suspicious links, keywords, and phrases that are often used in spam messages.
In 2022, Google’s SpamBrain anti-spam system became even more effective in catching multiple forms of spam. The company reported that they caught 25 billion spam pages and 100 billion spam messages every day. That’s a significant increase from the previous year, where they caught 20 billion spam pages and 70 billion spam messages daily.
The report also highlighted other ways in which Google’s anti-spam system improved in 2022. For example, they improved their ability to detect and block phishing attacks by analyzing the content of emails and identifying suspicious links. They also improved their ability to detect and block fake news by analyzing the content of web pages and identifying misinformation.
Google’s efforts to combat spam are essential in maintaining the integrity of the internet. Spam not only harms individual users but also undermines the credibility of online information. Therefore, it is crucial to have effective anti-spam systems in place to protect online safety and privacy.
In conclusion, Google’s annual Webspam Report covering 2022 highlighted the company’s ongoing efforts to combat spam on the internet. Their SpamBrain anti-spam system became more adept at catching multiple forms of spam, including unwanted emails, malicious links, and fake news. Their efforts are a reminder of the importance of having effective anti-spam systems in place to protect online safety and privacy.
Google SpamBrain Platform
SpamBrain is the name that Google gave to their machine learning system that Google calls a platform from which to launch algorithms that detect multiple forms of unwanted content.
Machine learning is a form of artificial intelligence that uses data to learn to become increasingly proficient at the task it is designed to complete.
Machine Learning Examples In The Real World (And For SEO)
Not much is known about SpamBrain other than it’s a machine learning platform and it’s “central” to Google’s initiatives to keep spam from ranking.
Google’s Webspam report notes this about SpamBrain:
“We also improved SpamBrain as a robust and versatile platform, launching multiple solutions to improve our coverage of different abuse types.”
Improvements to SpamBrain
The Webspam report noted that improvements to the system resulted in catching 500% more spam sites than the year before.
Additional training resulted in a tenfold increase in SpamBrain’s ability to identify hacked websites.
Link Spam Detection
The report noted that special link spam training resulted in catching fifty times more sites creating link spam as compared from the year before, citing SpamBrain’s ability to learn as key to its success.
“Thanks to SpamBrain’s learning capability, we detected 50 times more link spam sites compared to the previous link spam update.”
Indexing Gatekeeper
An interesting fact about SpamBrain is how it identifies spam at the time of crawling.
If a crawled page is detected to be spam it is immediately blocked, preventing it from entering Google’s search index and saving resources from being wasted crawling unwanted webpages.
Blocking spam at crawl time is a capability that was announced in 2021, which noted that indexing is not only blocked when spam is crawled but also when it tries to sneak in through search console and sitemaps.
They wrote in 2021:
“…we have systems that can detect spam when we crawl pages or other content. Crawling is when our automatic systems visit content and consider it for inclusion in the index we use to provide search results. Some content detected as spam isn’t added to the index.
These systems also work for content we discover through sitemaps and Search Console.
For example, Search Console has a Request Indexing feature so creators can let us know about new pages that should be added quickly. We observed spammers hacking into vulnerable sites, pretending to be the owners of these sites, verifying themselves in the Search Console and using the tool to ask Google to crawl and index the many spammy pages they created.
Using AI, we were able to pinpoint suspicious verifications and prevented spam URLs from getting into our index this way.”
So it’s fair to say that one of the many functions of SpamBrain is to act like a gatekeeper, blocking spam before it has a chance to make it into Google’s index.
Scam Protection Is Now Multilingual
Something new for SpamBrain is that the scam identification system is now multilingual, reducing clicks on scam sites by 50% when compared to the year before.
What About Spammy Content?
This year’s report focused on catching link spam, identifying hacked sites and improvements in detecting spam at crawl time.
What it didn’t mention was anything to do with identifying spammy content.
Is this because the content side is handled by the Helpful Content Algorithm and not SpamBrain?
Read Google’s Webspam Report:
How we fought spam on Google Search in 2022