Just what percentage of users on Twitter are ‘fake’ or ‘bots’? That’s the multi-billion dollar question right now, with Elon Musk hinting at a lower price for the deal, given he’s not convinced about the company’s user base. In fact, he has said that “fake users make up 20 per cent of all Twitter accounts.” Meanwhile, Twitter insists that only 5 per cent of its monetizable Daily Active Users (mDAUs) are fake or bots.
We take a look at everything that has been said about Twitter’s bot problem so far, what recent independent research claims, and why this is a hard problem to solve.
The Tesla CEO has made a number of statements around this issue of bots, the latest being the claim about 20 per cent of users are fake, but he’s now insisting on more evidence from Twitter to back its claims that only 5 per cent of users are spam or fake.
He wrote in a new tweet, “20% fake/spam accounts, while 4 times what Twitter claims, could be *much* higher. My offer was based on Twitter’s SEC filings being accurate. Yesterday, Twitter’s CEO publicly refused to show proof of <5%. This deal cannot move forward until he does.”
Musk had tweeted on May 13 saying the deal was on hold till he could confirm the number of users, though two hours later he tweeted saying he was committed to acquisition. More recently, he has posted that his team would do “a random sample of 100 followers of @twitter” to figure out the number of fake followers.” He also invited others to carry out the same process.
Another analysis by Twitter analytics tool SparkToro—an audience research tool company and Followerwonk—has claimed that “19.42% of public accounts that sent a tweet in the past 90 days are fake or spam.” The analysis was carried out between May 13 to May 15 and in a blog post, the researchers revealed that they looked at 44,058 public Twitter accounts active in the last 90 days and that these were selected randomly by machine from a set of 130+ million public, active profiles.
The definition of spam here includes accounts “that do not regularly have a human being personally composing the content of their tweets, consuming the activity on their timeline, or engaging in the Twitter ecosystem,” according to their blog.
The researchers also looked at Elon Musk and his 93 million-plus followers and concluded that over 70.23 per cent of his followers are “unlikely to be authentic, active users who see his tweets.” They attributed the rise in fake followers to “Musk’s active use of Twitter, the media coverage of his tweets, and Twitter’s own recommendation systems.”
Some of the signals that SparkToro includes in its Fake Followers analysis include looking at the profile image, the account age in days, the number of followers, days since the last tweet, days inactive, etc. Other signals considered are oversharing as spam accounts often tend to post a large volume of tweets on a daily basis, etc.
But keep in mind that Twitter has said these signals alone—say what looks like a fake profile picture—are not enough to define an account as fake or spam and the company itself includes a lot more in private and public data when making such decisions.
The researchers claim their “methodology likely undercounts spam and fake accounts, but almost never includes false positives.” They also admitted that unlike Twitter they don’t have access to some other private data, which the company is likely using to rate accounts as fake or spam and therefore their methodology could be flawed. However, they are confident that the number of spam or fake followers is being undercounted.
Elon Musk’s comments have been disputed by Twitter CEO Parag Agrawal who has put out a long thread earlier today, explaining all the signals and factors that they consider when marking an account as ‘spam’ or ‘fake’. Parag Agrawal stressed that fighting spam is “incredibly *dynamic*,” as they are constantly evolving and one set of rules developed today might not work later on.
Parag Agrawal revealed that Twitter suspends “over half a million spam accounts every day,” even before users can see them on the platform. Agrawal revealed that Twitter’s estimate of spam accounts, which it puts at under 5 per cent, is based on “multiple human reviews (in replicate) of thousands of accounts, that are sampled at random, consistently over time.”
Twitter and its human reviewers look at a number of signals using both private and public data to categorise an account as spam or fake, according to the thread. This private data includes IP address, phone number, geolocation, client/browser signatures, what the account does when it’s active…, etc, according to Agrawal.
According to Parag Agrawal, this estimation of spam or fake accounts cannot “be performed externally, given the critical need to use both public and private information (which we can’t share).”
Twitter itself does not mark all bots as bad. In fact, it has a list of criteria defining bad bot behaviour, and this includes “malicious use of automation to undermine and disrupt the public conversation, like trying to get something to trend” to “artificial amplification of conversations” as well spammy tweets, bulk tweeting, etc.
According to Ankush Sabharwal, Founder and CEO of CoRover, which has built IRCTC’s Disha chatbot, finding spam or bot accounts is easier said than done, though some behaviour is obvious. “To give an example, if I’m a human, my mouse will not directly go to the action button or image. My actions will have some navigation flow. But if it is a bot, the bot knows exactly where to go. And the cursor would not navigate the way humans would navigate,” he explained.
But, he too admits that not all bots can be detected. “They might have built a kind of logic to retweet only certain events,” he explained, which can make the challenge much harder.
There are several open-source tools to detect bots on Twitter as well. There’s Botometer, a project of the Observatory on Social Media at Indiana University in the US, which works by checking “the activity of a Twitter account and giving it a score,” according to their description page. A “higher scores mean more bot-like activity,” it adds. The score ranges from zero to five, with zero indicating the most human-like activity.
Again, such methodology is not foolproof as most experts agree that some bots are incredibly good at appearing human. Finally, Twitter’s bot problem is not new. In fact, as we have reported in the past, independent research has shown how Twitter bots were used to manipulate hashtags, etc during the 2019 elections in India. Bots were also used extensively during the US 2016 and 2020 elections as well to manipulate trends, etc, though Twitter claims to have cracked down on them significantly since then. But clearly, Elon Musk is not convinced about its claims and the controversy is unlikely to go away anytime soon.