How Twitter Algos Determine Who Is Market-Moving And Who Isn't

Tyler Durden's picture

Now that even Bridgewater has joined the Twitter craze and is using user-generated content for real-time economic modelling, and who knows what else, the scramble to determine who has the most market-moving, and actionable, Twitter stream is on. Because with HFT algos having camped out at all the usual newswire sources: Bloomberg, Reuters, Dow Jones, etc. the scramble to find a "content edge" for market moving information has never been higher. However, that opens up a far trickier question: whose information on the fastest growing social network, one which many say may surpass Bloomberg in terms of news propagation and functionality, is credible and by implication: whose is not? Indeed, that is the $64K question. Luckily, there is an algo for that.

In a note by Castillo et al from Yahoo Research in Spain and Chile, the authors focus on automatic methods for assessing the credibility of a given set of tweets. Specifically, they analyze microblog postings related to “trending” topics, and classify them as credible or not credible, based on features extracted from them. Our results shows that there are measurable differences in the way messages propagate, that can be used to classify them automatically as credible or not credible, with precision and recall in the range of 70% to 80%.

Needless to say, the topic of social media credibility is a critical one, in part due to the voluntary anonymity of the majority of sources , the frequent error rate of named sources, the painfully subjective attributes involved in determining good and bad information, and one where discerning the credible sources has become a very lucrative business. Further from the authors:

In a recent user study, it was found that providing information to users about the estimated credibility of online content was very useful and valuable to them. In absence of this external information, perceptions of credibility online are strongly influenced by style-related attributes, including visual design, which are not directly related to the content itself. Users also may change their perception of credibility of a blog posting depending on the (supposed) gender of the author. In this light the results of the experiment described are not surprising. In the experiment, the headline of a news item was presented to users in different ways, i.e. as posted in a traditional media website, as a blog, and as a post on Twitter. Users found the same news headline significantly less credible when presented on Twitter.


This distrust may not be completely ungrounded. Major search engines are starting to prominently display search results from the “real-time web” (blog and microblog postings), particularly for trending topics. This has attracted spammers that use Twitter to attract visitors to (typically) web pages offering products or services. It has also increased the potential impact of orchestrated attacks that spread lies and misinformation. Twitter is currently being used as a tool for political propaganda. Misinformation can also be spread unwillingly. For instance, on November 2010 the Twitter account of the presidential adviser for disaster management of Indonesia was hacked. The hacker then used the account to post a false tsunami warning. On January 2011 rumors of a shooting in the Oxford Circus in London, spread rapidly through Twitter. A large collection of screenshots of those tweets can be found online.


Recently, the Truthy service from researchers at Indiana University, has started to collect, analyze and visualize the spread of tweets belonging to “trending topics”. Features collected from the tweets are used to compute a truthiness score for a set of tweets. Those sets with low truthiness score are more likely to be part of a campaign to deceive users. Instead, in our work we do not focus specifically on detecting willful deception, but look for factors that can be used to automatically approximate users’ perceptions of credibility.

The study's conclusion: "we have shown that for messages about time-sensitive topics, we can separate automatically newsworthy topics from other types of conversations. Among several other features, newsworthy topics tend to include URLs and to have deep propagation trees. We also show that we can assess automatically the level of social media credibility of newsworthy topics. Among several other features, credible news are propagated through authors that have previously written a large number of messages, originate at a single or a few users in the network, and have many re-posts."

All of the above is largely known. What isn't, however, is the mostly generic matrix used by various electronic and algorithmic sources to determine who is real and who isn't, and thus who is market moving and who, well, ins't. Once again, courtesy of Castillo, one can determine how the filtering algo operates, (and thus reverse engineer it). So without further ado, here is the set of features used by Twitter truth-seekers everywhere.

Those are the variables. And as for the decision tree that leads an algo to conclude if a source's data can be trusted and thus acted upon, here it is in its entirety. First, verbally:

As the decision tree shows, the top features for this task were the following:

  • Topic-based features: the fraction of tweets having an URL is the root of the tree. Sentiment-based features like fraction of negative sentiment or fraction of tweets with an exclamation mark correspond to the following relevant features, very close to the root. In particular we can observe two very simple classification rules, tweets which do not include URLs tend to be related to non-credible news. On the other hand, tweets which include negative sentiment terms are related to credible news. Something similar occurs when people use positive sentiment terms: a low fraction of tweets with positive sentiment terms tend to be related to noncredible news.
  • User-based features: these collection of features is very relevant for this task. Notice that low credible news are mostly propagated by users who have not written many messages in the past. The number of friends is also a feature that is very close to the root.
  • Propagation-based features: the maximum level size of the RT tree is also a relevant feature for this task. Tweets with many re-tweets are related to credible news.

These results show that textual information is very relevant for this task. Opinions or subjective expressions describe people’s sentiments or perceptions about a given topic or event. Opinions are also important for this task that allow to detect the community perception about the credibility of an event. On the other hand, user-based features are indicators of the reputation of the users. Messages propagated trough credible users (active users with a significant number of connections) are seen as highly credible. Thus, those users tend to propagate credible news suggesting that the Twitter community works like a social filter.

And visually:

Get to the very bottom of the tree without spooking too many algos, and you too can have a Carl Icahn-like impact on the stock of your choosing.

Source: "Information Credibility on Twitter"

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
_ConanTheLibertarian_'s picture

Only time before someone successfully fools the algos with a fake news bomb to crash and burn the market.

That should be fun.

0b1knob's picture

Already happened.

Twitter Hoax Sparks Swift Stock Swoon

Attributed to the Free Syrian Army.

ParkAveFlasher's picture
  • =ISHUNGRY("canhazcheezburgr?","WTF")

Leave it to the Italians to go full retard with a flourish.  Target $8000 per share, presstissimo.

malikai's picture

Surprising. I'd have thought they'd go with MARS over Decision Trees.

But to be fair, there's been a lot of success lately using DTs on junk data.

boogerbently's picture

Monday Icahn buys HP.

Tuesday Icahn tweets he bought HP.

Thursday Icahn sells HP.

Friday Icahn says he sold HP.

Monday Icahn buys HP........

Atomizer's picture

‘White House Attacked, Obama Injured’ AP Tweet Hoax Crashes US Stock Market

It’s much like the retail industry looking to create hoax stories on hackers exploiting personal CC information. All fucking bullshit! These retail outlets need investment monies to change the consumers shopping experience. The smartphone is already been deployed.

Shopping in the Future as a Chipped Human

disabledvet's picture

how can a "hoax tweet" not be a hack? it's not like twitter authorized it. I remember turning on CNN and you could see the entire thing was false...but it did move the market. the flow chart should read "warren buffet buys" actually. doesn't hurt being right, either. in other words...setting forth a track record by being honest about what you're doing...but being as honest as you can about the data...and then seeing if the bots respond. I rememeber before TD was even on SA doing a one day battle with a short seller saying "short Harley Davidson." I didn't own any HD stock...still don't unfortunately...but warned any and all listeners that this was the wrong company to be shorting in the collapse. Stock price went up north of 10 percent the next. So I started getting carried away with one long call after another...eventually settling on an "entirely long thesis" for the market as a whole...stayed consistent...then changed my view for the first time in five years just under a year ago...and have remained a treasury bull ever since. So far it hasn't been like the equity ride but i can't wait to be out of equities en toto even though there are some great companies doing great things right here right now. it's just that when the market goes to the moon and the economy doesn't budge it's more than the political class that is in trouble. can "twitter" pick up on this? i don't see why not. we have no privacy any more...actually, we never had any to begin you don't even need any money in the trade actually to be right...just be right. Since most outliers are in fact where you want to be...trying to discover trends and hammer home your reasoning before the herd rushes always want to make a point of why you think there is a herd to begin with (there always is...the weakness of the statistical distribution method) and then why the herd will come towards you. it worked like a charm in the bull market in equities...which made the move into treasuries "very not easy." i'm glad i didn't recommend anyone follow me in it because Taper did a number on the debt markets this year. Having said that clearly "the Fed is out of bullets now." With the Federal Government now dramatically winding down the war effort now and no one in Washington taking the "non-recovery recovery" seriously I think you could get a lot more than another recession here. and no "bad news is not good news" for stocks like it was in the 90's. I can think of a lot of things worth selling...right here, right now (Goldman Sachs would be the top of my list)...and no, i'm not even bearish actually.

Atomizer's picture


Twitter Hoax Sparks Swift Stock Swoon , Attributed to the Free Syrian Army.

  • Have you gone to the site to investigate?
  • You have the wrong URL:
  • What you have listed above traces back to France location with a administer: State/Province: Ar Riyad, SA address.


The wheels are falling off the wagon, hang tight!

The Count's picture

Twitter, Facebook, LinkedIn?

You really want to enslave yourself, then continue using that crap. Got out of all social media several years ago. 

Google is not much better. Now they force you into Google+ if you want to post something on Youtube. Fuck them all!

onelight's picture

Twitter has the potential to be a great source of real-time ideas and research links, but sadly it's also other things too..

HedgeAccordingly's picture

VERY interesting.. this has been going on since 2010. was only more profitable then.. now mass utilization of these now known strategies wil dimish the returns. of the firm who uses these metrics. 

ItsDanger's picture

Could bring an epidemic of fat fingers.

0b1knob's picture

Phat Phinger Phacebook Phools.

Zero Debt's picture

The most truly effective fat fingers in the market nowadays are middle fingers, pointing upwards, facing your clients, after stealing their segregated account funds.

debtor of last resort's picture

So we have a non profit casino stock as a tool for algos to manage braindead #investors

ParkAveFlasher's picture

+1.  As a bonus, it will condition the sheep towards dumbing down their Turing testing instincts.

MedicalQuack's picture

Funny yeah here we go again with trying to get social media to find the algo fairies..

How about some quantitated see the formula up there "so it must be good"  <grin>

Just read where Forbes has 6 bids, and I wonder do the humans journalists go with it or just the journobot?  Have not heard about that bot...welcome to the beginnings of news content farms..(link above tells about journobot) anyway I think in the long run they will end up keeping sites like mine and zero hedge going as who in the heck will want to comment to a bot writing an article...and let's take this Twitter formula and roll it into the journo bit while we are at it:)  Professor Siefe at NYU has a new book in the making and he said the other day that he has a chapter that will be dedicated to the journobot.  He's the same guy that wrote "Proofness, the Dark Arts of Mathematical Deception" so he's right on track for the next book and this last one from two years ago was good.  He's ahead of his time and debunks the nonsense out there.  I use one of his video clips at the link above and what he says is right on target that the press can't help themselves with stats and numbers and boy if it has a formula in the article, just about everyone bites..except here of course at Zero gotta love it. 


icanhasbailout's picture

I would be ridiculously rich if I were evil.

Atomizer's picture

Stay honest and make your money the old fashion way. We have these historical cycles to purge evil. Don’t get caught sampling their candy. Winks.

Godisanhftbot's picture

you have to be smart and evil.  sorry, but evil is the easy part.

Spungo's picture

It would be a lot easier to buy all US media and use that power to tell the proles what to buy.

BuddyEffed's picture

Not only can you expect HFT algos to graze the twitter big data, but you can probably start expecting them to generate twitter messages too, as each HFT algorithm tries to spoof, front run, and split pennies every way they can.  HFT -- Hack From Twitter

Am hoping those HFT algos don't eventually tie in to traffic control and traffic flow, so they can choose who gets in to work early or who gets stuck in traffic.

Zero Debt's picture

We shall see whether the efficiency of these algorithms will be limited to the ability to place the first comment on every ZH thread about someone making boatloads of money out of their mom's basement...

Atomizer's picture

Algorithms are computer programs that look for clues to give you back exactly what you want.


What we are describing to you, the markets are rigged under a primary search engine and two social media sites. Good or bad news, the fraud has been tested over 1,000+ times, it always repeats itself in a self serving manner. The triangle of scammers know this networking system quite well. One day, they’ll wake up laughing about how much money they plan to steal for the day, then it’s over. Connection goes tits-up.

PT's picture

Do any of these algos actually work out that I didn't get the information I was looking for?  Or do they just keep suggesting web pages similar to what I have already looked at, thus guaranteeing that I will never find what I was looking for?  That would explain a lot.  I am constantly surprised at how specific I can be, and yet still take forever to find what I was looking for.

For example, when I was trying to find this picture (I saw it a long time ago and wanted to find it again):

You'd think I'd just have to type in something like, "Morans they need to get a brain".
There was umpteen dozen totally unrelated webpages suggested before I found the right one and this was not a one-off problem.  It happens to me all the time.

Or, perhaps they're using an old salesman's algo:  When it comes to buying a car, or a house, often times the saleman will show you a dozen cars / houses that you don't want to buy before showing one that meets your specs.  The reason being, if they show you what you want straight away, then you might decide to get fussy and still want to look at a dozen houses / cars anyway.  If they can piss you off for a couple of hours, then when you see something that meets your specs, you'll be so grateful that you'll buy it straight away.

The "morans" example was trivial and yet I'm astonished at how hard it was for me to find it again.  I maintain that search engines do a remarkably good job of hiding information.

ThisIsBob's picture

Its the first image on Google image search.