Trading the news: A sentiment analysis strategy using lemon.markets
News-following investment strategies often require you to closely follow the news. What if you could automate the process? In this project we’ll build a sentiment analysis strategy that autonomously trades based on news headlines. We show you how to scrape headlines from a financial website, determine the sentiment of said headlines and make trade decisions based on your findings. With the Python requests library, Beautiful Soup, VADER and lemon.markets in your toolbox, you’ll have everything you need to bring this project to fruition. We’re super excited to show you just how lemon.markets, the brokerage API powering automated trading, is the perfect tool for a project like this.
If you want to get started developing straight away, you can check out our GitHub repository for this project here. Otherwise, keep reading to learn more about the strategy.
Why would I trade the news? 📰
You’ve probably heard the maxim, ‘buy the rumour, sell the news’, referring to the phenomenon that traders speculate on news (and thus, also price movements) and subsequently exit out of their positions once the news has been published. If this really is the case, wouldn’t trading the news be too late? Turns out, day-to-day fluctuations in price movements to some extent reflect emotional reactions to the news, see this article. Combine this with an automated script, and you’re capitalising on the reactions of others to the news, without having to closely follow it for yourself.
Are you convinced? Let’s see how we can make the news work for us. We’ll begin by collecting news headlines.
Collecting your data 📊
As this process will be different for each data source, we won’t go into too much detail here. If you need some more help with web scraping, check out this article.
For this project, our goal is to place trade automatically based on the news. The first step is to decide how we want to gather our data, and especially from which source. We went for MarketWatch because the data is presented in an easily digestible format — for each headline, we are given its date and the ticker(s) corresponding to the headline, see the example below. Using the headlines, we can play a game of sentiment red light, green light and hopefully cash in a few won (🦑 🎲). But more on that in a second.
Screenshot of MarketWatch.com technology headlines collected on 19 October 2021
To collect these headlines, we use a simple GET request against the desired URL. Using the requests package, this looks as follows:
1import requests
2page = requests.get("https://marketwatch.com/investing/technology")
And to parse this data, we use BeautifulSoup, which is a Python package that can extract data from HTML documents.
1import pandas as pd
2from bs4 import BeautifulSoup
3soup = BeautifulSoup(page.content, "html.parser")
4article_contents = soup.find_all("div", class_="article__content")
5for article in article_contents:
6 headline = article.find("a", class_="link").text.strip()
7 ticker = article.find("span", class_="ticker__symbol")
8 headline_ticker = [headline, ticker]
9 headlines.append(headline_ticker)
10
11columns = ["headline", "US_ticker"]
12headlines_df = pd.DataFrame(headlines, columns=columns)
Keep in mind, this code won’t work for just any website. You’ll notice that we are accessing the article contents in something called “div” and “article__content”. You’ll need to adjust this on a website-by-website basis, and this requires some inspection of the page you are on. In Chrome, you can do this by right-clicking on any website and selecting ‘Inspect’ (if you use another browser, use these steps instead). You’ll be met with a jumble of HTML. The easiest way to figure out where the headlines are ‘hiding’, is to Ctrl-F (or Command-F on iOS) a particular headline. You can also click the ‘Select an element in the page to inspect it’ button on the top-left of the Developer console to pinpoint where to find your desired data.
Clicking on a particular element will reveal where it in the HTML code. For example, when we click on a ticker we’re informed it can be found in the <span> tag and in the ‘ticker__symbol’ class.
Once you’ve found the tags corresponding to the right element(s), you can paste the names into the code snippet above to retrieve its contents. We suggest frequently printing your output to determine whether you are collecting the desired information and whether it needs to be pre-processed. For example, in line 8, we remove the whitespace surrounding the headline to clean up our data. When you’re happy with your output, you can collect all relevant information in a Pandas DataFrame. If you’re interested in other Python resources that might be useful for automated trading, check out our article here.
Pre-processing your data 👩🏭
At this stage, it’s likely your data needs some (additional) pre-processing before it’s ready for sentiment analysis and trading. Perhaps you’re also collecting the headline’s timestamp and need to convert it to a different time-format, or you might want to remove any headlines that don’t mention a ticker.
Luckily, in our case, we don’t have to do a lot of pre-processing. In our GitHub repository, you’ll notice that we removed any headlines without tickers and headlines with tickers that we know are not tradable on lemon.markets (to make the dataset smaller). To do this, we created a list of non-tradable tickers and constructed a new DataFrame of the collected headlines, filtered by the negation of the above list. Additionally, to trade on lemon.markets, we need to obtain the instrument’s ISIN. Because we trade on a German exchange, querying for a US ticker will not (always) result in the correct instrument. Therefore, to ensure that there are no compatibility issues, we suggest mapping a ticker to its ISIN before trading. We’ve published an article that’ll help you do just that.
Performing the sentiment analysis 😃😢
Once we’ve collected our headlines and tickers (or ISINs), we need to be able to decide whether the headlines report positive or negative news. This is where our sentiment analysis tool, VADER, comes in. It’s a model for lexical scoring based on polarity (positive/negative) and intensity of emotion. The compound score indicates whether a text is positive (>0), neutral (0), or negative (<0). In the above headlines, it can determine that ‘“Squid Game” is worth nearly $900 million to Netflix’ has a somewhat positive sentiment as the word ‘worth’ is likely part of the positive sentiment lexicon.🚦 If you’d like to read more about how VADER works, check out this article. There’s also alternatives out there, like TextBlob or Flair. You might want to try out all three to determine which one works best on your dataset.
For our use-case (determining sentiment scores of online newspaper headlines), the implementation is really simple:
1from nltk.sentiment.vader import SentimentIntensityAnalyzer
2vader = SentimentIntensityAnalyzer()
3scores = []
4for headline in headlines_df.loc[:,"headline"]:
5 score = vader.polarity_scores(headline).get("compound")
6 scores.append(score)
7
8headlines_df.loc[:,"score"] = scores
If we have more than one headline (and scores) for a particular ticker, we have to aggregate them into a single score:
1headlines_df = headlines_df.groupby("ticker").mean()
2headlines_df.reset_index(level=0, inplace=True)
We’ve chosen to combine scores by taking the simple average, but there are several measures that you might opt to use. For example, a time-weighted average to penalise older deadlines as they probably are less representative of current (or future) market movements.
Placing your trades 📈
Once you’ve obtained the compound scores for the tickers, it’s time to place trades. However, you first need to decide on a trading strategy — what kind of score justifies a buy order? What about a sell order? And how much are you trading? There are several components to keep in mind here, such as your total balance, your current portfolio, the ‘trust’ you have in your strategy and others. However these parameters will vary depending on your goals for this strategy.
Our base project works with a very simple trade rule: buy any instrument with a score above 0.5 and sell any instrument with a score below -0.5 (see if you can come up with something a bit more complex 😉):
1buy = []
2sell = []
3for index, row in headlines_df.iterrows():
4 if row['score'] > 0.5 and row['isin'] != 'No ISIN found':
5 buy.append(row['isin'])
6 if row['score'] < -0.5 and row['isin'] != 'No ISIN found':
7 sell.append(row['isin'])
If the instrument is tradable (ISIN found on lemon.markets) and the sentiment score corresponds to our buy/sell decision, add ISIN to list of instruments we wish to trade.
We can then feed this list of ISINs to the lemon.markets API (if you’re not signed up yet, do that here) to place and activate our trades:
1orders = []
2# place buy orders
3for isin in buy:
4 side = 'buy'
5 order = requests.post(
6 f"https://paper-trading.lemon.markets/v1/orders/",
7 data={"isin": isin,
8 "expires_at": "p0d",
9 "side": side,
10 "quantity": 1,
11 "venue": "XMUN"},
12 headers={"Authorization": f"Bearer {<YOUR-API-KEY>}"}).json()
13 orders.append(order)
14# place sell orders
15for isin in sell:
16 side = 'sell'
17 order = requests.post(
18 f"https://paper-trading.lemon.markets/v1/orders/",
19 data={"isin": isin,
20 "expires_at": "p0d",
21 "side": side,
22 "quantity": 1,
23 "venue": "XMUN"},
24 headers={"Authorization": f"Bearer {<YOUR-API-KEY>}"}).json()
25 orders.append(order)
26# activate orders
27for order in orders:
28 order_id = order['results'].get('id')
29
30 requests.post(
31 f"https://paper-trading.lemon.markets/v1/orders/{order_id}/activate/",
32 headers={"Authorization": f"Bearer {<YOUR-API-KEY>}"})
33 print(f'Activated {order["results"].get("isin")}')
You’ll need to fill in your own API-KEY in this code snippet to make it run. Please note that you’ll need to make sure you’re not selling any financial instruments you don’t own, we’ve omitted it in this code snippet, but you can find the implementation in our GitHub repository!
For demonstration purposes, our trades are all of size 1, but depending on your capital, you might want to increase this parameter (or even make it dynamic depending on the sentiment score). Besides this, there are lots of other ways you can make this project even more extensive! We are excited to see your ideas 😏
Further extensions 🤓
This project is only a start to your very own sentiment trading strategy. There are several extensions that can be made, for example, you can make your trade decisions more robust by collecting news from several sources. Or you can conduct more extensive sentiment analysis by, for example, applying VADER on the whole article rather than just the headline (we all know clickbait is a real thing 🎣). Perhaps you want to use a different sentiment analysis tool, like TextBlob. Or maybe you even want to create your own sentiment score library based on investment-specific jargon.
We suggest you begin by collecting data from a news source you trust and tweaking the trading decision rule. Let your imagination go wild!
You’re now set to use BeautifulSoup, VADER and lemon.markets in your sentiment analysis project. See our GitHub repository for the entire script. And, if you come up with an interesting extension, feel free to make a PR! We look forward to seeing your ideas.
Joanne from lemon.markets🍋
You might also be interested in
Using Time Series Forecasting to predict stock prices 🔮
In this article you will learn what time series forecasting is and how its application in finance looks like. Then you'll also dive into Facebook's Prophet Model for Time Series Forecasting and use it together with the lemon.markets Market Data API to forecast the development of the Tesla Stock.
Dummies Guide to Trading with Machine Learning
Ever wonder how a trader with decades of experience on thousands of stocks and lightning fast reaction times might perform in the market? With some machine learning knowledge, you might be able to automate such a trader yourself! 💻 📈
4 reasons why YOU should automate your trading strategy
In the current volatile market conditions, everyone is trying to find ways to minimise portfolio loss. In that context, have you ever thought about automating your trading strategy? In this article, we will dive into 4 reasons for doing so. Expect to learn how it can save you time, make your trading more efficient and lead to data-based decisions.
Find more resources to get started easily
Check out our documentation to find out more about our API structure, different endpoints and specific use cases.
Interested in building lemon.markets with us?
We are always looking for great additions to our team that help us build a brokerage infrastructure for the 21st century.