Using Time Series Forecasting to predict stock prices 🔮


September 7, 2022
Trading


The wish to predict the stock market is probably as old as the stock market itself. In today’s age, though, this wish has come closer to reality using data science. In this article we will look at a common method utilized in data science called time series forecasting. Specifically we will use Facebook’s Prophet Model to look at a time series, a sequence of data points, in an effort to predict the future progress of the Tesla Stock! 📈🥳

Time Series and the Stock Market 🤑

Even though the Efficient Market Hypothesis states that it is impossible to accurately predict stock prices, there is work in the literature that has shown that if the right variables are selected and appropriate models are developed, stock price movements can be predicted to a certain extent.

We at lemon.markets 🍋 (the Berlin-based fintech start-up dedicated to building a trading API for developers) also believe in the potential of utilising historic data in supporting your investment decisions. Combined with thorough fundamental research about the instrument and a rigorous backtesting strategy, historic data can be used to gain further insights trough technical analysis and can therefore provide additional helpful criteria for your investment decisions. With the help of our Market Data API, you can query market data from over 8000 financial instruments like stocks or ETFs. And for those who want to step up their game and automate their decision making, we also provide a Trading API.

When it comes to investment decisions, using a time series analysis to track the price of a security over time is a popular approach. The price can be tracked over the short term, such as the price of a security over an hour during a business day, or the long term, such as the daily close price of a security over the course of five years.

Now, what Time Series Forecasting does is it uses information about historical market data and associated patterns to make a prediction for future activity. In many cases, these patterns are concepts like trend analysis, cyclical fluctuation analysis, or issues of seasonality. For instance, stock markets tend to perform well at the beginning of the year, as this is when many investors have fresh capital to place into the market. Or share prices often rally ahead of long weekends and three-day holidays, such as Thanksgiving and Independence Day in the US. This has been attributed to simple optimism and high spirits among traders.

Bringing a Time Series Analysis to life 💫

If you are interested in creating your own time series analysis, Facebook’s Prophet can be a great tool for you as it is a beginner friendly step into the world of data science. Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. The open-source algorithm for generating time-series models uses a few old ideas with some new twists. Prophet is particularly good at modelling time series that have multiple seasonalities. At its core is the sum of three functions of time plus an error term: growth g(t), seasonality s(t), holidays h(t) , and error e_t :

yt=g(t)+s(t)+h(t)+εt

g(t) describes a piecewise-linear trend (or “growth term”), s(t) describes the various seasonal patterns, h(t) captures the holiday effects, and εt is a white noise error term

If you are interested in learning more about how Prophet works under the hood, check out this article from Meta and their extensive documentation and GitHub repository.

Let’s have a look at how we can utilise the Prophet model to forecast the future development of the Tesla Stock. Before getting started, we should note that this project develops a proof of concept. If you plan on running this strategy in a live environment this would require additional testing and research. The objective for this article is to shed some light on how time series forecasting with stocks can work.

Predicting TESLA Stock prices 🏎

In this example project we want to retrieve historic stock data of the Tesla stock, feed it into a model and tune this model to generate valuable future predictions of the stock’s development. In the following we will go through all of these steps in detail.

As a heads up, we will use some common Machine Learning terminology in this guide. If you are new to this topic, we have published an article which will help you familiarize yourself with machine learning (ML) concepts and guide you through the confusing jungle of ML vocabulary. 🫣

If you are interested in building the project we present in this article on your own: you can find the code for this project at this GitHub repository. To gain a better understanding of this article it makes sense to follow along the repo as you read. Alright, let’s get started.

Tools 🔨

Before we can get started developing our prediction model, we are going to need some tools and packages to be able to properly process financial data and numbers. Upfront, for this project I am using Python 3.9.12, so all packages and modules are Python-based. The first one, pandas, is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool. In order to use pandas you have to install anaconda, a package and environment management system. For getting our data we will use the lemon.markets Market Data API. The team at lemon.markets developed a Python SDK, which drastically simplifies the usage of the API. We will therefore also use it in our project. Lastly we obviously need to install the actual Prophet module itself. To install all packages, type the following:

1conda install pandas
2conda install -c conda-forge lemon
3conda install -c conda-forge prophet

Then we will import the datetime and csv module, which come preinstalled with Python 3.9.12. Prophet includes functionality for time series cross validation, so we’re going to import the required modules for this as well. 
This is the full list of all imported packages.

1from prophet import Prophet
2from prophet.diagnostics import performance_metrics, cross_validation
3from datetime import timedelta
4from lemon import api
5import pandas as pd
6import datetime
7import csv

Get Data 🗄

First of all, we have to get the data we want to use in order to make the forecast. For this we will use the lemon.markets Market Data API, which offers quotes, trades and OHLC data. Using the Python SDK, we make a GET request to the OHLC endpoint. This type of market data is advantageous for our project, as it offers the price of an instrument at a specific venue at the end of the respective time period. In our case we retrieve the daily closing price of the Tesla Stock.

1response = client.market_data.ohlc.get(
2    isin=['US88160R1014'],
3    period='d1',
4    from_=new_date
5)

For daily data (“D1”), you can request 60 days of data per request with the lemon.markets API. That means we have to make multiple requests to get data for longer time periods, loop through the responses and append the data to a list. For this project I use all closing prices from 2021–08–06 until 2022–07–11.

Preprocessing Data ⚙️

Before you can begin feeding data into your ML model, you’ll need to preprocess it first! This involves filtering out the relevant data for us and storing it in an appropriate data format. 

The input to Prophet is always a data frame with two columns: ds , which are the date stamps and y, which are the numerical data points used for the forecast. That means we have to create a data frame file, containing a ds and y column with our data. We choose to store our dataset in a csv file, which is common, as csv files make it easier to share the dataset, also with non programmers, as it can be opened in Excel and doesn’t require any additional instalments.

1with open("data.csv", "w") as f:
2    # create the csv writer
3    writer = csv.writer(f)
4    writer.writerow(["ds", "y"])
5    for in in range(length_of_dates)
6        writer.writerow(ds[i], data[i])

Now with all of our data in a csv file we can pass it over to the Prophet Model. Going directly to instantiating our model in order to make a forecast would be a step to early though. We first have to choose the right parameters.

Hyperparameters 🍎🍐

The Prophet Model is basically an object that can take multiple parameters. These parameter will try to replicate the real world “personality” of a financial instrument. For example if the instrument is highly fluctuating or maybe strongly based on seasonal effects. These parameters have to be tuned for every model individually and play a huge role in the accuracy of your model’s forecast. We recommend that you take your time to deep dive into the past behaviour of a stock, always rigorously test your model and implement additional decision criteria if you want to use your model for investment decisions. 

From the many parameters one can influence, only the ones described in the following have a major impact. Nonetheless, I would still advise you to take a look at every one of them to gain insights if they might have an impact on your model as well. It is important to keep in mind that every financial instrument and the dataset behind it is different and thus will be impacted by different parameters to a different extent.

  • changepoint_prior_scale is a very impactful parameter. It will determine how much impact abrupt changes in your data will have on the model. If you have a highly fluctuating chart it would makes sense to account for that and change this parameter to a higher value. 
  • If changepoints, especially towards the end of the timeframe, are frequent and have huge influence on your model, it makes sense to tweak the changepoint_range parameter. By default, changepoints are only inferred for the first 80% of the time series in order to have plenty of runway for projecting the trend forward and to avoid overfitting fluctuations at the end of the time series. This default works in many situations but not all, and can be changed using the changepoint_range argument. In Python this looks like this for example m = Prophet(changepoint_range=0.97)
  • A highly repeating pattern at intervals could be an indicator for high seasonality influence. In this case the seasonality_prior_scale hyperparameter plays an important role as it allows the seasonality to fit large fluctuations or to shrink the magnitude of the seasonality.
  • If your stock is highly influenced by holidays, the effects of holidays_prior_scale have a huge influence. Retail companies for example are often heavily influenced by Black Friday or Christmas Day and you can account for that by adding those holidays to your model. 

First, figure out which parameters have the highest impact on your model. You can do that by conducting research into the financial instruments behaviour and history as well as learn about the company and its business/operating model. For example, a clothing retail company might show high seasonal trends. In this case the seasonality or holiday parameter would probably play a huge role. Companies that rely heavily on individual product launches e.g. a game publisher, might show high changepoint’s in their stocks development once their product released. It also makes sense to just play around with some random parameters to get a grasp for how they affect the outcome of your forecast. When you have a good feeling about the parameters influence on your instrument you can then focus on finding the right values for each parameter. You can achieve that by using Prophet’s build-in cross-validation features.

Supervised Learning / Tune and Train 🤺

After you figured out which parameters are most impactful for your model (see previous step), cross-validation can be used for tuning the hyperparameters of your model. Cross validation is the process of evaluating and comparing the accuracy of a model. We will use this process to simulate different models with different parameter values in an effort to find the best possible parameter values.

We do that by creating a grid containing the parameters and a few possible values. We then will perform a calculation with the combinations of all values in the grid and determine the best combination using a KPI (more on that later). Depending on your computing power you can implement more or less values in your grid. We will first choose a wider value range including the start and end of the possible value spectrum. By repeating this process we can narrow down the parameter values each iteration.

1param_grid = { 
2 ‘changepoint_prior_scale’: [0.001, 0.01, 0.1, 0.5],
3 ‘seasonality_prior_scale’: [0.01, 0.1, 1.0, 10.0],
4 ‘holidays_prior_scale’: [0.01, 0.1, 1.0, 10.0],
5}

Let’s get to it. We first loop through the grid and test out each combination. In the example above, this means 125 different combinations (5x5x5). For each single combination we calculate the root-mean-square error (RMSE). The RMSE is a KPI to determine how far from the regression line data points are. So this metric tells us the average distance between the predicted values from the model and the actual observed values in the dataset.

Luckily we don’t have to perform the RMSE calculation ourselves as Prophet includes a method for doing so. However, if you want to learn more about this metric, this article could be interesting to you.

1for params in all_params:
2    m = Prophet(**params, interval_width=0.9,         daily_seasonality=True).fit(data)
3    df_cv = cross_validation(m, initial='60 days', period='10 days', horizon = '20 days')
4    df_p = performance_metrics(df_cv, rolling_window=1)
5    rmses.append(df_p['rmse'].values[0])
6# Find the best parameters
7tuning_results = pd.DataFrame(all_params)
8tuning_results['rmse'] = rmses
9print(tuning_results)

When the calculations are finished, we access our data frame and search for the row with the smallest RMSE and return it. The smallest RMSE means that the difference between the actual data set and Prophet’s forecast is the smallest. The smaller the difference, the more accurate is our model.

1{'changepoint_prior_scale': 0.1, 'seasonality_prior_scale': 1.0, 'holidays_prior_scale': 0.01}

You can repeat this process multiple times and adjust your grid values based on the previous outcome to fine-tune your results. When you reach the final parameter value, meaning your outcome doesn’t change anymore, it’s time to forecast.

Forecast 🔭

After we evaluated the right parameter’s and their value, we can now start to forecast our data. This is actually the easiest step of all. Simply instantiate your model and feed it with the parameter values you determined in the previous steps. Important to note for stock predictions is to eliminate the weekend from the future data frame. 

1future['day'] = future['ds'].dt.weekday
2future = future[future['day']<=4]

Then you will have to determine the period how far into the future you want to forecast. As a rule of thumb, the further you forecast the less reliable your data will be. This is because the further away you go from your last actual data point the less data the forecast has to support its predictions. But also short-term data does not guarantee reliable estimates either. We will elaborate on why that is in the retrospective later on.

Then you can go ahead and plot your model.

The black dots represent the actual given datapoints and the blue line represents the forecast made by Prophet. For this example I made a 20 day prediction into the future. The part where there are no black dots and only the blue line continues is the future prediction.

The prediction for the Tesla Stock on the 29.07.2022 (day of writing this article) is a closing price of 837.72 €. 

Let’s put it in perspective 💭

Even though this might sound exciting at first glance I would advise you to not rely your investment decisions blindly on the Prophet forecast. By looking at the graph we can detect multiple instances where the model diverges from the data points to a noticeable extent. This happens especially at drastic breakpoints, as Prophet somewhat ignores them in order to stay aligned with the general behavior of the graph.

One way to significantly improve the quality of your outcome is to use a bigger data set. However, even with a bigger data set, Prophet comes with limitations that are important to keep in mind.

  1. First of all, Prophet uses mathematical calculations in an effort to predict the value of future data points. The real world stock market on the other hand relies on many factors which Prophet doesn’t incorporate in its decision making at all. News get translated into data pretty slowly, as Prophet does not directly consider the recent data points as compared to other models. That means if shocking news, like a scandal in the company, hit the market, Prophet’s immediate reaction to that will be marginal. Even though the price on the stock market would react to news instantly, in the Prophet model it’s just one of many data points calculating the future outcome. One could try to somewhat counteract this behavior by cranking up changepoint_range but this can result in overfitting fluctuations at the end of the time series. Finding the right middle ground can be tricky here.
  2. Secondly, as the market changes constantly, the need for specific parameters does as well, meaning that the model you crafted with an underlying data set might be outdated as soon as the market situation changes. In theory it would be possible to always feed your code the newest data and train new models constantly. While big computer farms might have the computing power needed for that, for the everyday computer it becomes unfeasible. That’s why many investors use time series forecasts to get a general idea of a stocks potential development rather than basing short term trades on forecast data.
  3. Third, there are also other Time Series Models Prophet has to compete against. Many of them differ from the Prophet Model in complexity and ability. For example traditional time series models like ARIMAX have many stringent data requirements like stationarity and equally spaced values. Another example, Recurring Neural Networks with Long-Short Term Memory (RNN-LSTM), can be highly complex and difficult to work with if you don’t have a significant level of understanding about neural network architecture. One common comparison though is with the ARIMA (Autoregressive integrated moving average) model. This model basically combines two approaches. It first takes the Autoregressive model which says that the forecasts correspond to a linear combination of past values of the variable. And then combines it with the approach of the Moving Average model which forecasts correspond to a linear combination of past forecast errors. A study conducted by Lorenzo Menculini from Cornell University deduced that Prophet’s performances are much poorer than ARIMA. 

The challenge with Facebook’s Prophet is that it does not look for casual relationships between the past and the future. It simply finds the best curve to fit the data using a linear logistic curve component for the external regressor. That’s why Prophet is generally recommended only for time series where the only informative signals are trends, and the residuals are just noise.

Still, Prophet is a great learning opportunity for getting started on working with ML models. It does not require much prior knowledge of forecasting time series data because it can automatically find seasonal trends with a set of data and offers easy to understand parameters. The extensive and well written documentation as well as many example projects out there make it very attractive for beginners. This means that even a non-statistician can start using it and obtain good results on par with the experts.

lemon.markets 💛 data science 

The goal of this article was to give you a glimpse and easy start into using time series for stock predictions. If you want to step up your game and learn more about trading via API you should check out our Medium blog for other interesting articles or maybe take a look at this video about using the Mean Reversion Strategy to automate your trading strategy. 

The lemon.markets API can help you in your data science journey with the help of our Market Data API 📊. Retrieve historical prices or stream the price development of your favourites in real-time. If you’re searching for Quotes, Trades or OHLC: we got you covered. Three flexible pricing tiers (including a free one), our extensive documentation and the seamless integration guarantee a quick and simple implementation into your data science project.

I hope you enjoyed this article. If you want to learn more about investing, make sure to follow our blog. And don’t forget to sign up to lemon.markets to start building your own data science project. If you have any questions, make sure to contact us via support@lemon.markets or join our Slack community 🚀

We are looking forward to your projects with lemon.markets :)

🍋 David

You might also be interested in

blog photo

Using Time Series Forecasting to predict stock prices 🔮

In this article you will learn what time series forecasting is and how its application in finance looks like. Then you'll also dive into Facebook's Prophet Model for Time Series Forecasting and use it together with the lemon.markets Market Data API to forecast the development of the Tesla Stock.

blog photo

Dummies Guide to Trading with Machine Learning

Ever wonder how a trader with decades of experience on thousands of stocks and lightning fast reaction times might perform in the market? With some machine learning knowledge, you might be able to automate such a trader yourself! 💻 📈

blog photo

4 reasons why YOU should automate your trading strategy

In the current volatile market conditions, everyone is trying to find ways to minimise portfolio loss. In that context, have you ever thought about automating your trading strategy? In this article, we will dive into 4 reasons for doing so. Expect to learn how it can save you time, make your trading more efficient and lead to data-based decisions.

Dive Deeper

Find more resources to get started easily

Check out our documentation to find out more about our API structure, different endpoints and specific use cases.

Engage

Join lemon.markets community

Join our Slack channel to actively participate in our community, ask questions to other users and stay up to date at all times.

Contribute

Interested in building lemon.markets with us?

We are always looking for great additions to our team that help us build a brokerage infrastructure for the 21st century.

Need any help?
Ask a question in our CommunityAsk a question in our CommunityGet started with our DocumentationGet started with our DocumentationGet inspired on our BlogGet inspired on our Blog
© lemon.markets 2022Privacy PolicyImprint
Systems are down

As a tied agent under § 3 Sec. 2 WplG on the account and under the liability of DonauCapital Wertpapier GmbH, Passauer Str. 5, 94161 Ruderting (short: DonauCapital), lemon.markets GmbH offers you the receipt and transmission of orders for clients (§ 2 Sec. 2 Nr. 3 WpIG) of financial instruments according to § 3 Sec. 5 WpIG as well as brokerage of accounts.