# How to Build a Time Series Forecasting Model

# Overview

This guide documents an approach to building forecasting models from time series data, using a dataset of daily temperatures in Delhi.

We will produce a simple model as quickly as possible, then iterate on it with layers of complexity to improve the performance, with the help of:

kortical.features.time_series - Kortical's time series library.
kortical.api.experiment_framework - Kortical's experiment framework library.

PRE-REQUISITIES

This tutorial assumes that you are using:

Kortical SDK. Refer to the Kortical SDK documentation setup page, which details a range of installation methods depending on your operating system and preferences.

To run the scripts, you will need:

Your system URL, which has the format https://platform.kortical.com/<company>/<system>.
Your credentials string, which can be generated with the CLI command korical config credentials to_string.

# Defining the problem

Let's examine the data:

from kortical import datasets

df = datasets.load('delhi_climate')

>>> df.head()

         date   meantemp   humidity  wind_speed  meanpressure
0  2013-01-01  10.000000  84.500000    0.000000   1015.666667
1  2013-01-02   7.400000  92.000000    2.980000   1017.800000
2  2013-01-03   7.166667  87.000000    4.633333   1018.666667
3  2013-01-04   8.666667  71.333333    1.233333   1017.166667
4  2013-01-05   6.000000  86.833333    3.700000   1016.500000

To transform this into a forecasting problem, we need a target column. For a given row, the target column must contain a value measured on a future date e.g meantemp_next_week:

>>> df['meantemp_next_week'] = df['meantemp'].shift(-7)
>>> df.head()

        date   meantemp  ...  meanpressure  meantemp_next_week
0  2013-01-01  10.000000  ...   1015.666667            8.857143
1  2013-01-02   7.400000  ...   1017.800000           14.000000
2  2013-01-03   7.166667  ...   1018.666667           11.000000
3  2013-01-04   8.666667  ...   1017.166667           15.714286
4  2013-01-05   6.000000  ...   1016.500000           14.000000

Our forecasting problem is to predict the mean temperature in Delhi in 1 week's time.

# Baseline solution

Let's fling the raw dataset into Kortical to get an initial model with a baseline score.

Here is the script:

SCRIPT 1: Execute a train run on the raw dataset, calculate the test score.

NOTE: Configuring your Training Run

The most important parameter in this script is arguably in train_model(). You might need to reduce max_models_with_no_score_change=100 for train runs on larger datasets/models; for this example, it should run reasonably quickly. As you will see, the remaining examples in this guide execute multiple train runs to optimise over features, so the total runtime will be multiplied. It's important to find the right trade-off where there is both a reasonable convergence and runtime.

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

from kortical import api, datasets
from kortical.api.data import Data
from kortical.api.model import Model


def preprocess(df):
    # Create target column
    df['meantemp_next_week'] = df['meantemp'].shift(-7)
    df.dropna(inplace=True)
    
    df_train, df_test = train_test_split(df, test_size=0.1, shuffle=False)
    return df_train, df_test


def train_run(df_train, df_test, target, experiment_name):
    # Authentication
    api.init(system_url='https://platform.kortical.com/<company>/<system>',
             credentials=<credentials-string>)

    # Prepare data
    train_data = Data.upload_df(df_train, f'{experiment_name}', targets=target)

    climate_model = Model.create_or_select('delhi_temperature_prediction', delete_unpublished_versions=True, stop_train=True)
    best_version = climate_model.train_model(train_data, max_models_with_no_score_change=100)

    # Deploy and predict
    development_environment = climate_model.get_environment()
    model_instance = development_environment.create_component_instance(best_version.id, wait_for_ready=True)
    predictions = model_instance.predict(df_test)

    mae_on_test = mean_absolute_error(predictions[target], predictions['yhat'])
    print(f'MAE on test set was {mae_on_test}')

    return best_version, mae_on_test


if __name__ == '__main__':
    df = datasets.load('delhi_climate')
    df_train, df_test = preprocess(df)
    train_run(df_train, df_test, target='meantemp_next_week', experiment_name='raw_data')

How can this be improved?

In most cases we wouldn't use the raw dataset directly for machine learning because each row only contains information at a single point in time: imagine predicting UK rainfall tomorrow using only the rainfall today... yikes! We would want to have an idea of the weather last month, or even last year.

TAKEAWAY

The values of an observation (such as temperature or humidity) at any given point in time is only part of the story - we also need to understand it in terms of past observations to help us understand norms and trends.

In order to improve performance, we need more context; this can be done with feature engineering.

# Creating lags with the Time Series Library

To address the problems of the previous section, we need to transform the original dataset into a form which provides more predictive power for machine learning:

Leads. These are target columns containing future values we want to predict. This has already been done in the previous section (the target is meantemp_next_week).
Lags. These are observations that have been calculated from past data. Examples of lags include rolling averages (e.g average mean temperature over the last month) and offsets (e.g temperature this time last year). If a lag feature is calculated from more than one row, a time window and aggregation function must be defined.

Our top priority is to create an assortment of lag features, which will help our train runs to create high-performing models. The Kortical time series package is designed to accelerate this:

from kortical.features import time_series as ts

rows = ts.create_rows(
        dataframe=df.copy(),
        datetime_column='date',
        columns=['meantemp', 'humidity', 'wind_speed', 'meanpressure'],
        time_windows=ts.lags_daily_over_10_days + ts.lags_weekly_over_3_weeks + lags_weekly_over_4_weeks_last_year + ts.lags_yearly_over_3_years + last_year_today,
        functions=(fns.mean, fns.min, fns.max, fns.std),
        sample_frequency=timedelta(days=1),
        datetime_format='%Y-%m-%d')

In the function above, note these particular arguments:

columns - These are the columns from which we calculate lag features.
time_windows - These are the time periods we want to aggregate over. Some common windows are already offered, and more custom windows can be defined with the TimeWindow class.
functions - These are the aggregation functions we want to use over the time windows. Again, some common functions are offered.

This function will consider all permutations over the specified columns, time windows and functions. As a result, the original 5-column dataframe has been processed into a new dataframe with 205 columns! This provides a lot more context for machine learning.

Here is the updated script:

SCRIPT 2: Create lag features from the original dataframe, execute a train run, calculate the test score.

from datetime import timedelta
from dateutil.relativedelta import relativedelta
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

from kortical import api, datasets
from kortical.api.data import Data
from kortical.api.model import Model
from kortical.features import time_series as ts
from kortical.features.time_series import Functions as fns


def preprocess(df):

    # Define time windows
    last_year_today = [ts.TimeWindow(
        start=relativedelta(years=1),
        duration=timedelta(days=1),
        use_value_as_is=True
    )]
    lags_weekly_over_4_weeks_last_year = ts.generate_weekly_windows(
        num_weeks=4,
        start_offset=relativedelta(weeks=51)
    )
    lags_monthly_over_4_months_last_year = ts.generate_monthly_windows(
        num_months=4,
        start_offset=relativedelta(months=11)
    )

    # Create lag features
    rows = ts.create_rows(
        dataframe=df.copy(),
        datetime_column='date',
        columns=['meantemp', 'humidity', 'wind_speed', 'meanpressure'],
        time_windows=ts.lags_daily_over_10_days + ts.lags_weekly_over_3_weeks + lags_weekly_over_4_weeks_last_year + ts.lags_yearly_over_3_years + last_year_today,
        functions=(fns.mean, fns.min, fns.max, fns.std),
        sample_frequency=timedelta(days=1),
        datetime_format='%Y-%m-%d')

    # finally, index by datetime
    rows.index = pd.to_datetime(rows.index).date

    # Create target column
    rows['meantemp_next_week'] = rows['meantemp_now'].shift(7)
    rows.dropna(subset=['meantemp_next_week'], inplace=True)

    diff = rows['meantemp_next_week'] - rows['meantemp_now']
    print(f'Average difference: {diff.abs().mean()}')
    rows.index = pd.to_datetime(rows.index)
    rows = rows.sort_index()

    # Split into train + test
    encoded_train, encoded_test = train_test_split(rows, test_size=0.1, shuffle=False)

    return encoded_train, encoded_test


def train_run(df_train, df_test, target, experiment_name):
    # Authentication
    api.init(system_url='https://platform.kortical.com/<company>/<system>',
             credentials=<credentials-string>)

    # Prepare data
    train_data = Data.upload_df(df_train, f'{experiment_name}', targets=target)

    climate_model = Model.create_or_select('delhi_temperature_prediction', delete_unpublished_versions=True, stop_train=True)
    best_version = climate_model.train_model(train_data, max_models_with_no_score_change=100)

    # Deploy and predict
    development_environment = climate_model.get_environment()
    model_instance = development_environment.create_component_instance(best_version.id, wait_for_ready=True)
    predictions = model_instance.predict(df_test)

    mae_on_test = mean_absolute_error(predictions[target], predictions['yhat'])
    print(f'MAE on test set was {mae_on_test}')

    return best_version, mae_on_test


if __name__ == '__main__':
    df = datasets.load('delhi_climate')
    df_train, df_test = preprocess(df)
    best_version, mae_on_test = train_run(df_train, df_test, target='meantemp_next_week', experiment_name='data_with_lags')

The new mean absolute error should be a significant improvement over the original train run.

TIP

Refer to SDK Documentation -> Feature Engineering -> Time Series Transformation for full documentation.

What time windows should I use?

An intuition can be developed from the dataset and the type of problem we are solving.

In this example we are predicting temperature, which tends to trend in cycles over a yearly period. As a result, we are more inclined to use lags that focus on this time last year, or stretch over a longer time window.

Another example might have a different answer: consider a time series that represent Blackberry phone sales each month. Although in the past we may have seen a slow upward trend with seasonal spikes, perhaps recent events (e.g launch of the iPhone) caused sales to plummet; a long-term pattern is not guaranteed to hold, so the model might need to weight short-term lags more heavily to react to sudden changes in the market.

The more features the better, right?

Not necessarily.

The more columns we have in our dataset, the more rows we need for training in order to get suitable performance; however, we might not always have enough data (e.g weekly data spanning 5 years is only 260 rows). Since time series data is often small, we need to deliberately pick a subset of possible features based on intuition.

There are endless combinations of features we can select and use for creation of even more features. In practice, we wouldn't want to try all possibilities if the test set was small, since we are likely to overfit; furthermore, testing every approach would take forever. Instead, we can consider a few sensible feature engineering approaches (e.g try creating features with longer time windows). Insight-driven hypotheses from the problem and the data will determine which strategies are favourable.

# Iterating with the Experiment Framework

As explained in the previous section, there are often multiple data science approaches when preparing a time series dataset for training. How do we know which approach is best? The only way to reliably compare them is to test them all empirically, resulting in the following workflow:

Preprocess the dataset to include the lag features we want.
Start a train run.
Wait for a high-performing candidate model to be created (this might take a while depending on the data...).
Store the results and repeat for the next approach.

Iterating through just 5 strategies would result in an extremely monotonous job of a stop/start nature. Instead, we can automate this using Kortical's experiment framework.

The general template for using the experiment framework is shown below:

from kortical.api.experiment_framework import ExperimentFramework

experiments = [
    "use a and b": ["feature_a", "feature_b"],
    "use a and c": ["feature_a", "feature_c"],
    "use b and c": ["feature_b", "feature_c"]
]

with ExperimentFramework('.experiment_name') as ef:
    for experiment_name, experiment_data in experiments:
        
        # Do something that results in a train run and a best model/score
        best_model_version, score = do_stuff(**args, experiment_data)

        ef.record_experiment(experiment_name, model_version=best_model_version, results=score)

You can iterate over anything that might affect a train run; this is normally a subset of columns. However in this case, we want to keep all the original columns and experiment with how lag features are being generated. Therefore, we choose to test across a set of aggregation functions.

The final script is shown below:

SCRIPT 3: Experiment with aggregation functions for feature generation, execute a train run and store the best scores.

from datetime import timedelta
from dateutil.relativedelta import relativedelta
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

from kortical import api, datasets
from kortical.api.data import Data
from kortical.api.model import Model
from kortical.api.experiment_framework import ExperimentFramework
from kortical.features import time_series as ts
from kortical.features.time_series import Functions as fns


def preprocess(df, lag_functions):

    # Define time windows
    last_year_today = [ts.TimeWindow(
        start=relativedelta(years=1),
        duration=timedelta(days=1),
        use_value_as_is=True
    )]
    lags_weekly_over_4_weeks_last_year = ts.generate_weekly_windows(
        num_weeks=4,
        start_offset=relativedelta(weeks=51)
    )
    lags_monthly_over_4_months_last_year = ts.generate_monthly_windows(
        num_months=4,
        start_offset=relativedelta(months=11)
    )

    # Create lag features
    rows = ts.create_rows(
        dataframe=df.copy(),
        datetime_column='date',
        columns=['meantemp', 'humidity', 'wind_speed', 'meanpressure'],
        time_windows=ts.lags_daily_over_10_days + ts.lags_weekly_over_3_weeks + lags_weekly_over_4_weeks_last_year + ts.lags_yearly_over_3_years + last_year_today,
        functions=lag_functions,
        sample_frequency=timedelta(days=1),
        datetime_format='%Y-%m-%d')

    # finally, index by datetime
    rows.index = pd.to_datetime(rows.index).date

    # Create target column
    rows['meantemp_next_week'] = rows['meantemp_now'].shift(7)
    rows.dropna(subset=['meantemp_next_week'], inplace=True)

    diff = rows['meantemp_next_week'] - rows['meantemp_now']
    print(f'Average difference: {diff.abs().mean()}')
    rows.index = pd.to_datetime(rows.index)
    rows = rows.sort_index()

    # Split into train + test
    encoded_train, encoded_test = train_test_split(rows, test_size=0.1, shuffle=False)

    return encoded_train, encoded_test


def train_run(df_train, df_test, target, experiment_name):
    # Authentication
    api.init(system_url='https://platform.kortical.com/<company>/<system>',
             credentials=<credentials-string>)

    # Prepare data
    train_data = Data.upload_df(df_train, f'{experiment_name}', targets=target)

    climate_model = Model.create_or_select('delhi_temperature_prediction', delete_unpublished_versions=True, stop_train=True)
    best_version = climate_model.train_model(train_data, max_models_with_no_score_change=100)

    # Deploy and predict
    development_environment = climate_model.get_environment()
    model_instance = development_environment.create_component_instance(best_version.id, wait_for_ready=True)
    predictions = model_instance.predict(df_test)

    mae_on_test = mean_absolute_error(predictions[target], predictions['yhat'])
    print(f'MAE on test set was {mae_on_test}')

    return best_version, mae_on_test


if __name__ == '__main__':

    experiments = [
        (fns.mean, fns.min),
        (fns.mean, fns.max),
        (fns.mean, fns.std),
        (fns.mean, fns.min, fns.max),
        (fns.mean, fns.min, fns.std),
        (fns.mean, fns.max, fns.std),
        (fns.mean, fns.min, fns.max, fns.std),
    ]

    with ExperimentFramework('.delhi_climate_experiments', is_minimising=True) as ef:
        for functions in experiments:
            experiment_name = "Functions: {}".format(', '.join([f.__name__ for f in functions]))
            print(experiment_name)

            df = datasets.load('delhi_climate')
            df_train, df_test = preprocess(df, functions)
            best_version, mae_on_test = train_run(df_train, df_test, target='meantemp_next_week', experiment_name=experiment_name)

            ef.record_experiment(experiment_name, model_version=best_version, results=mae_on_test)

The final result of this experimental run is shown below:

>>>   from kortical.api.experiment_framework import ExperimentFramework
>>>   with ExperimentFramework('.delhi_climate_experiments', is_minimising=True) as ef:
>>>       ef.print_experiment_results()

Experiment list, ranked by score:

1. Functions: mean, amin, amax. Result: 1.483183839156075, Best Model Score: 1.1935639782912986 mean_absolute_error at 2023/07/10, 15:37:00 
version_id: 1973, model_type: extra_trees, score: 1.1935639782912986, score_type: mean_absolute_error, 
2. Functions: mean, amax, std. Result: 1.5533506131364487, Best Model Score: 1.2146430870893596 mean_absolute_error at 2023/07/10, 17:38:53 
version_id: 2290, model_type: random_forest, score: 1.2146430870893596, score_type: mean_absolute_error, 
3. Functions: mean, amin, std. Result: 1.511986292313084, Best Model Score: 1.2391746843498472 mean_absolute_error at 2023/07/10, 15:51:21 
version_id: 2094, model_type: extra_trees, score: 1.2391746843498472, score_type: mean_absolute_error, 
4. Functions: mean, amin. Result: 1.3562317601560747, Best Model Score: 1.2557637963779862 mean_absolute_error at 2023/07/10, 14:51:50 
version_id: 1672, model_type: random_forest, score: 1.2557637963779862, score_type: mean_absolute_error, 
5. Functions: mean, amax. Result: 1.5676393233308412, Best Model Score: 1.2691204582073359 mean_absolute_error at 2023/07/10, 15:05:20 
version_id: 1771, model_type: random_forest, score: 1.2691204582073359, score_type: mean_absolute_error, 
6. Functions: mean, amin, amax, std. Result: 1.8216208416457942, Best Model Score: 1.2822942143577594 mean_absolute_error at 2023/07/10, 18:53:13 
version_id: 2411, model_type: extra_trees, score: 1.2822942143577594, score_type: mean_absolute_error, 
7. Functions: mean, std. Result: 1.6977714971457945, Best Model Score: 1.2900593830368485 mean_absolute_error at 2023/07/10, 15:19:11 
version_id: 1874, model_type: extra_trees, score: 1.2900593830368485, score_type: mean_absolute_error,

Looking at the top row, the best model version score should show a further improvement; this was trained with lag features generated from mean/min/max functions.

TIP

Refer to SDK Documentation -> Platform API -> Experiment Framework for full documentation.

Of course, this script has just scratched the surface; experiments can also be done to optimise over columns and time windows, instead of just the aggregation functions. With Kortical, we can set up dozens of experiments and train runs, storing the results as we go with no need to babysit the process.

← How to build a Model Retraining Pipeline Platform Documentation →