Sequence Predictions with Keras & TensorFlow

Introduction

Many forecasting problems require more than the historical values of a time series to make accurate predictions. For example, when forecasting sales we can make better predictions by considering leads, opportunities and marketing campaigns. When forecasting solar power generation, we need to consider variables such as cloud cover and hours of daylight. Deep Learning offers us powerful capabilities for modelling such sequence prediction problems.

Deep learning models are capable of learning features from sequential data, can take in multiple input variables, and be set up to look at a certain number of periods, for instance 30 days, and make multi-step predictions, for example for the next 7 days. With Convolutional Neural Networks (CNNs) we have the capability to learn patterns from sequential data. And with Long Short Term Memory Networks (LSTMs) we can build models that retain information on long term dependencies.

Keras is a Deep Learning library for Python which provides a high-level neural networks API, capable of running on top of TensorFlow, CNTK, or Theano.

This SSE provides functions for sequence predictions that use Keras and scikit-learn. We can set up our neural network's 'architecture', i.e. the inputs, hidden layers and outputs, and then train and test the model entirely through the Qlik load script. We can generate predictions by passing a configured number of lag periods to the model, either through the load script or through chart expressions. This implementation supports both regression and classification problems.

If you want to dive deeper into Deep Learning I recommend the following posts:

The Unreasonable Effectiveness of Recurrent Neural Networks
Understanding LSTM Networks
How to Develop Deep Learning Models for Univariate Time Series Forecasting
Introduction to 1D Convolutional Neural Networks in Keras for Time Sequences
Convolutions in Autoregressive Neural Networks

Deep Learning

The steps to set up a Deep Learning model are very similar to those described under Machine Learning with scikit-learn . In this SSE each step is carried out with the help of functions that provide a bridge between Qlik and the scikit-learn and Keras APIs. These functions are explained in the sections below.

At a high-level the steps are:

Prepare the features in Qlik
Prepare feature definitions in Qlik
Prepare the architecture for the neural network in Qlik
Setup the model with relevant parameters
- PyTools.sklearn_Setup(model_name, estimator_args, scaler_args, execution_args)
- PyTools.sklearn_Setup_Adv(model_name, estimator_args, scaler_args, metric_args, dim_reduction_args, execution_args)
Set feature definitions for the model
- PyTools.sklearn_Set_Features(model_name, feature_name, variable_type, data_type, feature_strategy, strategy_args)
Set the Keras architecture for the model
- PyTools.Keras_Set_Layers(model_name, sort_order, layer_type, args, kwargs)
Fit the model using the training data
- PyTools.sklearn_Fit(model_name, n_features)
Calculate metrics on test data. In contrast to standard scikit-learn models, it is recommended to do this separate from training.
- PyTools.sklearn_Calculate_Metrics_Sequence(model_name, key, n_features)
- PyTools.sklearn_Get_Confusion_Matrix(model_name) (Only applicable for classifiers)
Get loss (error) per training epoch for the model. Plotting this data on a line chart gives an idea of how well the model is learning from the given data.
- PyTools.Keras_Get_History(model_name)
Get predictions from an existing model.
- PyTools.sklearn_Predict_Sequence(model_name, key, n_features) (For use in chart expressions)
- PyTools.sklearn_Bulk_Predict_Sequence(model_name, key, n_features) (For use in the load script)
- PyTools.sklearn_Predict_Proba_Sequence(model_name, key, n_features) (For use in chart expressions. Only applicable to classifiers)
- PyTools.sklearn_Bulk_Predict_Proba_Sequence(model_name, key, n_features) (For use in the load script. Only applicable to classifiers)

Steps 1-9 are done through Qlik's data load script, while the predictions can be made through either the load script or in real-time using chart expressions.

Note that the sklearn_Explain_Importances function is currently not supported for Keras models.

Sample Qlik Sense app

For the explanations below you can refer to Sample-App-Forecasting-with-Keras which trains four different models to predict bicycle rentals in Washington. The data source for this app can be found here.

You will need to reload the app to train the models before you can use the last two sheets to make predictions.

Preparing data

The first step is to gather the features (i.e. dimensions and measures in Qlik) that will help us predict the target. The target can be discrete labels for a classification problem, or a continuous variable for a regression problem.

This is something where Qlik natively works well; bringing together data from multiple sources, deriving new dimensions and measures and structuring the data into a single table. This table should provide us one row for each sample, with columns for the target and all the features being provided to predict the target.

For sequence predictions you need to include a unique numerical identifier field, for example the date for daily data. This will be used to preserve the sequence of the data.

The first input to our model is this training dataset. You should hold back a portion of the data for testing.

Preparing feature definitions

Next, for this implementation, we need to provide feature definitions for our dataset. This metadata can be brought into Qlik from any source such as a file or an internal table.

For each feature, i.e. each column in the dataset, we need to define the following attributes:

Metadata field	Description	Valid values	Remarks
Name	A unique name for the feature	Any string	The feature name must be unique.
Variable Type	Identify whether the variable is a feature or target	`feature`, `target`, `excluded`, `identifier`	You need to specify an identifer which should be numerical and unique for each row. If using a date field you will need to convert it to a number using the `num` function in Qlik. Features marked as `excluded` will be ignored.
Data Type	Used to covert the data to the correct type	`bool`, `int`, `float`, `str`	Specifying data types is necessary due to how data will be exchanged between Qlik and this SSE.
Feature Strategy	The feature preparation strategy	`one hot encoding`, `hashing`, `count_vectorizing`, `tf_idf`, `text_similarity`, `scaling`, `none`	Strings need to be converted to numerical values for machine learning. The strategies implemented in this SSE to do this are one hot encoding, hashing, count vectorizing, TF-IDF vectorizing and `text_similarity`. The latter is an approach developed in this SSE to convert strings to their unicode representation and then apply one hot encoding, creating one feature for each unique character in the column. This can be useful when similarity between strings is significant. Numerical values generally need to be scaled to avoid bias towards larger numbers. In general, for discrete values use OHE where the number of unique values is small, otherwise use hashing. For continuous values, use scaling.
Strategy Arguments	For `hashing`: the number of derived features. For `count_vectorizing` and `tf_idf`: the keyword arguments that will be passed to the scikit-learn class	For `hashing` this should be an integer e.g. `4` For `count_vectorizing` and `tf_idf` this should follow the syntax described under Specifying keyword arguments for scikit-learn classes e.g. `'analyzer=char\|str, ngram_range=2;2\|tuple\|int'`	For hashing the integer should be a power of 2 for the algorithm to work correctly.

The table should look like this:

Setting up the model

The model is set up through the Qlik load script. You should familiarize yourself with the LOAD...EXTENSION syntax.

To set up the model we need to provide a model name and parameters for the estimator, scaler and the SSE itself. The input to the function sklearn_Setup is a table with one row and four columns: Model_Name, EstimatorArgs, ScalerArgs, ExecutionArgs.

While the parameters that can be passed here are explained in the Input Specifications, there are certain essential ones required for setting up a model for sequence predictions.

Estimaor

For regression use KerasRegressor and for classification use KerasClassifier.

Epochs and Batches

You need to specify the epochs and batch_size in the EstimatorArgs for each model. If you are unfamiliar with these concepts they are explained well here.

In short an epoch is a cycle in which the model is trained over the entire training set. Neural networks generally require hundreds of epochs for learning from a given dataset. In each epoch the model's weights will be updated after a batch of samples has been processed. The batch_size determines the size of each batch in an epoch.

Testing strategy

Next, in the ExecutionArgs you need to set the test_size to zero as it is recommended to do the testing separate to training so that we can use walk forward validation. If testing using cross validation you will need to pass the time_series_split argument with the number of splits, e.g. time_series_split=3.

Lag observations

You also need to specify the lag periods which will be used to reshape the data, so that each sample shown to the model is composed of a number of lag observations. For example we can build a model that uses the previous 10 periods as input to make a prediction for the current period.

Note that you don't need to reshape the data in Qlik; the lag observations will be added by the SSE.

You can also specify whether to use lag values of the target as an additional input feature which will be added by the SSE, and whether to scale the target.

test_size=0, lags=29, lag_target=true, scale_target=false

In addition the current_sample_as_input parameter controls whether the model expects inputs (other than the target) for the current sample or will only use lag observations to make its predictions. This parameter is set to true by default.

Multi-step predictions

Finally, for multi-step predictions, for e.g. using the previous 30 days of inputs to produce a 7 day forecast, you can specify the prediction periods in the ExecutionArgs as well.

test_size=0, prediction_periods=7, current_sample_as_input=false, lags=30, lag_target=true, scale_target=false

Model setup

In the sample app we set up a table MODEL_INIT to provide the input to our function.

The response is the model name, the result and a timestamp. If successful, this call will save the model to the path ../qlik-py-tools/qlik-py-env/models.

[Result-Setup]:
LOAD
   model_name,
   result,
   timestamp
EXTENSION PyTools.sklearn_Setup(MODEL_INIT{Model_Name, EstimatorArgs, ScalerArgs, ExecutionArgs});

Feature definitions

We then send the feature definitions, i.e. the metadata on our features, to the model. In the sample script below the existing table FEATURES provides us the inputs: Model_Name, Name, Variable_Type, Data_Type, Feature_Strategy, Strategy_Arguments.

[Result-Setup]:
LOAD
   model_name,
   result,
   timestamp
EXTENSION PyTools.sklearn_Set_Features(FEATURES{Model_Name, Name, Variable_Type, Data_Type, Feature_Strategy, Strategy_Arguments});

Setting up the neural network

The architecture of a neural network is defined by a number of layers that connect the input to the output. This SSE supports sequential Keras models which are a linear stack of layers.

Using Python we can define a Keras model in a few lines of code:

from keras.models import Sequential

model = Sequential()
model.add(Dense(units=64, activation='relu', input_shape=(100,)))
model.add(Dense(units=1, activation='softmax'))
model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['accuracy'])

This model has two Dense layers, the first with 64 nodes and the second with a single node which is the output. The input shape is 100 which means that we expect 100 features as input for each sample. The final line compiles the model for training.

To define the same model through this SSE we set up the architecture in the Qlik load script. We need to provide five fields: Model Name, Sort Order, Layer Type, Args and Kwargs. In addition, we need to specify the data type for each value in Args and Kwargs so we can convert these to the correct data type in Python. More details on this notation is available under the Input Specifications.

[Keras Architecture]:
LOAD * INLINE [
Model Name:Sort Order:Layer Type:Args:Kwargs
Dense-Net:1:Dense:64|int:input_shape=None|tuple|int, activation=relu|str
Dense-Net:2:Dense:1|int:activation=softmax|str
Dense-Net:3:Compilation::loss=binary_crossentropy|str, optimizer=sgd|str, metrics=accuracy|list|str
](delimiter is ':');

The final row in the architecture should be the compilation parameters for the model. The optimizer parameters can be customized by passing it as the second last row of the table.

[Keras Architecture]:
LOAD * INLINE [
Model Name:Sort Order:Layer Type:Args:Kwargs
Dense-Net:1:Dense:64|int:input_shape=None|tuple|int, activation=relu|str
Dense-Net:2:Dense:1|int:activation=softmax|str
Dense-Net:3:Adam::lr=0.002|float
Dense-Net:4:Compilation::loss=binary_crossentropy|str, metrics=accuracy|list|str
](delimiter is ':');

For valid optimizers and relevant parameters refer to the Keras documentation here.

Input shape

Note that we passed the input shape as None in the example above. The SSE will calculate the final number of features after pre-processing the data, for e.g. by applying one hot encoding, and set this value for us.

However, the number of dimensions in the input shape will depend on the Keras layer requirements. For example a Conv1D layer expects a 3D input of (batch, steps, channels). We don't need to specify the batch in Keras, and the channels refer to the number of features which are calculated by the SSE. So in this case you only need to provide the value of the steps and a placeholder value for channels.

The values for dimensions other than the number of features depend on the lags and current_sample_as_input parameter set in the ExecutionArgs.

For e.g. with lags=29, current_sample_as_input=true, we need the previous 29 samples and the current sample as input. So in this case we would have input_shape=30;None|tuple|int. With lags=29, current_sample_as_input=false we would have input_shape=29;None|tuple|int.

The lag_target argument simply adds another feature to the inputs and does not affect the dimensions of the input_shape that need to be specified.

Output shape

You can build multi-step models by having more than one node in the final layer. You must pass the prediction_periods parameter to match this number of output nodes in the ExecutionArgs.

Table structure

A description of the fields required to define the Keras architecture is provided below. For sample architectures refer to Sample Keras architectures.

Field	Description	Example	Remarks
Model_Name	Unique name for the model. This allows for multiple architectures to be defined in one table.	`Dense-Net`	The model will be stored to disk at `../qlik-py-tools/qlik-py-env/models`.
Sort_Order	Number to order the layers.	`1`	The compilation arguments should always be provided as a final 'layer'.
Layer_Type	Any valid Keras layer.	`Dense`, `LSTM`, `Conv1D`, `TimeDistributed Conv1D`	Refer to the Keras documentation for more information. For compilation parameters pass the `Layer_Type` as `compilation`. Layer wrappers can be used by adding a space between the wrapper and the layer type e.g. `TimeDistributed Dense`
Args	Positional arguments for the layer.	`64\|int`	A layer may take in arguments which do not require a keyword, for e.g the number of nodes for a Dense layer. These can be provided through the `Args` column. The data type for the argument should be appended to the value using the `
Kwargs	Keyword arguments for the layer.	`input_shape=30;None\|tuple\|int, activation=relu\|str`	A layer will take in several keyword arguments. You can provide these as `keyword1=value1\|type1, keyword2=value2\|type2...`. For lists, tuples and numpy arrays you can use the syntax `keyword=item1;item2;item3\|object_type\|value_type` and for dictionaries you can use `keyword=x:1;y:2\|dict\|key_type\|value_type`

Training the model

To train the model we need to prepare a temporary table that concatenates all features into a single string. This is due to Qlik currently requiring a fixed number of parameters for SSE function calls.

This table can be set up easily in the load script by concatenating fields using & and the delimiter |. Note that the delimiter must be |.

Remember to include a numerical identifier field which can be used by the SSE for sorting the data. Also, you must provide the fields in exactly the same order as specified in the feature definitions.

// Set up a temporary table for the training and test dataset
[TEMP_TRAIN_TEST]:
LOAD
   num([Date]) & '|' &
   [Count] & '|' &
   [Season] & '|' &
   [Month Number] & '|' &
   [Weekday] & '|' &
   [Weather] & '|' &
   [Temperature] & '|' &
   [Feels Like Temperature] & '|' &
   [Humidity] & '|' &
   [Wind Speed] as N_Features
RESIDENT [day]
WHERE Split = 'Train';

Now we are ready to train the model. The inputs to sklearn_Fit is the table setup earlier with the addition of a field for the model name.

[TEMP_SAMPLES]:
LOAD
   '$(vModel)' as Model_Name,
   N_Features
RESIDENT [TEMP_TRAIN_TEST];

[Result-Fit]:
LOAD
   model_name,
   result as fit_result, 
   time_stamp as fit_timestamp, 
   score_result, 
   score
EXTENSION PyTools.sklearn_Fit(TEMP_SAMPLES{Model_Name, N_Features});

You will see the progress of the training in the SSE terminal with the loss for each epoch indicating the error over the entire training set. As the training proceeds you should see the loss reduce and then stabilize if your model has reached a limit in learning from the training set.

Testing the model

For sequence prediction models we cannot use standard k-fold validation as we need to preserve order in the data. For this SSE, the recommended approach is to hold back a portion of the data, pass test_size=0 in the ExecutionArgs, and use the sklearn_Calculate_Metrics_Sequence function to get evaluation metrics.

The sklearn_Calculate_Metrics function takes in a new test dataset, with exactly the same features as the training data, and calculates the evaluation metrics. The Key field should be a unique numerical identifer that can be used to sort the data.

[Result-Metrics]:
LOAD *
EXTENSION PyTools.sklearn_Calculate_Metrics_Sequence(TEMP_SAMPLES_WITH_KEYS{Model_Name, Key, N_Features});

This function implements walk forward validation, which means that after each prediction we slide forward over the data by a given number of periods, and actual data for the previous periods is revealed to the model for use in the next prediction. This blog post provides further explanation.

If using lag values of the target as inputs, we use earlier predictons as input for further predictions. Since in practice, we would be able to provide actual values to the model, the evaluation should be performed in batches over limited periods. For example, the sample app calculates metrics for one month of data at a time and then averages the results across 6 months.

The approach is slightly different for multi-step predictions as the model is trained to make predictions for a fixed number of periods. In this case, we jump forward after each prediction period, and always use actual values for each prediction. In the sample app the Bikes-CausalConv-7d-v1 model applies this method.

// Other than the multi-step model we get evaluation metrics and predictions one month at a time.
// Since past values are inputs to these models, predicting over long periods will add up the errors from previous predictions.
// In practice, we can continue to provide actual values to the model.
// So we are going to get predictions one month at a time, supplying actual values for previous periods.
For Each vMonth in 7, 8, 9, 10, 11, 12

   // Prepare samples table with the current model name and unique keys
   [TEMP_SAMPLES_WITH_KEYS]:
   LOAD
      '$(vModel)' as Model_Name,
      Key,
      N_Features // Including Target for historical data. Target may be Null or zero for the periods to be predicted
   RESIDENT [TEMP_TEST]
   WHERE [Key] >= (MonthStart(MakeDate(2012, $(vMonth), 1))-31) and [Key] <= MonthEnd(MakeDate(2012, $(vMonth), 1)); 

   // Use the LOAD...EXTENSION syntax to call the Get_Metrics function
   [Result-Metrics]:
   LOAD *,
         MonthName(MakeDate(2012, $(vMonth), 1)) as MonthName
   EXTENSION PyTools.sklearn_Calculate_Metrics_Sequence(TEMP_SAMPLES_WITH_KEYS{Model_Name, Key, N_Features});

   Drop table TEMP_SAMPLES_WITH_KEYS;

Next vMonth

An alternative method for testing, is to pass the time_series_split parameter in the ExecutionArgs. This will get the SSE to split the data into the given number of series while preserving order, and then use the scikit-learn cross_validate function to obtain evaluation metrics during the call to sklearn_Fit. However, these metrics will be based on the standard prediction function which cannot use previous predictions as inputs. In addition, this would train multiple models, one for each split and then a final model with all the data, which can be very time consuming for neural networks.

The sklearn_Get_Metrics function can be used if the model was evaluated using cross validation. This function only requires the model name as the input.

[Result-Metrics]:
LOAD *
EXTENSION PyTools.sklearn_Get_Metrics(TEMP_SAMPLES{Model_Name});

Evaluation Metrics

The evaluation metrics available will depend on the testing method and whether the model is performing classification or regression. Refer to the table below for the output metrics.

Model Type	Testing Method	Metrics	Remarks
Classification	Hold-out	`accuracy`, `precision`, `recall`, `fscore`, `support`
Classification	Cross validation	`accuracy`, `precision`, `precision_std`, `recall`, `recall_std`, `fscore`, `fscore_std`	With cross validation, the metrics represent an average taken over subsets of data. The standard deviation helps indicate how actual performance of the model may vary on unseen data.
Regression	Hold-out	`r2_score`, `mean_squared_error`, `mean_absolute_error`, `median_absolute_error`, `explained_variance_score`
Regression	Cross validation	`r2_score`, `r2_score_std`, `mean_squared_error`, `mean_squared_error_std`, `mean_absolute_error`, `mean_absolute_error_std`, `median_absolute_error`, `median_absolute_error_std`, `explained_variance_score`, `explained_variance_score_std`	With cross validation, the metrics represent an average taken over subsets of data. The standard deviation helps indicate how actual performance of the model may vary on unseen data.

For classifiers you can also obtain a confusion matrix with the function sklearn_Get_Confusion_Matrix. For hold-out testing the matrix is calculated on the test dataset, while for cross validation the matrices are calculated for the test data in each fold and added up to give results across the entire training data.

[Result-ConfusionMatrix]:
LOAD *
EXTENSION PyTools.sklearn_Get_Confusion_Matrix(TEMP_SAMPLES{Model_Name});

Training History

You can obtain the training history for Keras models using the Keras_Get_History function. This function takes in the model name and returns the loss per epoch for the model recorded during model training. For more information on interpreting learning curves refer to this post.

// Use the LOAD...EXTENSION syntax to call the Keras_Get_History function
[Result-Keras]:
LOAD *
EXTENSION PyTools.Keras_Get_History(TEMP_MODEL{Model_Name});

Making predictions using the model

To make predictions you need to use a trained model. Trained models are found in the SSE under ../qlik-py-tools/qlik-py-env/models.

Convenience Functions

A list of models can be obtained using the sklearn_List_Models function. The input to this function is a string to search for within the directory. You can also provide an empty string to get a list of all files.

This function is meant to be used in chart expressions, for example it can be used a measure in a text object with the following expression.

'$(=PyTools.sklearn_List_Models(''))'

To make predictions you must provide exactly the same features in exactly the same order as the training data. And as explained in the section above, all features need to be concatenated into a single string. To help with this the function sklearn_Get_Features_Expression provides the expression that can be added to a variable in the load script.

TEMP_MODEL:
LOAD * INLINE [
    'Model_Name'
    'HR-Attrition-LR'
];

[FEATURES_EXPRESSION]:
LOAD
	result as features_expression
EXTENSION PyTools.sklearn_Get_Features_Expression(TEMP_MODEL{Model_Name});

vFeaturesExpression = peek('features_expression', 0, 'FEATURES_EXPRESSION');

Drop tables TEMP_MODEL;

Sequence Predictions

Predictions for sequential data can be made in a chart expression using the sklearn_Predict_Sequence function. For classifiers you can use the sklearn_Predict_Proba_Sequence function to get the predicted class probabilities.

The input to the function are the model name, a unique numerical identifier and the features. The features should include the identifier as well as the target if using lag_target=True. The target can be NULL or zero for future periods.

PyTools.sklearn_Predict_Sequence(
	'Bikes-CausalConv-v1', 
	num(Date), 
    num([Date]) & '|' & Count & '|' & Season & '|' & [Month Number] & '|' & Weekday & '|' & Weather & '|' & Temperature & '|' & [Feels Like Temperature] & '|' &  Humidity & '|' & [Wind Speed]
    )
// Count can be Null or zero for future periods. The predicted values from previous periods will then be used as input for further predictions.

Note: As of the June 2018 release, the expression above can be used as a dimension in the table object, allowing you to make selections on the prediction results. However, the prediction expression should not be stored as a master item dimension as this can lead to unstable behavior.

Predictions can also be made in the load script using the sklearn_Bulk_Predict method. For classifiers you can also use the sklearn_Bulk_Predict_Proba function to get the predicted class probabilities.

The input to the bulk prediction functions are the model name, a unique numerical identifier and the features. The features should include the identifier as well as the target if using lag_target=True. The target can be NULL or zero for future periods. The key field is returned in the response and can be used to link the predictions to the data model.

// Prepare samples table with the current model name and unique keys
[TEMP_SAMPLES_WITH_KEYS]:
LOAD
   '$(vModel)' as Model_Name,
   Key,
   N_Features // Including Target for historical data. Target may be Null or zero for the periods to be predicted
RESIDENT [TEMP_TEST]
WHERE [Key] >= (MonthStart(MakeDate(2012, $(vMonth), 1))-31) and [Key] <= MonthEnd(MakeDate(2012, $(vMonth), 1));

// Use the LOAD...EXTENSION syntax to call the Bulk_Predict function
[Predictions]:
LOAD
   model_name,
   Date(key) as [Date],
   prediction
EXTENSION PyTools.sklearn_Bulk_Predict_Sequence(TEMP_SAMPLES_WITH_KEYS{Model_Name, Key, N_Features});

If using lag values of the target as inputs, we use earlier predictons as input for further predictions which can cause errors to add up over time. Since in practice, we would be able to provide actual values to the model, the predictions should be performed in batches over limited periods. For example, the sample app gets predictions in the load script for one month of data at a time. In the Walk Forward Predictions sheet the SSE produces 7 days of predictions.

The approach is slightly different for multi-step predictions as the model is trained to make predictions for a fixed number of periods. In this case, we jump forward after each prediction period, and always use actual historical values for each prediction. For example, we can make predictions for a week, then jump ahead 7 days and use actual values for the previous week to make predictions for the next week. In the sample app the Bikes-CausalConv-7d-v1 model applies this method in the load script and on the Multi-Step Predictions sheet.

Sample Keras architectures

A key factor in getting good results from a neural network is the model architecture. While this often requires experimentation, you can get a good start by referring to published models. Keras provides several examples in its documentation.

The integration with this SSE has been build with sequence prediction and timeseries forecasting problems in mind. For such problems Convolutional Neural Networks (CNNs) and hybrid CNN and Long Short Term Memory networks (LSTMs) look most promising based on my research. Convolution, especially, seems to be key as stacking such layers helps identify higher level features in the time series across the lag observations.

Convolutional LSTM

Here is one Convolutional LSTM architecture used in the sample app. Note the lag settings and the input shape in the script below.

Each sample is going to consist of 29 lag periods. An additional row of data is needed as we are using lag observations of the target, i.e. for the 29th row we need to look at the 30th row's target. Finally, since the current_sample_as_input parameter is set to true by default, we will use another row of data which will be the period for which we make the prediction. The input from Qlik will be 31 rows of data.

These 31 rows will be reshaped by the SSE to the input shape set in the first layer. In this case the data will be reshaped to three timesteps of ten steps each with n features.

We use a TimeDistributed layer wrapper to add a time dimension to the 1D Convolutional layers which feeds into the LSTM layers. The final Dense layer has a single output node which will be the prediction for the last period in our input.

[Estimators]:
LOAD * INLINE [
Model Name;Estimator;Estimators.Hyperparameters;Execution Args
Bikes-ConvLSTM-v1;KerasRegressor;epochs=150|int, batch_size=14|int;test_size=0,lags=29,lag_target=true,scale_target=false
](delimiter is ';');

[Keras Architecture]:
LOAD * INLINE [
Model Name:Sort Order:Layer Type:Args:Kwargs
Bikes-ConvLSTM-v1:1:TimeDistributed Conv1D:100|int, 2|int:input_shape=3;10;None|tuple|int, activation=relu|str
Bikes-ConvLSTM-v1:2:TimeDistributed Conv1D:100|int, 2|int:activation=relu|str
Bikes-ConvLSTM-v1:3:TimeDistributed MaxPooling1D::pool_size=2|int
Bikes-ConvLSTM-v1:4:TimeDistributed Flatten::
Bikes-ConvLSTM-v1:5:LSTM:100|int:return_sequences=true|bool, activation=relu|str
Bikes-ConvLSTM-v1:6:LSTM:50|int:activation=relu|str
Bikes-ConvLSTM-v1:7:Dense:1|int:
Bikes-ConvLSTM-v1:8:Compilation::loss=mean_squared_error|str, optimizer=adam|str
](delimiter is ':');

The sample app includes another variant without the TimeDistributed layer wrapper. This model has three stacks of convolutional layers which can theoretically allow the model to learn higher level features from the inputs.

[Estimators]:
LOAD * INLINE [
Model Name;Estimator;Estimators.Hyperparameters;Execution Args
](delimiter is ';');

[Keras Architecture]:
LOAD * INLINE [
Model Name:Sort Order:Layer Type:Args:Kwargs
Bikes-ConvLSTM-v2:1:Conv1D:100|int, 2|int:input_shape=30;None|tuple|int, activation=relu|str
Bikes-ConvLSTM-v2:2:Conv1D:100|int, 2|int:activation=relu|str
Bikes-ConvLSTM-v2:3:MaxPooling1D::pool_size=2|int
Bikes-ConvLSTM-v2:4:Conv1D:100|int, 2|int:activation=relu|str
Bikes-ConvLSTM-v2:5:Conv1D:100|int, 2|int:activation=relu|str
Bikes-ConvLSTM-v2:6:MaxPooling1D::pool_size=2|int
Bikes-ConvLSTM-v2:7:Conv1D:100|int, 2|int:activation=relu|str
Bikes-ConvLSTM-v2:8:Conv1D:100|int, 2|int:activation=relu|str
Bikes-ConvLSTM-v2:9:MaxPooling1D::pool_size=2|int
Bikes-ConvLSTM-v2:10:LSTM:100|int:return_sequences=true|bool, activation=relu|str
Bikes-ConvLSTM-v2:11:LSTM:50|int:activation=relu|str
Bikes-ConvLSTM-v2:12:Dense:1|int:
Bikes-ConvLSTM-v2:13:Compilation::loss=mean_squared_error|str, optimizer=adam|str
](delimiter is ':');

Causal CNN

We can introduce causality and autoregression into a neural network using causal convolution layers. This is obviously an important consideration in timeseries and sequence prediction models and the Causal CNN variants are our preferred models in the sample app.

The architecture below is inspired from the posts here and here. Note the padding and dilation_rate arguments which set up this model to only apply 1D convolutions in the forward direction.

[Estimators]:
LOAD * INLINE [
Model Name;Estimator;Estimators.Hyperparameters;Execution Args
Bikes-CausalConv-v1;KerasRegressor;epochs=150|int, batch_size=14|int;test_size=0,lags=29,lag_target=true,scale_target=false
](delimiter is ';');

[Keras Architecture]:
LOAD * INLINE [
Model Name:Sort Order:Layer Type:Args:Kwargs
Bikes-CausalConv-v1:1:Conv1D:64|int, 2|int:input_shape=30;None|tuple|int, padding=causal|str, dilation_rate=1|int, activation=relu|str
Bikes-CausalConv-v1:2:Conv1D:64|int, 2|int:padding=causal|str, dilation_rate=2|int, activation=relu|str
Bikes-CausalConv-v1:3:Conv1D:64|int, 2|int:padding=causal|str, dilation_rate=4|int, activation=relu|str
Bikes-CausalConv-v1:4:Conv1D:64|int, 2|int:padding=causal|str, dilation_rate=8|int, activation=relu|str
Bikes-CausalConv-v1:5:Conv1D:64|int, 2|int:padding=causal|str, dilation_rate=16|int, activation=relu|str
Bikes-CausalConv-v1:6:Flatten::
Bikes-CausalConv-v1:7:Dense:512|int:activation=relu|str
Bikes-CausalConv-v1:8:Dense:256|int:activation=relu|str
Bikes-CausalConv-v1:9:Dropout:0.2|float:
Bikes-CausalConv-v1:10:Dense:1|int:
Bikes-CausalConv-v1:11:Compilation::loss=mean_squared_error|str, optimizer=adam|str
](delimiter is ':');

Multi-Step Causal CNN

The final model in the sample app is set up to take 30 days of input and return 7 days of predictions. Note the prediction_periods=7 argument in the ExecutionArgs and the 7 nodes in the final layer.

Also note the current_sample_as_input=false argument which sets up this model to only use lag observations as inputs. This lets us define models where we have no future information on any of the input variables.

[Estimators]:
LOAD * INLINE [
Model Name;Estimator;Estimators.Hyperparameters;Execution Args
Bikes-CausalConv-7d-v1;KerasRegressor;epochs=150|int, batch_size=14|int;test_size=0,prediction_periods=7,current_sample_as_input=false,lags=30,lag_target=true,scale_target=false
](delimiter is ';');

[Keras Architecture]:
LOAD * INLINE [
Model Name:Sort Order:Layer Type:Args:Kwargs
Bikes-CausalConv-7d-v1:1:Conv1D:64|int, 2|int:input_shape=30;None|tuple|int, padding=causal|str, dilation_rate=1|int, activation=relu|str
Bikes-CausalConv-7d-v1:2:Conv1D:64|int, 2|int:padding=causal|str, dilation_rate=2|int, activation=relu|str
Bikes-CausalConv-7d-v1:3:Conv1D:64|int, 2|int:padding=causal|str, dilation_rate=4|int, activation=relu|str
Bikes-CausalConv-7d-v1:4:Conv1D:64|int, 2|int:padding=causal|str, dilation_rate=8|int, activation=relu|str
Bikes-CausalConv-7d-v1:5:Conv1D:64|int, 2|int:padding=causal|str, dilation_rate=16|int, activation=relu|str
Bikes-CausalConv-7d-v1:6:Flatten::
Bikes-CausalConv-7d-v1:7:Dense:512|int:activation=relu|str
Bikes-CausalConv-7d-v1:8:Dense:256|int:activation=relu|str
Bikes-CausalConv-7d-v1:9:Dropout:0.2|float:
Bikes-CausalConv-7d-v1:10:Dense:7|int:
Bikes-CausalConv-7d-v1:11:Compilation::loss=mean_squared_error|str, optimizer=adam|str
](delimiter is ':');

Input Specifications

This sections provides the specific syntax required for arguments to the sklearn and keras functions.

In general, arguments are passed as strings to this SSE using the syntax argument=value. Multiple arguments within a string have to be separated by a comma, with single quotes around the entire string. For example:

// Set up a variable for execution parameters
LET vExecutionArgs = 'overwrite=true,test_size=0,lags=30,lag_target=true,debug=true';

Specifying keyword arguments for scikit-learn and Keras

Many of the functions defined in this SSE accept additional keyword arguments for the relevant scikit-learn or Keras class. You should refer to the scikit-learn and Keras APIs for documentation on the parameters available for each class or layer.

For the simple data types bool, int, float, and str, the format for such keyword arguments must take the form arg=value|type. Such arguments can be combined into a single string with the named parameters described in the sections below. For example:

'scaler=StandardScaler, with_mean=true|bool, with_std=true|bool'

In addition this SSE accepts lists, arrays and tuples with the syntax arg=item1;item2;item3|object_type|value_type. For example:

'arg=1;2;3|list|int'
'arg=x;y;z|array|str'
'arg=0;1|tuple|int'

Finally dictionaries are accepted with the syntax arg=x:1;y:2|dict|key_type|value_type. For example:

'arg=x:1;y:2|dict|str|int'

Execution Arguments

These arguments define the flow for the model. These settings are made once during the model setup and apply to subsequent calls to the model where relevant.

Keyword	Description	Sample Values	Remarks
overwrite	Specify whether any existing model with the same name should be overwritten	`true`, `false`	Defaults to `true`.
test_size	Set the ratio that will be used to split the samples into training and testing data sets	`0.3`, `0`	Defaults to `0.33`. It is recommended to set this to `0` when using sequence predictions and use a separate call with test data for evaluation. Refer to the Testing the model section for further explanation.
time_series_split	Enable cross validation	`3`	Defaults to `0` in which case the hold-out testing strategy is used as per `test_size`. The value represents the time series splitting strategy as defined in the scikit-learn TimeSeriesSplit class. Refer to the Testing the model section for further explanation.
lags	Add the given number lag observations to each sample.	`30`	This is a key parameter for Keras models and can be used to frame timeseries and sequence prediction problems into 3D or 4D data required for deep learning. For e.g. if lags=10, features from the previous ten samples will be concatenated as input features for the current sample. An identifier field must be included in the feature definitions to correctly sort the data for this capability. Defaults to zero lags.
lag_target	Include targets in the lag observations	`true`	Defaults to `false`. If True an additional feature will be created for each sample using the previous value of the target. This is a key input for autoregressive models. Note that setting this to true will mean that an additional row of input data is required. For example, with 29 lags the 29th row will need to refer to the 30th row's target.
scale_target	Scale the target before fitting	`true`	Defaults to `false`. The scaling will be inversed so that predictions are returned in the original scale.
make_stationary	Make the target series more stationary	`log`	Valid values are `log` in which case we apply a logarithm to the target values, or `difference` in which case we transform the targets into variance from the previous value. The transformation will be reversed before returning predictions.
current_sample_as_input	Specify if the current sample should be used as input to the model	`false`	Defaults to `true`. Setting this to `false` allows for models that only use lag observations to make future predictions.
prediction_periods	Specify the number of predictions expected from the model	`7`	Defaults to `1`. This can be used to get a model to predict the next m periods given inputs for the previous n periods. This is only valid for Keras models which have a final output layer with more than one node.
random_state	Seed used by the random number generator when generating the training testing split	`42`	Defaults to `42`. Must be an integer.
compress	Compression level between 1-9 used by joblib when saving the model	`1`	Defaults to `3`.
retain_data	Flag to determine if the training and test data should be saved in the model	`true`, `false`	Defaults to `false` as this adds to the size of the model on disk.
debug	Flag to output additional information to the terminal and logs	`true`, `false`	Defaults to `false`. Information will be printed to the terminal as well to a log file: `qlik-py-tools\qlik-py-env\core\logs\SKLearn Log <n>.txt`.

Scaler Arguments

These arguments specify how numerical data will be standardized before being sent to the machine learning algorithm. Standardization of the data is a common requirement for machine learning algorithmns and helps avoid bias. If you want to use default values you can simply pass an empty string for ScalerArgs when setting up the model.

In this implementation we use the sklearn.preprocessing package for scaling data.

Keyword	Description	Sample Values	Remarks
scaler	scikit-learn class that will be used for scaling numeric data	`StandardScaler`, `MinMaxScaler`, `MaxAbsScaler`, `RobustScaler`, `QuantileTransformer`	Defaults to `StandardScaler`. Mandatory argument if the scaler arguments are not an empty string.
missing	Strategy to use for missing/null values	`mean`, `median`, `mode`, `zeros`, `none`	Defaults to `zeros`. This setting only applies to numerical features. If you want Null values to be taken into consideration for text features, replace them with an appropriate string in Qlik.
scale_hashed	Whether to scale hashed features	`true`, `false`	Defaults to `true`. At times machine learning requires trial and error. You may want to control this setting and see the impact on the results.
scale_vectors	Whether to scale count vectorized and TF-IDF vectorized features	`true`, `false`	Defaults to `true`. Same as above. You may want to control this setting and see the impact on the results.

In addition to the standard parameters above, you can provide any valid key word arguments accepted by the scikit-learn preprocesing class specified under the scaler argument above. Refer to the specifictions under Specifying keyword arguments for scikit-learn classes

For more information on available parameters refer to the scikit-learn API.

Estimator Arguments

For Keras you can use KerasRegressor for regression problems and KerasClassifier for classification problems. Valid arguments are this for the Keras fit method documented here.

Additional arguments for the estimator should be included in the estimator_args string when calling the PyTools.sklearn_Setup or PyTools.sklearn_Setup_Adv functions. The syntax for this is described under Specifying keyword arguments for scikit-learn and Keras.

Metrics Arguments

Model evaluation metrics are calculated by the PyTools.sklearn_Fit and PyTools.sklearn_Calculate_Metrics_Sequence functions.

For classification, metrics are calculated using the scikit-learn class metrics.precision_recall_fscore_support. The accuracy is calculated using the metrics.accuracy_score method. When using cross validation the metrics are calculated using the methods precision_score, recall_score, f1_score.

For regression, metrics are calculated using metrics.r2_score, metrics.mean_squared_error, metrics.mean_absolute_error, metrics.median_absolute_error, metrics.explained_variance_score.

Arguments for the metrics other than those coming from the estimator's score method can be provided by using the PyTools.sklearn_Setup_Adv function instead of the standard PyTools.sklearn_Setup function.

[Result-Setup]:
LOAD
   model_name,
   result,
   timestamp
EXTENSION PyTools.sklearn_Setup_Adv(MODEL_INIT{Model_Name, EstimatorArgs, ScalerArgs, MetricArgs, DimReductionArgs, ExecutionArgs});

Metric arguments should be specified as a string using the syntax described under Specifying keyword arguments for scikit-learn and Keras.

Attribution

The data used in the sample app was obtained from https://www.kaggle.com:

Bike Sharing in Washington D.C. Dataset

Files

Keras.md

Latest commit

History