Skip to content

Data\Implementing BaseData

Gerardo Salazar edited this page Jul 3, 2020 · 1 revision

Introduction

All data that can be added and used in LEAN derives from the BaseData class with the exception of static files that can be downloaded using the Download method in your algorithm.

BaseData was created with the intention of being able to generically support any reoccuring time-series data, including custom user data that is not provided by QuantConnect.

Compared to static file downloads using Download, implementing BaseData should be considered when you are dealing with the following.

  • Repeating data that spans across multiple days
  • Custom data must be simulated in a backtest similar to price data
  • You plan on associating the custom data to equity or option tickers

To get started, there are a few guidelines you should follow when considering implementing a new data source with regards to your data.

  1. You understand the data you plan on implementing
  2. Data has no look-ahead bias
  3. Data is in a single line
    • If you can not represent your data in a single line because of the shape of the data (such as deeply nested objects), then you can try representing the data in JSON. But note that the data will have to be contained in one line with no line breaks in order to be read completely into the Reader method for parsing.

Quick Start

If you are using a static file, like for example a machine learning model hosted on DropBox, you can use the Download method in your algorithm to download the data. An example is provided below.

public class MyCustomAlgorithm : QCAlgorithm {
    public override void Initialize() {
        var myCustomModel = Download("https://<YOUR_SITE_GOES_HERE>/<FILE>");
        // Deserialize `myCustomModel` and load into a framework
    }
}
class MyCustomAlgorithm(QCAlgorithm):
    def Initialize(self):
        myCustomModel = self.Download("https://<YOUR_SITE_GOES_HERE>/<FILE>")
        # Unpickle `myCustomModel` and load into a framework

If you are using a remote data source for your algorithm such as data hosted on an API, you can use the following template to get started.

using QuantConnect;
using QuantConnect.Util;
using System;
using System.IO;

namespace QuantConnect.Algorithm.CSharp {
    // Algorithm implementation goes here
    // ...
    // ...
    // ...

    public class MyCustomDataSource : BaseData {
        // define your values here. Examples: BullSentiment, BearSentiment

        // Instructs LEAN to look for data at the given URL or location on disk.
        public override SubscriptionDataSource GetSource(SubscriptionDataConfig config, DateTime date, bool isLiveMode) {
            return new SubscriptionDataSource(
                "<LOCATION>",                         // Location of the data. Can be a path or URL.
                SubscriptionTransportMedium.<SOURCE>, // Specifies where to read the data from the source
                FileFormat.<FORMAT>                   // Specifies how to read the file
            );
        }

        // This will have to be implemented by you since almost all data sources differ in the way we parse them.
        public override BaseData Reader(SubscriptionDataConfig config, string line, DateTime date, bool isLiveMode) {
            // Your implementation goes here
        }

        // We include the Clone method override to ensure that our data implementation
        // is robust. Including the Clone method makes the custom data implementation 
        // more durable against failure.
        public override BaseData Clone() {
            return new MyCustomDataSource {
                // Don't forget to copy these two properties over
                EndTime = EndTime,
                Symbol = Symbol,

                // Copy your fields here
            };
        }

        // Specifies the time zone for this data source. This is useful for custom data types
        public override DateTimeZone DataTimeZone()
        {
            // Select the time zone of your data here.
            // Defaults to New York timezone if no implementation is provided
            // Example: 
            return TimeZones.Utc;
        }

        // Indicates whether this data type is linked to an underlying equity Symbol.
        public override bool RequiresMapping() {
            // If your data has a relationship with equities via their tickers,
            // you should set this to true. 
            
            // Example: SEC filings are based on stock tickers, so it should be `true`
            // Example: sentiment data tickers are related to stock tickers, so it should be `true`.
            // Example: Federal Reserve data tickers are not related to stock tickers, so it should be `false`.
            // Example: Weather data has no relationship to equities, so it should be `false`

            // Setting to `false` for example purposes
            return false;
        }

        // Sets the default resolution of this data source.
        public override Resolution DefaultResolution() {
            // Setting to `Resolution.Minute` for example purposes
            return Resolution.Minute;
        }

        // Sets the supported resolutions for this data source.
        public override List<Resolution> SupportedResolutions() {
            // Setting to all resolutions for example purposes
            return AllResolutions;
        }
    }
}
from datetime import datetime
from QuantConnect import *
from QuantConnect.Data import *

# Algorithm implementation goes here
# ...
# ...
# ...

class MyCustomDataSource(PythonData):
    def GetSource(self, config, date, isLiveMode):
        '''
        Instructs LEAN to look for data at the given URL or location on disk.
        '''

        return SubscriptionDataSource(
            "<LOCATION>",                         # Location of the data. Can be a path or URL.
            SubscriptionTransportMedium.<SOURCE>, # Specifies where to read the data from the source
            FileFormat.<FORMAT>                   # Specifies how to read the file
        )

    def Reader(self, config, line, date, isLiveMode):
        '''
        This will have to be implemented by you since almost all data sources differ in the way we parse them.
        '''
        # Here we parse the custom fields that we've implemented. Define your values here. Example:
        # instance["BullSentiment"] = ...
        # instance["BearSentiment"] = ...
        pass

    # Python doesn't require the implementation of the `Clone` method.
    # It is important that you do not override the `Clone` method from Python.

    def DataTimeZone(self):
        '''
        Select the time zone of your data here.
        Defaults to New York timezone if no implementation is provided.
        '''
        # Setting to "UTC" for example purposes
        return TimeZones.Utc

    def RequiresMapping(self):
        '''
        Indicates whether this data type is linked to an underlying equity Symbol.
        '''
        # If your data has a relationship with equities via their tickers,
        # you should set this to True. 
        
        # Example: SEC filings are based on stock tickers, so it should be `True`
        # Example: sentiment data tickers are related to stock tickers, so it should be `True`.
        # Example: Federal Reserve data tickers are not related to stock tickers, so it should be `False`.
        # Example: Weather data has no relationship to equities, so it should be `False`

        # Setting to `False` for example purposes
        return False

    def DefaultResolution(self):
        '''
        Sets the default resolution of this data source.
        '''
        # Setting to `Resolution.Minute` for example purposes
        return Resolution.Minute

    def SupportedResolutions(self):
        '''
        Sets the supported resolutions for this data source.
        '''
        # Setting to all resolutions for example purposes
        return self.AllResolutions

Implementing a Custom Data Type

What is BaseData?

BaseData is the base class used to define a base structure for which we can represent data in LEAN. All data, including equities, forex, crypto, futures, options, and CFDs all use an implementation of BaseData to transmit data to the user building/running an algorithm. Because you can not use BaseData directly, we must provide an implementation of BaseData in order to use it and load our data into LEAN. You can view the fields and properties of BaseData by clicking on the table below:


TABLE GOES HERE


BaseData Fields and Properties

BaseData includes overridable properties as well as some default properties left for convinience purposes. The most used and important properties of BaseData are

  • Value: decimal - Data value at the given EndTime. Useful for when you have a single series of data, not a DataFrame-like structure of data.
  • Time: DateTime - Beginning time of the data point. It explains when the data was emitted.
  • EndTime: DateTime - Explains when the data was emitted. By default, this will return the value of Time. This property is overridable in case we need to express a starting time with Time and the emit time with EndTime separately.
  • Symbol: Symbol - Associates the data with a Symbol object. An example would be AAPL sentiment data. The Symbol object would be representitive of an equity such as AAPL, FB, etc. depending on which assets you've subscribed to.

The rest of the BaseData properties are left for internal/specific use cases only.

BaseData Overrideable Methods

BaseData contains a handful of virtual methods, which means that these methods are overrideable. The most important virtual methods provided in BaseData are:

  • GetSource - Tells LEAN where to locate your data from
  • Reader - Creates a new instance of your BaseData implementation using data from disk. This is responsible for parsing data and converting it into a usable representation for your algorithm in LEAN.
  • Clone - Creates a clone of the data (deep copy)

Optional methods

  • RequiresMapping - Tells LEAN if rename events apply to this data source. Defaults to true if the Symbol SecurityType is equity or option
  • DataTimeZone - Tells LEAN what timezone this data source is in. Defaults to New York time.
  • IsSparseData - Tells LEAN whether to log for missing files if the data source is sparse (i.e. data missing between data points). Defaults to true if the data source is custom data.
  • ToString - Converts the instance to a string. Defaults to Symbol: Value (e.g. AAPL: 0.5)

GetSource

This method instructs LEAN on where to locate your data and which medium the data is in via the class SubscriptionDataSource. GetSource has three parameters passed to it: config, date, and isLiveMode.

  • The config parameter contains all of the data associated with an added equity, forex, crypto, future, option, cfd, or custom data such as the SecurityType, Resolution, Market, and Symbol. It is included to help you locate the data you want depending on the configuration you provided when the data was initially added to the algorithm. Its type is SubscriptionDataConfig.

  • The date parameter is the time the engine is requesting for the data. It is included to help you determine what date we want to load data for. Its type is DateTime

  • The isLiveMode parameter is included to help you determine whether we're running an algorithm live. This is so that you can decide what source you want to load data from in case the source for live data is different than backtesting. Its type is bool

You can view SubscriptionDataConfig fields/properties in the table below.


TABLE GOES HERE


To return a SubscriptionDataSource to LEAN, we must first specify the FileFormat and SubscriptionTransportMedium. An explanation of the two types is provided below.

  • SubscriptionTransportMedium - describes where the data is stored (e.g. local disk, remote). Tells LEAN where and how to find the data
  • FileFormat - describes the format the file is in (e.g. CSV). Tells LEAN how to pump the data to Reader

Reader

This method is the main place where all the parsing of the data will take place. In this method, we will convert the raw data from the source into a class usable by LEAN.

Reader has four parameters passed to it: config, line, date, and isLiveMode.

If FileFormat.Csv was selected, data will be split into lines and be individually passed into Reader. This means that your Reader method will be called n times depending on how many lines your file contains. It is important that your Reader method is robust and your data is clean to prevent any errors from occurring.

The various parameters passed to Reader are:

  • The config parameter contains all of the data associated with an added equity, forex, crypto, future, option, cfd, or custom data such as the SecurityType, Resolution, Market, and Symbol. It is included to help you locate the data you want depending on the configuration you provided when the data was initially added to the algorithm and has the type SubscriptionDataConfig.

  • The line parameter will contain a line of data from the data source specified in GetSource. This data is provided to you so that you can parse it. Its type is string.

  • The date parameter is the time the engine is requesting for the data. It is included to inform you as to what time we're requesting the data for. Its type is DateTime.

  • The isLiveMode parameter is included to help you determine whether we're running an algorithm live. This is so that you can decide if you want to parse the data in a special way depending if you're trading live since the data might come from another source. Its type is bool.

You can view SubscriptionDataConfig fields/properties in the table below.


TABLE GOES HERE


An example of a Reader implementation that parses CSV data is provided below:

public override BaseData Reader(SubscriptionDataConfig config, string line, DateTime date, bool isLiveMode) {
    // Assuming our CSV is as follows:
    // TIME (yyyyMMdd HH:mm:ss), BullSentiment, BearSentiment 
    var csv = line.Split(',');

    // Since MyCustomDataSource derives from BaseData, it is valid as a return type
    return new MyCustomDataSource {
        // This is the emit time, i.e. the time that the algorithm will output the event.
        // Ensure you have this value set
        EndTime = Parse.DateTimeExact(csv[0], "yyyyMMdd HH:mm:ss"),
        // This is the Symbol associated with the data. Usually should be set to `config.Symbol`
        // Ensure you have this value set. 
        Symbol = config.Symbol,

        // Here we parse the custom fields that we've implemented.
        BullSentiment = Parse.Decimal(csv[1]),
        BearSentiment = Parse.Decimal(csv[2])
    };
}
def Reader(self, config, line, date, isLiveMode):
    '''
    This will have to be implemented by you since almost all data sources differ in the way we parse them.
    Below we've provided an example showing how to correctly and idiomatically parse the data
    '''

    # Assuming our CSV is as follows:
    # TIME (yyyyMMdd HH:mm:ss), BullSentiment, BearSentiment 
    csv = line.split(",")

    # We must create an instance first to add our own custom data
    instance = MyCustomDataSource()

    # This is the emit time, i.e. the time that the algorithm will output the event.
    # Ensure you have this value set.
    instance.EndTime = datetime.strptime(csv[0], "%Y%m%d %H:%M:%S")

    # This is the Symbol associated with the data. Usually should be set to `config.Symbol`
    # Ensure you have this value set.
    instance.Symbol = config.Symbol

    # Here we parse the custom fields that we've implemented. Define your values here
    instance["BullSentiment"] = float(csv[1])
    instance["BearSentiment"] = float(csv[2])

    return instance

An example of a Reader implementation that parses JSON data is provided below. Please note that the JSON data must not contain any new lines/line breaks (i.e. data must be in a single line).

using Newtonsoft.Json;
using QuantConnect.Util;

namespace QuantConnect.Algorithm.CSharp {
    public class MyCustomDataSource : BaseData {

        [JsonProperty("bull_sentiment")]
        public decimal BullSentiment { get; set; }

        [JsonProperty("bear_sentiment")]
        public decimal BearSentiment { get; set; }

        // Because we have a format that Json.NET can't parse, we need to define
        // the format of the date. You can use the `DateTimeJsonConverter` class
        // to define the format of the date easily.
        [JsonProperty("time"), JsonConverter(typeof(DateTimeJsonConverter), "yyyyMMdd HH:mm:ss")]
        public override DateTime EndTime { get; set; }


        public override BaseData Reader(SubscriptionDataConfig config, string line, DateTime date, bool isLiveMode) {
            // Assuming our JSON data is as follows:
            // {"time": "20190101 00:00:00", "bull_sentiment": 0.5, "bear_sentiment": 0.5}
            // If we're going to be parsing JSON, use Newtonsoft's JSON parser and decorate your types with `JsonProperty`

            // Use the current class name (MyCustomDataSource) as the type parameter.
            // Replace the type parameter with the name of your class.
            var instance = JsonConvert.DeserializeObject<MyCustomDataSource>(line);

            // This is the Symbol associated with the data. Usually should be set to `config.Symbol`
            // Ensure you have this value set. 
            instance.Symbol = config.Symbol;

            return instance;
        }
    }
}
import json
from datetime import datetime

class MyCustomDataSource(PythonData):
    def Reader(self, config, line, date, isLiveMode):
        # Assuming our JSON data is as follows:
        # {"time": "20190101 00:00:00", "bull_sentiment": 0.5, "bear_sentiment": 0.5}
        data = json.loads(line)

        instance = MyCustomDataSource()

        # Parse the time and set EndTime equal to it.
        # Ensure you have this value set.
        instance.EndTime = datetime.strptime(data["end_time"], "%Y%m%d %H:%M:%S")

        # This is the Symbol associated with the data. Usually should be set to `config.Symbol`
        # Ensure you have this value set. 
        instance.Symbol = config.Symbol

        instance["BullSentiment"] = data["bull_sentiment"]
        instance["BearSentiment"] = data["bear_sentiment"]

        return instance

Clone

To ensure we have a robust BaseData type, we must implement the Clone method (only in C#). The Clone method is called by the LEAN engine to create a copy of the data so that the original data is not altered.

In addition, we guarantee a higher degree of robustness to your custom data source if you implement this method. More information can be found under the Debugging::Why is my data all null? section.

To implement the Clone method, simply copy over all the types found in your data. The only property that is modified is the Time property. But because it is a DateTime, we are guaranteed to have a full copy occur.

In Python, do not override Clone.

An example of the Clone method implementation is provided below.

// We include the Clone method override to ensure that our data implementation
// is robust. Including the Clone method makes it more durable against failure.
public override BaseData Clone() {
    return new MyCustomDataSource {
        // Don't forget to copy these two properties over
        Time = Time,
        Symbol = Symbol,

        BullSentiment = BullSentiment,
        BearSentiment = BearSentiment
    };
}

DataTimeZone

This method informs LEAN what time zone this data source is in. To set a timezone, use the QuantConnect.TimeZones static class to select the appropriate time zone for your data source. This ensures that the time is assigned the proper timezone so that it can be emitted at the right time. An example implementing this method is provided below.

// Specifies the time zone for this data source. This is useful for custom data types
public override DateTimeZone DataTimeZone()
{
    // Select the time zone of your data here
    return TimeZones.Utc;
}
import QuantConnect

def DataTimeZone(self):
    return Quantconnect.TimeZones.Utc

This method will default to TimeZones.NewYork if no implementation is provided.

RequiresMapping

This method informs LEAN that the data source has a relationship with equity Symbols. A few items to determine if you should enable RequiresMapping are provided below.

  • The custom data source you are implementing is for equities or options.
  • The custom data source you are implementing uses the same Symbols/tickers as equities (e.g. AAPL for equities and AAPL for custom data)

If you checked both of these boxes, you should set RequiresMapping to true. Otherwise, set it to false as it has no relationship to equities or options.

An example implementation of this method is provided below:

// Indicates whether this data type is linked to an underlying Symbol.
public override bool RequiresMapping() {
    return true;
}
# Indicates whether this data type is linked to an underlying Symbol
def RequiresMapping(self):
    return True

Note that this method will default to true if the underlying Symbol is an Equity or Option and no implementation is provided.

Design Patterns and Considerations

Handling Parsing Failures

Someday, you might receive malformed data from the data vendor due to a glitch or changes in the data spec. However, in our current implementation, if our parsing fails, the exception will be unhandled and cause the algorithm to terminate prematurely.

To deal with this issue, we can return [csharp]null[/csharp][python]None[/python] to indicate that the value failed to parse properly. This gives us a way to handle the error gracefully, which can be critical for live trading uptime.

An example of this pattern is present in the CBOE custom data implementation:

public override BaseData Reader(SubscriptionDataConfig config, string line, DateTime date, bool isLiveMode) {
    // Return null if we don't have a valid date for the first entry
    if (!char.IsNumber(line.FirstOrDefault())) {
        return null;
    }
    // ...
}
def Reader(self, config, line, date, isLiveMode):
    if not line.isnumeric():
        return None
    # ...

Exception Handling in Reader

A reoccurring pattern in the implementation of Reader is the inclusion of exception handling. In this way, we are guaranteed to only receive valid data from the Reader method. However, it has the potential of suppressing/obscuring errors which would result in a lower amount of data emitted.

If you plan on implementing exception handling, we recommend logging an error with the exception so that you can review it at a later time.

Below are a few reasons you may or may not want to handle exceptions.

Reasons to handle exceptions:

  • Provide fault-tolerant behavior in your algorithm
  • You want to custom tailor the behavior of Reader when an error is encountered
  • You want to provide redundant behavior inside Reader

Reasons to not handle exceptions:

  • Enforce consistency and correctness in your data
  • Stop execution immediately if invalid data is encountered
  • Performance (only applies if the catch/except block is fired periodically)

Using an Existing BaseData Implementation

Sometimes, the structure in which your data comes in has already been implemented in LEAN. A great example of this is are data sources that use "open, high, low, close" (OHLC) bars. For data sources that use OHLC, we can use the existing Bar class and inherit from it instead of BaseData. In that way, we wouldn't have to reimplement the properties of the class and also gain access to some of the helper methods that it provides.

Here is a guideline to help you decide which type to inherit from when constructing your custom data.

  • Bar - Your data source has OHLC fields (no volume field)
  • TradeBar - Your data source has OHLCV fields
  • QuoteBar - Your data source has OLHCV for both bid and ask sides
  • BaseData - None of the above apply

The following sections only apply to C#

These features are unsupported in Python due to the differences between these two languages. There is no plans to support equivalent features at this time.

C#: Prefer Properties Over Fields

When adding new value definitions to your class such as BullIntensity, prefer using properties over fields. This is done to ensure consistency with the rest of the codebase.

Prefer:

public decimal BullIntensity { get; set; }

Disprefer:

public decimal BullIntensity;

C#: EndTime vs. Time

When EndTime is overridden, it is normally done so that we can specify the starting time that the data applies to and the ending time that the data ends at (i.e. when it should be emitted).

Period tends to also be included into the custom data source as well to describe how much time has passed between Time and EndTime.

A common overridden implementation of EndTime is shown below.

// Set a period of only a single minute. This means that a bar will encompass one minute
public TimeSpan Period { get; set; } = TimeSpan.FromMinutes(1);

// The end time of this data. Some data covers spans (trade bars) and as such we want
// to know the entire time span covered
public override DateTime EndTime
{
    get { return Time + Period; }
    set { Time = value - Period; }
}

Beware: if you override this property to the example above, you should not copy EndTime in the Clone method. You should only copy Time and your EndTime will still be preserved.

Completed Example

Here we present to you a complete example comprised of all the sections explained so far. Note that Python is lacking some features due to its inability to implement C# specific code across interop boundaries.

using QuantConnect;
using QuantConnect.Logging;
using QuantConnect.Util;
using System;
using System.IO;

namespace QuantConnect.Data.Custom {
    public class MyCustomDataSource : BaseData {
        // Define the period of the bar last one minute.
        // This is amount of time between the starting time to the ending time
        public TimeSpan Period { get; set; } = TimeSpan.FromMinutes(1);

        // Sets the EndTime of the bar. This is the time the data will be emitted
        public override DateTime EndTime { 
            get { return Time + Period; }
            set { Time = value - Period; }
        }

        // define your values here. Examples:
        public decimal BullSentiment { get; set; }
        public decimal BearSentiment { get; set; }

        // Instructs LEAN to look for data at the given URL or location on disk.
        public override SubscriptionDataSource GetSource(SubscriptionDataConfig config, DateTime date, bool isLiveMode) {
            return new SubscriptionDataSource(
                "https://<YOUR_SITE_GOES_HERE>.com/sentiment_data.csv", // Location of the data.
                SubscriptionTransportMedium.RemoteFile,                 // Specifies to read a whole file from a remote source (URL)
                FileFormat.Csv                                          // Specifies to read the file line by line, like a CSV file
            );
        }

        // Below we've provided an example showing how to correctly and idiomatically parse the data
        public override BaseData Reader(SubscriptionDataConfig config, string line, DateTime date, bool isLiveMode) {
            try {
                // Assuming our CSV is as follows:
                // TIME (yyyyMMdd HH:mm:ss), BullSentiment, BearSentiment 
                var csv = line.Split(',');

                // Since MyCustomDataSource derives from BaseData, it is valid as a return type
                return new MyCustomDataSource {
                    // This is the emit time, i.e. the time that the algorithm will output the event.
                    // Ensure you have this value set
                    EndTime = Parse.DateTimeExact(csv[0], "yyyyMMdd HH:mm:ss"),
                    // This is the Symbol associated with the data. Usually should be set to `config.Symbol`
                    // Ensure you have this value set. 
                    Symbol = config.Symbol,

                    // Here we parse the custom fields that we've implemented.
                    BullSentiment = Parse.Decimal(csv[1]),
                    BearSentiment = Parse.Decimal(csv[2])
                };
            }
            catch (Exception e) {
                // Log the error for future debugging
                Log.Error(e);
                // Return null if we couldn't parse the data. 
                return null;
            }
        }

        // We include the Clone method override to ensure that our data implementation
        // is robust. Including the Clone method makes it more durable against failure.
        public override BaseData Clone() {
            return new MyCustomDataSource {
                // Don't forget to copy these two properties over.
                // Copy `Time` instead of `EndTime` to prevent our time from shifting
                // over one whole `Period` 
                Time = Time,
                Symbol = Symbol,

                BullSentiment = BullSentiment,
                BearSentiment = BearSentiment
            };
        }

        // Specifies the time zone for this data source. This is useful for custom data types
        public override DateTimeZone DataTimeZone()
        {
            // Select the time zone of your data here
            return TimeZones.Utc;
        }

        // Indicates whether this data type is linked to an underlying equity Symbol.
        public override bool RequiresMapping() {
            return true;
        }

        public override string ToString() {
            return $"{EndTime} - {Symbol}: Bull sentiment: {BullSentiment}, Bear sentiment: {BearSentiment}";
        }
    }
}
from datetime import datetime
from QuantConnect import *
from QuantConnect.Data import *


class MyCustomDataSource(PythonData):
    def GetSource(self, config, date, isLiveMode):
        '''
        Instructs LEAN to look for data at the given URL or location on disk.
        '''

        return SubscriptionDataSource(
            "https://<YOUR_SITE_GOES_HERE>.com/sentiment_data.csv", # Location of the data.
            SubscriptionTransportMedium.RemoteFile,                 # Specifies to read a whole file from a remote source (URL)
            FileFormat.Csv                                          # Specifies to read the file line by line, like a CSV file
        )

    def Reader(self, config, line, date, isLiveMode):
        '''
        This will have to be implemented by you since almost all data sources differ in the way we parse them.
        '''
        # Here we parse the custom fields that we've implemented. 
        try:
            # Assuming our CSV is as follows:
            # TIME (yyyyMMdd HH:mm:ss), BullSentiment, BearSentiment 
            var csv = line.Split(',')

            # Since MyCustomDataSource derives from BaseData, it is valid as a return type
            instance = MyCustomDataSource()

            # This is the emit time, i.e. the time that the algorithm will output the event.
            # Ensure you have this value set
            instance.EndTime = datetime.strptime(csv[0], "%Y%m%d %H:%M:%S")
            # This is the Symbol associated with the data. Usually should be set to `config.Symbol`
            # Ensure you have this value set. 
            instance.Symbol = config.Symbol

            # Define your values here.
            instance["BullSentiment"] = Parse.Decimal(csv[1])
            instance["BearSentiment"] = Parse.Decimal(csv[2])

            return instance

        except Exception as e:
            # Log the error for future debugging
            Log.Error(e)
            # Return null if we couldn't parse the data. 
            return None

    # Python doesn't require the implementation of the `Clone` method.
    # It is important that you do not override the `Clone` method in Python.

    def DataTimeZone(self):
        '''
        Select the time zone of your data here.
        Defaults to New York timezone if no implementation is provided.
        '''
        return TimeZones.Utc

    def RequiresMapping(self):
        '''
        Indicates whether this data type is linked to an underlying equity Symbol.
        '''
        # If your data has a relationship with equities via their tickers,
        # you should set this to True. 
        
        # Example: SEC filings are based on stock tickers, so it should be `True`
        # Example: sentiment data tickers are related to stock tickers, so it should be `True`.
        # Example: Federal Reserve data tickers are not related to stock tickers, so it should be `False`.
        # Example: Weather data has no relationship to equities, so it should be `False`

        # Sentiment data is for equities. Set to True because we share the same set of tickers.
        return True 

    def DefaultResolution(self):
        '''
        Sets the default resolution of this data source.
        '''
        return Resolution.Minute

    def SupportedResolutions(self):
        '''
        Sets the supported resolutions for this data source.
        '''
        return self.AllResolutions

If you require additional reference material/examples, please visit the LEAN repository on GitHub containing implementations of these concepts used in production

Accessing Data in an Algorithm

To access the custom data in your algorithm in OnData, we recommend using the Slice.Get method. Because Slice is keyed by Symbol, we want to access the data itself. We can do this by calling .Values on the outcome of Slice.Get, which we can then iterate over.

An example of this is provided below.

public override void OnData(Slice data) {
    foreach (var sentiment in data.Get<MyCustomDataSource>().Values) {
        Log($"{sentiment.Symbol}: Got bullish sentiment of {sentiment.BullSentiment}");
    }
}
def OnData(self, data):
    for sentiment in data.Get(MyCustomDataSource).Values:
        self.Log(f"{sentiment.Symbol}: Got bullish sentiment of {sentiment.BullSentiment}")

Debugging

Why is data not reaching OnData?

This can be caused by either not implementing the Clone method properly, GetSource pointing to a resource that does not exist, or an error inside your Reader method exists.

To fix this, make sure you've done or verified the following:

  • You have implemented the Clone method
  • GetSource points to a valid location
  • Parsing in Reader is successful
  • Your endlines are \n or \r\n
  • Use UTF-8 file encoding
  • If using compression, ensure your archive is not corrupted

Why is all of my data null?

This can be caused by not implementing the Clone method properly, or a silent failure in Reader.

To fix this, make sure you've done the following:

  • You have implemented the Clone method
  • Parsing in Reader is successful

Why are nullable values being set to their default value or null?

This can be caused by accessing your custom data via the Slice indexer (e.g. data[_symbol]).

To fix this, we recommend accessing the data using the Slice.Get method as it will respect and preserve nullable types.

Prefer:

public override void OnData(Slice data) {
    var customData = data.Get<MyCustomDataSource>(_symbol).Value;
}
def OnData(self, data):
    customData = data.Get(MyCustomDataSource, self.symbol).Value

Disprefer:

public override void OnData(Slice data) {
    var customData = data[_symbol];
}
def OnData(self, data):
    customData = data[self.symbol]

How come my time is different from algorithm time?

This can be caused by the algorithm operating in a different time zone than the data.

To fix this, you can set the algorithm's time zone to the same time zone as the data.

Why are seconds in EndTime being rounded down to the minute?

This is caused by not implementing the DefaultResolution and SupportedResolutions methods.

To fix this, you can override those methods and provide a suitable resolution for your data.

My algorithm is too slow

If you are backtesting locally and retrieving data from a remote source, we recommend gathering the data you want to backtest on locally before backtesting.

If you are backtesting on our cloud platform using an officially supported alternative data and it is too slow for your purposes, please contact support via e-mail with an example algorithm attached.


If your issues persist after following these steps, please e-mail support with your custom data class, an algorithm attached, and if possible, example data to replicate the issue with.


Advanced Concepts and Applications

Accessing files inside a ZIP archive

Sometimes, compression is desired for our data sources to save disk space as much as possible. If this is the case, it is still possible to access data within a ZIP archive by using the hash ("#") feature in GetSource. In the <SOURCE> position in the Quick Start GetSource method, you can reference the zip file along with the file contained within that you want to read.

The syntax is as follows:

<SOURCE>#<FILE>

An example of this concept is provided below.

public override SubscriptionDataSource GetSource(SubscriptionDataConfig config, DateTime date, bool isLiveMode) {
    return new SubscriptionDataSource(
        "Data/my_custom_data/20180101.zip#file.json",
        // ...
        // ...
    );
}
def GetSource(self, config, date, isLiveMode):
    return SubscriptionDataSource(
        "Data/my_custom_data/20180101.zip#file.json",
        # ...
        # ...
    )

Currently, we only support ZIP compression. If you require support for alternative forms of compression, please e-mail support with the compression format you would like supported.

FileFormat.Index

This file format is useful whenever you have a collection of tickers contained in a single piece of data, but don't want to duplicate the data itself for each ticker. Similar to pointers, FileFormat.Index indicates that the file located under the ticker you want to access redirects to the final data that contains a collection of tickers, including the one that is being requested for.

An example diagram of the concept is provided below.


                              +----------------------+
GetSource(...) returns -----> | ./aapl/20180101.json | -----> (which is then iterated on and "GetSourceForAnIndex(...)" is called)
                              | -------------------- |
                              | 1234.json            | -----> GetSourceForAnIndex(...) returns: ./contents/20180101.zip#1234.json
                              | 2345.json            | -----> GetSourceForAnIndex(...) returns: ./contents/20180101.zip#2345.json
                              +----------------------+

To implement FileFormat.Index, you need to do the following.

  • Derive from IndexedBaseData instead of
  • Implement GetSource that returns FileFormat.Index to point towards the index file (./aapl/20180101.json)
  • Create a new method called GetSourceForAnIndex that points to the final file containing the data and collection of tickers

An example implementation of the diagram above is provided below:

public class MyCustomDataSource : IndexedBaseData {
    // ...

    // This effectively tells LEAN where to find the index file
    public override SubscriptionDataSource GetSource(SubscriptionDataConfig config, DateTime date, bool isLiveMode) {
        return new SubscriptionDataSource(
            $"./{config.Symbol.Value.ToLower()}/{date:yyyyMMdd}.json", // Assuming `Symbol` is "AAPL" and `date` is 2018-01-01
            SubscriptionDataSource.LocalFile,
            FileFormat.Index
        );
    }

    // This tells LEAN where to find the real data for a given index.
    // We will be redirected to another file from here
    public override SubscriptionDataSource GetSourceForAnIndex(SubscriptionDataConfig config, DateTime date, string index, bool isLiveMode) {
        return new SubscriptionDataSource(
            $"./contents/{date:yyyyMMdd}.zip#{index}", // Assuming `index` is `1234.json` or `2345.json`
            SubscriptionDataSource.LocalFile,
            FileFormat.Csv
        );
    }
}

This feature is not supported in Python at this time.

If you'd like to request this feature for Python, please e-mail support with a link to this section explaining your use case.

FileFormat.Collection

If you need to return a collection of your custom data type, FileFormat.Collection can be used to do so if your return type satisfies IEnumerable<BaseData>.

To implement this, do the following.

  • Return from GetSource with FileFormat set to FileFormat.Collection
  • In Reader, return a BaseDataCollection object containing the data as the final argument.

You can see an example implementation of it in SECReport10Q on GitHub

LEAN data caching

Custom data sources are not cached at this time.

Live Mode

Live mode flag

When you are implementing your data source for live trading, it is important to know if your data will be in a different shape or will require special parsing. If your data retrieval or data parsing differs from the backtesting implementation, then you will need to implement a special branch inside the existing GetSource and or Reader.

To do so, you can use the flag isLiveMode to determine whether the algorithm is trading live.

An example is provided below.

public class MyCustomDataSource : BaseData {
    // Place our API key here for use in live trading
    private string _apiKey = "<OMITTED>";

    // Instructs LEAN to look for data at the given URL or location on disk.
    public override SubscriptionDataSource GetSource(SubscriptionDataConfig config, DateTime date, bool isLiveMode) {
        if (isLiveMode) {
            return new SubscriptionDataSource(
                $"https://<SOME_API_SITE>.com/?key={_apiKey}&date={date:yyyyMMddTHH:mm}",
                SubscriptionTransportMedium.RemoteFile,
                FileFormat.Csv
            );
        }

        return new SubscriptionDataSource(
            "https://<YOUR_SITE_GOES_HERE>.com/sentiment_data.csv", // Location of the data.
            SubscriptionTransportMedium.RemoteFile,                 // Specifies to read a whole file from a remote source (URL)
            FileFormat.Csv                                          // Specifies to read the file line by line, like a CSV file
        );
    }

    // This will have to be implemented by you since almost all data sources differ in the way we parse them.
    // Below we've provided an example showing how to correctly and idiomatically parse the data
    public override BaseData Reader(SubscriptionDataConfig config, string line, DateTime date, bool isLiveMode) {
        if (isLiveMode) {
            // Assuming our CSV is as follows from our live endpoint:
            // TIME (yyyyMMddHH:mm:ss), BearSentiment, BullSentiment
            //
            // Notice how the data source format can be different. Having
            // the `isLiveMode` flag is advantageous in allowing us to implement
            // a custom parser for a live data source that differs from the backtesting feed
            var csv = line.Split(',');

            return new MyCustomDataSource { 
                EndTime = Parse.DateTimeExact(csv[0], "yyyyMMddHH:mm:ss"),
                Symbol = config.Symbol,
                BullSentiment = Parse.Decimal(csv[2]),
                BearSentiment = Parse.Decimal(csv[1])
            };
        }
        // Assuming our CSV is as follows:
        // TIME (yyyyMMdd HH:mm:ss), BullSentiment, BearSentiment 
        var csv = line.Split(',');

        // Since MyCustomDataSource derives from BaseData, it is valid as a return type
        return new MyCustomDataSource {
            // This is the emit time, i.e. the time that the algorithm will output the event.
            // Ensure you have this value set
            EndTime = Parse.DateTimeExact(csv[0], "yyyyMMdd HH:mm:ss"),
            // This is the Symbol associated with the data. Usually should be set to `config.Symbol`
            // Ensure you have this value set. 
            Symbol = config.Symbol,

            // Here we parse the custom fields that we've implemented.
            BullSentiment = Parse.Decimal(csv[1]),
            BearSentiment = Parse.Decimal(csv[2])
        };
    }
}
from datetime import datetime
from QuantConnect import *
from QuantConnect.Data import *


class MyCustomDataSource(PythonData):
    def __init__(self):
        # Place our API key here for use in live trading
        self.apiKey = "<OMITTED>"

    def GetSource(self, config, date, isLiveMode):
        '''
        Instructs LEAN to look for data at the given URL or location on disk.
        '''

        if isLiveMode:
            return SubscriptionDataSource(
                f"https://<SOME_API_SITE>.com/?key={self.apiKey}&date={date.strftime('%Y%m%dT%H:%M')}",
                SubscriptionTransportMedium.RemoteFile,
                FileFormat.Csv
            )

        return SubscriptionDataSource(
            "https://<YOUR_SITE_GOES_HERE>.com/sentiment_data.csv", # Location of the data.
            SubscriptionTransportMedium.RemoteFile,                 # Specifies to read a whole file from a remote source (URL)
            FileFormat.Csv                                          # Specifies to read the file line by line, like a CSV file
        );

    def Reader(self, config, line, date, isLiveMode):
        '''
        This will have to be implemented by you since almost all data sources differ in the way we parse them.
        Below we've provided an example showing how to correctly and idiomatically parse the data
        '''
        if isLiveMode:
            # Assuming our CSV is as follows from our live endpoint:
            # TIME (yyyyMMddHH:mm:ss), BearSentiment, BullSentiment
            #
            # Notice how the data source format can be different. Having
            # the `isLiveMode` flag is advantageous in allowing us to implement
            # a custom parser for a live data source that differs from the backtesting feed
            csv = line.split(",")

            instance = MyCustomDataSource()

            instance.EndTime = datetime.strptime(csv[0], "%Y%m%d%H:%M:%S")
            instance.Symbol = config.Symbol
            instance.BullSentiment = float(csv[2])
            instance.BearSentiment = float(csv[1])

            return instance

        # Assuming our CSV is as follows:
        # TIME (yyyyMMdd HH:mm:ss), BullSentiment, BearSentiment 
        csv = line.split(",");

        # Since MyCustomDataSource derives from BaseData, it is valid as a return type
        instance = MyCustomDataSource()
        # This is the emit time, i.e. the time that the algorithm will output the event.
        # Ensure you have this value set
        instance.EndTime = datetime.strptime(csv[0], "%Y%m%d %H:%M:%S")
        # This is the Symbol associated with the data. Usually should be set to `config.Symbol`
        # Ensure you have this value set. 
        instance.Symbol = config.Symbol

        # Here we parse the custom fields that we've implemented.
        instance.BullSentiment = float(csv[1])
        instance.BearSentiment = float(csv[2])

        return instance