This project aims to optimize a trading algorithm leveraging machine learning techniques. The primary goal is to automate the process and enhance performance without the need for constant manual intervention. The initial model employed is based on logistic regression.
Instead of using pre-built libraries like sklearn.linear_model.LogisticRegression, this project involves building the logistic regression model from scratch. This approach aids in understanding the fundamental mechanisms and allows for future expansions. The model is built around two key components:
- Cost Function: The core of the logistic regression model.
- Gradient Descent: An optimization algorithm used to refine the Cost Function.
Measure of losses over a training set.
Measure of difference of a single example to its target value.
The algorithm is used to optimize the cost function and find the best value for w and b, where a is the learning rate.
To prevent overfitting or underfitting, regularization is applied. It involves reducing feature sizes.
This is just the cost function with one additional element.
This is just the gradient descent algorithm with one additional element.
- Formula for w changes while the formula for b stays the same.
The model uses angles as input data to predict probabilities.
To determine the slope at a point of interest:
Once the slope is obtained, the angle of intersection is calculated using:
Upon thorough analysis of the angle data within our dataset, it is observed that there is a negligible correlation between the angle values and the win/loss rate. This finding aligns with initial expectations, given the model's reliance on a single feature, which typically offers limited predictive power.
The lack of significant correlation in this context is not surprising. Single-feature models often struggle to capture the complexities inherent in datasets where outcomes (like win/loss rates) are influenced by multiple factors. Therefore, the current analysis reinforces the hypothesis that a more robust model, integrating additional relevant features, is necessary for more accurate predictions.
Sigmoid function being plotted which shows no correlation between angles and wins/losses, 0's are losses 1's are wins
The next phase of my research will focus on identifying and incorporating additional features that are potentially correlated with the win/loss outcomes. This includes but is not limited to:
- Rate of increase:This feature measures the rate at which a certain variable (e.g., price, volume, etc.) increases over a specified time frame. In the context of win/loss outcomes, this rate could provide insights into momentum trends, potentially indicating periods of heightened activity or interest. For instance, a rapid rate of increase might correlate with higher win rates due to market enthusiasm, while a slower rate could suggest caution or uncertainty among participants.
- Time Passed Since Last Trade (or Number of Bars Since Last Trade): This feature captures the duration since the last trading activity, which can be a critical indicator of market sentiment and trader behavior. In terms of its relevance, this duration might reflect the market’s response time or the latency in reaction to external events. Shorter intervals could imply a more active and responsive market, potentially leading to higher win rates, while longer intervals might indicate a less dynamic market environment, possibly correlating with lower win rates.
Further, I plan to refine our predictive model by:
- Integrating these new features and analyzing their collective impact.
- Exploring more complex modeling techniques that can capture the multi-dimensional nature of the data.
This initial phase of analysis sets the groundwork for a more comprehensive study. By expanding our feature set and employing advanced modeling strategies, I aim to significantly improve the accuracy of our predictions and gain deeper insights into the factors influencing win/loss rates.
Predicting wins using Random Forest Classifier
Predicting best exit's with DQN