Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question #32

Open
lefig opened this issue May 23, 2020 · 9 comments
Open

Question #32

lefig opened this issue May 23, 2020 · 9 comments

Comments

@lefig
Copy link

lefig commented May 23, 2020

Hi Robert,

This is another fine project together with your excellent portfolio optimisation work.
I hope you don't mind me asking a question as someone inexperienced in this area regarding the amount of fundamental data required to produce a viable model.

My local exchange is London (LSE/FTSE) and getting historic fundamentals is hard. I am able to extract these day by day but it will take some time to produce a significant amount.

So I was wondering, how many days would I need to have processed for a viable classification model? Say 6 months, 3 months etc. I have many fields but at present this goes back a week.

Thank you in advance

Fig

@robertmartin8
Copy link
Owner

Hi @lefig,

Thanks for reaching out! Your question presupposes that with enough data, the classifier will be viable. I realise that the readme is quite encouraging in that regard, but I was 17 when I first built the project and was a lot more naive than I am now.

Having recently revisited the project with a few more years and some new statistical tools under my belt, I no longer believe that just throwing fundamentals into a ML model can produce alpha. I do, however, believe that it could work as some kind of automated screener, prior to discretionary human analysis.

As for the amount of data, I'm afraid I really don't know the answer. The least I've ever used is 5 years.

I'm sorry for not being more helpful. Happy to discuss further!

Robert

@lefig
Copy link
Author

lefig commented May 26, 2020

Hi @robertmartin8 ,

I really do appreciate you coming back to me with your thoughts. My thinking is similar to yours with respect to fundamentals providing the means for a screener in terms of say quality, magic formula, corporate governance etc.

But I have no confidence in time series based technical analysis as my models have just not been sufficiently reliable. I am still a learner however and would welcome your thoughts on whether timeseries analysis is indeed a fruitless exercise.

As for the minimum amount of training data, I need to have a little think about this but hope that you don't mind if I come back to you with a follow-up query. Its a tricky one for sure:)

Fig

@Wigley007
Copy link

Hi @robertmartin8 and @lefig,
I've been interested in what you've been doing here for sometime now. We've built a fundamentals-oriented back-testing engine and model portfolio tracker. I'm thinking an ML layer would potentially work well sitting on top of our engine. I see it as constantly running strategies essentialy picking the best one for the day based on a set risk management restrictions we'd apply... Although I don't really know enough about it, so I'm really just thinking out loud.. Penny for your thoughts :-) We're up and running in demo-mode: (https://forwardcaster.com/). Cheers, Tom

@robertmartin8
Copy link
Owner

@lefig

I don't want to insinuate that fundamentals are useless. In fact, I've been playing around on quantopian lately and have found a couple of fundamental factors that I think have signal. There has recently been an excellent discussion thread regarding a modified version of Greenblatt's magic formula, which you might find interesting.

The point is that there is a strong economic hypothesis underlying some of these factors which gives you confidence that the result is not overfit. If you just throw everything into a classifier, you lose this. I suppose my earlier statement about using it as a screen is somewhat contradictory – if something has value as a screen it can also have value as a strategy – but I just meant that there is a lot less alpha than I initially believed when I started the project.

If you are keen on continuing with the machine learning approach, here are two pieces of advice which hav improved the quality of my ML stock picker (but not to the point that I am willing to allocate live, as I once was):

  • Put a lot more effort into feature engineering. Have a look at Gurufocus for the kind of data they provide. Try normalising by sector or against history.
  • Think about grouping by sectors. This was probably the single biggest improvement, and one of the few avenues that I continue to think is worth exploring. You may find that excluding certain sectors (e.g Financials) helps.

Regarding time series, I think that "technical analysis" in the form of chart patterns is probably useless (I do try to keep an open mind), however, proper time series methods certainly are not futile. The trouble is that the barrier to entry is much higher because in that case you really are competing against maths PhDs. I've had some success with statistical arbitrage, though.

I'm enjoying the discussion, let me know what you think!

Robert

@robertmartin8
Copy link
Owner

@Wigley007

Sounds cool! I do think that ML can be valuable in alpha combination and regime detection, the only trouble is that it's nontrivial to frame it as an ML problem.

For example, in your case, you are saying that the output variable should be the label of whichever strategy you want it to run (or equivalently, which regime you are in). That's fine, but then you have to think about what goes in as an input. Should it be the historical time series of each of the strategies? Should it be fundamentals? Should it be macro factors?

Alternatively, you can view it as an alpha combination problem, which is a little more common in quant funds.

Robert

@lefig
Copy link
Author

lefig commented May 28, 2020

Hi @robertmartin8,

Real good to hear from you and your incites are fascinating. As a hobbiest geek this is all just pure pleasure for me. Indeed, I am obliged for you mentioning about the quantopian discussion on Greenblatt's magic formula.

That is rather engaging and yes, this is the kind of thinking that I believe can work as means for preliminary screening. In my mind fundamental analysis is most relevant to long term trading time frames and technical analysis is for short term directional movement.

But I also think that portfolio work could provide an intriguing alternative approach by the assignment of weights for assets in a target portfolio. There are some examples in this area involving a tool called Open AI gym. This could be worth investigating.

I concur with your view regarding 'pure' timeseries based analysis of OHLC and tick data. From what I can see, the publicly available LSTM type models are rather contrived around specific datasets and parameters that may work in a laboratory environment but may not be readily adapted.

My gut instinct is that your classification project using fundamentals to encode a stock as possible good or bad is on the right track.

But yes, this is a fascinating area of research and we are most fortunate:)

Fig

@lefig
Copy link
Author

lefig commented May 28, 2020

Hi @Wigley007

Good to hear from you and thank you for dropping a line.

You have an interesting business model there. And I can see your portfolio tool filling a gap in the UK LSE market.

Following on from what Robert mentioned above you may wish to review the guidelines that Quantconnect have for what constitutes valid alpha factors and the means by which they can be reliably measured.

It is surprisingly hard to get right and I am aware that many contributors to their platform get alpha rejected from time to time.

Best wishes,
Fig

@Wigley007
Copy link

Wigley007 commented May 28, 2020

Hi @lefig and @robertmartin8,

Thanks for your feedback and suggestions! I'll do some more digging :-)

I may be thinking more about an optimization problem rather than an ML problem, I'm not sure..?

Let's say our engine ranks the market using 100 different metrics over 5-10 different time periods, and for each ranked metric we split the market into 10 deciles (i.e. a portfolio of stocks is created for each decile and our engine tracks the returns over a given time period for each portfolio..).

What I want is our engine to tell us is, of the 100 or so financial metrics we have, which are the best to use?

Rather than a human running many decile strategies (i.e. ranking and re-ranking the market - looking for 'robustness'), I want a machine to do it.

It should tell us these 5-10 most 'robust' metrics to use today..

From there, it would be possible to create an 'adaptive strategy' (i.e. one which would automatically drop one metric and pick up another because it was determined to be more 'robust')..

well that's my line of thinking anyway :-)

Cheers,
Tom

@lefig
Copy link
Author

lefig commented May 30, 2020

Hi @Wigley007 ,

Thank you for sharing your concept and thoughts. Robert is far more knowledgeable than myself in these matters but as you do some digging I would start with googling how Quantconnect use alpha streams in this context.

Perhaps this is more a question about portfolio construction rather than optimization although the distinction can be rather nuanced.

Ranking and scoring metrics by way of principle component analysis would probably be the best starting point.

Its an interesting one.

Fig

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants