Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoTVM optimization? #2244

Closed
federicoparra opened this issue Apr 28, 2024 · 3 comments
Closed

AutoTVM optimization? #2244

federicoparra opened this issue Apr 28, 2024 · 3 comments
Labels
question Question about the usage

Comments

@federicoparra
Copy link

federicoparra commented Apr 28, 2024

I recently gone through this tutorial: https://tvm.apache.org/docs/tutorial/autotvm_relay_x86.html

Model execution performance on Orange Pi Mali improved quite a lot during the optimization process; crucially, the optimization is not a fixed set of optimizations but rather an iterative search that improves model inference performance on your specific hardware.

In contrast, it looks like the MLC compilation using Relax, even when using the maximum optimization settings, involves a set of fixed optimizations and that there is no equivalent iterative search.

I wonder if an iterative search the likes of AutoTVM could make remarkable improvement on inference speeds for LLMs on MLC for certain hardware.

Thoughts?

@tqchen
Copy link
Contributor

tqchen commented May 4, 2024

We are already applying auto-tune implicitly in many cases. The current mechanism dlight is already somewhat auto-tuned then coded into the rule, but indeed can be further tweaked, see examples like apache/tvm#16932

Our general philosophy here is that we would like to decouple auto-tune from build, so it is still possible to autotune, then the results are then applied with the configuration found, right now some of these configs are directly coded in template. Starting from the dlight space is likely better for LLM specific use-cases

@federicoparra
Copy link
Author

We are already applying auto-tune implicitly in many cases. The current mechanism dlight is already somewhat auto-tuned then coded into the rule, but indeed can be further tweaked, see examples like apache/tvm#16932

Our general philosophy here is that we would like to decouple auto-tune from build, so it is still possible to autotune, then the results are then applied with the configuration found, right now some of these configs are directly coded in template. Starting from the dlight space is likely better for LLM specific use-cases

I'm sorry I don't know what dlight is - I know I tried autotvm with a vision model and the iterative search for optimization took very long so I'm assuming a full on optimization for an LLM with iterative search could take hours to get to the most performant code? In contrast MLC compilation in my orange pi takes seconds even when maxing out optimizations.

Is there a way to optimize an MLC model like we do in relay? Please point me in the right direction, thanks!

@tqchen
Copy link
Contributor

tqchen commented May 4, 2024

Relax is actually a better iteration from relay that address some of the long time compilation issues. The prebuilt are already better space so it is better optimized. You can think about relax as something with abetter starting pt than AutoTM. dlight code is here

@tqchen tqchen closed this as completed May 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Question about the usage
Projects
None yet
Development

No branches or pull requests

2 participants