AutoTVM optimization? #2244

federicoparra · 2024-04-28T23:08:51Z

I recently gone through this tutorial: https://tvm.apache.org/docs/tutorial/autotvm_relay_x86.html

Model execution performance on Orange Pi Mali improved quite a lot during the optimization process; crucially, the optimization is not a fixed set of optimizations but rather an iterative search that improves model inference performance on your specific hardware.

In contrast, it looks like the MLC compilation using Relax, even when using the maximum optimization settings, involves a set of fixed optimizations and that there is no equivalent iterative search.

I wonder if an iterative search the likes of AutoTVM could make remarkable improvement on inference speeds for LLMs on MLC for certain hardware.

Thoughts?

tqchen · 2024-05-04T12:46:20Z

We are already applying auto-tune implicitly in many cases. The current mechanism dlight is already somewhat auto-tuned then coded into the rule, but indeed can be further tweaked, see examples like apache/tvm#16932

Our general philosophy here is that we would like to decouple auto-tune from build, so it is still possible to autotune, then the results are then applied with the configuration found, right now some of these configs are directly coded in template. Starting from the dlight space is likely better for LLM specific use-cases

federicoparra · 2024-05-04T19:42:45Z

We are already applying auto-tune implicitly in many cases. The current mechanism dlight is already somewhat auto-tuned then coded into the rule, but indeed can be further tweaked, see examples like apache/tvm#16932

Our general philosophy here is that we would like to decouple auto-tune from build, so it is still possible to autotune, then the results are then applied with the configuration found, right now some of these configs are directly coded in template. Starting from the dlight space is likely better for LLM specific use-cases

I'm sorry I don't know what dlight is - I know I tried autotvm with a vision model and the iterative search for optimization took very long so I'm assuming a full on optimization for an LLM with iterative search could take hours to get to the most performant code? In contrast MLC compilation in my orange pi takes seconds even when maxing out optimizations.

Is there a way to optimize an MLC model like we do in relay? Please point me in the right direction, thanks!

tqchen · 2024-05-04T20:55:16Z

Relax is actually a better iteration from relay that address some of the long time compilation issues. The prebuilt are already better space so it is better optimized. You can think about relax as something with abetter starting pt than AutoTM. dlight code is here

federicoparra added the question Question about the usage label Apr 28, 2024

federicoparra mentioned this issue May 4, 2024

Reduced and now seemingly removed support for Mali? #2274

Closed

tqchen closed this as completed May 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoTVM optimization? #2244

AutoTVM optimization? #2244

federicoparra commented Apr 28, 2024 •

edited

tqchen commented May 4, 2024 •

edited

federicoparra commented May 4, 2024

tqchen commented May 4, 2024

AutoTVM optimization? #2244

AutoTVM optimization? #2244

Comments

federicoparra commented Apr 28, 2024 • edited

tqchen commented May 4, 2024 • edited

federicoparra commented May 4, 2024

tqchen commented May 4, 2024

federicoparra commented Apr 28, 2024 •

edited

tqchen commented May 4, 2024 •

edited