Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Implementation of MediaTek Backend for Executorch #3571

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

neuropilot-captain
Copy link

This pull request introduces the initial implementation of the MediaTek backend for Executorch. Below are the key points and instructions related to this contribution:

  • The "Ahead of Time" compilation feature is not included in this pull request due to technical limitations. We plan to address this and include it in a subsequent update.
  • The build process and execution phase rely on libraries provided by the MediaTek NeuroPilot SDK. For the purpose of verification, these libraries will be made available offline.

Build Instructions:

To build the MediaTek backend for Executorch, please adhere to the guidelines provided in the backends/mediatek/scripts/README.md file within this repository.

On-device Execution:

For instructions on how to run an example on MediaTek devices, refer to the documentation in examples/mediatek/README.md.

Llama Example:

A sample Llama runner is built together with MediaTek backend.

- Add backends/mediatek & examples/mediatek
- Add runtime: Neuron ExecuTorch Backend
- Add example: mtk_executor_runner
Copy link

pytorch-bot bot commented May 10, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/3571

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 10, 2024
Download Android NDK and make sure that $ANDROID_NDK is specified as the NDK Path.

3. Download **MediaTek ExercuTorch libraries**
Download related libraries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these and where do you get it from? I would like to repro

@@ -0,0 +1,29 @@
# ExecuTorch Neuron Backend examples

## Directory structure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add pre-requisite here as to which phones support MTK chipset? ALso do add "validated on" to indicate which phone or chipset this is validated on

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion! We have added the list of supported chips that the examples are validated on. Also, we have refactor this README for better readability.

Please follow the build step in backends/mediatek/scripts.
The mtk_executor_runner will be automatically generated.

## Execute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you generate the pte file?

@kimishpatel
Copy link
Contributor

I dont see any instructions/code on the ahead of time component

Copy link

pytorch-bot bot commented May 10, 2024

Please seek CI approval before scheduling CIFlow labels

@mergennachin
Copy link
Contributor

Thanks for your contribution and PR.

Will take a look at it deeply soon. I just kicked off OSS CI jobs. One thing, we have a lint rule. Here's how you can invoke:

https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#coding-style

@mergennachin
Copy link
Contributor

The build process and execution phase rely on libraries provided by the MediaTek NeuroPilot SDK. For the purpose of verification, these libraries will be made available offline.

Is there a public link that we can download from MediaTek website? If not, is there a plan to publish it publicly?

Copy link
Contributor

@swolchok swolchok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just dropping in with a bunch of small C++ suggestions; feel free to apply them or not

backends/mediatek/runtime/include/NeuronBackend.h Outdated Show resolved Hide resolved
backends/mediatek/runtime/include/NeuronBackend.h Outdated Show resolved Hide resolved
backends/mediatek/runtime/NeuronExecutor.cpp Outdated Show resolved Hide resolved
backends/mediatek/runtime/include/NeuronLog.h Outdated Show resolved Hide resolved
ET_CHECK_MSG(
GetModelInstance() == nullptr,
"Model is already initialized before calling LoadModels.");
void* instance = CreateModelInstance(mModelPathMap[id]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/mModelPathMap[id]/modelPath/ , no need to look it up again

}

template <typename IdType>
std::string MultiModelLoader<IdType>::GetModelPath() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this return a const reference to avoid copying?

size_t getRotEmbedLength() const;

private:
char* mMasterLut; // byte flatten array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use std::unique_ptr<char[]>?

Comment on lines +48 to +50
if (mLutBuffer != nullptr) {
delete mLutBuffer;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. deleting nullptr is fine
  2. this should be delete[]
  3. but why not use std::unique_ptr<uint8_t[]> and avoid the whole thing?

Comment on lines +127 to +128
size_t num_memory_planned_buffers = method_meta->num_memory_planned_buffers();
for (size_t id = 0; id < num_memory_planned_buffers; ++id) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's probably not performance-critical, but reserving these vectors before filling them wouldn't hurt

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As answered in the conversation below.
The current implementation of mtk_executor_runner is based on examples/portable/executor_runner, and we'll look into executorch/extension/module to determine if we need to modify the code.

Copy link
Contributor

@cccclai cccclai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you pushing the first PR! Looks great :)

Overall I think NeuronBackend and mtk_executor_runner are reasonable. It will be great if we can share the expected flow for users to generate the .pte file and run it.

Regarding the llama runner logic, I can see why we need the multimodel loader and the caching logic, but it's not quite clear to me what logic from .sois needed here. Maybe we can walk through it.

return Error::InvalidState;
};

auto& allocator = GET_NEURON_ALLOCATOR;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can set up temp allocator pointing to neuron allocator, which will be passed to NeuronExecuTorchDelegate::execute

size_t mSize = 0;
};

class BufferAllocator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me that this allocator is for allocating memory in Neuron, but not CPU, is it correct?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right! The BufferAllocator is mainly used for allocating memory within the Neuron. However, the allocated buffers are also accessible to the CPU, allowing for shared use between the Neuron and the CPU.

ArrayRef<CompileSpec> compile_specs) const {
NeuronDelegateSetting setting;
for (auto& compile_spec : compile_specs) {
if (std::strcmp(compile_spec.key, kHighAddrKey) == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad that you're leveraging the compile spec! I'm trying to follow what these keys mean, maybe we can discuss about it.


// ---------------------------Execution------------------------------------
// Run the compiled model against a set of inputs
err = NeuronExecution_create(compilation, &execution);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason we need to execute the model during compilation stage? Is it for verification purpose only?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this function only creates an execution instance for execution, not executing it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment said "Run the compiled model against a set of inputs", and I thought it was executing the model. If not, can we update the comment to make it clear?

LogInfo("NeuronBackend", "version %u, input %u, output %u, length %u, payload size: %zu",
Payload.Header.Version, Payload.Header.InputCount, Payload.Header.OutputCount, Payload.Header.DataLen, processed->size());

auto delegate = std::unique_ptr<NeuronExecuTorchDelegate>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason we have a NeuronExecuTorchDelegate wrapper but not applying the logic inside NeuronBackend directly?

return mExecutor.Compute() == NEURON_NO_ERROR ? Error::Ok : Error::InvalidState;
};

int NeuronExecuTorchDelegate::HintNeuronBackend(EValue** args) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand what HintNeuronBackend mean - is there some import forever issue?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'import forever' is an optimization hint designed for specific models including Llama. We prefer not to automatically apply it to every model because it could leads unknow effects.

Comment on lines +22 to +28
size_t prompt_token_batch_size = 1;
size_t cache_size = 1024;
size_t hidden_size = 4096;
size_t num_head = 32;
size_t num_layer = 32;
size_t max_token_length = 2048;
double rot_emb_base = 10000.0f;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this data ideally don't need to be hard-coded but retrieved from the model. If they're hardcoded, something might go wrong if users export a llama-based model but with different model arch

double rot_emb_base = 10000.0f;

// Types
LLMType model_input_type = LLMType::INT16;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm how do you quantize the model?

return indexes;
}

LlamaModelChunk::LlamaModelChunk(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does user export 4 .pte file or 1 .pte file?

neuropilot-captain and others added 4 commits May 17, 2024 15:08
- Fix typos in NeuronExecutor.cpp.
- Remove duplicate APUWareUtilsLib.h in CMakeLists.txt.
- Update NEURON_BUFFER_ALLOCATOR to NEURON_BUFFER_ALLOCATOR_LIB.
- Clean up mtk_build.sh and set BUCK_PATH.
- Modify ReadSystemProperty signature to take string reference.
- Add MediaTek libraries introduction
- Improve readability
- Add the list of supported devices
- Improve readability
neuropilot-captain and others added 6 commits May 23, 2024 09:38
- minor wording change
- Fix Buck2 download link
- Add command example
Changed kHighAddrKey and kImportForeverKey from constexpr to extern in NeuronBackend.h.
Defined kHighAddrKey and kImportForeverKey in NeuronBackend.cpp.
Moved backend registration logic to an anonymous namespace in NeuronBackend.cpp.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants