Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Regression] ObservationWriter.AddList and DiscreteActionOutputApplier.Apply have become inefficient with Sentis 1.2.0+ #6112

Open
HyperlightDennis opened this issue May 9, 2024 · 0 comments
Labels
bug Issue describes a potential bug in ml-agents.

Comments

@HyperlightDennis
Copy link

ObservationWriter.AddList loops over data and writes it to a tensor in Line 189.

for (var index = 0; index < data.Count; index++)
{
    var val = data[index];
    ((TensorFloat)m_Proxy.data)[m_Batch, index + m_Offset + writeOffset] = val;
}

In Barracuda 3.0, these writes would be written a cache (https://github.com/Unity-Technologies/barracuda-release/blob/release/3.0.1/Barracuda/Runtime/Core/Tensor.cs#L2300) before being uploaded to the tensor. This would result in 1 job per 1 sensor. However, in Sentis 1.2.0+, there is no cache. In BurstTensorData.cs, the following code is executed:

    public void Set<T>(int index, T value) where T : unmanaged
    {
        CompleteAllPendingOperations();
        m_Array.Set<T>(m_Offset + index, value);
    }

This results in 1 job executed per float observation.

The same applies with DiscreteActionOutputApplier.Apply, which fetches from a tensor in Line 94:

for (var j = 0; j < actionSize; j++)
{
    discreteBuffer[j] = ((TensorInt)tensorProxy.data)[agentIndex, j];
}

In Sentis 1.2.0+, there is no cache, so this results in 1 job per action, instead of 1 job total. In BurstTensorData.cs, the following code is executed:

    public T Get<T>(int index) where T : unmanaged
    {
        CompleteAllPendingOperations();
        return m_Array.Get<T>(m_Offset + index);
    }

A cache should probably be implemented on the MLAgents inference side to reduce the number of job requests.

To Reproduce

  1. Run any model and profile it in the Unity Editor using Burst as a backend.
  2. Inspect the GenerateTensors and ApplyTensors markers. Compare the results using MLAgents 2.0.1 (with Barracuda 3) vs MLAgents 3.0.0 (with Sentis).
  3. Observed the increase in the number of job requests.

Screenshots
With MLAgents 2.0.1 + Barracuda 3, GenerateTensors created 3 jobs, one for each of my observation tensors. This took 0.377 ms.
MLAgents 2.0.1 + Barracuda 3 (GenerateTensors)

With MLAgents 2.0.1 + Barracuda 3, ApplyTensors took 0.02 ms.
MLAgents 2.0.1 + Barracuda 3 (ApplyTensors)

With MLAgents 3.0.0 + Sentis 1.2.0, GenerateTensors created 5,000+ jobs and took 8.97 ms (24x longer).
MLAgents 3.0.0 + Sentis 1.2.0 (GenerateTensors)

With MLAgents 3.0.0 + Sentis 1.2.0, ApplyTensors created 250+ jobs and took 0.22 ms (11x longer).
MLAgents 3.0.0 + Sentis 1.2.0 (ApplyTensors)

Environment (please complete the following information):

  • Unity Version: Unity 2022.3.27f1
  • OS + version: Windows 11
  • _ML-Agents version: 3.0.0
@HyperlightDennis HyperlightDennis added the bug Issue describes a potential bug in ml-agents. label May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue describes a potential bug in ml-agents.
Projects
None yet
Development

No branches or pull requests

1 participant