Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Most efficient way to encode arguments #355

Open
2 tasks done
OskarGroth opened this issue Mar 3, 2023 · 3 comments
Open
2 tasks done

Most efficient way to encode arguments #355

OskarGroth opened this issue Mar 3, 2023 · 3 comments

Comments

@OskarGroth
Copy link

OskarGroth commented Mar 3, 2023

Checklist

I have an app that implements realtime Metal rendering pipelines @ 60 FPS. My arguments (filter, render, shader pipelines) won't change for 99.9% of the time, so it seems that the best method of storage & encoding is via a Metal buffer. We also make use of triple buffering for our rendering, creating MTLBuffers of 3x the size of the parameters (with proper alignment).

I'm trying to replace parts of it with MetalPetal, but I'm confused as to what is the best way to encode arguments. I've been trying to use MTIDataBuffer in the following way:

class UniformBuffer<T> {
    var buffers: [MTIDataBuffer]
    var maxBuffersInFlight: Int
    var index: Int = 0
    var current: MTIDataBuffer {
        buffers[index]
    }

    init(data: T, context: RenderContext) {
        maxBuffersInFlight = context.maxFramesInFlight
        buffers = (0..<maxBuffersInFlight).compactMap { _ in
            MTIDataBuffer(values: [data], options: .storageModeShared)
        }
    }

    func update(_ data: T) {
        index = (index + 1) % maxBuffersInFlight
        buffers[index].unsafeAccess { (buffer: UnsafeMutableBufferPointer<T>) -> Void in
            buffer[0] = data
        }
    }
}

While this works, it creates 3x buffers instead of a contiguous buffer (worse performance?). It also incurs a bit of performance hit due to the fact that the argument encoder is looking up buffers in a cache every time. I'm also concerned whether this fact counter-affects the triple buffering and may reuse the same single MTLBuffer under the hood?

If I am OK with managing my MTLBuffers manually, how can I use them in conjunction with MetalPetal? Is there a better way to optimise resource usage?

@YuAo
Copy link
Member

YuAo commented Mar 4, 2023

I think you are doing pretty well under the current framework limitation.

You can do a little "hack" if you want, using MTIGeometry, to fully customize the command encoding part. You can access the command encoder in the MTIGeometry protocol. Store the buffer in your MTIGeometry object and encode that buffer yourself. (Pass [:] to parameters, to skip the default argument encoding)

Examples of implementing MTIGeometry:

private class PointVertices: NSObject, MTIGeometry {
func copy(with zone: NSZone? = nil) -> Any {
return self
}
func encodeDrawCall(with commandEncoder: MTLRenderCommandEncoder, context: MTIGeometryRenderingContext) {
commandEncoder.drawPrimitives(type: .point, vertexStart: 0, vertexCount: 1, instanceCount: BouncingBallsView.numberOfParticles)
}
}

- (void)encodeDrawCallWithCommandEncoder:(id<MTLRenderCommandEncoder>)commandEncoder context:(id<MTIGeometryRenderingContext>)context {
//assuming buffer bounded to index 0.
[_vertexBuffer encodeToVertexBufferAtIndex:0 withCommandEncoder:commandEncoder];
if (_indexBuffer) {
[commandEncoder drawIndexedPrimitives:_primitiveType indexCount:_indexCount indexType:MTLIndexTypeUInt32 indexBuffer:[_indexBuffer bufferForDevice:commandEncoder.device] indexBufferOffset:0];
} else {
[commandEncoder drawPrimitives:_primitiveType vertexStart:0 vertexCount:_vertexCount];
}

Examples of using MTIGeometry:

let renderCommand = MTIRenderCommand(kernel: BouncingBallsView.renderKernel, geometry: PointVertices(), images: [computeOutput], parameters: ["data": frameDataBuffer.buffer])
let output = MTIRenderCommand.images(byPerforming: [renderCommand], outputDescriptors: [MTIRenderPassOutputDescriptor(dimensions: MTITextureDimensions(width: 1024, height: 1024, depth: 1), pixelFormat: .unspecified, clearColor: MTLClearColor(red: 0, green: 0, blue: 0, alpha: 1), loadAction: .clear, storeAction: .store)])


it creates 3x buffers instead of a contiguous buffer (worse performance?)

The 3x buffers only create once. It's a one-time cost, using a contiguous buffer won't help much.

It also incurs a bit of performance hit due to the fact that the argument encoder is looking up buffers in a cache every time.

Have you measured the performance of this part? This should be super fast. The cache uses the object address directly for the key hashing and object comparison, also, the hash table is very small (you usually only have one MTLDevice).

I'm also concerned whether this fact counter-affects the triple buffering and may reuse the same single MTLBuffer under the hood?

As I mentioned earlier. The cache uses the object's pointer/address, so it does not have any counter-affects on the triple buffering.


For further improvements on filter argument encoding, I'm thinking about adding APIs to register custom argument encoders. I'll let you know when I have some designs to discuss.

@OskarGroth
Copy link
Author

OskarGroth commented Mar 6, 2023

You can do a little "hack" if you want, using MTIGeometry, to fully customize the command encoding part

Interesting! I'll definitely try that, although it does feel like quite a workaround.

The 3x buffers only create once. It's a one-time cost, using a contiguous buffer won't help much.

True, thanks for confirming.

Have you measured the performance of this part? This should be super fast.

Fair point; I haven't. If the cache uses object addresses I agree with you, this shouldn't be a problem for speed or accuracy. Thanks for explaining!

For further improvements on filter argument encoding, I'm thinking about adding APIs to register custom argument encoders. I'll let you know when I have some designs to discuss

Definitely do! Maybe MTIRenderPipelineKernel should be extended to support a parameter encoding closure instead of a parameter dictionary? For example, I would love to have the convenience of using calls like kernel.makeImage to quickly generate a full viewport image but with custom fragment buffer encoding. So something like a:

MTIRenderPipelineKernel(encoding: @escaping ((MTLRenderCommandEncoder) -> Void), dimensions: MTITextureDimensions, pixelFormat: MTLPixelFormat = .unspecified)

and matching makeImage.

I also wanted to ask about texture usage. If I regenerate the same image (but with different parameters) every frame, how can I do so in the most performant way possible? Doing it manually I would reuse a single texture (assuming dimensions are the same) but with a loadAction: .clear, but I noticed my MetalPetal pipeline has loadAction: .dontCare so I am assuming it is creating a new texture for each frame. If so, how can I optimise resource usage there?

@YuAo
Copy link
Member

YuAo commented Mar 28, 2023

I also wanted to ask about texture usage. If I regenerate the same image (but with different parameters) every frame, how can I do so in the most performant way possible? Doing it manually I would reuse a single texture (assuming dimensions are the same) but with a loadAction: .clear, but I noticed my MetalPetal pipeline has loadAction: .dontCare so I am assuming it is creating a new texture for each frame. If so, how can I optimise resource usage there?

@OskarGroth

MetalPetal uses a texture pool internally. It allocates and resues textures from the pool. (You can call context.reclaimResources() to drain the pool)

So it does not create a new texture for each frame.

MetalPetal uses donotCare by default for performance purposes only. We assume that you draw every pixel of the texture so the device does not need to clear the texture before each draw (since every pixel will be overwritten). You can change the loadAction: .donotCare by providing a custom MTIRenderPassOutputDescriptor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants