Most efficient way to encode arguments #355

OskarGroth · 2023-03-03T11:36:33Z

Checklist

I've read the README
I've searched for existing GitHub issues

I have an app that implements realtime Metal rendering pipelines @ 60 FPS. My arguments (filter, render, shader pipelines) won't change for 99.9% of the time, so it seems that the best method of storage & encoding is via a Metal buffer. We also make use of triple buffering for our rendering, creating MTLBuffers of 3x the size of the parameters (with proper alignment).

I'm trying to replace parts of it with MetalPetal, but I'm confused as to what is the best way to encode arguments. I've been trying to use MTIDataBuffer in the following way:

class UniformBuffer<T> {
    var buffers: [MTIDataBuffer]
    var maxBuffersInFlight: Int
    var index: Int = 0
    var current: MTIDataBuffer {
        buffers[index]
    }

    init(data: T, context: RenderContext) {
        maxBuffersInFlight = context.maxFramesInFlight
        buffers = (0..<maxBuffersInFlight).compactMap { _ in
            MTIDataBuffer(values: [data], options: .storageModeShared)
        }
    }

    func update(_ data: T) {
        index = (index + 1) % maxBuffersInFlight
        buffers[index].unsafeAccess { (buffer: UnsafeMutableBufferPointer<T>) -> Void in
            buffer[0] = data
        }
    }
}

While this works, it creates 3x buffers instead of a contiguous buffer (worse performance?). It also incurs a bit of performance hit due to the fact that the argument encoder is looking up buffers in a cache every time. I'm also concerned whether this fact counter-affects the triple buffering and may reuse the same single MTLBuffer under the hood?

If I am OK with managing my MTLBuffers manually, how can I use them in conjunction with MetalPetal? Is there a better way to optimise resource usage?

YuAo · 2023-03-04T17:40:09Z

I think you are doing pretty well under the current framework limitation.

You can do a little "hack" if you want, using MTIGeometry, to fully customize the command encoding part. You can access the command encoder in the MTIGeometry protocol. Store the buffer in your MTIGeometry object and encode that buffer yourself. (Pass [:] to parameters, to skip the default argument encoding)

Examples of implementing MTIGeometry:

MetalPetal/MetalPetalExamples/Shared/BouncingBallsView.swift

Lines 16 to 23 in f9b7889

    
           private class PointVertices: NSObject, MTIGeometry { 
        
               func copy(with zone: NSZone? = nil) -> Any { 
        
                   return self 
        
               } 
        
               func encodeDrawCall(with commandEncoder: MTLRenderCommandEncoder, context: MTIGeometryRenderingContext) { 
        
                   commandEncoder.drawPrimitives(type: .point, vertexStart: 0, vertexCount: 1, instanceCount: BouncingBallsView.numberOfParticles) 
        
               } 
        
           }

MetalPetal/Frameworks/MetalPetal/MTIVertex.m

Lines 233 to 240 in f9b7889

    
           - (void)encodeDrawCallWithCommandEncoder:(id<MTLRenderCommandEncoder>)commandEncoder context:(id<MTIGeometryRenderingContext>)context { 
        
               //assuming buffer bounded to index 0. 
        
               [_vertexBuffer encodeToVertexBufferAtIndex:0 withCommandEncoder:commandEncoder]; 
        
               if (_indexBuffer) { 
        
                   [commandEncoder drawIndexedPrimitives:_primitiveType indexCount:_indexCount indexType:MTLIndexTypeUInt32 indexBuffer:[_indexBuffer bufferForDevice:commandEncoder.device] indexBufferOffset:0]; 
        
               } else { 
        
                   [commandEncoder drawPrimitives:_primitiveType vertexStart:0 vertexCount:_vertexCount]; 
        
               }

Examples of using MTIGeometry:

MetalPetal/MetalPetalExamples/Shared/BouncingBallsView.swift

Lines 56 to 57 in f9b7889

    
           let renderCommand = MTIRenderCommand(kernel: BouncingBallsView.renderKernel, geometry: PointVertices(), images: [computeOutput], parameters: ["data": frameDataBuffer.buffer]) 
        
           let output = MTIRenderCommand.images(byPerforming: [renderCommand], outputDescriptors: [MTIRenderPassOutputDescriptor(dimensions: MTITextureDimensions(width: 1024, height: 1024, depth: 1), pixelFormat: .unspecified, clearColor: MTLClearColor(red: 0, green: 0, blue: 0, alpha: 1), loadAction: .clear, storeAction: .store)])

it creates 3x buffers instead of a contiguous buffer (worse performance?)

The 3x buffers only create once. It's a one-time cost, using a contiguous buffer won't help much.

It also incurs a bit of performance hit due to the fact that the argument encoder is looking up buffers in a cache every time.

Have you measured the performance of this part? This should be super fast. The cache uses the object address directly for the key hashing and object comparison, also, the hash table is very small (you usually only have one MTLDevice).

I'm also concerned whether this fact counter-affects the triple buffering and may reuse the same single MTLBuffer under the hood?

As I mentioned earlier. The cache uses the object's pointer/address, so it does not have any counter-affects on the triple buffering.

For further improvements on filter argument encoding, I'm thinking about adding APIs to register custom argument encoders. I'll let you know when I have some designs to discuss.

OskarGroth · 2023-03-06T12:52:20Z

You can do a little "hack" if you want, using MTIGeometry, to fully customize the command encoding part

Interesting! I'll definitely try that, although it does feel like quite a workaround.

The 3x buffers only create once. It's a one-time cost, using a contiguous buffer won't help much.

True, thanks for confirming.

Have you measured the performance of this part? This should be super fast.

Fair point; I haven't. If the cache uses object addresses I agree with you, this shouldn't be a problem for speed or accuracy. Thanks for explaining!

For further improvements on filter argument encoding, I'm thinking about adding APIs to register custom argument encoders. I'll let you know when I have some designs to discuss

Definitely do! Maybe MTIRenderPipelineKernel should be extended to support a parameter encoding closure instead of a parameter dictionary? For example, I would love to have the convenience of using calls like kernel.makeImage to quickly generate a full viewport image but with custom fragment buffer encoding. So something like a:

MTIRenderPipelineKernel(encoding: @escaping ((MTLRenderCommandEncoder) -> Void), dimensions: MTITextureDimensions, pixelFormat: MTLPixelFormat = .unspecified)

and matching makeImage.

I also wanted to ask about texture usage. If I regenerate the same image (but with different parameters) every frame, how can I do so in the most performant way possible? Doing it manually I would reuse a single texture (assuming dimensions are the same) but with a loadAction: .clear, but I noticed my MetalPetal pipeline has loadAction: .dontCare so I am assuming it is creating a new texture for each frame. If so, how can I optimise resource usage there?

YuAo · 2023-03-28T12:11:46Z

I also wanted to ask about texture usage. If I regenerate the same image (but with different parameters) every frame, how can I do so in the most performant way possible? Doing it manually I would reuse a single texture (assuming dimensions are the same) but with a loadAction: .clear, but I noticed my MetalPetal pipeline has loadAction: .dontCare so I am assuming it is creating a new texture for each frame. If so, how can I optimise resource usage there?

@OskarGroth

MetalPetal uses a texture pool internally. It allocates and resues textures from the pool. (You can call context.reclaimResources() to drain the pool)

So it does not create a new texture for each frame.

MetalPetal uses donotCare by default for performance purposes only. We assume that you draw every pixel of the texture so the device does not need to clear the texture before each draw (since every pixel will be overwritten). You can change the loadAction: .donotCare by providing a custom MTIRenderPassOutputDescriptor.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Most efficient way to encode arguments #355

Most efficient way to encode arguments #355

OskarGroth commented Mar 3, 2023 •

edited

YuAo commented Mar 4, 2023 •

edited

OskarGroth commented Mar 6, 2023 •

edited

YuAo commented Mar 28, 2023 •

edited

Most efficient way to encode arguments #355

Most efficient way to encode arguments #355

Comments

OskarGroth commented Mar 3, 2023 • edited

Checklist

YuAo commented Mar 4, 2023 • edited

OskarGroth commented Mar 6, 2023 • edited

YuAo commented Mar 28, 2023 • edited

OskarGroth commented Mar 3, 2023 •

edited

YuAo commented Mar 4, 2023 •

edited

OskarGroth commented Mar 6, 2023 •

edited

YuAo commented Mar 28, 2023 •

edited