New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPS backend thinks that subnormals are equal to zero #125051
Comments
Confirmed this issue still occurs on the latest torch, we're investigating on the mps side |
MPS does not natively support double, so the float value next after zero will be subnormal. This is treated as zero, so MPS results not matching CPU in this case is intended behavior. Closing this issue for now |
@skotapati Thanks for clarifying! Is this a hardware limitation? Could MPS possibly support subnormal numbers in fp32 in the future if it became a priority? From what I can tell subnormal numbers are supported in fp16. Is this correct? Will operations on subnormal numbers be different than normal numbers in fp16 (eg. slower)? Assuming adverse effects; Is there a way to flag FTZ/DAZ with passthrough in MPS? |
Weird, indeed import Metal
let shader_source = """
#include <metal_stdlib>
using namespace metal;
kernel void nextafter_pred(device float *data [[buffer(0)]],
device bool *pred [[buffer(1)]],
uint thread_index [[thread_position_in_grid]]) {
data[thread_index] = nextafter(float(thread_index) - 8.0, 1e4);
pred[thread_index] = data[thread_index] > 0.0;
}
"""
guard let device = MTLCopyAllDevices().first else { fatalError("Not Metal device found") }
print("Using device \(device.name)")
let options = MTLCompileOptions()
options.languageVersion = .version3_1
options.fastMathEnabled = false
let library = try! device.makeLibrary(source:shader_source, options:options)
guard let mfunc = library.makeFunction(name: "nextafter_pred") else { fatalError("Can't find function") }
let nelem = 256;
guard let dbuf = device.makeBuffer(length:nelem * MemoryLayout<Float>.size, options: [.storageModeShared]) else { fatalError("Can't alloc") }
guard let pbuf = device.makeBuffer(length:nelem * MemoryLayout<Bool>.size, options: [.storageModeShared]) else { fatalError("Can't alloc") }
guard let queue = device.makeCommandQueue() else { fatalError("Can't make queue") }
guard let cmdBuffer = queue.makeCommandBuffer() else { fatalError("Can't make command buffer") }
guard let computeEncoder = cmdBuffer.makeComputeCommandEncoder() else { fatalError("Can't make compute encoder") }
computeEncoder.setComputePipelineState(try! device.makeComputePipelineState(function: mfunc))
computeEncoder.setBuffer(dbuf, offset:0, index: 0)
computeEncoder.setBuffer(pbuf, offset:0, index: 1)
computeEncoder.dispatchThreads(MTLSizeMake(nelem, 1, 1), threadsPerThreadgroup:MTLSizeMake(nelem, 1, 1))
computeEncoder.endEncoding()
cmdBuffer.commit()
cmdBuffer.waitUntilCompleted()
let float_data = dbuf.contents().assumingMemoryBound(to: Float.self)
let bool_data = pbuf.contents().assumingMemoryBound(to: Bool.self)
for i in 0..<16 {
print("\(i): \(float_data[i]) >0 is \(bool_data[i])")
} |
What does hardcoding |
馃悰 Describe the bug
Prints
Versions
2.3.0, nightlies
cc @kulinseth @albanD @DenisVieriu97 @jhavukainen
The text was updated successfully, but these errors were encountered: