Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output values are not changing for different inputs #64

Open
vmelentev opened this issue May 8, 2024 · 8 comments
Open

Output values are not changing for different inputs #64

vmelentev opened this issue May 8, 2024 · 8 comments

Comments

@vmelentev
Copy link

vmelentev commented May 8, 2024

Hi, I am using a movenet model from tfhub.dev with FrameProcessor and VisionCamera to try and apply human pose estimation to a person. It doesn't appear as though it is tracking my movements as the outputs in the console are always the same. This appears to be the case with all models I try to use.

Here is the link to the model

Here is the code I am using to resize the frame:

function getArrayFromCache(size) {
    'worklet'
    if (global[CACHE_ID] == null || global[CACHE_ID].length !== size) {
      global[CACHE_ID] = new Uint8Array(size);
    }
    return global[CACHE_ID];
  }

function resize(frame, width, height) {
    'worklet'
    const inputWidth = frame.width;
    const inputHeight = frame.height;
    const arrayData = frame.toArrayBuffer();

    const outputSize = width * height * 3; // 3 for RGB
    const outputFrame = getArrayFromCache(outputSize);

    for (let y = 0; y < height; y++) {
      for (let x = 0; x < width; x++) {
        // Find closest pixel from the source image
        const srcX = Math.floor((x / width) * inputWidth);
        const srcY = Math.floor((y / height) * inputHeight);

        // Compute the source and destination index
        const srcIndex = (srcY * inputWidth + srcX) * 4; // 4 for BGRA
        const destIndex = (y * width + x) * 3;           // 3 for RGB

        // Convert from BGRA to RGB
        outputFrame[destIndex] = arrayData[srcIndex + 2];   // R
        outputFrame[destIndex + 1] = arrayData[srcIndex + 1]; // G
        outputFrame[destIndex + 2] = arrayData[srcIndex];     // B
      }
    }

    return outputFrame;
  }

Here is my frame processor function:

  const frameProcessor = useFrameProcessor((frame) => {
    'worklet'
    if (model == null) return

    const newFrame = resize(frame, 192, 192)

    const outputs = model.runSync([newFrame])
    outputs = outputs[0]
    console.log(outputs[1])
  }, [model])

Here is the output in the console:


 LOG  0.46377456188201904
 LOG  0.46377456188201904
 LOG  0.46377456188201904
 LOG  0.46377456188201904
 LOG  0.46377456188201904
 LOG  0.46377456188201904

For each frame the camera sees the result is always the same.

Does anyone know how to resolve this issue?

Thank you

@mrousavy
Copy link
Owner

mrousavy commented May 8, 2024

Please format your code properly.

@willadamskeane
Copy link

I had a similar issue - in my case, the input size didn't match what the model was expecting. I'd also check that the model accepts uint8 input.
You can verify on https://netron.app

@vmelentev
Copy link
Author

I had a similar issue - in my case, the input size didn't match what the model was expecting. I'd also check that the model accepts uint8 input. You can verify on https://netron.app

Hi, the frame input size and type (uint8) is correct. If it weren't, I wouldn't get console outputs above and I would get errors such as 'Invalid input size/type'.

My issue is that the output is not changing regardless of the input. If I understand correctly this model is meant to detect different features of the human body (nose, eyes, elbows, knees ect) and output values based on where they appear on the screen, which doesn't appear to be the case as the output values are always the same.

@mrousavy
Copy link
Owner

Does your newFrame contain new data each time?

@Silvan-M
Copy link

Hi! Seemingly have the same problem. The resized image does change, however not the output of the tflite model.
I get the same when running your /example in this repo with the following output:

 LOG  Result: 25
 LOG  Running inference on 640 x 480 yuv Frame
 LOG  Result: 25
 LOG  Running inference on 640 x 480 yuv Frame
 LOG  Result: 25
 LOG  Running inference on 640 x 480 yuv Frame
 LOG  Result: 25
 LOG  Running inference on 640 x 480 yuv Frame
 LOG  Result: 25
 LOG  Running inference on 640 x 480 yuv Frame
 LOG  Result: 25
 LOG  Running inference on 640 x 480 yuv Frame
 LOG  Result: 25
 ...

@mrousavy
Copy link
Owner

Well if the resized image changes but the output values don't then it might be an issue with your TFLite model? I am not sure if this is an issue here in this library...

@Silvan-M
Copy link

Ok, I can confirm it was an issue with the input size as @willadamskeane suggested. For some reason, it does not output an error on wrong input size (e.g. 151x150 instead of 150x150 px using the vision-camera-resize-plugin).

If this is considered expected behaviour, from my end the issue can be closed.

@s54BU
Copy link

s54BU commented May 28, 2024

Hi all, after some experimentation it appears as though my code for resizing the frame does not work properly and does not put the frame into the correct format, yet it wasn't throwing an error for some reason. I have resolved this issues by switching to using the vision-camera-resize-plugin which @Silvan-M suggested and it now works. Thank you for your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants