Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions for Tracking Players in Hockey #271

Open
AbinashSankaran opened this issue Jul 16, 2023 · 6 comments
Open

Suggestions for Tracking Players in Hockey #271

AbinashSankaran opened this issue Jul 16, 2023 · 6 comments
Labels
help wanted Extra attention is needed

Comments

@AbinashSankaran
Copy link

AbinashSankaran commented Jul 16, 2023

Attached is the use case video with this issue

The situation we are working on
We are working on a player-tracking solution for Ice Hockey. We are trying to record the statistics and analytics of a game match with Deep Learning. We will be recording metrics like, on a given frame who are all inside the rink (playing area), player position, and many others like who made a goal and how many goals the goalie (defender) defended. To track these, it is necessary to know which player stood where. Since the game is so fast-moving, it is impossible to recognize all players' jersey numbers in each frame. This is where tracking of each player helps to get the analytics better.

Need help with
A video input will be given with a duration ranging between (10 - 40 seconds). We have a custom-trained Yolo model which will detect three classes, player, goalie, and referee. We need a tracking solution that should take care of moving cameras, and reidentification of the same players if they are out of frame for say 1 second. We would like to get suggestions on which model from norfair to go with (or which combination) and some inputs on the hyperparameters for the case of ice hockey.

Solutions Tried
We have already tried the following from norfair,
- Tracking
- Tracking with ReID
- Tracking with Camera Motion
- Tracking with OpenPose

Issues Faced
1. The same player was given multiple track ids within a few seconds.
2. In some tuning cases, the tracking id of one player was given to another player if they collide with each other which will mess up the entire solution we have developed

We tuned some of the parameters but were unable to get significant improvement. With ReID and Camera Motion, the player was not given the same id after getting into the frame after say 15 frames (we did try tuning the hit_counter_max initialization_delay parameters)

We would like to get some suggestions on model selection and tuning parameters

Video.Clip.mp4
@AbinashSankaran AbinashSankaran added the help wanted Extra attention is needed label Jul 16, 2023
@facundo-lezama
Copy link
Collaborator

Hi @AbinashSankaran! Great to hear you are considering using Norfair for this exciting project!

As you know, multiple object tracking in sports is challenging and requires significant effort to make it work. But the scenario you describe (10 to 40 seconds videos) is one where you could get good results.

To get a sense of the current state of your tracking output, can you share a video where we can see the problems? With that, I can suggest some specific ideas to help you solve your problems. Or at least try.

@AbinashSankaran
Copy link
Author

Hi @facundo-lezama , here are some of the videos i have generated with Norfair
I am not able to upload the 10 second clips here, as GitHub will only allow upto 10 MB. So attaching the video links here

Video 1: Just the Tracker (distance_thresh=0.7, iou) => As you can see, when the players cross, the id 11 and id 1 are wrongly assigned here

Video 2: Tracker with Camera Motion (The track id is assigned multiple times within a second) here

Video 3: Tracker with ReID. Again the same issue, ids are wrongly assigned as they cross. here

It would be greateful if atleast we could remove that id getting assigned to a wrong player issue and able to keep the same id as long as possible.

PS: So far the naive tracker woks great. But the ID issue is very much concerning. Also, normal yolo model will work fine though we have used custom model to detect only players.

@DiegoFernandezC
Copy link
Member

Hi @AbinashSankaran, I have conducted several experiments and would like to share the results with you, which you may find useful for your own trials.

For these experiments, I fine-tuned a YOLOv8 model using a limited set of examples to detect the classes that you specified.

Below are the tests I performed:

  1. Norfair stock (excluding ReID or camera motion): I employed iou as the distance_function and set 0.8 as the distance_threshold across all experiments. The results can be viewed here. Upon comparison with the video you uploaded, I noticed some enhancements. It may be beneficial to reassess the detector.

  2. Norfair with camera motion: There was no noticeable improvement in comparison to the stock experiment. The camera movements were quite subtle, suggesting that this feature may not be necessary in this context. The results can be found here. The only modification made was the min_distance, which was set to 7. The code was derived from the camera motion demo. Please remember that it's crucial to mask the detections as demonstrated in our demo. In your videos, it's equally important to mask the channel logo and the scoreboard, as these elements remain static throughout the video. We aim to exclude points from these areas to improve the accuracy of the camera motion estimations.

  3. Norfair with ReID: Although we do not currently have a user guide, I can detail the methods we employed and provide a video to demonstrate a potential outcome of using the ReID feature with embeddings. We utilized this repository, where a model for ReID can be fine-tuned. The repository includes a model zoo section with pre-trained models across various domains. In this instance, we used the ResNet50 trained with the Market1501 dataset. We fine-tuned this model for several players, the referee, and the goalie. The results indicated improvements and maintained consistent IDs for the referee and goalie. However, there is significant room for enhancement. The solution is not robust enough to retain the same ID for players, and we believe that if you do not require an online tracker (such as Norfair, which necessitates an ID for each person in every frame), better results can be achieved by analyzing the entire video prior to returning the results. The primary reason is that the player's number is not always visible and players often resemble each other. Additionally, it is likely that using the position of each player could further improve the results and some other rules that you know about this domain.

Here is the detector that I've fine-tuned, should you wish to replicate the results obtained from the first two alternatives.

We can delve deeper into each alternative based on your feedback on the performance of the videos I sent you, and explore ways to enhance your results. Please let me know your thoughts.

@AbinashSankaran
Copy link
Author

AbinashSankaran commented Jul 26, 2023 via email

@DiegoFernandezC
Copy link
Member

Yes, I fine-tuned a YOLOv8 model using a small dataset that I constructed from your video, which can be found here. The initial model used for this process was the pre-trained version provided by Ultralytics, specifically, I believe it was the nano version.

@msaqib17
Copy link

msaqib17 commented Oct 4, 2023

Hi @AbinashSankaran, I have conducted several experiments and would like to share the results with you, which you may find useful for your own trials.

For these experiments, I fine-tuned a YOLOv8 model using a limited set of examples to detect the classes that you specified.

Below are the tests I performed:

  1. Norfair stock (excluding ReID or camera motion): I employed iou as the distance_function and set 0.8 as the distance_threshold across all experiments. The results can be viewed here. Upon comparison with the video you uploaded, I noticed some enhancements. It may be beneficial to reassess the detector.
  2. Norfair with camera motion: There was no noticeable improvement in comparison to the stock experiment. The camera movements were quite subtle, suggesting that this feature may not be necessary in this context. The results can be found here. The only modification made was the min_distance, which was set to 7. The code was derived from the camera motion demo. Please remember that it's crucial to mask the detections as demonstrated in our demo. In your videos, it's equally important to mask the channel logo and the scoreboard, as these elements remain static throughout the video. We aim to exclude points from these areas to improve the accuracy of the camera motion estimations.
  3. Norfair with ReID: Although we do not currently have a user guide, I can detail the methods we employed and provide a video to demonstrate a potential outcome of using the ReID feature with embeddings. We utilized this repository, where a model for ReID can be fine-tuned. The repository includes a model zoo section with pre-trained models across various domains. In this instance, we used the ResNet50 trained with the Market1501 dataset. We fine-tuned this model for several players, the referee, and the goalie. The results indicated improvements and maintained consistent IDs for the referee and goalie. However, there is significant room for enhancement. The solution is not robust enough to retain the same ID for players, and we believe that if you do not require an online tracker (such as Norfair, which necessitates an ID for each person in every frame), better results can be achieved by analyzing the entire video prior to returning the results. The primary reason is that the player's number is not always visible and players often resemble each other. Additionally, it is likely that using the position of each player could further improve the results and some other rules that you know about this domain.

Here is the detector that I've fine-tuned, should you wish to replicate the results obtained from the first two alternatives.

We can delve deeper into each alternative based on your feedback on the performance of the videos I sent you, and explore ways to enhance your results. Please let me know your thoughts.

Hi @AbinashSankaran, I have conducted several experiments and would like to share the results with you, which you may find useful for your own trials.

For these experiments, I fine-tuned a YOLOv8 model using a limited set of examples to detect the classes that you specified.

Below are the tests I performed:

  1. Norfair stock (excluding ReID or camera motion): I employed iou as the distance_function and set 0.8 as the distance_threshold across all experiments. The results can be viewed here. Upon comparison with the video you uploaded, I noticed some enhancements. It may be beneficial to reassess the detector.
  2. Norfair with camera motion: There was no noticeable improvement in comparison to the stock experiment. The camera movements were quite subtle, suggesting that this feature may not be necessary in this context. The results can be found here. The only modification made was the min_distance, which was set to 7. The code was derived from the camera motion demo. Please remember that it's crucial to mask the detections as demonstrated in our demo. In your videos, it's equally important to mask the channel logo and the scoreboard, as these elements remain static throughout the video. We aim to exclude points from these areas to improve the accuracy of the camera motion estimations.
  3. Norfair with ReID: Although we do not currently have a user guide, I can detail the methods we employed and provide a video to demonstrate a potential outcome of using the ReID feature with embeddings. We utilized this repository, where a model for ReID can be fine-tuned. The repository includes a model zoo section with pre-trained models across various domains. In this instance, we used the ResNet50 trained with the Market1501 dataset. We fine-tuned this model for several players, the referee, and the goalie. The results indicated improvements and maintained consistent IDs for the referee and goalie. However, there is significant room for enhancement. The solution is not robust enough to retain the same ID for players, and we believe that if you do not require an online tracker (such as Norfair, which necessitates an ID for each person in every frame), better results can be achieved by analyzing the entire video prior to returning the results. The primary reason is that the player's number is not always visible and players often resemble each other. Additionally, it is likely that using the position of each player could further improve the results and some other rules that you know about this domain.

Here is the detector that I've fine-tuned, should you wish to replicate the results obtained from the first two alternatives.

We can delve deeper into each alternative based on your feedback on the performance of the videos I sent you, and explore ways to enhance your results. Please let me know your thoughts.

Do you example code where you have used ReID with Norfair? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants