GitHub - rishabh01solanki/PixeLearner: Combining the power of MobileNetV2 with the privacy of on-device learning. Benefit from real-time updates and efficient image processing, all while ensuring your data remains securely on your device. Experience precision, speed, and trust with PixeLearner.

PixeLearner: A Personalized ML-Powered App

PixeLearner is a cutting-edge application that uses a combination of state-of-the-art Machine Learning algorithms to recognize and label individuals in real-time. Its strength lies in its ability to not only recognize individuals from visual data but also associate labels using natural language processing.

Overview

Objective: To label people you know easily in a natural way, almost like making introductions in person.

Use Cases:

Recognizing friends or family from live camera feed.
Making quick introductions by associating a name with a face.
Training the model in real-time with new introductions.

Advantage:

On-device processing ensures user data privacy.
Offers potential extensions to a variety of applications.
Provides a great learning experience for developers diving into ML and NLP integration.

How Does it Work?

Camera Feed: Using the AV Foundation, the app captures live video feed and processes each frame. Special care is taken to ensure resources are utilized efficiently and no memory leaks occur.
Model Inference: Each frame captured is processed by a custom CNN (MobileNetV2). This model processes the image and provides feature embeddings for the face detected.
Speech Recognition: The user can vocalize labels (like names) using a speech-to-text module. This takes the user's spoken words and converts them to a textual format.
BERT NLP: The textual data is then processed by BERT, a state-of-the-art NLP algorithm, to ensure it is tokenized and properly formatted. If a token is not in BERT’s vocabulary, it uses Apple's NLTagger from the NLToolkit to further split and label the string.
Connecting the Dots: Once we have feature embeddings from the CNN and labels from BERT, we can associate the two. Thus, when a recognized face appears in the camera feed, the app can label them in real-time.
Model Update: Over time, as more labels are introduced and recognized, the model can be updated to improve its accuracy.

Technical Details:

Camera Feed:
- Uses AV Foundation to capture live video.
- The frame is processed and prepared (like squaring, resizing, etc.) before inference.
- The face's coordinates in the frame are sent to a face preview layer for better visualization.
BERT-NLP:
- Tokenizes strings into words and word pieces.
- Uses Apple's NLTagger for splitting strings if necessary.
- Can further split strings until a match is found in the vocabulary or label it as unknown.
Speech to Text:
- An observable class that starts and stops recording based on user interaction.
- Outputs translated audio to text.
Inference:
- The model takes pixel buffer as input and returns a label and confidence score.
- Errors are caught and handled gracefully.
Update:
- A model update class takes in training data and a completion handler.
- After updating, the updated model is saved and the context is refreshed to refer to the updated model.

Future Scope:

With the foundation in place, future versions can potentially integrate more complex ML models, offer cloud-based model updates, and perhaps even extend into AR/VR spaces.

Prerequisites

Xcode (latest version)
iOS device for deployment
CoreMLtools (ensure it's installed)

Installation and Running

Clone this repository:

git clone https://github.com/rishabh01solanki/PixeLearner.git

Open the Xcode project in Xcode.
Connect your iOS device.
Build and run the project to deploy the application on your device.

Upcoming

Stay tuned for the newer version, which will soon be available on the App Store!

Contributing

Feel free to open issues, suggest improvements, and make pull requests. Your contributions are welcome!

License

This project is open source, under MIT license.

Acknowledgments

MobileNetV2 creators for the base model architecture. BERT creators for text processing. CoreML tools for enabling on-device machine learning.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
PixeLearner		PixeLearner
pix2.0.xcodeproj		pix2.0.xcodeproj
.DS_Store		.DS_Store
PixeLearner.gif		PixeLearner.gif
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PixeLearner

PixeLearner

pix2.0.xcodeproj

pix2.0.xcodeproj

.DS_Store

.DS_Store

PixeLearner.gif

PixeLearner.gif

README.md

README.md

Repository files navigation

PixeLearner: A Personalized ML-Powered App

Overview

How Does it Work?

Technical Details:

Future Scope:

Prerequisites

Installation and Running

Upcoming

Contributing

License

Acknowledgments

About

Releases

Packages

Languages

rishabh01solanki/PixeLearner

Folders and files

Latest commit

History

Repository files navigation

PixeLearner: A Personalized ML-Powered App

Overview

How Does it Work?

Technical Details:

Future Scope:

Prerequisites

Installation and Running

Upcoming

Contributing

License

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Languages