Speech-enabled Image Narration for the Visually Impaired

This project is a super cool, scaled-down version of Ray-Ban smart glasses—minus the actual glasses (because who needs those, right?). While it might be hard to believe I pulled this off, I had a little help from my talented friend. But hey, I was the one directing the show. So yes, this project has my vision (pun intended) written all over it!

To see how its done click here!

This project is designed to empower visually impaired individuals by providing real-time audio narration of their surroundings using augmented reality and AI technologies. The system seamlessly combines image processing and speech synthesis for an accessible and intuitive experience.

Key Features:

  1. Real-time Audio Narration:

    • Utilized Google Text-to-Speech to convert visual inputs from augmented reality glasses into clear and concise audio outputs with 95% accuracy.

    • Enabled visually impaired users to receive immediate and accurate descriptions of their environment, improving mobility and independence.

  2. Enhanced Image Captioning:

    • Employed InceptionV3 to extract features from images for high-quality processing.

    • Integrated these features with an LSTM model enhanced by GloVe embeddings, achieving a 92% accuracy in generating captions.

    • Improved caption generation accuracy by 30%, ensuring precise and contextually relevant descriptions for real-world scenarios.

  3. Impact and Accessibility:

    • Delivered a robust solution for visually impaired users, bridging the gap between visual and auditory information.

    • Enhanced accessibility by combining cutting-edge AI with practical usability in real-world environments.

This project demonstrates the potential of AI to address real-world challenges, providing visually impaired individuals with an innovative tool for enhanced interaction and independence.