GitHub - jaeheonshim/VisionNav: 1st Place at HackGT 12 — The Curator’s Cause: An assistive technology that helps the visually impaired navigate daily life through spatialized sound.

Inspiration

VisionNav is built with a focus on aiding the visually impaired who are looking for an easily accessible way to navigate their daily lives. The reason behind creating such an app is most of our team knows someone in their family who may face difficulties with vision. With further research into this topic, we found that 7.3 million adults in the U.S. are classified as visually impaired, with 1 million fully blind, of which 70% are unemployed. This was the primary motivation behind implementing VisionNav, which aims to help its users find affordable ease in their daily lives.

What it does

VisionNav utilizes the built-in camera and LIDAR in an iPhone to stream a real-time visualization of the user's surroundings, and uses sound within devices like the AirPods to navigate around the setting. Within this setting, VisionNav aims to accomplish three different categories of tasks. The first behavior is set to navigate around obstacles using 3D audio cues that go off in either the left or right AirPod, directing the user to move in that direction. The decision to set the audio cue off is made based on the proximity and depth perception of the object, as determined by the real-time streaming data collected by the app. The next behavior lies in the user's ability to ask VisionNav AI assistant to locate a specific item in a given scene. With the help of the AI assistant, the command stated by the user is deciphered and processed into the YOLO model, which identifies the object based on the deciphered message. Once the object is confirmed to be in the frame, the user will need to put their hand in the frame as well, which will then prompt the app to then guide the hand using the behavior from the navigation ability towards the target object. The final behavior is to perform miscellaneous tasks, such as finding a path to a specific local location, locating an available seat in a room, or even reading a book. This task is accomplished through the combination of the AI assistant and navigation ability, where we identify the available locations in the frame and navigate towards that destination while avoiding objects. The AI assistant automatically switches to navigation mode based on the user's command.

How we built it

VisionNav has a few distinct phases. First, to utilize the iPhone camera and LIDAR, the app was built using Xcode and Swift, which provided good compatibility with the required services. Next, the real-time images captured by the iPhone were streamed to a local web server, where a Python script was incorporated to read the pictures. More Python functions were created to divide the frame captured by the iPhone into five equidistant columns, which tracked the depth and distance of objects within the frame. Next, the images were read by the YOLO model, which was used to identify objects. This data was then pushed into other Python functions, which determined which of the audio cues needed to be sounded. We then utilized Gemini APIs for the vocal commands input by the user into the app. We used Gemini as the AI assistant to pass messages, such as locating specific targets.

Challenges we ran into

The main challenge faced during the implementation of VisionNav was accurately handling multiple objects in proximity, as the app struggled to provide clear enough audio cues to account for both obstacles, given that it detects both objects simultaneously. There was also considerable difficulty with the automated switching between "find an object mode" and "navigate around the obstacle mode," where the app struggled with the timing of identifying the object as the end goal and avoiding obstacles while progressing towards that target object. There was also a struggle with the voice recognition accuracy of the Gemini text-to-speech implementation, as the feature's accuracy was not efficient.

Accomplishments that we're proud of

The main accomplishment we are proud of is the feature where we use the hand navigation to find a target object in a set frame based on the 3D audio cues which signified the left and right directions through the corresponding AirPod signal, and the aligned forward motion of the hand with a different audio with rapid ticking noise as the hand got closer to the target object, all of which was processed through the help of Gemini to help understand the user's desire to find the target object, along with the YOLO models ability to locate and identify the target object in the set frame.

What we learned

VisionNav was a project that provided valuable learning experiences. Starting with the sensing aspect, on Xcode using Swift, our team learned to activate and extract the data collected through the camera and lidar built into the iPhone. Our team also experimented with the use of depth data and distancing while using audio libraries from Python to connect the conditional situations of playing audio cues for the user. There was also the use of the YOLO model, a real-time object detection algorithm, which is vital in identifying objects present in a given frame. We also learned how to utilize Gemini's API keys to integrate the speech-to-text feature specifically into our app. With all these functionalities present, we also had to learn how to efficiently combine all these components into one application, which resulted in VisionNav.

What's next for VisionNav

With the current implementation of VisionNav, we have created straightforward functionality that aims to make mobility easier for visually impaired users. We aim to enhance the accuracy of the audio cues by improving latency and overall programming. We also want to integrate the app into wearable hardware, such as smart glasses, with a strict focus on implementing the app's functionality, which could make the viewing aspect of the app more natural while keeping costs low.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
VisionApp		VisionApp
audio		audio
gemini		gemini
scripts		scripts
vision		vision
vision_tracker		vision_tracker
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Inspiration

What it does

How we built it

Challenges we ran into

Accomplishments that we're proud of

What we learned

What's next for VisionNav

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

jaeheonshim/VisionNav

Folders and files

Latest commit

History

Repository files navigation

Inspiration

What it does

How we built it

Challenges we ran into

Accomplishments that we're proud of

What we learned

What's next for VisionNav

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages