This is a multimodal AI wearable coded by GPT-4



summary
Summary

Project Ring combines language and image models in an AI wearable that looks at the world through a camera and comments on it with an AI-generated voice.

The simplest way to describe Project Ring is as a wearable Google Lens with voice controls. According to developer Mina Fahmi, the project aims to “demonstrate low-friction interactions which blend physical & digital information between humans & AI.”

To that end, Fahmi built a wrist-worn minicomputer with a camera and joystick that can visually analyze the environment in real-time using a Replicate image-to-text modeldescribe it in text, and comment on it via a ChatGPT.

The text is converted to speech using Eleven Labs’ text-to-speech service, which is then transmitted to bone-conduction headphones via an Android smartphone. The headphones have a built-in microphone that allows the user to speak back to the wearable, for example, to ask questions about the environment. The user’s voice is converted to text using OpenAI’s Whisper so that ChatGPT can chime in with some more or less intelligent remarks. All data is processed in the Google Cloud.

ad

Image: Midjourney prompted by THE DECODER

“Project Ring feels like having a curious friend on your shoulder – one who sees the world as you do and unobtrusively whispers thoughts in your ear,” Fahmi writes.

GPT-4 writes code for the wearable, but “it wasn’t easy”

Fahmi says he did all the code generation for Project Ring with GPT-4. In total, the language model generated about 750 lines of code. That includes a Python script for the Raspberry Pi, a cloud application, a website, and an Android application.

Fahmi has a background in coding, but he says that he hasn’t written any code in years. He believes his project shows that it is possible, though not easy, to use GPT-4 to program complete software prototypes.

His coding background helped him get GPT-4 to make corrections in the right places or to assemble the code correctly by copying and pasting. According to Fahmi, GPT-4 occasionally lost context and needed to be realigned. The code was also unstable and neither efficient nor production-ready, he said.

Despite these shortcomings, AI “may be capable of automating a large majority of coding tasks in a relatively short time period,” Fahmi speculates.

Recommendation

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top