In case you’re not anxious about company surveillance bots scraping your purchasing record and manipulating you thru advertising, you should buy any variety of off-the-shelf good audio system in your dwelling. Alternatively, you may roll your individual like [arpy8] did, and hold your life a bit extra personal.
The construct is predicated round an ESP32 microcontroller. It connects to the ‘internet by way of its inbuilt Wi-Fi connection, and listens out in your voice with an INMP441 omnidirectional microphone module. The audio information is trucked off to a backend server operating a Whisper speech-to-text mannequin. The textual content is then handed to Google’s Gemini 2.5 Flash massive language mannequin. The response generated is handed to the Piper Neural Voice text-to-speech engine, despatched again to the ESP32, and spat out by way of the gadget’s DAC output and a speaker connected to an LM386 amplifier. Mainly, something you may ask Gemini, you are able to do with this gadget.
By advantage of utilizing a industrial massive language mannequin, it’s not completely personal by any means. Nonetheless, it’s a minimum of a bit farther eliminated than utilizing a wise speaker that’s immediately logged in to your Amazon/Google/Hulu/Beanstikk account. Information are on Github for these desperate to dive into the code. We’ve seen another enjoyable builds alongside these strains earlier than, too. Video after the break.