Recently, I was fooling around with ChatGPT. And I started thinking: "How cool would it be if ChatGPT powered Siri or Google Assistant?".
I know this will never happen, but this got me wondering: "How hard would it be to create my voice assistant?".
Well, it turned out that it wasn't, and after a couple of evenings, I got my Voice Assistant called FRIDAI (Fricking Revolutionary Intelligent Digital Assistant Interface).
Here is a quick demo:
It's not perfect, and it is nowhere near the level of Siri or Google Assistant. But considering it took me only a few evenings, the overall result is not too bad.
There is no official ChatGPT API, so I used OpenAI's completions API to power FRIDAI. The results are not as good as I would get from ChatGPT, but still, they are impressive. It's s a good starting point. Once OpenAI releases the ChatGPT API, the responses will be even better.
Also, I would love to try the APIs that power the new Bing. Access to ChatGPT powered by real-time data would make FRIDAI powerful.
From other technicalities, I used the
SFSpeechRecognizer to detect phrases like
hey FRIDAI or
ok, FRIDAI and start listening to the user's voice input. I also use it to convert the user's speech to text, which I later display on the screen and send to ChatGPT.
Once I get the answer from ChatGPT, I use the
AVSpeechSynthesizer to spell out the response. Unfortunately, the speech synthesizer is using the old, robotic Siri voice. And I couldn't force it to use the current Siri voice, which is more fluent and natural.
Also, I couldn't use the speech synthesizer with the stream output from ChatGPT. The OpenAI streams the response character by character. In that case, the speech synthesizer is spelling the response letter by letter. To solve this issue, I need to accumulate the response from ChatGPT and then start reading it to the user.
But that's something for the future. For now I'm happy with the results because a couple of months ago, doing something like this seemed impossible.