We kept hitting the same wall building voice AI systems. Pipecat and LiveKit are great projects, genuinely. But getting it to production took us weeks of plumbing - wiring things together, handling barge-ins, setting up telephony, Knowledge base, tool calls, handling barge in etc. And every time we needed to tweak agent behavior, you were back in the code and redeploying. We just wanted to change a prompt and test it in 30 seconds. Thats why Vapi retell etc exist.
So we wrote the entire code and open sourced it as a Visual drag-and-drop for voice agents ( same as vapi or n8n for voice). Built on a Pipecat fork and BSD-2, no strings attached. Tool calls, knowledge base, variable extraction, voicemail detection, call transfer to humans, multilingual support, post-call QA, background noise suppression, and a website widget are all included. You're not paying per-minute fees to a middleman wrapping the same APIs you'd call directly.
You can set it up with a simple docker command. It comes pre-wired with Deepgram, Cartesia, OpenAI , Speechmatics Sarvam for STT, same for TTS, and OpenAI, Gemini, groq, Openrouter, Azure on the LLM side. Telephony works out of the box with Twilio, Vonage , CLoudonix and Asterisk for both inbound and outbound.
There's a hosted version at app.dograh.com if self-hosting isn't your thing.
Repo: github.com/dograh-hq/dograh Video walkthrough: https://youtu.be/sxiSp4JXqws
We built this out of frustration, not a thesis. The tool is free to use and fully open source (and will always remain so), happy to answer questions about the data or how we built it.
That is an extremely misleading title because it made it sound like Vapi was open sourced, not that you just made a clone.
Fair point on the title - should have been clearer. Dograh is an open source alternative to Vapi , not a clone though. Vapi/Retell are closed platforms; this is open source infra you self-host and modify. Like saying n8n is a clone of Zapier because they solve the same problem.
Same category, but fundamentally different model.
Nice work, will checkout. What’s the average end-to-end latency per turn with STT + LLM + TTS in your default stack?
Hello.
The latency is a factor of the models you are picking up for reasoning. If you are colocating the models by self hosting on GPUs, the latency can be as low as 500 - 600 ms between bot - user turns. With models like Gemini-2.5-flash, the latency is around 800-1000 ms. The latency can be higher with reasoning and larger models, like gpt-4.1.
Hello HN. I am Abhishek, one of the creators and maintainers of Dograh - github.com/dograh-hq/dograh
Please feel free to ask any question you may have or give us feedbacks on how we can make it better for you.
Thanks!
This is look pretty promising. Do you guys are focussing on a specific use case or any voice AI use case in general?
Thanks for the kind words @ajabish.
We are more of a horizontal platform and can support a wide variety of use cases. We are serving large BPO call centres on our managed hosted service for outbound and inbound cases.
There are individual builders also trying to build inbound use cases for personal use or trying to build their business on top of Dograh.
[dead]