Voice Recognition API Increases DAU by 19%

The Challenge

The client, an EdTech startup teaches young children the alphabet via a mobile application. Through current teaching methods, the client is unsure of student progress and retention of knowledge taught in lessons. The clients want to leverage technology by “gamifying” the science of learning and tracking all progress. 

The idea: codify dopamine into the user experience and make learning enjoyable and engaging, increasing time learning on the app and user retention.


Vacon proposed a speech-to-text API embedded inside the application that hears the student’s voice and scores accuracy of recognition and speed of recall. Positive affirmation is awarded through collectible accolades (badges) for the accomplishment of milestones. Passing the minimum set learning requirements unlocks the next lesson.

How it works:

  • Voice activity detection (VAC) was conducted on an audio dataset to extract words spoken from large audio files.

  • After extracting small audio we built a deep learning-based model in Pytorch and classified each audio into 10 classes (0 to 9).

  • Once achieving +95% test accuracy, a FlaskAPI was created and embedded into the mobile application.

Tech Stack


  • API transcribes and matches within 50 milliseconds with 95% accuracy

  • Daily Average Users (DAU) increased by 19%

  • 12% increase in retention rate

  • 32% in average session length

Scroll to Top