Grammarly for Speech
My Bluestamp project is a program that analyzes a person’s speech and gives feedback and statistics based on the speech. This program is a way for a person to practicing public speaking while reciving feedback and advice for improvement. This program will return a speaking score (based on various factors such as pronunciation and verbal flubs) and provide the user feedback on things to improve upon in the future.
|Engineer||School||Area of Interest||Grade|
|Yang G||Leigh High School||Computer Science||Incoming Junior|
Overall, during the course of the three weeks I had at Bluestamp, I not only had a lot of fun, but I also learned a lot of new things.
It was my plan to become a software engineer or developer when I grow up, and after experiencing the work they do firsthand, I realized that I think I rather enjoy this line of work and my plan has become my goal.
Over this time, I experienced using a lot of tools that I would probably have to use in the future, thus exposing me to the computer science world a little bit.
Bluestamp has motivated me to continue working on my project, as I have realized that my project is now an achievable goal and I am even further encouraged to spread my ideas about public speaking and help as many people as possible. I will continue to develop this project even after I leave Bluestamp and try to improve upon the program further.
My third milestone was improving upon the existing features. Specifically, I changed the speech score algorithm to be more accurate and more representative of a “better speech”, improved the UI for my webapp, and added a new feature that calculates the speaker’s speaking speed and gives feedback off of that.
First, I’ll explain how the speech score functions and how I plan to write the algorithm for the speech feedback. As of the status quo, the speech score depends on the speech speed, the Levenstein distance percentage based on the recognized speech API, and the confidence percentage given by the API. In the future, I plan on implementing more factors as well, such as verbal flub count, pitch change, decibel levels, etc.
Second, I also improved the UI for my webapp. I’m still extremely new to react, so this was just a learning experience for me. I was mainly just looking for things on bootstrap and trying to figure out how to implement some of the templates onto my own page.
Lastly, I added a new feature that calculates the speaker’s speaking speed. Conveniently, the Google speech-to-text API includes functions that track a word’s starting and end time. So, using those functions, I simply found the start time of the first word and the end time of the last word to calculate the average words per minute of the speaker.
That was basically it for my third milestone, a bunch of minor changes that overall make the program better. Sadly, this is going to be the last milestone that I create at Bluestamp, but I’ll continue to work on this project in my free time so that in the future I’ll be able to help some people with my program.
My second milestone was setting up a working UI using react and Node.js and properly link it to my back end code. Here is a general gist of what the process was: Set up a node.js webapp, use React to make a UI for the server, link the webapp to the backend code that I made previously, upload the files into Google storage, and finally draw the files from Google storage to use in my program.
With this being said, that thought process motivated me into creating a website for the program. This was my first encounter with making a website, and it was rather traumatizing for a newbie developer like me. After finishing this milestone, I have internally demoted myself from rookie Java developer to incompetent student coder as I realized that there was a ton of basic things that APCS did not teach me.
My react code:
I worked with a lot of basic tools such as Springboot and Node.js to launch a website, and I worked with languages other than Java. I encountered many different bugs that required me to spend time on google and moments of Eureka where I felt enlightened by a tidbit of information. This milestone has let me truly realize the meaning of “open up a whole new can of worms” and helped me delve deeper into the world of a Java developer. My encounter with the basic daily tools of a Java developer has given me a whole new level of appreciation for Java developers.
This milestone has given me a great opportunity to work on a project that will be somewhat similar to what I would be doing in a real Java development job. The amount of sheer pain and agony I have encountered in the past week will save me the trouble in the future since I will have to learn all of this stuff anyways. Working with Node.js, Google cloud storage, Google speech APIs, React, Springboot, and a bunch of other mumbo jumbo has opened up new pathways in my brain that have never been touched before. My tiny brain has almost imploded from the sheer amount of brainpower required to solve my confusion.
If it weren’t for the amazing help provided to me by my instructor and my dad, my brain probably would have vaporized into the atmosphere. During this process, I encountered many problems and bugs, but for the sake of time, I’ll just list out a few of the problems I encountered.
The most brain numbing problem was the implementation of Google Storage in my project. Rather than just transferring strings, I had to transfer FILES into my project, which (in my opinion) is much worse. For this project, since I was using a Google API, I was basically forced into using Google Storage to transfer files. This, again, was a lot of effort as it required me to look into the usage of Google buckets and everything related to that.
My backend code:
But, this shows me that I definitely have a lot to learn in the future and it gives me motivation to improve myself. Overall, even though my UI is still a work in progress and generally speaking my features are still messy, I am proud of the work that I have accomplished so far and I am eager to improve upon my project in the upcoming weeks.
Here is the very simple UI for the project that I made using React:
My first milestone was setting up the core features of my program (speech recognition API and script comparison). For my project I decided to implement Google’s speech to text API since it was the most accurate option as well as the fact that it is in Java (my preferred language). I set up the Google API and got it working in Intellij and allowed it to access local files that I input.
Now that my program could understand what the user was saying, I then implemented some features that analyze the accuracy of the speech to text API. I had my program display the confidence percentage of the API, showing how confident the API was in it’s recognition, and I started the baseline of my speaking score. I had the user input their script for their speech, and I used Levenstein distance to calculate the difference between the script and the string recognized by the speech API.
I looked it up online and found some premade class for Levenstein distance and used the methods in the class for my own program. I then took the Levenstein score and subtracted 100 by the number to get the speaking score (higher speaking score = better speech). In the future I plan to add more factors that contribute to the speaking score, but as of now, this speaking score will tell the user how clearly they were speaking (background noise, pronunciation, etc).
My example program’s output:
My program also has some basic feedback added based off the speaking score and the confidence percentage, for example: “The software did recognized your speech well, showing that you had good clarity and pronunciation, but try to have more flow in your speech with less verbal flubs or pauses.” I tested out 3 different sample speeches in my code (as seen in the Youtube video) all with varying levels of success. Overall, I think I built the core fundamental of my program relatively well, now I will work on making a web application for my project and adding more features/statistics that can provide feedback for the user.
An example of the feedback given by the program: