Technology that can accurately determine the state of a person's vocal apparatus (mouth, tongue, etc.) was developed through joint research between the National Institute of Advanced Industrial Science and Technology (AIST) and PRONTEST INC.. This technology was then used to develop an automated teaching system that can correct the user's pronunciation based on phonetics and English education techniques.
Wanting to improve one's English conversation skills is a natural goal, but it can be difficult to attend English classes regularly, and it is also very challenging to learn on one's own. A number of English learning software systems on the market use voice recognition technology to help overcome potential problems like these. However, the systems that have been developed to this point have only been able to pinpoint certain areas or mistakes and have not been able to provide individualized instruction to the user.
AIST conducts research on highly accurate voice analysis methods, but it was through collaboration with PRONTEST INC. and because of that company's experience with pronunciation correction and knowledge of English phonetics that this system was developed. The system determines the position of the mouth and tongue at the time of pronunciation using voice analysis in order to correct English pronunciation.
Combining knowledge of phonetics and English language instruction with information on the state of the vocal apparatus means that we can now anticipate the creation of learning materials that can be as effective as private lessons.
At the many English conversation schools around the country, it is possible to get English instruction, but it is not always possible to get accurate guidance on pronunciation. This is due to the fact that even if the teachers are native speakers of English, they may not have an understanding of phonetics or the theory of English language instruction. Furthermore, while the importance of English conversation education has been recognized at elementary and junior high schools, it is still not possible to expect the children to receive accurate guidance in pronunciation in their school lessons. Some people think that if the children can get the message across, it does not matter if their pronunciation sounds more Japanese than English, but as we have learned from motor theorists, the acquisition of pronunciation skills can also improve listening comprehension. Furthermore, it can give children an increased level of pride in their speaking abilities, which will make them more motivated to learn. Therefore, software that could detect pronunciation and advise the user on how to correct it could be used as a highly effective tool both for learners who study by themselves and as supplementary material in a classroom setting. However, most English learning software that uses audio information technology only shows the user a score or tells the user that a mistake has been made. Since instruction is only given at the phoneme level ("your L sounds like an R" or "make your E like a Japanese 'i'"), and the user is not given specific guidance on how to approximate a correct pronunciation, it has been difficult to prove that these systems have a strong effect on learning.
In November 2003, PRONTEST INC. (which was called Bears Communications at the time) started to think of ways to use its understanding of and experience with pronunciation training in the English conversation classroom to create software that would systematize this training. They discussed the possibility of creating such technology with Tsukuba Center Inc. Since AIST has a long history of doing research on phonetic analysis methods and the characteristics of articulation, an industry, academia, and government liaison coordinator at AIST proposed creating a link between the two organizations. Collaborative research on and development of this project started in July 2004, funded by a grant from PRONTEST INC..
The research was conducted by collecting a large number of English pronunciation samples and carrying out detailed investigations into the characteristics of speech analysis from the perspective of pronunciation correction. From this, the researchers were able to develop technology that can automatically determine the state of the vocal apparatus (shape of mouth, position of tongue) from the user's English utterance. Traditional methods that use voice recognition technology are based on the phoneme, so they were only able to give information about which sounds the user's utterance are closest to. The current system adds information on the characteristics of articulation, such as whether the tongue is touching the roof of the mouth or whether the lips are rounded enough. Building a recognition model based on these characteristics made it possible to determine the specific state of the vocal apparatus during speech. Using this idea, an English pronunciation correction system prototype was developed that can provide specific advice based on the student's pronunciation (e.g. try rounding your lips more; don't open your rounded lips too quickly; your jaw seems to be moving too much; it seems like air is escaping from your nose). This system combines knowledge of phonetics and language instruction with techniques for teaching pronunciation and has the added characteristic of being able to determine the state of the vocal apparatus. Students will now be able to use this software on their own computers to receive individual instruction equivalent to that provided by a skilled instructor. With systems like these in place, we can anticipate the creation of even more effective materials for language study in the future.