Advertisement
Researchers from the Massachusetts Institute of Technology (MIT) are working on an AI project that can accurately map what people look like just by analyzing voice recordings from the internet. The researchers published their first paper on the project: “Speech2Face: Learning the Face behind a Voice.”
The researchers trained the AI neural networks using millions of YouTube audio clips taken from the internet. The clips included voice audio from over 100,000 individuals. The AI was able to make clever distinctions in voice tone and inflection and could accurately predict the age, gender and ethnicity of the individuals. Most shocking: The AI attempted to recreate the faces of the people based on the sound of their voice alone. The computer simulation was able to determine key features on people’s faces and then generate lookalike photos of the people — all based on random audio clips.
The researchers did not intervene in the project. The AI grew smarter on its own. After analyzing millions of audio clips, the computer was able to recreate faces that closely resembled the original speaker in the video clip. The AI was not 100 percent accurate and failed to judge some of the ethnicities, especially when a person spoke more than one language.
The research does raise privacy concerns. The MIT researchers intend to address those privacy concerns now that people are speaking out. A researcher at Cloudfare, Nick Sullivan, was notified by a friend that his face was recreated in the AI research project. “I was not informed that my image or likeness was being used in this research. I’m not sure how concerned I should be about that,” Sullivan said. The researchers did not get consent from anyone whose audio, photos and likeness were used in the study. The researchers said no one should be too worried about privacy violations because the computer can’t generate the “true identity of a person” and can only create “average-looking faces.”
The researchers explained their project to Arxiv: “Our goal in this work is to study to what extent we can infer how a person looks from the way they talk. Obviously, there is no one-to-one matching between faces and voices. Thus, our goal is not to predict a recognizable image of the exact face, but rather to capture dominant facial traits of the person that are correlated with the input speech.” The researchers seek to use the AI for “useful applications” such as “attaching a representative face to phone/video calls based on the speaker’s voice.”
The researchers even hinted that the AI was able to detect “breakthrough” correlations in facial patterns. In future applications, these “breakthroughs” could make it easier to map nodal points on a face. The researchers wrote, “Our reconstructions reveal non-negligible correlations between craniofacial features (e.g., nose structure) and voice.”
It will be interesting to see how authorities use this artificial learning in the future. It could potentially be paired with facial recognition software to single people out of a crowded field or to help further clarify nodal points and recreate an image of an individual’s face. Perhaps the technology can help authorities hunt someone down based on sounds taken from a random audio clip.
For more on AI and the encroaching police state, check out PrivacyWatch.news.
Sources include:
Advertisement
Advertisements