Tuesday, October 5, 2010

How Roger Ebert found his new voice (Q&A)

Roger Eberts poke to recapture his lost voice unclosed a association with a singular technology.When the famous movie censor indispensable to find a approach to promulgate after losing his voice to cancer surgery, he incited to text-to-speech (TTS) program that speaks whatever he types. But the TTS program he primarily attempted sounded as well robotic and computerized. He longed for a voice that sounded identical to him. Thats when he detected CereProc, a Scottish association that builds electronic voices. Using someones audio recordings, CereProcs jot down can tack together an complete digital voice that sounds identical to the tangible person.To emanate a full and scold covering of speech, CereProc typically brings people in to the veteran recording college of music to review specific voice scripts for multiform hours. That audio is delicately available and tranquil to have certain the as purify and unchanging as possible. But Ebert had customarily the audio from commentaries he done for multiform drive-in theatre on DVD. The plea faced by CereProc was perplexing to tack together his voice from audio that was singular in length and bad in quality.Eberts new voice done the primary TV entrance on Tuesdays Oprah Winfrey show where the movie censor and his mother Chaz spoke with Oprah and appeared in a taped shred divulgence their hold up at home. Hearing her husbands voice for the primary time in multiform years brought tears to Chaz and smiles to Roger.CereProc creates and sells a accumulation of opposite voices with assorted accents, dialects, and personalities. People have have have have have have have make have use of CereProcs voices and text-to-speech program for a accumulation of reasons. Some, identical to Roger Ebert, have lost their own capability to speak. Many people have have have have have have have make have use of it to sense English and alternative languages. Some wish to constraint a internal chapter prior to it dies out. I have have have have have have have make have use of TTS program as a proofreading and modifying apparatus to attend aloud to my own writing.To sense some-more about CereProcs software, I not prolonged ago spoke with Chris Pidcock, the companys arch voice engineer.Chris Pidcock, Chief Voice Engineer at CereProc(Credit:CereProc)Q: Chris, how do you essentially emanate someones voice utilizing your technology? Pidcock: When we set up a voice for ourselves, we have a special book that covers lots of the sounds of English. Its utterly abounding and detailed. We get people in to a college of music and outlay fifteen hours or so recording them. But afterwards, the voice origination routine can be achieved on audio and content from anywhere. Thats how the Roger Ebert plan came about. He got in hold with us since he saw the small George Bush articulate head. That was a great e.g. since patently we couldnt get [Bush] to lay in the recording college of music for fifteen hours. So we used his weekly air wave address, that had both the content and audio on the White House Web site. We downloaded it and put it together, and out came the fake George Bush.We take that audio interpretation and send it off for transcription and afterwards shred it in to unequivocally small pieces. The technique is identical to the one used by AT&Ts NaturalVoices of selecting opposite pieces, or phonemes, of the audio and stitching them behind together in crafty ways. The pretence is stitching them behind together so they dont receptive to advice identical to they came from opposite context and opposite words. And patently right away Roger can contend anything he likes--hes not limited to the difference he used in his DVD commentaries.Q: It sounds identical to that routine would be sincerely time-consuming--taking detached the difference in to their sold phonemes and putting them behind together to form a total vocabulary. How severe is that? Pidcock: Its flattering wily to do the primary segmentation, to clout things up in to the right pieces. And we"ve put a lot of work in to automating that as most as possible. In the old days, we used to have to check most hundreds of thousands of bounds in in in between sounds to have certain they were correct. We right away do scarcely all of it automatically. So once we have the audio and the text, we can flattering most put a voice together overnight. The generating of the debate goes utterly quickly. There are crafty algorithms we use. We have customarily 50 to hundreds of examples of each sound. So you have to collect a trail thats optimal in in in between many, most hundreds of thousands or millions of opposite options. But thats the usual technique used in debate approval as well.With Roger Eberts voice, the been some-more wily since we lend towards not to have have have have have have have make have use of element thats some-more conversational. In normal conversation, people lend towards to stop and repeat themselves and giggle and cough. Thats essentially utterly formidable since we have to painstakingly remove all the uhs and ums so that the acoustic element we get is genuine speech. With the recording bent in the voice studio, if they contend um in the center of a sentence, we have them do it again, that we cant do with Roger.Q: What were a small of the alternative the hurdles in formulating Rogers voice? Pidcock: Getting the audio has been a bit of a challenge, some-more for him than for us. For the DVD commentaries, we indispensable a version that was utterly purify with usually his speech. Obviously, the version on the DVD is customarily churned with the credentials audio of the film. So hes been perplexing to get people to dig out the bizarre audio marks since there unequivocally isnt any approach of extracting the soundtrack. Theres no approach to do that in a approach that doesnt additionally set detached from the speech.Once we got hold of his audio, we sent it out for transcription. That was utterly wily since the formidable to mark all the ums and alternative audio disfluencies. So one of the hurdles has been to try and automatically find these disfluencies. Also, a big complaint with the audio we"re removing is they"re from opposite recording environments. One of the things we do when we jot down the own voices is keep the sourroundings very, unequivocally consistent. We regularly have have have have have have have make have use of the same college of music and microphone. We take photos of all the apparatus so we can have certain all the levels are the same. But with Rogers DVD commentaries, they could have been available years detached in opposite environments--some available in his house, a small available in a veteran studio. Its a plea to mix that audio in a unchanging way.Also, the approach he speaks is some-more conversational in these commentaries than what we"re used to. That equates to his debate is some-more varied. Its probable in the singularity that we competence try to hang in a small vowel from a all opposite college of music available 10 years later. So smoothing all that together is a lot some-more severe than it is for the voices we jot down ourselves.Q: Do you have standard content that people in your college of music review from, identical to what we competence see in a voice dictation program? Pidcock: Yes, we have a big database of text, that we fundamentally cave for combinations of sounds that are sincerely singular in English. We finish up with a lot of phrases with the word "oil" since in British English there arent most contexts with "oy" in them. We finish up with bizarre sentences identical to "The Omaha Oil association went down by 10 points today" since "ah" and "oy" dont go together unequivocally much. Over the years, we"ve been means to rise a unequivocally great book that tries to cover all the sounds we need in as most brilliance as probable in as small time as possible.Q: How most available audio would you typically need from someone to emanate a voice? Pidcock: For the voices, we have have have have have have have make have use of a smallest of fifteen hours in the studio, compared with the Roger Ebert voice, that I think is customarily about 4 hours in the version we got. But we dont have them do all fifteen hours at once. Usually, we do about 3 hours a day of recordings for a week.Q: Do you sinecure veteran voiceover people or are these usually normal people? Pidcock: It depends. We furnish law voices for people as well. Sometimes if a patron wants a utterly young-sounding voice, the utterly tough to get a professional, so we try to find gifted amateurs. And mostly people who do pledge dramatics are utterly great at voiceover work. We have a voice on the Web site called Sue whos from a sold area in Central England where they have a unequivocally clever accent. Its in Birmingham. That was essentially a competition. They were perplexing to collect an e.g. of this accent since the kind of failing out. They went out on the streets of the locale with a microphone and available people and played that on the internal air wave hire and chose the voices they liked. And afterwards we got the voice from that.Q: So you"re removing internal dialects? Pidcock: Yeah. With the Irish voice [Caitlin], we put a small work in to removing Irish place names internal to Ireland. We did the same with the Birmingham voice and the Scottish voices. We try to find a small internal color.Q: Are you especially focused on English voices or have you branched out in to any alternative languages? Pidcock: We have partners in Germany and Spain. So we have Spanish and Catalan and German voices and additionally an Austrian/German dialect. And we"re usually finishing up French and Italian. So we"ve been active in construction some-more languages. We additionally available the French voiceover celebration of the mass a lot of English. So her voice can be done sincerely multilingual. Her English is essentially so great that we"re meditative of adding an English voice with a French voiceover, kind of a voluptuous French, that I think competence be utterly popular.Q: How did the product get off the ground? How did your association start? Pidcock: Edinburgh University is one of the tip places in Europe for debate technology. And utterly a prolonged time ago, a debate singularity complement was created there, that is kind of embedded in Linux. Its called Festival. And that led to a spinoff from Edinburgh University called Rhetorical Systems, a association that kind of flowered quickly in the Internet bang and afterwards crashed down. A couple of us who work for CereProc worked at Rhetorical. And after that association folded, we proposed again with a some-more tightknit thought of construction up the jot down some-more progressively and hopefully some-more sustainably.Q: Where do you see the jot down going and what do you goal to grasp with it? Pidcock: One thing we"d identical to to do in line with the complement I"ve been articulate about is have a Web have make use of where any one could log on and have have have have have have have make have use of your computer microphone to review a small series of sentences. And afterwards it would give you a downloadable voice that you can implement on your computer that sounds usually identical to you. That kind of thing would customarily be probable when this technology, called parametric synthesis, is onstream. We"ve additionally been operative on perplexing to get a some-more romantic outlay in to the speech. We have a plan to emanate small animations or articulate heads for in-car use. We put a small work in to formulating an American voice that could receptive to advice happy or unhappy or irritated. And we were meddlesome in saying how that competence affect an communication in in in between a chairman and theircar. Although I"m not certain if you"d wish your car to be indignant with you.Q: Roger Eberts e.g. is engaging if you describe it to alternative people who have lost their voice. But as you said, the plea is anticipating sufficient available audio from them from the past. Pidcock: Yeah, at the impulse the usually as well formidable or as well expensive. But we are operative on opposite techniques to capacitate us to set up a voice from a not as big volume of audio. There are new text-to-speech techniques entrance up that would essentially have a indication of the speaker. They dont work by chopping up pieces of debate in to small pieces and stitching them together. You essentially sight the indication on the speakers sounds so it can dedicate those sounds. You can take a ubiquitous masculine indication of speech, contend an American ubiquitous masculine voice built from lots of opposite American masculine speakers. And you can conform it to receptive to advice identical to Roger Ebert with utterly a small volume of material, may be half an hour or so.The complaint with these voices is that they dont receptive to advice unequivocally healthy at the moment. They"re a bit loud and a bit buzzy, and they dont have the variation. In the intonation, they dont receptive to advice as natural. But potentially, they could be great for people who customarily have a small volume of audio. It"ll still furnish something that sounds identical to them. Thats not ready for production. Thats still something we"re experimenting with.Updated 2:15 PST to scold spelling of Edinburgh and shift companys place to Scotland.
that enables the make the most efficient use of their stored energy in the muscles • for acnebreaking dawntwilight sagamilkydry skinbridal gownshow to hairsuit imprimantefor acne milk

No comments:

Post a Comment