We had some amazing content on October 10, including a panel of public servants from Latin America speaking about municipal innovation. The panelists all spoke Spanish. How could we bring this to our global audience?
We evaluated a number of products that seemed promising, and ultimately found one that worked well. Some of the criteria we considered were:
-
How accurate is the translation?
-
Does the dubbed voice use a sample of the original speaker’s voice, so it sounds like them?
-
Does it speed up/slow down the video when the translation is longer or shorter than the original words?
-
Does it edit the face to change lip movements so they line up with syllables and phonemes?
-
Is it done in realtime, or does it require pre-recording and subsequent processing?
Rebecca Croll, who heads up content, did all of the heavy lifting to test several products. Ultimately, real-time processing and facial editing are still “beta” technologies—but the rest of these criteria are easily achieved.
We found that the results were immeasurably better (and cheaper) than human translation, in large part because it retains the speaker’s voice and inflection, and paces the video, in ways a human translator can’t. We’re sorry to say it, but translators are now lamplighters for work that doesn’t involve interpreting human body language and nuance. You’d still want a translator for things like negotiations.
The resulting machine translations (into English) were then translated into French in real time (without the original speaker’s voice) on our event platform.
The experiment left us with some real questions:
-
What does this mean for human labor and dignity?
-
The results are hard to ignore. So many other jobs will undergo this in the next couple of years. Just this week, Kevin Weil, OpenAI’s CPO, declared that his company’s products could do “8 hours of legal work at $1,000 an hour” for $3 in tokens. A 2,666x cost reduction isn’t something we can just “upskill and retrain” around.
-
We need meaningful economic answers about how to distribute the spoils of the productivity increases that generative Ai will bring.
-
The translation with lipsync is going to be a nail in the coffin of human translation once it works well, because it makes it much easier to follow a speaker when their lips match their sounds. It’s not clear whether it’s good enough for lip readers.
-
Some of these tools also have support for .srt subtitle files, making it easier for a human to vet translations line-by-line (we did this.)
To access this session log into the platform, or register if you haven’t already, and then choose “Replay” from the menu on the left.
One of the things we love about running FWD50 is how we get to try out new technologies like this. There’s so much innovation to do in events when we experiment with emerging tech.