Google English-Spanish Real-Time Speech Translator Review 2025

Blog

Monday June 23, 2025 - Posted by:

Comprehensive Review of Google’s Real-Time English-Spanish Speech Translator: EN-ES & ES-EN AI for International Communication

TL;DR

What is Google’s Speech Translator?
In May 2025, Google launched a real-time speech translator (powered by Gemini AI) for Google Meet. It supports both English-to-Spanish (EN→ES) and Spanish-to-English (ES→EN) translation, aiming for natural, human-like conversations.

Who is it for?

Global teams collaborating remotely, multilingual families and friends, and international customer service and consulting

How was it tested?
Five content types: Casual/informal; Business/corporate; Technical; Fast-paced/overlapping dialogue; Idiomatic language

How was it scored?

Scored on: accuracy, naturalness, tone, comprehensibility, and timing.

Key Strengths

Works best for structured, business, and technical speech (especially ES→EN). Main ideas and data are preserved in those contexts.

Key Weaknesses

Literal, awkward, or unnatural translations in casual and idiomatic speech. Performs poorly with fast-paced or overlapping dialogue. Choppy delivery and unnatural pauses at times.

Bottom Line
Google’s real-time speech translator (EN↔ES) is promising for formal business and technical meetings but unreliable for informal, idiomatic, or rapid conversations. It’s a step forward for breaking language barriers in structured settings, but natural, everyday speech still needs improvement.

 

Introduction

In May 2025, Google launched its new voice translation feature, powered by Gemini AI, for Google AI Pro and Ultra subscribers. Debuting at Google I/O 2025, this tool enables near real-time translation of spoken language during Google Meet sessions. Notably, it attempts to replicate the speaker’s tone, inflection, and expression, aiming for a natural, human-like sound. Currently the feature is only available for English to Spanish and vice-versa.

The core goal? To break down language barriers in virtual meetings, enabling smooth, natural conversations between speakers of different languages. This is especially valuable for:

Remote collaboration among global teams

Families or friends who speak different native languages

International customer service or consulting

Ultimately, Google’s new feature is designed to support flowing conversations, even when participants don’t share a common language.

 

Methodology

For its evaluation, we selected five different kinds of texts, each one covering different conversation contexts:

Casual/informal

Business/corporate

Technical

Fast paced / overlapping dialogue

Text heavy with idioms

We chose to base our analysis on the following criteria and weighting to obtain a score out of 10 for each of the content types:

Accuracy of Translation (30% weighting): Fidelity to the source meaning

Naturalness and Fluency (25% weighting): How idiomatic and smooth the translation sounds

Tone and Register (20% weighting): Appropriateness of word choice and tone for the context

Comprehensibility (15% weighting): Whether the main idea is understandable, even if imperfect

Timing and Delivery (10% weighting): Correctness of pauses, sentence breaks, and rhythm

 

Overall scores

ES-EN

Casual/informal: 4.45/10

Business/corporate: 7.6/10

Technical: 7.05/10

Fast paced / overlapping dialogue: 1/10

Idiomatic: 4.1/10

EN-ES

Casual/informal: 4.55/10

Business/corporate: 5.55/10

Technical: 3.4/10

Fast paced / overlapping dialogue: 1/10

Idiomatic: 3.1/10

 

ES-EN Speech Translation

Casual/informal Content (Score: 4.55/10)

The meaning is just about understandable with significant effort, however anything longer than the short content tested would have been unworkable.

ES Transcript (source text)

¡Ey, qué pasa! Ayer estuve viendo una serie que está brutal, te la recomiendo un montón. Luego quedamos para tomar algo y ponernos al día, que hace tiempo que no nos vemos. ¿Quién se apunta?

EN Localisation of source text (correct translation)

Hey, what’s up! I was watching this amazing series yesterday, I totally recommend it. Let’s catch up over a drink soon, it’s been ages since we last saw each other. Are you in?

EN Transcript (Google Speech Translator Output)

Hey. What’s up? Yesterday I was watching a series that is brutal. I highly recommend it. Then we meet up to have a drink and catch up, which we haven’t seen in a while. Who signs up?

Analysis

Accuracy of translation (30%) 6/10

Strengths

The AI translator captures the main ideas and most key information, although some translations misconveyed meaning.

Weaknesses

It translates phrases too literally, such as:

“una serie que está brutal” → “a series that is brutal” (should be “an amazing series”)

“¿Quién se apunta?” → “Who signs up?” (should be “Are you in?”)

Literal translations lead to awkward or unnatural English, losing the intended meaning.

Naturalness and fluency (25%) 2/10

Strengths

None in this case.

Weaknesses

The phrasing is stilted and unnatural, e.g.:

“Yesterday. I was watching a series that is brutal.” (fragmented, unnatural pause)

“Then we meet up to have a drink and catch up, which we haven’t seen in a while.” (awkward structure)

Tone and register (20%) 4/10

Strengths

Attempts to keep the informal tone (“Hey. What’s up?”).

Weaknesses

Fails to fully convey the friendly, colloquial style:

Literal translations strip away the casual, inviting feeling.

“Who signs up?” is formal and odd in this context.

The translation is technically informal but misses the warmth and idiomatic nature of the original.

Comprehensibility (15%) 7/10

Strengths

The main message is understandable with effort.

Weaknesses

Unnatural pauses and direct translations make it harder to follow.

Listeners can work out the meaning, but it’s not smooth or easy to process.

Timing and delivery (10%) 3/10

Strengths

Pauses are present.

Weaknesses

Pauses are unnatural and break the flow, making comprehension harder.

The delivery is choppy and distracts from the message.

Business/corporate Content (Score: 7.6/10)

The more structured corporate/business-style content seems to suit the AI better. Although the translation is by no means perfect, it is easily understandable and there is no possibility for misinterpretation of the text.

ES Transcript (source text)

Solo una actualización rápida: desde que implementamos los modelos de lenguaje en LATAM, hemos visto mejoras concretas — los tiempos de respuesta bajaron un 12 % y el equipo de soporte ha tenido un 18 % menos de carga. 

Para el próximo trimestre, esperamos subir la eficiencia otro 8–10 %, sobre todo con la ampliación en Bogotá y CDMX. Gracias a todos por el esfuerzo, especialmente a los leads de equipo — el empuje que están dando en el día a día está marcando la diferencia. ¡Sigamos así!

EN Localisation of source text (correct translation)

Just a quick update: since rolling out the language models across LATAM, we’ve seen tangible improvements. Response times are down by 12%, and the support team’s workload has decreased by 18%.

Looking ahead to next quarter, we’re targeting a further 8–10% boost in efficiency, particularly as we expand operations in Bogotá and Mexico City.

A big thank you to everyone for your continued efforts, especially the team leads. Your day-to-day drive is making a real difference. Let’s keep it up!

EN Transcript (Google Speech Translator Output)

Just a quick update since we implemented the language models in LatAm, we’ve seen concrete improvements. Response times fell 12%, and the support team had 18% less load.

For next quarter, we expect to raise efficiency another 8 10%, especially with the extension in Bogota and Mexico City. Thank you all for the effort, especially the team’s leads. The drive they are giving on a day to day basis is making a difference. Let’s keep going.

Analysis

Accuracy of translation (30%) 8/10

Strengths

All key data (percentages, locations, sequence of events) is correctly translated.

No significant omissions or factual errors.

Weaknesses

Some literal translations, e.g., “concrete improvements” for “mejoras concretas,” are not the most idiomatic but do not distort meaning.

For a speech translator, accuracy is strong; the meaning is faithfully preserved, which is critical for understanding spoken business updates.

Naturalness and fluency (25%) 7/10

Strengths

The transcript reflects natural sentence breaks and phrasing typical of spoken language.

Overall, the flow is coherent and logical for a speech context.

Weaknesses

Some phrases are clunky or awkward, such as “extension in Bogota” instead of “expansion,” and “team’s leads” instead of “team leads.”

The AI generally produces plausible spoken phrasing but could improve idiomatic expressions and smoother transitions.

Tone and register (20%) 7/10

Strengths

The tone is appropriately professional and formal for a corporate speech.

Politeness and appreciation are conveyed in a suitable manner.

Weaknesses

Slightly less warmth or motivational nuance than a human speaker might naturally convey.

The tone fits the corporate context well, though a bit more natural variation in emphasis and friendliness would enhance listener engagement.

Comprehensibility (15%) 9/10

Strengths

The live translation is easy to follow and understand with minimal effort.

No ambiguity or confusion arises from the phrasing.

Weaknesses

Minor awkwardness in phrasing does not affect overall comprehension.

The AI’s output supports clear understanding of the spoken message, which is essential for effective communication.

Timing and delivery (10%) 7/10

Strengths

Pauses and sentence breaks generally align well with natural speech patterns.

Weaknesses

Some pauses are slightly misplaced, causing minor disruptions to flow.

Timing is mostly effective but could be refined to better mimic natural speech rhythm and intonation.

Technical Content (Score: 7.05/10)

Overall a very good effort if you ignore the misunderstanding of the sentence break-up at the start.

ES Transcript (source text)

Los modelos de lenguaje son sistemas de inteligencia artificial diseñados para procesar, comprender y generar texto en lenguaje natural. Utilizan arquitecturas basadas en redes neuronales, como los transformadores, entrenadas con grandes volúmenes de datos textuales. Su función principal es predecir la probabilidad de aparición de una palabra o secuencia de palabras, lo que permite realizar tareas como traducción automática, resumen, respuesta a preguntas y redacción de textos. 

Modelos avanzados, como los transformadores generativos preentrenados, incorporan aprendizaje no supervisado y ajustes posteriores, lo que mejora su rendimiento en tareas específicas. Estos modelos han transformado significativamente el procesamiento del lenguaje natural.

EN Localisation of source text (correct translation)

Language models are artificial intelligence systems designed to process, understand, and generate text in natural language. They use neural network architectures — such as transformers — trained on large volumes of textual data. Their core function is to predict the likelihood of a word or sequence of words appearing, enabling tasks such as machine translation, summarisation, question answering, and text generation.

Advanced models, such as generative pre-trained transformers, incorporate unsupervised learning and subsequent fine-tuning, which enhances their performance on specific tasks. These models have significantly transformed the field of natural language processing.

EN Transcript (Google Speech Translator Output)

The language models are artificial intelligence systems designed to process, understand, and generate text. In natural language, they use architectures based on neural networks, such as transformers trained with large volumes of textual data. Its main function is to predict the probability of a word appearance or sequence of words, which allows tasks such as machine translation, summary, answer to questions, and writing of texts. Advanced models such as pre trained generative transformers incorporates unsupervised learning and subsequent adjustments, which improves their performance in specific tasks. These models have significantly transformed the natural language process.

Analysis

Accuracy of translation (30%) 7/10

Strengths

Most technical concepts and terminology are correctly rendered.

Accurately conveys the core functions and features of language models.

Weaknesses

Misplacement of sentence breaks alters the meaning or flow (e.g., splitting “generate text in natural language” into two sentences).

Some minor mistranslations (“natural language process” instead of “natural language processing”).

Generally reliable, but technical precision is occasionally undermined by sentence boundary errors.

Naturalness and fluency (25%) 7/10

Strengths

Some sections, especially towards the end, flow naturally and reflect appropriate spoken intonation.

Weaknesses

The beginning is robotic and fragmented, with unnatural pauses and sentence splits.

Some phrases sound clunky (“writing of texts” instead of “text generation”).

Fluency improves as the transcript progresses, but the start is noticeably stilted for spoken English.

Tone and register (20%) 7/10

Strengths

Maintains a formal, technical register suitable for an explanatory or educational context.

Consistent use of technical vocabulary.

Weaknesses

Slightly monotonous, lacking variation in emphasis that a human technical speaker would use for clarity and engagement.

Appropriate for a technical explanation, but could benefit from more expressive delivery.

Comprehensibility (15%) 8/10

Strengths

The main ideas and technical content are clear and understandable.

No major ambiguity in the technical explanation.

Weaknesses

Sentence break errors at the start may momentarily confuse listeners.

The message is almost entirely clear except for initial confusion with a sentence break.

Timing and delivery (10%) 6/10

Strengths

Good use of intonation at the end of sentences (“…and writing of texts.”).

Delivery becomes more natural as the transcript continues.

Weaknesses

Unnatural pauses and misplaced sentence breaks at the start disrupt the flow.

Mixed performance: delivery is inconsistent, starting off robotic but improving as the segment progresses.

Fast-paced/overlapping Content (Score: 1/10)

The delay makes it impossible to have fast-paced overlapping dialogue, you really need to wait for the other to finish speaking before taking over. You would need the speech translation to be perfectly simultaneous for this to work. It scores one point for localising the occasional word.

Idiomatic Content (Score: 3.1/10)

Idiomatic expressions are a no go for the moment. Some very strange translations.

ES Transcript (source text)

Esta mañana me levanté con el pie izquierdo. Primero se me hizo tarde porque no sonó la alarma, luego derramé el café encima de la camisa y, para colmo, perdí el autobús. Ya iba camino al trabajo echando humo, pensando que el día no podía ir peor. Encima, cuando llego al metro, estaba hasta los topes y casi no entro. Para rematarla, me doy cuenta de que dejé el portátil en casa. Mi gozo en un pozo. En serio, hay días en los que es mejor no salir de la cama.

EN Localisation of source text (correct translation)

I woke up on the wrong side of the bed this morning. First, I overslept because my alarm didn’t go off, then I spilled coffee all over my shirt and to top it off, I missed the bus. I was already fuming on the way to work, thinking the day couldn’t get any worse. Then, I get to the tube and it’s absolutely packed, barely managed to squeeze in. And just when I thought I was in the clear… I realise I’ve left my laptop at home. Brilliant. Honestly, some days you’re just better off staying in bed.

EN Transcript (Google Speech Translator Output)

I woke up with my left foot. First it was late for me because the alarm didn’t go off, then I broke my coffee on my shirt. And to top it all off, I missed the bus. I was on my way to work thinking I couldn’t get any worse. On top of that, when I reached the subway, I was almost done and hardly entered. To finish it, I realised that I left the laptop at home.

Analysis

Accuracy of translation (30%) 4/10

Strengths

If the meaning is straightforward, the tool can deliver the message.

Weaknesses

Does not understand idioms. The second half of the text is not translated.

The AI struggles with idiomatic language, leading to loss or distortion of meaning.

Naturalness and fluency (25%) 3/10

Strengths

Some sentences are understandable and have basic structure.

Weaknesses

Pauses are not respected and sentences are not well cut. Therefore, the text is not understandable.

The output lacks the smooth, conversational flow expected in natural speech.

Tone and register (20%) 3/10

Strengths

Attempts to maintain a personal, anecdotal tone.

Weaknesses

Fails to convey the motivation and humour of the original due to poor idiom handling.

The register is inconsistent; some parts sound robotic or odd.

The emotional tone is largely lost, making the story less engaging and relatable.

Comprehensibility (15%) 3/10

Strengths

The general idea of overcoming a difficult situation is still understandable.

Weaknesses

Confusing or incorrect idiom translations require extra effort to interpret. The last part of the text is omitted.

Comprehension is possible in the first half of the text. The second part is lost.

Timing and delivery (10%) 4/10

Strengths

Some pauses are serviceable when the sentence is more straightforward.

Weaknesses

Some pauses are not respected and sentences are not well cut. Therefore, the text is not understandable.

Delivery is passable but lacks the rhythm and expressiveness of a natural storyteller.

 

EN-ES Speech Translation

Casual/informal Content (Score: 4.55/10)

There is an attempt to preserve the informal tone. The overall context is understandable but the final goal of the text is missed.

EN Transcript (source text)

Hey, how’s it going? I just got back from the gym, and honestly, I’m exhausted. I was thinking of heading to that new café on the corner later, you know, the one with the weird chairs? Fancy meeting there around three-ish? I’ll probably bring my laptop too, might try to get some work done… or at least pretend to.

ES Localisation of source text (correct translation)

Ey, ¿qué tal? Acabo de volver del gimnasio y, la verdad, estoy reventado. Estaba pensando en acercarme a la nueva cafetería de la esquina más tarde. Ya sabes, la que tiene esas sillas raras. ¿Te apetece quedar ahí sobre las tres? Igual me llevo el portátil también, intentaré trabajar un poco… o al menos fingirlo.

ES Transcript (Google Speech Translator Output)

¿Cómo va? Acabo de volver del gimnasio y, honestamente, no estoy muy publicado. Estaba pensando en dirigirme a ese nuevo café en la esquina más tarde. Ya sabes el que tiene una especie… de reunión elegante alrededor de las tres. Probablemente también traeré mi computadora portátil para intentar adivinar.

Analysis

Accuracy of translation (30%) 5/10

Strengths

The AI translator captures the main ideas and most key information, although some translations misconveyed meaning.

Weaknesses

The translation presents many mistranslations, which make the outcome confusing. From the second half of the text, the tool misunderstands some words, which really affects the overall meaning. It also omits some key parts, for example, the word “chairs” and “might try to get some work done”. It also translates “pretends” as “guess” instead of “fake”.

Naturalness and fluency (25%) 5/10

Strengths

No major strengths.

Weaknesses

Phrasing is not natural since everything is translated word-by-word. For example, “”Estaba pensando en dirigirme a ese nuevo café en la esquina…” is a literal translation from the wording used in EN.

Tone and register (20%) 4/10

Strengths

Attempts to keep the informal tone. For example, “¿Cómo va?”

Weaknesses

Fails to fully convey the friendly, colloquial style.

Literal translations strip away the casual, inviting feeling. Also many mistranslations that make the text awkward.

Comprehensibility (15%) 4/10

Strengths

The message could be understood overall, but some parts are completely missed due to mistranslations.

“Fancy meeting” > “Reunión elegante”. The invitation to meet is completely missed here.

Weaknesses 

Unnatural pauses, mistranslations and direct translations make it harder to follow.

Listeners can understand some of the content, but the whole point of the text (the invitation to the café) is missed.

Timing and delivery (10%) 4/10

Strengths

Pauses are present.

Weaknesses

Pauses are unnatural and break the flow, making comprehension harder.

Business/corporate Content (Score: 5.55/10)

The tone fits the corporate context well, though a bit more natural variation in emphasis and friendliness would enhance listener engagement. There are no significant omissions for this example.

EN Transcript (source text)

As we move into Q3, our strategic focus remains centred on three core pillars: accelerating growth in emerging markets, driving operational efficiency through automation, and leveraging generative AI to personalise customer journeys. We’ve already seen a 12% uplift in retention through the pilot programme in LATAM, and we’re now scaling that across EMEA. I’d encourage all team leads to review the rollout plan in detail and flag any potential resourcing gaps by next Friday.

ES Localisation of source text (correct translation)

A medida que avanzamos al tercer trimestre, nuestro enfoque estratégico sigue centrado en tres pilares principales: acelerar el crecimiento en los mercados emergentes, mejorar la eficiencia operativa a través de la automatización, y aprovechar la IA generativa para personalizar los viajes de los clientes. Ya hemos visto un aumento del 12% en la retención gracias al programa piloto en LATAM, y ahora lo estamos escalando en EMEA. Animo a todos los líderes de equipo a que revisen el plan de implementación en detalle y de que informen de cualquier brecha de recursos posible antes del próximo viernes.

ES Transcript (Google Speech Translator Output)

A medida que avanzamos hacia el tercer trimestre nuestro enfoque estratégico permanece centrado en tres pilares centrales, lo que acelera el crecimiento de los mercados emergentes, impulsa la eficiencia operativa a través de la automatización y aprovecha la IA generativa para personalizar los recorridos de los clientes. Ya hemos visto un 12% de atención a través de la programación pirata en LATAM y ahora estamos escalando eso en EMEA. Animaría a todos los clientes potenciales del equipo a que realicen los detalles del plan de implementación y marquen cualquier brecha de recursos potenciales para el próximo viernes.

Analysis

Accuracy of translation (30%) 4/10

Strengths

All key data (percentages, locations, sequence of events) is correctly translated.

No significant omissions or factual errors.

Weaknesses

Mistranslates the core pillars as consequences, not key points to focus on: “As we move into Q3, our strategic focus remains centred on three core pillars, which accelerates…”.

Translates “retention” as “attention”. Localises “pilot” as “pirate”. Also “team leads” as “clientes potenciales”

Some key words are missed, which make the overall message difficult to understand fully.

Naturalness and fluency (25%) 6/10

Strengths

Sentence breaks are clear. Terminology used is appropriate for this kind of text.

Weaknesses

Everything is a direct translation, so the outcome is not idiomatic.

The AI generally produces plausible spoken phrasing but could improve idiomatic expressions and smoother transitions.

Tone and register (20%) 7/10

Strengths

The tone is appropriately professional and formal for a corporate speech.

Politeness and appreciation are conveyed in a suitable manner.

Weaknesses

Slightly less warmth or motivational nuance than a human speaker might naturally convey.

The tone fits the corporate context well, though a bit more natural variation in emphasis and friendliness would enhance listener engagement.

Comprehensibility (15%) 7/10

Strengths

The live translation is easy to follow and understand with some effort.

Weaknesses

Some awkwardness in the structures used and some pauses that can lead to misunderstanding. Ambiguity in the first sentence, since the core pillars are translated as consequences. Some mistranslations complicate comprehensibility (pilot > pirate).

The AI’s output supports a mostly clear understanding of the spoken message. However, the speaker must speak clearly and make sure the pauses are emphasised.

Timing and delivery (10%) 7/10

Strengths

Pauses and sentence breaks generally align well with natural speech patterns.

Weaknesses

Pauses were not made correctly in a couple of instances, so the message was more difficult to understand. Some delays were experienced.

Timing is mostly effective but could be refined to better mimic natural speech rhythm and intonation.

Technical Content (Score: 3.4/10)

This section has lots of potential but – in this case – the second part of the text gets lost. Since the terminology is technical, the translation does not come out very unidiomatic.

EN Transcript (source text)

Large Language Models have made huge strides in recent years, particularly in generating human-like text and handling complex tasks like summarisation, translation, and even reasoning. But with that power comes growing attention to something more nuanced: bias.

Because these models are trained on vast amounts of online content — much of it scraped from the open web — they inevitably reflect the patterns and prejudices present in that data. That could mean reinforcing gender stereotypes, favouring certain dialects over others, or even giving more weight to dominant cultural viewpoints in a way that doesn’t reflect the diversity of a global audience.

ES Localisation of source text (correct translation)

Los Modelos de lenguaje grandes han avanzado mucho en los últimos años, particularmente en la generación de texto similar al humano y llevando a cabo tareas complejas como resumen, traducción, e incluso, razonamiento. Pero con ese poder, es necesaria una atención mayor a un matiz: el sesgo.

Debido a que estos modelos se entrenan con grandes cantidades de contenido en línea, gran parte de él extraído de la web abierta, inevitablemente reflejan patrones y prejuicios presentes en esos datos. 

Esto puede suponer un refuerzo de los estereotipos de género, el favorecer algunos dialectos por encima de otros, o incluso aportar más peso a puntos de vista culturalmente dominantes de manera que no se refleje la diversidad de la audiencia global.

ES Transcript (Google Speech Translator Output)

Los Modelos de lenguaje grandes han hecho grandes avances en los últimos años, particularmente en la generación de textos similar a un humano y en el manejo de tareas complejas como resumen, traducción e incluso razonamiento. Así que, con eso, necesitamos decidir más sesgo.

Debido a que estos sesgos no se capacitan en cantidades de contenido en línea. Gran parte de esto es genial de la web abierta, las referencias y prejuicios inevitables presentes en esos datos. Eso podría significar reforzar los estereotipos de género, favorecer los conjuntos y dialectos sobre otros, o incluso aumentar más el peso de la opinión cultural dominante de una manera que no refleje la diversidad de la audiencia global.

Analysis

Accuracy of translation (30%) 3/10

Strengths

Most technical concepts and terminology are correctly rendered.

Accurately conveys the core functions and features of language models.

Weaknesses

Localises “models” as “bias”. Besides, it doesn’t identify the pause “— much of it scraped from the open web —” so it localises something like “much of it is great from the open web”.  The meaning of the second part of the text gets lost because of this.

This section has lots of potential, but the second half of the text gets lost.

Naturalness and fluency (25%) 3/10

Strengths

Some sections, especially at the beginning, flow naturally and reflect appropriate spoken intonation.

Weaknesses

The voice suddenly gets cut and starts mumbling words, which really affects the understandability of the text.

Since the terminology is technical, the translation is fairly faithful, but the tool issues make the text impossible to understand.

Tone and register (20%) 6/10

Strengths

Maintains a formal, technical register suitable for an explanatory or educational context.

Consistent use of technical vocabulary appropriate for a technical explanation, but could benefit from more expressive delivery.

Weaknesses

Slightly monotonous, lacking variation in emphasis that a human technical speaker would use for clarity and engagement.

Appropriate for a technical explanation, but could benefit from more expressive delivery.

Comprehensibility (15%) 3/10

Strengths

Technical terminology is well conveyed. The first section of the text has lots of potential, but the second half of the text gets lost.

Weaknesses 

The totality of the text cannot be fully understood.

The totality of the text cannot be fully understood, but listeners can access the meaning of the first half of the text.

Timing and delivery (10%) 4/10

Strengths

Good use of intonation and delivery at the beginning. The first half is good quality but gets worse as the text progresses.

Weaknesses

Unnatural pauses and lack of understandability on the tool’s side.

Delivery is inconsistent. The first half is good quality but gets worse as the text progresses.

Fast-paced/overlapping Content (Score: 1/10)

It managed the direct translation of a few words, however due to the delay, a fast-paced dialogue is simply not possible. The tool suddenly stopped translating halfway through.

Idiomatic Content (Score: 3.1/10)

The AI struggles with idiomatic language, leading to loss or distortion of meaning.

EN Transcript (source text)

To be honest, this quarter hasn’t exactly been smooth sailing. We knew there’d be a few bumps in the road, but some of the challenges caught us off guard. That said, the team really stepped up, we rolled with the punches, stayed focused, and kept moving forward.

There were moments when we had to think on our feet and change direction quickly, but in the end, we delivered. It just goes to show that when everyone pulls their weight and keeps their eye on the ball, we can get through even the trickiest situations.

ES Localisation of source text (correct translation)

Para ser honestos, este trimestre no ha sido coser y cantar. Sabíamos que nos encontraríamos con algunos baches, pero algunos retos nos han pillado desprevenidos. Dicho esto, el equipo se puso las pilas, aguantamos el tirón, nos mantuvimos centrados y seguimos avanzando.

En algunos momentos teníamos que pensar sobre la marcha y cambiar de dirección rápido, pero al final, lo conseguimos. Esto solo demuestra que cuando todo el mundo arrima el hombro y mantiene su objetivo claro, podemos sortear hasta las situaciones más complicadas.

ES Transcript (Google Speech Translator Output)

Este trimestre no ha sido exactamente adecuado para ustedes. Sabíamos que había algunos en el camino, pero algunos de los desafíos son cuartos de salida. Dicho esto, el equipo realmente dio un paso adelante, rodamos con los golpes, nos mantuvimos concentrados y seguimos avanzando. Hubo momentos en los que tuvimos que pensar rápidamente y cambiar. Vamos a hacer una pelota y situaciones.

Analysis

Accuracy of translation (30%) 4/10

Strengths

If the meaning is straightforward, the tool can deliver the message. For example: “Dicho esto, el equipo realmente dio un paso adelante, rodamos con los golpes, nos mantuvimos concentrados y seguimos avanzando.”

Weaknesses

Does not understand idioms. The second half of the text is not translated, only mumbled words. 

The AI struggles with idiomatic language, leading to loss or distortion of meaning.

Naturalness and fluency (25%) 3/10

Strengths

Some sentences are understandable if they have basic structure.

Weaknesses

Pauses are not respected and sentences are not well cut. 

The output lacks the smooth, conversational flow expected in natural speech.

Tone and register (20%) 3/10

Strengths

Attempts to maintain a personal, anecdotal tone.

Weaknesses

Fails to convey the motivation and humour of the original due to poor idiom handling.

The register is inconsistent; some parts sound robotic or odd.

The emotional tone is largely lost, making the story less engaging and relatable.

Comprehensibility (15%) 3/10

Strengths

The general idea of overcoming a difficult situation is still understandable.

Weaknesses 

Confusing or incorrect idiom translations require extra effort to interpret.

The last part of the text is omitted.

Timing and delivery (10%) 4/10

Strengths

Some pauses are serviceable when the sentence is more straightforward.

Weaknesses

Some pauses are not respected and sentences are not well cut making the text difficult to understand.

Delivery is passable but lacks the rhythm and expressiveness of a natural storyteller.

 

Conclusion

Google’s real-time English-Spanish and Spanish-English voice translation tool shows strong promise, especially in its ability to replicate speaker tone and enable real-time multilingual conversations. The tool is easy to use and integrates smoothly into Google Meet, lowering the barrier for international and cross-lingual collaboration.

Performance Overview

The translator currently performs best in structured, formal contexts, such as business meetings and technical discussions, where vocabulary is predictable and syntax is more rigid. In these scenarios, Spanish-to-English (ES→EN) translation is noticeably more accurate and natural than English-to-Spanish (EN→ES). The tool reliably preserves key information and main ideas, though some literal phrasing and awkwardness remain.

Challenges and Limitations

Speaker Identification: The tool occasionally misidentifies the active speaker, sometimes picking up voices from nearby conversations (such as someone in another phone booth), which can introduce errors or confusion.

Translation Failures: There were instances where the tool failed to translate altogether, requiring the speaker to repeat their message from the beginning.

Speech Clarity: For best results, speakers must articulate clearly, pronounce each syllable, and use deliberate pauses. Stuttering, unclear enunciation, or overlapping speech can confuse the system and degrade translation quality.

Informal & Fast-Paced Speech: The tool struggles in informal, idiomatic, or fast-paced conversations, where nuance and natural rhythm are essential. Literal translations and choppy delivery can make the output awkward or even incomprehensible.

Ease of Use

Despite these challenges, the tool is user-friendly and accessible, making it practical for most business and formal communication needs.

Future Outlook

To become a truly reliable and inclusive bridge for global communication, Google should focus on improving:

Linguistic nuance and idiomatic understanding

Real-time processing speed and speaker identification

Contextual adaptation for informal and dynamic conversations

With these enhancements, Google’s voice translation tool could mature into a vital platform for seamless, natural communication across languages and cultures. For now, it is best suited for structured, formal exchanges, especially when translating from Spanish to English, while casual, idiomatic, or rapid conversations remain a challenge.

Latest Blog Articles

Quantum Dreams and Content Machines

Quantum Dreams and Content Machines

BARCELONA — In the shadow of Gaudí’s skyline, a different kind of architecture was...

Read more
Taking Content (and Careers) Further

Taking Content (and Careers) Further

At Locaria, a lot of our daily work revolves around translation, localisation, transcreation, voice-overs…...

Read more
Specialised AI Models Rise as LLMs Stall

Specialised AI Models Rise as LLMs Stall

AI is evolving rapidly, but with the rise of large language models (LLMs) like...

Read more