My assumption is that Gemini has no insight into the time stamps, and instead is...

simonw · 2025-11-18T20:19:53 1763497193

The Gemini documentation specifically mentions timestamp awareness here: https://ai.google.dev/gemini-api/docs/audio

minimaxir · 2025-11-18T20:28:29 1763497709

Per the docs, Gemini represents each second of audio as 32 tokens. Since it's a consistent amount, as long as the model is trained to understand the relation between timestamps and the number of tokens (which per Simon's link it does), it should be able to infer the correct amount of seconds.