Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My assumption is that Gemini has no insight into the time stamps, and instead is ballparking it based on how much context has been analyzed up to that point.

I wonder if you put the audio into a video that is nothing but a black screen with a timer running, it would be able to correctly timestamp.



The Gemini documentation specifically mentions timestamp awareness here: https://ai.google.dev/gemini-api/docs/audio


Per the docs, Gemini represents each second of audio as 32 tokens. Since it's a consistent amount, as long as the model is trained to understand the relation between timestamps and the number of tokens (which per Simon's link it does), it should be able to infer the correct amount of seconds.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: