Fascinating. I've always wanted to take part in one of these expeditions to record an endangered language. It's great the RPi is being used for so many things.
Yes, a feedback loop could result in nonsense data being added to a corpus. This is one of the reasons that it's important to consider these sorts of technologies as not replacing a language speaker in a learning context, but being used in support of a person who knows the language. It's a concern even with keyboards for Indigenous languages - how much can people misusing a keyboard (intentionally or otherwise) influence language?