Great question.
I have not an answer to this question but would like to compare it to a similar question I’ve been looking at in a different tech sector which is voice control over text and type to control our electronics.
It’s almost like: did we need this? What need is not met that we are trying to fill with the introduction and implementation of this new technology and process?
I think with voice/ speech to control the use case is more obvious than choose your own cinematic adventure but at the core of the tech issue the question is the same.
What problem are we solving?