- Sora's training data source questioned, sparking transparency concerns.
- YouTube CEO warns against using platform videos for Sora's training.
- Transparency crucial as AI tools like Sora become more accessible.
Text-to-Video Generator Sora by OpenAI, is exposed to criticism concerning the source of its training data, the AI company has entered into AI space with its latest application, Sora.
Its ability to generate video clips from textual input has amazed the onlookers. Yet, new reports have broken people’s trust in the reliability of an enormous dataset which AlphaGo utilized for training the innovative AI model.
Debate Over OpenAI Training Data
Although OpenAI has not revealed the exact occurrence of Sora’s training data, the issue was heightened when the company’s Chief Technology Officer, Mira Murati admitted to not knowing the source during an interview with the Wall Street Journal. This absence of transparency has been a big issue and therefore, the tech industry is now really skeptical of about any kind of data repositories such as YouTube.
Neil Mohan, CEO of YouTube, a Google owned company cautioned the creators of OpenAI. In an interview with Emily Chang on Bloomberg Originals Mohan showed that YouTube videos were used in training AI models (like Sora) whose use was against YouTube’s terms of service.
Besides this, he underscored their commitment to signed contracts with YouTube creators while using their content to build their own AI models, such as multimodal AI Gemini. However, OpenAI does not have an official refutation on this concern.
Related
The issue of transparency surrounding the usage data and origin of AI models is of top priority, especially as AI technologies like Sora will become more available to the public. In the coming months, there might be a rise of added topics in this particular discussion taking place because both the hazards and benefits associated with the AI-assisted video generation are going to be unveiled.