OpenAI’s renowned chatbot, ChatGPT, is set to undergo a transformative upgrade, introducing voice interactions and image-based functionalities.
- Voice Interactions: ChatGPT will soon support voice conversations on Android and iOS platforms.
- Image-based Queries: Users can now present images to ChatGPT across all platforms, enabling a more interactive experience.
- Availability: Plus and Enterprise users will be the first to access these features, with a broader rollout planned for the future.
- Safety Measures: OpenAI acknowledges potential misuse and has implemented safety protocols to mitigate risks.
- Collaboration with Spotify: Spotify pilots a Voice Translation tool, leveraging ChatGPT’s voice technology.
ChatGPT’s latest features will allow users to engage in voice dialogues on mobile platforms. Those eager to experience this can activate the feature via the ChatGPT app settings. With a simple tap on the microphone icon, users can select from five distinct voices, crafted with the assistance of professional actors. Powering these voice interactions is OpenAI’s innovative text-to-speech model, capable of producing lifelike audio from mere text and brief sample speech. Complementing this, the Whisper speech recognition system translates spoken words into text, ensuring seamless communication.
The introduction of image-based functionalities offers a plethora of possibilities. Users can present ChatGPT with a photograph, seeking assistance on diverse topics, from troubleshooting a grill to meal planning based on fridge contents. This feature even extends to solving math problems captured in images. The underlying technology harnesses the power of GPT-3.5 and GPT-4 for image recognition. To engage with this feature, users can simply tap the photo button, select or capture an image, and direct ChatGPT’s attention to specific image sections if needed.
OpenAI, in its announcement, expressed awareness of the potential misuse of these advancements. The ability to mimic voices raises concerns about impersonation and potential fraud. As a precaution, OpenAI is limiting the voice technology to ChatGPT conversations and collaborating with select partners for other specific applications. In partnership with Be My Eyes, an app assisting the visually impaired, OpenAI aims to enhance the understanding of surroundings for users. However, OpenAI has imposed restrictions on ChatGPT’s analysis of individuals in images to uphold privacy standards.
While ChatGPT excels in understanding English text within images, its proficiency in other languages remains limited, especially for non-Roman scripts. OpenAI recommends non-English users exercise caution when using the image-text feature.
In a groundbreaking collaboration, Spotify is leveraging ChatGPT’s voice technology. The music streaming giant is piloting a “Voice Translation” tool for podcasters, enabling podcast translations while retaining the original speaker’s voice characteristics. Initially, select English podcasts will be translated into Spanish, with French and German versions in the pipeline.