Recapping the Past Week in AI
While AI is seeing greater industry acceptance, including its incorporation into the 2024 Paris Olympic Games, notable tech leaders have released news on new features, general updates, and information on the future of governmental oversight and stewardship.
ChatGPT Advanced Voice
OpenAI released several updates, updating one previously surrounded by controversy after “Sky” resembled an American actress, initiating potential legal action. ChatGPT Plus subscribers can now access Advanced Voice, which is available on Apple and Android apps. This feature processes language more naturally and in real-time, responding to non-verbal and emotional cues. After training with 45 languages and 100 red team members from 29 countries, Advanced Voice hosts four preset voices. This feature will incorporate video and screen sharing to provide additional versatility.
ChatGPT-4o Long Output
LLMs are restricted in their responses due to tokens, the characters in words, which include trailing spaces and sub-words. Tokenization restricts the amount of text for prompting and limits the model’s response length. At the moment, ChatGPT-4o can handle a total input of 128K tokens but currently limits its responses to 4K tokens. This means that a prompt can be 124K tokens in length to receive the most extended generation. ChatGPT-40 Long Output has been released to select members for alpha testing. The responses from long output are increased 16-fold to 64K tokens per request. Longer communication formats with AI will reform our conversational interactions with LLMs. Our approaches to long-form writing, content or code generation, even interpreting longer documents will all be impacted.
Federal Oversight
After companies released AI models to the public, people raised concerns about the rate at which the technology was advancing. These concerns spurred the development of an AI Bill of Rights and began multiple Senate testimonies. OpenAI has endorsed multiple Senate bills, including the Future of AI Act, to recognize the United States AI Safety Institute. The Institute will provide future oversight and establish guidelines and regulations for AI development. Sam Altman confirmed that OpenAI would release new models early to the Institute for research and safety testing under NIST standards. Anna Makanju, leader of Global Affairs with OpenAI, reiterated that the company has been a vocal proponent of developing safety standards, citing that it believes the government should play a role in ensuring that AI is accessible and safe for everyone and for working globally.
Apple Intelligence Release Delayed
Earlier this July, Apple announced the most significant partnership in human history: the pairing of Apple Intelligence with OpenAI. Apple announced this alliance at the Annual Developer Conference, describing how it intended to integrate AI into its products. These integrations include suites for writing tools, an image playground for image generation, and upgrades to the voice assistant Siri.
As the largest seller of consumer electronics, Apple will be delaying the rollout of these AI features with the iPhone 16, opting for a gradual, safe approach. Software developers will be the first permitted access to these features for early beta testing for iOS 18.1 and iPadOS 18 to address any issues with the new system. Although delayed, consumers can anticipate accessing the full suite of updates by the end of 2024 and the first half of 2025.
Gemini Updates
On December 6th, 2024, Google launched the AI arms race as a competitor to ChatGPT, announcing its multimodal model, Gemini. The DeepMind team has continued to advance the capabilities of its systems, starting with the announcement of an upgraded version of Gemini 1.5 Pro, available in the Google AI Studio. This latest model has surpassed ChatGPT, according to LMSYS, an open-source research project founded by LMSYS members and UC Berkley SkyLab to evaluate and compare LLMs.
Additionally, Google is also expanding AI products to home life by incorporating Gemini into Google Home products and Nest Cameras. With the updated multimodal models, these security cameras can process video, text, and images, allowing for more specific processing of images for security or locating objects with a voice prompt. The Home app now generates prompts for smart homes with a ‘Help me’ feature. Users type or speak a prompt, and Gemini will create the automation.