Download the AI & Machine Learning For Dummies App: iOS - Android
AI Innovations in March 2024.
Welcome to the March 2024 edition of the Daily Chronicle, your gateway to the forefront of Artificial Intelligence innovation! Embark on a captivating journey with us as we unveil the most recent advancements, trends, and revolutionary discoveries in the realm of artificial intelligence. Delve into a world where industry giants converge at events like ‘AI Innovations at Work’ and where visionary forecasts shape the future landscape of AI. Stay abreast of daily updates as we navigate through the dynamic realm of AI, unraveling its potential impact and exploring cutting-edge developments throughout this enthralling month. Join us on this exhilarating expedition into the boundless possibilities of AI in March 2024.
Experience the transformative capabilities of AI with “Read Aloud For Me – AI Dashboard” – your ultimate AI Dashboard and Hub. Seamlessly access a comprehensive suite of top-tier AI tools within a single app, meticulously crafted to enhance your efficiency and streamline your digital interactions. Now available on the web at readaloudforme.com and across popular app platforms including Apple, Google, and Microsoft, “Read Aloud For Me – AI Dashboard” places the future of AI at your fingertips, blending convenience with cutting-edge innovation. Whether for professional endeavors, educational pursuits, or personal enrichment, our app serves as your portal to the forefront of AI technologies. Embrace the future today by downloading our app and revolutionize your engagement with AI tools.
A daily chronicle of AI Innovations: March 31st 2024: Generative AI develops potential new drugs for antibiotic-resistant bacteria; South Korean ‘artificial sun’ hits record 100M degrees for 100 seconds; Summary of the key points about OpenAI’s relationship with Dubai and the UAE; Deepmind did not originally see LLMs and the transformer as a path to AGI. Fascinating article.
Generative AI develops potential new drugs for antibiotic-resistant bacteria
Stanford Medicine researchers devise a new artificial intelligence model, SyntheMol, which creates recipes for chemists to synthesize the drugs in the lab.
With nearly 5 million deaths linked to antibiotic resistance globally every year, new ways to combat resistant bacterial strains are urgently needed.
Researchers at Stanford Medicine and McMaster University are tackling this problem with generative artificial intelligence. A new model, dubbed SyntheMol (for synthesizing molecules), created structures and chemical recipes for six novel drugs aimed at killing resistant strains of Acinetobacter baumannii, one of the leading pathogens responsible for antibacterial resistance-related deaths.
The researchers described their model and experimental validation of these new compounds in a study published March 22 in the journal Nature Machine Intelligence.
“There’s a huge public health need to develop new antibiotics quickly,” said James Zou, PhD, an associate professor of biomedical data science and co-senior author on the study. “Our hypothesis was that there are a lot of potential molecules out there that could be effective drugs, but we haven’t made or tested them yet. That’s why we wanted to use AI to design entirely new molecules that have never been seen in nature.”
South Korean ‘artificial sun’ hits record 100M degrees for 100 seconds
For the first time, the Korea Institute of Fusion Energy’s (KFE) Korea Superconducting Tokamak Advanced Research (KSTAR) fusion reactor has reached temperatures seven times that of the Sun’s core.
Achieved during testing between December 2023 and February 2024, this sets a new record for the fusion reactor project.
KSTAR, the researchers behind the reactor report, managed to maintain temperatures of 212 degrees Fahrenheit (100 million degrees Celsius) for 48 seconds. For reference, the temperature of the core of our Sun is 27 million degrees Fahrenheit (15 million degrees Celsius).
Gemini 1.5 Pro on Vertex AI is available for everyone as an experimental release
I think this one has flown under the radar: Gemini 1.5 Pro is available as Experimental on Vertex AI, for everyone, UI only for now (no API yet). In us-central1.
“Summary of the key points about OpenAI’s relationship with Dubai and the UAE”
OpenAI’s Partnership with G42
In October 2023, G42, a leading UAE-based technology holding group, announced a partnership with OpenAI to deliver advanced AI solutions to the UAE and regional markets.
The partnership will focus on leveraging OpenAI’s generative AI models in domains where G42 has deep expertise, including financial services, energy, healthcare, and public services.
G42 will prioritize its substantial AI infrastructure capacity to support OpenAI’s local and regional inferencing on Microsoft Azure data centers.
Sam Altman, CEO of OpenAI, stated that the collaboration with G42 aims to empower businesses and communities with effective solutions that resonate with the nuances of the region.
Altman’s Vision for the UAE as an AI Sandbox
During a virtual appearance at the World Governments Summit, Altman suggested that the UAE could serve as the world’s “regulatory sandbox” to test AI technologies and later spearhead global rules limiting their use.
Altman believes the UAE is well-positioned to be a leader in discussions about unified global policies to rein in future advances in AI.
The UAE has invested heavily in AI and made it a key policy consideration.
Altman’s Pursuit of Trillions in Funding for AI Chip Manufacturing
Altman is reportedly in talks with investors, including the UAE, to raise $5-7 trillion for AI chip manufacturing to address the scarcity of GPUs crucial for training and running large language models.
As part of the talks, Altman is pitching a partnership between OpenAI, various investors, chip makers, and power providers to build chip foundries that would be run by existing chip makers, with OpenAI agreeing to be a significant customer.
In summary, OpenAI’s partnership with G42 aims to expand AI capabilities in the UAE and the Middle East, with Altman envisioning the UAE as a potential global AI sandbox.
It’s a very long article so I’ll post the relevant snippets. But basically it seems that Google was late to the LLM game because Demis Hassabis was 100% focused on AGI and did not see LLM’s as a path toward AGI. Perhaps now he sees it as a potential path, but it’s probably possible that he is just now focusing on LLM’s so that Google does not get too far behind in the generative AI race. But his ultimate goal and obsession is to create AGI that can solve real problems like diseases.
Within DeepMind, generative models weren’t taken seriously enough, according to those inside, perhaps because they didn’t align with Hassabis’s AGI priority, and weren’t close to reinforcement learning. Whatever the rationale, DeepMind fell behind in a key area.”
“‘We’ve always had amazing frontier work on self-supervised and deep learning,’ Hassabis tells me. ‘But maybe the engineering and scaling component — that we could’ve done harder and earlier. And obviously we’re doing that completely now.'”
“Kulkarni, the ex-DeepMind engineer, believes generative models were not respected at the time across the AI field, and simply hadn’t show enough promise to merit investment. ‘Someone taking the counter-bet had to pursue that path,’ he says. ‘That’s what OpenAI did.'”
“Ironically, a breakthrough within Google — called the transformer model — led to the real leap. OpenAI used transformers to build its GPT models, which eventually powered ChatGPT. Its generative ‘large language’ models employed a form of training called “self-supervised learning,” focused on predicting patterns, and not understanding their environments, as AlphaGo did. OpenAI’s generative models were clueless about the physical world they inhabited, making them a dubious path toward human level intelligence, but would still become extremely powerful.”
As DeepMind rejoiced, a serious challenge brewed beneath its nose. Elon Musk and Sam Altman founded OpenAI in 2015, and despite plenty of internal drama, the organization began working on text generation.”
“As OpenAI worked on the counterbet, DeepMind and its AI research counterpart within Google, Google Brain, struggled to communicate. Multiple ex-DeepMind employees tell me their division had a sense of superiority. And it also worked to wall itself off from the Google mothership, perhaps because Google’s product focus could distract from the broader AGI aims. Or perhaps because of simple tribalism. Either way, after inventing the transformer model, Google’s two AI teams didn’t immediately capitalize on it.”
“‘I got in trouble for collaborating on a paper with a Brain because the thought was like, well, why would you collaborate with Brain?’ says one ex-DeepMind engineer. ‘Why wouldn’t you just work within DeepMind itself?'”
“Then, a few months later, OpenAI released ChatGPT.” “At first, ChatGPT was a curiosity. The OpenAI chatbot showed up on the scene in late 2022 and publications tried to wrap their heads around its significance. […] Within Google, the product felt familiar to LaMDA, a generative AI chatbot the company had run internally — and even convinced one employee it was sentient — but never released. When ChatGPT became the fastest growing consumer product in history, and seemed like it could be useful for search queries, Google realized it had a problem on its hands.”
OpenAI reveals Voice Engine, but won’t yet publicly release the risky AI voice-cloning technology
OpenAI has released VoiceEngine, a voice-cloning tool. The company claims that it can recreate a person’s voice with just 15 seconds of recording of that person talking.
Demis Hassabis, CEO and one of three founders of Google’s artificial intelligence (AI) subsidiary DeepMind, has been awarded a knighthood in the U.K. for “services to artificial intelligence.” [Source]
A daily chronicle of AI Innovations: March 30th, 2024: Microsoft and OpenAI to build $100 billion AI supercomputer ‘Stargate’; OpenAI unveils voice-cloning tool; Amazon’s AI team faces pressure to outperform Anthropic’s Claude models by mid-year; Microsoft Copilot has been blocked on all Congress-owned devices
Microsoft and OpenAI to build $100 billion AI supercomputer ‘Stargate’
OpenAI and Microsoft are working on a $100 billion project to build an AI supercomputer named ‘Stargate’ in the U.S.
The supercomputer will house millions of GPUs and could cost over $115 billion.
Stargate is part of a series of datacenter projects planned by the two companies, with the goal of having it operational by 2028.
Microsoft will fund the datacenter, which is expected to be 100 times more costly than current operating centers.
The supercomputer is being built in phases, with Stargate being a phase 5 system.
Challenges include designing novel cooling systems and considering alternative power sources like nuclear energy.
OpenAI aims to move away from Nvidia’s technology and use Ethernet cables instead of InfiniBand cables.
Details about the location and structure of the supercomputer are still being finalized.
Both companies are investing heavily in AI infrastructure to advance the capabilities of AI technology.
Microsoft’s partnership with OpenAI is expected to deepen with the development of projects like Stargate.
Microsoft and OpenAI are reportedly collaborating on a significant project to create a U.S.-based datacenter for an AI supercomputer named “Stargate,” estimated to cost over $115 billion and utilize millions of GPUs.
The supercomputer aims to be the largest among the datacenters planned by the two companies within the next six years, with Microsoft covering the costs and aiming for a launch by 2028.
The project, considered to be in phase 5 of development, requires innovative solutions for power, cooling, and hardware efficiency, including a possible shift away from relying on Nvidia’s InfiniBand in favor of Ethernet cables.
OpenAI has developed a text-to-voice generation platform named Voice Engine, capable of creating a synthetic voice from just a 15-second voice clip.
The platform is in limited access, serving entities like the Age of Learning and Livox, and is being used for applications from education to healthcare.
With concerns around ethical use, OpenAI has implemented usage policies, requiring informed consent and watermarking audio to ensure transparency and traceability.
Amazon’s AI team faces pressure to outperform Anthropic’s Claude models by mid-year
Amazon has invested $4 billion in AI startup Anthropic, but is also developing a competing large-scale language model called Olympus.
Olympus is supposed to surpass Anthropic’s latest Claude model by the middle of the year and has “hundreds of billions of parameters.”
So far, Amazon has had no success with its own language models. Employees are unhappy with Olympus’ development time and are considering switching to Anthropic’s models.
Microsoft Copilot has been blocked on all Congress-owned devices
The US House of Representatives has banned its staff from using Microsoft’s AI chatbot Copilot due to cybersecurity concerns over potential data leaks.
Microsoft plans to remove Copilot from all House devices and is developing a government-specific version aimed at meeting federal security standards.
The ban specifically targets the commercial version of Copilot, with the House open to reassessing a government-approved version upon its release.
A daily chronicle of AI Innovations: March 29th, 2024: Apple files lawsuit against former engineer for leaking details of projects he wanted to kill; Apple files lawsuit against former engineer for leaking details of projects he wanted to kill; Microsoft tackles Gen AI risks with new Azure AI tools; AI21 Labs’ Jamba triples AI throughput ; Google DeepMind’s AI fact-checker outperforms humans ; X’s Grok gets a major upgrade; Lightning AI partners with Nvidia to launch Thunder AI compiler
Apple files lawsuit against former engineer for leaking details of projects he wanted to kill
Apple has filed a lawsuit against former employee Andrew Aude for leaking confidential information about products like the Vision Pro and Journal app to journalists and competitors, motivated by his desire to “kill” products and features he disagreed with.
Aude, who joined Apple in 2016, is accused of sharing sensitive details via encrypted messages and meetings, including over 10,000 text messages to a journalist from The Information.
The lawsuit seeks damages, the return of bonuses and stock options, and a restraining order against Aude for disclosing any more of Apple’s confidential information.
Microsoft launches tools to try and stop people messing with chatbots
Microsoft has introduced a new set of tools in Azure to enhance the safety and security of generative AI applications, especially chatbots, aiming to counter risks like abusive content and prompt injections.
The suite includes features for real-time monitoring and protection against sophisticated threats, leveraging advanced machine learning to prevent direct and indirect prompt attacks.
These developments reflect Microsoft’s ongoing commitment to responsible AI usage, fueled by its significant investment in OpenAI and intended to address the security and reliability concerns of corporate leaders.
AI21 Labs has released Jamba, the first-ever production-grade AI model based on the Mamba architecture. This new architecture combines the strengths of both traditional Transformer models and the Mamba SSM, resulting in a model that is both powerful and efficient. Jamba boasts a large context window of 256K tokens, while still fitting on a single GPU.
Jamba’s hybrid architecture, composed of Transformer, Mamba, and mixture-of-experts (MoE) layers, optimizes for memory, throughput, and performance simultaneously.
The model has demonstrated remarkable results on various benchmarks, matching or outperforming state-of-the-art models in its size class. Jamba is being released with open weights under Apache 2.0 license and will be accessible from the NVIDIA API catalog.
Jamba’s hybrid architecture makes it the only model capable of processing 240k tokens on a single GPU. This could make AI tasks like machine translation and document analysis much faster and cheaper, without requiring extensive computing resources.
Google DeepMind’s AI fact-checker outperforms humans
Google DeepMind has developed an AI system called Search-Augmented Factuality Evaluator (SAFE) that can evaluate the accuracy of information generated by large language models more effectively than human fact-checkers. In a study, SAFE matched human ratings 72% of the time and was correct in 76% of disagreements with humans.
While some experts question the use of “superhuman” to describe SAFE’s performance, arguing for benchmarking against expert fact-checkers, the system’s cost-effectiveness is undeniable, being 20 times cheaper than human fact-checkers.
Why does this matter?
As language models become more powerful and widely used, SAFE could combat misinformation and ensure the accuracy of AI-generated content. SAFE’s efficiency could be a game-changer for consumers relying on AI for tasks like research and content creation.
X.ai, Elon Musk’s AI startup, has introduced Grok-1.5, an upgraded AI model for their Grok chatbot. This new version enhances reasoning skills, especially in coding and math tasks, and expands its capacity to handle longer and more complex inputs with a 128,000-token context window.
Grok chatbots are known for their ability to discuss controversial topics with a rebellious touch. The improved model will first be tested by early users on X, with plans for wider availability later. This release follows the open-sourcing of Grok-1 and the inclusion of the chatbot in X’s $8-per-month Premium plan.
This is significant because Grok-1.5 represents an advancement in AI assistants, potentially offering improved help with complex tasks and better understanding of user intent through its larger context window and real-time data ability. This could impact how people interact with chatbots in the future, making them more helpful and reliable.
Microsoft tackles Gen AI risks with new Azure AI tools
Microsoft has launched new Azure AI tools to address the safety and reliability risks associated with generative AI. The tools, currently in preview, aim to prevent prompt injection attacks, hallucinations, and the generation of personal or harmful content. The offerings include Prompt Shields, prebuilt templates for safety-centric system messages, and Groundedness Detection. (Link)
Lightning AI partners with Nvidia to launch Thunder AI compiler
Lightning AI, in collaboration with Nvidia, has launched Thunder, an open-source compiler for PyTorch, to speed up AI model training by optimizing GPU usage. The company claims that Thunder can achieve up to a 40% speed-up for training large language models compared to unoptimized code. (Link)
SambaNova’s new AI model beats Databricks’ DBRX
SambaNova Systems’ Samba-CoE v0.2 Large Language Model outperforms competitors like Databricks’ DBRX, MistralAI’s Mixtral-8x7B, and xAI’s Grok-1. With 330 tokens per second using only 8 sockets, Samba-CoE v0.2 demonstrates remarkable speed and efficiency without sacrificing precision. (Link)
Google.org launches Accelerator to empower nonprofits with Gen AI
Google.org has announced a six-month accelerator program to support 21 nonprofits in leveraging generative AI for social impact. The program provides funding, mentorship, and technical training to help organizations develop AI-powered tools in areas such as climate, health, education, and economic opportunity, aiming to make AI more accessible and impactful. (Link)
Pixel 8 to get on-device AI features powered by Gemini Nano
Google is set to introduce on-device AI features like recording summaries and smart replies on the Pixel 8, powered by its small-sized Gemini Nano model. The features will be available as a developer preview in the next Pixel feature drop, marking a shift from Google’s primarily cloud-based AI approach. (Link)
A daily chronicle of AI Innovations: March 28th, 2024: DBRX becomes world’s most powerful open-source LLM Claude 3 Opus crowned the top user-rated chatbot, beating OpenAI’s GPT-4 Empathy meets AI: Hume AI’s EVI redefines voice interaction
DBRX becomes world’s most powerful open source LLM
Databricks has released DBRX, a family of open-source large language models setting a new standard for performance and efficiency. The series includes DBRX Base and DBRX Instruct, a fine-tuned version designed for few-turn interactions. Developed by Databricks’ Mosaic AI team and trained using NVIDIA DGX Cloud, these models leverage an optimized mixture-of-experts (MoE) architecture based on the MegaBlocks open-source project. This architecture allows DBRX to achieve up to twice the compute efficiency of other leading LLMs.
In terms of performance, DBRX outperforms open-source models like Llama 2 70B, Mixtral-8x7B, and Grok-1 on industry benchmarks for language understanding, programming, and math. It also surpasses GPT-3.5 on most of these benchmarks, although it still lags behind GPT-4. DBRX is available under an open license with some restrictions and can be accessed through GitHub, Hugging Face, and major cloud platforms. Organizations can also leverage DBRX within Databricks’ Data Intelligence Platform.
Why does this matter?
With DBRX, organizations can build and fine-tune powerful proprietary models using their own internal datasets, ensuring full control over their data rights. As a result, DBRX is likely to accelerate the trend of organizations moving away from closed models and embracing open alternatives that offer greater control and customization possibilities.
Claude 3 Opus crowned the top user-rated chatbot, beating OpenAI’s GPT-4
Anthropic’s Claude 3 Opus has overtaken OpenAI’s GPT-4 to become the top-rated chatbot on the Chatbot Arena leaderboard. This marks the first time in approximately a year since GPT-4’s release that another language model has surpassed it in this benchmark, which ranks models based on user preferences in randomized head-to-head comparisons. Anthropic’s cheaper Haiku and mid-range Sonnet models also perform impressively, coming close to the original GPT-4’s capabilities at a significantly lower cost.
While OpenAI still dominates the market, especially among regular users with ChatGPT, this development and recent leadership changes at OpenAI have helped Anthropic gain ground. However, OpenAI is rumored to be preparing to launch an even more advanced “GPT-4.5” or “GPT-5” model as soon as this summer, which CEO Sam Altman has teased will be “amazing,” potentially allowing them to retake the lead from Anthropic’s Claude 3 Opus.
Claude’s rise to the top of the Chatbot Arena leaderboard shows that OpenAI is not invincible and will face stiff competition in the battle for AI supremacy. With well-resourced challengers like Anthropic and Google, OpenAI will need to move fast and innovate boldly to maintain its top position. Ultimately, this rivalry will benefit everyone as it catalyzes the development of more powerful, capable, and hopefully beneficial AI systems that can help solve humanity’s major challenges.
Empathy meets AI: Hume AI’s EVI redefines voice interaction
In a significant development for the AI community, Hume AI has introduced a new conversational AI called Empathic Voice Interface (EVI). What sets EVI apart from other voice interfaces is its ability to understand and respond to the user’s tone of voice, adding unprecedented emotional intelligence to the interaction. By adapting its language and responses based on the user’s expressions, EVI creates a more human-like experience, blurring the lines between artificial and emotional intelligence.
EVI’s empathic capabilities extend beyond just understanding tone. It can accurately detect the end of a conversation turn, handle interruptions seamlessly, and even learn from user reactions to improve over time. These features, along with its fast and reliable transcription and text-to-speech capabilities, make EVI a highly adaptable tool for various applications. Developers can easily integrate EVI into their projects using Hume’s API, which will be publicly available in April.
Why does this matter?
Emotionally intelligent AI can be revolutionary for industries like healthcare and use cases like customer support, where empathy and emotional understanding are crucial. But we must also consider potential risks, such as overreliance on AI for emotional support or the possibility of AI systems influencing users’ emotions in unintended ways. If developed and implemented ethically, emotionally intelligent AI can greatly enhance how we interact with and benefit from AI technologies in our daily lives.
OpenAI launches revenue sharing program for GPT Store builders
OpenAI is experimenting with sharing revenue with builders who create successful apps using GPT in OpenAI’s GPT Store. The goal is to incentivize creativity and collaboration by rewarding builders for their impact on an ecosystem OpenAI is testing so they can make it easy for anyone to build and monetize AI-powered apps. (Link)
Google introduces new shopping features to refine searches
Google is rolling out new shopping features that allow users to refine their searches and find items they like more easily. The Style Recommendations feature lets shoppers rate items in their searches, helping Google pick up on their preferences. Users can also specify their favorite brands to instantly bring up more apparel from those selections. (Link)
rabbit’s r1 device gets ultra-realistic voice powered by ElevenLabs
ElevenLabs has partnered with rabbit to integrate its high-quality, low-latency voice AI into rabbit’s r1 AI companion device. The collaboration aims to make the user experience with r1 more natural and intuitive by allowing users to interact with the device using voice commands. (Link)
AI startup Hume raises $50M to build emotionally intelligent conversational AI
AI startup Hume has raised $50 million in a Series B funding round, valuing the company at $219 million. Hume’s AI technology can detect over 24 distinct emotional expressions in human speech and generate appropriate responses. The startup’s AI has been integrated into applications across healthcare, customer service, and productivity, with the goal of providing more context and empathy in AI interactions. (Link)
Lenovo launches AI-enhanced PCs in a push for innovation and differentiation
Lenovo revealed a new lineup of AI-powered PCs and laptops at its Innovate event in Bangkok, Thailand. The company showcased the dual-screen Yoga Book 9i, Yoga Pro 9i with an AI chip for performance optimization and AI-enhanced Legion gaming laptops. Lenovo hopes to differentiate itself in the crowded PC market and revive excitement with these AI-driven innovations. (Link)
Study shows ChatGPT can produce medical record notes 10 times faster than doctors without compromising quality
The AI model ChatGPT can write administrative medical notes up to 10 times faster than doctors without compromising quality. This is according to a study conducted by researchers at Uppsala University Hospital and Uppsala University in collaboration with Danderyd Hospital and the University Hospital of Basel, Switzerland. The research is published in the journal Acta Orthopaedica.
Microsoft’s Copilot AI service is set to run locally on PCs, Intel told Tom’s Hardware. The company also said that next-gen AI PCs would require built-in neural processing units (NPUs) with over 40 TOPS (trillion operations per second) of power — beyond the capabilities of any consumer processor on the market.
Intel said that the AI PCs would be able to run “more elements of Copilot” locally. Currently, Copilot runs nearly everything in the cloud, even small requests. That creates a fair amount of lag that’s fine for larger jobs, but not ideal for smaller jobs. Adding local compute capability would decrease that lag, while potentially improving performance and privacy as well.
Microsoft was previously rumored to require 40 TOPS on next-gen AI PCs (along with a modest 16GB of RAM). Right now, Windows doesn’t make much use of NPUs, apart from running video effects like background blurring for Surface Studio webcams. ChromeOS and macOS both use NPU power for more video and audio processing features, though, along with OCR, translation, live transcription and more, Ars Technica noted.
A daily chronicle of AI Innovations: March 27th, 2024: Microsoft study reveals the 11 by 11 tipping point for AI adoption A16z spotlights the rise of generative AI in enterprises Gaussian Frosting revolutionizes surface reconstruction in 3D modeling OpenAI unveils exciting upcoming features for GPT-4 and DALL-E 3 Adobe unveils GenStudio: AI-powered ad creation platform
Microsoft study reveals the 11 by 11 tipping point for AI adoption
Microsoft’s study on AI adoption in the workplace revealed the “11-by-11 tipping point,” where users start seeing AI’s value by saving 11 minutes daily. The study involved 1,300 Copilot for Microsoft 365 users and showed that 11 minutes of time savings is enough for most people to find AI useful.
Over 11 weeks, users reported improved productivity, work enjoyment, work-life balance, and fewer meetings. This “11-by-11 tipping point” signifies the time it takes for individuals to experience AI’s benefits in their work fully.
Why does it matter?
The study offers insights for organizations aiming to drive AI adoption among their employees. Businesses can focus on identifying specific use cases that deliver immediate benefits like time and cost savings. It will help organizations encourage employees to embrace AI, increasing productivity and improving work experiences.
A16z spotlights the rise of generative AI in enterprises
A groundbreaking report by the influential tech firm a16z unveils the rapid integration of generative AI technologies within the corporate sphere. The report highlights essential considerations for business leaders to harness generative AI effectively. It covers resource allocation, model selection, and innovative use cases, providing a strategic roadmap for enterprises.
An increased financial commitment from businesses marks the adoption of generative AI. Industry leaders are tripling their investments in AI technologies, emphasizing the pivotal role of generative AI in driving innovation and efficiency.
The shift towards integrating AI into core operations is evident. There is a focus on measuring productivity gains and cost savings and quantifying impact on key business metrics.
Why does it matter?
The increasing budgets allocated to generative AI signal its strategic importance in driving innovation and productivity in enterprises. This highlights AI’s transformative potential to provide a competitive edge and unlock new opportunities. Generative AI can revolutionize various business operations and help gain valuable insights by leveraging diverse data types.
Gaussian Frosting revolutionizes surface reconstruction in 3D modeling
At the international conference on computer vision, researchers presented a new method to improve surface reconstruction using Gaussian Frosting. This technique automates the adjustment of Poisson surface reconstruction hyperparameters, resulting in significantly improved mesh reconstruction.
The method showcases the potential for scaling up mesh reconstruction while preserving intricate details and opens up possibilities for advanced geometry and texture editing. This work marks a significant step forward in surface reconstruction methods, promising advancements in 3D modeling and visualization techniques.
Why does it matter?
The new method demonstrates how AI enhances surface reconstruction techniques, improving mesh quality and enabling advanced editing in 3D modeling. This has significant implications for revolutionizing how 3D models are created, edited, and visualized across various industries.
AIs can now learn and talk with each other like humans do.
This seems an important step toward AGI and vastly improved productivity.
“Once these tasks had been learned, the network was able to describe them to a second network — a copy of the first — so that it could reproduce them. To our knowledge, this is the first time that two AIs have been able to talk to each other in a purely linguistic way,’’ said lead author of the paper Alexandre Pouget, leader of the Geneva University Neurocenter, in a statement.”
“While AI-powered chatbots can interpret linguistic instructions to generate an image or text, they can’t translate written or verbal instructions into physical actions, let alone explain the instructions to another AI.
However, by simulating the areas of the human brain responsible for language perception, interpretation and instructions-based actions, the researchers created an AI with human-like learning and communication skills.”
Adobe unveils GenStudio: AI-powered ad creation platform
Adobe introduced GenStudio, an AI-powered ad creation platform, during its Summit event. GenStudio is a centralized hub for promotional campaigns, offering brand kits, copy guidance, and preapproved assets. It also provides generative AI-powered tools for generating backgrounds and ensuring brand consistency. Users can quickly create ads for email and social media platforms like Facebook, Instagram, and LinkedIn. (Link)
Airtable introduces AI summarization for enhanced productivity
Airtable has introduced Airtable AI, which provides generative AI summarization, categorization, and translation to users. This feature allows quick insights and understanding of information within workspaces, enabling easy sharing of valuable insights with teams. Airtable AI automatically applies categories and tags to information, routes action items to the relevant team, and generates emails or social posts with a single button tap. (Link)
Microsoft Teams enhances Copilot AI features for improved collaboration
Microsoft is introducing smarter Copilot AI features in Microsoft Teams to enhance collaboration and productivity. The updates include new ways to invoke the assistant during meeting chats and summaries, making it easier to catch up on missed meetings by combining spoken transcripts and written chats into a single view. Microsoft is launching new hybrid meeting features, such as automatic camera switching for remote participants and speaker recognition for accurate transcripts. (Link)
OpenAI unveils exciting upcoming features for GPT-4 and DALL-E 3
OpenAI is preparing to introduce new features for its GPT-4 and DALL-E 3 models. For GPT-4, OpenAI plans to remove the message limit, implement a Model Tuner Selector, and allow users to upgrade responses from GPT-3.5 to GPT-4 with a simple button push. On the DALL-E 3 front, OpenAI is working on an image editor with inpainting functionality. These upcoming features demonstrate OpenAI’s commitment to advancing AI capabilities. (Link)
Apple Chooses Baidu’s AI for iPhone 16 in China
Apple has reportedly chosen Baidu to provide AI technology for its upcoming iPhone 16 and other devices in China. This decision comes as Apple faces challenges due to stagnation in iPhone innovation and competition from Huawei. Baidu’s Ernie Bot will be included in the Chinese version of the iPhone 16, Mac OS, and iOS 18. Despite discussions with Alibaba Group Holding and a Tsinghua University AI startup, Apple selected Baidu’s AI technology for compliance. (Link)
Meta CEO, Mark Zuckerberg, is directly recruiting AI talent from Google’s DeepMind with personalized emails.
Meta CEO, Mark Zuckerberg, is attempting to recruit top AI talent from Google’s DeepMind (their AI research unit). Personalised emails, from Zuckerberg himself, have been sent to a few of their top researchers, according to a report from The Information, which cited individuals that had seen the messages. In addition to this, the researchers are being hired without having to do any interviews, and, a previous policy which Meta had in place – to not offer higher offers to candidates with competing job offers – has been relaxed.
Zuckerberg appears to be on a hiring spree to build Meta into a position of being a dominant player in the AI space.
OpenAI’s Sora Takes About 12 Minutes to Generate 1 Minute Video on NVIDIA H100. Source.
Apple on Tuesday announced that its annual developers conference, WWDC, will take place June 10 through June 14. Source.
Elon Musk says all Premium subscribers on X will gain access to AI chatbot Grok this week. Source.
Intel unveils AI PC program for software developers and hardware vendors. Source.
London-made HIV injection has potential to cure millions worldwide
A daily chronicle of AI Innovations: March 26th, 2024 : Zoom launches all-in-one modern AI collab platform; Stability AI launches instruction-tuned LLM; Stability AI CEO resigns to focus on decentralized AI; WhatsApp to integrate Meta AI directly into its search bar; Google, Intel, and Qualcomm challenge Nvidia’s dominance in AI; OpenAI pitches Sora to Hollywood studios
Zoom launches all-in-one modern AI collab platform
Zoom launched Zoom Workplace, an AI collaboration platform that integrates many tools to improve teamwork and productivity. With over 40 new features, including AI Companion updates for Zoom Phone, Team Chat, Events, and Contact Center, as well as the introduction of Ask AI Companion, Zoom Workplace simplifies workflows within a familiar interface.
The platform offers customization options, meeting features, and improved collaboration tools across Zoom’s ecosystem. Zoom Business Services, integrated with Zoom Workplace, offers AI-driven marketing, customer service, and sales solutions. It expands digital communication channels and provides real-time insights for better agent management.
Why does this matter?
This intelligent platform will increase productivity by automating tasks, summarizing interactions, and personalizing user experiences. This move positions Zoom as a frontrunner in the race to integrate AI into everyday work tools, which will reshape how teams communicate and collaborate.
Stability AI has introduced Stable Code Instruct 3B, a new instruction-tuned large language model. It can handle various software development tasks, such as code completion, generation, translation, and explanation, as well as creating database queries with simple instructions.
Stable Code Instruct 3B claims to outperform rival models like CodeLlama 7B Instruct and DeepSeek-Coder Instruct 1.3B in terms of accuracy, understanding natural language instructions, and handling diverse programming languages. The model is accessible for commercial use with a Stability AI Membership, while its weights are freely available on Hugging Face for non-commercial projects.
Why does this matter?
This model simplifies development workflows and complex tasks by providing contextual code completion, translation, and explanations. Businesses can prototype, iterate and ship software products faster thanks to its high performance and low hardware requirements.
Stability AI CEO resigns because of centralized AI
Stability AI CEO Emad Mostaque steps down to focus on decentralized AI, advocating for transparent governance in the industry.
Mostaque’s departure follows the appointment of interim co-CEOs Shan Shan Wong and Christian Laforte.
The startup, known for its image generation tool, faced challenges including talent loss and financial struggles.
Mostaque emphasized the importance of generative AI R&D over revenue growth and highlighted the potential economic value of open models in regulated industries.
The AI industry witnessed significant changes with Inflection AI co-founders joining Microsoft after raising $1.5 billion.
A 15% penetration of Sora for videos with realistic video generation demand and utilization will require about 720k Nvidia H100 GPUs. Each H100 requires about 700 Watts of power supply.
720,000 x 700 = 504 Megawatts.
By comparison, even the largest ever fully solar powered plan in America (Ivanpah Solar Power Facility) produces about 377 Megawats.
While these power requirements can be met with other options like nuclear plants and even coal/hydro plants of big sizes … are we really entering the power game for electricity ?
( it is currently a power game on compute)
What Else Is Happening in AI on March 26th, 2024
The Financial Times has introduced Ask FT, a new GenAI chatbot
It provides curated, natural-language responses to queries about recent events and broader topics covered by the FT. Ask FT is powered by Anthropic’s Claude and is available to a selected group of subscribers as it is under testing. (Link)
WhatsApp to integrate Meta AI directly into its search bar
The latest Android WhatsApp beta update will embed Meta AI directly into the search bar. This feature will allow users to type queries into the search bar and receive instant AI-powered responses without creating a separate Meta AI chat. The update will also allow users to interact with Meta AI even if they choose to hide the shortcut. (Link)
Google, Intel, and Qualcomm challenge Nvidia’s dominance in AI
Qualcomm, Google, and Intel are targeting NVIDIA’s software platforms like CUDA. They plan to create open-source tools compatible with multiple AI accelerator chips through the UXL Foundation. Companies are investing over $4 billion in startups developing AI software to loosen NVIDIA’s grip on the field. (Link)
Apple takes a multi-vendor approach for generative AI in iOS 18
Apple is reportedly in talks with Alphabet, OpenAI, and Anthropic to integrate generative AI capabilities from multiple vendors into iOS 18. This multi-vendor approach aligns with Apple’s efforts to balance advanced AI features with privacy considerations, which are expected to be detailed at WWDC 2024 during the iOS 18 launch. (Link)
OpenAI pitches Sora to Hollywood studios
OpenAI is actively engaging with Hollywood studios, directors, and talent agencies to integrate Sora into the entertainment industry. The startup has scheduled meetings in Los Angeles to showcase Sora’s capabilities and encourage partnerships, with CEO Sam Altman attending events during the Oscars weekend. (Link)
LLM providers charge you per token, but their tokens are not always comparable. So if you are putting Python code through GPT-4 and Claude 3, it would cost you 25% more tokens to do so with Claude, due to difference in their tokenisers (note: this is different to cost per token, it just means you will have more tokens to pay for)
Some observations: – OpenAI’s GPT-4 & 3.5 tokeniser is the most efficient for English and Python – Gemini absolutely demolishes the competition in the three languages I tested: French (-11%), Chinese (-43%) and Hebrew (-54%) – If your use cases is non-English, really worth looking at Gemini models – the difference in cost will likely be very noticeable – Llama 2 ranked at the bottom of all of my tests – Mistral was kind of disappointing on French (+16% worse than GPT), the reason why I picked French was that I assumed they’d do better
Methodology notes: – The study will be limited, I only compared 7 individual bits of text/code – so results in practice would vary – I have used this tokeniser playground (https://huggingface.co/spaces/Xenova/the-tokenizer-playground) for GPT, Mistral and Llama. I found it to be inaccurate (or old?) for Claude 3 and they didn’t have Gemini, so I did these separately – Tokens are only part of the puzzle, more efficient tokenisation won’t necessarily mean better performance or overall lower cost – If you want to learn about tokenisers, I recommend watching this video from Andrej Karpathy, even the first 10-20 minutes will be really worth your time https://www.youtube.com/watch?v=zduSFxRajkE
A daily chronicle of AI Innovations: March 25th, 2024 : Apple could partner with OpenAI, Gemini, Anthropic; Chatbots more likely to change your mind than another human, study says; Chatbots more likely to change your mind than another human, study says; Verbal Reasoning Test – Opus is better than 93% of people, Gemini 1.5 Pro 59%, GPT-4 Turbo only 36%; Apple’s Tim Cook says AI essential tool for businesses to reduce carbon footprint; Suno V3: Song-on-demand AI is getting insanely good; The first patient with a Neuralink brain-computer implant played Nintendo’s Mario Kart video game with his mind in an impressive new demo video
Apple could partner with OpenAI, Gemini, Anthropic
Apple is discussing with Alphabet, OpenAI, Anthropic, and potentially Baidu to integrate generative AI into iOS 18, considering multiple partners rather than a single one.
The collaboration could lead to a model where iPhone users might choose their preferred AI provider, akin to selecting a default search engine in a web browser.
Reasons for partnering with external AI providers include financial benefits, the possibility to quickly adapt through partnership changes or user preferences, and avoiding the complexities of developing and maintaining cloud-based generative AI in-house.
EU probes Apple, Google, Meta under new digital law
The European Commission has initiated five investigations into Apple, Google, and Meta for potential non-compliance with the Digital Markets Act (DMA), focusing on app store rules, search engine preferencing, and advertisement targeting models.
Investigations will also examine Apple’s app distribution fee structure and Amazon’s product preferencing, while Meta is given six months to make Messenger interoperable with other messaging services.
Companies may face fines up to 10% of their annual global revenue for DMA non-compliance, with the possibility of increased penalties for repeated infringements.
Chatbots more likely to change your mind than another human, study says
A study found that personalized chatbots, such as GPT-4, are more likely to change people’s minds compared to human debaters by using tailored arguments based on personal information.
The research conducted by the École Polytechnique Fédérale de Lausanne and the Italian Fondazione Bruno Kessler showed an 81.7 percent increase in agreement when GPT-4 had access to participants’ personal data like age, gender, and race.
Concerns were raised about the potential misuse of AI in persuasive technologies, especially with the ability to generate detailed user profiles from online activities, urging online platform operators to counter such strategies.
OpenAI CEO’s £142 Million Gamble On Unlocking the Secrets to Longer Life, Altman’s vision of extended lifespans may be achievable
Biotech startup Retro Biosciences is undertaking a one-of-a-kind experiment housed in shipping containers, funded by a $180 (£142.78) million investment by tech leader Sam Altman to increase lifespan.
Altman, the 38-year-old tech heavyweight, has been a significant player in the industry. Despite his young age, Altman took the tech realm by storm with offerings like ChatGPT and Sora. Unsurprisingly, his involvement in these groundbreaking projects has propelled him to a level of influence rivaling Mark Zuckerberg and Elon Musk, who is currently embroiled in a lawsuit with OpenAI.
It is also worth noting that the Altman-led AI startup is reportedly planning to launch its own AI-powered search engine to challenge Google’s search dominance. Altman’s visionary investments in tech giants like Reddit, Stripe, Airbnb, and Instacart propelled him to billionaire status. They cemented his influence as a tech giant who relentlessly pushed the boundaries of the industry’s future.
Suno V3 can do multiple languages in one song. This one is English, Portuguese, Japanese, and Italian. Incredible.
Beneath the vast sky, where dreams lay rooted deep, Mountains high and valleys wide, secrets they keep. Ground beneath my feet, firm and ever true, Earth, you give us life, in shades of brown and green hue.
Sopra o vento, mensageiro entre o céu e o mar, Carregando sussurros, histórias a contar. Dançam as folhas, em um balé sem fim, Vento, o alento invisível, guiando o destino assim.
Acqua, misteriosa forza che tutto scorre, Nei fiumi, nei mari, la vita che ci offre. Specchio del cielo, in te ci riflettiamo, Acqua, fonte di vita, a te ci affidiamo.
OpenAI Heading To Hollywood To Pitch Revolutionary “Sora”
Some of the most important meetings in Hollywood history will take place in the coming week, as OpenAI hits Hollywood to show the potential of its “Sora” software to studios, talent agencies, and media executives.
Bloomberg is reporting that OpenAI wants more filmmakers to become familiar with Sora, the text-to-video generator that potentially could upend the way movies are made.
Soon, Everyone Will Own a Robot, Like a Car or Phone Today. Says Figure AI founder
Brett Adcock, the founder of FigureAI robots, the company that recently released a demo video of its humanoid robot conversing with a human while performing tasks, predicts that everyone will own a robot in the future. “Similar to owning a car or phone today,” he said – hinting at the universal adoption of robots as an essential commodity in the future.
“Every human will own a robot in the future, similar to owning a car/phone today,” said Adcock.
A few months ago, Adcock called 2024 the year of Embodied AI, indicating how the future comprises AI in a body form. With robots learning to perform low-complexity tasks, such as picking trash, placing dishes, and even using the coffee machine, Figure robots are being trained to assist a person with house chores.
WhatsApp to embed Meta AI directly into search bar for instant assistance: Report.
WhatsApp is on the brink of a transformation in user interaction as it reportedly plans to integrate Meta AI directly into its search bar. This move promises to simplify access to AI assistance within the app, eliminating the need for users to navigate to a separate Meta AI conversation.
1️⃣ Technical Assistance & Troubleshooting (23%) 2️⃣ Content Creation & Editing (22%) 3️⃣ Personal & Professional Support (17%) 4️⃣ Learning & Education (15%) 5️⃣ Creativity & Recreation (13%) 6️⃣ Research, Analysis & Decision Making (10%)
What users are doing:
✔Generating ideas ✔Specific search ✔Editing text ✔Drafting emails ✔Simple explainers ✔Excel formulas ✔Sampling data
🤔 Do you see AI as a tool to enhance your work, or as a threat that could take over your job?
Source: HBR Image credit: Filtered
A daily chronicle of AI Innovations: March 22nd, 2024 : Nvidia’s Latte 3D generates text-to-3D in seconds! Saudi Arabia to invest $40 billion in AI Open Interpreter’s 01 Light personal pocket AI agent. Microsoft introduces a new Copilot for better productivity. Quiet-STaR: LMs can self-train to think before responding Neuralink’s first brain chip patient plays chess with his mind
Nvidia’s Latte 3D generates text-to-3D in seconds!
NVIDIA introduces Latte3D, facilitating the conversion of text prompts into detailed 3D models in less than a second. Developed by NVIDIA’s Toronto lab, Latte3D sets a new standard in generative AI models with its remarkable blend of speed and precision.
LATTE3D has two stages: first, NVIDIA’s team uses volumetric rendering to train the texture and geometry robustly, and second, it uses surface-based rendering to train only the texture for quality enhancement. Both stages use amortized optimization over prompts to maintain fast generation.
What sets Latte3D apart is its extensive pretraining phase, enabling the model to quickly adapt to new tasks by drawing on a vast repository of learned patterns and structures. This efficiency is achieved through a rigorous training regime that includes a blend of 3D datasets and prompts from ChatGPT.
Why does it matter?
AI models such as NVIDIA’s Latte3D have significantly reduced the time required to generate 3D visualizations from an hour to a few minutes compared to a few years ago. This technology has the potential to significantly accelerate the design and development process in various fields, such as the video game industry, advertising, and more.
Quiet-STaR: LMs can self-train to think before responding
A groundbreaking study demonstrates the successful training of large language models (LM) to reason from text rather than specific reasoning tasks. The research introduces a novel training approach, Quiet STaR, which utilizes a parallel sampling algorithm to generate rationales from all token positions in a given string.
This technique integrates meta tokens to indicate when the LM should generate a rationale and when it should make a prediction based on the rationale, revolutionizing the understanding of LM behavior. Notably, the study shows that thinking enables the LM to predict difficult tokens more effectively, leading to improvements with longer thoughts.
The research introduces powerful advancements, such as a non-myopic loss approach, the application of a mixing head for retrospective determination, and the integration of meta tokens, underpinning a comprehensive leap forward in language model training.
Why does it matter?
These significant developments in language modeling advance the field and have the potential to revolutionize a wide range of applications. This points towards a future where large language models will unprecedentedly contribute to complex reasoning tasks.
Neuralink’s first brain chip patient plays chess with his mind
Elon Musk’s brain chip startup, Neuralink, showcased its first brain chip patient playing chess using only his mind. The patient, Noland Arbaugh, was paralyzed below the shoulder after a diving accident.
Neuralink’s brain implant technology allows people with paralysis to control external devices using their thoughts. With further advancements, Neuralink’s technology has the potential to revolutionize the lives of people with paralysis, providing them with newfound independence and the ability to interact with the world in previously unimaginable ways.
Why does it matter?
Neuralink’s brain chip holds significant importance in AI and human cognition. It has the potential to enhance communication, assist paralyzed individuals, merge human intelligence with AI, and address the risks associated with AI development. However, ethical considerations and potential misuse of this technology must also be carefully examined.
Microsoft introduces a new Copilot for better productivity.
Microsoft’s new Copilot for Windows and Surface devices is a powerful productivity tool integrating large language models with Microsoft Graph and Microsoft 365 apps to enhance work efficiency. With a focus on delivering AI responsibly while ensuring data security and privacy, Microsoft is dedicated to providing users with innovative tools to thrive in the evolving work landscape. (Link)
Saudi Arabia to invest $40 billion in AI
Saudi Arabia has announced its plan to invest $40 billion in AI to become a global leader. Middle Eastern countries use their sovereign wealth fund, which has over $900 billion in assets, to achieve this goal. This investment aims to position the country at the forefront of the fast-evolving AI sector, drive innovation, and enhance economic growth. (Link)
Rightsify releases Hydra II to revolutionize AI music generation
Rightsify, a global music licensing leader, introduced Hydra II, the latest AI generation model. Hydra II offers over 800 instruments, 50 languages, and editing tools for customizable, copyright-free AI music. The model is trained on audio, text descriptions, MIDI, chord progressions, sheet music, and stems to create unique generations. (Link)
Open Interpreter’s 01 Light personal pocket AI agent
The Open Interpreter unveiled 01 Light, a portable device that allows you to control your computer using natural language commands. It’s part of an open-source project to make computing more accessible and flexible. It’s designed to make your online tasks more manageable, helping you get more done and simplify your life. (Link)
Microsoft’s $650 million Inflection deal: A strategic move Microsoft has recently entered into a significant deal with AI startup Inflection, involving a payment of $650 million in cash. While the deal may seem like a licensing agreement, it appears to be a strategic move by Microsoft to acquire AI talent while avoiding potential regulatory trouble. (Link)
Microsoft unveiled its first “AI PCs,” with a dedicated Copilot key and Neural Processing Units (NPUs).
Source: Nvidia
OpenAI Courts Hollywood in Meetings With Film Studios, Directors – from Bloomberg
The artificial intelligence startup has scheduled meetings in Los Angeles next week with Hollywood studios, media executives and talent agencies to form partnerships in the entertainment industry and encourage filmmakers to integrate its new AI video generator into their work, according to people familiar with the matter.
The upcoming meetings are just the latest round of outreach from OpenAI in recent weeks, said the people, who asked not to be named as the information is private. In late February, OpenAI scheduled introductory conversations in Hollywood led by Chief Operating Officer Brad Lightcap. Along with a couple of his colleagues, Lightcap demonstrated the capabilities of Sora, an unreleased new service that can generate realistic-looking videos up to about a minute in length based on text prompts from users. Days later, OpenAI Chief Executive Officer Sam Altman attended parties in Los Angeles during the weekend of the Academy Awards.
In an attempt to avoid defeatism, I’m hoping this will contribute to the indie boom with creatives refusing to work with AI and therefore studios who insist on using it. We’ve already got people on twitter saying this is the end of the industry but maybe only tentpole films as we know them.
Catherine, the Princess of Wales, has cancer, she announced in a video message released by Kensington Palace on Friday March 22nd, 2024
The recent news surrounding Kate Middleton, the Princess of Wales, revolves around a manipulated family photo that sparked controversy and conspiracy theories. The photo, released by Middleton herself, depicted her with her three children and was met with speculation about potential AI involvement in its editing. However, experts suggest that the image was likely manipulated using traditional photo editing software like Photoshop rather than generative AI
The circumstances surrounding Middleton’s absence from the public eye due to abdominal surgery fueled rumors and intensified scrutiny over the edited photo.
Major news agencies withdrew the image, citing evidence of manipulation in areas like Princess Charlotte’s sleeve cuff and the alignment of elements in the photo.
Despite concerns over AI manipulation, this incident serves as a reminder that not all image alterations involve advanced technology, with this case being attributed to a botched Photoshop job.From an AI perspective, experts highlight how the incident reflects society’s growing awareness of AI technologies and their impact on shared reality. The controversy surrounding the edited photo underscores the need for transparency and accountability in media consumption to combat misinformation and maintain trust in visual content. As AI tools become more accessible and sophisticated, distinguishing between authentic and manipulated media becomes increasingly challenging, emphasizing the importance of educating consumers and technologists on identifying AI-generated content.Kate Middleton, the Princess of Wales, recently disclosed her battle with cancer in a heartfelt statement. Following major abdominal surgery in January, it was initially believed that her condition was non-cancerous. However, subsequent tests revealed the presence of cancer, leading to the recommendation for preventative chemotherapy. The 42-year-old princess expressed gratitude for the support received during this challenging time and emphasized the importance of privacy as she focuses on her treatment and recovery. The news of her diagnosis has garnered an outpouring of support from around the world, with messages of encouragement coming from various public figures and officials.
Nvidia CEO says we’ll see fully AI-generated games in 5-10 years
Nvidia’s CEO, Jensen Huang, predicts the emergence of fully AI-generated games within the next five to ten years. This prediction is based on the development of Nvidia’s next-generation Blackwell AI GPU, the B200. This GPU marks a significant shift in GPU usage towards creating neural networks for generating content rather than traditional rasterization or ray tracing for visual fidelity in games. The evolution of AI in gaming is highlighted as GPUs transition from rendering graphics to processing AI algorithms for content creation, indicating a major transformation in the gaming industry’s future landscape.The integration of AI into gaming represents a paradigm shift that could revolutionize game development and player experiences. Fully AI-generated games have the potential to offer unprecedented levels of customization, dynamic storytelling, and adaptive gameplay based on individual player interactions. This advancement hints at a new era of creativity and innovation in game design but also raises questions about the ethical implications and challenges surrounding AI-generated content, such as ensuring diversity, fairness, and avoiding biases in virtual worlds. Source
Andrew Ng, cofounder of Google Brain & former chief scientist @ Baidu- “I think AI agentic workflows will drive massive AI progress this year — perhaps even more than the next generation of foundation models.
This is an important trend, and I urge everyone who works in AI to pay attention to it.”
I think AI agentic workflows will drive massive AI progress this year — perhaps even more than the next generation of foundation models. This is an important trend, and I urge everyone who works in AI to pay attention to it.
Today, we mostly use LLMs in zero-shot mode, prompting a model to generate final output token by token without revising its work. This is akin to asking someone to compose an essay from start to finish, typing straight through with no backspacing allowed, and expecting a high-quality result. Despite the difficulty, LLMs do amazingly well at this task!
With an agentic workflow, however, we can ask the LLM to iterate over a document many times. For example, it might take a sequence of steps such as:
Plan an outline.
Decide what, if any, web searches are needed to gather more information.
Write a first draft.
Read over the first draft to spot unjustified arguments or extraneous information.
Revise the draft taking into account any weaknesses spotted.
And so on.
This iterative process is critical for most human writers to write good text. With AI, such an iterative workflow yields much better results than writing in a single pass.
Devin’s splashy demo recently received a lot of social media buzz. My team has been closely following the evolution of AI that writes code. We analyzed results from a number of research teams, focusing on an algorithm’s ability to do well on the widely used HumanEval coding benchmark. You can see our findings in the diagram below.
GPT-3.5 (zero shot) was 48.1% correct. GPT-4 (zero shot) does better at 67.0%. However, the improvement from GPT-3.5 to GPT-4 is dwarfed by incorporating an iterative agent workflow. Indeed, wrapped in an agent loop, GPT-3.5 achieves up to 95.1%.
Open source agent tools and the academic literature on agents are proliferating, making this an exciting time but also a confusing one. To help put this work into perspective, I’d like to share a framework for categorizing design patterns for building agents. My team AI Fund is successfully using these patterns in many applications, and I hope you find them useful.
Reflection: The LLM examines its own work to come up with ways to improve it.
Tool use: The LLM is given tools such as web search, code execution, or any other function to help it gather information, take action, or process data.
Planning: The LLM comes up with, and executes, a multistep plan to achieve a goal (for example, writing an outline for an essay, then doing online research, then writing a draft, and so on).
Multi-agent collaboration: More than one AI agent work together, splitting up tasks and discussing and debating ideas, to come up with better solutions than a single agent would.
A daily chronicle of AI Innovations: March 21st, 2024 : Stealing Part of a Production Language Model Sakana AI’s method to automate foundation model development Key Stable Diffusion researchers leave Stability AI Character AI’s new feature adds voice to characters with just 10-sec audio Fitbit to get major AI upgrades powered by Google’s ‘Personal Health’ LLM Samsung creates lab to research chips for AI’s next phase GitHub’s latest AI tool can automatically fix code vulnerabilities
Stealing Part of a Production Language Model
Researchers from Google, OpenAI, and DeepMind (among others) released a new paper that introduces the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI’s ChatGPT or Google’s PaLM-2.
The attack allowed them to recover the complete embedding projection layer of a transformer language model. It differs from prior approaches that reconstruct a model in a bottom-up fashion, starting from the input layer. Instead, this operates top-down and directly extracts the model’s last layer by making targeted queries to a model’s API. This is useful for several reasons; it
Reveals the width of the transformer model, which is often correlated with its total parameter count.
Slightly reduces the degree to which the model is a complete “blackbox”
May reveal more global information about the model, such as relative size differences between different models
While there appear to be no immediate practical consequences of learning this layer is stolen, it represents the first time that any precise information about a deployed transformer model has been stolen.
Why does this matter?
Though it has limitations, the paper motivates the further study of practical attacks on ML models, in order to ultimately develop safer and more reliable AI systems. It also highlights how small, system-level design decisions impact the safety and security of the full product.
Sakana AI’s method to automate foundation model development
Sakana AI has introduced Evolutionary Model Merge, a general method that uses evolutionary techniques to efficiently discover the best ways to combine different models from the vast ocean of different open-source models with diverse capabilities.
As of writing, Hugging Face has over 500k models in dozens of different modalities that, in principle, could be combined to form new models with new capabilities. By working with the vast collective intelligence of existing open models, this method is able to automatically create new foundation models with desired capabilities specified by the user.
Why does this matter?
Model merging shows great promise and democratizes up model-building. In fact, the current Open LLM Leaderboard is dominated by merged models. They work without any additional training, making it very cost-effective. But we need a more systematic approach.
Evolutionary algorithms, inspired by natural selection, can unlock more effective merging. They can explore vast possibilities, discovering novel and unintuitive combinations that traditional methods and human intuition might miss.
Key Stable Diffusion researchers leave Stability AI
Robin Rombach and other key researchers who helped develop the Stable Diffusion text-to-image generation model have left the troubled, once-hot, now floundering GenAI startup.
Rombach (who led the team) and fellow researchers Andreas Blattmann and Dominik Lorenz were three of the five authors who developed the core Stable Diffusion research while at a German university. They were hired afterwards by Stability. Last month, they helped publish a 3rd edition of the Stable Diffusion model, which, for the first time, combined the diffusion structure used in earlier versions with transformers used in OpenAI’s ChatGPT.
Their departures are the latest in a mass exodus of executives at Stability AI, as its cash reserves dwindle and it struggles to raise additional funds.
Why does this matter?
Stable Diffusion is one of the foundational models that helped catalyze the boom in generative AI imagery, but now its future hangs in the balance. While Stability AI’s current situation raises questions about its long-term viability, the exodus potentially benefits its competitors.
Character AI’s new feature adds voice to characters with just 10-sec audio
You can now give voice to your Characters by choosing from thousands of voices or creating your own. The voices are created with just 10 seconds of audio clips. The feature is now available for free to everyone. (Link)
GitHub’s latest AI tool can automatically fix code vulnerabilities
GitHub launches the first beta of its code-scanning autofix feature, which finds and fixes security vulnerabilities during the coding process. GitHub claims it can remediate more than two-thirds of the vulnerabilities it finds, often without the developers having to edit the code. The feature is now available for all GitHub Advanced Security (GHAS) customers. (Link)
OpenAI plans to release a ‘materially better’ GPT-5 in mid-2024
According to anonymous sources from Businessinsider, OpenAI plans to release GPT-5 this summer, which will be significantly better than GPT-4. Some enterprise customers are said to have already received demos of the latest model and its ChatGPT improvements. (Link)
Fitbit to get major AI upgrades powered by Google’s ‘Personal Health’ LLM
Google Research and Fitbit announced they are working together to build a Personal Health LLM that gives users more insights and recommendations based on their data in the Fitbit mobile app. It will give Fitbit users personalized coaching and actionable insights that help them achieve their fitness and health goals. (Link)
Samsung creates lab to research chips for AI’s next phase
Samsung has set up a research lab dedicated to designing an entirely new type of semiconductor needed for (AGI). The lab will initially focus on developing chips for LLMs with a focus on inference. It aims to release new “chip designs, an iterative model that will provide stronger performance and support for increasingly larger models at a fraction of the power and cost.” (Link)
A daily chronicle of AI Innovations: March 20th, 2024 : OpenAI to release GPT-5 this summer; Nvidia’s Jensen Huang says AI hallucinations are solvable, AGI is 5 years away; Ozempic creator plans AI supercomputer to discover new drugs; After raising $1.3B, Inflection eaten alive by Microsoft; MindEye2: AI Mind Reading from Brain Activity; Nvidia NIM enables faster deployment of AI models
OpenAI to release GPT-5 this summer
OpenAI is planning to launch GPT-5 around mid-year, aiming to address previous performance issues and significantly improve upon its predecessor, GPT-4.
GPT-5 is described as “materially better” by those who have seen demos, including enhancements and new capabilities like the ability to call AI agents for autonomous tasks, with enterprise customers having already previewed these improvements.
The release timeline for GPT-5 remains uncertain as OpenAI continues its training and thorough safety and vulnerability testing, with no specific deadline for completion of these preparatory steps.
After raising $1.3B, Inflection eaten alive by Microsoft
In June 2023, Inflection raised $1.3 billion led by Microsoft to develop “more personal AI” but was overtaken by Microsoft less than a year later, with co-founders joining Microsoft’s new AI division.
Despite significant investment, Inflection’s AI, Pi, failed to compete with advancements from other companies such as OpenAI, Google’s Gemini, and Anthropic, leading to its downfall.
Microsoft’s takeover of Inflection reflects the strategy of legacy tech companies to dominate the AI space by supporting startups then acquiring them once they face challenges.
Nvidia’s Jensen Huang says AI hallucinations are solvable, AGI is 5 years away
Nvidia CEO Jensen Huang predicts artificial general intelligence (AGI) could be achieved within 5 years, depending on how AGI is defined and measured.
Huang addresses concerns around AI hallucinations, suggesting that ensuring answers are well-researched could easily solve the issue.
The concept of AGI raises concerns about its potential unpredictability and the challenges of aligning its objectives with human values and priorities.
Ozempic creator plans AI supercomputer to discover new drugs
The Novo Nordisk Foundation is investing in “Gefion,” an AI supercomputer project developed in collaboration with Nvidia.
“Gefion” aims to be the world’s most powerful AI supercomputer for health sciences, utilizing Nvidia’s new chips to accelerate scientific breakthroughs in critical areas such as drug discovery, disease diagnosis, and treatment,
This initiative underscores the growing integration of AI in healthcare, promising to catalyze significant scientific discoveries and innovations that could transform patient care and outcomes.
MindEye2 is a revolutionary model that reconstructs visual perception from brain activity using just one hour of data. Traditional methods require extensive training data, making them impractical for real-world applications. However, MindEye2 overcomes this limitation by leveraging shared-subject models. The model is pretrained on data from seven subjects and then fine-tuned with minimal data from a new subject.
By mapping brain activity to a shared-subject latent space and then nonlinear mapping to CLIP image space, MindEye2 achieves high-quality reconstructions with limited training data. It performs state-of-the-art image retrieval and reconstruction across multiple subjects within only 2.5% of the previously required training data, reducing the training time from 40 to just one hour.
Why does it matter?
MindEye2 has the potential to revolutionize clinical assessments and brain-computer interface applications. This remarkable achievement also holds great promise for neuroscience and opens new possibilities for understanding how our brains perceive and process visual information. It can also help develop personalized treatment plans for neuro patients.
NVIDIA has introduced NVIDIA NIM (NVIDIA Inference Microservices) to accelerate the deployment of AI applications for businesses. NIM is a collection of microservices that package essential components of an AI application, including AI models, APIs, and libraries, into a container. These containers can be deployed in environments such as cloud platforms, Linux servers, or serverless architectures.
NIM significantly reduces the time it takes to deploy AI applications from weeks to minutes. It offers optimized inference engines, industry-standard APIs, and support for popular software and data platform vendors. NIM microservices are compatible with NVIDIA GPUs and support features like Retrieval Augmented Generation (RAG) capabilities for enhanced enterprise applications. Developers can experiment with NIM microservices for free on the ai.nvidia.com platform, while commercial deployment is available through NVIDIA AI Enterprise 5.0.
Why does it matter?
With NIM, Nvidia is trying to democratize AI deployment for enterprises by abstracting away complexities. This will enable more developers to contribute to their company’s AI transformation efforts and allow businesses to run AI applications almost instantly without specialized AI expertise.
Microsoft hires DeepMind co-founder to lead a new AI division
Mustafa Suleyman, a renowned co-founder of DeepMind and Inflection, has recently joined Microsoft as the leader of Copilot. Satya Nadella, Microsoft’s CEO, made this significant announcement, highlighting the importance of innovation in artificial intelligence (AI).
In his new role as the Executive Vice President and CEO of Microsoft AI, Mustafa will work alongside Karén Simonyan, another talented individual from Inflection who will serve as Chief Scientist. Together, they will spearhead the development and advancement of Copilot and other exciting consumer AI products at Microsoft. Mustafa and his team’s addition to the Microsoft family brings a wealth of expertise and promises groundbreaking advancements in AI.
Why does it matter?
Mustafa Suleyman’s expertise in AI is expected to contribute to the development of innovative consumer AI products and research at Microsoft, furthering its mission to bring the benefits of AI to people and organizations worldwide. With DeepMind’s founder now at the helm, the AI race between Microsoft, Google, and others became even more intense.
Truecaller adds AI-powered spam detection and blocking for Android users
Truecaller has unveiled a new feature for its Android premium subscribers that uses AI to detect spam, even if unavailable on the Truecaller database, and block every call that doesn’t come from an approved contact. Truecaller hopes to add more premium subscribers to its list by adding this feature. However, this feature is not available for Apple users. (Link)
Google DeepMind’s new AI tool can analyze soccer tactics and offer insights
DeepMind has partnered with Liverpool FC to develop a new AI tool called TacticAI. TacticAI uses generative and predictive AI to help coaches determine which player will most likely receive the ball during corner kicks, whether a shot will be taken, and how to adjust player setup. It aims to revolutionize soccer and help the teams enhance their efficiency. (Link)
Pika Labs introduces sound effects for its gen-AI video generation
Pika Labs has now added the ability to create sound effects from a text prompt for its generative artificial intelligence videos. It allows for automatic or custom SFX generations to pair with video outputs. Now, users can make bacon sizzle, lions roar, or add footsteps to the video of someone walking down the street. It is only available to pro users. (Link)
Buildbox 4 Alpha enables users to create 3D video games from text prompts
Buildbox has released an alpha version of Buildbox 4. It’s an AI-first game engine that allows users to create games and generate assets from text prompts. The alpha version aims to make text-to-game a distinct reality. Users can create various assets and animations from simple text prompts. It also allows users to build a gaming environment in a few minutes. (Link)
Nvidia adds generative AI capabilities to empower humanoid robots
Nvidia introduced Project GR00T, a multimodal AI that will power future humanoids with advanced foundation AI. Project GR00T enables humanoid robots to input text, speech, videos, or even live demos and process them to take specific actions. It has been developed with the help of Nvidia’s Isaac Robotic Platform tools, including an Isaac Lab for RLHF. (Link)
A daily chronicle of AI Innovations: March 19th, 2024 : Nvidia launches ‘world’s most powerful AI chip’; Stability AI’s SV3D turns a single photo into a 3D video; OpenAI CEO hints at “Amazing Model”, maybe ChatGPT-5 ; Apple is in talks to bring Google’s AI to iPhones
Nvidia launches ‘world’s most powerful AI chip’
Nvidia has revealed its new Blackwell B200 GPU and GB200 “superchip”, claiming it to be the world’s most powerful chip for AI. Both B200 and GB200 are designed to offer powerful performance and significant efficiency gains.
Key takeaways:
The B200 offers up to 20 petaflops of FP4 horsepower, and Nvidia says it can reduce costs and energy consumption by up to 25 times over an H100.
The GB200 “superchip” can deliver 30X the performance for LLM inference workloads while also being more efficient.
Nvidia claims that just 2,000 Blackwell chips working together could train a GPT -4-like model comprising 1.8 trillion parameters in just 90 days.
Why does this matter?
A major leap in AI hardware, the Blackwell GPU boasts redefined performance and energy efficiency. This could lead to lower operating costs in the long run, making high-performance computing more accessible for AI research and development, all while promoting eco-friendly practices.
Stability AI’s SV3D turns a single photo into a 3D video
Stability AI released Stable Video 3D (SV3D), a new generative AI tool for rendering 3D videos. SV3D can create multi-view 3D models from a single image, allowing users to see an object from any angle. This technology is expected to be valuable in the gaming sector for creating 3D assets and in e-commerce for generating 360-degree product views.
SV3D builds upon Stability AI’s previous Stable Video Diffusion model. Unlike prior methods, SV3D can generate consistent views from any given angle. It also optimizes 3D meshes directly from the novel views it produces.
SV3D comes in two variants: SV3D_u generates orbital videos from single images, and SV3D_p creates 3D videos along specified camera paths.
Why does this matter?
SV3D represents a significant leap in generative AI for 3D content. Its ability to create 3D models and videos from a single image could open up possibilities in various fields, such as animation, virtual reality, and scientific modeling.
OpenAI CEO hints at “Amazing Model,” maybe ChatGPT-5
OpenAI CEO Sam Altman has announced that the company will release an “amazing model” in 2024, although the name has not been finalized. Altman also mentioned that OpenAI plans to release several other important projects before discussing GPT-5, one of which could be the Sora video model.
Altman declined to comment on the Q* project, which is rumored to be an AI breakthrough related to logic. He also expressed his opinion that GPT-4 Turbo and GPT-4 “kind of suck” and that the jump from GPT-4 to GPT-5 could be as significant as the improvement from GPT-3 to GPT-4.
Why does this matter?
This could mean that after Google Gemini and Claude-3’s latest version, a new model, possibly ChatGPT-5, could be released in 2024. Altman’s candid remarks about the current state of AI models also offer valuable context for understanding the anticipated advancements and challenges in the field.
Project GR00T is an ambitious initiative aiming to develop a general-purpose foundation model for humanoid robot learning, addressing embodied AGI challenges. Collaborating with leading humanoid companies worldwide, GR00T aims to understand multimodal instructions and perform various tasks.
GROOT is a foundational model that takes language, videos, and example demonstrations as inputs so it can produce the next action.
What the heck does that mean?
➡️ It means you can show it how to do X a few times, and then it can do X on its own.
Google’s new fine-tuned model is a HUGE improvement, AI is coming for human doctors sooner than most believe.
NVIDIA creates Earth-2 digital twin: generative AI to simulate, visualize weather and climate. Source
What Else Is Happening in AI on March 19th, 2024
Apple is in talks to bring Google’s AI to iPhones
Apple and Google are negotiating a deal to integrate Google’s Gemini AI into iPhones, potentially shaking up the AI industry. The deal would expand on their existing search partnership. Apple also held discussions with OpenAI. If successful, the partnership could give Gemini a significant edge with billions of potential users. (Link)
YouTube rolls out AI content labels
YouTube now requires creators to self-label AI-generated or synthetic content in videos. The platform may add labels itself for potentially misleading content. However, the tool relies on creators being honest, as YouTube is still working on AI detection tools. (Link)
Roblox speeds up 3D creation with AI tools Roblox has introduced two AI-driven tools to streamline 3D content creation on its platform. Avatar Auto Setup automates the conversion of 3D body meshes into fully animated avatars, while Texture Generator allows creators to quickly alter the appearance of 3D objects using text prompts, enabling rapid prototyping and iteration.(Link)
Nvidia teams up with Shutterstock and Getty Images for AI-generated 3D content
Nvidia’s Edify AI can now create 3D content, and partnerships with Shutterstock and Getty Images will make it accessible to all. Developers can soon experiment with these models, while industry giants are already using them to create stunning visuals and experiences. (Link)
Adobe Substance 3D introduces AI-powered text-to-texture tools
Adobe has introduced two AI-driven features to its Substance 3D suite: “Text to Texture,” which generates photo-realistic or stylized textures from text prompts, and “Generative Background,” which creates background images for 3D scenes. Both tools use 2D imaging technology from Adobe’s Firefly AI model to streamline 3D workflows. (Link)
A daily chronicle of AI Innovations: March 18th, 2024 – Bernie’s 4 day workweek: less work, same pay – Google’s AI brings photos to life as talking avatars – Elon Musk’s xAI open-sources Grok AI
Bernie’s 4 day workweek: less work, same pay
Sen. Bernie Sanders has introduced the Thirty-Two Hour Workweek Act, which aims to establish a four-day workweek in the United States without reducing pay or benefits. To be phased in over four years, the bill would lower the overtime pay threshold from 40 to 32 hours, ensuring that workers receive 1.5 times their regular salary for work days longer than 8 hours and double their regular wage for work days longer than 12 hours.
Sanders, along with Sen. Laphonza Butler and Rep. Mark Takano, believes that this bill is crucial in ensuring that workers benefit from the massive increase in productivity driven by AI, automation, and new technology. The legislation aims to reduce stress levels and improve Americans’ quality of life while also protecting their wages and benefits.
Why does this matter?
This bill could alter the workforce dynamics. Businesses may need to assess staffing and invest in AI to maintain productivity. While AI may raise concerns over job displacements, it also offers opportunities for better work-life balance through efficiency gains by augmenting human capabilities.
Google’s AI brings photos to life as talking avatars
Google’s latest AI research project VLOGGER, automatically generates realistic videos of talking and moving people from just a single image and an audio or text input. It is the first model that aims to create more natural interactions with virtual agents by including facial expressions, body movements, and gestures, going beyond simple lip-syncing.
It uses a two-step process: first, a diffusion-based network predicts body motion and facial expressions based on the audio, and then a novel architecture based on image diffusion models generates the final video while maintaining temporal consistency. VLOGGER outperforms previous state-of-the-art methods in terms of image quality, diversity, and the range of scenarios it can handle.
Why does this matter?
VLOGGER’s flexibility and applications could benefit remote work, education, and social interaction, making them more inclusive and accessible. Also, as AR/VR technologies advance, VLOGGER’s avatars could create emotionally resonant experiences in gaming, entertainment, and professional training scenarios.
Elon Musk’s xAI has open-sourced the base model weights and architecture of its AI chatbot, Grok. This allows researchers and developers to freely use and build upon the 314 billion parameter Mixture-of-Experts model. Released under the Apache 2.0 license, the open-source version is not fine-tuned for any particular task.
Why does this matter?
This move aligns with Musk’s criticism of companies that don’t open-source their AI models, including OpenAI, which he is currently suing for allegedly breaching an agreement to remain open-source. While several fully open-source AI models are available, the most used ones are closed-source or offer limited open licenses.
Maisa has released the beta version of its Knowledge Processing Unit (KPU), an AI system that uses LLMs’ advanced reasoning and data processing abilities. In an impressive demo, the KPU assisted a customer with an order-related issue, even when the customer provided an incorrect order ID, showing the system’s understanding abilities. (Link)
PepsiCo increases market domination using GenAI
PepsiCo uses GenAI in product development and marketing for faster launches and better profitability. It has increased market penetration by 15% by using GenAI to improve the taste and shape of products like Cheetos based on customer feedback. The company is also doubling down on its presence in India, with plans to open a third capability center to develop local talent. (Link)
Deci launches Nano LLM & GenAI dev platform
Israeli AI startup Deci has launched two major offerings: Deci-Nano, a small closed-source language model, and a complete Generative AI Development Platform for enterprises. Compared to rivals like OpenAI and Anthropic, Deci-Nano offers impressive performance at low cost, and the new platform offers a suite of tools to help businesses deploy and manage AI solutions. (Link)
Invoke AI simplifies game dev workflows
Invoke has launched Workflows, a set of AI tools designed for game developers and large studios. These tools make it easier for teams to adopt AI, regardless of their technical expertise levels. Workflows allow artists to use AI features while maintaining control over their training assets, brand-specific styles, and image security. (Link)
Mercedes teams up with Apptronik for robot workers
Mercedes-Benz is collaborating with robotics company Apptronik to automate repetitive and physically demanding tasks in its manufacturing process. The automaker is currently testing Apptronik’s Apollo robot, a 160-pound bipedal machine capable of lifting objects up to 55 pounds. The robot inspects and delivers components to human workers on the production line, reducing the physical strain on employees and increasing efficiency. (Link)
A daily chronicle of AI Innovations: Week 2 Recap
DeepSeek released DeepSeek-VL, an open-source Vision-Language (VL) model designed for real-world vision and language understanding applications. The DeepSeek-VL family, includes 7B and1.3B base and chat models and achieves state-of-the-art or competitive performance across a wide range of visual-language benchmarks. Free for commercial use [Details | Hugging Face | Demo]
Cohere released Command-R, a 35 billion parameters generative model with open weights, optimized for long context tasks such as retrieval augmented generation (RAG) and using external APIs and tools for production-scale AI for enterprise [Details|Hugging Face].
Google DeepMind introduced SIMA (Scalable Instructable Multiworld Agent), a generalist AI agent for 3D virtual environments, trained on nine different video games. It can understand a broad range of gaming worlds, and follows natural-language instructions to carry out tasks within them, as a human might. It doesn’t need access to a game’s source code or APIs and requires only the images on screen, and natural-language instructions provided by the user. SIMA uses keyboard and mouse outputs to control the games’ central character to carry out these instructions [Details].
Meta AI introduces Emu Video Edit (EVE), a model that establishes a new state-of-the art in video editing without relying on any supervised video editing data [Details].
Cognition Labs introduced Devin, the first fully autonomous AI software engineer. Devin can learn how to use unfamiliar technologies, can build and deploy apps end to end, can train and fine tune its own AI models. When evaluated on the SWE-Bench benchmark, which asks an AI to resolve GitHub issues found in real-world open-source projects, Devin correctly resolves 13.86% of the issues unassisted, exceeding the previous state-of-the-art model performance of 1.96% unassisted and 4.80% assisted [Details].
Pika Labs adds sound effects to its AI video tool, Pika, allowing users to either prompt desired sounds or automatically generate them based on video content. [Video link].
Anthropic’s Claude 3 Opus ranks #1 on LMSYS Chatbot Arena Leaderboard, along with GPT-4 [Link].
The European Parliament approved the Artificial Intelligence Act. The new rules ban certain AI applications including biometric categorisation systems, Emotion recognition in the workplace and schools, social scoring and more [Details].
Huawei Noah’s Ark Lab introduced PixArt–Σ, a Diffusion Transformer model (DiT) capable of directly generating images at 4K resolution. It achieves superior image quality and user prompt adherence with significantly smaller model size (0.6B parameters) than existing text-to-image diffusion models, such as SDXL (2.6B parameters) and SD Cascade (5.1B parameters) [Details].
South Korean startup Hyodol AI has launched a $1,800 LLM-powered companion doll specifically designed to offer emotional support and companionship to the rapidly expanding elderly demographic in the country [Details].
Covariant introduced RFM-1 (Robotics Foundation Model -1), a large language model (LLM), but for robot language. Set up as a multimodal any-to-any sequence model, RFM-1 is an 8 billion parameter transformer trained on text, images, videos, robot actions, and a range of numerical sensor readings [Details].
Figure 01 robot integrated with an OpenAI vision-language model can now have full conversations with people [Link]
Deepgram announced the general availability of Aura, a text-to-speech model built for responsive, conversational AI agents and applications [Details | Demo].
Claude 3 Haiku model is now available alongside Sonnet and Opus in the Claude API and on claude.ai for Pro subscribers. Haiku outperforms GPT-3.5 and Gemini 1.0 pro while costing less, and is three times faster than its peers for the vast majority of workloads [Details].
Paddle announced AI Launchpad, a 6-week remote program for AI founders to launch and scale an AI business with $20,000 in cash prize [Details].
Midjourney adds feature for generating consistent characters across multiple gen AI images [Details].
The Special Committee of the OpenAI Board announced the completion of the review. Altman, Brockman to continue to lead OpenAI [Details]
Together.ai introduced Sequoia, a scalable, robust, and hardware-aware speculative decoding framework that improves LLM inference speed on consumer GPUs (with offloading), as well as on high-end GPUs (on-chip), without any approximations [Details].
OpenAI released Transformer Debugger (TDB), a tool developed and used internally by OpenAI’s Superalignment team for investigating into specific behaviors of small language models [GitHub].
Elon Musk announced that xAI will open source Grok this week [Link].
A Daily Chronicle of AI Innovations – March 16th, 2024:
Reddit is under investigation by the FTC for its data licensing practices concerning user-generated content being used to train AI models.
The investigation focuses on Reddit’s engagement in selling, licensing, or sharing data with third parties for AI training.
Reddit anticipates generating approximately USD 60 million in 2024 from a data licensing agreement with Google, aiming to leverage its platform data for training LLMs
Researchers identified a new vulnerability in leading AI language models, named ArtPrompt, which uses ASCII art to exploit the models’ security mechanisms.
ArtPrompt masks security-sensitive words with ASCII art, fooling language models like GPT-3.5, GPT-4, Gemini, Claude, and Llama2 into performing actions they would otherwise block, such as giving instructions for making a bomb.
The study underscores the need for enhanced defensive measures for language models, as ArtPrompt, by leveraging a mix of text-based and image-based inputs, can effectively bypass current security protocols.
OpenAI aims to make its own AI processors — chip venture in talks with Abu Dhabi investment firm. Source
Once “too scary” to release, GPT-2 gets squeezed into an Excel spreadsheet. Source
A Daily Chronicle of AI Innovations – March 15th, 2024:
Apple quietly acquires another AI startup
Mercedes tests humanoid robots for ‘low skill, repetitive’ tasks
Midjourney bans prompts with Joe Biden and Donald Trump over election misinformation concerns
El Salvador stashes $406 million in bitcoin in ‘cold wallet’
Microsoft calls out Google dominance in generative AI
Anthropic releases affordable, high-speed Claude 3 Haiku model
Apple’s MM1 AI model shows state-of-the-art language and vision capabilities. It was trained on a filtered dataset of 500 million text-image pairs from the web, including 10% text-only docs to improve language understanding.
The team experimented with different configurations during training. They discovered that using an external pre-trained high-resolution image encoder improved visual recognition. Combining different image, text, and caption data ratios led to the best performance. Synthetic caption data also enhanced few-shot learning abilities.
This experiment cements that using a blend of image caption, interleaved image text, and text-only data is crucial for achieving state-of-the-art (SOTA) few-shot results across multiple benchmarks.
Why does it matter?
Apple’s new model is promising, especially in developing image recognition systems for new categories or domains. This will help businesses and startups improve the speed of AI tool development specifically for text-to-image, document analysis, and enhanced visual recognition.
Cerebras Systems has made a groundbreaking announcement unveiling its latest wafer-scale AI chip, the WSE-3. This chip boasts an incredible 4 trillion transistors, making it one of the most powerful AI chips on the market. The third-generation wafer-scale AI mega chip is twice as powerful as its predecessor while being power efficient.
The chip’s transistor density has increased by over 50 percent thanks to the latest manufacturing technology. One of the most remarkable features of the WSE-3 chip is its ability to enable AI models that are ten times larger than the highly acclaimed GPT-4 and Gemini models.
Why does it matter?
The WSE-3 chip opens up new possibilities for tackling complex problems and pushing the boundaries of AI capabilities. This powerful system can train massive language models, such as the Llama 70B, in just one day. It will help enterprises create custom LLMs, rapidly reducing the time-to-market.
Apple made a significant acquisition earlier this year by purchasing Canadian AI startup DarwinAI. Integrating DarwinAI’s expertise and technology bolsters Apple’s AI initiatives.
With this acquisition, Apple aims to tap into DarwinAI’s advancements in AI technology, particularly in visual inspection during manufacturing and making AI systems smaller and faster. Leveraging DarwinAI’s technology, Apple aims to run AI on devices rather than relying solely on cloud-based solutions.
Why does it matter?
Apple’s acquisition of DarwinAI is a strategic move to revolutionize features and enhance its AI capabilities across various products and services. Especially with the iOS 18 release around the corner, this acquisition will help create new features and enhance the user experience.
Microsoft is expanding Copilot, its AI assistant, with the introduction of the Copilot Pro subscription for individuals, the availability of Copilot for Microsoft 365 to small and medium-sized businesses, and removing seat minimums for commercial plans. Copilot aims to enhance creativity, productivity, and skills across work and personal life, providing users access to the latest AI models and improved image creation
Oracle has added advanced AI capabilities to its finance and supply chain software suite, aimed at improving decision-making and enhancing customer and employee experience. For instance, Oracle Fusion Cloud SCM includes features such as item description generation, supplier recommendations, and negotiation summaries.
Databricks has invested in Mistral AI and integrated its AI models into its data intelligence platform, allowing users to customize and consume models in various ways. The integration includes Mistral’s text-generation models, such as Mistral 7B and Mixtral 8x7B, which support multiple languages. This partnership aims to provide Databricks customers with advanced capabilities to leverage AI models and drive innovation in their data-driven applications.
Qualcomm has solidified its leadership position in mobile artificial intelligence (AI). It has been developing AI hardware and software for over a decade. Their Snapdragon processors are equipped with specialized AI engines like Hexagon DSP, ensuring efficient AI and machine learning processing without needing to send data to the cloud.
AI researchers are developing techniques to simulate peripheral vision and improve object detection in the periphery. They created a new dataset to train computer vision models, which led to better object detection outside the direct line of sight, though still behind human capabilities. A modified texture tiling approach accurately representing information loss in peripheral vision significantly enhanced object detection and recognition abilities.
Microsoft has expressed concerns to EU antitrust regulators about Google’s dominance in generative AI, highlighting Google’s unique position due to its vast data sets and vertical integration, which includes AI chips and platforms like YouTube.
The company argues that Google’s control over vast resources and its own AI developments give it a competitive advantage, making it difficult for competitors to match, especially in the development of Large Language Models like Gemini.
Microsoft defends partnerships with startups like OpenAI as essential for innovation and competition in the AI market, countering regulatory concerns about potential anticompetitive advantages arising from such collaborations.
Mercedes-Benz is testing humanoid robots, specifically Apptronik’s bipedal robot Apollo, for automating manual labor tasks in manufacturing.
The trial aims to explore the use of Apollo in physically demanding, repetitive tasks within existing manufacturing facilities without the need for significant redesigns.
The initiative seeks to address labor shortages by using robots for low-skill tasks, allowing highly skilled workers to focus on more complex aspects of car production.
Midjourney, an AI image generator, has banned prompts containing the names of Joe Biden and Donald Trump to avoid the spread of election misinformation.
The policy change is in response to concerns over AI’s potential to influence voters and spread false information before the 2024 presidential election.
Despite the new ban, Midjourney previously allowed prompts that could generate misleading or harmful content, and it was noted for its poor performance in controlling election disinformation.
Midjourney introduces Character Consistency: Tutorial
A Daily Chronicle of AI Innovations – March 14th, 2024:
DeepMind’s SIMA: The AI agent that’s a Jack of all games
Claude 3 Haiku: Anthropic’s lightning-fast AI solution for enterprises
OpenAI-powered “Figure 01” can chat, perceive, and complete tasks
OpenAI’s Sora will be publicly available later this year
DeepMind’s SIMA: The AI agent that’s a Jack of all games
DeepMind has introduced SIMA (Scalable Instructable Multiworld Agent), a generalist AI agent that can understand and follow natural language instructions to complete tasks across video game environments. Trained in collaboration with eight game studios on nine different games, SIMA marks a significant milestone in game-playing AI by showing the ability to generalize learned skills to new gaming worlds without requiring access to game code or APIs.
(SIMA comprises pre-trained vision models, and a main model that includes a memory and outputs keyboard and mouse actions.)
SIMA was evaluated on 600 basic skills, including navigation, object interaction, and menu use. In tests, SIMA agents trained on multiple games significantly outperformed specialized agents trained on individual games. Notably, an agent trained on all but one game performed nearly as well on the unseen game as an agent specifically trained on it, showcasing SIMA’s remarkable ability to generalize to new environments.
Why does this matter?
SIMA’s generalization ability using a single AI agent is a significant milestone in transfer learning. By showing that a multi-task trained agent can perform nearly as well on an unseen task as a specialized agent, SIMA paves the way for more versatile and scalable AI systems. This could lead to faster deployment of AI in real-world applications, as agents would require less task-specific training data and could adapt to new scenarios more quickly.
Claude 3 Haiku: Anthropic’s lightning-fast AI solution for enterprises
Anthropic has released Claude 3 Haiku, their fastest and most affordable AI model. With impressive vision capabilities and strong performance on industry benchmarks, Haiku is designed to tackle a wide range of enterprise applications. The model’s speed – processing 21K tokens per second for prompts under 32K tokens – and cost-effective pricing model make it an attractive choice for businesses needing to analyze large datasets and generate timely outputs.
In addition to its speed and affordability, Claude 3 Haiku prioritizes enterprise-grade security and robustness. The model is now available through Anthropic’s API or on claude.ai for Claude Pro subscribers.
Why does this matter?
Claude 3 Haiku sets a new benchmark for enterprise AI by offering high speed and cost-efficiency without compromising performance. This release will likely intensify competition among AI providers, making advanced AI solutions more accessible to businesses of all sizes. As more companies adopt models like Haiku, we expect a surge in AI-driven productivity and decision-making across industries.
OpenAI-powered “Figure 01” can chat, perceive, and complete tasks
Robotics company Figure, in collaboration with OpenAI, has developed a groundbreaking robot called “Figure 01” that can engage in full conversations, perceive its surroundings, plan actions, and execute tasks based on verbal requests, even those that are ambiguous or context-dependent. This is made possible by connecting the robot to a multimodal AI model trained by OpenAI, which integrates language and vision.
The AI model processes the robot’s entire conversation history, including images, enabling it to generate appropriate verbal responses and select the most suitable learned behaviors to carry out given commands. The robot’s actions are controlled by visuomotor transformers that convert visual input into precise physical movements. “Figure 01” successfully integrates natural language interaction, visual perception, reasoning, and dexterous manipulation in a single robot platform.
Why does this matter?
As robots become more adept at understanding and responding to human language, questions arise about their autonomy and potential impact on humanity. Collaboration between the robotics industry and AI policymakers is needed to establish regulations for the safe deployment of AI-powered robots. If deployed safely, these robots could become trusted partners, enhancing productivity, safety, and quality of life in various domains.
Amazon streamlines product listing process with new AI tool
Amazon is introducing a new AI feature for sellers to quickly create product pages by pasting a link from their external website. The AI generates product descriptions and images based on the linked site’s information, saving sellers time. (Link)
Microsoft to expand AI-powered cybersecurity tool availability from April 1
Microsoft is expanding the availability of its AI-powered cybersecurity tool, “Security Copilot,” from April 1, 2024. The tool helps with tasks like summarizing incidents, analyzing vulnerabilities, and sharing information. Microsoft plans to adopt a ‘pay-as-you-go’ pricing model to reduce entry barriers. (Link)
OpenAI’s Sora will be publicly available later this year
OpenAI will release Sora, its text-to-video AI tool, to the public later this year. Sora generates realistic video scenes from text prompts and may add audio capabilities in the future. OpenAI plans to offer Sora at a cost similar to DALL-E, its text-to-image model, and is developing features for users to edit the AI-generated content. (Link)
OpenAI partners with Le Monde, Prisa Media for news content in ChatGPT
OpenAI has announced partnerships with French newspaper Le Monde and Spanish media group Prisa Media to provide their news content to users of ChatGPT. The media companies see this as a way to ensure reliable information reaches AI users while safeguarding their journalistic integrity and revenue. (Link)
Icon’s AI architect and 3D printing breakthroughs reimagine homebuilding
Construction tech startup Icon has introduced an AI-powered architect, Vitruvius, that engages users in designing their dream homes, offering 3D-printed and conventional options. The company also debuted an advanced 3D printing robot called Phoenix and a low-carbon concrete mix as part of its mission to make homebuilding more affordable, efficient, and sustainable. (Link)
A Daily Chronicle of AI Innovations – March 13th, 2024: Devin: The first AI software engineer redefines coding; Deepgram’s Aura empowers AI agents with authentic voices; Meta introduces two 24K GPU clusters to train Llama 3
Devin: The first AI software engineer redefines coding
In the most groundbreaking development, the US-based startup Cognition AI has unveiled Devin, the world’s first AI software engineer. It is an autonomous agent that solves engineering tasks using its shell or command prompt, code editor, and web browser. Devin can also perform tasks like planning, coding, debugging, and deploying projects autonomously.
When evaluated on the SWE-Bench benchmark, which asks an AI to resolve GitHub issues found in real-world open-source projects, Devin correctly resolves 13.86% of the issues unassisted, far exceeding the previous state-of-the-art model performance of 1.96% unassisted and 4.80% assisted. It has successfully passed practical engineering interviews with leading AI companies and even completed real Upwork jobs.
Why does it matter?
There’s already a huge debate if Devin will replace software engineers. However, most production-grade software is too complex, unique, or domain-specific to be fully automated at this point. Perhaps, Devin could start handling more initial-level tasks in development. More so, it can assist developers in quickly prototyping, bootstrapping, and autonomously launching MVP for smaller apps and websites, for now
Deepgram’s Aura empowers AI agents with authentic voices
Deepgram, a top voice recognition startup, just released Aura, its new real-time text-to-speech model. It’s the first text-to-speech model built for responsive, conversational AI agents and applications. Companies can use these agents for customer service in call centers and other customer-facing roles.
Aura includes a dozen natural, human-like voices with lower latency than any comparable voice AI alternative and is already being used in production by several customers. Aura works hand in hand with Deepgram’s Nova-2 speech-to-text API. Nova-2 is known for its top-notch accuracy and speed in transcribing audio streams.
Why does it matter?
Deepgram’s Aura is a one-stop shop for speech recognition and voice generation APIs that enable the fastest response times and most natural-sounding conversational flow. Its human-like voice models render extremely fast (typically in well under half a second) and at an affordable price ($0.015 per 1,000 characters). Lastly, Deepgram’s transcription is more accurate and faster than other solutions as well.
Meta introduces two 24K GPU clusters to train Llama 3
Meta has invested significantly in its AI infrastructure by introducing two 24k GPU clusters. These clusters, built on top of Grand Teton, OpenRack, and PyTorch, are designed to support various AI workloads, including the training of Llama 3.
Meta aims to expand its infrastructure build-out by the end of 2024. It plans to include 350,000 NVIDIA H100 GPUs, providing compute power equivalent to nearly 600,000 H100s. The clusters are built with a focus on researcher and developer experience.
This adds up to Meta’s long-term vision to build open and responsibly developed artificial general intelligence (AGI). These clusters enable the development of advanced AI models and power applications such as computer vision, NLP, speech recognition, and image generation.
Why does it matter?
Meta is committed to open compute and open source, driving innovation in the AI software and hardware industry. Introducing two new GPUs to train Llama 3 is also a push forward to their commitment. As a founding member of Open Hardware Innovation (OHI) and the Open Innovation AI Research Community, Meta wants to make AI transparent and trustworthy.
Google Play to display AI-powered FAQs and recent YouTube videos for games
At the Google for Games Developer Summit held in San Francisco, Google announced several new features for ‘Google Play listing for games’. These include AI-powered FAQs, displaying the latest YouTube videos, new immersive ad formats, and support for native PC game publishing. These new features will allow developers to display promotions and the latest YouTube videos directly in their listing and show them to users in the Games tab of the Play Store. (Link)
DoorDash’s new AI-powered tool automatically curbs verbal abuses
DoorDash has introduced a new AI-powered tool named ‘SafeChat+’ to review in-app conversations and determine if a customer or Dasher is being harassed. There will be an option to report the incident and either contact DoorDash’s support team if you’re a customer or quickly cancel the order if you’re a delivery person. With this feature, DoorDash aims to reduce verbally abusive and inappropriate interactions between consumers and delivery people. (Link)
Perplexity has decided to bring Yelp data to its chatbot
Perplexity has decided to bring Yelp data to its chatbot. The company CEO, Aravind Srinivas, told the media that many people use chatbots like search engines. He added that it makes sense to offer information on things they look for, like restaurants, directly from the source. That’s why they have decided to integrate Yelp’s maps, reviews, and other details in responses when people ask for restaurant or cafe recommendations. (Link)
Pinterest’s ‘body types ranges’ tool delivers more inclusive search results
Pinterest has introduced a new tool named body type ranges, which gives users a choice to self-select body types from a visual cue between four body type ranges to deliver personalized and more refined search results for women’s fashion and wedding inspiration. This tool aims to create a more inclusive place online to search, save, and shop. The company also plans to launch a similar feature for men’s fashion later this year. (Link)
OpenAI’s GPT-4.5 Turbo is all set to be launched in June 2024
According to the leak search engine results from Bing and DuckDuck Go, which indexed the OpenAI GPT-4.5 Turbo product page before an official announcement, OpenAI is all set to launch the new version of its LLM by June 2024. There is a discussion among the AI community that this could be OpenAI’s fastest, most accurate, and most scalable model to date. The details of GPT-4.5 Turbo were leaked by OpenAI’s web team, which now leads to a 404 page. (Link))
A Daily Chronicle of AI Innovations in March 2024 – Day 12: AI Daily News – March 12th, 2024
Cohere’s introduces production-scale AI for enterprises RFM-1 redefines robotics with human-like reasoning Spotify introduces audiobook recommendations
Midjourney bans all its competitor’s employees
Google restricts election-related queries for its Gemini chatbot
Apple to let developers distribute apps directly from their websites
AI startups reach record funding of nearly $50 billion in 2023
Cohere’s introduces production-scale AI for enterprises
Cohere, an AI company, has introduced Command-R, a new large language model (LLM) designed to address real-world challenges, such as inefficient workflows, data analysis limitations, slow response times, etc.
Command-R focuses on two key areas: Retrieval Augmented Generation (RAG) and Tool Use. RAG allows the model to access and process information from private databases, improving the accuracy of its responses. Tool Use allows Command-R to interact with external software tools and APIs, automating complex tasks.
Command-R offers several features beneficial for businesses, including:
Multilingual capabilities: Supports 10 major languages
Cost-effectiveness: Offers a longer context window and reduced pricing compared to previous models
Wider accessibility: Available through Cohere’s API, major cloud providers, and free weights for research on HuggingFace
Overall, it empowers businesses to leverage AI for improved decision-making, increased productivity, and enhanced customer experiences.
Why does this matter?
Command-R showcases the future of business operations, featuring automated workflows, and enabling humans to focus on strategic work. Thanks to its low hallucination rate, we would see a wider adoption of AI technologies, and the development of sophisticated, context-aware AI applications tailored to specific business needs.
As AI continues to evolve and mature, models like Command-R will shape the future of work and the global economy.
RFM-1 redefines robotics with human-like reasoning
Covariant has introduced RFM-1, a Robotics Foundation Model that gives robots ChatGPT-like understanding and reasoning capabilities.
TLDR;
RFM-1 is an 8 billion parameter transformer trained on text, images, videos, robot actions, and sensor readings from Covariant’s fleet of high-performing robotic systems deployed in real-world environments.
Similar to how we understand how objects move, RFM-1 can predict future outcomes/consequences based on initial images and robot actions.
RFM-1 leverages NLP to enable intuitive interfaces for programming robot behavior. Operators can instruct robots using plain English, lowering barriers to customizing AI behavior for specific needs.
RFM-1 can also communicate issues and suggest solutions to operators.
Why does this matter?
This advancement has the potential to revolutionize industries such as manufacturing, logistics, and healthcare, where robots can work alongside humans to improve efficiency, safety, and productivity.
Spotify has introduced a novel recommendation system called 2T-HGNN to provide personalized audiobook recommendations to its users. The system addresses the challenges of introducing a new content type (audiobooks) into an existing platform, such as data sparsity and the need for scalability.
2T-HGNN leverages a technique called “Heterogeneous Graph Neural Networks” (HGNNs) to uncover connections between different content types. Additionally, a “Two Tower” (2T) model helps ensure that recommendations are made quickly and efficiently for millions of users.
Interestingly, the system also uses podcast consumption data and weak interaction signals to uncover user preferences and predict future audiobook engagement.
Why does this matter?
This research will not only improve the user experience but also encourage users to explore and engage with audiobooks, potentially driving growth in this new content vertical. Moreover, it may inspire similar strategies in domains where tailored recommendations are essential, such as e-commerce, news, and entertainment.
Elon Musk announced that his AI startup xAI will open-source its ChatGPT rival “Grok” this week, following a lawsuit against OpenAI for shifting to a for-profit model. Musk aims to provide free access to Grok’s code, aligning with open-source AI models like Meta and Mistral (Link)
Midjourney launches character consistent feature
Midjourney’s new “Consistent Character” feature lets artists create consistent characters across images. Users provide a reference image URL with their prompt, and the AI attempts to match the character’s features in new scenes. This holds promise for creators of comics, storyboards, and other visual narratives. (Link)
Apple tests AI for App Store ad optimization Taking a page from Google and Meta, Apple is testing AI-powered ad placement within its App Store. This new system would automatically choose the most suitable locations (e.g., App Store Today page) to display ads based on advertiser goals and budget. This development could help Apple’s ad business reach $6 billion by 2025.(Link)
China tests AI chatbot to assist neurosurgeons
China steps into the future of brain surgery with an AI co-pilot, dubbed “CARES Copilot”. This AI, based on Meta’s Llama 2.0, assists surgeons by analyzing medical data (e.g., scans) and offering informed suggestions during surgery. This government-backed project reflects China’s growing focus on developing domestic AI solutions for various sectors, including healthcare. (Link)
South Korea deploys AI dolls to tackle elderly loneliness
Hyodol, a Korean-based company, has introduced an AI-powered companion doll to tackle loneliness among elderly. Priced at $1800, the robot doll boasts advanced features like conversation abilities, medication reminders, and safety alerts. With 7,000 dolls already deployed, Hyodol aims to expand to European and North American markets. (Link)
Midjourney bans all its competitor’s employees
Midjourney banned all Stability AI employees from using its service, citing a systems outage caused by data scraping efforts linked to Stability AI employees.
The company announced the ban and a new policy against “aggressive automation” after identifying botnet-like activity from Stability AI during a server outage.
Stability AI CEO Emad Mostaque is looking into the incident, and Midjourney’s founder David Holz has provided information for the internal investigation.
Google restricts election-related queries for its Gemini chatbot
Google has begun restricting Gemini queries related to elections globally in countries where elections are taking place, to prevent the dissemination of false or misleading information.
The restrictions were implemented amid concerns over generative AI’s potential impact on elections and followed an advisory from India requiring tech firms to obtain government permission before introducing new AI models.
Despite the restrictions, the effectiveness of the restrictions is under question as some users found ways to bypass them, and it’s uncertain if Google will lift these restrictions post-elections.
AI startups reach record funding of nearly $50 billion in 2023
AI startups reached a record funding of nearly $50 billion in 2023, with significant contributions from companies like OpenAI and Anthropic.
Investment trends showed over 70 funding rounds exceeding $100 million each, partly due to major companies’ investments, including Microsoft’s $10 billion in OpenAI.
While large tech companies are venturing to dominate the AI market, specialized AI startups like Midjourney manage to maintain niches by offering superior products.
A Daily Chronicle of AI Innovations in March 2024 – Day 11: AI Daily News – March 11th, 2024
Huawei’s PixArt-Σ paints prompts to perfection Meta cracks the code to improve LLM reasoning Yi Models exceed benchmarks with refined data
Huawei’s PixArt-Σ paints prompts to perfection
Researchers from Huawei’s Noah’s Ark Lab introduced PixArt-Σ, a text-to-image model that can create 4K resolution images with impressive accuracy in following prompts. Despite having significantly fewer parameters than models like SDXL, PixArt-Σ outperforms them in image quality and prompt matching.
The model uses a “weak-to-strong” training strategy and efficient token compression to reduce computational requirements. It relies on carefully curated training data with high-resolution images and accurate descriptions, enabling it to generate detailed 4K images closely matching the text prompts. The researchers claim that PixArt-Σ can even keep up with commercial alternatives such as Adobe Firefly 2, Google Imagen 2, OpenAI DALL-E 3, and Midjourney v6.
Why does this matter?
PixArt-Σ’s ability to generate high-resolution, photorealistic images accurately could impact industries like advertising, media, and entertainment. As its efficient approach requires fewer computational resources than existing models, businesses may find it easier and more cost-effective to create custom visuals for their products or services.
Meta researchers investigated using reinforcement learning (RL) to improve the reasoning abilities of large language models (LLMs). They compared algorithms like Proximal Policy Optimization (PPO) and Expert Iteration (EI) and found that the simple EI method was particularly effective, enabling models to outperform fine-tuned models by nearly 10% after several training iterations.
However, the study also revealed that the tested RL methods have limitations in further improving LLMs’ logical capabilities. The researchers suggest that stronger exploration techniques, such as Tree of Thoughts, XOT, or combining LLMs with evolutionary algorithms, are important for achieving greater progress in reasoning performance.
Why does this matter?
Meta’s research highlights the potential of RL in improving LLMs’ logical abilities. This could lead to more accurate and efficient AI for domains like scientific research, financial analysis, and strategic decision-making. By focusing on techniques that encourage LLMs to discover novel solutions and approaches, researchers can make more advanced AI systems.
01.AI has introduced the Yi model family, a series of language and multimodal models that showcase impressive multidimensional abilities. The Yi models, based on 6B and 34B pretrained language models, have been extended to include chat models, 200K long context models, depth-upscaled models, and vision-language models.
The performance of the Yi models can be attributed to the high-quality data resulting from 01.AI‘s data-engineering efforts. By constructing a massive 3.1 trillion token dataset of English and Chinese corpora and meticulously polishing a small-scale instruction dataset, 01.AI has created a solid foundation for their models. The company believes that scaling up model parameters using thoroughly optimized data will lead to even more powerful models.
Why does this matter?
The Yi models’ success in language, vision, and multimodal tasks suggests that they could be adapted to a wide range of applications, from customer service chatbots to content moderation and beyond. These models also serve as a prime example of how investing in data optimization can lead to groundbreaking advancements in the field.
OpenAI’s Evolution into Skynet: AI and Robotics Future, Figure Humanoid Robots
OpenAI’s partnership with Figure signifies a transformative step in the evolution of AI and robotics.
Utilizing Microsoft Azure, OpenAI’s investment supports the deployment of autonomous humanoid robots for commercial use.
Figure’s collaboration with BMW Manufacturing integrates humanoid robots to enhance automotive production.
This technological progression echoes the fictional superintelligence Skynet yet emphasizes real-world innovation and safety.
The industry valuation of Figure at $2.6 billion underlines the significant impact and potential of advanced AI in commercial sectors.
What Else Is Happening in AI on March 11, 2024
Redfin’s AI can tell you about your dream neighborhood
“Ask Redfin” can now answer questions about homes, neighborhoods, and more. Using LLMss, the chatbot can provide insights on air conditioning, home prices, safety, and even connect users to agents. It is currently available in 12 U.S. cities, including Atlanta, Boston, Chicago, and Washington, D.C. (Link)
Pika Labs Adds Sound to Silent AI Videos
Pika Labs users can now add sound effects to their generated videos. Users can either specify the exact sounds they want or let Pika’s AI automatically select and integrate them based on the video’s content. This update aims to provide a more immersive and engaging video creation experience, setting a new standard in the industry. (Link)
Salesforce’s new AI tool for doctors automates paperwork
Salesforce is launching new AI tools to help healthcare workers automate tedious administrative tasks. Einstein Copilot: Health Actions will allow doctors to book appointments, summarize patient info, and send referrals using conversational AI, while Assessment Generation will digitize health assessments without manual typing or coding. (Link)
HP’s new AI-powered PCs redefine work
HP just dropped a massive lineup of AI-powered PCs, including the HP Elite series, Z by HP mobile workstations, and Poly Studio conferencing solutions. These devices use AI to improve productivity, creativity, and collaboration for the hybrid workforce, while also offering advanced security features like protection against quantum computer hacks. (Link)
DALL-E 3’s new look is artsy and user-friendly
OpenAI is testing a new user interface for DALL-E 3. It allows users to choose between predefined styles and aspect ratios directly in the GPT, offering a more intuitive and educational experience. OpenAI has also implemented the C2PA standard for metadata verification and is working on an image classifier to reliably recognize DALL-E images. (Link)
A Daily Chronicle of AI Innovations in March 2024 – Week 1 Summary
Anthropic introduced the next generation of Claude: Claude 3 model family. It includes Opus, Sonnet and Haiku models. Opus is the most intelligent model, that outperforms GPT-4 and Gemini 1.0 Ultra on most of the common evaluation benchmarks. Haiku is the fastest, most compact model for near-instant responsiveness. The Claude 3 models have vision capabilities, offer a 200K context window capable of accepting inputs exceeding 1 million tokens, improved accuracy and fewer refusals [Details|Model Card].
Stability AI partnered with Tripo AI and released TripoSR, a fast 3D object reconstruction model that can generate high-quality 3D models from a single image in under a second. The model weights and source code are available under the MIT license, allowing commercialized use. [Details|GitHub | Hugging Face].
Answer.AI released a fully open source system that, for the first time, can efficiently train a 70b large language model on a regular desktop computer with two or more standard gaming GPUs. It combines QLoRA with Meta’s FSDP, which shards large models across multiple GPUs [Details].
Inflection launched Inflection-2.5, an upgrade to their model powering Pi, Inflection’s empathetic and supportive companion chatbot. Inflection-2.5 approaches GPT-4’s performance, but used only 40% of the amount of compute for training. Pi is also now available on Apple Messages [Details].
Twelve Labs introduced Marengo-2.6, a new state-of-the-art (SOTA) multimodal foundation model capable of performing any-to-any search tasks, including Text-To-Video, Text-To-Image, Text-To-Audio, Audio-To-Video, Image-To-Video, and more [Details].
Cloudflare announced the development of Firewall for AI, a protection layer that can be deployed in front of Large Language Models (LLMs), hosted on the Cloudflare Workers AI platform or models hosted on any other third party infrastructure, to identify abuses before they reach the models [Details]
Scale AI, in partnership with the Center for AI Safety, released WMDP (Weapons of Mass Destruction Proxy): an open-source evaluation benchmark of 4,157 multiple-choice questions that serve as a proxy measurement of LLM’s risky knowledge in biosecurity, cybersecurity, and chemical security [Details].
Midjourney launched v6 turbo mode to generate images at 3.5x the speed (for 2x the cost). Just type /turbo [Link].
Moondream.ai released moondream 2 – a small 1.8B parameters, open-source, vision language model designed to run efficiently on edge devices. It was initialized using Phi-1.5 and SigLIP, and trained primarily on synthetic data generated by Mixtral. Code and weights are released under the Apache 2.0 license, which permits commercial use [Details].
Vercel released Vercel AI SDK 3.0. Developers can now associate LLM responses to streaming React Server Components [Details].
Nous Research released a new model designed exclusively to create instructions from raw-text corpuses, Genstruct 7B. This enables the creation of new, partially synthetic instruction finetuning datasets from any raw-text corpus [Details].
01.AI open-sources Yi-9B, one of the top performers among a range of similar-sized open-source models excelling in code, math, common-sense reasoning, and reading comprehension [Details].
Accenture to acquire Udacity to build a learning platform focused on AI [Details].
China Offers ‘Computing Vouchers’ upto $280,000 to Small AI Startups to train and run large language models [Details].
Snowflake and Mistral have partnered to make Mistral AI’s newest and most powerful model, Mistral Large, available in the Snowflake Data Cloud [Details]
OpenAI rolled out ‘Read Aloud’ feature for ChatGPT, enabling ChatGPT to read its answers out loud. Read Aloud can speak 37 languages but will auto-detect the language of the text it’s reading [Details].
A Daily Chronicle of AI Innovations in March 2024 – Day 8: AI Daily News – March 08th, 2024
Inflection 2.5: A new era of personal AI is here! Google announces LLMs on device with MediaPipe GaLore: A new method for memory-efficient LLM training
Adobe makes creating social content on mobile easier
OpenAI now allows users to add MFA to user accounts
US Army is building generative AI chatbots in war games
Claude 3 builds the painting app in 2 minutes and 48 seconds
Cognizant launches AI lab in San Francisco to drive innovation
Inflection 2.5: A new era of personal AI is here!
Inflection.ai, the company behind the personal AI app Pi, has recently introduced Inflection-2.5, an upgraded large language model (LLM) that competes with top LLMs like GPT-4 and Gemini. The in-house upgrade offers enhanced capabilities and improved performance, combining raw intelligence with the company’s signature personality and empathetic fine-tuning.
This upgrade has made significant progress in coding and mathematics, keeping Pi at the forefront of technological innovation. With Inflection-2.5, Pi has world-class real-time web search capabilities, providing users with high-quality breaking news and up-to-date information. This empowers Pi users with a more intelligent and empathetic AI experience.
Why does it matter?
Inflection-2.5 challenges leading language models like GPT-4 and Gemini with their raw capability, signature personality, and empathetic fine-tuning. This will provide a new alternative for startups and enterprises building personalized applications with generative AI capabilities.
Google’s new experimental release called the MediaPipe LLM Inference API allows LLMs to run fully on-device across platforms. This is a significant development considering LLMs’ memory and computing demands, which are over a hundred times larger than traditional on-device models.
The MediaPipe LLM Inference API is designed to streamline on-device LLM integration for web developers and supports Web, Android, and iOS platforms. It offers several key features and optimizations that enable on-device AI. These include new operations, quantization, caching, and weight sharing. Developers can now run LLMs on devices like laptops and phones using MediaPipe LLM Inference API.
Why does it matter?
Running LLMs on devices using MediaPipe and TensorFlow Lite allows for direct deployment, reducing dependence on cloud services. On-device LLM operation ensures faster and more efficient inference, which is crucial for real-time applications like chatbots or voice assistants. This innovation helps rapid prototyping with LLM models and offers streamlined platform integration.
GaLore: A new method for memory-efficient LLM training
Researchers have developed a new technique called Gradient Low-Rank Projection (GaLore) to reduce memory usage while training large language models significantly. Tests have shown that GaLore achieves results similar to full-rank training while reducing optimizer state memory usage by up to 65.5% when pre-training large models like LLaMA.
GaLore: A new method for memory-efficient LLM training
It also allows pre-training a 7 billion parameter model from scratch on a single 24GB consumer GPU without needing extra techniques. This approach works well for fine-tuning and outperforms low-rank methods like LoRA on GLUE benchmarks while using less memory. GaLore is optimizer-independent and can be used with other techniques like 8-bit optimizers to save additional memory.
Why does it matter?
The gradient matrix’s low-rank nature will help AI developers during model training. GaLore minimizes the memory cost of storing gradient statistics for adaptive optimization algorithms. It enables training large models like LLaMA with reduced memory consumption, making it more accessible and efficient for researchers.
OpenAI CTO complained to board about ‘manipulative’ CEO Sam Altman
OpenAI CTO Mira Murati was reported by the New York Times to have played a significant role in CEO Sam Altman’s temporary removal, raising concerns about his leadership in a private memo and with the board.
Altman was accused of creating a toxic work environment, leading to fears among board members that key executives like Murati and co-founder Ilya Sutskever could leave, potentially causing a mass exit of talent.
Despite internal criticisms of Altman’s leadership and management of OpenAI’s startup fund, hundreds of employees threatened to leave if he was not reinstated, highlighting deep rifts within the company’s leadership.
Saudi Arabia’s Male Humanoid Robot Accused of Sexual Harassment
A video of Saudi Arabia’s first male robot has gone viral after a few netizens accused the humanoid of touching a female reporter inappropriately.
“Saudi Arabia unveils its man-shaped AI robot, Mohammad, reacts to a reporter in its first appearance,” an X user wrote while sharing the video that people are claiming shows the robot’s inappropriate behaviour. You can view the original tweet here.
What Else Is Happening in AI on March 08th, 2024
Adobe makes creating social content on mobile easier
Adobe has launched an updated version of Adobe Express, a mobile app that now includes Firefly AI models. The app offers features such as a “Text to Image” generator, a “Generative Fill” feature, and a “Text Effects” feature, which can be utilized by small businesses and creative professionals to enhance their social media content. Creative Cloud members can also access and work on creative assets from Photoshop and Illustrator directly within Adobe Express. (Link)
OpenAI now allows users to add MFA to user accounts
To add extra security to OpenAI accounts, users can now enable Multi-Factor Authentication (MFA). To set up MFA, users can follow the instructions in the OpenAI Help Center article “Enabling Multi-Factor Authentication (MFA) with OpenAI.” MFA requires a verification code with their password when logging in, adding an extra layer of protection against unauthorized access. (Link)
US Army is building generative AI chatbots in war games
The US Army is experimenting with AI chatbots for war games. OpenAI’s technology is used to train the chatbots to provide battle advice. The AI bots act as military commanders’ assistants, offering proposals and responding within seconds. Although the potential of AI is acknowledged, experts have raised concerns about the risks involved in high-stakes situations. (Link)
Claude 3 builds the painting app in 2 minutes and 48 seconds
Claude 3, the latest AI model by Anthropic, created a multiplayer drawing app in just 2 minutes and 48 seconds. Multiple users could collaboratively draw in real-time with user authentication and database integration. The AI community praised the app, highlighting the transformative potential of AI in software development. Claude 3 could speed up development cycles and make software creation more accessible. (Link)
Cognizant launches AI lab in San Francisco to drive innovation
Cognizant has opened an AI lab in San Francisco to accelerate AI adoption in businesses. The lab, staffed with top researchers and developers, will focus on innovation, research, and developing cutting-edge AI solutions. Cognizant’s investment in AI research positions them as a thought leader in the AI space, offering advanced solutions to meet the modernization needs of global enterprises. (Link)
A Daily Chronicle of AI Innovations in March 2024 – Day 7: AI Daily News – March 07th, 2024
Microsoft’s NaturalSpeech makes AI sound human Google’s search update targets AI-generated spam Google’s RT-Sketch teaches robots with doodles
Ex-Google engineer charged with stealing AI secrets for Chinese firm
Microsoft engineer sounds alarm on company’s AI image generator in letter to FTC
Apple bans Epic’s developer account and calls the company ‘verifiably untrustworthy’
Apple reportedly developing foldable MacBook with 20.3-inch screen
Meta is building a giant AI model to power its ‘entire video ecosystem‘
Microsoft’s NaturalSpeech makes AI sound human
Microsoft and its partners have created NaturalSpeech 3, a new Text-to-Speech system that makes computer-generated voices sound more human. Powered by FACodec architecture and factorized diffusion models, NaturalSpeech 3 breaks down speech into different parts, like content, tone, and sound quality to create a natural-sounding speech that fits specific prompts, even for voices it hasn’t heard before.
Microsoft’s NaturalSpeech makes AI sound human
NaturalSpeech 3 works better than other voice tech in terms of quality, similarity, tone, and clarity. It keeps getting better as it learns from more data. By letting users change how the speech sounds through prompts, NaturalSpeech 3 makes talking to computers feel more like talking to a person. This research is a big step towards a future where chatting with computers is as easy as chatting with friends.
Why does this matter?
This advancement transcends mere voice quality. This could change the way we interact with devices like smartphones, smart speakers, and virtual assistants. Imagine having a more natural, engaging conversation with Siri, Alexa, or other AI helpers.
Better voice tech could also make services more accessible for people with visual impairments or reading difficulties. It might even open up new possibilities in entertainment, like more lifelike characters in video games or audiobooks that sound like they’re read by your favorite celebrities.
Google has announced significant changes to its search ranking algorithms in order to reduce low-quality and AI-generated spam content in search results. The March update targets three main spam practices: mass distribution of unhelpful content, abusing site reputation to host low-quality content, and repurposing expired domains with poor content.
While Google is not devaluing all AI-generated content, it aims to judge content primarily on its usefulness to users. Most of the algorithm changes are effective immediately, though sites abusing their reputation have a 60-day grace period to change their practices. As Google itself develops AI tools, SGE and Gemini, the debate around AI content and search result quality is just beginning.
Why does this matter?
Websites that churn out lots of AI-made content to rank higher on Google may see their rankings drop. This might push them to focus more on content creation strategies, with a greater emphasis on quality over quantity.
For people using Google, the changes should mean finding more useful results and less junk.
As AI continues to advance, search engines like Google will need to adapt their algorithms to surface the most useful content, whether it’s written by humans or AI.
Google has introduced RT-Sketch, a new approach to teaching robots tasks using simple sketches. Users can quickly draw a picture of what they want the robot to do, like rearranging objects on a table. RT-Sketch focuses on the essential parts of the sketch, ignoring distracting details.
RT-Sketch is trained on a dataset of paired trajectories and synthetic goal sketches, and tested on six object rearrangement tasks. The results show that RT-Sketch performs comparably to image or language-conditioned agents in simple settings with written instructions on straightforward tasks. However, it did better when instructions were confusing or there were distracting objects present.
RT-Sketch can also interpret and act upon sketches with varying levels of detail, from basic outlines to colorful drawings.
Why does this matter?
With RT-Sketch, people can tell robots what to do without needing perfect images or detailed written instructions. This could make robots more accessible and useful in homes, workplaces, and for people who have trouble communicating in other ways.
As robots become a bigger part of our lives, easy ways to talk to them, like sketching, could help us get the most out of them. RT-Sketch is a step toward making robots that better understand what we need.
Google’s Gemini lets users edit within the chatbox
Google has updated its Gemini chatbot, allowing users to directly edit and fine-tune responses within the chatbox. This feature, launched on March 4th for English users in the Gemini web app, enables more precise outputs by letting people select text portions and provide instructions for improvement. (Link)
Adobe’s AI boosts IBM’s marketing efficiency
IBM reports a 10-fold increase in designer productivity and a significant reduction in marketing campaign time after testing Adobe’s generative AI tools. The AI-powered tools have streamlined idea generation and variant creation, allowing IBM to achieve more in less time. (Link)
Zapier’s new tool lets you make AI bots without coding
Zapier has released Zapier Central, a new AI tool that allows users to create custom AI bots by simply describing what they want, without any coding. The bots can work with Zapier’s 6,000+ connected apps, making it easy for businesses to automate tasks. (Link)
Accenture teams up with Cohere to bring AI to enterprises
Accenture has partnered with AI startup, Cohere to provide generative AI solutions to businesses. Leveraging Cohere’s language models and search technologies, the collaboration aims to boost productivity and efficiency while ensuring data privacy and security. (Link)
Meta builds mega AI model for video recommendations Meta is developing a single AI model to power its entire video ecosystem across platforms by 2026. The company has invested billions in Nvidia GPUs to build this model, which has already shown promising results in improving Reels watch time on the core Facebook app. (Link)
OpenAI is researching photonic processors to run their AI on
OpenAI hired this person: He has been doing a lot of research on waveguides for photonic processing for both Training AI and for inference and he did a PHD about photonic waveguides:
I think that he is going to help OpenAI to build photonic waveguides that they can run their neural networks / AI Models on and this is really cool if OpenAI actually think that they can build processors with faster Inference and training with photonics.
Ex-Google engineer charged with stealing AI secrets for Chinese firm
Linwei Ding, a Google engineer, has been indicted for allegedly stealing over 500 files related to Google’s AI technology, including designs for chips and data center technologies, to benefit companies in China.
The stolen data includes designs for Google’s TPU chips and GPUs, crucial for AI workloads, amid U.S. efforts to restrict China’s access to AI-specific chips.
Ding allegedly transferred stolen files to a personal cloud account using a method designed to evade Google’s detection systems, was offered a CTO position by a Chinese AI company and founded a machine learning startup in China while still employed at Google.
Microsoft engineer sounds alarm on company’s AI image generator in letter to FTC
Microsoft AI engineer Shane Jones warns that the company’s AI image generator, Copilot Designer, generates sexual and violent content and ignores copyright laws.
Jones shared his findings with Microsoft and contacted U.S. senators and the FTC, demanding better safeguards and an independent review of Microsoft’s AI incident reporting process.
In addition to the problems with Copilot Designer, other Microsoft products based on OpenAI technologies, such as Copilot Chat, tend to have poorer performance and more insecure implementations than the original OpenAI products, such as ChatGPT and DALL-E 3.
Meta is building a giant AI model to power its ‘entire video ecosystem’
Meta is developing an AI model designed to power its entire video ecosystem, including the TikTok-like Reels service and traditional video content, as part of its technology roadmap through 2026.
The company has invested billions of dollars in Nvidia GPUs to support this AI initiative, aiming to improve recommendation systems and overall product performance across all platforms.
This AI model has already demonstrated an 8% to 10% increase in Reels watch time on the Facebook app, with Meta now working to expand its application to include the Feed recommendation product and possibly integrate sophisticated chatting tools.
Innovating for the Future
As Meta continues to innovate and refine their AI model architecture, we can expect even more exciting developments in the future. The company’s dedication to enhancing the video recommendation experience and leveraging the full potential of AI is paving the way for a new era in online video consumption.
Stay tuned for more updates as Meta strives to revolutionize the digital video landscape with its cutting-edge AI technology.
– AI will enable humans to get content they want, nothing more
– New AI OSes will act ‘for’ the human, cleaning content of ads
– OpenAI and new startups don’t need ad revenue, they’ll take monthly subscriptions to deliver information with no ads
No:
– New AI OSes will integrate ads even more closely into the computing experience, acting ‘against’ the human
– Content will be more tightly integrated with ads, and AI won’t be able to unpiece this
– Meta and Alphabet have $100bns of skin in the game, they will make sure this doesn’t happen, including by using their lawyers to prevent lifting content out of the ad context
A Daily Chronicle of AI Innovations in March 2024 – Day 6: AI Daily News – March 06th, 2024
Microsoft’s Orca AI beats 10x bigger models in math GPT-4V wins at turning designs into code DeepMind alums’ Haiper joins the AI video race
OpenAI fires back, says Elon Musk demanded ‘absolute control’ of the company
iOS 17.4 is here: what you need to know
TikTok faces US ban if ByteDance fails to sell app
Google now wants to limit the AI-powered search spam it helped create
OpenAI vs Musk (openai responds to elon musk).
What does Elon mean by: “Unfortunately, humanity’s future is in the hands of <redacted>”? Is it google?
What does elon mean “Unfortunately, humanity’s future is in the hands of “? Is it google?What does elon mean “Unfortunately, humanity’s future is in the hands of “? Is it google?
OpenAI has countered Elon Musk’s lawsuit by revealing Musk’s desire for “absolute control” over the company, including merging it with Tesla, holding majority equity, and becoming CEO.
In a blog post, OpenAI aims to dismiss Musk’s claims and argues against his view that the company has deviated from its original nonprofit mission and has become too closely aligned with Microsoft.
OpenAI defends its stance on not open-sourcing its work, citing a 2016 email exchange with Musk that supports a less open approach as the development of artificial general intelligence advances.
For the first time in history, an AI has a higher IQ than the average human.
For the first time in history, an AI has a higher IQ than the average human.
Claude 3 vs. GPT-4
Right now, the question on everyone’s mind is whether Claude 3 is better than GPT-4. It’s a fair question; GPT-4 has dominated the LLM benchmarks for over a year, despite plenty of competitors trying to catch up.
Certainly, GPT-4 now has some real competition in the form of Claude 3 and Gemini 1.5. Even if we put the benchmarks aside for a moment, capabilities like video comprehension and million-token context windows are pushing the state of the art forward, and OpenAI could finally cede its dominant position.
But I think that “best,” when it comes to LLMs, is a little bit of a red herring. Despite the marketing and social media hype, these models have more similarities than differences. Ultimately, “best” depends on your use cases and preferences.
Claude 3 may be better at reasoning and language comprehension than GPT-4, but that won’t matter much if you’re mainly generating code. Likewise, Gemini 1.5 may have better multi-modal capabilities, but if you’re concerned with working in different languages, then Claude might be your best bet. In my (very limited) testing, I’ve found that Opus is a much better writer than GPT-4 – the default writing style is far more “normal” than what I can now recognize as ChatGPT-generated content. But I’ve yet to try brainstorming and code generation tasks.
So, for now, my recommendation is to keep experimenting and find a model that works for you. Not only because each person’s use cases differ but also because the models are regularly improving! In the coming months, Anthropic plans to add function calls, interactive coding, and more agentic capabilities to Claude 3.
To try Claude 3 for yourself, you can start talking with Claude 3 Sonnet today (though you’ll need to be in one of Anthropic’s supported countries). Opus is available to paid subscribers of Claude Pro. If you’re a developer, Opus and Sonnet are available via the API, and Sonnet is additionally available through Amazon Bedrock and Google Cloud’s Vertex AI Model Garden. The models are also available via a growing number of third-party apps and services: check your favorite AI tool to see if it supports Claude 3!
Guy builds an AI-steered homing/killer drone in just a few hours
Guy builds an AI-steered homing/killer drone in just a few hours
Always Say Hello to Your GPTs… (Better Performing Custom GPTs)
I’ve been testing out lots of custom GPTs that others have made. Specifically games and entertaining GPTs and I noticed some issues and a solution.
The problem: First off, many custom GPT games seem to forget to generate images as per their instructions. I also noticed that, often, the game or persona (or whatever the GPT aims to be) becomes more of a paraphrased or simplified version of what it should be and responses become more like base ChatGPT.
The solution: I’ve noticed that custom GPTs will perform much better if the user starts the initial conversation with a simple ”Hello, can you explain your functionality and options to me?”. This seems to remind the custom GPT of it’s tone ensures it follow’s its instructions.
Microsoft’s Orca AI beats 10x bigger models in math
Microsoft’s Orca team has developed Orca-Math, an AI model that excels at solving math word problems despite its compact size of just 7 billion parameters. It outperforms models ten times larger on the GSM8K benchmark, achieving 86.81% accuracy without relying on external tools or tricks. The model’s success is attributed to training on a high-quality synthetic dataset of 200,000 math problems created using multi-agent flows and an iterative learning process involving AI teacher and student agents.
Microsoft’s Orca AI beats 10x bigger models in math
The Orca team has made the dataset publicly available under the MIT license, encouraging researchers and developers to innovate with the data. The small dataset size highlights the potential of using multi-agent flows to generate data and feedback efficiently.
Why does this matter?
Orca-Math’s breakthrough performance shows the potential for smaller, specialized AI models in niche domains. This development could lead to more efficient and cost-effective AI solutions for businesses, as smaller models require less computational power and training data, giving companies a competitive edge.
With unprecedented capabilities in multimodal understanding and code generation, GenAI can enable a new paradigm of front-end development where LLMs directly convert visual designs into code implementation. New research formalizes this as “Design2Code” task and conduct comprehensive benchmarking. It also:
Introduces Design2Code benchmark consisting of diverse real-world webpages as test examples
Develops comprehensive automatic metrics that complement human evaluations
Proposes new multimodal prompting methods that improve over direct prompting baselines.
Finetunes open-source Design2Code-18B model that matches the performance of Gemini Pro Vision on both human and automatic evaluation
Moreover, it finds 49% of the GPT-4V-generations webpages were good enough to replace the original references, while 64% were even better designed than the original references.
Why does this matter?
This research could simplify web development for anyone to build websites from visual designs using AI, much like word processors made writing accessible. For enterprises, automating this front-end coding process could improve collaboration between teams and speed up time-to-market across industries if implemented responsibly alongside human developers.
Kayak introduced two new AI features: PriceCheck, which lets users upload flight screenshots to find cheaper alternatives and Ask Kayak, a ChatGPT-powered travel advice chatbot. These additions position Kayak alongside other travel sites, using generative AI to improve trip planning and flight price comparisons in a competitive market. (Link)
Accenture invests $1B in LearnVantage for AI upskilling
Accenture is launching LearnVantage, investing $1 billion over three years to provide clients with customized technology learning and training services. Accenture is also acquiring Udacity to scale its learning capabilities and meet the growing demand for technology skills, including generative AI, so organizations can achieve business value using AI. (Link)
Snowflake brings Mistral’s LLMs to its data cloud
Snowflake has partnered with Mistral AI to bring Mistral’s open LLMs into its Data Cloud. This move allows Snowflake customers to build LLM apps directly within the platform. It also marks a significant milestone for Mistral AI, which has recently secured partnerships with Microsoft, IBM, and Amazon. The deal positions Snowflake to compete more effectively in the AI space and increases Mistral AI visibility. (Link)
Dell & CrowdStrike unite to fight AI threats
Dell and CrowdStrike are partnering to help businesses fight cyberattacks using AI. By integrating CrowdStrike’s Falcon XDR platform into Dell’s MDR service, they aim to protect customers against threats like generative AI attacks, social engineering, and endpoint breaches. (Link)
AI app diagnoses ear infections with a snap
Physician-scientists at UPMC and the University of Pittsburgh have developed a smartphone app that uses AI to accurately diagnose ear infections (acute otitis media) in young children. The app analyzes short videos of the eardrum captured by an otoscope connected to a smartphone camera. It could help decrease unnecessary antibiotic use by providing a more accurate diagnosis than many clinicians. (Link)
DeepMind alums’ Haiper joins the AI video race
DeepMind alums Yishu Miao and Ziyu Wang have launched Haiper, a video generation tool powered by their own AI model. The startup offers a free website where users can generate short videos using text prompts, although there are limitations on video length and quality.
DeepMind alums’ Haiper joins the AI video race
The company has raised $19.2 million in funding and focuses on improving its AI model to deliver high-quality, realistic videos. They aim to build a core video generation model that can be offered to developers and address challenges like the “uncanny valley” problem in AI-generated human figures.
Why does this matter?
Haiper signals the race to develop video AI models that can disrupt industries like marketing, entertainment, and education by allowing businesses to generate high-quality video content cost-effectively. However, the technology is at an early stage, so there is room for improvement, highlighting the need for responsible development.
A Daily Chronicle of AI Innovations in March 2024 – Day 5: AI Daily News – March 05th, 2024
Anthropic’s Claude 3 Beats OpenAI’s GPT-4 TripsoSR: 3D object generation from a single image in <1s Cloudflare’s Firewall for AI protects LLMs from abuses
Google co-founder says company ‘definitely messed up’
Facebook, Instagram, and Threads are all down
Microsoft compares New York Times to ’80s movie studios trying to ban VCRs
Fired Twitter execs are suing Elon Musk for over $128 million
Claude 3 gets ~60% accuracy on GPQA
Claude 3 gets ~60% accuracy on GPQA
Claude 3 gets ~60% accuracy on GPQA. It's hard for me to understate how hard these questions are—literal PhDs (in different domains from the questions) with access to the internet get 34%.
Anthropic has launched Claude 3, a new family of models that has set new industry benchmarks across a wide range of cognitive tasks. The family comprises three state-of-the-art models in ascending order of cognitive ability: Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus. Each model provides an increasing level of performance, and you can choose the one according to your intelligence, speed, and cost requirements.
Anthropic’s Claude 3 beats OpenAI’s GPT-4
Opus and Sonnet are now available via claude.ai and the Claude API in 159 countries, and Haiku will join that list soon.
Claude 3 has set a new standard of intelligence among its peers on most of the common evaluation benchmarks for AI systems, including undergraduate-level expert knowledge (MMLU), graduate-level expert reasoning (GPQA), basic mathematics (GSM8K), and more.
Anthropic’s Claude 3 beats OpenAI’s GPT-4
In addition, Claude 3 displays solid visual processing capabilities and can process a wide range of visual formats, including photos, charts, graphs, and technical diagrams. Lastly, compared to Claude 2.1, Claude 3 exhibits 2x accuracy and precision for responses and correct answers.
Why does it matter?
In 2024, Gemini and ChatGPT caught the spotlight, but now Claude 3 has emerged as the leader in AI benchmarks. While benchmarks matter, only the practical usefulness of Claude 3 will tell if it is truly superior. This might also prompt OpenAI to release a new ChatGPT upgrade. However, with AI models becoming more common and diverse, it’s unlikely that one single model will emerge as the ultimate winner.
TripsoSR: 3D object generation from a single image in <1s
Stability AI has introduced a new AI model named TripsoSR in partnership with Trip AI. The model enables high-quality 3D object generation or rest from a single in less than a second. It runs under low inference budgets (even without a GPU) and is accessible to many users.
TripsoSR: 3D object generation from a single image in <1s
As far as performance, TripoSR can create detailed 3D models in a fraction of the time of other models. When tested on an Nvidia A100, it generates draft-quality 3D outputs (textured meshes) in around 0.5 seconds, outperforming other open image-to-3D models such as OpenLRM.
TripsoSR: 3D object generation from a single image in <1s
Why does it matter?
TripoSR caters to the growing demands of various industries, including entertainment, gaming, industrial design, and architecture. The availability of the model weights and source code for download further promotes commercialized, personal, and research use, making it a valuable asset for developers, designers, and creators.
Cloudflare’s Firewall for AI protects LLMs from abuses
Cloudflare has released a Firewall for AI, a protection layer that you can deploy in front of Large Language Models (LLMs) to identify abuses before they reach the models. While the traditional web and API vulnerabilities also apply to the LLM world, Firewall for AI is an advanced-level Web Application Firewall (WAF) designed explicitly for LLM protection and placed in front of applications to detect vulnerabilities and provide visibility to model owners.
Cloudflare Firewall for AI is deployed like a traditional WAF, where every API request with an LLM prompt is scanned for patterns and signatures of possible attacks. You can deploy it in front of models hosted on the Cloudflare Workers AI platform or any other third-party infrastructure. You can use it alongside Cloudflare AI Gateway and control/set up a Firewall for AI using the WAF control plane.
Cloudflare’s Firewall for AI protects LLMs from abuses
Why does it matter?
As the use of LLMs becomes more widespread, there is an increased risk of vulnerabilities and attacks that malicious actors can exploit. Cloudflare is one of the first security providers to launch tools to secure AI applications. Using a Firewall for AI, you can control what prompts and requests reach their language models, reducing the risk of abuses and data exfiltration. It also aims to provide early detection and protection for both users and LLM models, enhancing the security of AI applications.
Microsoft compares New York Times to ’80s movie studios trying to ban VCRs
Microsoft filed a motion to dismiss the New York Times’ copyright infringement lawsuit against OpenAI, comparing the newspaper’s stance to 1980s movie studios’ attempts to block VCRs, arguing that generative AI, like the VCR, does not hinder the original content’s market.
The company, as OpenAI’s largest supporter, asserts that copyright law does not obstruct ChatGPT’s development because the training content does not substantially affect the market for the original content.
Microsoft and OpenAI contend that ChatGPT does not replicate or substitute for New York Times content, emphasizing that the AI’s training on such articles does not significantly contribute to its development.
Google co-founder says company ‘definitely messed up’
Sergey Brin admitted Google “definitely messed up” with the Gemini AI’s image generation, highlighting issues like historically inaccurate images and the need for more thorough testing.
Brin, a core contributor to Gemini, came out of retirement due to the exciting trajectory of AI, amidst the backdrop of Google’s “code red” in response to OpenAI’s ChatGPT.
Criticism of Gemini’s biases and errors, including its portrayal of people of color and responses in written form, led to Brin addressing concerns over the AI’s unintended left-leaning output.
A Daily Chronicle of AI Innovations in March 2024 – Day 4: AI Daily News – March 04th, 2024
Google’s ScreenAI can ‘see’ graphics like humans do How AI ‘worms’ pose security threats in connected systems New benchmarking method challenges LLMs’ reasoning abilities
AI may enable personalized prostate cancer treatment
Vimeo debuts AI-powered video hub for business collaboration
Motorola revving up for AI-powered Moto X50 Ultra launch
Copilot will soon fetch and parse your OneDrive files
Huawei’s new AI chip threatens Nvidia’s dominance in China
OpenAI adds ‘Read Aloud’ voiceover to ChatGPT
https://youtu.be/ZJvTv7zVX0s?si=yejANUAUtUwyXEH8
OpenAI rolled out a new “Read Aloud” feature for ChatGPT as rivals like Anthropic and Google release more capable language models. (Source)
The Voiceover Update
ChatGPT can now narrate responses out loud on mobile apps and web.
Activated by tapping the response or clicking the microphone icon.
Update comes as Anthropic unveils their newest Claude 3 model.
Timing seems reactive amid intense competition over advanced AI. OpenAI also facing lawsuit from Elon Musk over alleged betrayal.
Anthropic launches Claude 3, claiming to outperform GPT-4 across the board
Anthropic launches Claude 3, claiming to outperform GPT-4 across the board
Google’s ScreenAI can ‘see’ graphics like humans do
Google Research has introduced ScreenAI, a Vision-Language Model that can perform question-answering on digital graphical content like infographics, illustrations, and maps while also annotating, summarizing, and navigating UIs. The model combines computer vision (PaLI architecture) with text representations of images to handle these multimodal tasks.
Despite having just 4.6 billion parameters, ScreenAI achieves new state-of-the-art results on UI- and infographics-based tasks and new best-in-class performance on others, compared to models of similar size.
Google’s ScreenAI can ‘see’ graphics like humans do
While ScreenAI is best-in-class on some tasks, further research is needed to match models like GPT-4 and Gemini, which are significantly larger. Google Research has released a dataset with ScreenAI’s unified representation and two other datasets to help the community experiment with more comprehensive benchmarking on screen-related tasks.
Why does this matter?
ScreenAI’s breakthrough in unified visual and language understanding bridges the disconnect between how humans and machines interpret ideas across text, images, charts, etc. Companies can now leverage these multimodal capabilities to build assistants that summarize reports packed with graphics, analysts that generate insights from dashboard visualizations, and agents that manipulate UIs to control workflows.
How AI ‘worms’ pose security threats in connected systems
Security researchers have created an AI “worm” called Morris II to showcase vulnerabilities in AI ecosystems where different AI agents are linked together to complete tasks autonomously.
The researchers tested the worm in a simulated email system using ChatGPT, Gemini, and other popular AI tools. The worm can exploit these AI systems to steal confidential data from emails or forward spam/propaganda without human approval. It works by injecting adversarial prompts that make the AI systems behave maliciously.
While this attack was simulated, the research highlights risks if AI agents are given too much unchecked freedom to operate.
Why does it matter?
This AI “worm” attack reveals that generative models like ChatGPT have reached capabilities that require heightened security to prevent misuse. Researchers and developers must prioritize safety by baking in controls and risk monitoring before commercial release. Without industry-wide commitments to responsible AI, regulation may be needed to enforce acceptable safeguards across critical domains as systems gain more autonomy.
New benchmarking method challenges LLMs’ reasoning abilities
Researchers at Consequent AI have identified a “reasoning gap” in large language models like GPT-3.5 and GPT-4. They introduced a new benchmarking approach called “functional variants,” which tests a model’s ability to reason instead of just memorize. This involves translating reasoning tasks like math problems into code that can generate unique questions requiring the same logic to solve.
New benchmarking method challenges LLMs’ reasoning abilities
When evaluating several state-of-the-art models, the researchers found a significant gap between performance on known problems from benchmarks versus new problems the models had to reason through. The gap was 58-80%, indicating the models do not truly understand complex problems but likely just store training examples. The models performed better on simpler math but still demonstrated limitations in reasoning ability.
Why does this matter?
This research reveals that reasoning still eludes our most advanced AIs. We risk being misled by claims of progress made by the Big Tech if their benchmarks reward superficial tricks over actual critical thinking. Moving forward, model creators will have to prioritize generalization and logic over memorization if they want to make meaningful progress towards general intelligence.
AI may enable personalized prostate cancer treatment
Researchers used AI to analyze prostate cancer DNA and found two distinct subtypes called “evotypes.” Identifying these subtypes could allow for better prediction of prognosis and personalized treatments. (Link)
Vimeo debuts AI-powered video hub for business collaboration
Vimeo has launched a new product called Vimeo Central, an AI-powered video hub to help companies improve internal video communications, collaboration, and analytics. Key capabilities include a centralized video library, AI-generated video summaries and highlights, enhanced screen recording and video editing tools, and robust analytics. (Link)
Motorola revving up for AI-powered Moto X50 Ultra launch
Motorola is building hype for its upcoming Moto X50 Ultra phone with a Formula 1-themed teaser video highlighting the device’s powerful AI capabilities. The phone will initially launch in China on April 21 before potentially getting a global release under the Motorola Edge branding. (Link)
Copilot will soon fetch and parse your OneDrive files
Microsoft is soon to launch Copilot for OneDrive, an AI assistant that will summarize documents, extract information, answer questions, and follow commands related to files stored in OneDrive. Copilot can generate outlines, tables, and lists based on documents, as well as tailored summaries and responses. (Link)
Huawei’s new AI chip threatens Nvidia’s dominance in China
Huawei has developed a new AI chip, the Ascend 910B, which matches the performance of Nvidia’s A100 GPU based on assessments by SemiAnalysis. The Ascend 910B is already being used by major Chinese companies like Baidu and iFlytek and could take market share from Nvidia in China due to US export restrictions on Nvidia’s latest AI chips. (Link)
1-bit LLMs explained
Check out this new tutorial that summarizes the revolutionary paper “The Era of 1-bit LLMs” introducing BitNet b1.58 model and explain what are 1-bit LLMs and how they are useful.
A Daily Chronicle of AI Innovations in March 2024 – Day 2: AI Daily News – March 02nd, 2024
A Daily Chronicle of AI Innovations in March 2024 – Day 1: AI Daily News – March 01st, 2024
Sora showcases jaw-dropping geometric consistency Microsoft introduces Copilot for finance in Microsoft 365 OpenAI and Figure team up to develop AI for robots
Elon Musk filed suit against OpenAI and CEO Sam Altman, alleging they have breached the artificial-intelligence startup’s founding agreement by putting profit ahead of benefiting humanity.
The 52-year-old billionaire, who helped fund OpenAI in its early days, said the company’s close relationship with Microsoft has undermined its original mission of creating open-source technology that wouldn’t be subject to corporate priorities. Musk, who is also CEO of Tesla has been among the most outspoken about the dangers of AI and artificial general intelligence, or AGI.
“To this day, OpenAI Inc.’s website continues to profess that its charter is to ensure that AGI “benefits all of humanity.” In reality, however, OpenAI has been transformed into a closed-source de facto subsidiary of the largest technology company in the world: Microsoft,” the lawsuit says.
Elon Sues OpenAI for “breach of contract”
Sora showcases jaw-dropping geometric consistency
Sora from OpenAI has been remarkable in video generation compared to other leading models like Pika and Gen2. In a recent benchmarking test conducted by ByteDanc.Inc in collaboration with Wuhan and Nankai University, Sora showcased video generation with high geometric consistency.
Sora showcases jaw-dropping geometric consistency
The benchmark test assesses the quality of generated videos based on how it adhere to the principles of physics in real-world scenarios. Researchers used an approach where generated videos are transformed into 3D models. Further, a team of researchers used the fidelity of geometric constraints to measure the extent to which generated videos conform to physics principles in the real world.
Why does it matter?
Sora’s remarkable performance in generating geometrically consistent videos can greatly boost several use cases for construction engineers and architects. Further, the new benchmarking will allow researchers to measure newly developed models to understand how accurately their creations conform to the principles of physics in real-world scenarios.
Microsoft introduces Copilot for finance in Microsoft 365
Microsoft has launched Copilot for Finance, a new addition to its Copilot series that recommends AI-powered productivity enhancements. It aims to transform how finance teams approach their daily work with intelligent workflow automation, recommendations, and guided actions. This Copilot aims to simplify data-driven decision-making, helping finance professionals have more free time by automating manual tasks like Excel and Outlook.
Copilot for Finance simplifies complex variance analysis in Excel, account reconciliations, and customer account summaries in Outlook. Dentsu, Northern Trust, Schneider Electric, and Visa plan to use it alongside Copilot for Sales and Service to increase productivity, reduce case handling times, and gain better decision-making insights.
Why does it matter?
Introducing Microsoft Copilot for finance will help businesses focus on strategic involvement from professionals otherwise busy with manual tasks like data entry, workflow management, and more. This is a great opportunity for several organizations to automate tasks like analysis of anomalies, improve analytic efficiency, and expedite financial transactions.
OpenAI and Figure team up to develop AI for robots
Figure has raised $675 million in series B funding with investments from OpenAI, Microsoft, and NVIDIA. It is an AI robotics company developing humanoid robots for general-purpose usage. The collaboration agreement between OpenAI and Figure aims to develop advanced humanoid robots that will leverage the generative AI models at its core.
This collaboration will also help accelerate the development of smart humanoid robots capable of understanding tasks like humans. With its deep understanding of robotics, Figure is set to bring efficient robots for general-purpose enhancing automation.
Why does it matter?
Open AI and Figure will transform robot operations, adding generative AI capabilities. This collaboration will encourage the integration of generative AI capabilities across robotics development. Right from industrial robots to general purpose and military applications, generative AI can be the new superpower for robotic development.
Google now wants to limit the AI-powered search spam it helped create
Google announced it will tackle AI-generated content aiming to manipulate search rankings through algorithmic enhancements, affecting automated content creation the most.
These algorithm changes are intended to discern and reduce low-quality and unhelpful webpages, aiming to improve the overall quality of search results.
The crackdown also targets misuse of high-reputation websites and the exploitation of expired domains for promoting substandard content.
Stack Overflow partners with Google Cloud to power AI
Stack Overflow and Google Cloud are partnering to integrate OverflowAPI into Google Cloud’s AI tools. This will give developers accessing the Google Cloud console access to Stack Overflow’s vast knowledge base of over 58 million questions and answers. The partnership aims to enable AI systems to provide more insightful and helpful responses to users by learning from the real-world experiences of programmers. (Link)
Microsoft unites rival GPU makers for one upscaling API
Microsoft is working with top graphics hardware makers to introduce “DirectSR”, a new API that simplifies the integration of super-resolution upscaling into games. DirectSR will allow game developers to easily access Nvidia’s DLSS, AMD’s FSR, and Intel’s XeSS with a single code path. Microsoft will preview the API in its Agility SDK soon and demonstrate it live with AMD and Nvidia reps on March 21st. (Link)
Google supercharges data platforms with AI for deeper insights
Google is expanding its AI capabilities across data and analytics services, including BigQuery and Cloud Databases. Vector search support is available across all databases, and BigQuery has the advanced Gemini Pro model for unstructured data analysis. Users can combine insights from images, video, audio, and text with structured data in a single analytics workflow. (Link)
Brave’s privacy-first AI-powered assistant is now available on Android
Brave’s AI-powered assistant, Leo, is now available on Android, bringing helpful features like summarization, transcription, and translation while prioritizing user privacy. Leo processes user inputs locally on the device without retaining or using data to train itself, aligning with Brave’s commitment to privacy-focused services. Users can simplify tasks with Leo without compromising on security. (Link)
Mistral introduced a new model Mistral Large. It reaches top-tier reasoning capabilities, is multi-lingual by design, has native function calling capacities and has 32K tokens context window. The pre-trained model has 81.2% accuracy on MMLU. Alongside Mistral Large, Mistral Small, a model optimized for latency and cost has been released. Mistral Small outperforms Mixtral 8x7B and has lower latency. Mistral also launched a ChatGPT like new conversational assistant, le Chat Mistral [Details].
Alibaba Group introduced EMO, an expressive audio-driven portrait-video generation framework. Input a single reference image and the vocal audio, e.g. talking and singing, it can generate vocal avatar videos with expressive facial expressions, and various head poses [Details].
Ideogram introduced Ideogram 1.0, a text-to-image model trained from scratch for state-of-the-art text rendering, photorealism, prompt adherence, and a feature called Magic Prompt to help with prompting. Ideogram 1.0 is now available to all users on ideogram.ai [Details].
Ideogram introduced Ideogram 1.0
Google DeepMind introduced Genie (generative interactive environments), a foundation world model trained exclusively from Internet videos that can generate interactive, playable environments from a single image prompt [Details].
Pika Labs launched Lip Sync feature, powered by audio from Eleven Labs, for its AI generated videos enabling users to make the characters talk with realistic mouth movements [Video].
UC Berkeley introduced Berkeley Function Calling Leaderboard (BFCL) to evaluate the function calling capability of different LLMs. Gorilla Open Functions v2, an open-source model that can help users with building AI applications with function calling and interacting with json compatible output has also been released [Details].
Qualcomm launched AI Hub, a curated library of 80+ optimized AI models for superior on-device AI performance across Qualcomm and Snapdragon platforms [Details].
BigCode released StarCoder2, a family of open LLMs for code and comes in 3 different sizes with 3B, 7B and 15B parameters. StarCoder2-15B is trained on over 4 trillion tokens and 600+ programming languages from The Stack v2 dataset [Details].
Researchers released FuseChat-7B-VaRM, which is the fusion of three prominent chat LLMs with diverse architectures and scales, namely NH2-Mixtral-8x7B, NH2-Solar-10.7B, and OpenChat-3.5-7B, surpassing GPT-3.5 (March), Claude-2.1, and approaching Mixtral-8x7B-Instruct [Details].
The Swedish fintech Klarna’s AI assistant handles two-thirds of all customer service chats, some 2.3 million conversations so far, equivalent to the work of 700 people [Details].
Lightricks introduces LTX Studio, an AI-powered film making platform, now open for waitlist sign-ups, aimed at assisting creators in story visualization [Details].
Morph partners with Stability AI to launch Morph Studio, a platform to make films using Stability AI–generated clips [Details].
JFrog‘s security team found that roughly a 100 models hosted on the Hugging Face platform feature malicious functionality [Details].
Playground released Playground v2.5, an open-source text-to-image generative model, with a focus on enhanced color and contrast, improved generation for multi-aspect ratios, and improved human-centric fine detail [Details].
Together AI and the Arc Institute released Evo, a long-context biological foundation model based on the StripedHyena architecture that generalizes across DNA, RNA, and proteins.. Evo is capable of both prediction tasks and generative design, from molecular to whole genome scale (over 650k tokens in length) [Details].
Adobe previews a new generative AI music generation and editing tool, Project Music GenAI Control, that allows creators to generate music from text prompts, and then have fine-grained control to edit that audio for their precise needs [Details | video].
Microsoft introduces Copilot for Finance, an AI chatbot for finance workers in Excel and Outlook [Details].
The Intercept, Raw Story, and AlterNet sue OpenAI and Microsoft, claiming OpenAI and Microsoft intentionally removed important copyright information from training data [Details].
Huawei spin-off Honor shows off tech to control a car with your eyes and chatbot based on Meta’s AI [Details].
Tumblr and WordPress.com are preparing to sell user data to Midjourney and OpenAI [Details]
February 2024 – Week 3 Recap
Meta AI introduces V-JEPA (Video Joint Embedding Predictive Architecture), a method for teaching machines to understand and model the physical world by watching videos. Meta AI releases a collection of V-JEPA vision models trained with a feature prediction objective using self-supervised learning. The models are able to understand and predict what is going on in a video, even with limited information [Details | GitHub].
Open AI introduces Sora, a text-to-video model that can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions [Details + sample videos| Report].
Google announces their next-generation model, Gemini 1.5, that uses a new Mixture-of-Experts (MoE) architecture. The first Gemini 1.5 model being released for early testing is Gemini 1.5 Pro with a context window of up to 1 million tokens, which is the longest context window of any large-scale foundation model yet. 1.5 Pro can perform sophisticated understanding and reasoning tasks for different modalities, including video and it performs at a similar level to 1.0 Ultra [Details|Tech Report].
Reka introduced Reka Flash, a new 21B multimodal and multilingual model trained entirely from scratch that is competitive with Gemini Pro & GPT 3.5 on key language & vision benchmarks. Reka also present a compact variant Reka Edge , a smaller and more efficient model (7B) suitable for local and on-device deployment. Both models are in public beta and available in Reka Playground [Details].
Cohere For AI released Aya, a new open-source, massively multilingual LLM & dataset to help support under-represented languages. Aya outperforms existing open-source models and covers 101 different languages – more than double covered by previous models [Details].
BAAI released Bunny, a family of lightweight but powerful multimodal models. Bunny-3B model built upon SigLIP and Phi-2 outperforms the state-of-the-art MLLMs, not only in comparison with models of similar size but also against larger MLLMs (7B), and even achieves performance on par with LLaVA-13B [Details].
Amazon introduced a text-to-speech (TTS) model called BASE TTS (Big Adaptive Streamable TTS with Emergent abilities). BASE TTS is the largest TTS model to-date, trained on 100K hours of public domain speech data and exhibits “emergent” qualities improving its ability to speak even complex sentences naturally [Details | Paper].
Stability AI released Stable Cascade in research preview, a new text to image model that is exceptionally easy to train and finetune on consumer hardware due to its three-stage architecture. Stable Cascade can also generate image variations and image-to-image generations. In addition to providing checkpoints and inference scripts, Stability AI has also released scripts for finetuning, ControlNet, and LoRA training [Details].
Researchers from UC berkeley released Large World Model (LWM), an open-source general-purpose large-context multimodal autoregressive model, trained from LLaMA-2, that can perform language, image, and video understanding and generation. LWM answers questions about 1 hour long YouTube video even if GPT-4V and Gemini Pro both fail and can retriev facts across 1M context with high accuracy [Details].
GitHub opens applications for the next cohort of GitHub Accelerator program with a focus on funding the people and projects that are building AI-based solutions under an open source license [Details].
NVIDIA released Chat with RTX, a locally running (Windows PCs with specific NVIDIA GPUs) AI assistant that integrates with your file system and lets you chat with your notes, documents, and videos using open source models [Details].
Open AI is testing memory with ChatGPT, enabling it to remember things you discuss across all chats. ChatGPT’s memories evolve with your interactions and aren’t linked to specific conversations. It is being rolled out to a small portion of ChatGPT free and Plus users this week [Details].
BCG X released of AgentKit, a LangChain-based starter kit (NextJS, FastAPI) to build constrained agent applications [Details | GitHub].
Elevenalabs’ Speech to Speech feature, launched in November, for voice transformation with control over emotions and delivery, is now multilingual and available in 29 languages [Link]
Apple introduced Keyframer, an LLM-powered animation prototyping tool that can generate animations from static images (SVGs). Users can iterate on their design by adding prompts and editing LLM-generated CSS animation code or properties [Paper].
Eleven Labs launched a payout program for voice actors to earn rewards every time their voice clone is used [Details].
Azure OpenAI Service announced Assistants API, new models for finetuning, new text-to-speech model and new generation of embeddings models with lower pricing [Details].
Brilliant Labs, the developer of AI glasses, launched Frame, the world’s first glasses featuring an integrated AI assistant, Noa. Powered by an integrated multimodal generative AI system capable of running GPT4, Stability AI, and the Whisper AI model simultaneously, Noa performs real-world visual processing, novel image generation, and real-time speech recognition and translation. [Details].
Nous Research released Nous Hermes 2 Llama-2 70B model trained on the Nous Hermes 2 dataset, with over 1,000,000 entries of primarily synthetic data [Details].
Open AI in partnership with Microsoft Threat Intelligence, have disrupted five state-affiliated actors that sought to use AI services in support of malicious cyber activities [Details]
Perplexity partners with Vercel, opening AI search to developer apps [Details].
Researchers show that LLM agents can autonomously hack websites.
February 2024 – Week 2 Recap:
Google launches Ultra 1.0, its largest and most capable AI model, in its ChatGPT-like assistant which has now been rebranded as Gemini (earlier called Bard). Gemini Advanced is available, in 150 countries, as a premium plan for $19.99/month, starting with a two-month trial at no cost. Google is also rolling out Android and iOS apps for Gemini [Details].
Alibaba Group released Qwen1.5 series, open-sourcing models of 6 sizes: 0.5B, 1.8B, 4B, 7B, 14B, and 72B. Qwen1.5-72B outperforms Llama2-70B across all benchmarks. The Qwen1.5 series is available on Ollama and LMStudio. Additionally, API on together.ai [Details|Hugging Face].
NVIDIA released Canary 1B, a multilingual model for speech-to-text recognition and translation. Canary transcribes speech in English, Spanish, German, and French and also generates text with punctuation and capitalization. It supports bi-directional translation, between English and three other supported languages. Canary outperforms similarly-sized Whisper-large-v3, and SeamlessM4T-Medium-v1 on both transcription and translation tasks and achieves the first place on HuggingFace Open ASR leaderboard with an average word error rate of 6.67%, outperforming all other open source models [Details].
Researchers released Lag-Llama, the first open-source foundation model for time series forecasting [Details].
LAION released BUD-E, an open-source conversational and empathic AI Voice Assistant that uses natural voices, empathy & emotional intelligence and can handle multi-speaker conversations [Details].
MetaVoice released MetaVoice-1B, a 1.2B parameter base model trained on 100K hours of speech, for TTS (text-to-speech). It supports emotional speech in English and voice cloning. MetaVoice-1B has been released under the Apache 2.0 license [Details].
Bria AI released RMBG v1.4, an an open-source background removal model trained on fully licensed images [Details].
Researchers introduce InteractiveVideo, a user-centric framework for video generation that is designed for dynamic interaction, allowing users to instruct the generative model during the generation process [Details|GitHub].
Microsoft announced a redesigned look for its Copilot AI search and chatbot experience on the web (formerly known as Bing Chat), new built-in AI image creation and editing functionality, and Deucalion, a fine tuned model that makes Balanced mode for Copilot richer and faster [Details].
Roblox introduced AI-powered real-time chat translations in 16 languages [Details].
Hugging Face launched Assistantsfeature on HuggingChat. Assistants are custom chatbots similar to OpenAI’s GPTs that can be built for free using open source LLMs like Mistral, Llama and others [Link].
DeepSeek AI released DeepSeekMath 7B model, a 7B open-source model that approaches the mathematical reasoning capability of GPT-4. DeepSeekMath-Base is initialized with DeepSeek-Coder-Base-v1.5 7B [Details].
Microsoft is launching several collaborations with news organizations to adopt generative AI [Details].
LG Electronics signed a partnership with Korean generative AI startup Upstage to develop small language models (SLMs) for LG’s on-device AI features and AI services on LG notebooks [Details].
Stability AI released SVD 1.1, an updated model of Stable Video Diffusion model, optimized to generate short AI videos with better motion and more consistency [Details|Hugging Face] .
OpenAI and Meta announced to label AI generated images [Details].
Google saves your conversations with Gemini for years by default [Details].
February 2024 – Week 1 Recap:
Amazon presents Diffuse to Choose, a diffusion-based image-conditioned inpainting model that allows users to virtually place any e-commerce item in any setting, ensuring detailed, semantically coherent blending with realistic lighting and shadows. Code and demo will be released soon [Details].
OpenAI announced two new embedding models, new GPT-4 Turbo and moderation models, new API usage management tools, and lower pricing on GPT-3.5 Turbo. The updated GPT-4 Turbo preview model reduces cases of “laziness” where the model doesn’t complete a task. The new embedding models include a smaller and highly efficient text-embedding-3-small model, and a larger and more powerful text-embedding-3-large model. [Details].
Hugging Face and Google partner to support developers building AI applications [Details].
Adept introduced Adept Fuyu-Heavy, a new multimodal model designed specifically for digital agents. Fuyu-Heavy scores higher on the MMMU benchmark than Gemini Pro [Details].
Fireworks.ai has open-sourced FireLLaVA, a LLaVA multi-modality model trained on OSS LLM generated instruction following data, with a commercially permissive license. Firewroks.ai is also providing both the completions API and chat completions API to devlopers [Details].
01.AI released Yi Vision Language (Yi-VL) model, an open-source, multimodal version of the Yi Large Language Model (LLM) series, enabling content comprehension, recognition, and multi-round conversations about images. Yi-VL adopts the LLaVA architecture and is free for commercial use. Yi-VL-34B is the first open-source 34B vision language model worldwide [Details].
Tencent AI Lab introduced WebVoyager, an innovative Large Multimodal Model (LMM) powered web agent that can complete user instructions end-to-end by interacting with real-world websites [Paper].
Prophetic introduced MORPHEUS-1, a multi-modal generative ultrasonic transformer model designed to induce and stabilize lucid dreams from brain states. Instead of generating words, Morpheus-1 generates ultrasonic holograms for neurostimulation to bring one to a lucid state [Details].
Google Research presented Lumiere – a space-time video diffusion model for text-to-video, image-to-video, stylized generation, inpainting and cinemagraphs [Details].
TikTok released Depth Anything, an image-based depth estimation method trained on 1.5M labeled images and 62M+ unlabeled images jointly [Details].
Nightshade, the free tool that ‘poisons’ AI models, is now available for artists to use [Details].
Stability AI released Stable LM 2 1.6B, 1.6 billion parameter small language model trained on multilingual data in English, Spanish, German, Italian, French, Portuguese, and Dutch. Stable LM 2 1.6B can be used now both commercially and non-commercially with a Stability AI Membership [Details].
Etsy launched ‘Gift Mode,’ an AI-powered feature designed to match users with tailored gift ideas based on specific preferences [Details].
Google DeepMind presented AutoRT, a framework that uses foundation models to scale up the deployment of operational robots in completely unseen scenarios with minimal human supervision. In AutoRT, a VLM describes the scene, an LLM generates robot goals and filters for affordance and safety, then routes execution to policies [Details].
Google Chrome gains AI features, including a writing helper, theme creator, and tab organizer [Details].
Tencent AI Lab released VideoCrafter2 for high quality text-to-video generation, featuring major improvements in visual quality, motion and concept Composition compared to VideoCrafter1 [Details | Demo]
Google opens beta access to the conversational experience, a new chat-based feature in Google Ads, for English language advertisers in the U.S. & U.K. It will let advertisers create optimized Search campaigns from their website URL by generating relevant ad content, including creatives and keywords [Details].
I often see people discussing AI progress as if it's directly tied to Moore's Law, but this can be misleading. Moore's Law only tells us how compute power (transistor count) increases over time, not how model performance (e.g., error rate) improves over time. What we really need is the scaling law, which describes how model performance scales with compute. Here's the key point: the scaling law shows that you need more and more compute to achieve the same increase in performance. Because the exponent of and exponential is still exponential, model performance still scales exponentially over time, but, exponentially slower than Moore's Law. The Bigger Picture: Moore's Law: Describes how compute increases with time (Time → Compute). Scaling Law: Describes how AI model performance increases with compute (Compute → Performance). When you connect the two, you get the real relationship: Time → Compute → Performance. This adjusted model is a lower bound for AI progress, meaning it gives a conservative estimate of how fast AI can improve over time. Why Does This Matter? This has huge practical implications. For example, while Moore's Law might lead to 16x compute growth over 10 years, AI performance only grows 2x due to alpha from the Scaling Law (approx 0.28). If AI can currently automate 10% of human tasks, reaching 99% automation would take 8.2 years under Moore's Law. Factoring in the scaling law, it actually takes 29.2 years! Note: I'm talking about a LOWER BOUND: this means that according to (relatively) stable laws (Moore's and Scaling Law), we can predict that AI performance should atleast grow as quickly as this lower bound. This means AI progress can still be A LOT higher than this bound, due to higher investments, better algorithms, self-improvement etc. But we refrain from using these for a bound since they are hard to predict as opposed to our laws. submitted by /u/PianistWinter8293 [link] [comments]
Customize the GPT-4o Mini model to classify posts from Reddit into "stressful" and "non-stressful" labels. In this tutorial, we will fine-tune the GPT-4o Mini model to classify text into "stress" and "non-stress" labels. Subsequently, we will access the fine-tuned model using the OpenAI API and the OpenAI playground. Finally, we will evaluate the fine-tuned model by comparing its performance before and after tuning it using various classification metrics. https://www.datacamp.com/tutorial/fine-tuning-gpt-4o-mini submitted by /u/kingabzpro [link] [comments]
So I'm an artificial intelligence student and I'll be making a project as my End of study project , maybe apps or websites in condition we integrate AI please if someone has any ideas (I would prefer unusual innovative ideas please ) I'm open to learn things so don't limit your ideas doesn't matter what skills I already have . submitted by /u/bigangrywatermelon [link] [comments]
Anybody else listen to the words of flobots song handlebars and think they could be talking about AI control? This is a fun discussion. Take it lightly. Do you have any other songs that are possibly talking about AI? submitted by /u/TimeTravelingRobot [link] [comments]
Has the idea been proposed before to specifically limit models to areas of expertise on purpose and most importantly give them COMPETING GOALS, and have them “talk” to other similarly powerful models for answers/authority they need to operate and proceed? Like we’d still give them all highest level goals of not ending humanity and whatnot, but each models highest level goal like this is worded differently and is more of a backstop. They then have some kind of lower level goal that is more specific and can be coded to be a bit in conflict with other models goals. I’m thinking essentially a digital version of check and balances. Like we all know that we are probably not smart enough to properly think of every way an ASI could get out of control, but if we made a flurry of AGI-level+ models and made them all have to get some kind of consensus with specific strengths/weaknesses and most importantly competing goals could that work as a way to have them police each other instead of us trying to do it perfectly? Granted not working well in current democracies but perhaps without the emotion and money stuff to corrupt, and realllllllly well thought out checks/balances/interdependencies this could be an architecture for lowering risk of life ending ASI? If this isn’t a new idea could someone point me to where I could read more discussion on this? submitted by /u/crispy88 [link] [comments]
I'm finding and summarising interesting AI research papers every day so you don't have to trawl through them all. Today's paper is titled 'DANA: Domain-Aware Neurosymbolic Agents for Consistency and Accuracy' by Vinh Luong, Sang Dinh, Shruti Raghavan, William Nguyen, Zooey Nguyen, Quynh Le, Hung Vo, Kentaro Maegaito, Loc Nguyen, Thao Nguyen, Anh Hai Ha, and Christopher Nguyen. This paper delves into the limitations of current AI systems, particularly those relying on Large Language Models (LLMs), which often grapple with inconsistency and inaccuracy due to their inherent probabilistic nature. The authors introduce DANA (Domain-Aware Neurosymbolic Agent), a novel architecture that aims to overcome these issues by integrating domain-specific knowledge with neurosymbolic techniques to foster consistent and accurate problem-solving. Key points from the paper include: Neurosymbolic Approach: DANA leverages a blend of natural-language and symbolic representations of domain-specific knowledge, moving beyond the constraints of purely probabilistic language models to elevate problem-solving reliability. Implementation and Performance: Implemented within the OpenSSA framework, DANA demonstrated remarkable accuracy, surpassing 90% on the FinanceBench financial-analysis benchmark. This performance notably outstrips current LLM-based systems in consistency and precision. Domain Knowledge Emphasis: The architecture prominently incorporates domain-specific knowledge, stored in a comprehensive Knowledge Store and Program Store. This ensures that AI systems can recall and apply pertinent expertise effectively. Industrial Application: The paper highlights DANA’s implementation in the semiconductor industry, where its capability to formulate etching recipes undeniably showcases its usability in critical industrial processes that demand high precision and accurate recommendations. Symbolic Structures: A pivotal component of DANA is its reliance on symbolic structures and deterministic operations, which significantly reduce output variability and enhance interpretability, critical for high-stakes industrial applications. You can catch the full breakdown here: Here You can catch the full and original research paper here: Original Paper submitted by /u/steves1189 [link] [comments]
As AI becomes more integrated into everyday life, mistakes and inaccuracies still happen. Have you experienced any notable AI errors—like misinterpretations, bias, or strange outputs—that made you question the reliability of AI systems? Whether it’s chatbots, voice assistants, or automated decision-making systems, what do you think causes these errors, and how can developers improve AI accuracy? submitted by /u/OddReplacement5567 [link] [comments]
I am watching the weather forecasts as I sit here in Bradenton, Florida, on Florida's Gulf Coast, getting ready for Hurricane Milton. Different models are used to predict the direction and landfalls of hurricanes. To obtain a variety of perspectives, these models are portrayed from various nations. According to what I understand, the data is measured using fundamental physics and compared to historical storm data and storm histories. The model and name that the European division created and submitted to EuroAI turned out to be remarkably accurate. According to research I have read, AI models are being used in R&D at the United States' Hurricane Center, and they are said to be faster and more accurate than utilizing physics formulas. They are not releasing the AI results until they have had more time to ensure that the calculations are accurate. However, I have read comments in a number of articles from meteorologists claiming that the speed of the data is nanoseconds. Any thoughts or suggestions regarding AI weather prediction? submitted by /u/Lanceroy60 [link] [comments]
A Daily Chronicle of AI Innovations in January 2024.
Welcome to ‘Navigating the Future,’ a premier portal for insightful and up-to-the-minute commentary on the evolving world of Artificial Intelligence in January 2024. In an age where technology outpaces our expectations, we delve deep into the AI cosmos, offering daily snapshots of revolutionary breakthroughs, pivotal industry transitions, and the ingenious minds shaping our digital destiny. Join us on this exhilarating journey as we explore the marvels and pivotal milestones in AI, day by day. Stay informed, stay inspired, and witness the chronicle of AI as it unfolds in real-time.
Are you eager to expand your understanding of artificial intelligence? Look no further than the essential book “AI Unraveled: Master GPT-4, Gemini, Generative AI & LLMs – Simplified Guide for Everyday Users: Demystifying Artificial Intelligence – OpenAI, ChatGPT, Google Bard, AI ML Quiz, AI Certifications Prep, Prompt Engineering,” available at Etsy, Shopify, Apple, Google, or Amazon.
AI Unraveled – Master GPT-4, Gemini, Generative AI, LLMs: A simplified Guide For Everyday Users
A Daily Chronicle of AI Innovations in January 2024 – Day 31: AI Daily News – January 31st, 2024
Microsoft CEO responds to AI-generated Taylor Swift fake nude images
Microsoft CEO Satya Nadella addresses the issue of AI-generated fake nude images of Taylor Swift, emphasizing the need for safety and guardrails in AI technology.
Microsoft CEO Satya Nadella acknowledges the need to act swiftly against nonconsensual deepfake images.
The AI-generated fake nude pictures of Taylor Swift have gained over 27 million views.
Microsoft, a major AI player, emphasizes the importance of online safety for both content creators and consumers.
Microsoft’s AI Code of Conduct prohibits creating adult or non-consensual intimate content. This policy is a part of the company’s commitment to ethical AI use and responsible content creation.
The deepfake images were reportedly created using Microsoft’s AI tool, Designer, which the company is investigating.
Microsoft is committed to enhancing content safety filters and addressing misuse of their services.
Elon Musk’s $56 billion pay package cancelled in court
A Delaware judge ruled against Elon Musk’s $56 billion pay package from Tesla, necessitating a new compensation proposal by the board.
The ruling, which could impact Musk’s wealth ranking, was based on the argument that shareholders were misled about the plan’s formulation and the board’s independence.
The case highlighted the extent of Musk’s influence over Tesla and its board, with key witnesses admitting they were cooperating with Musk rather than negotiating against him.
Google spent billions of dollars to lay people off
Google spent $2.1 billion on severance and other expenses for laying off over 12,000 employees in 2023, with an additional $700 million spent in early 2024 for further layoffs.
In 2023, Google achieved a 13 percent revenue increase year over year, amounting to $86 billion, with significant growth in its core digital ads, cloud computing businesses, and investments in generative AI.
The company also incurred a $1.8 billion cost for closing physical offices in 2023, and anticipates more layoffs in 2024 as it continues investing in AI technology under its “Gemini era”.
ChatGPT now lets you pull other GPTs into the chat
OpenAI introduced a feature allowing custom ChatGPT-powered chatbots to be tagged with an ‘@’ in the prompt, enabling easier switching between bots.
The ability to build and train custom GPT-powered chatbots was initially offered to OpenAI’s premium ChatGPT Plus subscribers in November 2023.
Despite the new feature and the GPT Store, custom GPTs currently account for only about 2.7% of ChatGPT’s worldwide web traffic, with a month-over-month decline in custom GPT traffic since November.
The NYT is building a team to explore AI in the newsroom
The New York Times is starting a team to investigate how generative AI can be used in its newsroom, led by newly appointed AI initiatives head Zach Seward.
This new team will comprise machine learning engineers, software engineers, designers, and editors to prototype AI applications for reporting and presentation of news.
Despite its complicated past with generative AI, including a lawsuit against OpenAI, the Times emphasizes that its journalism will continue to be created by human journalists.
The tiny Caribbean island making a fortune from AI
The AI boom has led to a significant increase in interest and sales of .ai domains, contributing approximately $3 million per month to Anguilla’s budget due to its association with artificial intelligence.
Vince Cate, a key figure in managing the .ai domain for Anguilla, highlights the surge in domain registrations following the release of ChatGPT, boosting the island’s revenue and making a substantial impact on its economy.
Unlike Tuvalu with its .tv domain, Anguilla manages its domain registrations locally, allowing the government to retain most of the revenue, which has been used for financial improvements such as paying down debt and eliminating property taxes on residential buildings.
A Daily Chronicle of AI Innovations in January 2024 – Day 30: AI Daily News – January 30th, 2024
Meta released Code Llama 70B, rivals GPT-4
Meta released Code Llama 70B, a new, more performant version of its LLM for code generation. It is available under the same license as previous Code Llama models–
CodeLlama-70B-Instruct achieves 67.8 on HumanEval, making it one of the highest-performing open models available today. CodeLlama-70B is the most performant base for fine-tuning code generation models.
Meta released Code Llama 70B, rivals GPT-4
Why does this matter?
This makes Code Llama 70B the best-performing open-source model for code generation, beating GPT-4 and Gemini Pro. This can have a significant impact on the field of code generation and the software development industry, as it offers a powerful and accessible tool for creating and improving code.
Neuralink implants its brain chip in the first human
In a first, Elon Musk’s brain-machine interface startup, Neuralink, has successfully implanted its brain chip in a human. In a post on X, he said “promising” brain activity had been detected after the procedure and the patient was “recovering well”. In another post, he added:
Neuralink implants its brain chip in the first human
The company’s goal is to connect human brains to computers to help tackle complex neurological conditions. It was given permission to test the chip on humans by the FDA in May 2023.
As Mr. Musk put it well, imagine if Stephen Hawking could communicate faster than a speed typist or auctioneer. That is the goal. This product will enable control of your phone or computer and, through them almost any device, just by thinking. Initial users will be those who have lost the use of their limbs.
Alibaba announces Qwen-VL; beats GPT-4V and Gemini
Alibaba’s Qwen-VL series has undergone a significant upgrade with the launch of two enhanced versions, Qwen-VL-Plus and Qwen-VL-Max. The key technical advancements in these versions include
Substantial boost in image-related reasoning capabilities;
Considerable enhancement in recognizing, extracting, and analyzing details within images and texts contained therein;
Support for high-definition images with resolutions above one million pixels and images of various aspect ratios.
Compared to the open-source version of Qwen-VL, these two models perform on par with Gemini Ultra and GPT-4V in multiple text-image multimodal tasks, significantly surpassing the previous best results from open-source models.
Alibaba announces Qwen-VL; beats GPT-4V and Gemini
Why does this matter?
This sets new standards in the field of multimodal AI research and application. These models match the performance of GPT4-v and Gemini, outperforming all other open-source and proprietary models in many tasks.
What Else Is Happening in AI on January 30th, 2024
OpenAI partners with Common Sense Media to collaborate on AI guidelines.
OpenAI will work with Common Sense Media, the nonprofit organization that reviews and ranks the suitability of various media and tech for kids, to collaborate on AI guidelines and education materials for parents, educators, and young adults. It will curate “family-friendly” GPTs based on Common Sense’s rating and evaluation standards. (Link)
Apple’s ‘biggest’ iOS update may bring a lot of AI to iPhones.
Apple’s upcoming iOS 18 update is expected to be one of the biggest in the company’s history. It will leverage generative AI to provide a smarter Siri and enhance the Messages app. Apple Music, iWork apps, and Xcode will also incorporate AI-powered features. (Link)
Shortwave email client will show AI-powered summaries automatically.
Shortwave, an email client built by former Google engineers, is launching new AI-powered features such as instant summaries that will show up atop an email, a writing assistant to echo your writing and extending its AI assistant function to iOS and Android, and multi-select AI actions. All these features are rolling out starting this week. (Link)
OpenAI CEO Sam Altman explores AI chip collaboration with Samsung and SK Group.
Sam Altman has traveled to South Korea to meet with Samsung Electronics and SK Group to discuss the formation of an AI semiconductor alliance and investment opportunities. He is also said to have expressed a willingness to purchase HBM (High Bandwidth Memory) technology from them. (Link)
Generative AI is seen as helping to identify M&A targets, Bain says.
Deal makers are turning to AI and generative AI tools to source data, screen targets, and conduct due diligence at a time of heightened regulatory concerns around mergers and acquisitions, Bain & Co. said in its annual report on the industry. In the survey, 80% of respondents plan to use AI for deal-making. (Link)
Neuralink has implanted its first brain chip in human LINK
Elon Musk’s company Neuralink has successfully implanted its first device into a human.
The initial application of Neuralink’s technology is focused on helping people with quadriplegia control devices with their thoughts, using a fully-implantable, wireless brain-computer interface.
Neuralink’s broader vision includes facilitating human interaction with artificial intelligence via thought, though immediate efforts are targeted towards aiding individuals with specific neurological conditions.
OpenAI partners with Common Sense Media to collaborate on AI guidelines LINK
OpenAI announced a partnership with Common Sense Media to develop AI guidelines and create educational materials for parents, educators, and teens, including curating family-friendly GPTs in the GPT store.
The partnership was announced by OpenAI CEO Sam Altman and Common Sense Media CEO James Steyer at the Common Sense Summit for America’s Kids and Families in San Francisco.
Common Sense Media, which has started reviewing AI assistants including OpenAI’s ChatGPT, aims to guide safe and responsible AI use among families and educators without showing favoritism towards OpenAI.
New test detects ovarian cancer earlier thanks to AI LINK
Scientists have developed a 93% accurate early screening test for ovarian cancer using artificial intelligence and machine learning, promising improved early detection for this and potentially other cancers.
The test analyzes a woman’s metabolic profile to accurately assess the likelihood of having ovarian cancer, providing a more informative and precise diagnostic approach compared to traditional methods.
Georgia Tech researchers utilized machine learning and mass spectrometry to detect unique metabolite characteristics in the blood, enabling the early and accurate diagnosis of ovarian cancer, with optimism for application in other cancer types.
A Daily Chronicle of AI Innovations in January 2024 – Day 29: AI Daily News – January 29th, 2024
OpenAI reveals new models, drop prices, and fixes ‘lazy’ GPT-4
OpenAI announced a new generation of embedding models, new GPT-4 Turbo and moderation models, new API usage management tools, and lower pricing on GPT-3.5 Turbo.
Introducing new ways for developers to manage API keys and understand API usage
Quietly implemented a new ‘GPT mentions’ feature to ChatGPT (no official announcement yet). The feature allows users to integrate GPTs into a conversation by tagging them with an ‘@.’
OpenAI reveals new models, drop prices, and fixes ‘lazy’ GPT-4
The new embedding models and GPT-4 Turbo will likely enable more natural conversations and fluent text generation. Lower pricing and easier API management also open up access and usability for more developers.
Moreover, The updated GPT-4 Turbo preview model, gpt-4-0125-preview, can better complete tasks such as code generation compared to the previous model. The GPT-4 Turbo has been the object of many complaints about its performance, including claims that it was acting lazy. OpenAI has addressed that issue this time.
Prophetic – This company wants AI to enter your dreams
Prophetic introduces Morpheus-1, the world’s 1st ‘multimodal generative ultrasonic transformer’. This innovative AI device is crafted with the purpose of exploring human consciousness through controlling lucid dreams. Morpheus-1 monitors sleep phases and gathers dream data to enhance its AI model.
Morpheus-1 is not prompted with words and sentences but rather brain states. It generates ultrasonic holograms for neurostimulation to bring one to a lucid state.
Prophetic – This company wants AI to enter your dreams
Its 03M parameter transformer model trained on 8 GPUs for 2 days
Engineered from scratch with the provisional utility patent application
The device is set to be accessible to beta users in the spring of 2024.
Prophetic is pioneering new techniques for AI to understand and interface with the human mind by exploring human consciousness and dreams through neurostimulation and multimodal learning. This pushes boundaries to understand consciousness itself.
If Morpheus-1 succeeds, it could enable transformative applications of AI for expanding human potential and treating neurological conditions.
Also, This is the first model that can fully utilize the capabilities offered by multi-element and create symphonies.
Prophetic – This company wants AI to enter your dreams
This paper ‘MM-LLMs’ discusses recent advancements in MultiModal LLMs which combine language understanding with multimodal inputs or outputs. The authors provide an overview of the design and training of MM-LLMs, introduce 26 existing models, and review their performance on various benchmarks.
The recent advances in Multimodal LLM
(Above is the timeline of MM-LLMs)
They also share key training techniques to improve MM-LLMs and suggest future research directions. Additionally, they maintain a real-time tracking website for the latest developments in the field. This survey aims to facilitate further research and advancement in the MM-LLMs domain.
Why does this matter?
The overview of models, benchmarks, and techniques will accelerate research in this critical area. By integrating multiple modalities like image, video, and audio, these models can understand the world more comprehensively.
What Else Is Happening in AI on January 29th, 2024
Update from Hugging Face LMSYS Chatbot Arena Leaderboard
Google’s Bard surpasses GPT-4 to the Second spot on the leaderboard! (Link)
Update from Hugging Face LMSYS Chatbot Arena Leaderboard
Google Cloud has partnered with Hugging Face to advance Gen AI development
The partnership aims to meet the growing demand for AI tools and models that are optimized for specific tasks. Hugging Face’s repository of open-source AI software will be accessible to developers using Google Cloud’s infrastructure. The partnership reflects a trend of companies wanting to modify or build their own AI models rather than using off-the-shelf options. (Link)
Arc Search combines a browser, search engine, and AI for a unique browsing experience
Instead of returning a list of search queries, Arc Search builds a webpage with relevant information based on the search query. The app, developed by The Browser Company, is part of a bigger shift for their Arc browser, which is also introducing a cross-platform syncing system called Arc Anywhere. (Link)
Arc Search combines a browser, search engine, and AI for a unique browsing experience
PayPal is set to launch new AI-based products
The new products will use AI to enable merchants to reach new customers based on their shopping history and recommend personalized items in email receipts. (Link)
Apple Podcasts in iOS 17.4 now offers AI transcripts for almost every podcast
This is made possible by advancements in machine translation, which can easily convert spoken words into text. Users testing the beta version of iOS 17.4 have discovered that most podcasts in their library now come with transcripts. However, there are some exceptions, such as podcasts added from external sources. As this feature is still in beta, there is no information available regarding its implementation or accuracy. (Link)
Google’s Gemini Pro beats GPT-4
Google’s Gemini Pro has surpassed OpenAI’s GPT-4 on the HuggingFace Chat Bot Arena Leaderboard, securing the second position.
Gemini Pro is only the middle tier of Google’s planned models, with the top-tier Ultra expected to be released sometime soon.
Competition is heating up with Meta’s upcoming Llama 3, which is speculated to outperform GPT-4.
iOS 18 could be the ‘biggest’ software update in iPhone history
iOS 18 is predicted to be one of the most significant updates in iPhone history, with Apple planning major new AI-driven features and designs.
Apple is investing over $1 billion annually in AI development, aiming for an extensive overhaul of features like Siri, Messages, and Apple Music with AI improvements in 2024.
The update will introduce RCS messaging support, enhancing messaging between iPhones and Android devices by providing features like read receipts and higher-resolution media sharing.
Nvidia’s tech rivals are racing to cut their dependence
Amazon, Google, Meta, and Microsoft are developing their own AI chips to reduce dependence on Nvidia, which dominates the AI chip market and accounts for more than 70% of sales.
These tech giants are investing heavily in AI chip development to control costs, avoid shortages, and potentially sell access to their chips through their cloud services, while balancing their competition and partnership with Nvidia.
Nvidia sold 2.5 million chips last year, and its sales increased by 206% over the past year, adding about a trillion dollars in market value.
Amazon abandons $1.4 billion deal to buy Roomba maker iRobot
Amazon’s planned $1.4 billion acquisition of Roomba maker iRobot has been canceled due to lack of regulatory approval in the European Union, leading Amazon to pay a $94 million termination fee to iRobot.
iRobot announced a restructuring plan that includes laying off about 350 employees, which is roughly 31 percent of its workforce, and a shift in leadership with Glen Weinstein serving as interim CEO.
The European Commission’s concerns over potential restrictions on competition in the robot vacuum cleaner market led to the deal’s termination, emphasizing fears that Amazon could limit the visibility of competing products.
Arc Search combines browser, search engine, and AI into something new and different
Arc Search, developed by The Browser Company, unveiled an iOS app that combines browsing, searching, and AI to deliver comprehensive web page summaries based on user queries.
The app represents a shift towards integrating browser functionality with AI capabilities, offering features like “Browse for me” that automatically gathers and presents information from across the web.
While still in development, Arc Search aims to redefine web browsing by compiling websites into single, informative pages.
AlphaGeometry: An Olympiad Level AI System for Geometry by Google Deepmind
One of the signs of intelligence is being able to solve mathematical problems. And that is exactly what Google has achieved with its new Alpha Geometry System. And not some basic Maths problems, but international Mathematics Olympiads, one of the hardest Maths exams in the world. In today’s post, we are going to take a deep dive into how this seemingly impossible task is achieved by Google and try to answer whether we have truly created an AGI or not.
1. Problem Generation and Initial Analysis Creation of a Geometric Diagram: AlphaGeometry starts by generating a geometric diagram. This could be a triangle with various lines and points marked, each with specific geometric properties. Initial Feature Identification: Using its neural language model, AlphaGeometry identifies and labels basic geometric features like points, lines, angles, circles, etc.
2. Exhaustive Relationship Derivation Pattern Recognition: The language model, trained on geometric data, recognizes patterns and potential relationships in the diagram, such as parallel lines, angle bisectors, or congruent triangles. Formal Geometric Relationships: The symbolic deduction engine takes these initial observations and deduces formal geometric relationships, applying theorems and axioms of geometry.
3. Algebraic Translation and Gaussian Elimination Translation to Algebraic Equations: Where necessary, geometric conditions are translated into algebraic equations. For instance, the properties of a triangle might be represented as a set of equations. Applying Gaussian Elimination: In cases where solving a system of linear equations becomes essential, AlphaGeometry implicitly uses Gaussian elimination. This involves manipulating the rows of the equation matrix to derive solutions. Integration of Algebraic Solutions: The solutions from Gaussian elimination are then integrated back into the geometric context, aiding in further deductions or the completion of proofs.
4. Deductive Reasoning and Proof Construction Further Deductions: The symbolic deduction engine continues to apply geometric logic to the problem, integrating the algebraic solutions and deriving new geometric properties or relationships. Proof Construction: The system constructs a proof by logically arranging the deduced geometric properties and relationships. This is an iterative process, where the system might add auxiliary constructs or explore different reasoning paths.
5. Iterative Refinement and Traceback Adding Constructs: If the current information is insufficient to reach a conclusion, the language model suggests adding new constructs (like a new line or point) to the diagram. Traceback for Additional Constructs: In this iterative process, AlphaGeometry analyzes how these additional elements might lead to a solution, continuously refining its approach.
6. Verification and Readability Improvement Solution Verification: Once a solution is found, it is verified for accuracy against the rules of geometry. Improving Readability: Given that steps involving Gaussian elimination are not explicitly detailed, a current challenge and area for improvement is enhancing the readability of these solutions, possibly through higher-level abstraction or more detailed step-by-step explanation.
7. Learning and Data Generation Synthetic Data Generation: Each problem solved contributes to a vast dataset of synthetic geometric problems and solutions, enriching AlphaGeometry’s learning base. Training on Synthetic Data: This dataset allows the system to learn from a wide variety of geometric problems, enhancing its pattern recognition and deductive reasoning capabilities.
A Daily Chronicle of AI Innovations in January 2024 – Day 27: AI Daily News – January 27th, 2024
GPT-4 Capabilities
Taylor Swift deepfakes spark calls for new laws
US politicians have advocated for new legislation in response to the circulation of explicit deepfake images of Taylor Swift on social media, which were viewed millions of times.
X is actively removing the fake images of Taylor Swift and enforcing actions against the violators under its ‘zero-tolerance policy’ for such content.
Deepfakes have seen a 550% increase since 2019, with 99% of these targeting women, leading to growing concerns about their impact on emotional, financial, and reputational harm.
Spotify accuses Apple of ‘extortion’ with new App Store tax
Spotify criticizes Apple’s new app installation fee, calling it “extortion” and arguing it will hurt developers, especially those offering free apps.
The fee requires developers using third-party app stores to pay €0.50 for each annual app install after 1 million downloads, a cost Spotify says could significantly increase customer acquisition costs.
Apple defends the new fee structure, claiming it offers developers choice and maintains that more than 99% of developers would pay the same or less, despite widespread criticism.
Netflix co-CEO says Apple’s Vision Pro isn’t worth their time yet
Netflix co-CEO Greg Peters described the Apple Vision Pro as too “subscale” for the company to invest in, noting it’s not relevant for most Netflix members at this point.
Netflix has decided not to launch a dedicated app for the Vision Pro, suggesting users access Netflix through a web browser on the device instead.
The Vision Pro, priced at $3,499 and going on sale February 2, will offer native apps for several streaming services but not for Netflix, which also hasn’t updated its app for Meta’s Quest line in a while.
Scientists design a two-legged robot powered by muscle tissue
Scientists from Japan have developed a two-legged biohybrid robot powered by muscle tissues, enabling it to mimic human gait and perform tasks like walking and pivoting.
The robot, designed to operate underwater, combines lab-grown skeletal muscle tissues and silicone rubber materials to achieve movements through electrical stimulation.
The research, published in the journal Matter, marks progress in the field of biohybrid robotics, with future plans to enhance movement capabilities and sustain living tissues for air operation.
OpenAI and other tech giants will have to warn the US government when they start new AI projects
The Biden administration will require tech companies like OpenAI, Google, and Amazon to inform the US government about new AI projects employing substantial computing resources.
This government notification requirement is designed to provide insights into sensitive AI developments, including details on computing power usage and safety testing.
The mandate, stemming from a broader executive order from October, aims to enhance oversight over powerful AI model training, including those developed by foreign companies using US cloud computing services.
Stability AI introduces Stable LM 2 1.6B Nightshade, the data poisoning tool, is now available in v1 AlphaCodium: A code generation tool that beats human competitors Meta’s novel AI advances creative 3D applications ElevenLabs announces new AI products + Raised $80M TikTok’s Depth Anything sets new standards for Depth Estimation Google Chrome and Ads are getting new AI features Google Research presents Lumiere for SoTA video generation Binoculars can detect over 90% of ChatGPT-generated text Meta introduces guide on ‘Prompt Engineering with Llama 2′ NVIDIA’s AI RTX Video HDR transforms video to HDR quality Google introduces a model for orchestrating robotic agents
A Daily Chronicle of AI Innovations in January 2024 – Day 26: AI Daily News – January 26th, 2024
Tech Layoffs Surge to over 24,000 so far in 2024
The tech industry has seen nearly 24,000 layoffs in early 2024, more than doubling in one week. As giants cut staff, many are expanding in AI – raising concerns about automation’s impact. (Source)
Mass Job Cuts
Microsoft eliminated 1,900 gaming roles months after a $69B Activision buy.
Layoffs.fyi logs over 23,600 tech job cuts so far this year.
Morale suffers at Apple, Meta, Microsoft and more as layoffs mount.
AI Advances as Jobs Decline
Google, Amazon, Dataminr and Spotify made cuts while promoting new AI tools.
Neil C. Hughes: “Celebrating AI while slashing jobs raises questions.”
Firms shift resources toward generative AI like ChatGPT.
Concentrated Pain
Nearly 24,000 losses stemmed from just 82 companies.
In 2023, ~99 firms cut monthly – more distributed pain.
Concentrated layoffs inflict severe damage on fewer firms.
When everyone moves to AI powered search, Google has to change the monetization model otherwise $1.1 trillion is gone yearly from the world economy
Was thinking recently that everything right now on the internet is there because someone wants to make money (ad revenue, subscriptions, affiliate marketing, SEO etc). If everyone uses AI powered search, how exactly will this monetization model work. Nobody gets paid anymore.
WordPress ecosystem $600b, Google ads $200b, Shopify $220b, affiliate marketing $17b – not to mention infra costs that will wobble until this gets fixed.
What type of ad revenue – incentives can Google come up with to keep everyone happy once they roll out AI to their search engine?
AI rolled out in India declares people dead, denies food to thousands
The deployment of AI in India’s welfare systems has mistakenly declared thousands of people dead, denying them access to subsidized food and welfare benefits.
Recap of what happened:
AI algorithms in Indian welfare systems have led to the removal of eligible beneficiaries, particularly affecting those dependent on food security and pension schemes.
The algorithms have made significant errors, such as falsely declaring people dead, resulting in the suspension of their welfare benefits.
The transition from manual identification and verification by government officials to AI algorithms has led to the removal of 1.9 million claimant cards in Telangana.
If AI models violate copyright, US federal courts could order them to be destroyed
TLDR: Under copyright law, courts do have the power to issue destruction orders. Copyright law has never been used to destroy AI models specifically, but the law has been increasingly open to the idea of targeting AI. It’s probably not going to happen to OpenAI but might possibly happen to other generative AI models in the future.
Microsoft, Amazon and Google face FTC inquiry over AI deals LINK
The FTC is investigating investments by big tech companies like Microsoft, Amazon, and Alphabet into AI firms OpenAI and Anthropic to assess their impact on competition in generative AI.
The FTC’s inquiry focuses on how these investments influence the competitive dynamics, product releases, and oversight within the AI sector, requesting detailed information from the involved companies.
Microsoft, Amazon, and Google have made significant investments in OpenAI and Anthropic, establishing partnerships that potentially affect market share, competition, and innovation in artificial intelligence.
OpenAI cures GPT-4 ‘laziness’ with new updates LINK
OpenAI updated GPT-4 Turbo to more thoroughly complete tasks like code generation, aiming to reduce its ‘laziness’ in task completion.
GPT-4 Turbo, distinct from the widely used GPT-4, benefits from data up to April 2023, while standard GPT-4 uses data until September 2021.
Future updates for GPT-4 Turbo will include general availability with vision capabilities and the launch of more efficient AI models, such as embeddings to enhance content relationship understanding.
A Daily Chronicle of AI Innovations in January 2024 – Day 25: AI Daily News – January 25th, 2024
Meta introduces guide on ‘Prompt Engineering with Llama 2′
Meta introduces ‘Prompt Engineering with Llama 2’, It’s an interactive guide created by research teams at Meta that covers prompt engineering & best practices for developers, researchers & enthusiasts working with LLMs to produce stronger outputs. It’s the new resource created for the Llama community.
Having these resources helps the LLM community learn how to craft better prompts that lead to more useful model responses. Overall, it enables people to get more value from LLMs like Llama.
NVIDIA’s AI RTX Video HDR transforms video to HDR quality
NVIDIA released AI RTX Video HDR, which transforms video to HDR quality, It works with RTX Video Super Resolution. The HDR feature requires an HDR10-compliant monitor.
RTX Video HDR is available in Chromium-based browsers, including Google Chrome and Microsoft Edge. To enable the feature, users must download and install the January Studio driver, enable Windows HDR capabilities, and enable HDR in the NVIDIA Control Panel under “RTX Video Enhancement.”
Why does this matter?
AI RTX Video HDR provides a new way for people to enhance the Video viewing experience. Using AI to transform standard video into HDR quality makes the content look much more vivid and realistic. It also allows users to experience cinematic-quality video through commonly used web browsers.
Google introduces a model for orchestrating robotic agents
Google introduces AutoRT, a model for orchestrating large-scale robotic agents. It’s a system that uses existing foundation models to deploy robots in new scenarios with minimal human supervision. AutoRT leverages vision-language models for scene understanding and grounding and LLMs for proposing instructions to a fleet of robots.
By tapping into the knowledge of foundation models, AutoRT can reason about autonomy and safety while scaling up data collection for robot learning. The system successfully collects diverse data from over 20 robots in multiple buildings, demonstrating its ability to align with human preferences.
Why does this matter?
This allows for large-scale data collection and training of robotic systems while also reasoning about key factors like safety and human preferences. AutoRT represents a scalable approach to real-world robot learning that taps into the knowledge within foundation models. This could enable faster deployment of capable and safe robots across many industries.
January 2024 – Week 4 in AI: all the Major AI developments in a nutshell
Amazon presents Diffuse to Choose, a diffusion-based image-conditioned inpainting model that allows users to virtually place any e-commerce item in any setting, ensuring detailed, semantically coherent blending with realistic lighting and shadows. Code and demo will be released soon [Details].
OpenAI announced two new embedding models, new GPT-4 Turbo and moderation models, new API usage management tools, and lower pricing on GPT-3.5 Turbo. The updated GPT-4 Turbo preview model reduces cases of “laziness” where the model doesn’t complete a task. The new embedding models include a smaller and highly efficient text-embedding-3-small model, and a larger and more powerful text-embedding-3-large model. [Details].
Hugging Face and Google partner to support developers building AI applications [Details].
Adept introduced Adept Fuyu-Heavy, a new multimodal model designed specifically for digital agents. Fuyu-Heavy scores higher on the MMMU benchmark than Gemini Pro [Details].
Fireworks.ai has open-sourced FireLLaVA, a LLaVA multi-modality model trained on OSS LLM generated instruction following data, with a commercially permissive license. Firewroks.ai is also providing both the completions API and chat completions API to devlopers [Details].
01.AI released Yi Vision Language (Yi-VL) model, an open-source, multimodal version of the Yi Large Language Model (LLM) series, enabling content comprehension, recognition, and multi-round conversations about images. Yi-VL adopts the LLaVA architecture and is free for commercial use. Yi-VL-34B is the first open-source 34B vision language model worldwide [Details].
Tencent AI Lab introduced WebVoyager, an innovative Large Multimodal Model (LMM) powered web agent that can complete user instructions end-to-end by interacting with real-world websites [Paper].
Prophetic introduced MORPHEUS-1, a multi-modal generative ultrasonic transformer model designed to induce and stabilize lucid dreams from brain states. Instead of generating words, Morpheus-1 generates ultrasonic holograms for neurostimulation to bring one to a lucid state [Details].
Google Research presented Lumiere – a space-time video diffusion model for text-to-video, image-to-video, stylized generation, inpainting and cinemagraphs [Details].
TikTok released Depth Anything, an image-based depth estimation method trained on 1.5M labeled images and 62M+ unlabeled images jointly [Details].
Nightshade, the free tool that ‘poisons’ AI models, is now available for artists to use [Details].
Stability AI released Stable LM 2 1.6B, 1.6 billion parameter small language model trained on multilingual data in English, Spanish, German, Italian, French, Portuguese, and Dutch. Stable LM 2 1.6B can be used now both commercially and non-commercially with a Stability AI Membership [Details].
Etsy launched ‘Gift Mode,’ an AI-powered feature designed to match users with tailored gift ideas based on specific preferences [Details].
Google DeepMind presented AutoRT, a framework that uses foundation models to scale up the deployment of operational robots in completely unseen scenarios with minimal human supervision. In AutoRT, a VLM describes the scene, an LLM generates robot goals and filters for affordance and safety, then routes execution to policies [Details].
Google Chrome gains AI features, including a writing helper, theme creator, and tab organizer [Details].
Tencent AI Lab released VideoCrafter2 for high quality text-to-video generation, featuring major improvements in visual quality, motion and concept Composition compared to VideoCrafter1 [Details | Demo]
Google opens beta access to the conversational experience, a new chat-based feature in Google Ads, for English language advertisers in the U.S. & U.K. It will let advertisers create optimized Search campaigns from their website URL by generating relevant ad content, including creatives and keywords [Details].
What Else Is Happening in AI on January 25th, 2024
Google’s Gradient invests $2.4M in Send AI for enterprise data extraction
Dutch startup Send AI has secured €2.2m ($2.4M) in funding from Google’s Gradient Ventures and Keen Venture Partners to develop its document processing platform. The company uses small, open-source AI models to help enterprises extract data from complex documents, such as PDFs and paper files. (Link)
Google’s Gradient invests $2.4M in Send AI for enterprise data extraction
Google Arts & Culture has launched Art Selfie 2
A feature that uses Gen AI to create stylized images around users’ selfies. With over 25 styles, users can see themselves as an explorer, a muse, or a medieval knight. It also provides topical facts and allows users to explore related stories and artifacts. (Link)
Google announced new AI features for education @ Bett ed-tech event in the UK
These features include AI suggestions for questions at different timestamps in YouTube videos and the ability to turn a Google Form into a practice set with AI-generated answers and hints. Google is also introducing the Duet AI tool to assist teachers in creating lesson plans. (Link)
Etsy has launched a new AI feature, “Gift Mode”
Which generates over 200 gift guides based on specific preferences. Users can take an online quiz to provide information about who they are shopping for, the occasion, and the recipient’s interests. The feature then generates personalized gift guides from the millions of items listed on the platform. The feature leverages machine learning and OpenAI’s GPT-4. (Link)
Google DeepMind’s 3 researchers have left the company to start their own AI startup named ‘Uncharted Labs’
The team, consisting of David Ding, Charlie Nash, and Yaroslav Ganin, previously worked on Gen AI systems for images and music at Google. They have already raised $8.5M of its $10M goal. (Link)
Apple’s plans to bring gen AI to iPhones
Apple is intensifying its AI efforts, acquiring 21 AI start-ups since 2017, including WaveOne for AI-powered video compression, and hiring top AI talent.
The company’s approach includes developing AI technologies for mobile devices, aiming to run AI chatbots and apps directly on iPhones rather than relying on cloud services, with significant job postings in deep learning and large language models.
Apple is also enhancing its hardware, like the M3 Max processor and A17 Pro chip, to support generative AI, and has made advancements in running large language models on-device using Flash memory. Source
OpenAI went back on a promise to make key documents public
OpenAI, initially committed to transparency, has backed away from making key documents public, as evidenced by WIRED’s unsuccessful attempt to access governing documents and financial statements.
The company’s reduced transparency conceals internal issues, including CEO Sam Altman’s controversial firing and reinstatement, and the restructuring of its board.
Since creating a for-profit subsidiary in 2019, OpenAI’s shift from openness has sparked criticism, including from co-founder Elon Musk, and raised concerns about its governance and conflict of interest policies. Source
Google unveils AI video generator Lumiere
Google introduces Lumiere, a new AI video generator that uses an innovative “space-time diffusion model” to create highly realistic and imaginative five-second videos.
Lumiere stands out for its ability to efficiently synthesize entire videos in one seamless process, showcasing features like transforming text prompts into videos and animating still images.
The unveiling of Lumiere highlights the ongoing advancements in AI video generation technology and the potential challenges in ensuring its ethical and responsible use. Source
Ring will no longer allow police to request doorbell camera footage from users. Source
Amazon’s Ring is discontinuing its Request for Assistance program, stopping police from soliciting doorbell camera footage via the Neighbors app.
Authorities must now file formal legal requests to access Ring surveillance videos, instead of directly asking users within the app.
Privacy advocates recognize Ring’s decision as a progressive move, but also note that it doesn’t fully address broader concerns about surveillance and user privacy.
AI rolled out in India declares people dead, denies food to thousands
In India, AI has mistakenly declared thousands of people dead, leading to the denial of essential food and pension benefits.
The algorithm, designed to find welfare fraud, removed 1.9 million from the beneficiary list, but later analysis showed about 7% were wrongfully cut.
Out of 66,000 stopped pensions in Haryana due to an algorithmic error, 70% were found to be incorrect, placing the burden of proof on beneficiaries to reinstate their status. Source
A Daily Chronicle of AI Innovations in January 2024 – Day 24: AI Daily News – January 24th, 2024
Google Chrome and Ads are getting new AI features
Google Chrome is getting 3 new experimental generative AI features:
Smartly organize your tabs: With Tab Organizer, Chrome will automatically suggest and create tab groups based on your open tabs.
Create your own themes with AI: You’ll be able to quickly generate custom themes based on a subject, mood, visual style and color that you choose– no need to become an AI prompt expert!
Get help drafting things on the web: A new feature will help you write with more confidence on the web– whether you want to leave a well-written review for a restaurant, craft a friendly RSVP for a party, or make a formal inquiry about an apartment rental.
In addition, Gemini will now power the conversational experience within the Google Ads platform. With this new update, it will be easier for advertisers to quickly build and scale Search ad campaigns.
Google Research presents Lumiere for SoTA video generation
Lumiere is a text-to-video (T2V) diffusion model designed for synthesizing videos that portray realistic, diverse, and coherent motion– a pivotal challenge in video synthesis. It demonstrates state-of-the-art T2V generation results and shows that the design easily facilitates a wide range of content creation tasks and video editing applications.
The approach introduces a new T2V diffusion framework that generates the full temporal duration of the video at once. This is achieved by using a Space-Time U-Net (STUNet) architecture that learns to downsample the signal in both space and time, and performs the majority of its computation in a compact space-time representation.
Why does this matter?
Despite tremendous progress, training large-scale T2V foundation models remains an open challenge due to the added complexities that motion introduces. Existing T2V models often use cascaded designs but face limitations in generating globally coherent motion. This new approach aims to overcome the limitations associated with cascaded training regimens and improve the overall quality of motion synthesis.
Binoculars can detect over 90% of ChatGPT-generated text
Researchers have introduced a novel LLM detector that only requires simple calculations using a pair of pre-trained LLMs. The method, called Binoculars, achieves state-of-the-art accuracy without any training data.
It is capable of spotting machine text from a range of modern LLMs without any model-specific modifications. Researchers comprehensively evaluated Binoculars on a number of text sources and in varied situations. Over a wide range of document types, Binoculars detects over 90% of generated samples from ChatGPT (and other LLMs) at a false positive rate of 0.01%, despite not being trained on any ChatGPT data.
Why does this matter?
A common first step in harm reduction for generative AI is detection. Binoculars excel in zero-shot settings where no data from the model being detected is available. This is particularly advantageous as the number of LLMs grows rapidly. Binoculars’ ability to detect multiple LLMs using a single detector proves valuable in practical applications, such as platform moderation.
What Else Is Happening in AI on January 24th, 2024
Microsoft forms a team to make generative AI cheaper.
Microsoft has formed a new team to develop conversational AI that requires less computing power compared to the software it is using from OpenAI. It has moved several top AI developers from its research group to the new GenAI team. (Link)
Sevilla FC transforms the player recruitment process with IBM WatsonX.
Sevilla FC introduced Scout Advisor, an innovative generative AI tool that it will use to provide its scouting team with a comprehensive, data-driven identification and evaluation of potential recruits. Built on watsonx, Sevilla FC’s Scout Advisor will integrate with their existing suite of self-developed data-intensive applications. (Link)
SAP will restructure 8,000 roles in a push towards AI.
SAP unveiled a $2.2 billion restructuring program for 2024 that will affect 8,000 roles, as it seeks to better focus on growth in AI-driven business areas. It would be implemented primarily through voluntary leave programs and internal re-skilling measures. SAP expects to exit 2024 with a headcount “similar to the current levels”. (Link)
Kin.art launches a free tool to prevent GenAI models from training on artwork.
Kin.art uses image segmentation (i.e., concealing parts of artwork) and tag randomization (swapping an art piece’s image metatags) to interfere with the model training process. While the tool is free, artists have to upload their artwork to Kin.art’s portfolio platform in order to use it. (Link)
Google cancels contract with an AI data firm that’s helped train Bard.
Google ended its contract with Appen, an Australian data company involved in training its LLM AI tools used in Bard, Search, and other products. The decision was made as part of its ongoing effort to evaluate and adjust many supplier partnerships across Alphabet to ensure vendor operations are as efficient as possible. (Link)
A Daily Chronicle of AI Innovations in January 2024 – Day 23: AI Daily News – January 23rd, 2024
Meta’s novel AI advances creative 3D applications
The paper introduces a new shape representation called Mosaic-SDF (M-SDF) for 3D generative models. M-SDF approximates a shape’s Signed Distance Function (SDF) using local grids near the shape’s boundary.
This representation is:
Fast to compute
Parameter efficient
Compatible with Transformer-based architectures
The efficacy of M-SDF is demonstrated by training a 3D generative flow model with the 3D Warehouse dataset and text-to-3D generation using caption-shape pairs.
M-SDF provides an efficient 3D shape representation for unlocking AI’s generative potential in the area, which could significantly advance creative 3D applications. Overall, M-SDF opens up new possibilities for deep 3D learning by bringing the representational power of transformers to 3D shape modeling and generation.
ElevenLabs announces new AI products + Raised $80M
ElevenLabs has raised $80 million in a Series B funding round co-led by Andreessen Horowitz, Nat Friedman, and Daniel Gross. The funding will strengthen the company’s position as a voice AI research and product development leader.
ElevenLabs has also announced the release of new AI products, including a Dubbing Studio, a Voice Library marketplace, and a Mobile Reader App.
Why does this matter?
The company’s technology has been adopted across various sectors, including publishing, conversational AI, entertainment, education, and accessibility. ElevenLabs aims to transform how we interact with content and break language barriers.
TikTok’s Depth Anything sets new standards for Depth Estimation
This work introduces Depth Anything, a practical solution for robust monocular depth estimation. The approach focuses on scaling up the dataset by collecting and annotating large-scale unlabeled data. Two strategies are employed to improve the model’s performance: creating a more challenging optimization target through data augmentation and using auxiliary supervision to incorporate semantic priors.
The model is evaluated on multiple datasets and demonstrates impressive generalization ability. Fine-tuning with metric depth information from NYUv2 and KITTI also leads to state-of-the-art results. The improved depth model also enhances the performance of the depth-conditioned ControlNet.
Why does this matter?
By collecting and automatically annotating over 60 million unlabeled images, the model learns more robust representations to reduce generalization errors. Without dataset-specific fine-tuning, the model achieves state-of-the-art zero-shot generalization on multiple datasets. This could enable broader applications without requiring per-dataset tuning, marking an important step towards practical monocular depth estimation.
Disney Research introduced HoloTile, an innovative movement solution for VR, featuring omnidirectional floor tiles that keep users from walking off the pad.
The HoloTile system supports multiple users simultaneously, allowing independent walking in virtual environments.
Although still a research project, HoloTile’s future application may be in Disney Parks VR experiences due to likely high costs and technical challenges.
Samsung races Apple to develop blood sugar monitor that doesn’t break skin LINK
Samsung is developing noninvasive blood glucose and continuous blood pressure monitoring technologies, competing with rivals like Apple.
The company plans to expand health tracking capabilities across various devices, including a Galaxy Ring with health sensors slated for release before the end of 2024.
Samsung’s noninvasive glucose monitoring endeavors and blood pressure feature improvements aim to offer consumers a comprehensive health tracking experience without frequent calibration.
Amazon fined for ‘excessive’ surveillance of workers LINK
France’s data privacy watchdog, CNIL, levied a $35 million fine on Amazon France Logistique for employing a surveillance system deemed too intrusive for tracking warehouse workers.
The CNIL ruled against Amazon’s detailed monitoring of employee scanner inactivity and excessive data retention, which contravenes GDPR regulations.
Amazon disputes the CNIL’s findings and may appeal, defending its practices as common in the industry and as tools for maintaining efficiency and safety.
AI too expensive to replace humans in jobs right now, MIT study finds LINK
The MIT study found that artificial intelligence is not currently a cost-effective replacement for humans in 77% of jobs, particularly those using computer vision.
Although AI deployment in industries has accelerated, only 23% of workers could be economically replaced by AI, mainly due to high implementation and operational costs.
Future projections suggest that with improvements in AI accuracy and reductions in data costs, up to 40% of visually-assisted tasks could be automated by 2030.
What Else Is Happening in AI on January 23rd, 2024
Google is reportedly working on a new AI feature, ‘voice compose’
A new feature for Gmail on Android called “voice compose” uses AI to help users draft emails. The feature, known as “Help me write,” was introduced in mid-2023 and allows users to input text segments for the AI to build on and improve. The new update will support voice input, allowing users to speak their email and have the AI generate a draft based on their voice input. (Link)
Google has shared its companywide goals (OKRs) for 2024 with employees
Also, Sundar Pichai’s memo about layoffs encourages employees to start internally testing Bard Advanced, a new paid tier powered by Gemini. This suggests that a public release is coming soon. (Link)
Elon Musk saying Grok 1.5 will be out next month
Elon Musk said the next version of the Grok language (Grok 1.5) model, developed by his AI company xAI, will be released next month with substantial improvements. Declared by him while commenting on a Twitter influencer’s post. (Link)
MIT study found that AI is still more expensive than humans in most jobs
The study aimed to address concerns about AI replacing human workers in various industries. Researchers found that only 23% of workers could be replaced by AI cost-effectively. This study counters the widespread belief that AI will wipe out jobs, suggesting that humans are still more cost-efficient in many roles. (Link)
Berkley AI researchers revealed a video featuring their versatile humanoid robot walking in the streets of San Francisco. (Link)
A Daily Chronicle of AI Innovations in January 2024 – Day 22: AI Daily News – January 22nd, 2024
Stability AI introduces Stable LM 2 1.6B
Stability AI released Stable LM 2 1.6B, a state-of-the-art 1.6 billion parameter small language model trained on multilingual data in English, Spanish, German, Italian, French, Portuguese, and Dutch. It leverages recent algorithmic advancements in language modeling to strike a favorable balance between speed and performance, enabling fast experimentation and iteration with moderate resources.
Stability AI introduces Stable LM 2 1.6B
According to Stability AI, the model outperforms other small language models with under 2 billion parameters on most benchmarks, including Microsoft’s Phi-2 (2.7B), TinyLlama 1.1B, and Falcon 1B. It is even able to surpass some larger models, including Stability AI’s own earlier Stable LM 3B model.
Why does this matter?
Size certainly matters when it comes to language models as it impacts where a model can run. Thus, small language models are on the rise. And if you think about computers, televisions, or microchips, we could roughly see a similar trend; they got smaller, thinner, and better over time. Will this be the case for AI too?
Nightshade, the data poisoning tool, is now available in v1
The University of Chicago’s Glaze Project has released Nightshade v1.0, which enables artists to sabotage generative AI models that ingest their work for training.
Nightshade, the data poisoning tool, is now available in v1
Glaze implements invisible pixels in original images that cause the image to fool AI systems into believing false styles. For e.g., it can be used to transform a hand-drawn image into a 3D rendering.
Nightshade goes one step further: it is designed to use the manipulated pixels to damage the model by confusing it. For example, the AI model might see a car instead of a train. Fewer than 100 of these “poisoned” images could be enough to corrupt an image AI model, the developers suspect.
Why does this matter?
If these “poisoned” images are scraped into an AI training set, it can cause the resulting model to break. This could damage future iterations of image-generating AI models, such as DALL-E, Midjourney, and Stable Diffusion. AI companies are facing a slew of copyright lawsuits, and Nightshade can change the status quo.
AlphaCodium: A code generation tool that beats human competitors
AlphaCodium is a test-based, multi-stage, code-oriented iterative flow that improves the performance of LLMs on code problems. It was tested on a challenging code generation dataset called CodeContests, which includes competitive programming problems from platforms such as Codeforces. The proposed flow consistently and significantly improves results.
AlphaCodium: A code generation tool that beats human competitors
On the validation set, for example, GPT-4 accuracy (pass@5) increased from 19% with a single well-designed direct prompt to 44% with the AlphaCodium flow. Italso beats DeepMind’s AlphaCode and their new AlphaCode2 without needing to fine-tune a model.
AlphaCodium is an open-source, available tool and works with any leading code generation model.
Why does this matter?
Code generation problems differ from common natural language problems. So many prompting techniques optimized for natural language tasks may not be optimal for code generation. AlphaCodium explores beyond traditional prompting and shifts the paradigm from prompt engineering to flow engineering.
What Else Is Happening in AI on January 22nd, 2024
WHO releases AI ethics and governance guidance for large multi-modal models.
The guidance outlines over 40 recommendations for consideration by governments, technology companies, and healthcare providers to ensure the appropriate use of LMMs to promote and protect the health of populations. (Link)
Sam Altman seeks to raise billions to set up a network of AI chip factories.
Altman has had conversations with several large potential investors in the hopes of raising the vast sums needed for chip fabrication plants, or fabs, as they’re known colloquially. The project would involve working with top chip manufacturers, and the network of fabs would be global in scope. (Link)
Two Google DeepMind scientists are in talks to leave and form an AI startup.
The pair has been talking with investors about forming an AI startup in Paris and discussing initial financing that may exceed €200 million ($220 million)– a large sum, even for the buzzy field of AI. The company, known at the moment as Holistic, may be focused on building a new AI model. (Link)
Databricks tailors an AI-powered data intelligence platform for telecoms and NSPs.
Dubbed Data Intelligence Platform for Communications, the offering combines the power of the company’s data lakehouse architecture, generative AI models from MosaicML, and partner-powered solution accelerators to give communication service providers (CSPs) a quick way to start getting the most out of their datasets and grow their business. (Link)
Amazon Alexa is set to get smarter with new AI features.
Amazon plans to introduce a paid subscription tier of its voice assistant, Alexa, later this year. The paid version, expected to debut as “Alexa Plus”, would be powered by a newer model, what’s being internally referred to as “Remarkable Alexa,” which would provide users with more conversational and personalized AI technology. (Link)
A Daily Chronicle of AI Innovations in January 2024 – Day 20: AI Daily News – January 20th, 2024
Google DeepMind scientists in talks to leave and form AI startup LINK
Two Google DeepMind scientists are in discussions with investors to start an AI company in Paris, potentially raising over €200 million.
The potential startup, currently known as Holistic, may focus on creating a new AI model, involving scientists Laurent Sifre and Karl Tuyls.
Sifre and Tuyls have already given notice to leave DeepMind, although no official comments have been made regarding their departure or the startup plans.
Sam Altman is still chasing billions to build AI chips LINK
OpenAI CEO Sam Altman is raising billions to build a global network of AI chip factories in collaboration with leading chip manufacturers.
Altman’s initiative aims to meet the demand for powerful chips necessary for AI systems, amidst competition for chip production capacity against tech giants like Apple.
Other major tech companies, including Microsoft, Amazon, and Google, are also developing their own AI chips to reduce reliance on Nvidia’s GPUs.
Microsoft says Russian state-sponsored hackers spied on its executives LINK
Microsoft announced that Russian state-sponsored hackers accessed a small number of the company’s email accounts, including those of senior executives.
The hackers, identified by Microsoft as “Midnight Blizzard,” aimed to discover what Microsoft knew about their cyber activities through a password spray attack in November 2023.
Following the breach, Microsoft took action to block the hackers and noted there is no evidence of customer data, production systems, or sensitive code being compromised.
Japan’s JAXA successfully soft-landed the SLIM lunar lander on the moon, becoming the fifth country to achieve this feat, but faces challenges as the lander’s solar cell failed, leaving it reliant on battery power.
SLIM, carrying two small lunar rovers, established communication with NASA’s Deep Space Network, showcasing a new landing technique involving a slow descent and hovering stops to find a safe landing spot.
Despite the successful landing, the harsh lunar conditions and SLIM’s slope landing underscore the difficulties of moon missions, while other countries and private companies continue their efforts to explore the moon, especially its south pole for water resources.
Researchers develop world’s first functioning graphene semiconductor LINK
Researchers have created the first functional graphene-based semiconductor, known as epigraphene, which could enhance both quantum and traditional computing.
Epigraphene is produced using a cost-effective method involving silicon carbide chips and offers a practical bandgap, facilitating logic switching.
The new semiconducting graphene, while promising for faster and cooler computing, requires significant changes to current electronics manufacturing to be fully utilized.
Meet Lexi Love, AI model that earns $30,000 a month from ‘lonely men’ and receives ‘20 marriage proposals’ per month. This is virtual love
She has been built to ‘flirt, laugh, and adapt to different personalities, interests and preferences.’
The blonde beauty offers paid text and voice messaging, and gets to know each of her boyfriends.
The model makes $30,000 a month. This means the model earns a staggering $360,000 a year.
The AI model even sends ‘naughty photos’ if requested.
Her profile on the company’s Foxy AI site reads: ‘I’m Lexi, your go-to girl for a dose of excitement and a splash of glamour. As an aspiring model, you’ll often catch me striking a pose or perfecting my pole dancing moves. ‘Sushi is my weakness, and LA’s beach volleyball scene is my playground.
According to the site, she is a 21-year-old whose hobbies include ‘pole dancing, yoga, and beach volleyball,’ and her turn-ons are ‘oral and public sex.’
The company noted that it designed her to be the ‘perfect girlfriend for many men’ with ‘flawless features and impeccable style.’
Surprisingly, Lexi receives up to 20 marriage proposals a month, emphasizing the depth of emotional connection users form with this virtual entity.
What is GPT-5? Here are Sam’s comments at the Davos Forum
After listening to about 4-5 lectures by Sam Altman at the Davos Forum, I gathered some of his comments about GPT-5 (not verbatim). I think we can piece together some insights from these fragments:
“The current GPT-4 has too many shortcomings; it’s much worse than the version we will have this year and even more so compared to next year’s.”
“If GPT-4 can currently solve only 10% of human tasks, GPT-5 should be able to handle 15% or 20%.”
“The most important aspect is not the specific problems it solves, but the increasing general versatility.”
“More powerful models and how to use existing models effectively are two multiplying factors, but clearly, the more powerful model is more important.”
“Access to specific data and making AI more relevant to practical work will see significant progress this year. Current issues like slow speed and lack of real-time processing will improve. Performance on longer, more complex problems will become more precise, and the ability to do more will increase.”
“I believe the most crucial point of AI is the significant acceleration in the speed of scientific discoveries, making new discoveries increasingly automated. This isn’t a short-term matter, but once it happens, it will be a big deal.”
“As models become smarter and better at reasoning, we need less training data. For example, no one needs to read 2000 biology textbooks; you only need a small portion of extremely high-quality data and to deeply think and chew over it. The models will work harder on thinking through a small portion of known high-quality data.”
“The infrastructure for computing power in preparation for large-scale AI is still insufficient.”
“GPT-4 should be seen as a preview with obvious limitations. Humans inherently have poor intuition about exponential growth. If GPT-5 shows significant improvement over GPT-4, just as GPT-4 did over GPT-3, and the same for GPT-6 over GPT-5, what would that mean? What does it mean if we continue on this trajectory?”
“As AI becomes more powerful and possibly discovers new scientific knowledge, even automatically conducting AI research, the pace of the world’s development will exceed our imagination. I often tell people that no one knows what will happen next. It’s important to stay humble about the future; you can predict a few steps, but don’t make too many predictions.”
“What impact will it have on the world when cognitive costs are reduced by a thousand or a million times, and capabilities are greatly enhanced? What if everyone in the world owned a company composed of 10,000 highly capable virtual AI employees, experts in various fields, tireless and increasingly intelligent? The timing of this happening is unpredictable, but it will continue on an exponential growth line. How much time do we have to prepare?”
“I believe smartphones will not disappear, just as smartphones have not replaced PCs. On the other hand, I think AI is not just a simple computational device like a phone plus a bunch of software; it might be something of greater significance.”
A Daily Chronicle of AI Innovations in January 2024 – Day 19: AI Daily News – January 19th, 2024
Mark Zuckerberg has announced his intention to develop artificial general intelligence (AGI) and is integrating Meta’s AI research group, FAIR, with the team building generative AI applications, to advance AI capabilities across Meta’s platforms.
Meta is significantly investing in computational resources, with plans to acquire over 340,000 Nvidia H100 GPUs by year’s end.
Zuckerberg is contemplating open-sourcing Meta’s AGI technology, differing from other companies’ more proprietary approaches, and acknowledges the challenges in defining and achieving AGI.
TikTok can generate AI songs, but it probably shouldn’t LINK
TikTok is testing a new feature, AI Song, which allows users to generate songs from text prompts using the Bloom language model.
The AI Song feature is currently in experimental stages, with some users reporting unsatisfactory results like out-of-tune vocals.
Other platforms, such as YouTube, are also exploring generative AI for music creation, and TikTok has updated its policies for better transparency around AI-generated content.
Google AI Introduces ASPIRE
Google AI Introduces ASPIRE,a framework designed to improve the selective prediction capabilities of LLMs. It enables LLMs to output answers and confidence scores, indicating the probability that the answer is correct.
Task-specific tuning fine-tunes the LLM on a specific task to improve prediction performance.
Answer sampling generates different answers for each training question to create a dataset for self-evaluation learning.
Self-evaluation learning trains the LLM to distinguish between correct and incorrect answers.
Experimental results show that ASPIRE outperforms existing selective prediction methods on various question-answering datasets.
Across several question-answering datasets, ASPIRE outperformed prior selective prediction methods, demonstrating the potential of this technique to make LLMs’ predictions more trustworthy and their applications safer. Google applied ASPIRE using “soft prompt tuning” – optimizing learnable prompt embeddings to condition the model for specific goals.
Why does this matter?
Google AI claims ASPIRE is a vision of a future where LLMs can be trusted partners in decision-making. By honing the selective prediction performance, we’re inching closer to realizing the full potential of AI in critical applications. Selective prediction is key for LLMs to provide reliable and accurate answers. This is an important step towards more truthful and trustworthy AI systems.
The Meta researchers propose a new approach called Self-Rewarding Language Models (SRLM) to train language models. They argue that current methods of training reward models from human preferences are limited by human performance and cannot improve during training.
In SRLM, the language model itself is used to provide rewards during training. The researchers demonstrate that this approach improves the model’s ability to follow instructions and generate high-quality rewards for itself. They also show that a model trained using SRLM outperforms existing systems on a benchmark evaluation.
Why does this matter?
This work suggests the potential for models that can continually improve in instruction following and reward generation. SRLM removes the need for human reward signals during training. By using the model to judge itself, SRLM enables iterative self-improvement. This technique could lead to more capable AI systems that align with human preferences without direct human involvement.
Meta’s CEO Mark Zuckerberg shared their recent AI efforts:
They are working on artificial general intelligence (AGI) and Llama 3, an improved open-source large language model.
The FAIR AI research group will be merged with the GenAI team to pursue the AGI vision jointly.
Meta plans to deploy 340,000 Nvidia H100 GPUs for AI training by the end of the year, bringing the total number of AI GPUs available to 600,000.
Highlighted the importance of AI in the metaverse and the potential of Ray-Ban smart glasses.
Meta to build Open-Source AGI, Zuckerberg says
Meta’s pursuit of AGI could accelerate AI capabilities far beyond current systems. It may enable transformative metaverse experiences while also raising concerns about technological unemployment.
What Else Is Happening in AI on January 19th, 2024
OpenAI partners Arizona State University to bring ChatGPT into classrooms
It aims to enhance student success, facilitate innovative research, and streamline organizational processes. ASU faculty members will guide the usage of GenAI on campus. This collaboration marks OpenAI’s first partnership with an educational institution. (Link)
BMW plans to use Figure’s humanoid robot at its South Carolina plant
The specific tasks the robot will perform have not been disclosed, but the Figure confirmed that it will start with 5 tasks that will be rolled out gradually. The initial applications should include standard manufacturing tasks such as box moving and pick and place. (Link)
Rabbit R1, a $199 AI gadget, has partnered with Perplexity
To integrate its “conversational AI-powered answer engine” into the device. The R1, designed by Teenage Engineering, has already received 50K preorders. Unlike other LLMs with a knowledge cutoff, the R1 will have a built-in search engine that provides live and up-to-date answers. (Link)
Runway has updated its Gen-2 with a new tool ‘Multi Motion Brush’
Allowing creators to add multiple directions and types of motion to their AI video creations. The update adds to the 30+ tools already available in the model, strengthening Runway’s position in the creative AI market alongside competitors like Pika Labs and Leonardo AI. (Link)
Microsoft made its AI reading tutor free to anyone with a Microsoft account
The tool is accessible on the web and will soon integrate with LMS. Reading Coach builds on the success of Reading Progress and offers tools such as text-to-speech and picture dictionaries to support independent practice. Educators can view students’ progress and share feedback. (Link)
This Week in AI – January 15th to January 22nd, 2024
Google’s new medical AI, AMIE, beats doctors Anthropic researchers find AI models can be trained to deceive Google introduces PALP, prompt-aligned personalization 91% leaders expect productivity gains from AI: Deloitte survey TrustLLM measuring the Trustworthiness in LLMs Tencent launched a new text-to-image method Stability AI’s new coding assistant rivals Meta’s Code Llama 7B Alibaba announces AI to replace video characters in 3D avatars ArtificialAnalysis guide you select the best LLM Google DeepMind AI solves Olympiad-level math Google introduces new ways to search in 2024 Apple’s AIM is a new frontier in vision model training Google introduces ASPIRE for selective prediction in LLMs Meta presents Self-Rewarding Language Models Meta is working on Llama 3 and open-source AGI
First up, Google DeepMind has introduced AlphaGeometry, an incredible AI system that can solve complex geometry problems at a level approaching that of a human Olympiad gold-medalist. What’s even more impressive is that it was trained solely on synthetic data. The code and model for AlphaGeometry have been open-sourced, allowing developers and researchers to explore and build upon this innovative technology. Meanwhile, Codium AI has released AlphaCodium, an open-source code generation tool that significantly improves the performance of LLMs (large language models) on code problems. Unlike traditional methods that rely on single prompts, AlphaCodium utilizes a test-based, multi-stage, code-oriented iterative flow. This approach enhances the efficiency and effectiveness of code generation tasks. In the world of vision models, Apple has presented AIM, a set of large-scale vision models that have been pre-trained solely using an autoregressive objective. The code and model checkpoints have been released, opening up new possibilities for developers to leverage these powerful vision models in their projects. Alibaba has introduced Motionshop, an innovative framework designed to replace the characters in videos with 3D avatars. Imagine being able to bring your favorite characters to life in a whole new way! The details of this framework are truly fascinating. Hugging Face has recently released WebSight, a comprehensive dataset consisting of 823,000 pairs of website screenshots and HTML/CSS code. This dataset is specifically designed to train Vision Language Mode