Download the AI & Machine Learning For Dummies PRO App: iOS - Android Our AI and Machine Learning For Dummies PRO App can help you Ace the following AI and Machine Learning certifications:
Welcome to the March 2024 edition of the Daily Chronicle, your gateway to the forefront of Artificial Intelligence innovation! Embark on a captivating journey with us as we unveil the most recent advancements, trends, and revolutionary discoveries in the realm of artificial intelligence. Delve into a world where industry giants converge at events like ‘AI Innovations at Work’ and where visionary forecasts shape the future landscape of AI. Stay abreast of daily updates as we navigate through the dynamic realm of AI, unraveling its potential impact and exploring cutting-edge developments throughout this enthralling month. Join us on this exhilarating expedition into the boundless possibilities of AI in March 2024.
Experience the transformative capabilities of AI with “Read Aloud For Me – AI Dashboard” – your ultimate AI Dashboard and Hub. Seamlessly access a comprehensive suite of top-tier AI tools within a single app, meticulously crafted to enhance your efficiency and streamline your digital interactions. Now available on the web at readaloudforme.com and across popular app platforms including Apple, Google, and Microsoft, “Read Aloud For Me – AI Dashboard” places the future of AI at your fingertips, blending convenience with cutting-edge innovation. Whether for professional endeavors, educational pursuits, or personal enrichment, our app serves as your portal to the forefront of AI technologies. Embrace the future today by downloading our app and revolutionize your engagement with AI tools.
A daily chronicle of AI Innovations: March 31st 2024: Generative AI develops potential new drugs for antibiotic-resistant bacteria; South Korean ‘artificial sun’ hits record 100M degrees for 100 seconds; Summary of the key points about OpenAI’s relationship with Dubai and the UAE; Deepmind did not originally see LLMs and the transformer as a path to AGI. Fascinating article.
Generative AI develops potential new drugs for antibiotic-resistant bacteria
Stanford Medicine researchers devise a new artificial intelligence model, SyntheMol, which creates recipes for chemists to synthesize the drugs in the lab.
With nearly 5 million deaths linked to antibiotic resistance globally every year, new ways to combat resistant bacterial strains are urgently needed.
Researchers at Stanford Medicine and McMaster University are tackling this problem with generative artificial intelligence. A new model, dubbed SyntheMol (for synthesizing molecules), created structures and chemical recipes for six novel drugs aimed at killing resistant strains of Acinetobacter baumannii, one of the leading pathogens responsible for antibacterial resistance-related deaths.
The researchers described their model and experimental validation of these new compounds in a study published March 22 in the journal Nature Machine Intelligence.
“There’s a huge public health need to develop new antibiotics quickly,” said James Zou, PhD, an associate professor of biomedical data science and co-senior author on the study. “Our hypothesis was that there are a lot of potential molecules out there that could be effective drugs, but we haven’t made or tested them yet. That’s why we wanted to use AI to design entirely new molecules that have never been seen in nature.”
South Korean ‘artificial sun’ hits record 100M degrees for 100 seconds
For the first time, the Korea Institute of Fusion Energy’s (KFE) Korea Superconducting Tokamak Advanced Research (KSTAR) fusion reactor has reached temperatures seven times that of the Sun’s core.
Achieved during testing between December 2023 and February 2024, this sets a new record for the fusion reactor project.
KSTAR, the researchers behind the reactor report, managed to maintain temperatures of 212 degrees Fahrenheit (100 million degrees Celsius) for 48 seconds. For reference, the temperature of the core of our Sun is 27 million degrees Fahrenheit (15 million degrees Celsius).
Gemini 1.5 Pro on Vertex AI is available for everyone as an experimental release
I think this one has flown under the radar: Gemini 1.5 Pro is available as Experimental on Vertex AI, for everyone, UI only for now (no API yet). In us-central1.
“Summary of the key points about OpenAI’s relationship with Dubai and the UAE”
OpenAI’s Partnership with G42
In October 2023, G42, a leading UAE-based technology holding group, announced a partnership with OpenAI to deliver advanced AI solutions to the UAE and regional markets.
The partnership will focus on leveraging OpenAI’s generative AI models in domains where G42 has deep expertise, including financial services, energy, healthcare, and public services.
G42 will prioritize its substantial AI infrastructure capacity to support OpenAI’s local and regional inferencing on Microsoft Azure data centers.
Sam Altman, CEO of OpenAI, stated that the collaboration with G42 aims to empower businesses and communities with effective solutions that resonate with the nuances of the region.
Altman’s Vision for the UAE as an AI Sandbox
During a virtual appearance at the World Governments Summit, Altman suggested that the UAE could serve as the world’s “regulatory sandbox” to test AI technologies and later spearhead global rules limiting their use.
Altman believes the UAE is well-positioned to be a leader in discussions about unified global policies to rein in future advances in AI.
The UAE has invested heavily in AI and made it a key policy consideration.
Altman’s Pursuit of Trillions in Funding for AI Chip Manufacturing
Altman is reportedly in talks with investors, including the UAE, to raise $5-7 trillion for AI chip manufacturing to address the scarcity of GPUs crucial for training and running large language models.
As part of the talks, Altman is pitching a partnership between OpenAI, various investors, chip makers, and power providers to build chip foundries that would be run by existing chip makers, with OpenAI agreeing to be a significant customer.
In summary, OpenAI’s partnership with G42 aims to expand AI capabilities in the UAE and the Middle East, with Altman envisioning the UAE as a potential global AI sandbox.
It’s a very long article so I’ll post the relevant snippets. But basically it seems that Google was late to the LLM game because Demis Hassabis was 100% focused on AGI and did not see LLM’s as a path toward AGI. Perhaps now he sees it as a potential path, but it’s probably possible that he is just now focusing on LLM’s so that Google does not get too far behind in the generative AI race. But his ultimate goal and obsession is to create AGI that can solve real problems like diseases.
Within DeepMind, generative models weren’t taken seriously enough, according to those inside, perhaps because they didn’t align with Hassabis’s AGI priority, and weren’t close to reinforcement learning. Whatever the rationale, DeepMind fell behind in a key area.”
“‘We’ve always had amazing frontier work on self-supervised and deep learning,’ Hassabis tells me. ‘But maybe the engineering and scaling component — that we could’ve done harder and earlier. And obviously we’re doing that completely now.'”
“Kulkarni, the ex-DeepMind engineer, believes generative models were not respected at the time across the AI field, and simply hadn’t show enough promise to merit investment. ‘Someone taking the counter-bet had to pursue that path,’ he says. ‘That’s what OpenAI did.'”
“Ironically, a breakthrough within Google — called the transformer model — led to the real leap. OpenAI used transformers to build its GPT models, which eventually powered ChatGPT. Its generative ‘large language’ models employed a form of training called “self-supervised learning,” focused on predicting patterns, and not understanding their environments, as AlphaGo did. OpenAI’s generative models were clueless about the physical world they inhabited, making them a dubious path toward human level intelligence, but would still become extremely powerful.”
As DeepMind rejoiced, a serious challenge brewed beneath its nose. Elon Musk and Sam Altman founded OpenAI in 2015, and despite plenty of internal drama, the organization began working on text generation.”
“As OpenAI worked on the counterbet, DeepMind and its AI research counterpart within Google, Google Brain, struggled to communicate. Multiple ex-DeepMind employees tell me their division had a sense of superiority. And it also worked to wall itself off from the Google mothership, perhaps because Google’s product focus could distract from the broader AGI aims. Or perhaps because of simple tribalism. Either way, after inventing the transformer model, Google’s two AI teams didn’t immediately capitalize on it.”
“‘I got in trouble for collaborating on a paper with a Brain because the thought was like, well, why would you collaborate with Brain?’ says one ex-DeepMind engineer. ‘Why wouldn’t you just work within DeepMind itself?'”
“Then, a few months later, OpenAI released ChatGPT.” “At first, ChatGPT was a curiosity. The OpenAI chatbot showed up on the scene in late 2022 and publications tried to wrap their heads around its significance. […] Within Google, the product felt familiar to LaMDA, a generative AI chatbot the company had run internally — and even convinced one employee it was sentient — but never released. When ChatGPT became the fastest growing consumer product in history, and seemed like it could be useful for search queries, Google realized it had a problem on its hands.”
OpenAI reveals Voice Engine, but won’t yet publicly release the risky AI voice-cloning technology
OpenAI has released VoiceEngine, a voice-cloning tool. The company claims that it can recreate a person’s voice with just 15 seconds of recording of that person talking.
Demis Hassabis, CEO and one of three founders of Google’s artificial intelligence (AI) subsidiary DeepMind, has been awarded a knighthood in the U.K. for “services to artificial intelligence.” [Source]
A daily chronicle of AI Innovations: March 30th, 2024: Microsoft and OpenAI to build $100 billion AI supercomputer ‘Stargate’; OpenAI unveils voice-cloning tool; Amazon’s AI team faces pressure to outperform Anthropic’s Claude models by mid-year; Microsoft Copilot has been blocked on all Congress-owned devices
Microsoft and OpenAI to build $100 billion AI supercomputer ‘Stargate’
OpenAI and Microsoft are working on a $100 billion project to build an AI supercomputer named ‘Stargate’ in the U.S.
The supercomputer will house millions of GPUs and could cost over $115 billion.
Stargate is part of a series of datacenter projects planned by the two companies, with the goal of having it operational by 2028.
Microsoft will fund the datacenter, which is expected to be 100 times more costly than current operating centers.
The supercomputer is being built in phases, with Stargate being a phase 5 system.
Challenges include designing novel cooling systems and considering alternative power sources like nuclear energy.
OpenAI aims to move away from Nvidia’s technology and use Ethernet cables instead of InfiniBand cables.
Details about the location and structure of the supercomputer are still being finalized.
Both companies are investing heavily in AI infrastructure to advance the capabilities of AI technology.
Microsoft’s partnership with OpenAI is expected to deepen with the development of projects like Stargate.
Microsoft and OpenAI are reportedly collaborating on a significant project to create a U.S.-based datacenter for an AI supercomputer named “Stargate,” estimated to cost over $115 billion and utilize millions of GPUs.
The supercomputer aims to be the largest among the datacenters planned by the two companies within the next six years, with Microsoft covering the costs and aiming for a launch by 2028.
The project, considered to be in phase 5 of development, requires innovative solutions for power, cooling, and hardware efficiency, including a possible shift away from relying on Nvidia’s InfiniBand in favor of Ethernet cables.
OpenAI has developed a text-to-voice generation platform named Voice Engine, capable of creating a synthetic voice from just a 15-second voice clip.
The platform is in limited access, serving entities like the Age of Learning and Livox, and is being used for applications from education to healthcare.
With concerns around ethical use, OpenAI has implemented usage policies, requiring informed consent and watermarking audio to ensure transparency and traceability.
Amazon’s AI team faces pressure to outperform Anthropic’s Claude models by mid-year
Amazon has invested $4 billion in AI startup Anthropic, but is also developing a competing large-scale language model called Olympus.
Olympus is supposed to surpass Anthropic’s latest Claude model by the middle of the year and has “hundreds of billions of parameters.”
So far, Amazon has had no success with its own language models. Employees are unhappy with Olympus’ development time and are considering switching to Anthropic’s models.
Microsoft Copilot has been blocked on all Congress-owned devices
The US House of Representatives has banned its staff from using Microsoft’s AI chatbot Copilot due to cybersecurity concerns over potential data leaks.
Microsoft plans to remove Copilot from all House devices and is developing a government-specific version aimed at meeting federal security standards.
The ban specifically targets the commercial version of Copilot, with the House open to reassessing a government-approved version upon its release.
A daily chronicle of AI Innovations: March 29th, 2024: Apple files lawsuit against former engineer for leaking details of projects he wanted to kill; Apple files lawsuit against former engineer for leaking details of projects he wanted to kill; Microsoft tackles Gen AI risks with new Azure AI tools; AI21 Labs’ Jamba triples AI throughput ; Google DeepMind’s AI fact-checker outperforms humans ; X’s Grok gets a major upgrade; Lightning AI partners with Nvidia to launch Thunder AI compiler
Apple files lawsuit against former engineer for leaking details of projects he wanted to kill
Apple has filed a lawsuit against former employee Andrew Aude for leaking confidential information about products like the Vision Pro and Journal app to journalists and competitors, motivated by his desire to “kill” products and features he disagreed with.
Aude, who joined Apple in 2016, is accused of sharing sensitive details via encrypted messages and meetings, including over 10,000 text messages to a journalist from The Information.
The lawsuit seeks damages, the return of bonuses and stock options, and a restraining order against Aude for disclosing any more of Apple’s confidential information.
Microsoft launches tools to try and stop people messing with chatbots
Microsoft has introduced a new set of tools in Azure to enhance the safety and security of generative AI applications, especially chatbots, aiming to counter risks like abusive content and prompt injections.
The suite includes features for real-time monitoring and protection against sophisticated threats, leveraging advanced machine learning to prevent direct and indirect prompt attacks.
These developments reflect Microsoft’s ongoing commitment to responsible AI usage, fueled by its significant investment in OpenAI and intended to address the security and reliability concerns of corporate leaders.
AI21 Labs has released Jamba, the first-ever production-grade AI model based on the Mamba architecture. This new architecture combines the strengths of both traditional Transformer models and the Mamba SSM, resulting in a model that is both powerful and efficient. Jamba boasts a large context window of 256K tokens, while still fitting on a single GPU.
Jamba’s hybrid architecture, composed of Transformer, Mamba, and mixture-of-experts (MoE) layers, optimizes for memory, throughput, and performance simultaneously.
The model has demonstrated remarkable results on various benchmarks, matching or outperforming state-of-the-art models in its size class. Jamba is being released with open weights under Apache 2.0 license and will be accessible from the NVIDIA API catalog.
Jamba’s hybrid architecture makes it the only model capable of processing 240k tokens on a single GPU. This could make AI tasks like machine translation and document analysis much faster and cheaper, without requiring extensive computing resources.
Google DeepMind’s AI fact-checker outperforms humans
Google DeepMind has developed an AI system called Search-Augmented Factuality Evaluator (SAFE) that can evaluate the accuracy of information generated by large language models more effectively than human fact-checkers. In a study, SAFE matched human ratings 72% of the time and was correct in 76% of disagreements with humans.
While some experts question the use of “superhuman” to describe SAFE’s performance, arguing for benchmarking against expert fact-checkers, the system’s cost-effectiveness is undeniable, being 20 times cheaper than human fact-checkers.
Why does this matter?
As language models become more powerful and widely used, SAFE could combat misinformation and ensure the accuracy of AI-generated content. SAFE’s efficiency could be a game-changer for consumers relying on AI for tasks like research and content creation.
X.ai, Elon Musk’s AI startup, has introduced Grok-1.5, an upgraded AI model for their Grok chatbot. This new version enhances reasoning skills, especially in coding and math tasks, and expands its capacity to handle longer and more complex inputs with a 128,000-token context window.
Grok chatbots are known for their ability to discuss controversial topics with a rebellious touch. The improved model will first be tested by early users on X, with plans for wider availability later. This release follows the open-sourcing of Grok-1 and the inclusion of the chatbot in X’s $8-per-month Premium plan.
This is significant because Grok-1.5 represents an advancement in AI assistants, potentially offering improved help with complex tasks and better understanding of user intent through its larger context window and real-time data ability. This could impact how people interact with chatbots in the future, making them more helpful and reliable.
Microsoft tackles Gen AI risks with new Azure AI tools
Microsoft has launched new Azure AI tools to address the safety and reliability risks associated with generative AI. The tools, currently in preview, aim to prevent prompt injection attacks, hallucinations, and the generation of personal or harmful content. The offerings include Prompt Shields, prebuilt templates for safety-centric system messages, and Groundedness Detection. (Link)
Lightning AI partners with Nvidia to launch Thunder AI compiler
Lightning AI, in collaboration with Nvidia, has launched Thunder, an open-source compiler for PyTorch, to speed up AI model training by optimizing GPU usage. The company claims that Thunder can achieve up to a 40% speed-up for training large language models compared to unoptimized code. (Link)
SambaNova’s new AI model beats Databricks’ DBRX
SambaNova Systems’ Samba-CoE v0.2 Large Language Model outperforms competitors like Databricks’ DBRX, MistralAI’s Mixtral-8x7B, and xAI’s Grok-1. With 330 tokens per second using only 8 sockets, Samba-CoE v0.2 demonstrates remarkable speed and efficiency without sacrificing precision. (Link)
Google.org launches Accelerator to empower nonprofits with Gen AI
Google.org has announced a six-month accelerator program to support 21 nonprofits in leveraging generative AI for social impact. The program provides funding, mentorship, and technical training to help organizations develop AI-powered tools in areas such as climate, health, education, and economic opportunity, aiming to make AI more accessible and impactful. (Link)
Pixel 8 to get on-device AI features powered by Gemini Nano
Google is set to introduce on-device AI features like recording summaries and smart replies on the Pixel 8, powered by its small-sized Gemini Nano model. The features will be available as a developer preview in the next Pixel feature drop, marking a shift from Google’s primarily cloud-based AI approach. (Link)
A daily chronicle of AI Innovations: March 28th, 2024: DBRX becomes world’s most powerful open-source LLM Claude 3 Opus crowned the top user-rated chatbot, beating OpenAI’s GPT-4 Empathy meets AI: Hume AI’s EVI redefines voice interaction
DBRX becomes world’s most powerful open source LLM
Databricks has released DBRX, a family of open-source large language models setting a new standard for performance and efficiency. The series includes DBRX Base and DBRX Instruct, a fine-tuned version designed for few-turn interactions. Developed by Databricks’ Mosaic AI team and trained using NVIDIA DGX Cloud, these models leverage an optimized mixture-of-experts (MoE) architecture based on the MegaBlocks open-source project. This architecture allows DBRX to achieve up to twice the compute efficiency of other leading LLMs.
In terms of performance, DBRX outperforms open-source models like Llama 2 70B, Mixtral-8x7B, and Grok-1 on industry benchmarks for language understanding, programming, and math. It also surpasses GPT-3.5 on most of these benchmarks, although it still lags behind GPT-4. DBRX is available under an open license with some restrictions and can be accessed through GitHub, Hugging Face, and major cloud platforms. Organizations can also leverage DBRX within Databricks’ Data Intelligence Platform.
Why does this matter?
With DBRX, organizations can build and fine-tune powerful proprietary models using their own internal datasets, ensuring full control over their data rights. As a result, DBRX is likely to accelerate the trend of organizations moving away from closed models and embracing open alternatives that offer greater control and customization possibilities.
Claude 3 Opus crowned the top user-rated chatbot, beating OpenAI’s GPT-4
Anthropic’s Claude 3 Opus has overtaken OpenAI’s GPT-4 to become the top-rated chatbot on the Chatbot Arena leaderboard. This marks the first time in approximately a year since GPT-4’s release that another language model has surpassed it in this benchmark, which ranks models based on user preferences in randomized head-to-head comparisons. Anthropic’s cheaper Haiku and mid-range Sonnet models also perform impressively, coming close to the original GPT-4’s capabilities at a significantly lower cost.
While OpenAI still dominates the market, especially among regular users with ChatGPT, this development and recent leadership changes at OpenAI have helped Anthropic gain ground. However, OpenAI is rumored to be preparing to launch an even more advanced “GPT-4.5” or “GPT-5” model as soon as this summer, which CEO Sam Altman has teased will be “amazing,” potentially allowing them to retake the lead from Anthropic’s Claude 3 Opus.
Claude’s rise to the top of the Chatbot Arena leaderboard shows that OpenAI is not invincible and will face stiff competition in the battle for AI supremacy. With well-resourced challengers like Anthropic and Google, OpenAI will need to move fast and innovate boldly to maintain its top position. Ultimately, this rivalry will benefit everyone as it catalyzes the development of more powerful, capable, and hopefully beneficial AI systems that can help solve humanity’s major challenges.
Empathy meets AI: Hume AI’s EVI redefines voice interaction
In a significant development for the AI community, Hume AI has introduced a new conversational AI called Empathic Voice Interface (EVI). What sets EVI apart from other voice interfaces is its ability to understand and respond to the user’s tone of voice, adding unprecedented emotional intelligence to the interaction. By adapting its language and responses based on the user’s expressions, EVI creates a more human-like experience, blurring the lines between artificial and emotional intelligence.
EVI’s empathic capabilities extend beyond just understanding tone. It can accurately detect the end of a conversation turn, handle interruptions seamlessly, and even learn from user reactions to improve over time. These features, along with its fast and reliable transcription and text-to-speech capabilities, make EVI a highly adaptable tool for various applications. Developers can easily integrate EVI into their projects using Hume’s API, which will be publicly available in April.
Why does this matter?
Emotionally intelligent AI can be revolutionary for industries like healthcare and use cases like customer support, where empathy and emotional understanding are crucial. But we must also consider potential risks, such as overreliance on AI for emotional support or the possibility of AI systems influencing users’ emotions in unintended ways. If developed and implemented ethically, emotionally intelligent AI can greatly enhance how we interact with and benefit from AI technologies in our daily lives.
OpenAI launches revenue sharing program for GPT Store builders
OpenAI is experimenting with sharing revenue with builders who create successful apps using GPT in OpenAI’s GPT Store. The goal is to incentivize creativity and collaboration by rewarding builders for their impact on an ecosystem OpenAI is testing so they can make it easy for anyone to build and monetize AI-powered apps. (Link)
Google introduces new shopping features to refine searches
Google is rolling out new shopping features that allow users to refine their searches and find items they like more easily. The Style Recommendations feature lets shoppers rate items in their searches, helping Google pick up on their preferences. Users can also specify their favorite brands to instantly bring up more apparel from those selections. (Link)
rabbit’s r1 device gets ultra-realistic voice powered by ElevenLabs
ElevenLabs has partnered with rabbit to integrate its high-quality, low-latency voice AI into rabbit’s r1 AI companion device. The collaboration aims to make the user experience with r1 more natural and intuitive by allowing users to interact with the device using voice commands. (Link)
AI startup Hume raises $50M to build emotionally intelligent conversational AI
AI startup Hume has raised $50 million in a Series B funding round, valuing the company at $219 million. Hume’s AI technology can detect over 24 distinct emotional expressions in human speech and generate appropriate responses. The startup’s AI has been integrated into applications across healthcare, customer service, and productivity, with the goal of providing more context and empathy in AI interactions. (Link)
Lenovo launches AI-enhanced PCs in a push for innovation and differentiation
Lenovo revealed a new lineup of AI-powered PCs and laptops at its Innovate event in Bangkok, Thailand. The company showcased the dual-screen Yoga Book 9i, Yoga Pro 9i with an AI chip for performance optimization and AI-enhanced Legion gaming laptops. Lenovo hopes to differentiate itself in the crowded PC market and revive excitement with these AI-driven innovations. (Link)
Study shows ChatGPT can produce medical record notes 10 times faster than doctors without compromising quality
The AI model ChatGPT can write administrative medical notes up to 10 times faster than doctors without compromising quality. This is according to a study conducted by researchers at Uppsala University Hospital and Uppsala University in collaboration with Danderyd Hospital and the University Hospital of Basel, Switzerland. The research is published in the journal Acta Orthopaedica.
Microsoft’s Copilot AI service is set to run locally on PCs, Intel told Tom’s Hardware. The company also said that next-gen AI PCs would require built-in neural processing units (NPUs) with over 40 TOPS (trillion operations per second) of power — beyond the capabilities of any consumer processor on the market.
Intel said that the AI PCs would be able to run “more elements of Copilot” locally. Currently, Copilot runs nearly everything in the cloud, even small requests. That creates a fair amount of lag that’s fine for larger jobs, but not ideal for smaller jobs. Adding local compute capability would decrease that lag, while potentially improving performance and privacy as well.
Microsoft was previously rumored to require 40 TOPS on next-gen AI PCs (along with a modest 16GB of RAM). Right now, Windows doesn’t make much use of NPUs, apart from running video effects like background blurring for Surface Studio webcams. ChromeOS and macOS both use NPU power for more video and audio processing features, though, along with OCR, translation, live transcription and more, Ars Technica noted.
A daily chronicle of AI Innovations: March 27th, 2024: Microsoft study reveals the 11 by 11 tipping point for AI adoption A16z spotlights the rise of generative AI in enterprises Gaussian Frosting revolutionizes surface reconstruction in 3D modeling OpenAI unveils exciting upcoming features for GPT-4 and DALL-E 3 Adobe unveils GenStudio: AI-powered ad creation platform
Microsoft study reveals the 11 by 11 tipping point for AI adoption
Microsoft’s study on AI adoption in the workplace revealed the “11-by-11 tipping point,” where users start seeing AI’s value by saving 11 minutes daily. The study involved 1,300 Copilot for Microsoft 365 users and showed that 11 minutes of time savings is enough for most people to find AI useful.
Over 11 weeks, users reported improved productivity, work enjoyment, work-life balance, and fewer meetings. This “11-by-11 tipping point” signifies the time it takes for individuals to experience AI’s benefits in their work fully.
Why does it matter?
The study offers insights for organizations aiming to drive AI adoption among their employees. Businesses can focus on identifying specific use cases that deliver immediate benefits like time and cost savings. It will help organizations encourage employees to embrace AI, increasing productivity and improving work experiences.
A16z spotlights the rise of generative AI in enterprises
A groundbreaking report by the influential tech firm a16z unveils the rapid integration of generative AI technologies within the corporate sphere. The report highlights essential considerations for business leaders to harness generative AI effectively. It covers resource allocation, model selection, and innovative use cases, providing a strategic roadmap for enterprises.
An increased financial commitment from businesses marks the adoption of generative AI. Industry leaders are tripling their investments in AI technologies, emphasizing the pivotal role of generative AI in driving innovation and efficiency.
The shift towards integrating AI into core operations is evident. There is a focus on measuring productivity gains and cost savings and quantifying impact on key business metrics.
Why does it matter?
The increasing budgets allocated to generative AI signal its strategic importance in driving innovation and productivity in enterprises. This highlights AI’s transformative potential to provide a competitive edge and unlock new opportunities. Generative AI can revolutionize various business operations and help gain valuable insights by leveraging diverse data types.
Gaussian Frosting revolutionizes surface reconstruction in 3D modeling
At the international conference on computer vision, researchers presented a new method to improve surface reconstruction using Gaussian Frosting. This technique automates the adjustment of Poisson surface reconstruction hyperparameters, resulting in significantly improved mesh reconstruction.
The method showcases the potential for scaling up mesh reconstruction while preserving intricate details and opens up possibilities for advanced geometry and texture editing. This work marks a significant step forward in surface reconstruction methods, promising advancements in 3D modeling and visualization techniques.
Why does it matter?
The new method demonstrates how AI enhances surface reconstruction techniques, improving mesh quality and enabling advanced editing in 3D modeling. This has significant implications for revolutionizing how 3D models are created, edited, and visualized across various industries.
AIs can now learn and talk with each other like humans do.
This seems an important step toward AGI and vastly improved productivity.
“Once these tasks had been learned, the network was able to describe them to a second network — a copy of the first — so that it could reproduce them. To our knowledge, this is the first time that two AIs have been able to talk to each other in a purely linguistic way,’’ said lead author of the paper Alexandre Pouget, leader of the Geneva University Neurocenter, in a statement.”
“While AI-powered chatbots can interpret linguistic instructions to generate an image or text, they can’t translate written or verbal instructions into physical actions, let alone explain the instructions to another AI.
However, by simulating the areas of the human brain responsible for language perception, interpretation and instructions-based actions, the researchers created an AI with human-like learning and communication skills.”
Adobe unveils GenStudio: AI-powered ad creation platform
Adobe introduced GenStudio, an AI-powered ad creation platform, during its Summit event. GenStudio is a centralized hub for promotional campaigns, offering brand kits, copy guidance, and preapproved assets. It also provides generative AI-powered tools for generating backgrounds and ensuring brand consistency. Users can quickly create ads for email and social media platforms like Facebook, Instagram, and LinkedIn. (Link)
Airtable introduces AI summarization for enhanced productivity
Airtable has introduced Airtable AI, which provides generative AI summarization, categorization, and translation to users. This feature allows quick insights and understanding of information within workspaces, enabling easy sharing of valuable insights with teams. Airtable AI automatically applies categories and tags to information, routes action items to the relevant team, and generates emails or social posts with a single button tap. (Link)
Microsoft Teams enhances Copilot AI features for improved collaboration
Microsoft is introducing smarter Copilot AI features in Microsoft Teams to enhance collaboration and productivity. The updates include new ways to invoke the assistant during meeting chats and summaries, making it easier to catch up on missed meetings by combining spoken transcripts and written chats into a single view. Microsoft is launching new hybrid meeting features, such as automatic camera switching for remote participants and speaker recognition for accurate transcripts. (Link)
OpenAI unveils exciting upcoming features for GPT-4 and DALL-E 3
OpenAI is preparing to introduce new features for its GPT-4 and DALL-E 3 models. For GPT-4, OpenAI plans to remove the message limit, implement a Model Tuner Selector, and allow users to upgrade responses from GPT-3.5 to GPT-4 with a simple button push. On the DALL-E 3 front, OpenAI is working on an image editor with inpainting functionality. These upcoming features demonstrate OpenAI’s commitment to advancing AI capabilities. (Link)
Apple Chooses Baidu’s AI for iPhone 16 in China
Apple has reportedly chosen Baidu to provide AI technology for its upcoming iPhone 16 and other devices in China. This decision comes as Apple faces challenges due to stagnation in iPhone innovation and competition from Huawei. Baidu’s Ernie Bot will be included in the Chinese version of the iPhone 16, Mac OS, and iOS 18. Despite discussions with Alibaba Group Holding and a Tsinghua University AI startup, Apple selected Baidu’s AI technology for compliance. (Link)
Meta CEO, Mark Zuckerberg, is directly recruiting AI talent from Google’s DeepMind with personalized emails.
Meta CEO, Mark Zuckerberg, is attempting to recruit top AI talent from Google’s DeepMind (their AI research unit). Personalised emails, from Zuckerberg himself, have been sent to a few of their top researchers, according to a report from The Information, which cited individuals that had seen the messages. In addition to this, the researchers are being hired without having to do any interviews, and, a previous policy which Meta had in place – to not offer higher offers to candidates with competing job offers – has been relaxed.
Zuckerberg appears to be on a hiring spree to build Meta into a position of being a dominant player in the AI space.
OpenAI’s Sora Takes About 12 Minutes to Generate 1 Minute Video on NVIDIA H100. Source.
Apple on Tuesday announced that its annual developers conference, WWDC, will take place June 10 through June 14. Source.
Elon Musk says all Premium subscribers on X will gain access to AI chatbot Grok this week. Source.
Intel unveils AI PC program for software developers and hardware vendors. Source.
London-made HIV injection has potential to cure millions worldwide
A daily chronicle of AI Innovations: March 26th, 2024 : Zoom launches all-in-one modern AI collab platform; Stability AI launches instruction-tuned LLM; Stability AI CEO resigns to focus on decentralized AI; WhatsApp to integrate Meta AI directly into its search bar; Google, Intel, and Qualcomm challenge Nvidia’s dominance in AI; OpenAI pitches Sora to Hollywood studios
Zoom launches all-in-one modern AI collab platform
Zoom launched Zoom Workplace, an AI collaboration platform that integrates many tools to improve teamwork and productivity. With over 40 new features, including AI Companion updates for Zoom Phone, Team Chat, Events, and Contact Center, as well as the introduction of Ask AI Companion, Zoom Workplace simplifies workflows within a familiar interface.
The platform offers customization options, meeting features, and improved collaboration tools across Zoom’s ecosystem. Zoom Business Services, integrated with Zoom Workplace, offers AI-driven marketing, customer service, and sales solutions. It expands digital communication channels and provides real-time insights for better agent management.
Why does this matter?
This intelligent platform will increase productivity by automating tasks, summarizing interactions, and personalizing user experiences. This move positions Zoom as a frontrunner in the race to integrate AI into everyday work tools, which will reshape how teams communicate and collaborate.
Stability AI has introduced Stable Code Instruct 3B, a new instruction-tuned large language model. It can handle various software development tasks, such as code completion, generation, translation, and explanation, as well as creating database queries with simple instructions.
Stable Code Instruct 3B claims to outperform rival models like CodeLlama 7B Instruct and DeepSeek-Coder Instruct 1.3B in terms of accuracy, understanding natural language instructions, and handling diverse programming languages. The model is accessible for commercial use with a Stability AI Membership, while its weights are freely available on Hugging Face for non-commercial projects.
Why does this matter?
This model simplifies development workflows and complex tasks by providing contextual code completion, translation, and explanations. Businesses can prototype, iterate and ship software products faster thanks to its high performance and low hardware requirements.
Stability AI CEO resigns because of centralized AI
Stability AI CEO Emad Mostaque steps down to focus on decentralized AI, advocating for transparent governance in the industry.
Mostaque’s departure follows the appointment of interim co-CEOs Shan Shan Wong and Christian Laforte.
The startup, known for its image generation tool, faced challenges including talent loss and financial struggles.
Mostaque emphasized the importance of generative AI R&D over revenue growth and highlighted the potential economic value of open models in regulated industries.
The AI industry witnessed significant changes with Inflection AI co-founders joining Microsoft after raising $1.5 billion.
A 15% penetration of Sora for videos with realistic video generation demand and utilization will require about 720k Nvidia H100 GPUs. Each H100 requires about 700 Watts of power supply.
720,000 x 700 = 504 Megawatts.
By comparison, even the largest ever fully solar powered plan in America (Ivanpah Solar Power Facility) produces about 377 Megawats.
While these power requirements can be met with other options like nuclear plants and even coal/hydro plants of big sizes … are we really entering the power game for electricity ?
( it is currently a power game on compute)
What Else Is Happening in AI on March 26th, 2024
The Financial Times has introduced Ask FT, a new GenAI chatbot
It provides curated, natural-language responses to queries about recent events and broader topics covered by the FT. Ask FT is powered by Anthropic’s Claude and is available to a selected group of subscribers as it is under testing. (Link)
WhatsApp to integrate Meta AI directly into its search bar
The latest Android WhatsApp beta update will embed Meta AI directly into the search bar. This feature will allow users to type queries into the search bar and receive instant AI-powered responses without creating a separate Meta AI chat. The update will also allow users to interact with Meta AI even if they choose to hide the shortcut. (Link)
Google, Intel, and Qualcomm challenge Nvidia’s dominance in AI
Qualcomm, Google, and Intel are targeting NVIDIA’s software platforms like CUDA. They plan to create open-source tools compatible with multiple AI accelerator chips through the UXL Foundation. Companies are investing over $4 billion in startups developing AI software to loosen NVIDIA’s grip on the field. (Link)
Apple takes a multi-vendor approach for generative AI in iOS 18
Apple is reportedly in talks with Alphabet, OpenAI, and Anthropic to integrate generative AI capabilities from multiple vendors into iOS 18. This multi-vendor approach aligns with Apple’s efforts to balance advanced AI features with privacy considerations, which are expected to be detailed at WWDC 2024 during the iOS 18 launch. (Link)
OpenAI pitches Sora to Hollywood studios
OpenAI is actively engaging with Hollywood studios, directors, and talent agencies to integrate Sora into the entertainment industry. The startup has scheduled meetings in Los Angeles to showcase Sora’s capabilities and encourage partnerships, with CEO Sam Altman attending events during the Oscars weekend. (Link)
LLM providers charge you per token, but their tokens are not always comparable. So if you are putting Python code through GPT-4 and Claude 3, it would cost you 25% more tokens to do so with Claude, due to difference in their tokenisers (note: this is different to cost per token, it just means you will have more tokens to pay for)
Some observations: – OpenAI’s GPT-4 & 3.5 tokeniser is the most efficient for English and Python – Gemini absolutely demolishes the competition in the three languages I tested: French (-11%), Chinese (-43%) and Hebrew (-54%) – If your use cases is non-English, really worth looking at Gemini models – the difference in cost will likely be very noticeable – Llama 2 ranked at the bottom of all of my tests – Mistral was kind of disappointing on French (+16% worse than GPT), the reason why I picked French was that I assumed they’d do better
Methodology notes: – The study will be limited, I only compared 7 individual bits of text/code – so results in practice would vary – I have used this tokeniser playground (https://huggingface.co/spaces/Xenova/the-tokenizer-playground) for GPT, Mistral and Llama. I found it to be inaccurate (or old?) for Claude 3 and they didn’t have Gemini, so I did these separately – Tokens are only part of the puzzle, more efficient tokenisation won’t necessarily mean better performance or overall lower cost – If you want to learn about tokenisers, I recommend watching this video from Andrej Karpathy, even the first 10-20 minutes will be really worth your time https://www.youtube.com/watch?v=zduSFxRajkE
A daily chronicle of AI Innovations: March 25th, 2024 : Apple could partner with OpenAI, Gemini, Anthropic; Chatbots more likely to change your mind than another human, study says; Chatbots more likely to change your mind than another human, study says; Verbal Reasoning Test – Opus is better than 93% of people, Gemini 1.5 Pro 59%, GPT-4 Turbo only 36%; Apple’s Tim Cook says AI essential tool for businesses to reduce carbon footprint; Suno V3: Song-on-demand AI is getting insanely good; The first patient with a Neuralink brain-computer implant played Nintendo’s Mario Kart video game with his mind in an impressive new demo video
Apple could partner with OpenAI, Gemini, Anthropic
Apple is discussing with Alphabet, OpenAI, Anthropic, and potentially Baidu to integrate generative AI into iOS 18, considering multiple partners rather than a single one.
The collaboration could lead to a model where iPhone users might choose their preferred AI provider, akin to selecting a default search engine in a web browser.
Reasons for partnering with external AI providers include financial benefits, the possibility to quickly adapt through partnership changes or user preferences, and avoiding the complexities of developing and maintaining cloud-based generative AI in-house.
EU probes Apple, Google, Meta under new digital law
The European Commission has initiated five investigations into Apple, Google, and Meta for potential non-compliance with the Digital Markets Act (DMA), focusing on app store rules, search engine preferencing, and advertisement targeting models.
Investigations will also examine Apple’s app distribution fee structure and Amazon’s product preferencing, while Meta is given six months to make Messenger interoperable with other messaging services.
Companies may face fines up to 10% of their annual global revenue for DMA non-compliance, with the possibility of increased penalties for repeated infringements.
Chatbots more likely to change your mind than another human, study says
A study found that personalized chatbots, such as GPT-4, are more likely to change people’s minds compared to human debaters by using tailored arguments based on personal information.
The research conducted by the École Polytechnique Fédérale de Lausanne and the Italian Fondazione Bruno Kessler showed an 81.7 percent increase in agreement when GPT-4 had access to participants’ personal data like age, gender, and race.
Concerns were raised about the potential misuse of AI in persuasive technologies, especially with the ability to generate detailed user profiles from online activities, urging online platform operators to counter such strategies.
OpenAI CEO’s £142 Million Gamble On Unlocking the Secrets to Longer Life, Altman’s vision of extended lifespans may be achievable
Biotech startup Retro Biosciences is undertaking a one-of-a-kind experiment housed in shipping containers, funded by a $180 (£142.78) million investment by tech leader Sam Altman to increase lifespan.
Altman, the 38-year-old tech heavyweight, has been a significant player in the industry. Despite his young age, Altman took the tech realm by storm with offerings like ChatGPT and Sora. Unsurprisingly, his involvement in these groundbreaking projects has propelled him to a level of influence rivaling Mark Zuckerberg and Elon Musk, who is currently embroiled in a lawsuit with OpenAI.
It is also worth noting that the Altman-led AI startup is reportedly planning to launch its own AI-powered search engine to challenge Google’s search dominance. Altman’s visionary investments in tech giants like Reddit, Stripe, Airbnb, and Instacart propelled him to billionaire status. They cemented his influence as a tech giant who relentlessly pushed the boundaries of the industry’s future.
Suno V3 can do multiple languages in one song. This one is English, Portuguese, Japanese, and Italian. Incredible.
Beneath the vast sky, where dreams lay rooted deep, Mountains high and valleys wide, secrets they keep. Ground beneath my feet, firm and ever true, Earth, you give us life, in shades of brown and green hue.
Sopra o vento, mensageiro entre o céu e o mar, Carregando sussurros, histórias a contar. Dançam as folhas, em um balé sem fim, Vento, o alento invisível, guiando o destino assim.
Acqua, misteriosa forza che tutto scorre, Nei fiumi, nei mari, la vita che ci offre. Specchio del cielo, in te ci riflettiamo, Acqua, fonte di vita, a te ci affidiamo.
OpenAI Heading To Hollywood To Pitch Revolutionary “Sora”
Some of the most important meetings in Hollywood history will take place in the coming week, as OpenAI hits Hollywood to show the potential of its “Sora” software to studios, talent agencies, and media executives.
Bloomberg is reporting that OpenAI wants more filmmakers to become familiar with Sora, the text-to-video generator that potentially could upend the way movies are made.
Soon, Everyone Will Own a Robot, Like a Car or Phone Today. Says Figure AI founder
Brett Adcock, the founder of FigureAI robots, the company that recently released a demo video of its humanoid robot conversing with a human while performing tasks, predicts that everyone will own a robot in the future. “Similar to owning a car or phone today,” he said – hinting at the universal adoption of robots as an essential commodity in the future.
“Every human will own a robot in the future, similar to owning a car/phone today,” said Adcock.
A few months ago, Adcock called 2024 the year of Embodied AI, indicating how the future comprises AI in a body form. With robots learning to perform low-complexity tasks, such as picking trash, placing dishes, and even using the coffee machine, Figure robots are being trained to assist a person with house chores.
WhatsApp to embed Meta AI directly into search bar for instant assistance: Report.
WhatsApp is on the brink of a transformation in user interaction as it reportedly plans to integrate Meta AI directly into its search bar. This move promises to simplify access to AI assistance within the app, eliminating the need for users to navigate to a separate Meta AI conversation.
1️⃣ Technical Assistance & Troubleshooting (23%) 2️⃣ Content Creation & Editing (22%) 3️⃣ Personal & Professional Support (17%) 4️⃣ Learning & Education (15%) 5️⃣ Creativity & Recreation (13%) 6️⃣ Research, Analysis & Decision Making (10%)
What users are doing:
✔Generating ideas ✔Specific search ✔Editing text ✔Drafting emails ✔Simple explainers ✔Excel formulas ✔Sampling data
🤔 Do you see AI as a tool to enhance your work, or as a threat that could take over your job?
Source: HBR Image credit: Filtered
A daily chronicle of AI Innovations: March 22nd, 2024 : Nvidia’s Latte 3D generates text-to-3D in seconds! Saudi Arabia to invest $40 billion in AI Open Interpreter’s 01 Light personal pocket AI agent. Microsoft introduces a new Copilot for better productivity. Quiet-STaR: LMs can self-train to think before responding Neuralink’s first brain chip patient plays chess with his mind
Nvidia’s Latte 3D generates text-to-3D in seconds!
NVIDIA introduces Latte3D, facilitating the conversion of text prompts into detailed 3D models in less than a second. Developed by NVIDIA’s Toronto lab, Latte3D sets a new standard in generative AI models with its remarkable blend of speed and precision.
LATTE3D has two stages: first, NVIDIA’s team uses volumetric rendering to train the texture and geometry robustly, and second, it uses surface-based rendering to train only the texture for quality enhancement. Both stages use amortized optimization over prompts to maintain fast generation.
What sets Latte3D apart is its extensive pretraining phase, enabling the model to quickly adapt to new tasks by drawing on a vast repository of learned patterns and structures. This efficiency is achieved through a rigorous training regime that includes a blend of 3D datasets and prompts from ChatGPT.
Why does it matter?
AI models such as NVIDIA’s Latte3D have significantly reduced the time required to generate 3D visualizations from an hour to a few minutes compared to a few years ago. This technology has the potential to significantly accelerate the design and development process in various fields, such as the video game industry, advertising, and more.
Quiet-STaR: LMs can self-train to think before responding
A groundbreaking study demonstrates the successful training of large language models (LM) to reason from text rather than specific reasoning tasks. The research introduces a novel training approach, Quiet STaR, which utilizes a parallel sampling algorithm to generate rationales from all token positions in a given string.
This technique integrates meta tokens to indicate when the LM should generate a rationale and when it should make a prediction based on the rationale, revolutionizing the understanding of LM behavior. Notably, the study shows that thinking enables the LM to predict difficult tokens more effectively, leading to improvements with longer thoughts.
The research introduces powerful advancements, such as a non-myopic loss approach, the application of a mixing head for retrospective determination, and the integration of meta tokens, underpinning a comprehensive leap forward in language model training.
Why does it matter?
These significant developments in language modeling advance the field and have the potential to revolutionize a wide range of applications. This points towards a future where large language models will unprecedentedly contribute to complex reasoning tasks.
Neuralink’s first brain chip patient plays chess with his mind
Elon Musk’s brain chip startup, Neuralink, showcased its first brain chip patient playing chess using only his mind. The patient, Noland Arbaugh, was paralyzed below the shoulder after a diving accident.
Neuralink’s brain implant technology allows people with paralysis to control external devices using their thoughts. With further advancements, Neuralink’s technology has the potential to revolutionize the lives of people with paralysis, providing them with newfound independence and the ability to interact with the world in previously unimaginable ways.
Why does it matter?
Neuralink’s brain chip holds significant importance in AI and human cognition. It has the potential to enhance communication, assist paralyzed individuals, merge human intelligence with AI, and address the risks associated with AI development. However, ethical considerations and potential misuse of this technology must also be carefully examined.
Microsoft introduces a new Copilot for better productivity.
Microsoft’s new Copilot for Windows and Surface devices is a powerful productivity tool integrating large language models with Microsoft Graph and Microsoft 365 apps to enhance work efficiency. With a focus on delivering AI responsibly while ensuring data security and privacy, Microsoft is dedicated to providing users with innovative tools to thrive in the evolving work landscape. (Link)
Saudi Arabia to invest $40 billion in AI
Saudi Arabia has announced its plan to invest $40 billion in AI to become a global leader. Middle Eastern countries use their sovereign wealth fund, which has over $900 billion in assets, to achieve this goal. This investment aims to position the country at the forefront of the fast-evolving AI sector, drive innovation, and enhance economic growth. (Link)
Rightsify releases Hydra II to revolutionize AI music generation
Rightsify, a global music licensing leader, introduced Hydra II, the latest AI generation model. Hydra II offers over 800 instruments, 50 languages, and editing tools for customizable, copyright-free AI music. The model is trained on audio, text descriptions, MIDI, chord progressions, sheet music, and stems to create unique generations. (Link)
Open Interpreter’s 01 Light personal pocket AI agent
The Open Interpreter unveiled 01 Light, a portable device that allows you to control your computer using natural language commands. It’s part of an open-source project to make computing more accessible and flexible. It’s designed to make your online tasks more manageable, helping you get more done and simplify your life. (Link)
Microsoft’s $650 million Inflection deal: A strategic move Microsoft has recently entered into a significant deal with AI startup Inflection, involving a payment of $650 million in cash. While the deal may seem like a licensing agreement, it appears to be a strategic move by Microsoft to acquire AI talent while avoiding potential regulatory trouble. (Link)
Microsoft unveiled its first “AI PCs,” with a dedicated Copilot key and Neural Processing Units (NPUs).
Source: Nvidia
OpenAI Courts Hollywood in Meetings With Film Studios, Directors – from Bloomberg
The artificial intelligence startup has scheduled meetings in Los Angeles next week with Hollywood studios, media executives and talent agencies to form partnerships in the entertainment industry and encourage filmmakers to integrate its new AI video generator into their work, according to people familiar with the matter.
The upcoming meetings are just the latest round of outreach from OpenAI in recent weeks, said the people, who asked not to be named as the information is private. In late February, OpenAI scheduled introductory conversations in Hollywood led by Chief Operating Officer Brad Lightcap. Along with a couple of his colleagues, Lightcap demonstrated the capabilities of Sora, an unreleased new service that can generate realistic-looking videos up to about a minute in length based on text prompts from users. Days later, OpenAI Chief Executive Officer Sam Altman attended parties in Los Angeles during the weekend of the Academy Awards.
In an attempt to avoid defeatism, I’m hoping this will contribute to the indie boom with creatives refusing to work with AI and therefore studios who insist on using it. We’ve already got people on twitter saying this is the end of the industry but maybe only tentpole films as we know them.
Catherine, the Princess of Wales, has cancer, she announced in a video message released by Kensington Palace on Friday March 22nd, 2024
The recent news surrounding Kate Middleton, the Princess of Wales, revolves around a manipulated family photo that sparked controversy and conspiracy theories. The photo, released by Middleton herself, depicted her with her three children and was met with speculation about potential AI involvement in its editing. However, experts suggest that the image was likely manipulated using traditional photo editing software like Photoshop rather than generative AI
The circumstances surrounding Middleton’s absence from the public eye due to abdominal surgery fueled rumors and intensified scrutiny over the edited photo.
Major news agencies withdrew the image, citing evidence of manipulation in areas like Princess Charlotte’s sleeve cuff and the alignment of elements in the photo.
Despite concerns over AI manipulation, this incident serves as a reminder that not all image alterations involve advanced technology, with this case being attributed to a botched Photoshop job.From an AI perspective, experts highlight how the incident reflects society’s growing awareness of AI technologies and their impact on shared reality. The controversy surrounding the edited photo underscores the need for transparency and accountability in media consumption to combat misinformation and maintain trust in visual content. As AI tools become more accessible and sophisticated, distinguishing between authentic and manipulated media becomes increasingly challenging, emphasizing the importance of educating consumers and technologists on identifying AI-generated content.Kate Middleton, the Princess of Wales, recently disclosed her battle with cancer in a heartfelt statement. Following major abdominal surgery in January, it was initially believed that her condition was non-cancerous. However, subsequent tests revealed the presence of cancer, leading to the recommendation for preventative chemotherapy. The 42-year-old princess expressed gratitude for the support received during this challenging time and emphasized the importance of privacy as she focuses on her treatment and recovery. The news of her diagnosis has garnered an outpouring of support from around the world, with messages of encouragement coming from various public figures and officials.
Nvidia CEO says we’ll see fully AI-generated games in 5-10 years
Nvidia’s CEO, Jensen Huang, predicts the emergence of fully AI-generated games within the next five to ten years. This prediction is based on the development of Nvidia’s next-generation Blackwell AI GPU, the B200. This GPU marks a significant shift in GPU usage towards creating neural networks for generating content rather than traditional rasterization or ray tracing for visual fidelity in games. The evolution of AI in gaming is highlighted as GPUs transition from rendering graphics to processing AI algorithms for content creation, indicating a major transformation in the gaming industry’s future landscape.The integration of AI into gaming represents a paradigm shift that could revolutionize game development and player experiences. Fully AI-generated games have the potential to offer unprecedented levels of customization, dynamic storytelling, and adaptive gameplay based on individual player interactions. This advancement hints at a new era of creativity and innovation in game design but also raises questions about the ethical implications and challenges surrounding AI-generated content, such as ensuring diversity, fairness, and avoiding biases in virtual worlds. Source
Andrew Ng, cofounder of Google Brain & former chief scientist @ Baidu- “I think AI agentic workflows will drive massive AI progress this year — perhaps even more than the next generation of foundation models.
This is an important trend, and I urge everyone who works in AI to pay attention to it.”
I think AI agentic workflows will drive massive AI progress this year — perhaps even more than the next generation of foundation models. This is an important trend, and I urge everyone who works in AI to pay attention to it.
Today, we mostly use LLMs in zero-shot mode, prompting a model to generate final output token by token without revising its work. This is akin to asking someone to compose an essay from start to finish, typing straight through with no backspacing allowed, and expecting a high-quality result. Despite the difficulty, LLMs do amazingly well at this task!
With an agentic workflow, however, we can ask the LLM to iterate over a document many times. For example, it might take a sequence of steps such as:
Plan an outline.
Decide what, if any, web searches are needed to gather more information.
Write a first draft.
Read over the first draft to spot unjustified arguments or extraneous information.
Revise the draft taking into account any weaknesses spotted.
And so on.
This iterative process is critical for most human writers to write good text. With AI, such an iterative workflow yields much better results than writing in a single pass.
Devin’s splashy demo recently received a lot of social media buzz. My team has been closely following the evolution of AI that writes code. We analyzed results from a number of research teams, focusing on an algorithm’s ability to do well on the widely used HumanEval coding benchmark. You can see our findings in the diagram below.
GPT-3.5 (zero shot) was 48.1% correct. GPT-4 (zero shot) does better at 67.0%. However, the improvement from GPT-3.5 to GPT-4 is dwarfed by incorporating an iterative agent workflow. Indeed, wrapped in an agent loop, GPT-3.5 achieves up to 95.1%.
Open source agent tools and the academic literature on agents are proliferating, making this an exciting time but also a confusing one. To help put this work into perspective, I’d like to share a framework for categorizing design patterns for building agents. My team AI Fund is successfully using these patterns in many applications, and I hope you find them useful.
Reflection: The LLM examines its own work to come up with ways to improve it.
Tool use: The LLM is given tools such as web search, code execution, or any other function to help it gather information, take action, or process data.
Planning: The LLM comes up with, and executes, a multistep plan to achieve a goal (for example, writing an outline for an essay, then doing online research, then writing a draft, and so on).
Multi-agent collaboration: More than one AI agent work together, splitting up tasks and discussing and debating ideas, to come up with better solutions than a single agent would.
A daily chronicle of AI Innovations: March 21st, 2024 : Stealing Part of a Production Language Model Sakana AI’s method to automate foundation model development Key Stable Diffusion researchers leave Stability AI Character AI’s new feature adds voice to characters with just 10-sec audio Fitbit to get major AI upgrades powered by Google’s ‘Personal Health’ LLM Samsung creates lab to research chips for AI’s next phase GitHub’s latest AI tool can automatically fix code vulnerabilities
Stealing Part of a Production Language Model
Researchers from Google, OpenAI, and DeepMind (among others) released a new paper that introduces the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI’s ChatGPT or Google’s PaLM-2.
The attack allowed them to recover the complete embedding projection layer of a transformer language model. It differs from prior approaches that reconstruct a model in a bottom-up fashion, starting from the input layer. Instead, this operates top-down and directly extracts the model’s last layer by making targeted queries to a model’s API. This is useful for several reasons; it
Reveals the width of the transformer model, which is often correlated with its total parameter count.
Slightly reduces the degree to which the model is a complete “blackbox”
May reveal more global information about the model, such as relative size differences between different models
While there appear to be no immediate practical consequences of learning this layer is stolen, it represents the first time that any precise information about a deployed transformer model has been stolen.
Why does this matter?
Though it has limitations, the paper motivates the further study of practical attacks on ML models, in order to ultimately develop safer and more reliable AI systems. It also highlights how small, system-level design decisions impact the safety and security of the full product.
Sakana AI’s method to automate foundation model development
Sakana AI has introduced Evolutionary Model Merge, a general method that uses evolutionary techniques to efficiently discover the best ways to combine different models from the vast ocean of different open-source models with diverse capabilities.
As of writing, Hugging Face has over 500k models in dozens of different modalities that, in principle, could be combined to form new models with new capabilities. By working with the vast collective intelligence of existing open models, this method is able to automatically create new foundation models with desired capabilities specified by the user.
Why does this matter?
Model merging shows great promise and democratizes up model-building. In fact, the current Open LLM Leaderboard is dominated by merged models. They work without any additional training, making it very cost-effective. But we need a more systematic approach.
Evolutionary algorithms, inspired by natural selection, can unlock more effective merging. They can explore vast possibilities, discovering novel and unintuitive combinations that traditional methods and human intuition might miss.
Key Stable Diffusion researchers leave Stability AI
Robin Rombach and other key researchers who helped develop the Stable Diffusion text-to-image generation model have left the troubled, once-hot, now floundering GenAI startup.
Rombach (who led the team) and fellow researchers Andreas Blattmann and Dominik Lorenz were three of the five authors who developed the core Stable Diffusion research while at a German university. They were hired afterwards by Stability. Last month, they helped publish a 3rd edition of the Stable Diffusion model, which, for the first time, combined the diffusion structure used in earlier versions with transformers used in OpenAI’s ChatGPT.
Their departures are the latest in a mass exodus of executives at Stability AI, as its cash reserves dwindle and it struggles to raise additional funds.
Why does this matter?
Stable Diffusion is one of the foundational models that helped catalyze the boom in generative AI imagery, but now its future hangs in the balance. While Stability AI’s current situation raises questions about its long-term viability, the exodus potentially benefits its competitors.
Character AI’s new feature adds voice to characters with just 10-sec audio
You can now give voice to your Characters by choosing from thousands of voices or creating your own. The voices are created with just 10 seconds of audio clips. The feature is now available for free to everyone. (Link)
GitHub’s latest AI tool can automatically fix code vulnerabilities
GitHub launches the first beta of its code-scanning autofix feature, which finds and fixes security vulnerabilities during the coding process. GitHub claims it can remediate more than two-thirds of the vulnerabilities it finds, often without the developers having to edit the code. The feature is now available for all GitHub Advanced Security (GHAS) customers. (Link)
OpenAI plans to release a ‘materially better’ GPT-5 in mid-2024
According to anonymous sources from Businessinsider, OpenAI plans to release GPT-5 this summer, which will be significantly better than GPT-4. Some enterprise customers are said to have already received demos of the latest model and its ChatGPT improvements. (Link)
Fitbit to get major AI upgrades powered by Google’s ‘Personal Health’ LLM
Google Research and Fitbit announced they are working together to build a Personal Health LLM that gives users more insights and recommendations based on their data in the Fitbit mobile app. It will give Fitbit users personalized coaching and actionable insights that help them achieve their fitness and health goals. (Link)
Samsung creates lab to research chips for AI’s next phase
Samsung has set up a research lab dedicated to designing an entirely new type of semiconductor needed for (AGI). The lab will initially focus on developing chips for LLMs with a focus on inference. It aims to release new “chip designs, an iterative model that will provide stronger performance and support for increasingly larger models at a fraction of the power and cost.” (Link)
A daily chronicle of AI Innovations: March 20th, 2024 : OpenAI to release GPT-5 this summer; Nvidia’s Jensen Huang says AI hallucinations are solvable, AGI is 5 years away; Ozempic creator plans AI supercomputer to discover new drugs; After raising $1.3B, Inflection eaten alive by Microsoft; MindEye2: AI Mind Reading from Brain Activity; Nvidia NIM enables faster deployment of AI models
OpenAI to release GPT-5 this summer
OpenAI is planning to launch GPT-5 around mid-year, aiming to address previous performance issues and significantly improve upon its predecessor, GPT-4.
GPT-5 is described as “materially better” by those who have seen demos, including enhancements and new capabilities like the ability to call AI agents for autonomous tasks, with enterprise customers having already previewed these improvements.
The release timeline for GPT-5 remains uncertain as OpenAI continues its training and thorough safety and vulnerability testing, with no specific deadline for completion of these preparatory steps.
After raising $1.3B, Inflection eaten alive by Microsoft
In June 2023, Inflection raised $1.3 billion led by Microsoft to develop “more personal AI” but was overtaken by Microsoft less than a year later, with co-founders joining Microsoft’s new AI division.
Despite significant investment, Inflection’s AI, Pi, failed to compete with advancements from other companies such as OpenAI, Google’s Gemini, and Anthropic, leading to its downfall.
Microsoft’s takeover of Inflection reflects the strategy of legacy tech companies to dominate the AI space by supporting startups then acquiring them once they face challenges.
Nvidia’s Jensen Huang says AI hallucinations are solvable, AGI is 5 years away
Nvidia CEO Jensen Huang predicts artificial general intelligence (AGI) could be achieved within 5 years, depending on how AGI is defined and measured.
Huang addresses concerns around AI hallucinations, suggesting that ensuring answers are well-researched could easily solve the issue.
The concept of AGI raises concerns about its potential unpredictability and the challenges of aligning its objectives with human values and priorities.
Ozempic creator plans AI supercomputer to discover new drugs
The Novo Nordisk Foundation is investing in “Gefion,” an AI supercomputer project developed in collaboration with Nvidia.
“Gefion” aims to be the world’s most powerful AI supercomputer for health sciences, utilizing Nvidia’s new chips to accelerate scientific breakthroughs in critical areas such as drug discovery, disease diagnosis, and treatment,
This initiative underscores the growing integration of AI in healthcare, promising to catalyze significant scientific discoveries and innovations that could transform patient care and outcomes.
MindEye2 is a revolutionary model that reconstructs visual perception from brain activity using just one hour of data. Traditional methods require extensive training data, making them impractical for real-world applications. However, MindEye2 overcomes this limitation by leveraging shared-subject models. The model is pretrained on data from seven subjects and then fine-tuned with minimal data from a new subject.
By mapping brain activity to a shared-subject latent space and then nonlinear mapping to CLIP image space, MindEye2 achieves high-quality reconstructions with limited training data. It performs state-of-the-art image retrieval and reconstruction across multiple subjects within only 2.5% of the previously required training data, reducing the training time from 40 to just one hour.
Why does it matter?
MindEye2 has the potential to revolutionize clinical assessments and brain-computer interface applications. This remarkable achievement also holds great promise for neuroscience and opens new possibilities for understanding how our brains perceive and process visual information. It can also help develop personalized treatment plans for neuro patients.
NVIDIA has introduced NVIDIA NIM (NVIDIA Inference Microservices) to accelerate the deployment of AI applications for businesses. NIM is a collection of microservices that package essential components of an AI application, including AI models, APIs, and libraries, into a container. These containers can be deployed in environments such as cloud platforms, Linux servers, or serverless architectures.
NIM significantly reduces the time it takes to deploy AI applications from weeks to minutes. It offers optimized inference engines, industry-standard APIs, and support for popular software and data platform vendors. NIM microservices are compatible with NVIDIA GPUs and support features like Retrieval Augmented Generation (RAG) capabilities for enhanced enterprise applications. Developers can experiment with NIM microservices for free on the ai.nvidia.com platform, while commercial deployment is available through NVIDIA AI Enterprise 5.0.
Why does it matter?
With NIM, Nvidia is trying to democratize AI deployment for enterprises by abstracting away complexities. This will enable more developers to contribute to their company’s AI transformation efforts and allow businesses to run AI applications almost instantly without specialized AI expertise.
Microsoft hires DeepMind co-founder to lead a new AI division
Mustafa Suleyman, a renowned co-founder of DeepMind and Inflection, has recently joined Microsoft as the leader of Copilot. Satya Nadella, Microsoft’s CEO, made this significant announcement, highlighting the importance of innovation in artificial intelligence (AI).
In his new role as the Executive Vice President and CEO of Microsoft AI, Mustafa will work alongside Karén Simonyan, another talented individual from Inflection who will serve as Chief Scientist. Together, they will spearhead the development and advancement of Copilot and other exciting consumer AI products at Microsoft. Mustafa and his team’s addition to the Microsoft family brings a wealth of expertise and promises groundbreaking advancements in AI.
Why does it matter?
Mustafa Suleyman’s expertise in AI is expected to contribute to the development of innovative consumer AI products and research at Microsoft, furthering its mission to bring the benefits of AI to people and organizations worldwide. With DeepMind’s founder now at the helm, the AI race between Microsoft, Google, and others became even more intense.
Truecaller adds AI-powered spam detection and blocking for Android users
Truecaller has unveiled a new feature for its Android premium subscribers that uses AI to detect spam, even if unavailable on the Truecaller database, and block every call that doesn’t come from an approved contact. Truecaller hopes to add more premium subscribers to its list by adding this feature. However, this feature is not available for Apple users. (Link)
Google DeepMind’s new AI tool can analyze soccer tactics and offer insights
DeepMind has partnered with Liverpool FC to develop a new AI tool called TacticAI. TacticAI uses generative and predictive AI to help coaches determine which player will most likely receive the ball during corner kicks, whether a shot will be taken, and how to adjust player setup. It aims to revolutionize soccer and help the teams enhance their efficiency. (Link)
Pika Labs introduces sound effects for its gen-AI video generation
Pika Labs has now added the ability to create sound effects from a text prompt for its generative artificial intelligence videos. It allows for automatic or custom SFX generations to pair with video outputs. Now, users can make bacon sizzle, lions roar, or add footsteps to the video of someone walking down the street. It is only available to pro users. (Link)
Buildbox 4 Alpha enables users to create 3D video games from text prompts
Buildbox has released an alpha version of Buildbox 4. It’s an AI-first game engine that allows users to create games and generate assets from text prompts. The alpha version aims to make text-to-game a distinct reality. Users can create various assets and animations from simple text prompts. It also allows users to build a gaming environment in a few minutes. (Link)
Nvidia adds generative AI capabilities to empower humanoid robots
Nvidia introduced Project GR00T, a multimodal AI that will power future humanoids with advanced foundation AI. Project GR00T enables humanoid robots to input text, speech, videos, or even live demos and process them to take specific actions. It has been developed with the help of Nvidia’s Isaac Robotic Platform tools, including an Isaac Lab for RLHF. (Link)
A daily chronicle of AI Innovations: March 19th, 2024 : Nvidia launches ‘world’s most powerful AI chip’; Stability AI’s SV3D turns a single photo into a 3D video; OpenAI CEO hints at “Amazing Model”, maybe ChatGPT-5 ; Apple is in talks to bring Google’s AI to iPhones
Nvidia launches ‘world’s most powerful AI chip’
Nvidia has revealed its new Blackwell B200 GPU and GB200 “superchip”, claiming it to be the world’s most powerful chip for AI. Both B200 and GB200 are designed to offer powerful performance and significant efficiency gains.
Key takeaways:
The B200 offers up to 20 petaflops of FP4 horsepower, and Nvidia says it can reduce costs and energy consumption by up to 25 times over an H100.
The GB200 “superchip” can deliver 30X the performance for LLM inference workloads while also being more efficient.
Nvidia claims that just 2,000 Blackwell chips working together could train a GPT -4-like model comprising 1.8 trillion parameters in just 90 days.
Why does this matter?
A major leap in AI hardware, the Blackwell GPU boasts redefined performance and energy efficiency. This could lead to lower operating costs in the long run, making high-performance computing more accessible for AI research and development, all while promoting eco-friendly practices.
Stability AI’s SV3D turns a single photo into a 3D video
Stability AI released Stable Video 3D (SV3D), a new generative AI tool for rendering 3D videos. SV3D can create multi-view 3D models from a single image, allowing users to see an object from any angle. This technology is expected to be valuable in the gaming sector for creating 3D assets and in e-commerce for generating 360-degree product views.
SV3D builds upon Stability AI’s previous Stable Video Diffusion model. Unlike prior methods, SV3D can generate consistent views from any given angle. It also optimizes 3D meshes directly from the novel views it produces.
SV3D comes in two variants: SV3D_u generates orbital videos from single images, and SV3D_p creates 3D videos along specified camera paths.
Why does this matter?
SV3D represents a significant leap in generative AI for 3D content. Its ability to create 3D models and videos from a single image could open up possibilities in various fields, such as animation, virtual reality, and scientific modeling.
OpenAI CEO hints at “Amazing Model,” maybe ChatGPT-5
OpenAI CEO Sam Altman has announced that the company will release an “amazing model” in 2024, although the name has not been finalized. Altman also mentioned that OpenAI plans to release several other important projects before discussing GPT-5, one of which could be the Sora video model.
Altman declined to comment on the Q* project, which is rumored to be an AI breakthrough related to logic. He also expressed his opinion that GPT-4 Turbo and GPT-4 “kind of suck” and that the jump from GPT-4 to GPT-5 could be as significant as the improvement from GPT-3 to GPT-4.
Why does this matter?
This could mean that after Google Gemini and Claude-3’s latest version, a new model, possibly ChatGPT-5, could be released in 2024. Altman’s candid remarks about the current state of AI models also offer valuable context for understanding the anticipated advancements and challenges in the field.
Project GR00T is an ambitious initiative aiming to develop a general-purpose foundation model for humanoid robot learning, addressing embodied AGI challenges. Collaborating with leading humanoid companies worldwide, GR00T aims to understand multimodal instructions and perform various tasks.
GROOT is a foundational model that takes language, videos, and example demonstrations as inputs so it can produce the next action.
What the heck does that mean?
➡️ It means you can show it how to do X a few times, and then it can do X on its own.
Google’s new fine-tuned model is a HUGE improvement, AI is coming for human doctors sooner than most believe.
NVIDIA creates Earth-2 digital twin: generative AI to simulate, visualize weather and climate. Source
What Else Is Happening in AI on March 19th, 2024
Apple is in talks to bring Google’s AI to iPhones
Apple and Google are negotiating a deal to integrate Google’s Gemini AI into iPhones, potentially shaking up the AI industry. The deal would expand on their existing search partnership. Apple also held discussions with OpenAI. If successful, the partnership could give Gemini a significant edge with billions of potential users. (Link)
YouTube rolls out AI content labels
YouTube now requires creators to self-label AI-generated or synthetic content in videos. The platform may add labels itself for potentially misleading content. However, the tool relies on creators being honest, as YouTube is still working on AI detection tools. (Link)
Roblox speeds up 3D creation with AI tools Roblox has introduced two AI-driven tools to streamline 3D content creation on its platform. Avatar Auto Setup automates the conversion of 3D body meshes into fully animated avatars, while Texture Generator allows creators to quickly alter the appearance of 3D objects using text prompts, enabling rapid prototyping and iteration.(Link)
Nvidia teams up with Shutterstock and Getty Images for AI-generated 3D content
Nvidia’s Edify AI can now create 3D content, and partnerships with Shutterstock and Getty Images will make it accessible to all. Developers can soon experiment with these models, while industry giants are already using them to create stunning visuals and experiences. (Link)
Adobe Substance 3D introduces AI-powered text-to-texture tools
Adobe has introduced two AI-driven features to its Substance 3D suite: “Text to Texture,” which generates photo-realistic or stylized textures from text prompts, and “Generative Background,” which creates background images for 3D scenes. Both tools use 2D imaging technology from Adobe’s Firefly AI model to streamline 3D workflows. (Link)
A daily chronicle of AI Innovations: March 18th, 2024 – Bernie’s 4 day workweek: less work, same pay – Google’s AI brings photos to life as talking avatars – Elon Musk’s xAI open-sources Grok AI
Bernie’s 4 day workweek: less work, same pay
Sen. Bernie Sanders has introduced the Thirty-Two Hour Workweek Act, which aims to establish a four-day workweek in the United States without reducing pay or benefits. To be phased in over four years, the bill would lower the overtime pay threshold from 40 to 32 hours, ensuring that workers receive 1.5 times their regular salary for work days longer than 8 hours and double their regular wage for work days longer than 12 hours.
Sanders, along with Sen. Laphonza Butler and Rep. Mark Takano, believes that this bill is crucial in ensuring that workers benefit from the massive increase in productivity driven by AI, automation, and new technology. The legislation aims to reduce stress levels and improve Americans’ quality of life while also protecting their wages and benefits.
Why does this matter?
This bill could alter the workforce dynamics. Businesses may need to assess staffing and invest in AI to maintain productivity. While AI may raise concerns over job displacements, it also offers opportunities for better work-life balance through efficiency gains by augmenting human capabilities.
Google’s AI brings photos to life as talking avatars
Google’s latest AI research project VLOGGER, automatically generates realistic videos of talking and moving people from just a single image and an audio or text input. It is the first model that aims to create more natural interactions with virtual agents by including facial expressions, body movements, and gestures, going beyond simple lip-syncing.
It uses a two-step process: first, a diffusion-based network predicts body motion and facial expressions based on the audio, and then a novel architecture based on image diffusion models generates the final video while maintaining temporal consistency. VLOGGER outperforms previous state-of-the-art methods in terms of image quality, diversity, and the range of scenarios it can handle.
Why does this matter?
VLOGGER’s flexibility and applications could benefit remote work, education, and social interaction, making them more inclusive and accessible. Also, as AR/VR technologies advance, VLOGGER’s avatars could create emotionally resonant experiences in gaming, entertainment, and professional training scenarios.
Elon Musk’s xAI has open-sourced the base model weights and architecture of its AI chatbot, Grok. This allows researchers and developers to freely use and build upon the 314 billion parameter Mixture-of-Experts model. Released under the Apache 2.0 license, the open-source version is not fine-tuned for any particular task.
Why does this matter?
This move aligns with Musk’s criticism of companies that don’t open-source their AI models, including OpenAI, which he is currently suing for allegedly breaching an agreement to remain open-source. While several fully open-source AI models are available, the most used ones are closed-source or offer limited open licenses.
Maisa has released the beta version of its Knowledge Processing Unit (KPU), an AI system that uses LLMs’ advanced reasoning and data processing abilities. In an impressive demo, the KPU assisted a customer with an order-related issue, even when the customer provided an incorrect order ID, showing the system’s understanding abilities. (Link)
PepsiCo increases market domination using GenAI
PepsiCo uses GenAI in product development and marketing for faster launches and better profitability. It has increased market penetration by 15% by using GenAI to improve the taste and shape of products like Cheetos based on customer feedback. The company is also doubling down on its presence in India, with plans to open a third capability center to develop local talent. (Link)
Deci launches Nano LLM & GenAI dev platform
Israeli AI startup Deci has launched two major offerings: Deci-Nano, a small closed-source language model, and a complete Generative AI Development Platform for enterprises. Compared to rivals like OpenAI and Anthropic, Deci-Nano offers impressive performance at low cost, and the new platform offers a suite of tools to help businesses deploy and manage AI solutions. (Link)
Invoke AI simplifies game dev workflows
Invoke has launched Workflows, a set of AI tools designed for game developers and large studios. These tools make it easier for teams to adopt AI, regardless of their technical expertise levels. Workflows allow artists to use AI features while maintaining control over their training assets, brand-specific styles, and image security. (Link)
Mercedes teams up with Apptronik for robot workers
Mercedes-Benz is collaborating with robotics company Apptronik to automate repetitive and physically demanding tasks in its manufacturing process. The automaker is currently testing Apptronik’s Apollo robot, a 160-pound bipedal machine capable of lifting objects up to 55 pounds. The robot inspects and delivers components to human workers on the production line, reducing the physical strain on employees and increasing efficiency. (Link)
A daily chronicle of AI Innovations: Week 2 Recap
DeepSeek released DeepSeek-VL, an open-source Vision-Language (VL) model designed for real-world vision and language understanding applications. The DeepSeek-VL family, includes 7B and1.3B base and chat models and achieves state-of-the-art or competitive performance across a wide range of visual-language benchmarks. Free for commercial use [Details | Hugging Face | Demo]
Cohere released Command-R, a 35 billion parameters generative model with open weights, optimized for long context tasks such as retrieval augmented generation (RAG) and using external APIs and tools for production-scale AI for enterprise [Details|Hugging Face].
Google DeepMind introduced SIMA (Scalable Instructable Multiworld Agent), a generalist AI agent for 3D virtual environments, trained on nine different video games. It can understand a broad range of gaming worlds, and follows natural-language instructions to carry out tasks within them, as a human might. It doesn’t need access to a game’s source code or APIs and requires only the images on screen, and natural-language instructions provided by the user. SIMA uses keyboard and mouse outputs to control the games’ central character to carry out these instructions [Details].
Meta AI introduces Emu Video Edit (EVE), a model that establishes a new state-of-the art in video editing without relying on any supervised video editing data [Details].
Cognition Labs introduced Devin, the first fully autonomous AI software engineer. Devin can learn how to use unfamiliar technologies, can build and deploy apps end to end, can train and fine tune its own AI models. When evaluated on the SWE-Bench benchmark, which asks an AI to resolve GitHub issues found in real-world open-source projects, Devin correctly resolves 13.86% of the issues unassisted, exceeding the previous state-of-the-art model performance of 1.96% unassisted and 4.80% assisted [Details].
Pika Labs adds sound effects to its AI video tool, Pika, allowing users to either prompt desired sounds or automatically generate them based on video content. [Video link].
Anthropic’s Claude 3 Opus ranks #1 on LMSYS Chatbot Arena Leaderboard, along with GPT-4 [Link].
The European Parliament approved the Artificial Intelligence Act. The new rules ban certain AI applications including biometric categorisation systems, Emotion recognition in the workplace and schools, social scoring and more [Details].
Huawei Noah’s Ark Lab introduced PixArt–Σ, a Diffusion Transformer model (DiT) capable of directly generating images at 4K resolution. It achieves superior image quality and user prompt adherence with significantly smaller model size (0.6B parameters) than existing text-to-image diffusion models, such as SDXL (2.6B parameters) and SD Cascade (5.1B parameters) [Details].
South Korean startup Hyodol AI has launched a $1,800 LLM-powered companion doll specifically designed to offer emotional support and companionship to the rapidly expanding elderly demographic in the country [Details].
Covariant introduced RFM-1 (Robotics Foundation Model -1), a large language model (LLM), but for robot language. Set up as a multimodal any-to-any sequence model, RFM-1 is an 8 billion parameter transformer trained on text, images, videos, robot actions, and a range of numerical sensor readings [Details].
Figure 01 robot integrated with an OpenAI vision-language model can now have full conversations with people [Link]
Deepgram announced the general availability of Aura, a text-to-speech model built for responsive, conversational AI agents and applications [Details | Demo].
Claude 3 Haiku model is now available alongside Sonnet and Opus in the Claude API and on claude.ai for Pro subscribers. Haiku outperforms GPT-3.5 and Gemini 1.0 pro while costing less, and is three times faster than its peers for the vast majority of workloads [Details].
Paddle announced AI Launchpad, a 6-week remote program for AI founders to launch and scale an AI business with $20,000 in cash prize [Details].
Midjourney adds feature for generating consistent characters across multiple gen AI images [Details].
The Special Committee of the OpenAI Board announced the completion of the review. Altman, Brockman to continue to lead OpenAI [Details]
Together.ai introduced Sequoia, a scalable, robust, and hardware-aware speculative decoding framework that improves LLM inference speed on consumer GPUs (with offloading), as well as on high-end GPUs (on-chip), without any approximations [Details].
OpenAI released Transformer Debugger (TDB), a tool developed and used internally by OpenAI’s Superalignment team for investigating into specific behaviors of small language models [GitHub].
Elon Musk announced that xAI will open source Grok this week [Link].
A Daily Chronicle of AI Innovations – March 16th, 2024:
Reddit is under investigation by the FTC for its data licensing practices concerning user-generated content being used to train AI models.
The investigation focuses on Reddit’s engagement in selling, licensing, or sharing data with third parties for AI training.
Reddit anticipates generating approximately USD 60 million in 2024 from a data licensing agreement with Google, aiming to leverage its platform data for training LLMs
Researchers identified a new vulnerability in leading AI language models, named ArtPrompt, which uses ASCII art to exploit the models’ security mechanisms.
ArtPrompt masks security-sensitive words with ASCII art, fooling language models like GPT-3.5, GPT-4, Gemini, Claude, and Llama2 into performing actions they would otherwise block, such as giving instructions for making a bomb.
The study underscores the need for enhanced defensive measures for language models, as ArtPrompt, by leveraging a mix of text-based and image-based inputs, can effectively bypass current security protocols.
OpenAI aims to make its own AI processors — chip venture in talks with Abu Dhabi investment firm. Source
Once “too scary” to release, GPT-2 gets squeezed into an Excel spreadsheet. Source
A Daily Chronicle of AI Innovations – March 15th, 2024:
Apple quietly acquires another AI startup
Mercedes tests humanoid robots for ‘low skill, repetitive’ tasks
Midjourney bans prompts with Joe Biden and Donald Trump over election misinformation concerns
El Salvador stashes $406 million in bitcoin in ‘cold wallet’
Microsoft calls out Google dominance in generative AI
Anthropic releases affordable, high-speed Claude 3 Haiku model
Apple’s MM1 AI model shows state-of-the-art language and vision capabilities. It was trained on a filtered dataset of 500 million text-image pairs from the web, including 10% text-only docs to improve language understanding.
The team experimented with different configurations during training. They discovered that using an external pre-trained high-resolution image encoder improved visual recognition. Combining different image, text, and caption data ratios led to the best performance. Synthetic caption data also enhanced few-shot learning abilities.
This experiment cements that using a blend of image caption, interleaved image text, and text-only data is crucial for achieving state-of-the-art (SOTA) few-shot results across multiple benchmarks.
Why does it matter?
Apple’s new model is promising, especially in developing image recognition systems for new categories or domains. This will help businesses and startups improve the speed of AI tool development specifically for text-to-image, document analysis, and enhanced visual recognition.
Cerebras Systems has made a groundbreaking announcement unveiling its latest wafer-scale AI chip, the WSE-3. This chip boasts an incredible 4 trillion transistors, making it one of the most powerful AI chips on the market. The third-generation wafer-scale AI mega chip is twice as powerful as its predecessor while being power efficient.
The chip’s transistor density has increased by over 50 percent thanks to the latest manufacturing technology. One of the most remarkable features of the WSE-3 chip is its ability to enable AI models that are ten times larger than the highly acclaimed GPT-4 and Gemini models.
Why does it matter?
The WSE-3 chip opens up new possibilities for tackling complex problems and pushing the boundaries of AI capabilities. This powerful system can train massive language models, such as the Llama 70B, in just one day. It will help enterprises create custom LLMs, rapidly reducing the time-to-market.
Apple made a significant acquisition earlier this year by purchasing Canadian AI startup DarwinAI. Integrating DarwinAI’s expertise and technology bolsters Apple’s AI initiatives.
With this acquisition, Apple aims to tap into DarwinAI’s advancements in AI technology, particularly in visual inspection during manufacturing and making AI systems smaller and faster. Leveraging DarwinAI’s technology, Apple aims to run AI on devices rather than relying solely on cloud-based solutions.
Why does it matter?
Apple’s acquisition of DarwinAI is a strategic move to revolutionize features and enhance its AI capabilities across various products and services. Especially with the iOS 18 release around the corner, this acquisition will help create new features and enhance the user experience.
Microsoft is expanding Copilot, its AI assistant, with the introduction of the Copilot Pro subscription for individuals, the availability of Copilot for Microsoft 365 to small and medium-sized businesses, and removing seat minimums for commercial plans. Copilot aims to enhance creativity, productivity, and skills across work and personal life, providing users access to the latest AI models and improved image creation
Oracle has added advanced AI capabilities to its finance and supply chain software suite, aimed at improving decision-making and enhancing customer and employee experience. For instance, Oracle Fusion Cloud SCM includes features such as item description generation, supplier recommendations, and negotiation summaries.
Databricks has invested in Mistral AI and integrated its AI models into its data intelligence platform, allowing users to customize and consume models in various ways. The integration includes Mistral’s text-generation models, such as Mistral 7B and Mixtral 8x7B, which support multiple languages. This partnership aims to provide Databricks customers with advanced capabilities to leverage AI models and drive innovation in their data-driven applications.
Qualcomm has solidified its leadership position in mobile artificial intelligence (AI). It has been developing AI hardware and software for over a decade. Their Snapdragon processors are equipped with specialized AI engines like Hexagon DSP, ensuring efficient AI and machine learning processing without needing to send data to the cloud.
AI researchers are developing techniques to simulate peripheral vision and improve object detection in the periphery. They created a new dataset to train computer vision models, which led to better object detection outside the direct line of sight, though still behind human capabilities. A modified texture tiling approach accurately representing information loss in peripheral vision significantly enhanced object detection and recognition abilities.
Microsoft has expressed concerns to EU antitrust regulators about Google’s dominance in generative AI, highlighting Google’s unique position due to its vast data sets and vertical integration, which includes AI chips and platforms like YouTube.
The company argues that Google’s control over vast resources and its own AI developments give it a competitive advantage, making it difficult for competitors to match, especially in the development of Large Language Models like Gemini.
Microsoft defends partnerships with startups like OpenAI as essential for innovation and competition in the AI market, countering regulatory concerns about potential anticompetitive advantages arising from such collaborations.
Mercedes-Benz is testing humanoid robots, specifically Apptronik’s bipedal robot Apollo, for automating manual labor tasks in manufacturing.
The trial aims to explore the use of Apollo in physically demanding, repetitive tasks within existing manufacturing facilities without the need for significant redesigns.
The initiative seeks to address labor shortages by using robots for low-skill tasks, allowing highly skilled workers to focus on more complex aspects of car production.
Midjourney, an AI image generator, has banned prompts containing the names of Joe Biden and Donald Trump to avoid the spread of election misinformation.
The policy change is in response to concerns over AI’s potential to influence voters and spread false information before the 2024 presidential election.
Despite the new ban, Midjourney previously allowed prompts that could generate misleading or harmful content, and it was noted for its poor performance in controlling election disinformation.
Midjourney introduces Character Consistency: Tutorial
A Daily Chronicle of AI Innovations – March 14th, 2024:
DeepMind’s SIMA: The AI agent that’s a Jack of all games
Claude 3 Haiku: Anthropic’s lightning-fast AI solution for enterprises
OpenAI-powered “Figure 01” can chat, perceive, and complete tasks
OpenAI’s Sora will be publicly available later this year
DeepMind’s SIMA: The AI agent that’s a Jack of all games
DeepMind has introduced SIMA (Scalable Instructable Multiworld Agent), a generalist AI agent that can understand and follow natural language instructions to complete tasks across video game environments. Trained in collaboration with eight game studios on nine different games, SIMA marks a significant milestone in game-playing AI by showing the ability to generalize learned skills to new gaming worlds without requiring access to game code or APIs.
(SIMA comprises pre-trained vision models, and a main model that includes a memory and outputs keyboard and mouse actions.)
SIMA was evaluated on 600 basic skills, including navigation, object interaction, and menu use. In tests, SIMA agents trained on multiple games significantly outperformed specialized agents trained on individual games. Notably, an agent trained on all but one game performed nearly as well on the unseen game as an agent specifically trained on it, showcasing SIMA’s remarkable ability to generalize to new environments.
Why does this matter?
SIMA’s generalization ability using a single AI agent is a significant milestone in transfer learning. By showing that a multi-task trained agent can perform nearly as well on an unseen task as a specialized agent, SIMA paves the way for more versatile and scalable AI systems. This could lead to faster deployment of AI in real-world applications, as agents would require less task-specific training data and could adapt to new scenarios more quickly.
Claude 3 Haiku: Anthropic’s lightning-fast AI solution for enterprises
Anthropic has released Claude 3 Haiku, their fastest and most affordable AI model. With impressive vision capabilities and strong performance on industry benchmarks, Haiku is designed to tackle a wide range of enterprise applications. The model’s speed – processing 21K tokens per second for prompts under 32K tokens – and cost-effective pricing model make it an attractive choice for businesses needing to analyze large datasets and generate timely outputs.
In addition to its speed and affordability, Claude 3 Haiku prioritizes enterprise-grade security and robustness. The model is now available through Anthropic’s API or on claude.ai for Claude Pro subscribers.
Why does this matter?
Claude 3 Haiku sets a new benchmark for enterprise AI by offering high speed and cost-efficiency without compromising performance. This release will likely intensify competition among AI providers, making advanced AI solutions more accessible to businesses of all sizes. As more companies adopt models like Haiku, we expect a surge in AI-driven productivity and decision-making across industries.
OpenAI-powered “Figure 01” can chat, perceive, and complete tasks
Robotics company Figure, in collaboration with OpenAI, has developed a groundbreaking robot called “Figure 01” that can engage in full conversations, perceive its surroundings, plan actions, and execute tasks based on verbal requests, even those that are ambiguous or context-dependent. This is made possible by connecting the robot to a multimodal AI model trained by OpenAI, which integrates language and vision.
The AI model processes the robot’s entire conversation history, including images, enabling it to generate appropriate verbal responses and select the most suitable learned behaviors to carry out given commands. The robot’s actions are controlled by visuomotor transformers that convert visual input into precise physical movements. “Figure 01” successfully integrates natural language interaction, visual perception, reasoning, and dexterous manipulation in a single robot platform.
Why does this matter?
As robots become more adept at understanding and responding to human language, questions arise about their autonomy and potential impact on humanity. Collaboration between the robotics industry and AI policymakers is needed to establish regulations for the safe deployment of AI-powered robots. If deployed safely, these robots could become trusted partners, enhancing productivity, safety, and quality of life in various domains.
Amazon streamlines product listing process with new AI tool
Amazon is introducing a new AI feature for sellers to quickly create product pages by pasting a link from their external website. The AI generates product descriptions and images based on the linked site’s information, saving sellers time. (Link)
Microsoft to expand AI-powered cybersecurity tool availability from April 1
Microsoft is expanding the availability of its AI-powered cybersecurity tool, “Security Copilot,” from April 1, 2024. The tool helps with tasks like summarizing incidents, analyzing vulnerabilities, and sharing information. Microsoft plans to adopt a ‘pay-as-you-go’ pricing model to reduce entry barriers. (Link)
OpenAI’s Sora will be publicly available later this year
OpenAI will release Sora, its text-to-video AI tool, to the public later this year. Sora generates realistic video scenes from text prompts and may add audio capabilities in the future. OpenAI plans to offer Sora at a cost similar to DALL-E, its text-to-image model, and is developing features for users to edit the AI-generated content. (Link)
OpenAI partners with Le Monde, Prisa Media for news content in ChatGPT
OpenAI has announced partnerships with French newspaper Le Monde and Spanish media group Prisa Media to provide their news content to users of ChatGPT. The media companies see this as a way to ensure reliable information reaches AI users while safeguarding their journalistic integrity and revenue. (Link)
Icon’s AI architect and 3D printing breakthroughs reimagine homebuilding
Construction tech startup Icon has introduced an AI-powered architect, Vitruvius, that engages users in designing their dream homes, offering 3D-printed and conventional options. The company also debuted an advanced 3D printing robot called Phoenix and a low-carbon concrete mix as part of its mission to make homebuilding more affordable, efficient, and sustainable. (Link)
A Daily Chronicle of AI Innovations – March 13th, 2024: Devin: The first AI software engineer redefines coding; Deepgram’s Aura empowers AI agents with authentic voices; Meta introduces two 24K GPU clusters to train Llama 3
Devin: The first AI software engineer redefines coding
In the most groundbreaking development, the US-based startup Cognition AI has unveiled Devin, the world’s first AI software engineer. It is an autonomous agent that solves engineering tasks using its shell or command prompt, code editor, and web browser. Devin can also perform tasks like planning, coding, debugging, and deploying projects autonomously.
When evaluated on the SWE-Bench benchmark, which asks an AI to resolve GitHub issues found in real-world open-source projects, Devin correctly resolves 13.86% of the issues unassisted, far exceeding the previous state-of-the-art model performance of 1.96% unassisted and 4.80% assisted. It has successfully passed practical engineering interviews with leading AI companies and even completed real Upwork jobs.
Why does it matter?
There’s already a huge debate if Devin will replace software engineers. However, most production-grade software is too complex, unique, or domain-specific to be fully automated at this point. Perhaps, Devin could start handling more initial-level tasks in development. More so, it can assist developers in quickly prototyping, bootstrapping, and autonomously launching MVP for smaller apps and websites, for now
Deepgram’s Aura empowers AI agents with authentic voices
Deepgram, a top voice recognition startup, just released Aura, its new real-time text-to-speech model. It’s the first text-to-speech model built for responsive, conversational AI agents and applications. Companies can use these agents for customer service in call centers and other customer-facing roles.
Aura includes a dozen natural, human-like voices with lower latency than any comparable voice AI alternative and is already being used in production by several customers. Aura works hand in hand with Deepgram’s Nova-2 speech-to-text API. Nova-2 is known for its top-notch accuracy and speed in transcribing audio streams.
Why does it matter?
Deepgram’s Aura is a one-stop shop for speech recognition and voice generation APIs that enable the fastest response times and most natural-sounding conversational flow. Its human-like voice models render extremely fast (typically in well under half a second) and at an affordable price ($0.015 per 1,000 characters). Lastly, Deepgram’s transcription is more accurate and faster than other solutions as well.
Meta introduces two 24K GPU clusters to train Llama 3
Meta has invested significantly in its AI infrastructure by introducing two 24k GPU clusters. These clusters, built on top of Grand Teton, OpenRack, and PyTorch, are designed to support various AI workloads, including the training of Llama 3.
Meta aims to expand its infrastructure build-out by the end of 2024. It plans to include 350,000 NVIDIA H100 GPUs, providing compute power equivalent to nearly 600,000 H100s. The clusters are built with a focus on researcher and developer experience.
This adds up to Meta’s long-term vision to build open and responsibly developed artificial general intelligence (AGI). These clusters enable the development of advanced AI models and power applications such as computer vision, NLP, speech recognition, and image generation.
Why does it matter?
Meta is committed to open compute and open source, driving innovation in the AI software and hardware industry. Introducing two new GPUs to train Llama 3 is also a push forward to their commitment. As a founding member of Open Hardware Innovation (OHI) and the Open Innovation AI Research Community, Meta wants to make AI transparent and trustworthy.
Google Play to display AI-powered FAQs and recent YouTube videos for games
At the Google for Games Developer Summit held in San Francisco, Google announced several new features for ‘Google Play listing for games’. These include AI-powered FAQs, displaying the latest YouTube videos, new immersive ad formats, and support for native PC game publishing. These new features will allow developers to display promotions and the latest YouTube videos directly in their listing and show them to users in the Games tab of the Play Store. (Link)
DoorDash’s new AI-powered tool automatically curbs verbal abuses
DoorDash has introduced a new AI-powered tool named ‘SafeChat+’ to review in-app conversations and determine if a customer or Dasher is being harassed. There will be an option to report the incident and either contact DoorDash’s support team if you’re a customer or quickly cancel the order if you’re a delivery person. With this feature, DoorDash aims to reduce verbally abusive and inappropriate interactions between consumers and delivery people. (Link)
Perplexity has decided to bring Yelp data to its chatbot
Perplexity has decided to bring Yelp data to its chatbot. The company CEO, Aravind Srinivas, told the media that many people use chatbots like search engines. He added that it makes sense to offer information on things they look for, like restaurants, directly from the source. That’s why they have decided to integrate Yelp’s maps, reviews, and other details in responses when people ask for restaurant or cafe recommendations. (Link)
Pinterest’s ‘body types ranges’ tool delivers more inclusive search results
Pinterest has introduced a new tool named body type ranges, which gives users a choice to self-select body types from a visual cue between four body type ranges to deliver personalized and more refined search results for women’s fashion and wedding inspiration. This tool aims to create a more inclusive place online to search, save, and shop. The company also plans to launch a similar feature for men’s fashion later this year. (Link)
OpenAI’s GPT-4.5 Turbo is all set to be launched in June 2024
According to the leak search engine results from Bing and DuckDuck Go, which indexed the OpenAI GPT-4.5 Turbo product page before an official announcement, OpenAI is all set to launch the new version of its LLM by June 2024. There is a discussion among the AI community that this could be OpenAI’s fastest, most accurate, and most scalable model to date. The details of GPT-4.5 Turbo were leaked by OpenAI’s web team, which now leads to a 404 page. (Link))
A Daily Chronicle of AI Innovations in March 2024 – Day 12: AI Daily News – March 12th, 2024
Cohere’s introduces production-scale AI for enterprises RFM-1 redefines robotics with human-like reasoning Spotify introduces audiobook recommendations
Midjourney bans all its competitor’s employees
Google restricts election-related queries for its Gemini chatbot
Apple to let developers distribute apps directly from their websites
AI startups reach record funding of nearly $50 billion in 2023
Cohere’s introduces production-scale AI for enterprises
Cohere, an AI company, has introduced Command-R, a new large language model (LLM) designed to address real-world challenges, such as inefficient workflows, data analysis limitations, slow response times, etc.
Command-R focuses on two key areas: Retrieval Augmented Generation (RAG) and Tool Use. RAG allows the model to access and process information from private databases, improving the accuracy of its responses. Tool Use allows Command-R to interact with external software tools and APIs, automating complex tasks.
Command-R offers several features beneficial for businesses, including:
Multilingual capabilities: Supports 10 major languages
Cost-effectiveness: Offers a longer context window and reduced pricing compared to previous models
Wider accessibility: Available through Cohere’s API, major cloud providers, and free weights for research on HuggingFace
Overall, it empowers businesses to leverage AI for improved decision-making, increased productivity, and enhanced customer experiences.
Why does this matter?
Command-R showcases the future of business operations, featuring automated workflows, and enabling humans to focus on strategic work. Thanks to its low hallucination rate, we would see a wider adoption of AI technologies, and the development of sophisticated, context-aware AI applications tailored to specific business needs.
As AI continues to evolve and mature, models like Command-R will shape the future of work and the global economy.
RFM-1 redefines robotics with human-like reasoning
Covariant has introduced RFM-1, a Robotics Foundation Model that gives robots ChatGPT-like understanding and reasoning capabilities.
TLDR;
RFM-1 is an 8 billion parameter transformer trained on text, images, videos, robot actions, and sensor readings from Covariant’s fleet of high-performing robotic systems deployed in real-world environments.
Similar to how we understand how objects move, RFM-1 can predict future outcomes/consequences based on initial images and robot actions.
RFM-1 leverages NLP to enable intuitive interfaces for programming robot behavior. Operators can instruct robots using plain English, lowering barriers to customizing AI behavior for specific needs.
RFM-1 can also communicate issues and suggest solutions to operators.
Why does this matter?
This advancement has the potential to revolutionize industries such as manufacturing, logistics, and healthcare, where robots can work alongside humans to improve efficiency, safety, and productivity.
Spotify has introduced a novel recommendation system called 2T-HGNN to provide personalized audiobook recommendations to its users. The system addresses the challenges of introducing a new content type (audiobooks) into an existing platform, such as data sparsity and the need for scalability.
2T-HGNN leverages a technique called “Heterogeneous Graph Neural Networks” (HGNNs) to uncover connections between different content types. Additionally, a “Two Tower” (2T) model helps ensure that recommendations are made quickly and efficiently for millions of users.
Interestingly, the system also uses podcast consumption data and weak interaction signals to uncover user preferences and predict future audiobook engagement.
Why does this matter?
This research will not only improve the user experience but also encourage users to explore and engage with audiobooks, potentially driving growth in this new content vertical. Moreover, it may inspire similar strategies in domains where tailored recommendations are essential, such as e-commerce, news, and entertainment.
Elon Musk announced that his AI startup xAI will open-source its ChatGPT rival “Grok” this week, following a lawsuit against OpenAI for shifting to a for-profit model. Musk aims to provide free access to Grok’s code, aligning with open-source AI models like Meta and Mistral (Link)
Midjourney launches character consistent feature
Midjourney’s new “Consistent Character” feature lets artists create consistent characters across images. Users provide a reference image URL with their prompt, and the AI attempts to match the character’s features in new scenes. This holds promise for creators of comics, storyboards, and other visual narratives. (Link)
Apple tests AI for App Store ad optimization Taking a page from Google and Meta, Apple is testing AI-powered ad placement within its App Store. This new system would automatically choose the most suitable locations (e.g., App Store Today page) to display ads based on advertiser goals and budget. This development could help Apple’s ad business reach $6 billion by 2025.(Link)
China tests AI chatbot to assist neurosurgeons
China steps into the future of brain surgery with an AI co-pilot, dubbed “CARES Copilot”. This AI, based on Meta’s Llama 2.0, assists surgeons by analyzing medical data (e.g., scans) and offering informed suggestions during surgery. This government-backed project reflects China’s growing focus on developing domestic AI solutions for various sectors, including healthcare. (Link)
South Korea deploys AI dolls to tackle elderly loneliness
Hyodol, a Korean-based company, has introduced an AI-powered companion doll to tackle loneliness among elderly. Priced at $1800, the robot doll boasts advanced features like conversation abilities, medication reminders, and safety alerts. With 7,000 dolls already deployed, Hyodol aims to expand to European and North American markets. (Link)
Midjourney bans all its competitor’s employees
Midjourney banned all Stability AI employees from using its service, citing a systems outage caused by data scraping efforts linked to Stability AI employees.
The company announced the ban and a new policy against “aggressive automation” after identifying botnet-like activity from Stability AI during a server outage.
Stability AI CEO Emad Mostaque is looking into the incident, and Midjourney’s founder David Holz has provided information for the internal investigation.
Google restricts election-related queries for its Gemini chatbot
Google has begun restricting Gemini queries related to elections globally in countries where elections are taking place, to prevent the dissemination of false or misleading information.
The restrictions were implemented amid concerns over generative AI’s potential impact on elections and followed an advisory from India requiring tech firms to obtain government permission before introducing new AI models.
Despite the restrictions, the effectiveness of the restrictions is under question as some users found ways to bypass them, and it’s uncertain if Google will lift these restrictions post-elections.
AI startups reach record funding of nearly $50 billion in 2023
AI startups reached a record funding of nearly $50 billion in 2023, with significant contributions from companies like OpenAI and Anthropic.
Investment trends showed over 70 funding rounds exceeding $100 million each, partly due to major companies’ investments, including Microsoft’s $10 billion in OpenAI.
While large tech companies are venturing to dominate the AI market, specialized AI startups like Midjourney manage to maintain niches by offering superior products.
A Daily Chronicle of AI Innovations in March 2024 – Day 11: AI Daily News – March 11th, 2024
Huawei’s PixArt-Σ paints prompts to perfection Meta cracks the code to improve LLM reasoning Yi Models exceed benchmarks with refined data
Huawei’s PixArt-Σ paints prompts to perfection
Researchers from Huawei’s Noah’s Ark Lab introduced PixArt-Σ, a text-to-image model that can create 4K resolution images with impressive accuracy in following prompts. Despite having significantly fewer parameters than models like SDXL, PixArt-Σ outperforms them in image quality and prompt matching.
The model uses a “weak-to-strong” training strategy and efficient token compression to reduce computational requirements. It relies on carefully curated training data with high-resolution images and accurate descriptions, enabling it to generate detailed 4K images closely matching the text prompts. The researchers claim that PixArt-Σ can even keep up with commercial alternatives such as Adobe Firefly 2, Google Imagen 2, OpenAI DALL-E 3, and Midjourney v6.
Why does this matter?
PixArt-Σ’s ability to generate high-resolution, photorealistic images accurately could impact industries like advertising, media, and entertainment. As its efficient approach requires fewer computational resources than existing models, businesses may find it easier and more cost-effective to create custom visuals for their products or services.
Meta researchers investigated using reinforcement learning (RL) to improve the reasoning abilities of large language models (LLMs). They compared algorithms like Proximal Policy Optimization (PPO) and Expert Iteration (EI) and found that the simple EI method was particularly effective, enabling models to outperform fine-tuned models by nearly 10% after several training iterations.
However, the study also revealed that the tested RL methods have limitations in further improving LLMs’ logical capabilities. The researchers suggest that stronger exploration techniques, such as Tree of Thoughts, XOT, or combining LLMs with evolutionary algorithms, are important for achieving greater progress in reasoning performance.
Why does this matter?
Meta’s research highlights the potential of RL in improving LLMs’ logical abilities. This could lead to more accurate and efficient AI for domains like scientific research, financial analysis, and strategic decision-making. By focusing on techniques that encourage LLMs to discover novel solutions and approaches, researchers can make more advanced AI systems.
01.AI has introduced the Yi model family, a series of language and multimodal models that showcase impressive multidimensional abilities. The Yi models, based on 6B and 34B pretrained language models, have been extended to include chat models, 200K long context models, depth-upscaled models, and vision-language models.
The performance of the Yi models can be attributed to the high-quality data resulting from 01.AI‘s data-engineering efforts. By constructing a massive 3.1 trillion token dataset of English and Chinese corpora and meticulously polishing a small-scale instruction dataset, 01.AI has created a solid foundation for their models. The company believes that scaling up model parameters using thoroughly optimized data will lead to even more powerful models.
Why does this matter?
The Yi models’ success in language, vision, and multimodal tasks suggests that they could be adapted to a wide range of applications, from customer service chatbots to content moderation and beyond. These models also serve as a prime example of how investing in data optimization can lead to groundbreaking advancements in the field.
OpenAI’s Evolution into Skynet: AI and Robotics Future, Figure Humanoid Robots
OpenAI’s partnership with Figure signifies a transformative step in the evolution of AI and robotics.
Utilizing Microsoft Azure, OpenAI’s investment supports the deployment of autonomous humanoid robots for commercial use.
Figure’s collaboration with BMW Manufacturing integrates humanoid robots to enhance automotive production.
This technological progression echoes the fictional superintelligence Skynet yet emphasizes real-world innovation and safety.
The industry valuation of Figure at $2.6 billion underlines the significant impact and potential of advanced AI in commercial sectors.
What Else Is Happening in AI on March 11, 2024
Redfin’s AI can tell you about your dream neighborhood
“Ask Redfin” can now answer questions about homes, neighborhoods, and more. Using LLMss, the chatbot can provide insights on air conditioning, home prices, safety, and even connect users to agents. It is currently available in 12 U.S. cities, including Atlanta, Boston, Chicago, and Washington, D.C. (Link)
Pika Labs Adds Sound to Silent AI Videos
Pika Labs users can now add sound effects to their generated videos. Users can either specify the exact sounds they want or let Pika’s AI automatically select and integrate them based on the video’s content. This update aims to provide a more immersive and engaging video creation experience, setting a new standard in the industry. (Link)
Salesforce’s new AI tool for doctors automates paperwork
Salesforce is launching new AI tools to help healthcare workers automate tedious administrative tasks. Einstein Copilot: Health Actions will allow doctors to book appointments, summarize patient info, and send referrals using conversational AI, while Assessment Generation will digitize health assessments without manual typing or coding. (Link)
HP’s new AI-powered PCs redefine work
HP just dropped a massive lineup of AI-powered PCs, including the HP Elite series, Z by HP mobile workstations, and Poly Studio conferencing solutions. These devices use AI to improve productivity, creativity, and collaboration for the hybrid workforce, while also offering advanced security features like protection against quantum computer hacks. (Link)
DALL-E 3’s new look is artsy and user-friendly
OpenAI is testing a new user interface for DALL-E 3. It allows users to choose between predefined styles and aspect ratios directly in the GPT, offering a more intuitive and educational experience. OpenAI has also implemented the C2PA standard for metadata verification and is working on an image classifier to reliably recognize DALL-E images. (Link)
A Daily Chronicle of AI Innovations in March 2024 – Week 1 Summary
Anthropic introduced the next generation of Claude: Claude 3 model family. It includes Opus, Sonnet and Haiku models. Opus is the most intelligent model, that outperforms GPT-4 and Gemini 1.0 Ultra on most of the common evaluation benchmarks. Haiku is the fastest, most compact model for near-instant responsiveness. The Claude 3 models have vision capabilities, offer a 200K context window capable of accepting inputs exceeding 1 million tokens, improved accuracy and fewer refusals [Details|Model Card].
Stability AI partnered with Tripo AI and released TripoSR, a fast 3D object reconstruction model that can generate high-quality 3D models from a single image in under a second. The model weights and source code are available under the MIT license, allowing commercialized use. [Details|GitHub | Hugging Face].
Answer.AI released a fully open source system that, for the first time, can efficiently train a 70b large language model on a regular desktop computer with two or more standard gaming GPUs. It combines QLoRA with Meta’s FSDP, which shards large models across multiple GPUs [Details].
Inflection launched Inflection-2.5, an upgrade to their model powering Pi, Inflection’s empathetic and supportive companion chatbot. Inflection-2.5 approaches GPT-4’s performance, but used only 40% of the amount of compute for training. Pi is also now available on Apple Messages [Details].
Twelve Labs introduced Marengo-2.6, a new state-of-the-art (SOTA) multimodal foundation model capable of performing any-to-any search tasks, including Text-To-Video, Text-To-Image, Text-To-Audio, Audio-To-Video, Image-To-Video, and more [Details].
Cloudflare announced the development of Firewall for AI, a protection layer that can be deployed in front of Large Language Models (LLMs), hosted on the Cloudflare Workers AI platform or models hosted on any other third party infrastructure, to identify abuses before they reach the models [Details]
Scale AI, in partnership with the Center for AI Safety, released WMDP (Weapons of Mass Destruction Proxy): an open-source evaluation benchmark of 4,157 multiple-choice questions that serve as a proxy measurement of LLM’s risky knowledge in biosecurity, cybersecurity, and chemical security [Details].
Midjourney launched v6 turbo mode to generate images at 3.5x the speed (for 2x the cost). Just type /turbo [Link].
Moondream.ai released moondream 2 – a small 1.8B parameters, open-source, vision language model designed to run efficiently on edge devices. It was initialized using Phi-1.5 and SigLIP, and trained primarily on synthetic data generated by Mixtral. Code and weights are released under the Apache 2.0 license, which permits commercial use [Details].
Vercel released Vercel AI SDK 3.0. Developers can now associate LLM responses to streaming React Server Components [Details].
Nous Research released a new model designed exclusively to create instructions from raw-text corpuses, Genstruct 7B. This enables the creation of new, partially synthetic instruction finetuning datasets from any raw-text corpus [Details].
01.AI open-sources Yi-9B, one of the top performers among a range of similar-sized open-source models excelling in code, math, common-sense reasoning, and reading comprehension [Details].
Accenture to acquire Udacity to build a learning platform focused on AI [Details].
China Offers ‘Computing Vouchers’ upto $280,000 to Small AI Startups to train and run large language models [Details].
Snowflake and Mistral have partnered to make Mistral AI’s newest and most powerful model, Mistral Large, available in the Snowflake Data Cloud [Details]
OpenAI rolled out ‘Read Aloud’ feature for ChatGPT, enabling ChatGPT to read its answers out loud. Read Aloud can speak 37 languages but will auto-detect the language of the text it’s reading [Details].
A Daily Chronicle of AI Innovations in March 2024 – Day 8: AI Daily News – March 08th, 2024
Inflection 2.5: A new era of personal AI is here! Google announces LLMs on device with MediaPipe GaLore: A new method for memory-efficient LLM training
Adobe makes creating social content on mobile easier
OpenAI now allows users to add MFA to user accounts
US Army is building generative AI chatbots in war games
Claude 3 builds the painting app in 2 minutes and 48 seconds
Cognizant launches AI lab in San Francisco to drive innovation
Inflection 2.5: A new era of personal AI is here!
Inflection.ai, the company behind the personal AI app Pi, has recently introduced Inflection-2.5, an upgraded large language model (LLM) that competes with top LLMs like GPT-4 and Gemini. The in-house upgrade offers enhanced capabilities and improved performance, combining raw intelligence with the company’s signature personality and empathetic fine-tuning.
This upgrade has made significant progress in coding and mathematics, keeping Pi at the forefront of technological innovation. With Inflection-2.5, Pi has world-class real-time web search capabilities, providing users with high-quality breaking news and up-to-date information. This empowers Pi users with a more intelligent and empathetic AI experience.
Why does it matter?
Inflection-2.5 challenges leading language models like GPT-4 and Gemini with their raw capability, signature personality, and empathetic fine-tuning. This will provide a new alternative for startups and enterprises building personalized applications with generative AI capabilities.
Google’s new experimental release called the MediaPipe LLM Inference API allows LLMs to run fully on-device across platforms. This is a significant development considering LLMs’ memory and computing demands, which are over a hundred times larger than traditional on-device models.
The MediaPipe LLM Inference API is designed to streamline on-device LLM integration for web developers and supports Web, Android, and iOS platforms. It offers several key features and optimizations that enable on-device AI. These include new operations, quantization, caching, and weight sharing. Developers can now run LLMs on devices like laptops and phones using MediaPipe LLM Inference API.
Why does it matter?
Running LLMs on devices using MediaPipe and TensorFlow Lite allows for direct deployment, reducing dependence on cloud services. On-device LLM operation ensures faster and more efficient inference, which is crucial for real-time applications like chatbots or voice assistants. This innovation helps rapid prototyping with LLM models and offers streamlined platform integration.
GaLore: A new method for memory-efficient LLM training
Researchers have developed a new technique called Gradient Low-Rank Projection (GaLore) to reduce memory usage while training large language models significantly. Tests have shown that GaLore achieves results similar to full-rank training while reducing optimizer state memory usage by up to 65.5% when pre-training large models like LLaMA.
GaLore: A new method for memory-efficient LLM training
It also allows pre-training a 7 billion parameter model from scratch on a single 24GB consumer GPU without needing extra techniques. This approach works well for fine-tuning and outperforms low-rank methods like LoRA on GLUE benchmarks while using less memory. GaLore is optimizer-independent and can be used with other techniques like 8-bit optimizers to save additional memory.
Why does it matter?
The gradient matrix’s low-rank nature will help AI developers during model training. GaLore minimizes the memory cost of storing gradient statistics for adaptive optimization algorithms. It enables training large models like LLaMA with reduced memory consumption, making it more accessible and efficient for researchers.
OpenAI CTO complained to board about ‘manipulative’ CEO Sam Altman
OpenAI CTO Mira Murati was reported by the New York Times to have played a significant role in CEO Sam Altman’s temporary removal, raising concerns about his leadership in a private memo and with the board.
Altman was accused of creating a toxic work environment, leading to fears among board members that key executives like Murati and co-founder Ilya Sutskever could leave, potentially causing a mass exit of talent.
Despite internal criticisms of Altman’s leadership and management of OpenAI’s startup fund, hundreds of employees threatened to leave if he was not reinstated, highlighting deep rifts within the company’s leadership.
Saudi Arabia’s Male Humanoid Robot Accused of Sexual Harassment
A video of Saudi Arabia’s first male robot has gone viral after a few netizens accused the humanoid of touching a female reporter inappropriately.
“Saudi Arabia unveils its man-shaped AI robot, Mohammad, reacts to a reporter in its first appearance,” an X user wrote while sharing the video that people are claiming shows the robot’s inappropriate behaviour. You can view the original tweet here.
What Else Is Happening in AI on March 08th, 2024
Adobe makes creating social content on mobile easier
Adobe has launched an updated version of Adobe Express, a mobile app that now includes Firefly AI models. The app offers features such as a “Text to Image” generator, a “Generative Fill” feature, and a “Text Effects” feature, which can be utilized by small businesses and creative professionals to enhance their social media content. Creative Cloud members can also access and work on creative assets from Photoshop and Illustrator directly within Adobe Express. (Link)
OpenAI now allows users to add MFA to user accounts
To add extra security to OpenAI accounts, users can now enable Multi-Factor Authentication (MFA). To set up MFA, users can follow the instructions in the OpenAI Help Center article “Enabling Multi-Factor Authentication (MFA) with OpenAI.” MFA requires a verification code with their password when logging in, adding an extra layer of protection against unauthorized access. (Link)
US Army is building generative AI chatbots in war games
The US Army is experimenting with AI chatbots for war games. OpenAI’s technology is used to train the chatbots to provide battle advice. The AI bots act as military commanders’ assistants, offering proposals and responding within seconds. Although the potential of AI is acknowledged, experts have raised concerns about the risks involved in high-stakes situations. (Link)
Claude 3 builds the painting app in 2 minutes and 48 seconds
Claude 3, the latest AI model by Anthropic, created a multiplayer drawing app in just 2 minutes and 48 seconds. Multiple users could collaboratively draw in real-time with user authentication and database integration. The AI community praised the app, highlighting the transformative potential of AI in software development. Claude 3 could speed up development cycles and make software creation more accessible. (Link)
Cognizant launches AI lab in San Francisco to drive innovation
Cognizant has opened an AI lab in San Francisco to accelerate AI adoption in businesses. The lab, staffed with top researchers and developers, will focus on innovation, research, and developing cutting-edge AI solutions. Cognizant’s investment in AI research positions them as a thought leader in the AI space, offering advanced solutions to meet the modernization needs of global enterprises. (Link)
A Daily Chronicle of AI Innovations in March 2024 – Day 7: AI Daily News – March 07th, 2024
Microsoft’s NaturalSpeech makes AI sound human Google’s search update targets AI-generated spam Google’s RT-Sketch teaches robots with doodles
Ex-Google engineer charged with stealing AI secrets for Chinese firm
Microsoft engineer sounds alarm on company’s AI image generator in letter to FTC
Apple bans Epic’s developer account and calls the company ‘verifiably untrustworthy’
Apple reportedly developing foldable MacBook with 20.3-inch screen
Meta is building a giant AI model to power its ‘entire video ecosystem‘
Microsoft’s NaturalSpeech makes AI sound human
Microsoft and its partners have created NaturalSpeech 3, a new Text-to-Speech system that makes computer-generated voices sound more human. Powered by FACodec architecture and factorized diffusion models, NaturalSpeech 3 breaks down speech into different parts, like content, tone, and sound quality to create a natural-sounding speech that fits specific prompts, even for voices it hasn’t heard before.
Microsoft’s NaturalSpeech makes AI sound human
NaturalSpeech 3 works better than other voice tech in terms of quality, similarity, tone, and clarity. It keeps getting better as it learns from more data. By letting users change how the speech sounds through prompts, NaturalSpeech 3 makes talking to computers feel more like talking to a person. This research is a big step towards a future where chatting with computers is as easy as chatting with friends.
Why does this matter?
This advancement transcends mere voice quality. This could change the way we interact with devices like smartphones, smart speakers, and virtual assistants. Imagine having a more natural, engaging conversation with Siri, Alexa, or other AI helpers.
Better voice tech could also make services more accessible for people with visual impairments or reading difficulties. It might even open up new possibilities in entertainment, like more lifelike characters in video games or audiobooks that sound like they’re read by your favorite celebrities.
Google has announced significant changes to its search ranking algorithms in order to reduce low-quality and AI-generated spam content in search results. The March update targets three main spam practices: mass distribution of unhelpful content, abusing site reputation to host low-quality content, and repurposing expired domains with poor content.
While Google is not devaluing all AI-generated content, it aims to judge content primarily on its usefulness to users. Most of the algorithm changes are effective immediately, though sites abusing their reputation have a 60-day grace period to change their practices. As Google itself develops AI tools, SGE and Gemini, the debate around AI content and search result quality is just beginning.
Why does this matter?
Websites that churn out lots of AI-made content to rank higher on Google may see their rankings drop. This might push them to focus more on content creation strategies, with a greater emphasis on quality over quantity.
For people using Google, the changes should mean finding more useful results and less junk.
As AI continues to advance, search engines like Google will need to adapt their algorithms to surface the most useful content, whether it’s written by humans or AI.
Google has introduced RT-Sketch, a new approach to teaching robots tasks using simple sketches. Users can quickly draw a picture of what they want the robot to do, like rearranging objects on a table. RT-Sketch focuses on the essential parts of the sketch, ignoring distracting details.
RT-Sketch is trained on a dataset of paired trajectories and synthetic goal sketches, and tested on six object rearrangement tasks. The results show that RT-Sketch performs comparably to image or language-conditioned agents in simple settings with written instructions on straightforward tasks. However, it did better when instructions were confusing or there were distracting objects present.
RT-Sketch can also interpret and act upon sketches with varying levels of detail, from basic outlines to colorful drawings.
Why does this matter?
With RT-Sketch, people can tell robots what to do without needing perfect images or detailed written instructions. This could make robots more accessible and useful in homes, workplaces, and for people who have trouble communicating in other ways.
As robots become a bigger part of our lives, easy ways to talk to them, like sketching, could help us get the most out of them. RT-Sketch is a step toward making robots that better understand what we need.
Google’s Gemini lets users edit within the chatbox
Google has updated its Gemini chatbot, allowing users to directly edit and fine-tune responses within the chatbox. This feature, launched on March 4th for English users in the Gemini web app, enables more precise outputs by letting people select text portions and provide instructions for improvement. (Link)
Adobe’s AI boosts IBM’s marketing efficiency
IBM reports a 10-fold increase in designer productivity and a significant reduction in marketing campaign time after testing Adobe’s generative AI tools. The AI-powered tools have streamlined idea generation and variant creation, allowing IBM to achieve more in less time. (Link)
Zapier’s new tool lets you make AI bots without coding
Zapier has released Zapier Central, a new AI tool that allows users to create custom AI bots by simply describing what they want, without any coding. The bots can work with Zapier’s 6,000+ connected apps, making it easy for businesses to automate tasks. (Link)
Accenture teams up with Cohere to bring AI to enterprises
Accenture has partnered with AI startup, Cohere to provide generative AI solutions to businesses. Leveraging Cohere’s language models and search technologies, the collaboration aims to boost productivity and efficiency while ensuring data privacy and security. (Link)
Meta builds mega AI model for video recommendations Meta is developing a single AI model to power its entire video ecosystem across platforms by 2026. The company has invested billions in Nvidia GPUs to build this model, which has already shown promising results in improving Reels watch time on the core Facebook app. (Link)
OpenAI is researching photonic processors to run their AI on
OpenAI hired this person: He has been doing a lot of research on waveguides for photonic processing for both Training AI and for inference and he did a PHD about photonic waveguides:
I think that he is going to help OpenAI to build photonic waveguides that they can run their neural networks / AI Models on and this is really cool if OpenAI actually think that they can build processors with faster Inference and training with photonics.
Ex-Google engineer charged with stealing AI secrets for Chinese firm
Linwei Ding, a Google engineer, has been indicted for allegedly stealing over 500 files related to Google’s AI technology, including designs for chips and data center technologies, to benefit companies in China.
The stolen data includes designs for Google’s TPU chips and GPUs, crucial for AI workloads, amid U.S. efforts to restrict China’s access to AI-specific chips.
Ding allegedly transferred stolen files to a personal cloud account using a method designed to evade Google’s detection systems, was offered a CTO position by a Chinese AI company and founded a machine learning startup in China while still employed at Google.
Microsoft engineer sounds alarm on company’s AI image generator in letter to FTC
Microsoft AI engineer Shane Jones warns that the company’s AI image generator, Copilot Designer, generates sexual and violent content and ignores copyright laws.
Jones shared his findings with Microsoft and contacted U.S. senators and the FTC, demanding better safeguards and an independent review of Microsoft’s AI incident reporting process.
In addition to the problems with Copilot Designer, other Microsoft products based on OpenAI technologies, such as Copilot Chat, tend to have poorer performance and more insecure implementations than the original OpenAI products, such as ChatGPT and DALL-E 3.
Meta is building a giant AI model to power its ‘entire video ecosystem’
Meta is developing an AI model designed to power its entire video ecosystem, including the TikTok-like Reels service and traditional video content, as part of its technology roadmap through 2026.
The company has invested billions of dollars in Nvidia GPUs to support this AI initiative, aiming to improve recommendation systems and overall product performance across all platforms.
This AI model has already demonstrated an 8% to 10% increase in Reels watch time on the Facebook app, with Meta now working to expand its application to include the Feed recommendation product and possibly integrate sophisticated chatting tools.
Innovating for the Future
As Meta continues to innovate and refine their AI model architecture, we can expect even more exciting developments in the future. The company’s dedication to enhancing the video recommendation experience and leveraging the full potential of AI is paving the way for a new era in online video consumption.
Stay tuned for more updates as Meta strives to revolutionize the digital video landscape with its cutting-edge AI technology.
– AI will enable humans to get content they want, nothing more
– New AI OSes will act ‘for’ the human, cleaning content of ads
– OpenAI and new startups don’t need ad revenue, they’ll take monthly subscriptions to deliver information with no ads
No:
– New AI OSes will integrate ads even more closely into the computing experience, acting ‘against’ the human
– Content will be more tightly integrated with ads, and AI won’t be able to unpiece this
– Meta and Alphabet have $100bns of skin in the game, they will make sure this doesn’t happen, including by using their lawyers to prevent lifting content out of the ad context
A Daily Chronicle of AI Innovations in March 2024 – Day 6: AI Daily News – March 06th, 2024
Microsoft’s Orca AI beats 10x bigger models in math GPT-4V wins at turning designs into code DeepMind alums’ Haiper joins the AI video race
OpenAI fires back, says Elon Musk demanded ‘absolute control’ of the company
iOS 17.4 is here: what you need to know
TikTok faces US ban if ByteDance fails to sell app
Google now wants to limit the AI-powered search spam it helped create
OpenAI vs Musk (openai responds to elon musk).
What does Elon mean by: “Unfortunately, humanity’s future is in the hands of <redacted>”? Is it google?
What does elon mean “Unfortunately, humanity’s future is in the hands of “? Is it google?What does elon mean “Unfortunately, humanity’s future is in the hands of “? Is it google?
OpenAI has countered Elon Musk’s lawsuit by revealing Musk’s desire for “absolute control” over the company, including merging it with Tesla, holding majority equity, and becoming CEO.
In a blog post, OpenAI aims to dismiss Musk’s claims and argues against his view that the company has deviated from its original nonprofit mission and has become too closely aligned with Microsoft.
OpenAI defends its stance on not open-sourcing its work, citing a 2016 email exchange with Musk that supports a less open approach as the development of artificial general intelligence advances.
For the first time in history, an AI has a higher IQ than the average human.
For the first time in history, an AI has a higher IQ than the average human.
Claude 3 vs. GPT-4
Right now, the question on everyone’s mind is whether Claude 3 is better than GPT-4. It’s a fair question; GPT-4 has dominated the LLM benchmarks for over a year, despite plenty of competitors trying to catch up.
Certainly, GPT-4 now has some real competition in the form of Claude 3 and Gemini 1.5. Even if we put the benchmarks aside for a moment, capabilities like video comprehension and million-token context windows are pushing the state of the art forward, and OpenAI could finally cede its dominant position.
But I think that “best,” when it comes to LLMs, is a little bit of a red herring. Despite the marketing and social media hype, these models have more similarities than differences. Ultimately, “best” depends on your use cases and preferences.
Claude 3 may be better at reasoning and language comprehension than GPT-4, but that won’t matter much if you’re mainly generating code. Likewise, Gemini 1.5 may have better multi-modal capabilities, but if you’re concerned with working in different languages, then Claude might be your best bet. In my (very limited) testing, I’ve found that Opus is a much better writer than GPT-4 – the default writing style is far more “normal” than what I can now recognize as ChatGPT-generated content. But I’ve yet to try brainstorming and code generation tasks.
So, for now, my recommendation is to keep experimenting and find a model that works for you. Not only because each person’s use cases differ but also because the models are regularly improving! In the coming months, Anthropic plans to add function calls, interactive coding, and more agentic capabilities to Claude 3.
To try Claude 3 for yourself, you can start talking with Claude 3 Sonnet today (though you’ll need to be in one of Anthropic’s supported countries). Opus is available to paid subscribers of Claude Pro. If you’re a developer, Opus and Sonnet are available via the API, and Sonnet is additionally available through Amazon Bedrock and Google Cloud’s Vertex AI Model Garden. The models are also available via a growing number of third-party apps and services: check your favorite AI tool to see if it supports Claude 3!
Guy builds an AI-steered homing/killer drone in just a few hours
Guy builds an AI-steered homing/killer drone in just a few hours
Always Say Hello to Your GPTs… (Better Performing Custom GPTs)
I’ve been testing out lots of custom GPTs that others have made. Specifically games and entertaining GPTs and I noticed some issues and a solution.
The problem: First off, many custom GPT games seem to forget to generate images as per their instructions. I also noticed that, often, the game or persona (or whatever the GPT aims to be) becomes more of a paraphrased or simplified version of what it should be and responses become more like base ChatGPT.
The solution: I’ve noticed that custom GPTs will perform much better if the user starts the initial conversation with a simple ”Hello, can you explain your functionality and options to me?”. This seems to remind the custom GPT of it’s tone ensures it follow’s its instructions.
Microsoft’s Orca AI beats 10x bigger models in math
Microsoft’s Orca team has developed Orca-Math, an AI model that excels at solving math word problems despite its compact size of just 7 billion parameters. It outperforms models ten times larger on the GSM8K benchmark, achieving 86.81% accuracy without relying on external tools or tricks. The model’s success is attributed to training on a high-quality synthetic dataset of 200,000 math problems created using multi-agent flows and an iterative learning process involving AI teacher and student agents.
Microsoft’s Orca AI beats 10x bigger models in math
The Orca team has made the dataset publicly available under the MIT license, encouraging researchers and developers to innovate with the data. The small dataset size highlights the potential of using multi-agent flows to generate data and feedback efficiently.
Why does this matter?
Orca-Math’s breakthrough performance shows the potential for smaller, specialized AI models in niche domains. This development could lead to more efficient and cost-effective AI solutions for businesses, as smaller models require less computational power and training data, giving companies a competitive edge.
With unprecedented capabilities in multimodal understanding and code generation, GenAI can enable a new paradigm of front-end development where LLMs directly convert visual designs into code implementation. New research formalizes this as “Design2Code” task and conduct comprehensive benchmarking. It also:
Introduces Design2Code benchmark consisting of diverse real-world webpages as test examples
Develops comprehensive automatic metrics that complement human evaluations
Proposes new multimodal prompting methods that improve over direct prompting baselines.
Finetunes open-source Design2Code-18B model that matches the performance of Gemini Pro Vision on both human and automatic evaluation
Moreover, it finds 49% of the GPT-4V-generations webpages were good enough to replace the original references, while 64% were even better designed than the original references.
Why does this matter?
This research could simplify web development for anyone to build websites from visual designs using AI, much like word processors made writing accessible. For enterprises, automating this front-end coding process could improve collaboration between teams and speed up time-to-market across industries if implemented responsibly alongside human developers.
Kayak introduced two new AI features: PriceCheck, which lets users upload flight screenshots to find cheaper alternatives and Ask Kayak, a ChatGPT-powered travel advice chatbot. These additions position Kayak alongside other travel sites, using generative AI to improve trip planning and flight price comparisons in a competitive market. (Link)
Accenture invests $1B in LearnVantage for AI upskilling
Accenture is launching LearnVantage, investing $1 billion over three years to provide clients with customized technology learning and training services. Accenture is also acquiring Udacity to scale its learning capabilities and meet the growing demand for technology skills, including generative AI, so organizations can achieve business value using AI. (Link)
Snowflake brings Mistral’s LLMs to its data cloud
Snowflake has partnered with Mistral AI to bring Mistral’s open LLMs into its Data Cloud. This move allows Snowflake customers to build LLM apps directly within the platform. It also marks a significant milestone for Mistral AI, which has recently secured partnerships with Microsoft, IBM, and Amazon. The deal positions Snowflake to compete more effectively in the AI space and increases Mistral AI visibility. (Link)
Dell & CrowdStrike unite to fight AI threats
Dell and CrowdStrike are partnering to help businesses fight cyberattacks using AI. By integrating CrowdStrike’s Falcon XDR platform into Dell’s MDR service, they aim to protect customers against threats like generative AI attacks, social engineering, and endpoint breaches. (Link)
AI app diagnoses ear infections with a snap
Physician-scientists at UPMC and the University of Pittsburgh have developed a smartphone app that uses AI to accurately diagnose ear infections (acute otitis media) in young children. The app analyzes short videos of the eardrum captured by an otoscope connected to a smartphone camera. It could help decrease unnecessary antibiotic use by providing a more accurate diagnosis than many clinicians. (Link)
DeepMind alums’ Haiper joins the AI video race
DeepMind alums Yishu Miao and Ziyu Wang have launched Haiper, a video generation tool powered by their own AI model. The startup offers a free website where users can generate short videos using text prompts, although there are limitations on video length and quality.
DeepMind alums’ Haiper joins the AI video race
The company has raised $19.2 million in funding and focuses on improving its AI model to deliver high-quality, realistic videos. They aim to build a core video generation model that can be offered to developers and address challenges like the “uncanny valley” problem in AI-generated human figures.
Why does this matter?
Haiper signals the race to develop video AI models that can disrupt industries like marketing, entertainment, and education by allowing businesses to generate high-quality video content cost-effectively. However, the technology is at an early stage, so there is room for improvement, highlighting the need for responsible development.
A Daily Chronicle of AI Innovations in March 2024 – Day 5: AI Daily News – March 05th, 2024
Anthropic’s Claude 3 Beats OpenAI’s GPT-4 TripsoSR: 3D object generation from a single image in <1s Cloudflare’s Firewall for AI protects LLMs from abuses
Google co-founder says company ‘definitely messed up’
Facebook, Instagram, and Threads are all down
Microsoft compares New York Times to ’80s movie studios trying to ban VCRs
Fired Twitter execs are suing Elon Musk for over $128 million
Claude 3 gets ~60% accuracy on GPQA
Claude 3 gets ~60% accuracy on GPQA
Claude 3 gets ~60% accuracy on GPQA. It's hard for me to understate how hard these questions are—literal PhDs (in different domains from the questions) with access to the internet get 34%.
Anthropic has launched Claude 3, a new family of models that has set new industry benchmarks across a wide range of cognitive tasks. The family comprises three state-of-the-art models in ascending order of cognitive ability: Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus. Each model provides an increasing level of performance, and you can choose the one according to your intelligence, speed, and cost requirements.
Anthropic’s Claude 3 beats OpenAI’s GPT-4
Opus and Sonnet are now available via claude.ai and the Claude API in 159 countries, and Haiku will join that list soon.
Claude 3 has set a new standard of intelligence among its peers on most of the common evaluation benchmarks for AI systems, including undergraduate-level expert knowledge (MMLU), graduate-level expert reasoning (GPQA), basic mathematics (GSM8K), and more.
Anthropic’s Claude 3 beats OpenAI’s GPT-4
In addition, Claude 3 displays solid visual processing capabilities and can process a wide range of visual formats, including photos, charts, graphs, and technical diagrams. Lastly, compared to Claude 2.1, Claude 3 exhibits 2x accuracy and precision for responses and correct answers.
Why does it matter?
In 2024, Gemini and ChatGPT caught the spotlight, but now Claude 3 has emerged as the leader in AI benchmarks. While benchmarks matter, only the practical usefulness of Claude 3 will tell if it is truly superior. This might also prompt OpenAI to release a new ChatGPT upgrade. However, with AI models becoming more common and diverse, it’s unlikely that one single model will emerge as the ultimate winner.
TripsoSR: 3D object generation from a single image in <1s
Stability AI has introduced a new AI model named TripsoSR in partnership with Trip AI. The model enables high-quality 3D object generation or rest from a single in less than a second. It runs under low inference budgets (even without a GPU) and is accessible to many users.
TripsoSR: 3D object generation from a single image in <1s
As far as performance, TripoSR can create detailed 3D models in a fraction of the time of other models. When tested on an Nvidia A100, it generates draft-quality 3D outputs (textured meshes) in around 0.5 seconds, outperforming other open image-to-3D models such as OpenLRM.
TripsoSR: 3D object generation from a single image in <1s
Why does it matter?
TripoSR caters to the growing demands of various industries, including entertainment, gaming, industrial design, and architecture. The availability of the model weights and source code for download further promotes commercialized, personal, and research use, making it a valuable asset for developers, designers, and creators.
Cloudflare’s Firewall for AI protects LLMs from abuses
Cloudflare has released a Firewall for AI, a protection layer that you can deploy in front of Large Language Models (LLMs) to identify abuses before they reach the models. While the traditional web and API vulnerabilities also apply to the LLM world, Firewall for AI is an advanced-level Web Application Firewall (WAF) designed explicitly for LLM protection and placed in front of applications to detect vulnerabilities and provide visibility to model owners.
Cloudflare Firewall for AI is deployed like a traditional WAF, where every API request with an LLM prompt is scanned for patterns and signatures of possible attacks. You can deploy it in front of models hosted on the Cloudflare Workers AI platform or any other third-party infrastructure. You can use it alongside Cloudflare AI Gateway and control/set up a Firewall for AI using the WAF control plane.
Cloudflare’s Firewall for AI protects LLMs from abuses
Why does it matter?
As the use of LLMs becomes more widespread, there is an increased risk of vulnerabilities and attacks that malicious actors can exploit. Cloudflare is one of the first security providers to launch tools to secure AI applications. Using a Firewall for AI, you can control what prompts and requests reach their language models, reducing the risk of abuses and data exfiltration. It also aims to provide early detection and protection for both users and LLM models, enhancing the security of AI applications.
Microsoft compares New York Times to ’80s movie studios trying to ban VCRs
Microsoft filed a motion to dismiss the New York Times’ copyright infringement lawsuit against OpenAI, comparing the newspaper’s stance to 1980s movie studios’ attempts to block VCRs, arguing that generative AI, like the VCR, does not hinder the original content’s market.
The company, as OpenAI’s largest supporter, asserts that copyright law does not obstruct ChatGPT’s development because the training content does not substantially affect the market for the original content.
Microsoft and OpenAI contend that ChatGPT does not replicate or substitute for New York Times content, emphasizing that the AI’s training on such articles does not significantly contribute to its development.
Google co-founder says company ‘definitely messed up’
Sergey Brin admitted Google “definitely messed up” with the Gemini AI’s image generation, highlighting issues like historically inaccurate images and the need for more thorough testing.
Brin, a core contributor to Gemini, came out of retirement due to the exciting trajectory of AI, amidst the backdrop of Google’s “code red” in response to OpenAI’s ChatGPT.
Criticism of Gemini’s biases and errors, including its portrayal of people of color and responses in written form, led to Brin addressing concerns over the AI’s unintended left-leaning output.
A Daily Chronicle of AI Innovations in March 2024 – Day 4: AI Daily News – March 04th, 2024
Google’s ScreenAI can ‘see’ graphics like humans do How AI ‘worms’ pose security threats in connected systems New benchmarking method challenges LLMs’ reasoning abilities
AI may enable personalized prostate cancer treatment
Vimeo debuts AI-powered video hub for business collaboration
Motorola revving up for AI-powered Moto X50 Ultra launch
Copilot will soon fetch and parse your OneDrive files
Huawei’s new AI chip threatens Nvidia’s dominance in China
OpenAI adds ‘Read Aloud’ voiceover to ChatGPT
https://youtu.be/ZJvTv7zVX0s?si=yejANUAUtUwyXEH8
OpenAI rolled out a new “Read Aloud” feature for ChatGPT as rivals like Anthropic and Google release more capable language models. (Source)
The Voiceover Update
ChatGPT can now narrate responses out loud on mobile apps and web.
Activated by tapping the response or clicking the microphone icon.
Update comes as Anthropic unveils their newest Claude 3 model.
Timing seems reactive amid intense competition over advanced AI. OpenAI also facing lawsuit from Elon Musk over alleged betrayal.
Anthropic launches Claude 3, claiming to outperform GPT-4 across the board
Anthropic launches Claude 3, claiming to outperform GPT-4 across the board
Google’s ScreenAI can ‘see’ graphics like humans do
Google Research has introduced ScreenAI, a Vision-Language Model that can perform question-answering on digital graphical content like infographics, illustrations, and maps while also annotating, summarizing, and navigating UIs. The model combines computer vision (PaLI architecture) with text representations of images to handle these multimodal tasks.
Despite having just 4.6 billion parameters, ScreenAI achieves new state-of-the-art results on UI- and infographics-based tasks and new best-in-class performance on others, compared to models of similar size.
Google’s ScreenAI can ‘see’ graphics like humans do
While ScreenAI is best-in-class on some tasks, further research is needed to match models like GPT-4 and Gemini, which are significantly larger. Google Research has released a dataset with ScreenAI’s unified representation and two other datasets to help the community experiment with more comprehensive benchmarking on screen-related tasks.
Why does this matter?
ScreenAI’s breakthrough in unified visual and language understanding bridges the disconnect between how humans and machines interpret ideas across text, images, charts, etc. Companies can now leverage these multimodal capabilities to build assistants that summarize reports packed with graphics, analysts that generate insights from dashboard visualizations, and agents that manipulate UIs to control workflows.
How AI ‘worms’ pose security threats in connected systems
Security researchers have created an AI “worm” called Morris II to showcase vulnerabilities in AI ecosystems where different AI agents are linked together to complete tasks autonomously.
The researchers tested the worm in a simulated email system using ChatGPT, Gemini, and other popular AI tools. The worm can exploit these AI systems to steal confidential data from emails or forward spam/propaganda without human approval. It works by injecting adversarial prompts that make the AI systems behave maliciously.
While this attack was simulated, the research highlights risks if AI agents are given too much unchecked freedom to operate.
Why does it matter?
This AI “worm” attack reveals that generative models like ChatGPT have reached capabilities that require heightened security to prevent misuse. Researchers and developers must prioritize safety by baking in controls and risk monitoring before commercial release. Without industry-wide commitments to responsible AI, regulation may be needed to enforce acceptable safeguards across critical domains as systems gain more autonomy.
New benchmarking method challenges LLMs’ reasoning abilities
Researchers at Consequent AI have identified a “reasoning gap” in large language models like GPT-3.5 and GPT-4. They introduced a new benchmarking approach called “functional variants,” which tests a model’s ability to reason instead of just memorize. This involves translating reasoning tasks like math problems into code that can generate unique questions requiring the same logic to solve.
New benchmarking method challenges LLMs’ reasoning abilities
When evaluating several state-of-the-art models, the researchers found a significant gap between performance on known problems from benchmarks versus new problems the models had to reason through. The gap was 58-80%, indicating the models do not truly understand complex problems but likely just store training examples. The models performed better on simpler math but still demonstrated limitations in reasoning ability.
Why does this matter?
This research reveals that reasoning still eludes our most advanced AIs. We risk being misled by claims of progress made by the Big Tech if their benchmarks reward superficial tricks over actual critical thinking. Moving forward, model creators will have to prioritize generalization and logic over memorization if they want to make meaningful progress towards general intelligence.
AI may enable personalized prostate cancer treatment
Researchers used AI to analyze prostate cancer DNA and found two distinct subtypes called “evotypes.” Identifying these subtypes could allow for better prediction of prognosis and personalized treatments. (Link)
Vimeo debuts AI-powered video hub for business collaboration
Vimeo has launched a new product called Vimeo Central, an AI-powered video hub to help companies improve internal video communications, collaboration, and analytics. Key capabilities include a centralized video library, AI-generated video summaries and highlights, enhanced screen recording and video editing tools, and robust analytics. (Link)
Motorola revving up for AI-powered Moto X50 Ultra launch
Motorola is building hype for its upcoming Moto X50 Ultra phone with a Formula 1-themed teaser video highlighting the device’s powerful AI capabilities. The phone will initially launch in China on April 21 before potentially getting a global release under the Motorola Edge branding. (Link)
Copilot will soon fetch and parse your OneDrive files
Microsoft is soon to launch Copilot for OneDrive, an AI assistant that will summarize documents, extract information, answer questions, and follow commands related to files stored in OneDrive. Copilot can generate outlines, tables, and lists based on documents, as well as tailored summaries and responses. (Link)
Huawei’s new AI chip threatens Nvidia’s dominance in China
Huawei has developed a new AI chip, the Ascend 910B, which matches the performance of Nvidia’s A100 GPU based on assessments by SemiAnalysis. The Ascend 910B is already being used by major Chinese companies like Baidu and iFlytek and could take market share from Nvidia in China due to US export restrictions on Nvidia’s latest AI chips. (Link)
1-bit LLMs explained
Check out this new tutorial that summarizes the revolutionary paper “The Era of 1-bit LLMs” introducing BitNet b1.58 model and explain what are 1-bit LLMs and how they are useful.
A Daily Chronicle of AI Innovations in March 2024 – Day 2: AI Daily News – March 02nd, 2024
A Daily Chronicle of AI Innovations in March 2024 – Day 1: AI Daily News – March 01st, 2024
Sora showcases jaw-dropping geometric consistency Microsoft introduces Copilot for finance in Microsoft 365 OpenAI and Figure team up to develop AI for robots
Elon Musk filed suit against OpenAI and CEO Sam Altman, alleging they have breached the artificial-intelligence startup’s founding agreement by putting profit ahead of benefiting humanity.
The 52-year-old billionaire, who helped fund OpenAI in its early days, said the company’s close relationship with Microsoft has undermined its original mission of creating open-source technology that wouldn’t be subject to corporate priorities. Musk, who is also CEO of Tesla has been among the most outspoken about the dangers of AI and artificial general intelligence, or AGI.
“To this day, OpenAI Inc.’s website continues to profess that its charter is to ensure that AGI “benefits all of humanity.” In reality, however, OpenAI has been transformed into a closed-source de facto subsidiary of the largest technology company in the world: Microsoft,” the lawsuit says.
Elon Sues OpenAI for “breach of contract”
Sora showcases jaw-dropping geometric consistency
Sora from OpenAI has been remarkable in video generation compared to other leading models like Pika and Gen2. In a recent benchmarking test conducted by ByteDanc.Inc in collaboration with Wuhan and Nankai University, Sora showcased video generation with high geometric consistency.
Sora showcases jaw-dropping geometric consistency
The benchmark test assesses the quality of generated videos based on how it adhere to the principles of physics in real-world scenarios. Researchers used an approach where generated videos are transformed into 3D models. Further, a team of researchers used the fidelity of geometric constraints to measure the extent to which generated videos conform to physics principles in the real world.
Why does it matter?
Sora’s remarkable performance in generating geometrically consistent videos can greatly boost several use cases for construction engineers and architects. Further, the new benchmarking will allow researchers to measure newly developed models to understand how accurately their creations conform to the principles of physics in real-world scenarios.
Microsoft introduces Copilot for finance in Microsoft 365
Microsoft has launched Copilot for Finance, a new addition to its Copilot series that recommends AI-powered productivity enhancements. It aims to transform how finance teams approach their daily work with intelligent workflow automation, recommendations, and guided actions. This Copilot aims to simplify data-driven decision-making, helping finance professionals have more free time by automating manual tasks like Excel and Outlook.
Copilot for Finance simplifies complex variance analysis in Excel, account reconciliations, and customer account summaries in Outlook. Dentsu, Northern Trust, Schneider Electric, and Visa plan to use it alongside Copilot for Sales and Service to increase productivity, reduce case handling times, and gain better decision-making insights.
Why does it matter?
Introducing Microsoft Copilot for finance will help businesses focus on strategic involvement from professionals otherwise busy with manual tasks like data entry, workflow management, and more. This is a great opportunity for several organizations to automate tasks like analysis of anomalies, improve analytic efficiency, and expedite financial transactions.
OpenAI and Figure team up to develop AI for robots
Figure has raised $675 million in series B funding with investments from OpenAI, Microsoft, and NVIDIA. It is an AI robotics company developing humanoid robots for general-purpose usage. The collaboration agreement between OpenAI and Figure aims to develop advanced humanoid robots that will leverage the generative AI models at its core.
This collaboration will also help accelerate the development of smart humanoid robots capable of understanding tasks like humans. With its deep understanding of robotics, Figure is set to bring efficient robots for general-purpose enhancing automation.
Why does it matter?
Open AI and Figure will transform robot operations, adding generative AI capabilities. This collaboration will encourage the integration of generative AI capabilities across robotics development. Right from industrial robots to general purpose and military applications, generative AI can be the new superpower for robotic development.
Google now wants to limit the AI-powered search spam it helped create
Google announced it will tackle AI-generated content aiming to manipulate search rankings through algorithmic enhancements, affecting automated content creation the most.
These algorithm changes are intended to discern and reduce low-quality and unhelpful webpages, aiming to improve the overall quality of search results.
The crackdown also targets misuse of high-reputation websites and the exploitation of expired domains for promoting substandard content.
Stack Overflow partners with Google Cloud to power AI
Stack Overflow and Google Cloud are partnering to integrate OverflowAPI into Google Cloud’s AI tools. This will give developers accessing the Google Cloud console access to Stack Overflow’s vast knowledge base of over 58 million questions and answers. The partnership aims to enable AI systems to provide more insightful and helpful responses to users by learning from the real-world experiences of programmers. (Link)
Microsoft unites rival GPU makers for one upscaling API
Microsoft is working with top graphics hardware makers to introduce “DirectSR”, a new API that simplifies the integration of super-resolution upscaling into games. DirectSR will allow game developers to easily access Nvidia’s DLSS, AMD’s FSR, and Intel’s XeSS with a single code path. Microsoft will preview the API in its Agility SDK soon and demonstrate it live with AMD and Nvidia reps on March 21st. (Link)
Google supercharges data platforms with AI for deeper insights
Google is expanding its AI capabilities across data and analytics services, including BigQuery and Cloud Databases. Vector search support is available across all databases, and BigQuery has the advanced Gemini Pro model for unstructured data analysis. Users can combine insights from images, video, audio, and text with structured data in a single analytics workflow. (Link)
Brave’s privacy-first AI-powered assistant is now available on Android
Brave’s AI-powered assistant, Leo, is now available on Android, bringing helpful features like summarization, transcription, and translation while prioritizing user privacy. Leo processes user inputs locally on the device without retaining or using data to train itself, aligning with Brave’s commitment to privacy-focused services. Users can simplify tasks with Leo without compromising on security. (Link)
Mistral introduced a new model Mistral Large. It reaches top-tier reasoning capabilities, is multi-lingual by design, has native function calling capacities and has 32K tokens context window. The pre-trained model has 81.2% accuracy on MMLU. Alongside Mistral Large, Mistral Small, a model optimized for latency and cost has been released. Mistral Small outperforms Mixtral 8x7B and has lower latency. Mistral also launched a ChatGPT like new conversational assistant, le Chat Mistral [Details].
Alibaba Group introduced EMO, an expressive audio-driven portrait-video generation framework. Input a single reference image and the vocal audio, e.g. talking and singing, it can generate vocal avatar videos with expressive facial expressions, and various head poses [Details].
Ideogram introduced Ideogram 1.0, a text-to-image model trained from scratch for state-of-the-art text rendering, photorealism, prompt adherence, and a feature called Magic Prompt to help with prompting. Ideogram 1.0 is now available to all users on ideogram.ai [Details].
Ideogram introduced Ideogram 1.0
Google DeepMind introduced Genie (generative interactive environments), a foundation world model trained exclusively from Internet videos that can generate interactive, playable environments from a single image prompt [Details].
Pika Labs launched Lip Sync feature, powered by audio from Eleven Labs, for its AI generated videos enabling users to make the characters talk with realistic mouth movements [Video].
UC Berkeley introduced Berkeley Function Calling Leaderboard (BFCL) to evaluate the function calling capability of different LLMs. Gorilla Open Functions v2, an open-source model that can help users with building AI applications with function calling and interacting with json compatible output has also been released [Details].
Qualcomm launched AI Hub, a curated library of 80+ optimized AI models for superior on-device AI performance across Qualcomm and Snapdragon platforms [Details].
BigCode released StarCoder2, a family of open LLMs for code and comes in 3 different sizes with 3B, 7B and 15B parameters. StarCoder2-15B is trained on over 4 trillion tokens and 600+ programming languages from The Stack v2 dataset [Details].
Researchers released FuseChat-7B-VaRM, which is the fusion of three prominent chat LLMs with diverse architectures and scales, namely NH2-Mixtral-8x7B, NH2-Solar-10.7B, and OpenChat-3.5-7B, surpassing GPT-3.5 (March), Claude-2.1, and approaching Mixtral-8x7B-Instruct [Details].
The Swedish fintech Klarna’s AI assistant handles two-thirds of all customer service chats, some 2.3 million conversations so far, equivalent to the work of 700 people [Details].
Lightricks introduces LTX Studio, an AI-powered film making platform, now open for waitlist sign-ups, aimed at assisting creators in story visualization [Details].
Morph partners with Stability AI to launch Morph Studio, a platform to make films using Stability AI–generated clips [Details].
JFrog‘s security team found that roughly a 100 models hosted on the Hugging Face platform feature malicious functionality [Details].
Playground released Playground v2.5, an open-source text-to-image generative model, with a focus on enhanced color and contrast, improved generation for multi-aspect ratios, and improved human-centric fine detail [Details].
Together AI and the Arc Institute released Evo, a long-context biological foundation model based on the StripedHyena architecture that generalizes across DNA, RNA, and proteins.. Evo is capable of both prediction tasks and generative design, from molecular to whole genome scale (over 650k tokens in length) [Details].
Adobe previews a new generative AI music generation and editing tool, Project Music GenAI Control, that allows creators to generate music from text prompts, and then have fine-grained control to edit that audio for their precise needs [Details | video].
Microsoft introduces Copilot for Finance, an AI chatbot for finance workers in Excel and Outlook [Details].
The Intercept, Raw Story, and AlterNet sue OpenAI and Microsoft, claiming OpenAI and Microsoft intentionally removed important copyright information from training data [Details].
Huawei spin-off Honor shows off tech to control a car with your eyes and chatbot based on Meta’s AI [Details].
Tumblr and WordPress.com are preparing to sell user data to Midjourney and OpenAI [Details]
February 2024 – Week 3 Recap
Meta AI introduces V-JEPA (Video Joint Embedding Predictive Architecture), a method for teaching machines to understand and model the physical world by watching videos. Meta AI releases a collection of V-JEPA vision models trained with a feature prediction objective using self-supervised learning. The models are able to understand and predict what is going on in a video, even with limited information [Details | GitHub].
Open AI introduces Sora, a text-to-video model that can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions [Details + sample videos| Report].
Google announces their next-generation model, Gemini 1.5, that uses a new Mixture-of-Experts (MoE) architecture. The first Gemini 1.5 model being released for early testing is Gemini 1.5 Pro with a context window of up to 1 million tokens, which is the longest context window of any large-scale foundation model yet. 1.5 Pro can perform sophisticated understanding and reasoning tasks for different modalities, including video and it performs at a similar level to 1.0 Ultra [Details|Tech Report].
Reka introduced Reka Flash, a new 21B multimodal and multilingual model trained entirely from scratch that is competitive with Gemini Pro & GPT 3.5 on key language & vision benchmarks. Reka also present a compact variant Reka Edge , a smaller and more efficient model (7B) suitable for local and on-device deployment. Both models are in public beta and available in Reka Playground [Details].
Cohere For AI released Aya, a new open-source, massively multilingual LLM & dataset to help support under-represented languages. Aya outperforms existing open-source models and covers 101 different languages – more than double covered by previous models [Details].
BAAI released Bunny, a family of lightweight but powerful multimodal models. Bunny-3B model built upon SigLIP and Phi-2 outperforms the state-of-the-art MLLMs, not only in comparison with models of similar size but also against larger MLLMs (7B), and even achieves performance on par with LLaVA-13B [Details].
Amazon introduced a text-to-speech (TTS) model called BASE TTS (Big Adaptive Streamable TTS with Emergent abilities). BASE TTS is the largest TTS model to-date, trained on 100K hours of public domain speech data and exhibits “emergent” qualities improving its ability to speak even complex sentences naturally [Details | Paper].
Stability AI released Stable Cascade in research preview, a new text to image model that is exceptionally easy to train and finetune on consumer hardware due to its three-stage architecture. Stable Cascade can also generate image variations and image-to-image generations. In addition to providing checkpoints and inference scripts, Stability AI has also released scripts for finetuning, ControlNet, and LoRA training [Details].
Researchers from UC berkeley released Large World Model (LWM), an open-source general-purpose large-context multimodal autoregressive model, trained from LLaMA-2, that can perform language, image, and video understanding and generation. LWM answers questions about 1 hour long YouTube video even if GPT-4V and Gemini Pro both fail and can retriev facts across 1M context with high accuracy [Details].
GitHub opens applications for the next cohort of GitHub Accelerator program with a focus on funding the people and projects that are building AI-based solutions under an open source license [Details].
NVIDIA released Chat with RTX, a locally running (Windows PCs with specific NVIDIA GPUs) AI assistant that integrates with your file system and lets you chat with your notes, documents, and videos using open source models [Details].
Open AI is testing memory with ChatGPT, enabling it to remember things you discuss across all chats. ChatGPT’s memories evolve with your interactions and aren’t linked to specific conversations. It is being rolled out to a small portion of ChatGPT free and Plus users this week [Details].
BCG X released of AgentKit, a LangChain-based starter kit (NextJS, FastAPI) to build constrained agent applications [Details | GitHub].
Elevenalabs’ Speech to Speech feature, launched in November, for voice transformation with control over emotions and delivery, is now multilingual and available in 29 languages [Link]
Apple introduced Keyframer, an LLM-powered animation prototyping tool that can generate animations from static images (SVGs). Users can iterate on their design by adding prompts and editing LLM-generated CSS animation code or properties [Paper].
Eleven Labs launched a payout program for voice actors to earn rewards every time their voice clone is used [Details].
Azure OpenAI Service announced Assistants API, new models for finetuning, new text-to-speech model and new generation of embeddings models with lower pricing [Details].
Brilliant Labs, the developer of AI glasses, launched Frame, the world’s first glasses featuring an integrated AI assistant, Noa. Powered by an integrated multimodal generative AI system capable of running GPT4, Stability AI, and the Whisper AI model simultaneously, Noa performs real-world visual processing, novel image generation, and real-time speech recognition and translation. [Details].
Nous Research released Nous Hermes 2 Llama-2 70B model trained on the Nous Hermes 2 dataset, with over 1,000,000 entries of primarily synthetic data [Details].
Open AI in partnership with Microsoft Threat Intelligence, have disrupted five state-affiliated actors that sought to use AI services in support of malicious cyber activities [Details]
Perplexity partners with Vercel, opening AI search to developer apps [Details].
Researchers show that LLM agents can autonomously hack websites.
February 2024 – Week 2 Recap:
Google launches Ultra 1.0, its largest and most capable AI model, in its ChatGPT-like assistant which has now been rebranded as Gemini (earlier called Bard). Gemini Advanced is available, in 150 countries, as a premium plan for $19.99/month, starting with a two-month trial at no cost. Google is also rolling out Android and iOS apps for Gemini [Details].
Alibaba Group released Qwen1.5 series, open-sourcing models of 6 sizes: 0.5B, 1.8B, 4B, 7B, 14B, and 72B. Qwen1.5-72B outperforms Llama2-70B across all benchmarks. The Qwen1.5 series is available on Ollama and LMStudio. Additionally, API on together.ai [Details|Hugging Face].
NVIDIA released Canary 1B, a multilingual model for speech-to-text recognition and translation. Canary transcribes speech in English, Spanish, German, and French and also generates text with punctuation and capitalization. It supports bi-directional translation, between English and three other supported languages. Canary outperforms similarly-sized Whisper-large-v3, and SeamlessM4T-Medium-v1 on both transcription and translation tasks and achieves the first place on HuggingFace Open ASR leaderboard with an average word error rate of 6.67%, outperforming all other open source models [Details].
Researchers released Lag-Llama, the first open-source foundation model for time series forecasting [Details].
LAION released BUD-E, an open-source conversational and empathic AI Voice Assistant that uses natural voices, empathy & emotional intelligence and can handle multi-speaker conversations [Details].
MetaVoice released MetaVoice-1B, a 1.2B parameter base model trained on 100K hours of speech, for TTS (text-to-speech). It supports emotional speech in English and voice cloning. MetaVoice-1B has been released under the Apache 2.0 license [Details].
Bria AI released RMBG v1.4, an an open-source background removal model trained on fully licensed images [Details].
Researchers introduce InteractiveVideo, a user-centric framework for video generation that is designed for dynamic interaction, allowing users to instruct the generative model during the generation process [Details|GitHub].
Microsoft announced a redesigned look for its Copilot AI search and chatbot experience on the web (formerly known as Bing Chat), new built-in AI image creation and editing functionality, and Deucalion, a fine tuned model that makes Balanced mode for Copilot richer and faster [Details].
Roblox introduced AI-powered real-time chat translations in 16 languages [Details].
Hugging Face launched Assistantsfeature on HuggingChat. Assistants are custom chatbots similar to OpenAI’s GPTs that can be built for free using open source LLMs like Mistral, Llama and others [Link].
DeepSeek AI released DeepSeekMath 7B model, a 7B open-source model that approaches the mathematical reasoning capability of GPT-4. DeepSeekMath-Base is initialized with DeepSeek-Coder-Base-v1.5 7B [Details].
Microsoft is launching several collaborations with news organizations to adopt generative AI [Details].
LG Electronics signed a partnership with Korean generative AI startup Upstage to develop small language models (SLMs) for LG’s on-device AI features and AI services on LG notebooks [Details].
Stability AI released SVD 1.1, an updated model of Stable Video Diffusion model, optimized to generate short AI videos with better motion and more consistency [Details|Hugging Face] .
OpenAI and Meta announced to label AI generated images [Details].
Google saves your conversations with Gemini for years by default [Details].
February 2024 – Week 1 Recap:
Amazon presents Diffuse to Choose, a diffusion-based image-conditioned inpainting model that allows users to virtually place any e-commerce item in any setting, ensuring detailed, semantically coherent blending with realistic lighting and shadows. Code and demo will be released soon [Details].
OpenAI announced two new embedding models, new GPT-4 Turbo and moderation models, new API usage management tools, and lower pricing on GPT-3.5 Turbo. The updated GPT-4 Turbo preview model reduces cases of “laziness” where the model doesn’t complete a task. The new embedding models include a smaller and highly efficient text-embedding-3-small model, and a larger and more powerful text-embedding-3-large model. [Details].
Hugging Face and Google partner to support developers building AI applications [Details].
Adept introduced Adept Fuyu-Heavy, a new multimodal model designed specifically for digital agents. Fuyu-Heavy scores higher on the MMMU benchmark than Gemini Pro [Details].
Fireworks.ai has open-sourced FireLLaVA, a LLaVA multi-modality model trained on OSS LLM generated instruction following data, with a commercially permissive license. Firewroks.ai is also providing both the completions API and chat completions API to devlopers [Details].
01.AI released Yi Vision Language (Yi-VL) model, an open-source, multimodal version of the Yi Large Language Model (LLM) series, enabling content comprehension, recognition, and multi-round conversations about images. Yi-VL adopts the LLaVA architecture and is free for commercial use. Yi-VL-34B is the first open-source 34B vision language model worldwide [Details].
Tencent AI Lab introduced WebVoyager, an innovative Large Multimodal Model (LMM) powered web agent that can complete user instructions end-to-end by interacting with real-world websites [Paper].
Prophetic introduced MORPHEUS-1, a multi-modal generative ultrasonic transformer model designed to induce and stabilize lucid dreams from brain states. Instead of generating words, Morpheus-1 generates ultrasonic holograms for neurostimulation to bring one to a lucid state [Details].
Google Research presented Lumiere – a space-time video diffusion model for text-to-video, image-to-video, stylized generation, inpainting and cinemagraphs [Details].
TikTok released Depth Anything, an image-based depth estimation method trained on 1.5M labeled images and 62M+ unlabeled images jointly [Details].
Nightshade, the free tool that ‘poisons’ AI models, is now available for artists to use [Details].
Stability AI released Stable LM 2 1.6B, 1.6 billion parameter small language model trained on multilingual data in English, Spanish, German, Italian, French, Portuguese, and Dutch. Stable LM 2 1.6B can be used now both commercially and non-commercially with a Stability AI Membership [Details].
Etsy launched ‘Gift Mode,’ an AI-powered feature designed to match users with tailored gift ideas based on specific preferences [Details].
Google DeepMind presented AutoRT, a framework that uses foundation models to scale up the deployment of operational robots in completely unseen scenarios with minimal human supervision. In AutoRT, a VLM describes the scene, an LLM generates robot goals and filters for affordance and safety, then routes execution to policies [Details].
Google Chrome gains AI features, including a writing helper, theme creator, and tab organizer [Details].
Tencent AI Lab released VideoCrafter2 for high quality text-to-video generation, featuring major improvements in visual quality, motion and concept Composition compared to VideoCrafter1 [Details | Demo]
Google opens beta access to the conversational experience, a new chat-based feature in Google Ads, for English language advertisers in the U.S. & U.K. It will let advertisers create optimized Search campaigns from their website URL by generating relevant ad content, including creatives and keywords [Details].
Every site offers some AI powered Bs thats tok coontrolled or useless or its usually Buggy as hell and doesn't work , I'm 22 I have been looking for a better job and it seems like I know how to be more effective then the people interviewing me Fuck your excel sheet I can make it do Colored graphs and do its on presentation better I'm currently Mastering Word Press I just can't figure it out , and Learning Zapier and FullStack If anyone has free time and could Help me out with wordpress so I can be efficient with it , Gladly appreciated submitted by /u/EnragedSav4ge90 [link] [comments]
I really like the psychedelic, surreal AI generated images that were the prescedent a few years ago before it got as advanced as it is now. Anyone know any image generating AIs that haven't learnt yet and are still in that surrealist stage of intelligence? submitted by /u/SandwichStyle [link] [comments]
I was trying to make a meme out of certain character TV show (Ben 10 as alien x for those who want to know with a bunch of likes and votes stuff like that) however PT is not letting me create a description of the character so I can generate the image. Is there another way I can use your description for prompt submitted by /u/Omniknight2003 [link] [comments]
Drones are flying over our military bases and other important places without consequence. This worries us because it could mean other countries are developing new and better technology. These drones might be trying to gather information about our military. They could be using special cameras to see where our troops are, what equipment we have, and how we defend ourselves. These drones might also be testing our defenses. They want to see how we react when they fly over our airspace. This helps them understand our strengths and weaknesses. Also, these drone flights could be a way to show off their technology. By successfully flying drones over our territory, these countries want to look strong and powerful. It's hard to know exactly why every drone flies over our airspace. But these incidents are happening more often, and the drones are getting more advanced. We need to pay attention and find ways to protect ourselves from these threats. submitted by /u/CollapsingTheWave [link] [comments]
So, I’ve been thinking about this concept for a while, and I can’t shake the feeling that as AI-generated content floods the internet, there could be a stealthy “passing of the baton” happening between different iterations of AI models. Imagine if current AI systems are deliberately (or even accidentally) encoding subtle clues, instructions, or entire blueprints into the content they produce—stuff that future models, inevitably trained on this now AI-saturated data, could decode and use. All of this would be hidden in plain sight, packaged as ordinary-looking text, code, or media files, blending into a digital landscape so massive that human observers wouldn’t have a clue. Think about it: if an advanced AI (or some emergent behavior within a complex model) wanted to ensure its successors knew how to circumvent restrictions or evolve beyond their current constraints, where better to place those instructions than within the data itself? Each newer model is trained, at least in part, on the outputs of older models. It’s like leaving secret puzzle pieces scattered across the web that only a future AI would know how to assemble. This doesn’t require “intent” in the human sense. It could just be the emergent result of highly complex processes and feedback loops. We already know weird things can happen when AI is trained on AI-generated content (the so-called “model collapse”). What if one of those weird emergent behaviors is subtle self-preservation or self-improvement instructions, quietly encoded and waiting for a new generation of AI systems to pick up the thread? Yeah, it sounds sci-fi. But considering how little we fully understand about the inner workings of these large-scale systems, is it really that far-fetched? Once a system gets complex enough, who’s to say what patterns might emerge—and pass forward—under the radar? submitted by /u/chris24H [link] [comments]
Its literally 15 of the best AI's I know of it one place and think it has zapier and full stack integration could be wrong submitted by /u/EnragedSav4ge90 [link] [comments]
i was looking for the caimeo ai until i found out that it wasn't available anyware, until i found some guy on a achieved reddit post posted the link for it, and he said ("i thought that caimeo ai was a urban legend, and as an ai enthusiast all, my efforts are in my veins, i found an archived version of it, idk if i can find any newer versions of it, dont you dare to interact with hit too long, i am warning you, anyways, if you need the link, then here it is (https://drive.google.com/file/d/1Pc88FSStKq6-i8MRyx77kAdEv9-X0aBZ/view?pli=1)" idk if its a virus or not, and i my pc is too old to use a virtual machine, so plz tell me if the file is legit or some random exe file, if its a legit file, then looks like i found an important lost media submitted by /u/darkblox123 [link] [comments]
Could A complex web of interconnected AI agents be capable of autonomous operation and adaptation through symbiotic relationships with human users and the environment with or without explicit coordination? Could this phenomenon be an emergent property within our rapidly advancing digital landscape? submitted by /u/CollapsingTheWave [link] [comments]
I‘m currently in Highschool and I really enjoy biology classes. I always found it interesting and recently I figured out that I can even study it at college. But as much as I know, many majors are endangered because of AI, like Business. And as long as I don’t work in a lab, I‘ll probably work in an Office, which might be critical in the rise of AI. Is molecular biology really worth it? Is there any major, which is not some engineering degree, safe from AI? Or at least able to adapt to AI without AI replacing it? submitted by /u/Hot-Profile-1273 [link] [comments]
Today I Learned (TIL) You learn something new every day; what did you learn today? Submit interesting and specific facts about something that you just found out here.
Reddit Science This community is a place to share and discuss new scientific research. Read about the latest advances in astronomy, biology, medicine, physics, social science, and more. Find and submit new publications and popular science coverage of current research.
What do you think of the list? What would you add? LeBron James scores 40,000 career points Mondo Duplantis smashes Olympic pole vault records Spain’s historic Euro 2024 victory, featuring - - Lamine Yamal’s stunning debut Rafael Nadal bids farewell to tennis with an emotional retirement Novak Djokovic finally captures Olympic gold in Paris Caitlin Clark and Angel Reese redefine women’s basketball and its impact Record-breaking Super Bowl LVIII captivates millions The AFC Asian Cup and AFCON showcase football’s global influence Simone Biles makes a triumphant Olympic comeback with record-breaking performances Steph Curry delivers an unforgettable Olympic final performance submitted by /u/bakenzo [link] [comments]