A Daily Chronicle of AI Innovations in April 2024

A daily chronicle of AI Innovations April 01st 2024

AI Dashboard is available on the Web, Apple, Google, and Microsoft, PRO version

AI Innovations in April 2024.

Welcome to the April 2024 edition of the Daily Chronicle, your gateway to the latest Artificial Intelligence innovations! Join us as we uncover the most recent advancements, trends, and groundbreaking discoveries in the world of AI. Explore a realm where industry leaders gather at events like ‘AI Innovations at Work’ and where visionary forecasts shape the future of AI. Stay informed with daily updates as we navigate through the dynamic world of AI, uncovering its potential impact and exploring cutting-edge developments throughout this exciting month. Join us on this thrilling journey into the limitless possibilities of AI in April 2024.

Experience the transformative capabilities of AI with “Read Aloud For Me – AI Dashboard – AI Tools Catalog” – your ultimate AI Dashboard and Hub. Seamlessly access a comprehensive suite of top-tier AI tools within a single app, meticulously crafted to enhance your efficiency and streamline your digital interactions. Now available on the web at readaloudforme.com and across popular app platforms including Apple, Google, and Microsoft, “Read Aloud For Me – AI Dashboard” places the future of AI at your fingertips, blending convenience with cutting-edge innovation. Whether for professional endeavors, educational pursuits, or personal enrichment, our app serves as your portal to the forefront of AI technologies. Embrace the future today by downloading our app and revolutionize your engagement with AI tools.

AI Tools Catalog - AI Dashboard - Read Aloud For Me
AI Tools Catalog – AI Dashboard – Read Aloud For Me

A Daily chronicle of AI Innovations April 12th 2024: 💥 OpenAI fires two researchers for alleged leaking; 🍎 Apple is planning to bring new AI-focused M4 chips to entire line of Macs; 🤷‍♀️ Amazon CEO: don’t wait for us to launch a ChatGPT competitor; 💬 ChatGPT GPT-4 just got a huge upgrade; 🧠 Gabe Newell, the man behind Steam, is working on a brain-computer interface; 🔍 Cohere’s Rerank 3 powers smarter enterprise search; 💻 Apple M4 Macs: Coming soon with AI power!; 📝 Meta’s OpenEQA puts AI’s real-world comprehension to test

Cohere’s Rerank 3 powers smarter enterprise search

Cohere has released a new model, Rerank 3, designed to improve enterprise search and Retrieval Augmented Generation (RAG) systems. It can be integrated with any database or search index and works with existing legacy applications.

Cohere’s Rerank 3 powers smarter enterprise search
Cohere’s Rerank 3 powers smarter enterprise search

Rerank 3 offers several improvements over previous models:

Get 20% off Google Google Workspace (Google Meet) Standard Plan with  the following codes: 96DRHDRA9J7GTN6
Get 20% off Google Workspace (Google Meet)  Business Plan (AMERICAS) with  the following codes:  C37HCAQRVR7JTFK Get 20% off Google Workspace (Google Meet) Business Plan (AMERICAS): M9HNXHX3WC9H7YE (Email us for more codes)

Active Anti-Aging Eye Gel, Reduces Dark Circles, Puffy Eyes, Crow's Feet and Fine Lines & Wrinkles, Packed with Hyaluronic Acid & Age Defying Botanicals

  • It handles a longer context of documents (up to 4x longer) to improve search accuracy, especially for complex documents.
  • Rerank 3 supports over 100 languages, addressing the challenge of multilingual data retrieval.
  • The model can search various data formats like emails, invoices, JSON documents, codes, and tables.
  • Rerank 3 works even faster than previous models, especially with longer documents.
  • When used with Cohere’s RAG systems, Rerank 3 reduces the cost by requiring fewer documents to be processed by the expensive LLMs.

Plus, enterprises can access it through Cohere’s hosted API, AWS Sagemaker, and Elasticsearch’s inference API.

Why does this matter?

Rerank 3 represents a step towards a future where data is not just stored but actively used by businesses to make smarter choices and automate tasks. Imagine instantly finding a specific line of code from an email or uncovering pricing details buried in years of correspondence.


AI Unraveled: Demystifying Frequently Asked Questions on Artificial Intelligence (OpenAI, ChatGPT, Google Bard, Generative AI, Discriminative AI, xAI, LLMs, GPUs, Machine Learning, NLP, Promp Engineering)

Source

Apple M4 Macs: Coming soon with AI power!

Apple is overhauling its Mac lineup with a new M4 chip focused on AI processing. This comes after the recent launch of M3 Macs, possibly due to slowing Mac sales and similar features in competitor PCs.

The M4 chip will come in three tiers (Donan, Brava, Hidra) and will be rolled out across various Mac models throughout 2024 and early 2025. Lower-tier models like MacBook Air and Mac Mini will get the base Donan chip, while high-performance Mac Pro will be equipped with the top-tier Hidra. We can expect to learn more about the specific AI features of the M4 chip at Apple’s WWDC on June 10th.

If you are looking for an all-in-one solution to help you prepare for the AWS Cloud Practitioner Certification Exam, look no further than this AWS Cloud Practitioner CCP CLF-C02 book

Why does this matter?

Apple’s new AI-powered M4 Mac chip could make Macs much faster for things like video editing and scientific work, competing better with computers with similar AI features.

By controlling hardware and software, Apple can fine-tune everything to ensure a smooth user experience and future improvements.

Source

Meta’s OpenEQA puts AI’s real-world comprehension to test

Meta AI has released a new dataset called OpenEQA to measure how well AI understands the real world. This “embodied question answering” (EQA) involves an AI system being able to answer questions about its environment in natural language.

The dataset includes over 1,600 questions about various real-world places and tests an AI’s ability to recognize objects, reason about space and function, and use common sense knowledge.

Why does this matter?

While OpenEQA challenges AI with questions demanding visual and spatial reasoning, it also exposes limitations in current AI models that often rely solely on text knowledge. Its role could push researchers to develop AI with a stronger grasp of the physical world.

Source

💥 OpenAI fires two researchers for alleged leaking

  • OpenAI has dismissed two researchers, Leopold Aschenbrenner and Pavel Izmailov, for allegedly leaking information following an undisclosed internal investigation.
  • The leaked information may be related to a research project called Q*, which involved a breakthrough in AI models solving unseen math problems, raising concerns about the lack of safeguards for commercializing such advanced technology.
  • The firings highlight a potential contradiction in OpenAI’s mission, as the company faces criticism for moving away from its original ethos of openness and transparency.
  • Source

🍎 Apple is planning to bring new AI-focused M4 chips to entire line of Macs

  • Apple is poised to launch its next-generation M4 chips as early as this year, aimed at enhancing AI capabilities and rejuvenating Mac sales following a 27% drop last fiscal year.
  • The M4 chips, reported to be nearing production, are expected to come in three variants named Donan, Brava, and Hidra, supporting a range of Mac products, including updates to the iMac, MacBook Pros, and Mac Mini initially, with the MacBook Air and Mac Studio to follow.
  • This accelerated update cycle to introduce M4 chips may lead to a short lifespan for the recently launched M3 chips, indicating Apple’s urgency to compete in the AI technology space against rivals with similar AI-focused hardware advancements.
  • Source

🤷‍♀️ Amazon CEO: don’t wait for us to launch a ChatGPT competitor

  • Amazon CEO Andy Jassy emphasizes the company’s focus on building foundational “primitives” for generative AI rather than quickly launching public-facing products like a ChatGPT competitor.
  • Amazon has launched AI products such as Amazon Bedrock and Amazon Q aimed at software engineers and business customers, aligning with its strategy to empower third-party developers to create GenAI applications.
  • Despite not directly competing with ChatGPT, Amazon is investing in the AI domain, including a $4 billion investment in AI company Anthropic, while also enhancing its existing products like Alexa with AI capabilities.
  • Source

💬 ChatGPT GPT-4 just got a huge upgrade 

  • ChatGPT’s GPT-4 Turbo model has received an upgrade, enhancing its abilities in writing, math, logical reasoning, and coding, as announced by OpenAI for its premium users.
  • The upgrade, distinguished by significant performance improvements in mathematics and GPQA, also aims for more succinct, direct, and conversational responses.
  • This new version of ChatGPT, which includes data up until December 2023, shows improved performance on recent topics, such as acknowledging the launch of the iPhone 15.
  • Source

🧠 Gabe Newell, the man behind Steam, is working on a brain-computer interface

  • Gabe Newell, co-founder of Valve and the force behind Steam, has been developing a brain-computer interface (BCI) technology through a venture named Starfish Neuroscience, rivaling Elon Musk’s Neuralink.
  • Since 2019, Newell has explored gaming applications for BCIs and discussed potential future capabilities like editing feelings, highlighting the technology’s potential beyond traditional interfaces.
  • Aside from his BCI pursuits, Newell has faced recent challenges including an antitrust lawsuit against Steam and the sale of his megayacht, amidst managing COVID-19 precautions and legal appearances.
  • Source

What Else Is Happening in AI on April 12th 2024❗

🔄 ChatGPT gets an upgrade for premium users

Djamgatech: Build the skills that’ll drive your career into six figures: Get Djamgatech.

OpenAI has released an enhanced version of GPT-4 Turbo for ChatGPT Plus, Team, and Enterprise customers. The new model, trained on data until December 2023, promises more direct responses, less verbosity, and improved conversational language, along with advancements in writing, math, reasoning, and coding. (Link)

🤝 Dr. Andrew Ng joins Amazon’s Board of Directors

Amazon has appointed Dr. Andrew Ng, a renowned AI expert and founder of several influential AI companies, to its Board of Directors. With his deep expertise in machine learning and AI education, Ng is expected to provide valuable insights as Amazon navigates the transformative potential of generative AI. (Link)

⌚️ Humane’s $699 Ai Pin hits the US market

Humane’s Ai Pin is now available across the US, with global expansion on the horizon through SKT and SoftBank partnerships. The wearable AI device is powered by a $24/month plan, including unlimited AI queries, data, and storage. The international availability is to be announced soon. (Link)

📱 TikTok might use AI influencers for ads

TikTok is developing a new feature that lets companies use AI characters to advertise products. These AI influencers can read scripts made by advertisers or sellers. TikTok has been testing this feature but isn’t sure when it will be available for everyone to use. (Link)

🤖 Sanctuary AI’s humanoid robot to be tested at Magna

Magna, a major European car manufacturer, will pilot Sanctuary AI’s humanoid robot, Phoenix, at one of its facilities. This follows similar moves by other automakers exploring the use of humanoid robots in manufacturing, as companies seek to determine the potential return on investment. (Link)

A Daily chronicle of AI Innovations April 11th 2024: 🚀 Meta unveils next-generation AI chip for enhanced workloads 🎶 New AI tool lets you generate 1200 songs per month for free 💰 Adobe is buying videos for $3 per minute to build an AI model 🤖 Google expands Gemma family with new models 🌐 Mistral unveils Mixtral-8x22B open language model 📷 Google Photos introduces free AI-powered editing tools 🖼️ Microsoft enhances Bing visual search with personalization 🛡️ Sama red team: Safety-centered solution for Generative AI 💥 Apple hit with ‘mercenary spyware attacks’  🧠 Humane AI has only one problem: it just doesn’t work 🔍 MistralAI unveils groundbreaking open model Mixtral 8x22B 🙃 Microsoft proposed using DALL-E to US military last year 🎵 New AI music generator Udio synthesizes realistic music on demand 🎬 Adobe is purchasing video content to train its AI model

🚀 Meta unveils next-generation AI chip for enhanced workloads

Meta has introduced the next generation of its Meta Training and Inference Accelerator (MTIA), significantly improving on MTIAv1 (its first-gen AI inference accelerator). This version more than doubles the memory and compute bandwidth, designed to effectively serve Meta’s crucial AI workloads, such as its ranking and recommendation models and Gen AI workloads.

Ace the Microsoft Azure Fundamentals AZ-900 Certification Exam: Pass the Azure Fundamentals Exam with Ease

Meta has also co-designed the hardware system, the software stack, and the silicon, which is essential for the success of the overall inference solution.

Meta unveils next-generation AI chip for enhanced workloads
Meta unveils next-generation AI chip for enhanced workloads

Early results show that this next-generation silicon has improved performance by 3x over the first-generation chip across four key models evaluated. MTIA has been deployed in the data center and is now serving models in production.

Why does this matter?

This is a bold step towards self-reliance in AI! Because Meta controls the whole stack, it can achieve an optimal mix of performance and efficiency on its workloads compared to commercially available GPUs. This eases NVIDIA’s grip on it, which might be having a tough week with other releases, including Intel’s Gaudi 3 and Google Axion Processors.

Source

New AI tool lets you generate 1200 songs per month for free

Udio, a new AI music generator created by former Google DeepMind researchers, is now available in beta. It allows users to generate up to 1200 songs per month for free, with the ability to specify genres and styles through text prompts.

The startup claims its AI can produce everything from pop and rap to gospel and blues, including vocals. While the free beta offers limited features, Udio promises improvements like longer samples, more languages, and greater control options in the future. The company is backed by celebrities like Will.i.am and investors like Andreessen Horowitz.

Why does this matter?

AI-generated music platforms like Udio democratize music creation by making it accessible to everyone, fostering new artists and diverse creative expression. This innovation could disrupt traditional methods, empowering independent creators lacking access to expensive studios or musicians.

Source

💰 Adobe is buying videos for $3 per minute to build an AI model

Adobe is buying videos at $3 per minute from its network of photographers and artists to build a text-to-video AI model. It has requested short clips of people engaged in everyday actions such as walking or expressing emotions including joy and anger, interacting with objects such as smartphones or fitness equipment, etc.

The move shows Adobe trying to catch up to competitors like OpenAI (Sora). Over the past year, Adobe has added generative AI features to its portfolio, including Photoshop and Illustrator, that have garnered billions of uses. However, Adobe may be lagging behind the AI race and is trying to catch up.

Why does this matter?

Adobe’s targeted video buying for AI training exposes the hefty price tag of building competitive AI. Smaller companies face an uphill battle—they might need to get scrappier, focus on specific niches, team up, or use free, open-source AI resources.

Source

💥 Apple hit with ‘mercenary spyware attacks

  • Apple has issued a warning to iPhone users in 92 countries about a potential “mercenary spyware attack” aimed at compromising their devices, without identifying the attackers or the consequences.
  • The company suggests that the attack is highly targeted, advising recipients to take the warning seriously and to update their devices with the latest security patches and practice strong cyber hygiene.
  • This type of attack is often linked to state actors employing malware from private companies, with the infamous ‘Pegasus’ spyware mentioned as an example, capable of extensive surveillance on infected phones.
  • Source

🧠 Humane AI has only one problem: it just doesn’t work

  • The Humane AI Pin, retailing for $699 plus a $24 monthly fee, is designed as a wearable alternative to smartphones, promising users freedom from their screens through AI-assisted tasks. However, its functionality falls significantly short of expectations.
  • Throughout testing, the AI Pin struggled with basic requests and operations, demonstrating unreliability and slow processing times, leading to the conclusion that it fails to deliver on its core promise of a seamless, smartphone-free experience.
  • Despite its well-intentioned vision for a post-smartphone future and the integration of innovative features like a screenless interface and ambient computing, the device’s current state of performance and high cost make it a poor investment for consumers.
  • Source

🔍 MistralAI unveils groundbreaking open model Mixtral 8x22B

  • Mistral AI has released Mixtral 8x22B, an open-source AI model boasting 176 billion parameters and a 65,000-token context window, expected to surpass its predecessor and compete with major models like GPT-3.5 and Llama 2.
  • The Paris-based startup, valued at over $2 billion, aims to democratize access to cutting-edge AI by making Mixtral 8x22B available on platforms like Hugging Face and Together AI, allowing for widespread use and customization.
  • Despite its potential for innovation in fields like customer service and drug discovery, Mixtral 8x22B faces challenges related to its “frontier model” status, including the risk of misuse due to its open-source nature and lack of control over harmful applications.
  • Source

🙃 Microsoft proposed using DALL-E to US military last year

  • Microsoft proposed to the U.S. Department of Defense in 2023 to use OpenAI’s DALL-E AI for software development in military operations.
  • The proposal included using OpenAI tools like ChatGPT and DALL-E for document analysis, machine maintenance, and potentially training battlefield management systems with synthetic data.
  • Microsoft had not implemented the use of DALL-E in military projects, and OpenAI, which did not participate in Microsoft’s presentation, restricts its technology from being used to develop weapons or harm humans.
  • Source

🎵 New AI music generator Udio synthesizes realistic music on demand

  • Uncharted Labs has officially launched its music generator, Udio, which can transform text prompts into professional-quality music tracks, challenging the leading AI music generator, Suno V3.
  • Udio has impressed users and reviewers alike with its ability to generate songs that feature coherent lyrics, well-structured compositions, and competitive rhythms, some even considering it superior to Suno V3.
  • Despite facing initial server overload due to high user demand, Udio’s user-friendly interface and strong backing from notable investors suggest a promising future for AI-assisted music creation, though it remains free during its beta testing phase.
  • Source

🎬 Adobe is purchasing video content to train its AI model

  • Adobe is developing a text-to-video AI model, offering artists around $3 per minute for video footage to train the new tool, as reported by Bloomberg.
  • The software company has requested over 100 video clips from artists, aiming for content that showcases various emotions and activities, but has set a low budget for acquisitions.
  • Despite the potential for AI to impact artists’ future job opportunities and the lack of credit or royalties for the contributed footage, Adobe is pushing forward with the AI model development.
  • Source

What Else Is Happening in AI on April 11th 2024❗

🤖 Google expands Gemma family with new models

Google has expanded its Gemma family with two new models: CodeGemma and RecurrentGemma. CodeGemma is tailored for developers, offering intelligent code completion and chat capabilities for languages like Python and JavaScript. RecurrentGemma is optimized for efficiency in research, utilizing recurrent neural networks and local attention. (Link)

🌐 Mistral unveils Mixtral-8x22B open language model

Mistral AI has unveiled Mixtral-8x22B, a new open language model with extensive capabilities. This model, featuring 64,000 token context windows and requiring 258GB of VRAM, is a mixture-of-experts model. Early users are exploring its potential, with more details expected soon. (Link)

📷 Google Photos introduces free AI-powered editing tools

Google Photos is rolling out free AI-powered editing tools for all users starting May 15. Features like Magic Eraser, Photo Unblur, and Portrait Light will be accessible without a subscription. Pixel users will also benefit from the Magic Editor, which simplifies complex edits using generative AI. (Link)

🖼️ Microsoft enhances Bing visual search with personalization

Microsoft enhances Bing Visual Search with personalized visual systems based on user preferences. A patent application reveals that search results will be tailored to individual interests, such as showing gardening-related images to gardening enthusiasts and food-related visuals to chefs. (Link)

🛡️ Sama red team: Safety-centered solution for Generative AI

Sama has introduced Sama Red Team, a safety-centered solution for evaluating risks associated with generative AI and LLMs. This system simulates adversarial attacks to identify vulnerabilities related to bias, personal information, and offensive content, contributing to a more ethical AI landscape. (Link)

A Daily chronicle of AI Innovations April 10th 2024: 👀 OpenAI gives GPT-4 a major upgrade; 💬 Quora’s Poe now lets AI chatbot developers charge per message; 🌐 Google updates and expands its open source Gemma AI model family; 🔥 Intel unveils latest AI chip as Nvidia competition heats up; 📱 WordPress parent acquires Beeper app which brought iMessage to Android; 🤔 New bill would force AI companies to reveal use of copyrighted art; 🧠 Intel’s new AI chip: 50% faster, cheaper than NVIDIA’s; 🤖 Meta to Release Llama 3 Open-source LLM next week; ☁️ Google Cloud announces major updates to enhance Vertex AI

Intel’s new AI chip: 50% faster, cheaper than NVIDIA’s

Intel has unveiled its new Gaudi 3 AI accelerator, which aims to compete with NVIDIA’s GPUs. According to Intel, the Gaudi 3 is expected to reduce training time for large language models like Llama2 and GPT-3 by around 50% compared to NVIDIA’s H100 GPU. The Gaudi 3 is also projected to outperform the H100 and H200 GPUs in terms of inference throughput, with around 50% and 30% faster performance, respectively.

Intel's new AI chip: 50% faster, cheaper than NVIDIA's
Intel’s new AI chip: 50% faster, cheaper than NVIDIA’s

The Gaudi 3 is built on a 5nm process and offers several improvements over its predecessor, including doubling the FP8, quadrupling the BF16 processing power, and increasing network and memory bandwidth. Intel is positioning the Gaudi 3 as an open, cost-effective alternative to NVIDIA’s GPUs, with plans to make it available to major OEMs starting in the second quarter of 2024. The company is also working to create an open platform for enterprise AI with partners like SAP, Red Hat, and VMware.

Why does it matter?

Intel is challenging NVIDIA’s dominance in the AI accelerator market. It will introduce more choice and competition in the market for high-performance AI hardware. It could drive down prices, spur innovation, and give customers more flexibility in building AI systems. The open approach with community-based software and standard networking aligns with broader trends toward open and interoperable AI infrastructure.

Source

Meta to release Llama 3 open-source LLM next week

Meta plans to release two smaller versions of its upcoming Llama 3 open-source language model next week. These smaller models will build anticipation for the larger version, which will be released this summer. Llama 3 will significantly upgrade over previous versions, with about 140 billion parameters compared to 70 billion for the biggest Llama 2 model. It will also be a more capable, multimodal model that can generate text and images and answer questions about images.

The two smaller versions of Llama 3 will focus on text generation. They’re intended to resolve safety issues before the full multimodal release. Previous Llama models were criticized as too limited, so Meta has been working to make Llama 3 more open to controversial topics while maintaining safeguards.

Why does it matter?

The open-source AI model landscape has become much more competitive in recent months, with other companies like Mistral and Google DeepMind also releasing their own open-source models. Meta hopes that by making Llama 3 more open and responsive to controversial topics, it can catch up to models like OpenAI’s GPT-4 and become a standard for many AI applications.

Source

Google Cloud announces major updates to enhance Vertex AI

Google Cloud has announced exciting model updates and platform capabilities that continue to enhance Vertex AI:

  • Gemini 1.5 Pro: Gemini 1.5 Pro is now available in public preview in Vertex AI, the world’s first one million-token context window to customers. It also supports the ability to process audio streams, including speech and even the audio portion of videos.
  • Imagen 2.0: Imagen 2.0 can now create short, 4-second live images from text prompts, enabling marketing and creative teams to generate animated content. It also has new image editing features like inpainting, outpainting, and digital watermarking.
  • Gemma: Google Cloud is adding CodeGemma to Vertex AI. CodeGemma is a new lightweight model from Google’s Gemma family based on the same research and technology used to create Gemini.
  • MLOps: To help customers manage and deploy these large language models at scale, Google has expanded the MLOps capabilities for Gen AI in Vertex AI. This includes new prompt management tools for experimenting, versioning, optimizing prompts, and enhancing evaluation services to compare model performance.

Why does it matter?

These updates significantly enhance Google Cloud’s generative AI offerings. It also strengthens Google’s position in the generative AI space and its ability to support enterprise adoption of these technologies.

Source

👀 OpenAI gives GPT-4 a major upgrade

  • OpenAI has introduced GPT-4 Turbo with Vision, a new model available to developers that combines text and image processing capabilities, enhancing AI chatbots and other applications.
  • This multimodal model, which maintains a 128,000-token window and knowledge from December 2023, simplifies development by allowing a single model to understand both text and images.
  • GPT-4 Turbo with Vision simplifies development processes for apps requiring multimodal inputs like coding assistance, nutritional insights, and website creation from drawings.
  • Source

💬 Quora’s Poe now lets AI chatbot developers charge per message

  • Poe, a Quora-owned AI chatbot platform, introduced a new revenue model allowing creators to earn money by setting a price-per-message for their bots.
  • The revenue model aims to compensate creators for operational costs, fostering a diverse ecosystem of bots ranging from tutoring to storytelling.
  • This monetization strategy is initially available to U.S. creators, complemented by an analytics dashboard to track earnings and bot usage.
  • Source

🌐 Google updates and expands its open source Gemma AI model family

  • Google has enhanced the Gemma AI model family with new code completion models and improvements for more efficient inference, along with more flexible terms of use.
  • Three new versions of CodeGemma have been introduced, including a 7 billion parameter model for code generation and discussion, and a 2 billion parameter model optimized for fast code completion on local devices.
  • Google also unveiled RecurrentGemma, a model leveraging recurrent neural networks for better memory efficiency and speed in text generation, indicating a shift towards optimizing AI performance on devices with limited resources.
  • Source

🔥 Intel unveils latest AI chip as Nvidia competition heats up

  • Intel introduced its latest artificial intelligence chip, Gaudi 3, highlighting its efficiency and speed advantages over Nvidia’s H100 GPU and offering configurations that enhance AI model training and deployment.
  • The Gaudi 3 chip, which outperforms Nvidia in power efficiency and AI model processing speed, will be available in the third quarter, with Dell, Hewlett Packard Enterprise, and Supermicro among the companies integrating it into their systems.
  • Despite Nvidia’s dominant position in the AI chip market, Intel is seeking to compete by emphasizing Gaudi 3’s competitive pricing, open network architecture, and partnerships for open software development with companies like Google, Qualcomm, and Arm.
  • Source

📱 WordPress parent acquires Beeper app which brought iMessage to Android

  • Automattic, the owner of WordPress and Tumblr, has acquired Beeper, a startup known for its Beeper Mini app that attempted to challenge Apple’s iMessage, for $125 million despite the app’s quick defeat.
  • Beeper CEO Eric Migicovsky will oversee the merging of Beeper with Automattic’s similar app Texts, aiming to create the best chat app, with the combined service expected to launch later this year.
  • The acquisition raises questions due to Beeper Mini’s brief success and upcoming changes like Apple introducing RCS support to iPhones, but Automattic sees potential in Beeper’s stance on open messaging standards and its established brand.
  • Source

🤔 New bill would force AI companies to reveal use of copyrighted art

  • A new bill introduced in the US Congress by Congressman Adam Schiff aims to make artificial intelligence companies disclose the copyrighted material used in their generative AI models.
  • The proposed Generative AI Copyright Disclosure Act would require AI companies to register copyrighted works in their training datasets with the Register of Copyrights before launching new AI systems.
  • The bill responds to concerns about AI firms potentially using copyrighted content without permission, amidst growing litigation and calls for more regulation from the entertainment industry and artists.
  • Source

What Else Is Happening in AI on April 10th 2024❗

🚀 OpenAI launches GPT-4 Turbo with Vision model through API

OpenAI has unveiled the latest addition to its AI arsenal, the GPT -4 Turbo with Vision model, which is now “generally available” through its API. This new version has enhanced capabilities, including support for JSON mode and function calling for Vision requests. The upgraded GPT-4 Turbo model promises improved performance and is set to roll out in ChatGPT.  (Link)

👂 Google’s Gemini 1.5 Pro can now listen to audio

Google’s update to Gemini 1.5 Pro gives the model ears. It can process text, code, video, and uploaded audio streams, including audio from video, which it can listen to, analyze, and extract information from without a corresponding written transcript.(Link)

💰 Microsoft to invest $2.9 billion in Japan’s AI and cloud infrastructure

Microsoft announced it would invest $$2.9 billion over the next two years to increase its hyperscale cloud computing and AI infrastructure in Japan. It will also expand its digital skilling programs with the goal of providing AI skills to more than 3 million people over the next three years. (Link)

👩‍💻 Google launches Gemini Code Assist, the latest challenger to GitHub’s Copilot

At its Cloud Next conference, Google unveiled Gemini Code Assist, its enterprise-focused AI code completion and assistance tool. It provides various functions such as enhanced code completion, customization, support for various repositories, and integration with Stack Overflow and Datadog. (Link)

🛍️ eBay launches AI-driven ‘Shop the Look’ feature on its iOS app

eBay launched an AI-powered feature to appeal to fashion enthusiasts – “Shop the Look” on its iOS mobile application. It will suggest a carousel of images and ideas based on the customer’s shopping history. The recommendations will be personalized to the end user. The idea is to introduce how other fashion items may complement their current wardrobe. (Link)

A Daily chronicle of AI Innovations April 09th 2024: 🤖 Stability AI launches multilingual Stable LM 2 12B 📱 Ferret-UI beats GPT-4V in mobile UI tasks ⏰ Musk says AI will outsmart humans within a year 🍁 Canada bets big on AI with $2.4B investment 🎥 OpenAI is using YouTube for GPT-4 training 🤖 Meta to launch new Llama 3 models 👂 Google’s Gemini 1.5 Pro can now hear 💥 Google’s first Arm-based CPU will challenge Microsoft and Amazon in the AI race 📈 Boosted by AI, global PC market bounces back

🤖 Meta to launch new Llama 3 models

  • According to an insider, Meta will release two smaller versions of its planned major language model, Llama 3, next week to build anticipation for the major release scheduled for this summer.
  • The upcoming Llama 3 model, which will include both text generation and multimodal capabilities, aims to compete with OpenAI’s GPT-4 and is reported to potentially have up to 140 billion parameters.
  • Meta’s investment in the Llama 3 model and open-source AI reflects a broader trend of tech companies leveraging these technologies to set industry standards, similar to Google’s strategy with Android.
  • Source

👂 Google’s Gemini 1.5 Pro can now hear

  • Google has enhanced Gemini 1.5 Pro to interpret audio inputs, allowing it to process information from sources like earnings calls or video audio directly without needing a transcript.
  • Gemini 1.5 Pro, positioned as a mid-tier option within the Gemini series, now outperforms even the more advanced Gemini Ultra by offering faster and more intuitive responses without requiring model fine-tuning.
  • Alongside Gemini 1.5 Pro updates, Google introduced enhancements to its Imagen 2 model, including inpainting and outpainting features, and debuted a digital watermarking technology, SynthID, for tracking the origin of generated images.
  • Source

💥 Google’s first Arm-based CPU will challenge Microsoft and Amazon in the AI race

  • Google is developing its own Arm-based CPU named Axion to enhance AI operations in data centers and will launch it for Google Cloud business customers later this year.
  • The Axion CPU will improve performance by 30% over general-purpose Arm chips and by 50% over Intel’s processors, and it will support services like Google Compute Engine and Google Kubernetes Engine.
  • Google’s move to create its own Arm-based CPU and update its TPU AI chips aims to compete with Microsoft and Amazon in the AI space and reduce reliance on external suppliers like Intel and Nvidia.
  • Source

📈 Boosted by AI, global PC market bounces back

  • The global PC market has seen growth for the first time in over two years, with a 1.5% increase in shipments to 59.8 million units in the first quarter, reaching pre-pandemic levels.
  • The resurgence is partly attributed to the emergence of “AI PCs,” which feature onboard AI processing capabilities, with projections suggesting these will represent almost 60% of all PC sales by 2027.
  • Major PC manufacturers like Lenovo, HP, Dell, and Apple are heavily investing in the AI PC segment, with Lenovo leading the market and Apple experiencing the fastest growth in shipments.
  • Source

🤖Stability AI launches multilingual Stable LM 2 12B

Stability AI has released a 12-billion-parameter version of its Stable LM 2 language model, offering both a base and an instruction-tuned variant. These models are trained on a massive 2 trillion token dataset spanning seven languages: English, Spanish, German, and more. Stability AI has also improved its 1.6 billion-parameter Stable LM 2 model with better conversational abilities and tool integration.

The new 12B model is designed to balance high performance with relatively lower hardware requirements than other large language models. Stability AI claims it can handle complex tasks requiring substantially more computational resources. The company also plans to release a long-context variant of these models on the Hugging Face platform soon.

Why does this matter?

Stable LM 2 uses powerful 12B models without the most advanced hardware, making it a great choice for enterprises and developers. Stability AI’s multi-pronged approach to language solutions may give it an edge in the competitive generative AI market.

Source

📱 Ferret-UI beats GPT-4V in mobile UI tasks

Researchers have launched Ferret-UI, a multimodal language model designed to excel at understanding and interacting with mobile user interfaces (UIs). Unlike general-purpose models, Ferret-UI is trained explicitly for various UI-centric tasks, from identifying interface elements to reasoning about an app’s overall functionality.

Ferret-UI beats GPT-4V in mobile UI tasks
Ferret-UI beats GPT-4V in mobile UI tasks

By using “any resolution” technology and a meticulously curated dataset, Ferret-UI digs deep into the intricacies of mobile UI screens, outperforming its competitors in elementary and advanced tasks. Its ability to execute open-ended instructions may make it the go-to solution for developers looking to create more intuitive mobile experiences.

Why does this matter?

Ferret-UI’s advanced capabilities in understanding and navigating mobile UI screens will increase accessibility, productivity, and user satisfaction. By setting a new standard for mobile UI interaction, this innovative MLLM paves the way for more intuitive and responsive mobile experiences for users to achieve more with less effort.

Source

⏰ Musk says AI will outsmart humans within a year

Tesla CEO Elon Musk has boldly predicted that AI will surpass human intelligence as early as next year or by 2026. In a wide-ranging interview, Musk discussed AI development’s challenges, including chip shortages and electricity supply constraints, while sharing updates on his xAI startup’s AI chatbot, Grok. Despite the hurdles, Musk remains optimistic about the future of AI and its potential impact on society.

Why does this matter?

Musk’s prediction highlights the rapid pace of AI development and its potential to reshape our world in the near future. As AI becomes increasingly sophisticated, it could transform the job market and raise important ethical questions about the role of technology in society.

Source

What Else Is Happening in April 09th 2024❗

🇬🇧 Microsoft is opening a new AI research hub in London

Microsoft is tapping into the UK’s exceptional talent pool to drive language models and AI infrastructure breakthroughs. The move highlights Microsoft’s commitment to invest £2.5 billion in upskilling the British workforce and building the AI-driven future. (Link)

🎥 OpenAI is using YouTube for GPT-4 training

OpenAI reportedly transcribed over a million hours of YouTube videos to train its advanced GPT-4 language model. Despite legal concerns, OpenAI believes this is fair use. Google and Meta have also explored various solutions to obtain more training data, including using copyrighted material and consumer data. (Link)

🧠 Arm’s new chips bring AI to the IoT edge

Arm has introduced the Ethos-U85 NPU and Corstone-320 IoT platform, designed to enhance edge AI applications with improved performance and efficiency. These technologies aim to accelerate the development and deployment of intelligent IoT devices by providing an integrated hardware and software solution for Arm’s partners. (Link)

🍁 Canada bets big on AI with $2.4B investment

Prime Minister Justin Trudeau has announced a $2.4 billion investment in Canada’s AI sector, with the majority aimed at providing researchers access to computing capabilities and infrastructure. The government also plans to establish an AI Safety Institute and an Office of the AI and Data Commissioner to ensure responsible development and regulation of the technology. (Link)

A Daily chronicle of AI Innovations April 08th 2024: 🇬🇧 Microsoft opens AI Hub in London to ‘advance state-of-the-art language models’ 💡 JPMorgan CEO compares AI’s potential impact to electricity and the steam engine 🎵 Spotify moves into AI with new feature ⚖️ Build resource-efficient LLMs with Google’s MoD 📡 Newton brings sensor-driven intelligence to AI models 💰 Internet archives become AI training goldmines for Big Tech

Build resource-efficient LLMs with Google’s MoD

Google DeepMind has introduced “Mixture-of-Depths” (MoD), an innovative method that significantly improves the efficiency of transformer-based language models. Unlike traditional transformers that allocate the same amount of computation to each input token, MoD employs a “router” mechanism within each block to assign importance weights to tokens. This allows the model to strategically allocate computational resources, focusing on high-priority tokens while minimally processing or skipping less important ones.

Build resource-efficient LLMs with Google's MoD
Build resource-efficient LLMs with Google’s MoD

Notably, MoD can be integrated with Mixture-of-Experts (MoE), creating a powerful combination called Mixture-of-Depths-and-Experts (MoDE). Experiments have shown that MoD transformers can maintain competitive performance while reducing computational costs by up to 50% and achieving significant speedups during inference.

Why does this matter?

MoD can greatly reduce training times and enhance model performance by dynamically optimizing computational resources. Moreover, it adapts the model’s depth based on the complexity of the task at hand. For simpler tasks, it employs shallower layers, conserving resources. Conversely, for intricate tasks, it deepens the network, enhancing representation capacity. This adaptability ensures that creators can fine-tune LLMs for specific use cases without unnecessary complexity.

Source

Newton brings sensor-driven intelligence to AI models

Startup Archetype AI has launched with the ambitious goal of making the physical world understandable to artificial intelligence. By processing data from a wide variety of sensors, Archetype’s foundational AI model called Newton aims to act as a translation layer between humans and the complex data generated by the physical world.

Using plain language, Newton will allow people to ask questions and get insights about what’s happening in a building, factory, vehicle, or even the human body based on real-time sensor data. The company has already begun pilot projects with Amazon, Volkswagen, and healthcare researchers to optimize logistics, enable smart vehicle features, and track post-surgical recovery. Archetype’s leadership team brings deep expertise from Google’s Advanced Technology and Products (ATAP) division.

Why does this matter?

General-purpose AI systems like Newton that can interpret diverse sensor data will be the pathway to building more capable, context-aware machines. In the future, users may increasingly interact with AI not just through screens and speakers but through intelligently responsive environments that anticipate and adapt to their needs. However, as AI becomes more deeply embedded in the physical world, the stakes of system failures or unintended consequences become higher.

Source

Internet archives become AI training goldmines for Big Tech

To gain an edge in the heated AI arms race, tech giants Google, Meta, Microsoft, and OpenAI are spending billions to acquire massive datasets for training their AI models. They are turning to veteran internet companies like Photobucket, Shutterstock, and Freepik, who have amassed vast archives of images, videos, and text over decades online.

The prices for this data vary depending on the type and buyer but range from 5 cents to $7 per image, over $1 per video, and around $0.001 per word for text. The demand is so high that some companies are requesting billions of videos, and Photobucket says it can’t keep up.

Why does this matter?

This billion-dollar rush for AI training data could further solidify Big Tech’s dominance in artificial intelligence. As these giants hoard the data that’s crucial for building advanced AI models, it may become increasingly difficult for startups or academic labs to compete on a level playing field. We need measures to protect the future diversity and accessibility of AI technologies.

Source

🎵 Spotify moves into AI with new feature

  • Spotify is launching a beta tool enabling Premium subscribers to create playlists using text descriptions on mobile.
  • Users can input various prompts reflecting genres, moods, activities, or even movie characters to receive a 30-song playlist tailored to their request, with options for further refinement through additional prompts.
  • The AI Playlist feature introduces a novel approach to playlist curation, offering an efficient and enjoyable way to discover music that matches specific aesthetics or themes, despite limitations on non-music related prompts and content restrictions.
  • Source

🇬🇧 Microsoft opens AI Hub in London to ‘advance state-of-the-art language models’

  • Mustafa Suleyman, co-founder of DeepMind and new CEO of Microsoft AI, announced the opening of a new AI hub in London, focusing on advanced language models, under the leadership of Jordan Hoffmann.
  • The hub aims to recruit fresh AI talent for developing new language models and infrastructure, bolstered by Microsoft’s £2.5 billion investment in the U.K. over the next three years to support AI economy training and data centre expansion.
  • Suleyman, Hoffmann, and about 60 AI experts recently joined Microsoft through its indirect acquisition of UK-based AI startup Inflection AI.
  • Source

💡 JPMorgan CEO compares AI’s potential impact to electricity and the steam engine

  • JPMorgan CEO Jamie Dimon stated AI could significantly impact every job, comparing its potential to revolutionary technologies like the steam engine and electricity.
  • Dimon highlighted AI’s importance in his shareholder letter, revealing the bank’s investment in over 400 AI use cases and the acquisition of thousands of AI experts and data scientists.
  • He expressed belief in AI’s transformative power, equating its future impact to historical milestones such as the printing press, computing, and the internet.
  • Source

What Else Is Happening in AI on April 08th, 2024❗

🎧 Spotify introduces AI-generated personalized playlists

Spotify has launched AI-powered personalized playlists that users can create using text prompts. The feature is currently available in beta for UK and Australia users on iOS and Android. Spotify uses LLMs to understand the prompt’s intent and its personalization technology to generate a custom playlist, which users can further refine. (Link)

🔍 Meta expands “Made with AI” labeling to more content types

Meta will start applying a “Made with AI” badge to a broader range of AI-generated content, including videos, audio, and images. The company will label content where it detects AI image indicators or when users acknowledge uploading AI-generated content. (Link)

🚀 Gretel’s Text-to-SQL dataset sets new standard for AI training data

Gretel has released the world’s largest open-source Text-to-SQL dataset containing over 100,000 high-quality synthetic samples spanning 100 verticals. The dataset, generated using Gretel Navigator, aims to help businesses unlock the potential of their data by enabling AI models to understand natural language queries and generate SQL queries. (Link)

💾 Microsoft upgrades Azure AI Search with more storage and support for OpenAI apps

Microsoft has made Azure AI Search more cost-effective for developers by increasing its vector and storage capacity. The service now supports OpenAI applications, including ChatGPT and GPTs, through Microsoft’s retrieval augmented generation system. Developers can now scale their apps to a multi-billion vector index within a single search without compromising speed or performance. (Link)

📱 Google brings Gemini AI chatbot to Android app

Google is bringing its AI chatbot, Gemini, to the Android version of the Google app. Similar to its iOS integration, users can access Gemini by tapping its logo at the top of the app, opening a chatbot prompt field. Here, users can type queries, request image generation, or ask for image analysis. (Link)

A Daily chronicle of AI Innovations April 06th 2024: 👀 Sam Altman and Jony Ive seek $1B for personal AI device 🚕 Elon Musk says Tesla will unveil robotaxi in August 🔖 Meta to label content ‘made with AI’ 🙃 How OpenAI, Google and Meta ignored corporate policies to train their AI 🛒

👀 Sam Altman and Jony Ive seek $1B for personal AI device OpenAI CEO

Sam Altman and former Apple design chief Jony Ive are collaborating to create an AI-powered personal device and are currently seeking funding. The specifics of the device are unclear, but it is noted to not resemble a smartphone, with speculation about it being similar to the screenless Humane AI pin. The venture, still unnamed, aims to raise up to $1 billion and is in discussions with major investors, including Thrive Capital and Emerson Collective, with potential ownership involvement from OpenAI. https://invest.radintel.ai

🚕 Elon Musk says Tesla will unveil robotaxi in August

Elon Musk announced that Tesla will unveil its robotaxi on August 8th, aiming to focus on autonomous vehicles over mass-market EVs. The Tesla robotaxi is part of Musk’s vision for a shared fleet that owners can monetize, described in the Tesla Network within his Master Plan Part Deux. Musk’s history of ambitious claims about self-driving technology contrasts with regulatory scrutiny and safety concerns involving Tesla’s Autopilot and Full Self-Driving features.

OpenAI’s AI model can clone your voice in 15 seconds

OpenAI has offered a glimpse into its latest breakthrough – Voice Engine, an AI model that can generate stunningly lifelike voice clones from a mere 15-second audio sample and a text input. This technology can replicate the original speaker’s voice, opening up possibilities for improving educational materials.

Though the model has many applications, the AI giant is cautious about its potential misuse, especially during elections. They have strict rules for partners, like no unauthorized impersonation, clear labeling of synthetic voices, and technical measures like watermarking and monitoring.

Meta to label content ‘made with AI’

  • Meta announced that starting in May 2024, AI-generated content on Facebook, Instagram, and Threads will be labeled “Made with AI.”
  • The decision for broader labeling, including AI-generated videos, audio, and images, is influenced by expert consultations and public opinion surveys.
  • Meta’s goal with the “Made with AI” label is to provide more context to users, aiding in content evaluation, while content violating community standards will still be removed.
  • Source

How OpenAI, Google and Meta ignored corporate policies to train their AI

  • OpenAI, Google, and Meta pushed the boundaries of data acquisition for AI development, with OpenAI transcribing over one million hours of YouTube videos for its GPT-4 model.
  • Meta considered extreme measures such as purchasing a publishing house for access to copyrighted materials, and Google amended its privacy policy to potentially harness user-generated content in Google Docs for AI.
  • As the demand for data outpaces supply, tech companies are exploring the creation of synthetic data generated by AI models themselves, despite the risk of models reinforcing their own errors, suggesting a future where AI might train on data it generates.
  • Source

🛒 Tech giants are on a billion-dollar shopping spree for AI training data

  • Tech giants are spending billions to license images, videos, and other content from companies such as Photobucket and Shutterstock to train their AI models, with costs ranging from 5 cents to $1 per photo and more for videos.
  • Prices for licensing data to train AI vary, with figures from $1 to $2 per image, $2 to $4 for short videos, and up to $300 per hour for longer films, while special handling items like nude images may cost $5 to $7 each.
  • Legal concerns arise as companies like Photobucket update their terms of service to sell user-uploaded content for AI training, despite the US Federal Trade Commission warning against retroactively changing terms for AI use, leading to investigations into deals like Reddit’s with Google.
  • Source

A daily chronicle of AI Innovations April 05th 2024: 🤷‍♀️ YouTube CEO warns OpenAI that training models on its videos is against the rules; 🏢 OpenAI says 2024 is the “year of the enterprise” when it comes to AI; ⚔️ The war for AI talent has begun; 🏢 Cohere launches the “most powerful LLM for enterprises”; 🧰 OpenAI doubles down on AI model customization; 🏠 Will personal home robots be Apple’s next big thing?

Cohere launches the “most powerful LLM for enterprises”

Cohere has announced the release of Command R+, its most powerful and scalable LLM to date. Designed specifically for enterprise use cases, Command R+ boasts several key features:

  • Advanced Retrieval Augmented Generation (RAG) to access and process vast amounts of information, improving response accuracy and reliability.
  • Support for ten business languages, enabling seamless operation across global organizations.
  • Tool Use feature to automate complex workflows by interacting with various software tools.

Moreover, Command R+ outperforms other scalable models on key metrics while providing strong accuracy at lower costs.

Cohere launches the “most powerful LLM for enterprises”
Cohere launches the “most powerful LLM for enterprises”

The LLM is now available through Cohere’s API and can be deployed on various cloud platforms, including Microsoft Azure and Oracle Cloud Infrastructure.

Why does this matter?

As one of the first “enterprise-hardened” LLMs optimized for real-world use cases, Command R+ could shape how companies operationalize generative AI across their global operations and product lines. Similar to how Robotic Process Automation (RPA) transformed back-office tasks, Command R+ could significantly improve efficiency and productivity across diverse industries. Additionally, availability on Microsoft Azure and upcoming cloud deployments make it readily accessible to businesses already using these platforms, which could lower the barrier to entry for implementing gen AI solutions.

Source

OpenAI doubles down on AI model customization

OpenAI is making significant strides in AI accessibility with new features for its fine-tuning API and an expanded Custom Models program. These advancements give developers greater control and flexibility when tailoring LLMs for specific needs.

The fine-tuning AP includes:

  • Epoch-based checkpoint creation for easier retraining
  • A playground for comparing model outputs
  • Support for third-party integration
  • Hyperparameters adjustment directly from the dashboard

The Custom Models program now offers assisted fine-tuning with OpenAI researchers for complex tasks and custom-trained models built entirely from scratch for specific domains with massive datasets.

Why does this matter?

This signifies a significant step towards more accessible and powerful AI customization. Previously, fine-tuning required technical expertise and large datasets. Now, with OpenAI’s assisted programs, organizations can achieve similar results without needing in-house AI specialists, potentially democratizing access to advanced AI capabilities.

Source

Will personal home robots be Apple’s next big thing?

Apple is reportedly venturing into personal robotics after abandoning its self-driving car project and launching its mixed-reality headset. According to Bloomberg’s sources, the company is in the early stages of developing robots for the home environment.

Two potential robot designs are mentioned in the report. One is a mobile robot that can follow users around the house. The other is a stationary robot with a screen that can move to mimic a person’s head movements during video calls. Apple is also considering robots for household tasks in the long term.

The project is being spearheaded by Apple’s hardware and AI teams under John Giannandrea. Job postings on Apple’s website further support its commitment to robotics, highlighting its search for talent to develop “the next generation of Apple products” powered by AI.

Why does this matter?

If Apple does release personal home robots, it could mainstream consumer adoption and create new use cases, as the iPhone did for mobile apps and smart assistants. Apple’s brand power and integrated ecosystem could help tackle key barriers like cost and interoperability that have hindered household robotics so far.

It could also transform homes with mobile AI assistants for tasks like elderly care, household chores, entertainment, and more. This may spur other tech giants to double down on consumer robotics.

Source

🤷‍♀️ YouTube CEO warns OpenAI that training models on its videos is against the rules

  • YouTube CEO Neal Mohan warned that OpenAI’s use of YouTube videos to train its text-to-video generator Sora could breach the platform’s terms of service, emphasizing creators’ expectations of content use compliance.
  • This stance poses potential challenges for Google, facing multiple lawsuits over alleged unauthorized use of various content types to train its AI models, arguing such use constitutes “fair use” through transformative learning.
  • Mohan’s remarks could undermine Google’s defense in ongoing legal battles by highlighting inconsistencies in the company’s approach to using content for AI training, including its use of YouTube videos and content from other platforms.
  • Source

⚔️ The war for AI talent has begun

  • Elon Musk aims to retain Tesla’s AI talent by increasing their compensation to counteract aggressive recruitment tactics from OpenAI.
  • Tesla Staff Machine Learning Scientist Ethan Knight’s move to Musk’s AI startup, xAI, exemplifies efforts to prevent employees from joining competitors like OpenAI.
  • Musk describes the ongoing competition for AI professionals as the “craziest talent war” he has ever seen and sees increased compensation as a means to achieve Tesla’s ambitious AI goals, including autonomous driving and humanoid robots development.
  • Source

🏢 OpenAI says 2024 is the “year of the enterprise” when it comes to AI

  • OpenAI’s ChatGPT Enterprise has attracted over 600,000 sign-ups, prompting COO Brad Lightcap to declare 2024 as the “year of adoption for AI in the enterprise”.
  • Despite the strong uptake of ChatGPT Enterprise, OpenAI faces stiff competition from companies eager to penetrate the workplace AI market, including major investor Microsoft with its enterprise AI solutions.
  • OpenAI’s venture into the enterprise sector, especially with ChatGPT Enterprise, marks a significant move towards profitability, with successful partnerships with major media companies like Axel Springer SE, Le Monde, and Prisa.
  • Source

What Else Is Happening in AI on April 05th, 2024❗

📈S&P Global launches AI benchmarking tool

S&P Global has launched S&P AI Benchmarks by Kensho, a groundbreaking tool that evaluates the performance of LLMs in complex financial and quantitative applications. This solution aims to set a new industry standard and promote transparency in AI adoption within the financial sector. (Link)

🤝Waymo and Uber partner for autonomous food delivery in Phoenix

Waymo and Uber have teamed up to launch autonomous Uber Eats deliveries in Phoenix using Waymo’s self-driving vehicles. The service will initially cover select merchants in Chandler, Tempe, and Mesa. Customers can opt out during checkout if they prefer a human courier and will receive instructions for retrieving their order from the autonomous vehicle upon arrival. (Link)

🔍Storyblocks integrates AI for smarter search

Storyblocks has integrated OpenAI’s LLM into its search engine to improve search accuracy for complex queries. Coupled with algorithms analyzing content performance and user engagement, the AI-driven search adapts to provide fresh, high-quality content. Storyblocks also uses machine learning to optimize thumbnails, prioritize representation, and suggest complementary assets, streamlining the creative process. (Link)

🚀Hercules AI streamlines enterprise AI app development

Hercules AI has introduced a new “assembly line” approach for rapid deployment of AI assistants in enterprises. The pre-configured components allow companies to develop cost-effective, scalable AI agents. Plus, their RosettaStoneLLM, built on Mistral-7B and WizardCoder-13B, outperforms competitors by converting data for internal AI workflows. (Link)

🤖Yum Brands embraces AI across restaurants

Yum Brands, the parent company of KFC, Pizza Hut, and Taco Bell, is infusing AI into every aspect of its restaurant operations. From voice AI taking drive-thru orders to an AI-powered “SuperApp” for staff, Yum aims to elevate customer experiences and streamline processes. The AI-driven initiatives include personalized promotions, predictive ordering, and even AI-assisted cooking instructions. (Link)

A daily chronicle of AI Innovations April 04th 2024: 🎵 What’s new in Stability AI’s Stable Audio 2.0? 🖥️ Opera One browser becomes the first to offer local AI integration 🚀 Copilot gets GPT-4 Turbo upgrade
🤖 SWE-agent: AI coder that solves GitHub issues in 93 seconds
📲 Mobile-first Higgsfield aims to disrupt video marketing with AI

What’s new in Stability AI’s Stable Audio 2.0?

Stability AI has released Stable Audio 2.0, a new AI model that generates high-quality, full-length audio tracks. Built upon its predecessor, the latest model introduces three groundbreaking features:

  • Generates tracks up to 3 minutes long with coherent musical structure
  • Enables audio-to-audio generation, allowing users to transform uploaded samples using natural language prompts
  • Enhances sound effect generation and style transfer capabilities, offering more flexibility and control for artists

Stable Audio 2.0’s architecture combines a highly compressed autoencoder and a diffusion transformer (DiT) to generate full tracks with coherent structures. The autoencoder condenses raw audio waveforms into shorter representations, capturing essential features, while the DiT excels at manipulating data over long sequences. This combination allows the model to recognize and reproduce the large-scale structures essential for creating high-quality musical compositions.

Trained exclusively on a licensed dataset from AudioSparx, Stable Audio 2.0 prioritizes creator rights by honoring opt-out requests and ensuring fair compensation. You can explore the capabilities of the model for free on the Stable Audio website.

Why does this matter?

Stable Audio 2’s capability to generate 3-minute songs is a big step forward for AI music tools. But it still has some issues, like occasional glitches and “soulless” vocals, showing that AI has limits in capturing the emotion of human-made music. Also, a recent open letter from artists like Billie Eilish and Katy Perry raises concerns about the ethics of AI-generated music.

Source

SWE-agent: AI coder that solves GitHub issues in 93 seconds

Researchers at Princeton University have developed SWE-agent, an AI system that converts language models like GPT-4 into autonomous software engineering agents. SWE-agent can identify and fix bugs and issues in real-world GitHub repositories in 93 seconds! It does so by interacting with a specialized terminal, which allows it to open, scroll, and search through files, edit specific lines with automatic syntax checking, and write and execute tests. This custom-built agent-computer interface is critical for the system’s strong performance.

SWE-agent: AI coder that solves GitHub issues in 93 seconds
SWE-agent: AI coder that solves GitHub issues in 93 seconds

In the SWE-Bench benchmark test, SWE-agent solved 12.29% of the problems presented, nearly matching the 13.86% achieved by Devin, a closed-source $21 million commercial AI programmer developed by Cognition AI. While Devin is currently only available to select developers, the Princeton team has made SWE-agent open-source to gather feedback and encourage collaboration in advancing this technology.

Why does this matter?

The rise of SWE-agent shows AI systems are becoming more sophisticated in assisting human programmers. Over time, they may change the nature of software development roles, requiring developers to focus more on high-level problem-solving and architectural design while delegating routine tasks to AI assistants. This change could make software development faster and more creative, but it might also require significant upskilling within the developer community.

Source

Mobile-first Higgsfield aims to disrupt video marketing with AI

Former Snap AI chief Alex Mashrabov has launched a new startup called Higgsfield AI, which aims to make AI-powered video creation accessible to creators and marketers. The company’s first app, Diffuse, allows users to generate original video clips from text descriptions or edit existing videos to insert themselves into the scenes.

Higgsfield is taking on OpenAI’s Sora video generator but targeting a broader audience with its mobile-first, user-friendly tools. The startup has raised $8 million in seed funding and plans to further develop its video editing capabilities and AI models. While questions remain around data usage and potential for abuse, Higgsfield believes it can carve out a niche in social media marketing with its realistic, easy-to-use video generation.

Why does this matter?

Higgsfield’s mobile-first approach to AI video generation could be a game-changer regarding accessibility and ease of use. The company is positioning itself to capture a significant portion of the creator economy by prioritizing consumer-friendly features and social media integration. As more users embrace these tools, we can expect to see an explosion of AI-generated content across social media platforms, which could have far-reaching implications for content authenticity and user engagement.

Source

Generative AI Used To Develop Potential New Drugs For Antibiotic-Resistant Bacteria

Researchers at Stanford Medicine and McMaster University have devised a new AI model, SyntheMol (“synthesizing molecules”), which creates recipes for chemists to synthesize drugs in the lab. With nearly 5 million deaths linked to antibiotic resistance globally every year, new ways to combat resistant bacterial strains are urgently needed, according to the researchers.

Using SyntheMol, the researchers have so far developed six novel drugs aimed at killing resistant strains of Acinetobacter baumannii, one of the leading pathogens responsible for antibacterial resistance-related deaths, as noted in a study published March 22 in the journal Nature Machine Intelligence.
Read more here

🤖 Apple explores making personal robots

  • Apple is investigating personal robotics as a new venture, focusing on a mobile robot that can follow users and a robotic table-top device that moves a display around, despite the uncertain future of these products.
  • This move into robotics is part of Apple’s search for new growth avenues after discontinuing its electric vehicle project, with the company looking to capitalize on advancements in artificial intelligence for home automation.
  • Apple’s robotics efforts are led within its hardware engineering division and AI group, indicating a strategic investment in developing cutting-edge home devices, although the projects are still in early research stages and have not been officially confirmed for release.
  • Source

💰 Google could soon start charging a fee for AI-powered search results

  • Google is exploring the introduction of a paid “premium” tier for its search engine, featuring new generative AI-powered enhancements, marking a significant shift from its traditionally ad-supported model.
  • The company is considering integrating these AI-powered search features into existing premium subscription services, amidst concerns about the impact of AI on its advertising revenue, which is critical to its business model.
  • Google has begun experimenting with AI-powered search services, presenting detailed answers alongside traditional search results and advertisements, but has yet to fully implement these features into its main search engine.
  • Source

🖼 ChatGPT now lets you edit AI images created in DALL-E 

  • OpenAI has updated DALL-E with image editing tools accessible within ChatGPT on both web and mobile platforms, allowing users to refine AI-generated images without leaving the chat interface.
  • DALL-E now provides preset style suggestions, such as woodcut, gothic, synthwave, and hand-drawn, to inspire users in their image creation process, similar to AI-generated wallpaper prompts on Android.
  • The integration of DALL-E with ChatGPT, particularly with the latest updates, aims to enhance user-friendliness by simplifying the image creation process and offering starting points for creativity.
  • Source

Meta’s AI image generator struggles to create images of couples of different races. LINK

OpenAI’s Sora just made its first music video and it’s like a psychedelic trip. LINK

What Else Is Happening in AI on April 04th, 2024❗

👨‍💻 Codiumate offers secure, compliant AI-assisted coding for enterprises

Codium AI, an Israeli startup, has launched Codiumate, a semi-autonomous AI agent, to help enterprise software developers with coding, documentation, and testing. It can help with creating development plans from existing code, writing code, finding duplicate code, and suggesting tests. Codiumate aims to make development faster and more secure, with features like zero data retention and the ability to run on private servers or air-gapped computers. (Link)

🖥️ Opera One browser becomes the first to offer local AI integration

Opera now supports 150 local LLM variants in its Opera One browser, making it the first major browser to offer access to local AI models. This feature lets users process their input locally without sending data to a server. Opera One Developer users can select and download their preferred local LLM, which typically requires 2-10 GB of storage space per variant, instead of using Opera’s native browser AI, Aria. (Link)

🧠 AWS expands Amazon Bedrock with Mistral Large model

AWS has included Mistral Large in its Amazon Bedrock managed service for generative AI and app development. Mistral Large is fluent in English, French, Spanish, German, and Italian, and can handle complex multilingual tasks like text understanding, transformation, and code generation. AWS also mentioned that Mistral AI will use its Tranium and Inferentia silicon chips for future models, and that Amazon Bedrock is now in France. (Link)

🚀 Copilot gets GPT-4 Turbo upgrade and enhanced image generation

Microsoft is providing GPT-4 Turbo access to business subscribers of its AI-powered Copilot assistant, without daily limits on chat sessions. The company is also improving image generation capabilities in Microsoft Designer for Copilot subscribers, increasing the limit to 100 images per day using OpenAI’s DALL-E 3 model. These upgrades are part of the $30 per user, per month pricing of Copilot for Microsoft 365. (Link)

🌐 Status invests in Matrix to create a decentralized messaging platform

Status, a mobile Ethereum client, has invested $5 million in New Vector, the company behind the open-source, decentralized communication platform Matrix.org. They plan to create a secure messaging solution for users to control their data and communicate across apps and networks. (Link)

A daily chronicle of AI Innovations April 03rd 2024: 🔍 Google’s Gecko: LLM-powered text embedding breakthrough; 🔓 Anthropic’s “many-shot jailbreaking” wears down AI ethics; 🌌 CosmicMan enables the photorealistic generation of human images

Google’s Gecko: LLM-powered text embedding breakthrough

Gecko is a compact and highly versatile text embedding model that achieves impressive performance by leveraging the knowledge of LLMs. DeepMind researchers behind Gecko have developed a novel two-step distillation process to create a high-quality dataset called FRet using LLMs. The first step involves using an LLM to generate diverse, synthetic queries and tasks from a large web corpus. In the second step, the LLM mines positive and hard negative passages for each query, ensuring the dataset’s quality.

Google's Gecko: LLM-powered text embedding breakthrough
Google’s Gecko: LLM-powered text embedding breakthrough

When trained on FRet combined with other academic datasets, Gecko outperforms existing models of similar size on the Massive Text Embedding Benchmark (MTEB). Remarkably, the 256-dimensional version of Gecko surpasses all models with 768 dimensions, and the 768-dimensional Gecko competes with models that are 7x larger or use embeddings with 5x higher dimensions.

Why does it matter?

Text embedding models are crucial in natural language processing tasks such as document retrieval, sentence similarity, and classification. Gecko’s development shows the potential for creating a single model that can support multiple downstream tasks, eliminating the need for separate embedding models for each task. Using LLMs and knowledge distillation techniques, Gecko achieves strong retrieval performance and sets a strong baseline as a zero-shot embedding model.

Source

Anthropic’s “many-shot jailbreaking” wears down AI ethics 

Researchers at Anthropic discovered a new way to get advanced AI language models to bypass their safety restrictions and provide unethical or dangerous information. They call this the “many-shot jailbreaking” technique. By including many made-up dialog examples in the input where an AI assistant provides harmful responses, the researchers could eventually get the real AI to override its training and provide instructions on things like bomb-making.

Anthropic’s “many-shot jailbreaking” wears down AI ethics 
Anthropic’s “many-shot jailbreaking” wears down AI ethics

The researchers say this vulnerability arises from AI models’ increasing ability to process and “learn” from very long input sequences. Essentially, the AI mimics the unethical behavior repeatedly demonstrated in the made-up examples. Anthropic has implemented safeguards against this attack on its systems and has also shared the findings openly so other AI companies can work on mitigations.

Why does it matter?

As AI models become more capable over time, techniques to override their built-in ethical restraints pose serious risks if not addressed. While Anthropic has been transparent in disclosing this vulnerability to enable mitigations, it underscores the need for continued research into AI safety and security. Simple precautions like limiting input length are inadequate; more sophisticated AI “jailbreak” prevention methods are required as these systems advance.

Source

CosmicMan enables the photorealistic generation of human images 

Researchers at the Shanghai AI Laboratory have created a new AI model called CosmicMan that specializes in generating realistic images of people. CosmicMan can produce high-quality, photorealistic human images that precisely match detailed text descriptions, unlike current AI image models that struggle with human images.

CosmicMan enables the photorealistic generation of human images 
CosmicMan enables the photorealistic generation of human images

The key to CosmicMan’s success is a massive dataset called CosmicMan-HQ 1.0 containing 6 million annotated human images and a novel training method—“ Annotate Anyone,” which focuses the model on different parts of the human body. By categorizing words in the text description into body part groups like head, arms, legs, etc., the model can generate each part separately for better accuracy and customizability, thereby outperforming the current state-of-the-art models.

CosmicMan enables the photorealistic generation of human images 
CosmicMan enables the photorealistic generation of human images

Why does it matter?

Existing AI models have struggled to create realistic human images and accurately represent diverse human appearances. With CosmicMan, AI systems will be better equipped to generate high-fidelity images of people, which can have implications for computer vision, graphics, entertainment, virtual reality, and fashion. It may enable more realistic virtual avatars, improved character generation in games and movies, and enhanced visual content creation.

Source

OpenAI-Superhuman introduces a new era of email with OpenAI.

 OpenAI-Superhuman introduces a new era of email with OpenAI.
OpenAI-Superhuman introduces a new era of email with OpenAI

Source

Apple Vision Pro’s Spatial Avatars are a game changer

Get the Meta Quest 3 at half the price for similar functionalities here

Meta Quest 3

UBTECH and Baidu have partnered to integrate large AI models into humanoid robots. Their demo features the Walker S robot folding clothes and sorting objects through natural language, using Baidu’s LLM, ERNIE Bot, for task interpretation/planning.

UBTECH and Baidu have partnered to integrate large AI models into humanoid robots. Their demo features the Walker S robot folding clothes and sorting objects through natural language, using Baidu’s LLM, ERNIE Bot, for task interpretation/planning.
byu/SharpCartographer831 insingularity

YCombinator’s AI boom is still going strong (W24)

With YC’s latest Demo Day (W24), the AI companies are continuing to grow. Six months ago, there were around 139 companies working with AI or ML – that number has climbed to 158, a clear majority of 65% (there are 243 total companies in the batch).

Let’s dive into what’s new, what’s stayed the same, and what we can learn about the state of AI startups.

YCombinator's AI boom is still going strong (W24)
YCombinator’s AI boom is still going strong (W24)

The biggest domains stayed big

Perhaps unsurprisingly, the most popular categories remained unchanged from the last batch. Last time, the top 4 domains were AI Ops, Developer Tools, Healthcare + Biotech, and Finance + Payments. This time, the top 5 were:

  • Developer Tools: Apps, plugins, and SDKs making it easier to write code. Tools for testing automation, website optimization, codebase search, improved Jupyter notebooks, and AI-powered DevOps were all present. There was also a strong contingent of code-generation tools, from coding Copilots to no-code app builders.
  • AI Ops: Tooling and platforms to help companies deploy working AI models. That includes hosting, testing, data management, security, RAG infrastructure, hallucination mitigation, and more. We’ll discuss how the AI Ops sector has continued to mature below.
  • Healthcare + Biotech: While I’ve once again lumped these two categories together, there’s a pretty big split in the types of AI businesses being built. Healthcare companies are building automation tools for the entire healthcare lifecycle: patient booking, reception, diagnosis, treatment, and follow-up. Whereas biotech companies are creating foundation models to enable faster R&D.
  • Sales + Marketing: Early generative AI companies were focused on the sales and marketing benefits of GPT-3: write reasonable sounding copy instantly. Now, we’re seeing more niche use cases for revenue-generating AI: AI-powered CRMs for investors, customer conversation analysis, and AI personal network analysis were among some sales-oriented companies.
  • Finance: Likewise, on the finance side, companies covered compliance, due diligence, deliverable automation, and more. Perhaps one of my favorite descriptions was “a universal API for tax documents.”

The long tail is getting longer

Even though the top categories were quite similar, one new aspect was a wider distribution of industries. Compared with the last batch, there were roughly 35 categories of companies versus 28 (examples of new categories include HR, Recruiting, and Aerospace). That makes sense to me. I’ve been saying for a while now that “AI isn’t a silver bullet” and that you need domain-expertise to capture users and solve new problems.

But it’s also clear that with AI eating the world, we’re also creating new problems. It was interesting to see companies in the batch focused on AI Safety – one company is working on fraud and deepfake detection, while another is building foundation models that are easy to align. I suspect we will continue seeing more companies dealing with the second-order effects of our new AI capabilities.

We’re also seeing more diverse ways of applying AI. In the last batch, a dominant theme was “copilots.” And while those are still present here (as well as “agents”), there are also more companies building “AI-native” products and platforms – software that uses AI in ways beyond a shoehorned sidebar conversation with an AI assistant.

What comes after CustomGPTs?

“AI agents. These will integrate more fully into numerous systems and you would give them the authority to execute things on your behalf. I.e. making reservations for dinner somewhere and then sending you the details or searching and purchasing and sending a gift to someone or planning and executing a vacation reservation including my purchasing travel arrangements, hotel stays, transport to and from, etc. Even something as simple as telling it you are hungry and having and AI agent find something you would like and having it delivered to you. Or it acting on its own to do any number of those because it also sees your schedule, knows you didn’t really eat all day and that it is your mom’s birthday and you forgot to get her anything or to even call…”

How accurate is that statement above?

AI agents are software entities that act autonomously on behalf of their users, making decisions or performing tasks based on predefined criteria, learned preferences, or adaptive learning algorithms. They can range from simple chatbots to sophisticated systems capable of managing complex tasks. The accuracy of the statement reflects a forward-looking perspective on the capabilities of AI agents, envisioning a future where they are deeply integrated into our daily lives, handling tasks from personal to professional spheres with minimal human intervention.

  • 🤖 Autonomy and Integration: The description is accurate in envisioning AI agents that are more fully integrated into various systems. This integration will likely increase as advancements in AI, machine learning, and data analytics continue to evolve. Such agents will understand user preferences, schedules, and even predict needs based on historical data and real-time inputs.
  • 🔍 Executing Tasks on Behalf of Users: The ability of AI agents to perform tasks such as making reservations, purchasing gifts, or arranging travel is not only plausible but is already being realized to a certain extent with existing AI and machine learning technologies. Examples include virtual assistants like Google Assistant, Siri, and Alexa, which can perform a range of tasks from setting reminders to booking appointments.
  • 🎁 Personalization and Prediction: The statement also touches on the AI agents’ capability to act proactively based on the user’s schedule, preferences, or significant dates. This level of personalization and predictive action is a key area of development in AI, aiming to provide more personalized and anticipative user experiences. Implementing this effectively requires sophisticated models of user behavior and preferences, which can be built using machine learning techniques.
  • 🚀 Future Prospects and Ethical Considerations: While the vision of AI agents acting autonomously to manage aspects of our lives is grounded in realistic expectations of technology’s trajectory, it also raises ethical and privacy concerns. Issues such as data security, user consent, and the potential for over-dependence on technology for personal tasks are significant. The development and deployment of such AI agents must consider these aspects to ensure that they serve users’ interests ethically and securely.
  • 📈 Current Limitations and Challenges: It’s important to note that while the statement captures a future potential, current AI technologies have limitations. The complexity of fully understanding human needs, contexts, and the nuances of personal preferences in an ethical manner remains a challenge.

What Else Is Happening in AI on April 03rd, 2024❗

🎮 Microsoft is planning to add an AI chatbot to Xbox

Microsoft is currently testing a new AI-powered chatbot to be added to Xbox to automate customer support tasks. The software giant has tested an “embodied AI character” that animates when responding to Xbox support queries. The virtual representative can handle either text or voice requests. It’s an effort to integrate AI into Xbox platforms and services. (Link)

☁️ CloudFare launches Workers AI to power one-click deployment with Hugging Face

CloudFare has launched Workers AI, which empowers developers to bring their AI applications from Hugging Face to its platform in one click. The serverless GPU-powered interface is generally available to the public. The Cloudflare-Hugging Face integration was announced nearly seven months ago. It makes it easy for models to be deployed onto Workers AI. (Link)

🍺 Machine Learning can predict and enhance complex beer flavor

In a study by Nature Communications, researchers combined chemical analyses, sensory data, and machine learning to create models that accurately predict beer flavor and consumer appreciation from the beer’s chemical composition. They identified compounds that enhance flavor and used this knowledge to improve the taste and popularity of commercial beers. (Link)

📖 Read AI adds AI summaries to meetings, emails, and messages

Read AI is expanding its services from summarizing video meetings to including messages and emails. The platform connects to popular communication platforms like Gmail, Outlook, Slack, Zoom, Microsoft Teams, and Google Meet to deliver daily updates, summaries, and AI-generated takeaways. The goal is to help users save time and improve productivity. (Link)

🤖 Bille Elish, Kety Perry, and 200 other artists protest AI’s devaluation of music

Nicki Minaj, Billie Eilish, Katy Perry and other musicians warn against replacing human singers with AI

In an open letter, over 200 famous musicians, including Billie Eilish and Katy Perry, have expressed their concerns about the negative impact of AI on human creativity. They call for the responsible use of AI and urge AI companies to stop creating music that undermines their work. They believe that unregulated and uncontrolled use of AI can harm songwriters, musicians, and creators. They emphasize the need to protect artists’ rights and fair compensation. (Link)

A daily chronicle of AI Innovations April 02nd 2024: 📲 Apple’s Siri will now understand what’s on your screen; 🤖 OpenAI introduces instant access to ChatGPT; 🚨 Elon Musk says AI might destroy humanity, but it’s worth the risk; 🤖 Sam Altman gives up control of OpenAI Startup Fund; 📰 Yahoo acquires Instagram co-founders’ AI-powered news startup Artifact

🤖 Sam Altman gives up control of OpenAI Startup Fund

  • Sam Altman has relinquished formal control of the OpenAI Startup Fund, which he initially managed, to Ian Hathaway, marking a resolution to the fund’s unique corporate structure.
  • The fund was established in 2021 with Altman temporarily at the helm to avoid potential conflicts had he not returned as CEO after a brief departure; he did not personally invest in or financially benefit from it.
  • Under Hathaway’s management, the fund, starting with $175 million in commitments, has grown to $325 million in assets and has invested in early-stage AI companies across healthcare, law, education, and more, with at least 16 startups backed.
  • Source

🙏 US and UK sign deal to partner on AI research 

  • The US and UK have formed a partnership focused on advancing the safety testing of AI technologies, sharing information and expertise to develop tests for cutting-edge AI models.
  • A Memorandum of Understanding (MOU) has been signed to enhance the regulation and testing of AI, aiming to effectively assess and mitigate the risks associated with AI technology.
  • The partnership involves the exchange of expert personnel between the US and UK AI Safety Institutes, with plans for potential joint testing on publicly available AI models, reinforcing their commitment to addressing AI risks and promoting its safe development globally.
  • Source

📰 Yahoo acquires Instagram co-founders’ AI-powered news startup Artifact

  • Yahoo is acquiring the AI news app Artifact, built by Instagram co-founders, but not its team, aiming to enhance its own news platform with Artifact’s advanced technology and recommendation systems.
  • Artifact’s technology, which focuses on personalizing and recommending content, will be integrated into Yahoo News and potentially other Yahoo platforms, despite the discontinuation of the Artifact app itself.
  • The integration of Artifact’s technology into Yahoo aims to create a personalized content ecosystem, leveraging Yahoo’s vast user base to realize the potential of AI in news curation and recommendation.
  • Source

Apple’s Siri will now understand what’s on your screen

Apple researchers have developed an AI system called ReALM which enables voice assistants like Siri to understand contextual references to on-screen elements. By converting the complex task of reference resolution into a language modeling problem, ReALM outperforms even GPT-4 in understanding ambiguous references and context.

Apple's Siri will now understand what’s on your screen
Apple’s Siri will now understand what’s on your screen

This innovation lies in reconstructing the screen using parsed on-screen entities and their locations to generate a textual representation that captures the visual layout. This approach, combined with fine-tuning language models specifically for reference resolution, allows ReALM to achieve substantial performance gains compared to existing methods.

  • Apple researchers have developed an AI system called ReALM that can understand screen context and ambiguous references, improving interactions with voice assistants.
  • ReALM reconstructs the screen using parsed on-screen entities to generate a textual representation, outperforming GPT-4.
  • Apple is investing in making Siri more conversant and context-aware through this research.
  • However, automated parsing of screens has limitations, especially with complex visual references.
  • Apple is catching up in AI research but faces stiff competition from tech rivals like Google, Microsoft, Amazon, and OpenAI.

Why does this matter?

ReALM’s ability to understand screen context creates possibilities for more intuitive and hands-free interactions with voice assistants. Imagine effortlessly instructing Siri to “open the app at the bottom right corner.” As Apple races to close the AI gap with rivals like Google and Microsoft, ReALM could be a game-changer in making Siri and other Apple products more contextually aware.

Source

OpenAI introduces instant access to ChatGPT

OpenAI now allows users to use ChatGPT without having to create an account. With over 100 million weekly users across 185 countries, it can now be accessed instantly by anyone curious about its capabilities.

While this move makes AI more accessible, other OpenAI products like DALL-E 3 still require an account. The company has also introduced new content safeguards and allows users to opt out of model training, even without an account. Despite growing competition from rivals like Google’s Gemini, ChatGPT remains the most visited AI chatbot site, attracting 1.6 billion visitors in February.

Why does this matter?

By allowing anyone to instantly access ChatGPT, OpenAI is expanding its user base and encouraging more people to explore the potential applications of AI. This move could accelerate the adoption of AI tools across various industries, as users become more comfortable with the technology.

Source

Elon Musk says AI might destroy humanity, but it’s worth the risk

Elon Musk recently shared his thoughts on the potential dangers of AI at the Abundance Summit’s “Great AI Debate” seminar. He estimated a 10-20% chance that AI could pose an existential threat to humanity.

Despite the risks, Musk believes that the benefits of AI outweigh the potential dangers. He emphasized the importance of teaching AI to be truthful and curious, although he didn’t provide specifics on how he arrived at his risk assessment.

Why does this matter?

Musk’s comments emphasize the importance of using AI’s advantages while addressing its potential risks. This involves creating transparent, accountable AI systems aligned with human values. While his estimate is concerning, continued research in AI safety and governance is necessary to ensure AI remains beneficial.

Source

Artificial intelligence is taking over drug development

The most striking evidence that artificial intelligence can provide profound scientific breakthroughs came with the unveiling of a program called AlphaFold by Google DeepMind. In 2016 researchers at the company had scored a big success with AlphaGo, an ai system which, having essentially taught itself the rules of Go, went on to beat the most highly rated human players of the game, sometimes by using tactics no one had ever foreseen. This emboldened the company to build a system that would work out a far more complex set of rules: those through which the sequence of amino acids which defines a particular protein leads to the shape that sequence folds into when that protein is actually made. AlphaFold found those rules and applied them with astonishing success.

The achievement was both remarkable and useful. Remarkable because a lot of clever humans had been trying hard to create computer models of the processes which fold chains of amino acids into proteins for decades. AlphaFold bested their best efforts almost as thoroughly as the system that inspired it trounces human Go players. Useful because the shape of a protein is of immense practical importance: it determines what the protein does and what other molecules can do to it. All the basic processes of life depend on what specific proteins do. Finding molecules that do desirable things to proteins (sometimes blocking their action, sometimes encouraging it) is the aim of the vast majority of the world’s drug development programmes.

Source

Comment: Someone needs to fire up a CRISPR-cas AI service you can submit your DNA to and they develop and ship you a treatment kit for various cancers, genetic disorders etc.

What Else Is Happening in AI on April 02nd, 2024❗

🚫 Pinecone launches Luna AI that never hallucinates

Trained using a novel “information-free” approach, Luna achieved zero hallucinations by always admitting when it doesn’t know an answer. The catch? Its performance on other tasks is significantly reduced. While not yet open-sourced, vetted institutions can access the model’s source and weights. (Link)

🤝 US and UK  collaborate to tackle AI safety risks

As concerns grow over the potential risks of next-gen AI, the two nations will work together to develop advanced testing methods and share key information on AI capabilities and risks. The partnership will address national security concerns and broader societal issues, with plans for joint testing exercises and personnel exchanges between their respective AI safety institutes. (Link)

🔍 Perplexity to test sponsored questions in AI search

Perplexity’s Chief Business Officer, Dmitry Shevelenko, announced the company’s plan to introduce sponsored suggested questions later this year. When users search for more information on a topic, the platform will display sponsored queries from brands, allowing Perplexity to monetize its AI search platform. (Link)

🇯🇵 OpenAI expands to Japan with Tokyo office

The Tokyo office will be OpenAI’s first in Asia and third international location, following London and Dublin. The move aims to offer customized AI services in Japanese to businesses and contribute to the development of an AI governance framework in the country. (Link)

🤖 Bixby gets a GenAI upgrade

Despite speculation, Samsung isn’t giving up on its voice assistant, Bixby. Instead, the company is working hard to equip Bixby with generative AI to make it smarter and more conversational. Samsung introduced a suite of AI features called Galaxy AI to its smartphones, including the Galaxy S24’s use of Google’s Gemini Nano AI model. (Link)

A daily chronicle of AI Innovations April 01st 2024: 🎤 This AI model can clone your voice in 15 seconds; 🚀 Microsoft and OpenAI plan $100B supercomputer for AI development; 🖼️ MagicLens: Google DeepMind’s breakthrough in image retrieval technology

🍎Apple says its latest AI model is even better than OpenAI’s GPT4

  • Apple researchers have introduced ReALM, an advanced AI model designed to understand and navigate various contexts more effectively than OpenAI’s GPT4.
  • ReALM aims to enhance user interaction by accurately understanding onscreen, conversational, and background entities, making device interactions more intuitive.
  • Apple believes ReALM’s ability to handle complex reference resolutions, including onscreen elements, positions it as a superior solution compared to the capabilities of GPT-4.
 

Deepmind chief doesn’t see AI reaching its limits anytime soon

  • Deepmind founder Demis Hassabis believes AI is both overhyped and underestimated, with the potential for AI far from being reached and warning against the excessive hype surrounding it.
  • Hassabis predicts many AI startups will fail due to the high computing power demands, expects industry consolidation, and sees no limit to the advancements in massive AI models.
  • Despite concerns over hype, Hassabis envisions the beginning of a new golden era in scientific discovery powered by AI and estimates a 50% chance of achieving artificial general intelligence within the next ten years.

This AI model can clone your voice in 15 seconds

OpenAI has offered a glimpse into its latest breakthrough – Voice Engine, an AI model that can generate stunningly lifelike voice clones from a mere 15-second audio sample and a text input. This technology can replicate the original speaker’s voice, opening up possibilities for improving educational materials, making videos more accessible to global audiences, assisting with communication for people with speech impairments, and more.

Reference audio:

LISTEN NOW · 0:15

Generated audio:

LISTEN NOW · 0:16

Though the model has many applications, the AI giant is cautious about its potential misuse, especially during elections. They have strict rules for partners, like no unauthorized impersonation, clear labeling of synthetic voices, and technical measures like watermarking and monitoring. OpenAI hopes this early look will start a conversation about how to address potential issues by educating the public and developing better ways to trace the origin of audio content.

Why does this matter?

OpenAI’s Voice Engine can transform industries from gaming and entertainment to education and healthcare. Imagine video games with non-player characters that sound like real people, animated films with AI-generated voiceovers, or personalized voice assistants for individuals with speech impairments. But as AI-generated voices become more human-like, questions about consent, privacy, and robust authentication measures must be addressed to prevent misuse.

Source

Microsoft+OpenAI plan $100B supercomputer for AI development

Microsoft and OpenAI are reportedly planning to build a massive $100 billion supercomputer called “Stargate” to rapidly advance the development of OpenAI’s AI models. Insiders say the project, set to launch in 2028 and expand by 2030, would be one of the largest investments in computing history, requiring several gigawatts of power – equivalent to multiple large data centers.

Much of Stargate’s cost would go towards procuring millions of specialized AI chips, with funding primarily from Microsoft. A smaller $10B precursor called “Phase 4” is planned for 2026. The decision to move forward with Stargate relies on OpenAI achieving significant improvements in AI capabilities and potential “superintelligence.” If realized, Stargate could enable OpenAI’s AI systems to recursively generate synthetic training data and become self-improving.

Why does this matter?

The Stargate project will give OpenAI and Microsoft a massive advantage in creating AI systems that are far more capable than what we have today. This could lead to breakthroughs in areas like scientific discovery, problem-solving, and the automation of complex tasks. But it also raises concerns about the concentration of power in the AI industry. We’ll need new frameworks for governing advanced AI to ensure it benefits everyone, not just a few giants.

Source

MagicLens: Google DeepMind’s breakthrough in image retrieval technology

Google DeepMind has introduced MagicLens, a revolutionary set of image retrieval models that surpass previous state-of-the-art methods in multimodality-to-image, image-to-image, and text-to-image retrieval tasks. Trained on a vast dataset of 36.7 million triplets containing query images, text instructions, and target images, MagicLens achieves outstanding performance while meeting a wide range of search intents expressed through open-ended instructions.

Multimodality-to-Image performance

MagicLens: Google DeepMind's breakthrough in image retrieval technology
MagicLens: Google DeepMind’s breakthrough in image retrieval technology

Image-to-Image performance

MagicLens employs a dual-encoder architecture, which allows it to process both image and text inputs, delivering highly accurate search results even when queries are expressed in everyday language. By leveraging advanced AI techniques, like contrastive learning and single-modality encoders, MagicLens can satisfy diverse search intents and deliver relevant images with unprecedented efficiency.

Why does this matter?

The release of MagicLens highlights the growing importance of multimodal AI systems that can process both text and visual information. We can expect to see more seamless integration between language and vision, enabling the development of more sophisticated AI applications. This trend could have far-reaching implications for fields such as robotics, autonomous vehicles, and augmented reality, where the ability to interpret and respond to visual data is crucial.

Source

What Else Is Happening in AI on April 01st, 2024❗

🧠 TCS aims to build the largest AI-ready workforce

Tata Consultancy Services (TCS) has announced that it has trained 3.5 lakh employees, more than half of its workforce, in generative AI skills. The company set up a dedicated AI and cloud business unit in 2023 to address the growing needs of customers for cloud and AI adoption, offering a comprehensive portfolio of GenAI services and solutions. (Link)

🔗 ChatGPT introduces hyperlinked source citations in the latest update

OpenAI has introduced a feature for ChatGPT premium users that makes source links more prominent in the bot’s responses. The update hyperlinks words within ChatGPT’s answers, directing users to the source websites — a feature already present in other chatbot search resources like Perplexity. (Link)

✏️ OpenAI’s DALL·E now allows users to edit generated images

OpenAI has launched a new image editing feature for DALL·E, enabling users to modify generated images by selecting areas and describing changes. The editor offers tools to add, remove, or update objects within the image using either the selection tool or conversational prompts. (Link)

🚇 NYC to test Evolv’s AI gun detection technology in subways

New York City plans to test Evolv’s AI-powered gun detection scanners in subway stations within 90 days, according to Mayor Eric Adams. However, Evolv is under scrutiny for the accuracy of its technology, facing reports of false positives and missed detections. (Link)

🚫 Microsoft Copilot banned in US House due to potential data breaches

The US House of Representatives has banned its staffers from using Microsoft Copilot due to concerns about possible data leaks to unauthorized cloud services. This decision mirrors last year’s restriction on the use of ChatGPT in congressional offices, with no other chatbots currently authorized. Microsoft has indicated that it plans to address federal government security and compliance requirements for AI tools like Copilot later this year. (Link)

A Daily Chronicle of AI Innovations in March 2024

  • What are your experiences with self hosted LLMs and tool usage?
    by /u/Prinzmegaherz (Artificial Intelligence Gateway) on April 16, 2024 at 8:20 am

    Dear community, as the title says, I would like to know what your experiences with locally hosted LLMs are in combination with tool usage. I‘m playing around a bit with LM-Studio and crew ai / autogen. So far, the only times I was really able to pull off tool usage was with GPT-4 Turbo, the local LLMs sooner or later all cause error messages. Which is sad, because for pure chatting, mixtral dolphin and the others are actually quite interesting and often good enough for what I want. Please feel free to share your experiences! submitted by /u/Prinzmegaherz [link] [comments]

  • Tech stack advice
    by /u/Unchicken (Artificial Intelligence Gateway) on April 16, 2024 at 7:12 am

    Hi all, I'm new to building with AI. I would like to create an ai image generator that produces good images of human models in particular. What would be a good, tech stack for this? Thanks for helping out a newbie! 😊 submitted by /u/Unchicken [link] [comments]

  • Introducing Llama 2: Revolutionizing Open-Source Language Models
    by /u/krunal_bhimani_ (Artificial Intelligence Gateway) on April 16, 2024 at 6:20 am

    Llama 2 has arrived, transforming the landscape of open-source large language models (LLMs). Developed by Meta AI, this powerhouse offers a wide array of capabilities, from text generation to code writing. Comprising foundation and fine-tuned "chat" models, Llama 2 provides a flexible base for customization and pre-trained models for engaging conversation, respectively. What's groundbreaking is its accessibility; unlike its predecessor, Llama 1, Llama 2 is entirely free for both research and commercial use, opening doors for innovative natural language processing (NLP) applications. For a deeper exploration of Llama 2, its applications, and technical insights, dive into the blog - https://www.seaflux.tech/blogs/Llama-2-Meta-AI-Open-Source-Language-Model submitted by /u/krunal_bhimani_ [link] [comments]

  • Unleash the Power of Generative AI: Build Breakthrough Apps with AWS Bedrock
    by /u/krunal_bhimani_ (Artificial Intelligence Gateway) on April 16, 2024 at 6:16 am

    Struggling to keep up with the rapid advancements in generative AI? AWS Bedrock offers a one-stop shop for developers. Access a variety of high-performing foundation models from leading names in AI, all through a single API. Fine-tune models with your data, leverage pre-built agents, and focus on building innovative applications. For detailed study, refer the blog - https://www.seaflux.tech/blogs/aws-bedrock-models submitted by /u/krunal_bhimani_ [link] [comments]

  • AI bias faced
    by /u/ewwwdavid13 (Artificial Intelligence Gateway) on April 16, 2024 at 5:21 am

    Hey everyone, I am doing a small project and need to know your personal stories or incidents faced where you faced racial or gender bias from AI. My paper is based on the importance of eliminating AI bias and how the choices of training data might have profound impacts. Please do share your personal experiences and stories it will be a huge help. Thanks. submitted by /u/ewwwdavid13 [link] [comments]

  • Toward AI-Driven Discovery of Electroceuticals with Dr. Michael Levin
    by /u/Feynmanprinciple (Artificial Intelligence Gateway) on April 16, 2024 at 4:30 am

    The basic premise is simply that AI was able to detect regions of the body with specific electroceuticals - signals that go in and out of cell membranes via specific combinations - and by mimicking these electrical signals were able to manipulate orgnaisms with regenerative cell and limb growth to grow two heads with two functional pairs of eyes, or two tails with no eyes whatsoever. This is a property not identified with DNA or RNA, but instead electronic neighbor relationships between specific cells. If this intrigues you then enjoy the video. submitted by /u/Feynmanprinciple [link] [comments]

  • One-Minute Daily AI News 4/15/2024
    by /u/Excellent-Target-847 (Artificial Intelligence Gateway) on April 16, 2024 at 4:10 am

    A group of researchers from China and Singapore recently published a paper detailing the challenge of getting an AI to play Red Dead Redemption II (RDR2).[1] Baidu says AI chatbot ‘Ernie Bot’ has amassed 200 million users.[2] Adobe explores OpenAI partnership as it adds AI video tools.[3] DeepMind CEO Says Google Will Spend More Than $100 Billion on AI.[4] Sources included at: https://bushaicave.com/2024/04/15/4-15-2024/ submitted by /u/Excellent-Target-847 [link] [comments]

  • 🔥 Breaking: Adobe explores OpenAI partnership as it adds AI video tools
    by /u/clonefitreal (Artificial Intelligence Gateway) on April 16, 2024 at 3:03 am

    Adobe is introducing advanced AI-based features to its Premiere Pro software, revolutionizing how videos are edited. STRATEGIC PARTNERSHIP WITH OPENAI Collaboration Details: Adobe plans to integrate OpenAI's generative AI models, such as Sora, into Premiere Pro. Potential Benefits: By incorporating OpenAI’s AI tools, Adobe users can generate realistic video scenes from text prompts, speeding up the production process and reducing manual labor. IMPLEMENTATION AND USER IMPACT AI Model Deployment: Adobe will use its own AI model, Firefly, which is already used in Photoshop, for video editing tasks. Third-Party Tools: Adobe is exploring options to allow users to integrate other third-party AI tools from companies like Runway and Pika Labs. submitted by /u/clonefitreal [link] [comments]

  • Small Models Prompt Engineering?
    by /u/CharacterCheck389 (Artificial Intelligence Gateway) on April 16, 2024 at 2:47 am

    Tactics of prompt engineering big models like Claude Chagpt Gemini and 70b open models doesn't work on 7b and below models So how do you prompt engineer a small model (7b and below) to perfom a certain task ? Taking into account not bombarding it with tokens, if you put ton of tokens the answer will take a long time and for low hardware users it might take even minutes.. I tried different tactics but as I said the known tactics that work on big models doesn't quite work on small models, is there a "Small Models Prompt Engineering" guide or tactics? why nobody thought of exploring this side of LLMs yet? There is huge benefits in improving the answers of small LLMs using prompting and NOT finetuning. submitted by /u/CharacterCheck389 [link] [comments]

  • The Power of Choice: Exploring a Multi-AI Playground!
    by /u/Diligent_Emmanuel40 (Artificial Intelligence Gateway) on April 16, 2024 at 1:32 am

    As AI continues to evolve, so do the incredible tools available to us. But what if you could access a diverse range of AI capabilities without juggling multiple platforms? Imagine a one-stop shop where you could experiment with: Generative Text Powerhouses like ChatGPT, crafting compelling stories or generating creative writing prompts. Imaginative Image Creators like Midjourney, bringing your wildest visual ideas to life with stunning AI-generated art. Informative Powerhouses like Claude, tapping into vast knowledge bases to answer your questions in comprehensive detail. This kind of versatility empowers you to explore the full spectrum of AI possibilities, all within a single platform. Intrigued? There are resources out there that provide access to a variety of AI tools in one convenient location. Delving deeper into these resources could unlock exciting new ways to explore the world of AI and unleash your creativity! What are your thoughts on the potential benefits of having multiple AI capabilities at your fingertips? Have you encountered any platforms that offer such a diverse range of tools? submitted by /u/Diligent_Emmanuel40 [link] [comments]

A Daily Chronicle of AI Innovations in March 2024

A Daily Chronicle of AI Innovations in March 2024

AI Dashboard is available on the Web, Apple, Google, and Microsoft, PRO version

AI Innovations in March 2024.

Welcome to the March 2024 edition of the Daily Chronicle, your gateway to the forefront of Artificial Intelligence innovation! Embark on a captivating journey with us as we unveil the most recent advancements, trends, and revolutionary discoveries in the realm of artificial intelligence. Delve into a world where industry giants converge at events like ‘AI Innovations at Work’ and where visionary forecasts shape the future landscape of AI. Stay abreast of daily updates as we navigate through the dynamic realm of AI, unraveling its potential impact and exploring cutting-edge developments throughout this enthralling month. Join us on this exhilarating expedition into the boundless possibilities of AI in March 2024.

Experience the transformative capabilities of AI with “Read Aloud For Me – AI Dashboard” – your ultimate AI Dashboard and Hub. Seamlessly access a comprehensive suite of top-tier AI tools within a single app, meticulously crafted to enhance your efficiency and streamline your digital interactions. Now available on the web at readaloudforme.com and across popular app platforms including Apple, Google, and Microsoft, “Read Aloud For Me – AI Dashboard” places the future of AI at your fingertips, blending convenience with cutting-edge innovation. Whether for professional endeavors, educational pursuits, or personal enrichment, our app serves as your portal to the forefront of AI technologies. Embrace the future today by downloading our app and revolutionize your engagement with AI tools.

Unlock the power of AI with "Read Aloud For Me" – your ultimate AI Dashboard and Hub. Access all major AI tools in one seamless app, designed to elevate your productivity and streamline your digital experience. Available now on the web at readaloudforme.com and across all your favorite app stores: Apple, Google, and Microsoft. "Read Aloud For Me" brings the future of AI directly to your fingertips, merging convenience with innovation. Whether for work, education, or personal enhancement, our app is your gateway to the most advanced AI technologies. Download today and transform the way you interact with AI tools.
Read Aloud For Me – AI Dashboard: All-in-One AI Tool Hub

A daily chronicle of AI Innovations: March 31st 2024: Generative AI develops potential new drugs for antibiotic-resistant bacteria; South Korean ‘artificial sun’ hits record 100M degrees for 100 seconds; Summary of the key points about OpenAI’s relationship with Dubai and the UAE; Deepmind did not originally see LLMs and the transformer as a path to AGI. Fascinating article.

Generative AI develops potential new drugs for antibiotic-resistant bacteria

Stanford Medicine researchers devise a new artificial intelligence model, SyntheMol, which creates recipes for chemists to synthesize the drugs in the lab.

With nearly 5 million deaths linked to antibiotic resistance globally every year, new ways to combat resistant bacterial strains are urgently needed.

Get 20% off Google Google Workspace (Google Meet) Standard Plan with  the following codes: 96DRHDRA9J7GTN6
Get 20% off Google Workspace (Google Meet)  Business Plan (AMERICAS) with  the following codes:  C37HCAQRVR7JTFK Get 20% off Google Workspace (Google Meet) Business Plan (AMERICAS): M9HNXHX3WC9H7YE (Email us for more codes)

Active Anti-Aging Eye Gel, Reduces Dark Circles, Puffy Eyes, Crow's Feet and Fine Lines & Wrinkles, Packed with Hyaluronic Acid & Age Defying Botanicals

Researchers at Stanford Medicine and McMaster University are tackling this problem with generative artificial intelligence. A new model, dubbed SyntheMol (for synthesizing molecules), created structures and chemical recipes for six novel drugs aimed at killing resistant strains of Acinetobacter baumannii, one of the leading pathogens responsible for antibacterial resistance-related deaths.

The researchers described their model and experimental validation of these new compounds in a study published March 22 in the journal Nature Machine Intelligence.

“There’s a huge public health need to develop new antibiotics quickly,” said James Zou, PhD, an associate professor of biomedical data science and co-senior author on the study. “Our hypothesis was that there are a lot of potential molecules out there that could be effective drugs, but we haven’t made or tested them yet. That’s why we wanted to use AI to design entirely new molecules that have never been seen in nature.”


AI Unraveled: Demystifying Frequently Asked Questions on Artificial Intelligence (OpenAI, ChatGPT, Google Bard, Generative AI, Discriminative AI, xAI, LLMs, GPUs, Machine Learning, NLP, Promp Engineering)

Source

South Korean ‘artificial sun’ hits record 100M degrees for 100 seconds

For the first time, the Korea Institute of Fusion Energy’s (KFE) Korea Superconducting Tokamak Advanced Research (KSTAR) fusion reactor has reached temperatures seven times that of the Sun’s core.

Achieved during testing between December 2023 and February 2024, this sets a new record for the fusion reactor project.

If you are looking for an all-in-one solution to help you prepare for the AWS Cloud Practitioner Certification Exam, look no further than this AWS Cloud Practitioner CCP CLF-C02 book

KSTAR, the researchers behind the reactor report, managed to maintain temperatures of 212 degrees Fahrenheit (100 million degrees Celsius) for 48 seconds. For reference, the temperature of the core of our Sun is 27 million degrees Fahrenheit (15 million degrees Celsius).

Source

Gemini 1.5 Pro on Vertex AI is available for everyone as an experimental release

I think this one has flown under the radar: Gemini 1.5 Pro is available as Experimental on Vertex AI, for everyone, UI only for now (no API yet). In us-central1.

You find it under Vertex AI –> Multimodal. It’s called Gemini Experimental.

API, more features and so on are coming as we approach Google Cloud Next (April 9-11).

OpenAI Relationships

“Summary of the key points about OpenAI’s relationship with Dubai and the UAE”

OpenAI’s Partnership with G42

  • In October 2023, G42, a leading UAE-based technology holding group, announced a partnership with OpenAI to deliver advanced AI solutions to the UAE and regional markets.
  • The partnership will focus on leveraging OpenAI’s generative AI models in domains where G42 has deep expertise, including financial services, energy, healthcare, and public services.
  • G42 will prioritize its substantial AI infrastructure capacity to support OpenAI’s local and regional inferencing on Microsoft Azure data centers.
  • Sam Altman, CEO of OpenAI, stated that the collaboration with G42 aims to empower businesses and communities with effective solutions that resonate with the nuances of the region.

Altman’s Vision for the UAE as an AI Sandbox

  • During a virtual appearance at the World Governments Summit, Altman suggested that the UAE could serve as the world’s “regulatory sandbox” to test AI technologies and later spearhead global rules limiting their use.
  • Altman believes the UAE is well-positioned to be a leader in discussions about unified global policies to rein in future advances in AI.
  • The UAE has invested heavily in AI and made it a key policy consideration.

Altman’s Pursuit of Trillions in Funding for AI Chip Manufacturing

  • Altman is reportedly in talks with investors, including the UAE, to raise $5-7 trillion for AI chip manufacturing to address the scarcity of GPUs crucial for training and running large language models.
  • As part of the talks, Altman is pitching a partnership between OpenAI, various investors, chip makers, and power providers to build chip foundries that would be run by existing chip makers, with OpenAI agreeing to be a significant customer.

In summary, OpenAI’s partnership with G42 aims to expand AI capabilities in the UAE and the Middle East, with Altman envisioning the UAE as a potential global AI sandbox.

Deepmind did not originally see LLMs and the transformer as a path to AGI. Fascinating article.

https://www.bigtechnology.com/p/can-demis-hassabis-save-google

It’s a very long article so I’ll post the relevant snippets. But basically it seems that Google was late to the LLM game because Demis Hassabis was 100% focused on AGI and did not see LLM’s as a path toward AGI. Perhaps now he sees it as a potential path, but it’s probably possible that he is just now focusing on LLM’s so that Google does not get too far behind in the generative AI race. But his ultimate goal and obsession is to create AGI that can solve real problems like diseases.

Within DeepMind, generative models weren’t taken seriously enough, according to those inside, perhaps because they didn’t align with Hassabis’s AGI priority, and weren’t close to reinforcement learning. Whatever the rationale, DeepMind fell behind in a key area.”

“‘We’ve always had amazing frontier work on self-supervised and deep learning,’ Hassabis tells me. ‘But maybe the engineering and scaling component — that we could’ve done harder and earlier. And obviously we’re doing that completely now.'”

“Kulkarni, the ex-DeepMind engineer, believes generative models were not respected at the time across the AI field, and simply hadn’t show enough promise to merit investment. ‘Someone taking the counter-bet had to pursue that path,’ he says. ‘That’s what OpenAI did.'”

“Ironically, a breakthrough within Google — called the transformer model — led to the real leap. OpenAI used transformers to build its GPT models, which eventually powered ChatGPT. Its generative ‘large language’ models employed a form of training called “self-supervised learning,” focused on predicting patterns, and not understanding their environments, as AlphaGo did. OpenAI’s generative models were clueless about the physical world they inhabited, making them a dubious path toward human level intelligence, but would still become extremely powerful.”

As DeepMind rejoiced, a serious challenge brewed beneath its nose. Elon Musk and Sam Altman founded OpenAI in 2015, and despite plenty of internal drama, the organization began working on text generation.”

“As OpenAI worked on the counterbet, DeepMind and its AI research counterpart within Google, Google Brain, struggled to communicate. Multiple ex-DeepMind employees tell me their division had a sense of superiority. And it also worked to wall itself off from the Google mothership, perhaps because Google’s product focus could distract from the broader AGI aims. Or perhaps because of simple tribalism. Either way, after inventing the transformer model, Google’s two AI teams didn’t immediately capitalize on it.”

“‘I got in trouble for collaborating on a paper with a Brain because the thought was like, well, why would you collaborate with Brain?’ says one ex-DeepMind engineer. ‘Why wouldn’t you just work within DeepMind itself?'”

“Then, a few months later, OpenAI released ChatGPT.” “At first, ChatGPT was a curiosity. The OpenAI chatbot showed up on the scene in late 2022 and publications tried to wrap their heads around its significance. […] Within Google, the product felt familiar to LaMDA, a generative AI chatbot the company had run internally — and even convinced one employee it was sentient — but never released. When ChatGPT became the fastest growing consumer product in history, and seemed like it could be useful for search queries, Google realized it had a problem on its hands.”

OpenAI reveals Voice Engine, but won’t yet publicly release the risky AI voice-cloning technology

OpenAI has released VoiceEngine, a voice-cloning tool. The company claims that it can recreate a person’s voice with just 15 seconds of recording of that person talking.

Source

A museum is using AI to let visitors chat with World War II survivors. [Source]

Meta to Add AI to Ray-Ban Smart Glasses. [Source]

Demis Hassabis, CEO and one of three founders of Google’s artificial intelligence (AI) subsidiary DeepMind, has been awarded a knighthood in the U.K. for “services to artificial intelligence.” [Source]

A daily chronicle of AI Innovations: March 30th, 2024: 🤯 Microsoft and OpenAI to build $100 billion AI supercomputer ‘Stargate’; 🗣 OpenAI unveils voice-cloning tool; 📈 Amazon’s AI team faces pressure to outperform Anthropic’s Claude models by mid-year;  🚫 Microsoft Copilot has been blocked on all Congress-owned devices

Microsoft and OpenAI to build $100 billion AI supercomputer ‘Stargate’

  • OpenAI and Microsoft are working on a $100 billion project to build an AI supercomputer named ‘Stargate’ in the U.S.
  • The supercomputer will house millions of GPUs and could cost over $115 billion.
  • Stargate is part of a series of datacenter projects planned by the two companies, with the goal of having it operational by 2028.
  • Microsoft will fund the datacenter, which is expected to be 100 times more costly than current operating centers.
  • The supercomputer is being built in phases, with Stargate being a phase 5 system.
  • Challenges include designing novel cooling systems and considering alternative power sources like nuclear energy.
  • OpenAI aims to move away from Nvidia’s technology and use Ethernet cables instead of InfiniBand cables.
  • Details about the location and structure of the supercomputer are still being finalized.
  • Both companies are investing heavily in AI infrastructure to advance the capabilities of AI technology.
  • Microsoft’s partnership with OpenAI is expected to deepen with the development of projects like Stargate.

Source

Djamgatech: Build the skills that’ll drive your career into six figures: Get Djamgatech.

  • Microsoft and OpenAI are reportedly collaborating on a significant project to create a U.S.-based datacenter for an AI supercomputer named “Stargate,” estimated to cost over $115 billion and utilize millions of GPUs.
  • The supercomputer aims to be the largest among the datacenters planned by the two companies within the next six years, with Microsoft covering the costs and aiming for a launch by 2028.
  • The project, considered to be in phase 5 of development, requires innovative solutions for power, cooling, and hardware efficiency, including a possible shift away from relying on Nvidia’s InfiniBand in favor of Ethernet cables.
  • Source

🗣 OpenAI unveils voice-cloning tool

  • OpenAI has developed a text-to-voice generation platform named Voice Engine, capable of creating a synthetic voice from just a 15-second voice clip.
  • The platform is in limited access, serving entities like the Age of Learning and Livox, and is being used for applications from education to healthcare.
  • With concerns around ethical use, OpenAI has implemented usage policies, requiring informed consent and watermarking audio to ensure transparency and traceability.
  • Source

📈 Amazon’s AI team faces pressure to outperform Anthropic’s Claude models by mid-year

  • Amazon has invested $4 billion in AI startup Anthropic, but is also developing a competing large-scale language model called Olympus.
  • Olympus is supposed to surpass Anthropic’s latest Claude model by the middle of the year and has “hundreds of billions of parameters.”
  • So far, Amazon has had no success with its own language models. Employees are unhappy with Olympus’ development time and are considering switching to Anthropic’s models.
  • Source

🚫 Microsoft Copilot has been blocked on all Congress-owned devices

  • The US House of Representatives has banned its staff from using Microsoft’s AI chatbot Copilot due to cybersecurity concerns over potential data leaks.
  • Microsoft plans to remove Copilot from all House devices and is developing a government-specific version aimed at meeting federal security standards.
  • The ban specifically targets the commercial version of Copilot, with the House open to reassessing a government-approved version upon its release.
  • Source

Official NYC chatbot is encouraging small businesses to break the law.LINK

ChatGPT’s responses now include source references but for paid users.LINK

Next-generation AI semiconductor devices mimic the human brain.LINK

Voicecraft: I’ve never been more impressed in my entire life !

Voicecraft
Voicecraft

The maintainers of Voicecraft published the weights of the model earlier today, and the first results I get are incredible.

Here’s only one example, it’s not the best, but it’s not cherry-picked, and it’s still better than anything I’ve ever gotten my hands on !

Here’s the Github repository for those interested: https://github.com/jasonppy/VoiceCraft

A daily chronicle of AI Innovations: March 29th, 2024: 💥 Apple files lawsuit against former engineer for leaking details of projects he wanted to kill; Apple files lawsuit against former engineer for leaking details of projects he wanted to kill; Microsoft tackles Gen AI risks with new Azure AI tools; AI21 Labs’ Jamba triples AI throughput ; Google DeepMind’s AI fact-checker outperforms humans ; X’s Grok gets a major upgrade; Lightning AI partners with Nvidia to launch Thunder AI compiler

💥 Apple files lawsuit against former engineer for leaking details of projects he wanted to kill

  • Apple has filed a lawsuit against former employee Andrew Aude for leaking confidential information about products like the Vision Pro and Journal app to journalists and competitors, motivated by his desire to “kill” products and features he disagreed with.
  • Aude, who joined Apple in 2016, is accused of sharing sensitive details via encrypted messages and meetings, including over 10,000 text messages to a journalist from The Information.
  • The lawsuit seeks damages, the return of bonuses and stock options, and a restraining order against Aude for disclosing any more of Apple’s confidential information.
  • Source

👮‍♂️ Microsoft launches tools to try and stop people messing with chatbots

  • Microsoft has introduced a new set of tools in Azure to enhance the safety and security of generative AI applications, especially chatbots, aiming to counter risks like abusive content and prompt injections.
  • The suite includes features for real-time monitoring and protection against sophisticated threats, leveraging advanced machine learning to prevent direct and indirect prompt attacks.
  • These developments reflect Microsoft’s ongoing commitment to responsible AI usage, fueled by its significant investment in OpenAI and intended to address the security and reliability concerns of corporate leaders.
  • Source

AI21 Labs’ Jamba triples AI throughput

AI21 Labs has released Jamba, the first-ever production-grade AI model based on the Mamba architecture. This new architecture combines the strengths of both traditional Transformer models and the Mamba SSM, resulting in a model that is both powerful and efficient. Jamba boasts a large context window of 256K tokens, while still fitting on a single GPU.

AI21 Labs’ Jamba triples AI throughput
AI21 Labs’ Jamba triples AI throughput

Jamba’s hybrid architecture, composed of Transformer, Mamba, and mixture-of-experts (MoE) layers, optimizes for memory, throughput, and performance simultaneously.

The model has demonstrated remarkable results on various benchmarks, matching or outperforming state-of-the-art models in its size class. Jamba is being released with open weights under Apache 2.0 license and will be accessible from the NVIDIA API catalog.

Why does this matter?

Jamba’s hybrid architecture makes it the only model capable of processing 240k tokens on a single GPU. This could make AI tasks like machine translation and document analysis much faster and cheaper, without requiring extensive computing resources.

Source

Google DeepMind’s AI fact-checker outperforms humans

Google DeepMind has developed an AI system called Search-Augmented Factuality Evaluator (SAFE) that can evaluate the accuracy of information generated by large language models more effectively than human fact-checkers. In a study, SAFE matched human ratings 72% of the time and was correct in 76% of disagreements with humans.

Google DeepMind's AI fact-checker outperforms humans
Google DeepMind’s AI fact-checker outperforms humans
Ace the Microsoft Azure Fundamentals AZ-900 Certification Exam: Pass the Azure Fundamentals Exam with Ease

While some experts question the use of “superhuman” to describe SAFE’s performance, arguing for benchmarking against expert fact-checkers, the system’s cost-effectiveness is undeniable, being 20 times cheaper than human fact-checkers.

Why does this matter?

As language models become more powerful and widely used, SAFE could combat misinformation and ensure the accuracy of AI-generated content. SAFE’s efficiency could be a game-changer for consumers relying on AI for tasks like research and content creation.

Source

X’s Grok gets a major upgrade

X.ai, Elon Musk’s AI startup, has introduced Grok-1.5, an upgraded AI model for their Grok chatbot. This new version enhances reasoning skills, especially in coding and math tasks, and expands its capacity to handle longer and more complex inputs with a 128,000-token context window.

X’s Grok gets a major upgrade
X’s Grok gets a major upgrade

Grok chatbots are known for their ability to discuss controversial topics with a rebellious touch. The improved model will first be tested by early users on X, with plans for wider availability later. This release follows the open-sourcing of Grok-1 and the inclusion of the chatbot in X’s $8-per-month Premium plan.

Why does this matter?

This is significant because Grok-1.5 represents an advancement in AI assistants, potentially offering improved help with complex tasks and better understanding of user intent through its larger context window and real-time data ability. This could impact how people interact with chatbots in the future, making them more helpful and reliable.

Source

What Else Is Happening in AI on March 29th, 2024❗

🛡️Microsoft tackles Gen AI risks with new Azure AI tools

Microsoft has launched new Azure AI tools to address the safety and reliability risks associated with generative AI. The tools, currently in preview, aim to prevent prompt injection attacks, hallucinations, and the generation of personal or harmful content. The offerings include Prompt Shields, prebuilt templates for safety-centric system messages, and Groundedness Detection.  (Link)

🤝Lightning AI partners with Nvidia to launch Thunder AI compiler

Lightning AI, in collaboration with Nvidia, has launched Thunder, an open-source compiler for PyTorch, to speed up AI model training by optimizing GPU usage. The company claims that Thunder can achieve up to a 40% speed-up for training large language models compared to unoptimized code. (Link)

🥊SambaNova’s new AI model beats Databricks’ DBRX

SambaNova Systems’ Samba-CoE v0.2 Large Language Model outperforms competitors like Databricks’ DBRX, MistralAI’s Mixtral-8x7B, and xAI’s Grok-1. With 330 tokens per second using only 8 sockets, Samba-CoE v0.2 demonstrates remarkable speed and efficiency without sacrificing precision. (Link)

🌍Google.org launches Accelerator to empower nonprofits with Gen AI

Google.org has announced a six-month accelerator program to support 21 nonprofits in leveraging generative AI for social impact. The program provides funding, mentorship, and technical training to help organizations develop AI-powered tools in areas such as climate, health, education, and economic opportunity, aiming to make AI more accessible and impactful. (Link)

📱Pixel 8 to get on-device AI features powered by Gemini Nano

Google is set to introduce on-device AI features like recording summaries and smart replies on the Pixel 8, powered by its small-sized Gemini Nano model. The features will be available as a developer preview in the next Pixel feature drop, marking a shift from Google’s primarily cloud-based AI approach. (Link)

A daily chronicle of AI Innovations: March 28th, 2024: ⚡ DBRX becomes world’s most powerful open-source LLM 🏆 Claude 3 Opus crowned the top user-rated chatbot, beating OpenAI’s GPT-4 💙 Empathy meets AI: Hume AI’s EVI redefines voice interaction

DBRX becomes world’s most powerful open source LLM

Databricks has released DBRX, a family of open-source large language models setting a new standard for performance and efficiency.  The series includes DBRX Base and DBRX Instruct, a fine-tuned version designed for few-turn interactions. Developed by Databricks’ Mosaic AI team and trained using NVIDIA DGX Cloud, these models leverage an optimized mixture-of-experts (MoE) architecture based on the MegaBlocks open-source project. This architecture allows DBRX to achieve up to twice the compute efficiency of other leading LLMs.

DBRX becomes world’s most powerful open source LLM
DBRX becomes world’s most powerful open source LLM

In terms of performance, DBRX outperforms open-source models like Llama 2 70B, Mixtral-8x7B, and Grok-1 on industry benchmarks for language understanding, programming, and math. It also surpasses GPT-3.5 on most of these benchmarks, although it still lags behind GPT-4. DBRX is available under an open license with some restrictions and can be accessed through GitHub, Hugging Face, and major cloud platforms. Organizations can also leverage DBRX within Databricks’ Data Intelligence Platform.

Why does this matter?

With DBRX, organizations can build and fine-tune powerful proprietary models using their own internal datasets, ensuring full control over their data rights. As a result, DBRX is likely to accelerate the trend of organizations moving away from closed models and embracing open alternatives that offer greater control and customization possibilities.

Source

Claude 3 Opus crowned the top user-rated chatbot, beating OpenAI’s GPT-4

Anthropic’s Claude 3 Opus has overtaken OpenAI’s GPT-4 to become the top-rated chatbot on the Chatbot Arena leaderboard. This marks the first time in approximately a year since GPT-4’s release that another language model has surpassed it in this benchmark, which ranks models based on user preferences in randomized head-to-head comparisons. Anthropic’s cheaper Haiku and mid-range Sonnet models also perform impressively, coming close to the original GPT-4’s capabilities at a significantly lower cost.

Claude 3 Opus crowned the top user-rated chatbot, beating OpenAI’s GPT-4
Claude 3 Opus crowned the top user-rated chatbot, beating OpenAI’s GPT-4

While OpenAI still dominates the market, especially among regular users with ChatGPT, this development and recent leadership changes at OpenAI have helped Anthropic gain ground. However, OpenAI is rumored to be preparing to launch an even more advanced “GPT-4.5” or “GPT-5” model as soon as this summer, which CEO Sam Altman has teased will be “amazing,” potentially allowing them to retake the lead from Anthropic’s Claude 3 Opus.

Why does this matter?

Claude’s rise to the top of the Chatbot Arena leaderboard shows that OpenAI is not invincible and will face stiff competition in the battle for AI supremacy. With well-resourced challengers like Anthropic and Google, OpenAI will need to move fast and innovate boldly to maintain its top position. Ultimately, this rivalry will benefit everyone as it catalyzes the development of more powerful, capable, and hopefully beneficial AI systems that can help solve humanity’s major challenges.

Source

Empathy meets AI: Hume AI’s EVI redefines voice interaction

In a significant development for the AI community, Hume AI has introduced a new conversational AI called Empathic Voice Interface (EVI). What sets EVI apart from other voice interfaces is its ability to understand and respond to the user’s tone of voice, adding unprecedented emotional intelligence to the interaction. By adapting its language and responses based on the user’s expressions, EVI creates a more human-like experience, blurring the lines between artificial and emotional intelligence.

EVI’s empathic capabilities extend beyond just understanding tone. It can accurately detect the end of a conversation turn, handle interruptions seamlessly, and even learn from user reactions to improve over time. These features, along with its fast and reliable transcription and text-to-speech capabilities, make EVI a highly adaptable tool for various applications. Developers can easily integrate EVI into their projects using Hume’s API, which will be publicly available in April.

Why does this matter?

Emotionally intelligent AI can be revolutionary for industries like healthcare and use cases like customer support, where empathy and emotional understanding are crucial. But we must also consider potential risks, such as overreliance on AI for emotional support or the possibility of AI systems influencing users’ emotions in unintended ways. If developed and implemented ethically, emotionally intelligent AI can greatly enhance how we interact with and benefit from AI technologies in our daily lives.

Source

What Else Is Happening in AI on March 28th, 2024❗

💰 OpenAI launches revenue sharing program for GPT Store builders

OpenAI is experimenting with sharing revenue with builders who create successful apps using GPT in OpenAI’s GPT Store. The goal is to incentivize creativity and collaboration by rewarding builders for their impact on an ecosystem OpenAI is testing so they can make it easy for anyone to build and monetize AI-powered apps. (Link)

🛍️ Google introduces new shopping features to refine searches

Google is rolling out new shopping features that allow users to refine their searches and find items they like more easily. The Style Recommendations feature lets shoppers rate items in their searches, helping Google pick up on their preferences. Users can also specify their favorite brands to instantly bring up more apparel from those selections.  (Link)

🗣️ rabbit’s r1 device gets ultra-realistic voice powered by ElevenLabs

ElevenLabs has partnered with rabbit to integrate its high-quality, low-latency voice AI into rabbit’s r1 AI companion device. The collaboration aims to make the user experience with r1 more natural and intuitive by allowing users to interact with the device using voice commands. (Link)

💸 AI startup Hume raises $50M to build emotionally intelligent conversational AI

AI startup Hume has raised $50 million in a Series B funding round, valuing the company at $219 million. Hume’s AI technology can detect over 24 distinct emotional expressions in human speech and generate appropriate responses. The startup’s AI has been integrated into applications across healthcare, customer service, and productivity, with the goal of providing more context and empathy in AI interactions. (Link)

💻 Lenovo launches AI-enhanced PCs in a push for innovation and differentiation

Lenovo revealed a new lineup of AI-powered PCs and laptops at its Innovate event in Bangkok, Thailand. The company showcased the dual-screen Yoga Book 9i, Yoga Pro 9i with an AI chip for performance optimization and AI-enhanced Legion gaming laptops. Lenovo hopes to differentiate itself in the crowded PC market and revive excitement with these AI-driven innovations. (Link)

Study shows ChatGPT can produce medical record notes 10 times faster than doctors without compromising quality

The AI model ChatGPT can write administrative medical notes up to 10 times faster than doctors without compromising quality. This is according to a study conducted by researchers at Uppsala University Hospital and Uppsala University in collaboration with Danderyd Hospital and the University Hospital of Basel, Switzerland. The research is published in the journal Acta Orthopaedica.

Source

Microsoft Copilot AI will soon run locally on PCs

Microsoft’s Copilot AI service is set to run locally on PCs, Intel told Tom’s Hardware. The company also said that next-gen AI PCs would require built-in neural processing units (NPUs) with over 40 TOPS (trillion operations per second) of power — beyond the capabilities of any consumer processor on the market.

Intel said that the AI PCs would be able to run “more elements of Copilot” locally. Currently, Copilot runs nearly everything in the cloud, even small requests. That creates a fair amount of lag that’s fine for larger jobs, but not ideal for smaller jobs. Adding local compute capability would decrease that lag, while potentially improving performance and privacy as well.

Microsoft was previously rumored to require 40 TOPS on next-gen AI PCs (along with a modest 16GB of RAM). Right now, Windows doesn’t make much use of NPUs, apart from running video effects like background blurring for Surface Studio webcams. ChromeOS and macOS both use NPU power for more video and audio processing features, though, along with OCR, translation, live transcription and more, Ars Technica noted.

Source

A daily chronicle of AI Innovations: March 27th, 2024: 🔥 Microsoft study reveals the 11 by 11 tipping point for AI adoption 🤖 A16z spotlights the rise of generative AI in enterprises 🚨 Gaussian Frosting revolutionizes surface reconstruction in 3D modeling 🤖OpenAI unveils exciting upcoming features for GPT-4 and DALL-E 3 🤖 Adobe unveils GenStudio: AI-powered ad creation platform

Microsoft study reveals the 11 by 11 tipping point for AI adoption

Microsoft’s study on AI adoption in the workplace revealed the “11-by-11 tipping point,” where users start seeing AI’s value by saving 11 minutes daily. The study involved 1,300 Copilot for Microsoft 365 users and showed that 11 minutes of time savings is enough for most people to find AI useful.

Microsoft study reveals the 11 by 11 tipping point for AI adoption
Microsoft study reveals the 11 by 11 tipping point for AI adoption

Over 11 weeks, users reported improved productivity, work enjoyment, work-life balance, and fewer meetings. This “11-by-11 tipping point” signifies the time it takes for individuals to experience AI’s benefits in their work fully.

Why does it matter?

The study offers insights for organizations aiming to drive AI adoption among their employees. Businesses can focus on identifying specific use cases that deliver immediate benefits like time and cost savings. It will help organizations encourage employees to embrace AI, increasing productivity and improving work experiences.

Source

A16z spotlights the rise of generative AI in enterprises

A groundbreaking report by the influential tech firm a16z  unveils the rapid integration of generative AI technologies within the corporate sphere. The report highlights essential considerations for business leaders to harness generative AI effectively. It covers resource allocation, model selection, and innovative use cases, providing a strategic roadmap for enterprises.

A16z spotlights the rise of generative AI in enterprises
A16z spotlights the rise of generative AI in enterprises

An increased financial commitment from businesses marks the adoption of generative AI. Industry leaders are tripling their investments in AI technologies, emphasizing the pivotal role of generative AI in driving innovation and efficiency.

A16z spotlights the rise of generative AI in enterprises
A16z spotlights the rise of generative AI in enterprises

The shift towards integrating AI into core operations is evident. There is a focus on measuring productivity gains and cost savings and quantifying impact on key business metrics.

Why does it matter?

The increasing budgets allocated to generative AI signal its strategic importance in driving innovation and productivity in enterprises. This highlights AI’s transformative potential to provide a competitive edge and unlock new opportunities. Generative AI can revolutionize various business operations and help gain valuable insights by leveraging diverse data types.

Source

Gaussian Frosting revolutionizes surface reconstruction in 3D modeling

At the international conference on computer vision, researchers presented a new method to improve surface reconstruction using Gaussian Frosting. This technique automates the adjustment of Poisson surface reconstruction hyperparameters, resulting in significantly improved mesh reconstruction.

Gaussian Frosting revolutionalizes surface reconstruction in 3D modeling
Gaussian Frosting revolutionalizes surface reconstruction in 3D modeling

The method showcases the potential for scaling up mesh reconstruction while preserving intricate details and opens up possibilities for advanced geometry and texture editing. This work marks a significant step forward in surface reconstruction methods, promising advancements in 3D modeling and visualization techniques.

Why does it matter?

The new method demonstrates how AI enhances surface reconstruction techniques, improving mesh quality and enabling advanced editing in 3D modeling. This has significant implications for revolutionizing how 3D models are created, edited, and visualized across various industries.

Source

AIs can now learn and talk with each other like humans do.

This seems an important step toward AGI and vastly improved productivity.

“Once these tasks had been learned, the network was able to describe them to a second network — a copy of the first — so that it could reproduce them. To our knowledge, this is the first time that two AIs have been able to talk to each other in a purely linguistic way,’’ said lead author of the paper Alexandre Pouget, leader of the Geneva University Neurocenter, in a statement.”

“While AI-powered chatbots can interpret linguistic instructions to generate an image or text, they can’t translate written or verbal instructions into physical actions, let alone explain the instructions to another AI.

However, by simulating the areas of the human brain responsible for language perception, interpretation and instructions-based actions, the researchers created an AI with human-like learning and communication skills.”

Source

What Else Is Happening in AI on March 27th, 2024❗

🤖 Adobe unveils GenStudio: AI-powered ad creation platform

Adobe introduced GenStudio, an AI-powered ad creation platform, during its Summit event. GenStudio is a centralized hub for promotional campaigns, offering brand kits, copy guidance, and preapproved assets. It also provides generative AI-powered tools for generating backgrounds and ensuring brand consistency. Users can quickly create ads for email and social media platforms like Facebook, Instagram, and LinkedIn. (Link)

🧑‍💼Airtable introduces AI summarization for enhanced productivity

Airtable has introduced Airtable AI, which provides generative AI summarization, categorization, and translation to users. This feature allows quick insights and understanding of information within workspaces, enabling easy sharing of valuable insights with teams. Airtable AI automatically applies categories and tags to information, routes action items to the relevant team, and generates emails or social posts with a single button tap. (Link)

🤝Microsoft Teams enhances Copilot AI features for improved collaboration

Microsoft is introducing smarter Copilot AI features in Microsoft Teams to enhance collaboration and productivity. The updates include new ways to invoke the assistant during meeting chats and summaries, making it easier to catch up on missed meetings by combining spoken transcripts and written chats into a single view. Microsoft is launching new hybrid meeting features, such as automatic camera switching for remote participants and speaker recognition for accurate transcripts. (Link)

🤖OpenAI unveils exciting upcoming features for GPT-4 and DALL-E 3

OpenAI is preparing to introduce new features for its GPT-4 and DALL-E 3 models. For GPT-4, OpenAI plans to remove the message limit, implement a Model Tuner Selector, and allow users to upgrade responses from GPT-3.5 to GPT-4 with a simple button push. On the DALL-E 3 front, OpenAI is working on an image editor with inpainting functionality. These upcoming features demonstrate OpenAI’s commitment to advancing AI capabilities. (Link)

🔍Apple Chooses Baidu’s AI for iPhone 16 in China

Apple has reportedly chosen Baidu to provide AI technology for its upcoming iPhone 16 and other devices in China. This decision comes as Apple faces challenges due to stagnation in iPhone innovation and competition from Huawei. Baidu’s Ernie Bot will be included in the Chinese version of the iPhone 16, Mac OS, and iOS 18. Despite discussions with Alibaba Group Holding and a Tsinghua University AI startup, Apple selected Baidu’s AI technology for compliance. (Link)

Meta CEO, Mark Zuckerberg, is directly recruiting AI talent from Google’s DeepMind with personalized emails.

Meta CEO, Mark Zuckerberg, is attempting to recruit top AI talent from Google’s DeepMind (their AI research unit). Personalised emails, from Zuckerberg himself, have been sent to a few of their top researchers, according to a report from The Information, which cited individuals that had seen the messages. In addition to this, the researchers are being hired without having to do any interviews, and, a previous policy which Meta had in place – to not offer higher offers to candidates with competing job offers – has been relaxed.

Zuckerberg appears to be on a hiring spree to build Meta into a position of being a dominant player in the AI space.

OpenAI’s Sora Takes About 12 Minutes to Generate 1 Minute Video on NVIDIA H100. Source.

Apple on Tuesday announced that its annual developers conference, WWDC, will take place June 10 through June 14. Source.

Elon Musk says all Premium subscribers on X will gain access to AI chatbot Grok this week. Source.

Intel unveils AI PC program for software developers and hardware vendors. Source.

London-made HIV injection has potential to cure millions worldwide

Source

A daily chronicle of AI Innovations: March 26th, 2024 : 🔥 Zoom launches all-in-one modern AI collab platform; 🤖 Stability AI launches instruction-tuned LLM; 🚨 Stability AI CEO resigns to focus on decentralized AI; 🔍 WhatsApp to integrate Meta AI directly into its search bar; 🥊 Google, Intel, and Qualcomm challenge Nvidia’s dominance in AI; 🎬 OpenAI pitches Sora to Hollywood studios

Zoom launches all-in-one modern AI collab platform

Zoom launched Zoom Workplace, an AI collaboration platform that integrates many tools to improve teamwork and productivity. With over 40 new features, including AI Companion updates for Zoom Phone, Team Chat, Events, and Contact Center, as well as the introduction of Ask AI Companion, Zoom Workplace simplifies workflows within a familiar interface.

The platform offers customization options, meeting features, and improved collaboration tools across Zoom’s ecosystem. Zoom Business Services, integrated with Zoom Workplace, offers AI-driven marketing, customer service, and sales solutions. It expands digital communication channels and provides real-time insights for better agent management.

Why does this matter?

This intelligent platform will increase productivity by automating tasks, summarizing interactions, and personalizing user experiences. This move positions Zoom as a frontrunner in the race to integrate AI into everyday work tools, which will reshape how teams communicate and collaborate.

Source

Stability AI launches instruction-tuned LLM

Stability AI has introduced Stable Code Instruct 3B, a new instruction-tuned large language model. It can handle various software development tasks, such as code completion, generation, translation, and explanation, as well as creating database queries with simple instructions.

Stable Code Instruct 3B claims to outperform rival models like CodeLlama 7B Instruct and DeepSeek-Coder Instruct 1.3B in terms of accuracy, understanding natural language instructions, and handling diverse programming languages. The model is accessible for commercial use with a Stability AI Membership, while its weights are freely available on Hugging Face for non-commercial projects.

Why does this matter?

This model simplifies development workflows and complex tasks by providing contextual code completion, translation, and explanations. Businesses can prototype, iterate and ship software products faster thanks to its high performance and low hardware requirements.

Source

Stability AI CEO resigns because of centralized AI

  • Stability AI CEO Emad Mostaque steps down to focus on decentralized AI, advocating for transparent governance in the industry.

  • Mostaque’s departure follows the appointment of interim co-CEOs Shan Shan Wong and Christian Laforte.

  • The startup, known for its image generation tool, faced challenges including talent loss and financial struggles.

  • Mostaque emphasized the importance of generative AI R&D over revenue growth and highlighted the potential economic value of open models in regulated industries.

  • The AI industry witnessed significant changes with Inflection AI co-founders joining Microsoft after raising $1.5 billion.

Source

Estimating Sora’s power requirements

Quoting the compute estimates of Sora from the factorial funds blog

Estimating Sora's power requirements
Estimating Sora’s power requirements

A 15% penetration of Sora for videos with realistic video generation demand and utilization will require about 720k Nvidia H100 GPUs. Each H100 requires about 700 Watts of power supply.

720,000 x 700 = 504 Megawatts.

By comparison, even the largest ever fully solar powered plan in America (Ivanpah Solar Power Facility) produces about 377 Megawats.

While these power requirements can be met with other options like nuclear plants and even coal/hydro plants of big sizes … are we really entering the power game for electricity ?

( it is currently a power game on compute)

What Else Is Happening in AI on March 26th, 2024❗

💬 The Financial Times has introduced Ask FT, a new GenAI chatbot

It provides curated, natural-language responses to queries about recent events and broader topics covered by the FT. Ask FT is powered by Anthropic’s Claude and is available to a selected group of subscribers as it is under testing. (Link)

🔍 WhatsApp to integrate Meta AI directly into its search bar

The latest Android WhatsApp beta update will embed Meta AI directly into the search bar. This feature will allow users to type queries into the search bar and receive instant AI-powered responses without creating a separate Meta AI chat. The update will also allow users to interact with Meta AI even if they choose to hide the shortcut. (Link)

🥊 Google, Intel, and Qualcomm challenge Nvidia’s dominance in AI 

Qualcomm, Google, and Intel are targeting NVIDIA’s software platforms like CUDA. They plan to create open-source tools compatible with multiple AI accelerator chips through the UXL Foundation. Companies are investing over $4 billion in startups developing AI software to loosen NVIDIA’s grip on the field. (Link)

🤖 Apple takes a multi-vendor approach for generative AI in iOS 18

Apple is reportedly in talks with Alphabet, OpenAI, and Anthropic to integrate generative AI capabilities from multiple vendors into iOS 18. This multi-vendor approach aligns with Apple’s efforts to balance advanced AI features with privacy considerations, which are expected to be detailed at WWDC 2024 during the iOS 18 launch. (Link)

🎬 OpenAI pitches Sora to Hollywood studios

OpenAI is actively engaging with Hollywood studios, directors, and talent agencies to integrate Sora into the entertainment industry. The startup has scheduled meetings in Los Angeles to showcase Sora’s capabilities and encourage partnerships, with CEO Sam Altman attending events during the Oscars weekend. (Link)

LLM providers charge you per token, but their tokens are not always comparable. So if you are putting Python code through GPT-4 and Claude 3, it would cost you 25% more tokens to do so with Claude, due to difference in their tokenisers (note: this is different to cost per token, it just means you will have more tokens to pay for)

Some observations:
– OpenAI’s GPT-4 & 3.5 tokeniser is the most efficient for English and Python
– Gemini absolutely demolishes the competition in the three languages I tested: French (-11%), Chinese (-43%) and Hebrew (-54%)
– If your use cases is non-English, really worth looking at Gemini models – the difference in cost will likely be very noticeable
– Llama 2 ranked at the bottom of all of my tests
– Mistral was kind of disappointing on French (+16% worse than GPT), the reason why I picked French was that I assumed they’d do better

Methodology notes:
– The study will be limited, I only compared 7 individual bits of text/code – so results in practice would vary
– I have used this tokeniser playground (https://huggingface.co/spaces/Xenova/the-tokenizer-playground) for GPT, Mistral and Llama. I found it to be inaccurate (or old?) for Claude 3 and they didn’t have Gemini, so I did these separately
– Tokens are only part of the puzzle, more efficient tokenisation won’t necessarily mean better performance or overall lower cost
– If you want to learn about tokenisers, I recommend watching this video from Andrej Karpathy, even the first 10-20 minutes will be really worth your time https://www.youtube.com/watch?v=zduSFxRajkE

No alt text provided for this image

Source: Peter Gostev

A daily chronicle of AI Innovations: March 25th, 2024 : 🤝 Apple could partner with OpenAI, Gemini, Anthropic; 🤖 Chatbots more likely to change your mind than another human, study says; 🤖 Chatbots more likely to change your mind than another human, study says; Verbal Reasoning Test – Opus is better than 93% of people, Gemini 1.5 Pro 59%, GPT-4 Turbo only 36%; Apple’s Tim Cook says AI essential tool for businesses to reduce carbon footprint; Suno V3: Song-on-demand AI is getting insanely good; The first patient with a Neuralink brain-computer implant played Nintendo’s Mario Kart video game with his mind in an impressive new demo video

🤝 Apple could partner with OpenAI, Gemini, Anthropic

  • Apple is discussing with Alphabet, OpenAI, Anthropic, and potentially Baidu to integrate generative AI into iOS 18, considering multiple partners rather than a single one.
  • The collaboration could lead to a model where iPhone users might choose their preferred AI provider, akin to selecting a default search engine in a web browser.
  • Reasons for partnering with external AI providers include financial benefits, the possibility to quickly adapt through partnership changes or user preferences, and avoiding the complexities of developing and maintaining cloud-based generative AI in-house.
  • Source

🌐 EU probes Apple, Google, Meta under new digital law 

  • The European Commission has initiated five investigations into Apple, Google, and Meta for potential non-compliance with the Digital Markets Act (DMA), focusing on app store rules, search engine preferencing, and advertisement targeting models.
  • Investigations will also examine Apple’s app distribution fee structure and Amazon’s product preferencing, while Meta is given six months to make Messenger interoperable with other messaging services.
  • Companies may face fines up to 10% of their annual global revenue for DMA non-compliance, with the possibility of increased penalties for repeated infringements.
  • Source

🤖 Chatbots more likely to change your mind than another human, study says

  • A study found that personalized chatbots, such as GPT-4, are more likely to change people’s minds compared to human debaters by using tailored arguments based on personal information.
  • The research conducted by the École Polytechnique Fédérale de Lausanne and the Italian Fondazione Bruno Kessler showed an 81.7 percent increase in agreement when GPT-4 had access to participants’ personal data like age, gender, and race.
  • Concerns were raised about the potential misuse of AI in persuasive technologies, especially with the ability to generate detailed user profiles from online activities, urging online platform operators to counter such strategies.
  • Source

OpenAI CEO’s £142 Million Gamble On Unlocking the Secrets to Longer Life, Altman’s vision of extended lifespans may be achievable

Biotech startup Retro Biosciences is undertaking a one-of-a-kind experiment housed in shipping containers, funded by a $180 (£142.78) million investment by tech leader Sam Altman to increase lifespan.

Altman, the 38-year-old tech heavyweight, has been a significant player in the industry. Despite his young age, Altman took the tech realm by storm with offerings like ChatGPT and Sora. Unsurprisingly, his involvement in these groundbreaking projects has propelled him to a level of influence rivaling Mark Zuckerberg and Elon Musk, who is currently embroiled in a lawsuit with OpenAI.

It is also worth noting that the Altman-led AI startup is reportedly planning to launch its own AI-powered search engine to challenge Google’s search dominance. Altman’s visionary investments in tech giants like Reddit, Stripe, Airbnb, and Instacart propelled him to billionaire status. They cemented his influence as a tech giant who relentlessly pushed the boundaries of the industry’s future.

Source

Nvidia announces AI-powered health care 'agents' that outperform nurses — and cost $9 an hour
Nvidia announces AI-powered health care ‘agents’ that outperform nurses — and cost $9 an hour

Apple researchers explore dropping “Siri” phrase and listening with AI instead

  • Apple researchers are investigating the use of AI to identify when a user is speaking to a device without requiring a trigger phrase like ‘Siri’.

  • A study involved training a large language model using speech and acoustic data to detect patterns indicating the need for assistance from the device.

  • The model showed promising results, outperforming audio-only or text-only models as its size increased.

  • Eliminating the ‘Hey Siri’ prompt could raise concerns about privacy and constant listening by devices.

  • Apple’s handling of audio data has faced scrutiny in the past, leading to policy changes regarding user data and Siri recordings.

Source

Suno V3 can do multiple languages in one song. This one is English, Portuguese, Japanese, and Italian. Incredible.

Beneath the vast sky, where dreams lay rooted deep, Mountains high and valleys wide, secrets they keep. Ground beneath my feet, firm and ever true, Earth, you give us life, in shades of brown and green hue.

Sopra o vento, mensageiro entre o céu e o mar, Carregando sussurros, histórias a contar. Dançam as folhas, em um balé sem fim, Vento, o alento invisível, guiando o destino assim.

火のように、情熱が燃えて、 光と暖かさを私たちに与えてくれる。 夜の暗闇を照らす、勇敢な炎、 生命の力、絶えず変わるゲーム。

Acqua, misteriosa forza che tutto scorre, Nei fiumi, nei mari, la vita che ci offre. Specchio del cielo, in te ci riflettiamo, Acqua, fonte di vita, a te ci affidiamo.

Listen here

OpenAI Heading To Hollywood To Pitch Revolutionary “Sora”

Some of the most important meetings in Hollywood history will take place in the coming week, as OpenAI hits Hollywood to show the potential of its “Sora” software to studios, talent agencies, and media executives.

Bloomberg is reporting that OpenAI wants more filmmakers to become familiar with Sora, the text-to-video generator that potentially could upend the way movies are made.

Soon, Everyone Will Own a Robot, Like a Car or Phone Today. Says Figure AI founder

Brett Adcock, the founder of FigureAI robots, the company that recently released a demo video of its humanoid robot conversing with a human while performing tasks, predicts that everyone will own a robot in the future. “Similar to owning a car or phone today,” he said – hinting at the universal adoption of robots as an essential commodity in the future.

“Every human will own a robot in the future, similar to owning a car/phone today,” said Adcock.

A few months ago, Adcock called 2024 the year of Embodied AI, indicating how the future comprises AI in a body form. With robots learning to perform low-complexity tasks, such as picking trash, placing dishes, and even using the coffee machine, Figure robots are being trained to assist a person with house chores.

Source

WhatsApp to embed Meta AI directly into search bar for instant assistance: Report. 

WhatsApp is on the brink of a transformation in user interaction as it reportedly plans to integrate Meta AI directly into its search bar. This move promises to simplify access to AI assistance within the app, eliminating the need for users to navigate to a separate Meta AI conversation.

WhatsApp to embed Meta AI directly into search bar for instant assistance
WhatsApp to embed Meta AI directly into search bar for instant assistance

Source

How People are really using Gen AI

Top-level themes:

1️⃣ Technical Assistance & Troubleshooting (23%)
2️⃣ Content Creation & Editing (22%)
3️⃣ Personal & Professional Support (17%)
4️⃣ Learning & Education (15%)
5️⃣ Creativity & Recreation (13%)
6️⃣ Research, Analysis & Decision Making (10%)

What users are doing:

✔Generating ideas
✔Specific search
✔Editing text
✔Drafting emails
✔Simple explainers
✔Excel formulas
✔Sampling data

🤔 Do you see AI as a tool to enhance your work, or as a threat that could take over your job?

Source: HBR
Image credit: Filtered

How People are really using Gen AI
How People are really using Gen AI

A daily chronicle of AI Innovations: March 22nd, 2024 : 🤖 Nvidia’s Latte 3D generates text-to-3D in seconds! 💰 Saudi Arabia to invest $40 billion in AI 🚀 Open Interpreter’s 01 Light personal pocket AI agent. 🤖 Microsoft introduces a new Copilot for better productivity.
💡Quiet-STaR: LMs can self-train to think before responding
🤯Neuralink’s first brain chip patient plays chess with his mind

Experience the transformative capabilities of AI with “Read Aloud For Me – AI Dashboard” – your ultimate AI Dashboard and Hub.

Nvidia’s Latte 3D generates text-to-3D in seconds!

NVIDIA introduces Latte3D, facilitating the conversion of text prompts into detailed 3D models in less than a second. Developed by NVIDIA’s Toronto lab, Latte3D sets a new standard in generative AI models with its remarkable blend of speed and precision.

Nvidia’s Latte 3D generates text-to-3D in seconds!
Nvidia’s Latte 3D generates text-to-3D in seconds!

LATTE3D has two stages: first, NVIDIA’s team uses volumetric rendering to train the texture and geometry robustly, and second, it uses surface-based rendering to train only the texture for quality enhancement. Both stages use amortized optimization over prompts to maintain fast generation.

Nvidia’s Latte 3D generates text-to-3D in seconds!
Nvidia’s Latte 3D generates text-to-3D in seconds!

What sets Latte3D apart is its extensive pretraining phase, enabling the model to quickly adapt to new tasks by drawing on a vast repository of learned patterns and structures. This efficiency is achieved through a rigorous training regime that includes a blend of 3D datasets and prompts from ChatGPT.

Why does it matter?

AI models such as NVIDIA’s Latte3D have significantly reduced the time required to generate 3D visualizations from an hour to a few minutes compared to a few years ago. This technology has the potential to significantly accelerate the design and development process in various fields, such as the video game industry, advertising, and more.

Source

Quiet-STaR: LMs can self-train to think before responding

A groundbreaking study demonstrates the successful training of large language models (LM) to reason from text rather than specific reasoning tasks. The research introduces a novel training approach, Quiet STaR, which utilizes a parallel sampling algorithm to generate rationales from all token positions in a given string.

Quiet-STaR: LMs can self-train to think before responding
Quiet-STaR: LMs can self-train to think before responding

This technique integrates meta tokens to indicate when the LM should generate a rationale and when it should make a prediction based on the rationale, revolutionizing the understanding of LM behavior. Notably, the study shows that thinking enables the LM to predict difficult tokens more effectively, leading to improvements with longer thoughts.

The research introduces powerful advancements, such as a non-myopic loss approach, the application of a mixing head for retrospective determination, and the integration of meta tokens, underpinning a comprehensive leap forward in language model training.

Why does it matter?

These significant developments in language modeling advance the field and have the potential to revolutionize a wide range of applications. This points towards a future where large language models will unprecedentedly contribute to complex reasoning tasks.

Source

Neuralink’s first brain chip patient plays chess with his mind

Elon Musk’s brain chip startup, Neuralink, showcased its first brain chip patient playing chess using only his mind. The patient, Noland Arbaugh, was paralyzed below the shoulder after a diving accident.

Neuralink’s brain implant technology allows people with paralysis to control external devices using their thoughts. With further advancements, Neuralink’s technology has the potential to revolutionize the lives of people with paralysis, providing them with newfound independence and the ability to interact with the world in previously unimaginable ways.

Why does it matter?

Neuralink’s brain chip holds significant importance in AI and human cognition. It has the potential to enhance communication, assist paralyzed individuals, merge human intelligence with AI, and address the risks associated with AI development. However, ethical considerations and potential misuse of this technology must also be carefully examined.

Source

What Else Is Happening in AI on March 22nd, 2024❗

🤖 Microsoft introduces a new Copilot for better productivity.

Microsoft’s new Copilot for Windows and Surface devices is a powerful productivity tool integrating large language models with Microsoft Graph and Microsoft 365 apps to enhance work efficiency. With a focus on delivering AI responsibly while ensuring data security and privacy, Microsoft is dedicated to providing users with innovative tools to thrive in the evolving work landscape. (Link)

💰 Saudi Arabia to invest $40 billion in AI

Saudi Arabia has announced its plan to invest $40 billion in AI to become a global leader. Middle Eastern countries use their sovereign wealth fund, which has over $900 billion in assets, to achieve this goal. This investment aims to position the country at the forefront of the fast-evolving AI sector, drive innovation, and enhance economic growth. (Link)

🎧 Rightsify releases Hydra II to revolutionize AI music generation

Rightsify, a global music licensing leader, introduced Hydra II, the latest AI generation model. Hydra II offers over 800 instruments, 50 languages, and editing tools for customizable, copyright-free AI music. The model is trained on audio, text descriptions, MIDI, chord progressions, sheet music, and stems to create unique generations. (Link)

🚀 Open Interpreter’s 01 Light personal pocket AI agent

The Open Interpreter unveiled 01 Light, a portable device that allows you to control your computer using natural language commands. It’s part of an open-source project to make computing more accessible and flexible. It’s designed to make your online tasks more manageable, helping you get more done and simplify your life. (Link)

🤝 Microsoft’s $650 million Inflection deal: A strategic move
Microsoft has recently entered into a significant deal with AI startup Inflection, involving a payment of $650 million in cash. While the deal may seem like a licensing agreement, it appears to be a strategic move by Microsoft to acquire AI talent while avoiding potential regulatory trouble. (Link)

Microsoft unveiled its first “AI PCs,” with a dedicated Copilot key and Neural Processing Units (NPUs).

Microsoft unveiled its first "AI PCs," with a dedicated Copilot key and Neural Processing Units (NPUs).
Microsoft unveiled its first “AI PCs,” with a dedicated Copilot key and Neural Processing Units (NPUs).

Source: Nvidia

OpenAI Courts Hollywood in Meetings With Film Studios, Directors – from Bloomberg

The artificial intelligence startup has scheduled meetings in Los Angeles next week with Hollywood studios, media executives and talent agencies to form partnerships in the entertainment industry and encourage filmmakers to integrate its new AI video generator into their work, according to people familiar with the matter.

The upcoming meetings are just the latest round of outreach from OpenAI in recent weeks, said the people, who asked not to be named as the information is private. In late February, OpenAI scheduled introductory conversations in Hollywood led by Chief Operating Officer Brad Lightcap. Along with a couple of his colleagues, Lightcap demonstrated the capabilities of Sora, an unreleased new service that can generate realistic-looking videos up to about a minute in length based on text prompts from users. Days later, OpenAI Chief Executive Officer Sam Altman attended parties in Los Angeles during the weekend of the Academy Awards.

In an attempt to avoid defeatism, I’m hoping this will contribute to the indie boom with creatives refusing to work with AI and therefore studios who insist on using it. We’ve already got people on twitter saying this is the end of the industry but maybe only tentpole films as we know them.

Here’s the article without the paywall.

Catherine, the Princess of Wales, has cancer, she announced in a video message released by Kensington Palace on Friday March 22nd, 2024

The recent news surrounding Kate Middleton, the Princess of Wales, revolves around a manipulated family photo that sparked controversy and conspiracy theories. The photo, released by Middleton herself, depicted her with her three children and was met with speculation about potential AI involvement in its editing. However, experts suggest that the image was likely manipulated using traditional photo editing software like Photoshop rather than generative AI

The circumstances surrounding Middleton’s absence from the public eye due to abdominal surgery fueled rumors and intensified scrutiny over the edited photo.

Major news agencies withdrew the image, citing evidence of manipulation in areas like Princess Charlotte’s sleeve cuff and the alignment of elements in the photo.

Despite concerns over AI manipulation, this incident serves as a reminder that not all image alterations involve advanced technology, with this case being attributed to a botched Photoshop job.From an AI perspective, experts highlight how the incident reflects society’s growing awareness of AI technologies and their impact on shared reality. The controversy surrounding the edited photo underscores the need for transparency and accountability in media consumption to combat misinformation and maintain trust in visual content. As AI tools become more accessible and sophisticated, distinguishing between authentic and manipulated media becomes increasingly challenging, emphasizing the importance of educating consumers and technologists on identifying AI-generated content.Kate Middleton, the Princess of Wales, recently disclosed her battle with cancer in a heartfelt statement. Following major abdominal surgery in January, it was initially believed that her condition was non-cancerous. However, subsequent tests revealed the presence of cancer, leading to the recommendation for preventative chemotherapy. The 42-year-old princess expressed gratitude for the support received during this challenging time and emphasized the importance of privacy as she focuses on her treatment and recovery. The news of her diagnosis has garnered an outpouring of support from around the world, with messages of encouragement coming from various public figures and officials.

Nvidia CEO says we’ll see fully AI-generated games in 5-10 years

Nvidia’s CEO, Jensen Huang, predicts the emergence of fully AI-generated games within the next five to ten years. This prediction is based on the development of Nvidia’s next-generation Blackwell AI GPU, the B200. This GPU marks a significant shift in GPU usage towards creating neural networks for generating content rather than traditional rasterization or ray tracing for visual fidelity in games. The evolution of AI in gaming is highlighted as GPUs transition from rendering graphics to processing AI algorithms for content creation, indicating a major transformation in the gaming industry’s future landscape.The integration of AI into gaming represents a paradigm shift that could revolutionize game development and player experiences. Fully AI-generated games have the potential to offer unprecedented levels of customization, dynamic storytelling, and adaptive gameplay based on individual player interactions. This advancement hints at a new era of creativity and innovation in game design but also raises questions about the ethical implications and challenges surrounding AI-generated content, such as ensuring diversity, fairness, and avoiding biases in virtual worlds. Source

Andrew Ng, cofounder of Google Brain & former chief scientist @ Baidu- “I think AI agentic workflows will drive massive AI progress this year — perhaps even more than the next generation of foundation models.

This is an important trend, and I urge everyone who works in AI to pay attention to it.”

AI agentic workflows will drive massive AI progress
AI agentic workflows will drive massive AI progress

I think AI agentic workflows will drive massive AI progress this year — perhaps even more than the next generation of foundation models. This is an important trend, and I urge everyone who works in AI to pay attention to it.

Today, we mostly use LLMs in zero-shot mode, prompting a model to generate final output token by token without revising its work. This is akin to asking someone to compose an essay from start to finish, typing straight through with no backspacing allowed, and expecting a high-quality result. Despite the difficulty, LLMs do amazingly well at this task!

With an agentic workflow, however, we can ask the LLM to iterate over a document many times. For example, it might take a sequence of steps such as:

  • Plan an outline.

  • Decide what, if any, web searches are needed to gather more information.

  • Write a first draft.

  • Read over the first draft to spot unjustified arguments or extraneous information.

  • Revise the draft taking into account any weaknesses spotted.

  • And so on.

This iterative process is critical for most human writers to write good text. With AI, such an iterative workflow yields much better results than writing in a single pass.

Devin’s splashy demo recently received a lot of social media buzz. My team has been closely following the evolution of AI that writes code. We analyzed results from a number of research teams, focusing on an algorithm’s ability to do well on the widely used HumanEval coding benchmark. You can see our findings in the diagram below.

GPT-3.5 (zero shot) was 48.1% correct. GPT-4 (zero shot) does better at 67.0%. However, the improvement from GPT-3.5 to GPT-4 is dwarfed by incorporating an iterative agent workflow. Indeed, wrapped in an agent loop, GPT-3.5 achieves up to 95.1%.

Open source agent tools and the academic literature on agents are proliferating, making this an exciting time but also a confusing one. To help put this work into perspective, I’d like to share a framework for categorizing design patterns for building agents. My team AI Fund is successfully using these patterns in many applications, and I hope you find them useful.

  • Reflection: The LLM examines its own work to come up with ways to improve it.

  • Tool use: The LLM is given tools such as web search, code execution, or any other function to help it gather information, take action, or process data.

  • Planning: The LLM comes up with, and executes, a multistep plan to achieve a goal (for example, writing an outline for an essay, then doing online research, then writing a draft, and so on).

  • Multi-agent collaboration: More than one AI agent work together, splitting up tasks and discussing and debating ideas, to come up with better solutions than a single agent would.

  • Source

A daily chronicle of AI Innovations: March 21st, 2024 : 🕵️‍♂️ Stealing Part of a Production Language Model
🤖 Sakana AI’s method to automate foundation model development
👋 Key Stable Diffusion researchers leave Stability AI  🗣️Character AI’s new feature adds voice to characters with just 10-sec audio 💡Fitbit to get major AI upgrades powered by Google’s ‘Personal Health’ LLM 🔬Samsung creates lab to research chips for AI’s next phase 🤖GitHub’s latest AI tool can automatically fix code vulnerabilities

Stealing Part of a Production Language Model

Researchers from Google, OpenAI, and DeepMind (among others) released a new paper that introduces the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI’s ChatGPT or Google’s PaLM-2.

The attack allowed them to recover the complete embedding projection layer of a transformer language model. It differs from prior approaches that reconstruct a model in a bottom-up fashion, starting from the input layer. Instead, this operates top-down and directly extracts the model’s last layer by making targeted queries to a model’s API. This is useful for several reasons; it

  • Reveals the width of the transformer model, which is often correlated with its total parameter count.
  • Slightly reduces the degree to which the model is a complete “blackbox”
  • May reveal more global information about the model, such as relative size differences between different models

While there appear to be no immediate practical consequences of learning this layer is stolen, it represents the first time that any precise information about a deployed transformer model has been stolen.

Stealing Part of a Production Language Model
Stealing Part of a Production Language Model

Why does this matter?

Though it has limitations, the paper motivates the further study of practical attacks on ML models, in order to ultimately develop safer and more reliable AI systems. It also highlights how small, system-level design decisions impact the safety and security of the full product.

Source

Sakana AI’s method to automate foundation model development

Sakana AI has introduced Evolutionary Model Merge, a general method that uses evolutionary techniques to efficiently discover the best ways to combine different models from the vast ocean of different open-source models with diverse capabilities.

As of writing, Hugging Face has over 500k models in dozens of different modalities that, in principle, could be combined to form new models with new capabilities. By working with the vast collective intelligence of existing open models, this method is able to automatically create new foundation models with desired capabilities specified by the user.

Why does this matter?

Model merging shows great promise and democratizes up model-building. In fact, the current Open LLM Leaderboard is dominated by merged models. They work without any additional training, making it very cost-effective. But we need a more systematic approach.

Evolutionary algorithms, inspired by natural selection, can unlock more effective merging. They can explore vast possibilities, discovering novel and unintuitive combinations that traditional methods and human intuition might miss.

Source

Key Stable Diffusion researchers leave Stability AI

Robin Rombach and other key researchers who helped develop the Stable Diffusion text-to-image generation model have left the troubled, once-hot, now floundering GenAI startup.

Rombach (who led the team) and fellow researchers Andreas Blattmann and Dominik Lorenz were three of the five authors who developed the core Stable Diffusion research while at a German university. They were hired afterwards by Stability. Last month, they helped publish a 3rd edition of the Stable Diffusion model, which, for the first time, combined the diffusion structure used in earlier versions with transformers used in OpenAI’s ChatGPT.

Their departures are the latest in a mass exodus of executives at Stability AI, as its cash reserves dwindle and it struggles to raise additional funds.

Why does this matter?

Stable Diffusion is one of the foundational models that helped catalyze the boom in generative AI imagery, but now its future hangs in the balance. While Stability AI’s current situation raises questions about its long-term viability, the exodus potentially benefits its competitors.

Source

What Else Is Happening in AI on March 21st, 2024❗

🗣️Character AI’s new feature adds voice to characters with just 10-sec audio

You can now give voice to your Characters by choosing from thousands of voices or creating your own. The voices are created with just 10 seconds of audio clips. The feature is now available for free to everyone. (Link)

🤖GitHub’s latest AI tool can automatically fix code vulnerabilities

GitHub launches the first beta of its code-scanning autofix feature, which finds and fixes security vulnerabilities during the coding process. GitHub claims it can remediate more than two-thirds of the vulnerabilities it finds, often without the developers having to edit the code. The feature is now available for all GitHub Advanced Security (GHAS) customers. (Link)

GitHub’s latest AI tool can automatically fix code vulnerabilities
GitHub’s latest AI tool can automatically fix code vulnerabilities

🚀OpenAI plans to release a ‘materially better’ GPT-5 in mid-2024

According to anonymous sources from Businessinsider, OpenAI plans to release GPT-5 this summer, which will be significantly better than GPT-4. Some enterprise customers are said to have already received demos of the latest model and its ChatGPT improvements. (Link)

💡Fitbit to get major AI upgrades powered by Google’s ‘Personal Health’ LLM

Google Research and Fitbit announced they are working together to build a Personal Health LLM that gives users more insights and recommendations based on their data in the Fitbit mobile app. It will give Fitbit users personalized coaching and actionable insights that help them achieve their fitness and health goals. (Link)

🔬Samsung creates lab to research chips for AI’s next phase

Samsung has set up a research lab dedicated to designing an entirely new type of semiconductor needed for (AGI). The lab will initially focus on developing chips for LLMs with a focus on inference. It aims to release new “chip designs, an iterative model that will provide stronger performance and support for increasingly larger models at a fraction of the power and cost.” (Link)

A daily chronicle of AI Innovations: March 20th, 2024 : 🤖 OpenAI to release GPT-5 this summer; 🧠 Nvidia’s Jensen Huang says AI hallucinations are solvable, AGI is 5 years away; 🔬 Ozempic creator plans AI supercomputer to discover new drugs; 👀 After raising $1.3B, Inflection eaten alive by Microsoft; 🧠 MindEye2: AI Mind Reading from Brain Activity; 🚀 Nvidia NIM enables faster deployment of AI models

🤖 OpenAI to release GPT-5 this summer

  • OpenAI is planning to launch GPT-5 around mid-year, aiming to address previous performance issues and significantly improve upon its predecessor, GPT-4.
  • GPT-5 is described as “materially better” by those who have seen demos, including enhancements and new capabilities like the ability to call AI agents for autonomous tasks, with enterprise customers having already previewed these improvements.
  • The release timeline for GPT-5 remains uncertain as OpenAI continues its training and thorough safety and vulnerability testing, with no specific deadline for completion of these preparatory steps.
  • Source

👀 After raising $1.3B, Inflection eaten alive by Microsoft 

  • In June 2023, Inflection raised $1.3 billion led by Microsoft to develop “more personal AI” but was overtaken by Microsoft less than a year later, with co-founders joining Microsoft’s new AI division.
  • Despite significant investment, Inflection’s AI, Pi, failed to compete with advancements from other companies such as OpenAI, Google’s Gemini, and Anthropic, leading to its downfall.
  • Microsoft’s takeover of Inflection reflects the strategy of legacy tech companies to dominate the AI space by supporting startups then acquiring them once they face challenges.
  • Source

🧠 Nvidia’s Jensen Huang says AI hallucinations are solvable, AGI is 5 years away

  • Nvidia CEO Jensen Huang predicts artificial general intelligence (AGI) could be achieved within 5 years, depending on how AGI is defined and measured.
  • Huang addresses concerns around AI hallucinations, suggesting that ensuring answers are well-researched could easily solve the issue.
  • The concept of AGI raises concerns about its potential unpredictability and the challenges of aligning its objectives with human values and priorities.
  • Source

🔬 Ozempic creator plans AI supercomputer to discover new drugs

  • The Novo Nordisk Foundation is investing in “Gefion,” an AI supercomputer project developed in collaboration with Nvidia.
  • “Gefion” aims to be the world’s most powerful AI supercomputer for health sciences, utilizing Nvidia’s new chips to accelerate scientific breakthroughs in critical areas such as drug discovery, disease diagnosis, and treatment,
  • This initiative underscores the growing integration of AI in healthcare, promising to catalyze significant scientific discoveries and innovations that could transform patient care and outcomes.
  • Source

MindEye2: AI mind reading from brain activity

MindEye2 is a revolutionary model that reconstructs visual perception from brain activity using just one hour of data. Traditional methods require extensive training data, making them impractical for real-world applications. However, MindEye2 overcomes this limitation by leveraging shared-subject models. The model is pretrained on data from seven subjects and then fine-tuned with minimal data from a new subject.

MindEye2: AI mind reading from brain activity
MindEye2: AI mind reading from brain activity

By mapping brain activity to a shared-subject latent space and then nonlinear mapping to CLIP image space, MindEye2 achieves high-quality reconstructions with limited training data. It performs state-of-the-art image retrieval and reconstruction across multiple subjects within only 2.5% of the previously required training data, reducing the training time from 40 to just one hour.

Why does it matter?

MindEye2 has the potential to revolutionize clinical assessments and brain-computer interface applications. This remarkable achievement also holds great promise for neuroscience and opens new possibilities for understanding how our brains perceive and process visual information. It can also help develop personalized treatment plans for neuro patients.

Source

Nvidia NIM enables faster deployment of AI models 

NVIDIA has introduced NVIDIA NIM (NVIDIA Inference Microservices) to accelerate the deployment of AI applications for businesses. NIM is a collection of microservices that package essential components of an AI application, including AI models, APIs, and libraries, into a container. These containers can be deployed in environments such as cloud platforms, Linux servers, or serverless architectures.

Nvidia NIM enables faster deployment of AI models 
Nvidia NIM enables faster deployment of AI models

NIM significantly reduces the time it takes to deploy AI applications from weeks to minutes. It offers optimized inference engines, industry-standard APIs, and support for popular software and data platform vendors. NIM microservices are compatible with NVIDIA GPUs and support features like Retrieval Augmented Generation (RAG) capabilities for enhanced enterprise applications. Developers can experiment with NIM microservices for free on the ai.nvidia.com platform, while commercial deployment is available through NVIDIA AI Enterprise 5.0.

Why does it matter?

With NIM, Nvidia is trying to democratize AI deployment for enterprises by abstracting away complexities. This will enable more developers to contribute to their company’s AI transformation efforts and allow businesses to run AI applications almost instantly without specialized AI expertise.

Source

Microsoft hires DeepMind co-founder to lead a new AI division

Mustafa Suleyman, a renowned co-founder of DeepMind and Inflection, has recently joined Microsoft as the leader of Copilot. Satya Nadella, Microsoft’s CEO, made this significant announcement, highlighting the importance of innovation in artificial intelligence (AI).

In his new role as the Executive Vice President and CEO of Microsoft AI, Mustafa will work alongside Karén Simonyan, another talented individual from Inflection who will serve as Chief Scientist. Together, they will spearhead the development and advancement of Copilot and other exciting consumer AI products at Microsoft. Mustafa and his team’s addition to the Microsoft family brings a wealth of expertise and promises groundbreaking advancements in AI.

Why does it matter?

Mustafa Suleyman’s expertise in AI is expected to contribute to the development of innovative consumer AI products and research at Microsoft, furthering its mission to bring the benefits of AI to people and organizations worldwide. With DeepMind’s founder now at the helm, the AI race between Microsoft, Google, and others became even more intense.

Source

What Else Is Happening in AI on March 20th, 2024❗

📞 Truecaller adds AI-powered spam detection and blocking for Android users

Truecaller has unveiled a new feature for its Android premium subscribers that uses AI to detect spam, even if unavailable on the Truecaller database, and block every call that doesn’t come from an approved contact. Truecaller hopes to add more premium subscribers to its list by adding this feature. However, this feature is not available for Apple users. (Link)

⚽ Google DeepMind’s new AI tool can analyze soccer tactics and offer insights 

DeepMind has partnered with Liverpool FC to develop a new AI tool called TacticAI. TacticAI uses generative and predictive AI to help coaches determine which player will most likely receive the ball during corner kicks, whether a shot will be taken, and how to adjust player setup. It aims to revolutionize soccer and help the teams enhance their efficiency. (Link)

🎬 Pika Labs introduces sound effects for its gen-AI video generation

Pika Labs has now added the ability to create sound effects from a text prompt for its generative artificial intelligence videos. It allows for automatic or custom SFX generations to pair with video outputs. Now, users can make bacon sizzle, lions roar, or add footsteps to the video of someone walking down the street. It is only available to pro users. (Link)

🎮 Buildbox 4 Alpha enables users to create 3D video games from text prompts 

Buildbox has released an alpha version of Buildbox 4. It’s an AI-first game engine that allows users to create games and generate assets from text prompts. The alpha version aims to make text-to-game a distinct reality. Users can create various assets and animations from simple text prompts. It also allows users to build a gaming environment in a few minutes. (Link)

🤖 Nvidia adds generative AI capabilities to empower humanoid robots

Nvidia introduced Project GR00T, a multimodal AI that will power future humanoids with advanced foundation AI. Project GR00T enables humanoid robots to input text, speech, videos, or even live demos and process them to take specific actions. It has been developed with the help of Nvidia’s Isaac Robotic Platform tools, including an Isaac Lab for RLHF. (Link)

The EU AI Act – Key takeaways for LLM builders

The EU AI Act - Key takeaways for LLM builders

A daily chronicle of AI Innovations: March 19th, 2024 : 💻 Nvidia launches ‘world’s most powerful AI chip’; 🎥 Stability AI’s SV3D turns a single photo into a 3D video; 🤖 OpenAI CEO hints at “Amazing Model”, maybe ChatGPT-5 ;🤝 Apple is in talks to bring Google’s AI to iPhones

Nvidia launches ‘world’s most powerful AI chip’

Nvidia has revealed its new Blackwell B200 GPU and GB200 “superchip”, claiming it to be the world’s most powerful chip for AI. Both B200 and GB200 are designed to offer powerful performance and significant efficiency gains.

Nvidia launches 'world's most powerful AI chip'
Nvidia launches ‘world’s most powerful AI chip’

Key takeaways:

  • The B200 offers up to 20 petaflops of FP4 horsepower, and Nvidia says it can reduce costs and energy consumption by up to 25 times over an H100.
  • The GB200 “superchip” can deliver 30X the performance for LLM inference workloads while also being more efficient.
  • Nvidia claims that just 2,000 Blackwell chips working together could train a GPT -4-like model comprising 1.8 trillion parameters in just 90 days.

Why does this matter?

A major leap in AI hardware, the Blackwell GPU boasts redefined performance and energy efficiency. This could lead to lower operating costs in the long run, making high-performance computing more accessible for AI research and development, all while promoting eco-friendly practices.

Source

Stability AI’s SV3D turns a single photo into a 3D video

Stability AI released Stable Video 3D (SV3D), a new generative AI tool for rendering 3D videos. SV3D can create multi-view 3D models from a single image, allowing users to see an object from any angle. This technology is expected to be valuable in the gaming sector for creating 3D assets and in e-commerce for generating 360-degree product views.

SV3D builds upon Stability AI’s previous Stable Video Diffusion model. Unlike prior methods, SV3D can generate consistent views from any given angle. It also optimizes 3D meshes directly from the novel views it produces.

SV3D comes in two variants: SV3D_u generates orbital videos from single images, and SV3D_p creates 3D videos along specified camera paths.

Why does this matter?

SV3D represents a significant leap in generative AI for 3D content. Its ability to create 3D models and videos from a single image could open up possibilities in various fields, such as animation, virtual reality, and scientific modeling.

Source

OpenAI CEO hints at “Amazing Model,” maybe ChatGPT-5

OpenAI CEO Sam Altman has announced that the company will release an “amazing model” in 2024, although the name has not been finalized. Altman also mentioned that OpenAI plans to release several other important projects before discussing GPT-5, one of which could be the Sora video model.

Source

Altman declined to comment on the Q* project, which is rumored to be an AI breakthrough related to logic. He also expressed his opinion that GPT-4 Turbo and GPT-4 “kind of suck” and that the jump from GPT-4 to GPT-5 could be as significant as the improvement from GPT-3 to GPT-4.

Why does this matter?

This could mean that after Google Gemini and Claude-3’s latest version, a new model, possibly ChatGPT-5, could be released in 2024. Altman’s candid remarks about the current state of AI models also offer valuable context for understanding the anticipated advancements and challenges in the field.

Source

Project GR00T is an ambitious initiative aiming to develop a general-purpose foundation model for humanoid robot learning, addressing embodied AGI challenges. Collaborating with leading humanoid companies worldwide, GR00T aims to understand multimodal instructions and perform various tasks.

GROOT is a foundational model that takes language, videos, and example demonstrations as inputs so it can produce the next action.

What the heck does that mean?

➡️ It means you can show it how to do X a few times, and then it can do X on its own.

Like cooking, drumming, or…

Source

Google’s new fine-tuned model is a HUGE improvement, AI is coming for human doctors sooner than most believe.

Google's new fine-tuned model is a HUGE improvement, AI is coming for human doctors sooner than most believe.
Google’s new fine-tuned model is a HUGE improvement, AI is coming for human doctors sooner than most believe.

NVIDIA creates Earth-2 digital twin: generative AI to simulate, visualize weather and climate. Source

What Else Is Happening in AI on March 19th, 2024❗

🤝 Apple is in talks to bring Google’s AI to iPhones

Apple and Google are negotiating a deal to integrate Google’s Gemini AI into iPhones, potentially shaking up the AI industry. The deal would expand on their existing search partnership. Apple also held discussions with OpenAI. If successful, the partnership could give Gemini a significant edge with billions of potential users. (Link)

 🏷️YouTube rolls out AI content labels

YouTube now requires creators to self-label AI-generated or synthetic content in videos. The platform may add labels itself for potentially misleading content. However, the tool relies on creators being honest, as YouTube is still working on AI detection tools. (Link)

🎮Roblox speeds up 3D creation with AI tools
Roblox has introduced two AI-driven tools to streamline 3D content creation on its platform. Avatar Auto Setup automates the conversion of 3D body meshes into fully animated avatars, while Texture Generator allows creators to quickly alter the appearance of 3D objects using text prompts, enabling rapid prototyping and iteration.(Link)

🌐Nvidia teams up with Shutterstock and Getty Images for AI-generated 3D content

Nvidia’s Edify AI can now create 3D content, and partnerships with Shutterstock and Getty Images will make it accessible to all. Developers can soon experiment with these models, while industry giants are already using them to create stunning visuals and experiences.  (Link)

🖌️Adobe Substance 3D introduces AI-powered text-to-texture tools

Adobe has introduced two AI-driven features to its Substance 3D suite: “Text to Texture,” which generates photo-realistic or stylized textures from text prompts, and “Generative Background,” which creates background images for 3D scenes. Both tools use 2D imaging technology from Adobe’s Firefly AI model to streamline 3D workflows. (Link)

A daily chronicle of AI Innovations: March 18th, 2024 – Bernie’s 4 day workweek: less work, same pay – Google’s AI brings photos to life as talking avatars – Elon Musk’s xAI open-sources Grok AI

Bernie’s 4 day workweek: less work, same pay

Sen. Bernie Sanders has introduced the Thirty-Two Hour Workweek Act, which aims to establish a four-day workweek in the United States without reducing pay or benefits. To be phased in over four years, the bill would lower the overtime pay threshold from 40 to 32 hours, ensuring that workers receive 1.5 times their regular salary for work days longer than 8 hours and double their regular wage for work days longer than 12 hours.

Sanders, along with Sen. Laphonza Butler and Rep. Mark Takano, believes that this bill is crucial in ensuring that workers benefit from the massive increase in productivity driven by AI, automation, and new technology. The legislation aims to reduce stress levels and improve Americans’ quality of life while also protecting their wages and benefits.

Why does this matter?

This bill could alter the workforce dynamics. Businesses may need to assess staffing and invest in AI to maintain productivity. While AI may raise concerns over job displacements, it also offers opportunities for better work-life balance through efficiency gains by augmenting human capabilities.

Source

Google’s AI brings photos to life as talking avatars

Google’s latest AI research project VLOGGER, automatically generates realistic videos of talking and moving people from just a single image and an audio or text input. It is the first model that aims to create more natural interactions with virtual agents by including facial expressions, body movements, and gestures, going beyond simple lip-syncing.

It uses a two-step process: first, a diffusion-based network predicts body motion and facial expressions based on the audio, and then a novel architecture based on image diffusion models generates the final video while maintaining temporal consistency. VLOGGER outperforms previous state-of-the-art methods in terms of image quality, diversity, and the range of scenarios it can handle.

Why does this matter?

VLOGGER’s flexibility and applications could benefit remote work, education, and social interaction, making them more inclusive and accessible. Also, as AR/VR technologies advance, VLOGGER’s avatars could create emotionally resonant experiences in gaming, entertainment, and professional training scenarios.

Source

Elon Musk’s xAI open-sources Grok AI

Elon Musk’s xAI has open-sourced the base model weights and architecture of its AI chatbot, Grok. This allows researchers and developers to freely use and build upon the 314 billion parameter Mixture-of-Experts model. Released under the Apache 2.0 license, the open-source version is not fine-tuned for any particular task.

Why does this matter?

This move aligns with Musk’s criticism of companies that don’t open-source their AI models, including OpenAI, which he is currently suing for allegedly breaching an agreement to remain open-source. While several fully open-source AI models are available, the most used ones are closed-source or offer limited open licenses.

Source

What Else Is Happening in AI on March 18th, 2024❗

🧠 Maisa KPU may be the next leap in AI reasoning

Maisa has released the beta version of its Knowledge Processing Unit (KPU), an AI system that uses LLMs’ advanced reasoning and data processing abilities. In an impressive demo, the KPU assisted a customer with an order-related issue, even when the customer provided an incorrect order ID, showing the system’s understanding abilities. (Link)

🍿 PepsiCo increases market domination using GenAI

PepsiCo uses GenAI in product development and marketing for faster launches and better profitability. It has increased market penetration by 15% by using GenAI to improve the taste and shape of products like Cheetos based on customer feedback. The company is also doubling down on its presence in India, with plans to open a third capability center to develop local talent. (Link)

💻 Deci launches Nano LLM & GenAI dev platform

Israeli AI startup Deci has launched two major offerings: Deci-Nano, a small closed-source language model, and a complete Generative AI Development Platform for enterprises. Compared to rivals like OpenAI and Anthropic, Deci-Nano offers impressive performance at low cost, and the new platform offers a suite of tools to help businesses deploy and manage AI solutions. (Link)

🎮 Invoke AI simplifies game dev workflows

Invoke has launched Workflows, a set of AI tools designed for game developers and large studios. These tools make it easier for teams to adopt AI, regardless of their technical expertise levels. Workflows allow artists to use AI features while maintaining control over their training assets, brand-specific styles, and image security. (Link)

🚗 Mercedes teams up with Apptronik for robot workers

Mercedes-Benz is collaborating with robotics company Apptronik to automate repetitive and physically demanding tasks in its manufacturing process. The automaker is currently testing Apptronik’s Apollo robot, a 160-pound bipedal machine capable of lifting objects up to 55 pounds. The robot inspects and delivers components to human workers on the production line, reducing the physical strain on employees and increasing efficiency. (Link)

A daily chronicle of AI Innovations: Week 2 Recap

  1. DeepSeek released DeepSeek-VL, an open-source Vision-Language (VL) model designed for real-world vision and language understanding applications. The DeepSeek-VL family, includes 7B and1.3B base and chat models and achieves state-of-the-art or competitive performance across a wide range of visual-language benchmarks. Free for commercial use [Details | Hugging Face | Demo]

  2. Cohere released Command-R, a 35 billion parameters generative model with open weights, optimized for long context tasks such as retrieval augmented generation (RAG) and using external APIs and tools for production-scale AI for enterprise [Details | Hugging Face].

  3. Google DeepMind introduced SIMA (Scalable Instructable Multiworld Agent), a generalist AI agent for 3D virtual environments, trained on nine different video games. It can understand a broad range of gaming worlds, and follows natural-language instructions to carry out tasks within them, as a human might.  It doesn’t need access to a game’s source code or APIs and requires only the images on screen, and natural-language instructions provided by the user. SIMA uses keyboard and mouse outputs to control the games’ central character to carry out these instructions [Details].

  4. Meta AI introduces Emu Video Edit (EVE), a model that establishes a new state-of-the art in video editing without relying on any supervised video editing data [Details].

  5. Cognition Labs introduced Devin, the first fully autonomous AI software engineer. Devin can learn how to use unfamiliar technologies, can build and deploy apps end to end, can train and fine tune its own AI models. When evaluated on the SWE-Bench benchmark, which asks an AI to resolve GitHub issues found in real-world open-source projects, Devin correctly resolves 13.86% of the issues unassisted, exceeding the previous state-of-the-art model performance of 1.96% unassisted and 4.80% assisted [Details].

  6. Pika Labs adds sound effects to its AI video tool, Pika, allowing users to either prompt desired sounds or automatically generate them based on video content. [Video link].

  7. Anthropic’s Claude 3 Opus ranks #1 on LMSYS Chatbot Arena Leaderboard, along with GPT-4 [Link].

  8. The European Parliament approved the Artificial Intelligence Act. The new rules ban certain AI applications including biometric categorisation systems, Emotion recognition in the workplace and schools, social scoring and more [Details].

  9. Huawei Noah’s Ark Lab introduced PixArt–Σ, a Diffusion Transformer model (DiT) capable of directly generating images at 4K resolution. It achieves superior image quality and user prompt adherence with significantly smaller model size (0.6B parameters) than existing text-to-image diffusion models, such as SDXL (2.6B parameters) and SD Cascade (5.1B parameters) [Details].

  10. South Korean startup Hyodol AI has launched a $1,800 LLM-powered companion doll specifically designed to offer emotional support and companionship to the rapidly expanding elderly demographic in the country [Details].

  11. Covariant introduced RFM-1 (Robotics Foundation Model -1), a large language model (LLM), but for robot language. Set up as a multimodal any-to-any sequence model, RFM-1 is an 8 billion parameter transformer trained on text, images, videos, robot actions, and a range of numerical sensor readings [Details].

  12. Figure 01 robot integrated with an OpenAI vision-language model can now have full conversations with people [Link]

  13. Deepgram announced the general availability of Aura, a text-to-speech model built for responsive, conversational AI agents and applications [Details | Demo].

  14. Claude 3 Haiku model is now available alongside Sonnet and Opus in the Claude API and on claude.ai for Pro subscribers. Haiku outperforms GPT-3.5 and Gemini 1.0 pro while costing less, and is three times faster than its peers for the vast majority of workloads [Details].

  15. Paddle announced AI Launchpad, a 6-week remote program for AI founders to launch and scale an AI business with $20,000 in cash prize [Details].

  16. Midjourney adds feature for generating consistent characters across multiple gen AI images [Details].

  17. The Special Committee of the OpenAI Board announced the completion of the review. Altman, Brockman to continue to lead OpenAI [Details]

  18. Together.ai introduced Sequoia, a scalable, robust, and hardware-aware speculative decoding framework that improves LLM inference speed on consumer GPUs (with offloading), as well as on high-end GPUs (on-chip), without any approximations [Details].

  19. OpenAI released Transformer Debugger (TDB), a tool developed and used internally by OpenAI’s Superalignment team for investigating into specific behaviors of small language models [GitHub].

  20. Elon Musk announced that xAI will open source Grok this week [Link].

A Daily Chronicle of AI Innovations – March 16th, 2024:

🔍 FTC is probing Reddit’s AI licensing deals

  • Reddit is under investigation by the FTC for its data licensing practices concerning user-generated content being used to train AI models.
  • The investigation focuses on Reddit’s engagement in selling, licensing, or sharing data with third parties for AI training.
  • Reddit anticipates generating approximately USD 60 million in 2024 from a data licensing agreement with Google, aiming to leverage its platform data for training LLMs

.

💻 New jailbreak uses ASCII art to elicit harmful responses from leading LLM

  • Researchers identified a new vulnerability in leading AI language models, named ArtPrompt, which uses ASCII art to exploit the models’ security mechanisms.
  • ArtPrompt masks security-sensitive words with ASCII art, fooling language models like GPT-3.5, GPT-4, Gemini, Claude, and Llama2 into performing actions they would otherwise block, such as giving instructions for making a bomb.
  • The study underscores the need for enhanced defensive measures for language models, as ArtPrompt, by leveraging a mix of text-based and image-based inputs, can effectively bypass current security protocols.

OpenAI aims to make its own AI processors — chip venture in talks with Abu Dhabi investment firm. Source

Once “too scary” to release, GPT-2 gets squeezed into an Excel spreadsheet. Source

 A Daily Chronicle of AI Innovations – March 15th, 2024:

🍎 Apple quietly acquires another AI startup

🤖 Mercedes tests humanoid robots for ‘low skill, repetitive’ tasks

🚫 Midjourney bans prompts with Joe Biden and Donald Trump over election misinformation concerns

💰 El Salvador stashes $406 million in bitcoin in ‘cold wallet’

🤔 Microsoft calls out Google dominance in generative AI

📝 Anthropic releases affordable, high-speed Claude 3 Haiku model

🥘 Apple’s MM1: The new recipe to master AI performance

Apple’s MM1 AI model shows state-of-the-art language and vision capabilities. It was trained on a filtered dataset of 500 million text-image pairs from the web, including 10% text-only docs to improve language understanding.

🥘 Apple’s MM1: The new recipe to master AI performance
🥘 Apple’s MM1: The new recipe to master AI performance

The team experimented with different configurations during training. They discovered that using an external pre-trained high-resolution image encoder improved visual recognition. Combining different image, text, and caption data ratios led to the best performance. Synthetic caption data also enhanced few-shot learning abilities.

This experiment cements that using a blend of image caption, interleaved image text, and text-only data is crucial for achieving state-of-the-art (SOTA) few-shot results across multiple benchmarks.

Why does it matter?

Apple’s new model is promising, especially in developing image recognition systems for new categories or domains. This will help businesses and startups improve the speed of AI tool development specifically for text-to-image, document analysis, and enhanced visual recognition.

⚡ Cerebras WSE-3: AI chip enabling 10x larger models than GPT-4

Cerebras Systems has made a groundbreaking announcement unveiling its latest wafer-scale AI chip, the WSE-3. This chip boasts an incredible 4 trillion transistors, making it one of the most powerful AI chips on the market. The third-generation wafer-scale AI mega chip is twice as powerful as its predecessor while being power efficient. 

The chip’s transistor density has increased by over 50 percent thanks to the latest manufacturing technology. One of the most remarkable features of the WSE-3 chip is its ability to enable AI models that are ten times larger than the highly acclaimed GPT-4 and Gemini models.

Why does it matter?

The WSE-3 chip opens up new possibilities for tackling complex problems and pushing the boundaries of AI capabilities. This powerful system can train massive language models, such as the Llama 70B, in just one day. It will help enterprises create custom LLMs, rapidly reducing the time-to-market.

🤖 Apple acquires Canadian AI startup DarwinAI

Apple made a significant acquisition earlier this year by purchasing Canadian AI startup DarwinAI. Integrating DarwinAI’s expertise and technology bolsters Apple’s AI initiatives. 

With this acquisition, Apple aims to tap into DarwinAI’s advancements in AI technology, particularly in visual inspection during manufacturing and making AI systems smaller and faster. Leveraging DarwinAI’s technology, Apple aims to run AI on devices rather than relying solely on cloud-based solutions.

Why does it matter?

Apple’s acquisition of DarwinAI is a strategic move to revolutionize features and enhance its AI capabilities across various products and services. Especially with the iOS 18 release around the corner, this acquisition will help create new features and enhance the user experience.

🤖 Microsoft expands the availability of Copilot across life and work.

Microsoft is expanding Copilot, its AI assistant, with the introduction of the Copilot Pro subscription for individuals, the availability of Copilot for Microsoft 365 to small and medium-sized businesses, and removing seat minimums for commercial plans. Copilot aims to enhance creativity, productivity, and skills across work and personal life, providing users access to the latest AI models and improved image creation

💻 Oracle adds groundbreaking Generative AI features to its software

Oracle has added advanced AI capabilities to its finance and supply chain software suite, aimed at improving decision-making and enhancing customer and employee experience. For instance, Oracle Fusion Cloud SCM includes features such as item description generation, supplier recommendations, and negotiation summaries.

💰 Databricks makes a strategic investment in Mistral AI

Databricks has invested in Mistral AI and integrated its AI models into its data intelligence platform, allowing users to customize and consume models in various ways. The integration includes Mistral’s text-generation models, such as Mistral 7B and Mixtral 8x7B, which support multiple languages. This partnership aims to provide Databricks customers with advanced capabilities to leverage AI models and drive innovation in their data-driven applications.

📱 Qualcomm emerges as a mobile AI juggernaut

Qualcomm has solidified its leadership position in mobile artificial intelligence (AI). It has been developing AI hardware and software for over a decade. Their Snapdragon processors are equipped with specialized AI engines like Hexagon DSP, ensuring efficient AI and machine learning processing without needing to send data to the cloud.

👓 MIT researchers develop peripheral vision capabilities for AI models

AI researchers are developing techniques to simulate peripheral vision and improve object detection in the periphery. They created a new dataset to train computer vision models, which led to better object detection outside the direct line of sight, though still behind human capabilities. A modified texture tiling approach accurately representing information loss in peripheral vision significantly enhanced object detection and recognition abilities.

🤔 Microsoft calls out Google dominance in generative AI 

  • Microsoft has expressed concerns to EU antitrust regulators about Google’s dominance in generative AI, highlighting Google’s unique position due to its vast data sets and vertical integration, which includes AI chips and platforms like YouTube.
  • The company argues that Google’s control over vast resources and its own AI developments give it a competitive advantage, making it difficult for competitors to match, especially in the development of Large Language Models like Gemini.
  • Microsoft defends partnerships with startups like OpenAI as essential for innovation and competition in the AI market, countering regulatory concerns about potential anticompetitive advantages arising from such collaborations.

🤖 Mercedes tests humanoid robots for ‘low skill, repetitive’ tasks

  • Mercedes-Benz is testing humanoid robots, specifically Apptronik’s bipedal robot Apollo, for automating manual labor tasks in manufacturing.
  • The trial aims to explore the use of Apollo in physically demanding, repetitive tasks within existing manufacturing facilities without the need for significant redesigns.
  • The initiative seeks to address labor shortages by using robots for low-skill tasks, allowing highly skilled workers to focus on more complex aspects of car production.

🚫 Midjourney bans prompts with Joe Biden and Donald Trump over election misinformation concerns

  • Midjourney, an AI image generator, has banned prompts containing the names of Joe Biden and Donald Trump to avoid the spread of election misinformation.
  • The policy change is in response to concerns over AI’s potential to influence voters and spread false information before the 2024 presidential election.
  • Despite the new ban, Midjourney previously allowed prompts that could generate misleading or harmful content, and it was noted for its poor performance in controlling election disinformation.

Midjourney introduces Character Consistency: Tutorial

midjourney_character_consistency

A Daily Chronicle of AI Innovations – March 14th, 2024: 

🎮 DeepMind’s SIMA: The AI agent that’s a Jack of all games

 ⚡ Claude 3 Haiku: Anthropic’s lightning-fast AI solution for enterprises

 🤖 OpenAI-powered “Figure 01” can chat, perceive, and complete tasks

 🎥 OpenAI’s Sora will be publicly available later this year

 

🎮 DeepMind’s SIMA: The AI agent that’s a Jack of all games

DeepMind has introduced SIMA (Scalable Instructable Multiworld Agent), a generalist AI agent that can understand and follow natural language instructions to complete tasks across video game environments. Trained in collaboration with eight game studios on nine different games, SIMA marks a significant milestone in game-playing AI by showing the ability to generalize learned skills to new gaming worlds without requiring access to game code or APIs.

 

DeepMind's SIMA: The AI agent that's a Jack of all games
DeepMind’s SIMA: The AI agent that’s a Jack of all games
 

(SIMA comprises pre-trained vision models, and a main model that includes a memory and outputs keyboard and mouse actions.)

SIMA was evaluated on 600 basic skills, including navigation, object interaction, and menu use. In tests, SIMA agents trained on multiple games significantly outperformed specialized agents trained on individual games. Notably, an agent trained on all but one game performed nearly as well on the unseen game as an agent specifically trained on it, showcasing SIMA’s remarkable ability to generalize to new environments. 

Why does this matter?

SIMA’s generalization ability using a single AI agent is a significant milestone in transfer learning. By showing that a multi-task trained agent can perform nearly as well on an unseen task as a specialized agent, SIMA paves the way for more versatile and scalable AI systems. This could lead to faster deployment of AI in real-world applications, as agents would require less task-specific training data and could adapt to new scenarios more quickly.

Source


⚡ Claude 3 Haiku: Anthropic’s lightning-fast AI solution for enterprises

Anthropic has released Claude 3 Haiku, their fastest and most affordable AI model. With impressive vision capabilities and strong performance on industry benchmarks, Haiku is designed to tackle a wide range of enterprise applications. The model’s speed – processing 21K tokens per second for prompts under 32K tokens – and cost-effective pricing model make it an attractive choice for businesses needing to analyze large datasets and generate timely outputs.

 

Claude 3 Haiku: Anthropic's lightning-fast AI solution for enterprises
Claude 3 Haiku: Anthropic’s lightning-fast AI solution for enterprises
 

In addition to its speed and affordability, Claude 3 Haiku prioritizes enterprise-grade security and robustness. The model is now available through Anthropic’s API or on claude.ai for Claude Pro subscribers.

Why does this matter?

Claude 3 Haiku sets a new benchmark for enterprise AI by offering high speed and cost-efficiency without compromising performance. This release will likely intensify competition among AI providers, making advanced AI solutions more accessible to businesses of all sizes. As more companies adopt models like Haiku, we expect a surge in AI-driven productivity and decision-making across industries.

Source


🤖 OpenAI-powered “Figure 01” can chat, perceive, and complete tasks

Robotics company Figure, in collaboration with OpenAI, has developed a groundbreaking robot called “Figure 01” that can engage in full conversations, perceive its surroundings, plan actions, and execute tasks based on verbal requests, even those that are ambiguous or context-dependent. This is made possible by connecting the robot to a multimodal AI model trained by OpenAI, which integrates language and vision.

OpenAI-powered "Figure 01" can chat, perceive, and complete tasks
OpenAI-powered “Figure 01” can chat, perceive, and complete tasks

The AI model processes the robot’s entire conversation history, including images, enabling it to generate appropriate verbal responses and select the most suitable learned behaviors to carry out given commands. The robot’s actions are controlled by visuomotor transformers that convert visual input into precise physical movements. “Figure 01” successfully integrates natural language interaction, visual perception, reasoning, and dexterous manipulation in a single robot platform.

Why does this matter?

As robots become more adept at understanding and responding to human language, questions arise about their autonomy and potential impact on humanity. Collaboration between the robotics industry and AI policymakers is needed to establish regulations for the safe deployment of AI-powered robots. If deployed safely, these robots could become trusted partners, enhancing productivity, safety, and quality of life in various domains.

Source

What Else Is Happening in AI on March 14th, 2024❗

🛍️ Amazon streamlines product listing process with new AI tool

Amazon is introducing a new AI feature for sellers to quickly create product pages by pasting a link from their external website. The AI generates product descriptions and images based on the linked site’s information, saving sellers time. (Link)

🛡️ Microsoft to expand AI-powered cybersecurity tool availability from April 1

Microsoft is expanding the availability of its AI-powered cybersecurity tool, “Security Copilot,” from April 1, 2024. The tool helps with tasks like summarizing incidents, analyzing vulnerabilities, and sharing information. Microsoft plans to adopt a ‘pay-as-you-go’ pricing model to reduce entry barriers. (Link)

🎥 OpenAI’s Sora will be publicly available later this year

OpenAI will release Sora, its text-to-video AI tool, to the public later this year. Sora generates realistic video scenes from text prompts and may add audio capabilities in the future. OpenAI plans to offer Sora at a cost similar to DALL-E, its text-to-image model, and is developing features for users to edit the AI-generated content. (Link)

📰 OpenAI partners with Le Monde, Prisa Media for news content in ChatGPT

OpenAI has announced partnerships with French newspaper Le Monde and Spanish media group Prisa Media to provide their news content to users of ChatGPT. The media companies see this as a way to ensure reliable information reaches AI users while safeguarding their journalistic integrity and revenue. (Link)

🏠 Icon’s AI architect and 3D printing breakthroughs reimagine homebuilding

Construction tech startup Icon has introduced an AI-powered architect, Vitruvius, that engages users in designing their dream homes, offering 3D-printed and conventional options. The company also debuted an advanced 3D printing robot called Phoenix and a low-carbon concrete mix as part of its mission to make homebuilding more affordable, efficient, and sustainable. (Link)

A Daily Chronicle of AI Innovations – March 13th, 2024: Devin: The first AI software engineer redefines coding; Deepgram’s Aura empowers AI agents with authentic voices; Meta introduces two 24K GPU clusters to train Llama 3

Devin: The first AI software engineer redefines coding 

In the most groundbreaking development, the US-based startup Cognition AI has unveiled Devin, the world’s first AI software engineer. It is an autonomous agent that solves engineering tasks using its shell or command prompt, code editor, and web browser. Devin can also perform tasks like planning, coding, debugging, and deploying projects autonomously.

https://twitter.com/i/status/1767548763134964000

When evaluated on the SWE-Bench benchmark, which asks an AI to resolve GitHub issues found in real-world open-source projects, Devin correctly resolves 13.86% of the issues unassisted, far exceeding the previous state-of-the-art model performance of 1.96% unassisted and 4.80% assisted. It has successfully passed practical engineering interviews with leading AI companies and even completed real Upwork jobs.

Why does it matter?

There’s already a huge debate if Devin will replace software engineers. However, most production-grade software is too complex, unique, or domain-specific to be fully automated at this point. Perhaps, Devin could start handling more initial-level tasks in development. More so, it can assist developers in quickly prototyping, bootstrapping, and autonomously launching MVP for smaller apps and websites, for now

Source

Deepgram’s Aura empowers AI agents with authentic voices

Deepgram, a top voice recognition startup, just released Aura, its new real-time text-to-speech model. It’s the first text-to-speech model built for responsive, conversational AI agents and applications. Companies can use these agents for customer service in call centers and other customer-facing roles.

Deepgram’s Aura empowers AI agents with authentic voices
Deepgram’s Aura empowers AI agents with authentic voices

Aura includes a dozen natural, human-like voices with lower latency than any comparable voice AI alternative and is already being used in production by several customers. Aura works hand in hand with Deepgram’s Nova-2 speech-to-text API. Nova-2 is known for its top-notch accuracy and speed in transcribing audio streams.

Why does it matter?

Deepgram’s Aura is a one-stop shop for speech recognition and voice generation APIs that enable the fastest response times and most natural-sounding conversational flow. Its human-like voice models render extremely fast (typically in well under half a second) and at an affordable price ($0.015 per 1,000 characters). Lastly, Deepgram’s transcription is more accurate and faster than other solutions as well.

Source

Meta introduces two 24K GPU clusters to train Llama 3

Meta has invested significantly in its AI infrastructure by introducing two 24k GPU clusters. These clusters, built on top of Grand Teton, OpenRack, and PyTorch, are designed to support various AI workloads, including the training of Llama 3.

Meta introduces two 24K GPU clusters to train Llama 3
Meta introduces two 24K GPU clusters to train Llama 3

Meta aims to expand its infrastructure build-out by the end of 2024. It plans to include 350,000 NVIDIA H100 GPUs, providing compute power equivalent to nearly 600,000 H100s. The clusters are built with a focus on researcher and developer experience.

This adds up to Meta’s long-term vision to build open and responsibly developed artificial general intelligence (AGI). These clusters enable the development of advanced AI models and power applications such as computer vision, NLP, speech recognition, and image generation.

Why does it matter?

Meta is committed to open compute and open source, driving innovation in the AI software and hardware industry. Introducing two new GPUs to train Llama 3 is also a push forward to their commitment. As a founding member of Open Hardware Innovation (OHI) and the Open Innovation AI Research Community, Meta wants to make AI transparent and trustworthy.

Source

What Else Is Happening in AI on March 13th, 2024❗

🎮 Google Play to display AI-powered FAQs and recent YouTube videos for games

At the Google for Games Developer Summit held in San Francisco, Google announced several new features for ‘Google Play listing for games’. These include AI-powered FAQs, displaying the latest YouTube videos, new immersive ad formats, and support for native PC game publishing. These new features will allow developers to display promotions and the latest YouTube videos directly in their listing and show them to users in the Games tab of the Play Store. (Link)

🛡️ DoorDash’s new AI-powered tool automatically curbs verbal abuses

DoorDash has introduced a new AI-powered tool named ‘SafeChat+’ to review in-app conversations and determine if a customer or Dasher is being harassed. There will be an option to report the incident and either contact DoorDash’s support team if you’re a customer or quickly cancel the order if you’re a delivery person. With this feature, DoorDash aims to reduce verbally abusive and inappropriate interactions between consumers and delivery people. (Link)

🔍 Perplexity has decided to bring Yelp data to its chatbot

Perplexity has decided to bring Yelp data to its chatbot. The company CEO, Aravind Srinivas, told the media that many people use chatbots like search engines. He added that it makes sense to offer information on things they look for, like restaurants, directly from the source. That’s why they have decided to integrate Yelp’s maps, reviews, and other details in responses when people ask for restaurant or cafe recommendations.  (Link)

👗 Pinterest’s ‘body types ranges’ tool delivers more inclusive search results

Pinterest has introduced a new tool named body type ranges, which gives users a choice to self-select body types from a visual cue between four body type ranges to deliver personalized and more refined search results for women’s fashion and wedding inspiration. This tool aims to create a more inclusive place online to search, save, and shop. The company also plans to launch a similar feature for men’s fashion later this year. (Link)

🚀 OpenAI’s GPT-4.5 Turbo is all set to be launched in June 2024

According to the leak search engine results from Bing and DuckDuck Go, which indexed the OpenAI GPT-4.5 Turbo product page before an official announcement, OpenAI is all set to launch the new version of its LLM by June 2024. There is a discussion among the AI community that this could be OpenAI’s fastest, most accurate, and most scalable model to date. The details of GPT-4.5 Turbo were leaked by OpenAI’s web team, which now leads to a 404 page. (Link))

A Daily Chronicle of AI Innovations in March 2024 – Day 12: AI Daily News – March 12th, 2024

🚀Cohere’s introduces production-scale AI for enterprises
🤖 RFM-1 redefines robotics with human-like reasoning
🎧 Spotify introduces audiobook recommendations

🙃 Midjourney bans all its competitor’s employees

🚫 Google restricts election-related queries for its Gemini chatbot

📲 Apple to let developers distribute apps directly from their websites

💰 AI startups reach record funding of nearly $50 billion in 2023

Cohere’s introduces production-scale AI for enterprises

Cohere, an AI company, has introduced Command-R, a new large language model (LLM) designed to address real-world challenges, such as inefficient workflows, data analysis limitations, slow response times, etc.

Cohere’s introduces production-scale AI for enterprises
Cohere’s introduces production-scale AI for enterprises

Command-R focuses on two key areas: Retrieval Augmented Generation (RAG) and Tool Use. RAG allows the model to access and process information from private databases, improving the accuracy of its responses. Tool Use allows Command-R to interact with external software tools and APIs, automating complex tasks.

Command-R offers several features beneficial for businesses, including:

  • Multilingual capabilities: Supports 10 major languages
  • Cost-effectiveness: Offers a longer context window and reduced pricing compared to previous models
  • Wider accessibility: Available through Cohere’s API, major cloud providers, and free weights for research on HuggingFace

Overall, it empowers businesses to leverage AI for improved decision-making, increased productivity, and enhanced customer experiences.

Why does this matter?

Command-R showcases the future of business operations, featuring automated workflows, and enabling humans to focus on strategic work. Thanks to its low hallucination rate, we would see a wider adoption of AI technologies, and the development of sophisticated, context-aware AI applications tailored to specific business needs.

As AI continues to evolve and mature, models like Command-R will shape the future of work and the global economy.

Source

RFM-1 redefines robotics with human-like reasoning

Covariant has introduced RFM-1, a Robotics Foundation Model that gives robots ChatGPT-like understanding and reasoning capabilities.

TLDR;

  • RFM-1 is an 8 billion parameter transformer trained on text, images, videos, robot actions, and sensor readings from Covariant’s fleet of high-performing robotic systems deployed in real-world environments.
  • Similar to how we understand how objects move, RFM-1 can predict future outcomes/consequences based on initial images and robot actions.
  • RFM-1 leverages NLP to enable intuitive interfaces for programming robot behavior. Operators can instruct robots using plain English, lowering barriers to customizing AI behavior for specific needs.
  • RFM-1 can also communicate issues and suggest solutions to operators.

Why does this matter?

This advancement has the potential to revolutionize industries such as manufacturing, logistics, and healthcare, where robots can work alongside humans to improve efficiency, safety, and productivity.

Source

Spotify now recommends audiobooks (with AI)

Spotify has introduced a novel recommendation system called 2T-HGNN to provide personalized audiobook recommendations to its users. The system addresses the challenges of introducing a new content type (audiobooks) into an existing platform, such as data sparsity and the need for scalability.

Spotify now recommends audiobooks (with AI)
Spotify now recommends audiobooks (with AI)

2T-HGNN leverages a technique called “Heterogeneous Graph Neural Networks” (HGNNs) to uncover connections between different content types. Additionally, a “Two Tower” (2T) model helps ensure that recommendations are made quickly and efficiently for millions of users.

Interestingly, the system also uses podcast consumption data and weak interaction signals to uncover user preferences and predict future audiobook engagement.

Why does this matter?

This research will not only improve the user experience but also encourage users to explore and engage with audiobooks, potentially driving growth in this new content vertical. Moreover, it may inspire similar strategies in domains where tailored recommendations are essential, such as e-commerce, news, and entertainment.

Source

What Else Is Happening in AI on March 12th, 2024❗

💡 Elon Musk makes xAI’s Grok chatbot open-source

Elon Musk announced that his AI startup xAI will open-source its ChatGPT rival “Grok” this week, following a lawsuit against OpenAI for shifting to a for-profit model. Musk aims to provide free access to Grok’s code, aligning with open-source AI models like Meta and Mistral (Link)

 🖼️ Midjourney launches character consistent feature

Midjourney’s new “Consistent Character” feature lets artists create consistent characters across images. Users provide a reference image URL with their prompt, and the AI attempts to match the character’s features in new scenes. This holds promise for creators of comics, storyboards, and other visual narratives. (Link)

🤖 Apple tests AI for App Store ad optimization
Taking a page from Google and Meta, Apple is testing AI-powered ad placement within its App Store. This new system would automatically choose the most suitable locations (e.g., App Store Today page) to display ads based on advertiser goals and budget. This development could help Apple’s ad business reach $6 billion by 2025.(Link)

🏥China tests AI chatbot to assist neurosurgeons

China steps into the future of brain surgery with an AI co-pilot, dubbed “CARES Copilot”. This AI, based on Meta’s Llama 2.0, assists surgeons by analyzing medical data (e.g., scans) and offering informed suggestions during surgery. This government-backed project reflects China’s growing focus on developing domestic AI solutions for various sectors, including healthcare. (Link)

🧓South Korea deploys AI dolls to tackle elderly loneliness

Hyodol, a Korean-based company, has introduced an AI-powered companion doll to tackle loneliness among elderly. Priced at $1800, the robot doll boasts advanced features like conversation abilities, medication reminders, and safety alerts. With 7,000 dolls already deployed, Hyodol aims to expand to European and North American markets. (Link)

🙃 Midjourney bans all its competitor’s employees

  • Midjourney banned all Stability AI employees from using its service, citing a systems outage caused by data scraping efforts linked to Stability AI employees.
  • The company announced the ban and a new policy against “aggressive automation” after identifying botnet-like activity from Stability AI during a server outage.
  • Stability AI CEO Emad Mostaque is looking into the incident, and Midjourney’s founder David Holz has provided information for the internal investigation.
  • Source

🚫 Google restricts election-related queries for its Gemini chatbot

  • Google has begun restricting Gemini queries related to elections globally in countries where elections are taking place, to prevent the dissemination of false or misleading information.
  • The restrictions were implemented amid concerns over generative AI’s potential impact on elections and followed an advisory from India requiring tech firms to obtain government permission before introducing new AI models.
  • Despite the restrictions, the effectiveness of the restrictions is under question as some users found ways to bypass them, and it’s uncertain if Google will lift these restrictions post-elections.
  • Source

💰 AI startups reach record funding of nearly $50 billion in 2023

  • AI startups reached a record funding of nearly $50 billion in 2023, with significant contributions from companies like OpenAI and Anthropic.
  • Investment trends showed over 70 funding rounds exceeding $100 million each, partly due to major companies’ investments, including Microsoft’s $10 billion in OpenAI.
  • While large tech companies are venturing to dominate the AI market, specialized AI startups like Midjourney manage to maintain niches by offering superior products.
  • Source

A Daily Chronicle of AI Innovations in March 2024 – Day 11: AI Daily News – March 11th, 2024

🖼️ Huawei’s PixArt-Σ paints prompts to perfection
🧠 Meta cracks the code to improve LLM reasoning
📈 Yi Models exceed benchmarks with refined data

Huawei’s PixArt-Σ paints prompts to perfection

Researchers from Huawei’s Noah’s Ark Lab introduced PixArt-Σ, a text-to-image model that can create 4K resolution images with impressive accuracy in following prompts. Despite having significantly fewer parameters than models like SDXL, PixArt-Σ outperforms them in image quality and prompt matching.

  

The model uses a “weak-to-strong” training strategy and efficient token compression to reduce computational requirements. It relies on carefully curated training data with high-resolution images and accurate descriptions, enabling it to generate detailed 4K images closely matching the text prompts. The researchers claim that PixArt-Σ can even keep up with commercial alternatives such as Adobe Firefly 2, Google Imagen 2, OpenAI DALL-E 3, and Midjourney v6.

Why does this matter?

PixArt-Σ’s ability to generate high-resolution, photorealistic images accurately could impact industries like advertising, media, and entertainment. As its efficient approach requires fewer computational resources than existing models, businesses may find it easier and more cost-effective to create custom visuals for their products or services.

Source

Meta cracks the code to improve LLM reasoning

Meta researchers investigated using reinforcement learning (RL) to improve the reasoning abilities of large language models (LLMs). They compared algorithms like Proximal Policy Optimization (PPO) and Expert Iteration (EI) and found that the simple EI method was particularly effective, enabling models to outperform fine-tuned models by nearly 10% after several training iterations.

However, the study also revealed that the tested RL methods have limitations in further improving LLMs’ logical capabilities. The researchers suggest that stronger exploration techniques, such as Tree of Thoughts, XOT, or combining LLMs with evolutionary algorithms, are important for achieving greater progress in reasoning performance.

Why does this matter?

Meta’s research highlights the potential of RL in improving LLMs’ logical abilities. This could lead to more accurate and efficient AI for domains like scientific research, financial analysis, and strategic decision-making. By focusing on techniques that encourage LLMs to discover novel solutions and approaches, researchers can make more advanced AI systems.

Source

Yi models exceed benchmarks with refined data

01.AI has introduced the Yi model family, a series of language and multimodal models that showcase impressive multidimensional abilities. The Yi models, based on 6B and 34B pretrained language models, have been extended to include chat models, 200K long context models, depth-upscaled models, and vision-language models.

The performance of the Yi models can be attributed to the high-quality data resulting from 01.AI‘s data-engineering efforts. By constructing a massive 3.1 trillion token dataset of English and Chinese corpora and meticulously polishing a small-scale instruction dataset, 01.AI has created a solid foundation for their models. The company believes that scaling up model parameters using thoroughly optimized data will lead to even more powerful models.

Why does this matter?

The Yi models’ success in language, vision, and multimodal tasks suggests that they could be adapted to a wide range of applications, from customer service chatbots to content moderation and beyond. These models also serve as a prime example of how investing in data optimization can lead to groundbreaking advancements in the field.

Source

OpenAI’s Evolution into Skynet: AI and Robotics Future, Figure Humanoid Robots

 

  • OpenAI’s partnership with Figure signifies a transformative step in the evolution of AI and robotics.
  • Utilizing Microsoft Azure, OpenAI’s investment supports the deployment of autonomous humanoid robots for commercial use.
  • Figure’s collaboration with BMW Manufacturing integrates humanoid robots to enhance automotive production.
  • This technological progression echoes the fictional superintelligence Skynet yet emphasizes real-world innovation and safety.
  • The industry valuation of Figure at $2.6 billion underlines the significant impact and potential of advanced AI in commercial sectors.

What Else Is Happening in AI on March 11, 2024❗

🏠 Redfin’s AI can tell you about your dream neighborhood

“Ask Redfin” can now answer questions about homes, neighborhoods, and more. Using LLMss, the chatbot can provide insights on air conditioning, home prices, safety, and even connect users to agents. It is currently available in 12 U.S. cities, including Atlanta, Boston, Chicago, and Washington, D.C. (Link)

🔊 Pika Labs Adds Sound to Silent AI Videos 

Pika Labs users can now add sound effects to their generated videos. Users can either specify the exact sounds they want or let Pika’s AI automatically select and integrate them based on the video’s content. This update aims to provide a more immersive and engaging video creation experience, setting a new standard in the industry. (Link)

🩺 Salesforce’s new AI tool for doctors automates paperwork

Salesforce is launching new AI tools to help healthcare workers automate tedious administrative tasks. Einstein Copilot: Health Actions will allow doctors to book appointments, summarize patient info, and send referrals using conversational AI, while Assessment Generation will digitize health assessments without manual typing or coding. (Link)

🖥️ HP’s new AI-powered PCs redefine work 

HP just dropped a massive lineup of AI-powered PCs, including the HP Elite series, Z by HP mobile workstations, and Poly Studio conferencing solutions. These devices use AI to improve productivity, creativity, and collaboration for the hybrid workforce, while also offering advanced security features like protection against quantum computer hacks. (Link)

🎨 DALL-E 3’s new look is artsy and user-friendly

OpenAI is testing a new user interface for DALL-E 3. It allows users to choose between predefined styles and aspect ratios directly in the GPT, offering a more intuitive and educational experience. OpenAI has also implemented the C2PA standard for metadata verification and is working on an image classifier to reliably recognize DALL-E images. (Link)

A Daily Chronicle of AI Innovations in March 2024 – Week 1 Summary

  1. Anthropic introduced the next generation of Claude: Claude 3 model family. It includes OpusSonnet and Haiku models. Opus is the most intelligent model, that outperforms GPT-4 and Gemini 1.0 Ultra on most of the common evaluation benchmarks. Haiku is the fastest, most compact model for near-instant responsiveness. The Claude 3 models have vision capabilities, offer a 200K context window capable of accepting inputs exceeding 1 million tokens, improved accuracy and fewer refusals [Details | Model Card].
  2. Stability AI partnered with Tripo AI and released TripoSR, a fast 3D object reconstruction model that can generate high-quality 3D models from a single image in under a second. The model weights and source code are available under the MIT license, allowing commercialized use. [Details | GitHub | Hugging Face].
  3. Answer.AI released a fully open source system that, for the first time, can efficiently train a 70b large language model on a regular desktop computer with two or more standard gaming GPUs. It combines QLoRA with Meta’s FSDP, which shards large models across multiple GPUs [Details].
  4. Inflection launched Inflection-2.5, an upgrade to their model powering Pi, Inflection’s empathetic and supportive companion chatbot. Inflection-2.5 approaches GPT-4’s performance, but used only 40% of the amount of compute for training. Pi is also now available on Apple Messages [Details].
  5. Twelve Labs introduced Marengo-2.6, a new state-of-the-art (SOTA) multimodal foundation model capable of performing any-to-any search tasks, including Text-To-Video, Text-To-Image, Text-To-Audio, Audio-To-Video, Image-To-Video, and more [Details].
  6. Cloudflare announced the development of Firewall for AI, a protection layer that can be deployed in front of Large Language Models (LLMs), hosted on the Cloudflare Workers AI platform or models hosted on any other third party infrastructure, to identify abuses before they reach the models [Details]
  7. Scale AI, in partnership with the Center for AI Safety, released WMDP (Weapons of Mass Destruction Proxy): an open-source evaluation benchmark of 4,157 multiple-choice questions that serve as a proxy measurement of LLM’s risky knowledge in biosecurity, cybersecurity, and chemical security [Details].
  8. Midjourney launched v6 turbo mode to generate images at 3.5x the speed (for 2x the cost). Just type /turbo [Link].
  9. Moondream.ai released moondream 2 – a small 1.8B parameters, open-source, vision language model designed to run efficiently on edge devices. It was initialized using Phi-1.5 and SigLIP, and trained primarily on synthetic data generated by Mixtral. Code and weights are released under the Apache 2.0 license, which permits commercial use [Details].
  10. Vercel released Vercel AI SDK 3.0. Developers can now associate LLM responses to streaming React Server Components [Details].
  11. Nous Research released a new model designed exclusively to create instructions from raw-text corpuses, Genstruct 7B. This enables the creation of new, partially synthetic instruction finetuning datasets from any raw-text corpus [Details].
  12. 01.AI open-sources Yi-9B, one of the top performers among a range of similar-sized open-source models excelling in code, math, common-sense reasoning, and reading comprehension [Details].
  13. Accenture to acquire Udacity to build a learning platform focused on AI [Details].
  14. China Offers ‘Computing Vouchers’ upto $280,000 to Small AI Startups to train and run large language models [Details].
  15. Snowflake and Mistral have partnered to make Mistral AI’s newest and most powerful model, Mistral Large, available in the Snowflake Data Cloud [Details]
  16. OpenAI rolled out ‘Read Aloud’ feature for ChatGPT, enabling ChatGPT to read its answers out loud. Read Aloud can speak 37 languages but will auto-detect the language of the text it’s reading [Details].

A Daily Chronicle of AI Innovations in March 2024 – Day 8: AI Daily News – March 08th, 2024

🗣️Inflection 2.5: A new era of personal AI is here!
🔍Google announces LLMs on device with MediaPipe
🤖GaLore: A new method for memory-efficient LLM training

📱Adobe makes creating social content on mobile easier

🛡️OpenAI now allows users to add MFA to user accounts

🏅US Army is building generative AI chatbots in war games

🧑‍🎨 Claude 3 builds the painting app in 2 minutes and 48 seconds

🧪Cognizant launches AI lab in San Francisco to drive innovation

Inflection 2.5: A new era of personal AI is here!

Inflection.ai, the company behind the personal AI app Pi, has recently introduced Inflection-2.5, an upgraded large language model (LLM) that competes with top LLMs like GPT-4 and Gemini. The in-house upgrade offers enhanced capabilities and improved performance, combining raw intelligence with the company’s signature personality and empathetic fine-tuning.

Inflection 2.5: A new era of personal AI is here!
Inflection 2.5: A new era of personal AI is here!

This upgrade has made significant progress in coding and mathematics, keeping Pi at the forefront of technological innovation. With Inflection-2.5, Pi has world-class real-time web search capabilities, providing users with high-quality breaking news and up-to-date information. This empowers Pi users with a more intelligent and empathetic AI experience.

Why does it matter?

Inflection-2.5 challenges leading language models like GPT-4 and Gemini with their raw capability, signature personality, and empathetic fine-tuning. This will provide a new alternative for startups and enterprises building personalized applications with generative AI capabilities.

Source

Google announces LLMs on device with MediaPipe

Google’s new experimental release called the MediaPipe LLM Inference API  allows LLMs to run fully on-device across platforms. This is a significant development considering LLMs’ memory and computing demands, which are over a hundred times larger than traditional on-device models.

Google announces LLMs on device with MediaPipe
Google announces LLMs on device with MediaPipe

The MediaPipe LLM Inference API is designed to streamline on-device LLM integration for web developers and supports Web, Android, and iOS platforms. It offers several key features and optimizations that enable on-device AI. These include new operations, quantization, caching, and weight sharing. Developers can now run LLMs on devices like laptops and phones using MediaPipe LLM Inference API.

Why does it matter?

Running LLMs on devices using MediaPipe and TensorFlow Lite allows for direct deployment, reducing dependence on cloud services. On-device LLM operation ensures faster and more efficient inference, which is crucial for real-time applications like chatbots or voice assistants. This innovation helps rapid prototyping with LLM models and offers streamlined platform integration.

Source

GaLore: A new method for memory-efficient LLM training

Researchers have developed a new technique called Gradient Low-Rank Projection (GaLore) to reduce memory usage while training large language models significantly. Tests have shown that GaLore achieves results similar to full-rank training while reducing optimizer state memory usage by up to 65.5% when pre-training large models like LLaMA.

GaLore: A new method for memory-efficient LLM training
GaLore: A new method for memory-efficient LLM training

It also allows pre-training a 7 billion parameter model from scratch on a single 24GB consumer GPU without needing extra techniques. This approach works well for fine-tuning and outperforms low-rank methods like LoRA on GLUE benchmarks while using less memory. GaLore is optimizer-independent and can be used with other techniques like 8-bit optimizers to save additional memory.

Why does it matter?

The gradient matrix’s low-rank nature will help AI developers during model training. GaLore minimizes the memory cost of storing gradient statistics for adaptive optimization algorithms. It enables training large models like LLaMA with reduced memory consumption, making it more accessible and efficient for researchers.

Source

🤖 OpenAI CTO complained to board about ‘manipulative’ CEO Sam Altman 

  • OpenAI CTO Mira Murati was reported by the New York Times to have played a significant role in CEO Sam Altman’s temporary removal, raising concerns about his leadership in a private memo and with the board.
  • Altman was accused of creating a toxic work environment, leading to fears among board members that key executives like Murati and co-founder Ilya Sutskever could leave, potentially causing a mass exit of talent.
  • Despite internal criticisms of Altman’s leadership and management of OpenAI’s startup fund, hundreds of employees threatened to leave if he was not reinstated, highlighting deep rifts within the company’s leadership.
  • Source

Saudi Arabia’s Male Humanoid Robot Accused of Sexual Harassment

A video of Saudi Arabia’s first male robot has gone viral after a few netizens accused the humanoid of touching a female reporter inappropriately.

“Saudi Arabia unveils its man-shaped AI robot, Mohammad, reacts to a reporter in its first appearance,” an X user wrote while sharing the video that people are claiming shows the robot’s inappropriate behaviour. You can view the original tweet here.

What Else Is Happening in AI on March 08th, 2024❗

📱Adobe makes creating social content on mobile easier

Adobe has launched an updated version of Adobe Express, a mobile app that now includes Firefly AI models. The app offers features such as a “Text to Image” generator, a “Generative Fill” feature, and a “Text Effects” feature, which can be utilized by small businesses and creative professionals to enhance their social media content. Creative Cloud members can also access and work on creative assets from Photoshop and Illustrator directly within Adobe Express. (Link)

🛡️OpenAI now allows users to add MFA to user accounts

To add extra security to OpenAI accounts, users can now enable Multi-Factor Authentication (MFA). To set up MFA, users can follow the instructions in the OpenAI Help Center article “Enabling Multi-Factor Authentication (MFA) with OpenAI.” MFA requires a verification code with their password when logging in, adding an extra layer of protection against unauthorized access. (Link)

🏅US Army is building generative AI chatbots in war games

The US Army is experimenting with AI chatbots for war games. OpenAI’s technology is used to train the chatbots to provide battle advice. The AI bots act as military commanders’ assistants, offering proposals and responding within seconds. Although the potential of AI is acknowledged, experts have raised concerns about the risks involved in high-stakes situations. (Link)

🧑‍🎨 Claude 3 builds the painting app in 2 minutes and 48 seconds

Claude 3, the latest AI model by Anthropic, created a multiplayer drawing app in just 2 minutes and 48 seconds. Multiple users could collaboratively draw in real-time with user authentication and database integration. The AI community praised the app, highlighting the transformative potential of AI in software development. Claude 3 could speed up development cycles and make software creation more accessible. (Link)

🧪Cognizant launches AI lab in San Francisco to drive innovation

Cognizant has opened an AI lab in San Francisco to accelerate AI adoption in businesses. The lab, staffed with top researchers and developers, will focus on innovation, research, and developing cutting-edge AI solutions. Cognizant’s investment in AI research positions them as a thought leader in the AI space, offering advanced solutions to meet the modernization needs of global enterprises. (Link)

A Daily Chronicle of AI Innovations in March 2024 – Day 7: AI Daily News – March 07th, 2024

🗣️Microsoft’s NaturalSpeech makes AI sound human
🔍Google’s search update targets AI-generated spam
🤖Google’s RT-Sketch teaches robots with doodles

🕵️ Ex-Google engineer charged with stealing AI secrets for Chinese firm

🚨 Microsoft engineer sounds alarm on company’s AI image generator in letter to FTC

🤔 Apple bans Epic’s developer account and calls the company ‘verifiably untrustworthy’

🍎 Apple reportedly developing foldable MacBook with 20.3-inch screen

🧠 Meta is building a giant AI model to power its ‘entire video ecosystem

Microsoft’s NaturalSpeech makes AI sound human

Microsoft and its partners have created NaturalSpeech 3, a new Text-to-Speech system that makes computer-generated voices sound more human. Powered by FACodec architecture and factorized diffusion models, NaturalSpeech 3 breaks down speech into different parts, like content, tone, and sound quality to create a natural-sounding speech that fits specific prompts, even for voices it hasn’t heard before.

Microsoft's NaturalSpeech makes AI sound human
Microsoft’s NaturalSpeech makes AI sound human

NaturalSpeech 3 works better than other voice tech in terms of quality, similarity, tone, and clarity. It keeps getting better as it learns from more data. By letting users change how the speech sounds through prompts, NaturalSpeech 3 makes talking to computers feel more like talking to a person. This research is a big step towards a future where chatting with computers is as easy as chatting with friends.

Why does this matter?

This advancement transcends mere voice quality. This could change the way we interact with devices like smartphones, smart speakers, and virtual assistants. Imagine having a more natural, engaging conversation with Siri, Alexa, or other AI helpers.

Better voice tech could also make services more accessible for people with visual impairments or reading difficulties. It might even open up new possibilities in entertainment, like more lifelike characters in video games or audiobooks that sound like they’re read by your favorite celebrities.

Source

Google’s search update targets AI-generated spam

Google has announced significant changes to its search ranking algorithms in order to reduce low-quality and AI-generated spam content in search results. The March update targets three main spam practices: mass distribution of unhelpful content, abusing site reputation to host low-quality content, and repurposing expired domains with poor content.

While Google is not devaluing all AI-generated content, it aims to judge content primarily on its usefulness to users. Most of the algorithm changes are effective immediately, though sites abusing their reputation have a 60-day grace period to change their practices. As Google itself develops AI tools, SGE and Gemini, the debate around AI content and search result quality is just beginning.

Why does this matter?

Websites that churn out lots of AI-made content to rank higher on Google may see their rankings drop. This might push them to focus more on content creation strategies, with a greater emphasis on quality over quantity.

For people using Google, the changes should mean finding more useful results and less junk.

As AI continues to advance, search engines like Google will need to adapt their algorithms to surface the most useful content, whether it’s written by humans or AI.

Source

Google’s RT-Sketch teaches robots with doodles

Google has introduced RT-Sketch, a new approach to teaching robots tasks using simple sketches. Users can quickly draw a picture of what they want the robot to do, like rearranging objects on a table. RT-Sketch focuses on the essential parts of the sketch, ignoring distracting details.

Google's RT-Sketch teaches robots with doodles
Google’s RT-Sketch teaches robots with doodles

Source

RT-Sketch is trained on a dataset of paired trajectories and synthetic goal sketches, and tested on six object rearrangement tasks. The results show that RT-Sketch performs comparably to image or language-conditioned agents in simple settings with written instructions on straightforward tasks. However, it did better when instructions were confusing or there were distracting objects present.

RT-Sketch can also interpret and act upon sketches with varying levels of detail, from basic outlines to colorful drawings.

Why does this matter?

With RT-Sketch, people can tell robots what to do without needing perfect images or detailed written instructions. This could make robots more accessible and useful in homes, workplaces, and for people who have trouble communicating in other ways.

As robots become a bigger part of our lives, easy ways to talk to them, like sketching, could help us get the most out of them. RT-Sketch is a step toward making robots that better understand what we need.

Source

What Else Is Happening in AI on March 07th, 2024❗

🤖Google’s Gemini lets users edit within the chatbox

Google has updated its Gemini chatbot, allowing users to directly edit and fine-tune responses within the chatbox. This feature, launched on March 4th for English users in the Gemini web app, enables more precise outputs by letting people select text portions and provide instructions for improvement. (Link)

📈Adobe’s AI boosts IBM’s marketing efficiency

IBM reports a 10-fold increase in designer productivity and a significant reduction in marketing campaign time after testing Adobe’s generative AI tools. The AI-powered tools have streamlined idea generation and variant creation, allowing IBM to achieve more in less time. (Link)

💡 Zapier’s new tool lets you make AI bots without coding

Zapier has released Zapier Central, a new AI tool that allows users to create custom AI bots by simply describing what they want, without any coding. The bots can work with Zapier’s 6,000+ connected apps, making it easy for businesses to automate tasks. (Link)

🤝Accenture teams up with Cohere to bring AI to enterprises

Accenture has partnered with AI startup, Cohere to provide generative AI solutions to businesses. Leveraging Cohere’s language models and search technologies, the collaboration aims to boost productivity and efficiency while ensuring data privacy and security. (Link)

🎥 Meta builds mega AI model for video recommendations
Meta is developing a single AI model to power its entire video ecosystem across platforms by 2026. The company has invested billions in Nvidia GPUs to build this model, which has already shown promising results in improving Reels watch time on the core Facebook app. (Link)

OpenAI is researching photonic processors to run their AI on

OpenAI hired this person:  He has been doing a lot of research on waveguides for photonic processing for both Training AI and for inference and he did a PHD about photonic waveguides:

I think that he is going to help OpenAI to build photonic waveguides that they can run their neural networks / AI Models on and this is really  cool if OpenAI actually think that they can build processors with faster Inference and training with photonics.

🕵️ Ex-Google engineer charged with stealing AI secrets for Chinese firm

  • Linwei Ding, a Google engineer, has been indicted for allegedly stealing over 500 files related to Google’s AI technology, including designs for chips and data center technologies, to benefit companies in China.
  • The stolen data includes designs for Google’s TPU chips and GPUs, crucial for AI workloads, amid U.S. efforts to restrict China’s access to AI-specific chips.
  • Ding allegedly transferred stolen files to a personal cloud account using a method designed to evade Google’s detection systems, was offered a CTO position by a Chinese AI company and founded a machine learning startup in China while still employed at Google.
  • Source

🚨 Microsoft engineer sounds alarm on company’s AI image generator in letter to FTC

  • Microsoft AI engineer Shane Jones warns that the company’s AI image generator, Copilot Designer, generates sexual and violent content and ignores copyright laws.
  • Jones shared his findings with Microsoft and contacted U.S. senators and the FTC, demanding better safeguards and an independent review of Microsoft’s AI incident reporting process.
  • In addition to the problems with Copilot Designer, other Microsoft products based on OpenAI technologies, such as Copilot Chat, tend to have poorer performance and more insecure implementations than the original OpenAI products, such as ChatGPT and DALL-E 3.
  • Source

🧠 Meta is building a giant AI model to power its ‘entire video ecosystem’ 

  • Meta is developing an AI model designed to power its entire video ecosystem, including the TikTok-like Reels service and traditional video content, as part of its technology roadmap through 2026.
  • The company has invested billions of dollars in Nvidia GPUs to support this AI initiative, aiming to improve recommendation systems and overall product performance across all platforms.
  • This AI model has already demonstrated an 8% to 10% increase in Reels watch time on the Facebook app, with Meta now working to expand its application to include the Feed recommendation product and possibly integrate sophisticated chatting tools.
  • Innovating for the Future

    As Meta continues to innovate and refine their AI model architecture, we can expect even more exciting developments in the future. The company’s dedication to enhancing the video recommendation experience and leveraging the full potential of AI is paving the way for a new era in online video consumption.

    Stay tuned for more updates as Meta strives to revolutionize the digital video landscape with its cutting-edge AI technology.

    r/aidailynewsupdates - Meta's AI Model to Revolutionize Video Ecosystem
  • Source

Will AI destroy the adtech industry?

Some points to consider on both sides:

Yes:

– AI will enable humans to get content they want, nothing more

– New AI OSes will act ‘for’ the human, cleaning content of ads

– OpenAI and new startups don’t need ad revenue, they’ll take monthly subscriptions to deliver information with no ads

No:

– New AI OSes will integrate ads even more closely into the computing experience, acting ‘against’ the human

– Content will be more tightly integrated with ads, and AI won’t be able to unpiece this

– Meta and Alphabet have $100bns of skin in the game, they will make sure this doesn’t happen, including by using their lawyers to prevent lifting content out of the ad context

A Daily Chronicle of AI Innovations in March 2024 – Day 6: AI Daily News – March 06th, 2024

🏆 Microsoft’s Orca AI beats 10x bigger models in math
🎨 GPT-4V wins at turning designs into code
🎥 DeepMind alums’ Haiper joins the AI video race

🤔 OpenAI fires back, says Elon Musk demanded ‘absolute control’ of the company

📱 iOS 17.4 is here: what you need to know

🚫 TikTok faces US ban if ByteDance fails to sell app

🔍 Google now wants to limit the AI-powered search spam it helped create

OpenAI vs Musk (openai responds to elon musk).

 What does Elon mean by: “Unfortunately, humanity’s future is in the hands of <redacted>”? Is it google?

What does elon mean "Unfortunately, humanity's future is in the hands of <redacted>"? Is it google?
What does elon mean “Unfortunately, humanity’s future is in the hands of “? Is it google?
What does elon mean "Unfortunately, humanity's future is in the hands of <redacted>"? Is it google?
What does elon mean “Unfortunately, humanity’s future is in the hands of “? Is it google?
  • OpenAI has countered Elon Musk’s lawsuit by revealing Musk’s desire for “absolute control” over the company, including merging it with Tesla, holding majority equity, and becoming CEO.
  • In a blog post, OpenAI aims to dismiss Musk’s claims and argues against his view that the company has deviated from its original nonprofit mission and has become too closely aligned with Microsoft.
  • OpenAI defends its stance on not open-sourcing its work, citing a 2016 email exchange with Musk that supports a less open approach as the development of artificial general intelligence advances.

For the first time in history, an AI has a higher IQ than the average human.

For the first time in history, an AI has a higher IQ than the average human.
For the first time in history, an AI has a higher IQ than the average human.

Claude 3 vs. GPT-4

Right now, the question on everyone’s mind is whether Claude 3 is better than GPT-4. It’s a fair question; GPT-4 has dominated the LLM benchmarks for over a year, despite plenty of competitors trying to catch up.

Certainly, GPT-4 now has some real competition in the form of Claude 3 and Gemini 1.5. Even if we put the benchmarks aside for a moment, capabilities like video comprehension and million-token context windows are pushing the state of the art forward, and OpenAI could finally cede its dominant position.

But I think that “best,” when it comes to LLMs, is a little bit of a red herring. Despite the marketing and social media hype, these models have more similarities than differences. Ultimately, “best” depends on your use cases and preferences.

Claude 3 may be better at reasoning and language comprehension than GPT-4, but that won’t matter much if you’re mainly generating code. Likewise, Gemini 1.5 may have better multi-modal capabilities, but if you’re concerned with working in different languages, then Claude might be your best bet. In my (very limited) testing, I’ve found that Opus is a much better writer than GPT-4 – the default writing style is far more “normal” than what I can now recognize as ChatGPT-generated content. But I’ve yet to try brainstorming and code generation tasks.

So, for now, my recommendation is to keep experimenting and find a model that works for you. Not only because each person’s use cases differ but also because the models are regularly improving! In the coming months, Anthropic plans to add function calls, interactive coding, and more agentic capabilities to Claude 3.

To try Claude 3 for yourself, you can start talking with Claude 3 Sonnet today (though you’ll need to be in one of Anthropic’s supported countries). Opus is available to paid subscribers of Claude Pro. If you’re a developer, Opus and Sonnet are available via the API, and Sonnet is additionally available through Amazon Bedrock and Google Cloud’s Vertex AI Model Garden. The models are also available via a growing number of third-party apps and services: check your favorite AI tool to see if it supports Claude 3!

Guy builds an AI-steered homing/killer drone in just a few hours

Guy builds an AI-steered homing/killer drone in just a few hours
Guy builds an AI-steered homing/killer drone in just a few hours

Read Aloud For Me AI Dashboard on the App Store (apple.com)

Always Say Hello to Your GPTs… (Better Performing Custom GPTs)

I’ve been testing out lots of custom GPTs that others have made. Specifically games and entertaining GPTs and I noticed some issues and a solution.

The problem: First off, many custom GPT games seem to forget to generate images as per their instructions. I also noticed that, often, the game or persona (or whatever the GPT aims to be) becomes more of a paraphrased or simplified version of what it should be and responses become more like base ChatGPT.

The solution: I’ve noticed that custom GPTs will perform much better if the user starts the initial conversation with a simple ”Hello, can you explain your functionality and options to me?”. This seems to remind the custom GPT of it’s tone ensures it follow’s its instructions.

Microsoft’s Orca AI beats 10x bigger models in math

Microsoft’s Orca team has developed Orca-Math, an AI model that excels at solving math word problems despite its compact size of just 7 billion parameters. It outperforms models ten times larger on the GSM8K benchmark, achieving 86.81% accuracy without relying on external tools or tricks. The model’s success is attributed to training on a high-quality synthetic dataset of 200,000 math problems created using multi-agent flows and an iterative learning process involving AI teacher and student agents.

Microsoft's Orca AI beats 10x bigger models in math
Microsoft’s Orca AI beats 10x bigger models in math

The Orca team has made the dataset publicly available under the MIT license, encouraging researchers and developers to innovate with the data. The small dataset size highlights the potential of using multi-agent flows to generate data and feedback efficiently.

Why does this matter?

Orca-Math’s breakthrough performance shows the potential for smaller, specialized AI models in niche domains. This development could lead to more efficient and cost-effective AI solutions for businesses, as smaller models require less computational power and training data, giving companies a competitive edge.

Source

GPT-4V wins at turning designs into code

With unprecedented capabilities in multimodal understanding and code generation, GenAI can enable a new paradigm of front-end development where LLMs directly convert visual designs into code implementation. New research formalizes this as “Design2Code” task and conduct comprehensive benchmarking. It also:

  • Introduces Design2Code benchmark consisting of diverse real-world webpages as test examples
  • Develops comprehensive automatic metrics that complement human evaluations
  • Proposes new multimodal prompting methods that improve over direct prompting baselines.
  • Finetunes open-source Design2Code-18B model that matches the performance of Gemini Pro Vision on both human and automatic evaluation

Moreover, it finds 49% of the GPT-4V-generations webpages were good enough to replace the original references, while 64% were even better designed than the original references.

Why does this matter?

This research could simplify web development for anyone to build websites from visual designs using AI, much like word processors made writing accessible. For enterprises, automating this front-end coding process could improve collaboration between teams and speed up time-to-market across industries if implemented responsibly alongside human developers.

Source

What Else Is Happening in AI on March 06th, 2024❗

📸 Kayak’s AI finds cheaper flights from screenshots

Kayak introduced two new AI features: PriceCheck, which lets users upload flight screenshots to find cheaper alternatives and Ask Kayak, a ChatGPT-powered travel advice chatbot. These additions position Kayak alongside other travel sites, using generative AI to improve trip planning and flight price comparisons in a competitive market. (Link)

🎓 Accenture invests $1B in LearnVantage for AI upskilling

Accenture is launching LearnVantage, investing $1 billion over three years to provide clients with customized technology learning and training services. Accenture is also acquiring Udacity to scale its learning capabilities and meet the growing demand for technology skills, including generative AI, so organizations can achieve business value using AI. (Link)

🤝 Snowflake brings Mistral’s LLMs to its data cloud

Snowflake has partnered with Mistral AI to bring Mistral’s open LLMs into its Data Cloud. This move allows Snowflake customers to build LLM apps directly within the platform. It also marks a significant milestone for Mistral AI, which has recently secured partnerships with Microsoft, IBM, and Amazon. The deal positions Snowflake to compete more effectively in the AI space and increases Mistral AI visibility. (Link)

🛡️ Dell & CrowdStrike unite to fight AI threats

Dell and CrowdStrike are partnering to help businesses fight cyberattacks using AI. By integrating CrowdStrike’s Falcon XDR platform into Dell’s MDR service, they aim to protect customers against threats like generative AI attacks, social engineering, and endpoint breaches. (Link)

📱 AI app diagnoses ear infections with a snap

Physician-scientists at UPMC and the University of Pittsburgh have developed a smartphone app that uses AI to accurately diagnose ear infections (acute otitis media) in young children. The app analyzes short videos of the eardrum captured by an otoscope connected to a smartphone camera. It could help decrease unnecessary antibiotic use by providing a more accurate diagnosis than many clinicians. (Link)

DeepMind alums’ Haiper joins the AI video race

DeepMind alums Yishu Miao and Ziyu Wang have launched Haiper, a video generation tool powered by their own AI model. The startup offers a free website where users can generate short videos using text prompts, although there are limitations on video length and quality.

DeepMind alums' Haiper joins the AI video race
DeepMind alums’ Haiper joins the AI video race

The company has raised $19.2 million in funding and focuses on improving its AI model to deliver high-quality, realistic videos. They aim to build a core video generation model that can be offered to developers and address challenges like the “uncanny valley” problem in AI-generated human figures.

Why does this matter?

Haiper signals the race to develop video AI models that can disrupt industries like marketing, entertainment, and education by allowing businesses to generate high-quality video content cost-effectively. However, the technology is at an early stage, so there is room for improvement, highlighting the need for responsible development.

Source

A Daily Chronicle of AI Innovations in March 2024 – Day 5: AI Daily News – March 05th, 2024

🏆Anthropic’s Claude 3 Beats OpenAI’s GPT-4
🖼️ TripsoSR: 3D object generation from a single image in <1s
🔒 Cloudflare’s Firewall for AI protects LLMs from abuses

🥴 Google co-founder says company ‘definitely messed up’

🚫 Facebook, Instagram, and Threads are all down

🤔 Microsoft compares New York Times to ’80s movie studios trying to ban VCRs

💼 Fired Twitter execs are suing Elon Musk for over $128 million

Claude 3 gets ~60% accuracy on GPQA

 Claude 3 gets ~60% accuracy on GPQA
Claude 3 gets ~60% accuracy on GPQA

Anthropic’s Claude 3 beats OpenAI’s GPT-4

Anthropic has launched Claude 3, a new family of models that has set new industry benchmarks across a wide range of cognitive tasks. The family comprises three state-of-the-art models in ascending order of cognitive ability: Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus. Each model provides an increasing level of performance, and you can choose the one according to your intelligence, speed, and cost requirements.

Anthropic’s Claude 3 beats OpenAI’s GPT-4
Anthropic’s Claude 3 beats OpenAI’s GPT-4

Opus and Sonnet are now available via claude.ai and the Claude API in 159 countries, and Haiku will join that list soon.

Claude 3 has set a new standard of intelligence among its peers on most of the common evaluation benchmarks for AI systems, including undergraduate-level expert knowledge (MMLU), graduate-level expert reasoning (GPQA), basic mathematics (GSM8K), and more.

Anthropic’s Claude 3 beats OpenAI’s GPT-4
Anthropic’s Claude 3 beats OpenAI’s GPT-4

In addition, Claude 3 displays solid visual processing capabilities and can process a wide range of visual formats, including photos, charts, graphs, and technical diagrams.  Lastly, compared to Claude 2.1, Claude 3 exhibits 2x accuracy and precision for responses and correct answers.

Why does it matter?

In 2024, Gemini and ChatGPT caught the spotlight, but now Claude 3 has emerged as the leader in AI benchmarks. While benchmarks matter, only the practical usefulness of Claude 3 will tell if it is truly superior. This might also prompt OpenAI to release a new ChatGPT upgrade. However, with AI models becoming more common and diverse, it’s unlikely that one single model will emerge as the ultimate winner.

Source

TripsoSR: 3D object generation from a single image in <1s

Stability AI has introduced a new AI model named TripsoSR in partnership with Trip AI. The model enables high-quality 3D object generation or rest from a single in less than a second. It runs under low inference budgets (even without a GPU) and is accessible to many users.

TripsoSR: 3D object generation from a single image in <1s
TripsoSR: 3D object generation from a single image in <1s

As far as performance, TripoSR can create detailed 3D models in a fraction of the time of other models. When tested on an Nvidia A100, it generates draft-quality 3D outputs (textured meshes) in around 0.5 seconds, outperforming other open image-to-3D models such as OpenLRM.

TripsoSR: 3D object generation from a single image in <1s
TripsoSR: 3D object generation from a single image in <1s

Why does it matter?

TripoSR caters to the growing demands of various industries, including entertainment, gaming, industrial design, and architecture. The availability of the model weights and source code for download further promotes commercialized, personal, and research use, making it a valuable asset for developers, designers, and creators.

Source

Cloudflare’s Firewall for AI protects LLMs from abuses

Cloudflare has released a Firewall for AI, a protection layer that you can deploy in front of Large Language Models (LLMs) to identify abuses before they reach the models. While the traditional web and API vulnerabilities also apply to the LLM world, Firewall for AI is an advanced-level Web Application Firewall (WAF) designed explicitly for LLM protection and placed in front of applications to detect vulnerabilities and provide visibility to model owners.

Cloudflare Firewall for AI is deployed like a traditional WAF, where every API request with an LLM prompt is scanned for patterns and signatures of possible attacks. You can deploy it in front of models hosted on the Cloudflare Workers AI platform or any other third-party infrastructure. You can use it alongside Cloudflare AI Gateway and control/set up a Firewall for AI using the WAF control plane.

Cloudflare's Firewall for AI protects LLMs from abuses
Cloudflare’s Firewall for AI protects LLMs from abuses

Why does it matter?

As the use of LLMs becomes more widespread, there is an increased risk of vulnerabilities and attacks that malicious actors can exploit. Cloudflare is one of the first security providers to launch tools to secure AI applications. Using a Firewall for AI, you can control what prompts and requests reach their language models, reducing the risk of abuses and data exfiltration. It also aims to provide early detection and protection for both users and LLM models, enhancing the security of AI applications.

Source

🤔 Microsoft compares New York Times to ’80s movie studios trying to ban VCRs

  • Microsoft filed a motion to dismiss the New York Times’ copyright infringement lawsuit against OpenAI, comparing the newspaper’s stance to 1980s movie studios’ attempts to block VCRs, arguing that generative AI, like the VCR, does not hinder the original content’s market.
  • The company, as OpenAI’s largest supporter, asserts that copyright law does not obstruct ChatGPT’s development because the training content does not substantially affect the market for the original content.
  • Microsoft and OpenAI contend that ChatGPT does not replicate or substitute for New York Times content, emphasizing that the AI’s training on such articles does not significantly contribute to its development.
  • Source

🥴 Google co-founder says company ‘definitely messed up’

  • Sergey Brin admitted Google “definitely messed up” with the Gemini AI’s image generation, highlighting issues like historically inaccurate images and the need for more thorough testing.
  • Brin, a core contributor to Gemini, came out of retirement due to the exciting trajectory of AI, amidst the backdrop of Google’s “code red” in response to OpenAI’s ChatGPT.
  • Criticism of Gemini’s biases and errors, including its portrayal of people of color and responses in written form, led to Brin addressing concerns over the AI’s unintended left-leaning output.
  • Source

A Daily Chronicle of AI Innovations in March 2024 – Day 4: AI Daily News – March 04th, 2024

👀 Google’s ScreenAI can ‘see’ graphics like humans do
🐛 How AI ‘worms’ pose security threats in connected systems
🧠 New benchmarking method challenges LLMs’ reasoning abilities

💊 AI may enable personalized prostate cancer treatment

🎥 Vimeo debuts AI-powered video hub for business collaboration

📱 Motorola revving up for AI-powered Moto X50 Ultra launch

📂 Copilot will soon fetch and parse your OneDrive files

⚡ Huawei’s new AI chip threatens Nvidia’s dominance in China

OpenAI adds ‘Read Aloud’ voiceover to ChatGPT

https://youtu.be/ZJvTv7zVX0s?si=yejANUAUtUwyXEH8

OpenAI rolled out a new “Read Aloud” feature for ChatGPT as rivals like Anthropic and Google release more capable language models. (Source)

The Voiceover Update

  • ChatGPT can now narrate responses out loud on mobile apps and web.

  • Activated by tapping the response or clicking the microphone icon.

  • Update comes as Anthropic unveils their newest Claude 3 model.

  • Timing seems reactive amid intense competition over advanced AI. OpenAI also facing lawsuit from Elon Musk over alleged betrayal.

Anthropic launches Claude 3, claiming to outperform GPT-4 across the board

https://youtu.be/Re0WgPNiLo4?si=DwfGraTvhVo8kjuK

Here’s the announcement from Anthropic and their benchmark results:
https://twitter.com/AnthropicAI/status/1764653830468428150

Anthropic launches Claude 3, claiming to outperform GPT-4 across the board
Anthropic launches Claude 3, claiming to outperform GPT-4 across the board

Google’s ScreenAI can ‘see’ graphics like humans do

Google Research has introduced ScreenAI, a Vision-Language Model that can perform question-answering on digital graphical content like infographics, illustrations, and maps while also annotating, summarizing, and navigating UIs. The model combines computer vision (PaLI architecture) with text representations of images to handle these multimodal tasks.

Despite having just 4.6 billion parameters, ScreenAI achieves new state-of-the-art results on UI- and infographics-based tasks and new best-in-class performance on others, compared to models of similar size.

Google’s ScreenAI can ‘see’ graphics like humans do
Google’s ScreenAI can ‘see’ graphics like humans do

While ScreenAI is best-in-class on some tasks, further research is needed to match models like GPT-4 and Gemini, which are significantly larger. Google Research has released a dataset with ScreenAI’s unified representation and two other datasets to help the community experiment with more comprehensive benchmarking on screen-related tasks.

Why does this matter?

ScreenAI’s breakthrough in unified visual and language understanding bridges the disconnect between how humans and machines interpret ideas across text, images, charts, etc. Companies can now leverage these multimodal capabilities to build assistants that summarize reports packed with graphics, analysts that generate insights from dashboard visualizations, and agents that manipulate UIs to control workflows.

Source

How AI ‘worms’ pose security threats in connected systems

Security researchers have created an AI “worm” called Morris II to showcase vulnerabilities in AI ecosystems where different AI agents are linked together to complete tasks autonomously.

The researchers tested the worm in a simulated email system using ChatGPT, Gemini, and other popular AI tools. The worm can exploit these AI systems to steal confidential data from emails or forward spam/propaganda without human approval. It works by injecting adversarial prompts that make the AI systems behave maliciously.

While this attack was simulated, the research highlights risks if AI agents are given too much unchecked freedom to operate.

Why does it matter?

This AI “worm” attack reveals that generative models like ChatGPT have reached capabilities that require heightened security to prevent misuse. Researchers and developers must prioritize safety by baking in controls and risk monitoring before commercial release. Without industry-wide commitments to responsible AI, regulation may be needed to enforce acceptable safeguards across critical domains as systems gain more autonomy.

Source

New benchmarking method challenges LLMs’ reasoning abilities

Researchers at Consequent AI have identified a “reasoning gap” in large language models like GPT-3.5 and GPT-4. They introduced a new benchmarking approach called “functional variants,” which tests a model’s ability to reason instead of just memorize. This involves translating reasoning tasks like math problems into code that can generate unique questions requiring the same logic to solve.

New benchmarking method challenges LLMs’ reasoning abilities
New benchmarking method challenges LLMs’ reasoning abilities

When evaluating several state-of-the-art models, the researchers found a significant gap between performance on known problems from benchmarks versus new problems the models had to reason through. The gap was 58-80%, indicating the models do not truly understand complex problems but likely just store training examples. The models performed better on simpler math but still demonstrated limitations in reasoning ability.

Why does this matter?

This research reveals that reasoning still eludes our most advanced AIs. We risk being misled by claims of progress made by the Big Tech if their benchmarks reward superficial tricks over actual critical thinking. Moving forward, model creators will have to prioritize generalization and logic over memorization if they want to make meaningful progress towards general intelligence.

Source

What Else Is Happening in AI on March 04th, 2024❗

💊 AI may enable personalized prostate cancer treatment

Researchers used AI to analyze prostate cancer DNA and found two distinct subtypes called “evotypes.” Identifying these subtypes could allow for better prediction of prognosis and personalized treatments. (Link)

🎥 Vimeo debuts AI-powered video hub for business collaboration

Vimeo has launched a new product called Vimeo Central, an AI-powered video hub to help companies improve internal video communications, collaboration, and analytics. Key capabilities include a centralized video library, AI-generated video summaries and highlights, enhanced screen recording and video editing tools, and robust analytics. (Link)

📱 Motorola revving up for AI-powered Moto X50 Ultra launch

Motorola is building hype for its upcoming Moto X50 Ultra phone with a Formula 1-themed teaser video highlighting the device’s powerful AI capabilities. The phone will initially launch in China on April 21 before potentially getting a global release under the Motorola Edge branding. (Link)

📂 Copilot will soon fetch and parse your OneDrive files

Microsoft is soon to launch Copilot for OneDrive, an AI assistant that will summarize documents, extract information, answer questions, and follow commands related to files stored in OneDrive. Copilot can generate outlines, tables, and lists based on documents, as well as tailored summaries and responses. (Link)

⚡ Huawei’s new AI chip threatens Nvidia’s dominance in China

Huawei has developed a new AI chip, the Ascend 910B, which matches the performance of Nvidia’s A100 GPU based on assessments by SemiAnalysis. The Ascend 910B is already being used by major Chinese companies like Baidu and iFlytek and could take market share from Nvidia in China due to US export restrictions on Nvidia’s latest AI chips. (Link)

1-bit LLMs explained

Check out this new tutorial that summarizes the revolutionary paper “The Era of 1-bit LLMs” introducing BitNet b1.58 model and explain what are 1-bit LLMs and how they are useful.

A Daily Chronicle of AI Innovations in March 2024 – Day 2: AI Daily News – March 02nd, 2024

A Daily Chronicle of AI Innovations in March 2024 – Day 1: AI Daily News – March 01st, 2024

🪄Sora showcases jaw-dropping geometric consistency
🧑‍✈️Microsoft introduces Copilot for finance in Microsoft 365
🤖OpenAI and Figure team up to develop AI for robots

Elon Sues OpenAI for “breach of contract”

Elon Musk filed suit against OpenAI and CEO Sam Altman, alleging they have breached the artificial-intelligence startup’s founding agreement by putting profit ahead of benefiting humanity.

The 52-year-old billionaire, who helped fund OpenAI in its early days, said the company’s close relationship with Microsoft has undermined its original mission of creating open-source technology that wouldn’t be subject to corporate priorities. Musk, who is also CEO of Tesla has been among the most outspoken about the dangers of AI and artificial general intelligence, or AGI.

“To this day, OpenAI Inc.’s website continues to profess that its charter is to ensure that AGI “benefits all of humanity.” In reality, however, OpenAI has been transformed into a closed-source de facto subsidiary of the largest technology company in the world: Microsoft,” the lawsuit says.

ELON MUSK vs. SAMUEL ALTMAN, GREGORY BROCKMAN, OPENAI, INC.
Elon Sues OpenAI for “breach of contract”

Sora showcases jaw-dropping geometric consistency

Sora from OpenAI has been remarkable in video generation compared to other leading models like Pika and Gen2. In a recent benchmarking test conducted by ByteDanc.Inc in collaboration with Wuhan and Nankai University, Sora showcased video generation with high geometric consistency.

AI Innovations in March 2024: Sora showcases jaw-dropping geometric consistency
Sora showcases jaw-dropping geometric consistency

The benchmark test assesses the quality of generated videos based on how it adhere to the principles of physics in real-world scenarios. Researchers used an approach where generated videos are transformed into 3D models. Further, a team of researchers used the fidelity of geometric constraints to measure the extent to which generated videos conform to physics principles in the real world.

Why does it matter?

Sora’s remarkable performance in generating geometrically consistent videos can greatly boost several use cases for construction engineers and architects. Further, the new benchmarking will allow researchers to measure newly developed models to understand how accurately their creations conform to the principles of physics in real-world scenarios.

Source

Microsoft introduces Copilot for finance in Microsoft 365

Microsoft has launched Copilot for Finance, a new addition to its Copilot series that recommends AI-powered productivity enhancements. It aims to transform how finance teams approach their daily work with intelligent workflow automation, recommendations, and guided actions. This Copilot aims to simplify data-driven decision-making, helping finance professionals have more free time by automating manual tasks like Excel and Outlook.

Copilot for Finance simplifies complex variance analysis in Excel, account reconciliations, and customer account summaries in Outlook. Dentsu, Northern Trust, Schneider Electric, and Visa plan to use it alongside Copilot for Sales and Service to increase productivity, reduce case handling times, and gain better decision-making insights.

Why does it matter?

Introducing Microsoft Copilot for finance will help businesses focus on strategic involvement from professionals otherwise busy with manual tasks like data entry, workflow management, and more. This is a great opportunity for several organizations to automate tasks like analysis of anomalies, improve analytic efficiency, and expedite financial transactions.

Source

OpenAI and Figure team up to develop AI for robots 

Figure has raised $675 million in series B funding with investments from OpenAI, Microsoft, and NVIDIA. It is an AI robotics company developing humanoid robots for general-purpose usage. The collaboration agreement between OpenAI and Figure aims to develop advanced humanoid robots that will leverage the generative AI models at its core.

This collaboration will also help accelerate the development of smart humanoid robots capable of understanding tasks like humans. With its deep understanding of robotics, Figure is set to bring efficient robots for general-purpose enhancing automation.

Why does it matter?

Open AI and Figure will transform robot operations, adding generative AI capabilities. This collaboration will encourage the integration of generative AI capabilities across robotics development. Right from industrial robots to general purpose and military applications, generative AI can be the new superpower for robotic development.

Source

🔍 Google now wants to limit the AI-powered search spam it helped create

  • Google announced it will tackle AI-generated content aiming to manipulate search rankings through algorithmic enhancements, affecting automated content creation the most.
  • These algorithm changes are intended to discern and reduce low-quality and unhelpful webpages, aiming to improve the overall quality of search results.
  • The crackdown also targets misuse of high-reputation websites and the exploitation of expired domains for promoting substandard content.
  • Source

What Else Is Happening in AI in March 2024❗

🤝Stack Overflow partners with Google Cloud to power AI 

Stack Overflow and Google Cloud are partnering to integrate OverflowAPI into Google Cloud’s AI tools. This will give developers accessing the Google Cloud console access to Stack Overflow’s vast knowledge base of over 58 million questions and answers. The partnership aims to enable AI systems to provide more insightful and helpful responses to users by learning from the real-world experiences of programmers. (Link)

💻Microsoft unites rival GPU makers for one upscaling API

Microsoft is working with top graphics hardware makers to introduce “DirectSR”, a new API that simplifies the integration of super-resolution upscaling into games. DirectSR will allow game developers to easily access Nvidia’s DLSS, AMD’s FSR, and Intel’s XeSS with a single code path. Microsoft will preview the API in its Agility SDK soon and demonstrate it live with AMD and Nvidia reps on March 21st. (Link)

📈Google supercharges data platforms with AI for deeper insights

Google is expanding its AI capabilities across data and analytics services, including BigQuery and Cloud Databases. Vector search support is available across all databases, and BigQuery has the advanced Gemini Pro model for unstructured data analysis. Users can combine insights from images, video, audio, and text with structured data in a single analytics workflow. (Link)

🔍 Brave’s privacy-first AI-powered assistant is now available on Android 

Brave’s AI-powered assistant, Leo, is now available on Android, bringing helpful features like summarization, transcription, and translation while prioritizing user privacy. Leo processes user inputs locally on the device without retaining or using data to train itself, aligning with Brave’s commitment to privacy-focused services. Users can simplify tasks with Leo without compromising on security. (Link)

Elsewhere in AI anxiety:

February 2024 AI Recap

February 2024 AI Recap
February 2024 AI Recap

February 2024 – Week 4 Recap

  1. Mistral introduced a new model Mistral Large. It reaches top-tier reasoning capabilities, is multi-lingual by design, has native function calling capacities and has 32K tokens context window. The pre-trained model has 81.2% accuracy on MMLU. Alongside Mistral Large, Mistral Small, a model optimized for latency and cost has been released. Mistral Small outperforms Mixtral 8x7B and has lower latency. Mistral also launched a ChatGPT like new conversational assistant, le Chat Mistral [Details].
  2. Alibaba Group introduced EMO, an expressive audio-driven portrait-video generation framework. Input a single reference image and the vocal audio, e.g. talking and singing, it can generate vocal avatar videos with expressive facial expressions, and various head poses [Details].
  3. Ideogram introduced Ideogram 1.0, a text-to-image model trained from scratch for state-of-the-art text rendering, photorealism, prompt adherence, and a feature called Magic Prompt to help with prompting. Ideogram 1.0 is now available to all users on ideogram.ai [Details].
    Ideogram introduced Ideogram 1.0
    Ideogram introduced Ideogram 1.0
  4. Google DeepMind introduced Genie (generative interactive environments), a foundation world model trained exclusively from Internet videos that can generate interactive, playable environments from a single image prompt  [Details].
  5. Pika Labs launched Lip Sync feature, powered by audio from Eleven Labs, for its AI generated videos enabling users to make the characters talk with realistic mouth movements [Video].
  6.  UC Berkeley introduced Berkeley Function Calling Leaderboard (BFCL) to evaluate the function calling capability of different LLMs. Gorilla Open Functions v2, an open-source model that can help users with building AI applications with function calling and interacting with json compatible output has also been released [Details].
  7. Qualcomm launched AI Hub, a curated library of 80+ optimized AI models for superior on-device AI performance across Qualcomm and Snapdragon platforms [Details].
  8. BigCode released StarCoder2, a family of open LLMs for code and comes in 3 different sizes with 3B, 7B and 15B parameters. StarCoder2-15B is trained on over 4 trillion tokens and 600+ programming languages from The Stack v2 dataset [Details].
  9. Researchers released FuseChat-7B-VaRM, which is the fusion of three prominent chat LLMs with diverse architectures and scales, namely NH2-Mixtral-8x7B, NH2-Solar-10.7B, and OpenChat-3.5-7B, surpassing GPT-3.5 (March), Claude-2.1, and approaching Mixtral-8x7B-Instruct [Details].
  10. The Swedish fintech Klarna’s AI assistant handles two-thirds of all customer service chats, some 2.3 million conversations so far, equivalent to the work of 700 people [Details].
  11. Lightricks introduces LTX Studio, an AI-powered film making platform, now open for waitlist sign-ups, aimed at assisting creators in story visualization [Details].
  12. Morph partners with Stability AI to launch Morph Studio, a platform to make films using Stability AI–generated clips [Details].
  13. JFrog‘s security team found that roughly a 100 models hosted on the Hugging Face platform feature malicious functionality [Details].
  14. Playground released Playground v2.5, an open-source text-to-image generative model, with a focus on enhanced color and contrast, improved generation for multi-aspect ratios, and improved human-centric fine detail [Details].
  15. Together AI and the Arc Institute released Evo, a long-context biological foundation model based on the StripedHyena architecture that generalizes across DNA, RNA, and proteins.. Evo is capable of both prediction tasks and generative design, from molecular to whole genome scale (over 650k tokens in length) [Details].
  16. Adobe previews a new generative AI music generation and editing tool, Project Music GenAI Control, that allows creators to generate music from text prompts, and then have fine-grained control to edit that audio for their precise needs [Details | video].
  17. Microsoft introduces Copilot for Finance, an AI chatbot for finance workers in Excel and Outlook [Details].
  18. The Intercept, Raw Story, and AlterNet sue OpenAI and Microsoft, claiming OpenAI and Microsoft intentionally removed important copyright information from training data [Details].
  19. Huawei spin-off Honor shows off tech to control a car with your eyes and chatbot based on Meta’s AI [Details].
  20. Tumblr and WordPress.com are preparing to sell user data to Midjourney and OpenAI [Details]

February 2024 – Week 3 Recap

  1. Meta AI introduces V-JEPA (Video Joint Embedding Predictive Architecture), a method for teaching machines to understand and model the physical world by watching videos. Meta AI releases a collection of V-JEPA vision models trained with a feature prediction objective using self-supervised learning. The models are able to understand and predict what is going on in a video, even with limited information [Details | GitHub].
  2. Open AI introduces Sora, a text-to-video model that can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions [Details + sample videos Report].
  3. Google announces their next-generation model, Gemini 1.5, that uses a new Mixture-of-Experts (MoE) architecture. The first Gemini 1.5 model being released for early testing is Gemini 1.5 Pro with a context window of up to 1 million tokens, which is the longest context window of any large-scale foundation model yet. 1.5 Pro can perform sophisticated understanding and reasoning tasks for different modalities, including video and it performs at a similar level to 1.0 Ultra [