Download the AI & Machine Learning For Dummies App: iOS - Android
AI Innovations in April 2024.
Welcome to the April 2024 edition of the Daily Chronicle, your gateway to the latest Artificial Intelligence innovations! Join us as we uncover the most recent advancements, trends, and groundbreaking discoveries in the world of AI. Explore a realm where industry leaders gather at events like ‘AI Innovations at Work’ and where visionary forecasts shape the future of AI. Stay informed with daily updates as we navigate through the dynamic world of AI, uncovering its potential impact and exploring cutting-edge developments throughout this exciting month. Join us on this thrilling journey into the limitless possibilities of AI in April 2024.
Experience the transformative capabilities of AI with “Read Aloud For Me – AI Dashboard – AI Tools Catalog – AI Tools Recommender” – your ultimate AI Dashboard and Hub. Seamlessly access a comprehensive suite of top-tier AI tools within a single app, meticulously crafted to enhance your efficiency and streamline your digital interactions. Now available on the web at readaloudforme.com and across popular app platforms including Apple, Google, and Microsoft, “Read Aloud For Me – AI Dashboard” places the future of AI at your fingertips, blending convenience with cutting-edge innovation. Whether for professional endeavors, educational pursuits, or personal enrichment, our app serves as your portal to the forefront of AI technologies. Embrace the future today by downloading our app and revolutionize your engagement with AI tools.
A Daily chronicle of AI Innovations April 30th 2024: Gradient AI releases Llama-3 8B with 1M context Mysterious “gpt2-chatbot” AI model bemuses experts GitHub’s Copilot Workspace turns ideas into AI-powered software OpenAI collaborates with Financial Times to use its content in ChatGPT Cohere’s Command R models family is accessible through Amazon Bedrock NIST launches a new platform for generative AI evaluation ‘ChatGPT for CRISPR’ creates new genome-editing tools Microsoft to invest $1.7 billion in Indonesia’s AI and cloud infrastructure
Gradient AI releases Llama-3 8B with 1M context
Gradient AI has released a new Llama-3 8B language model version called Llama-3-8B-Instruct-Gradient-1048k. This model’s key feature is its ability to handle extremely long context lengths up to 1 million tokens.
To extend the context window to 1 million tokens, Gradient AI used techniques like NTK-aware initialization of positional encodings, progressive training on increasing context lengths similar to prior work on long context modeling, and optimizations to train on huge GPU clusters efficiently. The model was trained on 1.4 billion tokens, a tiny fraction of Llama-3’s original pretraining data.
Why does it matter?
The 1M context window allows the Llama-3 8B model to process and generate text based on much larger inputs, like entire books or long documents. This could enable new applications in summarizing lengthy materials, answering questions that require referencing an extensive context and analyzing or writing on topics that require considering a large amount of background information.
Mysterious “gpt2-chatbot” AI model bemuses experts
A mysterious new AI model called “gpt2-chatbot” is going viral. It was released without official documentation, and there is speculation that it could be OpenAI’s next model.
gpt2-chatbot shows incredible reasoning skills. It also gets difficult AI questions right with a more human-like tone.
On a math test, gpt2-chatbot solved an International Math Olympiad (IMO) problem in one try. This does not apply to all IMO problems, but it is still insanely impressive.
Also, many AI experts discuss the gpt2-chatbot’s better coding skills than the newest version, GPT-4 or Claude Opus. Without official documentation, we still don’t know who released it and for what purpose.
However, there are a couple of speculations going around in the industry that gpt2-chatbot is:
It’s secretly GPT-5 released early OpenAI can benchmark it
It’s OpenAI’s GPT-2 from 2019 finetuned with modern assistant datasets
You can try out gpt2-chatbot for free by visiting https://chat.lmsys.org direct chat. Unfortunately, with so many people trying it right now, there are slow response times and a maximum of 8 turns per conversation.
If the “gpt2-chatbot” model truly represents a major advancement in language generation and conversational abilities, it could accelerate the development of more advanced virtual assistants, chatbots, and other natural language processing applications. However, if the model’s capabilities are overstated or have significant limitations, it may lead to disappointment and a temporary setback in the progress of conversational AI.
GitHub’s Copilot Workspace turns ideas into AI-powered software
GitHub is releasing a new AI-powered developer environment called Copilot Workspace. It allows developers to turn an idea into software code using natural language and provides AI assistance throughout the development process—planning the steps, writing the actual code, testing, debugging, etc.
The developer just needs to describe what they want in plain English, and Copilot Workspace will generate a step-by-step plan and the code itself. By automating repetitive tasks and providing step-by-step plans, Copilot Workspace aims to reduce developers’ cognitive strain and enable them to focus more on creativity and problem-solving. This new Copilot-native developer environment is designed for any device, making it accessible to developers anywhere.
Why does it matter?
Copilot Workspace could significantly lower the barrier to entry for those who can create software by automating much of the coding work. This could potentially enable a future with 1 billion developers on GitHub building software simply by describing what they want. Copilot Workspace could also make software development more accessible to non-technical people.
OpenAI collaborates with Financial Times to use its content in ChatGPT
The Financial Times has signed a deal with OpenAI to license its content for developing AI models and allow ChatGPT to answer queries with summaries attributable to the newspaper. It will help OpenAI enhance the ChatGPT chatbot with archived content from the FT, and the firms will work together to develop new AI products and features for FT readers. (Link)
Cohere’s Command R models family is accessible through Amazon Bedrock
Amazon Bedrock developers can access Cohere’s Command R and Command R+ LLMs via APIs. This addition gives enterprise customers more LLM options, joining Claude 3 Sonnet, Haiku, Opus, Mistral 7B, Mixtral 8x7B, and Mistral Large. The Command R and R+ models are highly scalable, RAG-optimized, and multilingual across 10 languages. (Link)
NIST launches a new platform for generative AI evaluation
NIST announced the launch of NIST GenAI, a new program spearheaded to assess generative AI technologies, including text- and image-generating AI. NIST GenAI will release benchmarks, help create “content authenticity” detection (i.e., deepfake-checking) systems, and encourage software development to spot the source of fake or misleading AI-generated information. (Link)
‘ChatGPT for CRISPR’ creates new genome-editing tools
ChatGPT has a specialized version called “GenomeGuide for CRISPR Research,” focusing on genetic engineering. It aims to assist researchers in designing new, more versatile gene-editing tools compared to the normal ones. It is also an AI assistant dedicated to genetic discoveries and provides R&D support in genetic engineering and CRISPR technology. (Link)
Microsoft to invest $1.7 billion in Indonesia’s AI and cloud infrastructure
Microsoft will invest $1.7 billion over the next 4 years in cloud and AI infrastructure in Indonesia, as well as AI skilling opportunities for 840,000 people and support for the nation’s growing developer community. These initiatives aim to achieve the Indonesian government’s Golden Indonesia 2045 Vision to transform the nation into a global economic powerhouse. (Link))
A Daily chronicle of AI Innovations April 29th 2024: China unveils Sora challenger OpenAI to train AI on Financial Times content Meta’s AI ad platform is glitching and blowing through cash Tesla and Baidu join forces to build cars that will drive themselves in China New iPad Pro may use an AI-enabled M4 chip iOS 18 may have OpenAI-powered gen AI Capabilities China’s Vidu generates 16-second 1080P videos, matching OpenAI’s Sora New S1 robot mimics human-like movements, speed, and precision
iOS 18 may have OpenAI-powered gen AI capabilities
Apple has reportedly reinitiated talks with OpenAI to incorporate generative AI capabilities into the upcoming iOS 18 operating system, which will power the next generation of iPhones. The tech giant has been quietly exploring ways to enhance Siri and introduce new AI-powered features across its ecosystem. As of now, the companies are reportedly actively negotiating the terms of the agreement.
Apple is also in discussions with Google about licensing its Gemini chatbot technology. As of now, Apple hasn’t made a final decision on which partners it will work with, and there’s no guarantee that a deal will be finalized. The company may ultimately reach agreements with both OpenAI and Google or choose another provider entirely.
Why does this matter?
The renewed talks indicate Apple’s desperate attempt to accelerate its gen AI innovation and catch up with Big Tech. If successful, this collaboration would position Apple as a leader in AI-driven mobile devices, setting a new standard for chatbot-like interactions. Users can anticipate more sophisticated AI features, improved voice assistants, and a wider range of AI-powered applications on future iPhones.
China’s Vidu generates 16-second 1080P videos, matching OpenAI’s Sora
At the ongoing Zhongguancun Forum in Beijing, Chinese tech firm ShengShu-AI and Tsinghua University have unveiled Vidu, a text-to-video AI model. Vidu is said to be the first Chinese AI model on par with OpenAI’s Sora, capable of generating 16-second 1080P video clips with a single click. The model is built on a self-developed visual transformation model architecture called Universal Vision Transformer (U-ViT), which integrates two text-to-video AI models: the Diffusion and the Transformer.
During a live demonstration, Vidu showcased its ability to simulate the real physical world, generating scenes with complex details that adhere to real physical laws, such as realistic light and shadow effects and intricate facial expressions. Vidu has a deep understanding of Chinese factors and can generate images of unique Chinese characters like pandas and loong (Chinese dragons).
Why does this matter?
Vidu’s launch represents a technical and strategic achievement for China. No other text-to-video AI model has yet been developed with cultural nuances with the intention of preserving national identity. Moreover, the integration of Diffusion and Transformer models in U-ViT architecture pushes the boundaries of realistic and dynamic video generation, potentially reshaping what’s possible in creative industries.
New S1 robot mimics human-like movements, speed, and precision
Chinese robotics firm Astribot, a subsidiary of Stardust Intelligence, has previewed its advanced humanoid robot assistant, the S1. In a recently released video, the S1 shows remarkable agility, dexterity, and speed while doing various household tasks, marking a significant milestone in the development of humanoid robots.
Utilizing imitation learning, the S1 robot can execute intricate tasks at a pace matching adult humans. The video showcases the robot’s impressive capabilities, like smoothly pulling a tablecloth from beneath a stack of wine glasses, opening and pouring wine, delicately shaving a cucumber, flipping a sandwich, etc. Astribot claims that the S1 is currently undergoing rigorous testing and is slated for commercial release in 2024.
Why does this matter?
The AI-powered humanoid robot industry is booming with innovation and competition. OpenAI recently introduced two impressive bots: one for folding laundry with “soft-touch” skills and another for natural language reasoning. Boston Dynamics unveiled the Atlas robot, and UBTech from China introduced its speaking bot, Walker S. Now, Astribot’s S1 bot has amazed us with its incredible speed and precision in household tasks.
China has developed a new text-to-video AI tool named Vidu, capable of generating 16-second videos in 1080p, akin to OpenAI’s Sora but with shorter video length capability.
The tool was created by Shengshu Technology in collaboration with Tsinghua University, and aims to advance China’s standing in the global generative AI market.
Vidu has been showcased with demo clips, such as a panda playing guitar and a puppy swimming, highlighting its imaginative capabilities and understanding of Chinese cultural elements.
The Financial Times has made a deal with OpenAI to license their content and collaborate on developing AI tools, with plans to integrate FT content summaries, quotes, and links within ChatGPT responses.
OpenAI commits to developing new AI products with the Financial Times, which already utilizes OpenAI products, including a generative AI search function, indicating a deeper technological partnership.
This licensing agreement places the Financial Times among other news organizations engaging with AI, contrasting with some organizations like The New York Times, which is pursuing legal action against OpenAI for copyright infringement.
Meta’s AI ad platform is glitching and blowing through cash
Meta’s automated ad platform, Advantage Plus shopping campaigns, has been heavily overspending and failing to deliver expected sales outcomes for advertisers.
Marketers have experienced unpredictable costs and poor performance with Advantage Plus, citing instances of ad budgets being rapidly depleted and a lack of transparent communication from Meta.
Despite efforts to address technical issues, ongoing problems with Advantage Plus have led some businesses to revert to manual ad buying and question the efficiency of AI-driven advertising on Meta’s platforms.
Tesla and Baidu join forces to build cars that will drive themselves in China
Elon Musk’s Tesla has partnered with Chinese tech giant Baidu to collect data on China’s public roads, aiming to develop and deploy Tesla’s full self-driving (FSD) system in China.
The partnership enables Tesla to meet local regulatory requirements by using Baidu’s mapping service, facilitating the legal operation of its FSD software on Chinese roads.
Elon Musk also claimed companies need to spend at least $10 billion on AI this year, similar to Tesla’s investment, to stay competitive.
The upcoming iPad Pro lineup is anticipated to feature the latest M4 chipset, marking a significant upgrade from the current M2 chipset-equipped models.
The new M4 chipset in the iPad Pro is expected to introduce advanced AI capabilities, positioning the device as Apple’s first truly AI-powered product.
Apple’s “Let Loose” event, scheduled for May 7, will also showcase new OLED iPad Pro variants and the first 12.9-inch iPad Air, alongside potential launches of a new Magic Keyboard and Apple Pencil.
OpenAI hit with GDPR complaint over ChatGPT’s ‘hallucination’ failure
OpenAI faces a new GDPR privacy complaint in the EU due to ChatGPT’s generation of incorrect personal information without a means to correct it.
The complaint challenges ChatGPT’s compliance with the GDPR, emphasizing the right of EU citizens to have erroneous data corrected and OpenAI’s refusal to amend incorrect information.
OpenAI’s situation highlights tension with GDPR requirements, including rights to rectification and transparency, as authorities in various EU countries investigate or consider actions against the company.
Estée Lauder and Microsoft’s collaboration for beauty brands
Estée Lauder Companies (ELC) and Microsoft have launched the AI Innovation Lab to help ELC’s brands leverage generative AI. The collaboration aims to enable faster responses to social trends and consumer demands, as well as accelerate product innovation. (Link)
Oracle boosts Fusion Cloud apps with 50+ generative AI capabilities
Oracle has launched new generative AI features across its Fusion Cloud CX suite to help sales, marketing, and service agents automate and accelerate critical workflows. The AI capabilities will enable contextually-aware responses, optimized schedules for on-field service agents, targeted content creation, and AI-based look-alike modeling for contacts. (Link)
Google’s new AI feature helps users practice English conversations
The chatbot, currently available in select countries through Search Labs or Google Translate on Android, provides feedback and helps users find the best words and conjugations within the context of a conversation. (Link)
OpenAI enhances ChatGPT with user-specific memory update
The update enables ChatGPT to provide more personalized and contextually relevant responses over time by storing details about users’ preferences and interactions. Users have control over the memory feature, including the ability to toggle it on or off, inspect stored information, and delete specific data entries. (Link)
Tech CEOs join DHS advisory board on AI safety and security
The US DHS has announced a blue-ribbon board that includes CEOs of major tech companies to advise the government on the role of AI in critical infrastructure. They will develop recommendations to prevent and prepare for AI-related disruptions to critical services that impact national economic security, public health, or safety. (Link)
A Daily chronicle of AI Innovations April 27th 2024: Apple in talks with OpenAI to build chatbot China developed its very own Neuralink Tesla Autopilot has ‘critical safety gap’ linked to hundreds of collisions Google’s AI bot can now help you learn English.
Apple in talks with OpenAI to build chatbot
Apple is in talks with OpenAI and Google to incorporate their AI technology into the iPhone’s upcoming features, aiming to debut new generative AI functionalities at the Worldwide Developers Conference.
Apple has struggled to develop a competitive AI chatbot internally, leading to the cancellation of some projects to refocus on generative AI technologies using external partnerships.
Choosing to partner with OpenAI or Google could mitigate past challenges with AI implementations, but also increase Apple’s dependency on these competitors for AI advancements.
Google has introduced a ‘Speaking practice’ feature for learning English, allowing users to converse with an AI bot on their phones, a tool that began offering feedback on spoken sentences in October 2023 and now supports continuous dialogues.
The ‘Speaking practice’ feature is available to Search Labs users in countries like Argentina, India, Mexico, Colombia, Venezuela, and Indonesia and may appear when translating to or from English on an Android device.
Unlike structured curriculum apps such as Duolingo, Babbel, and Pimsleur, Google’s approach allows for practicing English within conversational contexts, with the company expanding its AI’s language comprehension.
AI Weekly Rundown April 2024 Week 4 [April 21 – April 28]
iOS 18 to have AI features with on-device processing Many-shot ICL is a breakthrough in improving LLM performance Groq shatters AI inference speed record with 800 tokens/second on LLaMA 3 Microsoft launches its smallest AI model that can fit on your phone Adobe survey says 50% of Americans use generative AI every day Microsoft hired former Meta VP of infrastructure Firefly 3: Adobe’s best AI image generation model to date Meta finally rolls out multimodal AI capabilities for its smart glasses Profulent’s OpenCRISPR-1 can edit the human genome NVIDIA acquires Run:ai; integrates it with DGX Cloud AI Platform Snowflake enters the generative AI arena with Arctic LLM Monetizing generative AI to take time, says Zuckerberg Sanctuary AI launches Phoenix 7 robot for industrial automation AI integration hits roadblocks for CIOs Moderna and OpenAI partner to accelerate drug development
China developed its very own Neuralink
Beijing Xinzhida Neurotechnology, backed by the Chinese Community Party, unveiled a brain-computer interface named Neucyber, currently successful in controlling a robotic arm via a monkey.
Neucyber, regarded as a competitive response to Neuralink, highlights the intensifying global race in developing brain-computer interfaces, though it has not advanced to human trials yet.
The long-term implications of such technology remain uncertain, stirring both intrigue and concern in the context of its potential impact on health and the broader tech industry.
Tesla Autopilot has ‘critical safety gap’ linked to hundreds of collisions
The National Highway Traffic Safety Administration (NHTSA) reported that Tesla’s Autopilot contributed to at least 467 collisions, including 13 fatalities, due to a “critical safety gap” in its design.
NHTSA criticized the Autopilot system for inadequate driver monitoring and staying active even when drivers are not paying sufficient attention, leading to “foreseeable misuse and avoidable crashes.”
The agency is also investigating the effectiveness of a software update issued by Tesla intended to improve the Autopilot’s driver monitoring capabilities, following continued reports of related crashes.
A Daily chronicle of AI Innovations April 26th 2024: Elon Musk raises $6B to compete with OpenAI Sanctuary AI unveils next-gen robots; CIOs go big on AI! Moderna and OpenAI partner to accelerate drug development Samsung and Google tease collaborative AI features for Android Salesforce launches Einstein Copilot with advanced reasoning and actions AuditBoard integrates AI-powered descriptions to cut audit busywork LA Metro to install AI cameras on buses to issue tickets to illegal parkers EPFL and Yale researchers develop Meditron, a medical AI model
Sanctuary AI unveils next-gen robots
Sanctuary AI, a company developing human-like intelligence in robots, unveiled its latest robot – Phoenix Gen 7. This comes less than a year after their previous generation robot.
The new robot boasts significant improvements in both hardware and software. It can now perform complex tasks for longer durations, learn new tasks 50 times faster than before, and have a wider range of motion with improved dexterity. The company believes this is a major step towards achieving human-like general-purpose AI in robots.
Why does it matter?
While Boston Dynamics headlines focus on robotic feats, Sanctuary AI’s progress could set a new standard for the future of work and automation. As robots become more human-like in their capabilities, they can take on complex tasks in manufacturing, healthcare, and other sectors, reducing the need for human intervention in potentially dangerous or repetitive jobs.
A new Lenovo survey shows that CIOs are prioritizing integrating AI into their businesses alongside cybersecurity.
However, there are challenges hindering rapid AI adoption, such as:
Large portions of organizations are not prepared to integrate AI swiftly (e.g., new product lines, supply chain).
Security concerns around data privacy, attack vulnerability, and ethical AI use.
Talent shortage in machine learning, data science, and AI integration.
Difficulty demonstrating ROI of AI projects.
Resource constraints – focusing on AI may take away from sustainability efforts.
Despite the challenges, there is still a positive outlook on AI:
80% of CIOs believe AI will significantly impact their businesses.
96% of CIOs plan to increase their investments in AI.
Why does it matter?
This highlights a significant transition where CIOs are now focused on driving business outcomes rather than just operational maintenance. As AI plays a crucial role, addressing the barriers to adoption will have far-reaching implications across industries seeking to leverage AI for competition, innovation, and efficiency gains. Overcoming the skills gap and security risks and demonstrating clear ROI will be key to AI’s proliferation.
Moderna and OpenAI partner to accelerate drug development
Biotech giant Moderna has expanded its partnership with Open AI to deploy ChatGPT enterprise to every corner of its business. The aim is to leverage AI to accelerate the development of new life-saving treatments.
Here’s the gist:
Moderna plans to launch up to 15 new mRNA products in 5 years, including vaccines and cancer treatments.
Their custom “Dose ID” GPT helps select optimal vaccine doses for clinical trials.
Moderna saw the creation of 750+ custom GPTs with 120 ChatGPT conversations per user per week.
The redesign aims for a lean 3,000-employee team to perform like 100,000 with AI force multiplication.
Why does it matter?
If Moderna can pull this off, it could mean a future where new life-saving drugs are developed at lightning speed. And who knows, maybe your next doctor’s visit will involve a friendly chat with a healthcare AI. Just don’t ask it to diagnose you on WebMD first.
Samsung and Google tease collaborative AI features for Android: Samsung and Google are teasing new AI features developed through their strong partnership. Recent social media posts from Samsung Mobile and Google’s Rick Osterloh confirm the companies are working together on AI and exploring opportunities. The collaboration aims to deliver the best Android ecosystem of products and services. (Link)
Salesforce launches Einstein Copilot with advanced reasoning and actions
Salesforce announced the general availability of its generative AI platform, Einstein Copilot, with new features like Copilot Actions and Analytics. Actions enable sales teams to optimize workflows and close more deals, while Analytics provides insights into Copilot’s usage and performance. Salesforce is also working on improving efficiency with smaller AI models. (Link)
AuditBoard integrates AI-powered descriptions to cut audit busywork
AuditBoard, a cloud-based audit software company, has launched AuditBoard AI, an advanced AI feature to automate risk assessment descriptions. The AI-powered tool generates descriptions for risks and controls, reducing the time auditors spend on repetitive tasks and increasing efficiency. (Link)
LA Metro to install AI cameras on buses to issue tickets to illegal parkers
LA Metro equips buses with AI cameras to catch and ticket vehicles blocking bus lanes, aiming to improve bus times and accessibility. Violations will be human-reviewed before ticketing. The program, launching this year, could lead to AI-assisted traffic management in the future. (Link)
EPFL and Yale researchers develop Meditron, a medical AI model
Researchers from EPFL and Yale have developed Meditron, an open-source suite of medical AI models based on Meta’s Llama. Designed for low-resource settings, Meditron assists with clinical decision-making and diagnosis. The models, fine-tuned on high-quality medical data with expert input, have been downloaded over 30,000 times. (Link)
Elon Musk raises $6B to compete with OpenAI
xAI, Elon Musk’s AI company, is nearing a funding round of $6 billion at a pre-money valuation of $18 billion, aiming to be a competitor to OpenAI.
The funding round has attracted significant interest from investors, including Sequoia Capital and Future Ventures, and terms were adjusted from an initial $3 billion at a $15 billion valuation due to demand.
X, Musk’s social network, not only has a stake in xAI but also integrates its chatbot Grok, showcasing xAI’s broader ambition to merge the digital with the physical through data from Musk’s companies.
A Daily chronicle of AI Innovations April 25th 2024: NVIDIA acquires Run:ai, integrates it with DGX Cloud AI Platform Snowflake enters the generative AI arena with Arctic LLM Monetizing generative AI to take time, says Zuckerberg Adobe unveils VideoGigagan: AI project upscaling blurry videos to HD OpenELM: Apple’s evolving AI strategy for iPhones IBM acquires HashiCorp for $6.4 Billion to boost cloud business Synthesia Introduces Emotions to AI Video Avatars HubSpot introduces cutting-edge AI tools for SMBs
NVIDIA acquires Run:ai, integrates it with DGX Cloud AI Platform
NVIDIA has acquired Run:ai, an Israeli startup that simplifies AI hardware infrastructure management and optimization for developers and operations teams. The acquisition was made for an undisclosed sum, but sources suggest it was around $700 million.
Run:ai’s platform allows AI models to run parallel across various hardware environments, whether on-premises, in public clouds, or at the edge.
Nvidia plans to maintain Run:ai’s products with their existing business model and will support Run:ai’s product development within Nvidia’s DGX Cloud AI platform. This platform offers enterprise users access to computing infrastructure and software for training AI models, including generative AI.
Why does it matter?
NVIDIA is strengthening its offering across the entire AI stack, from hardware to software. and solidifies its status as a comprehensive solution provider for all your AI infra needs. NVIDIA’s vertical integration strategy aims to simplify and optimize AI deployments for customers, asserting its dominance in the evolving AI landscape.
Snowflake enters the generative AI arena with Arctic LLM
Snowflake, the cloud computing company, has released Arctic LLM, a generative AI model for enterprise use. It’s optimized for generating database code and is available under an Apache 2.0 license.
Arctic LLM outperforms other models like DBRX and Llama3 in tasks like coding and SQL generation. Snowflake aims to address enterprise challenges with this model, including building SQL co-pilots and high-quality chatbots. This move aligns with the trend of cloud vendors offering specialized generative AI solutions for businesses
Why does it matter?
Approximately 46% of global enterprise AI decision-makers use existing open-source LLMs for generative AI. With the release of Arctic, Snowflake democratizes access to cutting-edge models by offering an Apache 2.0 license for ungated personal, research, and commercial use.
Monetizing generative AI to take time, says Zuckerberg
Meta CEO Mark Zuckerberg stated that it would take several years for Meta to make money from generative AI. The company is already profitable, but building advanced AI capabilities will be lengthy and costly. Monetization strategies include scaling business messaging, introducing ads or paid content, and offering larger AI models for a fee. However, it will take time for these efforts to yield significant profits.
Why does it matter?
Mark Zuckerberg’s statement highlights the challenges and time required to monetize generative AI technologies effectively. It underscores the complexity of developing advanced AI capabilities and the need for substantial investments. Furthermore, it emphasizes the importance of long-term planning and patient investment in developing and commercializing AI applications.
AI start-up unveils avatars that convincingly show human emotions
An AI startup named Synthesia has created hyperrealistic AI-generated avatars that are extremely lifelike and expressive, pushing the boundaries of generative AI technology.
The avatars can replicate human emotions and mannerisms closely, thanks to advancements in AI and extensive data from human actors, aiming to make digital clones indistinguishable from real humans in videos.
Despite the technological marvel, the creation of such realistic avatars raises significant ethical concerns about distinguishing between real and AI-generated content, potentially affecting trust and truth in digital media.
Microsoft and Amazon’s AI ambitions spark regulatory rumble
UK regulators are investigating Microsoft and Amazon’s investments in AI startups, such as Amazon’s partnership with Anthropic and Microsoft’s dealings with Mistral AI and Inflection AI, for potential anti-competitive impacts.
The CMA is analyzing if these partnerships align with UK merger rules and their effect on competition, following significant investments and strategic hiring by the companies.
Both Microsoft and Amazon assert that their AI investments and partnerships promote competition and are confident in a favorable resolution by regulators.
Adobe unveils VideoGigagan: AI project upscaling blurry videos to HD
Adobe’s VideoGigagan AI project enhances low-quality videos by upscaling them to higher resolutions, even when the original footage is blurry. It uses automatic adjustments for brightness, contrast, saturation, and sharpness, benefiting brand perception, engagement, and customer satisfaction. (Link)
OpenELM: Apple’s evolving AI strategy for iPhones
Apple has unveiled OpenELM, a collection of compact language models that enable AI functionality on its devices. These models, available in four sizes ranging from 270 million to 3 billion parameters, are specifically designed to excel in text-related tasks like email composition.
Just as Google, Samsung and Microsoft continue to push their efforts with generative AI on PCs and mobile devices, Apple is moving to join the party with OpenELM, a new family of open source large language models (LLMs) that can run entirely on a single device rather than having to connect to cloud servers.
There are eight OpenELM models in total – four pre-trained and four instruction-tuned – covering different parameter sizes between 270 million and 3 billion parameters (referring to the connections between artificial neurons in an LLM, and more parameters typically denote greater performance and more capabilities, though not always).
Apple is offering the weights of its OpenELM models under what it deems a “sample code license,” along with different checkpoints from training, stats on how the models perform as well as instructions for pre-training, evaluation, instruction tuning and parameter-efficient fine tuning.
IBM acquires HashiCorp for $6.4 Billion to boost cloud business
IBM has acquired HashiCorp, Inc., for $6.4 billion, aiming to enhance its hybrid cloud and AI capabilities. The acquisition will integrate HashiCorp’s suite of products, including Terraform, to automate hybrid and multi-cloud environments. (Link)
Synthesia Introduces Emotions to AI Video Avatars
Synthesia, an AI startup specializing in video avatars for business users, has released an update introducing emotions to its avatars. The latest version includes avatars built from actual humans, providing better lip tracking, more expressive natural movements, and improved emotional range when generating videos. (Link)
HubSpot introduces cutting-edge AI tools for SMBs
HubSpot introduced HubSpot AI at INBOUND 2023, featuring AI assistants for email drafting and content creation, AI agents for customer service, predictive analytics, and ChatSpot powered by OpenAI’s ChatGPT. The revamped Sales Hub offers modernized sales processes tailored for SMBs. (Link)
A Daily chronicle of AI Innovations April 24th 2024: Firefly 3: Adobe’s best AI image generation model to date Meta finally rolls out multimodal AI capabilities for its smart glasses Profulent’s OpenCRISPR-1 can edit the human genome Coca-Cola and Microsoft partner to accelerate cloud and Gen AI initiatives Cognizant and Microsoft team up to boost Gen AI adoption Amazon wishes to host companies’ custom Gen AI models OpenAI launches more enterprise-grade features for API customers Tesla could start selling Optimus robots by the end of 2025 Snowflake launches 480bn-parameter AI to take on OpenAI, Google and Meta Meta adds AI to its Ray-Ban smart glasses Apple reduces production of Vision Pro due to low demand
Firefly 3: Adobe’s best AI image generation model to date
Adobe has announced a major update to its AI image generation technology called Firefly Image 3. The model showcases a significant improvement in creating more realistic and high-quality images over previous versions. It has enhanced capabilities to understand longer text prompts, generate better lighting, and depict subjects like crowds and human expressions. The Firefly Image 3 model is now available through Adobe’s Firefly web app as well as integrated into Adobe Photoshop and InDesign apps.
It powers new AI-assisted features in these apps, such as generating custom backgrounds, creating image variations, and enhancing detail. Adobe has also introduced advanced creative controls like Structure Reference to match a reference image’s composition and Style Reference to transfer artistic styles between images. Adobe also attaches “Content Credentials” to all Firefly-generated assets to promote responsible AI development.
Why does it matter?
In AI image generation, a more powerful model from a major player like Adobe could intensify competition with rivals like Midjourney and DALL-E It may motivate other providers to accelerate their own model improvements to keep pace. For creative professionals and enthusiasts, accessing such advanced AI tools could unlock new levels of creative expression and productivity.
Meta finally rolls out multimodal AI capabilities for its smart glasses; adds new features
Meta has announced exciting updates to their Ray-Ban Meta smart glasses collection. They are introducing new styles to cater to a wider range of face shapes. The new styles include the vintage-inspired Skyler frames, designed for smaller faces, and the Headliner frames with a low bridge option. It also introduces video calling capabilities via WhatsApp and Messenger, allowing users to share their views during a video call.
Meta is integrating its AI technology, Meta AI Vision, into Ray-Ban smart glasses. Users can interact with the glasses using voice commands, saying “Hey Meta,” and receive real-time information. The multimodal AI can translate text into different languages using the built-in camera. These capabilities were in testing for a while and are now available to everyone in the US and Canada.
Why does it matter?
Meta is pushing the boundaries of smart glasses technology, making them more versatile, user-friendly, and AI-powered. This could lead to increased mainstream adoption and integration of augmented reality wearables and voice-controlled AI assistants. Smart glasses could also redefine how people interact with the world around them, potentially changing how we work, communicate, and access information in the future.
Profulent’s OpenCRISPR-1 can edit the human genome
Profluent, a biotechnology company, has developed the world’s first precision gene editing system using AI-generated components. They trained LLMs on a vast dataset of CRISPR-Cas proteins to generate novel gene editors that greatly expand the natural diversity of these systems. OpenCRISPR-1 performed similarly to the widely used SpCas9 gene editor regarding on-target editing activity but had a 95% reduction in off-target effects. This means OpenCRISPR-1 can edit the human genome with high precision.
The researchers further improved OpenCRISPR-1 by using AI to design compatible guide RNAs, enhancing its editing efficiency. Profluent publicly released OpenCRISPR-1 to enable broader, ethical use of this advanced gene editing technology across research, agriculture, and therapeutic applications. By using AI-generated components, they aim to lower the cost and barriers to accessing powerful genome editing capabilities.
Why does it matter?
The ability to design custom gene editors using AI could dramatically accelerate the pace of innovation in gene editing, making these powerful technologies more precise, safer, accessible, and affordable for a wide range of diseases. This could lead to breakthroughs like personalized medicine, agricultural applications, and basic scientific research.
Coca-Cola and Microsoft partner to accelerate cloud and Gen AI initiatives
Microsoft and Coca-Cola announced a 5-year strategic partnership, where Coca-Cola has made a $1.1 billion commitment to the Microsoft Cloud and its generative AI capabilities. The collaboration underscores Coca-Cola’s ongoing technology transformation, underpinned by the Microsoft Cloud as Coca-Cola’s globally preferred and strategic cloud and AI platform. (Link)
Cognizant and Microsoft team up to boost Gen AI adoption
Microsoft has teamed up with Cognizant to bring Microsoft’s Gen AI capabilities to Cognizant’s employees and users. Cognizant acquired 25,000 Microsoft 365 Copilot seats for its associates, 500 Sales Copilot seats, and 500 Services Copilot seats. With that, Cognizant will transform business operations, enhance employee experiences, and deliver new customer value. (Link)
Amazon wishes to host companies’ custom Gen AI models
AWS wants to become the go-to place for companies to host and fine-tune their custom Gen AI models. Amazon Bedrock’s new Custom Model Import feature lets organizations import and access Gen AI models as fully managed APIs. Companies’ proprietary models, once imported, benefit from the same infrastructure as other generative AI models in Bedrock’s library. (Link)
OpenAI launches more enterprise-grade features for API customers
OpenAI expanded its enterprise features for API customers, further enriching its Assistants API and introducing new tools to enhance security and administrative control. The company has introduced Private Link, a secure method to enable direct communication between Azure and OpenAI. It has also added Multi-Factor Authentication (MFA) to bolster access control. (Link)
Tesla could start selling Optimus robots by the end of 2025
According to CEO Elon Musk, Tesla’s humanoid robot, Optimus, may be ready to sell by the end of next year. Several companies have been betting on humanoid robots to meet potential labor shortages and perform repetitive tasks that could be dangerous or tedious in industries such as logistics, warehousing, retail, and manufacturing. (Link))
Microsoft launches Phi-3, its smallest AI model yet
Microsoft launched the next version of its lightweight AI model Phi-3 Mini, the first of three small models the company plans to release.
The company released Phi-2 in December, which performed just as well as bigger models like Llama 2.
Eric Boyd, corporate vice president of Microsoft Azure AI Platform, tells The Verge Phi-3 Mini is as capable as LLMs like GPT-3.5 “just in a smaller form factor.”
Compared to their larger counterparts, small AI models are often cheaper to run and perform better on personal devices like phones and laptops.
Ray-Ban Meta smart glasses now include multimodal AI, enabling the device to process diverse types of data such as images, videos, text, and sound to understand the user’s environment in real-time.
The AI capabilities allow users to interact with their surroundings in enhanced ways, such as identifying dog breeds, translating signs in foreign languages, and offering recipe suggestions based on visible ingredients.
Initial testing of the multimodal AI has shown promise, although it has also revealed some inconsistencies in accuracy, such as errors in identifying certain car models and plant species.
Apple reduces production of Vision Pro due to low demand
Apple is reducing production of its Vision Pro headset for the rest of 2024 due to lower than expected demand, with sales projections adjusted down from up to 800,000 units to around 400,000 to 450,000 units.
Following weaker sales and reduced demand, the launch of a more affordable mixed-reality headset from Apple could be delayed until after 2025, as the company reassesses its Vision Pro strategy.
Despite efforts to boost Vision Pro’s appeal, including introducing new features and accessories, lack of key app support and customer dissatisfaction with practicality are contributing to its sluggish sales.
Snowflake launches 480bn-parameter AI to take on OpenAI, Google and Meta
Snowflake announced Arctic LLM, an enterprise-grade generative AI model designed for generating database code and available under an Apache 2.0 license for free commercial and research use.
Arctic LLM, using a mixture of experts (MoE) architecture, claims to outperform competitors like DBRX and certain models from Meta on coding and SQL generation tasks.
Snowflake aims to integrate Arctic LLM into its platform, Cortex, offering it as a solution for building AI- and machine learning-powered apps with a focus on security, governance, and scalability.
VADER (Valence Aware Dictionary and sEntiment Reasoner)
TextBlob
IBM Watson Natural Language Understanding
Lexalytics
Aylien Text Analysis API
Recommendation Systems:
Apache Mahout
LightFM
Surprise
Amazon Personalize
TensorFlow Recommenders
AI-driven Marketing Tools:
Salesforce Einstein
Marketo
HubSpot
Adobe Sensei
Optimizely
AI-powered Content Creation:
Artbreeder
Copy.ai
ShortlyAI
Jasper (Journalism AI)
AI Dungeon
PerfectEssayWriter.ai
MyPerfectPaper.net – AI Essay Writing
Healthcare AI Tools:
IBM Watson Health
NVIDIA Clara
Google Health
Ada Health
PathAI
AI in Finance:
AlphaSense
QuantConnect
Kensho Technologies
FactSet
Yewno|Edge
AI in Cybersecurity:
Darktrace
Cylance
CrowdStrike Falcon
Symantec AI Solutions
FireEye Helix
AI in Robotics:
ROS (Robot Operating System)
NVIDIA Isaac
Universal Robots
SoftBank Robotics
Boston Dynamics
AI in Energy and Sustainability:
Google DeepMind for Energy
C3.ai
GridGain Systems
Siemens Digital Grid
Envision Digital
AI in Agriculture:
Climate Corporation
Blue River Technology
PrecisionHawk
AgShift
Taranis
AI in Education:
Duolingo
Coursera
Gradescope
DreamBox Learning
Carnegie Learning
AI in Supply Chain Management:
Llamasoft
Blue Yonder (formerly JDA Software)
Element AI
ClearMetal
Kinaxis
AI in Gaming:
Unity ML-Agents
NVIDIA Deep Learning Super Sampling (DLSS)
Unreal Engine AI
Microsoft Project Malmo
IBM Watson Unity SDK
AI in Transportation:
Waymo
Tesla Autopilot
Uber ATG (Advanced Technologies Group)
Didi Chuxing AI Labs
Mobileye by Intel
AI in Customer Service:
Zendesk AI
Ada Support
Helpshift
Intercom
Freshworks AI
AI in Legal Services:
ROSS Intelligence
Luminance
Kira Systems
Casetext
Lex Machina
AI in Real Estate:
Zillow
Redfin
CompStak
Skyline AI
Matterport
AI in Human Resources:
HireVue
Textio
Pymetrics
Traitify
Visage
AI in Retail:
Amazon Go
Salesforce Commerce Cloud Einstein
Blue Yonder (formerly JDA Software)
Dynamic Yield
Sentient Ascend
AI in Personalization and Recommendation:
Netflix Recommendation System
Spotify Discover Weekly
Amazon Product Recommendations
YouTube Recommendations
Pandora Music Genome Project
AI in Natural Disaster Prediction:
One Concern
Jupiter
Descartes Labs
Zizmos
Earth AI
AI in Language Translation:
Google Translate
DeepL
Microsoft Translator
SYSTRAN
Translate.com
AI in Facial Recognition:
Amazon Rekognition
Face++ by Megvii
Kairos
Microsoft Azure Face API
NEC NeoFace
AI in Music Generation:
AIVA
Amper Music
Jukedeck
Magenta by Google
OpenAI Jukebox
AI in Remote Sensing:
Orbital Insight
Descartes Labs
SkyWatch
TerrAvion
Planet Labs
AI in Document Management:
DocuSign
Adobe Acrobat
Abbyy FineReader
DocuWare
Nitro
AI in Social Media Analysis:
Brandwatch
Sprinklr
Talkwalker
Hootsuite Insights
Synthesio
AI in Fraud Detection:
Feedzai
Forter
Simility
Featurespace
Signifyd
AI in Smart Cities:
Sidewalk Labs
CityBrain by Alibaba Cloud
Siemens City Performance Tool
StreetLight Data
SmartCone
AI in Mental Health:
Woebot
Wysa
X2AI
Talkspace
Ginger
AI in Music Streaming Services:
Spotify
Apple Music
Pandora
Tidal
Deezer
AI in Journalism:
Automated Insights
Narrativa
Heliograf by The Washington Post
Wordsmith by Automated Insights
RADAR by The Associated Press
AI in Predictive Maintenance:
Uptake
IBM Maximo Asset Performance Management
SAS Predictive Maintenance
Predikto
Augury
AI in 3D Printing:
Autodesk Netfabb
Formlabs PreForm
Stratasys GrabCAD
Materialise Magics
SLM Solutions
AI in Wildlife Conservation:
ConservationFIT
PAWS (Protection Assistant for Wildlife Security)
Instant Wild
TrailGuard AI
Wildlife Insights
AI in Graphic Design:
Adobe Sensei (Adobe Creative Cloud’s AI platform)
Canva’s Magic Resize
Designhill’s AI Logo Maker
Tailor Brands
Piktochart
A Daily chronicle of AI Innovations April 23rd 2024: Microsoft launches its smallest AI model that can fit on your phone Meta opens Quest OS to third-party developers to rival Apple Adobe claims its new image generation model is its best yet Adobe survey says 50% Americans use generative AI everyday Mercedes-Benz becomes first automaker to sell Level 3 autonomous vehicles in the US GPT-4 can exploit zero-day security vulnerabilities all by itself, a new study finds Creative Artists Agency (CAA) is testing an AI initiative called CAA Vault Poetry Camera by Kelin Carolyn Zhang and Ryan Mather generates poems from pictures Alethea AI launched expressive AI avatars on Coinbase’s blockchain
Microsoft launches its smallest AI model that can fit on your phone
Microsoft launched Phi-3-Mini, a 3.8 billion parameter language model, as the first of three small models in the Phi-3 series. It is trained on a smaller dataset than larger LLMs like GPT-4 and outperforms models like Meta’s Llama 2 7B and GPT-3.5 on benchmarks like MMLU and MT-bench. The Phi-3 series also includes Phi-3-Small (7B parameters) and Phi-3-Medium (14B parameters), which are more capable than Phi-3-Mini.
What sets Phi-3-Mini apart is its ability to run locally on mobile devices like the iPhone 14, thanks to its optimized size and innovative quantization techniques. Microsoft’s team took inspiration from how children learn, using a “curriculum” approach to train Phi-3 on synthetic “bedtime stories” and simplified texts. While robust for its size, Phi-3-Mini is limited in storing extensive factual knowledge and is primarily focused on English.
Why does this matter?
Microsoft’s innovative training approach could lead to more effective and efficient model development techniques. However, Phi-3-Mini’s limitations in storing factual knowledge and its English-centric focus highlight the challenges in creating truly comprehensive and multilingual AI systems.
Adobe survey says 50%Americans use generative AI everyday
Adobe surveyed 3,000 consumers on February 15-19, 2024, about their usage of generative AI and found over half of Americans have already used generative AI. The majority believe it helps them be more creative. Adobe’s Firefly has generated 6.5 billion images since its inception last March. Americans use generative AI for research, brainstorming, creating content, searching, summarization, coding, and learning new skills.
Moreover, 41% of Americans expect brands to use AI for personalized shopping, price comparisons, and customer support. Adobe’s data also reveals that online traffic to retail and travel sites has surged, with faster customer service and more creative experiences due to generative AI tools.
Why does this matter?
Gen AI’s usage has increased over time. Many surveys last year found that very less percentage of Americans used ChatGPT. As generative AI tools become more accessible, businesses must embrace this technology faster to deliver experiences that resonate with modern consumers.
With the recent addition of Google DeepMind co-founder Mustafa Suleyman to lead Microsoft’s consumer AI division, Microsoft has once again poached a former Meta VP of infrastructure. This strategic hire comes amidst rumors of Microsoft and OpenAI’s plans to construct a $100 billion supercomputer, “Stargate,” to power their AI models.
Jason Taylor oversaw infrastructure for AI, data, and privacy in Meta. He will join Microsoft as the corporate vice president and deputy CTO, tasked with building systems to advance the company’s AI ambitions.
Why does this matter?
Microsoft’s aggressive moves in the AI space highlight the fierce competition among tech giants. As AI systems become increasingly resource-intensive, having the right talent will be vital for delivering cutting-edge AI experiences. In addition to strategic hires, Microsoft is rumored to develop a supercomputer project, which could have far-reaching implications for various industries.
Creative Artists Agency (CAA) is testing an AI initiative called CAA Vault
Hollywood’s leading talent agency allows their A-list clients to create digital clones of themselves. The agency is partnering with AI firms to scan their clients’ bodies, faces, and voices. These AI replicas can reshoot scenes, dubbing, or superimpose onto stunt doubles in film and TV production. CAA is also planning to make this technology available to the entire industry. (Link)
Poetry Camera by Kelin Carolyn Zhang and Ryan Mather generates poems from pictures
Powered by GPT-4, this open-source AI camera allows users to choose from various poetic forms from the scenes it captures. It prioritizes privacy by not digitally saving images or poems. The positive response has led the creators to consider making the Poetry Camera commercially available. (Link)
Alethea AI launched expressive AI avatars on Coinbase’s blockchain
Their proprietary Emote Engine powers high-fidelity facial animations, body movements, and generative AI capabilities. The platform lets users create AI agents quickly and collaborate with the community. Creators can also monetize their AI agents without centralized censorship or revenue sharing. Alethea AI aims to create an avatar arena featuring full-body animation, voice, and lip-syncing. (Link)
TikTok is working on a new feature that lets users clone their voice
Discovered in the latest Android app version, this new AI text-to-speech feature will allow users to record their voices, which will then be added to the TikTok Voice Library for others. While the feature is still under development, it’s already raising concerns about potential misuse and spreading misinformation. TikTok is expected to provide additional details on privacy and safety measures when the feature is ready for broader release. (Link)
Meta opens Quest OS to third-party developers to rival Apple
Meta is licensing its Horizon OS, designed for Quest headsets, to hardware manufacturers such as Lenovo and Asus and creating a special Quest version with Xbox.
The company is promoting alternative app stores on its platform, making its App Lab store more visible and inviting Google to integrate the Play Store with Horizon OS.
With Horizon OS, Meta aims to create a more open ecosystem similar to Microsoft’s approach with Windows, focusing on expanding its social network Horizon through licensing and hardware partnerships.
Adobe claims its new image generation model is its best yet
Adobe has introduced its third-generation image-generation model, Firefly Image 3, which boasts enhanced realism and improved rendering capabilities for complex scenes and lighting, compared to its predecessors.
Firefly Image 3, which is now integrated into Photoshop and the Adobe Firefly web app, features advancements such as better understanding of detailed prompts, more accurate depiction of dense crowds, and improved text and iconography rendering.
In addition to technical improvements, Adobe emphasizes ethical AI practices with Firefly Image 3 by using a diverse and ethically sourced training dataset, including content from Adobe Stock and AI-generated images under strict moderation.
Mercedes-Benz becomes first automaker to sell Level 3 autonomous vehicles in the US
Mercedes-Benz is the first automaker to sell Level 3 autonomous driving vehicles in the U.S., with the EQS and S-Class sedans now available in California and Nevada.
The Drive Pilot feature in these vehicles allows drivers to take their eyes off the road and hands off the wheel in certain conditions, requiring a $2,500 yearly subscription.
Drive Pilot can be activated only during specific conditions such as clear weather, daytime, in heavy traffic under 40 mph, and on preapproved freeways in California and Nevada.
GPT-4 can exploit zero-day security vulnerabilities all by itself, a new study finds
GPT-4 has demonstrated the ability to exploit zero-day security vulnerabilities autonomously, as revealed by a new study.
The study, conducted by researchers from the University of Illinois Urbana-Champaign, found that GPT-4 could exploit 87% of tested vulnerabilities, significantly outperforming other models including GPT-3.5.
Despite the potential for “security through obscurity” strategies, the researchers advocate for more proactive security measures against the risks posed by highly capable AI agents like GPT-4.
A Daily chronicle of AI Innovations April 22 2024: iOS 18 to have AI features with on-device processing Many-shot ICL is a breakthrough in improving LLM performance Groq shatters AI inference speed record with 800 tokens/second on LLaMA 3 Why Zuckerberg wants to give away a $10B AI model Sundar Pichai tells Google staff he doesn’t want any more political debates in the office Israel-based startup enters AI humanoid race with Menteebot Hugging Face introduces benchmark for evaluating gen AI in healthcare Google announces major restructuring to accelerate AI development Nothing’s new earbuds offer ChatGPT integration Japanese researchers develop AI tool to predict employee turnover
iOS 18 to have AI features with complete on-device processing
Apple is set to make significant strides in artificial intelligence with the upcoming release of iOS 18. According to Apple Insider’s recent report, the tech giant is focusing on privacy-centric AI features that will function entirely on-device, eliminating the need for cloud-based processing or an internet connection. This approach addresses concerns surrounding AI tools that rely on server-side processing, which have been known to generate inaccurate information and compromise user privacy.
The company is reportedly developing an in-house LLM called “Ajax,” which will power AI features in iOS 18. Users can expect improvements to Messages, Safari, Spotlight Search, and Siri, with basic text analysis and response generation available offline. We’ll learn more about Apple’s AI plans at the Worldwide Developers Conference (WWDC) starting June 10.
Why does this matter?
Apple’s commitment to user data privacy is commendable, but eliminating cloud-based processing and internet connectivity may impede the implementation of more advanced features. Nevertheless, it presents an opportunity for Apple to differentiate itself from competitors by offering users a choice between privacy-focused on-device processing and more powerful cloud-based features.
Many-shot-in-context learning is a breakthrough in improving LLM performance
A recent research paper has introduced a groundbreaking technique that enables LLMs to significantly improve performance by learning from hundreds or thousands of examples provided in context. This approach, called many-shot in-context learning (ICL), has shown superior results compared to the traditional few-shot learning method across a wide range of generative and discriminative tasks.
To address the limitation of relying on human-generated examples for many-shot ICL, the researchers explored two novel settings: Reinforced ICL, which uses model-generated chain-of-thought rationales instead of human examples, and Unsupervised ICL, which removes rationales from the prompt altogether and presents the model with only domain-specific questions.
Both approaches have proven highly effective in the many-shot regime, particularly for complex reasoning tasks. Furthermore, the study reveals that many-shot learning can override pretraining biases and learn high-dimensional functions with numerical inputs, unlike few-shot learning, showcasing its potential to revolutionize AI applications.
Why does this matter?
Many-shot ICL allows for quick adaptation to new tasks and domains without the need for extensive fine-tuning or retraining. However, the success of many-shot ICL heavily depends on the quality and relevance of the examples provided. Moreover, as shown by Anthropic’s jailbreaking experiment, some users could use this technique to intentionally provide carefully crafted examples designed to exploit vulnerabilities or introduce biases, leading to unintended and dangerous consequences.
Groq shatters AI inference speed record with 800 tokens/second on LLaMA 3
AI chip startup Groq recently confirmed that its novel processor architecture is serving Meta’s newly released LLaMA 3 large language model at over 800 tokens per second. This translates to generating about 500 words of text per second – nearly an order of magnitude faster than the typical speeds of large models on mainstream GPUs. Early testing by users seems to validate the claim.
Groq’s Tensor Streaming Processor is designed from the ground up to accelerate AI inference workloads, eschewing the caches and complex control logic of general-purpose CPUs and GPUs. The company asserts this “clean sheet” approach dramatically reduces the latency, power consumption, and cost of running massive neural networks.
Why does this matter?
If the LLaMA 3 result holds up, it could shake up the competitive landscape for AI inference, challenging Nvidia’s dominance of GPUs and increasing the demand for purpose-built AI hardware for faster and more cost-effective inference solutions. Also, Groq’s capabilities could revolutionize software solutions that depend on real-time AI, such as virtual assistants, chatbots, and interactive customer services.
Mark Zuckerberg, CEO of Meta, said in a podcast he would be willing to open source a $10 billion AI model under certain conditions if it was safe and beneficial for all involved.
Zuckerberg believes that open sourcing can mitigate dependency on a few companies controlling AI technology, fostering innovation and competition.
He also points to Meta’s strong open-source legacy with projects like PyTorch and the Open Compute Project, which have significantly reduced costs and expanded supply chains by making their designs available to the public.
Sundar Pichai tells Google staff he doesn’t want any more political debates in the office
Google CEO Sundar Pichai directed employees to stop bringing political debates into the workplace, emphasizing the company as a business focused on being an objective information provider.
The directive came after 28 employees were fired for protesting against a controversial cloud computing contract.
Pichai’s stance reflects a broader trend in tech companies to restrict political discourse at work to maintain focus and avoid internal conflicts, with companies like Coinbase and Meta implementing similar policies.
Israel-based startup enters AI humanoid race with Menteebot
Israel-based startup Mentee Robotics has unveiled Menteebot, an AI-driven humanoid robot prototype for home and warehouse use. It employs transformer-based large language models, NeRF-based algorithms, and simulator-to-reality machine learning to understand commands, create 3D maps, and perform tasks. The finalized Menteebot is anticipated to launch in Q1 2025. (Link)
Hugging Face introduces benchmark for evaluating gen AI in healthcare
The benchmark combines existing test sets to assess medical knowledge and reasoning across various fields. It’s a starting point for evaluating healthcare-focused AI models, but experts caution against relying solely on the benchmark and emphasize the need for thorough real-world testing. (Link)
Google announces major restructuring to accelerate AI development
The changes involve consolidating AI model building at Google Research and DeepMind, focusing Google Research on foundational breakthroughs and responsible AI practices, and introducing a new “Platforms & Devices” product area. (Link)
Nothing’s new earbuds offer ChatGPT integration
Nothing Ear and Nothing Ear (a) allow users to ask questions by pinching the headphones’ stem, provided the ChatGPT app is installed on a connected Nothing handset. The earbuds offer improved sound quality, better noise-canceling, and longer battery life than their predecessors. (Link)
Japanese researchers develop AI tool to predict employee turnover
The tool analyzes employee data, such as attendance records and personal information, and creates a turnover model for each company. By predicting which new recruits are likely to quit, the AI tool enables managers to offer targeted support to those employees and potentially reduce turnover rates. (Link)
Apple acquires Paris-based AI company Datakalab to bolster its AI technology. LINK
China’s AI data centers to outpace Korea’s human water consumption. LINK
Google Gemini app on Android may soon let you read ‘real-time responses’. LINK
Bitcoin miners upgrade power centers and get into AI to brace for slashed revenue post halving. LINK
Ecosia launches world’s first energy-generating browser. LINK
A Daily chronicle of AI Innovations April 20th 2024 and April Week 3 Recap: OpenAI fires back at Elon Musk Google DeepMind researchers call for limits on AI that mimics humans Bitcoin just completed its fourth-ever ‘halving’ Twitter alternative Post News is shutting down
OpenAI fires back at Elon Musk
OpenAI has refuted Elon Musk’s lawsuit allegations, asserting that he is attempting to discredit the company for his own commercial gain after a failed attempt to dominate it years ago.
The company’s legal team disputes Musk’s claim that OpenAI violated its founding principles by commercializing its technology and forming a partnership with Microsoft.
OpenAI has requested a court to dismiss Musk’s lawsuit, arguing that there is no basis for his claims, and a hearing for the motion is set for April 24.
Google DeepMind researchers call for limits on AI that mimics humans
Google DeepMind researchers advocate for setting limits on AI that imitates human behaviors, highlighting the risk of users forming overly close bonds that could lead to loss of autonomy and disorientation.
Their paper discusses the potential of AI assistants to enhance daily life by acting as partners in creativity, analysis, and planning, but warns they could also misalign with user and societal interests, potentially exacerbating social and technological inequalities.
Researchers call for comprehensive research and protective measures, including restrictions on human-like elements in AI, to ensure these systems preserve user autonomy and prevent negative social impacts while promoting the advancement of socially beneficial AI.
What happened in AI from April 14th to April 20th 2024
xAI’s first multimodal model with a unique dataset Infini-Attention: Google’s breakthrough gives LLMs limitless context Adobe’s Firefly AI trained on competitor’s images: Bloomberg report Adobe partners with OpenAI, RunwayML & Pika for Premiere Pro Reka launches Reka Core: Their frontier in multimodal AI OpenAI is opening its first international office in Tokyo NVIDIA RTX A400 A1000: Lower-cost single slot GPUs Amazon Music launches Maestro, an AI-based playlist generator Stanford’s report reflects industry dominance and rising training costs in AI Microsoft VASA-1 generates lifelike talking faces with audio Boston Dynamics charges up for the future by electrifying Atlas Intel reveals world’s largest brain-inspired computer Meta released two Llama 3 models; 400B+ models in training Mixtral 8x22B claims highest open-source performance and efficiency Meta’s Megalodon to solve the fundamental challenges of the Transformer
A Daily chronicle of AI Innovations April 19th 2024: Meta declares war on OpenAI Meta’s Llama 3 models are here; 400B+ models in training! Google consolidates teams with aim to create AI products faster Apple pulls WhatsApp, Threads and Signal from app store in China Moderna CEO says AI will help scientists understand ‘most diseases’ in 3 to 5 years Mixtral 8x22B claims highest open-source performance and efficiency Meta’s Megalodon to solve the fundamental challenges of the Transformer Meta adds its AI chatbot, powered by Llama 3, to the search bar in all its apps. Wayve introduces LINGO-2, a groundbreaking AI model that drives and narrates its journey. Salesforce updates Slack AI with smart recaps and more languages. US Air Force tests AI-controlled jets against human pilots in simulated dogfights. Google Maps will use AI to find out-of-the-way EV chargers for you.
Meta’s Llama 3 models are here; 400B+ models in training!
Llama 3 is finally here! Meta introduced the first two models of the Llama 3 family for broad use: pretrained and instruction-fine-tuned language models with 8B and 70B parameters. Meta claims these are the best models existing today at the 8B and 70B parameter scale, with greatly improved reasoning, code generation, and instruction following, making Llama 3 more steerable.
But that’s not all. Meta is also training large models with over 400B parameters. Over coming months, it will release multiple models with new capabilities including multimodality, the ability to converse in multiple languages, a much longer context window, and stronger overall capabilities.
Llama 3 models will soon be available on AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM WatsonX, Microsoft Azure, NVIDIA NIM, and Snowflake, and with support from hardware platforms offered by AMD, AWS, Dell, Intel, NVIDIA, and Qualcomm.
Why does this matter?
While Llama 400B+ is still in training, it is already trending. Its release might mark a watershed moment for AI as the open-source community gains access to a GPT4-class model. It will be a powerful foundation for research efforts, and it could be a win for open-source in the longer run if startups/businesses start building more local, tailored models with it.
Mixtral 8x22B claims highest open-source performance and efficiency
Mistral AI has unveiled Mixtral 8x22B, a new open-source language model that the startup claims achieves the highest open-source performance and efficiency. it’s sparse mixture-of-experts (SMoE) model actively uses only 39 billion of its 141 billion parameters. As a result, it offers an exceptionally good price/performance ratio for its size.
The model’s other strengths include multilingualism, with support for English, French, Italian, German, and Spanish, as well as strong math and programming capabilities.
Why does this matter?
While Mistral AI’s claims may be true, there’s a new competitor on the market- Llama 3. So we might have to reconsider the claims on the best open-source model out right now. But whatever the benchmarks say, only the practical usefulness of these models will tell which is truly superior.
Meta’s Megalodon to solve the fundamental challenges of the Transformer
Researchers at Meta and the University of Southern California have proposed a new model that aims to solve some of the fundamental challenges of the Transformer, the deep learning architecture that gave rise to the age of LLMs.
The model, called Megalodon, allows language models to extend their context window to millions of tokens without requiring huge amounts of memory. Experiments show that Megalodon outperforms Transformer models of equal size in processing large texts. The researchers have also obtained promising results on small– and medium-scale experiments on other data modalities and will later work on adapting Megalodon to multi-modal settings.
Why does this matter?
Scientists have been looking for alternative architectures that can replace transformers. Megalodon is the latest in the series. However, much research has already been poured into enhancing and making transformers efficient. For example, Google’s Infini-attention released this week. So, the alternatives have a lot to catch up to. For now, transformers continue to remain the dominant architecture for language models.
Meta has expanded the integration of its AI assistant into platforms like Instagram, WhatsApp, Facebook, and a standalone website, aiming to challenge ChatGPT in the AI chatbot market.
Meta announced Llama 3, its latest AI model, which reportedly outperforms its predecessors and competitors in several benchmarks, with versions available for both internal use and external developers.
CEO Mark Zuckerberg stated that with Llama 3, Meta aims to establish the most advanced and globally accessible AI assistant, featuring enhanced capabilities such as integrated real-time search results and improved image generation.
Google consolidates teams with aim to create AI products faster
Google is merging its Android and Chrome software division with the Pixel and Fitbit hardware division to more extensively incorporate artificial intelligence into the company.
CEO Sundar Pichai stated that this integration aims to “turbocharge the Android and Chrome ecosystems” and foster innovation under the leadership of executive Rick Osterloh.
The reorganization reflects Google’s strategy to leverage AI for consumer and enterprise applications, emphasizing AI’s role in enhancing features like the Pixel camera.
Moderna CEO says AI will help scientists understand ‘most diseases’ in 3 to 5 years
Moderna CEO Stéphane Bancel predicted that AI will enable scientists to understand most diseases within the next 3 to 5 years, marking a significant milestone for human health.
AI is expected to accelerate drug development, allowing pharmaceutical companies to bring new medicines to patients faster and improving the diagnosis of conditions like heart disease.
Bancel expressed optimism about AI’s potential in healthcare, citing insights gained from AI that were previously unknown to scientists, indicating a bright future for medical research and treatment.
Meta adds its AI chatbot, powered by Llama 3, to the search bar in all its apps.
Meta has upgraded its AI chatbot with its newest LLM Llama 3 and has added it to the search bar of its apps– Facebook, Messenger, Instagram, and WhatsApp– in multiple countries. It also launched a new meta.ai site for users to access the chatbot and other new features such as faster image generation and access to web search results. (Link)
Wayve introduces LINGO-2, a groundbreaking AI model that drives and narrates its journey.
LINGO-2 merges vision, language, and action, resulting in every driving maneuver coming with an explanation. This provides a window into the AI’s decision-making, deepening trust and understanding of our assisted and autonomous driving technology. (Link)
Salesforce updates Slack AI with smart recaps and more languages.
Salesforce rolled out Gen AI updates for Slack. The new features build on the native AI smarts– collectively dubbed Slack AI– announced in Feb and provide users with easy-to-digest recaps to stay on top of their day-to-day work interactions. Salesforce also confirmed expanding Slack AI to more languages. (Link)
US Air Force tests AI-controlled jets against human pilots in simulated dogfights.
The Defense Advanced Research Projects Agency (DARPA) revealed that an AI-controlled jet successfully faced a human pilot during an in-air dogfight test last year. The agency has conducted 21 test flights so far and says the tests will continue through 2024. (Link)
Google Maps will use AI to find out-of-the-way EV chargers for you.
Google Maps will use AI to summarize customer reviews of EV chargers to display more specific directions to certain chargers, such as ones in parking garages or more hard-to-find places. The app will also have more prompts to encourage users to submit their feedback after using an EV charger. (Link)
A Daily chronicle of AI Innovations April 18th 2024: Samsung unveils lightning-fast DRAM for AI-powered devices Logitech’s new AI prompt builder & Signature AI edition mouse Snapchat to add watermark to images produced with its AI tools US Air Force confirms first successful AI dogfight Mistral’s latest model sets new records for open source LLMs Microsoft’s new AI model creates hyper-realistic video using static image GPT-4 nearly matches expert doctors in eye assessments Brave unleashes real-time privacy-focused AI answer engine Snapchat to add watermark to images produced with its AI tools
Microsoft’s VASA-1 generates lifelike talking faces with audio
Microsoft Research’s groundbreaking project, VASA-1, introduces a remarkable framework for generating lifelike talking faces from a single static image and a speech audio clip.
This premiere model achieves exquisite lip synchronization and captures a rich spectrum of facial nuances and natural head motions, resulting in hyper-realistic videos.
Why does it matter?
VASA-1 is crucial in AI for improving lifelike interactions with realistic facial expressions, benefiting customer service, education, and companionship. Its expressive features also enhance storytelling in games and media. Additionally, VASA-1 contributes to developing accessibility tools for those with communication challenges.
Boston Dynamics charges up for the future by electrifying Atlas
Boston Dynamics has unveiled an electric version of their humanoid robot, Atlas. Previously powered by hydraulics, the new Atlas operates entirely on electricity. This development aims to enhance its strength and range of motion, making it more versatile for real-world applications.
Boston Dynamics also plans to collaborate with partners like Hyundai to test and iterate Atlas applications in various environments, including labs, factories, and everyday life.
Why does it matter?
The electric version of Boston Dynamics’ humanoid robot, Atlas, matters because it offers enhanced strength, agility, and practicality for real-world applications. Its electric power source allows it to move in ways that exceed human capabilities, making it versatile for various tasks
Intel reveals world’s largest brain-inspired computer
Intel has introduced the world’s largest neuromorphic computer, mimicking the human brain. Unlike traditional computers, it combines computation and memory using artificial neurons. With 1.15 billion neurons, it consumes 100 times less energy than conventional machines. It performs 380 trillion synaptic operations per second This breakthrough could revolutionize AI and enhance energy-efficient computing.
Why does it matter?
In current AI models, data transfer between processing units can be bottlenecks. Neuromorphic architectures, directly address this issue by integrating computation and memory. This could lead to breakthroughs in training deep learning models.
US Air Force confirms first successful AI dogfight
The US Air Force, via DARPA, announced that an AI-controlled jet successfully engaged in an in-air dogfight against a human pilot for the first time, during tests at Edwards Air Force Base in California in September 2023.
DARPA has been working on AI for air combat through its Air Combat Evolution (ACE) program since December 2022, aiming to develop AI capable of autonomously flying fighter jets while adhering to safety protocols.
The AI was tested in a real aircraft, the experimental X-62A, against an F-16 flown by a human, achieving close maneuvers without the need for human pilots to intervene, and plans to continue testing through 2024.
Mistral’s latest model sets new records for open source LLMs
French AI startup Mistral AI has released Mixtral 8x22B, claiming it to be the highest-performing and most efficient open-source language model, utilizing a sparse mixture-of-experts model with 39 billion of its 141 billion parameters active.
Mixtral 8x22B excels in multilingual support and possesses strong math and programming capabilities, despite having a smaller context window compared to leading commercial models like GPT-4 or Claude 3.
The model, licensed under the Apache 2.0 license for unrestricted use, achieves top results on various comprehension and logic benchmarks and outperforms other models in its supported languages on specific tests.
Intel unveils the world’s largest neuromorphic computer
Intel Labs introduced its largest neuromorphic computer yet, the Hala Point, featuring 1.15 billion neurons, likened to the brain capacity of an owl, that aims to process information more efficiently by emulating the brain’s neurons and synapses in silicon.
The Hala Point system, consuming 2,600 W, is designed to achieve deep neural network efficiencies up to 15 TOPS/W at 8-bit precision, significantly surpassing Nvidia’s current and forthcoming systems in energy efficiency.
While showcasing remarkable potential for AI inference and optimization problems with significantly reduced power consumption, Intel’s neuromorphic technology is not yet a universal solution for all AI workloads, with limitations in general-purpose AI acceleration and challenges in adapting large language models.
Microsoft’s new AI model creates hyper-realistic video using static image
Microsoft introduced VASA-1, an AI model that produces hyper-realistic videos from a single photo and audio clip, featuring realistic lip syncs and facial movements.
The model can create 512x512p resolution videos at 40fps from one image, support modifications like eye gaze and emotional expressions, and even incorporate singing or non-English audio.
While Microsoft recognizes the AI’s potential for misuse in creating deepfakes, it intends to use VASA-1 solely for developing virtual interactive characters and advancing forgery detection.
GPT-4 nearly matches expert doctors in eye assessments
OpenAI’s GPT-4 almost matched the performance of expert ophthalmologists in an eye assessment study, as reported by the Financial Times and conducted by the University of Cambridge’s School of Clinical Medicine.
GPT-4 scored higher than trainee and junior doctors with 60 correct answers out of 87, closely following the expert doctors’ average score of 66.4, in a test evaluating knowledge on various ophthalmology topics.
The study, highlighting both potential benefits and risks, indicates that while GPT-4 shows promise in medical assessments, concerns about inaccuracies and the model’s tendency to “hallucinate” answers remain.
Samsung unveils lightning-fast DRAM for AI-powered devices
Samsung Electronics has achieved a significant milestone by developing the industry’s fastest LPDDR5X DRAM, capable of reaching speeds up to 10.7 Gbps. This new LPDDR5X offers 25% higher performance and 30% more capacity, making it an optimal solution for the on-device AI era. (Link)
Logitech’s new AI prompt builder & Signature AI edition mouse
Logitech has launched the Logi AI Prompt Builder, a free software tool that enhances interaction with OpenAI’s ChatGPT. It allows Logitech keyboards and mice to serve as shortcuts for more fluent AI prompts. Additionally, Logitech introduced the Signature AI Edition Mouse, featuring a dedicated AI prompt button. (Link)
Snapchat to add watermark to images produced with its AI tools
Snapchat plans to add watermarks to AI-generated images on its platform. These watermarks, featuring a translucent Snap logo and a sparkle emoji, will enhance transparency and prevent content misuse. (Link)
Brave unleashes real-time privacy-focused AI answer engine
Brave, the privacy-centric browser, has introduced an AI-driven answer engine within Brave Search. Unlike competitors, it prioritizes privacy by avoiding external search engines. The feature provides real-time generative answers across multiple languages, making it a robust alternative to traditional search. (Link)
LinkedIn tests premium company page subscription
LinkedIn is quietly testing a Premium Company Page subscription service for small and medium businesses. The service includes AI-generated content, follower-enhancing tools, and other features to elevate company profiles. Pricing starts at $99.99 per month. (Link)
A Daily chronicle of AI Innovations April 17th 2024: NVIDIA RTX A400 A1000: Lower-cost single slot GPUs; Stanford’s report reflects industry dominance and rising training costs in AI; Amazon Music launches Maestro, an AI playlist generator; Snap adds watermarks to AI-generated images; Boston Dynamics unveils a new humanoid robot; Andreessen Horowitz raises $7.2 billion, a sign that tech startup market may be bouncing back; OpenAI offers a 50% discount for off-peak GPT usage; AMD unveils AI chips for business laptops and desktops; Anthropic Claude 3 Opus is now available on Amazon Bedrock; Zendesk launches an AI-powered customer experience platform; Intel and The Linux Foundation launch Open Platform for Enterprise AI (OPEA)
Google will pump more than $100B into AI says DeepMind boss
DeepMind CEO predicts Google will invest over $100 billion in AI, surpassing rivals like Microsoft in processing prowess.
Google’s investment in AI may involve hardware like Axion CPUs based on the Arm architecture, claimed to be faster and more efficient than competitors.
Some of the budget will likely go to DeepMind, known for its work on the software side of AI, despite recent mixed results in material discoveries and weather prediction.
DeepMind has made progress in teaching AI social skills, a crucial step in advancing AI capabilities.
Hassabis emphasized the need for significant computing power, a reason for teaming up with Google in 2014.
A monster of a paper by Stanford, a 500-page report on the 2024 state of AI
Top 10 Takeaways:
AI beats humans on some tasks, but not on all. AI has surpassed human performance on several benchmarks, including some in image classification, visual reasoning, and English understanding. Yet it trails behind on more complex tasks like competition-level mathematics, visual commonsense reasoning and planning.
Industry continues to dominate frontier AI research. In 2023, industry produced 51 notable machine learning models, while academia contributed only 15. There were also 21 notable models resulting from industry-academia collaborations in 2023, a new high.
Frontier models get way more expensive. According to AI Index estimates, the training costs of state-of-the-art AI models have reached unprecedented levels. For example, OpenAI’s GPT-4 used an estimated $78 million worth of compute to train, while Google’s Gemini Ultra cost $191 million for compute.
The United States leads China, the EU, and the U.K. as the leading source of top AI models. In 2023, 61 notable AI models originated from U.S.-based institutions, far outpacing the European Union’s 21 and China’s 15.
Robust and standardized evaluations for LLM responsibility are seriously lacking. New research from the AI Index reveals a significant lack of standardization in responsible AI reporting. Leading developers, including OpenAI, Google, and Anthropic, primarily test their models against different responsible AI benchmarks. This practice complicates efforts to systematically compare the risks and limitations of top AI models.
Generative AI investment skyrockets. Despite a decline in overall AI private investment last year, funding for generative AI surged, nearly octupling from 2022 to reach $25.2 billion. Major players in the generative AI space, including OpenAI, Anthropic, Hugging Face, and Inflection, reported substantial fundraising rounds.
The data is in: AI makes workers more productive and leads to higher quality work. In 2023, several studies assessed AI’s impact on labor, suggesting that AI enables workers to complete tasks more quickly and to improve the quality of their output. These studies also demonstrated AI’s potential to bridge the skill gap between low- and high-skilled workers. Still, other studies caution that using AI without proper oversight can lead to diminished performance.
Scientific progress accelerates even further, thanks to AI. In 2022, AI began to advance scientific discovery. 2023, however, saw the launch of even more significant science-related AI applications— from AlphaDev, which makes algorithmic sorting more efficient, to GNoME, which facilitates the process of materials discovery.
The number of AI regulations in the United States sharply increases. The number of AI related regulations in the U.S. has risen significantly in the past year and over the last five years. In 2023, there were 25 AI-related regulations, up from just one in 2016. Last year alone, the total number of AI-related regulations grew by 56.3%.
People across the globe are more cognizant of AI’s potential impact—and more nervous. A survey from Ipsos shows that, over the last year, the proportion of those who think AI will dramatically affect their lives in the next three to five years has increased from 60% to 66%. Moreover, 52% express nervousness toward AI products and services, marking a 13 percentage point rise from 2022. In America, Pew data suggests that 52% of Americans report feeling more concerned than excited about AI, rising from 37% in 2022.
NVIDIA RTX A400 A1000: Lower-cost single slot GPUs
NVIDIA is expanding its lineup of professional RTX graphics cards with two new desktop GPUs – the RTX A400 and RTX A1000. These new GPUs are designed to bring enhanced AI and ray-tracing capabilities to workstation-class computers. The RTX A1000 GPU is already available from resellers, while the RTX A400 GPU is expected to launch in May.
NVIDIA RTX A400
With 24 tensor cores for AI processing, the A400 enables professionals to run AI apps directly on their desktops, such as intelligent chatbots and copilots. The GPU allows creatives to produce vivid, physically accurate 3D renderings. The A400 also features four display outputs, making it ideal for high-density display environments such as financial services, command and control, retail, and transportation.
NVIDIA RTX A1000
With 72 Tensor Cores, the A1000 offers 3x faster generative AI processing for tools like Stable Diffusion. The A1000 also excels in video processing, as it can process up to 38% more encoding streams and offers up to 2x faster decoding performance than the previous generation. With their slim single-slot design and power consumption of just 50W, the A400 and A1000 GPUs offer impressive features for compact, energy-efficient workstations.
Why does it matter?
NVIDIA RTX A400 and A1000 GPUs provide professionals with cutting-edge AI, graphics, and computing capabilities to increase productivity and unlock creative possibilities. These GPUs can be used by industrial designers, creatives, architects, engineers, healthcare teams, and financial professionals to improve their workflows and achieve faster and more accurate results. With their advanced features and energy efficiency, these GPUs have the potential to impact the future of AI in various industries.
Amazon Music launches Maestro, an AI-based playlist generator
Amazon Music is launching its AI-powered playlist generator, Maestro, following a similar feature introduced by Spotify. Maestro allows users in the U.S. to create playlists by speaking or writing prompts. The AI will then generate a song playlist that matches the user’s input. This feature is currently in beta and is being rolled out to a subset of Amazon Music’s free, Prime, and Unlimited subscribers on iOS and Android.
Like Spotify’s AI playlist generator, Amazon has built safeguards to block inappropriate prompts. However, the technology is still new, and Amazon warns that Maestro “won’t always get it right the first time.”
Why does it matter?
Introducing AI-powered playlist generators could profoundly impact how we discover and consume music in the future. These AI tools can revolutionize music curation and personalization by allowing users to create highly tailored playlists simply through prompts. This trend could increase user engagement, drive more paid subscriptions, and spur further innovation in AI-powered music experiences as companies offer more cutting-edge features.
Standford’s report reflects industry dominance and rising training costs in AI
The AI Index, an independent report by the Stanford Institute for Human-Centered Artificial Intelligence (HAI), provides a comprehensive overview of global AI trends in 2023.
The report states that the industry outpaced academia in AI development and deployment. Out of the 149 foundational models published in 2023, 108 (72.5%) were from industry compared to just 28 (18.8%) from academia.
Google (18) leads the way, followed by Meta (11), Microsoft (9), and OpenAI (7).
United States leads as the top source with 109 foundational models out of 149, followed by China (20) and the UK (9). In case of machine learning models, the United States again tops the chart with 61 notable models, followed by China (15) and France (8).
Regarding AI models’ training and computing costs, Gemini Ultra leads with a training cost of $191 million, followed by GPT-4, which has a training cost of $78 million.
Lastly, in 2023, AI reached human performance levels in many key AI benchmarks, such as reading comprehension, English understanding, visual thinking, image classification, etc.
Why does it matter?
Industry dominance in AI research suggests that companies will continue to drive advancements in the field, leading to more advanced and capable AI systems. However, the rising costs of AI training may pose challenges, as it could limit access to cutting-edge AI technology for smaller organizations or researchers.
A Daily chronicle of AI Innovations April 16th 2024: Adobe partners with OpenAI, RunwayML & Pika for Premiere Pro; Reka launches Reka Core: their frontier in multimodal AI; OpenAI is opening its first international office in Tokyo; Hugging Face has rolled out Idefics2 ; Quora’s Poe aims to become the ‘App Store’ for AI chatbots; Instagram is testing an AI program to amplify influencer engagement; Microsoft has released and open-sourced the new WizardLM-2 family of LLMs; Limitless AI launched a personal meeting assistant in a pendant Microsoft invests $1.5 billion in AI firm Baidu says its ChatGPT-like Ernie bot exceeds 200 million users OpenAI introduces Batch API with up to 50% discount for asynchronous tasks
Adobe partners with OpenAI, RunwayML & Pika for Premiere Pro
Adobe is integrating generative AI in Premiere Pro. The company is developing its own Firefly Video Model and teaming up with third-party AI models like OpenAI’s Sora, RunwayML, and Pika to bring features like Generative Extend, Object Addition and Removal, and Generative B-Roll to the editing timeline.
Adobe is committed to an open approach for delivering models. It allows editors to choose the best AI models for their needs to streamline video workflows, reduce tedious tasks, and expand creativity. It also provides “Content Credentials” to track model usage.
Why does this matter?
Adobe Premiere Pro has been used in blockbuster films like Deadpool, Gone Girl, and Everything Everywhere All at Once. By integrating generative AI into Premiere Pro, Adobe is transforming the film industry, allowing editors to streamline workflows and focus more on creative storytelling.
Reka launches Reka Core: their frontier in multimodal AI
Another day, another GPT-4-class model. But this time, it’s not from the usual suspects like OpenAI, Google, or Anthropic. Reka, a lesser-known AI startup has launched a new flagship offering, Reka Core – a most advanced and one of only two commercially available comprehensive multimodal solutions. It excels at understanding images, videos, and audio while offering a massive context window, exceptional reasoning skills, and even coding.
It outperforms other models on various industry-accepted evaluation metrics. To provide flexibility, Reka Core can be deployed via API, on-premises, or on-device. Reka’s partnerships with Snowflake and Oracle are set to democratize access to this tech for AI innovation across industries.
Why does this matter?
Reka Core matches and even surpasses the performance of leading OpenAI, Google, and Anthropic models across various benchmarks and modalities. By offering cost-effective, multi-modal solutions, Reka has the potential to make advanced AI more accessible and drive new applications across multiple industries.
OpenAI is opening its first international office in Tokyo
OpenAI is releasing a custom version of its GPT-4 model, specially optimized for the Japanese language. This specialized offering promises faster and more accurate performance and improved text handling.
Tadao Nagasaki has been appointed President of OpenAI Japan. The company plans to collaborate with the Japanese government, local businesses, and research institutions to develop safe AI tools that serve Japan’s unique needs. With Daikin and Rakuten already using ChatGPT Enterprise and local governments like Yokosuka City seeing productivity boosts, OpenAI is poised to impact the region significantly.
Why does it matter?
The move reflects OpenAI’s commitment to serving diverse markets. It could set a precedent for other AI companies, fostering a more inclusive and local approach. And as Japan grapples with rural depopulation and labor shortages, AI could prove invaluable in driving progress.
Microsoft will invest $1.5 billion in G42, a leading UAE artificial intelligence firm, as part of a strategic shift to align with American technology and disengage from Chinese partnerships following negotiations with the US government.
The investment enhances Microsoft’s influence in the Middle East, positioning G42 to use Microsoft Azure for its AI services, underpinning US efforts to limit Chinese access to advanced technologies.
This deal, which also involves Microsoft’s Brad Smith joining G42’s board, comes amidst broader US concerns about tech firms with Chinese ties.
OpenAI introduces Batch API with up to 50% discount for asynchronous tasks
OpenAI introduces a new Batch API providing up to 50% discount for asynchronous tasks like summarization, translation, and image classification.
This Batch API allows results for bulk API requests within 24 hours by uploading a JSONL file of requests in batch format, currently supporting only the /v1/chat/completions endpoint.
OpenAI expects this to enable more efficient use of its APIs for applications that require a large number of requests.
Unleash the Power of Generative AI: Build Breakthrough Apps with AWS Bedrock
Struggling to keep up with the rapid advancements in generative AI? AWS Bedrock offers a one-stop shop for developers. Access a variety of high-performing foundation models from leading names in AI, all through a single API. Fine-tune models with your data, leverage pre-built agents, and focus on building innovative applications.
AWS Bedrock is a fully managed service designed for developers to streamline the development of generative AI applications. It consists of high-performing foundation models (FMs) from leading AI companies that can be accessed from a single API. AWS Bedrock has tied its know with AI pros like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and also has in-house capabilities. Each FM has its own unique feature that can be leveraged according to your project preference. This eliminates the need for developers to manage the infrastructure and tooling necessary to train and deploy their own models. Despite the simplified process for developing applications, privacy and security aspects are not compromised. AWS Bedrock ensures the integrity and confidentiality of the developer’s data used for creating generative AI applications.
Key Features of AWS Bedrock
Variety of FMs: A wide range of high-performing models are available for different tasks like text generation, image generation, code generation, and more.
Simple API: A single API that makes it quick and easy to integrate FMs into your applications.
Fully managed service: All the infrastructure and tooling are managed for you to focus on building your applications.
Scalable: Applications can be scaled up or down as the requirement changes.
Secure: AWS Bedrock provides built-in security and privacy features ensuring integrity and confidentiality
How does AWS Bedrock work?
Choose a foundation model: Browse the available models and select the one that best fits your needs.
Send an API request: Use the simple API to send your data to the chosen model.
Receive the output: The model will generate the desired output, such as text, code, or an image.
Integrate the output: Use the output in your application however you like.
Types of Foundation Model (FM)
AWS Bedrocks provides 6 FMs with more than 15 versions that can be leveraged as per the project’s requirements. All the models are pre-trained on large datasets and are very reliable tools for a wide range of applications. The following table shows a brief about these FMs, and to know more about the models, visit the AWS Bedrock official website.
AWS Bedrock Pricing
AWS Bedrock provides two types of pricing models and charges based on the model inference and customization in the model.
On-demand & Batch: The pay-as-you-go pricing model is used without any time-based commitments.
Provisioned Throughput: A sufficient throughput is provided in exchange for a time-based commitment to meet the performance demand of the application. The term can be 1 month or 6-month commitment.
Follow through this pricing table to compare the models, or visit the AWS Bedrock official pricing website to know more about it.
What Else Is Happening in AI on April 16th, 2024
Hugging Face has rolled out Idefics2
Hugging Face has released Idefics2, a more compact and capable version of its visual language model. With just 8 billion parameters, this open-source model enhances image manipulation, improves OCR, and answers questions on visual data. (Link)
Quora’s Poe aims to become the ‘App Store’ for AI chatbots
After a $75 million funding round, Poe has launched a “multi-bot chat” feature that allows users to seamlessly integrate various AI models into a single conversation. Positioning itself as the “app store” for chatbots, Poe is also rolling out monetization tools for creators and planning an enterprise tier for businesses. (Link)
Instagram is testing an AI program to amplify influencer engagement
The “Creator AI” program lets popular creators interact with fans through automated chatbots. The bots will mimic the influencer’s voice using their past content, aiming to boost engagement while cutting down on manual responses. While some creators worry this could undermine authenticity, Meta sees AI as crucial to its future. (Link)
Microsoft has released and open-sourced the new WizardLM-2 family of LLMs
This next-gen LLM lineup boasts three cutting-edge versions—The 8x22B model outperforms even the best open-source alternatives, while the 70B and 7B variants deliver best-in-class reasoning and efficiency, respectively. (Link)
Limitless AI launched a personal meeting assistant in a pendant
Limitless launched a $99 wearable “Limitless Pendant” to transcribe conversations, generate real-time notes, and seamlessly integrate with your work apps. While starting with a focus on meetings, the startup’s CEO Dan Siroker sees Limitless eventually doing much more – proactively surfacing relevant information and even automating tasks on your behalf. (Link)
A Daily chronicle of AI Innovations April 15th 2024: Tesla lays off more than 10% of its workforce Adobe explores OpenAI partnership as it adds AI video tools Apple’s AI features on iOS 18 may run locally on your iPhone xAI’s first multimodal model with a unique dataset Infini-Attention: Google’s breakthrough gives LLMs limitless context Adobe’s Firefly AI trained on competitor’s images: Bloomberg report
xAI’s first multimodal model with a unique dataset
xAI, Elon Musk’s AI startup, has released the preview of Grok-1.5V, its first-generation multimodal AI model. This new model combines strong language understanding capabilities with the ability to process various types of visual information, like documents, diagrams, charts, screenshots, and photographs.
The startup claims Grok-1.5V has shown competitive performance across several benchmarks, including tests for multidisciplinary reasoning, mathematical problem-solving, and visual question answering. One notable achievement is its exceptional performance on the RealWorldQA dataset, which evaluates real-world spatial understanding in AI models.
Developed by xAI, this dataset features over 700 anonymized images from real-world scenarios, each accompanied by a question and verifiable answer. The release of Grok-1.5V and the RealWorldQA dataset aims to advance the development of AI models that can effectively comprehend and interact with the physical world.
Why does this matter?
What makes Grok-1.5V unique is its integration with the RealWorldQA dataset, which focuses on real-world spatial understanding crucial for AI systems in physical environments. The public availability of this dataset could significantly advance the development of AI-driven robotics and autonomous systems. With Musk’s backing, xAI could lead in multimodal AI and contribute to reshaping human-AI interaction.
Google researchers have developed a new technique called Infini-attention that allows LLMs to process text sequences of unlimited length. By elegantly modifying the Transformer architecture, Infini-attention enables LLMs to maintain strong performance on input sequences exceeding 1 million tokens without requiring additional memory or causing exponential increases in computation time.
The key innovation behind Infini-attention is the addition of a “compressive memory” module that efficiently stores old attention states once the input sequence grows beyond the model’s base context length. This compressed long-range context is then aggregated with local attention to generate coherent and contextually relevant outputs.
In benchmark tests on long-context language modeling, summarization, and information retrieval tasks, Infini-attention models significantly outperformed other state-of-the-art long-context approaches while using up to 114 times less memory.
Why does this matter?
Infini-attention can help AI systems expertly organize, summarize, and surface relevant information from vast knowledge bases. Additionally, infinite contextual understanding can help AI systems generate more nuanced and contextually relevant long-form content like articles, reports, and creative writing pieces. Overall, we can expect AI tools to generate more valuable and less generic content with this technique.
Adobe’s Firefly AI trained on competitor’s images: Bloomberg report
In a surprising revelation, Adobe’s AI image generator Firefly was found to have been trained not just on Adobe’s own stock photos but also on AI-generated images from rival platforms like Midjourney and DALL-E. The Bloomberg report, which cites insider sources, notes that while these AI images made up only 5% of Firefly’s training data, their inclusion has sparked an internal ethics debate within Adobe.
The news is particularly noteworthy given Adobe’s public emphasis on Firefly’s “ethical” sourcing of training data, a stance that aimed to differentiate it from competitors. The company had even set up a bonus scheme to compensate artists whose work was used to train Firefly. However, the decision to include AI-generated images, even if labeled as such by the submitting artists, has raised questions about the consistency of Adobe’s ethical AI practices.
Why does it matter?
As AI systems learn from one another in a continuous feedback loop, the distinction between original creation, inspiration, and imitation becomes blurred. This raises complex issues around intellectual property rights, consent, and the difference between remixing and replicating. Moreover, the increasing prevalence of AI-generated content in training data sets could lead to a homogenization of AI outputs, potentially stifling creativity and diversity.
Tesla plans to lay off “more than 10 percent” of its global workforce following its first year-over-year decline in vehicle deliveries since 2020, impacting at least 14,000 employees.
CEO Elon Musk expressed regret over the layoffs in an internal email, stating they are necessary for the company to remain “lean, innovative and hungry” for future growth.
Senior vice president Drew Baglino and policy chair Rohan Patel are among the top executives reported to be leaving the company amid these changes.
Adobe explores OpenAI partnership as it adds AI video tools
Adobe is enhancing Premiere Pro with new AI video tools, enabling capabilities such as video generation, object addition/removal, and clip extension, and is exploring a potential partnership with OpenAI.
The integration of OpenAI’s Sora with Adobe’s video tools is considered an “early exploration,” aiming to augment Adobe’s offerings and provide users with advanced generative capabilities.
Adobe aims to offer more choice to Premiere Pro users by potentially integrating third-party AI models and adding Content Credentials to identify the AI used, despite current limitations and the unclear extent of user control over these new features.
Apple’s AI features on iOS 18 may run locally on your iPhone LINK
Apple’s iOS 18, set to debut at WWDC 2024 on June 10, promises to be the most significant software upgrade with enhanced features like a smarter Siri through generative AI.
According to Bloomberg’s Mark Gurman, the initial set of AI features in iOS 18 will operate entirely on-device without requiring cloud processing, ensuring privacy and efficiency.
Apple is in discussions with AI developers such as Google’s Gemini, OpenAI’s GPT, and Baidu to integrate generative AI tools into iOS 18, potentially including third-party AI chatbots.
What Else Is Happening in AI on April 15th 2024
Meta trials AI chatbot on WhatsApp, Instagram, and Messenger
Meta is testing its AI chatbot, Meta AI, with WhatsApp, Instagram, and Messenger users in India and parts of Africa. The move allows Meta to leverage its massive user base across these apps to scale its AI offerings. Meta AI can answer user queries, generate images from text prompts, and assist with Instagram search queries. (Link)
Ideogram introduces new features to its AI image generation model
Ideogram’s AI image generation model now offers enhanced capabilities like description-based referencing, negative prompting, and options for generating images at varying speeds and quality levels. The upgrade aims to improve image coherence, photorealism, and text rendering quality, with human raters showing a 30-50% preference for the new version over the previous one. (Link)
New Freepik AI tool redefines image generation with realism and versatility
Freepik has launched the latest version of its AI Image Generator that offers real-time generation, infinite variations, and photorealistic results. The tool allows users to create infinite variations of an image with intuitive prompts, combining colors, settings, characters, and scenarios. It delivers highly realistic results and offers a streamlined workflow with real-time generation and infinite scrolling. (Link)
OpenAI promoted ChatGPT Enterprise to corporates with road-show-like events
OpenAI CEO Sam Altman recently hosted events in San Francisco, New York, and London, pitching ChatGPT Enterprise and other AI services to hundreds of Fortune 500 executives. This move is part of OpenAI’s strategy to diversify revenue streams and compete with partner Microsoft in selling AI products to enterprises. The events showcased applications such as call center management, translation, and custom AI solutions. (Link)
Google’s Notes tool now offers custom AI-generated backgrounds
Google has introduced an AI-powered background generation feature for its experimental Notes tool, allowing users to personalize their notes with custom images created from text prompts. The feature, currently available for select users in the US and India, utilizes Google’s Gemini AI model for image generation. (Link)
A Daily chronicle of AI Innovations April 12th 2024: OpenAI fires two researchers for alleged leaking; Apple is planning to bring new AI-focused M4 chips to entire line of Macs; Amazon CEO: don’t wait for us to launch a ChatGPT competitor; ChatGPT GPT-4 just got a huge upgrade; Gabe Newell, the man behind Steam, is working on a brain-computer interface; Cohere’s Rerank 3 powers smarter enterprise search; Apple M4 Macs: Coming soon with AI power!; Meta’s OpenEQA puts AI’s real-world comprehension to test
Cohere has released a new model, Rerank 3, designed to improve enterprise search and Retrieval Augmented Generation (RAG) systems. It can be integrated with any database or search index and works with existing legacy applications.
Rerank 3 offers several improvements over previous models:
It handles a longer context of documents (up to 4x longer) to improve search accuracy, especially for complex documents.
Rerank 3 supports over 100 languages, addressing the challenge of multilingual data retrieval.
The model can search various data formats like emails, invoices, JSON documents, codes, and tables.
Rerank 3 works even faster than previous models, especially with longer documents.
When used with Cohere’s RAG systems, Rerank 3 reduces the cost by requiring fewer documents to be processed by the expensive LLMs.
Plus, enterprises can access it through Cohere’s hosted API, AWS Sagemaker, and Elasticsearch’s inference API.
Why does this matter?
Rerank 3 represents a step towards a future where data is not just stored but actively used by businesses to make smarter choices and automate tasks. Imagine instantly finding a specific line of code from an email or uncovering pricing details buried in years of correspondence.
Apple is overhauling its Mac lineup with a new M4 chip focused on AI processing. This comes after the recent launch of M3 Macs, possibly due to slowing Mac sales and similar features in competitor PCs.
The M4 chip will come in three tiers (Donan, Brava, Hidra) and will be rolled out across various Mac models throughout 2024 and early 2025. Lower-tier models like MacBook Air and Mac Mini will get the base Donan chip, while high-performance Mac Pro will be equipped with the top-tier Hidra. We can expect to learn more about the specific AI features of the M4 chip at Apple’s WWDC on June 10th.
Why does this matter?
Apple’s new AI-powered M4 Mac chip could make Macs much faster for things like video editing and scientific work, competing better with computers with similar AI features.
By controlling hardware and software, Apple can fine-tune everything to ensure a smooth user experience and future improvements.
Meta’s OpenEQA puts AI’s real-world comprehension to test
Meta AI has released a new dataset called OpenEQA to measure how well AI understands the real world. This “embodied question answering” (EQA) involves an AI system being able to answer questions about its environment in natural language.
The dataset includes over 1,600 questions about various real-world places and tests an AI’s ability to recognize objects, reason about space and function, and use common sense knowledge.
Why does this matter?
While OpenEQA challenges AI with questions demanding visual and spatial reasoning, it also exposes limitations in current AI models that often rely solely on text knowledge. Its role could push researchers to develop AI with a stronger grasp of the physical world.
OpenAI has dismissed two researchers, Leopold Aschenbrenner and Pavel Izmailov, for allegedly leaking information following an undisclosed internal investigation.
The leaked information may be related to a research project called Q*, which involved a breakthrough in AI models solving unseen math problems, raising concerns about the lack of safeguards for commercializing such advanced technology.
The firings highlight a potential contradiction in OpenAI’s mission, as the company faces criticism for moving away from its original ethos of openness and transparency.
Apple is planning to bring new AI-focused M4 chips to entire line of Macs
Apple is poised to launch its next-generation M4 chips as early as this year, aimed at enhancing AI capabilities and rejuvenating Mac sales following a 27% drop last fiscal year.
The M4 chips, reported to be nearing production, are expected to come in three variants named Donan, Brava, and Hidra, supporting a range of Mac products, including updates to the iMac, MacBook Pros, and Mac Mini initially, with the MacBook Air and Mac Studio to follow.
This accelerated update cycle to introduce M4 chips may lead to a short lifespan for the recently launched M3 chips, indicating Apple’s urgency to compete in the AI technology space against rivals with similar AI-focused hardware advancements.
Amazon CEO: don’t wait for us to launch a ChatGPT competitor
Amazon CEO Andy Jassy emphasizes the company’s focus on building foundational “primitives” for generative AI rather than quickly launching public-facing products like a ChatGPT competitor.
Amazon has launched AI products such as Amazon Bedrock and Amazon Q aimed at software engineers and business customers, aligning with its strategy to empower third-party developers to create GenAI applications.
Despite not directly competing with ChatGPT, Amazon is investing in the AI domain, including a $4 billion investment in AI company Anthropic, while also enhancing its existing products like Alexa with AI capabilities.
ChatGPT’s GPT-4 Turbo model has received an upgrade, enhancing its abilities in writing, math, logical reasoning, and coding, as announced by OpenAI for its premium users.
The upgrade, distinguished by significant performance improvements in mathematics and GPQA, also aims for more succinct, direct, and conversational responses.
This new version of ChatGPT, which includes data up until December 2023, shows improved performance on recent topics, such as acknowledging the launch of the iPhone 15.
Gabe Newell, the man behind Steam, is working on a brain-computer interface
Gabe Newell, co-founder of Valve and the force behind Steam, has been developing a brain-computer interface (BCI) technology through a venture named Starfish Neuroscience, rivaling Elon Musk’s Neuralink.
Since 2019, Newell has explored gaming applications for BCIs and discussed potential future capabilities like editing feelings, highlighting the technology’s potential beyond traditional interfaces.
Aside from his BCI pursuits, Newell has faced recent challenges including an antitrust lawsuit against Steam and the sale of his megayacht, amidst managing COVID-19 precautions and legal appearances.
OpenAI has released an enhanced version of GPT-4 Turbo for ChatGPT Plus, Team, and Enterprise customers. The new model, trained on data until December 2023, promises more direct responses, less verbosity, and improved conversational language, along with advancements in writing, math, reasoning, and coding. (Link)
Dr. Andrew Ng joins Amazon’s Board of Directors
Amazon has appointed Dr. Andrew Ng, a renowned AI expert and founder of several influential AI companies, to its Board of Directors. With his deep expertise in machine learning and AI education, Ng is expected to provide valuable insights as Amazon navigates the transformative potential of generative AI. (Link)
Humane’s $699 Ai Pin hits the US market
Humane’s Ai Pin is now available across the US, with global expansion on the horizon through SKT and SoftBank partnerships. The wearable AI device is powered by a $24/month plan, including unlimited AI queries, data, and storage. The international availability is to be announced soon. (Link)
TikTok might use AI influencers for ads
TikTok is developing a new feature that lets companies use AI characters to advertise products. These AI influencers can read scripts made by advertisers or sellers. TikTok has been testing this feature but isn’t sure when it will be available for everyone to use. (Link)
Sanctuary AI’s humanoid robot to be tested at Magna
Magna, a major European car manufacturer, will pilot Sanctuary AI’s humanoid robot, Phoenix, at one of its facilities. This follows similar moves by other automakers exploring the use of humanoid robots in manufacturing, as companies seek to determine the potential return on investment.(Link)
A Daily chronicle of AI Innovations April 11th 2024: Meta unveils next-generation AI chip for enhanced workloads New AI tool lets you generate 1200 songs per month for free Adobe is buying videos for $3 per minute to build an AI model Google expands Gemma family with new models Mistral unveils Mixtral-8x22B open language model Google Photos introduces free AI-powered editing tools Microsoft enhances Bing visual search with personalization Sama red team: Safety-centered solution for Generative AI Apple hit with ‘mercenary spyware attacks’ Humane AI has only one problem: it just doesn’t work MistralAI unveils groundbreaking open model Mixtral 8x22B Microsoft proposed using DALL-E to US military last year New AI music generator Udio synthesizes realistic music on demand Adobe is purchasing video content to train its AI model
Meta unveils next-generation AI chip for enhanced workloads
Meta has introduced the next generation of its Meta Training and Inference Accelerator (MTIA), significantly improving on MTIAv1 (its first-gen AI inference accelerator). This version more than doubles the memory and compute bandwidth, designed to effectively serve Meta’s crucial AI workloads, such as its ranking and recommendation models and Gen AI workloads.
Meta has also co-designed the hardware system, the software stack, and the silicon, which is essential for the success of the overall inference solution.
Early results show that this next-generation silicon has improved performance by 3x over the first-generation chip across four key models evaluated. MTIA has been deployed in the data center and is now serving models in production.
Why does this matter?
This is a bold step towards self-reliance in AI! Because Meta controls the whole stack, it can achieve an optimal mix of performance and efficiency on its workloads compared to commercially available GPUs. This eases NVIDIA’s grip on it, which might be having a tough week with other releases, including Intel’s Gaudi 3 and Google Axion Processors.
New AI tool lets you generate 1200 songs per month for free
Udio, a new AI music generator created by former Google DeepMind researchers, is now available in beta. It allows users to generate up to 1200 songs per month for free, with the ability to specify genres and styles through text prompts.
The startup claims its AI can produce everything from pop and rap to gospel and blues, including vocals. While the free beta offers limited features, Udio promises improvements like longer samples, more languages, and greater control options in the future. The company is backed by celebrities like Will.i.am and investors like Andreessen Horowitz.
Why does this matter?
AI-generated music platforms like Udio democratize music creation by making it accessible to everyone, fostering new artists and diverse creative expression. This innovation could disrupt traditional methods, empowering independent creators lacking access to expensive studios or musicians.
Adobe is buying videos for $3 per minute to build an AI model
Adobe is buying videos at $3 per minute from its network of photographers and artists to build a text-to-video AI model. It has requested short clips of people engaged in everyday actions such as walking or expressing emotions including joy and anger, interacting with objects such as smartphones or fitness equipment, etc.
The move shows Adobe trying to catch up to competitors like OpenAI (Sora). Over the past year, Adobe has added generative AI features to its portfolio, including Photoshop and Illustrator, that have garnered billions of uses. However, Adobe may be lagging behind the AI race and is trying to catch up.
Why does this matter?
Adobe’s targeted video buying for AI training exposes the hefty price tag of building competitive AI. Smaller companies face an uphill battle—they might need to get scrappier, focus on specific niches, team up, or use free, open-source AI resources.
Apple has issued a warning to iPhone users in 92 countries about a potential “mercenary spyware attack” aimed at compromising their devices, without identifying the attackers or the consequences.
The company suggests that the attack is highly targeted, advising recipients to take the warning seriously and to update their devices with the latest security patches and practice strong cyber hygiene.
This type of attack is often linked to state actors employing malware from private companies, with the infamous ‘Pegasus’ spyware mentioned as an example, capable of extensive surveillance on infected phones.
Humane AI has only one problem: it just doesn’t work
The Humane AI Pin, retailing for $699 plus a $24 monthly fee, is designed as a wearable alternative to smartphones, promising users freedom from their screens through AI-assisted tasks. However, its functionality falls significantly short of expectations.
Throughout testing, the AI Pin struggled with basic requests and operations, demonstrating unreliability and slow processing times, leading to the conclusion that it fails to deliver on its core promise of a seamless, smartphone-free experience.
Despite its well-intentioned vision for a post-smartphone future and the integration of innovative features like a screenless interface and ambient computing, the device’s current state of performance and high cost make it a poor investment for consumers.
MistralAI unveils groundbreaking open model Mixtral 8x22B
Mistral AI has released Mixtral 8x22B, an open-source AI model boasting 176 billion parameters and a 65,000-token context window, expected to surpass its predecessor and compete with major models like GPT-3.5 and Llama 2.
The Paris-based startup, valued at over $2 billion, aims to democratize access to cutting-edge AI by making Mixtral 8x22B available on platforms like Hugging Face and Together AI, allowing for widespread use and customization.
Despite its potential for innovation in fields like customer service and drug discovery, Mixtral 8x22B faces challenges related to its “frontier model” status, including the risk of misuse due to its open-source nature and lack of control over harmful applications.
Microsoft proposed using DALL-E to US military last year
Microsoft proposed to the U.S. Department of Defense in 2023 to use OpenAI’s DALL-E AI for software development in military operations.
The proposal included using OpenAI tools like ChatGPT and DALL-E for document analysis, machine maintenance, and potentially training battlefield management systems with synthetic data.
Microsoft had not implemented the use of DALL-E in military projects, and OpenAI, which did not participate in Microsoft’s presentation, restricts its technology from being used to develop weapons or harm humans.
New AI music generator Udio synthesizes realistic music on demand
Uncharted Labs has officially launched its music generator, Udio, which can transform text prompts into professional-quality music tracks, challenging the leading AI music generator, Suno V3.
Udio has impressed users and reviewers alike with its ability to generate songs that feature coherent lyrics, well-structured compositions, and competitive rhythms, some even considering it superior to Suno V3.
Despite facing initial server overload due to high user demand, Udio’s user-friendly interface and strong backing from notable investors suggest a promising future for AI-assisted music creation, though it remains free during its beta testing phase.
Adobe is purchasing video content to train its AI model
Adobe is developing a text-to-video AI model, offering artists around $3 per minute for video footage to train the new tool, as reported by Bloomberg.
The software company has requested over 100 video clips from artists, aiming for content that showcases various emotions and activities, but has set a low budget for acquisitions.
Despite the potential for AI to impact artists’ future job opportunities and the lack of credit or royalties for the contributed footage, Adobe is pushing forward with the AI model development.
Google has expanded its Gemma family with two new models: CodeGemma and RecurrentGemma. CodeGemma is tailored for developers, offering intelligent code completion and chat capabilities for languages like Python and JavaScript. RecurrentGemma is optimized for efficiency in research, utilizing recurrent neural networks and local attention. (Link)
Mistral unveils Mixtral-8x22B open language model
Mistral AI has unveiled Mixtral-8x22B, a new open language model with extensive capabilities. This model, featuring 64,000 token context windows and requiring 258GB of VRAM, is a mixture-of-experts model. Early users are exploring its potential, with more details expected soon. (Link)
Google Photos introduces free AI-powered editing tools
Google Photos is rolling out free AI-powered editing tools for all users starting May 15. Features like Magic Eraser, Photo Unblur, and Portrait Light will be accessible without a subscription. Pixel users will also benefit from the Magic Editor, which simplifies complex edits using generative AI. (Link)
Microsoft enhances Bing visual search with personalization
Microsoft enhances Bing Visual Search with personalized visual systems based on user preferences. A patent application reveals that search results will be tailored to individual interests, such as showing gardening-related images to gardening enthusiasts and food-related visuals to chefs. (Link)
Sama red team: Safety-centered solution for Generative AI
Sama has introduced Sama Red Team, a safety-centered solution for evaluating risks associated with generative AI and LLMs. This system simulates adversarial attacks to identify vulnerabilities related to bias, personal information, and offensive content, contributing to a more ethical AI landscape. (Link)
A Daily chronicle of AI Innovations April 10th 2024: OpenAI gives GPT-4 a major upgrade; Quora’s Poe now lets AI chatbot developers charge per message; Google updates and expands its open source Gemma AI model family; Intel unveils latest AI chip as Nvidia competition heats up; WordPress parent acquires Beeper app which brought iMessage to Android; New bill would force AI companies to reveal use of copyrighted art; Intel’s new AI chip: 50% faster, cheaper than NVIDIA’s; Meta to Release Llama 3 Open-source LLM next week; Google Cloud announces major updates to enhance Vertex AI
Intel’s new AI chip: 50% faster, cheaper than NVIDIA’s
Intel has unveiled its new Gaudi 3 AI accelerator, which aims to compete with NVIDIA’s GPUs. According to Intel, the Gaudi 3 is expected to reduce training time for large language models like Llama2 and GPT-3 by around 50% compared to NVIDIA’s H100 GPU. The Gaudi 3 is also projected to outperform the H100 and H200 GPUs in terms of inference throughput, with around 50% and 30% faster performance, respectively.
The Gaudi 3 is built on a 5nm process and offers several improvements over its predecessor, including doubling the FP8, quadrupling the BF16 processing power, and increasing network and memory bandwidth. Intel is positioning the Gaudi 3 as an open, cost-effective alternative to NVIDIA’s GPUs, with plans to make it available to major OEMs starting in the second quarter of 2024. The company is also working to create an open platform for enterprise AI with partners like SAP, Red Hat, and VMware.
Why does it matter?
Intel is challenging NVIDIA’s dominance in the AI accelerator market. It will introduce more choice and competition in the market for high-performance AI hardware. It could drive down prices, spur innovation, and give customers more flexibility in building AI systems. The open approach with community-based software and standard networking aligns with broader trends toward open and interoperable AI infrastructure.
Meta plans to release two smaller versions of its upcoming Llama 3 open-source language model next week. These smaller models will build anticipation for the larger version, which will be released this summer. Llama 3 will significantly upgrade over previous versions, with about 140 billion parameters compared to 70 billion for the biggest Llama 2 model. It will also be a more capable, multimodal model that can generate text and images and answer questions about images.
The two smaller versions of Llama 3 will focus on text generation. They’re intended to resolve safety issues before the full multimodal release. Previous Llama models were criticized as too limited, so Meta has been working to make Llama 3 more open to controversial topics while maintaining safeguards.
Why does it matter?
The open-source AI model landscape has become much more competitive in recent months, with other companies like Mistral and Google DeepMind also releasing their own open-source models. Meta hopes that by making Llama 3 more open and responsive to controversial topics, it can catch up to models like OpenAI’s GPT-4 and become a standard for many AI applications.
Google Cloud announces major updates to enhance Vertex AI
Google Cloud has announced exciting model updates and platform capabilities that continue to enhance Vertex AI:
Gemini 1.5 Pro: Gemini 1.5 Pro is now available in public preview in Vertex AI, the world’s first one million-token context window to customers. It also supports the ability to process audio streams, including speech and even the audio portion of videos.
Imagen 2.0: Imagen 2.0 can now create short, 4-second live images from text prompts, enabling marketing and creative teams to generate animated content. It also has new image editing features like inpainting, outpainting, and digital watermarking.
Gemma: Google Cloud is adding CodeGemma to Vertex AI. CodeGemma is a new lightweight model from Google’s Gemma family based on the same research and technology used to create Gemini.
MLOps: To help customers manage and deploy these large language models at scale, Google has expanded the MLOps capabilities for Gen AI in Vertex AI. This includes new prompt management tools for experimenting, versioning, optimizing prompts, and enhancing evaluation services to compare model performance.
Why does it matter?
These updates significantly enhance Google Cloud’s generative AI offerings. It also strengthens Google’s position in the generative AI space and its ability to support enterprise adoption of these technologies.
OpenAI has introduced GPT-4 Turbo with Vision, a new model available to developers that combines text and image processing capabilities, enhancing AI chatbots and other applications.
This multimodal model, which maintains a 128,000-token window and knowledge from December 2023, simplifies development by allowing a single model to understand both text and images.
GPT-4 Turbo with Vision simplifies development processes for apps requiring multimodal inputs like coding assistance, nutritional insights, and website creation from drawings.
Google updates and expands its open source Gemma AI model family
Google has enhanced the Gemma AI model family with new code completion models and improvements for more efficient inference, along with more flexible terms of use.
Three new versions of CodeGemma have been introduced, including a 7 billion parameter model for code generation and discussion, and a 2 billion parameter model optimized for fast code completion on local devices.
Google also unveiled RecurrentGemma, a model leveraging recurrent neural networks for better memory efficiency and speed in text generation, indicating a shift towards optimizing AI performance on devices with limited resources.
Intel unveils latest AI chip as Nvidia competition heats up
Intel introduced its latest artificial intelligence chip, Gaudi 3, highlighting its efficiency and speed advantages over Nvidia’s H100 GPU and offering configurations that enhance AI model training and deployment.
The Gaudi 3 chip, which outperforms Nvidia in power efficiency and AI model processing speed, will be available in the third quarter, with Dell, Hewlett Packard Enterprise, and Supermicro among the companies integrating it into their systems.
Despite Nvidia’s dominant position in the AI chip market, Intel is seeking to compete by emphasizing Gaudi 3’s competitive pricing, open network architecture, and partnerships for open software development with companies like Google, Qualcomm, and Arm.
WordPress parent acquires Beeper app which brought iMessage to Android
Automattic, the owner of WordPress and Tumblr, has acquired Beeper, a startup known for its Beeper Mini app that attempted to challenge Apple’s iMessage, for $125 million despite the app’s quick defeat.
Beeper CEO Eric Migicovsky will oversee the merging of Beeper with Automattic’s similar app Texts, aiming to create the best chat app, with the combined service expected to launch later this year.
The acquisition raises questions due to Beeper Mini’s brief success and upcoming changes like Apple introducing RCS support to iPhones, but Automattic sees potential in Beeper’s stance on open messaging standards and its established brand.
New bill would force AI companies to reveal use of copyrighted art
A new bill introduced in the US Congress by Congressman Adam Schiff aims to make artificial intelligence companies disclose the copyrighted material used in their generative AI models.
The proposed Generative AI Copyright Disclosure Act would require AI companies to register copyrighted works in their training datasets with the Register of Copyrights before launching new AI systems.
The bill responds to concerns about AI firms potentially using copyrighted content without permission, amidst growing litigation and calls for more regulation from the entertainment industry and artists.
OpenAI launches GPT-4 Turbo with Vision model through API
OpenAI has unveiled the latest addition to its AI arsenal, the GPT -4 Turbo with Vision model, which is now “generally available” through its API. This new version has enhanced capabilities, including support for JSON mode and function calling for Vision requests. The upgraded GPT-4 Turbo model promises improved performance and is set to roll out in ChatGPT. (Link)
Google’s Gemini 1.5 Pro can now listen to audio
Google’s update to Gemini 1.5 Pro gives the model ears. It can process text, code, video, and uploaded audio streams, including audio from video, which it can listen to, analyze, and extract information from without a corresponding written transcript.(Link)
Microsoft to invest $2.9 billion in Japan’s AI and cloud infrastructure
Microsoft announced it would invest $$2.9 billion over the next two years to increase its hyperscale cloud computing and AI infrastructure in Japan. It will also expand its digital skilling programs with the goal of providing AI skills to more than 3 million people over the next three years. (Link)
Google launches Gemini Code Assist, the latest challenger to GitHub’s Copilot
At its Cloud Next conference, Google unveiled Gemini Code Assist, its enterprise-focused AI code completion and assistance tool. It provides various functions such as enhanced code completion, customization, support for various repositories, and integration with Stack Overflow and Datadog. (Link)
eBay launches AI-driven ‘Shop the Look’ feature on its iOS app
eBay launched an AI-powered feature to appeal to fashion enthusiasts – “Shop the Look” on its iOS mobile application. It will suggest a carousel of images and ideas based on the customer’s shopping history. The recommendations will be personalized to the end user. The idea is to introduce how other fashion items may complement their current wardrobe. (Link)
A Daily chronicle of AI Innovations April 09th 2024: Stability AI launches multilingual Stable LM 2 12B Ferret-UI beats GPT-4V in mobile UI tasks Musk says AI will outsmart humans within a year Canada bets big on AI with $2.4B investment OpenAI is using YouTube for GPT-4 training Meta to launch new Llama 3 models Google’s Gemini 1.5 Pro can now hear Google’s first Arm-based CPU will challenge Microsoft and Amazon in the AI race Boosted by AI, global PC market bounces back
Meta to launch new Llama 3 models
According to an insider, Meta will release two smaller versions of its planned major language model, Llama 3, next week to build anticipation for the major release scheduled for this summer.
The upcoming Llama 3 model, which will include both text generation and multimodal capabilities, aims to compete with OpenAI’s GPT-4 and is reported to potentially have up to 140 billion parameters.
Meta’s investment in the Llama 3 model and open-source AI reflects a broader trend of tech companies leveraging these technologies to set industry standards, similar to Google’s strategy with Android.
Google has enhanced Gemini 1.5 Pro to interpret audio inputs, allowing it to process information from sources like earnings calls or video audio directly without needing a transcript.
Gemini 1.5 Pro, positioned as a mid-tier option within the Gemini series, now outperforms even the more advanced Gemini Ultra by offering faster and more intuitive responses without requiring model fine-tuning.
Alongside Gemini 1.5 Pro updates, Google introduced enhancements to its Imagen 2 model, including inpainting and outpainting features, and debuted a digital watermarking technology, SynthID, for tracking the origin of generated images.
Google’s first Arm-based CPU will challenge Microsoft and Amazon in the AI race
Google is developing its own Arm-based CPU named Axion to enhance AI operations in data centers and will launch it for Google Cloud business customers later this year.
The Axion CPU will improve performance by 30% over general-purpose Arm chips and by 50% over Intel’s processors, and it will support services like Google Compute Engine and Google Kubernetes Engine.
Google’s move to create its own Arm-based CPU and update its TPU AI chips aims to compete with Microsoft and Amazon in the AI space and reduce reliance on external suppliers like Intel and Nvidia.
The global PC market has seen growth for the first time in over two years, with a 1.5% increase in shipments to 59.8 million units in the first quarter, reaching pre-pandemic levels.
The resurgence is partly attributed to the emergence of “AI PCs,” which feature onboard AI processing capabilities, with projections suggesting these will represent almost 60% of all PC sales by 2027.
Major PC manufacturers like Lenovo, HP, Dell, and Apple are heavily investing in the AI PC segment, with Lenovo leading the market and Apple experiencing the fastest growth in shipments.
Stability AI launches multilingual Stable LM 2 12B
Stability AI has released a 12-billion-parameter version of its Stable LM 2 language model, offering both a base and an instruction-tuned variant. These models are trained on a massive 2 trillion token dataset spanning seven languages: English, Spanish, German, and more. Stability AI has also improved its 1.6 billion-parameter Stable LM 2 model with better conversational abilities and tool integration.
The new 12B model is designed to balance high performance with relatively lower hardware requirements than other large language models. Stability AI claims it can handle complex tasks requiring substantially more computational resources. The company also plans to release a long-context variant of these models on the Hugging Face platform soon.
Why does this matter?
Stable LM 2 uses powerful 12B models without the most advanced hardware, making it a great choice for enterprises and developers. Stability AI’s multi-pronged approach to language solutions may give it an edge in the competitive generative AI market.
Researchers have launched Ferret-UI, a multimodal language model designed to excel at understanding and interacting with mobile user interfaces (UIs). Unlike general-purpose models, Ferret-UI is trained explicitly for various UI-centric tasks, from identifying interface elements to reasoning about an app’s overall functionality.
By using “any resolution” technology and a meticulously curated dataset, Ferret-UI digs deep into the intricacies of mobile UI screens, outperforming its competitors in elementary and advanced tasks. Its ability to execute open-ended instructions may make it the go-to solution for developers looking to create more intuitive mobile experiences.
Why does this matter?
Ferret-UI’s advanced capabilities in understanding and navigating mobile UI screens will increase accessibility, productivity, and user satisfaction. By setting a new standard for mobile UI interaction, this innovative MLLM paves the way for more intuitive and responsive mobile experiences for users to achieve more with less effort.
Tesla CEO Elon Musk has boldly predicted that AI will surpass human intelligence as early as next year or by 2026. In a wide-ranging interview, Musk discussed AI development’s challenges, including chip shortages and electricity supply constraints, while sharing updates on his xAI startup’s AI chatbot, Grok. Despite the hurdles, Musk remains optimistic about the future of AI and its potential impact on society.
Why does this matter?
Musk’s prediction highlights the rapid pace of AI development and its potential to reshape our world in the near future. As AI becomes increasingly sophisticated, it could transform the job market and raise important ethical questions about the role of technology in society.
Microsoft is opening a new AI research hub in London
Microsoft is tapping into the UK’s exceptional talent pool to drive language models and AI infrastructure breakthroughs. The move highlights Microsoft’s commitment to invest £2.5 billion in upskilling the British workforce and building the AI-driven future. (Link)
OpenAI is using YouTube for GPT-4 training
OpenAI reportedly transcribed over a million hours of YouTube videos to train its advanced GPT-4 language model. Despite legal concerns, OpenAI believes this is fair use. Google and Meta have also explored various solutions to obtain more training data, including using copyrighted material and consumer data. (Link)
Arm’s new chips bring AI to the IoT edge
Arm has introduced the Ethos-U85 NPU and Corstone-320 IoT platform, designed to enhance edge AI applications with improved performance and efficiency. These technologies aim to accelerate the development and deployment of intelligent IoT devices by providing an integrated hardware and software solution for Arm’s partners. (Link)
Canada bets big on AI with $2.4B investment
Prime Minister Justin Trudeau has announced a $2.4 billion investment in Canada’s AI sector, with the majority aimed at providing researchers access to computing capabilities and infrastructure. The government also plans to establish an AI Safety Institute and an Office of the AI and Data Commissioner to ensure responsible development and regulation of the technology. (Link)
A Daily chronicle of AI Innovations April 08th 2024: Microsoft opens AI Hub in London to ‘advance state-of-the-art language models’ JPMorgan CEO compares AI’s potential impact to electricity and the steam engine Spotify moves into AI with new feature Build resource-efficient LLMs with Google’s MoD Newton brings sensor-driven intelligence to AI models Internet archives become AI training goldmines for Big Tech
Build resource-efficient LLMs with Google’s MoD
Google DeepMind has introduced “Mixture-of-Depths” (MoD), an innovative method that significantly improves the efficiency of transformer-based language models. Unlike traditional transformers that allocate the same amount of computation to each input token, MoD employs a “router” mechanism within each block to assign importance weights to tokens. This allows the model to strategically allocate computational resources, focusing on high-priority tokens while minimally processing or skipping less important ones.
Notably, MoD can be integrated with Mixture-of-Experts (MoE), creating a powerful combination called Mixture-of-Depths-and-Experts (MoDE). Experiments have shown that MoD transformers can maintain competitive performance while reducing computational costs by up to 50% and achieving significant speedups during inference.
Why does this matter?
MoD can greatly reduce training times and enhance model performance by dynamically optimizing computational resources. Moreover, it adapts the model’s depth based on the complexity of the task at hand. For simpler tasks, it employs shallower layers, conserving resources. Conversely, for intricate tasks, it deepens the network, enhancing representation capacity. This adaptability ensures that creators can fine-tune LLMs for specific use cases without unnecessary complexity.
Newton brings sensor-driven intelligence to AI models
Startup Archetype AI has launched with the ambitious goal of making the physical world understandable to artificial intelligence. By processing data from a wide variety of sensors, Archetype’s foundational AI model called Newton aims to act as a translation layer between humans and the complex data generated by the physical world.
Using plain language, Newton will allow people to ask questions and get insights about what’s happening in a building, factory, vehicle, or even the human body based on real-time sensor data. The company has already begun pilot projects with Amazon, Volkswagen, and healthcare researchers to optimize logistics, enable smart vehicle features, and track post-surgical recovery. Archetype’s leadership team brings deep expertise from Google’s Advanced Technology and Products (ATAP) division.
Why does this matter?
General-purpose AI systems like Newton that can interpret diverse sensor data will be the pathway to building more capable, context-aware machines. In the future, users may increasingly interact with AI not just through screens and speakers but through intelligently responsive environments that anticipate and adapt to their needs. However, as AI becomes more deeply embedded in the physical world, the stakes of system failures or unintended consequences become higher.
Internet archives become AI training goldmines for Big Tech
To gain an edge in the heated AI arms race, tech giants Google, Meta, Microsoft, and OpenAI are spending billions to acquire massive datasets for training their AI models. They are turning to veteran internet companies like Photobucket, Shutterstock, and Freepik, who have amassed vast archives of images, videos, and text over decades online.
The prices for this data vary depending on the type and buyer but range from 5 cents to $7 per image, over $1 per video, and around $0.001 per word for text. The demand is so high that some companies are requesting billions of videos, and Photobucket says it can’t keep up.
Why does this matter?
This billion-dollar rush for AI training data could further solidify Big Tech’s dominance in artificial intelligence. As these giants hoard the data that’s crucial for building advanced AI models, it may become increasingly difficult for startups or academic labs to compete on a level playing field. We need measures to protect the future diversity and accessibility of AI technologies.
Spotify is launching a beta tool enabling Premium subscribers to create playlists using text descriptions on mobile.
Users can input various prompts reflecting genres, moods, activities, or even movie characters to receive a 30-song playlist tailored to their request, with options for further refinement through additional prompts.
The AI Playlist feature introduces a novel approach to playlist curation, offering an efficient and enjoyable way to discover music that matches specific aesthetics or themes, despite limitations on non-music related prompts and content restrictions.
Microsoft opens AI Hub in London to ‘advance state-of-the-art language models’
Mustafa Suleyman, co-founder of DeepMind and new CEO of Microsoft AI, announced the opening of a new AI hub in London, focusing on advanced language models, under the leadership of Jordan Hoffmann.
The hub aims to recruit fresh AI talent for developing new language models and infrastructure, bolstered by Microsoft’s £2.5 billion investment in the U.K. over the next three years to support AI economy training and data centre expansion.
Suleyman, Hoffmann, and about 60 AI experts recently joined Microsoft through its indirect acquisition of UK-based AI startup Inflection AI.
JPMorgan CEO compares AI’s potential impact to electricity and the steam engine
JPMorgan CEO Jamie Dimon stated AI could significantly impact every job, comparing its potential to revolutionary technologies like the steam engine and electricity.
Dimon highlighted AI’s importance in his shareholder letter, revealing the bank’s investment in over 400 AI use cases and the acquisition of thousands of AI experts and data scientists.
He expressed belief in AI’s transformative power, equating its future impact to historical milestones such as the printing press, computing, and the internet.
Spotify has launched AI-powered personalized playlists that users can create using text prompts. The feature is currently available in beta for UK and Australia users on iOS and Android. Spotify uses LLMs to understand the prompt’s intent and its personalization technology to generate a custom playlist, which users can further refine. (Link)
Meta expands “Made with AI” labeling to more content types
Meta will start applying a “Made with AI” badge to a broader range of AI-generated content, including videos, audio, and images. The company will label content where it detects AI image indicators or when users acknowledge uploading AI-generated content. (Link)
Gretel’s Text-to-SQL dataset sets new standard for AI training data
Gretel has released the world’s largest open-source Text-to-SQL dataset containing over 100,000 high-quality synthetic samples spanning 100 verticals. The dataset, generated using Gretel Navigator, aims to help businesses unlock the potential of their data by enabling AI models to understand natural language queries and generate SQL queries. (Link)
Microsoft upgrades Azure AI Search with more storage and support for OpenAI apps
Microsoft has made Azure AI Search more cost-effective for developers by increasing its vector and storage capacity. The service now supports OpenAI applications, including ChatGPT and GPTs, through Microsoft’s retrieval augmented generation system. Developers can now scale their apps to a multi-billion vector index within a single search without compromising speed or performance. (Link)
Google brings Gemini AI chatbot to Android app
Google is bringing its AI chatbot, Gemini, to the Android version of the Google app. Similar to its iOS integration, users can access Gemini by tapping its logo at the top of the app, opening a chatbot prompt field. Here, users can type queries, request image generation, or ask for image analysis. (Link)
A Daily chronicle of AI Innovations April 06th 2024: 👀 Sam Altman and Jony Ive seek $1B for personal AI device 🚕 Elon Musk says Tesla will unveil robotaxi in August 🔖 Meta to label content ‘made with AI’ 🙃 How OpenAI, Google and Meta ignored corporate policies to train their AI 🛒
👀 Sam Altman and Jony Ive seek $1B for personal AI device OpenAI CEO
Sam Altman and former Apple design chief Jony Ive are collaborating to create an AI-powered personal device and are currently seeking funding. The specifics of the device are unclear, but it is noted to not resemble a smartphone, with speculation about it being similar to the screenless Humane AI pin. The venture, still unnamed, aims to raise up to $1 billion and is in discussions with major investors, including Thrive Capital and Emerson Collective, with potential ownership involvement from OpenAI. https://invest.radintel.ai
🚕 Elon Musk says Tesla will unveil robotaxi in August
Elon Musk announced that Tesla will unveil its robotaxi on August 8th, aiming to focus on autonomous vehicles over mass-market EVs. The Tesla robotaxi is part of Musk’s vision for a shared fleet that owners can monetize, described in the Tesla Network within his Master Plan Part Deux. Musk’s history of ambitious claims about self-driving technology contrasts with regulatory scrutiny and safety concerns involving Tesla’s Autopilot and Full Self-Driving features.
OpenAI’s AI model can clone your voice in 15 seconds
OpenAI has offered a glimpse into its latest breakthrough – Voice Engine, an AI model that can generate stunningly lifelike voice clones from a mere 15-second audio sample and a text input. This technology can replicate the original speaker’s voice, opening up possibilities for improving educational materials.
Though the model has many applications, the AI giant is cautious about its potential misuse, especially during elections. They have strict rules for partners, like no unauthorized impersonation, clear labeling of synthetic voices, and technical measures like watermarking and monitoring.
Meta to label content ‘made with AI’
Meta announced that starting in May 2024, AI-generated content on Facebook, Instagram, and Threads will be labeled “Made with AI.”
The decision for broader labeling, including AI-generated videos, audio, and images, is influenced by expert consultations and public opinion surveys.
Meta’s goal with the “Made with AI” label is to provide more context to users, aiding in content evaluation, while content violating community standards will still be removed.
How OpenAI, Google and Meta ignored corporate policies to train their AI
OpenAI, Google, and Meta pushed the boundaries of data acquisition for AI development, with OpenAI transcribing over one million hours of YouTube videos for its GPT-4 model.
Meta considered extreme measures such as purchasing a publishing house for access to copyrighted materials, and Google amended its privacy policy to potentially harness user-generated content in Google Docs for AI.
As the demand for data outpaces supply, tech companies are exploring the creation of synthetic data generated by AI models themselves, despite the risk of models reinforcing their own errors, suggesting a future where AI might train on data it generates.
Tech giants are on a billion-dollar shopping spree for AI training data
Tech giants are spending billions to license images, videos, and other content from companies such as Photobucket and Shutterstock to train their AI models, with costs ranging from 5 cents to $1 per photo and more for videos.
Prices for licensing data to train AI vary, with figures from $1 to $2 per image, $2 to $4 for short videos, and up to $300 per hour for longer films, while special handling items like nude images may cost $5 to $7 each.
Legal concerns arise as companies like Photobucket update their terms of service to sell user-uploaded content for AI training, despite the US Federal Trade Commission warning against retroactively changing terms for AI use, leading to investigations into deals like Reddit’s with Google.
A daily chronicle of AI Innovations April 05th 2024: YouTube CEO warns OpenAI that training models on its videos is against the rules; OpenAI says 2024 is the “year of the enterprise” when it comes to AI; The war for AI talent has begun; Cohere launches the “most powerful LLM for enterprises”; OpenAI doubles down on AI model customization; Will personal home robots be Apple’s next big thing?
Cohere launches the “most powerful LLM for enterprises”
Cohere has announced the release of Command R+, its most powerful and scalable LLM to date. Designed specifically for enterprise use cases, Command R+ boasts several key features:
Advanced Retrieval Augmented Generation (RAG) to access and process vast amounts of information, improving response accuracy and reliability.
Support for ten business languages, enabling seamless operation across global organizations.
Tool Use feature to automate complex workflows by interacting with various software tools.
Moreover, Command R+ outperforms other scalable models on key metrics while providing strong accuracy at lower costs.
The LLM is now available through Cohere’s API and can be deployed on various cloud platforms, including Microsoft Azure and Oracle Cloud Infrastructure.
Why does this matter?
As one of the first “enterprise-hardened” LLMs optimized for real-world use cases, Command R+ could shape how companies operationalize generative AI across their global operations and product lines. Similar to how Robotic Process Automation (RPA) transformed back-office tasks, Command R+ could significantly improve efficiency and productivity across diverse industries. Additionally, availability on Microsoft Azure and upcoming cloud deployments make it readily accessible to businesses already using these platforms, which could lower the barrier to entry for implementing gen AI solutions.
OpenAI is making significant strides in AI accessibility with new features for its fine-tuning API and an expanded Custom Models program. These advancements give developers greater control and flexibility when tailoring LLMs for specific needs.
The fine-tuning AP includes:
Epoch-based checkpoint creation for easier retraining
A playground for comparing model outputs
Support for third-party integration
Hyperparameters adjustment directly from the dashboard
The Custom Models program now offers assisted fine-tuning with OpenAI researchers for complex tasks and custom-trained models built entirely from scratch for specific domains with massive datasets.
Why does this matter?
This signifies a significant step towards more accessible and powerful AI customization. Previously, fine-tuning required technical expertise and large datasets. Now, with OpenAI’s assisted programs, organizations can achieve similar results without needing in-house AI specialists, potentially democratizing access to advanced AI capabilities.
Will personal home robots be Apple’s next big thing?
Apple is reportedly venturing into personal robotics after abandoning its self-driving car project and launching its mixed-reality headset. According to Bloomberg’s sources, the company is in the early stages of developing robots for the home environment.
Two potential robot designs are mentioned in the report. One is a mobile robot that can follow users around the house. The other is a stationary robot with a screen that can move to mimic a person’s head movements during video calls. Apple is also considering robots for household tasks in the long term.
The project is being spearheaded by Apple’s hardware and AI teams under John Giannandrea. Job postings on Apple’s website further support its commitment to robotics, highlighting its search for talent to develop “the next generation of Apple products” powered by AI.
Why does this matter?
If Apple does release personal home robots, it could mainstream consumer adoption and create new use cases, as the iPhone did for mobile apps and smart assistants. Apple’s brand power and integrated ecosystem could help tackle key barriers like cost and interoperability that have hindered household robotics so far.
It could also transform homes with mobile AI assistants for tasks like elderly care, household chores, entertainment, and more. This may spur other tech giants to double down on consumer robotics.
YouTube CEO warns OpenAI that training models on its videos is against the rules
YouTube CEO Neal Mohan warned that OpenAI’s use of YouTube videos to train its text-to-video generator Sora could breach the platform’s terms of service, emphasizing creators’ expectations of content use compliance.
This stance poses potential challenges for Google, facing multiple lawsuits over alleged unauthorized use of various content types to train its AI models, arguing such use constitutes “fair use” through transformative learning.
Mohan’s remarks could undermine Google’s defense in ongoing legal battles by highlighting inconsistencies in the company’s approach to using content for AI training, including its use of YouTube videos and content from other platforms.
Elon Musk aims to retain Tesla’s AI talent by increasing their compensation to counteract aggressive recruitment tactics from OpenAI.
Tesla Staff Machine Learning Scientist Ethan Knight’s move to Musk’s AI startup, xAI, exemplifies efforts to prevent employees from joining competitors like OpenAI.
Musk describes the ongoing competition for AI professionals as the “craziest talent war” he has ever seen and sees increased compensation as a means to achieve Tesla’s ambitious AI goals, including autonomous driving and humanoid robots development.
OpenAI says 2024 is the “year of the enterprise” when it comes to AI
OpenAI’s ChatGPT Enterprise has attracted over 600,000 sign-ups, prompting COO Brad Lightcap to declare 2024 as the “year of adoption for AI in the enterprise”.
Despite the strong uptake of ChatGPT Enterprise, OpenAI faces stiff competition from companies eager to penetrate the workplace AI market, including major investor Microsoft with its enterprise AI solutions.
OpenAI’s venture into the enterprise sector, especially with ChatGPT Enterprise, marks a significant move towards profitability, with successful partnerships with major media companies like Axel Springer SE, Le Monde, and Prisa.
S&P Global has launched S&P AI Benchmarks by Kensho, a groundbreaking tool that evaluates the performance of LLMs in complex financial and quantitative applications. This solution aims to set a new industry standard and promote transparency in AI adoption within the financial sector. (Link)
Waymo and Uber partner for autonomous food delivery in Phoenix
Waymo and Uber have teamed up to launch autonomous Uber Eats deliveries in Phoenix using Waymo’s self-driving vehicles. The service will initially cover select merchants in Chandler, Tempe, and Mesa. Customers can opt out during checkout if they prefer a human courier and will receive instructions for retrieving their order from the autonomous vehicle upon arrival. (Link)
Storyblocks integrates AI for smarter search
Storyblocks has integrated OpenAI’s LLM into its search engine to improve search accuracy for complex queries. Coupled with algorithms analyzing content performance and user engagement, the AI-driven search adapts to provide fresh, high-quality content. Storyblocks also uses machine learning to optimize thumbnails, prioritize representation, and suggest complementary assets, streamlining the creative process. (Link)
Hercules AI streamlines enterprise AI app development
Hercules AI has introduced a new “assembly line” approach for rapid deployment of AI assistants in enterprises. The pre-configured components allow companies to develop cost-effective, scalable AI agents. Plus, their RosettaStoneLLM, built on Mistral-7B and WizardCoder-13B, outperforms competitors by converting data for internal AI workflows. (Link)
Yum Brands embraces AI across restaurants
Yum Brands, the parent company of KFC, Pizza Hut, and Taco Bell, is infusing AI into every aspect of its restaurant operations. From voice AI taking drive-thru orders to an AI-powered “SuperApp” for staff, Yum aims to elevate customer experiences and streamline processes. The AI-driven initiatives include personalized promotions, predictive ordering, and even AI-assisted cooking instructions. (Link)
A daily chronicle of AI Innovations April 04th 2024: What’s new in Stability AI’s Stable Audio 2.0? Opera One browser becomes the first to offer local AI integration Copilot gets GPT-4 Turbo upgrade SWE-agent: AI coder that solves GitHub issues in 93 seconds Mobile-first Higgsfield aims to disrupt video marketing with AI
What’s new in Stability AI’s Stable Audio 2.0?
Stability AI has released Stable Audio 2.0, a new AI model that generates high-quality, full-length audio tracks. Built upon its predecessor, the latest model introduces three groundbreaking features:
Generates tracks up to 3 minutes long with coherent musical structure
Enables audio-to-audio generation, allowing users to transform uploaded samples using natural language prompts
Enhances sound effect generation and style transfer capabilities, offering more flexibility and control for artists
Stable Audio 2.0’s architecture combines a highly compressed autoencoder and a diffusion transformer (DiT) to generate full tracks with coherent structures. The autoencoder condenses raw audio waveforms into shorter representations, capturing essential features, while the DiT excels at manipulating data over long sequences. This combination allows the model to recognize and reproduce the large-scale structures essential for creating high-quality musical compositions.
Trained exclusively on a licensed dataset from AudioSparx, Stable Audio 2.0 prioritizes creator rights by honoring opt-out requests and ensuring fair compensation. You can explore the capabilities of the model for free on the Stable Audio website.
Why does this matter?
Stable Audio 2’s capability to generate 3-minute songs is a big step forward for AI music tools. But it still has some issues, like occasional glitches and “soulless” vocals, showing that AI has limits in capturing the emotion of human-made music. Also, a recent open letter from artists like Billie Eilish and Katy Perry raises concerns about the ethics of AI-generated music.
SWE-agent: AI coder that solves GitHub issues in 93 seconds
Researchers at Princeton University have developed SWE-agent, an AI system that converts language models like GPT-4 into autonomous software engineering agents. SWE-agent can identify and fix bugs and issues in real-world GitHub repositories in 93 seconds! It does so by interacting with a specialized terminal, which allows it to open, scroll, and search through files, edit specific lines with automatic syntax checking, and write and execute tests. This custom-built agent-computer interface is critical for the system’s strong performance.
In the SWE-Bench benchmark test, SWE-agent solved 12.29% of the problems presented, nearly matching the 13.86% achieved by Devin, a closed-source $21 million commercial AI programmer developed by Cognition AI. While Devin is currently only available to select developers, the Princeton team has made SWE-agent open-source to gather feedback and encourage collaboration in advancing this technology.
Why does this matter?
The rise of SWE-agent shows AI systems are becoming more sophisticated in assisting human programmers. Over time, they may change the nature of software development roles, requiring developers to focus more on high-level problem-solving and architectural design while delegating routine tasks to AI assistants. This change could make software development faster and more creative, but it might also require significant upskilling within the developer community.
Mobile-first Higgsfield aims to disrupt video marketing with AI
Former Snap AI chief Alex Mashrabov has launched a new startup called Higgsfield AI, which aims to make AI-powered video creation accessible to creators and marketers. The company’s first app, Diffuse, allows users to generate original video clips from text descriptions or edit existing videos to insert themselves into the scenes.
Higgsfield is taking on OpenAI’s Sora video generator but targeting a broader audience with its mobile-first, user-friendly tools. The startup has raised $8 million in seed funding and plans to further develop its video editing capabilities and AI models. While questions remain around data usage and potential for abuse, Higgsfield believes it can carve out a niche in social media marketing with its realistic, easy-to-use video generation.
Why does this matter?
Higgsfield’s mobile-first approach to AI video generation could be a game-changer regarding accessibility and ease of use. The company is positioning itself to capture a significant portion of the creator economy by prioritizing consumer-friendly features and social media integration. As more users embrace these tools, we can expect to see an explosion of AI-generated content across social media platforms, which could have far-reaching implications for content authenticity and user engagement.
Generative AI Used To Develop Potential New Drugs For Antibiotic-Resistant Bacteria
Researchers at Stanford Medicine and McMaster University have devised a new AI model, SyntheMol (“synthesizing molecules”), which creates recipes for chemists to synthesize drugs in the lab. With nearly 5 million deaths linked to antibiotic resistance globally every year, new ways to combat resistant bacterial strains are urgently needed, according to the researchers.
Using SyntheMol, the researchers have so far developed six novel drugs aimed at killing resistant strains of Acinetobacter baumannii, one of the leading pathogens responsible for antibacterial resistance-related deaths, as noted in a study published March 22 in the journal Nature Machine Intelligence. Read more here
Apple explores making personal robots
Apple is investigating personal robotics as a new venture, focusing on a mobile robot that can follow users and a robotic table-top device that moves a display around, despite the uncertain future of these products.
This move into robotics is part of Apple’s search for new growth avenues after discontinuing its electric vehicle project, with the company looking to capitalize on advancements in artificial intelligence for home automation.
Apple’s robotics efforts are led within its hardware engineering division and AI group, indicating a strategic investment in developing cutting-edge home devices, although the projects are still in early research stages and have not been officially confirmed for release.
Google could soon start charging a fee for AI-powered search results
Google is exploring the introduction of a paid “premium” tier for its search engine, featuring new generative AI-powered enhancements, marking a significant shift from its traditionally ad-supported model.
The company is considering integrating these AI-powered search features into existing premium subscription services, amidst concerns about the impact of AI on its advertising revenue, which is critical to its business model.
Google has begun experimenting with AI-powered search services, presenting detailed answers alongside traditional search results and advertisements, but has yet to fully implement these features into its main search engine.
ChatGPT now lets you edit AI images created in DALL-E
OpenAI has updated DALL-E with image editing tools accessible within ChatGPT on both web and mobile platforms, allowing users to refine AI-generated images without leaving the chat interface.
DALL-E now provides preset style suggestions, such as woodcut, gothic, synthwave, and hand-drawn, to inspire users in their image creation process, similar to AI-generated wallpaper prompts on Android.
The integration of DALL-E with ChatGPT, particularly with the latest updates, aims to enhance user-friendliness by simplifying the image creation process and offering starting points for creativity.
Meta’s AI image generator struggles to create images of couples of different races. LINK
OpenAI’s Sora just made its first music video and it’s like a psychedelic trip. LINK
What Else Is Happening in AI on April 04th, 2024
Codiumate offers secure, compliant AI-assisted coding for enterprises
Codium AI, an Israeli startup, has launched Codiumate, a semi-autonomous AI agent, to help enterprise software developers with coding, documentation, and testing. It can help with creating development plans from existing code, writing code, finding duplicate code, and suggesting tests. Codiumate aims to make development faster and more secure, with features like zero data retention and the ability to run on private servers or air-gapped computers. (Link)
Opera One browser becomes the first to offer local AI integration
Opera now supports 150 local LLM variants in its Opera One browser, making it the first major browser to offer access to local AI models. This feature lets users process their input locally without sending data to a server. Opera One Developer users can select and download their preferred local LLM, which typically requires 2-10 GB of storage space per variant, instead of using Opera’s native browser AI, Aria. (Link)
AWS expands Amazon Bedrock with Mistral Large model
AWS has included Mistral Large in its Amazon Bedrock managed service for generative AI and app development. Mistral Large is fluent in English, French, Spanish, German, and Italian, and can handle complex multilingual tasks like text understanding, transformation, and code generation. AWS also mentioned that Mistral AI will use its Tranium and Inferentia silicon chips for future models, and that Amazon Bedrock is now in France. (Link)
Copilot gets GPT-4 Turbo upgrade and enhanced image generation
Microsoft is providing GPT-4 Turbo access to business subscribers of its AI-powered Copilot assistant, without daily limits on chat sessions. The company is also improving image generation capabilities in Microsoft Designer for Copilot subscribers, increasing the limit to 100 images per day using OpenAI’s DALL-E 3 model. These upgrades are part of the $30 per user, per month pricing of Copilot for Microsoft 365. (Link)
Status invests in Matrix to create a decentralized messaging platform
Status, a mobile Ethereum client, has invested $5 million in New Vector, the company behind the open-source, decentralized communication platform Matrix.org. They plan to create a secure messaging solution for users to control their data and communicate across apps and networks. (Link)
A daily chronicle of AI Innovations April 03rd 2024: Google’s Gecko: LLM-powered text embedding breakthrough; Anthropic’s “many-shot jailbreaking” wears down AI ethics; CosmicMan enables the photorealistic generation of human images
Google’s Gecko: LLM-powered text embedding breakthrough
Gecko is a compact and highly versatile text embedding model that achieves impressive performance by leveraging the knowledge of LLMs. DeepMind researchers behind Gecko have developed a novel two-step distillation process to create a high-quality dataset called FRet using LLMs. The first step involves using an LLM to generate diverse, synthetic queries and tasks from a large web corpus. In the second step, the LLM mines positive and hard negative passages for each query, ensuring the dataset’s quality.
When trained on FRet combined with other academic datasets, Gecko outperforms existing models of similar size on the Massive Text Embedding Benchmark (MTEB). Remarkably, the 256-dimensional version of Gecko surpasses all models with 768 dimensions, and the 768-dimensional Gecko competes with models that are 7x larger or use embeddings with 5x higher dimensions.
Why does it matter?
Text embedding models are crucial in natural language processing tasks such as document retrieval, sentence similarity, and classification. Gecko’s development shows the potential for creating a single model that can support multiple downstream tasks, eliminating the need for separate embedding models for each task. Using LLMs and knowledge distillation techniques, Gecko achieves strong retrieval performance and sets a strong baseline as a zero-shot embedding model.
Anthropic’s “many-shot jailbreaking” wears down AI ethics
Researchers at Anthropic discovered a new way to get advanced AI language models to bypass their safety restrictions and provide unethical or dangerous information. They call this the “many-shot jailbreaking” technique. By including many made-up dialog examples in the input where an AI assistant provides harmful responses, the researchers could eventually get the real AI to override its training and provide instructions on things like bomb-making.
The researchers say this vulnerability arises from AI models’ increasing ability to process and “learn” from very long input sequences. Essentially, the AI mimics the unethical behavior repeatedly demonstrated in the made-up examples. Anthropic has implemented safeguards against this attack on its systems and has also shared the findings openly so other AI companies can work on mitigations.
Why does it matter?
As AI models become more capable over time, techniques to override their built-in ethical restraints pose serious risks if not addressed. While Anthropic has been transparent in disclosing this vulnerability to enable mitigations, it underscores the need for continued research into AI safety and security. Simple precautions like limiting input length are inadequate; more sophisticated AI “jailbreak” prevention methods are required as these systems advance.
CosmicMan enables the photorealistic generation of human images
Researchers at the Shanghai AI Laboratory have created a new AI model called CosmicMan that specializes in generating realistic images of people. CosmicMan can produce high-quality, photorealistic human images that precisely match detailed text descriptions, unlike current AI image models that struggle with human images.
The key to CosmicMan’s success is a massive dataset called CosmicMan-HQ 1.0 containing 6 million annotated human images and a novel training method—“ Annotate Anyone,” which focuses the model on different parts of the human body. By categorizing words in the text description into body part groups like head, arms, legs, etc., the model can generate each part separately for better accuracy and customizability, thereby outperforming the current state-of-the-art models.
Why does it matter?
Existing AI models have struggled to create realistic human images and accurately represent diverse human appearances. With CosmicMan, AI systems will be better equipped to generate high-fidelity images of people, which can have implications for computer vision, graphics, entertainment, virtual reality, and fashion. It may enable more realistic virtual avatars, improved character generation in games and movies, and enhanced visual content creation.
Apple Vision Pro’s Spatial Avatars are a game changer
Get the Meta Quest 3 at half the price for similar functionalities here
UBTECH and Baidu have partnered to integrate large AI models into humanoid robots. Their demo features the Walker S robot folding clothes and sorting objects through natural language, using Baidu’s LLM, ERNIE Bot, for task interpretation/planning.
With YC’s latest Demo Day (W24), the AI companies are continuing to grow. Six months ago, there were around 139 companies working with AI or ML – that number has climbed to 158, a clear majority of 65% (there are 243 total companies in the batch).
Let’s dive into what’s new, what’s stayed the same, and what we can learn about the state of AI startups.
The biggest domains stayed big
Perhaps unsurprisingly, the most popular categories remained unchanged from the last batch. Last time, the top 4 domains were AI Ops, Developer Tools, Healthcare + Biotech, and Finance + Payments. This time, the top 5 were:
Developer Tools: Apps, plugins, and SDKs making it easier to write code. Tools for testing automation, website optimization, codebase search, improved Jupyter notebooks, and AI-powered DevOps were all present. There was also a strong contingent of code-generation tools, from coding Copilots to no-code app builders.
AI Ops: Tooling and platforms to help companies deploy working AI models. That includes hosting, testing, data management, security, RAG infrastructure, hallucination mitigation, and more. We’ll discuss how the AI Ops sector has continued to mature below.
Healthcare + Biotech: While I’ve once again lumped these two categories together, there’s a pretty big split in the types of AI businesses being built. Healthcare companies are building automation tools for the entire healthcare lifecycle: patient booking, reception, diagnosis, treatment, and follow-up. Whereas biotech companies are creating foundation models to enable faster R&D.
Sales + Marketing: Early generative AI companies were focused on the sales and marketing benefits of GPT-3: write reasonable sounding copy instantly. Now, we’re seeing more niche use cases for revenue-generating AI: AI-powered CRMs for investors, customer conversation analysis, and AI personal network analysis were among some sales-oriented companies.
Finance: Likewise, on the finance side, companies covered compliance, due diligence, deliverable automation, and more. Perhaps one of my favorite descriptions was “a universal API for tax documents.”
The long tail is getting longer
Even though the top categories were quite similar, one new aspect was a wider distribution of industries. Compared with the last batch, there were roughly 35 categories of companies versus 28 (examples of new categories include HR, Recruiting, and Aerospace). That makes sense to me. I’ve been saying for a while now that “AI isn’t a silver bullet” and that you need domain-expertise to capture users and solve new problems.
But it’s also clear that with AI eating the world, we’re also creating new problems. It was interesting to see companies in the batch focused on AI Safety – one company is working on fraud and deepfake detection, while another is building foundation models that are easy to align. I suspect we will continue seeing more companies dealing with the second-order effects of our new AI capabilities.
We’re also seeing more diverse ways of applying AI. In the last batch, a dominant theme was “copilots.” And while those are still present here (as well as “agents”), there are also more companies building “AI-native” products and platforms – software that uses AI in ways beyond a shoehorned sidebar conversation with an AI assistant.
What comes after CustomGPTs?
“AI agents. These will integrate more fully into numerous systems and you would give them the authority to execute things on your behalf. I.e. making reservations for dinner somewhere and then sending you the details or searching and purchasing and sending a gift to someone or planning and executing a vacation reservation including my purchasing travel arrangements, hotel stays, transport to and from, etc. Even something as simple as telling it you are hungry and having and AI agent find something you would like and having it delivered to you. Or it acting on its own to do any number of those because it also sees your schedule, knows you didn’t really eat all day and that it is your mom’s birthday and you forgot to get her anything or to even call…”
How accurate is that statement above?
AI agents are software entities that act autonomously on behalf of their users, making decisions or performing tasks based on predefined criteria, learned preferences, or adaptive learning algorithms. They can range from simple chatbots to sophisticated systems capable of managing complex tasks. The accuracy of the statement reflects a forward-looking perspective on the capabilities of AI agents, envisioning a future where they are deeply integrated into our daily lives, handling tasks from personal to professional spheres with minimal human intervention.
🤖 Autonomy and Integration: The description is accurate in envisioning AI agents that are more fully integrated into various systems. This integration will likely increase as advancements in AI, machine learning, and data analytics continue to evolve. Such agents will understand user preferences, schedules, and even predict needs based on historical data and real-time inputs.
🔍 Executing Tasks on Behalf of Users: The ability of AI agents to perform tasks such as making reservations, purchasing gifts, or arranging travel is not only plausible but is already being realized to a certain extent with existing AI and machine learning technologies. Examples include virtual assistants like Google Assistant, Siri, and Alexa, which can perform a range of tasks from setting reminders to booking appointments.
🎁 Personalization and Prediction: The statement also touches on the AI agents’ capability to act proactively based on the user’s schedule, preferences, or significant dates. This level of personalization and predictive action is a key area of development in AI, aiming to provide more personalized and anticipative user experiences. Implementing this effectively requires sophisticated models of user behavior and preferences, which can be built using machine learning techniques.
🚀 Future Prospects and Ethical Considerations: While the vision of AI agents acting autonomously to manage aspects of our lives is grounded in realistic expectations of technology’s trajectory, it also raises ethical and privacy concerns. Issues such as data security, user consent, and the potential for over-dependence on technology for personal tasks are significant. The development and deployment of such AI agents must consider these aspects to ensure that they serve users’ interests ethically and securely.
📈 Current Limitations and Challenges: It’s important to note that while the statement captures a future potential, current AI technologies have limitations. The complexity of fully understanding human needs, contexts, and the nuances of personal preferences in an ethical manner remains a challenge.
What Else Is Happening in AI on April 03rd, 2024
Microsoft is planning to add an AI chatbot to Xbox
Microsoft is currently testing a new AI-powered chatbot to be added to Xbox to automate customer support tasks. The software giant has tested an “embodied AI character” that animates when responding to Xbox support queries. The virtual representative can handle either text or voice requests. It’s an effort to integrate AI into Xbox platforms and services. (Link)
CloudFare launches Workers AI to power one-click deployment with Hugging Face
CloudFare has launched Workers AI, which empowers developers to bring their AI applications from Hugging Face to its platform in one click. The serverless GPU-powered interface is generally available to the public. The Cloudflare-Hugging Face integration was announced nearly seven months ago. It makes it easy for models to be deployed onto Workers AI. (Link)
Machine Learning can predict and enhance complex beer flavor
In a study by Nature Communications, researchers combined chemical analyses, sensory data, and machine learning to create models that accurately predict beer flavor and consumer appreciation from the beer’s chemical composition. They identified compounds that enhance flavor and used this knowledge to improve the taste and popularity of commercial beers. (Link)
Read AI adds AI summaries to meetings, emails, and messages
Read AI is expanding its services from summarizing video meetings to including messages and emails. The platform connects to popular communication platforms like Gmail, Outlook, Slack, Zoom, Microsoft Teams, and Google Meet to deliver daily updates, summaries, and AI-generated takeaways. The goal is to help users save time and improve productivity. (Link)
Bille Elish, Kety Perry, and 200 other artists protest AI’s devaluation of music
Nicki Minaj, Billie Eilish, Katy Perry and other musicians warn against replacing human singers with AI
In an open letter, over 200 famous musicians, including Billie Eilish and Katy Perry, have expressed their concerns about the negative impact of AI on human creativity. They call for the responsible use of AI and urge AI companies to stop creating music that undermines their work. They believe that unregulated and uncontrolled use of AI can harm songwriters, musicians, and creators. They emphasize the need to protect artists’ rights and fair compensation. (Link)
A daily chronicle of AI Innovations April 02nd 2024: Apple’s Siri will now understand what’s on your screen; OpenAI introduces instant access to ChatGPT; Elon Musk says AI might destroy humanity, but it’s worth the risk; Sam Altman gives up control of OpenAI Startup Fund; Yahoo acquires Instagram co-founders’ AI-powered news startup Artifact
Sam Altman gives up control of OpenAI Startup Fund
Sam Altman has relinquished formal control of the OpenAI Startup Fund, which he initially managed, to Ian Hathaway, marking a resolution to the fund’s unique corporate structure.
The fund was established in 2021 with Altman temporarily at the helm to avoid potential conflicts had he not returned as CEO after a brief departure; he did not personally invest in or financially benefit from it.
Under Hathaway’s management, the fund, starting with $175 million in commitments, has grown to $325 million in assets and has invested in early-stage AI companies across healthcare, law, education, and more, with at least 16 startups backed.
The US and UK have formed a partnership focused on advancing the safety testing of AI technologies, sharing information and expertise to develop tests for cutting-edge AI models.
A Memorandum of Understanding (MOU) has been signed to enhance the regulation and testing of AI, aiming to effectively assess and mitigate the risks associated with AI technology.
The partnership involves the exchange of expert personnel between the US and UK AI Safety Institutes, with plans for potential joint testing on publicly available AI models, reinforcing their commitment to addressing AI risks and promoting its safe development globally.
Yahoo is acquiring the AI news app Artifact, built by Instagram co-founders, but not its team, aiming to enhance its own news platform with Artifact’s advanced technology and recommendation systems.
Artifact’s technology, which focuses on personalizing and recommending content, will be integrated into Yahoo News and potentially other Yahoo platforms, despite the discontinuation of the Artifact app itself.
The integration of Artifact’s technology into Yahoo aims to create a personalized content ecosystem, leveraging Yahoo’s vast user base to realize the potential of AI in news curation and recommendation.
Apple’s Siri will now understand what’s on your screen
Apple researchers have developed an AI system called ReALM which enables voice assistants like Siri to understand contextual references to on-screen elements. By converting the complex task of reference resolution into a language modeling problem, ReALM outperforms even GPT-4 in understanding ambiguous references and context.
This innovation lies in reconstructing the screen using parsed on-screen entities and their locations to generate a textual representation that captures the visual layout. This approach, combined with fine-tuning language models specifically for reference resolution, allows ReALM to achieve substantial performance gains compared to existing methods.
Apple researchers have developed an AI system called ReALM that can understand screen context and ambiguous references, improving interactions with voice assistants.
ReALM reconstructs the screen using parsed on-screen entities to generate a textual representation, outperforming GPT-4.
Apple is investing in making Siri more conversant and context-aware through this research.
However, automated parsing of screens has limitations, especially with complex visual references.
Apple is catching up in AI research but faces stiff competition from tech rivals like Google, Microsoft, Amazon, and OpenAI.
Why does this matter?
ReALM’s ability to understand screen context creates possibilities for more intuitive and hands-free interactions with voice assistants. Imagine effortlessly instructing Siri to “open the app at the bottom right corner.” As Apple races to close the AI gap with rivals like Google and Microsoft, ReALM could be a game-changer in making Siri and other Apple products more contextually aware.
OpenAI now allows users to use ChatGPT without having to create an account. With over 100 million weekly users across 185 countries, it can now be accessed instantly by anyone curious about its capabilities.
While this move makes AI more accessible, other OpenAI products like DALL-E 3 still require an account. The company has also introduced new content safeguards and allows users to opt out of model training, even without an account. Despite growing competition from rivals like Google’s Gemini, ChatGPT remains the most visited AI chatbot site, attracting 1.6 billion visitors in February.
Why does this matter?
By allowing anyone to instantly access ChatGPT, OpenAI is expanding its user base and encouraging more people to explore the potential applications of AI. This move could accelerate the adoption of AI tools across various industries, as users become more comfortable with the technology.
Elon Musk says AI might destroy humanity, but it’s worth the risk
Elon Musk recently shared his thoughts on the potential dangers of AI at the Abundance Summit’s “Great AI Debate” seminar. He estimated a 10-20% chance that AI could pose an existential threat to humanity.
Despite the risks, Musk believes that the benefits of AI outweigh the potential dangers. He emphasized the importance of teaching AI to be truthful and curious, although he didn’t provide specifics on how he arrived at his risk assessment.
Why does this matter?
Musk’s comments emphasize the importance of using AI’s advantages while addressing its potential risks. This involves creating transparent, accountable AI systems aligned with human values. While his estimate is concerning, continued research in AI safety and governance is necessary to ensure AI remains beneficial.
Artificial intelligence is taking over drug development
The most striking evidence that artificial intelligence can provide profound scientific breakthroughs came with the unveiling of a program called AlphaFold by Google DeepMind. In 2016 researchers at the company had scored a big success with AlphaGo, an ai system which, having essentially taught itself the rules of Go, went on to beat the most highly rated human players of the game, sometimes by using tactics no one had ever foreseen. This emboldened the company to build a system that would work out a far more complex set of rules: those through which the sequence of amino acids which defines a particular protein leads to the shape that sequence folds into when that protein is actually made. AlphaFold found those rules and applied them with astonishing success.
The achievement was both remarkable and useful. Remarkable because a lot of clever humans had been trying hard to create computer models of the processes which fold chains of amino acids into proteins for decades. AlphaFold bested their best efforts almost as thoroughly as the system that inspired it trounces human Go players. Useful because the shape of a protein is of immense practical importance: it determines what the protein does and what other molecules can do to it. All the basic processes of life depend on what specific proteins do. Finding molecules that do desirable things to proteins (sometimes blocking their action, sometimes encouraging it) is the aim of the vast majority of the world’s drug development programmes.
Comment: Someone needs to fire up a CRISPR-cas AI service you can submit your DNA to and they develop and ship you a treatment kit for various cancers, genetic disorders etc.
What Else Is Happening in AI on April 02nd, 2024
Pinecone launches Luna AI that never hallucinates
Trained using a novel “information-free” approach, Luna achieved zero hallucinations by always admitting when it doesn’t know an answer. The catch? Its performance on other tasks is significantly reduced. While not yet open-sourced, vetted institutions can access the model’s source and weights. (Link)
US and UK collaborate to tackle AI safety risks
As concerns grow over the potential risks of next-gen AI, the two nations will work together to develop advanced testing methods and share key information on AI capabilities and risks. The partnership will address national security concerns and broader societal issues, with plans for joint testing exercises and personnel exchanges between their respective AI safety institutes. (Link)
Perplexity to test sponsored questions in AI search
Perplexity’s Chief Business Officer, Dmitry Shevelenko, announced the company’s plan to introduce sponsored suggested questions later this year. When users search for more information on a topic, the platform will display sponsored queries from brands, allowing Perplexity to monetize its AI search platform. (Link)
OpenAI expands to Japan with Tokyo office
The Tokyo office will be OpenAI’s first in Asia and third international location, following London and Dublin. The move aims to offer customized AI services in Japanese to businesses and contribute to the development of an AI governance framework in the country. (Link)
Bixby gets a GenAI upgrade
Despite speculation, Samsung isn’t giving up on its voice assistant, Bixby. Instead, the company is working hard to equip Bixby with generative AI to make it smarter and more conversational. Samsung introduced a suite of AI features called Galaxy AI to its smartphones, including the Galaxy S24’s use of Google’s Gemini Nano AI model. (Link)
A daily chronicle of AI Innovations April 01st 2024: This AI model can clone your voice in 15 seconds; Microsoft and OpenAI plan $100B supercomputer for AI development; MagicLens: Google DeepMind’s breakthrough in image retrieval technology
🍎Apple says its latest AI model is even better than OpenAI’s GPT4
Apple researchers have introduced ReALM, an advanced AI model designed to understand and navigate various contexts more effectively than OpenAI’s GPT4.
ReALM aims to enhance user interaction by accurately understanding onscreen, conversational, and background entities, making device interactions more intuitive.
Apple believes ReALM’s ability to handle complex reference resolutions, including onscreen elements, positions it as a superior solution compared to the capabilities of GPT-4.
Deepmind chief doesn’t see AI reaching its limits anytime soon
Deepmind founder Demis Hassabis believes AI is both overhyped and underestimated, with the potential for AI far from being reached and warning against the excessive hype surrounding it.
Hassabis predicts many AI startups will fail due to the high computing power demands, expects industry consolidation, and sees no limit to the advancements in massive AI models.
Despite concerns over hype, Hassabis envisions the beginning of a new golden era in scientific discovery powered by AI and estimates a 50% chance of achieving artificial general intelligence within the next ten years.
OpenAI has offered a glimpse into its latest breakthrough – Voice Engine, an AI model that can generate stunningly lifelike voice clones from a mere 15-second audio sample and a text input. This technology can replicate the original speaker’s voice, opening up possibilities for improving educational materials, making videos more accessible to global audiences, assisting with communication for people with speech impairments, and more.
Though the model has many applications, the AI giant is cautious about its potential misuse, especially during elections. They have strict rules for partners, like no unauthorized impersonation, clear labeling of synthetic voices, and technical measures like watermarking and monitoring. OpenAI hopes this early look will start a conversation about how to address potential issues by educating the public and developing better ways to trace the origin of audio content.
Why does this matter?
OpenAI’s Voice Engine can transform industries from gaming and entertainment to education and healthcare. Imagine video games with non-player characters that sound like real people, animated films with AI-generated voiceovers, or personalized voice assistants for individuals with speech impairments. But as AI-generated voices become more human-like, questions about consent, privacy, and robust authentication measures must be addressed to prevent misuse.
Microsoft+OpenAI plan $100B supercomputer for AI development
Microsoft and OpenAI are reportedly planning to build a massive $100 billion supercomputer called “Stargate” to rapidly advance the development of OpenAI’s AI models. Insiders say the project, set to launch in 2028 and expand by 2030, would be one of the largest investments in computing history, requiring several gigawatts of power – equivalent to multiple large data centers.
Much of Stargate’s cost would go towards procuring millions of specialized AI chips, with funding primarily from Microsoft. A smaller $10B precursor called “Phase 4” is planned for 2026. The decision to move forward with Stargate relies on OpenAI achieving significant improvements in AI capabilities and potential “superintelligence.” If realized, Stargate could enable OpenAI’s AI systems to recursively generate synthetic training data and become self-improving.
Why does this matter?
The Stargate project will give OpenAI and Microsoft a massive advantage in creating AI systems that are far more capable than what we have today. This could lead to breakthroughs in areas like scientific discovery, problem-solving, and the automation of complex tasks. But it also raises concerns about the concentration of power in the AI industry. We’ll need new frameworks for governing advanced AI to ensure it benefits everyone, not just a few giants.
MagicLens: Google DeepMind’s breakthrough in image retrieval technology
Google DeepMind has introduced MagicLens, a revolutionary set of image retrieval models that surpass previous state-of-the-art methods in multimodality-to-image, image-to-image, and text-to-image retrieval tasks. Trained on a vast dataset of 36.7 million triplets containing query images, text instructions, and target images, MagicLens achieves outstanding performance while meeting a wide range of search intents expressed through open-ended instructions.
Multimodality-to-Image performance
Image-to-Image performance
MagicLens employs a dual-encoder architecture, which allows it to process both image and text inputs, delivering highly accurate search results even when queries are expressed in everyday language. By leveraging advanced AI techniques, like contrastive learning and single-modality encoders, MagicLens can satisfy diverse search intents and deliver relevant images with unprecedented efficiency.
Why does this matter?
The release of MagicLens highlights the growing importance of multimodal AI systems that can process both text and visual information. We can expect to see more seamless integration between language and vision, enabling the development of more sophisticated AI applications. This trend could have far-reaching implications for fields such as robotics, autonomous vehicles, and augmented reality, where the ability to interpret and respond to visual data is crucial.
Tata Consultancy Services (TCS) has announced that it has trained 3.5 lakh employees, more than half of its workforce, in generative AI skills. The company set up a dedicated AI and cloud business unit in 2023 to address the growing needs of customers for cloud and AI adoption, offering a comprehensive portfolio of GenAI services and solutions. (Link)
ChatGPT introduces hyperlinked source citations in the latest update
OpenAI has introduced a feature for ChatGPT premium users that makes source links more prominent in the bot’s responses. The update hyperlinks words within ChatGPT’s answers, directing users to the source websites — a feature already present in other chatbot search resources like Perplexity. (Link)
OpenAI’s DALL·E now allows users to edit generated images
OpenAI has launched a new image editing feature for DALL·E, enabling users to modify generated images by selecting areas and describing changes. The editor offers tools to add, remove, or update objects within the image using either the selection tool or conversational prompts. (Link)
NYC to test Evolv’s AI gun detection technology in subways
New York City plans to test Evolv’s AI-powered gun detection scanners in subway stations within 90 days, according to Mayor Eric Adams. However, Evolv is under scrutiny for the accuracy of its technology, facing reports of false positives and missed detections. (Link)
Microsoft Copilot banned in US House due to potential data breaches
The US House of Representatives has banned its staffers from using Microsoft Copilot due to concerns about possible data leaks to unauthorized cloud services. This decision mirrors last year’s restriction on the use of ChatGPT in congressional offices, with no other chatbots currently authorized. Microsoft has indicated that it plans to address federal government security and compliance requirements for AI tools like Copilot later this year. (Link)
Today I Learned (TIL) You learn something new every day; what did you learn today? Submit interesting and specific facts about something that you just found out here.
Reddit Science This community is a place to share and discuss new scientific research. Read about the latest advances in astronomy, biology, medicine, physics, social science, and more. Find and submit new publications and popular science coverage of current research.