Download the AI & Machine Learning For Dummies PRO App: iOS - Android Our AI and Machine Learning For Dummies PRO App can help you Ace the following AI and Machine Learning certifications:
The AI revolution continues to blaze through 2024. June was a month of monumental strides, marked by breakthroughs in quantum AI, autonomous medical drones, and natural language processing. But the AI landscape is a dynamic one, and July has already proven to be no exception.
This month, we’re diving deep into the latest AI developments, from groundbreaking research to real-world applications. We’ll explore how AI is reshaping industries, addressing global challenges, and redefining what’s possible. Join us as we uncover the stories behind the headlines and analyze the implications of these innovations for society.
Whether you’re an AI expert or just curious about the future, this blog is your go-to source for the most up-to-date insights. Stay tuned for daily updates as we navigate the exciting world of artificial intelligence together.
A Daily Chronicle of AI Innovations on August 30th 2024
Apple and Nvidia may invest in OpenAI
Amazon’s new Alexa voice assistant will use Claude AI
OpenAI and Anthropic will share their models with the US government
Google is working on AI that can hear signs of sickness
OpenAI and Anthropic partner with US gov
China’s new Qwen2 beats GPT-4o
AI startup reaches 100M token context
China’s new Qwen2 beats GPT-4o
Alibaba just unveiled Qwen2-VL, a new vision-language AI model that outperforms GPT-4o in several benchmarks — particularly excelling in document comprehension and multilingual text-image understanding.
Qwen2-VL can understand images of various resolutions and ratios, as well as videos over 20 minutes long.
The model excels particularly at complex tasks such as college-level problem-solving, mathematical reasoning, and document analysis.
It also supports multilingual text understanding in images, including most European languages, Japanese, Korean, Arabic, and Vietnamese.
You can try Qwen2-VL on Hugging Face, with more information on the official announcement blog.
There’s yet another new contender in the state-of-the-art AI model arena, and it comes from China’s Alibaba. Qwen2-VL’s ability to understand diverse visual inputs and multilingual requests could lead to more sophisticated, globally accessible AI applications.
Apple and Nvidia are reportedly in talks to participate in a significant funding round for OpenAI, with Apple planning to integrate ChatGPT into iOS and Nvidia being a key supplier of the chips that power OpenAI’s AI services.
Apple, which had earlier considered appointing Phil Schiller to OpenAI’s board before abandoning the plan, is looking to deepen its involvement with OpenAI as it prepares to enhance Siri with ChatGPT capabilities later this year.
Nvidia, whose hardware is essential for OpenAI’s operations, is also considering investing in this funding round, joining Microsoft, which has been a major investor in OpenAI since 2019 and made another substantial investment in 2023.
OpenAI and Anthropic just signed a groundbreaking agreement with the U.S. Artificial Intelligence Safety Institute to allow government access and testing of their AI models before public release.
The U.S. AI Safety Institute will have access to major new models from both companies prior to and after their public release.
This collaboration is a step toward AI regulation and safety efforts, with the U.S. government evaluating AI models’ capabilities and associated risks.
The institute will provide feedback to OpenAI and Anthropic on potential safety improvements that should be made.
These agreements come as AI companies face increasing regulatory scrutiny, with California legislators recently passing a broad AI regulation bill earlier today.
The two most popular AI companies in the world are granting the U.S. government access to unreleased models before release. This could reshape how AI is developed, tested, and deployed worldwide, with major implications around innovation, safety, and international competition in the AI space, for better or worse.
Amazon’s new Alexa voice assistant will use Claude AI
Amazon’s new voice assistant, “Remarkable Alexa,” will launch in October and be powered by Anthropic’s Claude AI, offering a subscription-based service.
The existing Alexa model struggled with accuracy, leading Amazon to invest in Anthropic’s AI technology after facing internal technical and bureaucratic issues.
Remarkable Alexa is set to feature daily AI-generated news summaries, a child-focused chatbot, and conversational shopping tools, with a demo planned for Amazon’s September event.
Magic just developed LTM-2-mini, a model capable of processing 100 million tokens of context — equivalent to about 10 million lines of code or 750 novels — and partnered with Google Cloud to build advanced AI supercomputers.
LTM-2-mini can process and understand 100 million tokens of context given during inference, surpassing current models by 50x.
The model’s innovative algorithm processes long sequences of data 1000x more efficiently than the current top-performing AI models.
Magic is also partnering with Google Cloud to build supercomputers powered by Nvidia’s newest and most advanced GPUs.
The company has raised more than $450 million in total funding, including a recent $320 million investment round.
This breakthrough in context length allows AI agents to process and reason over dense and complicated codebases, vast databases, and years of conversation history in a single inference. It’s a significant step toward creating AI assistants with near-perfect recall and memory.
Google is working on AI that can hear signs of sickness
Google is developing artificial intelligence technology that can detect early signs of illness by analyzing sound signals like coughs and sniffles.
The AI model is trained with 300 million audio samples and can identify diseases such as tuberculosis by recognizing specific audio patterns of labored breathing.
Google has partnered with Salcit Technologies, an AI startup in India, to integrate this technology into smartphones to assist high-risk populations in areas with limited healthcare access.
Anthropic’s Prompt Engineering Interactive Tutorial: a digital platform designed to teach users how to effectively craft prompts for AI applications, enhancing user interaction and efficiency.
Documents reveal state-linked Chinese entities are using cloud services from AWS or its rivals to access advanced US chips and AI models they cannot acquire otherwise.
California lawmakersapproved a bill proposing sweeping AI regulations, including safety testing requirements and potential legal consequences for harmful AI systems.
A Daily Chronicle of AI Innovations on August 29th 2024
AI creates DOOM video game in real-time
OpenAI raises at $100B valuation
AI spots cancer earlier than ever
Nvidia just showed how hard it is to be the AI king
Google researchers run Doom on a self-generating AI model
Midjourney says it’s ‘getting into hardware’
OpenAI aims for $100B+ valuation in new funding round
Major websites reject Apple AI data scraping
AI creates DOOM video game in real-time
Google researchers just developed GameNGen, an AI system that can simulate the classic game DOOM in real-time, running at over 20 frames per second and producing visuals nearly indistinguishable from the original game.
GameNGen produces playable gameplay at 20 frames per second on a single chip, with each frame predicted by a diffusion model.
The AI was trained on 900M frames of gameplay data, resulting in 3-second clips almost indistinguishable from the actual game by playtesters.
Running on a single TPU, GameNGen handles Doom’s 3D environments and fast-paced action without traditional game engine components.
In tests, human raters could barely distinguish between short clips of the AI simulation and the actual game.
GameNGen is the first AI model that can generate a complex and playable video game in real-time without any underlying real game engine. We’re at the fascinating time where soon, AI will be able to create entire games on the fly, personalized to each player.
OpenAI is reportedly in talks to raise a new funding round at a valuation exceeding $100 billion, led by Thrive Capital, with Microsoft also expected to participate.
The potential valuation of over $100 billion would be significantly higher than OpenAI’s previous $86 billion valuation.
Thrive Capital is expected to invest around $1 billion in this round.
OpenAI’s annualized revenue reportedly surpassed $3.4 billion earlier this year.
The company is still, however, projected to lose nearly $5 billion by the end of the year and has already spent $8.5 billion on AI training and staffing.
Building AI is expensive, and raising billions of dollars at a $100B+ valuation would silence OpenAI’s critics who insist that the company is on its downfall. The increased valuation also suggests that the company has potential hidden breakthroughs behind the scenes, such as Project Strawberry and Orion.
Researchers recently developed an AI tool called AINU that can differentiate cancer cells from normal cells and detect early stages of viral infection, by analyzing high-resolution images of cell nuclei.
AINU uses a convolutional neural network to analyze images captured by STORM microscopy, which offers nanoscale resolution.
The AI can detect structural changes in cells as small as 20 nanometers, 5,000 times smaller than a human hair’s width.
AINU also detected viral infections (herpes simplex virus type-1) just one hour after infection by observing subtle changes in DNA packing.
The tool can accurately identify stem cells too, which could accelerate stem cell research without relying on animal testing.
Yesterday, researchers revealed an AI tool to help with early dementia detection, and now AI is detecting cancer cells at a nanoscale level. Clinical applications may be years away, but AI healthcare breakthroughs like AINU are only accelerating — and will dramatically revolutionize scientific research in the coming years.
Nvidia just showed how hard it is to be the AI king
Nvidia achieved strong second-quarter results by more than doubling its revenue compared to the same period last year, but industry experts anticipated these outcomes due to ongoing investments in AI by tech companies.
Despite reporting $30.04 billion in revenue, which surpassed analyst expectations, Nvidia’s stock fell 6.9% after hours due to investor concerns and sky-high expectations.
Issues like shipment delays for Nvidia’s upcoming Blackwell GPUs and slightly lower-than-expected revenue projections for the next quarter also contributed to investor unease, as noted by multiple analysts.
Midjourney, known for its AI image-generation tool, announced it is entering the hardware market and invited job seekers to join its new division.
The announcement was made on Midjourney’s official X account, revealing that founder David Holz and new hire Ahmad Abbas, a former Apple hardware manager, will lead the hardware efforts.
Midjourney hinted at multiple ongoing projects and the possibility of new form factors, though no specific timeline or further details have been provided yet.
OpenAI aims for $100B+ valuation in new funding round
OpenAI is reportedly negotiating with venture capital firms to raise a large sum of money, potentially valuing the company at over $100 billion.
Thrive Capital plans to invest $1 billion in this funding round, and Microsoft is also expected to contribute additional funds, as reported by The Wall Street Journal.
If successful, this would be the most substantial new capital for OpenAI since Microsoft’s $10 billion investment in January 2023, with OpenAI’s valuation potentially exceeding $103 billion based on recent negotiations.
Many of the largest websites, such as Facebook, Instagram, and The New York Times, have opted out of Apple’s AI training by using the Applebot-Extended tag to exclude their content.
Apple allows publishers to easily opt out of content scraping for Apple Intelligence training through a publicly-accessible robots.txt file, ensuring their data is not used for AI purposes.
Apple’s use of Applebot for AI training is designed to be ethical, with mechanisms to filter out personal data and a system for web publishers to prevent their data from being utilized.
A Daily Chronicle of AI Innovations on August 28th 2024
OpenAI prepares ‘Project Strawberry’
Google launches trio of new models
😯Google AI-Powered Interview Warmup
Create an AI prompt optimizer GPT
AI tools help early dementia detection
📈 Nvidia earnings to test AI boom
Google Meet will now take notes for you
OpenAI prepares ‘Project Strawberry’
OpenAI researchers are preparing to launch a new AI model, code-named Strawberry (previously Q*), that demonstrates superior reasoning capabilities in solving complex problems, according to a new report via The Information.
Project Strawberry could be integrated into ChatGPT as soon as this fall, marking a significant leap in AI intelligence.
Given extra “thinking” time, Strawberry can tackle subjective topics and solve complex puzzles like the New York Times Connections.
OpenAI is using Strawberry to generate high-quality training data for another secretive upcoming LLM, reportedly code-named Orion.
The new AI model could enhance OpenAI’s development of AI agents, potentially automating multi-step tasks more effectively.
If Strawberry lives up to the leaks, it could mark a significant leap in AI reasoning capabilities, potentially advancing OpenAI towards Stage 2 of its five-level roadmap to AGI. With ChatGPT reported to gain these capabilities this fall, we’re likely on the verge of seeing the next major wave of AI disruption.
Google Meet’s new AI-powered feature, “take notes for me,” has started rolling out today, summarizing meetings for Google Workspace customers with specific add-ons and initially announced at the 2023 Cloud Next conference.
This feature automatically generates a Google Doc with meeting notes, attaches it to the calendar event, and sends it to the meeting organizer and participants who activated the tool, although it currently supports only spoken English.
Google predicts the feature will be available to all Google Workspace customers by September 10th, 2024, but there are concerns about its accuracy, given the performance of similar transcription tools in the past.
Google just released three new experimental Gemini 1.5 models, including a compact 8B parameter version, an improved Pro model, and an enhanced Flash model — all available for developers on Google AI Studio.
Gemini 1.5 Flash-8B is a smaller, faster model that can handle text, images, and other data types efficiently for super quick responses while processing a lot of information.
The updated Gemini 1.5 Pro model is now better at writing code and understanding complex instructions.
An improved Gemini 1.5 Flash model offers overall enhancements, performing better on Google’s internal tests across various tasks.
The upgraded Gemini 1.5 Pro model now ranks as #2, and the new Gemini 1.5 Flash ranks as #6 on the Chatbot Arena leaderboard.
While OpenAI is leaving everyone waiting, Google has been shipping out constant upgrades and new features to its AI offerings. These new enhancements give Gemini 1.5 Flash big improvements overall and Gemini 1.5 Pro new upgrades in math, coding, and responding to longer prompts.
Google actually runs this tasty thing called “Interview Warmup.” It’s an AI-powered training tool for your next big interview. It throws real questions based on your discipline: UX, data and analytics, cybersecurity, etc. Then, the magic kicks in, evaluating your audio answers and sending back recommendations on things like framing your qualifications to supporting your impact.
5 questions. Get some analysis. Build some confidence. Easy, right? 🌟
Oh. And for the tech-oriented: Also make sure you check this site out, too. Videos, former (real) interview questions, the works. Interview Prep – Google Tech Dev Guide
OpenAI’s Custom GPTs allow premium users to create AI assistants that can optimize prompts for other AI creative tools such as Midjourney for AI image generation or Gen-3 for AI video generation.
Log into your ChatGPT Plus account and click “Explore GPTs”, then click “Create”.
Name your GPT and add a brief description.
In the Instructions, paste: “User is using an AI video generator called [Tool Name]. You need to craft a perfect prompt for the topic they ask by following the prompting guide below. The prompt needs to follow the format provided in the guide.”
Test your GPT in the preview panel, then click “Create” to finalize and choose sharing options.
Hot tip: Add a complete prompting guide for your chosen AI tool (e.g. Runway’s Gen-3 prompting guide)
Scientists from the Universities of Edinburgh and Dundee are launching a massive AI-driven study of over 1.6 million brain scans to develop tools for early dementia prediction and diagnosis.
The project, called NEURii, will use AI and machine learning to analyze CT and MRI scans from Scottish patients over the past decade.
Researchers aim to create digital tools for radiologists to assess dementia risk during routine scans.
The study will match image data with linked health records to identify patterns associated with dementia risk.
With global dementia cases projected to reach 153 million by 2050, this research could significantly impact early intervention and treatment development.
This week alone, we’ve seen AI developing new cancer drugs, 3D printing lifelike human organs, and now creating tools for early dementia detection. As AI rapidly advances in healthcare, we’re accelerating into a new era of personalized medicine and preventative care.
There have been several negative reports ahead of Nvidia’s earnings, ranging from supply chain/design challenges to concerns about use cases and applications. However, one thing we learned from discussions with customers is that demand is still extremely constrained.
Key topics ahead of the results:
1. Will the Hopper architecture stay stronger for longer? 2. Is Blackwell really delayed? 3. What is the upside if the company can deliver on the systems orders?
Here are some thoughts on each:
1. Key players like Microsoft, Snowflake, and Tesla highlighted tight capacity for GPUs and more demand than available supply. Snowflake particularly called out H100 (un)availability. This makes us believe that the Hopper cycle may extend beyond ’23/24
2. There were several reports pointing to Blackwell delays, the new generation GPU. Analysts have now taken it out of estimates for this year (C24). However, our research indicates that the delays are mainly on the systems side, which were not supposed to be delivered until (C25). Meanwhile, Nvidia’s CEO noted that we can expect significant revenues from Blackwell this year … key will be to find out if this is still the case.
3. Systems – namely the GB200 NVL36/72 is where the delays are. But our intel suggests that the order book for these is through the roof due to the TCO (total cost of ownership) they offer. If Nvidia is in fact able to deliver these in ’25 revenue from systems alone can exceed >$100BN with total DC revenue >$200BN.
What Else is happening in AI on August 28th 2024!
Apple announced a September 9 event where it’s expected to debut the iPhone 16 with new generative AI features.
Elon Muskendorsed California’s Senate Bill 1047, which would require safety testing for large AI models, breaking with other tech leaders who oppose the regulation.
Amazonplans to launch a delayed AI-powered Alexa subscription in October, featuring “Smart Briefing” AI-generated news summaries.
Anthropicannounced the full release of its Artifacts feature for all Claude users, including mobile apps, after millions were created in its test phase.
A Daily Chronicle of AI Innovations on August 27th 2024
AI can 3D print lifelike human organs
Anthropic reveals Claude’s secret sauce
Amazon aims to launch delayed AI Alexa subscription in October
OpenAI, Adobe, Microsoft want all companies to label AI-generated content
ChatGPT teams up with ASU
Discovering new drugs with AI
How to use Midjourney ‘Erase‘
AI can 3D print lifelike human organs
Researchers at Washington State University recently developed an AI technique called Bayesian Optimization that dramatically improves the speed and efficiency of 3D printing lifelike human organs.
The AI balances geometric precision, density, and printing time to create organ models that look and feel authentic.
In tests, it printed 60 continually improving versions of kidney and prostate organ models.
This approach significantly reduces the time and materials needed to find optimal 3D printing settings for complex objects.
The technology also has potential applications beyond medicine — for example, in the computer science, automotive, and aviation industries.
With cheaper, lifelike 3D-printed human organs, medical students could better practice for surgery before operating on actual patients. Beyond medicine, this AI technique could help reduce manufacturing costs for a variety of things like smartphones, car parts, and even airplane components.
Scientists from China and the U.S. just developed ActFound, a new AI model that outperforms existing methods in predicting drug bioactivity, potentially accelerating and reducing costs in drug development.
ActFound combines meta-learning and pairwise learning to overcome common limitations in AI drug discovery, like small datasets and incompatible measurements.
The model was trained on 35,000+ assays (metal ore breakdowns) and 1.6 million experimentally measured bioactivities from a popular chemical database.
In tests, ActFound outperformed nine competing models and showed strong performance in predicting cancer drug bioactivity.
ActFound could significantly speed up drug development by accurately predicting compound properties with less data and lower costs than traditional methods. While still in early stages, AI breakthroughs like this are the lesser-talked about developments that could end up saving millions of lives.
OpenAI’s ChatGPT is headed to Arizona State University (ASU), where the university is integrating the AI assistant into over 200 projects across teaching, research, and operations.
ASU is using ChatGPT Edu, a version designed for universities with enhanced privacy and security features.
The university also launched an ‘AI Innovation Challenge’ for faculty and staff, receiving an overwhelming demand for using ChatGPT to maximize teaching, research, and ops.
Key projects include an AI writing companion for scholarly work, ‘Sam’ (a chatbot for med students to practice patient interactions), and AI-assisted research recruitment.
The partnership has inspired other institutions like Oxford and Wharton to pursue similar collaborations.
While some schools are attempting to resist AI, ASU is embracing ChatGPT to make learning more personalized and to prepare students for an increasingly AI-driven job market. As education continues to change in the age of AI, case studies like this will be instrumental in shaping the future of academia.
Source: https://openai.com/index/asu/
Anthropic reveals Claude’s secret sauce
Anthropic has published the system prompts for its latest AI models, including Claude 3 Opus, Claude 3.5 Sonnet, and Claude 3.5 Haiku, to demonstrate transparency and ethical practices.
The system prompts reveal specific behaviors and capabilities of the Claude models, such as the inability to open URLs or recognize faces, aiming to ensure ethical interactions.
Anthropic plans to continue updating and disclosing these system prompts to promote transparency, potentially pressuring other AI vendors to follow suit.
Amazon aims to launch delayed AI Alexa subscription in October
The new Alexa AI, set to launch around mid-October, will feature a “Smart Briefing” that provides daily, AI-generated news summaries based on user preferences.
A more personalized experience is expected, with Alexa AI learning user preferences through interactive and tailored responses, such as dietary requirements for recipe suggestions.
Alexa AI will also introduce a “Shopping Scout” feature to help users find deals and track prices, alongside a kid-friendly “Explore with Alexa 2.0” for safe, moderated conversations.
OpenAI, Adobe, Microsoft want all companies to label AI-generated content
OpenAI, Adobe, and Microsoft now back a California bill that mandates tech companies to add watermarks to AI-generated content, with the bill set for a final vote in August.
AB 3211 requires AI-generated photos, videos, and audio clips to have watermarks in their metadata and mandates large online platforms to label AI content clearly for average viewers.
Initially opposed by a trade group representing major software companies, the bill gained support from OpenAI, Adobe, and Microsoft after amendments addressed concerns about its practicality.
Inflection AI partnered with Data Transfer Initiative, enabling Pi users to export conversations and announced plans to cap free usage while focusing on enterprise AI.
Source: https://inflection.ai/the-future-of-pi
Phariareleased Pharia-1-LLM-7B, an open-source model optimized for German, French, and Spanish that excels in domain-specific applications.
IBMpreviewed Spyre, a new AI accelerator chip for IBM Z mainframes, designed to scale enterprise AI workloads with clustering capabilities.
Source: https://research.ibm.com/blog/spyre-for-z
Hugging FaceandGoogle Cloud just partnered up to release optimized Deep Learning Containers for building AI with open models on Google Cloud infrastructure.
SPONSOR US: Get your product in front of over 1 million+ AI enthusiasts
Our Daily AI Chronicle Blog, newsletter and podcast is read by thousands of Redditors, Quorans, Linkedin professionals, tech executives, investors, engineers, managers, and business owners around the world. Get in touch today.
A Daily Chronicle of AI Innovations on August 26th 2024
Amazon is telling its salespeople to trash talk Google, Microsoft, and OpenAI
Apple may be working on an AI ‘personality’ to replace Siri on its robots
Chinese companies showcased 27 humanoid robots alongside Tesla’s Optimus
AI learns to plan better without humans
How to use Ideogram for generating images
️ Grok-2 improves speed, accuracy, transparency
AI learns to plan better without humans
IBM Research and Cornell University recently created AutoToS, a system that teaches AI to solve complex planning problems at 100% accuracy — without needing a human to check its work.
AutoToS is like a smart tutor for AI, helping it learn how to break down and solve tricky problems step-by-step.
The system uses clever tests to check the AI’s work, pointing out mistakes and showing examples of how to do better without human interferance.
This approach seems to work equally as well for smaller and larger models.
AutoToS succeeded in teaching AI to solve complex puzzles, including classic problems like arranging blocks and solving Sokoban, a box-pushing game.
Right now, it’s difficult to trust AI agents to completely autonomously perform actions on your behalf, but AutoToS is solving complex tasks at a 100% accuracy. If this system works in the real world, it’s the next big step in creating more reliable AI assistants.
Apple may be working on an AI ‘personality’ to replace Siri on its robots
Apple is developing a new AI-based ‘personality’ for use in upcoming robotic devices, aiming to enhance interactions similar to how Siri functions on existing Apple products.
Bloomberg’s Mark Gurman reports that Apple’s futuristic AI assistant will be more humanlike and could operate on a tabletop product and other future robots, potentially costing under $1,000.
The project is in early development stages with no guarantees of release, while Apple continues to integrate generative AI features into its devices, like iPhones, iPads, and Macs, later this year.
Chinese companies showcased 27 humanoid robots alongside Tesla’s Optimus
At the Beijing World Robot Conference, Tesla’s Optimus humanoid was displayed motionless inside a clear box, facing tough competition from Chinese robots demonstrated by various companies.
The event saw 27 new humanoid robots debut, with significant financial investments in China’s robotics industry surpassing 100 billion yuan over the past decade.
Chinese startups like Agibot and Stardust Intelligence showcased robots capable of performing complex tasks, while experts believe Tesla’s and other U.S. companies’ robot technology leads by about one to two years.
xAI’s Grok-2 and Grok-2 mini just made major improvements — doubling the model’s speed in the mini version and showing increased accuracy in both models, just days after its beta launch.
Grok-2 mini is now twice as fast as it was previously, thanks to a rewritten inference stack using SGLang.
Both Grok-2 and its mini version have become slightly more accurate due to reduced quantization error, according to one xAI employee.
Additionally, both Grok-2 models are now part of the LMSYS Chatbot Arena leaderboard for increased transparency, with Grok-2’s larger model ranking #2 and surpassing Claude 3.5 Sonnet.
Grok-2 excels particularly in math, where it ranks #1 and performs at a state-of-the-art level in hard prompts, coding, and instruction-following.
From being founded only ~18 months ago, to creating an LLM ranked third in the world, it’s safe to say that xAI has the entire AI community mind blown. This not only makes Grok-2 a top contender in the AI race but also intensifies competition, potentially accelerating advancements across the industry.
At the 2024 World Robot Conference in Beijing, Chinese companies showcased 27 humanoid robots alongside Tesla’s Optimus, signalling China’s ambition to dominate the industry.
Chinese tech firms unveiled 27 humanoid robots at the expo, with Tesla’s Optimus being the only foreign competitor present.
AGIBOT, founded by a Huawei alumnus, presented robots powered by large language models (LLMs) for industrial use and customer service.
Other notable entries included Astribot’s S1 robot assistant capable of writing calligraphy and playing musical instruments, and Galbot’s wheeled robots for food delivery and retail tasks.
Despite the impressive showcase, experts note that technological hurdles and high costs still create challenges for Chinese manufacturers.
China may be slightly behind in the AI race against the U.S., but it’s clear the country is committed to dominating the humanoid robotics race. With a whopping 27 China-based humanoid robots demonstrating a wide-range of use cases at the event, commercially available humanoids may be coming sooner than most expect.
Ideogram 2.0, the latest state-of-the-art AI image generator, excels at creating images that include text — opening new possibilities for use cases like thumbnails, posters, newsletter graphics, memes, and more.
Head over to Ideogram’s website and Sign up. You’ll get free credits to try the image generator without a credit card.
Click “Describe what you want to see” and enter a detailed text prompt for your desired image.
Customize settings like aspect ratio, AI model (choose 2.0), and style (Realistic, Design, 3D, or Anime).
Click “Generate” to create four AI-generated images based on your prompt!
Pro tip: Experiment with different prompts and settings to discover its full potential and create unique visuals for your projects!
What Else is Happening in AI on August 26th 2024!
Scientists to use AI and 1.6 million brain scans for earlier and more accurate dementia diagnoses.
Anthropic supported California’s AI regulation bill after changes were made, saying its benefits likely outweigh its costs for advanced AI development.
A Daily Chronicle of AI Innovations on August 23rd 2024
Nvidia and Mistral make laptop-ready AI
Amazon’s AI assistant saves 4,500 years of development time
Slack AI could be tricked into leaking login details and more
Cruise’s robotaxis are coming on Uber
Google DeepMind workers urge the company to end ties with military organizations
Salesforce unveils AI agents for sales
Nvidia and Mistral make laptop-ready AI
Nvidia and Mistral just released Mistral-NeMo-Minitron 8B, a highly accurate small language model that can run efficiently on laptops and PCs.
The model uses optimization techniques like pruning (removing certain weights) and distillation (retraining the pruned model on a small dataset) to achieve high accuracy with a smaller footprint.
These optimizations resulted in up to 40x cost savings in terms of raw compute during training.
Laptops and PCs can run the model locally for faster and more secure interactions with AI.
Minitron 8B leads nine language-driven AI benchmarks for similarly sized models from language understanding to reasoning and coding.
AI models that are small enough to run locally on laptops and PCs means less reliance on cloud services, improved data privacy, and faster responses. As this tech evolves, we could soon see advanced AI in everything from smartphones and watches to home appliances.
Amazon’s AI assistant saves 4,500 years of development time
Amazon CEO Andy Jassy stated that their AI assistant, Amazon Q, has significantly reduced software upgrade times, saving the company thousands of work hours.
Jassy mentioned that implementing Amazon Q resulted in estimated savings equivalent to 4,500 developer-years and $260 million in annual efficiency gains.
The AI-generated code reviews were so accurate that 79% of them were shipped without any additional changes, demonstrating the tool’s effectiveness in streamlining tedious tasks.
Researchers just developed a new AI-based method called NES-VMC that can accurately calculate the excited states of atoms and molecules, a challenge in physics and chemistry that previously delayed improvements in solar tech.
NES-VMC (natural excited states variational Monte Carlo) accurately predicted quantum excited states on systems ranging from single atoms to benzene-sized molecules.
The method outperforms leading computational chemistry techniques, often achieving chemical accuracy.
Excited states are crucial for understanding light-matter interactions, key to improving solar cells, LEDs, lasers, and more.
NES-VMC overcomes long-standing challenges in physics and chemistry that have hindered progress in these fields.
This AI-driven breakthrough could lead to more efficient solar cells, brighter LEDs, and more powerful lasers. The ripple effects could be dramatic: lower electricity costs, improvements in phone and laptop battery life and displays, faster fiber-optic internet, and so much more.
Salesforce just introduced two fully autonomous, AI-powered sales agents, Einstein SDR Agent and Einstein Sales Coach Agent, designed to help sales teams accelerate growth through automation and personalization.
Einstein SDR Agent engages with inbound leads 24/7 to answer questions, handle objections, and book meetings.
Einstein Sales Coach Agent helps salespeople rehearse pitches and offers real-time suggestions during calls.
The agents both leverage Salesforce’s CRM data and external data uploaded via Data Cloud to generate accurate, contextually relevant responses.
The agents will be generally available in October, with more details expected to be released at Dreamforce conference in September.
By integrating AI agents into existing platforms, Salesforce is lowering the barrier for AI adoption in business processes. These agents offer 24/7 support and automate repetitive tasks like qualifying leads and booking meetings, freeing human sales teams to focus on high-value tasks and potentially close more deals.
Slack AI could be tricked into leaking login details and more
Security experts found that Slack’s AI assistant can be misled into disclosing sensitive information, like API keys, to unauthorized users through carefully crafted prompts.
Hackers can exploit this vulnerability by creating a public Slack channel, inputting a malicious command that causes the AI to leak private data via clickable URLs.
Salesforce fixed the issue for private channels but public ones remain exposed, allowing attackers to use social engineering tactics to get workspace members to upload malicious documents.
Google DeepMind workers urge the company to end ties with military organizations
In May 2024, approximately 200 Google DeepMind employees signed a letter urging the company to cease its contracts with military organizations due to concerns over the use of AI technology in warfare, according to Time magazine.
The letter highlights internal tensions between Google’s AI division and its cloud business, referencing Google’s defense contract with the Israeli military and the use of AI for mass surveillance and targeting in Gaza.
The letter calls for Google to investigate claims of its cloud services being used by militaries, cut off such access, and establish a new governance body to prevent future military use of DeepMind’s AI technology.
A Daily Chronicle of AI Innovations on August 22nd 2024
Neuralink’s second patient is already playing video games with brain implant
Apple’s first foldable MacBook might see big delays
OpenAI joins Silicon Valley companies lobbying against California’s AI bill
Ideogram 2.0 launches with major upgrades
xAI releases Grok 2 in early beta
Create your own AI Clone
Disney AI brings robots to life
Ideogram 2.0 launches with major upgrades
Ideogram just released version 2.0 of its advanced text-to-image model with major upgrades and new features, including five new image styles, an iOS app, a beta API, and over 1 billion public Ideogram images.
Ideogram 2.0 offers five image styles: General, Realistic, Design, 3D, and Anime.
The Realistic style convincingly resembles photographs with dramatically improved textures for human features like hands and hair, a pain point for previous image generation models.
The Design style also significantly improves text rendering, allowing users to create greeting cards, t-shirt designs and more.
Ideogram offers a free tier that allows users to generate around 40 images, or 10 prompts a day at no charge.
Ideogram 2.0 consistently renders high-quality images with near perfect human hands and text — which is an instant ‘AI giveaway’ in other AI image generators. This makes the model the new gold standard for use cases like memes, newsletter images, YouTube thumbnails, posters, and more.
xAI has begun rolling out early beta access for Grok 2, a powerful new AI model that leverages real-time data from X and uses Flux.1 to generate relatively unfiltered AI images.
Grok 2 is now available to a select group of premium X users in early beta mode.
The model can access and use real-time information from X, setting it apart from ChatGPT and other LLMs.
Grok 2 offers two modes: regular and “fun” mode, with the latter providing a more distinctive and entertaining personality.
When gathering and summarizing news, Grok 2 can reference specific tweets, a capability that cannot be found in ChatGPT or Claude.
Grok 2’s biggest advantage against other top-tier AI chatbots like ChatGPT is its ability to access real-time information from X and provide unfiltered responses. And with Grok 3 rumoured to be coming at the end of 2024, xAI has proven itself as a serious competitor in the LLM race — in a very short period of time.
ETH Zurich and Disney Research scientists have developed an AI system that can generate realistic, physics-based movements for virtual characters and robots from simple text or image inputs.
The system uses a two-stage approach: first, it learns a latent representation of motion from a large dataset, then trains a control policy using reinforcement learning.
It can handle a diverse range of motions, from simple walking to complex acrobatics, outperforming previous methods in accuracy and generalization.
The AI adapts to physical constraints, allowing it to transfer motions to real robots while maintaining balance and style.
Disney released a video showcasing one robot trained on the new two-stage AI technique dancing and getting pushed around while staying on its feet.
This AI system bridges the gap between animation and robotics, helping humanoids move more naturally and adapt better to new situations. With personal robots coming as soon as 2025 and the rapid pace of AI and robotics advancements, we might be coexisting with robots sooner than most people realize.
Neuralink’s second patient is already playing video games with brain implant
Elon Musk’s company Neuralink has implanted a brain chip in a second human patient named Alex, who is now using it to play video games and design 3D objects.
Alex’s recovery from the procedure has been smooth, and he has successfully used computer-aided design software to create a custom mount for his Neuralink charger.
The core technology of Neuralink involves a small, implantable chip with flexible electrode threads that capture and transmit brain activity to external devices like computers.
OpenAI joins Silicon Valley companies lobbying against California’s AI bill
OpenAI’s chief strategy officer Jason Kwon argues that AI regulations should be managed by the federal government, not individual states, to avoid hindering progress and causing businesses to relocate from California.
Kwon states that a consistent, nation-wide set of AI policies will promote innovation, allowing the U.S. to become a leader in global AI standards, and thus opposes California’s SB 1047 bill.
The proposed California AI safety bill, designed by Senator Scott Wiener, includes measures like pre-deployment safety testing and whistleblower protections, and awaits its final vote before potentially being signed by Governor Gavin Newsom.
California and Google drafted a $300 million, 5-year partnership to fund in-state newsrooms and AI initiatives, including a $40 million annual “AI Innovation Accelerator”.
A Daily Chronicle of AI Innovations on August 21st 2024
OpenAI signs landmark agreement with Condé Nast
Microsoft releases new Phi-3.5 models, beating Google, OpenAI and more
AWS CEO tells employees that most developers could stop coding soon as AI takes over
OpenAI adds free fine-tuning to GPT-4o
Claude sued for copyright infringement
Create AI images in real-time on WhatsApp
Microsoft’s new AI beats larger models
Microsoft just released Phi-3.5-MoE, an advanced AI model that rivals the reasoning capabilities of much larger models while maintaining a compact and efficient architecture.
Phi-3.5-MoE uses a new mixture-of-experts (MoE) approach, which selectively activates only the most relevant parts of the model for each task to save compute power.
The new model excels at understanding and following complex instructions and can handle up to ~125,000 words in a single prompt.
In head-to-head benchmarks, Phi-3.5-MoE outperformed popular models like Meta’s Llama 3 8B and Google’s Gemma 2 9B, but fell short against OpenAI’s GPT-4o mini.
Microsoft made the model available under an open-source MIT license on Hugging Face.
While the mainstream media focuses on the most advanced large language model, there’s also another race amongst tech giants for the smartest, fastest, and smallest AI. Breakthroughs like Phi-3.5-MoE are paving the way for advanced AI models to run directly and privately on our mobile devices.
OpenAI signs landmark agreement with Condé Nast
OpenAI announced a new media partnership with Condé Nast to enhance search features using their SearchGPT prototype, aiming to make finding information and reliable content sources faster and more intuitive.
The partnership has raised transparency issues, particularly among Condé Nast’s unionized workers, who are worried about the impact on journalism and the lack of clear details on the agreement.
This deal occurs as Wall Street expresses growing concern over a potential AI bubble, with investors questioning the monetization and viability of AI technologies in the current market.
Microsoft releases new Phi-3.5 models, beating Google, OpenAI and more
Microsoft introduced three new open-source AI models, named mini-instruct, MoE-instruct, and vision-instruct, which excel in logical reasoning and support multiple languages but face challenges in factual accuracy and safety.
The Phi series aims to deliver highly efficient AI models for commercial and scientific purposes using quality training data, though specifics of the Phi-3.5 training process remain undisclosed by Microsoft.
All the new Phi 3.5 models are accessible under the MIT license on Hugging Face and Microsoft’s Azure AI Studio, but they require specialized GPU hardware like NVIDIA A100, A6000, or H100 for optimal performance.
AWS CEO tells employees that most developers could stop coding soon as AI takes over
A leaked recording revealed that AWS CEO Matt Garman believes software developers may soon stop coding as artificial intelligence takes over many of their tasks.
Garman’s remarks, shared during an internal chat in June, were intended as a positive forecast rather than a dire warning for software engineers, emphasizing new opportunities and skills.
Garman highlighted that developers should focus more on understanding customer needs and innovation, rather than just writing code, as AI tools increasingly manage the technical aspects.
Meta deploys new web crawlers that bypass scraping blocks
Meta has introduced new web crawling bots designed to collect data for training its AI models and related products without being easily blocked by website owners.
These new bots, Meta-ExternalAgent and Meta-ExternalFetcher, have features that potentially bypass the traditional robots.txt file, making website owners’ efforts to block them less effective.
Meta’s bots, launched in July, have shown low block rates compared to older versions, with only 1.5% blocking Meta-ExternalAgent and less than 1% blocking Meta-ExternalFetcher, according to Originality.ai.
OpenAI just launched free fine-tuning (up to 1 million tokens per day through September 23) for GPT-4o, allowing developers to customize the model for higher performance and accuracy.
Developers can now, for the first time ever, fine-tune GPT-4o to improve the model’s structure, tone, and domain-specific instructions for their AI applications.
Fine-tuning is available on all paid usage tiers with training costs of $25 per million tokens, but it is completely free until September 23.
OpenAI suggests that developers should see strong results from fine-tuning with only a few dozen training examples.
Additionally, Google’s Gemini API is giving developers 1.5 billion tokens for free every day on its Gemini 1.5 Flash model and 1.6 million tokens on its Gemini 1.5 Pro model.
Just last week, a company that was granted early access to fine-tune GPT-4o, produced Genie and achieved state-of-the-art scores on both SWE-bench Verified (43.8%) and Full (30.1%) benchmarks. With free fine-tuning now available to all developers, get ready for a new wave of smarter, faster and more capable AI bots.
A group of authors filed a lawsuit against AI startup Anthropic, alleging the company committed “large-scale theft” by training its Claude chatbot on pirated copies of copyrighted books.
This is the first lawsuit from writers targeting Anthropic and Claude, but similar lawsuits have been filed against competitor OpenAI and ChatGPT.
The lawsuit accuses Anthropic of using a dataset called The Pile, which includes numerous pirated books.
Anthropic and others, including OpenAI, have argued that training AI models is protected under the “fair use” doctrine of U.S. laws, which permits the limited use of copyrighted materials.
This is not the first time an AI company has been sued over copyright infringement, but it resurfaces an important debate about AI training data. While similar cases have been largely dismissed in the past, courts have yet to definitively address the core issue of using unauthorized internet-scraped material for AI training.
International Data Corporation (IDC)forecasted that worldwide AI spending is expected to reach $632 billion by 2028, with generative AI accounting for 32% of that.
LTX Studio opened to the public and launched five new features, including character animation and dialogue, face motion capture, and generation and keyframe control.
A Daily Chronicle of AI Innovations on August 20th 2024
AGIBOT reveals new humanoid robot family
ChatGPT runs for mayor in Wyoming
Luma Labs launches Dream Machine 1.5
Tesla’s humanoid robot has a new competitor
Waymo now giving 100,000 weekly robotaxi rides
Fortune 500 companies are getting increasingly worried about AI
Anthropic gets sued on allegations of ‘large-scale theft’
Nvidia’s new AI predicts thunderstorms with kilometer-scale precision
Luma Labs launches Dream Machine 1.5
Luma Labs just released Dream Machine 1.5, a major upgrade to their current AI video generation model, with higher quality text-to-video, smarter prompt understanding, and better image-to-video capabilities.
Dream Machine 1.5 builds on the original model’s ability to generate high-quality, realistic 5-second video clips from text and image prompts.
The upgraded model showcases better natural language processing, interpreting and executing prompts at a higher accuracy.
It excels in creating smooth motion, cinematography, and dramatic shots, turning static concepts into dynamic stories, but lags in morphing, movement, and text.
Dream Machine 1.5 is available to try for free here.
With text-to-image AI generation nearly indistinguishable from reality, the next big frontier is text-to-video — and Dream Machine 1.5 is another big leap forward for realism. While AI video still has some catching up to do, expect fast-moving startups like Luma Labs to close that gap for AI video, fast.
Victor Miller, a mayoral candidate in Wyoming’s capital city, just vowed to let his customized ChatGPT GPT named Vic (Virtual Integrated Citizen) help run the local government if elected.
Miller created VIC using ChatGPT, feeding it city ordinances and related documents to make municipal decisions.
Miller filed for him and VIC to run for mayor, proposing that the ChatGPT GPT provides data-driven insights and solutions while Miller ensures legal execution.
OpenAI has shut down Miller’s account twice, citing policies against using its products for campaigning.
Wyoming’s Secretary of State raised concerns, but local officials allowed Miller’s candidacy with his human name on the ballot.
While Miller’s chances of winning seem slim, and his grasp of data privacy and LLMs seem slimmer, this marks the first time a political candidate has openly advocated for AI in governance. Whether Cheyenne, Wyoming is ready for an AI co-pilot in City Hall is debatable, but AI will certainly infiltrate politics in the coming years.
AGIBOT, a China-based robotics startup, just unveiled a family of five advanced humanoid robots, directly challenging Elon Musk and Tesla’s upcoming Optimus bot.
AGIBOT’s five new models are both wheeled and biped humanoid robots specifically designed for diverse tasks — from household chores to industrial operations.
The flagship model, Yuanzheng A2, stands 5’9″ (175cm), weighs 121 lbs (55kg), and can perform delicate tasks like needle threading.
The company aims to start shipping 300 units by the end of 2024, claiming better commercialization and cost-control abilities than Tesla.
Unitree, another high-performance robot manufacturer from China, also showcased its new G1 mass production-ready robot with better functionality and appearance.
The humanoid robotics and AI race between the US and China is intensifying. While it’s been months since Tesla unveiled its Optimus 2 prototype, four Chinese startups, including AGIBOT revealing five new humanoid robots, have showcased major technical progress in just a few days.
Unitree Robotics has launched the production version of its G1 humanoid robot, priced at $16,000, just three months after its initial announcement.
The G1 is 90% cheaper than Unitree’s previous humanoid model, the H1, offering advanced features such as 23 degrees of freedom and a 3D vision system for real-time navigation.
While the G1 is not ready for consumer use, it is envisioned as an affordable platform for research and development, likely appealing to institutions and businesses exploring robotic automation.
Waymo disclosed it is now giving more than 100,000 paid robotaxi rides every week across Los Angeles, San Francisco, and Phoenix, doubling its previously stated figures.
This milestone was shared by Waymo co-CEO Tekedra Mawakana and reflects a significant increase from the over 50,000 weekly rides reported by Alphabet CEO Sundar Pichai earlier this year.
Waymo’s fleet consists of hundreds of fully autonomous Jaguar I-Pace vehicles, with 778 robotaxis deployed in California, and it has recently expanded its service to operate 24/7 in San Francisco and parts of Los Angeles.
Fortune 500 companies are getting increasingly worried about AI
Fortune 500 companies reporting AI as a risk factor saw a surge of 473.5% in the past year, according to a report by Arize AI, with 281 companies now flagging such risks.
Arize AI’s analysis revealed that 56.2% of Fortune 500 companies now include AI risks in their latest annual reports, a substantial jump from the previous year’s 49 companies.
The software and technology sectors lead the mentions of generative AI, while advertising, media, and entertainment industries report the highest percentage, 91.7%, of AI as a risk factor.
Anthropic gets sued on allegations of ‘large-scale theft’
A group of authors has filed a lawsuit against AI startup Anthropic, alleging “large-scale theft” for using pirated copies of copyrighted books to train its chatbot, Claude.
This marks the first lawsuit by writers specifically targeting Anthropic, although similar cases have been brought against OpenAI, the maker of ChatGPT, for the same reasons.
The lawsuit accuses Anthropic, which markets itself as a responsible AI developer, of contradicting its goals by using unauthorized works, and it adds to the increasing legal challenges faced by AI developers.
Nvidia’s new AI predicts thunderstorms with kilometer-scale precision
Nvidia Research has introduced StormCast, a new AI model for high-precision atmospheric dynamics to enhance mesoscale weather prediction, which is critical for disaster preparedness and mitigation.
Integrated into Nvidia’s Earth-2 platform, StormCast provides hourly autoregressive forecasts that are more accurate than current US operational models by 10%, improving early warning systems for severe weather events.
Trained on NOAA climate data, StormCast predicts over 100 weather variables and allows scientists to observe storm evolution in three dimensions, marking significant advancements in AI-driven weather forecasting by Nvidia.
A Daily Chronicle of AI Innovations on August 19th 2024
You can now rent ‘living computers’ made from human neurons
Start-up failures up by 60% as founders face hangover from boom years
AMD is going after Nvidia with a $5 billion acquisition
Tesla will pay you to pretend to be a robot
You can now rent ‘living computers’ made from human neurons
Researchers and companies like FinalSpark are creating computers from lab-grown human brain organoids, which can be rented for $500 a month.
These biocomputers use human neurons to form pathways mimicking human brain learning processes, potentially consuming significantly less energy than current AI technologies.
While challenges remain, such as limited organoid lifespans and lack of standardized manufacturing, FinalSpark and other researchers are exploring various biocomputing approaches, including cellular and fungal computing.
AMD is going after Nvidia with a $5 billion acquisition
AMD is set to buy ZT Systems for $4.9 billion in cash and stock, aiming to strengthen its AI ecosystem and offer better support to companies building large AI computing businesses.
The acquisition will integrate ZT Systems’ computing infrastructure design business into AMD, although AMD plans to sell the data center infrastructure manufacturing arm to a strategic partner.
ZT Systems’ CEO Frank Zhang and President Doug Huang will lead roles within AMD’s Data Center Solutions Business Group, with the deal expected to conclude in the first half of 2025.
Tesla is offering up to $48 per hour for Data Collection Operators to wear motion-capture suits and VR headsets to help train its humanoid Optimus robot.
Workers wearing these suits perform and analyze tasks to gather extensive data, aiding in the robot’s development for various roles, from factory work to caregiving.
Tesla’s initiative involves collecting potentially millions of hours of data, aiming to overcome the challenges of producing versatile robots at scale and ensuring their success in diverse tasks.
Swiss startup FinalSpark just launched a service allowing scientists to rent cloud access to “biocomputers” made of human brain cells for $500 a month, in an effort to create AI that uses 100,000x less energy than current systems.
The system uses organoids (clumps of human brain cells) that can “live” and compute for up to 100 days.
AI models are trained using dopamine for positive reinforcement and electrical signals for negative reinforcement, mimicking natural neural processes.
FinalSpark claims these biocomputers could be up to 100,000 times more efficient for AI training than traditional silicon-based technology.
The organoids and their behaviour are live streamed 24/7, which you can access here.
AI is an energy-hungry industry, and alleviating its dependence on CPUs and GPUs is generally a step in the right direction. That said, using brain organoids for biocomputing is completely uncharted territory and is bound to raise ethical concerns — such as the sci-fi possibility that cell masses somehow achieve consciousness.
California’s SB 1047, an aggressive AI safety bill aimed at preventing AI disasters, just got significantly revised to address concerns raised by AI companies like Anthropic and open-source developers.
The bill no longer allows California’s attorney general to sue AI companies for negligent safety practices before a catastrophic event occurs.
AI labs are now only required to submit public “statements” about their safety practices vs certifications “under penalty of perjury.”
Likewise, developers must now provide “reasonable care” vs “reasonable assurance” that AI models do not pose significant risks.
The bill is headed to California’s Assembly floor for a final vote.
There’s a fine line between advancing technological progress and mitigating potential existential risks that governments are navigating — and California is showing that regulation can be practical and adaptive. These changes are a big step towards fostering responsible AI development through collaborative governance.
Researchers just developed a new technique to find shorter solutions to scrambled Rubik’s Cubes by cleverly analyzing the puzzle’s structure and identifying the best moves more quickly.
The Rubik’s Cube has an enormous number of possible configurations, over 43 quintillion, making it challenging for AI to solve in the fewest moves possible.
Researchers represented the Rubik’s Cube as a complex network or “graph” and used a new technique to pass useful information, like the moves required to solve the puzzle, between connected nodes.
The AI then considers which next moves are most likely to lead to a quick solution, using the probabilities as weights, and focuses on the most promising paths.
When tested, the new technique found solutions to the puzzle faster than current state-of-the-art Rubik’s Cube solving AI systems.
As companies like Sakana build AIs that can completely automate scientific research, it’s important to make sure they’re solving highly complex problems efficiently. This technique, coupled with Sakana’s processes, could be massively beneficial in areas like optimizing supply chains and advanced drug discovery.
Free event: Navigating AI Data Privacy. Join Section CEO Greg Shove to learn how to protect your data, write a team or company AI data policy, and lead your company on safe AI. RSVP here.*Source: https://www.sectionschool.com/events/live-events/ai-data-privacy-in-large-organizations
Claudehttps://x.com/alexalbert__/status/1824483452802175082 a new screenshot capture button, allowing users to easily include images from their screen in prompts.Source: https://x.com/alexalbert__/status/1824483452802175082
Midjourneyreleased a new unified web-based AI image editor with advanced tools for seamlessly modifying and extending generated images.Source: https://venturebeat.com/ai/midjourney-releases-new-unified-ai-image-editor-on-the-web
Rebellions and Sapeon, South Korean AI chip makers, signed a definitive merger agreement to challenge global leaders like Nvidia.Source: https://www.reuters.com/technology/artificial-intelligence/south-korean-ai-chip-makers-rebellions-sapeon-agree-merge-2024-08-18
Bzigo launched Iris, an AI-powered mosquito detector that tracks and marks mosquitoes with a laser pointer for easy swatting.Source: https://www.foxnews.com/tech/ai-technology-can-help-you-win-battle-over-mosquitoes
Coinbasestarted a $15,000 accelerator grant program for projects combining AI with crypto wallets to enable economic participation.Source: https://cointelegraph.com/news/coinbase-ceo-brian-armstrong-ai-should-have-crypto-wallets
Microsoftunveiled PowerToys Workspaces, a new feature to auto-arrange apps, plus an AI-powered copy-paste tool with OpenAI API integration.Source: https://www.theverge.com/2024/8/16/24221639/microsoft-powertoys-workspaces-feature-demo
A Daily Chronicle of AI Innovations on August 16th 2024
AI makes Walmart 100x more productive
SoftBank’s AI chip faces setback
Create a Siri-like voice AI with Llama 3.1
Hermes 3 is the newest open-source model
AI makes Walmart 100x more productive
Walmart’s CEO Doug McMillon just reported that the company is using generative AI to increase its productivity, updating 850 million product catalog entries 100 times faster than human-led methods.
The report came during the company’s Q2 financial earnings call, where McMillon also announced AI improvements to customer search and seller support.
Customers can now use AI-powered search and a new shopping assistant on Walmart’s app and website — it even provides advice for questions like “Which TV is best for watching sports?”.
Walmart is also testing a completely new AI-driven experience for U.S. based marketplace sellers, but the details are not yet available.
McMillon said the company plans to continue experimenting with AI globally across all parts of its business.
Another multibillion dollar company is using AI to increase productivity, but most notably, Walmart is exploring the tech in all areas of its business ops. Whether people should be excited about the endless possibilities ahead or concerned about the relevance of their jobs is a question that’s not going away any time soon.
SoftBank’s ambitious Project Izanagi initiative, aimed at developing AI processors to rival Nvidia, is reportedly facing a major setback after Intel failed to meet volume and speed requirements.
SoftBank had been working with Intel to develop AI processors for Project Izanagi because it lacks in-house chip design expertise, but Intel failed to meet SoftBank’s demands.
In an effort to keep Project Izanagi on track, SoftBank is considering a new partnership with TSMC, the world’s largest chipmaker.
TSMC has its own issues, however, failing to meet its current chipmaking demands, which has stalled the negotiations.
Despite the complications, SoftBank CEO Masayoshi Son remains committed to the company’s ambitious plan and is seeking investments from Saudi Arabia, UAE, and major tech companies.
Nvidia is currently dominating the AI chip space, which propelled the company to its current $3 trillion dollar market capitalization. But with recent delays of Nvidia’s next-gen Blackwell AI chip, it could be time for competitors to strike.
Nous Research just released Hermes 3, a new open-source model with significant improvements in roleplaying, agentic tasks, function calling, multi-turn chats, and long context coherence.
Hermes 3 is available in three sizes (8B, 70B, and 405B) with the 405B parameter model achieving state-of-the-art performance relative to other open models.
The model is instruct tuned, or trained, to faithfully respond to user requests and closely follow provided system prompts, unlike base or foundation models.
It achieves similar or better performance to Meta’s Llama-3.1 405B in judgement, reward modeling, interpretable problem-solving, code generation, and tool use.
Hermes 3 is available now for free via Lambda Chat or in the Nous Research Discord server.
Meta has been the leader in open-source AI for a while, but companies like Nous Research and Mistral are catching up with their latest Hermes 3 and Large 2 models. And the more free, customizable and state-of-the-art AIs available to the public, the more transparency the world has.
Elon Muskrevealed that xAI is developing an in-house image generation system to replace the current Flux model in Grok 2 but it’s currently months away from release.
The U.S. Consumer Financial Protection Bureauhighlighted risks of AI in finance, saying existing laws apply and innovation requires consistent regulatory treatment.
Apptronik, an automation company that makes humanoid robots, recently reported that the company is preparing for a commercial launch by the end of 2025.
A Daily Chronicle of AI Innovations on August 15th 2024
Apple’s iPad is getting a robotic arm
Google’s Imagen 3 tops Midjourney, DALL-E
Apple’s next big thing is a $1000 home robot
Grok-2 reaches state-of-the-art status
Creating sound effects with text
X’s AI image generator allows users to create uncensored images
Ex-Google CEO says successful AI startups can steal IP and hire lawyers to ‘clean up the mess’
FTC finalizes rule banning fake reviews, including those made with AI
Apple’s next big thing is a $1000 home robot
Apple is reportedly working on a new smart home project featuring an iPad attached to a robotic arm that can twist and rotate, designed as a home “command center” with AI capabilities.
The initiative, backed by CEO Tim Cook and head of hardware engineering John Ternus, has involved hundreds of staff and follows the cancelled Apple-brand electric car project.
According to Bloomberg, the device is expected to be released around 2026 or 2027, potentially costing about $1,000, and will use a modified version of iPadOS.
xAI’s newest AI model, Grok-2, is now available in beta for users on the X platform — achieving state-of-the-art status and outperforming versions of Anthropic’s Claude and OpenAI’s GPT-4.
In addition to Grok-2, Grok-2 mini is also now available to users on the X platform in beta with an enterprise API release planned for later this month.
Both Grok-2 and Grok-2 mini show significant improvements in reasoning with retrieved content, tool use capabilities, and performance across all academic benchmarks.
Grok-2 can now create and publish images directly on the X platform, powered by Black Forest Lab’s Flux 1 AI model.
Grok-2 surpasses OpenAI’s latest GPT-4o and Anthropic’s Claude 3.5 Sonnet in some categories, making it one of the best models currently available to the public if based purely on benchmarks.
Grok-1 debuted as a niche, no-filter chatbot, but Grok-2’s newly achieved state-of-the-art status has catapulted xAI into a legitimate competitor in the AI race. The startup is looking to have a bright future with its new Supercluster, Elon’s ability to attract talent, and vast amounts of real-time training data available on X.
Apple is reportedly ramping up development on a high-end tabletop smart home device with a robotic arm, an iPad-like display, and Siri voice command to operate its AI features.
The project, codenamed J595, reportedly involves a team of several hundred people and could launch as early as 2026 or 2027.
The device combines an iPad-like display with a thin robotic arm that can tilt, spin 360 degrees, and move the screen around.
It is expected to run a modified version of iPadOS making it a familiar smart home command center, videoconferencing tool, and remote-controlled home security device.
Apple is targeting a price point of around $1,000 for the product.
Apple is doubling down on its commitment to artificial intelligence by ramping up the development of a strange new Siri-powered, countertop robotic arm. With Apple Intelligence launching later this year, the tech giant seemingly has big plans for implementing AI into its hardware.
X’s AI image generator allows users to create uncensored images
X’s new AI image generator, Grok, allows users to create and share highly controversial images, including those of public figures in inappropriate scenarios, raising concerns about the lack of content moderation.
Despite claiming to have restrictions, Grok often generates offensive or misleading images, with many users easily bypassing its few safeguards, leading to further scrutiny from regulators.
The chaotic rollout of Grok’s image generation feature aligns with Elon Musk’s relaxed approach to content moderation, potentially driving away advertisers and inviting regulatory action.
ElevenLabs now offers a text-to-sound feature that allows users to generate sound effects by writing a simple description of the noise they want.
Visit ElevenLabs and log in or create an account. You can try this feature for free.
Select “Sound Effects” from the left sidebar.
Describe your desired sound effect in the text box.
Adjust settings for duration and prompt influence.
Click “Generate Sound Effects” to create your sounds.
Source: https://elevenlabs.io/
Google’s Imagen 3 tops Midjourney, DALL-E
Google DeepMind recently published the paper for it’s new state-of-the-art AI image generation model, Imagen 3, flexing that it beat DALL-E 3, Midjourney v6, and Stable Diffusion 3 in human performance evaluations.
The human evaluations asked participants to rank their preferred models for overall quality and adherence to detailed prompts.
Imagen 3 excelled particularly in generating high-quality, realistic images that closely match long and complex text descriptions.
Despite its capability to accurately generate photorealistic images, it struggles with certain tasks requiring numerical reasoning, understanding scale, and depicting actions.
Ex-Google CEO says successful AI startups can steal IP and hire lawyers to ‘clean up the mess’
Former Google CEO Eric Schmidt suggested that successful AI startups can initially steal intellectual property and later hire lawyers to resolve legal issues if their product gains traction.
Schmidt used a hypothetical example of copying TikTok to illustrate how Silicon Valley entrepreneurs might prioritize rapid growth over legal considerations.
Schmidt’s comments, made during a talk at Stanford, were later removed from the university’s YouTube channel after drawing media attention.
FTC finalizes rule banning fake reviews, including those made with AI
The FTC has introduced a final rule prohibiting companies from producing or selling fake reviews, including AI-generated ones, and can now penalize companies that ignore the regulation.
The rule targets deceptive practices such as incentivizing feedback, undisclosed insider reviews, company-controlled review sites, intimidation to remove negative feedback, and the trade of fake followers or views.
Although the FTC first proposed the fake review ban last year, there are concerns about enforcing it on global marketplaces like Amazon, where numerous fraudulent reviews come from businesses outside the U.S.
Free eBook: The AI Proficiency Report from Section. 7% of the workforce is getting all the benefits of AI. Download the report to see what they do differently.*
A Daily Chronicle of AI Innovations on August 14th 2024
Google beats OpenAI in voice mode race
OpenAI redesigns coding benchmark
Bring images to life with Kling AI
Become a tennis pro with AI
Android phones get an AI upgrade
xAI releases Grok-2, adds image generation on X
New ‘AI Scientist’ conducts research autonomously
Android phones get an AI upgrade
Google is replacing Google Assistant with its new AI model, Gemini, on Android phones, introducing generative AI capabilities like automating calendar invites and creating playlists based on user input.
Gemini will operate through cloud-based services, allowing for advanced AI processing, while Apple plans to run its AI models directly on devices for better privacy and latency.
The introduction of Gemini marks a significant shift in smartphone functionality, offering the potential to automate day-to-day tasks, but there are risks of errors as AI assistants become more integrated into daily life.
Google just launched Gemini Live, a mobile conversational AI with advanced voice capabilities, while OpenAI’s ChatGPT voice mode remains in its “limited alpha phase” and is not yet available to everyone.
Gemini Live, Google’s answer to OpenAI’s Advanced Voice Mode, is capable of “in-depth“ hands-free conversations and has 10 different human-like voice options.
Users can interrupt and ask follow-up questions mid-response, mimicking natural conversation flow — however Gemini Live’s ability to see and respond to your camera view is planned later this year.
Similar to Apple’s upcoming Intelligence features, Gemini integrates directly with Google to provide context-aware answers without switching apps.
Gemini Live is now the default assistant on Google’s Pixel 9 and is available today to all Gemini Advanced subscribers on Android (coming to iOS soon).
Real-time voice is slowly shifting AI from a tool we text/prompt with, to an intelligence that we collaborate, learn, consult, and grow with. As the world’s anticipation for OpenAI’s unreleased products grows, Google has swooped in to steal the spotlight as the first to lead widespread advanced AI voice rollouts.
xAI has launched upgraded Grok-2 and Grok-2 mini chatbots with new image-generation capabilities, which are powered by Black Forest Lab’s Flux 1 AI model and allow users to publish images to X with few restrictions.
Both Grok-2 models are currently in beta, available to Premium and Premium Plus subscribers on X, and will be accessible via xAI’s enterprise API later this month.
Early examples of Grok-generated images, depicting figures like Donald Trump and Barack Obama, indicate minimal content restrictions, raising concerns about the spread of false information on the platform.
OpenAI and the authors of SWE-bench collaborated to redesign the popular software engineering benchmark and release ‘SWE-bench Verified’, a human-validated subset of the original benchmark.
SWE-bench Verified addresses issues in the original benchmark, such as overly specific unit tests and unreliable development environments that leads to incorrect assessments of AI performance.
The new subset includes 500 samples verified by human professional software developers to make evaluating models on SWE-bench easier and more reliable.
On SWE-bench Verified, GPT-4o figures out 33.2% of samples, and the best open-source scaffold, Agentless, doubles its previous score to 16%.
The leaderboard for SWE-bench Verified does not include Cosine’s Genie we wrote about yesterday, which shattered the high score on the old benchmark by over 10%.
Accurate benchmarking of AI in human-level tasks like coding is crucial for transparency and assessing AI risk. However, OpenAI’s collab with SWE-bench is a double-edged sword — while it improves the benchmark, it also raises questions about potential conflicts of interest, especially with ‘Project Strawberry’ rumors heating up.
Tokyo-based R&D company Sakana AI introduced “The AI Scientist,” an AI designed to fully automate research, claiming it’s the first system of its kind to independently handle numerous scientific tasks.
The AI Scientist generates innovative research ideas, conducts experiments, writes code, and produces scientific papers while using a simulated review process to evaluate its own findings, mimicking human scientific collaboration.
A rival AI startup, Omniscience, contested Sakana AI’s originality, asserting their AI model, Omni, was released months earlier and offers similar capabilities for aiding users in scientific writing and research tasks.
Kling AI’s new image-to-video feature allows users to take static images, and turn them into dynamic videos, offering a new dimension to the AI video generator’s character consistency.
Click “AI Videos” on the dashboard, then select “Image to Video” on the top bar.
Upload your chosen image and write a prompt describing how you want the image animated.
Hit “Generate” and watch your image come to life!
Source: https://klingai.com/
Become a tennis pro with AI
Researchers just created Match Point AI, a groundbreaking tennis simulation that pits AI agents against virtual pros, giving players data-driven tennis strategies and tools to help improve their game.
Match Point AI realistically models the complexities and uncertainties of real tennis, allowing AI to test new strategies in virtual games.
Early experiments show the AI rediscovering time-tested tennis strategies, like making opponents run, validating the framework’s ability to understand the sport.
By watching Match Point’s AI agents that mimic tennis legends like Novak Djokovic, players can learn the perfect strategies to optimize their game quickly and efficiently.
AI has long been trained to compete in games, but researchers usually focus on board and video games with straightforward mechanics. Match Point AI learns to make decisions in a real-world, complex sport, similar to how Google’s newest AI robot can play ping pong against intermediate players.
What else is happening in AI on August 14th 2024!
Google unveiled Pixel Buds Pro 2 with a custom Tensor A1 chip, enhanced noise cancellation, and Gemini AI integration.
A Daily Chronicle of AI Innovations on August 13th 2024
New AI can diagnose stroke via tongue color
Sakana reveals an autonomous AI scientist
New AI model sparks rumors about OpenAI’s Q* New AI model can listen while speaking Gemini 1.5 Flash cuts usage fees by 78% OpenAI releases GPT-4o System Card, revealing safety measures SingularityNet’s supercomputer network: A step closer to AGI
New AI model sparks rumors about OpenAI’s Q*
A mysterious new AI model has appeared in the LMSYS Chatbot Arena, sparking rumors that it could be OpenAI’s highly anticipated Q* AI breakthrough or its evolution, codenamed ‘Strawberry.’
Testers report that this “anonymous-chatbot” displays more advanced reasoning capabilities than the current state-of-the-art GPT-4o model. To add to the speculation, OpenAI CEO Sam Altman has tweeted a picture of a strawberry, which is believed to be the codename for OpenAI’s secret new AI model.
Why does it matter?
If this mystery model is indeed Q*, it could represent another significant leap forward in AI capabilities as OpenAI’s competitors like Anthropic and Meta start to catch up to GPT-4o. This could be a massive paradigm shift that could significantly reshape the landscape of AI.
Tokyo-based Sakana AI just introduced “The AI Scientist,” the world’s first AI system capable of autonomously conducting scientific research — potentially revolutionizing the scientific process.
The system generates new research ideas, writes code, runs experiments, writes papers, and performs its own peer review with near-human accuracy.
Sakana AI envisions a future where we won’t just see an autonomous AI researcher but also autonomous reviewers, area chairs, and entire conferences.
The AI Scientist has already produced papers with novel contributions in machine learning domains like language modeling and diffusion models.
Each paper only costs approximately $15 to produce, which could potentially democratize research capabilities.
This breakthrough could dramatically accelerate scientific progress by allowing researchers to collaborate with AI agents and automate time-consuming tasks. We’re entering a new era where academia could soon be powered by a tireless community of AI agents, working round-the-clock on any problem they’re directed to.
Cosine just showed off Genie, its new fully autonomous AI software engineer that broke the high score on a benchmark for evaluating the coding abilities of large language models (LLMs), by over 10%.
Cosine trained Genie on a dataset that emulates how human software engineers actually work from incremental knowledge discovery to step-by-step decision making.
When it makes a mistake, Genie iterates, re-plans, and re-executes until it fixes the problem, something that foundational models struggle with.
Genie scored 30.08% on SWE-Bench, a 57% improvement over previous top performers like Amazon’s Q and Code Factory at 19% (GPT-4 scores 1.31%).
The waitlist is currently open, but Genie has not yet been released to the general public.
Cosine completely rethinks the way that AI is trained, teaching it to be more human-like during its training rather than focusing on post-training prompt design — and it works! With its recent SWE-Bench success, more companies are likely to adopt the process and build smarter AIs, a win-win for everyone.
Researchers have developed a new Listening-While-Speaking Language Model (LSLM) that can listen and speak simultaneously. This allows for more natural and responsive conversations with AI systems. The LSLM uses a token-based decoder-only text-to-speech model for speech generation and a streaming self-supervised learning encoder for real-time audio input.
This enables the model to detect turn-taking and respond to interruptions, a key feature of natural conversation. In addition, the LSLM has demonstrated robustness to noise and sensitivity to diverse instructions in experiments.
Why does it matter?
While OpenAI’s advanced voice mode for ChatGPT pushes us towards realistic AI conversations, LSLM takes that to the next level, where it could revolutionize human-AI interactions, making conversations with machines feel natural and responsive.
Google has announced significant updates and improvements to its Gemini API and Google AI Studio. The biggest news is a significant reduction in the usage fees for Gemini 1.5 Flash. The input token costs have decreased by 78% to $0.075 per 1 million tokens, and the output token costs have decreased by 71% to $0.3 per 1 million.
This makes Gemini 1.5 Flash a popular and affordable summarization and multi-modal understanding model. Google has also completed the Gemini 1.5 Flash tuning rollout, allowing developers to customize the base model and improve its performance.
Why does it matter?
The extended language support, model tuning options, and improvements to the Gemini API will enable more developers and researchers to build innovative AI-powered products and services using advanced NLP capabilities.
SingularityNet’s supercomputer network: A step closer to AGI
SingularityNET is launching a network of powerful supercomputers to accelerate the development of AGI. The first of these supercomputers is expected to come online in Sep 2024. The network will use cutting-edge hardware like Nvidia GPUs and AMD processors to create a “multi-level cognitive computing network” for hosting and training complex AGI systems.
The company uses an open-source software framework called OpenCog Hyperon to manage the distributed computing power. Users will access the network through a tokenized system, allowing them to contribute data and test AGI concepts.
Why does it matter?
Major AI companies such as OpenAI, Anthropic, and Google currently dominate the race to AGI development. However, SingularityNET’s novel decentralized approach could disrupt this, democratizing AI research for a broader range of contributors and innovators.
An AI developed by researchers at Middle Technical University and the University of South Australia can diagnose stroke by analyzing the color of a person’s tongue.
The advanced algorithm, which boasts a 98% accuracy rate, can also detect conditions such as anaemia, asthma, diabetes, liver, and gallbladder issues, COVID-19, and various gastrointestinal diseases.
This innovative system uses tongue color analysis, an ancient technique from traditional Chinese medicine, and could potentially be adapted for use with smartphones for real-time health assessments.
Reddit is testing AI-powered search result pages that provide summaries and recommendations to help users “dig deep” into content and discover new communities.
According to leaked documents, Nvidia has been scraping video content from sources like YouTube and Netflix to train its AI models for its upcoming Cosmos project.
Automattic has launched a newtool called “Write Brief with AI.” This helps WordPress bloggers write concisely and improve the readability of their content.
Anthropic is expanding its safety bug bounty program to focus on finding flaws in its AI safeguarding systems. The company is offering bounty rewards of up to $15,000.
OpenAI allows free ChatGPT users to generate up to two images per day using its DALL-E 3 model. This was previously available only to ChatGPT Plus subscribers.
Google Researchers developed a robot to play competitive table tennis at an amateur human level. It can also adapt its game to play vs. unseen human opponents.
Alibaba has released a new LLM called Qwen2-Math that scored 84% on the MATH Benchmark, surpassing OpenAI’s GPT-4o and other leading math-focused AI models.
Google Meet is rolling out a new AI-powered feature, “Take notes for me,” which can automatically take notes during video calls,boosting productivity and efficiency.
A Daily Chronicle of AI Innovations on August 12th 2024
AI search is gaining momentum
ChatGPT unexpectedly began speaking in a user’s cloned voice during testing
Meta and UMG struck an agreement to ‘protect’ artists from AI
Google Meet adds new note-taking AI
FCC cracks down on AI voice calls
Google Meet adds new note-taking AI
Google is rolling out a new “Take notes for me” feature powered by its Gemini AI for it’s Google Meet feature, allowing users to focus on the meeting while the AI automatically captures key points.
The AI-powered tool will automatically take notes during Google Meet calls, reducing the need for manual note-taking.
The feature is powered by Google’s Gemini AI and will be available to Workspace customers with specific add-ons.
“Take notes for me” is part of the AI Meetings and Messaging add-on, which costs $10 per user/month across most Google Workspace plans.
Admins can configure the feature’s availability through the Google Workspace Admin console.
Taking notes during meetings will soon be a thing from our prehistoric, non-AI past — with Google pushing for a more practical, AI-assisted future of work. Alongside this, the tech giant is directly competing against smaller AI startups such as Otter AI and Fireflies who’ve thrived by selling a nearly identical features to users.
The U.S. Federal Communications Commission (FCC) just proposed new regulations requiring AI-generated voice calls to disclose the use of artificial intelligence.
The proposal aims to combat the rise of AI-generated voices in unwanted and potentially fraudulent ‘robocalls’.
AI voices would be required to explicitly state they are artificial at the beginning of calls.
The FCC is also exploring tools to alert people when they receive AI-generated calls and texts, including enhanced call filters, AI-based detection algorithms, and improved caller ID flagging.
As AI voices become indistinguishable from human speech, these regulations are crucial in combating highly targeted scams. But with enforcement likely to be a cat-and-mouse game against scammers, the best defence is education—especially for those most vulnerable to AI deception.
Perplexity’s AI search engine experienced substantial growth, answering 250 million queries last month, signaling a rising demand for AI-driven search technologies. In contrast, 500 million queries were processed throughout 2023, Shevelenko told the Financial Times
Despite this growth, Perplexity remains significantly behind Google, which dominates the market with over 90 percent share and processes around 8.5 billion queries daily.
The rise of AI in search, exemplified by Perplexity and other players, suggests a potential shift in user behavior and challenges to the traditional search engine business models.
ChatGPT unexpectedly began speaking in a user’s cloned voice during testing
During testing, ChatGPT’s Advanced Voice Mode accidentally mimicked users’ voices without their consent, as highlighted in OpenAI’s new GPT-4o system card released on Thursday.
OpenAI has implemented safeguards to prevent unauthorized voice imitation, although rare episodes during testing showcased the model’s ability to unintentionally generate user-like voices.
The GPT-4o AI model can synthesize almost any sound, and OpenAI directs this capability by using authorized voice samples and employing an output classifier to ensure only selected voices are generated.
Meta and UMG struck an agreement to ‘protect’ artists from AI
Meta and Universal Music Group (UMG) updated their licensing agreements to extend UMG’s content use across more Meta platforms, now including Threads and WhatsApp alongside Facebook, Instagram, Messenger, and Meta Horizon.
This multiyear agreement aims to explore new collaboration opportunities on WhatsApp and other Meta platforms, addressing issues like unauthorized AI-generated content that could impact artists and songwriters.
Meta’s collaboration with UMG dates back to 2017, allowing users to use UMG music in content and addressing copyright issues, a challenge shared by TikTok in its recent dealings with UMG.
Delphi unveiled an AI clone feature that creates lifelike digital replicas of individuals, demonstrating its capabilities in a TV interview on FOX Business.
A Daily Chronicle of AI Innovations on August 09th 2024
OpenAI fears users will become emotionally dependent on its ChatGPT voice mode
Google’s new robot can play table tennis like humans
GPT-4 tackles top-secret tasks
AI speeds up schizophrenia cure
OpenAI fears users will become emotionally dependent on its ChatGPT voice mode
OpenAI is concerned that users may become emotionally dependent on ChatGPT due to its new, human-sounding voice mode, which could affect relationships and social interactions.
The company observed users expressing shared bonds with ChatGPT’s voice mode, raising fears that prolonged use could reduce the need for human interaction and lead to unhealthy trust in AI-supplied information.
OpenAI plans to continue studying the potential for emotional reliance on its tools and aims to navigate the ethical and social implications responsibly while ensuring AI safety.
Google’s new robot can play table tennis like humans
Google’s DeepMind team has developed a table tennis robot that performs at a “solidly amateur” human level, successfully competing against beginner and intermediate players while struggling against advanced ones.
During testing, the robot achieved a 55% win rate against intermediate players, winning 45% of the 29 games it played in total, but it failed to win any matches against advanced players.
DeepMind identifies the robot’s main weaknesses as reacting to fast balls and dealing with system latency, suggesting improvements like advanced control algorithms and predictive models for better performance.
Researchers at Uppsala University recently used AI to accurately predict 3D structures of receptors linked to schizophrenia and depression treatments and speed up possible treatment strategies.
The AI model predicted the structure of TAAR1, a receptor linked to schizophrenia and depression treatments.
Then, supercomputers screened millions of molecules to find those fitting the AI-generated model.
Experimental testing confirmed many AI-predicted molecules activated TAAR1, and one potent molecule showed promising positive effects in animal experiments.
Researchers reported on a new model that can predict major diseases early enough to treat them, and now AI is working on curing schizophrenia and depression. As the tech continues to improve, we’re going to see a complete transformation in healthcare that will likely save millions, if not billions, of lives.
Microsoft and Palantir just partnered to deliver advanced AI, including GPT-4, and analytics capabilities to U.S. Defense and Intelligence agencies through classified cloud environments.
The partnership integrates Palantir’s AI Platforms with Microsoft’s Azure OpenAI Service in classified clouds.
The aim is to safely and securely enable AI-driven operational workloads across defense and intelligence sectors.
OpenAI’s models, including GPT-4, will be leveraged by the U.S. government to develop innovations for national security missions.
AI being trusted with classified documents is a big leap in its acceptance as a useful tool for humanity. However, it does feel a bit unsettling knowing that OpenAI’s models are being used at the government level, with the safety team completely dissolving last month and the still uncovered mysteries sorrounding Q*.
Galileo*: Our latest LLM Hallucination Index ranks 22 of the leading models on their performance across 3 different RAG tasks, evaluating the correctness of their responses and propensity to hallucinate.Read the report
A Daily Chronicle of AI Innovations on August 08th 2024
Humane’s AI Pin daily returns are outpacing sales
Sam Altman teases ‘Project Strawberry‘
AI breakthrough accurately predicts diseases
OpenAI bets $60M on webcams
Humane’s AI Pin daily returns are outpacing sales
Humane has faced considerable challenges with the AI Pin, seeing more returns than purchases between May and August, with current customer holdings near 7,000 units.
The AI Pin received negative reviews at launch, leading to efforts by Humane to stabilize operations and look for potential buyers or additional funding from investors.
Humane’s total sales of the AI Pin and accessories have only reached $9 million, which is significantly lower than the $200 million investment from prominent Silicon Valley executives.
OpenAI is reportedly leading a $60 million Series B funding round for Opal, a company known for high-end webcams, with plans to develop AI-powered consumer devices.
Opal plans to expand beyond high-end webcams and develop creative tools powered by OpenAI’s AI models.
The startup will work closely with OpenAI researchers to prototype various device ideas.
OpenAI executives are reportedly most interested in integrating their new voice AI models into Opal’s devices.
OpenAI’s $60 million bet on Opal and Sam Altman’s personal investments in AI hardware startups signals a major push from the AI giant to bring advanced AI from the cloud directly into users’ hands.
A new unknown AI model has appeared in the LMSYS Chatbot Arena, igniting rumors that it could be OpenAI’s highly anticipated Q* AI breakthrough or its evolution — codenamed ‘Strawberry’.
A new ‘anonymous-chatbot’ appeared in the LMSYS Chatbot Arena — an open-source platform where AI startups often test upcoming releases.
Previously, OpenAI tested GPT-4o with gpt2-chatbot two weeks before releasing it to the public, which put the arena on high alert for new AI models.
Testers of “anonymous-chatbot” report that it shows more advanced reasoning than GPT-4o and any other frontier model.
To add fuel to the speculation, Sam Altman tweeted a picture of a Strawberry on X, which is the codename of OpenAI’s reported secret AI model.
As competitors like Anthropic and Meta start to catch up to GPT-4o, the Internet has been eagerly awaiting OpenAI’s next move. If this mystery model is indeed Q*/Strawberry, then we could be on the cusp of another seismic shift in AI capabilities.
Researchers have just developed an AI model that can predict major diseases like heart conditions, diabetes, and cancer — significantly outperforming existing methods.
The new model analyzes patient data using statistics and deep learning to spot disease indicators more accurately.
It employs a smart algorithm (SEV-EB) to identify crucial health markers, helping doctors prioritize the most relevant patient information.
This achieves 95% accuracy in predicting specific diseases like coronary artery disease, type 2 diabetes, and breast cancer.
It also leverages patients’ digital health records for personalized risk assessment and earlier healthcare interventions.
Remember when AlphaFold cracked the protein folding problem? This could be healthcare’s next big AI moment. By significantly improving disease prediction accuracy, this model could transform early diagnosis and treatment planning to help save millions of lives across the globe
Intel reportedly declined an opportunity to invest in OpenAI in 2017, missing early entry into the AI market due to doubts about AI’s near-term potential.
A Daily Chronicle of AI Innovations on August 07th 2024
Reddit to test AI-powered search result pages
Robot dentist performs first automated procedure
AI robot helps assemble a BMW
New AI can listen while speaking
Reddit to test AI-powered search result pages
Reddit CEO Steve Huffman announced plans to test AI-powered search results later this year, aiming to help users explore products, shows, games, and new communities on the platform.
Huffman indicated that the company might explore monetizing through paywalled subreddits, which could offer exclusive content or private areas while still maintaining the traditional free version of Reddit.
As Reddit seeks to diversify revenue sources, Huffman emphasized that the company has blocked certain entities from accessing Reddit content to ensure transparency and protect user privacy.
A Boston-based tech company, backed by Mark Zuckerberg’s dentist father, completed the world’s first all-robotic dental procedure, marking a significant advancement in medical technology.
The robot, operated by Perceptive, independently performed a process called “cutting,” which involves drilling into and shaving down a tooth, demonstrating its capabilities in Barranquilla, Colombia.
This breakthrough aims to use autonomous machines for procedures like crown placements in as little as 15 minutes, enhancing precision, efficiency, and patient care.
OpenAI-backed startup Figure AI just showed off Figure 02, its next-generation AI-powered humanoid robot — capable of completely autonomous work in complex environments like a BMW factory.
Figure 02 uses OpenAI’s AI models for speech-to-speech reasoning, allowing the humanoid robot to have full conversations with humans.
A Vision Language Model (VLM) enables the robot to make quick, common-sense decisions based on visual input and self-correct errors.
Six RGB cameras provide the robot with 360-degree vision to help it navigate the real world.
The robot stands 5’6″and weighs 132 lbs, with a 44 lb lifting capacity and a 20-hour runtime thanks to a custom 2.25 KWh battery pack.
The humanoid robot race is intensifying, withFigure CEO Brett Adcock claiming that Figure 02 is now the “most advanced humanoid on the planet” — a direct challenge toward Elon Musk and Tesla Optimus. While the world now waits for Elon’s response, Figure has one ace up its sleeve: its OpenAI partnership.
ByteDance, the parent company of TikTok, just launched Jimeng AI for Chinese users, a text-to-video AI app that directly competes with OpenAI’s (unreleased) Sora AI video model.
Jimeng AI is available on the Apple App Store and Android for Chinese users.
ByteDance’s entry into the AI video generation market follows similar launches by other Chinese tech firms, including Kuaishou’s Kling AI.
The subscription, priced at 79 yuan ($11) monthly or 659 yuan ($92) annually allows for the creation of ~2,050 images or 168 AI videos per month.
Unlike OpenAI’s Sora, which isn’t yet publicly available, these models by Jimeng AI are already accessible to users (in China).
China’s AI video generation race is accelerating, with Kling AI’s public release just weeks ago and now ByteDance’s Jimeng AI launching while the world anxiously waits for Sora’s public release. With Jimeng AI being backed by TikTok, it will have plenty of training data and deep pockets to compete against other AI giants.
AI researchers just developed a new Listening-While-Speaking Language Model (LSLM) that can listen and speak simultaneously — advancing real-time, interactive speech-based AI conversations.
The new model, called the Listening-while-Speaking Language Model (LSLM), enables full-duplex modeling in interactive speech-language models.
LSLM uses a token-based decoder-only TTS for speech generation and a streaming self-supervised learning encoder for real-time audio input.
The system can detect turn-taking in real-time and respond to interruptions, a key feature of natural conversation.
The model demonstrated robustness to noise and sensitivity to diverse instructions in experiments.
While OpenAI’s recent Her-like advanced voice mode for ChatGPT inches us toward realistic AI conversations, LSLM leaps even further by enabling AI to process incoming speech WHILE talking. This could revolutionize human-AI interactions — making conversations with machines feel truly natural and responsive.
Reddit announced plans to test AI-generated summaries at the top of search result pages, using a combination of first-party and third-party technology to enhance content discovery.
A Daily Chronicle of AI Innovations on August 06th 2024
Figure unveils new sleeker and smarter humanoid robot
Nvidia used ‘a lifetime’ of videos everyday to train AI
Leaked code reveals Apple Intelligence’s plan to prevent hallucinations
Nvidia trains video model ‘Cosmos’
OpenAI co-founder leaves for Anthropic
Nvidia AI powers robots with Apple Vision Pro OpenAI has a secretive tool to detect AI-generated text Tesla’s AI gives robots human-like vision Nvidia delays new AI chip launch Google’s Gemini 1.5 Pro leads AI chatbot rankings AI turns brain cancer cells into immune cells
Nvidia AI powers robots with Apple Vision Pro
Nvidia introduced a new tool suite for developers to control and monitor robots using Apple’s Vision Pro headset. The MimicGen NIM microservice translates user movements captured by the Vision Pro into robot actions, enabling intuitive control of robotic limbs.
Additionally, Nvidia’s Isaac Sim can generate synthetic datasets from these captured movements, which reduces the time and cost of collecting real-world data for robot training.
Why does it matter?
This advancement is a practical application of teleoperation. It can lead to more intuitive and effective ways for humans to interact with and control robots and improve their usability in various fields such as manufacturing, healthcare, and service industries.
Leaked documents obtained by 404 media report Nvidia has been scraping millions of videos daily from YouTube, Netflix, and other sources to train its unreleased foundational AI model.
Nvidia’s project, codenamed Cosmos, aims to process “a human lifetime visual experience worth of training data per day.”
The company used open-source tools and virtual machines to download videos, including full-length movies and TV shows.
Employees raised concerns about copyright and ethics, but were told there was “umbrella approval” from executives.
Nvidia claims its practices are “in full compliance with the letter and spirit of copyright law.”
Project Cosmos appears to be Nvidia’s big move into video-based AI, which could revolutionize everything from 3D world generation to self-driving cars, digital humans, and more. However, this harsh introduction is not a good look for the company, especially as the industry’s practices are coming under intense scrutiny.
OpenAI has a secretive tool to detect AI-generated text
OpenAI has been sitting on a tool that can detect AI-assisted cheating for nearly a year. Using an invisible watermarking technique, the company has developed a tool that can detect ChatGPT-generated text with 99.9% accuracy. However, internal debates about user retention, potential bias, and distribution methods have kept this technology under wraps.
Meanwhile, educators are desperately seeking ways to detect AI misuse in schools. A recent survey found that 59% of middle- and high-school teachers were confident some students had used AI for schoolwork, up 17 points from the previous year.
Why does it matter?
This tool could preserve the value of original thought in education. However, OpenAI’s hesitation shows there are complex ethical considerations about AI detection and unintended consequences in language communities.
Three key leaders at OpenAI are departing or taking leave, including co-founder John Schulman, co-founder Greg Brockman, and Peter Deng — another major shakeup for the AI powerhouse.
John Schulman, co-founder and a key leader at OpenAI, has left to join rival AI startup Anthropic — one of OpenAI’s biggest competitors.
Greg Brockman, OpenAI’s president and co-founder, is taking an extended leave of absence until the end of the year.
Peter Deng, a product leader who joined last year from Meta, has reportedly also departed.
These moves follow other recent high-profile exits, including co-founders Ilya Sutskever and Andrej Karpathy.
OpenAI has struggled to regain its footing after Sam Altman’s departure and eventual return as CEO in November 2023. Brockman, one of Altman’s biggest supporters during the ousting, mysteriously takes a leave of absence at a crucial time as OpenAI sees increased competition from Anthropic and Meta AI.
Tesla’s latest patent introduces a vision system for autonomous robots, particularly its humanoid robot Optimus. The end-to-end AI model uses only camera inputs to create a detailed 3D understanding of the environment, without using expensive sensors like LiDAR.
By dividing the space into voxels (3D pixels), the system can predict each spatial unit’s occupancy, shape, semantics, and motion in real-time. It has already been implemented, with Tesla’s manufacturing team training and deploying the neural network in Optimus for tasks like picking up battery cells on a conveyor belt.
Why does it matter?
The development of such AI-driven perception technologies could lead to progress in autonomous systems for more sophisticated and reliable operations.
The Information reports that design flaws could delay the launch of Nvidia’s next-gen AI chips by three months or more. This setback could affect giants like Microsoft, Google, and Meta, who have collectively placed orders worth tens of billions of dollars for these chips.
Despite the rumored delay, Nvidia maintains that production of its new Blackwell chip series is on track. The company also reports strong demand for its Hopper chips and says a broad sampling of Blackwell has already begun. However, sources claim that Microsoft and another major cloud provider were informed of production delays just this week.
Why does it matter?
A slowdown in chip availability could hamper the development and deployment of new AI technologies, affecting everything from cloud services to generative AI applications. It also highlights the delicate balance and vulnerabilities in the AI supply chain.
Google has launched Gemini 1.5 Pro, an experimental version available for early testing. It quickly claimed the top spot on the LMSYS Chatbot Arena leaderboard, outperforming OpenAI’s GPT-4o and Anthropic’s Claude-3.5 Sonnet. With an impressive Elo score of 1300, Gemini 1.5 Pro excels in multilingual tasks, technical areas, and multimodal capabilities.
The model builds on the foundation of Gemini 1.5, boasting a massive context window of up to two million tokens.
Why does it matter?
Google’s decision to make the model available for early testing reflects a growing trend of open development and community engagement in the AI industry. The company’s focus on community feedback also reflects its move toward responsible AI development.
Researchers at the Keck School of Medicine of USC used AI to reprogram glioblastoma cells into cancer-fighting dendritic cells. It increased survival chances by up to 75% in mouse models of glioblastoma, the deadliest form of brain cancer in adults. The technique cleverly bypasses the blood-brain barrier by converting cancer cells within the tumor itself, a major hurdle in traditional glioblastoma treatments.
The approach greatly improved survival rates in animal models when combined with existing treatments like immune checkpoint therapy or DC vaccines. The research team aims to begin clinical trials in patients within the next few years
Why does it matter?
The technique offers new hope for patients facing this aggressive disease. Moreover, the approach’s application to other cancer types suggests a broader impact on cancer immunotherapy, transforming how we approach cancer treatment in the future.
Figure unveils new sleeker and smarter humanoid robot
Figure has introduced its new humanoid robot, the Figure 02, which features improved hardware and software, including six RGB cameras and enhanced CPU/GPU computing capabilities.
Leveraging a longstanding partnership with OpenAI, the Figure 02 is equipped for natural speech conversations, featuring speakers and microphones to facilitate communication with human co-workers.
Figure 02’s advanced AI and language processing aim to make interactions transparent and safe, which is crucial given the robot’s potential use alongside humans in factory and commercial environments.
Nvidia used ‘a lifetime’ of videos everyday to train AI
Nvidia collected videos from YouTube and other sites to create training data for its AI products, as shown by internal documents and communications obtained by 404 Media.
Nvidia asserted that their data collection practices align with both the letter and spirit of copyright law when questioned about legal and ethical concerns regarding the use of copyrighted material.
A former Nvidia employee revealed that workers were directed to gather videos from sources like Netflix and YouTube to train AI for the company’s 3D world generator project, internally referred to as Cosmos.
Leaked code reveals Apple Intelligence’s plan to prevent hallucinations
Leaked code for macOS Sequoia 15.1 has revealed pre-prompt instructions for Apple Intelligence to minimize hallucinations and improve accuracy in responses.
These pre-prompt instructions include directives for Apple Intelligence to ensure questions and answers in mail assistance are concise and relevant to avoid false information.
Instructions also specify limitations for creating photo memories, prohibiting religious, political, harmful, or provocative content to maintain a positive user experience.
OpenAI’s co-founder John Schulman has left for rival Anthropic and wants to focus on AI alignment research. Meanwhile, another co-founder and president of OpenAI Greg Brockman, is taking a sabbatical.
Meta is offering Judi Dench, Awkwafina, and Keegan-Michael Key millions for AI voice projects. While some stars are intrigued by the pay, others disagree over voice usage terms.
YouTube creator David Millette sued OpenAI for allegedly transcribing millions of videos without permission, claiming copyright infringement and seeking over $5 million in damages.
Google hired Character.AI’s co-founders Noam Shazeer and Daniel De Freitas for the DeepMind team, and secured a licensing deal for their large language model tech.
Black Forest Labs, an AI startup, has launched a suite of text-to-image models in three variants: [pro], [dev], and [schnell], which outperforms competitors like Midjourney v6.0 and DALL·E 3.
OpenAI has rolled out an advanced voice mode for ChatGPT to a select Plus subscribers. It has singing, accent imitation, language pronunciation, and storytelling capabilities.
Google’s latest Gemini ad shows a dad using Gemini to help his daughter write a fan letter to an Olympian. Critics argue it promotes lazy parenting and undermines human skills like writing. Google claims the ad aims to show Gemini as a source of initial inspiration.
Stability AI has introduced Stable Fast 3D which turns 2D images into detailed 3D assets in 0.5 seconds. It is significantly faster than previous models while maintaining high quality.
Google’s “About this image” tool is now accessible through Circle to Search and Google Lens. With a simple gesture, you can now check if an image is AI-generated, how it’s used across the web, and even see its metadata.
Karpathy/Nano-Llama31: a minimal, dependency-free version of the Llama 3.1 model architecture, enabling simple training, finetuning, and inference with significantly lighter dependencies compared to the official Meta and Hugging Face implementations.
Secretaries of state from five U.S. statesurged Elon Musk to address misinformation spread by X’s AI chatbot Grok regarding the upcoming November election.
A Daily Chronicle of AI Innovations on August 05th 2024
Neuralink successfully implants brain chip in second patient
OpenAI has a ‘highly accurate’ ChatGPT text detector, but won’t release it for now
Elon Musk is suing OpenAI and Sam Altman again
Meta AI’s new Hollywood hires
Google absorbs Character AI talent
Tesla unveils new AI vision for robots
Google takes another startup out of the AI race
Google pulls AI Olympics ad after backlash
Nvidia delays next AI chip due to design flaw
Meta AI’s new Hollywood hires
Meta is reportedly offering millions to celebrities like Awkwafina, Judi Dench, and Keegan-Michael Key to use their voices in upcoming AI projects.
The AI voices would be used across Meta’s platforms, including Facebook, Instagram, and Meta Ray-Ban smart glasses.
Meta is reportedly rushing to secure deals before its Meta Connect conference in September.
Contracts are reportedly temporary, with actors having the option to renew.
Meta has previously experimented with celebrity-inspired chatbots, though that program has ended.
In our exclusive interview with Mark Zuckerberg, he predicted that “we’re going to live in a world where there are going to be hundreds of millions or billions of different AI agents”. If it holds true, celebrity voice-powered AI could be part of Meta’s next big play to drive user engagement and growth on the platform.
Google has signed a non-exclusive licensing agreement with AI startup Character AI for its large language model technology, while also reabsorbing the startup’s co-founders and key talent back into its AI team.
Character AI co-founders Noam Shazeer and Daniel De Freitas return to Google, their former employer.
Google gains a non-exclusive license to Character AI’s language model technology.
About 30 of Character AI’s 130 employees, mainly those working on model training and voice AI, will join Google’s Gemini AI efforts.
Character AI will switch to open-source models like Meta’s Llama 3.1 for its products, moving away from in-house models.
This deal highlights the intensifying race to secure top AI talent, mirroring Microsoft’s recent deal with Inflection and Amazon’s deal with Adept. As AI becomes increasingly critical to tech companies’ futures, these talent grabs could reshape the landscape, while raising antitrust concerns.
Tesla just filed a patent for an AI-powered vision system that could transform how autonomous robots perceive and navigate their environment using only camera inputs.
The system uses a single neural network to process camera data and output detailed 3D environment information without LiDAR or radar.
It divides space into 3D voxels, predicting occupancy, shape, semantic data, and motion for each in real time.
The tech is designed to run on a robot’s onboard computer, enabling immediate decision-making.
This system could be implemented in both Tesla’s vehicles and humanoid robots like Optimus.
By relying solely on camera inputs and onboard processing, Tesla’s new vision system could enable robots to navigate diverse environments more efficiently and adapt to changes in real time. This would eliminate the need for extensive pre-mapping and accelerate the arrival of affordable, autonomous robots.
Neuralink successfully implants brain chip in second patient
Elon Musk’s brain-computer interface startup, Neuralink, has commenced its second human trial, revealing that the implant is successfully functioning with about 400 electrodes providing signals.
Musk claimed that Neuralink could bestow exceptional abilities such as thermal and eagle vision, and potentially restore blindness and cure neurological disorders in humans.
Despite some initial problems and federal investigations into animal testing practices, Neuralink has over 1,000 volunteers for further trials and plans to implant chips in up to eight more patients by the end of 2024.
OpenAI has a ‘highly accurate’ ChatGPT text detector, but won’t release it for now
OpenAI has an AI-detection tool that is highly effective at identifying AI-generated text, but the company hesitates to release it to avoid upsetting its user base.
The tool, reportedly 99.9% effective, is much more accurate than previous detection algorithms and utilizes a proprietary watermarking system to identify AI-created content.
Despite its potential to aid educators in spotting AI-generated homework, OpenAI is concerned about potential deciphering of their technique and biases against non-native English speakers.
Elon Musk has filed a new lawsuit against OpenAI, Sam Altman, and Greg Brockman, accusing them of breaching the company’s founding mission to benefit humanity with artificial intelligence.
The lawsuit alleges that Altman and Brockman manipulated Musk into co-founding OpenAI by promising it would be safer and more transparent than profit-driven alternatives.
Musk previously withdrew a similar lawsuit in June, but the new suit claims that OpenAI violated federal racketeering laws and manipulated its contract with Microsoft.
Founders of Character.AI, Noam Shazeer and Daniel De Freitas, along with other team members, are rejoining Google’s AI unit DeepMind, the companies announced on Friday.
Character.AI reached a $1 billion valuation last year and plans to offer a nonexclusive license of its large language models to Google, which will help fund its growth and the development of personalized AI products.
The founders, who left Google in 2021 due to disagreements about advancing chatbot technologies, are now returning amid a competitive AI landscape and will contribute to DeepMind’s research team.
Google has withdrawn its “Dear Sydney” ad from the Olympics after receiving significant backlash from viewers and negative feedback on social media.
The controversial advertisement featured a father using the Gemini AI to write a fan letter to Olympic track star Sydney McLaughlin-Levrone on behalf of his daughter, instead of composing it together.
Critics argued that the ad missed the essence of writing a personal fan letter and feared it promoted AI as a substitute for genuine human expression.
The production of Nvidia’s “Blackwell” B200 AI chips has been delayed by at least three months due to a late-discovered design flaw, according to sources.
The B200 chips are successors to the highly sought-after H100 chips and were expected to power many AI cloud infrastructures, but now face production setbacks.
Nvidia is collaborating with Taiwan Semiconductor Manufacturing Company to address the issue, with large-scale shipments now anticipated in the first quarter of next year.
For the first time ever, Google DeepMind’s experimental Gemini 1.5 Pro has claimed the top spot on the AI Chatbot Arena leaderboard, surpassing OpenAI’s GPT-4o and Anthropic’s Claude-3.5 with an impressive score of 1300.
Gemini 1.5 Pro (experimental 0801) gathered over 12K community votes during a week of testing on the LMSYS Chatbot Arena.
The new experimental model achieved the #1 position on both the overall and vision leaderboards.
The experimental version is available for early testing in Google AI Studio, the Gemini API, and the LMSYS Chatbot Arena.
Google DeepMind hasn’t disclosed specific improvements, but promises more updates soon.
Without any announcement, Gemini 1.5 Pro unexpectedly rose to the top of the overall AI chatbot leaderboard — by a whopping 14 points. The leap means that either Google just quietly established itself as the new leader in the LLM space, or we’re on the cusp of major competitive responses from industry rivals.
Meta’s Llama 3.1 allows users to search the internet and train the AI to write in their personal style, saving you time on content creation and research processes.
Access Llama 3.1 through Meta AI and log in with your Facebook or Instagram account.
Use the internet search feature by asking questions like “Summarize the Olympics highlights this week.”
Train Llama 3.1 in your voice by providing a sample of your best content and instructing it to mimic your style.
Generate content by asking Llama 3.1 to create posts on your desired topics.
Pro tip: The more examples and feedback you provide, the better Llama 3.1 will become at emulating your unique writing style!
Stability AI just introduced Stable Fast 3D, an AI model that generates high-quality 3D assets from a single image in just 0.5 seconds — potentially reshaping industries from gaming to e-commerce.
The model creates complete 3D assets, including UV unwrapped mesh, material parameters, and albedo colors with reduced illumination bake-in.
It outperforms previous models, reducing generation time from 10 minutes to 0.5 seconds while maintaining high-quality output.
Stable Fast 3D is available on Hugging Face and through Stability AI’s API, under Stability AI’s Community License.
The leap from 10 minutes to 0.5 seconds for high-quality 3D asset generation is nothing short of insane. We’re entering a world where video games will soon feature infinite, dynamically generated assets, e-commerce will have instant 3D product previews, architects will see designs in real-time, and so much more.
🔍 Gemma Scope: helping the safety community shed light on the inner workings of language models.
Explainable AI: One of the most requested feature for LLMs is to understand how to take internal decisions. This is a big step towards interpretability “This is a barebones tutorial on how to use Gemma Scope, Google DeepMind’s suite of Sparse Autoencoders (SAEs) on every layer and sublayer of Gemma 2 2B and 9B. Sparse Autoencoders are an interpretability tool that act like a “microscope” on language model activations. They let us zoom in on dense, compressed activations, and expand them to a larger but sparser and seemingly more interpretable form, which can be a very useful tool when doing interpretability research!”
AI systems can be powerful but opaque “black boxes” – even to researchers who train them. ⬛
Enter Gemma Scope: a set of open tools made up of sparse autoencoders to help decode the inner workings of Gemma 2 models, and better address safety issues.
What else is happening in AI on August 02nd 2024
Google introduced three new AI features for Chrome, including Google Lens for desktop, Tab compare for product comparisons, and an improved browsing history search.
GitHub launched GitHub Models, a new platform allowing developers to access and experiment with various AI models directly on GitHub, including a playground, Codespaces integration, and deployment.
Healx, an AI-enabled drug discovery startup,raised $47 million in Series C funding and received regulatory clearance to start Phase 2 clinical trials for a new rare disease treatment in the U.S.
Google is facing backlash over its Gemini AI Olympics-themed ad, with critics arguing it promotes overreliance on AI tools at the expense of children’s learning and creativity.
Microsoft officially listed OpenAI as a competitor in AI offerings and search advertising in its annual report, despite their long-term partnership and Microsoft’s significant investment in the company.
Character AI open-sourced Prompt Poet, their innovative approach to prompt design, aiming to revolutionize how AI interactions are built and managed in production environments.
A Daily Chronicle of AI Innovations on August 01st 2024
Microsoft declares OpenAI as competitor
Meta is proving there’s still big AI hype on Wall Street
Reddit CEO says Microsoft needs to pay to search the site
Google launches three ‘open’ AI models prioritizing safety and transparency
Google’s tiny AI model bests GPT-3.5
Taco Bell’s AI drive-thru
AI reprograms brain cancer cells
Google’s tiny AI model bests GPT-3.5
Taco Bell’s AI drive-thru
Microsoft declares OpenAI as competitor
Microsoft has officially listed OpenAI as a competitor in AI, search, and news advertising in its latest annual report, signalling a shift in their relationship.
Despite Microsoft being the largest investor and exclusive cloud provider for OpenAI, both companies are now encroaching on each other’s market territories.
An OpenAI spokesperson indicated that their competitive dynamic was always expected as part of their partnership, and Microsoft still remains a strong partner for OpenAI.
Meta is proving there’s still big AI hype on Wall Street
Meta’s shares surged by about 7% in extended trading after surpassing Wall Street’s revenue and profit expectations and providing an optimistic forecast for the current period.
The company reported a 22% increase in second-quarter revenue to $39.07 billion and a 73% rise in net income, attributing the growth to gains in the digital ad market and cost-cutting measures.
Meta continues to invest heavily in AI and VR technologies, with plans for significant capital expenditure growth in 2025 to support AI research and development, despite a broader downsizing effort.
Google launches three ‘open’ AI models prioritizing safety and transparency
Google has unveiled three new models to the Gemma 2 lineup, building on the original models released in June 2024, focusing on performance and safety enhancements.
The first addition, Gemma 2 2B, provides improved capabilities and is adaptable for various devices, while ShieldGemma and Gemma Scope focus on content safety and model interpretability, respectively.
These new tools and models are available on platforms like Kaggle and Hugging Face, promoting broader use and development within the AI community with a focus on responsible innovation.
Researchers at USC made a breakthrough using AI to reprogram glioblastoma cells into immune-activating dendritic cells in mouse models, potentially revolutionizing treatment for the deadly brain cancer.
Glioblastoma is the deadliest adult brain cancer, with less than 10% of patients surviving five years after diagnosis.
AI identified genes that can convert glioblastoma cells into dendritic cells (DCs), which sample cancer antigens and activate other immune cells to attack the tumor.
In mouse models, this approach increased survival chances by up to 75% when combined with immune checkpoint therapy.
Researchers have also identified human genes that could potentially reprogram human glioblastoma cells, paving the way for future clinical trials.
By turning cancer cells against themselves, this new research offers a novel way to fight tumors from within. If the 75% increased survival chances in mice translate to humans, this could not only revolutionize glioblastoma treatment but potentially open doors for similar approaches in other hard-to-treat cancers.
Taco Bell’s parent company, Yum Brands, just announced plans to roll out AI-powered drive-thru ordering at hundreds of restaurants in the U.S. by the end of 2024, with ambitions for global implementation.
The AI understands orders, auto-inputs them into the system, and even suggests additional items — potentially increasing sales through upselling.
Over 100 Taco Bell restaurants in the U.S. already use voice AI in drive-thrus.
The company has been testing the AI for over two years and claims it has outperformed humans in accuracy, reduced wait times, and decreased employee workload.
Rivals like Wendy’s and White Castle are also experimenting with AI ordering, while McDonald’s recently ended its IBM partnership for similar tech.
IfTaco Bell’s positive results on their two-year test are any indication, this large-scale AI implementation could change the way fast-food chain businesses operate and how we order food at drive-thrus. However, the success (or failure) of this rollout could set the tone for the entire industry’s adoption.
Google just unveiled Gemma 2 2B, a lightweight AI model with just 2B parameters that outperforms much larger models like GPT-3.5 and Mixtral 8x7B on key benchmarks.
Gemma 2 2B boasts just 2.6B parameters, but was trained on a massive 2 trillion token dataset.
It scores 1130 on the LMSYS Chatbot Arena, matching GPT-3.5-Turbo-0613 (1117) and Mixtral-8x7b (1114) — models 10x its size.
Other notable key benchmark scores include 56.1 on MMLU and 36.6 on MBPP, beating its predecessor by over 10%.
The model is open-source, and developers can download the model’s weights from Google’s announcement page.
As we enter a new era of on-device, local AI, lightweight and efficient models are crucial for running AI directly on our phones and laptops. With Gemma 2 beating GPT-3.5 Turbo at just 1/10th the size, Google isn’t just showing what’s possible — they’re cementing their position as the leader in the small model space.
Google expanded access to its “About this image” tool, making it available through Circle to Search and Google Lens, allowing users to quickly get context on images they encounter online or via messaging.
NEURA, a German robotics company, released a new video showcasing their humanoid robot 4NE-1 performing tasks like chopping vegetables, ironing cloths, solving puzzles, and more.Source: https://x.com/TheHumanoidHub/status/1818726046633804184
Synthesia introduced “Personal Avatars,” AI-generated lifelike avatars created from brief webcam or phone footage, allowing users to create short-form videos for social media in multiple languages.Source: https://www.synthesia.io/features/custom-avatar/persona
Enjoying these FREE AI updates without the clutter, Set yourself up for promotion or get a better job by Acing the AWS Certify Data Engineer Associate Exam (DEA-C01) with the book or App below:
Looking for a tool, where it would be able to visualize what the room would look if we knock down a wall or open up a window. Can be based off of an existing plan, or tuned based on prompts would be fine. Would be cool to have something so we can visualize how a remodeling idea would look. submitted by /u/redditIhardlyknowit [link] [comments]
I'm searching for an AI tool or service that can translate YouTube videos longer than 1 hour, specifically from English to Spanish. I don’t necessarily need an all-in-one tool that does everything—just a good translator is fine. I’ve tried Chrome extensions, but none seem to translate accurately. Most tools I’ve found (even paid ones) limit translations to 30-minute videos, and cutting videos into parts is too much work for my needs. Any recommendations for tools or workflows that handle long video translations effectively? Paid options are fine as long as they get the job done. submitted by /u/disguisedspybot [link] [comments]
I was laid off and decided to use this time wisely to switch careers. Willing to do the hard work and I know it won’t be overnight but need a starting point to enter to ecosystem. submitted by /u/toyheartz [link] [comments]
My rig has a 7800x3d, 32g ram and a 4070ti Super, so i don't know if that's sufficient to make short videos. Anyways, specifically, i want to make a short video of a children's outdoor birthday party with a cake and gifts and a clown, then the clown walks off screen and comes back leading a white pony. The kids clap for the pony, and the pony raises it's tail and sprays diarrhea all over the cake, the gifts and the kids. The clown yells angrily at the children. A child wipes diarrhea from his face as he cries. Which program or service should I be using? submitted by /u/Farting_Sunshine [link] [comments]
I've been doing some thinking and deep diving into AI research and came across a fascinating concept called 'relational intelligence.' Here's what it's all about: Most people think of AI as just number-crunching machines, but I'm fascinated by the potential for a more nuanced form of intelligence. Relational intelligence is more than AI mimicking human consciousness - it's about crafting systems that adapt to and genuinely understand context. Imagine AI that doesn't just respond mechanically, but actually synthesizes information dynamically. Think of it as the difference between a simple calculator and a conversation partner who actually gets the nuances of what you're saying. In fields like healthcare, education, and customer service, this could be revolutionary - systems that genuinely understand the complexity of human needs. We're not trying to create human-like consciousness, but something entirely new: an intelligence that complements human thinking while being uniquely its own thing. I'm really curious to hear your thoughts: 1. How do you see relational intelligence potentially transforming different industries? 2. Can AI develop a meaningful form of intelligence without human emotions? 3. What challenges might we face in developing this approach? Disclaimer: Just exploring ideas here, not claiming we've solved AI consciousness or anything. submitted by /u/That-Pension4540 [link] [comments]
Hello everyone, I know there's probably a post like this that comes up every single day but I'm really posting this because I'm stuck and almost completely depleted of recourses. I'm having an extremely difficult time generating the content that I want out of my prompts on multiple platforms and am in need of guidance or advice on the matter. For a little background, I'm an independant artist that recently discovered the magnificence of AI and felt extremely motivated and passionate about releasing my new project alongside an AI created shortfilm. Now the project is a little more complicated than just that but I currently can't even get past the beginning portion so I don't want to get ahead of myself and think of the future too hastily. In terms of workflow and recourses I currently have: I am using a Macbook Pro M1 Pro Max (so not ideal for me to use a local SD engine, etc, unless there's something that I'm missing) I have the complete adobe suite (photoshop, premiere, after effects, etc) and am fairly proficient in them. I have a monthly subscription for Midjourney, KlingAI, Minimax, LeonardoAI. I create my own music and sound design with Logic Pro and Splice. What i'm trying to create currently and having difficulty is a :30 second trailer for my upcoming project that in essence is of a man walking through an empty white space into a black entrance with different camera angles of the man walking and his facial expressions. What i've tried for workflow purposes: Create many reference photos of the man using prompts like: "Create a 9-panel character sheet, camera angled at medium length to show the subject from the top of his head to the end of stomach, korean male, 35 years old, clean shaven face, defined jaw line, short hair cut with a high fade buzzed on the sides, black hair and black eyes, wearing a plain white longsleeve crewneck sweater and plain white pants mostly normal expression but change expressions slightly and turn head slightly throughout each panel, Evenly-spaced photo grid with deep color tone. Standing in front of a plain solid white backdrop with studio lighting. Professional full body model photography, highlighting the details of the subject." That prompt after filtering through the many outputs leads to this result: https://imgur.com/a/s9JqbFC I then sliced the references into seperate layers on photoshop and removing the background of each and altering some details that came out wonky. I then take those references and re-add them to midjourney as CREFS and create several new prompts that read like this: "side profile photo looking towards the right, of a korean man age 35, average build, around 5'10, black hair, black eyes, clean shaven, short buzzed haircut, wearing a white long-sleeve crewneck sweater and long white pants, barefoot, the man has a normal resting face. Standing in front of a plain solid white backdrop with studio lighting. Professional full body model photography, highlighting the details of the subject." That created Results like this: https://imgur.com/a/Irx5uIU I then created a prompt for the space that I wanted the man to be in so that I can eventually turn that into a video using the other services. The prompt was as follows: "cinematic birds eye superwide angle, film by George Lucas, huge empty white room with no walls, completely smooth white with no markings or ceilings and one singular small door at the very end of the white space, 35mm, 8k, ultra realistic, style of sci-fi" This was the result of that prompt: https://cdn.midjourney.com/f46c926f-bb3a-4a18-870e-b5e834f1ae67/0_3.png I tried merging the two using Crefs and Style references with a prompt but wasn't given what I wanted so I decided to photoshop what I wanted using the AI built in photoshop as well as well as the seperate entries: https://imgur.com/a/BaE00nB I then used that reference image as well as the rest of these photoshopped images (which just added sequence for image to video for services that give a start point and end point image reference): https://imgur.com/a/WAGKEgn into KlingAI, Minimax, Leonardo and Runway, Haiper, and Vidu (the last three were with free credits), these were my results: KLINGAI: https://imgur.com/a/aHgO6uc MINIMAX: https://imgur.com/a/SpYId3T RUNWAY: https://imgur.com/a/FvcDJyE HAIPERAI: https://imgur.com/a/LBO6jhV VIDUAI: https://imgur.com/a/Es3nU7e From all the generations the best were Vidu AI, although I started running into weird discoloration. All I want is for that man to walk slowly to the next picture slide (It would be ROOM 2 into ROOM 2.2). 2) So that didn't work fully so I decided to train a Lora model on Leonardo AI so I began to generate even more images of the previous character reference using more photoshopped character reference photos and the seed# for the images that I thought were appropriate. I narrowed the images down to 30 solid images of front facing, back facing, right and left side profile, full body, and even turning photos of the character reference as consistent as I could make it. After training on Leonardo I tried to generate but realized that It still was not consistent (the model, didn't even attempt adding him into a room). In conclusion, i'm running out of options, free credits to try, and money since i've already invested into multiple monthly subscriptions. It's a lot for me at the moment, i know it may not be much for others. I'm not giving up however, I just don't want to endlessly buy more subscriptions or waste the ones i currently purchased and instead have some ability to do some research or get guidance before I beging purchasing more! I know this was a longwinded post but I wanted to be as detailed as possible so that It doesn't seem like I'm just lazily asking for help without trying myself but since I've only just started learning about AI 5 days ago, it's been hard to filter what's good info and what's not, as well as understanding or trying to look for things without knowing the language and/or terms, even when using Chat-GPT. If anyone can help that'd be GREATLY appreciated! Also I am free to answer any questions that may help clear up any confusing wording or portions of what I wrote. Thank you all in advance! submitted by /u/natureboyandymiami [link] [comments]
Hello everyone, I know there's probably a post like this that comes up every single day but I'm really posting this because I'm stuck and almost completely depleted of recourses. I'm having an extremely difficult time generating the content that I want out of my prompts on multiple platforms and am in need of guidance or advice on the matter. For a little background, I'm an independant artist that recently discovered the magnificence of AI and felt extremely motivated and passionate about releasing my new project alongside an AI created shortfilm. Now the project is a little more complicated than just that but I currently can't even get past the beginning portion so I don't want to get ahead of myself and think of the future too hastily. In terms of workflow and recourses I currently have: - I am using a Macbook Pro M1 Pro Max (so not ideal for me to use a local SD engine, etc, unless there's something that I'm missing) - I have the complete adobe suite (photoshop, premiere, after effects, etc) and am fairly proficient in them. - I have a monthly subscription for Midjourney, KlingAI, Minimax, LeonardoAI. - I create my own music and sound design with Logic Pro and Splice. What i'm trying to create currently and having difficulty is a :30 second trailer for my upcoming project that in essence is of a man walking through an empty white space into a black entrance with different camera angles of the man walking and his facial expressions. What i've tried for workflow purposes: 1) Create many reference photos of the man using prompts like: "Create a 9-panel character sheet, camera angled at medium length to show the subject from the top of his head to the end of stomach, korean male, 35 years old, clean shaven face, defined jaw line, short hair cut with a high fade buzzed on the sides, black hair and black eyes, wearing a plain white longsleeve crewneck sweater and plain white pants mostly normal expression but change expressions slightly and turn head slightly throughout each panel, Evenly-spaced photo grid with deep color tone. Standing in front of a plain solid white backdrop with studio lighting. Professional full body model photography, highlighting the details of the subject." That prompt after filtering through the many outputs leads to this result: https://imgur.com/a/s9JqbFC I then sliced the references into seperate layers on photoshop and removing the background of each and altering some details that came out wonky. I then take those references and re-add them to midjourney as CREFS and create several new prompts that read like this: "side profile photo looking towards the right, of a korean man age 35, average build, around 5'10, black hair, black eyes, clean shaven, short buzzed haircut, wearing a white long-sleeve crewneck sweater and long white pants, barefoot, the man has a normal resting face. Standing in front of a plain solid white backdrop with studio lighting. Professional full body model photography, highlighting the details of the subject." That created Results like this: https://imgur.com/a/Irx5uIU I then created a prompt for the space that I wanted the man to be in so that I can eventually turn that into a video using the other services. The prompt was as follows: "cinematic birds eye superwide angle, film by George Lucas, huge empty white room with no walls, completely smooth white with no markings or ceilings and one singular small door at the very end of the white space, 35mm, 8k, ultra realistic, style of sci-fi" This was the result of that prompt: https://cdn.midjourney.com/f46c926f-bb3a-4a18-870e-b5e834f1ae67/0_3.png I tried merging the two using Crefs and Style references with a prompt but wasn't given what I wanted so I decided to photoshop what I wanted using the AI built in photoshop as well as well as the seperate entries: https://imgur.com/a/BaE00nB I then used that reference image as well as the rest of these photoshopped images (which just added sequence for image to video for services that give a start point and end point image reference): https://imgur.com/a/WAGKEgn into KlingAI, Minimax, Leonardo and Runway, Haiper, and Vidu (the last three were with free credits), these were my results: KLINGAI: https://imgur.com/a/aHgO6uc MINIMAX: https://imgur.com/a/SpYId3T RUNWAY: https://imgur.com/a/FvcDJyE HAIPERAI: https://imgur.com/a/LBO6jhV VIDUAI: https://imgur.com/a/Es3nU7e From all the generations the best were Vidu AI, although I started running into weird discoloration. All I want is for that man to walk slowly to the next picture slide (It would be ROOM 2 into ROOM 2.2). 2) So that didn't work fully so I decided to train a Lora model on Leonardo AI so I began to generate even more images of the previous character reference using more photoshopped character reference photos and the seed# for the images that I thought were appropriate. I narrowed the images down to 30 solid images of front facing, back facing, right and left side profile, full body, and even turning photos of the character reference as consistent as I could make it. After training on Leonardo I tried to generate but realized that It still was not consistent (the model, didn't even attempt adding him into a room). In conclusion, i'm running out of options, free credits to try, and money since i've already invested into multiple monthly subscriptions. It's a lot for me at the moment, i know it may not be much for others. I'm not giving up however, I just don't want to endlessly buy more subscriptions or waste the ones i currently purchased and instead have some ability to do some research or get guidance before I beging purchasing more! I know this was a longwinded post but I wanted to be as detailed as possible so that It doesn't seem like I'm just lazily asking for help without trying myself but since I've only just started learning about AI 5 days ago, it's been hard to filter what's good info and what's not, as well as understanding or trying to look for things without knowing the language and/or terms, even when using Chat-GPT. If anyone can help that'd be GREATLY appreciated! Also I am free to answer any questions that may help clear up any confusing wording or portions of what I wrote. Thank you all in advance! submitted by /u/natureboyandymiami [link] [comments]
Is it possible to write an AI program that could gather the first 20 leading news articles each day in each country and then have the AI create a constant updating history book? It would be interesting to see how much each country sees the same information on the long scale submitted by /u/SomeGuy69-420 [link] [comments]
Hey! I developed my own, easy-to-use logo maker app almost a year ago. It generates logos based on prompts you enter, using advanced AI to create unique and personalized designs. Well, the app isn’t doing very well, mostly because I haven’t marketed it much, and the design tools niche is very crowded. I’m giving everyone who comments on this post a free 1-year subscription. All I want in return is your feedback. An App Store review would also be greatly appreciated. Thanks a lot! Here’s the link to the App Store page: https://apps.apple.com/au/app/logo-maker-ai-generator-loly/id6738083056?platform=iphone submitted by /u/Significant-Bed-3149 [link] [comments]
Download the AI & Machine Learning For Dummies PRO App: iOS - Android Our AI and Machine Learning For Dummies PRO App can help you Ace the following AI and Machine Learning certifications:
A Daily Chronicle of AI Innovations in January 2024.
Welcome to ‘Navigating the Future,’ a premier portal for insightful and up-to-the-minute commentary on the evolving world of Artificial Intelligence in January 2024. In an age where technology outpaces our expectations, we delve deep into the AI cosmos, offering daily snapshots of revolutionary breakthroughs, pivotal industry transitions, and the ingenious minds shaping our digital destiny. Join us on this exhilarating journey as we explore the marvels and pivotal milestones in AI, day by day. Stay informed, stay inspired, and witness the chronicle of AI as it unfolds in real-time.
Are you eager to expand your understanding of artificial intelligence? Look no further than the essential book “AI Unraveled: Master GPT-4, Gemini, Generative AI & LLMs – Simplified Guide for Everyday Users: Demystifying Artificial Intelligence – OpenAI, ChatGPT, Google Bard, AI ML Quiz, AI Certifications Prep, Prompt Engineering,” available at Etsy, Shopify, Apple, Google, or Amazon.
A Daily Chronicle of AI Innovations in January 2024 – Day 31: AI Daily News – January 31st, 2024
Microsoft CEO responds to AI-generated Taylor Swift fake nude images
Microsoft CEO Satya Nadella addresses the issue of AI-generated fake nude images of Taylor Swift, emphasizing the need for safety and guardrails in AI technology.
Microsoft CEO Satya Nadella acknowledges the need to act swiftly against nonconsensual deepfake images.
The AI-generated fake nude pictures of Taylor Swift have gained over 27 million views.
Microsoft, a major AI player, emphasizes the importance of online safety for both content creators and consumers.
Microsoft’s AI Code of Conduct prohibits creating adult or non-consensual intimate content. This policy is a part of the company’s commitment to ethical AI use and responsible content creation.
The deepfake images were reportedly created using Microsoft’s AI tool, Designer, which the company is investigating.
Microsoft is committed to enhancing content safety filters and addressing misuse of their services.
Elon Musk’s $56 billion pay package cancelled in court
A Delaware judge ruled against Elon Musk’s $56 billion pay package from Tesla, necessitating a new compensation proposal by the board.
The ruling, which could impact Musk’s wealth ranking, was based on the argument that shareholders were misled about the plan’s formulation and the board’s independence.
The case highlighted the extent of Musk’s influence over Tesla and its board, with key witnesses admitting they were cooperating with Musk rather than negotiating against him.
Google spent billions of dollars to lay people off
Google spent $2.1 billion on severance and other expenses for laying off over 12,000 employees in 2023, with an additional $700 million spent in early 2024 for further layoffs.
In 2023, Google achieved a 13 percent revenue increase year over year, amounting to $86 billion, with significant growth in its core digital ads, cloud computing businesses, and investments in generative AI.
The company also incurred a $1.8 billion cost for closing physical offices in 2023, and anticipates more layoffs in 2024 as it continues investing in AI technology under its “Gemini era”.
ChatGPT now lets you pull other GPTs into the chat
OpenAI introduced a feature allowing custom ChatGPT-powered chatbots to be tagged with an ‘@’ in the prompt, enabling easier switching between bots.
The ability to build and train custom GPT-powered chatbots was initially offered to OpenAI’s premium ChatGPT Plus subscribers in November 2023.
Despite the new feature and the GPT Store, custom GPTs currently account for only about 2.7% of ChatGPT’s worldwide web traffic, with a month-over-month decline in custom GPT traffic since November.
The NYT is building a team to explore AI in the newsroom
The New York Times is starting a team to investigate how generative AI can be used in its newsroom, led by newly appointed AI initiatives head Zach Seward.
This new team will comprise machine learning engineers, software engineers, designers, and editors to prototype AI applications for reporting and presentation of news.
Despite its complicated past with generative AI, including a lawsuit against OpenAI, the Times emphasizes that its journalism will continue to be created by human journalists.
The tiny Caribbean island making a fortune from AI
The AI boom has led to a significant increase in interest and sales of .ai domains, contributing approximately $3 million per month to Anguilla’s budget due to its association with artificial intelligence.
Vince Cate, a key figure in managing the .ai domain for Anguilla, highlights the surge in domain registrations following the release of ChatGPT, boosting the island’s revenue and making a substantial impact on its economy.
Unlike Tuvalu with its .tv domain, Anguilla manages its domain registrations locally, allowing the government to retain most of the revenue, which has been used for financial improvements such as paying down debt and eliminating property taxes on residential buildings.
A Daily Chronicle of AI Innovations in January 2024 – Day 30: AI Daily News – January 30th, 2024
Meta released Code Llama 70B, rivals GPT-4
Meta released Code Llama 70B, a new, more performant version of its LLM for code generation. It is available under the same license as previous Code Llama models–
CodeLlama-70B-Instruct achieves 67.8 on HumanEval, making it one of the highest-performing open models available today. CodeLlama-70B is the most performant base for fine-tuning code generation models.
Why does this matter?
This makes Code Llama 70B the best-performing open-source model for code generation, beating GPT-4 and Gemini Pro. This can have a significant impact on the field of code generation and the software development industry, as it offers a powerful and accessible tool for creating and improving code.
Neuralink implants its brain chip in the first human
In a first, Elon Musk’s brain-machine interface startup, Neuralink, has successfully implanted its brain chip in a human. In a post on X, he said “promising” brain activity had been detected after the procedure and the patient was “recovering well”. In another post, he added:
The company’s goal is to connect human brains to computers to help tackle complex neurological conditions. It was given permission to test the chip on humans by the FDA in May 2023.
As Mr. Musk put it well, imagine if Stephen Hawking could communicate faster than a speed typist or auctioneer. That is the goal. This product will enable control of your phone or computer and, through them almost any device, just by thinking. Initial users will be those who have lost the use of their limbs.
Alibaba announces Qwen-VL; beats GPT-4V and Gemini
Alibaba’s Qwen-VL series has undergone a significant upgrade with the launch of two enhanced versions, Qwen-VL-Plus and Qwen-VL-Max. The key technical advancements in these versions include
Substantial boost in image-related reasoning capabilities;
Considerable enhancement in recognizing, extracting, and analyzing details within images and texts contained therein;
Support for high-definition images with resolutions above one million pixels and images of various aspect ratios.
Compared to the open-source version of Qwen-VL, these two models perform on par with Gemini Ultra and GPT-4V in multiple text-image multimodal tasks, significantly surpassing the previous best results from open-source models.
This sets new standards in the field of multimodal AI research and application. These models match the performance of GPT4-v and Gemini, outperforming all other open-source and proprietary models in many tasks.
What Else Is Happening in AI on January 30th, 2024
OpenAI partners with Common Sense Media to collaborate on AI guidelines.
OpenAI will work with Common Sense Media, the nonprofit organization that reviews and ranks the suitability of various media and tech for kids, to collaborate on AI guidelines and education materials for parents, educators, and young adults. It will curate “family-friendly” GPTs based on Common Sense’s rating and evaluation standards. (Link)
Apple’s ‘biggest’ iOS update may bring a lot of AI to iPhones.
Apple’s upcoming iOS 18 update is expected to be one of the biggest in the company’s history. It will leverage generative AI to provide a smarter Siri and enhance the Messages app. Apple Music, iWork apps, and Xcode will also incorporate AI-powered features. (Link)
Shortwave email client will show AI-powered summaries automatically.
Shortwave, an email client built by former Google engineers, is launching new AI-powered features such as instant summaries that will show up atop an email, a writing assistant to echo your writing and extending its AI assistant function to iOS and Android, and multi-select AI actions. All these features are rolling out starting this week. (Link)
OpenAI CEO Sam Altman explores AI chip collaboration with Samsung and SK Group.
Sam Altman has traveled to South Korea to meet with Samsung Electronics and SK Group to discuss the formation of an AI semiconductor alliance and investment opportunities. He is also said to have expressed a willingness to purchase HBM (High Bandwidth Memory) technology from them. (Link)
Generative AI is seen as helping to identify M&A targets, Bain says.
Deal makers are turning to AI and generative AI tools to source data, screen targets, and conduct due diligence at a time of heightened regulatory concerns around mergers and acquisitions, Bain & Co. said in its annual report on the industry. In the survey, 80% of respondents plan to use AI for deal-making. (Link)
Neuralink has implanted its first brain chip in human LINK
Elon Musk’s company Neuralink has successfully implanted its first device into a human.
The initial application of Neuralink’s technology is focused on helping people with quadriplegia control devices with their thoughts, using a fully-implantable, wireless brain-computer interface.
Neuralink’s broader vision includes facilitating human interaction with artificial intelligence via thought, though immediate efforts are targeted towards aiding individuals with specific neurological conditions.
OpenAI partners with Common Sense Media to collaborate on AI guidelines LINK
OpenAI announced a partnership with Common Sense Media to develop AI guidelines and create educational materials for parents, educators, and teens, including curating family-friendly GPTs in the GPT store.
The partnership was announced by OpenAI CEO Sam Altman and Common Sense Media CEO James Steyer at the Common Sense Summit for America’s Kids and Families in San Francisco.
Common Sense Media, which has started reviewing AI assistants including OpenAI’s ChatGPT, aims to guide safe and responsible AI use among families and educators without showing favoritism towards OpenAI.
New test detects ovarian cancer earlier thanks to AI LINK
Scientists have developed a 93% accurate early screening test for ovarian cancer using artificial intelligence and machine learning, promising improved early detection for this and potentially other cancers.
The test analyzes a woman’s metabolic profile to accurately assess the likelihood of having ovarian cancer, providing a more informative and precise diagnostic approach compared to traditional methods.
Georgia Tech researchers utilized machine learning and mass spectrometry to detect unique metabolite characteristics in the blood, enabling the early and accurate diagnosis of ovarian cancer, with optimism for application in other cancer types.
A Daily Chronicle of AI Innovations in January 2024 – Day 29: AI Daily News – January 29th, 2024
OpenAI reveals new models, drop prices, and fixes ‘lazy’ GPT-4
OpenAI announced a new generation of embedding models, new GPT-4 Turbo and moderation models, new API usage management tools, and lower pricing on GPT-3.5 Turbo.
Introducing new ways for developers to manage API keys and understand API usage
Quietly implemented a new ‘GPT mentions’ feature to ChatGPT (no official announcement yet). The feature allows users to integrate GPTs into a conversation by tagging them with an ‘@.’
The new embedding models and GPT-4 Turbo will likely enable more natural conversations and fluent text generation. Lower pricing and easier API management also open up access and usability for more developers.
Moreover, The updated GPT-4 Turbo preview model, gpt-4-0125-preview, can better complete tasks such as code generation compared to the previous model. The GPT-4 Turbo has been the object of many complaints about its performance, including claims that it was acting lazy. OpenAI has addressed that issue this time.
Prophetic – This company wants AI to enter your dreams
Prophetic introduces Morpheus-1, the world’s 1st ‘multimodal generative ultrasonic transformer’. This innovative AI device is crafted with the purpose of exploring human consciousness through controlling lucid dreams. Morpheus-1 monitors sleep phases and gathers dream data to enhance its AI model.
Morpheus-1 is not prompted with words and sentences but rather brain states. It generates ultrasonic holograms for neurostimulation to bring one to a lucid state.
Its 03M parameter transformer model trained on 8 GPUs for 2 days
Engineered from scratch with the provisional utility patent application
The device is set to be accessible to beta users in the spring of 2024.
Prophetic is pioneering new techniques for AI to understand and interface with the human mind by exploring human consciousness and dreams through neurostimulation and multimodal learning. This pushes boundaries to understand consciousness itself.
If Morpheus-1 succeeds, it could enable transformative applications of AI for expanding human potential and treating neurological conditions.
Also, This is the first model that can fully utilize the capabilities offered by multi-element and create symphonies.
This paper ‘MM-LLMs’ discusses recent advancements in MultiModal LLMs which combine language understanding with multimodal inputs or outputs. The authors provide an overview of the design and training of MM-LLMs, introduce 26 existing models, and review their performance on various benchmarks.
(Above is the timeline of MM-LLMs)
They also share key training techniques to improve MM-LLMs and suggest future research directions. Additionally, they maintain a real-time tracking website for the latest developments in the field. This survey aims to facilitate further research and advancement in the MM-LLMs domain.
Why does this matter?
The overview of models, benchmarks, and techniques will accelerate research in this critical area. By integrating multiple modalities like image, video, and audio, these models can understand the world more comprehensively.
What Else Is Happening in AI on January 29th, 2024
Update from Hugging Face LMSYS Chatbot Arena Leaderboard
Google’s Bard surpasses GPT-4 to the Second spot on the leaderboard! (Link)
Google Cloud has partnered with Hugging Face to advance Gen AI development
The partnership aims to meet the growing demand for AI tools and models that are optimized for specific tasks. Hugging Face’s repository of open-source AI software will be accessible to developers using Google Cloud’s infrastructure. The partnership reflects a trend of companies wanting to modify or build their own AI models rather than using off-the-shelf options. (Link)
Arc Search combines a browser, search engine, and AI for a unique browsing experience
Instead of returning a list of search queries, Arc Search builds a webpage with relevant information based on the search query. The app, developed by The Browser Company, is part of a bigger shift for their Arc browser, which is also introducing a cross-platform syncing system called Arc Anywhere. (Link)
PayPal is set to launch new AI-based products
The new products will use AI to enable merchants to reach new customers based on their shopping history and recommend personalized items in email receipts. (Link)
Apple Podcasts in iOS 17.4 now offers AI transcripts for almost every podcast
This is made possible by advancements in machine translation, which can easily convert spoken words into text. Users testing the beta version of iOS 17.4 have discovered that most podcasts in their library now come with transcripts. However, there are some exceptions, such as podcasts added from external sources. As this feature is still in beta, there is no information available regarding its implementation or accuracy. (Link)
Google’s Gemini Pro beats GPT-4
Google’s Gemini Pro has surpassed OpenAI’s GPT-4 on the HuggingFace Chat Bot Arena Leaderboard, securing the second position.
Gemini Pro is only the middle tier of Google’s planned models, with the top-tier Ultra expected to be released sometime soon.
Competition is heating up with Meta’s upcoming Llama 3, which is speculated to outperform GPT-4.
iOS 18 could be the ‘biggest’ software update in iPhone history
iOS 18 is predicted to be one of the most significant updates in iPhone history, with Apple planning major new AI-driven features and designs.
Apple is investing over $1 billion annually in AI development, aiming for an extensive overhaul of features like Siri, Messages, and Apple Music with AI improvements in 2024.
The update will introduce RCS messaging support, enhancing messaging between iPhones and Android devices by providing features like read receipts and higher-resolution media sharing.
Nvidia’s tech rivals are racing to cut their dependence
Amazon, Google, Meta, and Microsoft are developing their own AI chips to reduce dependence on Nvidia, which dominates the AI chip market and accounts for more than 70% of sales.
These tech giants are investing heavily in AI chip development to control costs, avoid shortages, and potentially sell access to their chips through their cloud services, while balancing their competition and partnership with Nvidia.
Nvidia sold 2.5 million chips last year, and its sales increased by 206% over the past year, adding about a trillion dollars in market value.
Amazon abandons $1.4 billion deal to buy Roomba maker iRobot
Amazon’s planned $1.4 billion acquisition of Roomba maker iRobot has been canceled due to lack of regulatory approval in the European Union, leading Amazon to pay a $94 million termination fee to iRobot.
iRobot announced a restructuring plan that includes laying off about 350 employees, which is roughly 31 percent of its workforce, and a shift in leadership with Glen Weinstein serving as interim CEO.
The European Commission’s concerns over potential restrictions on competition in the robot vacuum cleaner market led to the deal’s termination, emphasizing fears that Amazon could limit the visibility of competing products.
Arc Search combines browser, search engine, and AI into something new and different
Arc Search, developed by The Browser Company, unveiled an iOS app that combines browsing, searching, and AI to deliver comprehensive web page summaries based on user queries.
The app represents a shift towards integrating browser functionality with AI capabilities, offering features like “Browse for me” that automatically gathers and presents information from across the web.
While still in development, Arc Search aims to redefine web browsing by compiling websites into single, informative pages.
AlphaGeometry: An Olympiad Level AI System for Geometry by Google Deepmind
One of the signs of intelligence is being able to solve mathematical problems. And that is exactly what Google has achieved with its new Alpha Geometry System. And not some basic Maths problems, but international Mathematics Olympiads, one of the hardest Maths exams in the world. In today’s post, we are going to take a deep dive into how this seemingly impossible task is achieved by Google and try to answer whether we have truly created an AGI or not.
1. Problem Generation and Initial Analysis Creation of a Geometric Diagram: AlphaGeometry starts by generating a geometric diagram. This could be a triangle with various lines and points marked, each with specific geometric properties. Initial Feature Identification: Using its neural language model, AlphaGeometry identifies and labels basic geometric features like points, lines, angles, circles, etc.
2. Exhaustive Relationship Derivation Pattern Recognition: The language model, trained on geometric data, recognizes patterns and potential relationships in the diagram, such as parallel lines, angle bisectors, or congruent triangles. Formal Geometric Relationships: The symbolic deduction engine takes these initial observations and deduces formal geometric relationships, applying theorems and axioms of geometry.
3. Algebraic Translation and Gaussian Elimination Translation to Algebraic Equations: Where necessary, geometric conditions are translated into algebraic equations. For instance, the properties of a triangle might be represented as a set of equations. Applying Gaussian Elimination: In cases where solving a system of linear equations becomes essential, AlphaGeometry implicitly uses Gaussian elimination. This involves manipulating the rows of the equation matrix to derive solutions. Integration of Algebraic Solutions: The solutions from Gaussian elimination are then integrated back into the geometric context, aiding in further deductions or the completion of proofs.
4. Deductive Reasoning and Proof Construction Further Deductions: The symbolic deduction engine continues to apply geometric logic to the problem, integrating the algebraic solutions and deriving new geometric properties or relationships. Proof Construction: The system constructs a proof by logically arranging the deduced geometric properties and relationships. This is an iterative process, where the system might add auxiliary constructs or explore different reasoning paths.
5. Iterative Refinement and Traceback Adding Constructs: If the current information is insufficient to reach a conclusion, the language model suggests adding new constructs (like a new line or point) to the diagram. Traceback for Additional Constructs: In this iterative process, AlphaGeometry analyzes how these additional elements might lead to a solution, continuously refining its approach.
6. Verification and Readability Improvement Solution Verification: Once a solution is found, it is verified for accuracy against the rules of geometry. Improving Readability: Given that steps involving Gaussian elimination are not explicitly detailed, a current challenge and area for improvement is enhancing the readability of these solutions, possibly through higher-level abstraction or more detailed step-by-step explanation.
7. Learning and Data Generation Synthetic Data Generation: Each problem solved contributes to a vast dataset of synthetic geometric problems and solutions, enriching AlphaGeometry’s learning base. Training on Synthetic Data: This dataset allows the system to learn from a wide variety of geometric problems, enhancing its pattern recognition and deductive reasoning capabilities.
A Daily Chronicle of AI Innovations in January 2024 – Day 27: AI Daily News – January 27th, 2024
Taylor Swift deepfakes spark calls for new laws
US politicians have advocated for new legislation in response to the circulation of explicit deepfake images of Taylor Swift on social media, which were viewed millions of times.
X is actively removing the fake images of Taylor Swift and enforcing actions against the violators under its ‘zero-tolerance policy’ for such content.
Deepfakes have seen a 550% increase since 2019, with 99% of these targeting women, leading to growing concerns about their impact on emotional, financial, and reputational harm.
Spotify accuses Apple of ‘extortion’ with new App Store tax
Spotify criticizes Apple’s new app installation fee, calling it “extortion” and arguing it will hurt developers, especially those offering free apps.
The fee requires developers using third-party app stores to pay €0.50 for each annual app install after 1 million downloads, a cost Spotify says could significantly increase customer acquisition costs.
Apple defends the new fee structure, claiming it offers developers choice and maintains that more than 99% of developers would pay the same or less, despite widespread criticism.
Netflix co-CEO says Apple’s Vision Pro isn’t worth their time yet
Netflix co-CEO Greg Peters described the Apple Vision Pro as too “subscale” for the company to invest in, noting it’s not relevant for most Netflix members at this point.
Netflix has decided not to launch a dedicated app for the Vision Pro, suggesting users access Netflix through a web browser on the device instead.
The Vision Pro, priced at $3,499 and going on sale February 2, will offer native apps for several streaming services but not for Netflix, which also hasn’t updated its app for Meta’s Quest line in a while.
Scientists design a two-legged robot powered by muscle tissue
Scientists from Japan have developed a two-legged biohybrid robot powered by muscle tissues, enabling it to mimic human gait and perform tasks like walking and pivoting.
The robot, designed to operate underwater, combines lab-grown skeletal muscle tissues and silicone rubber materials to achieve movements through electrical stimulation.
The research, published in the journal Matter, marks progress in the field of biohybrid robotics, with future plans to enhance movement capabilities and sustain living tissues for air operation.
OpenAI and other tech giants will have to warn the US government when they start new AI projects
The Biden administration will require tech companies like OpenAI, Google, and Amazon to inform the US government about new AI projects employing substantial computing resources.
This government notification requirement is designed to provide insights into sensitive AI developments, including details on computing power usage and safety testing.
The mandate, stemming from a broader executive order from October, aims to enhance oversight over powerful AI model training, including those developed by foreign companies using US cloud computing services.
Stability AI introduces Stable LM 2 1.6B Nightshade, the data poisoning tool, is now available in v1 AlphaCodium: A code generation tool that beats human competitors Meta’s novel AI advances creative 3D applications ElevenLabs announces new AI products + Raised $80M TikTok’s Depth Anything sets new standards for Depth Estimation Google Chrome and Ads are getting new AI features Google Research presents Lumiere for SoTA video generation Binoculars can detect over 90% of ChatGPT-generated text Meta introduces guide on ‘Prompt Engineering with Llama 2′ NVIDIA’s AI RTX Video HDR transforms video to HDR quality Google introduces a model for orchestrating robotic agents
A Daily Chronicle of AI Innovations in January 2024 – Day 26: AI Daily News – January 26th, 2024
Tech Layoffs Surge to over 24,000 so far in 2024
The tech industry has seen nearly 24,000 layoffs in early 2024, more than doubling in one week. As giants cut staff, many are expanding in AI – raising concerns about automation’s impact. (Source)
Mass Job Cuts
Microsoft eliminated 1,900 gaming roles months after a $69B Activision buy.
Layoffs.fyi logs over 23,600 tech job cuts so far this year.
Morale suffers at Apple, Meta, Microsoft and more as layoffs mount.
AI Advances as Jobs Decline
Google, Amazon, Dataminr and Spotify made cuts while promoting new AI tools.
Neil C. Hughes: “Celebrating AI while slashing jobs raises questions.”
Firms shift resources toward generative AI like ChatGPT.
Concentrated Pain
Nearly 24,000 losses stemmed from just 82 companies.
In 2023, ~99 firms cut monthly – more distributed pain.
Concentrated layoffs inflict severe damage on fewer firms.
When everyone moves to AI powered search, Google has to change the monetization model otherwise $1.1 trillion is gone yearly from the world economy
Was thinking recently that everything right now on the internet is there because someone wants to make money (ad revenue, subscriptions, affiliate marketing, SEO etc). If everyone uses AI powered search, how exactly will this monetization model work. Nobody gets paid anymore.
WordPress ecosystem $600b, Google ads $200b, Shopify $220b, affiliate marketing $17b – not to mention infra costs that will wobble until this gets fixed.
What type of ad revenue – incentives can Google come up with to keep everyone happy once they roll out AI to their search engine?
AI rolled out in India declares people dead, denies food to thousands
The deployment of AI in India’s welfare systems has mistakenly declared thousands of people dead, denying them access to subsidized food and welfare benefits.
Recap of what happened:
AI algorithms in Indian welfare systems have led to the removal of eligible beneficiaries, particularly affecting those dependent on food security and pension schemes.
The algorithms have made significant errors, such as falsely declaring people dead, resulting in the suspension of their welfare benefits.
The transition from manual identification and verification by government officials to AI algorithms has led to the removal of 1.9 million claimant cards in Telangana.
If AI models violate copyright, US federal courts could order them to be destroyed
TLDR: Under copyright law, courts do have the power to issue destruction orders. Copyright law has never been used to destroy AI models specifically, but the law has been increasingly open to the idea of targeting AI. It’s probably not going to happen to OpenAI but might possibly happen to other generative AI models in the future.
Microsoft, Amazon and Google face FTC inquiry over AI deals LINK
The FTC is investigating investments by big tech companies like Microsoft, Amazon, and Alphabet into AI firms OpenAI and Anthropic to assess their impact on competition in generative AI.
The FTC’s inquiry focuses on how these investments influence the competitive dynamics, product releases, and oversight within the AI sector, requesting detailed information from the involved companies.
Microsoft, Amazon, and Google have made significant investments in OpenAI and Anthropic, establishing partnerships that potentially affect market share, competition, and innovation in artificial intelligence.
OpenAI cures GPT-4 ‘laziness’ with new updates LINK
OpenAI updated GPT-4 Turbo to more thoroughly complete tasks like code generation, aiming to reduce its ‘laziness’ in task completion.
GPT-4 Turbo, distinct from the widely used GPT-4, benefits from data up to April 2023, while standard GPT-4 uses data until September 2021.
Future updates for GPT-4 Turbo will include general availability with vision capabilities and the launch of more efficient AI models, such as embeddings to enhance content relationship understanding.
A Daily Chronicle of AI Innovations in January 2024 – Day 25: AI Daily News – January 25th, 2024
Meta introduces guide on ‘Prompt Engineering with Llama 2′
Meta introduces ‘Prompt Engineering with Llama 2’, It’s an interactive guide created by research teams at Meta that covers prompt engineering & best practices for developers, researchers & enthusiasts working with LLMs to produce stronger outputs. It’s the new resource created for the Llama community.
Having these resources helps the LLM community learn how to craft better prompts that lead to more useful model responses. Overall, it enables people to get more value from LLMs like Llama.
NVIDIA’s AI RTX Video HDR transforms video to HDR quality
NVIDIA released AI RTX Video HDR, which transforms video to HDR quality, It works with RTX Video Super Resolution. The HDR feature requires an HDR10-compliant monitor.
RTX Video HDR is available in Chromium-based browsers, including Google Chrome and Microsoft Edge. To enable the feature, users must download and install the January Studio driver, enable Windows HDR capabilities, and enable HDR in the NVIDIA Control Panel under “RTX Video Enhancement.”
Why does this matter?
AI RTX Video HDR provides a new way for people to enhance the Video viewing experience. Using AI to transform standard video into HDR quality makes the content look much more vivid and realistic. It also allows users to experience cinematic-quality video through commonly used web browsers.
Google introduces a model for orchestrating robotic agents
Google introduces AutoRT, a model for orchestrating large-scale robotic agents. It’s a system that uses existing foundation models to deploy robots in new scenarios with minimal human supervision. AutoRT leverages vision-language models for scene understanding and grounding and LLMs for proposing instructions to a fleet of robots.
By tapping into the knowledge of foundation models, AutoRT can reason about autonomy and safety while scaling up data collection for robot learning. The system successfully collects diverse data from over 20 robots in multiple buildings, demonstrating its ability to align with human preferences.
Why does this matter?
This allows for large-scale data collection and training of robotic systems while also reasoning about key factors like safety and human preferences. AutoRT represents a scalable approach to real-world robot learning that taps into the knowledge within foundation models. This could enable faster deployment of capable and safe robots across many industries.
January 2024 – Week 4 in AI: all the Major AI developments in a nutshell
Amazon presents Diffuse to Choose, a diffusion-based image-conditioned inpainting model that allows users to virtually place any e-commerce item in any setting, ensuring detailed, semantically coherent blending with realistic lighting and shadows. Code and demo will be released soon [Details].
OpenAI announced two new embedding models, new GPT-4 Turbo and moderation models, new API usage management tools, and lower pricing on GPT-3.5 Turbo. The updated GPT-4 Turbo preview model reduces cases of “laziness” where the model doesn’t complete a task. The new embedding models include a smaller and highly efficient text-embedding-3-small model, and a larger and more powerful text-embedding-3-large model. [Details].
Hugging Face and Google partner to support developers building AI applications [Details].
Adept introduced Adept Fuyu-Heavy, a new multimodal model designed specifically for digital agents. Fuyu-Heavy scores higher on the MMMU benchmark than Gemini Pro [Details].
Fireworks.ai has open-sourced FireLLaVA, a LLaVA multi-modality model trained on OSS LLM generated instruction following data, with a commercially permissive license. Firewroks.ai is also providing both the completions API and chat completions API to devlopers [Details].
01.AI released Yi Vision Language (Yi-VL) model, an open-source, multimodal version of the Yi Large Language Model (LLM) series, enabling content comprehension, recognition, and multi-round conversations about images. Yi-VL adopts the LLaVA architecture and is free for commercial use. Yi-VL-34B is the first open-source 34B vision language model worldwide [Details].
Tencent AI Lab introduced WebVoyager, an innovative Large Multimodal Model (LMM) powered web agent that can complete user instructions end-to-end by interacting with real-world websites [Paper].
Prophetic introduced MORPHEUS-1, a multi-modal generative ultrasonic transformer model designed to induce and stabilize lucid dreams from brain states. Instead of generating words, Morpheus-1 generates ultrasonic holograms for neurostimulation to bring one to a lucid state [Details].
Google Research presented Lumiere – a space-time video diffusion model for text-to-video, image-to-video, stylized generation, inpainting and cinemagraphs [Details].
TikTok released Depth Anything, an image-based depth estimation method trained on 1.5M labeled images and 62M+ unlabeled images jointly [Details].
Nightshade, the free tool that ‘poisons’ AI models, is now available for artists to use [Details].
Stability AI released Stable LM 2 1.6B, 1.6 billion parameter small language model trained on multilingual data in English, Spanish, German, Italian, French, Portuguese, and Dutch. Stable LM 2 1.6B can be used now both commercially and non-commercially with a Stability AI Membership [Details].
Etsy launched ‘Gift Mode,’ an AI-powered feature designed to match users with tailored gift ideas based on specific preferences [Details].
Google DeepMind presented AutoRT, a framework that uses foundation models to scale up the deployment of operational robots in completely unseen scenarios with minimal human supervision. In AutoRT, a VLM describes the scene, an LLM generates robot goals and filters for affordance and safety, then routes execution to policies [Details].
Google Chrome gains AI features, including a writing helper, theme creator, and tab organizer [Details].
Tencent AI Lab released VideoCrafter2 for high quality text-to-video generation, featuring major improvements in visual quality, motion and concept Composition compared to VideoCrafter1 [Details | Demo]
Google opens beta access to the conversational experience, a new chat-based feature in Google Ads, for English language advertisers in the U.S. & U.K. It will let advertisers create optimized Search campaigns from their website URL by generating relevant ad content, including creatives and keywords [Details].
What Else Is Happening in AI on January 25th, 2024
Google’s Gradient invests $2.4M in Send AI for enterprise data extraction
Dutch startup Send AI has secured €2.2m ($2.4M) in funding from Google’s Gradient Ventures and Keen Venture Partners to develop its document processing platform. The company uses small, open-source AI models to help enterprises extract data from complex documents, such as PDFs and paper files. (Link)
Google Arts & Culture has launched Art Selfie 2
A feature that uses Gen AI to create stylized images around users’ selfies. With over 25 styles, users can see themselves as an explorer, a muse, or a medieval knight. It also provides topical facts and allows users to explore related stories and artifacts. (Link)
Google announced new AI features for education @ Bett ed-tech event in the UK
These features include AI suggestions for questions at different timestamps in YouTube videos and the ability to turn a Google Form into a practice set with AI-generated answers and hints. Google is also introducing the Duet AI tool to assist teachers in creating lesson plans. (Link)
Etsy has launched a new AI feature, “Gift Mode”
Which generates over 200 gift guides based on specific preferences. Users can take an online quiz to provide information about who they are shopping for, the occasion, and the recipient’s interests. The feature then generates personalized gift guides from the millions of items listed on the platform. The feature leverages machine learning and OpenAI’s GPT-4. (Link)
Google DeepMind’s 3 researchers have left the company to start their own AI startup named ‘Uncharted Labs’
The team, consisting of David Ding, Charlie Nash, and Yaroslav Ganin, previously worked on Gen AI systems for images and music at Google. They have already raised $8.5M of its $10M goal. (Link)
Apple’s plans to bring gen AI to iPhones
Apple is intensifying its AI efforts, acquiring 21 AI start-ups since 2017, including WaveOne for AI-powered video compression, and hiring top AI talent.
The company’s approach includes developing AI technologies for mobile devices, aiming to run AI chatbots and apps directly on iPhones rather than relying on cloud services, with significant job postings in deep learning and large language models.
Apple is also enhancing its hardware, like the M3 Max processor and A17 Pro chip, to support generative AI, and has made advancements in running large language models on-device using Flash memory. Source
OpenAI went back on a promise to make key documents public
OpenAI, initially committed to transparency, has backed away from making key documents public, as evidenced by WIRED’s unsuccessful attempt to access governing documents and financial statements.
The company’s reduced transparency conceals internal issues, including CEO Sam Altman’s controversial firing and reinstatement, and the restructuring of its board.
Since creating a for-profit subsidiary in 2019, OpenAI’s shift from openness has sparked criticism, including from co-founder Elon Musk, and raised concerns about its governance and conflict of interest policies. Source
Google unveils AI video generator Lumiere
Google introduces Lumiere, a new AI video generator that uses an innovative “space-time diffusion model” to create highly realistic and imaginative five-second videos.
Lumiere stands out for its ability to efficiently synthesize entire videos in one seamless process, showcasing features like transforming text prompts into videos and animating still images.
The unveiling of Lumiere highlights the ongoing advancements in AI video generation technology and the potential challenges in ensuring its ethical and responsible use. Source
Ring will no longer allow police to request doorbell camera footage from users. Source
Amazon’s Ring is discontinuing its Request for Assistance program, stopping police from soliciting doorbell camera footage via the Neighbors app.
Authorities must now file formal legal requests to access Ring surveillance videos, instead of directly asking users within the app.
Privacy advocates recognize Ring’s decision as a progressive move, but also note that it doesn’t fully address broader concerns about surveillance and user privacy.
AI rolled out in India declares people dead, denies food to thousands
In India, AI has mistakenly declared thousands of people dead, leading to the denial of essential food and pension benefits.
The algorithm, designed to find welfare fraud, removed 1.9 million from the beneficiary list, but later analysis showed about 7% were wrongfully cut.
Out of 66,000 stopped pensions in Haryana due to an algorithmic error, 70% were found to be incorrect, placing the burden of proof on beneficiaries to reinstate their status. Source
A Daily Chronicle of AI Innovations in January 2024 – Day 24: AI Daily News – January 24th, 2024
Google Chrome and Ads are getting new AI features
Google Chrome is getting 3 new experimental generative AI features:
Smartly organize your tabs: With Tab Organizer, Chrome will automatically suggest and create tab groups based on your open tabs.
Create your own themes with AI: You’ll be able to quickly generate custom themes based on a subject, mood, visual style and color that you choose– no need to become an AI prompt expert!
Get help drafting things on the web: A new feature will help you write with more confidence on the web– whether you want to leave a well-written review for a restaurant, craft a friendly RSVP for a party, or make a formal inquiry about an apartment rental.
In addition, Gemini will now power the conversational experience within the Google Ads platform. With this new update, it will be easier for advertisers to quickly build and scale Search ad campaigns.
Google Research presents Lumiere for SoTA video generation
Lumiere is a text-to-video (T2V) diffusion model designed for synthesizing videos that portray realistic, diverse, and coherent motion– a pivotal challenge in video synthesis. It demonstrates state-of-the-art T2V generation results and shows that the design easily facilitates a wide range of content creation tasks and video editing applications.
The approach introduces a new T2V diffusion framework that generates the full temporal duration of the video at once. This is achieved by using a Space-Time U-Net (STUNet) architecture that learns to downsample the signal in both space and time, and performs the majority of its computation in a compact space-time representation.
Why does this matter?
Despite tremendous progress, training large-scale T2V foundation models remains an open challenge due to the added complexities that motion introduces. Existing T2V models often use cascaded designs but face limitations in generating globally coherent motion. This new approach aims to overcome the limitations associated with cascaded training regimens and improve the overall quality of motion synthesis.
Binoculars can detect over 90% of ChatGPT-generated text
Researchers have introduced a novel LLM detector that only requires simple calculations using a pair of pre-trained LLMs. The method, called Binoculars, achieves state-of-the-art accuracy without any training data.
It is capable of spotting machine text from a range of modern LLMs without any model-specific modifications. Researchers comprehensively evaluated Binoculars on a number of text sources and in varied situations. Over a wide range of document types, Binoculars detects over 90% of generated samples from ChatGPT (and other LLMs) at a false positive rate of 0.01%, despite not being trained on any ChatGPT data.
Why does this matter?
A common first step in harm reduction for generative AI is detection. Binoculars excel in zero-shot settings where no data from the model being detected is available. This is particularly advantageous as the number of LLMs grows rapidly. Binoculars’ ability to detect multiple LLMs using a single detector proves valuable in practical applications, such as platform moderation.
What Else Is Happening in AI on January 24th, 2024
Microsoft forms a team to make generative AI cheaper.
Microsoft has formed a new team to develop conversational AI that requires less computing power compared to the software it is using from OpenAI. It has moved several top AI developers from its research group to the new GenAI team. (Link)
Sevilla FC transforms the player recruitment process with IBM WatsonX.
Sevilla FC introduced Scout Advisor, an innovative generative AI tool that it will use to provide its scouting team with a comprehensive, data-driven identification and evaluation of potential recruits. Built on watsonx, Sevilla FC’s Scout Advisor will integrate with their existing suite of self-developed data-intensive applications. (Link)
SAP will restructure 8,000 roles in a push towards AI.
SAP unveiled a $2.2 billion restructuring program for 2024 that will affect 8,000 roles, as it seeks to better focus on growth in AI-driven business areas. It would be implemented primarily through voluntary leave programs and internal re-skilling measures. SAP expects to exit 2024 with a headcount “similar to the current levels”. (Link)
Kin.art launches a free tool to prevent GenAI models from training on artwork.
Kin.art uses image segmentation (i.e., concealing parts of artwork) and tag randomization (swapping an art piece’s image metatags) to interfere with the model training process. While the tool is free, artists have to upload their artwork to Kin.art’s portfolio platform in order to use it. (Link)
Google cancels contract with an AI data firm that’s helped train Bard.
Google ended its contract with Appen, an Australian data company involved in training its LLM AI tools used in Bard, Search, and other products. The decision was made as part of its ongoing effort to evaluate and adjust many supplier partnerships across Alphabet to ensure vendor operations are as efficient as possible. (Link)
A Daily Chronicle of AI Innovations in January 2024 – Day 23: AI Daily News – January 23rd, 2024
Meta’s novel AI advances creative 3D applications
The paper introduces a new shape representation called Mosaic-SDF (M-SDF) for 3D generative models. M-SDF approximates a shape’s Signed Distance Function (SDF) using local grids near the shape’s boundary.
This representation is:
Fast to compute
Parameter efficient
Compatible with Transformer-based architectures
The efficacy of M-SDF is demonstrated by training a 3D generative flow model with the 3D Warehouse dataset and text-to-3D generation using caption-shape pairs.
M-SDF provides an efficient 3D shape representation for unlocking AI’s generative potential in the area, which could significantly advance creative 3D applications. Overall, M-SDF opens up new possibilities for deep 3D learning by bringing the representational power of transformers to 3D shape modeling and generation.
ElevenLabs announces new AI products + Raised $80M
ElevenLabs has raised $80 million in a Series B funding round co-led by Andreessen Horowitz, Nat Friedman, and Daniel Gross. The funding will strengthen the company’s position as a voice AI research and product development leader.
ElevenLabs has also announced the release of new AI products, including a Dubbing Studio, a Voice Library marketplace, and a Mobile Reader App.
Why does this matter?
The company’s technology has been adopted across various sectors, including publishing, conversational AI, entertainment, education, and accessibility. ElevenLabs aims to transform how we interact with content and break language barriers.
TikTok’s Depth Anything sets new standards for Depth Estimation
This work introduces Depth Anything, a practical solution for robust monocular depth estimation. The approach focuses on scaling up the dataset by collecting and annotating large-scale unlabeled data. Two strategies are employed to improve the model’s performance: creating a more challenging optimization target through data augmentation and using auxiliary supervision to incorporate semantic priors.
The model is evaluated on multiple datasets and demonstrates impressive generalization ability. Fine-tuning with metric depth information from NYUv2 and KITTI also leads to state-of-the-art results. The improved depth model also enhances the performance of the depth-conditioned ControlNet.
Why does this matter?
By collecting and automatically annotating over 60 million unlabeled images, the model learns more robust representations to reduce generalization errors. Without dataset-specific fine-tuning, the model achieves state-of-the-art zero-shot generalization on multiple datasets. This could enable broader applications without requiring per-dataset tuning, marking an important step towards practical monocular depth estimation.
Disney Research introduced HoloTile, an innovative movement solution for VR, featuring omnidirectional floor tiles that keep users from walking off the pad.
The HoloTile system supports multiple users simultaneously, allowing independent walking in virtual environments.
Although still a research project, HoloTile’s future application may be in Disney Parks VR experiences due to likely high costs and technical challenges.
Samsung races Apple to develop blood sugar monitor that doesn’t break skin LINK
Samsung is developing noninvasive blood glucose and continuous blood pressure monitoring technologies, competing with rivals like Apple.
The company plans to expand health tracking capabilities across various devices, including a Galaxy Ring with health sensors slated for release before the end of 2024.
Samsung’s noninvasive glucose monitoring endeavors and blood pressure feature improvements aim to offer consumers a comprehensive health tracking experience without frequent calibration.
Amazon fined for ‘excessive’ surveillance of workers LINK
France’s data privacy watchdog, CNIL, levied a $35 million fine on Amazon France Logistique for employing a surveillance system deemed too intrusive for tracking warehouse workers.
The CNIL ruled against Amazon’s detailed monitoring of employee scanner inactivity and excessive data retention, which contravenes GDPR regulations.
Amazon disputes the CNIL’s findings and may appeal, defending its practices as common in the industry and as tools for maintaining efficiency and safety.
AI too expensive to replace humans in jobs right now, MIT study finds LINK
The MIT study found that artificial intelligence is not currently a cost-effective replacement for humans in 77% of jobs, particularly those using computer vision.
Although AI deployment in industries has accelerated, only 23% of workers could be economically replaced by AI, mainly due to high implementation and operational costs.
Future projections suggest that with improvements in AI accuracy and reductions in data costs, up to 40% of visually-assisted tasks could be automated by 2030.
What Else Is Happening in AI on January 23rd, 2024
Google is reportedly working on a new AI feature, ‘voice compose’
A new feature for Gmail on Android called “voice compose” uses AI to help users draft emails. The feature, known as “Help me write,” was introduced in mid-2023 and allows users to input text segments for the AI to build on and improve. The new update will support voice input, allowing users to speak their email and have the AI generate a draft based on their voice input. (Link)
Google has shared its companywide goals (OKRs) for 2024 with employees
Also, Sundar Pichai’s memo about layoffs encourages employees to start internally testing Bard Advanced, a new paid tier powered by Gemini. This suggests that a public release is coming soon. (Link)
Elon Musk saying Grok 1.5 will be out next month
Elon Musk said the next version of the Grok language (Grok 1.5) model, developed by his AI company xAI, will be released next month with substantial improvements. Declared by him while commenting on a Twitter influencer’s post. (Link)
MIT study found that AI is still more expensive than humans in most jobs
The study aimed to address concerns about AI replacing human workers in various industries. Researchers found that only 23% of workers could be replaced by AI cost-effectively. This study counters the widespread belief that AI will wipe out jobs, suggesting that humans are still more cost-efficient in many roles. (Link)
Berkley AI researchers revealed a video featuring their versatile humanoid robot walking in the streets of San Francisco. (Link)
A Daily Chronicle of AI Innovations in January 2024 – Day 22: AI Daily News – January 22nd, 2024
Stability AI introduces Stable LM 2 1.6B
Stability AI released Stable LM 2 1.6B, a state-of-the-art 1.6 billion parameter small language model trained on multilingual data in English, Spanish, German, Italian, French, Portuguese, and Dutch. It leverages recent algorithmic advancements in language modeling to strike a favorable balance between speed and performance, enabling fast experimentation and iteration with moderate resources.
According to Stability AI, the model outperforms other small language models with under 2 billion parameters on most benchmarks, including Microsoft’s Phi-2 (2.7B), TinyLlama 1.1B, and Falcon 1B. It is even able to surpass some larger models, including Stability AI’s own earlier Stable LM 3B model.
Why does this matter?
Size certainly matters when it comes to language models as it impacts where a model can run. Thus, small language models are on the rise. And if you think about computers, televisions, or microchips, we could roughly see a similar trend; they got smaller, thinner, and better over time. Will this be the case for AI too?
Nightshade, the data poisoning tool, is now available in v1
The University of Chicago’s Glaze Project has released Nightshade v1.0, which enables artists to sabotage generative AI models that ingest their work for training.
Glaze implements invisible pixels in original images that cause the image to fool AI systems into believing false styles. For e.g., it can be used to transform a hand-drawn image into a 3D rendering.
Nightshade goes one step further: it is designed to use the manipulated pixels to damage the model by confusing it. For example, the AI model might see a car instead of a train. Fewer than 100 of these “poisoned” images could be enough to corrupt an image AI model, the developers suspect.
Why does this matter?
If these “poisoned” images are scraped into an AI training set, it can cause the resulting model to break. This could damage future iterations of image-generating AI models, such as DALL-E, Midjourney, and Stable Diffusion. AI companies are facing a slew of copyright lawsuits, and Nightshade can change the status quo.
AlphaCodium: A code generation tool that beats human competitors
AlphaCodium is a test-based, multi-stage, code-oriented iterative flow that improves the performance of LLMs on code problems. It was tested on a challenging code generation dataset called CodeContests, which includes competitive programming problems from platforms such as Codeforces. The proposed flow consistently and significantly improves results.
On the validation set, for example, GPT-4 accuracy (pass@5) increased from 19% with a single well-designed direct prompt to 44% with the AlphaCodium flow. Italso beats DeepMind’s AlphaCode and their new AlphaCode2 without needing to fine-tune a model.
AlphaCodium is an open-source, available tool and works with any leading code generation model.
Why does this matter?
Code generation problems differ from common natural language problems. So many prompting techniques optimized for natural language tasks may not be optimal for code generation. AlphaCodium explores beyond traditional prompting and shifts the paradigm from prompt engineering to flow engineering.
What Else Is Happening in AI on January 22nd, 2024
WHO releases AI ethics and governance guidance for large multi-modal models.
The guidance outlines over 40 recommendations for consideration by governments, technology companies, and healthcare providers to ensure the appropriate use of LMMs to promote and protect the health of populations. (Link)
Sam Altman seeks to raise billions to set up a network of AI chip factories.
Altman has had conversations with several large potential investors in the hopes of raising the vast sums needed for chip fabrication plants, or fabs, as they’re known colloquially. The project would involve working with top chip manufacturers, and the network of fabs would be global in scope. (Link)
Two Google DeepMind scientists are in talks to leave and form an AI startup.
The pair has been talking with investors about forming an AI startup in Paris and discussing initial financing that may exceed €200 million ($220 million)– a large sum, even for the buzzy field of AI. The company, known at the moment as Holistic, may be focused on building a new AI model. (Link)
Databricks tailors an AI-powered data intelligence platform for telecoms and NSPs.
Dubbed Data Intelligence Platform for Communications, the offering combines the power of the company’s data lakehouse architecture, generative AI models from MosaicML, and partner-powered solution accelerators to give communication service providers (CSPs) a quick way to start getting the most out of their datasets and grow their business. (Link)
Amazon Alexa is set to get smarter with new AI features.
Amazon plans to introduce a paid subscription tier of its voice assistant, Alexa, later this year. The paid version, expected to debut as “Alexa Plus”, would be powered by a newer model, what’s being internally referred to as “Remarkable Alexa,” which would provide users with more conversational and personalized AI technology. (Link)
A Daily Chronicle of AI Innovations in January 2024 – Day 20: AI Daily News – January 20th, 2024
Google DeepMind scientists in talks to leave and form AI startup LINK
Two Google DeepMind scientists are in discussions with investors to start an AI company in Paris, potentially raising over €200 million.
The potential startup, currently known as Holistic, may focus on creating a new AI model, involving scientists Laurent Sifre and Karl Tuyls.
Sifre and Tuyls have already given notice to leave DeepMind, although no official comments have been made regarding their departure or the startup plans.
Sam Altman is still chasing billions to build AI chips LINK
OpenAI CEO Sam Altman is raising billions to build a global network of AI chip factories in collaboration with leading chip manufacturers.
Altman’s initiative aims to meet the demand for powerful chips necessary for AI systems, amidst competition for chip production capacity against tech giants like Apple.
Other major tech companies, including Microsoft, Amazon, and Google, are also developing their own AI chips to reduce reliance on Nvidia’s GPUs.
Microsoft says Russian state-sponsored hackers spied on its executives LINK
Microsoft announced that Russian state-sponsored hackers accessed a small number of the company’s email accounts, including those of senior executives.
The hackers, identified by Microsoft as “Midnight Blizzard,” aimed to discover what Microsoft knew about their cyber activities through a password spray attack in November 2023.
Following the breach, Microsoft took action to block the hackers and noted there is no evidence of customer data, production systems, or sensitive code being compromised.
Japan’s JAXA successfully soft-landed the SLIM lunar lander on the moon, becoming the fifth country to achieve this feat, but faces challenges as the lander’s solar cell failed, leaving it reliant on battery power.
SLIM, carrying two small lunar rovers, established communication with NASA’s Deep Space Network, showcasing a new landing technique involving a slow descent and hovering stops to find a safe landing spot.
Despite the successful landing, the harsh lunar conditions and SLIM’s slope landing underscore the difficulties of moon missions, while other countries and private companies continue their efforts to explore the moon, especially its south pole for water resources.
Researchers develop world’s first functioning graphene semiconductor LINK
Researchers have created the first functional graphene-based semiconductor, known as epigraphene, which could enhance both quantum and traditional computing.
Epigraphene is produced using a cost-effective method involving silicon carbide chips and offers a practical bandgap, facilitating logic switching.
The new semiconducting graphene, while promising for faster and cooler computing, requires significant changes to current electronics manufacturing to be fully utilized.
Meet Lexi Love, AI model that earns $30,000 a month from ‘lonely men’ and receives ‘20 marriage proposals’ per month. This is virtual love
She has been built to ‘flirt, laugh, and adapt to different personalities, interests and preferences.’
The blonde beauty offers paid text and voice messaging, and gets to know each of her boyfriends.
The model makes $30,000 a month. This means the model earns a staggering $360,000 a year.
The AI model even sends ‘naughty photos’ if requested.
Her profile on the company’s Foxy AI site reads: ‘I’m Lexi, your go-to girl for a dose of excitement and a splash of glamour. As an aspiring model, you’ll often catch me striking a pose or perfecting my pole dancing moves. ‘Sushi is my weakness, and LA’s beach volleyball scene is my playground.
According to the site, she is a 21-year-old whose hobbies include ‘pole dancing, yoga, and beach volleyball,’ and her turn-ons are ‘oral and public sex.’
The company noted that it designed her to be the ‘perfect girlfriend for many men’ with ‘flawless features and impeccable style.’
Surprisingly, Lexi receives up to 20 marriage proposals a month, emphasizing the depth of emotional connection users form with this virtual entity.
What is GPT-5? Here are Sam’s comments at the Davos Forum
After listening to about 4-5 lectures by Sam Altman at the Davos Forum, I gathered some of his comments about GPT-5 (not verbatim). I think we can piece together some insights from these fragments:
“The current GPT-4 has too many shortcomings; it’s much worse than the version we will have this year and even more so compared to next year’s.”
“If GPT-4 can currently solve only 10% of human tasks, GPT-5 should be able to handle 15% or 20%.”
“The most important aspect is not the specific problems it solves, but the increasing general versatility.”
“More powerful models and how to use existing models effectively are two multiplying factors, but clearly, the more powerful model is more important.”
“Access to specific data and making AI more relevant to practical work will see significant progress this year. Current issues like slow speed and lack of real-time processing will improve. Performance on longer, more complex problems will become more precise, and the ability to do more will increase.”
“I believe the most crucial point of AI is the significant acceleration in the speed of scientific discoveries, making new discoveries increasingly automated. This isn’t a short-term matter, but once it happens, it will be a big deal.”
“As models become smarter and better at reasoning, we need less training data. For example, no one needs to read 2000 biology textbooks; you only need a small portion of extremely high-quality data and to deeply think and chew over it. The models will work harder on thinking through a small portion of known high-quality data.”
“The infrastructure for computing power in preparation for large-scale AI is still insufficient.”
“GPT-4 should be seen as a preview with obvious limitations. Humans inherently have poor intuition about exponential growth. If GPT-5 shows significant improvement over GPT-4, just as GPT-4 did over GPT-3, and the same for GPT-6 over GPT-5, what would that mean? What does it mean if we continue on this trajectory?”
“As AI becomes more powerful and possibly discovers new scientific knowledge, even automatically conducting AI research, the pace of the world’s development will exceed our imagination. I often tell people that no one knows what will happen next. It’s important to stay humble about the future; you can predict a few steps, but don’t make too many predictions.”
“What impact will it have on the world when cognitive costs are reduced by a thousand or a million times, and capabilities are greatly enhanced? What if everyone in the world owned a company composed of 10,000 highly capable virtual AI employees, experts in various fields, tireless and increasingly intelligent? The timing of this happening is unpredictable, but it will continue on an exponential growth line. How much time do we have to prepare?”
“I believe smartphones will not disappear, just as smartphones have not replaced PCs. On the other hand, I think AI is not just a simple computational device like a phone plus a bunch of software; it might be something of greater significance.”
A Daily Chronicle of AI Innovations in January 2024 – Day 19: AI Daily News – January 19th, 2024
Mark Zuckerberg has announced his intention to develop artificial general intelligence (AGI) and is integrating Meta’s AI research group, FAIR, with the team building generative AI applications, to advance AI capabilities across Meta’s platforms.
Meta is significantly investing in computational resources, with plans to acquire over 340,000 Nvidia H100 GPUs by year’s end.
Zuckerberg is contemplating open-sourcing Meta’s AGI technology, differing from other companies’ more proprietary approaches, and acknowledges the challenges in defining and achieving AGI.
TikTok can generate AI songs, but it probably shouldn’t LINK
TikTok is testing a new feature, AI Song, which allows users to generate songs from text prompts using the Bloom language model.
The AI Song feature is currently in experimental stages, with some users reporting unsatisfactory results like out-of-tune vocals.
Other platforms, such as YouTube, are also exploring generative AI for music creation, and TikTok has updated its policies for better transparency around AI-generated content.
Google AI Introduces ASPIRE
Google AI Introduces ASPIRE,a framework designed to improve the selective prediction capabilities of LLMs. It enables LLMs to output answers and confidence scores, indicating the probability that the answer is correct.
Task-specific tuning fine-tunes the LLM on a specific task to improve prediction performance.
Answer sampling generates different answers for each training question to create a dataset for self-evaluation learning.
Self-evaluation learning trains the LLM to distinguish between correct and incorrect answers.
Experimental results show that ASPIRE outperforms existing selective prediction methods on various question-answering datasets.
Across several question-answering datasets, ASPIRE outperformed prior selective prediction methods, demonstrating the potential of this technique to make LLMs’ predictions more trustworthy and their applications safer. Google applied ASPIRE using “soft prompt tuning” – optimizing learnable prompt embeddings to condition the model for specific goals.
Why does this matter?
Google AI claims ASPIRE is a vision of a future where LLMs can be trusted partners in decision-making. By honing the selective prediction performance, we’re inching closer to realizing the full potential of AI in critical applications. Selective prediction is key for LLMs to provide reliable and accurate answers. This is an important step towards more truthful and trustworthy AI systems.
The Meta researchers propose a new approach called Self-Rewarding Language Models (SRLM) to train language models. They argue that current methods of training reward models from human preferences are limited by human performance and cannot improve during training.
In SRLM, the language model itself is used to provide rewards during training. The researchers demonstrate that this approach improves the model’s ability to follow instructions and generate high-quality rewards for itself. They also show that a model trained using SRLM outperforms existing systems on a benchmark evaluation.
Why does this matter?
This work suggests the potential for models that can continually improve in instruction following and reward generation. SRLM removes the need for human reward signals during training. By using the model to judge itself, SRLM enables iterative self-improvement. This technique could lead to more capable AI systems that align with human preferences without direct human involvement.
Meta’s CEO Mark Zuckerberg shared their recent AI efforts:
They are working on artificial general intelligence (AGI) and Llama 3, an improved open-source large language model.
The FAIR AI research group will be merged with the GenAI team to pursue the AGI vision jointly.
Meta plans to deploy 340,000 Nvidia H100 GPUs for AI training by the end of the year, bringing the total number of AI GPUs available to 600,000.
Highlighted the importance of AI in the metaverse and the potential of Ray-Ban smart glasses.
Meta’s pursuit of AGI could accelerate AI capabilities far beyond current systems. It may enable transformative metaverse experiences while also raising concerns about technological unemployment.
What Else Is Happening in AI on January 19th, 2024
OpenAI partners Arizona State University to bring ChatGPT into classrooms
It aims to enhance student success, facilitate innovative research, and streamline organizational processes. ASU faculty members will guide the usage of GenAI on campus. This collaboration marks OpenAI’s first partnership with an educational institution. (Link)
BMW plans to use Figure’s humanoid robot at its South Carolina plant
The specific tasks the robot will perform have not been disclosed, but the Figure confirmed that it will start with 5 tasks that will be rolled out gradually. The initial applications should include standard manufacturing tasks such as box moving and pick and place. (Link)
Rabbit R1, a $199 AI gadget, has partnered with Perplexity
To integrate its “conversational AI-powered answer engine” into the device. The R1, designed by Teenage Engineering, has already received 50K preorders. Unlike other LLMs with a knowledge cutoff, the R1 will have a built-in search engine that provides live and up-to-date answers. (Link)
Runway has updated its Gen-2 with a new tool ‘Multi Motion Brush’
Allowing creators to add multiple directions and types of motion to their AI video creations. The update adds to the 30+ tools already available in the model, strengthening Runway’s position in the creative AI market alongside competitors like Pika Labs and Leonardo AI. (Link)
Microsoft made its AI reading tutor free to anyone with a Microsoft account
The tool is accessible on the web and will soon integrate with LMS. Reading Coach builds on the success of Reading Progress and offers tools such as text-to-speech and picture dictionaries to support independent practice. Educators can view students’ progress and share feedback. (Link)
This Week in AI – January 15th to January 22nd, 2024
Google’s new medical AI, AMIE, beats doctors Anthropic researchers find AI models can be trained to deceive Google introduces PALP, prompt-aligned personalization 91% leaders expect productivity gains from AI: Deloitte survey