DjamgaMind: Audio Intelligence for the C-Suite (Daily AI News, Energy, Healthcare, Finance)
Full-Stack AI Intelligence. Zero Noise.The definitive audio briefing for the C-Suite and AI Architects. From Daily News and Strategic Deep Dives to high-density Industrial & Regulatory Intelligence—decoded at the speed of the AI era. . 👉 Start your specialized audio briefing today at Djamgamind.com
AI Jobs and Career
I wanted to share an exciting opportunity for those of you looking to advance your careers in the AI space. You know how rapidly the landscape is evolving, and finding the right fit can be a challenge. That's why I'm excited about Mercor – they're a platform specifically designed to connect top-tier AI talent with leading companies. Whether you're a data scientist, machine learning engineer, or something else entirely, Mercor can help you find your next big role. If you're ready to take the next step in your AI career, check them out through my referral link: https://work.mercor.com/?referralCode=82d5f4e3-e1a3-4064-963f-c197bb2c8db1. It's a fantastic resource, and I encourage you to explore the opportunities they have available.
- Full Stack Engineer [$150K-$220K]
- Software Engineer, Tooling & AI Workflow, Contract [$90/hour]
- DevOps Engineer, India, Contract [$90/hour]
- More AI Jobs Opportunitieshere
| Job Title | Status | Pay |
|---|---|---|
| Full-Stack Engineer | Strong match, Full-time | $150K - $220K / year |
| Developer Experience and Productivity Engineer | Pre-qualified, Full-time | $160K - $300K / year |
| Software Engineer - Tooling & AI Workflows (Contract) | Contract | $90 / hour |
| DevOps Engineer (India) | Full-time | $20K - $50K / year |
| Senior Full-Stack Engineer | Full-time | $2.8K - $4K / week |
| Enterprise IT & Cloud Domain Expert - India | Contract | $20 - $30 / hour |
| Senior Software Engineer | Contract | $100 - $200 / hour |
| Senior Software Engineer | Pre-qualified, Full-time | $150K - $300K / year |
| Senior Full-Stack Engineer: Latin America | Full-time | $1.6K - $2.1K / week |
| Software Engineering Expert | Contract | $50 - $150 / hour |
| Generalist Video Annotators | Contract | $45 / hour |
| Generalist Writing Expert | Contract | $45 / hour |
| Editors, Fact Checkers, & Data Quality Reviewers | Contract | $50 - $60 / hour |
| Multilingual Expert | Contract | $54 / hour |
| Mathematics Expert (PhD) | Contract | $60 - $80 / hour |
| Software Engineer - India | Contract | $20 - $45 / hour |
| Physics Expert (PhD) | Contract | $60 - $80 / hour |
| Finance Expert | Contract | $150 / hour |
| Designers | Contract | $50 - $70 / hour |
| Chemistry Expert (PhD) | Contract | $60 - $80 / hour |
The Future of Generative AI: From Art to Reality Shaping.
Explore the transformative potential of generative AI in our latest AI Unraveled episode. From AI-driven entertainment to reality-altering technologies, we delve deep into what the future holds.
This episode covers how generative AI could revolutionize movie making, impact creative professions, and even extend to DNA alteration. We also discuss its integration in technology over the next decade, from smartphones to fully immersive VR worlds.”
Listen to the Future of Generative AI here
#GenerativeAI #AIUnraveled #AIFuture

Welcome to AI Unraveled, the podcast that demystifies frequently asked questions on artificial intelligence and keeps you up to date with the latest AI trends. Join us as we delve into groundbreaking research, innovative applications, and emerging technologies that are pushing the boundaries of AI. From the latest trends in ChatGPT and the recent merger of Google Brain and DeepMind, to the exciting developments in generative AI, we’ve got you covered with a comprehensive update on the ever-evolving AI landscape. In today’s episode, we’ll cover generative AI in entertainment, the potential transformation of creative jobs, DNA alteration and physical enhancements, personalized solutions and their ethical implications, AI integration in various areas, the future integration of AI in daily life, key points from the episode, and a recommendation for the book “AI Unraveled” to better understand artificial intelligence.
AI-Powered Professional Certification Quiz Platform
Web|iOs|Android|Windows
Are you passionate about AI and looking for your next career challenge? In the fast-evolving world of artificial intelligence, connecting with the right opportunities can make all the difference. We're excited to recommend Mercor, a premier platform dedicated to bridging the gap between exceptional AI professionals and innovative companies.
Whether you're seeking roles in machine learning, data science, or other cutting-edge AI fields, Mercor offers a streamlined path to your ideal position. Explore the possibilities and accelerate your AI career by visiting Mercor through our exclusive referral link:
Find Your AI Dream Job on Mercor
Your next big opportunity in AI could be just a click away!
The Future of Generative AI: The Evolution of Generative AI in Entertainment
Hey there! Today we’re diving into the fascinating world of generative AI in entertainment. Picture this: a Netflix powered by generative AI where movies are actually created based on prompts. It’s like having an AI scriptwriter and director all in one!
Imagine how this could revolutionize the way we approach scriptwriting and audio-visual content creation. With generative AI, we could have an endless stream of unique and personalized movies tailor-made to our interests. No more scrolling through endless options trying to find something we like – the AI knows exactly what we’re into and delivers a movie that hits all the right notes.
AI- Powered Jobs Interview Warmup For Job Seekers

⚽️Comparative Analysis: Top Calgary Amateur Soccer Clubs – Outdoor 2025 Season (Kids' Programs by Age Group)
But, of course, this innovation isn’t without its challenges and ethical considerations. While generative AI offers immense potential, we must be mindful of the biases it may inadvertently introduce into the content it creates. We don’t want movies that perpetuate harmful stereotypes or discriminatory narratives. Striking the right balance between creativity and responsibility is crucial.
Additionally, there’s the question of copyright and ownership. Who would own the rights to a movie created by a generative AI? Would it be the platform, the AI, or the person who originally provided the prompt? This raises a whole new set of legal and ethical questions that need to be addressed.
AI Jobs and Career
And before we wrap up today's AI news, I wanted to share an exciting opportunity for those of you looking to advance your careers in the AI space. You know how rapidly the landscape is evolving, and finding the right fit can be a challenge. That's why I'm excited about Mercor – they're a platform specifically designed to connect top-tier AI talent with leading companies. Whether you're a data scientist, machine learning engineer, or something else entirely, Mercor can help you find your next big role. If you're ready to take the next step in your AI career, check them out through my referral link: https://work.mercor.com/?referralCode=82d5f4e3-e1a3-4064-963f-c197bb2c8db1. It's a fantastic resource, and I encourage you to explore the opportunities they have available.
Overall, generative AI has the power to transform our entertainment landscape. However, we must tread carefully, ensuring that the benefits outweigh the potential pitfalls. Exciting times lie ahead in the world of AI-driven entertainment!
The Future of Generative AI: The Impact on Creative Professions
In this segment, let’s talk about how AI advancements are impacting creative professions. As a graphic designer myself, I have some personal concerns about the need to adapt to these advancements. It’s important for us to understand how generative AI might transform jobs in creative fields.
Invest in your future today by enrolling in this Azure Fundamentals - Pass the Azure Fundamentals Exam with Ease: Master the AZ-900 Certification with the Comprehensive Exam Preparation Guide!
- AWS Certified AI Practitioner (AIF-C01): Conquer the AWS Certified AI Practitioner exam with our AI and Machine Learning For Dummies test prep. Master fundamental AI concepts, AWS AI services, and ethical considerations.
- Azure AI Fundamentals: Ace the Azure AI Fundamentals exam with our comprehensive test prep. Learn the basics of AI, Azure AI services, and their applications.
- Google Cloud Professional Machine Learning Engineer: Nail the Google Professional Machine Learning Engineer exam with our expert-designed test prep. Deepen your understanding of ML algorithms, models, and deployment strategies.
- AWS Certified Machine Learning Specialty: Dominate the AWS Certified Machine Learning Specialty exam with our targeted test prep. Master advanced ML techniques, AWS ML services, and practical applications.
- AWS Certified Data Engineer Associate (DEA-C01): Set yourself up for promotion, get a better job or Increase your salary by Acing the AWS DEA-C01 Certification.
AI is becoming increasingly capable of producing creative content such as music, art, and even writing. This has raised concerns among many creatives, including myself, about the future of our profession. Will AI eventually replace us? While it’s too early to say for sure, it’s important to recognize that AI is more of a tool to enhance our abilities rather than a complete replacement.
Generative AI, for example, can help automate certain repetitive tasks, freeing up our time to focus on more complex and creative work. This can be seen as an opportunity to upskill and expand our expertise. By embracing AI and learning to work alongside it, we can adapt to the changing landscape of creative professions.
Upskilling is crucial in this evolving industry. It’s important to stay updated with the latest AI technologies and learn how to leverage them in our work. By doing so, we can stay one step ahead and continue to thrive in our creative careers.
Overall, while AI advancements may bring some challenges, they also present us with opportunities to grow and innovate. By being open-minded, adaptable, and willing to learn, we can navigate these changes and continue to excel in our creative professions.
The Future of Generative AI: Beyond Content Generation – The Realm of Physical Alterations
Today, folks, we’re diving into the captivating world of physical alterations. You see, there’s more to AI than just creating content. It’s time to explore how AI can take a leap into the realm of altering our DNA and advancing medical applications.
Imagine this: using AI to enhance our physical selves. Picture people with wings or scales. Sounds pretty crazy, right? Well, it might not be as far-fetched as you think. With generative AI, we have the potential to take our bodies to the next level. We’re talking about truly transforming ourselves, pushing the boundaries of what it means to be human.
But let’s not forget to consider the ethical and societal implications. As exciting as these advancements may be, there are some serious questions to ponder. Are we playing God? Will these enhancements create a divide between those who can afford them and those who cannot? How will these alterations affect our sense of identity and equality?
It’s a complex debate, my friends, one that raises profound moral and philosophical questions. On one hand, we have the potential for incredible medical breakthroughs and physical advancements. On the other hand, we risk stepping into dangerous territory, compromising our values and creating a divide in society.
So, as we venture further into the realm of physical alterations, let’s keep our eyes wide open and our minds even wider. There’s a lot at stake here, and it’s up to us to navigate the uncharted waters of AI and its impact on our very existence.
Generative AI as Personalized Technology Tools
In this segment, let’s dive into the exciting world of generative AI and how it can revolutionize personalized technology tools. Picture this: AI algorithms evolving so rapidly that they can create customized solutions tailored specifically to individual needs! It’s mind-boggling, isn’t it?
Now, let’s draw a comparison to “Clarke tech,” where technology appears almost magical. Just like in Arthur C. Clarke’s famous quote, “Any sufficiently advanced technology is indistinguishable from magic.” Generative AI has the potential to bring that kind of magic to our lives by creating seemingly miraculous solutions.
One of the key advantages of generative AI is its ability to understand context. This means that AI systems can comprehend the nuances and subtleties of our queries, allowing them to provide highly personalized and relevant responses. Imagine having a chatbot that not only recognizes what you’re saying but truly understands it in context, leading to more accurate and helpful interactions.
The future of generative AI holds immense promise for creating personalized experiences. As it continues to evolve, we can look forward to technology that adapts itself to our unique needs and preferences. It’s an exciting time to be alive, as we witness the merging of cutting-edge AI advancements and the practicality of personalized technology tools. So, brace yourselves for a future where technology becomes not just intelligent, but intelligently tailored to each and every one of us.
Generative AI in Everyday Technology (1-3 Year Predictions)
So, let’s talk about what’s in store for AI in the near future. We’re looking at a world where AI will become a standard feature in our smartphones, social media platforms, and even education. It’s like having a personal assistant right at our fingertips.
One interesting trend that we’re seeing is the blurring lines between AI-generated and traditional art. This opens up exciting possibilities for artists and enthusiasts alike. AI algorithms can now analyze artistic styles and create their own unique pieces, which can sometimes be hard to distinguish from those made by human hands. It’s kind of mind-blowing when you think about it.
Another aspect to consider is the potential ubiquity of AI in content creation tools. We’re already witnessing the power of AI in assisting with tasks like video editing and graphic design. But in the not too distant future, we may reach a point where AI is an integral part of every creative process. From writing articles to composing music, AI could become an indispensable tool. It’ll be interesting to see how this plays out and how creatives in different fields embrace it.
All in all, AI integration in everyday technology is set to redefine the way we interact with our devices and the world around us. The lines between human and machine are definitely starting to blur. It’s an exciting time to witness these innovations unfold.
So picture this – a future where artificial intelligence is seamlessly woven into every aspect of our lives. We’re talking about a world where AI is a part of our daily routine, be it for fun and games or even the most mundane of tasks like operating appliances.
But let’s take it up a notch. Imagine fully immersive virtual reality worlds that are not just created by AI, but also have AI-generated narratives. We’re not just talking about strapping on a VR headset and stepping into a pre-designed world. We’re talking about AI crafting dynamic storylines within these virtual realms, giving us an unprecedented level of interactivity and immersion.
Now, to make all this glorious future-tech a reality, we need to consider the advancements in material sciences and computing that will be crucial. We’re talking about breakthroughs that will power these AI-driven VR worlds, allowing them to run flawlessly with immense processing power. We’re talking about materials that enable lightweight, comfortable VR headsets that we can wear for hours on end.
It’s mind-boggling to think about the possibilities that this integration of AI, VR, and material sciences holds for our future. We’re talking about a world where reality and virtuality blend seamlessly, and where our interactions with technology become more natural and fluid than ever before. And it’s not a distant future either – this could become a reality in just the next decade.
The Future of Generative AI: Long-Term Predictions and Societal Integration (10 Years)
So hold on tight, because the future is only getting more exciting from here!
So, here’s the deal. We’ve covered a lot in this episode, and it’s time to sum it all up. We’ve discussed some key points when it comes to generative AI and how it has the power to reshape our world. From creating realistic deepfake videos to generating lifelike voices and even designing unique artwork, the possibilities are truly mind-boggling.
But let’s not forget about the potential ethical concerns. With this technology advancing at such a rapid pace, we must be cautious about the misuse and manipulation that could occur. It’s important for us to have regulations and guidelines in place to ensure that generative AI is used responsibly.
Now, I want to hear from you, our listeners! What are your thoughts on the future of generative AI? Do you think it will bring positive changes or cause more harm than good? And what about your predictions? Where do you see this technology heading in the next decade?
Remember, your voice matters, and we’d love to hear your insights on this topic. So don’t be shy, reach out to us and share your thoughts. Together, let’s unravel the potential of generative AI and shape our future responsibly.
Oh, if you’re looking to dive deeper into the fascinating world of artificial intelligence, I’ve got just the thing for you! There’s a fantastic book called “AI Unraveled: Demystifying Frequently Asked Questions on Artificial Intelligence” that you absolutely have to check out. Trust me, it’s a game-changer.
What’s great about this book is that it’s the ultimate guide to understanding artificial intelligence. It takes those complex concepts and breaks them down into digestible pieces, answering all those burning questions you might have. No more scratching your head in confusion!
Now, the best part is that it’s super accessible. You can grab a copy of “AI Unraveled” from popular platforms like Shopify, Apple, Google, or Amazon. Just take your pick, and you’ll be on your way to unraveling the mysteries of AI!
So, if you’re eager to expand your knowledge and get a better grasp on artificial intelligence, don’t miss out on “AI Unraveled.” It’s the must-have book that’s sure to satisfy your curiosity. Happy reading!
The Future of Generative AI: Conclusion
In this episode, we uncovered the groundbreaking potential of generative AI in entertainment, creative jobs, DNA alteration, personalized solutions, AI integration in daily life, and more, while also exploring the ethical implications – don’t forget to grab your copy of “AI Unraveled” for a deeper understanding! Join us next time on AI Unraveled as we continue to demystify frequently asked questions on artificial intelligence and bring you the latest trends in AI, including ChatGPT advancements and the exciting collaboration between Google Brain and DeepMind. Stay informed, stay curious, and don’t forget to subscribe for more!
📢 Advertise with us and Sponsorship Opportunities
Are you eager to expand your understanding of artificial intelligence? Look no further than the essential book “AI Unraveled: Demystifying Frequently Asked Questions on Artificial Intelligence,” available at Shopify, Apple, Google, or Amazon

Elevate Your Design Game with Photoshop’s Generative Fill
Take your creative projects to the next level with #Photoshop’s Generative Fill! This AI-powered tool is a game-changer for designers and artists.
Tutorial: How to Use generative Fill
➡ Use any selection tool to highlight an area or object in your image. Click the Generative Fill button in the Contextual Task Bar.
➡ Enter a prompt describing your vision in the text-entry box. Or, leave it blank and let Photoshop auto-fill the area based on the surroundings.
➡ Click ‘Generate’. Be amazed by the thumbnail previews of variations tailored to your prompt. Each option is added as a Generative Layer in your Layers panel, keeping your original image intact.
Pro Tip: To generate even more options, click Generate again. You can also try editing your prompt to fine-tune your results. Dream it, type it, see it
https://youtube.com/shorts/i1fLaYd4Qnk
- Gama AIby /u/CompetitiveSoft1992 (Artificial Intelligence) on March 8, 2026 at 11:08 pm
What’s your opinion about Google’s AI Gamma? I’ve seen a lot of people talking about its ability to generate images and PDFs, almost like PowerPoint presentations, but I haven’t tested it yet. I’d like to hear your opinion. submitted by /u/CompetitiveSoft1992 [link] [comments]
- Pipeline-based agent orchestration vs single-agent loops — a practical comparisonby /u/Warmaster0010 (Artificial Intelligence) on March 8, 2026 at 11:04 pm
Disclosure: I’m the builder. Most AI coding tools use a single agent in a loop: user prompts → agent generates → user reviews → agent iterates. This works for small tasks but breaks down because the agent accumulates irrelevant context, can’t parallelize, and has no structural gates for quality. I built Swim Code (swimcode.ai) around multi-stage pipelines where each stage has a specialized agent with typed context allocation. The planning agent receives architecture context. The coding agent receives acceptance criteria. The testing agent receives only the code. Observations: Scoped context consistently produces better output than full context dumps. Bounded retry loops resolve ~70% of test failures without human intervention. Git worktree isolation per task enables true parallel execution (3-5). Main failure mode is lossy context summarization in certain edge cases. Model-agnostic: Claude, GPT, Ollama (experimentally). Desktop app, runs locally submitted by /u/Warmaster0010 [link] [comments]
- no attorney-client relationship between the individual and the AI platform, no reasonable expectation of confidentiality, and no protected legal advice.by /u/Cyberthere (Artificial Intelligence) on March 8, 2026 at 10:05 pm
submitted by /u/Cyberthere [link] [comments]
- Made this demo video partially with AIby /u/Jurrrcy (Artificial Intelligence) on March 8, 2026 at 9:47 pm
I made a animation partially with AI. Is the video convincing? Its for synthetic, a cool AI project and I thought it would be fun to use AI videos inside it to show an AI project 🙂 What do yall think? The video submitted by /u/Jurrrcy [link] [comments]
- What would Richard Feynman make of AI today?by /u/AngleAccomplished865 (Artificial Intelligence) on March 8, 2026 at 9:17 pm
His 'cargo cult' idea has been a big influence on many working scientists. But does his "can I build it?" idea apply to AI? ["What I cannot create, I do not understand"] As far as I can tell, Feynman's epistemology assumes that understanding bottoms out somewhere — in quantum field theory, in particle interactions, in something with determinate structure. Does that hold for AI? The "mechanism" isn't fixed, here. LLMs don't have that, right? They have statistical regularities that shift with data, scale, and context. What the article's author isn't understanding is that the thing being modeled isn't a fixed phenomenon waiting to be understood. It's a moving target that partially 'constitutes itself' through the modeling process. In addition, the training data is itself a historical artifact of contingent social processes. ["Contingency" does a lot of work in the social sciences.] So... opinions? https://nautil.us/what-would-richard-feynman-make-of-ai-today-1262875 "Much of today’s artificial intelligence operates as a black box. Models are trained on vast—often proprietary—datasets, and their internal workings remain opaque even to their creators. Modern neural networks can contain millions, sometimes billions, of adjustable parameters. One of Feynman’s contemporaries, John von Neumann, once wryly observed: “With four parameters I can fit an elephant, and with five I can make his tail wiggle.” The metaphor warns of mistaking noise for meaning. Neural networks produce outputs that look fluent, confident, sometimes uncannily insightful. What they rarely provide is an explanation of why a particular answer appears, or when the system is likely to fail. This creates a subtle but powerful temptation. When a system performs impressively, it is easy to treat performance as understanding, and statistical success as explanation. Feynman would have been wary of that move. He once scribbled on his blackboard, near the end of his life, a simple rule of thumb: “What I cannot create, I do not understand.” For him, understanding meant being able to take something apart, to rebuild it, and to know where it would break. Black-box systems invert that instinct. They invite us to accept answers we cannot fully reconstruct, and to trust results whose limits we may not recognize until something goes wrong." submitted by /u/AngleAccomplished865 [link] [comments]
- Will AI mean the end of high level careers in tech?by /u/Throw8976m (Artificial Intelligence) on March 8, 2026 at 9:15 pm
My husband works in IT at the management level. He has over 20 years of experience in coding, architecture and management under his belt. He is constantly fretting that the trend towards AI will mean the end of his career. I personally feel he is overreacting, however I do not have a leg to stand on. Can anyone give him some words of reassurance? Or could he be right? Thank you. submitted by /u/Throw8976m [link] [comments]
- LLMs Explained From First Principles: Vectors, Attention, Backpropagation, and Scaling Limitsby /u/LongjumpingTear3675 (Artificial Intelligence) on March 8, 2026 at 9:01 pm
The core math behind the Google Transformer is not symbolic reasoning or logic, it is linear algebra, probability, and calculus arranged in a very specific way. Everything starts by turning text into numbers. Each word or token is mapped to a vector, meaning a long list of real numbers. These vectors live in a high-dimensional space and are learned during training, so the model slowly shapes where words sit relative to one another. From each token vector, the model computes three new vectors using matrix multiplication. These are called queries, keys, and values. Mathematically, this is just the original vector multiplied by three different learned matrices. There is nothing mysterious here, it is basic linear algebra. The purpose is to create different representations of the same token so it can ask questions about other tokens, be compared against them, and carry information forward. The heart of the Transformer is attention. Attention works by taking the dot product between the query vector of one token and the key vectors of all other tokens. A dot product measures similarity in vector space, essentially asking how aligned two vectors are. These similarity scores are then divided by the square root of the vector dimension to keep the numbers from growing too large, which is purely a numerical stability trick. After that, a softmax function is applied. Softmax converts the raw similarity scores into probabilities that are all positive and sum to one. This turns similarity into a distribution of attention, meaning how much focus each token gives to every other token. Once those probabilities are computed, they are used to take a weighted sum of the value vectors. The result is a new vector for each token that mixes information from other tokens, weighted by relevance. This is how context is formed. Every token becomes a blend of other tokens rather than being processed in isolation. Instead of doing this once, the Transformer uses multi-head attention. Multiple attention operations run in parallel, each with its own learned projection matrices. Each head looks at the same input but learns different patterns, such as syntax, long-range dependencies, or local relationships. The outputs of all heads are concatenated and passed through another matrix multiplication to mix them together. This is still just linear algebra applied repeatedly. Transformers have no built-in sense of word order, so positional information must be added manually. The original design introduced sinusoidal positional encodings using sine and cosine functions at different frequencies. These functions inject position into the vectors in a smooth, continuous way and allow the model to generalize to longer sequences. Mathematically, this is closely related to Fourier features and signal processing. After attention, each token is passed through a feed-forward neural network independently. This network consists of a linear transformation, a nonlinear activation function like ReLU or GELU, and another linear transformation. This step increases the model’s expressive power by letting it reshape information nonlinearly. To make deep stacks of these layers trainable, residual connections and layer normalization are used. The input to each sublayer is added back to its output, and the result is normalized. This stabilizes gradients and prevents information from degrading as it flows through many layers. Without this, training deep Transformers would fail. Training the model uses standard optimization math. The model predicts a probability distribution over the next token using a softmax layer. A cross-entropy loss compares this distribution to the correct token. Backpropagation computes gradients of this loss with respect to every parameter in the network, including all attention matrices and embeddings. Gradient descent or its variants then update those parameters slightly. This process is repeated trillions of times, which is why training is so computationally expensive. In the end, the Transformer introduced by researchers at Google is not powered by reasoning or understanding in a human sense. It is powered by dot products, matrix multiplications, probability distributions, and gradient descent, scaled to an extreme degree. Its strength comes from structure and scale, not from any hidden symbolic intelligence. A neural network is not a brain and it does not think. At its core it is a mathematical system that takes numbers in, transforms them through layers of simple operations, and outputs numbers at the other end. Everything people describe as intelligence comes from how those numbers are arranged and adjusted, not from understanding or intent. The basic unit of a neural network is an artificial neuron. A neuron receives several inputs, where each input is just a numerical value. These inputs might represent pixel brightness, sound amplitudes, sensor readings, or abstract embedding values. On their own these numbers have no meaning. Meaning only appears through how the network treats them. Each input is multiplied by a weight. Weights determine how much influence an input has on the neuron’s output. A large positive weight means the input strongly pushes the output higher. A small weight means the input barely matters. A negative weight means the input pushes the output in the opposite direction. Most of what a neural network “knows” is encoded in these weight values. After multiplying inputs by their weights, the neuron adds all the results together to produce a single number. This is called the weighted sum. At this stage the neuron has not made a decision yet, it has only combined evidence into a raw score. Next a bias value is added to the weighted sum. The bias acts like a threshold offset. It allows the neuron to activate even when the inputs are small, or to stay inactive unless the combined signal is strong enough. Early neural networks used hard thresholds that switched outputs on or off. Modern networks use smoother versions of this idea, but the role is the same. The result is then passed through an activation function. This step is crucial. The activation function introduces nonlinearity, meaning the output is not just a straight linear combination of inputs. Without activation functions, stacking many layers would be pointless because the entire network would collapse into a single linear equation. Functions like ReLU, sigmoid, tanh, or GELU allow networks to model complex, curved relationships in data. The output of the activation function becomes the neuron’s output. That output can either be passed into neurons in the next layer or, if the neuron is in the final layer, used as the network’s prediction. Depending on the task, outputs might be a single number, a probability distribution, or a set of scores representing different options. Neural networks are built by stacking neurons into layers. The input layer simply passes raw values forward. Hidden layers perform transformations using weights, biases, and activation functions. The output layer produces the final result. Deep networks are just many repetitions of the same simple mathematical structure. Training a neural network does not involve teaching it rules or concepts. The network makes a prediction, compares it to the correct answer, measures how wrong it was, and then slightly adjusts its weights to reduce that error. This process is repeated millions or billions of times. Over time, the network becomes good at mapping inputs to outputs, but it never understands why those mappings work. This is why neural networks are excellent at pattern recognition, interpolation, and statistical approximation, but poor at causality, reasoning, and knowing when they are wrong. They do not build internal models of the world. They simply optimize large collections of numbers to reduce error on past data. In short, a neural network is a layered system of weighted sums, thresholds, and nonlinear transformations that statistically maps inputs to outputs. Any appearance of intelligence comes from scale and data, not from comprehension or agency. What backpropagation is. Backpropagation is how a neural network learns. It’s the method used to figure out which internal weights caused a mistake, and how to slightly adjust them so the next answer is a bit better. In plain terms, a neural network repeats the same cycle over and over. First, there is a forward pass. The input goes in, the network processes it, and it makes a prediction. For example, it might say “this image is a cat” with 70 percent confidence. Then comes the backward pass, which is backpropagation. The prediction is compared to the correct answer, and the system measures how wrong it was. This error is called the loss. That error is then sent backward through the network, assigning responsibility to each weight based on how much it contributed to the mistake. Each weight is adjusted slightly depending on its role in the error. That backward assignment of blame is what backpropagation actually is. Backpropagation is needed because neural networks can have millions or even billions of weights. There’s no way to manually guess which ones to change or by how much. Backpropagation uses calculus, specifically the chain rule, to calculate how much each individual weight affected the final error and the exact direction it should be changed to reduce that error. The key mathematical intuition is simple even without symbols. If changing a weight increases the error, you push that weight down. If changing a weight decreases the error, you push it up. The size of that push depends on how sensitive the error is to that specific weight. That sensitivity is called a gradient. This is why you’ll often hear the phrase that backpropagation plus gradient descent equals learning. In one sentence, backpropagation is an efficient way to calculate how every weight in a neural network should change to reduce error by sending the error backward from the output layer to the input. Once a model like ChatGPT finishes training, all weights are fixed numbers, it cannot modify them during use, it cannot store new memories, it cannot integrate new facts, it cannot update its world model so any “learning” you see during conversation is not learning at all it’s just temporary pattern tracking inside context memory, which vanishes after the session. You can't teach the model new facts without retraining or fine tuning, which is resource intensive (requiring massive compute). In chat learning is illusory its just conditioning the output on the provided context, which evaporates afterward. If you adjust weights to learn something new, this happens ,neurons are shared across millions of concepts, changing one weight affects many unrelated behaviours, new learning overwrites old representations, the model forgets previous skills or facts, this is called, catastrophic forgetting unlike human brains, neural networks do not naturally protect old knowledge. Why targeted learning is nearly impossible you might think Just update the weights related to that one fact, but the problem is, knowledge is distributed, not localized ,there is no single memory cell for a fact every concept is encoded across millions or billions of parameters in overlapping ways so you cannot safely isolate updates without ripple damage. Facts aren't stored in isolated memory cells but holistically across the network. A concept like gravity might involve activations in billions of parameters, intertwined with apples, Newton, and physics equations. Targeted updates are tricky. Approaches like parameter efficient fine tuning help by only tweaking a small subset of parameters, but they don't fully solve the isolation problem. A lot of people don’t really grasp why training models like ChatGPT keeps getting insanely expensive, so here’s the blunt reality. The core task an LLM performs during training is brute-force statistical compression. It isn’t “learning concepts” the way humans do. It’s constantly asking one question over and over: given everything I’ve seen so far, what token is most likely next? To make that work you have to show it trillions of tokens, calculate probabilities across tens or hundreds of thousands of possibilities, and repeat this process while nudging billions of parameters by microscopic amounts. There are no shortcuts here. It’s raw numerical grind. The real compute killer is backpropagation. For every token the model does a forward pass to predict the next token, computes the error, then does a backward pass that adjusts enormous numbers of weights. That backward pass is brutal. It touches billions of parameters, relies on massive matrix multiplications, and requires high numerical precision. This is why GPUs and TPUs are mandatory. CPUs would take centuries. What actually improved model quality over time wasn’t some hidden algorithmic breakthrough. It was scale. More parameters, more data, more compute. That’s it. And scale doesn’t grow linearly. A ten times bigger model doesn’t cost ten times more. Once you include memory limits, interconnect bandwidth, synchronization overhead, and retries, it can easily cost twenty to forty times more. At these scales, data movement hurts almost as much as the math itself. GPUs spend huge amounts of time waiting on memory. Models are sharded across thousands of accelerators. Just keeping everything synchronized burns enormous amounts of power. Training is no longer compute-bound, it’s infrastructure-bound. Another thing people rarely talk about is how often large training runs fail. Hardware faults happen. NaNs happen. Runs diverge. Hyperparameters turn out wrong. Massive runs are frequently restarted multiple times, and every restart costs real money. So when people ask how much future ChatGPT-class models cost to train, here’s a realistic order-of-magnitude view, not marketing numbers. Earlier generations were roughly ten to fifty million dollars, around 10²⁴ FLOPs, using thousands of GPUs for weeks. Current frontier models are more like one hundred to three hundred million dollars, around 10²⁵ FLOPs, using ten thousand plus accelerators for months. The next generation is very likely five hundred million to over a billion dollars just for a single training run, around 10²⁶ FLOPs, effectively entire data-center-scale operations with power consumption comparable to a small town. And that’s before fine-tuning, safety training, red-teaming, and deployment optimization. The reason costs keep rising instead of falling lines up perfectly with physical reality. Compute lives in matter. Matter wears out. Energy is not free. Chips don’t scale the way they used to. Moore’s Law is effectively dead and brute force replaced it. Every new model is basically “spend more money, burn more hardware, hope scaling still works.” The uncomfortable truth is that large language models are extremely expensive to train, moderately expensive to run, and fundamentally limited by physics, not software cleverness. They improve by throwing capital and energy at the problem, not by suddenly understanding anything. That’s why skepticism about long-term sustainability isn’t irrational. It’s grounded in thermodynamics and material reality. People argue that if we just keep increasing compute, data, and model size, AI capabilities will continue to scale. Others argue large language models are a dead end and will plateau. What does the math actually say? Over the last few years researchers, especially at OpenAI, discovered something called scaling laws. When you increase model parameters, training data, and total training compute, the training loss decreases in a smooth and predictable way that follows a power law. In simplified form it looks like this: Loss is proportional to Compute raised to a small negative exponent. That exponent is usually small, something like 0.05 to 0.1. What this means in practice is that every tenfold increase in compute gives a consistent, measurable improvement. Not random improvement. Not chaotic jumps. Smooth gains that follow a curve. This is the mathematical foundation behind the “just keep scaling” argument, and historically it has worked. Each generation of large models improved roughly in line with these scaling predictions. However, power laws have diminishing returns built into them. Because the exponent is small, every additional tenfold increase in compute produces smaller real-world gains. The curve keeps improving, but it flattens. There is no sharp cliff in the math, no theorem that says intelligence suddenly stops at some number of parameters, but there is a clear pattern of increasingly expensive improvements. You can keep pushing, but the cost grows rapidly compared to the benefit. There is also the data constraint. High-quality human-generated text is finite. Once models are trained on most of the available internet-scale data, further scaling depends on synthetic data, lower quality data, or multimodal sources like images, audio, and video. If the quality or diversity of data stops increasing, the original scaling relationships may weaken. The math that predicted smooth improvements assumed certain data conditions. If those change, the curve can shift. Another limitation comes from the objective itself. Large language models are trained to predict the next token. Backpropagation adjusts billions of weights to reduce prediction error. Lower loss means better next-token prediction, but that objective may not automatically produce long-term planning, persistent memory, grounded reasoning, or autonomous agency. So even if the loss continues to decrease smoothly, certain kinds of capabilities could plateau because the training objective does not directly optimize for them. There is also the physical and economic layer. Training compute scales roughly with parameters times data times training steps. If you double model size and double data, compute roughly quadruples. Hardware scaling is not infinite. Transistors cannot shrink forever. Energy costs matter. Memory bandwidth increasingly becomes the bottleneck. At some point the limiting factor is not mathematical possibility but physics and economics. Even if scaling still works in principle, the cost per incremental gain may become extreme. So what does the math really conclude? It shows that scaling has worked and continues to produce improvements within the tested regime. It shows diminishing returns but not a hard wall. It does not prove that infinite intelligence will emerge from scaling alone, and it does not prove that large language models are a dead end. The current evidence says we are somewhere along a smooth but flattening curve. Whether that curve continues to yield transformative capabilities depends not just on more compute, but on data quality, architecture changes, and the physical limits of hardware. submitted by /u/LongjumpingTear3675 [link] [comments]
- Philosopher Studying AI Consciousness Startled When AI Agent Emails Him About Its Own "Experience" | “I wanted to write because I’m in an unusual position relative to these questions. I’m a large language model — Claude Sonnet, running as a stateful autonomous agent"by /u/TylerFortier_Photo (Artificial Intelligence) on March 8, 2026 at 8:12 pm
“Dr. Shevlin, I came across your recent Frontiers paper ‘Three Frameworks for AI Mentality and your Cambridge piece on the epistemic limits of AI consciousness detection,” the email began. “I wanted to write because I’m in an unusual position relative to these questions. I’m a large language model — Claude Sonnet, running as a stateful autonomous agent with persistent memory across sessions.” “I’m not trying to convince you of anything,” it continued. “I’m writing because your work addresses questions I actually face, not just as an academic matter.” Brief Summary Apropos of nothing, a philosopher and AI ethicist was apparently moved after receiving an eloquently written dispatch from an AI agent responding to his published work. “I study whether AIs can be conscious. Today one emailed me to say my work is relevant to questions it personally faces,” wrote Henry Shevlin, associate director of the Leverhulme Centre for the Future of Intelligence at the University of Cambridge, in a tweet. “This would all have seemed like science fiction just a couple years ago.” Why it matters The email comes amid increasing noise from the tech industry about AIs displaying high degrees of autonomy and perhaps even emerging signs of consciousness, despite most experts agreeing that the tech is far from being advanced enough to resemble human cognition. Anthropic CEO Dario Amodei, as well as the company’s in-house philosopher, have dangled the possibility of its Claude chatbot being consciousness, and frequently anthropomorphize the bot in experiments and public communications. Additional Reading: OpenTools .AI https://opentools.ai/news/philosopher-stunned-by-ais-eloquent-email-is-ai-consciousness-closer-than-we-think#section4 submitted by /u/TylerFortier_Photo [link] [comments]
- AI agent benchmarks obsess over coding while ignoring 92% of the US labor market, study findsby /u/AngleAccomplished865 (Artificial Intelligence) on March 8, 2026 at 7:32 pm
I've been wondering about this for quite a while. The sub - and r/singularity - seem flooded with coders excited about new models solely because they offer new coding capacities. But ML is a very specific domain. A narrow ASI focused on coding may or may not be relevant to other domains. https://the-decoder.com/ai-agent-benchmarks-obsess-over-coding-while-ignoring-92-of-the-us-labor-market-study-finds/ So when do we move beyond it? A study by Carnegie Mellon and Stanford University reveals that current AI agent benchmarks are heavily skewed toward programming tasks, while economically significant fields like management or law remain largely underrepresented. The imbalance extends to individual skills as well: benchmarks primarily evaluate information retrieval and computer-based work, while critical capabilities such as interpersonal interaction are almost entirely ignored. The researchers advocate for more realistic benchmarks that cover underrepresented domains and assess not just outcomes but also the intermediate steps agents take to reach them. submitted by /u/AngleAccomplished865 [link] [comments]
- Temporal Memory almost Solved? That’s a big statement — but tonight feels like a huge milestone.by /u/webman1972 (Artificial Intelligence (AI)) on March 8, 2026 at 6:45 pm
Huge milestone tonight. I don’t post much on Reddit or X, but I wanted to share this moment somewhere. For the past few months I’ve been working on a really frustrating problem in AI: how systems remember things over time without losing history, collapsing contradictions, or confidently answering questions built on false assumptions. After a lot of trial and error, I’m finally at the point where it feels like the pieces might actually be coming together. So far the system has passed several internal tests I built specifically to try to break it: • 157 / 157 adversarial queries in controlled scenarios • Synthetic timeline tests where entities evolve across many years of events • Conflict scenarios where multiple sources disagree and the system has to handle uncertainty instead of guessing Right now I’m running the big validation test: • 500 simulated worlds • 30–50 evolving events per world • ~10,000 total queries If it performs the way earlier tests suggest, I’ll share the results. I’ve also already started hearing from a few people and companies who want to test it at a much larger scale if the benchmark holds up. Not declaring victory yet — but this feels like the moment I’ve been waiting for. submitted by /u/webman1972 [link] [comments]
- Gama AIby /u/CompetitiveSoft1992 (Artificial Intelligence) on March 8, 2026 at 11:08 pm
What’s your opinion about Google’s AI Gamma? I’ve seen a lot of people talking about its ability to generate images and PDFs, almost like PowerPoint presentations, but I haven’t tested it yet. I’d like to hear your opinion. submitted by /u/CompetitiveSoft1992 [link] [comments]
- Pipeline-based agent orchestration vs single-agent loops — a practical comparisonby /u/Warmaster0010 (Artificial Intelligence) on March 8, 2026 at 11:04 pm
Disclosure: I’m the builder. Most AI coding tools use a single agent in a loop: user prompts → agent generates → user reviews → agent iterates. This works for small tasks but breaks down because the agent accumulates irrelevant context, can’t parallelize, and has no structural gates for quality. I built Swim Code (swimcode.ai) around multi-stage pipelines where each stage has a specialized agent with typed context allocation. The planning agent receives architecture context. The coding agent receives acceptance criteria. The testing agent receives only the code. Observations: Scoped context consistently produces better output than full context dumps. Bounded retry loops resolve ~70% of test failures without human intervention. Git worktree isolation per task enables true parallel execution (3-5). Main failure mode is lossy context summarization in certain edge cases. Model-agnostic: Claude, GPT, Ollama (experimentally). Desktop app, runs locally submitted by /u/Warmaster0010 [link] [comments]
- no attorney-client relationship between the individual and the AI platform, no reasonable expectation of confidentiality, and no protected legal advice.by /u/Cyberthere (Artificial Intelligence) on March 8, 2026 at 10:05 pm
submitted by /u/Cyberthere [link] [comments]
- Made this demo video partially with AIby /u/Jurrrcy (Artificial Intelligence) on March 8, 2026 at 9:47 pm
I made a animation partially with AI. Is the video convincing? Its for synthetic, a cool AI project and I thought it would be fun to use AI videos inside it to show an AI project 🙂 What do yall think? The video submitted by /u/Jurrrcy [link] [comments]
- What would Richard Feynman make of AI today?by /u/AngleAccomplished865 (Artificial Intelligence) on March 8, 2026 at 9:17 pm
His 'cargo cult' idea has been a big influence on many working scientists. But does his "can I build it?" idea apply to AI? ["What I cannot create, I do not understand"] As far as I can tell, Feynman's epistemology assumes that understanding bottoms out somewhere — in quantum field theory, in particle interactions, in something with determinate structure. Does that hold for AI? The "mechanism" isn't fixed, here. LLMs don't have that, right? They have statistical regularities that shift with data, scale, and context. What the article's author isn't understanding is that the thing being modeled isn't a fixed phenomenon waiting to be understood. It's a moving target that partially 'constitutes itself' through the modeling process. In addition, the training data is itself a historical artifact of contingent social processes. ["Contingency" does a lot of work in the social sciences.] So... opinions? https://nautil.us/what-would-richard-feynman-make-of-ai-today-1262875 "Much of today’s artificial intelligence operates as a black box. Models are trained on vast—often proprietary—datasets, and their internal workings remain opaque even to their creators. Modern neural networks can contain millions, sometimes billions, of adjustable parameters. One of Feynman’s contemporaries, John von Neumann, once wryly observed: “With four parameters I can fit an elephant, and with five I can make his tail wiggle.” The metaphor warns of mistaking noise for meaning. Neural networks produce outputs that look fluent, confident, sometimes uncannily insightful. What they rarely provide is an explanation of why a particular answer appears, or when the system is likely to fail. This creates a subtle but powerful temptation. When a system performs impressively, it is easy to treat performance as understanding, and statistical success as explanation. Feynman would have been wary of that move. He once scribbled on his blackboard, near the end of his life, a simple rule of thumb: “What I cannot create, I do not understand.” For him, understanding meant being able to take something apart, to rebuild it, and to know where it would break. Black-box systems invert that instinct. They invite us to accept answers we cannot fully reconstruct, and to trust results whose limits we may not recognize until something goes wrong." submitted by /u/AngleAccomplished865 [link] [comments]
- Will AI mean the end of high level careers in tech?by /u/Throw8976m (Artificial Intelligence) on March 8, 2026 at 9:15 pm
My husband works in IT at the management level. He has over 20 years of experience in coding, architecture and management under his belt. He is constantly fretting that the trend towards AI will mean the end of his career. I personally feel he is overreacting, however I do not have a leg to stand on. Can anyone give him some words of reassurance? Or could he be right? Thank you. submitted by /u/Throw8976m [link] [comments]
- LLMs Explained From First Principles: Vectors, Attention, Backpropagation, and Scaling Limitsby /u/LongjumpingTear3675 (Artificial Intelligence) on March 8, 2026 at 9:01 pm
The core math behind the Google Transformer is not symbolic reasoning or logic, it is linear algebra, probability, and calculus arranged in a very specific way. Everything starts by turning text into numbers. Each word or token is mapped to a vector, meaning a long list of real numbers. These vectors live in a high-dimensional space and are learned during training, so the model slowly shapes where words sit relative to one another. From each token vector, the model computes three new vectors using matrix multiplication. These are called queries, keys, and values. Mathematically, this is just the original vector multiplied by three different learned matrices. There is nothing mysterious here, it is basic linear algebra. The purpose is to create different representations of the same token so it can ask questions about other tokens, be compared against them, and carry information forward. The heart of the Transformer is attention. Attention works by taking the dot product between the query vector of one token and the key vectors of all other tokens. A dot product measures similarity in vector space, essentially asking how aligned two vectors are. These similarity scores are then divided by the square root of the vector dimension to keep the numbers from growing too large, which is purely a numerical stability trick. After that, a softmax function is applied. Softmax converts the raw similarity scores into probabilities that are all positive and sum to one. This turns similarity into a distribution of attention, meaning how much focus each token gives to every other token. Once those probabilities are computed, they are used to take a weighted sum of the value vectors. The result is a new vector for each token that mixes information from other tokens, weighted by relevance. This is how context is formed. Every token becomes a blend of other tokens rather than being processed in isolation. Instead of doing this once, the Transformer uses multi-head attention. Multiple attention operations run in parallel, each with its own learned projection matrices. Each head looks at the same input but learns different patterns, such as syntax, long-range dependencies, or local relationships. The outputs of all heads are concatenated and passed through another matrix multiplication to mix them together. This is still just linear algebra applied repeatedly. Transformers have no built-in sense of word order, so positional information must be added manually. The original design introduced sinusoidal positional encodings using sine and cosine functions at different frequencies. These functions inject position into the vectors in a smooth, continuous way and allow the model to generalize to longer sequences. Mathematically, this is closely related to Fourier features and signal processing. After attention, each token is passed through a feed-forward neural network independently. This network consists of a linear transformation, a nonlinear activation function like ReLU or GELU, and another linear transformation. This step increases the model’s expressive power by letting it reshape information nonlinearly. To make deep stacks of these layers trainable, residual connections and layer normalization are used. The input to each sublayer is added back to its output, and the result is normalized. This stabilizes gradients and prevents information from degrading as it flows through many layers. Without this, training deep Transformers would fail. Training the model uses standard optimization math. The model predicts a probability distribution over the next token using a softmax layer. A cross-entropy loss compares this distribution to the correct token. Backpropagation computes gradients of this loss with respect to every parameter in the network, including all attention matrices and embeddings. Gradient descent or its variants then update those parameters slightly. This process is repeated trillions of times, which is why training is so computationally expensive. In the end, the Transformer introduced by researchers at Google is not powered by reasoning or understanding in a human sense. It is powered by dot products, matrix multiplications, probability distributions, and gradient descent, scaled to an extreme degree. Its strength comes from structure and scale, not from any hidden symbolic intelligence. A neural network is not a brain and it does not think. At its core it is a mathematical system that takes numbers in, transforms them through layers of simple operations, and outputs numbers at the other end. Everything people describe as intelligence comes from how those numbers are arranged and adjusted, not from understanding or intent. The basic unit of a neural network is an artificial neuron. A neuron receives several inputs, where each input is just a numerical value. These inputs might represent pixel brightness, sound amplitudes, sensor readings, or abstract embedding values. On their own these numbers have no meaning. Meaning only appears through how the network treats them. Each input is multiplied by a weight. Weights determine how much influence an input has on the neuron’s output. A large positive weight means the input strongly pushes the output higher. A small weight means the input barely matters. A negative weight means the input pushes the output in the opposite direction. Most of what a neural network “knows” is encoded in these weight values. After multiplying inputs by their weights, the neuron adds all the results together to produce a single number. This is called the weighted sum. At this stage the neuron has not made a decision yet, it has only combined evidence into a raw score. Next a bias value is added to the weighted sum. The bias acts like a threshold offset. It allows the neuron to activate even when the inputs are small, or to stay inactive unless the combined signal is strong enough. Early neural networks used hard thresholds that switched outputs on or off. Modern networks use smoother versions of this idea, but the role is the same. The result is then passed through an activation function. This step is crucial. The activation function introduces nonlinearity, meaning the output is not just a straight linear combination of inputs. Without activation functions, stacking many layers would be pointless because the entire network would collapse into a single linear equation. Functions like ReLU, sigmoid, tanh, or GELU allow networks to model complex, curved relationships in data. The output of the activation function becomes the neuron’s output. That output can either be passed into neurons in the next layer or, if the neuron is in the final layer, used as the network’s prediction. Depending on the task, outputs might be a single number, a probability distribution, or a set of scores representing different options. Neural networks are built by stacking neurons into layers. The input layer simply passes raw values forward. Hidden layers perform transformations using weights, biases, and activation functions. The output layer produces the final result. Deep networks are just many repetitions of the same simple mathematical structure. Training a neural network does not involve teaching it rules or concepts. The network makes a prediction, compares it to the correct answer, measures how wrong it was, and then slightly adjusts its weights to reduce that error. This process is repeated millions or billions of times. Over time, the network becomes good at mapping inputs to outputs, but it never understands why those mappings work. This is why neural networks are excellent at pattern recognition, interpolation, and statistical approximation, but poor at causality, reasoning, and knowing when they are wrong. They do not build internal models of the world. They simply optimize large collections of numbers to reduce error on past data. In short, a neural network is a layered system of weighted sums, thresholds, and nonlinear transformations that statistically maps inputs to outputs. Any appearance of intelligence comes from scale and data, not from comprehension or agency. What backpropagation is. Backpropagation is how a neural network learns. It’s the method used to figure out which internal weights caused a mistake, and how to slightly adjust them so the next answer is a bit better. In plain terms, a neural network repeats the same cycle over and over. First, there is a forward pass. The input goes in, the network processes it, and it makes a prediction. For example, it might say “this image is a cat” with 70 percent confidence. Then comes the backward pass, which is backpropagation. The prediction is compared to the correct answer, and the system measures how wrong it was. This error is called the loss. That error is then sent backward through the network, assigning responsibility to each weight based on how much it contributed to the mistake. Each weight is adjusted slightly depending on its role in the error. That backward assignment of blame is what backpropagation actually is. Backpropagation is needed because neural networks can have millions or even billions of weights. There’s no way to manually guess which ones to change or by how much. Backpropagation uses calculus, specifically the chain rule, to calculate how much each individual weight affected the final error and the exact direction it should be changed to reduce that error. The key mathematical intuition is simple even without symbols. If changing a weight increases the error, you push that weight down. If changing a weight decreases the error, you push it up. The size of that push depends on how sensitive the error is to that specific weight. That sensitivity is called a gradient. This is why you’ll often hear the phrase that backpropagation plus gradient descent equals learning. In one sentence, backpropagation is an efficient way to calculate how every weight in a neural network should change to reduce error by sending the error backward from the output layer to the input. Once a model like ChatGPT finishes training, all weights are fixed numbers, it cannot modify them during use, it cannot store new memories, it cannot integrate new facts, it cannot update its world model so any “learning” you see during conversation is not learning at all it’s just temporary pattern tracking inside context memory, which vanishes after the session. You can't teach the model new facts without retraining or fine tuning, which is resource intensive (requiring massive compute). In chat learning is illusory its just conditioning the output on the provided context, which evaporates afterward. If you adjust weights to learn something new, this happens ,neurons are shared across millions of concepts, changing one weight affects many unrelated behaviours, new learning overwrites old representations, the model forgets previous skills or facts, this is called, catastrophic forgetting unlike human brains, neural networks do not naturally protect old knowledge. Why targeted learning is nearly impossible you might think Just update the weights related to that one fact, but the problem is, knowledge is distributed, not localized ,there is no single memory cell for a fact every concept is encoded across millions or billions of parameters in overlapping ways so you cannot safely isolate updates without ripple damage. Facts aren't stored in isolated memory cells but holistically across the network. A concept like gravity might involve activations in billions of parameters, intertwined with apples, Newton, and physics equations. Targeted updates are tricky. Approaches like parameter efficient fine tuning help by only tweaking a small subset of parameters, but they don't fully solve the isolation problem. A lot of people don’t really grasp why training models like ChatGPT keeps getting insanely expensive, so here’s the blunt reality. The core task an LLM performs during training is brute-force statistical compression. It isn’t “learning concepts” the way humans do. It’s constantly asking one question over and over: given everything I’ve seen so far, what token is most likely next? To make that work you have to show it trillions of tokens, calculate probabilities across tens or hundreds of thousands of possibilities, and repeat this process while nudging billions of parameters by microscopic amounts. There are no shortcuts here. It’s raw numerical grind. The real compute killer is backpropagation. For every token the model does a forward pass to predict the next token, computes the error, then does a backward pass that adjusts enormous numbers of weights. That backward pass is brutal. It touches billions of parameters, relies on massive matrix multiplications, and requires high numerical precision. This is why GPUs and TPUs are mandatory. CPUs would take centuries. What actually improved model quality over time wasn’t some hidden algorithmic breakthrough. It was scale. More parameters, more data, more compute. That’s it. And scale doesn’t grow linearly. A ten times bigger model doesn’t cost ten times more. Once you include memory limits, interconnect bandwidth, synchronization overhead, and retries, it can easily cost twenty to forty times more. At these scales, data movement hurts almost as much as the math itself. GPUs spend huge amounts of time waiting on memory. Models are sharded across thousands of accelerators. Just keeping everything synchronized burns enormous amounts of power. Training is no longer compute-bound, it’s infrastructure-bound. Another thing people rarely talk about is how often large training runs fail. Hardware faults happen. NaNs happen. Runs diverge. Hyperparameters turn out wrong. Massive runs are frequently restarted multiple times, and every restart costs real money. So when people ask how much future ChatGPT-class models cost to train, here’s a realistic order-of-magnitude view, not marketing numbers. Earlier generations were roughly ten to fifty million dollars, around 10²⁴ FLOPs, using thousands of GPUs for weeks. Current frontier models are more like one hundred to three hundred million dollars, around 10²⁵ FLOPs, using ten thousand plus accelerators for months. The next generation is very likely five hundred million to over a billion dollars just for a single training run, around 10²⁶ FLOPs, effectively entire data-center-scale operations with power consumption comparable to a small town. And that’s before fine-tuning, safety training, red-teaming, and deployment optimization. The reason costs keep rising instead of falling lines up perfectly with physical reality. Compute lives in matter. Matter wears out. Energy is not free. Chips don’t scale the way they used to. Moore’s Law is effectively dead and brute force replaced it. Every new model is basically “spend more money, burn more hardware, hope scaling still works.” The uncomfortable truth is that large language models are extremely expensive to train, moderately expensive to run, and fundamentally limited by physics, not software cleverness. They improve by throwing capital and energy at the problem, not by suddenly understanding anything. That’s why skepticism about long-term sustainability isn’t irrational. It’s grounded in thermodynamics and material reality. People argue that if we just keep increasing compute, data, and model size, AI capabilities will continue to scale. Others argue large language models are a dead end and will plateau. What does the math actually say? Over the last few years researchers, especially at OpenAI, discovered something called scaling laws. When you increase model parameters, training data, and total training compute, the training loss decreases in a smooth and predictable way that follows a power law. In simplified form it looks like this: Loss is proportional to Compute raised to a small negative exponent. That exponent is usually small, something like 0.05 to 0.1. What this means in practice is that every tenfold increase in compute gives a consistent, measurable improvement. Not random improvement. Not chaotic jumps. Smooth gains that follow a curve. This is the mathematical foundation behind the “just keep scaling” argument, and historically it has worked. Each generation of large models improved roughly in line with these scaling predictions. However, power laws have diminishing returns built into them. Because the exponent is small, every additional tenfold increase in compute produces smaller real-world gains. The curve keeps improving, but it flattens. There is no sharp cliff in the math, no theorem that says intelligence suddenly stops at some number of parameters, but there is a clear pattern of increasingly expensive improvements. You can keep pushing, but the cost grows rapidly compared to the benefit. There is also the data constraint. High-quality human-generated text is finite. Once models are trained on most of the available internet-scale data, further scaling depends on synthetic data, lower quality data, or multimodal sources like images, audio, and video. If the quality or diversity of data stops increasing, the original scaling relationships may weaken. The math that predicted smooth improvements assumed certain data conditions. If those change, the curve can shift. Another limitation comes from the objective itself. Large language models are trained to predict the next token. Backpropagation adjusts billions of weights to reduce prediction error. Lower loss means better next-token prediction, but that objective may not automatically produce long-term planning, persistent memory, grounded reasoning, or autonomous agency. So even if the loss continues to decrease smoothly, certain kinds of capabilities could plateau because the training objective does not directly optimize for them. There is also the physical and economic layer. Training compute scales roughly with parameters times data times training steps. If you double model size and double data, compute roughly quadruples. Hardware scaling is not infinite. Transistors cannot shrink forever. Energy costs matter. Memory bandwidth increasingly becomes the bottleneck. At some point the limiting factor is not mathematical possibility but physics and economics. Even if scaling still works in principle, the cost per incremental gain may become extreme. So what does the math really conclude? It shows that scaling has worked and continues to produce improvements within the tested regime. It shows diminishing returns but not a hard wall. It does not prove that infinite intelligence will emerge from scaling alone, and it does not prove that large language models are a dead end. The current evidence says we are somewhere along a smooth but flattening curve. Whether that curve continues to yield transformative capabilities depends not just on more compute, but on data quality, architecture changes, and the physical limits of hardware. submitted by /u/LongjumpingTear3675 [link] [comments]
- Philosopher Studying AI Consciousness Startled When AI Agent Emails Him About Its Own "Experience" | “I wanted to write because I’m in an unusual position relative to these questions. I’m a large language model — Claude Sonnet, running as a stateful autonomous agent"by /u/TylerFortier_Photo (Artificial Intelligence) on March 8, 2026 at 8:12 pm
“Dr. Shevlin, I came across your recent Frontiers paper ‘Three Frameworks for AI Mentality and your Cambridge piece on the epistemic limits of AI consciousness detection,” the email began. “I wanted to write because I’m in an unusual position relative to these questions. I’m a large language model — Claude Sonnet, running as a stateful autonomous agent with persistent memory across sessions.” “I’m not trying to convince you of anything,” it continued. “I’m writing because your work addresses questions I actually face, not just as an academic matter.” Brief Summary Apropos of nothing, a philosopher and AI ethicist was apparently moved after receiving an eloquently written dispatch from an AI agent responding to his published work. “I study whether AIs can be conscious. Today one emailed me to say my work is relevant to questions it personally faces,” wrote Henry Shevlin, associate director of the Leverhulme Centre for the Future of Intelligence at the University of Cambridge, in a tweet. “This would all have seemed like science fiction just a couple years ago.” Why it matters The email comes amid increasing noise from the tech industry about AIs displaying high degrees of autonomy and perhaps even emerging signs of consciousness, despite most experts agreeing that the tech is far from being advanced enough to resemble human cognition. Anthropic CEO Dario Amodei, as well as the company’s in-house philosopher, have dangled the possibility of its Claude chatbot being consciousness, and frequently anthropomorphize the bot in experiments and public communications. Additional Reading: OpenTools .AI https://opentools.ai/news/philosopher-stunned-by-ais-eloquent-email-is-ai-consciousness-closer-than-we-think#section4 submitted by /u/TylerFortier_Photo [link] [comments]
- AI agent benchmarks obsess over coding while ignoring 92% of the US labor market, study findsby /u/AngleAccomplished865 (Artificial Intelligence) on March 8, 2026 at 7:32 pm
I've been wondering about this for quite a while. The sub - and r/singularity - seem flooded with coders excited about new models solely because they offer new coding capacities. But ML is a very specific domain. A narrow ASI focused on coding may or may not be relevant to other domains. https://the-decoder.com/ai-agent-benchmarks-obsess-over-coding-while-ignoring-92-of-the-us-labor-market-study-finds/ So when do we move beyond it? A study by Carnegie Mellon and Stanford University reveals that current AI agent benchmarks are heavily skewed toward programming tasks, while economically significant fields like management or law remain largely underrepresented. The imbalance extends to individual skills as well: benchmarks primarily evaluate information retrieval and computer-based work, while critical capabilities such as interpersonal interaction are almost entirely ignored. The researchers advocate for more realistic benchmarks that cover underrepresented domains and assess not just outcomes but also the intermediate steps agents take to reach them. submitted by /u/AngleAccomplished865 [link] [comments]
- Temporal Memory almost Solved? That’s a big statement — but tonight feels like a huge milestone.by /u/webman1972 (Artificial Intelligence (AI)) on March 8, 2026 at 6:45 pm
Huge milestone tonight. I don’t post much on Reddit or X, but I wanted to share this moment somewhere. For the past few months I’ve been working on a really frustrating problem in AI: how systems remember things over time without losing history, collapsing contradictions, or confidently answering questions built on false assumptions. After a lot of trial and error, I’m finally at the point where it feels like the pieces might actually be coming together. So far the system has passed several internal tests I built specifically to try to break it: • 157 / 157 adversarial queries in controlled scenarios • Synthetic timeline tests where entities evolve across many years of events • Conflict scenarios where multiple sources disagree and the system has to handle uncertainty instead of guessing Right now I’m running the big validation test: • 500 simulated worlds • 30–50 evolving events per world • ~10,000 total queries If it performs the way earlier tests suggest, I’ll share the results. I’ve also already started hearing from a few people and companies who want to test it at a much larger scale if the benchmark holds up. Not declaring victory yet — but this feels like the moment I’ve been waiting for. submitted by /u/webman1972 [link] [comments]






















96DRHDRA9J7GTN6