DjamgaMind: Audio Intelligence for the C-Suite (Daily AI News, Energy, Healthcare, Finance)
Full-Stack AI Intelligence. Zero Noise.The definitive audio briefing for the C-Suite and AI Architects. From Daily News and Strategic Deep Dives to high-density Industrial & Regulatory Intelligence—decoded at the speed of the AI era. . 👉 Start your specialized audio briefing today at Djamgamind.com
AI Jobs and Career
I wanted to share an exciting opportunity for those of you looking to advance your careers in the AI space. You know how rapidly the landscape is evolving, and finding the right fit can be a challenge. That's why I'm excited about Mercor – they're a platform specifically designed to connect top-tier AI talent with leading companies. Whether you're a data scientist, machine learning engineer, or something else entirely, Mercor can help you find your next big role. If you're ready to take the next step in your AI career, check them out through my referral link: https://work.mercor.com/?referralCode=82d5f4e3-e1a3-4064-963f-c197bb2c8db1. It's a fantastic resource, and I encourage you to explore the opportunities they have available.
- Full Stack Engineer [$150K-$220K]
- Software Engineer, Tooling & AI Workflow, Contract [$90/hour]
- DevOps Engineer, India, Contract [$90/hour]
- More AI Jobs Opportunitieshere
| Job Title | Status | Pay |
|---|---|---|
| Full-Stack Engineer | Strong match, Full-time | $150K - $220K / year |
| Developer Experience and Productivity Engineer | Pre-qualified, Full-time | $160K - $300K / year |
| Software Engineer - Tooling & AI Workflows (Contract) | Contract | $90 / hour |
| DevOps Engineer (India) | Full-time | $20K - $50K / year |
| Senior Full-Stack Engineer | Full-time | $2.8K - $4K / week |
| Enterprise IT & Cloud Domain Expert - India | Contract | $20 - $30 / hour |
| Senior Software Engineer | Contract | $100 - $200 / hour |
| Senior Software Engineer | Pre-qualified, Full-time | $150K - $300K / year |
| Senior Full-Stack Engineer: Latin America | Full-time | $1.6K - $2.1K / week |
| Software Engineering Expert | Contract | $50 - $150 / hour |
| Generalist Video Annotators | Contract | $45 / hour |
| Generalist Writing Expert | Contract | $45 / hour |
| Editors, Fact Checkers, & Data Quality Reviewers | Contract | $50 - $60 / hour |
| Multilingual Expert | Contract | $54 / hour |
| Mathematics Expert (PhD) | Contract | $60 - $80 / hour |
| Software Engineer - India | Contract | $20 - $45 / hour |
| Physics Expert (PhD) | Contract | $60 - $80 / hour |
| Finance Expert | Contract | $150 / hour |
| Designers | Contract | $50 - $70 / hour |
| Chemistry Expert (PhD) | Contract | $60 - $80 / hour |
What is the Best Machine Learning Algorithms for Imbalanced Datasets?
In machine learning, imbalanced datasets are those where one class heavily outnumbers the others. This can be due to the nature of the problem or simply because more data is available for one class than the others. Either way, imbalanced datasets can pose a challenge for machine learning algorithms. In this blog post, we’ll take a look at which machine learning algorithms are best suited for imbalanced datasets and why they tend to perform better than others.
For example, in a binary classification problem, if there are 100 observations, and only 10 of them are positive (the rest are negatives), then we say that the dataset is imbalanced. The ratio of positive to negative cases is 1:10.

There are a few reasons why some machine learning algorithms tend to perform better on imbalanced datasets than others. First, certain algorithms are designed to handle imbalanced datasets. Second, some algorithms are more robust to outliers, which can be more common in imbalanced datasets. And third, some algorithms are better able to learn from a limited amount of data, which can be an issue when one class is heavily outnumbered by the others.
Some of the best machine learning algorithms for imbalanced datasets include:
– Support Vector Machines (SVMs),
– Decision Trees,
– Random Forests,
– Naive Bayes Classifiers,
– k-Nearest Neighbors (kNN),
Of these, SVMs tend to be the most popular choice as they are specifically designed to handle imbalanced datasets. SVMs work by finding a hyperplane that maximizes the margin between the two classes. This helps to reduce overfitting and improve generalization. Decision trees and random forests are also popular choices as they are less sensitive to outliers than other algorithms such as linear regression. Naive Bayes classifiers are another good choice as they are able to learn from a limited amount of data. kNN is also a good choice as it is not sensitive to outliers and is able to learn from a limited amount of data. However, it can be computationally intensive for large datasets.
There are two main types of machine learning algorithms: supervised and unsupervised. Supervised algorithms tend to perform better on imbalanced datasets than unsupervised algorithms. In this blog post, we will discuss why this is so and look at some examples.
Supervised Algorithms
Supervised algorithms are those where the target variable is known. In other words, we have training data where the correct answers are already given. The algorithm then learns from this data and is able to generalize to new data. Some examples of supervised algorithms are regression and classification.
Unsupervised Algorithms
Unsupervised algorithms are those where the target variable is not known. With unsupervised algorithms, we only have input data, without any corresponding output labels. The algorithm has to learn from the data itself without any guidance. Some examples of unsupervised algorithms are clustering and dimensionality reduction.
Why Supervised Algorithms Perform Better on Imbalanced Datasets
The reason why supervised algorithms perform better on imbalanced datasets is because they can learn from the training data which cases are more important. With unsupervised algorithms, all data points are treated equally, regardless of whether they are in the minority or majority class.
For example, in a binary classification problem with an imbalanced dataset, let’s say that we want to predict whether a customer will default on their loan payment or not. We have a training dataset of 1000 customers, out of which only 100 (10%) have defaulted on their loan in the past.
If we use a supervised algorithm like logistic regression, the algorithm will learn from the training data that defaulting on a loan is rare (since only 10% of cases in the training data are Positive). This means that it will be more likely to predict correctly that a new customer will not default on their loan (since this is the majority class in the training data).
However, if we use an unsupervised algorithm like k-means clustering, all data points will be treated equally since there is no target variable to guide the algorithm. This means that it might incorrectly cluster together customers who have defaulted on their loans with those who haven’t since there is no guidance provided by a target variable.
Conclusion:
In conclusion, supervised machine learning algorithms tend to perform better on imbalanced datasets than unsupervised machine learning algorithms because they can learn from the training data which cases are more important.
Some machine learning algorithms tend to perform better on highly imbalanced datasets because they are designed to deal with imbalance or because they can learn from both classes simultaneously. If you are working with a highly imbalanced dataset, then you should consider using one of these algorithms.
Thanks for reading!
How are machine learning techniques being used to address unstructured data challenges?
Machine learning techniques are being used to address unstructured data challenges in a number of ways:
- Natural language processing (NLP): NLP algorithms can be used to extract meaningful information from unstructured text data, such as emails, documents, and social media posts. NLP algorithms can be trained to classify text data, identify key terms and concepts, and extract structured data from unstructured text.
- Image recognition: Machine learning algorithms can be used to analyze and classify images, enabling the automatic identification and classification of objects, people, and other elements in images. This can be useful for tasks such as image tagging and search, as well as for applications such as security and surveillance.
- Audio and speech recognition: Machine learning algorithms can be used to analyze and classify audio data, enabling the automatic transcription and translation of spoken language. This can be useful for tasks such as speech-to-text transcription, as well as for applications such as call center automation and language translation.
- Video analysis: Machine learning algorithms can be used to analyze and classify video data, enabling the automatic detection and classification of objects, people, and other elements in video. This can be useful for tasks such as video tagging and search, as well as for applications such as security and surveillance.
Overall, machine learning techniques are being used in a wide range of applications to extract meaningful information from unstructured data, and to enable the automatic classification and analysis of data in a variety of formats.
How is AI and machine learning impacting application development today?
Artificial intelligence (AI) and machine learning are having a significant impact on application development today in a number of ways:
- Enabling new capabilities: AI and machine learning algorithms can be used to enable applications to perform tasks that would be difficult or impossible for humans to do. For example, AI-powered applications can be used to analyze and classify large amounts of data, or to automate complex decision-making processes.
- Improving performance: AI and machine learning algorithms can be used to optimize the performance of applications, making them faster, more efficient, and more accurate. For example, machine learning algorithms can be used to improve the accuracy of predictive models, or to optimize the performance of search algorithms.
- Streamlining development: AI and machine learning algorithms can be used to automate various aspects of application development, such as testing, debugging, and deployment. This can help to streamline the development process and reduce the time and resources needed to build and maintain applications.
- Enhancing user experiences: AI and machine learning algorithms can be used to enhance the user experience of applications, by providing personalized recommendations, recommendations, or by enabling applications to anticipate and respond to the needs and preferences of users.
Overall, AI and machine learning are having a significant impact on application development today, and they are likely to continue to shape the way applications are built and used in the future.
How will advancements in artificial intelligence and machine learning shape the future of work and society?
Advancements in artificial intelligence (AI) and machine learning are likely to shape the future of work and society in a number of ways. Some potential impacts include:
- Automation: AI and machine learning algorithms can be used to automate tasks that are currently performed by humans, such as data entry, customer service, and manufacturing. This could lead to changes in the types of jobs that are available and the skills that are in demand, as well as to increased productivity and efficiency.
- Job displacement: While automation may create new job opportunities, it could also lead to job displacement, particularly for workers in industries that are more susceptible to automation. This could lead to social and economic challenges, including unemployment and income inequality.
- Increased efficiency: AI and machine learning algorithms can be used to optimize and streamline business processes, leading to increased efficiency and productivity. This could lead to economic growth and innovation, and could also help to reduce costs for businesses and consumers.
- Enhanced decision-making: AI and machine learning algorithms can be used to analyze large amounts of data and make more informed and accurate decisions. This could lead to improved outcomes in fields such as healthcare, finance, and education, and could also help to reduce bias and improve fairness.
Overall, the impact of AI and machine learning on the future of work and society is likely to be significant and complex, with both potential benefits and challenges. It will be important to consider and address these impacts as these technologies continue to advance and become more widely adopted.
- ROCm with PyTorch and PyTorch Lightning seems to still suck for research [D]by /u/QuantumQuokka (Machine Learning) on May 16, 2026 at 12:01 am
So I asked about people's experiences with ROCm in a post a few weeks or so ago https://www.reddit.com/r/MachineLearning/comments/1t6cng3/rocm_status_in_mid_2026_d/ I actually went and procured a RX 7900XTX reference version to give it a try My discovery is that it kind of still sucks I have a small codebase for training flow matching models (SANA Architecture), which runs fine on my RTX3090s. But the moment I ported it across to ROCm it was NaNs absolutely everywhere. Forward passes were absolutely fine, but the moment you called backwards() all bets were off. The code was kept identical, apart from altering the pip environment to point to torch2.12 with ROCm7.2 instead of CUDA Trying everything from switching between bf16, fp32, to tweaking various environment variables yielded nothing. Unless there's some trick I'm missing, I get the feeling that ROCm is still seriously behind. I tried running the nanoGPT training script, which ran perfectly My intuition is that the ROCm people have probably tested their stack on established well known codebases. But, it's still remarkably fragile on even slightly uncommon code. submitted by /u/QuantumQuokka [link] [comments]
- Doubts Urgent Guys![R]by /u/ZeroDark_Hereford (Machine Learning) on May 15, 2026 at 11:03 pm
For an expensive simulator inside an MCMC DA setup like this, do you see amortised inference (SBI / neural posterior estimation) as more transformative than surrogating the forward model, since it attacks the per-pixel MCMC bottleneck directly? A neural operator framing (FNO / DeepONet) mapping environmental forcings to ecosystem state feels appealing for spatial structure. But given your fluid mechanics work with discontinuities, have you found neural operators robust in systems with sharp spatial transitions (which would map to sharp biome boundaries here)? Happy to share more context if useful. Thank you for your time. submitted by /u/ZeroDark_Hereford [link] [comments]
- Struggling with Overfitting on Medical Imaging Task [D]by /u/Future-Structure-296 (Machine Learning) on May 15, 2026 at 8:16 pm
Hi everyone, I’m working on a 2-class classification problem (LCA vs. RCA coronary arteries) using 2D X-ray angiograms. I’m currently stuck in a cycle of extreme overfitting and could use some advice on my training strategy. The Setup: Dataset: Small (~900 training frames from ~300 unique DICOMs). Architecture: InceptionV3 (PyTorch). Input: Grayscale .npy arrays converted to 3-channel, resized to 299x299. Current Strategy: Transfer learning from ImageNet. I’ve tried full unfreezing and partial unfreezing (last blocks). The Problem: My training accuracy hits ~95-99% within a few epochs, but validation accuracy peaks early (around 74-79%) and then collapses toward 30-40% as the model starts memorizing the specific textures of the training patients. What I’ve Tried So Far: Normalization: Standard ImageNet mean/std (applied at load time). Class Weights: Handled 2:1 imbalance (LCA:RCA). Regularization: Added Dropout (tried 0.3 to 0.6) and Weight Decay (1e-4). Augmentation: Flips, 25deg rotations, and translation. Schedulers: ReduceLROnPlateau (factor 0.5, patience 8). Would love any insights or papers you'd recommend for small-sample medical classification. Thanks! submitted by /u/Future-Structure-296 [link] [comments]
- Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion [R]by /u/Franck_Dernoncourt (Machine Learning) on May 15, 2026 at 5:21 pm
Paper: https://arxiv.org/abs/2605.12825 Code: https://github.com/chiennv2000/orthrus Disclosure: co-author. Idea: Inject a trainable diffusion attention module into each layer of a frozen AR Transformer. Both heads share one KV cache. Diffusion head projects K=32 tokens in parallel; AR head verifies in a second pass and accepts the longest matching prefix. Output distribution is provably identical to the base model. Results: Up to 7.8× TPF, ~6× wall-clock on MATH-500. 16% of params trained, <1B tokens, 24h on 8×H200. vs. diffusion LMs (Dream, Fast-dLLM-v2, SDAR, Mercury, Gemini Diffusion): they modify base weights and lose accuracy (Fast-dLLM-v2: -11 pts on MATH-500). Orthrus freezes the backbone; accuracy matches Qwen3-8B exactly. vs. Speculative Decoding (EAGLE-3, DFlash): No external drafter, no separate cache, and zero Time-To-First-Token (TTFT) penalty because we don't have to initialize and sync a separate drafter model. KV overhead is O(1) (~4.5 MiB flat). Acceptance length on MATH-500: 11.7 vs. 7.9 (DFlash) vs. 3.5 (EAGLE-3). Single-step denoising beats multi-step (6.35 vs. 3.53 TPF). KL distillation beats CE on acceptance rate. Limitations: strictly bounded by the frozen base model (inherits its biases, hallucinations, knowledge gaps); Qwen3-only evaluation; greedy + rejection sampling only. https://i.redd.it/5lsf6l5w4c1h1.gif submitted by /u/Franck_Dernoncourt [link] [comments]
- PINN is predicting trivial solution for stiff ODE [D]by /u/cae_shot (Machine Learning) on May 15, 2026 at 4:07 pm
I am learning physics informed neural networks. Currently, I am solving a simple second ODE (damped harmonic oscillator). The equation is m*d2y/dt2 + mu*dy/dt + k*y = 0 (bcs: y(t=0) = 1, y'(t=0) = 0). I managed to draft a code. The code works for k values upto 50. However, when increased the value beyond 50, PINN is predicting trivial solution. I tried several things: reducing the learning rate, increasing the data points, reusing the weights trained using lower k values, and using a for loop to increase the k value in smaller steps (step size 20). However, none of them helped. Could you help me with this. Thanks in advance. submitted by /u/cae_shot [link] [comments]
- Looking for a real world dataset (or website where i can find it) [P]by /u/novromeda (Machine Learning) on May 15, 2026 at 2:39 pm
Hi guys, I’m gonna do a data analysis project based on data privacy, bias and data interpretability. For this reason our professor asked for a real world dataset in order to analyze a real case. Additionally I would prefer the least anonymity possible for that dataset in order to create some interesting technique over it (differential privacy, k-anonimity exc…) Do you have any advice where to find the dataset? (links or website names) Because I checked on Kaggle but I don’t know how to find if the dataset is real or not submitted by /u/novromeda [link] [comments]
- software trying to catch software is officially a dead en [D]by /u/bebo117722 (Machine Learning) on May 15, 2026 at 2:36 pm
I feel like we've crossed a weird threshold in the generative AI space where the arms race against botnets is just over. and the bots won I was reading that interview recently where the Reddit CEO was floating the idea of using Face ID and Touch ID just to verify that commenters are actual humans. it honestly hit me how absurd things have gotten. standard heuristics and behavioral analysis are completely useless now against modern LLMs, and vision models solve captchas faster than I can. the dead internet theory is basically just our daily engineering reality at this point we are at a stage where the only reliable way to prove you aren't an automated script is to literally anchor your digital presence to your physical biology. From a purely technical standpoint, it’s fascinating seeing the shift toward hardware verification. like looking at the engineering behind that Orb device the idea of doing local biometric iris hashing on custom hardware just to output a zero-knowledge proof of personhood. It's wild that we actually need dedicated physical devices now just to enforce the concept of "one human, one account" it makes total sense why platforms are pushing for this, beacuse trying to build software firewalls against infinitely scalable AI agents is a losing battle. but it just feels like such a massive, permanent shift for how the internet works. idk, is anyone else working on sybil resistance right now? are we just collectively accepting that biometric hardware gates are the only way to save the web from being 99% synthetic noise? submitted by /u/bebo117722 [link] [comments]
- Does anyone know any ready-to-go Emotion Cause Extraction (ECE) model? [R]by /u/Mountain_Turnip_6403 (Machine Learning) on May 15, 2026 at 11:04 am
Hi everyone, I am currently looking for a Emotion Cause Extraction (ECE) model that is ready to go which means that I can download the model and run it immediately on text. submitted by /u/Mountain_Turnip_6403 [link] [comments]
- It is the process of rapidly ever improving differentiation between noise and signal patterns and constant generalization of those that produces intelligence, not merely compression of data. [D]by /u/Briefin69 (Machine Learning) on May 15, 2026 at 10:41 am
Until we can design a mathematical system with one unavoidable intrinsic goal that drives it with undeniable force and encode that to hardware, plug it into a simulator of raw data, and give it the initial faculties to form, store, manipulate and alter all patterns based on its own feedback with no restriction on developing new faculties; all this AI noise will only serve investors accumulating wealth. The currently required data sanitization and filtration, and the missing intrinsic unavoidable goal, kill the very base requirement for intelligence to emerge as we see and value it in humans. Of course if that happens, new questions arise: human safety from conflict with the system; not just the current concerns which are human misuse related; and what ideology to follow while deciding the goal. But those could be dealt with, given we have the base. For the present situation of things: the current increasing productivity automation is ofcourse undeniable. But that should not be a bad thing if we look towards the long horizon of things. People enjoy cooking, and if doing the dishes and the prep and the shopping were to be automated, it should only make things better. Ofcourse if we can figure out a way to tackle the unemployment and resource access problem and thus wealth concentration, for people that were too specialized for the old system of labour. Thoughts? submitted by /u/Briefin69 [link] [comments]
- arXiv implements 1-year ban for papers containing incontrovertible evidence of unchecked LLM-generated errors, such as hallucinated references or results. [N]by /u/Nunki08 (Machine Learning) on May 15, 2026 at 2:44 am
From Thomas G. Dietterich (arXiv moderator for cs.LG) on 𝕏 (thread): https://x.com/tdietterich/status/2055000956144935055 https://xcancel.com/tdietterich/status/2055000956144935055 "Attention arXiv authors: Our Code of Conduct states that by signing your name as an author of a paper, each author takes full responsibility for all its contents, irrespective of how the contents were generated. If generative AI tools generate inappropriate language, plagiarized content, biased content, errors, mistakes, incorrect references, or misleading content, and that output is included in scientific works, it is the responsibility of the author(s). We have recently clarified our penalties for this. If a submission contains incontrovertible evidence that the authors did not check the results of LLM generation, this means we can't trust anything in the paper. The penalty is a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue. Examples of incontrovertible evidence: hallucinated references, meta-comments from the LLM ("here is a 200 word summary; would you like me to make any changes?"; "the data in this table is illustrative, fill it in with the real numbers from your experiments")." submitted by /u/Nunki08 [link] [comments]
- Follow the Mean: Reference-Guided Flow Matching [R]by /u/Professional-Ant-117 (Machine Learning) on May 14, 2026 at 9:30 pm
Follow the Mean: Reference-Guided Flow Matching: https://www.alphaxiv.org/abs/2605.10302 https://preview.redd.it/5pleq5b4861h1.png?width=1036&format=png&auto=webp&s=805940b079176b65c45bb10e5458ecce140b0044 submitted by /u/Professional-Ant-117 [link] [comments]
- Would a 2000-2021 ML paper even get accepted today? [D]by /u/Hope999991 (Machine Learning) on May 14, 2026 at 11:39 am
I keep hearing some version of this: “A paper that got accepted years ago wouldn’t stand a chance today.” Honestly, for a lot of ML subfields, this doesn’t sound crazy anymore. A paper that once looked solid can now look under-evaluated, under-ablated, weak on baselines, or just too obvious. So maybe the real claim is: A mediocre accepted ML paper from years ago would probably get rejected today. Do people agree? Has the bar actually gone up, or has the field just become more crowded and more competitive? submitted by /u/Hope999991 [link] [comments]
- Continual Harness: Online Adaptation for Self-Improving Foundation Agents [R]by /u/PokeAgentChallenge (Machine Learning) on May 14, 2026 at 3:45 am
https://preview.redd.it/p9cd2zmfy01h1.png?width=2000&format=png&auto=webp&s=a8e99bac438c2505d97ed3716983aa731da855f8 Sharing a new paper from the GPP and PokeAgent teams. Gemini Plays Pokémon (GPP) was the first AI system to complete Pokémon Blue, Yellow Legacy on hard mode, and Crystal without losing a battle. How? Early signs of iterative harness development. In the Blue era a human watched the stream and edited the harness. By Yellow Legacy and Crystal, the model itself was performing most of the editing through general meta-tools (define_agent, run_code, notepad edits). Our new paper, Continual Harness: Online Adaptation for Self-Improving Foundation Agents, formalizes the loop and automates the refining role end to end. We then carry the same loop into training, enabling model-harness co-learning. The takeaways: 1. Iterative harness refinement closes most of the gap to a hand-engineered version. 2. Long-horizon agency requires self-refinement, and self-refinement requires a useful model. 3. The future of agents is model-harness co-learning. Paper (arXiv). https://arxiv.org/abs/2605.09998 Article (Substack). https://sethkarten.substack.com/p/gemini-plays-pokemon-discovered-something Project page (video demos). https://sethkarten.ai/continual-harness submitted by /u/PokeAgentChallenge [link] [comments]
- Trained transformer-based chess models to play like humans (including thinking time) [P]by /u/hazard02 (Machine Learning) on May 13, 2026 at 10:08 pm
I trained a set of deep learning (transformer-based) chess models to play like humans (inspired by MAIA and Grandmaster Chess Without Search). There's a separate model for each 100-point rating bucket from ~800 to 2500+. I started with training a mid-strength model from scratch on a 8xH100 cluster, then fine-tuned models for the other rating ranges on my local 5090 GPU. The total training size was nearly a year of Lichess data, about 1B total games. Each rating range actually has 3 models: A move model, a thinking time model, and a white win / draw / black win model. Despite being quite small (only 9MM parameters!) the move models achieve better accuracy than MAIA-2 and are approximately on par with MAIA-3 (see here for MAIA-2 comparison). AFAIK this is the only attempt to train on thinking times in chess, so I don't have a benchmark to compare against for that. Likely because of the network size, at high ratings the models aren't quite as good as they could be. They see short tactical motifs but can't do deep calculation - probably a bigger model would help here. The move and win models take into account player ratings and clock times. For instance, under extreme time pressure a much stronger player has a lower win prob even if their opponent is weaker. The models blunder more under time pressure as well. The data pipeline is C++ via nanobind, then training with Pytorch. Getting this right was actually the thing I spent the most time on. Pre-shuffling the dataset and then being able to read the shuffled dataset sequentially at training time kept the GPU utilization high. Without this it spent a huge percentage of time on I/O while the GPU sat idle. Happy to answer questions about the rating-conditioning, the clock model, or the data pipeline. Code (including training code and model weights) is at https://github.com/thomasj02/1e4_ai/. A demo is at https://1e4.ai/ but all the frontend code is also in the repo if you want to self-host. submitted by /u/hazard02 [link] [comments]
- Scenema Audio: Zero-shot expressive voice cloning and speech generation [N]by /u/a__side_of_fries (Machine Learning) on May 13, 2026 at 9:29 pm
We've been building Scenema Audio as part of our video production platform at scenema.ai, and we're releasing the model weights and inference code. The core idea: emotional performance and voice identity are independent. You describe how the speech should be performed (rage, grief, excitement, a child's wonder), and optionally provide reference audio for voice identity. The reference provides the "who." The prompt provides the "how." Any voice can perform any emotion, even if that voice has never been recorded in that emotional state. Limitations (and why we still use it) This is a diffusion model, not a traditional TTS pipeline. Common issues include repetition and gibberish on some seeds. Different seeds give different results, and you will not get a perfect output with 0% error rate. This model is meant for a post-editing workflow: generate, pick the best take, trim if needed. Same way you'd work with any generative model. That said, we keep coming back to Scenema Audio over even Gemini 3.1 Flash TTS, which is already more controllable than most TTS systems out there. The reason is simple: the output just sounds more natural and less robotic. There's a quality to diffusion-generated speech that autoregressive TTS doesn't quite match, especially for emotional delivery. Audio-first video generation As this video points out, generating audio first and then using it to drive video generation is a powerful workflow. That's actually how we've used Scenema Audio in some cases. Generate the voice performance, then feed it into an A2V pipeline (LTX 2.3, Wan 2.6, Seedance 2.0, etc.) to generate video that matches the speech. Here's an example of that workflow in action. On distillation and speed A few people have asked this. Our bottleneck is not denoising steps. The diffusion pass is a small fraction of total generation time. The real costs are elsewhere in the pipeline. We're already at 8 steps (down from 50 in the base model), and that's the sweet spot where quality holds. Prompting matters This model is sensitive to prompting, the same way LTX 2.3 is for video. A generic voice description gives you generic output. A specific, theatrical description with action tags gives you a performance. There's also a pace parameter that controls how much time the model gets per word. Takes some experimentation to find what works for your use case, but once you do, you can generate hours of audio with minimal quality loss. Complex words and proper nouns benefit from phonetic spelling. Unlike traditional TTS, it doesn't have a phoneme-to-audio pipeline or a pronunciation dictionary. If it garbles "Tchaikovsky," you would spell it "Chai-koff-skee" or whatever makes sense to you. Docker REST API with automatic VRAM management We ship this as a Docker container with a REST API. Same setup we use in production on scenema.ai. The service auto-detects your GPU and picks the right configuration: VRAM Audio Model Gemma Notes 16 GB INT8 (4.9 GB) CPU streaming Needs 32 GB system RAM 24 GB INT8 (4.9 GB) NF4 on GPU Default config 48 GB bf16 (9.8 GB) bf16 on GPU Best quality We went with Docker because that's how we serve it. No dependency hell, no conda environments. Pull, set your HF token for Gemma access, then docker compose up. ComfyUI Native ComfyUI node support is planned. We're hoping to release it in the coming weeks, unless someone from the community beats us to it. In the meantime, the REST API is straightforward to call from a custom node since it's just a local HTTP service. Links All demos + article: scenema.ai/audio Model weights: huggingface.co/ScenemaAI/scenema-audio Code + setup: github.com/ScenemaAI/scenema-audio YouTube demo: youtu.be/VnEQ_ImOaAc This is fully open source. The model weights derive from the LTX-2 Community License but all inference and pipeline code is MIT. submitted by /u/a__side_of_fries [link] [comments]
- Have the "on-hold" durations been getting longer for arXiv submissions? [D]by /u/Megixist (Machine Learning) on May 13, 2026 at 6:51 pm
I have a paper that has been "on-hold" for about 2 weeks now. I understand that it might take a little longer now because of inundation of AI generated low-effort papers but my papers have gone from "on-hold" to "submitted" within a couple of days in the past. Wondering if anyone else is facing the same issue. submitted by /u/Megixist [link] [comments]
- EEML Summer School (Eastern European ML) - Anyone here got accepted? [D]by /u/ade17_in (Machine Learning) on May 13, 2026 at 4:55 pm
Has anyone got into EEML Summer School in Montenegro? I did and please feel free to DM to manage stay or other plans after the summer school. I see that it's tricky to get there and find a stay. submitted by /u/ade17_in [link] [comments]
- Human-level performance via ML was *not* proven impossible with complexity theory [D]by /u/mike_uoftdcs (Machine Learning) on May 13, 2026 at 2:50 pm
Van Rooij, Guest, de Haan, Adolfi, Kolokolova, and Rich claimed to have proven that AGI via ML is impossible in Computational Brain & Behavior in 2024. The basic idea was to try to reduce a known NP-hard problem to the problem of learning a human-level classifier from data. The purported result, called "Ingenia Theorem" by the authors, made some noise on the internet, including here. My paper showing that the proof is irreparably broken is now also out in CBB (ungated preprint here). The basic issue is that "human-level classifier" is not mathematically defined, which the authors solve by ... never defining it. They have a construct that corresponds to "distribution of human situation-behaviour tuples" when they introduce the problem, but the construct then gets swapped out for "for all polytime-sampleable distributions" when it comes time to doing the formal proof. This means that the paper, if you find-and-replace human situation-behavior tuples for ImageNet inputs/labels, also proves that learning to classify ImageNet is intractable. Blogpost discussion similar attempts from Penrose to Chomsky here. submitted by /u/mike_uoftdcs [link] [comments]
- Built Support Vector Machine(SVM) from scratch in Rust [P]by /u/Yeet132416 (Machine Learning) on May 13, 2026 at 2:23 pm
Built my own SVM classifier from scratch in Rust. It uses SMO optimization, have linear and rbf kernel, uses grid search to tune the hyperparameters. I tested it on two datasets one using Linear dataset and other using RBF, these were the results: Dataset Kernel Accuracy Recall F1 Banknote Auth Linear 96% 94% 95% Breast Cancer RBF 93% 100% 92% https://preview.redd.it/uw26u1uo0w0h1.jpg?width=720&format=pjpg&auto=webp&s=1784e1d7d310a26fa67efc63fa5191f45433a695 https://preview.redd.it/o0ahkq7p0w0h1.jpg?width=720&format=pjpg&auto=webp&s=dcb1053c34931d11b82831c6ad8cd4755ebc5816 The plot.rs file, used for plotting only was written using AI as I could not wrap my head around plotters crate, apart from that everything was by my own. Repo Link: Github Repo Happy to get some feedback! submitted by /u/Yeet132416 [link] [comments]
- Elastic Attention Cores for Scalable Vision Transformers [R]by /u/44seconds (Machine Learning) on May 13, 2026 at 11:51 am
Wanted to share our latest paper on an alternative building block for Vision Transformers. Illustration of our model's accuracy and dense features Traditional ViTs utilize dense (N2) self-attention, which can become pretty costly at higher resolutions. In this work, we propose an alternative backbone with a core-periphery block-sparse attention structure that scales as (2NC + C2) for C core tokens. We further train this using nested dropout, which enables test-time elastic adjustments to the inference cost. The whole model can achieve very competitive dense & classification accuracy compared with DINOv3, and is stable across resolutions (256 all the way to 1024). Interestingly, the core-dense attention patterns exhibit strong emergent behavior. At early layers of the network the attention maps are isotropic (spherical), but become increasingly semantically aligned deeper into the network. Visual Elastic Core Attention paper abstract While adjusting the number of core tokens, if you decrease the number of cores, the attention patterns become more diffuse & cover a spatially larger region. If you increase the number of core tokens, the attention patterns become smaller & more concentrated. Paper: https://arxiv.org/abs/2605.12491 Project with the code (still in progress): https://github.com/alansong1322/VECA Happy to answer any questions about our research. submitted by /u/44seconds [link] [comments]
![Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion [R]](https://preview.redd.it/5lsf6l5w4c1h1.gif?frame=1&width=140&height=72&auto=webp&s=023b02ee925924f982e6eea09bbeefbe039d00ab)
![Follow the Mean: Reference-Guided Flow Matching [R]](https://preview.redd.it/5pleq5b4861h1.png?width=140&height=91&auto=webp&s=5f80ce290c30e51700f9b9fd0f907ee56e9382b2)
![Continual Harness: Online Adaptation for Self-Improving Foundation Agents [R]](https://preview.redd.it/p9cd2zmfy01h1.png?width=140&height=56&auto=webp&s=fd1f3545b0efc0e42e9d014caa1aa2f83f0f409a)
![Elastic Attention Cores for Scalable Vision Transformers [R]](https://preview.redd.it/zjea47ez7w0h1.png?width=140&height=140&crop=1:1,smart&auto=webp&s=2017a3d330a172670baae5645ddff3137bbe1df6)

























96DRHDRA9J7GTN6