DjamgaMind: Audio Intelligence for the C-Suite (Energy, Healthcare, Finance)
Are you drowning in dense legal text? DjamgaMind is the new audio intelligence platform that turns 100-page healthcare or Energy mandates into 5-minute executive briefings. Whether you are navigating Bill C-27 (Canada) or the CMS-0057-F Interoperability Rule (USA), our AI agents decode the liability so you don’t have to. 👉 Start your specialized audio briefing today at Djamgamind.com
AI Jobs and Career
I wanted to share an exciting opportunity for those of you looking to advance your careers in the AI space. You know how rapidly the landscape is evolving, and finding the right fit can be a challenge. That's why I'm excited about Mercor – they're a platform specifically designed to connect top-tier AI talent with leading companies. Whether you're a data scientist, machine learning engineer, or something else entirely, Mercor can help you find your next big role. If you're ready to take the next step in your AI career, check them out through my referral link: https://work.mercor.com/?referralCode=82d5f4e3-e1a3-4064-963f-c197bb2c8db1. It's a fantastic resource, and I encourage you to explore the opportunities they have available.
- Full Stack Engineer [$150K-$220K]
- Software Engineer, Tooling & AI Workflow, Contract [$90/hour]
- DevOps Engineer, India, Contract [$90/hour]
- More AI Jobs Opportunitieshere
| Job Title | Status | Pay |
|---|---|---|
| Full-Stack Engineer | Strong match, Full-time | $150K - $220K / year |
| Developer Experience and Productivity Engineer | Pre-qualified, Full-time | $160K - $300K / year |
| Software Engineer - Tooling & AI Workflows (Contract) | Contract | $90 / hour |
| DevOps Engineer (India) | Full-time | $20K - $50K / year |
| Senior Full-Stack Engineer | Full-time | $2.8K - $4K / week |
| Enterprise IT & Cloud Domain Expert - India | Contract | $20 - $30 / hour |
| Senior Software Engineer | Contract | $100 - $200 / hour |
| Senior Software Engineer | Pre-qualified, Full-time | $150K - $300K / year |
| Senior Full-Stack Engineer: Latin America | Full-time | $1.6K - $2.1K / week |
| Software Engineering Expert | Contract | $50 - $150 / hour |
| Generalist Video Annotators | Contract | $45 / hour |
| Generalist Writing Expert | Contract | $45 / hour |
| Editors, Fact Checkers, & Data Quality Reviewers | Contract | $50 - $60 / hour |
| Multilingual Expert | Contract | $54 / hour |
| Mathematics Expert (PhD) | Contract | $60 - $80 / hour |
| Software Engineer - India | Contract | $20 - $45 / hour |
| Physics Expert (PhD) | Contract | $60 - $80 / hour |
| Finance Expert | Contract | $150 / hour |
| Designers | Contract | $50 - $70 / hour |
| Chemistry Expert (PhD) | Contract | $60 - $80 / hour |
What is the Best Machine Learning Algorithms for Imbalanced Datasets?
In machine learning, imbalanced datasets are those where one class heavily outnumbers the others. This can be due to the nature of the problem or simply because more data is available for one class than the others. Either way, imbalanced datasets can pose a challenge for machine learning algorithms. In this blog post, we’ll take a look at which machine learning algorithms are best suited for imbalanced datasets and why they tend to perform better than others.
For example, in a binary classification problem, if there are 100 observations, and only 10 of them are positive (the rest are negatives), then we say that the dataset is imbalanced. The ratio of positive to negative cases is 1:10.

There are a few reasons why some machine learning algorithms tend to perform better on imbalanced datasets than others. First, certain algorithms are designed to handle imbalanced datasets. Second, some algorithms are more robust to outliers, which can be more common in imbalanced datasets. And third, some algorithms are better able to learn from a limited amount of data, which can be an issue when one class is heavily outnumbered by the others.
Some of the best machine learning algorithms for imbalanced datasets include:
– Support Vector Machines (SVMs),
– Decision Trees,
– Random Forests,
– Naive Bayes Classifiers,
– k-Nearest Neighbors (kNN),
Of these, SVMs tend to be the most popular choice as they are specifically designed to handle imbalanced datasets. SVMs work by finding a hyperplane that maximizes the margin between the two classes. This helps to reduce overfitting and improve generalization. Decision trees and random forests are also popular choices as they are less sensitive to outliers than other algorithms such as linear regression. Naive Bayes classifiers are another good choice as they are able to learn from a limited amount of data. kNN is also a good choice as it is not sensitive to outliers and is able to learn from a limited amount of data. However, it can be computationally intensive for large datasets.
There are two main types of machine learning algorithms: supervised and unsupervised. Supervised algorithms tend to perform better on imbalanced datasets than unsupervised algorithms. In this blog post, we will discuss why this is so and look at some examples.
Supervised Algorithms
Supervised algorithms are those where the target variable is known. In other words, we have training data where the correct answers are already given. The algorithm then learns from this data and is able to generalize to new data. Some examples of supervised algorithms are regression and classification.
Unsupervised Algorithms
Unsupervised algorithms are those where the target variable is not known. With unsupervised algorithms, we only have input data, without any corresponding output labels. The algorithm has to learn from the data itself without any guidance. Some examples of unsupervised algorithms are clustering and dimensionality reduction.
Why Supervised Algorithms Perform Better on Imbalanced Datasets
The reason why supervised algorithms perform better on imbalanced datasets is because they can learn from the training data which cases are more important. With unsupervised algorithms, all data points are treated equally, regardless of whether they are in the minority or majority class.
For example, in a binary classification problem with an imbalanced dataset, let’s say that we want to predict whether a customer will default on their loan payment or not. We have a training dataset of 1000 customers, out of which only 100 (10%) have defaulted on their loan in the past.
If we use a supervised algorithm like logistic regression, the algorithm will learn from the training data that defaulting on a loan is rare (since only 10% of cases in the training data are Positive). This means that it will be more likely to predict correctly that a new customer will not default on their loan (since this is the majority class in the training data).
However, if we use an unsupervised algorithm like k-means clustering, all data points will be treated equally since there is no target variable to guide the algorithm. This means that it might incorrectly cluster together customers who have defaulted on their loans with those who haven’t since there is no guidance provided by a target variable.
Conclusion:
In conclusion, supervised machine learning algorithms tend to perform better on imbalanced datasets than unsupervised machine learning algorithms because they can learn from the training data which cases are more important.
Some machine learning algorithms tend to perform better on highly imbalanced datasets because they are designed to deal with imbalance or because they can learn from both classes simultaneously. If you are working with a highly imbalanced dataset, then you should consider using one of these algorithms.
Thanks for reading!
How are machine learning techniques being used to address unstructured data challenges?
Machine learning techniques are being used to address unstructured data challenges in a number of ways:
- Natural language processing (NLP): NLP algorithms can be used to extract meaningful information from unstructured text data, such as emails, documents, and social media posts. NLP algorithms can be trained to classify text data, identify key terms and concepts, and extract structured data from unstructured text.
- Image recognition: Machine learning algorithms can be used to analyze and classify images, enabling the automatic identification and classification of objects, people, and other elements in images. This can be useful for tasks such as image tagging and search, as well as for applications such as security and surveillance.
- Audio and speech recognition: Machine learning algorithms can be used to analyze and classify audio data, enabling the automatic transcription and translation of spoken language. This can be useful for tasks such as speech-to-text transcription, as well as for applications such as call center automation and language translation.
- Video analysis: Machine learning algorithms can be used to analyze and classify video data, enabling the automatic detection and classification of objects, people, and other elements in video. This can be useful for tasks such as video tagging and search, as well as for applications such as security and surveillance.
Overall, machine learning techniques are being used in a wide range of applications to extract meaningful information from unstructured data, and to enable the automatic classification and analysis of data in a variety of formats.
How is AI and machine learning impacting application development today?
Artificial intelligence (AI) and machine learning are having a significant impact on application development today in a number of ways:
- Enabling new capabilities: AI and machine learning algorithms can be used to enable applications to perform tasks that would be difficult or impossible for humans to do. For example, AI-powered applications can be used to analyze and classify large amounts of data, or to automate complex decision-making processes.
- Improving performance: AI and machine learning algorithms can be used to optimize the performance of applications, making them faster, more efficient, and more accurate. For example, machine learning algorithms can be used to improve the accuracy of predictive models, or to optimize the performance of search algorithms.
- Streamlining development: AI and machine learning algorithms can be used to automate various aspects of application development, such as testing, debugging, and deployment. This can help to streamline the development process and reduce the time and resources needed to build and maintain applications.
- Enhancing user experiences: AI and machine learning algorithms can be used to enhance the user experience of applications, by providing personalized recommendations, recommendations, or by enabling applications to anticipate and respond to the needs and preferences of users.
Overall, AI and machine learning are having a significant impact on application development today, and they are likely to continue to shape the way applications are built and used in the future.
How will advancements in artificial intelligence and machine learning shape the future of work and society?
Advancements in artificial intelligence (AI) and machine learning are likely to shape the future of work and society in a number of ways. Some potential impacts include:
- Automation: AI and machine learning algorithms can be used to automate tasks that are currently performed by humans, such as data entry, customer service, and manufacturing. This could lead to changes in the types of jobs that are available and the skills that are in demand, as well as to increased productivity and efficiency.
- Job displacement: While automation may create new job opportunities, it could also lead to job displacement, particularly for workers in industries that are more susceptible to automation. This could lead to social and economic challenges, including unemployment and income inequality.
- Increased efficiency: AI and machine learning algorithms can be used to optimize and streamline business processes, leading to increased efficiency and productivity. This could lead to economic growth and innovation, and could also help to reduce costs for businesses and consumers.
- Enhanced decision-making: AI and machine learning algorithms can be used to analyze large amounts of data and make more informed and accurate decisions. This could lead to improved outcomes in fields such as healthcare, finance, and education, and could also help to reduce bias and improve fairness.
Overall, the impact of AI and machine learning on the future of work and society is likely to be significant and complex, with both potential benefits and challenges. It will be important to consider and address these impacts as these technologies continue to advance and become more widely adopted.
- [D] ICML26 new review policiesby /u/reutococco (Machine Learning) on January 18, 2026 at 10:54 am
ICML26 introduced a review type selection, where the author can decide whether LLMs can be used during their paper review, according to these two policies: Policy A (Conservative): Use of LLMs for reviewing is strictly prohibited. Policy B (Permissive): Allowed: Use of LLMs to help understand the paper and related works, and polish reviews. Submissions can be fed to privacy-compliant* LLMs. Not allowed: Ask LLMs about strengths/weaknesses, ask to suggest key points for the review, suggest an outline for the review, or write the full review \By “privacy-compliant”, we refer to LLM tools that do not use logged data for training and that place limits on data retention. This includes enterprise/institutional subscriptions to LLM APIs, consumer subscriptions with an explicit opt-out from training, and self-hosted LLMs. (We understand that this is an oversimplification.)* I'm struggling to decide which one to select, any suggestions? submitted by /u/reutococco [link] [comments]
- [P] I built an offline AI system on Raspberry Pi that analyzes wound images and gives basic medical guidance — fully on-deviceby /u/Severe-Environment-2 (Machine Learning) on January 18, 2026 at 8:09 am
Most AI medical demos assume cloud GPUs, APIs, and always-on internet. But in many real-world settings, that’s simply not available. So I built Companion — an offline AI system that runs entirely on a Raspberry Pi and analyzes wound images to provide basic, safe medical guidance, without relying on the cloud. The interesting part isn’t just what it does — but how it’s designed. 🧠 System design (high level) Instead of one big AI model, the system is split into independent agents, each with a clearly limited role: 👁️ Perception (Computer Vision) INT8-optimized MobileNetV2 Runs via TensorFlow Lite on ARM Classifies wound images on-device No internet, no GPU This agent answers: “What does the image look like?” Risk Logic (Rules, not AI) Deterministic, rule-based logic Converts wound type → severity & escalation No LLMs involved here on purpose This ensures safety decisions are predictable and auditable. Reasoning (Local LLM, constrained) Uses a locally running LLM (via Ollama) Only used for explanation and guidance Guardrails prevent diagnosis, treatment advice, or overriding safety logic The LLM explains — it does not decide. Architecture Each agent runs in its own Docker container Docker Compose defines responsibilities, inputs, and flow Makes the system easier to reason about and modify safely It’s built for environments where: connectivity is unreliable hardware is limited safety matters more than “AI magic” submitted by /u/Severe-Environment-2 [link] [comments]
- [D] Shower thought after 13hr coding session: Could physical filtration principles inform attention head design? (Claude wrote this, I just had the idea)by /u/notruescotchman (Machine Learning) on January 18, 2026 at 2:41 am
Full transparency upfront: I’m not an ML researcher. I’m a solutions architect who works with voice AI integrations. After a 13-hour coding marathon today, my brain started making weird connections and I asked Claude to help me write this up properly because I don’t have the background to formalize it myself. I’m posting this because: (a) I genuinely want to know if this is interesting or stupid, (b) I don’t need credit for anything, and (c) if there’s signal here, someone smarter than me should do something with it. The shower thought: Physical substrate filtration (like building a road bed or water filtration) layers materials by particle size: fine sand → coarse sand → gravel → crushed stone. Each layer handles what it can and passes the rest up. Order matters. The system is subtractive. Attention in transformers seems to have emergent granularity—early layers handle local patterns, later layers handle global dependencies. But this is learned, not constrained. The question: What if you explicitly constrained attention heads to specific receptive field sizes, like physical filter substrates? Something like: ∙ Heads 1-4: only attend within 16 tokens (fine) ∙ Heads 5-8: attend within 64 tokens (medium) ∙ Heads 9-12: global attention (coarse) Why this might not be stupid: ∙ Longformer/BigBird already do binary local/global splits ∙ WaveNet uses dilated convolutions with exponential receptive fields ∙ Probing studies show heads naturally specialize by granularity anyway ∙ Could reduce compute (fine heads don’t need O(n²)) ∙ Adds interpretability (you know what each head is doing) Why this might be stupid (more likely): ∙ Maybe the flexibility of unconstrained heads is the whole point ∙ Maybe this has been tried and doesn’t work ∙ I literally don’t know what I don’t know Bonus weird idea: What if attention was explicitly subtractive like physical filtration? Fine-grained heads “handle” local patterns and remove them from the residual stream, so coarse heads only see what’s ambiguous. No idea if gradient flow would survive this. What I’m asking: 1. Is this a known research direction I just haven’t found? 2. Is the analogy fundamentally broken somewhere? 3. Is this interesting enough that someone should actually test it? 4. Please destroy this if it deserves destruction—I’d rather know Thanks for reading my 1am brain dump. For Clyde Tombaugh. submitted by /u/notruescotchman [link] [comments]
- [D]It feels like LLM inference is missing its AWS Lambda moment.by /u/pmv143 (Machine Learning) on January 17, 2026 at 3:54 pm
If we actually wanted “model = function” to work, a few things seem fundamentally required: •. Fast scale from zero without keeping GPUs alive just to hold state • Execution state reuse so models don’t need full re-init and KV rebuild on every scale event • Clear separation between orchestration and runtime, like Lambda vs the underlying compute • Predictable latency even under spiky, bursty traffic • Cost model that doesn’t assume always-on GPUs Today, most inference setups still treat models as long-lived services, which makes scale-to-zero and elasticity awkward. What’s the real hard blocker to a true Lambda-style abstraction for models? Cold starts, KV cache, GPU memory semantics, scheduling, or something else? submitted by /u/pmv143 [link] [comments]
- [D] LLMs as a semantic regularizer for feature synthesis (small decision-tree experiment)by /u/ChavXO (Machine Learning) on January 17, 2026 at 2:59 pm
I’ve been experimenting with using LLMs not to generate features, but instead to filter them during enumerative feature synthesis. The approach was inspired by this paper: https://arxiv.org/pdf/2403.03997v1 I had already been playing with enumerative bottom up synthesis but noticed it usually gave me unintelligible features (even with regularization). I looked into how other symbolic approaches deal with this problem and saw that they tried to model the semantics of the domain somehow - including dimensions, refinement types etc. But those approaches weren't appealing to me because I was trying to come up with something that worked in general. So I tried using an LLM to score candidate expressions by how meaningful they are. The idea was that the semantic meaning of the column names, the dimensions, and the salience of the operations could be embedded in the LLM. My approach was: * Enumerate simple arithmetic features (treat feature eng as program synthesis) * Use an LLM as a semantic filter (“does this look like a meaningful quantity?”) * Train a decision tree (with oblique splits) considering only the filtered candidates as potential splits. The result was that the tree was noticeably more readable, accuracy was similar / slightly better in my small test. I wrote it up here: https://mchav.github.io/learning-better-decision-tree-splits/ Runnable code is here If you’ve tried constraining feature synthesis before: what filters worked best in practice? Are the any measures of semantic viability out there? submitted by /u/ChavXO [link] [comments]
- [P] Progressive coding exercises for transformer internalsby /u/randmusr66 (Machine Learning) on January 17, 2026 at 8:33 am
For a while I've been looking for a good format to practice implementing ML algorithms. LeetCode feels too disconnected from real work, but in actual projects you just use existing libraries. What worked for me was breaking real algorithms into progressive steps and implementing them piece by piece. I've been using this approach for myself, and recently decided to clean up some of it with tests and hints in case others find it useful. Currently covers: attention, BPE tokenization, beam search variants, and RoPE. Curious if others have found similar formats helpful, or what primitives would be worth adding. submitted by /u/randmusr66 [link] [comments]
- [D] Irreproducible KDD Paper?by /u/Massive-Bobcat-5363 (Machine Learning) on January 17, 2026 at 5:11 am
So I came across a 2025 KDD paper whose idea is pretty simple and not too novel in my opinion. The paper shared a code link that was broken. But the same paper was rejected from ICLR but had shared the code there. They primarily did experiments on 2 datasets that were public following some training/credentialing steps. I was planning to submit something to KDD this year trying to improve upon this work. I was thinking of simply following their experimental procedure for my method and use the results of all models reported in their paper as baselines. So I emailed the corresponding author who immediately directed the first author to contact me. The first author then shared a Github repo that was created 3 weeks ago. However, the experimental setup was still very vague (like the first preprocessing script assumed that a file is already available while the raw data is spread across directories and there was no clarity about what folders were even used). Initially the author was pretty fast in responding to my emails (took maybe 10-15 mins or so), but as soon as I asked for the script to create this file, they first said that they cannot share the script as the data is behind the credentialing step. However, having worked in this field for 4 years now, I know that you can share codes, but not data in this case. However, I actually sent proof that I have access to the data and shared my data usage agreement. However, it's been 7 hrs or so and no response. I mean, I have seen this type of radio silence from researchers from Chinese Universities before. But the authors of this paper are actually from a good R-1 University in the US. So it was kinda weird. I do not want to specifically reveal the names of the paper or the authors but what is the harm in sharing your experimental setup? I would have actually cited their work had I been able to code this up. Also, I do not get how such a borderline paper (in terms of the technical novelty) with poor reproducibility get into KDD in the first place? submitted by /u/Massive-Bobcat-5363 [link] [comments]
- [D] Burnout from the hiring processby /u/RNRuben (Machine Learning) on January 16, 2026 at 7:16 pm
I've been interviewing for research (some engineering) interships for the last 2 months, and I think I'm at a point of mental exhaustion from constant rejections and wasted time. For context, I just started my master’s at Waterloo, but I'm a research associate at one of the top labs in Europe. I have been doing research since my sophomore year. I did not start in ML, but over the last year and a half, I ended up in ML research, first in protein design and now in pretraining optimization. I started applying for interships a few months ago, and after 10+ first-round interviews and endless OAs, I haven't landed any offers. Most of the companies that I've interviewed with were a mix of (non-FAANG) frontier AI companies, established deep tech startups, research labs of F100 companies, a couple non name startups, and a quant firm. I get past a few rounds, then get cut. The feedback in general is that I'm not a good "fit" (a few companies told me I'm too researchy for a research engineer, another few were researching some niche stuff). And the next most common reason is that I failed the coding technical (I have no issue passing the research and ML theory technical interviews), but I think too slow for an engineer, and it's never the same type of questions (with one frontier company, I passed the research but failed the code review) and I'm not even counting OAs. Not a single one asked Leetcode or ML modelling; it's always some sort of a custom task that I have no prior experience with, so it's never the same stuff I can prepare. I'm at a loss, to be honest. Every PhD and a bunch of master's students in our lab have interned at frontier companies, and I feel like a failure that, after so many interviews, I can't get an offer. Because of my CV (no lies), I don't have a problem getting interviews, but I can't seem to get an offer. I've tried applying for non-research and less competitive companies, but I get hit with "not a good fit." I have 3 technicals next week, and tbh I know for a fact I'm not gonna pass 2 of them (too stupid to be a quant researcher) and the other is a 3rd round technical, but from the way he described it I don't think I'll be passing it (they're gonna throw a scientific simulation coding problem at me). And I still need to schedule one more between those 3, but I'm not sure why they even picked me, I don't do RL or robotics research. After so many days and hours spent preparing for each technical only to get cut, I mentally can't get myself to prepare for them anymore. It's always a new random format. I'm severely burned out by this whole process, but time is running out. I love research, but I'm starting to hate the hiring process in this industry. Any advice on what to do? submitted by /u/RNRuben [link] [comments]
- [P] vLLM-MLX: Native Apple Silicon LLM inference - 464 tok/s on M4 Maxby /u/waybarrios (Machine Learning) on January 16, 2026 at 5:05 pm
Hey everyone! I built vLLM-MLX - a framework that uses Apple's MLX for native GPU acceleration. What it does: - OpenAI-compatible API (drop-in replacement for your existing code) - Multimodal support: Text, Images, Video, Audio - all in one server - Continuous batching for concurrent users (3.4x speedup) - TTS in 10+ languages (Kokoro, Chatterbox models) - MCP tool calling support Performance on M4 Max: - Llama-3.2-1B-4bit → 464 tok/s - Qwen3-0.6B → 402 tok/s - Whisper STT → 197x real-time Works with standard OpenAI Python SDK - just point it to localhost. GitHub: https://github.com/waybarrios/vllm-mlx submitted by /u/waybarrios [link] [comments]
- [D] ICASSP 2026 Resultsby /u/Financial-Panda6581 (Machine Learning) on January 16, 2026 at 3:18 pm
It looks like ICASSP 2026 decisions may already be accessible. If you can log in to the following link and successfully send an invitation email, that seems to indicate your paper has been accepted: https://cmsworkshops.com/ICASSP2026/author_invitation_request.php The email says: “On behalf of IEEE ICASSP 2026, I invite you to join us for the upcoming conference. We are pleased to inform you that your submission has been accepted for presentation at the 2026 IEEE International Conference on Acoustics, Speech, and Signal Processing (IEEE ICASSP 2026) in Barcelona, Spain, during 3–8 May 2026. ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. It offers a comprehensive technical program presenting all the latest development in research and technology in the industry that attracts thousands of professionals annually.” Hopefully this helps others who are anxiously waiting. Good luck everyone -------- Update: It was a bug that got fixed within a few hours. It looks like no one can access it right now. “Error: No match for paper number and password. 0x4C”. -------- Update: Just got the official email! 🥰 ID 9000-10000 Some folks haven’t gotten the email yet, but they can already find their papers on the accepted list here: https://cmsworkshops.com/ICASSP2026/papers/accepted_papers.php you can also check a community-maintained spreadsheet compiled by users on another platform: https://docs.qq.com/sheet/DY3NTYVhwVVVGUUtx?tab=BB08J2 The list is still updating, so no worries if yours isn’t there yet just give it a bit more time. You can check your paper status here: https://cmsworkshops.com/ICASSP2026/Papers/FindPaperStatus.asp submitted by /u/Financial-Panda6581 [link] [comments]
- [D] Why Mamba rewrote its core algorithm and Microsoft abandoned RetNetby /u/petroslamb (Machine Learning) on January 16, 2026 at 2:47 pm
Mamba-2 restructured its recurrence from parallel scans (10-20% Tensor Core utilization) to block-diagonal GEMMs (60-70%). The architecture bent to fit the silicon. RetNet was published by Microsoft Research in July 2023 with promising results at 6.7B. Five months later, the same organization shipped Phi-2, a dense Transformer. Then Phi-3. Then Phi-4. The co-authors didn't bet on their own architecture. I wrote an analysis of why this pattern keeps repeating. The short version: Transformers and NVIDIA GPUs co-evolved into a stable attractor. Breaking out requires clearing two reinforcing gates at once, hardware compatibility and institutional backing, and the gates make each other harder to pass. At frontier scale, no pure alternative has done it. Essay has Tensor Core utilization numbers, analysis of alternative chip vendors, and three falsifiable predictions for 2028. submitted by /u/petroslamb [link] [comments]
- [D] Does weight decay in RealNVP (Normalizing flows) encourage identity transforms?by /u/Screech-1 (Machine Learning) on January 16, 2026 at 10:00 am
I’m looking for some opinions on the use of weight decay in RealNVP-style normalizing flows. My concern is that blindly applying standard weight decay (L2 on parameters) may be actively harmful in this setting. In RealNVP, each coupling layer is explicitly structured so that small weights push the transformation toward the identity map. With weight decay, we’re therefore not just regularizing capacity, we are actually biasing the model towards doing nothing. In flows, the identity transform is a perfectly valid (and often high-likelihood early) solution (especially if you zero init your scale networks which seems to be standard practice), so weight decay feels like it’s reinforcing a bad inductive bias. Most implementations seem to include weight decay by default, but I haven’t seen much discussion about whether it actually makes sense for invertible models. EDIT: Following this post, I took the liberty of exploring this question through a toy problem. The setup is intentionally simple: I train a RealNVP-style flow to map between a standard Gaussian and a learned latent distribution coming from another model I’m working on. The target latent distribution has very small variance (overall std ≈ 0.067, with some dimensions down at 1e-4), which makes the identity-map bias especially relevant. I ran a small ablation comparing no weight decay vs standard L2 (1e-4), keeping everything else fixed. With weight decay 0: === ABLATION CONFIG === weight_decay: 0.0 tanh_scale: 3.0 grad_clip: 1.0 lr: 0.001 epochs: 2000 print_every: 200 Latents: mean=0.0008, std=0.0667 per-dim std: min=0.0002, max=0.1173 === TRAINING === Epoch 200 | NLL: -801.28 | z_std: 0.900 | inv_std: 0.0646 | base1: [0.06573893129825592, 0.04342599958181381, 0.08187682926654816] Epoch 400 | NLL: -865.13 | z_std: 0.848 | inv_std: 0.0611 | base1: [0.10183795541524887, 0.05562306195497513, 0.14103063941001892] Epoch 600 | NLL: -892.77 | z_std: 0.956 | inv_std: 0.0618 | base1: [0.12410587072372437, 0.06660845875740051, 0.1999545693397522] Epoch 800 | NLL: -925.00 | z_std: 1.055 | inv_std: 0.0650 | base1: [0.13949117064476013, 0.07608211040496826, 0.2613525688648224] Epoch 1000 | NLL: -952.22 | z_std: 0.957 | inv_std: 0.0651 | base1: [0.1513708531856537, 0.08401045948266983, 0.3233321011066437] Epoch 1200 | NLL: -962.60 | z_std: 0.930 | inv_std: 0.0630 | base1: [0.16100724041461945, 0.09044866263866425, 0.385517954826355] Epoch 1400 | NLL: -972.35 | z_std: 1.120 | inv_std: 0.0644 | base1: [0.16973918676376343, 0.09588785469532013, 0.4429493546485901] Epoch 1600 | NLL: -1003.05 | z_std: 1.034 | inv_std: 0.0614 | base1: [0.17728091776371002, 0.10034342855215073, 0.4981722831726074] Epoch 1800 | NLL: -1005.57 | z_std: 0.949 | inv_std: 0.0645 | base1: [0.18365693092346191, 0.10299171507358551, 0.5445704460144043] Epoch 2000 | NLL: -1027.24 | z_std: 0.907 | inv_std: 0.0676 | base1: [0.19001561403274536, 0.10608844459056854, 0.5936127305030823] === FINAL EVALUATION === Target: mean=0.0008, std=0.0667 Forward: mean=0.0239, std=0.9074 (should be ~0, ~1) Inverse: mean=0.0009, std=0.0644 (should match target) With weight decay 1e-4: === ABLATION CONFIG === weight_decay: 0.0001 tanh_scale: 3.0 grad_clip: 1.0 lr: 0.001 epochs: 2000 print_every: 200 Latents: mean=0.0008, std=0.0667 per-dim std: min=0.0002, max=0.1173 === TRAINING === Epoch 200 | NLL: -766.17 | z_std: 0.813 | inv_std: 0.1576 | base1: [0.06523454189300537, 0.04702048376202583, 0.07113225013017654] Epoch 400 | NLL: -795.67 | z_std: 1.064 | inv_std: 0.7390 | base1: [0.08956282585859299, 0.0620030015707016, 0.10142181813716888] Epoch 600 | NLL: -786.70 | z_std: 1.004 | inv_std: 0.1259 | base1: [0.09346793591976166, 0.06835056096315384, 0.11534363776445389] Epoch 800 | NLL: -772.45 | z_std: 1.146 | inv_std: 0.1531 | base1: [0.09313802421092987, 0.06970944255590439, 0.12027867138385773] Epoch 1000 | NLL: -825.67 | z_std: 0.747 | inv_std: 0.1728 | base1: [0.09319467097520828, 0.06899876147508621, 0.12167126685380936] Epoch 1200 | NLL: -817.38 | z_std: 0.911 | inv_std: 0.1780 | base1: [0.09275200963020325, 0.06717729568481445, 0.12130238860845566] Epoch 1400 | NLL: -831.18 | z_std: 0.722 | inv_std: 0.1677 | base1: [0.0924605205655098, 0.0654158964753151, 0.1201595664024353] Epoch 1600 | NLL: -833.45 | z_std: 0.889 | inv_std: 0.1919 | base1: [0.09225902706384659, 0.06358200311660767, 0.11815735697746277] Epoch 1800 | NLL: -838.98 | z_std: 0.893 | inv_std: 0.1714 | base1: [0.09210160374641418, 0.06210005283355713, 0.11663311719894409] Epoch 2000 | NLL: -832.70 | z_std: 0.812 | inv_std: 0.1860 | base1: [0.0919715166091919, 0.060423776507377625, 0.11383745074272156] === FINAL EVALUATION === Target: mean=0.0008, std=0.0667 Forward: mean=-0.0090, std=0.8116 (should be ~0, ~1) Inverse: mean=0.0023, std=0.2111 (should match target) Without weight decay, the model steadily moves away from the identity. The inverse pass closely matches the target latent statistics, and the forward pass converges to something very close to a standard normal (std ≈ 0.91 by the end, still improving). NLL improves monotonically, and the learned base transform parameters keep growing, indicating the model is actually using its capacity. With weight decay, training is noticeably different. NLL plateaus much earlier and fluctuates. More importantly, the inverse mapping never fully contracts to the target latent distribution (final inverse std ≈ 0.21 vs target 0.067). The forward mapping also under-disperses (std ≈ 0.81). Qualitatively, this looks exactly like the concern I raised originally: weight decay doesn’t just regularize complexity here. Now, I’m not claiming this means “never use weight decay in flows,” but in appears that indeed in certain settings one should definitely think twice :D. submitted by /u/Screech-1 [link] [comments]
- [D] Is “video sentiment analysis” actually a thing?by /u/YiannisPits91 (Machine Learning) on January 16, 2026 at 9:48 am
We’ve been doing sentiment analysis on text forever(tweets, reviews, comments, etc). But what about video? With so much content now being video-first (YouTube, TikTok, ads, UGC, webinars), I’m wondering if anyone is actually doing sentiment analysis on video in a serious way. Things like: detecting positive / negative tone in spoken video understanding context around product mentions knowing when something is said in a video, not just that it was said analysing long videos, not just short clips I’m curious if: this is already being used in the real world it’s mostly research / experimental or people still just rely on transcripts + basic metrics Would love to hear from anyone in ML, data, marketing analytics, or CV who’s seen this in practice or experiemented with it. submitted by /u/YiannisPits91 [link] [comments]
- [R] China just released first SOTA multimodal model trained entirely on domestic chipsby /u/Different_Case_6484 (Machine Learning) on January 16, 2026 at 8:27 am
Zhipu AI and Huawei just dropped GLM-Image, and the technical details are interesting. First multimodal model trained completely on Chinese chips (Huawei Ascend 910) from data preprocessing to full scale training. They're using a hybrid architecture combining autoregressive + diffusion decoder. What stands out is the Chinese text rendering. It consistently ranks first among open source models for complex text generation, especially handling Chinese characters which most models struggle with. Native support for 1024 to 2048 resolution at any aspect ratio without additional training. API pricing is 0.1 yuan per image (roughly $0.014). The model handles both text to image and image to image generation in a single model. GitHub and Hugging Face repos are already up. This is significant because it proves you can train frontier models without relying on Nvidia hardware. The compute efficiency numbers they're claiming are 60% better than H200 for tokens per joule. Whether those benchmarks hold up in practice remains to be seen but the fact they pulled this off on domestic hardware is noteworthy. submitted by /u/Different_Case_6484 [link] [comments]
- [P] cv-pipeline: A minimal PyTorch toolkit for CV researchers who hate boilerplateby /u/Extension_Key_5970 (Machine Learning) on January 16, 2026 at 7:14 am
To all DS and ML researchers If someone got tired of copy-pasting the same data loading, training loops, and export code for every CV project. So I built a toolkit that handles the boring stuff. What it does: from cv_pipeline import quick_train, analyze_dataset, export_model # Analyze your dataset analyze_dataset("./my_images") # Train (one line) model, history = quick_train("./my_images", model="efficientnet_b0", epochs=10) # Export for deployment export_model(model, "model.onnx", format="onnx") Key features: Data loading - Point to a folder, get DataLoaders. Handles splits, augmentation, and normalisation. 50+ architectures - ResNet, EfficientNet, ViT, MobileNet via timm. One-line model loading. Dataset analysis - Class distribution, imbalance detection, image stats. Model comparison: benchmark multiple architectures on your data. Export - TorchScript, ONNX, state_dict. CLI - cv-pipeline train --data ./images --model resnet50 --epochs 20 Notebook generator - Auto-generate starter notebooks for classification/detection/segmentation. CLI example: # Analyze dataset cv-pipeline analyze --data ./images # Train cv-pipeline train --data ./images --model efficientnet_b0 --epochs 20 # Compare models cv-pipeline compare --models resnet50,efficientnet_b0,vit_base --data ./images Not a framework - just utilities. Use with your existing PyTorch code. No lock-in. Built for rapid prototyping and experiment iteration. Includes configs for medical imaging, manufacturing QC, retail, and document processing use cases. GitHub: https://github.com/var1914/pytorch-ml-pipeline Feedback welcome. What utilities would you add? submitted by /u/Extension_Key_5970 [link] [comments]
- [R] Is it possible for a high school student to publish multiple papers at top conferences within a year?by /u/ApprehensiveEgg5201 (Machine Learning) on January 16, 2026 at 1:12 am
I recently came across the Google Scholar profile of a high school student and was quite astonished by the strength of his publication record. Even more strikingly, he is also serving as a reviewer for ICLR and AISTATS. submitted by /u/ApprehensiveEgg5201 [link] [comments]
- [D] Scale AI ML Research Engineer Interviewsby /u/sailor-goon-is-here (Machine Learning) on January 16, 2026 at 1:06 am
Hi, I'm looking for help into preparing for the upcoming coding interviews for an ML research engineer position I applied to at Scale. These are for the onsite. The first coding question relates parsing data, data transformations, getting statistics about the data. The second (ML) coding involves ML concepts, LLMs, and debugging. I found the description of the ML part to be a bit vague. For those that have done this type of interview, what did you do to prepare? So far on my list, I have reviewing hyperparameters of LLMs, PyTorch debugging, transformer debugging, and data pipeline pre-processing, ingestion, etc. Will I need to implement NLP or CV algorithms from scratch? Any insight to this would be really helpful. submitted by /u/sailor-goon-is-here [link] [comments]
- [P] Adaptive load balancing in Go for LLM traffic - harder than expectedby /u/dinkinflika0 (Machine Learning) on January 15, 2026 at 6:58 pm
I am an open source contributor, working on load balancing for Bifrost (LLM gateway) and ran into some interesting challenges with Go implementation. Standard weighted round-robin works fine for static loads, but LLM providers behave weirdly. OpenAI might be fast at 9am, slow at 2pm. Azure rate limits kick in unexpectedly. One region degrades while others stay healthy. Built adaptive routing that adjusts weights based on live metrics - latency, error rates, throughput. Used EWMAs (exponentially weighted moving averages) to smooth out spikes without overreacting to noise. The Go part that was tricky: tracking per-provider metrics without locks becoming a bottleneck at high RPS. Ended up using atomic operations for counters and a separate goroutine that periodically reads metrics and recalculates weights. Keeps the hot path lock-free. Also had to handle provider health scoring. Not just "up or down" but scoring based on recent performance. A provider recovering from issues should gradually earn traffic back, not get slammed immediately. Connection pooling matters more than expected. Go's http.Transport reuses connections well, but tuning MaxIdleConnsPerHost made a noticeable difference under sustained load. Running this at 5K RPS with sub-microsecond overhead now. The concurrency primitives in Go made this way easier than Python would've been. Anyone else built adaptive routing in Go? What patterns worked for you? submitted by /u/dinkinflika0 [link] [comments]
- [R] statistical learning in machine learning vs cognitive sciencesby /u/Ok_Fudge1993 (Machine Learning) on January 15, 2026 at 3:22 pm
Hi everyone! Please bear with me with this question 🫣 I’m looking for someone in research to pick their brain about the similarities and differences between statistical learning in cognitive science and in machine learning, so definition, conceptual differences/similarities, predictions, testing…. Hope it makes sense, I’m doing research in cognitive sciences and I’d love to learn more about this term’s use in ML for a review I’m working on 🙂 thanks! submitted by /u/Ok_Fudge1993 [link] [comments]
- ISBI 2026: Results Out [D]by /u/ade17_in (Machine Learning) on January 15, 2026 at 7:02 am
Results out for ISBI 2026 - London a few days back. Just want to check with fellow medical imaging peeps on how did it go for all. Results were delayed by a month and I see a pretty high acceptance rate this time. submitted by /u/ade17_in [link] [comments]






















96DRHDRA9J7GTN6