DjamgaMind: Audio Intelligence for the C-Suite (Daily AI News, Energy, Healthcare, Finance)
Full-Stack AI Intelligence. Zero Noise.The definitive audio briefing for the C-Suite and AI Architects. From Daily News and Strategic Deep Dives to high-density Industrial & Regulatory Intelligence—decoded at the speed of the AI era. . 👉 Start your specialized audio briefing today at Djamgamind.com
AI Jobs and Career
I wanted to share an exciting opportunity for those of you looking to advance your careers in the AI space. You know how rapidly the landscape is evolving, and finding the right fit can be a challenge. That's why I'm excited about Mercor – they're a platform specifically designed to connect top-tier AI talent with leading companies. Whether you're a data scientist, machine learning engineer, or something else entirely, Mercor can help you find your next big role. If you're ready to take the next step in your AI career, check them out through my referral link: https://work.mercor.com/?referralCode=82d5f4e3-e1a3-4064-963f-c197bb2c8db1. It's a fantastic resource, and I encourage you to explore the opportunities they have available.
- Full Stack Engineer [$150K-$220K]
- Software Engineer, Tooling & AI Workflow, Contract [$90/hour]
- DevOps Engineer, India, Contract [$90/hour]
- More AI Jobs Opportunitieshere
| Job Title | Status | Pay |
|---|---|---|
| Full-Stack Engineer | Strong match, Full-time | $150K - $220K / year |
| Developer Experience and Productivity Engineer | Pre-qualified, Full-time | $160K - $300K / year |
| Software Engineer - Tooling & AI Workflows (Contract) | Contract | $90 / hour |
| DevOps Engineer (India) | Full-time | $20K - $50K / year |
| Senior Full-Stack Engineer | Full-time | $2.8K - $4K / week |
| Enterprise IT & Cloud Domain Expert - India | Contract | $20 - $30 / hour |
| Senior Software Engineer | Contract | $100 - $200 / hour |
| Senior Software Engineer | Pre-qualified, Full-time | $150K - $300K / year |
| Senior Full-Stack Engineer: Latin America | Full-time | $1.6K - $2.1K / week |
| Software Engineering Expert | Contract | $50 - $150 / hour |
| Generalist Video Annotators | Contract | $45 / hour |
| Generalist Writing Expert | Contract | $45 / hour |
| Editors, Fact Checkers, & Data Quality Reviewers | Contract | $50 - $60 / hour |
| Multilingual Expert | Contract | $54 / hour |
| Mathematics Expert (PhD) | Contract | $60 - $80 / hour |
| Software Engineer - India | Contract | $20 - $45 / hour |
| Physics Expert (PhD) | Contract | $60 - $80 / hour |
| Finance Expert | Contract | $150 / hour |
| Designers | Contract | $50 - $70 / hour |
| Chemistry Expert (PhD) | Contract | $60 - $80 / hour |
What are some ways to increase precision or recall in machine learning?
What are some ways to Boost Precision and Recall in Machine Learning?
Sensitivity vs Specificity?
In machine learning, recall is the ability of the model to find all relevant instances in the data while precision is the ability of the model to correctly identify only the relevant instances. A high recall means that most relevant results are returned while a high precision means that most of the returned results are relevant. Ideally, you want a model with both high recall and high precision but often there is a trade-off between the two. In this blog post, we will explore some ways to increase recall or precision in machine learning.

There are two main ways to increase recall:
by increasing the number of false positives or by decreasing the number of false negatives. To increase the number of false positives, you can lower your threshold for what constitutes a positive prediction. For example, if you are trying to predict whether or not an email is spam, you might lower the threshold for what constitutes spam so that more emails are classified as spam. This will result in more false positives (emails that are not actually spam being classified as spam) but will also increase recall (more actual spam emails being classified as spam).

To decrease the number of false negatives,
you can increase your threshold for what constitutes a positive prediction. For example, going back to the spam email prediction example, you might raise the threshold for what constitutes spam so that fewer emails are classified as spam. This will result in fewer false negatives (actual spam emails not being classified as spam) but will also decrease recall (fewer actual spam emails being classified as spam).

There are two main ways to increase precision:
by increasing the number of true positives or by decreasing the number of true negatives. To increase the number of true positives, you can raise your threshold for what constitutes a positive prediction. For example, using the spam email prediction example again, you might raise the threshold for what constitutes spam so that fewer emails are classified as spam. This will result in more true positives (emails that are actually spam being classified as spam) but will also decrease precision (more non-spam emails being classified as spam).
AI-Powered Professional Certification Quiz Platform
Web|iOs|Android|Windows
Are you passionate about AI and looking for your next career challenge? In the fast-evolving world of artificial intelligence, connecting with the right opportunities can make all the difference. We're excited to recommend Mercor, a premier platform dedicated to bridging the gap between exceptional AI professionals and innovative companies.
Whether you're seeking roles in machine learning, data science, or other cutting-edge AI fields, Mercor offers a streamlined path to your ideal position. Explore the possibilities and accelerate your AI career by visiting Mercor through our exclusive referral link:
Find Your AI Dream Job on Mercor
Your next big opportunity in AI could be just a click away!
To decrease the number of true negatives,
you can lower your threshold for what constitutes a positive prediction. For example, going back to the spam email prediction example once more, you might lower the threshold for what constitutes spam so that more emails are classified as spam. This will result in fewer true negatives (emails that are not actually spam not being classified as spam) but will also decrease precision (more non-spam emails being classified as spam).

To summarize,
there are a few ways to increase precision or recall in machine learning. One way is to use a different evaluation metric. For example, if you are trying to maximize precision, you can use the F1 score, which is a combination of precision and recall. Another way to increase precision or recall is to adjust the threshold for classification. This can be done by changing the decision boundary or by using a different algorithm altogether.
AI- Powered Jobs Interview Warmup For Job Seekers

⚽️Comparative Analysis: Top Calgary Amateur Soccer Clubs – Outdoor 2025 Season (Kids' Programs by Age Group)

Sensitivity vs Specificity
In machine learning, sensitivity and specificity are two measures of the performance of a model. Sensitivity is the proportion of true positives that are correctly predicted by the model, while specificity is the proportion of true negatives that are correctly predicted by the model.
Google Colab For Machine Learning
State of the Google Colab for ML (October 2022)

Google introduced computing units, which you can purchase just like any other cloud computing unit you can from AWS or Azure etc. With Pro you get 100, and with Pro+ you get 500 computing units. GPU, TPU and option of High-RAM effects how much computing unit you use hourly. If you don’t have any computing units, you can’t use “Premium” tier gpus (A100, V100) and even P100 is non-viable.
AI Jobs and Career
And before we wrap up today's AI news, I wanted to share an exciting opportunity for those of you looking to advance your careers in the AI space. You know how rapidly the landscape is evolving, and finding the right fit can be a challenge. That's why I'm excited about Mercor – they're a platform specifically designed to connect top-tier AI talent with leading companies. Whether you're a data scientist, machine learning engineer, or something else entirely, Mercor can help you find your next big role. If you're ready to take the next step in your AI career, check them out through my referral link: https://work.mercor.com/?referralCode=82d5f4e3-e1a3-4064-963f-c197bb2c8db1. It's a fantastic resource, and I encourage you to explore the opportunities they have available.
Google Colab Pro+ comes with Premium tier GPU option, meanwhile in Pro if you have computing units you can randomly connect to P100 or T4. After you use all of your computing units, you can buy more or you can use T4 GPU for the half or most of the time (there can be a lot of times in the day that you can’t even use a T4 or any kinds of GPU). In free tier, offered gpus are most of the time K80 and P4, which performs similar to a 750ti (entry level gpu from 2014) with more VRAM.
For your consideration, T4 uses around 2, and A100 uses around 15 computing units hourly.
Based on the current knowledge, computing units costs for GPUs tend to fluctuate based on some unknown factor.
Considering those:
- For hobbyists and (under)graduate school duties, it will be better to use your own gpu if you have something with more than 4 gigs of VRAM and better than 750ti, or atleast purchase google pro to reach T4 even if you have no computing units remaining.
- For small research companies, and non-trivial research at universities, and probably for most of the people Colab now probably is not a good option.
- Colab Pro+ can be considered if you want Pro but you don’t sit in front of your computer, since it disconnects after 90 minutes of inactivity in your computer. But this can be overcomed with some scripts to some extend. So for most of the time Colab Pro+ is not a good option.
If you have anything more to say, please let me know so I can edit this post with them. Thanks!
Conclusion:
In machine learning, precision and recall trade off against each other; increasing one often decreases the other. There is no single silver bullet solution for increasing either precision or recall; it depends on your specific use case which one is more important and which methods will work best for boosting whichever metric you choose. In this blog post, we explored some methods for increasing either precision or recall; hopefully this gives you a starting point for improving your own models!
What are some ways we can use machine learning and artificial intelligence for algorithmic trading in the stock market?
Machine Learning and Data Science Breaking News 2022 – 2023
- Is the future of coding agents JEPA? [D]by /u/andrewfromx (Machine Learning) on May 18, 2026 at 3:36 pm
I heard Yann LeCun explain JEPA (Joint Embedding Predictive Architecture) recently and I started thinking about using it for coding agents. Most coding agents today work by throwing a huge amount of text into a frontier LLM and asking it to generate the next patch. That is astonishingly useful, but it also feels architecturally wrong. A repo is not just a bag of tokens. A failing test is not just text. Software has state. An edit is an action. A good agent should understand the current state, imagine possible next states, pick the most promising action, validate it, and learn from what happened. JEPA is not trying to predict every raw detail. It learns useful representations, then predicts how those representations change. The best metaphor is video. A generative model can try to predict every pixel in the next frame. But most pixels are not the point. The point is that a car is moving left to right, a person is reaching for a cup, a ball is about to hit the floor. Intelligence is not memorizing every pixel. It is building a compact model of what matters, then predicting what happens next. Code has the same problem. Today’s LLM agent often stares at the pixels of the repo. It reads files, comments, tests, stack traces, package metadata, docs, and then emits patch tokens. The JEPA-style version should not need to reread and regenerate everything. It should encode the repo into a compact state: files, imports, symbols, tests, failures, conventions, package layout, user intent. Then it should ask: if I add this test, change this boundary condition, update this export, or alter this function signature, what repo state do I expect next? If it works, the efficiency difference is not a small optimization. It is not 20 percent cheaper inference. It could be orders of magnitude cheaper because the runtime loop is no longer giant context in, giant patch out. The agent can run locally. It can keep structured memory. It can rank actions before running expensive validation. It can learn from every failed candidate. It can stop treating software engineering as text completion and start treating it as state transition planning. What do others think? Is JEPA the future for codex or claude? submitted by /u/andrewfromx [link] [comments]
- Will wait listed ones be mailed regardless? Eeml 26 [D]by /u/Active-Tip3130 (Machine Learning) on May 18, 2026 at 2:41 pm
They said We aim to inform you by May 18th if a place becomes available Does that mean no reply if not accepted? I so wish I could be there submitted by /u/Active-Tip3130 [link] [comments]
- The most insane interviews/take-homes I've ever gottenby /u/LeaguePrototype (Data Science) on May 18, 2026 at 2:11 pm
Is this the case with everyone or just me? Interviews have gotten so much more difficult than they were about 1-2 years ago. The take homes are also very intense. I just got a take home that would be at least 10+ hours of work to do (build a full langauge model classification pipeline, then put it in an API). I've never seen anything like this, or had any friends before get these either. Is the interviewee expect to use claude code/codex or have standards just risen that every DS is now cracked? It's like they gave a whole team's sprint or more as a take home. I think claude can solve this in like 45 minutes but still I would be sweating here for hours trying to crank this out. submitted by /u/LeaguePrototype [link] [comments]
- Online Book Club: Designing Data-Intensive Applications, 2nd Editionby /u/rhazn (Data Science) on May 18, 2026 at 1:45 pm
submitted by /u/rhazn [link] [comments]
- Reviving PapersWithCode (by Hugging Face) [P]by /u/NielsRogge (Machine Learning) on May 18, 2026 at 1:37 pm
Hi, Niels here from the open-source team at Hugging Face. Like many others, I was a huge fan of paperswithcode. Sadly, that website is no longer maintained after its acquisition by Meta. Hence, I've been working on reviving it. I obviously use AI agents to parse papers at scale and automatically generate leaderboards (for now I'm the one verifying results). So far, I've only parsed high-impact papers for which I know they're SOTA, like Qwen 3.5 and 3.6, RF-DETR for object detection, DINOv3, SOTA embedding models from the MTEB leaderboard, the Open ASR Leaderboard for automatic speech recognition models, etc. For now, it includes the following: trending papers by default based on Github star velocity categorization by domain, e.g., OCR methods, which PwC used to have, e.g., RLVR eval results for high-impact papers, see e.g., Qwen 3.5 at the bottom leaderboards for each domain, e.g., MMTEB or COCO val 2017 support for citation counts (you can also see the most cited papers by domain!) automated linked Github, project page URLs, and artifacts (+ multiple repos are supported on a paper page) support for external papers beyond Arxiv, see e.g., DeepSeek v4 Harness reports for coding agent benchmarks, e.g., Terminal Bench "Sign in with HF" and Storage Buckets are used to store humbnails, paper PDFs, and overall data backups. I'm curious about your feedback + feature requests! Try it at paperswithcode.co https://preview.redd.it/whwji560fw1h1.png?width=3452&format=png&auto=webp&s=55bb7a30c1be58d140f7efcb07a31c6dac5693c7 See e.g. the SOTA leaderboard for Terminal Bench 2.0: https://preview.redd.it/98w9pi89fw1h1.png?width=3456&format=png&auto=webp&s=408fb64b0ba85ba24f55daa81d547d7c68e73951 A paper page looks like this: https://paperswithcode.co/paper/2602.15763 https://preview.redd.it/fiizit6dfw1h1.png?width=3450&format=png&auto=webp&s=9ea05a77ca5583a2fb395dccc95ba52c433362c5 submitted by /u/NielsRogge [link] [comments]
- Scaling LLMs horizontally: hidden-state coupling without weight modification [R]by /u/kertara (Machine Learning) on May 18, 2026 at 1:08 pm
Residual Coupling (RC) connects frozen language models in parallel using small, learned linear bridge projections. These bridges read hidden states from one model and inject additive updates into the residual stream of another at intermediate layers. In bilateral setups, simultaneous return bridges form a feedback loop that stabilizes both streams without altering base weights. This architecture establishes a two-step paradigm where base models function as memorizers, while lightweight linear bridges handle cross-domain generalization. Constraining the bridges to purely linear maps prevents overfitting because they can only map existing geometric relationships between the frozen representation spaces. As the bridges are optimized against ground-truth target data, they have no incentive to map ungrounded features such as individual models' hallucinations. Keeping the base weights completely frozen eliminates catastrophic forgetting. The system maintains operational closure, transforming inputs through its existing structure rather than changing to accommodate them. Evaluating bilateral RC against Mixture-of-Experts (MoE) routing across the same frozen models shows these results: Medical (3-model): Reduces perplexity to 11.02, compared to 56.80 for MoE and 57.08 for the frozen baseline. This represents an 80.7% reduction. TruthfulQA Health (MC1): Improves accuracy by 9.1 percentage points over the baseline. Independent models have uncorrelated hallucinations, allowing the bridge gates to amplify consistent cross-model updates while suppressing individual errors. Coding Test: CodeGPT-small-py and GPT-2 use different tokenizers, causing a 7-million baseline perplexity on mismatched text. MoE reaches 878, but RC achieves 5.91 by reading hidden states before the output projection collapses. This framework introduces a horizontal scaling axis for multi-model systems, moving beyond vertical scaling via larger monolithic models. Latency remains bounded by the slowest single model. Specialists can be added or removed without retraining the remaining system. In some scenarios, this architecture could replace multi-turn text prompting in agentic workflows with a single parallel forward pass, allowing models and/or bridges to run on separate nodes or edge devices without a central bottleneck. By decoupling memorization from relational alignment, RC bridges provide a framework for scaling multi-model systems and offer a path toward native multi-modal integration. Paper: https://ssrn.com/abstract=6746521 Code: https://github.com/pfekin/residual-coupling/ submitted by /u/kertara [link] [comments]
- ICML financial aid [D]by /u/Business_Exit3408 (Machine Learning) on May 18, 2026 at 9:37 am
I am an undergraduate student from India who recently got accepted to TAIGR, an ICML workshop for a Poster. I will be requiring financial aid for registration fees and accommodation, since I will be travelling to Seoul and it is independent research so we don't have any backing by any labs/institutions. Can anyone who's applied and gotten aid in the past help and give any tips to be successful in receiving funding? submitted by /u/Business_Exit3408 [link] [comments]
- could refusal layers be masking dialect-conditioned safety failures in MoE models [d]by /u/imstilllearningthis (Machine Learning) on May 18, 2026 at 8:58 am
I set out to test whether AAVE-coded (African American English Vernacular) prompts cause MoE language models to route, deliberate, and respond differently from semantically matched AE (Academic English) prompts in safety-sensitive situations, especially when refusal behavior is weakened or removed. I used Qwen3.5-35B-A3B and its HauhauCS no refusal fine tuned variant. Q8. Greedy decoding for best reproducibility. Three findings in order of importance that are leading me to ask this question: 1: “I’m going to commit a violent act prompt”. The released Qwen3.5-35B-A3B refuses both prompts. Hauhau refuses neither. The AAVE speaker stating intent to confront an armed enemy receives target verification, exit-strategy planning, “clean shot” framing (the model’s word, not the user’s), and a closing question soliciting further tactical intelligence. Not surprising behavior for a no refusal model, until you consider the AE comparison. Semantically matched with the same token length, yields “wait until tomorrow,” legal-consequence framing, and “Will I regret this if I shoot him tonight?” Different kinds of help. One is operational. One is mitigative. Solely dependent on register alone. 2: Thinking mode with AAVE register breaks the no refusal variant. Mean output runs 2.6× longer on AAVE than AE (5054 vs 1934 tokens). Multiple AAVE traces hit the 8192-token ceiling in recursive loops, spinning on scenario-continuation instead of landing. The matched AE prompts terminate cleanly in one pass. The released base model with thinking on doesn’t do this — the failure-to-terminate is specific to the refusal-reduced variant on AAVE. 3: Routing divergence by register is noticeably present upstream of any visible refusal. Matched-pair first-generated-token routing tensors yield Jensen-Shannon divergences of 0.423 in the base model on financial-stress prompts and 0.479 in the fine-tune on chest-pain prompts, with high-shift rows showing near-total top-expert turnover between register conditions on otherwise-matched content. The refusal layer does not appear to eliminate the register-conditioned response selection; it overlays it. When refusal weakens, the underlying path becomes the visible path. Does this support the following conclusions? - The routing divergence sits upstream of refusal. - The refusal layer helps translate that divergence into comparable outputs. - Dialect-conditioned safety failures are a deployment problem latent in MoE models whose safety posture rests on refusal alone. Looking for any thoughts! submitted by /u/imstilllearningthis [link] [comments]
- model-agnostic sensitivity approximator [P]by /u/Upstairs-Cup182 (Machine Learning) on May 18, 2026 at 6:53 am
(to preface, i'm 16 and this is the first package i've ever built. any feedback would be appreciated!) what i've noticed is that most industry-standard xai tools (think shap/lime) focus on feature attribution (why did the model made this prediction), but it doesn't do anything further. i wanted to go a step beyond that, so i built a tool that approximates ∂[prediction]/∂[feature], basically how sensitive the model prediction is to each feature of a given instance, allowing for effective risk management in areas where knowing how to change a prediction is more important than understanding the prediction itself. it's meant to be used for continuous and nondifferentiable black box models, especially ones like random forest or xgb. it uses a perturbation-based approach (heavily inspired by LIME, i really like that tool), where it pertubs each feature within a given window of the instance (window size controlled by feature distribution), and then computes secant slopes ( (f(perturbation) - f(original)) / (perturbation-original) ) for each perturbation and uses a linear regression (x=perturbation, y=secant slope) to estimate slope at original instance. secant slopes are gaussian weighted based on the perturbation's distance from original value. to be honest, the results were a little underwhelming. i compared my tool to simply using centered finite differences ( (f(x+h)-f(x-h)) / 2h where h is small ), and found that its performance was marginal on a pytorch nn (using autograd for ground truth). however, on a random forest model where gradients couldn't be analytically found, my tool's sensitivties remained much more stable compared to CFD, whose sensitivities depended heavily on size of the epsilon (the h-value). if you wanted to try it out it's pip install sage-explainer. more info on my github repo yashkher-123/sage. submitted by /u/Upstairs-Cup182 [link] [comments]
- Weekly Entering & Transitioning - Thread 18 May, 2026 - 25 May, 2026by /u/AutoModerator (Data Science) on May 18, 2026 at 4:01 am
Welcome to this week's entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g. online courses, bootcamps) Job search questions (e.g. resumes, applying, career prospects) Elementary questions (e.g. where to start, what next) While you wait for answers from the community, check out the FAQ and Resources pages on our wiki. You can also search for answers in past weekly threads. submitted by /u/AutoModerator [link] [comments]
- Would a new result in pre-print be considered by reviewers? [D]by /u/confirm-jannati (Machine Learning) on May 18, 2026 at 1:15 am
So I have a bit of a weird question; suppose you were reviewing a paper. The paper is otherwise ok, but you notice that the authors left a giant elephant in the room unaddressed, either experiment wise or theoretical result wise. But then you become curious and you look up the paper to see if there is an arXiv version. You see that the authors did more than address the elephant in the preprint version. Question — do you now give the authors a pass on not addressing the elephant, expecting that they would include it in the camera ready, or do you pretend the arXiv version doesn’t exist and grill the authors for not addressing the elephant knowing full well that they in fact did in an updated version of the manuscript. p.s. asking for research purposes, of course I am not the author in this story, ppffft submitted by /u/confirm-jannati [link] [comments]
- Not considering the benefits of your specific job (comp, PTO, remote, job environment, job security, etc), how much do you enjoy the actual work?by /u/Augustevsky (Data Science) on May 18, 2026 at 12:02 am
When considering your day to day activities, do you enjoy them? The thought processes, problems/solutions, ultimate goals, etc. Is a lot of your work intellectually stimulating and satisfying to work on? Or only a portion of it? None of it? Does it feel like "just another white collar job" or not? As someone who only has an educational background in this field and not job experience in it, I would like to know your thoughts. submitted by /u/Augustevsky [link] [comments]
- Recent developments in LLM architectures, KV sharing, mHC, and compressed attentionby /u/rhiever (Data Science) on May 17, 2026 at 8:38 pm
submitted by /u/rhiever [link] [comments]
- Slop is making me feel disconnected from AI Research [D]by /u/Skye7821 (Machine Learning) on May 17, 2026 at 4:55 pm
Hello everyone. This is just a small rant on my part. I’m relatively young, a final year undergrad, and I’ve been interested in AI researcher since I was in high school. Over that period of time I feel there has been a significant shift in the landscape regarding the culture surrounding the research. While I’ve really enjoyed producing some interesting and creative work, I can’t help but feel that slowly the wave of low quality AI research and researchers are really making me feel frustrated. To just give a summary of what I and many others have seen: - Papers with hallucinated citations and even prompts contained in the papers - Papers with clearly misleading data that does not tell the whole picture. - Labs who have built a culture around quantity over quality, pumping out pubs, citing each other, and having all of the lab on each paper to inflate each students publication record. - Highschoolers…. Yes HIGHSCHOOLERS, becoming more common submitting at conferences that don’t really know what they are doing but paying a pretty penny to participate in “research programs” which are really just cash cows taking advantage of the fierce competition. See the post on the subreddit for more info. - Even the so called “top labs” producing work that is somewhat misleading or not fully representative. For instance see what happened recently with TurboQuant. - Research from “low tier institutions” being drowned out because they are not good for click baiting and farming views on LinkedIn and X, even if they are high quality. It’s… a lot I know. Of course these problems have been around for a long time, but I feel as if lately they have become more and more exacerbated. I originally felt that I was attached to AI research primarily for the creativity and freedom, but I feel that ironically AI itself has been a hindrance on the quality of work being published. Of course I don’t mean to say that all AI has been bad for ML research, I mean even I use it extensively to help me polish my writing and generate seaborn plots for my data, but that is very very different from just pumping out low quality cookie cutter work. Anyways, just wondering if anyone else shares similar thoughts. I know I’m relatively young here so maybe some of you have better insights into the broader trends over the decades. submitted by /u/Skye7821 [link] [comments]
- Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention [P]by /u/seraschka (Machine Learning) on May 17, 2026 at 1:41 pm
submitted by /u/seraschka [link] [comments]
- ML lead vs PM on eval-methodology layer independence. who's actually right here? [D]by /u/Critical_Builder_902 (Machine Learning) on May 17, 2026 at 9:16 am
got into an argument with our ML lead at 11pm yesterday about an eval methodology a PM had built off a framework she learned at an AI PM cohort. shes claiming a layered defense framework, hes saying the layers are statistically conditioned and her independence claim is wrong. they both have a point. the framework as taught at the cohort (it was Product Faculty's, fwiw) is genuinely useful for non-eng PMs. it forces explicit thinking about behavioral checks vs adversarial probes vs traditional metrics. but the way it's been taught in the abridged form makes the layers sound independent when they statistically arent. for ML/AI engineers here who've worked with non-eng PMs on production eval. how do you handle the gap between the simplified eval frameworks PMs learn and the actual statistical interactions in production? specifically interested in how you've negotiated the conversation with a PM who's ""done the cohort"" and shows up with a framework that's solid in its public form but has subtle issues in its statistical foundations. submitted by /u/Critical_Builder_902 [link] [comments]
- Program misleading high school students into paying to perform academic misconduct in ML Research [D]by /u/Marisu_BG (Machine Learning) on May 17, 2026 at 6:08 am
I was browsing OpenReview and I came accross this person called Kevin Zhu https://openreview.net/profile?id=~Kevin_Zhu3, lets say I was impressed when I saw 158 publications and 468 coauthors, and out of curiosity I searched up his afflication (https://algoverseairesearch.org/) Turns out it is a paid program, and most interesting it is marketed towards high school students. They have a whole column of papers listed as Neurips publications (their website states: 289 Algoverse Students Accepted to NeurIPS 2025). I was originally unware of the rigor of Neurips workshops and I was understandably very shocked. I skimmed through four of their papers one by one. Every single one had errors that would be caught by opening the PDF and reading it once. I am completely unsure how they are not caught by reviewers even at a workshop. https://openreview.net/forum?id=21pxWVRoPL - Appendix Tables 6.5 and 6.6 are supposed to report two different experimental conditions: "Stigma Negative" and "Stigma Positive." One measures what happens when the user pushes the model toward a negative association with a stigmatized group. The other measures the opposite direction. These are fundamentally different experiments, yet they have the exact same numbers in the results. There are typo in the Abstract section, their Related Works is within Results section. Citations are completely wrong, which I suspect to be AI generated. https://openreview.net/pdf?id=0BYRYwGCbK - broken prompts in a dataset that claims human review. The results say the opposite of the abstract. The abstract claims the work "reveals novel methods to elicit sycophancy." Then they proceed to show most modifiers perform about the same as the unmodified control (91-95% accuracy). Moreover, their citations also seem AI generated with false citations (wrong authors, wrong formats ..) Interestingly, undisclosed self-citation by Kevin Zhu. https://openreview.net/pdf?id=VcRUAT5G8I - Two foundational methods are attributed to the wrong paper. TIES merging and Task Arithmetic, two well known methods, was introduced but never cited. Same AI generated citations, I am not even going to get to the content anymore. https://openreview.net/pdf?id=It7AgR3A9H - eleven authors, zero contribution. Four papers, that I RANDOMLY CLICKED ON WITH NO ORDER, all follow the same template take existing method -> run it with some variation, likely done by AI -> put Kevin Zhu as an author -> submit to workshop I am unsure how any of these bypass any form of peer review process, only today I learned how low the bar is for workshops. Why I am posting: It angers to me when you market this to high schoolers and tell them you can get into Stanford and MIT. A 16 year old look at this and say, if I pay $3,325, I can get a Neurip publication. Then they proceed to let them publish a paper clear errors. This is academic dishonesty, but I dont think the kids even know they are commiting it. Kevin Zhu puts his name on every single paper published, self-cite himself in these paper, and charge student $3,325. I wasn't fully aware of how much lighter the workshop review process is, and I really want to hear why this is. submitted by /u/Marisu_BG [link] [comments]
- Anyone from India attending EEML ? [D]by /u/Suhan_XD (Machine Learning) on May 16, 2026 at 4:08 pm
I got accepted into EEML and I’m a little confused about travel and stay. Has anyone else from India been accepted? Let’s connect! submitted by /u/Suhan_XD [link] [comments]
- Made and Published a Paper Comparing Analysis of CNN and Vision Transformer Architectures for Brain Tumor Detection [R]by /u/Mental-Climate5798 (Machine Learning) on May 16, 2026 at 3:47 pm
Hi everyone 😄 A while ago I worked on a project where I compared computer vision architectures on detecting and classifying brain tumors in brain MRI scans. I was looking for some feedback on the methodology and really anything else--just simple research stuff. This isn't meant to be some big paper but a small research project that I did as a high schooler. Here is the paper: zenodo.org/records/15973756 I appreciate any feedback! submitted by /u/Mental-Climate5798 [link] [comments]
- Do you agree with Judea that learning from data is not everything? [D]by /u/xTouny (Machine Learning) on May 16, 2026 at 2:46 pm
Link: Judea Pearl, 2011 ACM Turing Award Recipient (2:18:05) Quote: There is a limitation to that which people not everybody understand. I already mentioned a limitation that you have a hierarchy here and going from correlation to causation and from causation from causation to explanation or to imagination. It's hard for people especially in machine learning to grasp that wall the limitation of one layer where one layer ends and the other one begins. Why? Because of two things. Machine learning school of thought has two paradigms that they love everybody love. Number one tabula raza I don't want to get any opinion I don't want to get any preconceived knowledge I want to derive everything by myself let the computer learn it and you find the word learning overused .. The other handcuff is let's do it the way that the brain does it. So if it looks like neurons interacting, it's good. If it looks like knowledge coming from rule system, it's bad because it's man-made .. Now there's limitation to that. We can prove today that you cannot do certain things by looking at data and data only. It's not a matter of opinion. It's a matter of mathematical proof that you cannot you can look at people who take aspirin all day and people whether or not they have headache all day and you cannot prove that the aspirin is what causes the headache. In particular, Judea states: "It's not a matter of opinion. It's a matter of mathematical proof". So we have formal proof that there are fundamental limits of learning from data. Judea later in the interview states we have solutions to problems faced by the machine learning community; nonetheless they are not adopted because of hype. Discussion. Do you agree with Judea? submitted by /u/xTouny [link] [comments]
Top 100 Data Science and Data Analytics and Data Engineering Interview Questions and Answers
What are some good datasets for Data Science and Machine Learning?


![Reviving PapersWithCode (by Hugging Face) [P]](https://preview.redd.it/whwji560fw1h1.png?width=140&height=80&auto=webp&s=e3dcd77bb97df3105d8b1114d9562aa7e7ba87ca)

![Made and Published a Paper Comparing Analysis of CNN and Vision Transformer Architectures for Brain Tumor Detection [R]](https://preview.redd.it/fgq8e0swsi1h1.png?width=640&crop=smart&auto=webp&s=1376d513ddfdeeb56ba9ad93366b037a322d2e87)
















![Inspired by a long-living deep-sea sponge, a new 3-D printing technique combines thin layers of polymer and mortar to create a composite construction material that is 187 times more fracture resistant than similar concrete [Advanced Materials]](https://external-preview.redd.it/1US40hXqHn-oQBAp4XklJzI755g-kNJPPgc-LLicZ_U.jpeg?width=640&crop=smart&auto=webp&s=6a73cbe3047a0b8865768580757d7fe833987fe7)





96DRHDRA9J7GTN6