What are some ways to increase precision or recall in machine learning?

What are some ways to increase precision or recall in machine learning?
DjamgaMind - AI Unraveled Podcast

DjamgaMind: Audio Intelligence for the C-Suite (Daily AI News, Energy, Healthcare, Finance)

Full-Stack AI Intelligence. Zero Noise.The definitive audio briefing for the C-Suite and AI Architects. From Daily News and Strategic Deep Dives to high-density Industrial & Regulatory Intelligence—decoded at the speed of the AI era. . 👉 Start your specialized audio briefing today at Djamgamind.com


AI Jobs and Career

I wanted to share an exciting opportunity for those of you looking to advance your careers in the AI space. You know how rapidly the landscape is evolving, and finding the right fit can be a challenge. That's why I'm excited about Mercor – they're a platform specifically designed to connect top-tier AI talent with leading companies. Whether you're a data scientist, machine learning engineer, or something else entirely, Mercor can help you find your next big role. If you're ready to take the next step in your AI career, check them out through my referral link: https://work.mercor.com/?referralCode=82d5f4e3-e1a3-4064-963f-c197bb2c8db1. It's a fantastic resource, and I encourage you to explore the opportunities they have available.

Job TitleStatusPay
Full-Stack Engineer Strong match, Full-time $150K - $220K / year
Developer Experience and Productivity Engineer Pre-qualified, Full-time $160K - $300K / year
Software Engineer - Tooling & AI Workflows (Contract) Contract $90 / hour
DevOps Engineer (India) Full-time $20K - $50K / year
Senior Full-Stack Engineer Full-time $2.8K - $4K / week
Enterprise IT & Cloud Domain Expert - India Contract $20 - $30 / hour
Senior Software Engineer Contract $100 - $200 / hour
Senior Software Engineer Pre-qualified, Full-time $150K - $300K / year
Senior Full-Stack Engineer: Latin America Full-time $1.6K - $2.1K / week
Software Engineering Expert Contract $50 - $150 / hour
Generalist Video Annotators Contract $45 / hour
Generalist Writing Expert Contract $45 / hour
Editors, Fact Checkers, & Data Quality Reviewers Contract $50 - $60 / hour
Multilingual Expert Contract $54 / hour
Mathematics Expert (PhD) Contract $60 - $80 / hour
Software Engineer - India Contract $20 - $45 / hour
Physics Expert (PhD) Contract $60 - $80 / hour
Finance Expert Contract $150 / hour
Designers Contract $50 - $70 / hour
Chemistry Expert (PhD) Contract $60 - $80 / hour

What are some ways to increase precision or recall in machine learning?

What are some ways to Boost Precision and Recall in Machine Learning?

Sensitivity vs Specificity?


In machine learning, recall is the ability of the model to find all relevant instances in the data while precision is the ability of the model to correctly identify only the relevant instances. A high recall means that most relevant results are returned while a high precision means that most of the returned results are relevant. Ideally, you want a model with both high recall and high precision but often there is a trade-off between the two. In this blog post, we will explore some ways to increase recall or precision in machine learning.

What are some ways to increase precision or recall in machine learning?
What are some ways to increase precision or recall in machine learning?


There are two main ways to increase recall:

by increasing the number of false positives or by decreasing the number of false negatives. To increase the number of false positives, you can lower your threshold for what constitutes a positive prediction. For example, if you are trying to predict whether or not an email is spam, you might lower the threshold for what constitutes spam so that more emails are classified as spam. This will result in more false positives (emails that are not actually spam being classified as spam) but will also increase recall (more actual spam emails being classified as spam).

2023 AWS Certified Machine Learning Specialty (MLS-C01) Practice Exams
2023 AWS Certified Machine Learning Specialty (MLS-C01) Practice Exams

To decrease the number of false negatives,

you can increase your threshold for what constitutes a positive prediction. For example, going back to the spam email prediction example, you might raise the threshold for what constitutes spam so that fewer emails are classified as spam. This will result in fewer false negatives (actual spam emails not being classified as spam) but will also decrease recall (fewer actual spam emails being classified as spam).

What are some ways to increase precision or recall in machine learning?

There are two main ways to increase precision:

by increasing the number of true positives or by decreasing the number of true negatives. To increase the number of true positives, you can raise your threshold for what constitutes a positive prediction. For example, using the spam email prediction example again, you might raise the threshold for what constitutes spam so that fewer emails are classified as spam. This will result in more true positives (emails that are actually spam being classified as spam) but will also decrease precision (more non-spam emails being classified as spam).

AI-Powered Professional Certification Quiz Platform
Crack Your Next Exam with Djamgatech AI Cert Master

Web|iOs|Android|Windows

Are you passionate about AI and looking for your next career challenge? In the fast-evolving world of artificial intelligence, connecting with the right opportunities can make all the difference. We're excited to recommend Mercor, a premier platform dedicated to bridging the gap between exceptional AI professionals and innovative companies.

Whether you're seeking roles in machine learning, data science, or other cutting-edge AI fields, Mercor offers a streamlined path to your ideal position. Explore the possibilities and accelerate your AI career by visiting Mercor through our exclusive referral link:

Find Your AI Dream Job on Mercor

Your next big opportunity in AI could be just a click away!

To decrease the number of true negatives,

you can lower your threshold for what constitutes a positive prediction. For example, going back to the spam email prediction example once more, you might lower the threshold for what constitutes spam so that more emails are classified as spam. This will result in fewer true negatives (emails that are not actually spam not being classified as spam) but will also decrease precision (more non-spam emails being classified as spam).

What are some ways to increase precision or recall in machine learning?

To summarize,

there are a few ways to increase precision or recall in machine learning. One way is to use a different evaluation metric. For example, if you are trying to maximize precision, you can use the F1 score, which is a combination of precision and recall. Another way to increase precision or recall is to adjust the threshold for classification. This can be done by changing the decision boundary or by using a different algorithm altogether.

What are some ways to increase precision or recall in machine learning?

Sensitivity vs Specificity

In machine learning, sensitivity and specificity are two measures of the performance of a model. Sensitivity is the proportion of true positives that are correctly predicted by the model, while specificity is the proportion of true negatives that are correctly predicted by the model.

Google Colab For Machine Learning

State of the Google Colab for ML (October 2022)

Google introduced computing units, which you can purchase just like any other cloud computing unit you can from AWS or Azure etc. With Pro you get 100, and with Pro+ you get 500 computing units. GPU, TPU and option of High-RAM effects how much computing unit you use hourly. If you don’t have any computing units, you can’t use “Premium” tier gpus (A100, V100) and even P100 is non-viable.

AI Jobs and Career

And before we wrap up today's AI news, I wanted to share an exciting opportunity for those of you looking to advance your careers in the AI space. You know how rapidly the landscape is evolving, and finding the right fit can be a challenge. That's why I'm excited about Mercor – they're a platform specifically designed to connect top-tier AI talent with leading companies. Whether you're a data scientist, machine learning engineer, or something else entirely, Mercor can help you find your next big role. If you're ready to take the next step in your AI career, check them out through my referral link: https://work.mercor.com/?referralCode=82d5f4e3-e1a3-4064-963f-c197bb2c8db1. It's a fantastic resource, and I encourage you to explore the opportunities they have available.

Google Colab Pro+ comes with Premium tier GPU option, meanwhile in Pro if you have computing units you can randomly connect to P100 or T4. After you use all of your computing units, you can buy more or you can use T4 GPU for the half or most of the time (there can be a lot of times in the day that you can’t even use a T4 or any kinds of GPU). In free tier, offered gpus are most of the time K80 and P4, which performs similar to a 750ti (entry level gpu from 2014) with more VRAM.

For your consideration, T4 uses around 2, and A100 uses around 15 computing units hourly.
Based on the current knowledge, computing units costs for GPUs tend to fluctuate based on some unknown factor.

Considering those:

  1. For hobbyists and (under)graduate school duties, it will be better to use your own gpu if you have something with more than 4 gigs of VRAM and better than 750ti, or atleast purchase google pro to reach T4 even if you have no computing units remaining.
  2. For small research companies, and non-trivial research at universities, and probably for most of the people Colab now probably is not a good option.
  3. Colab Pro+ can be considered if you want Pro but you don’t sit in front of your computer, since it disconnects after 90 minutes of inactivity in your computer. But this can be overcomed with some scripts to some extend. So for most of the time Colab Pro+ is not a good option.

If you have anything more to say, please let me know so I can edit this post with them. Thanks!

Conclusion:


In machine learning, precision and recall trade off against each other; increasing one often decreases the other. There is no single silver bullet solution for increasing either precision or recall; it depends on your specific use case which one is more important and which methods will work best for boosting whichever metric you choose. In this blog post, we explored some methods for increasing either precision or recall; hopefully this gives you a starting point for improving your own models!

 

What are some ways we can use machine learning and artificial intelligence for algorithmic trading in the stock market?

Machine Learning and Data Science Breaking News 2022 – 2023

  • Using FC26 to simulate the world cup ? [D]
    by /u/Stillane (Machine Learning) on June 6, 2026 at 8:34 am

    maybe this should be asked in the Fc26 game subreddit but not sure. Anyway I just saw a video of someone predicting the winner of the world cup using the simulate match feature in the game but he only did it once. Would running this feature 100-1000 times give a significant result ? or is that feature only based on luck ? submitted by /u/Stillane [link] [comments]

  • Building a Custom Drones MuJoCo Environment [P]
    by /u/MT1699 (Machine Learning) on June 6, 2026 at 3:24 am

    Hi all, Lately I have been working on creating a package for Multi Agent RL based drone environments with different objectives, all bundled into a single GitHub repository: https://github.com/tau-intelligence/MuJoCo-drones-gym I am currently trying to organize things for RL community people, with a couple more tools coming soon. But right now, I want to make it useful for the community and hence would love some feedback from different people, about how I could improve it, incorporate more things into it or fix some broken implementation. Also everyone is welcome to raise issues on the repo. Thank you for the support. PS: I have some research publications at RL and ML venues regarding work on RL, though I still want to consider myself as a student of the field and hence would love your help here. submitted by /u/MT1699 [link] [comments]

  • TinyTPU: SystemVerilog systolic array compiled to WASM, running live in browser - RTL golden-verified against numpy [P]
    by /u/Horror-Flamingo-2150 (Machine Learning) on June 5, 2026 at 8:05 pm

    Most explanations of TPUs and systolic arrays are either hand-wavy diagrams or papers. I wanted to see the thing actually run, so I built it. TinyTPU is a 4×4 weight-stationary systolic array in real SystemVerilog, compiled to WebAssembly, with a step-by-step browser visualization. You enter two matrices, hit run, and watch the actual hardware execute: weights loading into PEs, matrix A streaming in diagonally (the "skew" that makes systolic arrays work), partial sums accumulating down the grid, results draining from the bottom. It has three levels: L1 - isolate a single MAC cell, watch one multiply-accumulate happen L2 - the full 4×4 array executing a real matmul L3 - tiling: what happens when your matrix is bigger than the hardware Nothing on screen is faked. The visualization reads state directly from compiled RTL. If you're trying to understand how matrix multiply maps to hardware why TPUs are efficient, what "weight-stationary" actually means, why the diagonal stagger exists this might click it for you in a way papers don't. Repo: tiny-tpu Live demo: Live If this project interests you please do star the repo, if you find something needs improving open a PR, I hope ya'll check this out and give me some feedback 🙏 submitted by /u/Horror-Flamingo-2150 [link] [comments]

  • What are the downsides of asking for an inflation adjustment in the salary?
    by /u/Fig_Towel_379 (Data Science) on June 5, 2026 at 4:22 pm

    On average, I have received a 0.75% salary hike over the last 5 years, which I know is pretty unreasonable. I have been looking for a new job, but given the current market, I cannot say for certain when I will find a new role. In the meantime, I was thinking of asking my manager for an inflation based adjustment to my base salary. I am not sure how much they will offer, if anything at all, but it still seems better than nothing. My performance has also been strong, though asking for a performance-based hike feels riskier and like it could backfire. What would you suggest? submitted by /u/Fig_Towel_379 [link] [comments]

  • ICML non-archival workshop - worth attending? [D]
    by /u/YOYOBOYOO (Machine Learning) on June 5, 2026 at 3:47 pm

    I have a paper accepted at a non-archival ICML workshop this year, and I am trying to decide whether it is worth registering and attending. By coincidence, I will already be in Seoul around that time, but I would have to pay the workshop registration fee (~$400) out of my own pocket. I would only be registering for the workshop day since I have other commitments during the rest of the conference. I am thinking of applying to PhD programs this fall (I applied this year too, but didn't get in), and the workshop speakers and panellists look genuinely great. Not sure what the real benefits are here or whether I should go for it. For context, I am also attending ACL 2026 this year, but that trip is fortunately sponsored, so this would be a separate personal expense. I would also appreciate guidance on how non-archival workshops work in general. Since the paper is non-archival and not formally published (at least to my understanding), is registration still expected or required for accepted papers? Do authors typically attend and present in person, or is it common to skip attendance and conference registration? Has anyone been in a similar situation? I want to understand the benefits of this. Any advice would be greatly appreciated because I honestly have no idea how to evaluate this. submitted by /u/YOYOBOYOO [link] [comments]

  • How do you identify researchers who are good? [D]
    by /u/roguejedi1 (Machine Learning) on June 5, 2026 at 2:04 pm

    About 10 years ago, I got into the basics of ML (like regression, KNN's, LVQ's) and read a few papers before taking a break a few years back. It feels like now, there's a lot of researchers in AI. How do you identify the ones who are actually solid vs those who (forgive my phrasing) are more researchers for appearance/status (i.e don't actually know what they're talking about)? Is the core filter h-index or where they work? How would you identify them? submitted by /u/roguejedi1 [link] [comments]

  • Would you say capture-time semantic annotation for robot trajectories is a solved problem? [R]
    by /u/Several-Many9101 (Machine Learning) on June 5, 2026 at 8:42 am

    It seems raw teleoperation data (RGB + joint states) structurally lacks affordance, contact intent, and embodiment-specific kinematic context. (information that can't be reliably recovered post-hoc once the demonstration is recorded) Most current approaches either filter/clean after collection, or rely on simulation to compensate. But neither seems to close the semantic gap for contact-rich tasks in unstructured environments. Is anyone working on supervision at acquisition time, enriching the stream as it's captured rather than labeling after the fact? And if not, is this a real bottleneck or am I overestimating the problem? submitted by /u/Several-Many9101 [link] [comments]

  • What is the most common reason data science projects fail to deliver business value?
    by /u/Effective_Ocelot_445 (Data Science) on June 5, 2026 at 5:57 am

    Iam curious whether the biggest challenges are related to data quality, stakeholder alignment, model adoption, business understanding, or something else entirely. submitted by /u/Effective_Ocelot_445 [link] [comments]

  • Is it allowed to use OpenAI API outputs to create a silver code dataset or benchmark for a specific Python library? [d]
    by /u/ororo88 (Machine Learning) on June 5, 2026 at 5:52 am

    Hello everyone, Is it allowed to use OpenAI API outputs to create a silver code dataset or benchmark for a specific Python library? I am working on a project idea related to library-specific code generation. The concrete case is a specific Python library used in a technical/scientific domain. The goal would be to improve and evaluate how well code-generation models can use this library correctly. I am trying to understand the legal / Terms of Service boundary around using OpenAI API outputs in two different scenarios: Scenario 1: Silver dataset for fine-tuning an OSS model Use the OpenAI API to generate programming tasks, reference solutions, and verification tests for the specific Python library. Then human-review, filter, and validate the generated examples. Then use this silver dataset to fine-tune an open-source code model, with the goal of improving its performance on this specific library. My question: would this violate OpenAI’s terms because the API outputs are being used to train/fine-tune another coding model, even if the scope is narrow and library-specific? Scenario 2: Benchmark only, not training Use the OpenAI API to generate programming tasks, reference solutions, and verification tests. Human-review and validate them. Then use the resulting dataset only as an evaluation benchmark to compare different models. The benchmark would not be used to fine-tune or train any model. My question: is this generally considered allowed under OpenAI’s terms, assuming the benchmark is properly reviewed and documented as AI-assisted? I understand that Reddit is not legal advice, and I would still contact OpenAI or legal counsel for a definitive answer. However, I thought new ideas could come up from people who have already faced similar situations in practice. submitted by /u/ororo88 [link] [comments]

  • Potential grad job lined up - how best to prepare?
    by /u/Tackit286 (Data Science) on June 5, 2026 at 12:49 am

    I’m have a potential grad position lined up starting in July. It’s starting out in more of a BI Analyst/Report Development type of role before working under a Data Scientist to get into more of the ML side of things. I’m fine with this as I’m undertaking a career change anyway, so I was always open to starting at the bottom. This would be my first job of any kind in the field and I want to make a good impression and show that I have what it takes. While I’m incredibly fortunate to have a potential job in such a tough market, I feel woefully underprepared for it given that I don’t really have much in the way of demonstrable project work outside my university studies and a few online certs. I will be continuing with some study and start doing some project work if and when I have time. Any advice for what I could do between now and then so that I can feel a little better prepared? submitted by /u/Tackit286 [link] [comments]

  • [R] Measuring the Symmetry--Data Exchange Rate
    by /u/AhmedMostafa16 (Machine Learning) on June 4, 2026 at 10:43 pm

    The prediction that equivariance reduces sample complexity by a factor of |G| appears in roughly every paper on geometric deep learning and is measured as an actual scaling law in roughly none of them. This paper does the measurement. The methodology is the interesting part. Naive estimators conflate group order with task difficulty (larger groups induce harder symmetry structure, not just more constraint), so the authors derive a relative exchange rate that cancels the shared difficulty out, meaning roughly how much less data the equivariant model needs compared to a vanilla baseline as a function of n, on a controlled C_n-symmetric task where n is a free knob. They also pre-specify a failure taxonomy: explicit conditions that would count as evidence against the hypothesis before seeing results. The headline number is beta_diff ~ 1.28, consistent with the theoretical 1.0. But the more durable finding is the wrong-group control: a model built with the wrong cyclic symmetry, same orbit size and same compute budget, is actively worse than no constraint. Not noise. The joint pairwise CI [+0.79, +3.26] excludes zero robustly across every estimator they run. Misalignment isn't just unhelpful; it is harmful. There is also a clean mathematical result slipped into Sec. 4.3: augmentation + test-time orbit averaging is exactly equivariant for output-pooling architectures, provably and verified to bit-identical training curves. The architecture-vs-augmentation gap collapses to whether you apply the orbit average at test time, not to anything structural. This seems underappreciated. The paper is unusually transparent about what it didn't nail: the relative-rate estimator was adopted post-hoc, the two-level bootstrap CI (seeds x group sizes) includes zero, and a finer-N replication on a sqrt(2)-spaced grid is inconclusive. They rank their findings explicitly by robustness. The wrong-group result is the one they would stake a claim on. The exchange rate is directionally probable. submitted by /u/AhmedMostafa16 [link] [comments]

  • How do ML researchers actually use AI tools to improve their writing? [D]
    by /u/Hope999991 (Machine Learning) on June 4, 2026 at 5:02 pm

    As an ML researcher, how do you use AI tools in your daily work? Do you mostly use them to clean up grammar and wording, or also to rewrite, structure, or draft technical text? submitted by /u/Hope999991 [link] [comments]

  • We built a source-available LLM reliability library (free for research / personal / internal eval) that can cut inference cost by half at matched quality, and you adopt it by changing one import [P] [R]
    by /u/Intellerce (Machine Learning) on June 4, 2026 at 4:51 pm

    TL;DR: Reliability techniques (methods that boost an LLM's correctness by spending extra inference, e.g., retries with feedback, ensembling, generator/critic refinement, verification passes, difficulty-aware routing) are scattered across the literature, each in its own paper-specific codebase. We unified 28 reliability techniques (21 communication-theoretic methods across 6 families plus 7 prior-method baselines: Self-Consistency, Self-Refine, CoVe, BoN, Weighted BoN, CISC, MoA), each measured against an uncoded single-pass baseline, under a single API, with 3 adaptive routers (SemKNN + two local ACM routers) sitting on top, then showed that routing the technique adaptively per prompt lets you slide along a quality/cost frontier. In our paper benchmark with one specific lineup, Nemotron + Devstral as the two generators and GLM-5.1 as the judge, the adaptive router delivered ~56% cost reduction at matched quality, or ~7% quality bump at matched cost, vs the best fixed method we compared against at that same lineup. One knob (λ) does the sliding. The qualitative pattern (adaptive beats fixed) should generalize, but absolute numbers are lineup-specific, and we haven't run the full sweep across other model combinations yet. Adoption is change one import: python - from openai import OpenAI + from agentcodec.openai import OpenAI Pass reliability="harq_ir" (or any of the 28 techniques) and existing client.chat.completions.create(...) calls keep their native OpenAI response shape. Same drop-in shims for Anthropic and Ollama. GitHub: https://github.com/intellerce/agentcodec Working paper: https://arxiv.org/abs/2605.09121 After spending a while researching reliability methods from papers, we kept hitting the same wall: every paper ships its own one-off codebase with its own prompt format, its own scoring rubric, its own model wrapper. Benchmarking "should we use self-refine or best-of-N here?" turned into a week of plumbing per comparison. The communication-theory framing is what tied it together: an LLM is a stochastic channel Y = A(X) + N, and every reliability technique from the wireless world has a direct analog in agent-land: Wireless Agent-land ARQ / HARQ retry-with-feedback loops Diversity combining (MRC/SC/EGC) ensemble multiple models Turbo decoding iterative generator/critic mutual refinement Fountain codes rateless sampling, stop when the judge is confident FEC answer + structured parity passes (re-derivation, verification, alternative), decode by cross-check ACM (adaptive coding-modulation) route by difficulty We put all of them in one library: 28 reliability techniques (the 7 prior-method baselines are part of that 28, not on top of it), plus the uncoded single-pass baseline they're all measured against, plus 3 adaptive routers (SemKNN + two local ACM routers) that select a technique per prompt. Full breakdown in the README. The minimal version ```python from agentcodec import ReliabilityModule mod = ReliabilityModule.from_dict({ "models": [ # Spatial diversity: two different families = uncorrelated errors {"model": "qwen3:8b", "base_url": "http://localhost:11434/v1", "api_key": "ollama"}, {"model": "llama3.1:8b", "base_url": "http://localhost:11434/v1", "api_key": "ollama"}, ], "judge": {"model": "gemma3:12b", "base_url": "http://localhost:11434/v1", "api_key": "ollama"}, "critic": {"same": True}, "strategy": {"type": "fixed", "technique": "harq_ir", "params": {"max_rounds": 4}}, }) result = mod.run("Prove the sum of the first n odd integers is n2.", category="reasoning") print(result.text, result.cost_usd, result.cost_source, result.technique_used) ``` Swap "harq_ir" for "diversity_mrc", "turbo", "fountain", etc. Same API, same ReliabilityResult shape, same cost-source tier on every output. For production, flip strategy to routed and the library picks the technique per prompt (cheap baseline on easy prompts, diversity_mrc on hard ones). Three things worth calling out Beyond the technique catalog, three pieces of the implementation that took real work: 1. Native async streaming for all but 2 techniques (acm_soft, acm_learned), with role-tagged events. mod.astream() drives AsyncOpenAI / AsyncAnthropic / httpx.AsyncClient end-to-end (no worker-thread bridge) and emits TokenEvents tagged with a role: "answer", "thinking", "draft", "critique", "verification", "candidate", "synthesis". So when you stream a HARQ-IR run, you can render the round-by-round drafts and critiques live, not just the final answer: python async for ev in mod.astream("Explain QUIC vs TCP."): if isinstance(ev, TokenEvent): if ev.role == "answer": print(ev.text, end="", flush=True) elif ev.role == "draft": print(f"\n[draft] {ev.text}") elif ev.role == "critique": print(f"\n[CRITIC] {ev.text}") elif ev.role == "thinking": pass # captured to result.thinking_text elif isinstance(ev, FinalEvent): print(f"\ndone — {ev.result.technique_used}, " f"thinking_cost=${ev.result.thinking_cost_usd:.4f}") Parallel-branch techniques fan out concurrently via asyncio.gather. diversity_mrc with two models actually runs them in parallel, and you see per-branch ProgressEvents as each one completes. 2. Thinking-text capture across all backends. Anthropic ThinkingBlock, OpenAI reasoning_content (+ exact reasoning_tokens from usage.completion_tokens_details), Ollama msg.thinking, and inline <think>...</think> tag stripping (DeepSeek-R1, Qwen3, GLM-4.5+, Nemotron) all populate result.thinking_text and split result.cost_usd into thinking_cost_usd + answer_cost_usd. So you can finally see what the o-series / Claude / DeepSeek is actually charging you for. 3. Drop-in compat shims with expose_reliability_stream=True. Default: the shim looks identical to the native SDK, delta.content for the answer, delta.reasoning_content for thinking. Drafts/critiques are hidden so existing code keeps working unchanged. Set the flag and the shim surfaces internal roles via sentinel fields (delta.agentcodec_role, delta.agentcodec_call_id) that existing consumers ignore harmlessly: ```python from agentcodec.openai import AsyncOpenAI client = AsyncOpenAI(api_key=KEY, reliability="harq_ir", expose_reliability_stream=True) Now drafts/critiques flow through the native OpenAI stream with sentinels. ``` Same flag and same semantics on agentcodec.anthropic.AsyncAnthropic and agentcodec.ollama.AsyncClient. Other useful bits Cost transparency built in: every result carries a cost_source tier marking how the price was obtained, from exact_user_rate (you supplied the rate) through openrouter_rate / exact_table_rate / inferred_table_rate down to default_fallback, plus token-estimation flags when only character counts were available. Live pricing fetched from OpenRouter, cached locally for 7 days. No more "I think this run cost $40, maybe?" Works against whatever you have: OpenAI, Anthropic (native SDK), Ollama (native + python lib + OpenAI-compat), vLLM, OpenRouter, LM Studio, Together. No Docker, no separate inference server, no LangChain. Strict config schema: typos in YAML / dict configs raise at load time, not on first .run(). 195 tests, 25 runnable examples under examples/: async streaming, thinking capture, drop-in compat for all three backends, plus a fully-annotated YAML config. Caveats The headline numbers are for a specific model lineup. The ~56% cost / ~7% quality figures come from a single benchmark run with Nemotron + Devstral as the two generators and GLM-5.1 as the judge. We expect the qualitative pattern (adaptive routing dominates fixed) to hold for other model combinations, since that's the whole point of the framework, but the absolute numbers will move with the lineup, and we haven't done the cross-lineup sweep yet. If you swap in different generators expect different absolute savings; the right comparison is your adaptive vs your best fixed baseline at your lineup. License is PolyForm Noncommercial 1.0.0: free for research, teaching, personal/internal eval. Commercial use needs a separate license. The trained SemKNN routing artifacts (learned router mapping prompt embeddings → best technique, the thing that delivers the headline cost number) are not redistributed; the client talks to a remote SemKNN service. All other routers (fixed, acm_table, acm_linear) run fully locally, though the last one needs you to train it. 2 techniques (acm_soft, acm_learned) still fall back to sync dispatch in an executor on the async streaming path. They produce correct FinalEvents but no mid-stream tokens. Roadmap. This is research code. Expect rough edges on the less-traveled paths (soft-output diversity variants, the learned ACM router). Feel free to ask about specific techniques, the routing approach, how to add a new one, or the streaming / thinking / compat work. Suggestions on what to ship next are welcome. submitted by /u/Intellerce [link] [comments]

  • How much do patents or publications actually matter in interviews?
    by /u/tinkerpal (Data Science) on June 4, 2026 at 4:27 pm

    I'm curious how much these things matter in practice during DS or MLE interview loops. I keep hearing mixed things. Did interviewers actually bring them up or did you have to steer the conversation yourself? Did it change the vibe of the interview, like more focus on your actual work instead of textbook ML questions and leetcode? Did it help with leveling or comp? Was there any difference between how big tech vs smaller companies treated them? Just trying to figure out how much weight these actually carry. submitted by /u/tinkerpal [link] [comments]

  • Faithful uncertainty in LLM agents: calibration vs utility tradeoff in practice[D]
    by /u/Ill_Awareness6706 (Machine Learning) on June 4, 2026 at 2:53 pm

    The Google paper on metacognition for hallucination reduction makes a distinction that is underappreciated in benchmarks. Calibration is not about being right more often. It is about matching confidence to correctness. A perfectly calibrated model can still be wrong twenty five percent of the time. It just does not pretend otherwise. In agent systems this distinction matters more than in chat. A conversational model giving a hedged answer is slightly annoying. An agent with tool access acting confidently on a wrong premise is dangerous. I have been trying this in a small verdent based coding setup by splitting the pipeline into a planning stage that produces a task graph, then running a verifier before any expensive tool gets invoked. The risk is the model trusts its own reasoning even when speculative. Grounding helps but it is not the same as calibration. One practical pattern: a planning stage produces a task graph, then a lightweight verifier checks whether the plan is consistent with available evidence. This catches about sixty percent of hallucinated tool calls in my setup before they execute. The downside is the utility tax. Extra verification adds latency. Dropping hallucination from twenty five to five percent costs about half the easy correct answers, mirroring the paper. My current compromise: let the planning layer flag low confidence tasks for human review, but auto execute high confidence ones. The reviewer only sees edge cases instead of drowning in every step. The awkward part is that most agent stacks still treat confidence as a log detail, not as a control surface. submitted by /u/Ill_Awareness6706 [link] [comments]

  • KVarN: Variance-Normalized KV-Cache Quantization [R]
    by /u/intentionallyBlue (Machine Learning) on June 4, 2026 at 1:21 pm

    Excited to share some of my own work here 🙂 KVarN is our new KV-Cache quantization method. In very brief, we combine Hadamard rotations with variance-normalization on both axes of the K and V matrices, then round to nearest. Simple, but works very well, especially for decode-heavy test-time-scaling settings (reasoning, code-gen, agentics). We get 3-4x compression at virtually no accuracy drop (mostly 0-1%) on tough benchmarks like AIME24 as well as a speed-up over fp16 baseline in vLLM (in contrast to other recent KV-Cache compression works). Behind it is an analysis of where quantization errors come from and have the biggest impact, especially in the error-accumulating decode setting: 1) fixing large errors is disproportionally useful (if you had a fixed MSE budget that you could ~fix, you should spend it on few big errors, rather than many small) 2) These big errors are mostly caused by bad token-scales (hence the normalization). Paper: https://arxiv.org/abs/2606.03458 vLLM implementation: https://github.com/huawei-csl/KVarN submitted by /u/intentionallyBlue [link] [comments]

  • On-policy distillation: one of the hottest terms on PapersWithCode [R]
    by /u/NielsRogge (Machine Learning) on June 4, 2026 at 12:40 pm

    Hi, Niels here from the open-source team at Hugging Face. At paperswithcode.co I am trying to make it easier for people to learn about the newest techniques used across AI papers. One of the hottest terms in AI research that I've recently added is On-policy distillation, also abbreviated as OPD. It's the key post-training behind models like Qwen 3.6 and 3.7, GLM-5.1, and DeepSeek-V4. https://preview.redd.it/yegq2gfag95h1.png?width=3046&format=png&auto=webp&s=f68fdf3ca075f3c4e56051fdd0ebcf97be9bcbc9 On PapersWithCode, you can find the original paper that introduced it, learn more about the method itself, as well as all papers that cite or mention it. Sasha Rush (who used to be a colleague of mine at Hugging Face, now at Cursor) recently made an excellent whiteboard explanation of OPD with Dwarkesh. I've linked this video lecture in the method description on PwC's website, so more people can find it. I'll copy the excellent short description of the method from Dwarkesh here: "The basic idea is this: if the model made a mistake at some point in the rollout (for example, calling a tool that doesn't exist), we want to discourage this specific error, but we don't want to just learn from the final reward, because it's a very noisy signal spread out over the whole trajectory. So we have another model to read this trajectory and figure out where the error was made. It simply inserts some hint tokens into the part of the trajectory immediately above where the mistake occurred. Now, with these injected hint tokens, run a forward pass through the model. You're not having to regenerate a new rollout - aka no new decode required. The hint causes the model to assign lower probabilities to the error tokens. You then train the original model to match these new probabilities, teaching it to downweight that specific mistake." Let me know which other methods I should add! Cheers submitted by /u/NielsRogge [link] [comments]

  • How Do You Handle Ablation Studies When the Original Model Is Already Trained?[R]
    by /u/Plane_Stick8394 (Machine Learning) on June 4, 2026 at 11:07 am

    I'm running into an issue with an ablation study for a paper I'm preparing. I trained a model. The model achieved my best result, and I saved the trained checkpoint (.pth file). Now my supervisor wants me to perform an ablation study by removing components and how it impacts the accuracy. My concern is that if I retrain from scratch, the accuracies will not exactly match the original run due to randomness, different seeds, etc. is there any way i can do the ablation study without retraining? I'd appreciate hearing how others have handled this situation in publications or thesis work. please help me out submitted by /u/Plane_Stick8394 [link] [comments]

  • Repo for implementations of various Transformer Attn mechanisms [P]
    by /u/AnyIce3007 (Machine Learning) on June 4, 2026 at 8:28 am

    Initially, I developed this so I can easily switch between different Attention mechanisms for my Small Language Model (SLM) experiments and benchmarking. However, I also realized that these implementations can be applicable in Computer Vision, modernize Vision Encoders, RL, and others. I hope this helps researchers, students, or educators in general. I also included MiniMax M3's sparse attention. This can be integrated with Andrej Karpathy's autoresearch framework. For contributing: I encourage you to please open a PR. I would like to see and learn implementations of other attention mechanisms I haven't covered in this repo. Thank you! GitHub Link: https://github.com/egmaminta/attnhut submitted by /u/AnyIce3007 [link] [comments]

  • Best Visual Reasoning Model in 2026 (Including APIs) [D]
    by /u/Alternative_Art2984 (Machine Learning) on June 4, 2026 at 3:52 am

    For example, suppose I have a one-hour video and I provide it to ChatGPT or another AI model. If I ask complex reasoning questions about the video, which models are best suited for long-horizon video understanding and reasoning? Which models can produce the most reliable answers in this scenario? submitted by /u/Alternative_Art2984 [link] [comments]

Top 100 Data Science and Data Analytics and Data Engineering Interview Questions and Answers


AI Unraveled: Demystifying Frequently Asked Questions on Artificial Intelligence (OpenAI, ChatGPT, Google Gemini, Generative AI, Discriminative AI, xAI, LLMs, GPUs, Machine Learning, NLP, Promp Engineering)

What are some good datasets for Data Science and Machine Learning?

What is Google Workspace?
Google Workspace is a cloud-based productivity suite that helps teams communicate, collaborate and get things done from anywhere and on any device. It's simple to set up, use and manage, so your business can focus on what really matters.

Watch a video or find out more here.

Here are some highlights:
Business email for your domain
Look professional and communicate as you@yourcompany.com. Gmail's simple features help you build your brand while getting more done.

Access from any location or device
Check emails, share files, edit documents, hold video meetings and more, whether you're at work, at home or on the move. You can pick up where you left off from a computer, tablet or phone.

Enterprise-level management tools
Robust admin settings give you total command over users, devices, security and more.

Sign up using my link https://referworkspace.app.goo.gl/Q371 and get a 14-day trial, and message me to get an exclusive discount when you try Google Workspace for your business.

Google Workspace Business Standard Promotion code for the Americas 63F733CLLY7R7MM 63F7D7CPD9XXUVT 63FLKQHWV3AEEE6 63JGLWWK36CP7WM
Email me for more promo codes

Active Hydrating Toner, Anti-Aging Replenishing Advanced Face Moisturizer, with Vitamins A, C, E & Natural Botanicals to Promote Skin Balance & Collagen Production, 6.7 Fl Oz

Age Defying 0.3% Retinol Serum, Anti-Aging Dark Spot Remover for Face, Fine Lines & Wrinkle Pore Minimizer, with Vitamin E & Natural Botanicals

Firming Moisturizer, Advanced Hydrating Facial Replenishing Cream, with Hyaluronic Acid, Resveratrol & Natural Botanicals to Restore Skin's Strength, Radiance, and Resilience, 1.75 Oz

Skin Stem Cell Serum

Smartphone 101 - Pick a smartphone for me - android or iOS - Apple iPhone or Samsung Galaxy or Huawei or Xaomi or Google Pixel

Can AI Really Predict Lottery Results? We Asked an Expert.

Ace the 2025 AWS Solutions Architect Associate SAA-C03 Exam with Confidence Pass the 2025 AWS Certified Machine Learning Specialty MLS-C01 Exam with Flying Colors

List of Freely available programming books - What is the single most influential book every Programmers should read



#BlackOwned #BlackEntrepreneurs #BlackBuniness #AWSCertified #AWSCloudPractitioner #AWSCertification #AWSCLFC02 #CloudComputing #AWSStudyGuide #AWSTraining #AWSCareer #AWSExamPrep #AWSCommunity #AWSEducation #AWSBasics #AWSCertified #AWSMachineLearning #AWSCertification #AWSSpecialty #MachineLearning #AWSStudyGuide #CloudComputing #DataScience #AWSCertified #AWSSolutionsArchitect #AWSArchitectAssociate #AWSCertification #AWSStudyGuide #CloudComputing #AWSArchitecture #AWSTraining #AWSCareer #AWSExamPrep #AWSCommunity #AWSEducation #AzureFundamentals #AZ900 #MicrosoftAzure #ITCertification #CertificationPrep #StudyMaterials #TechLearning #MicrosoftCertified #AzureCertification #TechBooks

Top 1000 Canada Quiz and trivia: CANADA CITIZENSHIP TEST- HISTORY - GEOGRAPHY - GOVERNMENT- CULTURE - PEOPLE - LANGUAGES - TRAVEL - WILDLIFE - HOCKEY - TOURISM - SCENERIES - ARTS - DATA VISUALIZATION
zCanadian Quiz and Trivia, Canadian History, Citizenship Test, Geography, Wildlife, Secenries, Banff, Tourism

Top 1000 Africa Quiz and trivia: HISTORY - GEOGRAPHY - WILDLIFE - CULTURE - PEOPLE - LANGUAGES - TRAVEL - TOURISM - SCENERIES - ARTS - DATA VISUALIZATION
Africa Quiz, Africa Trivia, Quiz, African History, Geography, Wildlife, Culture

Exploring the Pros and Cons of Visiting All Provinces and Territories in Canada.
Exploring the Pros and Cons of Visiting All Provinces and Territories in Canada

Exploring the Advantages and Disadvantages of Visiting All 50 States in the USA
Exploring the Advantages and Disadvantages of Visiting All 50 States in the USA


Health Health, a science-based community to discuss human health

Today I Learned (TIL) You learn something new every day; what did you learn today? Submit interesting and specific facts about something that you just found out here.

Reddit Science This community is a place to share and discuss new scientific research. Read about the latest advances in astronomy, biology, medicine, physics, social science, and more. Find and submit new publications and popular science coverage of current research.

Reddit Sports Sports News and Highlights from the NFL, NBA, NHL, MLB, MLS, NCAA, F1, and other leagues around the world.

Turn your dream into reality with Google Workspace: It’s free for the first 14 days.
Get 20% off Google Google Workspace (Google Meet) Standard Plan with  the following codes:
Get 20% off Google Google Workspace (Google Meet) Standard Plan with  the following codes: 96DRHDRA9J7GTN6 96DRHDRA9J7GTN6
63F733CLLY7R7MM
63F7D7CPD9XXUVT
63FLKQHWV3AEEE6
63JGLWWK36CP7WM
63KKR9EULQRR7VE
63KNY4N7VHCUA9R
63LDXXFYU6VXDG9
63MGNRCKXURAYWC
63NGNDVVXJP4N99
63P4G3ELRPADKQU
With Google Workspace, Get custom email @yourcompany, Work from anywhere; Easily scale up or down
Google gives you the tools you need to run your business like a pro. Set up custom email, share files securely online, video chat from any device, and more.
Google Workspace provides a platform, a common ground, for all our internal teams and operations to collaboratively support our primary business goal, which is to deliver quality information to our readers quickly.
Get 20% off Google Workspace (Google Meet) Business Plan (AMERICAS): M9HNXHX3WC9H7YE
C37HCAQRVR7JTFK
C3AE76E7WATCTL9
C3C3RGUF9VW6LXE
C3D9LD4L736CALC
C3EQXV674DQ6PXP
C3G9M3JEHXM3XC7
C3GGR3H4TRHUD7L
C3LVUVC3LHKUEQK
C3PVGM4CHHPMWLE
C3QHQ763LWGTW4C
Even if you’re small, you want people to see you as a professional business. If you’re still growing, you need the building blocks to get you where you want to be. I’ve learned so much about business through Google Workspace—I can’t imagine working without it.
(Email us for more codes)