What is the Best Machine Learning Algorithms for Imbalanced Datasets

Machine Learning Algorithms and Imbalanced Datasets
DjamgaMind - AI Unraveled Podcast

DjamgaMind: Audio Intelligence for the C-Suite (Daily AI News, Energy, Healthcare, Finance)

Full-Stack AI Intelligence. Zero Noise.The definitive audio briefing for the C-Suite and AI Architects. From Daily News and Strategic Deep Dives to high-density Industrial & Regulatory Intelligence—decoded at the speed of the AI era. . 👉 Start your specialized audio briefing today at Djamgamind.com


AI Jobs and Career

I wanted to share an exciting opportunity for those of you looking to advance your careers in the AI space. You know how rapidly the landscape is evolving, and finding the right fit can be a challenge. That's why I'm excited about Mercor – they're a platform specifically designed to connect top-tier AI talent with leading companies. Whether you're a data scientist, machine learning engineer, or something else entirely, Mercor can help you find your next big role. If you're ready to take the next step in your AI career, check them out through my referral link: https://work.mercor.com/?referralCode=82d5f4e3-e1a3-4064-963f-c197bb2c8db1. It's a fantastic resource, and I encourage you to explore the opportunities they have available.

Job TitleStatusPay
Full-Stack Engineer Strong match, Full-time $150K - $220K / year
Developer Experience and Productivity Engineer Pre-qualified, Full-time $160K - $300K / year
Software Engineer - Tooling & AI Workflows (Contract) Contract $90 / hour
DevOps Engineer (India) Full-time $20K - $50K / year
Senior Full-Stack Engineer Full-time $2.8K - $4K / week
Enterprise IT & Cloud Domain Expert - India Contract $20 - $30 / hour
Senior Software Engineer Contract $100 - $200 / hour
Senior Software Engineer Pre-qualified, Full-time $150K - $300K / year
Senior Full-Stack Engineer: Latin America Full-time $1.6K - $2.1K / week
Software Engineering Expert Contract $50 - $150 / hour
Generalist Video Annotators Contract $45 / hour
Generalist Writing Expert Contract $45 / hour
Editors, Fact Checkers, & Data Quality Reviewers Contract $50 - $60 / hour
Multilingual Expert Contract $54 / hour
Mathematics Expert (PhD) Contract $60 - $80 / hour
Software Engineer - India Contract $20 - $45 / hour
Physics Expert (PhD) Contract $60 - $80 / hour
Finance Expert Contract $150 / hour
Designers Contract $50 - $70 / hour
Chemistry Expert (PhD) Contract $60 - $80 / hour

What is the Best Machine Learning Algorithms for Imbalanced Datasets?

In machine learning, imbalanced datasets are those where one class heavily outnumbers the others. This can be due to the nature of the problem or simply because more data is available for one class than the others. Either way, imbalanced datasets can pose a challenge for machine learning algorithms. In this blog post, we’ll take a look at which machine learning algorithms are best suited for imbalanced datasets and why they tend to perform better than others.

 For example, in a binary classification problem, if there are 100 observations, and only 10 of them are positive (the rest are negatives), then we say that the dataset is imbalanced. The ratio of positive to negative cases is 1:10. 

What is the Best Machine Learning Algorithms for Imbalanced Datasets
What is the Best Machine Learning Algorithms for Imbalanced Datasets

There are a few reasons why some machine learning algorithms tend to perform better on imbalanced datasets than others. First, certain algorithms are designed to handle imbalanced datasets. Second, some algorithms are more robust to outliers, which can be more common in imbalanced datasets. And third, some algorithms are better able to learn from a limited amount of data, which can be an issue when one class is heavily outnumbered by the others.

Some of the best machine learning algorithms for imbalanced datasets include:

Support Vector Machines (SVMs),
Decision Trees,
Random Forests,
– Naive Bayes Classifiers,
k-Nearest Neighbors (kNN),

Of these, SVMs tend to be the most popular choice as they are specifically designed to handle imbalanced datasets. SVMs work by finding a hyperplane that maximizes the margin between the two classes. This helps to reduce overfitting and improve generalization. Decision trees and random forests are also popular choices as they are less sensitive to outliers than other algorithms such as linear regression. Naive Bayes classifiers are another good choice as they are able to learn from a limited amount of data. kNN is also a good choice as it is not sensitive to outliers and is able to learn from a limited amount of data. However, it can be computationally intensive for large datasets.

There are two main types of machine learning algorithms: supervised and unsupervised. Supervised algorithms tend to perform better on imbalanced datasets than unsupervised algorithms. In this blog post, we will discuss why this is so and look at some examples.

Supervised Algorithms
Supervised algorithms are those where the target variable is known. In other words, we have training data where the correct answers are already given. The algorithm then learns from this data and is able to generalize to new data. Some examples of supervised algorithms are regression and classification.

Unsupervised Algorithms
Unsupervised algorithms are those where the target variable is not known. With unsupervised algorithms, we only have input data, without any corresponding output labels. The algorithm has to learn from the data itself without any guidance. Some examples of unsupervised algorithms are clustering and dimensionality reduction.

Why Supervised Algorithms Perform Better on Imbalanced Datasets
The reason why supervised algorithms perform better on imbalanced datasets is because they can learn from the training data which cases are more important. With unsupervised algorithms, all data points are treated equally, regardless of whether they are in the minority or majority class.

For example, in a binary classification problem with an imbalanced dataset, let’s say that we want to predict whether a customer will default on their loan payment or not. We have a training dataset of 1000 customers, out of which only 100 (10%) have defaulted on their loan in the past.

If we use a supervised algorithm like logistic regression, the algorithm will learn from the training data that defaulting on a loan is rare (since only 10% of cases in the training data are Positive). This means that it will be more likely to predict correctly that a new customer will not default on their loan (since this is the majority class in the training data).
However, if we use an unsupervised algorithm like k-means clustering, all data points will be treated equally since there is no target variable to guide the algorithm. This means that it might incorrectly cluster together customers who have defaulted on their loans with those who haven’t since there is no guidance provided by a target variable.

Conclusion:
In conclusion, supervised machine learning algorithms tend to perform better on imbalanced datasets than unsupervised machine learning algorithms because they can learn from the training data which cases are more important. 

Some machine learning algorithms tend to perform better on highly imbalanced datasets because they are designed to deal with imbalance or because they can learn from both classes simultaneously. If you are working with a highly imbalanced dataset, then you should consider using one of these algorithms.

Thanks for reading!

How are machine learning techniques being used to address unstructured data challenges?

Machine learning techniques are being used to address unstructured data challenges in a number of ways:

  1. Natural language processing (NLP): NLP algorithms can be used to extract meaningful information from unstructured text data, such as emails, documents, and social media posts. NLP algorithms can be trained to classify text data, identify key terms and concepts, and extract structured data from unstructured text.
  2. Image recognition: Machine learning algorithms can be used to analyze and classify images, enabling the automatic identification and classification of objects, people, and other elements in images. This can be useful for tasks such as image tagging and search, as well as for applications such as security and surveillance.
  3. Audio and speech recognition: Machine learning algorithms can be used to analyze and classify audio data, enabling the automatic transcription and translation of spoken language. This can be useful for tasks such as speech-to-text transcription, as well as for applications such as call center automation and language translation.
  4. Video analysis: Machine learning algorithms can be used to analyze and classify video data, enabling the automatic detection and classification of objects, people, and other elements in video. This can be useful for tasks such as video tagging and search, as well as for applications such as security and surveillance.

Overall, machine learning techniques are being used in a wide range of applications to extract meaningful information from unstructured data, and to enable the automatic classification and analysis of data in a variety of formats.

How is AI and machine learning impacting application development today?

Artificial intelligence (AI) and machine learning are having a significant impact on application development today in a number of ways:

  1. Enabling new capabilities: AI and machine learning algorithms can be used to enable applications to perform tasks that would be difficult or impossible for humans to do. For example, AI-powered applications can be used to analyze and classify large amounts of data, or to automate complex decision-making processes.
  2. Improving performance: AI and machine learning algorithms can be used to optimize the performance of applications, making them faster, more efficient, and more accurate. For example, machine learning algorithms can be used to improve the accuracy of predictive models, or to optimize the performance of search algorithms.
  3. Streamlining development: AI and machine learning algorithms can be used to automate various aspects of application development, such as testing, debugging, and deployment. This can help to streamline the development process and reduce the time and resources needed to build and maintain applications.
  4. Enhancing user experiences: AI and machine learning algorithms can be used to enhance the user experience of applications, by providing personalized recommendations, recommendations, or by enabling applications to anticipate and respond to the needs and preferences of users.

Overall, AI and machine learning are having a significant impact on application development today, and they are likely to continue to shape the way applications are built and used in the future.

How will advancements in artificial intelligence and machine learning shape the future of work and society?

Advancements in artificial intelligence (AI) and machine learning are likely to shape the future of work and society in a number of ways. Some potential impacts include:

  1. Automation: AI and machine learning algorithms can be used to automate tasks that are currently performed by humans, such as data entry, customer service, and manufacturing. This could lead to changes in the types of jobs that are available and the skills that are in demand, as well as to increased productivity and efficiency.
  2. Job displacement: While automation may create new job opportunities, it could also lead to job displacement, particularly for workers in industries that are more susceptible to automation. This could lead to social and economic challenges, including unemployment and income inequality.
  3. Increased efficiency: AI and machine learning algorithms can be used to optimize and streamline business processes, leading to increased efficiency and productivity. This could lead to economic growth and innovation, and could also help to reduce costs for businesses and consumers.
  4. Enhanced decision-making: AI and machine learning algorithms can be used to analyze large amounts of data and make more informed and accurate decisions. This could lead to improved outcomes in fields such as healthcare, finance, and education, and could also help to reduce bias and improve fairness.

Overall, the impact of AI and machine learning on the future of work and society is likely to be significant and complex, with both potential benefits and challenges. It will be important to consider and address these impacts as these technologies continue to advance and become more widely adopted.

  • How do you create memorable poster for top tier conferences ( ICML/ICLR/NEURips ect…) [D]
    by /u/DazzlingPin3965 (Machine Learning) on May 13, 2026 at 12:05 am

    Hello everyone, Presenting at a top-tier conference for the first time and having a very hard time coming up with an appropriate design for my poster. Everything I do seems basic and banal. My paper is more theory-oriented, and apart from putting math formulas in bold in the middle, I am not sure what the best way is to design the poster. Even the sizing choice is complicated as ICML gives 3 different recommendations to pick from, and somehow from my computer, I can’t see how the PowerPoint slide will look like printed on those dimensions. And Printing a poster is nearly $100 CAD, so there’s no room for trial and error. So If anyone has any tips on how to do it properly, I have been using PowerPoint, but perhaps I should go to Canvas? Or Does anyone have another software to recommend? submitted by /u/DazzlingPin3965 [link] [comments]

  • I created a minimal one-file implementations (160loc) of JEPA family (ijepa, vjepa, vjepa2, cjepa) for educational purposes [P]
    by /u/kwk236 (Machine Learning) on May 12, 2026 at 11:08 pm

    Hi all, I made my own minimal implementation of JEPA algorithms. Making things minimal and removing all the things needed for scaling the algorithm always helped me understanding. So I stripped everything but the algorithm parts. What's left is 160-200 lines of code that distills the essence of the mathematics. It is very easy to compare with the math in the paper and the code and how it can be implemented in PyTorch. I added [algo]_tutorial.md files to help with understanding. https://github.com/keon/jepa submitted by /u/kwk236 [link] [comments]

  • Steam Recommender using similarity! (Undergraduate Student Project) [P]
    by /u/Expensive-Ad8916 (Machine Learning) on May 12, 2026 at 5:30 pm

    (DISCLAIMER: I accidentally deleted the last post on this subreddit my apologies if this is your second time seeing it) Last year I made a post about my steam recommender The last one was great and served its purpose of showing many people new games, But this new version is much more functional! I love making recommendation systems that tell the user WHY they got the recommendation. During a steam sale event, I always find myself trying to look for new video games to play. If I wanted to find a new game I would try to whittle it down by using steam tags, but the steam tag system is very broad "action". could apply to many many games. That got me thinking, what aspects do I like about my favorite games? Well I like Persona 4 because of the city vibes and jazz fusion, Spore because of the unique character creation and whimsical theme. Balatro for its unique deck building synergies. What if I could capture unique tags that identify a game that aren't just "action" and put them into vectors to show the (focus) of a game For example I could break persona 4 into something like Game play Focus vector: Day cycle 20% Dungeon crawling 20% Social sim 20% Tags: Music: jazz fusion Vibe: Small rural town I find that this system makes searching for games more "fun" now I can see why I like balatro. I like it because of the card synergies not so much for its rogue-like nature. I also find that this helps find new underrated games, and beats the trap that Collaborative Filtering algorithms that get into where it "feels" like you get recommended the same things. find your next favorite game! : https://nextsteamgame.com/ pull a PR!: https://github.com/BakedSoups/NextSteamGame ( I actually made some git issues myself for problems I can't fix) if anyone has any criticism I would love to hear it! this is probably my favorite passion project. I made this during final season, Since the database takes around 1 day to build, there were some inevitable rate limiting errors that I go into. So I am sure there are many bugs. if you come across any and are willing to share that would be Amazing. Hope this website helps people find new games! Also I have a advance mode for people that don't mind messing with sliders and weird data terms. submitted by /u/Expensive-Ad8916 [link] [comments]

  • TabPFN-3 just released: a pre-trained tabular foundation model for up to 1M rows [R][N]
    by /u/rsesrsfh (Machine Learning) on May 12, 2026 at 2:33 pm

    TabPFN-3 was released today, the next iteration of the tabular foundation model, originally published in Nature. Quick recap for anyone new to TabPFN: TabPFN predicts on tabular data in a single forward pass - no training, no hyperparameter search, no tuning. Built on TabPFN-2.5 (Nov 2025) and TabPFNv2 (Nature, Jan 2025), which together crossed 3M downloads and 200+ published applications. What's new: Scale: 1M rows on a single H100 (10x larger than 2.5).A reduced KV cache (~8GB per million rows per estimator) and row-chunked inference make this practical on a single GPU Speed: 10x-1000x faster inference than previous versions. 120x on SHAP via KV caching Thinking Mode (API only): test-time compute pushes predictions further via one-time extra fitting at inference. Beats every non-TabPFN method on TabArena by over 200 Elo, including 4-hour-tuned AutoGluon 1.5 extreme. Gap more than doubles to 420 Elo on the larger-data slice. Accuracy: it has a 93% win rate over classical ML on TabArena Many-class: native non-parametric retrieval decoder supporting up to 160 classes Calibrated quantile regression: bar-distribution regression head produces calibrated quantile predictions in a single forward pass Lifts adjacent tasks: time-series, interpretability, and new SOTA on relational benchmarks. 3 deployment paths: API, enterprise licensing, and open-source weights (permissive for research and academic evaluation) You can try it here or read the model report here. Happy to answer questions in the comments. submitted by /u/rsesrsfh [link] [comments]

  • I Found a Hidden Ratio in Transformers That Predicts Geometric Stability [R]
    by /u/Otaku_7nfy (Machine Learning) on May 12, 2026 at 2:04 pm

    I have analyzed some decoder transformer models using Lyapunov spectral analysis and found that the ratio of the MLP and attention spectral norms strongly indicates whether a model will eventually collapse to rank-1 or not by the final layers. I found that the spectral ratio is best kept around 0.5–2 for keeping the model stable till the final layers. Paper/Github repo: https://github.com/yousef-rafat/the-1-1-rule submitted by /u/Otaku_7nfy [link] [comments]

  • ICML Visa issues [D]
    by /u/No_Cardiologist7609 (Machine Learning) on May 12, 2026 at 11:50 am

    Has anyone applying for a Korean visa for ICML been asked for the conference’s Business Registration Number? The ICML website explicitly states that it cannot provide the BRC so I wanted to ask how others handled this submitted by /u/No_Cardiologist7609 [link] [comments]

  • Cache-testing software for LLM-provider-style tiered ephemeral caches? [D]
    by /u/flatmax (Machine Learning) on May 12, 2026 at 11:07 am

    I'm looking for a cache simulator / benchmark suite suited to the kind of tiered ephemeral cache that LLM providers use — e.g. Anthropic's 4-tier prompt cache, where context sits across several tiers with different residency windows, costs, and eviction rules. I've already tried libCacheSim. It's a solid piece of software for classical caches (LRU, FIFO, ARC, SIEVE, S3-FIFO, W-TinyLFU, Belady oracle, plugin API, trace replay), and I got a plugin + synthetic trace working against it. But it seems fundamentally aimed at single, flat caches: One cache, not a hierarchy of tiers with different costs No notion of partial / multi-tier residency of the same object Misses are uniform-cost — no way to express "miss to L1 vs miss to L3 vs full recompute," which is the whole point in LLM prompt caching Trace model is atomic get/put, not edit streams where cached objects mutate in place No first-class support for token-weighted object sizes So it works as a baseline comparator, but it's not really the right shape for evaluating LLM-cache policies. Does anyone know of cache-testing software specifically targeting LLM-provider-style caches? Something that models multiple tiers with per-tier cost/residency, tokenised objects, and edit-driven workloads would be ideal. Academic code, research prototypes, internal tools that got open-sourced — all welcome. Even partial matches (e.g. KV-cache simulators for inference servers) would be useful pointers. submitted by /u/flatmax [link] [comments]

  • Interaction Models from Thinking Machines Lab [P]
    by /u/Agitated-Ad809 (Machine Learning) on May 12, 2026 at 8:45 am

    submitted by /u/Agitated-Ad809 [link] [comments]

  • Follow-up on the TranslateGemma subtitle benchmark: human review of segments rated "clean" by MetricX-24 and COMETKiwi [D]
    by /u/ritis88 (Machine Learning) on May 12, 2026 at 8:38 am

    A few weeks ago I shared the results of a benchmark here comparing 6 LLMs on subtitle translation, scored with two reference-free QE metrics - MetricX-24 (~13B mT5-XXL) and COMETKiwi (~10.7B XLM-R-XXL) - combined into a TQI index. Posting a follow-up because we did human review afterwards, and the result is worth discussing. The original benchmark put TranslateGemma-12b first in every language pair. The natural question: are those high scores accurate, or are the metrics insensitive in their high-confidence zone? These metrics correlate well with human judgment at the population level (that's what they're trained for), but population-level correlation doesn't tell you whether the segments they call "clean" are actually clean. So we ran the check directly. 21 English subtitle segments from one tutorial video. TranslateGemma's translations into 4 languages (ES, JA, TH, ZH-CN - Korean and Traditional Chinese got dropped). All 84 translations chosen because they passed the dashboard clean-rule (MX < 5 AND CK ≥ 0.70) in all 4 languages simultaneously. Then full MQM annotation by professional linguists - Major/Minor severity, with categories covering accuracy (mistranslation, omission, addition, untranslated), fluency (grammar, punctuation, inconsistency), style, terminology. Results under the dashboard threshold: Auto-flagged: 1/84 Human-flagged: 60/84 any-error, 13/84 Major-only Metric-blindness rate (auto-clean ∩ human-flagged / auto-clean): 59/83 = 71% any-error, 12/83 = 14.5% Major-only All 25 human-found Accuracy-class errors fell in the metric-blind quadrant. Zero overlap with the auto-flagged region (which contained one Style-category Major error). Japanese carries 10 of 15 total mistranslations across the dataset, all metric-blind, despite having the highest mean COMETKiwi (0.863) of the four languages. Caveat: small n, one model, one content set, so the numbers are directional rather than definitive. Original thread: [link] Full benchmark report: in comments. submitted by /u/ritis88 [link] [comments]

  • Online RL Reading Group[D]
    by /u/eramyu (Machine Learning) on May 11, 2026 at 11:51 pm

    Hi, I am a student going into my first year in Ph.D in RL this September. Although each university kinda has their own reading groups, I was wondering if there is active RL Online reading group I can participate. Sadly I couldnt find any info elsewhere. Does anyone have any information regarding Online RL Reading groups? Thank you! submitted by /u/eramyu [link] [comments]

  • How can I check whether my paper follows the required ARR formatting before submission? [D]
    by /u/Distinct_Relation129 (Machine Learning) on May 11, 2026 at 9:39 pm

    Last cycle, one of my research paper was rejected because of formatting issues. I recently heard from someone that there may be a tool or software called something like “aclpubcheck” that can be used to check whether a manuscript follows the required submission format correctly. Does anyone know the exact name of this software or tool? Also, if there is no such reliable tool, what is the best way to make sure that a paper is formatted correctly before submission? Like, how do you usually verify margins, page limits, font size, template compliance, bibliography format, and other formatting requirements before submitting to a conference or journal? submitted by /u/Distinct_Relation129 [link] [comments]

  • A hackable compiler to generate efficient fused GPU kernels for AI models [P]
    by /u/NoVibeCoding (Machine Learning) on May 11, 2026 at 8:48 pm

    The modern ML (LLM) compiler stack is brutal. TVM is 500K+ lines of C++. PyTorch piles Dynamo, Inductor, and Triton on top of each other. I built a hackable LLM compiler from scratch and am documenting the process. It takes a small model (TinyLlama, Qwen2.5-7B) and lowers it to a sequence of CUDA kernels through six IRs. Currently, on RTX 5090, the emitted FP32 kernels run at geomean 1.11× vs PyTorch eager and 1.20× vs torch.compile, with full-block parity on TinyLlama-128 and Qwen2.5-7B at seq=128. Wins on small reductions / SDPA / kv-projections (up to 4.7×); losses on dense matmul at seq=512. Part 1 took an RMSNorm layer end-to-end and walked the upper half of that pipeline in detail. This second part closes the gap and explains Tile IR, Kernel IR, and associated lowering rules in depth. Full article: A Principled ML Compiler Stack in 5,000 Lines of Python The article focuses on producing a GPU schedule for an operation written in loop-nest form (Loop IR). Example for RMSNorm: python v0 = reciprocal(2048) for a0 in 0..32: # free for a1 in 0..2048: # reduce in2 = load x[0, a0, a1] v1 = multiply(in2, in2) acc0 <- add(acc0, v1) v2 = multiply(acc0, v0) v3 = add(v2, 1e-06) v4 = rsqrt(v3) for a2 in 0..2048: # free in3 = load x[0, a0, a2] in4 = load p_weight[a2] v5 = multiply(in3, v4) v6 = multiply(v5, in4) merged_n0[0, a0, a2] = v6 The stack mimics a sequence of optimization steps a CUDA engineer would perform when optimizing kernels: stage inputs to smem, reduce bank conflicts, increase occupancy, and so on. diff LoopOp │ ▼ [001] tileify — lift outer free Loops to thread axes [002] chunk_matmul_k — chunk the K reduce into K-outer × K-inner (intra-CTA) [003] split_matmul_k — promote the K-outer chunk loop into a grid dimension [004] cooperative_reduce — let multiple threads share one reduce; tree-merge with Combine [005] blockify_launch — pick block extents; partition free axes into BLOCK and THREAD [006] chunk_reduce — chunk non-matmul reduces so their Loads fit in shared memory [007] stage_inputs — hoist hot input slabs into Stage nodes [008] register_tile — replicate the inner tile so each thread owns a register block [009] permute_register_tile — reorder the register strip so bank-conflicting loads land on far columns [010] double_buffer — promote K-outer Stages to BufferedStage (ping-pong) [011] tma_copy — narrow eligible BufferedStages to TmaBufferedStage (sm_90+) [012] split_inner_for_swizzle — split the inner cache axis of a TmaBufferedStage for swizzle [013] async_copy — narrow the rest to AsyncBufferedStage (cp.async, sm_80+) [014] pad_smem — pad shared-memory strides to break bank conflicts [015] pipeline_k_outer — rotate the K-outer loop into prologue/steady-state/epilogue (cp.async + TMA) [016] mark_unroll — annotate small inner loops for #pragma unroll │ ▼ TileOp (fully scheduled) Each stage can be reproduced with a CLI command. For example, the stage_inputs pass stages input buffers into smem if possible and if there is a benefit in doing that (inputs are being read multiple times within CTA). To see it, the following command can be used: bash deplodock compile \ -c "torch.nn.RMSNorm(2048)(torch.randn(1,32,2048))" \ --ir tile -vv \ | awk '/^>>> t:007/,/^<<< t:007/' ```diff t:007_stage_inputs @@ matched at rms_norm (in-place) @@ @@ -2,6 +2,7 @@ v0 = reciprocal(2048) Tile(axes=(a0:256=THREAD, a1:32=BLOCK)): + x_smem = Stage(x, origin=(0, a1, 0), slab=(a2:2048@2)) StridedLoop(a2 = a0; < 2048; += 256): # reduce - in2 = load x[0, a1, a2] + in2 = load x_smem[a2] v1 = multiply(in2, in2) acc0 <- add(acc0, v1) @@ -11,5 +12,5 @@ v4 = rsqrt(v3) StridedLoop(a2 = a0; < 2048; += 256): # free - in3 = load x[0, a1, a2] + in3 = load x_smem[a2] in4 = load p_weight[a2] v5 = multiply(in3, v4) <<< t:007_stage_inputs ``` The final CUDA kernel for the RMSNorm layer: bash deplodock compile \ -c "torch.nn.RMSNorm(2048)(torch.randn(1,32,2048))" \ --target sm_120 --ir cuda c extern "C" __global__ __launch_bounds__(256) void k_rms_norm_reduce( const float* x, const float* p_weight, float* rms_norm) { float v0 = 1.0f / 2048.0f; int a1 = blockIdx.x; int a0 = threadIdx.x; int lane = threadIdx.x & 31; int warp = threadIdx.x >> 5; float acc0 = 0.0f; __shared__ float x_smem[2048]; for (int x_smem_flat = a0; x_smem_flat < 2048; x_smem_flat += 256) { float x_smem_v = x[a1 * 2048 + x_smem_flat]; x_smem[x_smem_flat] = x_smem_v; } __syncthreads(); for (int a2 = a0; a2 < 2048; a2 += 256) { float in2 = x_smem[a2]; float v1 = in2 * in2; acc0 += v1; } float acc0_w = acc0; acc0_w = acc0_w + __shfl_xor_sync(0xffffffff, acc0_w, 16); acc0_w = acc0_w + __shfl_xor_sync(0xffffffff, acc0_w, 8); acc0_w = acc0_w + __shfl_xor_sync(0xffffffff, acc0_w, 4); acc0_w = acc0_w + __shfl_xor_sync(0xffffffff, acc0_w, 2); acc0_w = acc0_w + __shfl_xor_sync(0xffffffff, acc0_w, 1); __shared__ float acc0_smem[8]; if (lane == 0) { acc0_smem[warp] = acc0_w; } __syncthreads(); for (int s = 4; s > 0; s >>= 1) { if (warp < s) { acc0_smem[warp] = acc0_smem[warp] + acc0_smem[warp + s]; } __syncthreads(); } float acc0_b = acc0_smem[0]; float v2 = acc0_b * v0; float v3 = v2 + 1e-06f; float v4 = rsqrtf(v3); for (int a2 = a0; a2 < 2048; a2 += 256) { float in3 = x_smem[a2]; float in4 = p_weight[a2]; float v5 = in3 * v4; float v6 = v5 * in4; rms_norm[a1 * 2048 + a2] = v6; } } submitted by /u/NoVibeCoding [link] [comments]

  • Passing Multidimensional time series to VLM [R]
    by /u/zillur-av (Machine Learning) on May 11, 2026 at 6:23 pm

    Hello all, I have a multidimensional time series dataset and corresponding environment videos. I want to pass them to a VLM to perform some tasks. What is the best way to pass the time series data? From the literature review, I see there are two methods: pass time series as text and plot line charts and pass those as images. Neither method performed well on my task. Appreciate any guidance. submitted by /u/zillur-av [link] [comments]

  • Where are small Models like Qwen3 0.6B and Qwen3.5 0.8B used ? Huggingface shows 2.88 million downloads this month.[D]
    by /u/adssidhu86 (Machine Learning) on May 11, 2026 at 5:19 pm

    I can see 2.88 million downloads per month for small Qwen3.5 model. I tried using earlier model 0.6B in a deep resarch workflow and it was very difficult to get something done with this model . Firstly they have a very surface level understanding of concepts. Poor Semantic understand means they can get confused about the topic or the task. Json outputs are often broken . Adding a layer of checks on top took much of my time while working with these models. Slow resposne. This one depends on a lot of factors and can actullay be improved , still slow response is a buzz kill most of the time I am very curious how is the community using these models. submitted by /u/adssidhu86 [link] [comments]

  • Interactive Jensen–Shannon Divergence Visualisation [P]
    by /u/ancillia (Machine Learning) on May 11, 2026 at 3:03 pm

    An interactive visualisation of Jensen–Shannon divergence - the symmetric, always-finite cousin of KL. Shape two distributions and watch JSD, its ceiling of one bit, and the per-point contribution respond in real time. https://robotchinwag.com/posts/jensen-shannon-divergence-visualisation/ Feedback welcome. submitted by /u/ancillia [link] [comments]

  • What to expect from AlphaZero's value predictions [D]
    by /u/YamEnvironmental4720 (Machine Learning) on May 11, 2026 at 12:29 pm

    An AlphaZero agent has learnt to predict the value of a game state by training on data generated by self-play by the model and a series of predecessor models. By construction, this value should reflect the probability of winning against a copy of itself starting from the given state. To be more precise, the value measures the state's average strength against opponent players collected among all the predecessors of the current model. This average depends on the manner in which the training data is sampled from the pool of self-play data (using a rolling window of self-play by the latest x models, putting more emphasis on recent models by geometric weighting, etc.). In each round of self-play, we can think of the agents (a copy for each player) making moves following a strategy, albeit a stochastic one (unless the temperature parameter is zero), defined by the PUCT function for the predicted values and policies, but that this strategy is a little perturbed by the addition of some proportion of Dirichlet noise. The purpose of this perturbation is to give the model an opportunity to find successful actions by chance and not get trapped into some rigid, possibly narrow, pattern of playing. Because of role of noise in deciding which move to make, the formulation above that the value reflects the chances of winning against the model itself is an over-simplification. The data on which the value prediction is based does include "outlier" moves, and - as far as I've understood - this is a heuristic argument for the claim that the model makes its predictions based on experience of playing against a variety of different players. However, due to the moves that differ the most from the "predicted" ones being outliers, such moves also have a correspondingly small impact on the value predictions: it is the agent's own playing style, and the historical development of said style, that governs value predictions. So, if the agent meets a strong opponent, either a human being or an algorithm with a strong track record, why should AlphaZero's value prediction be a reliable measure of the agent's chances of winning against this opponent from the given position? Experience has shown AlphaZero to indeed outperform both human players and other algorithms in a variety of games. I wonder if this success is also to be expected a priori, or is it conceivable that AlphaZero could even fail miserably in some game against a specific algorithm whose moves, though occurring in AlphaZero's training data pool, occur so infrequently that they don't make any significant impact on the predictions? submitted by /u/YamEnvironmental4720 [link] [comments]

  • Is reproducing or implementing a paper considered research? [R]
    by /u/UmbraShield (Machine Learning) on May 11, 2026 at 10:55 am

    I completed my bachelors recently and I plan to applying to a masters program either this cycle or the next. Unfortunately, I did not publish any papers or do any research during my undergrad. Right now I’m in a research internship which is coming to and soon and it’s unlikely that I’ll get to publish a paper. I would like to know if reproducing results from a known paper for validation or extension or a comparative analysis counts as credible research. It’s the only thing I could find to do independently. submitted by /u/UmbraShield [link] [comments]

  • Why is human LLM annotation so expensive? [D]
    by /u/Neil-Sharma (Machine Learning) on May 11, 2026 at 12:12 am

    Scale AI and similar services charge a lot for annotation. MTurk is cheap but the quality is horrible for anything requiring real domain understanding. For small teams that need a few thousand labeled examples to calibrate their evals or fine tune a model, there seems to be no good middle ground. How is everyone handling this? Are you doing it manually or has anyone found something that actually works? submitted by /u/Neil-Sharma [link] [comments]

  • PhD students in ML, how many hours on average do you work? [D]
    by /u/akardashian (Machine Learning) on May 10, 2026 at 11:54 pm

    I generally work around 9–10 hours a day, but not contiguously. I can usually carve out a dedicated chunk of time in the morning, take lab or project meetings in the afternoon, and block out around 6–8 PM for commute, exercise, socializing, and dinner. I also get more work done in the evening, since my focus is often best then. On weekends, I mostly run errands and try out new food spots, but I also make sure to do at least a little bit of work every day. I try to schedule my Slurm jobs so they run when I’m not actively working, so I can collect results when I get back. When I don’t have at least some Slurm jobs going, I feel anxious. I also feel pressure to use coding agents whenever I can. At the same time, I find that these agents can create an illusion of productivity: I end up with more “dead time” where I’m just waiting for the agent to finish thinking. I’m in my 3rd year as a PhD student at a top-5 program for my field in the US, and I’ve been thinking a lot about time management recently. I'm done with classes and not TA'ing this quarter. I mainly target the 3 main ML conferences (though I would love to make every deadline consistently and don’t), plus core NLP venues and journals. submitted by /u/akardashian [link] [comments]

  • Signals: finding the most informative agent traces without LLM judges [R]
    by /u/AdditionalWeb107 (Machine Learning) on May 10, 2026 at 5:26 pm

    Hello Peeps Salman, Shuguang and Adil here from Katanemo Labs (a DigitalOcean company). Wanted to introduce our latest research on agentic systems called Signals. If you've been building agents, you've probably noticed that there are far too many agent traces/trajectories to review one by one, and using humans or extra LLM calls to inspect all of them gets expensive really fast. The paper proposes a lightweight way to compute structured “signals” from live agent interactions so you can surface the trajectories most worth looking at, without changing the agent’s online behavior. Computing Signals doesn't require a GPU. Signals are grouped into a simple taxonomy across interaction, execution, and environment patterns, including things like misalignment, stagnation, disengagement, failure, looping, and exhaustion. In an annotation study on τ-bench, signal-based sampling reached an 82% informativeness rate versus 54% for random sampling, which translated to a 1.52x efficiency gain per informative trajectory. Paper: arXiv 2604.00356. https://arxiv.org/abs/2604.00356 Project where Signals are already implemented: https://github.com/katanemo/plano Happy to answer questions on the taxonomy, implementation details, or where this breaks down. submitted by /u/AdditionalWeb107 [link] [comments]

What is Google Workspace?
Google Workspace is a cloud-based productivity suite that helps teams communicate, collaborate and get things done from anywhere and on any device. It's simple to set up, use and manage, so your business can focus on what really matters.

Watch a video or find out more here.

Here are some highlights:
Business email for your domain
Look professional and communicate as you@yourcompany.com. Gmail's simple features help you build your brand while getting more done.

Access from any location or device
Check emails, share files, edit documents, hold video meetings and more, whether you're at work, at home or on the move. You can pick up where you left off from a computer, tablet or phone.

Enterprise-level management tools
Robust admin settings give you total command over users, devices, security and more.

Sign up using my link https://referworkspace.app.goo.gl/Q371 and get a 14-day trial, and message me to get an exclusive discount when you try Google Workspace for your business.

Google Workspace Business Standard Promotion code for the Americas 63F733CLLY7R7MM 63F7D7CPD9XXUVT 63FLKQHWV3AEEE6 63JGLWWK36CP7WM
Email me for more promo codes

Active Hydrating Toner, Anti-Aging Replenishing Advanced Face Moisturizer, with Vitamins A, C, E & Natural Botanicals to Promote Skin Balance & Collagen Production, 6.7 Fl Oz

Age Defying 0.3% Retinol Serum, Anti-Aging Dark Spot Remover for Face, Fine Lines & Wrinkle Pore Minimizer, with Vitamin E & Natural Botanicals

Firming Moisturizer, Advanced Hydrating Facial Replenishing Cream, with Hyaluronic Acid, Resveratrol & Natural Botanicals to Restore Skin's Strength, Radiance, and Resilience, 1.75 Oz

Skin Stem Cell Serum

Smartphone 101 - Pick a smartphone for me - android or iOS - Apple iPhone or Samsung Galaxy or Huawei or Xaomi or Google Pixel

Can AI Really Predict Lottery Results? We Asked an Expert.

Ace the 2025 AWS Solutions Architect Associate SAA-C03 Exam with Confidence Pass the 2025 AWS Certified Machine Learning Specialty MLS-C01 Exam with Flying Colors

List of Freely available programming books - What is the single most influential book every Programmers should read



#BlackOwned #BlackEntrepreneurs #BlackBuniness #AWSCertified #AWSCloudPractitioner #AWSCertification #AWSCLFC02 #CloudComputing #AWSStudyGuide #AWSTraining #AWSCareer #AWSExamPrep #AWSCommunity #AWSEducation #AWSBasics #AWSCertified #AWSMachineLearning #AWSCertification #AWSSpecialty #MachineLearning #AWSStudyGuide #CloudComputing #DataScience #AWSCertified #AWSSolutionsArchitect #AWSArchitectAssociate #AWSCertification #AWSStudyGuide #CloudComputing #AWSArchitecture #AWSTraining #AWSCareer #AWSExamPrep #AWSCommunity #AWSEducation #AzureFundamentals #AZ900 #MicrosoftAzure #ITCertification #CertificationPrep #StudyMaterials #TechLearning #MicrosoftCertified #AzureCertification #TechBooks

Top 1000 Canada Quiz and trivia: CANADA CITIZENSHIP TEST- HISTORY - GEOGRAPHY - GOVERNMENT- CULTURE - PEOPLE - LANGUAGES - TRAVEL - WILDLIFE - HOCKEY - TOURISM - SCENERIES - ARTS - DATA VISUALIZATION
zCanadian Quiz and Trivia, Canadian History, Citizenship Test, Geography, Wildlife, Secenries, Banff, Tourism

Top 1000 Africa Quiz and trivia: HISTORY - GEOGRAPHY - WILDLIFE - CULTURE - PEOPLE - LANGUAGES - TRAVEL - TOURISM - SCENERIES - ARTS - DATA VISUALIZATION
Africa Quiz, Africa Trivia, Quiz, African History, Geography, Wildlife, Culture

Exploring the Pros and Cons of Visiting All Provinces and Territories in Canada.
Exploring the Pros and Cons of Visiting All Provinces and Territories in Canada

Exploring the Advantages and Disadvantages of Visiting All 50 States in the USA
Exploring the Advantages and Disadvantages of Visiting All 50 States in the USA


Health Health, a science-based community to discuss human health

Today I Learned (TIL) You learn something new every day; what did you learn today? Submit interesting and specific facts about something that you just found out here.

Reddit Science This community is a place to share and discuss new scientific research. Read about the latest advances in astronomy, biology, medicine, physics, social science, and more. Find and submit new publications and popular science coverage of current research.

Reddit Sports Sports News and Highlights from the NFL, NBA, NHL, MLB, MLS, NCAA, F1, and other leagues around the world.

Turn your dream into reality with Google Workspace: It’s free for the first 14 days.
Get 20% off Google Google Workspace (Google Meet) Standard Plan with  the following codes:
Get 20% off Google Google Workspace (Google Meet) Standard Plan with  the following codes: 96DRHDRA9J7GTN6 96DRHDRA9J7GTN6
63F733CLLY7R7MM
63F7D7CPD9XXUVT
63FLKQHWV3AEEE6
63JGLWWK36CP7WM
63KKR9EULQRR7VE
63KNY4N7VHCUA9R
63LDXXFYU6VXDG9
63MGNRCKXURAYWC
63NGNDVVXJP4N99
63P4G3ELRPADKQU
With Google Workspace, Get custom email @yourcompany, Work from anywhere; Easily scale up or down
Google gives you the tools you need to run your business like a pro. Set up custom email, share files securely online, video chat from any device, and more.
Google Workspace provides a platform, a common ground, for all our internal teams and operations to collaboratively support our primary business goal, which is to deliver quality information to our readers quickly.
Get 20% off Google Workspace (Google Meet) Business Plan (AMERICAS): M9HNXHX3WC9H7YE
C37HCAQRVR7JTFK
C3AE76E7WATCTL9
C3C3RGUF9VW6LXE
C3D9LD4L736CALC
C3EQXV674DQ6PXP
C3G9M3JEHXM3XC7
C3GGR3H4TRHUD7L
C3LVUVC3LHKUEQK
C3PVGM4CHHPMWLE
C3QHQ763LWGTW4C
Even if you’re small, you want people to see you as a professional business. If you’re still growing, you need the building blocks to get you where you want to be. I’ve learned so much about business through Google Workspace—I can’t imagine working without it.
(Email us for more codes)