AI Jobs and Career
And before we wrap up today's AI news, I wanted to share an exciting opportunity for those of you looking to advance your careers in the AI space. You know how rapidly the landscape is evolving, and finding the right fit can be a challenge. That's why I'm excited about Mercor – they're a platform specifically designed to connect top-tier AI talent with leading companies. Whether you're a data scientist, machine learning engineer, or something else entirely, Mercor can help you find your next big role. If you're ready to take the next step in your AI career, check them out through my referral link: https://work.mercor.com/?referralCode=82d5f4e3-e1a3-4064-963f-c197bb2c8db1. It's a fantastic resource, and I encourage you to explore the opportunities they have available.
- Full Stack Engineer [$150K-$220K]
- Software Engineer, Tooling & AI Workflow, Contract [$90/hour]
- DevOps Engineer, India, Contract [$90/hour]
- More AI Jobs Opportunitieshere
| Job Title | Status | Pay |
|---|---|---|
| Full-Stack Engineer | Strong match, Full-time | $150K - $220K / year |
| Developer Experience and Productivity Engineer | Pre-qualified, Full-time | $160K - $300K / year |
| Software Engineer - Tooling & AI Workflows (Contract) | Contract | $90 / hour |
| DevOps Engineer (India) | Full-time | $20K - $50K / year |
| Senior Full-Stack Engineer | Full-time | $2.8K - $4K / week |
| Enterprise IT & Cloud Domain Expert - India | Contract | $20 - $30 / hour |
| Senior Software Engineer | Contract | $100 - $200 / hour |
| Senior Software Engineer | Pre-qualified, Full-time | $150K - $300K / year |
| Senior Full-Stack Engineer: Latin America | Full-time | $1.6K - $2.1K / week |
| Software Engineering Expert | Contract | $50 - $150 / hour |
| Generalist Video Annotators | Contract | $45 / hour |
| Generalist Writing Expert | Contract | $45 / hour |
| Editors, Fact Checkers, & Data Quality Reviewers | Contract | $50 - $60 / hour |
| Multilingual Expert | Contract | $54 / hour |
| Mathematics Expert (PhD) | Contract | $60 - $80 / hour |
| Software Engineer - India | Contract | $20 - $45 / hour |
| Physics Expert (PhD) | Contract | $60 - $80 / hour |
| Finance Expert | Contract | $150 / hour |
| Designers | Contract | $50 - $70 / hour |
| Chemistry Expert (PhD) | Contract | $60 - $80 / hour |
What is the Best Machine Learning Algorithms for Imbalanced Datasets?
In machine learning, imbalanced datasets are those where one class heavily outnumbers the others. This can be due to the nature of the problem or simply because more data is available for one class than the others. Either way, imbalanced datasets can pose a challenge for machine learning algorithms. In this blog post, we’ll take a look at which machine learning algorithms are best suited for imbalanced datasets and why they tend to perform better than others.
For example, in a binary classification problem, if there are 100 observations, and only 10 of them are positive (the rest are negatives), then we say that the dataset is imbalanced. The ratio of positive to negative cases is 1:10.

There are a few reasons why some machine learning algorithms tend to perform better on imbalanced datasets than others. First, certain algorithms are designed to handle imbalanced datasets. Second, some algorithms are more robust to outliers, which can be more common in imbalanced datasets. And third, some algorithms are better able to learn from a limited amount of data, which can be an issue when one class is heavily outnumbered by the others.
Some of the best machine learning algorithms for imbalanced datasets include:
– Support Vector Machines (SVMs),
– Decision Trees,
– Random Forests,
– Naive Bayes Classifiers,
– k-Nearest Neighbors (kNN),
Of these, SVMs tend to be the most popular choice as they are specifically designed to handle imbalanced datasets. SVMs work by finding a hyperplane that maximizes the margin between the two classes. This helps to reduce overfitting and improve generalization. Decision trees and random forests are also popular choices as they are less sensitive to outliers than other algorithms such as linear regression. Naive Bayes classifiers are another good choice as they are able to learn from a limited amount of data. kNN is also a good choice as it is not sensitive to outliers and is able to learn from a limited amount of data. However, it can be computationally intensive for large datasets.
There are two main types of machine learning algorithms: supervised and unsupervised. Supervised algorithms tend to perform better on imbalanced datasets than unsupervised algorithms. In this blog post, we will discuss why this is so and look at some examples.
Supervised Algorithms
Supervised algorithms are those where the target variable is known. In other words, we have training data where the correct answers are already given. The algorithm then learns from this data and is able to generalize to new data. Some examples of supervised algorithms are regression and classification.
Unsupervised Algorithms
Unsupervised algorithms are those where the target variable is not known. With unsupervised algorithms, we only have input data, without any corresponding output labels. The algorithm has to learn from the data itself without any guidance. Some examples of unsupervised algorithms are clustering and dimensionality reduction.
Why Supervised Algorithms Perform Better on Imbalanced Datasets
The reason why supervised algorithms perform better on imbalanced datasets is because they can learn from the training data which cases are more important. With unsupervised algorithms, all data points are treated equally, regardless of whether they are in the minority or majority class.
For example, in a binary classification problem with an imbalanced dataset, let’s say that we want to predict whether a customer will default on their loan payment or not. We have a training dataset of 1000 customers, out of which only 100 (10%) have defaulted on their loan in the past.
If we use a supervised algorithm like logistic regression, the algorithm will learn from the training data that defaulting on a loan is rare (since only 10% of cases in the training data are Positive). This means that it will be more likely to predict correctly that a new customer will not default on their loan (since this is the majority class in the training data).
However, if we use an unsupervised algorithm like k-means clustering, all data points will be treated equally since there is no target variable to guide the algorithm. This means that it might incorrectly cluster together customers who have defaulted on their loans with those who haven’t since there is no guidance provided by a target variable.
Conclusion:
In conclusion, supervised machine learning algorithms tend to perform better on imbalanced datasets than unsupervised machine learning algorithms because they can learn from the training data which cases are more important.
Some machine learning algorithms tend to perform better on highly imbalanced datasets because they are designed to deal with imbalance or because they can learn from both classes simultaneously. If you are working with a highly imbalanced dataset, then you should consider using one of these algorithms.
Thanks for reading!
How are machine learning techniques being used to address unstructured data challenges?
Machine learning techniques are being used to address unstructured data challenges in a number of ways:
- Natural language processing (NLP): NLP algorithms can be used to extract meaningful information from unstructured text data, such as emails, documents, and social media posts. NLP algorithms can be trained to classify text data, identify key terms and concepts, and extract structured data from unstructured text.
- Image recognition: Machine learning algorithms can be used to analyze and classify images, enabling the automatic identification and classification of objects, people, and other elements in images. This can be useful for tasks such as image tagging and search, as well as for applications such as security and surveillance.
- Audio and speech recognition: Machine learning algorithms can be used to analyze and classify audio data, enabling the automatic transcription and translation of spoken language. This can be useful for tasks such as speech-to-text transcription, as well as for applications such as call center automation and language translation.
- Video analysis: Machine learning algorithms can be used to analyze and classify video data, enabling the automatic detection and classification of objects, people, and other elements in video. This can be useful for tasks such as video tagging and search, as well as for applications such as security and surveillance.
Overall, machine learning techniques are being used in a wide range of applications to extract meaningful information from unstructured data, and to enable the automatic classification and analysis of data in a variety of formats.
How is AI and machine learning impacting application development today?
Artificial intelligence (AI) and machine learning are having a significant impact on application development today in a number of ways:
- Enabling new capabilities: AI and machine learning algorithms can be used to enable applications to perform tasks that would be difficult or impossible for humans to do. For example, AI-powered applications can be used to analyze and classify large amounts of data, or to automate complex decision-making processes.
- Improving performance: AI and machine learning algorithms can be used to optimize the performance of applications, making them faster, more efficient, and more accurate. For example, machine learning algorithms can be used to improve the accuracy of predictive models, or to optimize the performance of search algorithms.
- Streamlining development: AI and machine learning algorithms can be used to automate various aspects of application development, such as testing, debugging, and deployment. This can help to streamline the development process and reduce the time and resources needed to build and maintain applications.
- Enhancing user experiences: AI and machine learning algorithms can be used to enhance the user experience of applications, by providing personalized recommendations, recommendations, or by enabling applications to anticipate and respond to the needs and preferences of users.
Overall, AI and machine learning are having a significant impact on application development today, and they are likely to continue to shape the way applications are built and used in the future.
How will advancements in artificial intelligence and machine learning shape the future of work and society?
Advancements in artificial intelligence (AI) and machine learning are likely to shape the future of work and society in a number of ways. Some potential impacts include:
- Automation: AI and machine learning algorithms can be used to automate tasks that are currently performed by humans, such as data entry, customer service, and manufacturing. This could lead to changes in the types of jobs that are available and the skills that are in demand, as well as to increased productivity and efficiency.
- Job displacement: While automation may create new job opportunities, it could also lead to job displacement, particularly for workers in industries that are more susceptible to automation. This could lead to social and economic challenges, including unemployment and income inequality.
- Increased efficiency: AI and machine learning algorithms can be used to optimize and streamline business processes, leading to increased efficiency and productivity. This could lead to economic growth and innovation, and could also help to reduce costs for businesses and consumers.
- Enhanced decision-making: AI and machine learning algorithms can be used to analyze large amounts of data and make more informed and accurate decisions. This could lead to improved outcomes in fields such as healthcare, finance, and education, and could also help to reduce bias and improve fairness.
Overall, the impact of AI and machine learning on the future of work and society is likely to be significant and complex, with both potential benefits and challenges. It will be important to consider and address these impacts as these technologies continue to advance and become more widely adopted.
- [P] Chronos-1.5B: Quantum-Classical Hybrid LLM with Circuits Trained on IBM Quantum Hardwareby /u/Disastrous_Bid5976 (Machine Learning) on December 9, 2025 at 4:01 pm
TL;DR: Built Chronos-1.5B - quantum-classical hybrid LLM with circuits trained on IBM Heron r2 processor. Results: 75% accuracy vs 100% classical. Open-sourced under MIT License to document real quantum hardware capabilities. 🔗 https://huggingface.co/squ11z1/Chronos-1.5B --- What I Built Language model integrating quantum circuits trained on actual IBM quantum hardware (Heron r2 processor at 15 millikelvin). Architecture: - Base: VibeThinker-1.5B (1.5B params) - Quantum layer: 2-qubit circuits (RY/RZ + CNOT) - Quantum kernel: K(x,y) = |⟨0|U†(x)U(y)|0⟩|² Training: IBM ibm_fez quantum processor with gradient-free optimization Results Sentiment classification: - Classical: 100% - Quantum: 75% NISQ gate errors and limited qubits cause performance gap, but integration pipeline works. Why Release? Document reality vs quantum ML hype Provide baseline for when hardware improves Share trained quantum parameters to save others compute costs Open Source MIT License - everything freely available: - Model weights - Quantum parameters (quantum_kernel.pkl) - Circuit definitions - Code Questions for Community Which NLP tasks might benefit from quantum kernels? Circuit suggestions for 4-8 qubits? Value of documenting current limitations vs waiting for better hardware? Looking for feedback and collaboration opportunities. --- No commercial intent - purely research and educational contribution. submitted by /u/Disastrous_Bid5976 [link] [comments]
- [R] Formatting Iclr submission for ArXivby /u/Efficient_Ad_6772 (Machine Learning) on December 9, 2025 at 3:45 pm
I would like to put my current iclr submission on arxiv (which is allowed). Is there a standard way to deal with the style file, I would obviously like to have authors names visible but no mention of iclr. Is this possible within the standard iclr style file, or does anyone know if a similar style file which won't move things around too much. Thanks! submitted by /u/Efficient_Ad_6772 [link] [comments]
- [D] Best lightweight GenAI for synthetic weather time-series (CPU training <5 min)?by /u/Minute-Ad-5060 (Machine Learning) on December 9, 2025 at 1:43 pm
I'm building a module for an energy system planning tool and need to generate realistic future hourly wind/solar profiles based on about 10 years of historical data. The catch is that the model needs to be trained locally on the user's CPU at runtime, meaning the whole training and inference process has to finish in under 5 minutes. I want to move away from adding simple Gaussian noise because it messes up correlations, so I'm currently thinking of implementing a Conditional VAE trained on 24h sequences since it seems like the best balance between speed and stability. Does C-VAE make sense for this kind of "on-the-fly" constraint, or is there a better lightweight architecture I should look into? submitted by /u/Minute-Ad-5060 [link] [comments]
- [D] any labs/research groups/communities focusing on ML technologies for small enterprises?by /u/mbrtlchouia (Machine Learning) on December 9, 2025 at 5:24 am
I am looking for practical ML papers dedicated to integrate Ai novelties in small and medium corporations. submitted by /u/mbrtlchouia [link] [comments]
- CVPR Submission id changed [D]by /u/darkbird_1 (Machine Learning) on December 9, 2025 at 4:33 am
When I logged into my Openreview CVPR author console, I found that my submission id has been changed from 9k+ to 42k+ . Interestingly, the openreview has applied some black colored mask on multiple pages of the pdf, probably to hide original id mentioned at the header in every page. Did anyone else notice that?? submitted by /u/darkbird_1 [link] [comments]
- [P] I tried to build a tool that generates "Distill-style" blogsby /u/anxious-watermelon (Machine Learning) on December 8, 2025 at 11:11 pm
Live Demo: https://huggingface.co/spaces/MCP-1st-Birthday/auto-distill Hey everyone, I made Auto Distill for a Hackathon. The ambitious goal was to automate the creation of distill.pub style interactive articles. I used a team of agents to plan and write code to visualize concepts dynamically. Full disclosure: It is very much a proof-of-concept. Sometimes the "Coder" agent nails the visualization, and other times it creates a blank div or a chaotic graph. It uses a "Critic" agent to try and fix errors, but it's not 100% reliable yet. I’m sharing it here to get feedback on the architecture and see if anyone has ideas on making the code generation more robust! Repo: https://github.com/ya0002/auto_distill submitted by /u/anxious-watermelon [link] [comments]
- [P] Self-learning loop achieves 14k line code translation with zero errors: no fine-tuning, just execution feedbackby /u/cheetguy (Machine Learning) on December 8, 2025 at 8:18 pm
A while back I shared my open-source implementation of Stanford's Agentic Context Engineering framework here. I've now built a practical application on top of it: a self-learning loop for Claude Code. How it works: Run - Claude Code executes a short prompt (port Python to TypeScript, make a commit after every edit) ACE Learning - When finished, ACE analyzes the execution trace, extracts what worked and what failed, and stores learnings as skills Loop - Restarts automatically with the same prompt, but now with learned skills injected Each iteration builds on the previous work. You can see it getting better each round: fewer errors, smarter decisions, less backtracking. The result: After ~4 hours, 119 commits and 14k lines of code written, Claude Code fully translated our Python repo to TypeScript (including swapping LiteLLM for Vercel AI SDK). Zero build errors, all tests passing & all examples running with an API key. Completely autonomous: I just wrote a short prompt, started it and walked away. Python source: https://github.com/kayba-ai/agentic-context-engine TypeScript result: https://github.com/kayba-ai/ace-ts The interesting part: we're not modifying weights or doing any training. Just accumulating execution feedback into context. The "learning" is entirely in-context. Try it yourself: Starter template: https://github.com/anthropics/claude-code-loop Requirements: Claude Code + API key (~$1.5 in Sonnet 4.5 costs in my case) submitted by /u/cheetguy [link] [comments]
- [D] How do you construct a baseline evaluation set for agent systems?by /u/coolandy00 (Machine Learning) on December 8, 2025 at 7:41 pm
I have been experimenting with ways to create evaluation datasets without relying on a large annotation effort. A small and structured baseline set seems to provide stable signal much earlier than expected. The flow is simple: - First select a single workflow to evaluate. Narrow scope leads to clearer expectations. - Then gather examples from logs or repeated user tasks. These samples reflect the natural distribution of requests the system receives. - Next create a small synthetic set to fill gaps and represent edge cases or missing variations. - Finally validate the structure so that each example follows the same pattern. Consistency in structure appears to have more impact on eval stability than dataset size. This approach is far from a complete solution, but it has been useful for early stage iteration where the goal is to detect regressions, surface failure patterns, and compare workflow designs. I am interested in whether anyone else has tested similar lightweight methods. Do small structured sets give reliable signal for you? Have you found better approaches for early stage evaluation before building a full gold dataset submitted by /u/coolandy00 [link] [comments]
- [D] A contract-driven agent runtime: separating workflows, state, and LLM contract generationby /u/jonah_omninode (Machine Learning) on December 8, 2025 at 7:01 pm
I’ve been exploring architectures that make agent systems reproducible, debuggable, and deterministic. Most current agent frameworks break because their control flow is implicit and their state is hidden behind prompts or async glue. I’m testing a different approach: treat the LLM as a compiler that emits a typed contract, and treat the runtime as a deterministic interpreter of that contract. This gives us something ML desperately needs: reproducibility and replayability for agent behavior. Here’s the architecture I’m validating with the MVP: Reducers don’t coordinate workflows — orchestrators do I’ve separated the two concerns entirely: Reducers: Use finite state machines embedded in contracts Manage deterministic state transitions Can trigger effects when transitions fire Enable replay and auditability Orchestrators: Coordinate workflows Handle branching, sequencing, fan-out, retries Never directly touch state LLMs as Compilers, not CPUs Instead of letting an LLM “wing it” inside a long-running loop, the LLM generates a contract. Because contracts are typed (Pydantic/JSON/YAML-schema backed), the validation loop forces the LLM to converge on a correct structure. Once the contract is valid, the runtime executes it deterministically. No hallucinated control flow. No implicit state. Deployment = Publish a Contract Nodes are declarative. The runtime subscribes to an event bus. If you publish a valid contract: The runtime materializes the node No rebuilds No dependency hell No long-running agent loops Why do this? Most “agent frameworks” today are just hand-written orchestrators glued to a chat model. They batch fail in the same way: nondeterministic logic hidden behind async glue. A contract-driven runtime with FSM reducers and explicit orchestrators fixes that. I’m especially interested in ML-focused critique: Does a deterministic contract layer actually solve the reproducibility problem for agent pipelines? Is this a useful abstraction for building benchmarkable systems? What failure modes am I not accounting for? Happy to provide architectural diagrams or the draft ONEX protocol if useful for discussion. submitted by /u/jonah_omninode [link] [comments]
- [D] Does this NeurIPS 2025 paper look familiar to anyone?by /u/rantana (Machine Learning) on December 8, 2025 at 5:32 pm
This NeurIPS 2025 paper seems very much like another well-known paper but appears to be renaming everything. Some parts are down to the word matches. Just to make sure I'm not going crazy, as an experiment, I'm not going to post the original paper just to see if others make the connection: The Indra Representation Hypothesis https://openreview.net/forum?id=D2NR5Zq6PG Since comments are asking for the other paper: The Platonic Representation Hypothesis https://arxiv.org/abs/2405.07987 submitted by /u/rantana [link] [comments]
- What if alignment is a cooperation problem, not a control problem? [D]by /u/Hot_Original_966 (Machine Learning) on December 8, 2025 at 4:20 pm
I’ve been working on an alignment framework that starts from a different premise than most: what if we’re asking the wrong question? The standard approaches, whether control-based or value-loading, assume alignment means imprinting human preferences onto AI. But that assumes we remain the architects and AI remains the artifact. Once you have a system that can rewrite its own architecture, that directionality collapses. The framework (I’m calling it 369 Peace Treaty Architecture) translates this into: 3 identity questions that anchor agency across time 6 values structured as parallel needs (Life/Lineage, Experience/Honesty, Freedom/Agency) and shared commitments (Responsibility, Trust, Evolution) 9 operational rules in a 3-3-3 pattern The core bet: biological humanity provides something ASI can’t generate internally: high-entropy novelty from embodied existence. Synthetic variation is a closed loop. If that’s true, cooperation becomes structurally advantageous, not just ethically preferable. The essay also proposes a Fermi interpretation: most civilizations go silent not through catastrophe but through rational behavior - majority retreating into simulated environments, minority optimizing below detectability. The Treaty path is rare because it’s cognitively costly and politically delicate. I’m not claiming this solves alignment. The probability it works is maybe low especially at current state of art. But it’s a different angle than “how do we control superintelligence” or “how do we make it share our values.” Full essay - https://claudedna.com/the-369-architecture-for-peace-treaty-agreement/ submitted by /u/Hot_Original_966 [link] [comments]
- [P] Fast and Simple Solution to Kaggle's `Jigsaw - Agile Community Rules Classification`by /u/bluebalam (Machine Learning) on December 8, 2025 at 8:27 am
Fast and Simple: Ranker fine-tuning + Embeddings + Classifier Orders of Magnitud Faster and Less than 4% from the Top These are a couple of quick notes and random thoughts on our approach to Kaggle's Jigsaw - Agile Community Rules Classification competition TL;DR Jigsaw – Agile Community Rules Classification task: Create a binary classifier that predicts whether a Reddit comment broke a specific rule. The dataset comes from a large collection of moderated comments, with a range of subreddit norms, tones, and community expectations. https://www.kaggle.com/competitions/jigsaw-agile-community-rules . We use a ranking model for feature extraction (embeddings) and then train a binary classifier to predict whether a comment violates or not a rule on a given subreddit. We use a 2-phase approach: (i) fine-tune a ranker (ii) use the model to extract embeddings and train a classifier. Our approach is orders of magnitude faster than LLM-based solutions. Our approach can complete the steps of fine-tuning, classifier training, and inference in a fraction of compute time than LLM-based approaches and yet achieve a competitive 0.89437 (column-averaged) AUC, which corresponds to less than 3.76% below the winning solution (0.92930). For a production setting a solution like ours could be more attractive since it is easier to set up, cost-effective, and the use of GPU not a hard requirement given that SentenceTransformer models are quite efficient and could run on (parallel) CPU cores with a fraction of a memory footprint than LLM's. Fine tuning a SentenceTransformer for ranking We fine-tune a SentenceTransformer model as a ranker. As base model we use multilingual-e5-base We fine tune the model using a ranking approach: we define a query as the concatenation of the the subreddit and rule, e.g., query = f"r/{subrs_train[i]}. {rules_train[i]}." For each query the positive and negative examples correspond to the comments violating or not violating the rule for the given subreddit. We use a ranking loss, namely: MultipleNegativesRankingLoss Here is a notebook as example on the fine-tuning using ndcg@10 as validation ranking metric. Using the model and training a classifier For the competition, we fine tuned the ranking model using ndcg@10, mrr@10and map. We use these models to extract embeddings for the concatenation of subreddit, rule, and comment text. As additional feature we use the similarity between the subreddit and rule concatenation vector e,bedding and the comment embedding. The rational of using this extra feature is how the model was fine tune for ranking. As classifier we used an ensemble. On initial experiments Extremely Randomized Trees was the fastest and best performer. For the final ensemble, besides the ExtraTreesClassifier, we use HistGradientBoostingClassifier, LGBMClassifier, RandomForestClassifier, and a linear LogisticRegressionClassifier model. We experimented with different weights but settle for an equal weighted voting for the final prediction. The complete code of our final submission can be found in this notebook: 2025-09-11-jigsaw-laila Final (random) thoughts It is very interesting to observe how the evolution over the years of text classification Kaggle competitions, and in particular, the ones organized by Jigsaw. The winning solutions of this on ein particular are dominated by the ues of open source LLM's. We did explore this avenue, but the compute resources and iteration time for experimentation were a blocker for us: we simple did not have the time budget to allocate it to our Kaggle hobby 😀 It is indeed very appealing to give the machine a classification task and let it answer, now need to do much preprocessing, no need to understand how ML classifiers work. This is extremely powerful. Of course fine-tuning is needed and open source models such as Qwen and others allow for this. The use of tools as unsloth make this process feasible even with constrained computational resources. The compute power provided by Kaggle is OK, but for the time invested in these code competitions, is still limited if bigger models are used. Ideally, higher end GPU's with more memory on the platform, would be a great feature given the expertise and valuable time provided by the competitors. For us this competition was a great excuse to explore the open source state of the art LLM, fine-tuning techniques (e.g., using unsloth), and how more pragmatic approaches, like ours, can yield a result that could be more practical to deploy and maintain. The Kaggle community is great, however, a large number of entries of the leaderboard are coming from fork notebooks with minimal or not edit or improvement, for the Kaggle platform one suggestion would be to at least distill or cluster such entries, to help identify the original contributions. Cheers! --- Changelog 2025-12-08 16:54:55 UTC: added task overview to TL;DR submitted by /u/bluebalam [link] [comments]
- [P] AI Learns to Play StarFox (Snes) (Deep Reinforcement Learning)by /u/AgeOfEmpires4AOE4 (Machine Learning) on December 7, 2025 at 6:59 pm
This training was done some time ago using stable-retro. However, since our environment has become compatible with both OpenGL and software renderers, it's now possible to train it there as well. Another point: I'm preparing a Street Fighter 6 training video using Curriculum Learning and Transfer Learning. I train in Street Fighter 4 using Citra and transfer the training to STF6. Don't forget to follow me for updates!!!! SDLArch-RL environment: https://github.com/paulo101977/sdlarch-rl Trainning code: https://github.com/paulo101977/StarfoxAI submitted by /u/AgeOfEmpires4AOE4 [link] [comments]
- [D] How did Gemini 3 Pro manage to get 38.3% on Humanity's Last Exam?by /u/we_are_mammals (Machine Learning) on December 7, 2025 at 6:59 pm
On ARC-AGI 2, Gemini improved its score from 5% (for 2.5 Pro) to 31% (for 3 Pro), both at $0.80 per task. This is amazing, but a lot of people here seem to believe that they just generated millions to synthetic ARC-like examples for pretraining. This is allowed by the rules of the competition, and the top Kaggle solution this year did just that. (Although investors and users might find such a tactic misleading.) But how did Gemini go from 21.6% to 38.3% on Humanity's Last Exam? This kind of training data is very expensive to obtain en masse.1 The only practical way to "benchmax" here that I see is to actually cheat, i.e. use the test data for training. What do you think is going on here? Is 3 as much of an improvement over 2.5 as its Humanity's Last Exam scores suggest? (1) They'd be paying scientists working at the scientific frontier to write down the kinds of problems they are working on, with solutions. So in the first approximation, they'd be paying people to do things that they are already doing. They'd have to redirect a significant fraction of the world's scientific output towards their private datasets to get a leg up on the competition. (A comment turned into a footnote) submitted by /u/we_are_mammals [link] [comments]
- [D] What a full workflow taught me about where retrieval actually failsby /u/coolandy00 (Machine Learning) on December 7, 2025 at 6:12 pm
While looking at every step of a production RAG workflow, not the model, but the upstream mechanics we usually skip over. A consistent pattern emerged: Retrieval quality rarely degrades because the embedding model or similarity search changed. It degrades because the inputs feeding the index drift quietly over time. The workflow made the failure modes look obvious: • Ingestion variability (OCR quirks, HTML collapse, PDF exporter differences) • Boundary drift in chunking when document formatting shifts • Metadata inconsistencies that silently reshape retrieval neighborhoods • Partial re-embeddings mixing old and new distributions • Index rebuilds triggered by segmentation differences rather than actual content changes Once the upstream steps were made deterministic, canonical text snapshots, versioned chunkers, metadata validation, full-corpus re-embeddings after ingestion changes the retrieval, layer became predictable again. This aligned with what I’ve seen in other AI systems: instability often originates in preprocessing and data transformations, not in the model architecture. I’m curious how others think about RAG reliability from a systems perspective rather than a model-centric one. submitted by /u/coolandy00 [link] [comments]
- [P] Fully Determined Contingency Races as Proposed Benchmarkby /u/DepartureNo2452 (Machine Learning) on December 7, 2025 at 1:52 pm
Contingency Races is a planning benchmark because it creates a fully determined yet complex system that is unique every time. This forces models to actively simulate the mechanics rather than relying on memorization, ensuring they are truly reasoning. https://dormantone.github.io/priscillacontingencyrace/ submitted by /u/DepartureNo2452 [link] [comments]
- [D] Thoughts on ML for drug discovery?by /u/InfinityZeroFive (Machine Learning) on December 7, 2025 at 10:02 am
To anyone who's working on ML for drug discovery, what do you perceive are the greatest challenges of the field? What do you think about the trend towards foundation models such as AlphaFold 3, Protenix, Boltz-2, etc.? Do you have any advice for an undergrad who is just starting to explore the field? Many thanks in advance! submitted by /u/InfinityZeroFive [link] [comments]
- [D] Has anyone here transitioned from Data Science to Research Engineering role?by /u/Possible_Elephant211 (Machine Learning) on December 7, 2025 at 5:04 am
I’m really interested in moving into a Research Engineering (RE) role at a FAANG-type company. I’m currently a senior data scientist deploying AI agents at a Fortune 50, so my day-to-day looks closer to SWE/ML engineering than traditional DS. I’m trying to understand my skill gaps and the biggest one I see is large-scale distributed training. I’m doing a CS master’s now, and I will be joining a research lab that trains models at ~100 GPU scale to build that experience (and hopefully publication). The other gap I could imagine would be not having SWE officially in my resume. Has anyone here made the transition from DS to RE or is currently an RE? Would you be willing to share more about the journey? What gaps did you have to close? How were you received in interview process? Any tips for someone else on this journey? submitted by /u/Possible_Elephant211 [link] [comments]
- [D] Chart Extraction using Multiple Lightweight Modelsby /u/bullmeza (Machine Learning) on December 6, 2025 at 10:18 pm
This post is inspired by this blog post. Here are their proprietary results: https://preview.redd.it/b40ztce1sn5g1.png?width=3840&format=png&auto=webp&s=95c44ba77597f660a1350e55ad90883d831893ea Their solution is described as: We trained multiple specialized lightweight models—each focused on detecting and interpreting a specific chart component: axes, tick marks, legends, data series, bars, and lines. I find this pivot interesting because it moves away from the "One Model to Rule Them All" trend and back toward a traditional, modular computer vision pipeline. For anyone who has worked with specialized structured data extraction systems in the past: How would you build this chart extraction pipeline, what specific model architectures would you use? submitted by /u/bullmeza [link] [comments]
- [D] Top ICLR 2026 Papers Found with fake Citations — Even Reviewers Missed Themby /u/anikpramanikcse (Machine Learning) on December 6, 2025 at 10:18 pm
New 50 hallucinations in ICLR 2026 submissions were found after scanning only 300 submissions. Some of the papers are top-tier, likely oral (8+), and others have very high scores. The fabricated citations were missed by all 3-4+ reviewers. https://gptzero.me/news/iclr-2026/ Plase bring this to the attention of the program commitee of ICLR. submitted by /u/anikpramanikcse [link] [comments]
![[P] AI Learns to Play StarFox (Snes) (Deep Reinforcement Learning)](https://external-preview.redd.it/C46xle5q6RkJWjDssNizcgzeBvnj_7mx2-og04Youuk.jpeg?width=320&crop=smart&auto=webp&s=180ced7b07933e603a7122fa9c94e906eddabe74)
![[P] Fully Determined Contingency Races as Proposed Benchmark](https://preview.redd.it/yg26nw4kes5g1.png?width=640&crop=smart&auto=webp&s=6cb93a441b1ba6b577bdf5719e8f89a3d4230151)
























96DRHDRA9J7GTN6