DjamgaMind: Audio Intelligence for the C-Suite (Daily AI News, Energy, Healthcare, Finance)
Full-Stack AI Intelligence. Zero Noise.The definitive audio briefing for the C-Suite and AI Architects. From Daily News and Strategic Deep Dives to high-density Industrial & Regulatory Intelligence—decoded at the speed of the AI era. . 👉 Start your specialized audio briefing today at Djamgamind.com
AI Jobs and Career
I wanted to share an exciting opportunity for those of you looking to advance your careers in the AI space. You know how rapidly the landscape is evolving, and finding the right fit can be a challenge. That's why I'm excited about Mercor – they're a platform specifically designed to connect top-tier AI talent with leading companies. Whether you're a data scientist, machine learning engineer, or something else entirely, Mercor can help you find your next big role. If you're ready to take the next step in your AI career, check them out through my referral link: https://work.mercor.com/?referralCode=82d5f4e3-e1a3-4064-963f-c197bb2c8db1. It's a fantastic resource, and I encourage you to explore the opportunities they have available.
- Full Stack Engineer [$150K-$220K]
- Software Engineer, Tooling & AI Workflow, Contract [$90/hour]
- DevOps Engineer, India, Contract [$90/hour]
- More AI Jobs Opportunitieshere
| Job Title | Status | Pay |
|---|---|---|
| Full-Stack Engineer | Strong match, Full-time | $150K - $220K / year |
| Developer Experience and Productivity Engineer | Pre-qualified, Full-time | $160K - $300K / year |
| Software Engineer - Tooling & AI Workflows (Contract) | Contract | $90 / hour |
| DevOps Engineer (India) | Full-time | $20K - $50K / year |
| Senior Full-Stack Engineer | Full-time | $2.8K - $4K / week |
| Enterprise IT & Cloud Domain Expert - India | Contract | $20 - $30 / hour |
| Senior Software Engineer | Contract | $100 - $200 / hour |
| Senior Software Engineer | Pre-qualified, Full-time | $150K - $300K / year |
| Senior Full-Stack Engineer: Latin America | Full-time | $1.6K - $2.1K / week |
| Software Engineering Expert | Contract | $50 - $150 / hour |
| Generalist Video Annotators | Contract | $45 / hour |
| Generalist Writing Expert | Contract | $45 / hour |
| Editors, Fact Checkers, & Data Quality Reviewers | Contract | $50 - $60 / hour |
| Multilingual Expert | Contract | $54 / hour |
| Mathematics Expert (PhD) | Contract | $60 - $80 / hour |
| Software Engineer - India | Contract | $20 - $45 / hour |
| Physics Expert (PhD) | Contract | $60 - $80 / hour |
| Finance Expert | Contract | $150 / hour |
| Designers | Contract | $50 - $70 / hour |
| Chemistry Expert (PhD) | Contract | $60 - $80 / hour |
What are the top 3 methods used to find Autoregressive Parameters in Data Science?
In order to find autoregressive parameters, you will first need to understand what autoregression is. Autoregression is a statistical method used to create a model that describes data as a function of linear regression of lagged values of the dependent variable. In other words, it is a model that uses past values of a dependent variable in order to predict future values of the same dependent variable.
In time series analysis, autoregression is the use of previous values in a time series to predict future values. In other words, it is a form of regression where the dependent variable is forecasted using a linear combination of past values of the independent variable. The parameter values for the autoregression model are estimated using the method of least squares.
The autoregressive parameters are the coefficients in the autoregressive model. These coefficients can be estimated in a number of ways, including ordinary least squares (OLS), maximum likelihood (ML), or least squares with L1 regularization (LASSO). Once estimated, the autoregressive parameters can be used to predict future values of the dependent variable.
To find the autoregressive parameters, you need to use a method known as least squares regression. This method finds the parameters that minimize the sum of the squared residuals. The residual is simply the difference between the predicted value and the actual value. So, in essence, you are finding the parameters that best fit the data.

How to Estimate Autoregressive Parameters?
There are three main ways to estimate autoregressive parameters: ordinary least squares (OLS), maximum likelihood (ML), or least squares with L1 regularization (LASSO).
Ordinary Least Squares: Ordinary least squares is the simplest and most common method for estimating autoregressive parameters. This method estimates the parameters by minimizing the sum of squared errors between actual and predicted values.
Maximum Likelihood: Maximum likelihood is another common method for estimating autoregressive parameters. This method estimates the parameters by maximizing the likelihood function. The likelihood function is a mathematical function that quantifies the probability of observing a given set of data given certain parameter values.
Least Squares with L1 Regularization: Least squares with L1 regularization is another method for estimating autoregressive parameters. This method estimates the parameters by minimizing the sum of squared errors between actual and predicted values while also penalizing models with many parameters. L1 regularization penalizes models by adding an extra term to the error function that is proportional to the sum of absolute values of the estimator coefficients.
Finding Autoregressive Parameters: The Math Behind It
To find the autoregressive parameters using least squares regression, you first need to set up your data in a certain way. You need to have your dependent variable in one column and your independent variables in other columns. For example, let’s say you want to use three years of data to predict next year’s sales (the dependent variable). Your data would look something like this:
| Year | Sales |
|——|——-|
| 2016 | 100 |
| 2017 | 150 |
| 2018 | 200 |
AI-Powered Professional Certification Quiz Platform
Web|iOs|Android|Windows
Are you passionate about AI and looking for your next career challenge? In the fast-evolving world of artificial intelligence, connecting with the right opportunities can make all the difference. We're excited to recommend Mercor, a premier platform dedicated to bridging the gap between exceptional AI professionals and innovative companies.
Whether you're seeking roles in machine learning, data science, or other cutting-edge AI fields, Mercor offers a streamlined path to your ideal position. Explore the possibilities and accelerate your AI career by visiting Mercor through our exclusive referral link:
Find Your AI Dream Job on Mercor
Your next big opportunity in AI could be just a click away!
Next, you need to calculate the means for each column. For our sales example, that would look like this:
$$ \bar{Y} = \frac{100+150+200}{3} = 150$$
AI- Powered Jobs Interview Warmup For Job Seekers

⚽️Comparative Analysis: Top Calgary Amateur Soccer Clubs – Outdoor 2025 Season (Kids' Programs by Age Group)
Now we can calculate each element in what’s called the variance-covariance matrix:
$$ \operatorname {Var} (X)=\sum _{i=1}^{n}\left({x_{i}}-{\bar {x}}\right)^{2} $$
AI Jobs and Career
And before we wrap up today's AI news, I wanted to share an exciting opportunity for those of you looking to advance your careers in the AI space. You know how rapidly the landscape is evolving, and finding the right fit can be a challenge. That's why I'm excited about Mercor – they're a platform specifically designed to connect top-tier AI talent with leading companies. Whether you're a data scientist, machine learning engineer, or something else entirely, Mercor can help you find your next big role. If you're ready to take the next step in your AI career, check them out through my referral link: https://work.mercor.com/?referralCode=82d5f4e3-e1a3-4064-963f-c197bb2c8db1. It's a fantastic resource, and I encourage you to explore the opportunities they have available.
and
$$ \operatorname {Cov} (X,Y)=\sum _{i=1}^{n}\left({x_{i}}-{\bar {x}}\right)\left({y_{i}}-{\bar {y}}\right) $$
Invest in your future today by enrolling in this Azure Fundamentals - Pass the Azure Fundamentals Exam with Ease: Master the AZ-900 Certification with the Comprehensive Exam Preparation Guide!
- AWS Certified AI Practitioner (AIF-C01): Conquer the AWS Certified AI Practitioner exam with our AI and Machine Learning For Dummies test prep. Master fundamental AI concepts, AWS AI services, and ethical considerations.
- Azure AI Fundamentals: Ace the Azure AI Fundamentals exam with our comprehensive test prep. Learn the basics of AI, Azure AI services, and their applications.
- Google Cloud Professional Machine Learning Engineer: Nail the Google Professional Machine Learning Engineer exam with our expert-designed test prep. Deepen your understanding of ML algorithms, models, and deployment strategies.
- AWS Certified Machine Learning Specialty: Dominate the AWS Certified Machine Learning Specialty exam with our targeted test prep. Master advanced ML techniques, AWS ML services, and practical applications.
- AWS Certified Data Engineer Associate (DEA-C01): Set yourself up for promotion, get a better job or Increase your salary by Acing the AWS DEA-C01 Certification.
For our sales example, that calculation would look like this:
$$ \operatorname {Var} (Y)=\sum _{i=1}^{3}\left({y_{i}}-{\bar {y}}\right)^{2}=(100-150)^{2}+(150-150)^{2}+(200-150)^{2})=2500 $$
and
$$ \operatorname {Cov} (X,Y)=\sum _{i=1}^{3}\left({x_{i}}-{\bar {x}}\right)\left({y_{i}}-{\bar {y}}\right)=(2016-2017)(100-150)+(2017-2017)(150-150)+(2018-2017)(200-150))=-500 $$
Now we can finally calculate our autoregressive parameters! We do that by solving this equation:
$$ \hat {\beta }=(X^{\prime }X)^{-1}X^{\prime }Y=\frac {1}{2500}\times 2500\times (-500)=0.20 $$\.20 . That’s it! Our autoregressive parameter is 0\.20 . Once we have that parameter, we can plug it into our autoregressive equation:
$$ Y_{t+1}=0\.20 Y_t+a_1+a_2+a_3footnote{where $a_1$, $a_2$, and $a_3$ are error terms assuming an AR(3)} .$$ And that’s how you solve for autoregressive parameters! Of course, in reality you would be working with much larger datasets, but the underlying principles are still the same. Once you have your autoregressive parameters, you can plug them into the equation and start making predictions!.
Which Method Should You Use?
The estimation method you should use depends on your particular situation and goals. If you are looking for simple and interpretable results, then Ordinary Least Squares may be the best method for you. If you are looking for more accurate predictions, then Maximum Likelihood or Least Squares with L1 Regularization may be better methods for you.
Autoregressive models STEP BY STEP:
1) Download data: The first step is to download some data. This can be done by finding a publicly available dataset or by using your own data if you have any. For this example, we will be using data from the United Nations Comtrade Database.
2) Choose your variables: Once you have your dataset, you will need to choose the variables you want to use in your autoregression model. In our case, we will be using the import and export values of goods between countries as our independent variables.
3) Estimate your model: After choosing your independent variables, you can estimate your autoregression model using the method of least squares. OLS estimation can be done in many statistical software packages such as R or STATA.
4) Interpret your results: Once you have estimated your model, it is important to interpret the results in order to understand what they mean. The coefficients represent the effect that each independent variable has on the dependent variable. In our case, the coefficients represent the effect that imports and exports have on trade balance. A positive coefficient indicates that an increase in the independent variable leads to an increase in the dependent variable while a negative coefficient indicates that an increase in the independent variable leads to a decrease in the dependent variable.
5)Make predictions: Finally, once you have interpreted your results, you can use your autoregression model to make predictions about future values of the dependent variable based on past values of the independent variables.
Conclusion: In this blog post, we have discussed what autoregression is and how to find autoregressive parameters.
Estimating an autoregression model is a relatively simple process that can be done in many statistical software packages such as R or STATA.
In statistics and machine learning, autoregression is a modeling technique used to describe the linear relationship between a dependent variable and one more independent variables. To find the autoregressive parameters, you can use a method known as least squares regression which minimizes the sum of squared residuals. This blog post also explains how to set up your data for calculating least squares regression as well as how to calculate Variance and Covariance before finally calculating your autoregressive parameters. After finding your parameters you can plug them into an autoregressive equation to start making predictions about future events!
We have also discussed three different methods for estimating those parameters: Ordinary Least Squares, Maximum Likelihood, and Least Squares with L1 Regularization. The appropriate estimation method depends on your particular goals and situation.

Machine Learning For Dummies App
Machine Learning For Dummies on iOs: https://apps.apple.com/
Machine Learning For Dummies on Windows: https://www.
Machine Learning For Dummies Web/Android on Amazon: https://www.amazon.
What are some good datasets for Data Science and Machine Learning?
Machine Learning Engineer Interview Questions and Answers
Machine Learning Breaking News
Transformer – Machine Learning Models
Machine Learning – Software Classification
Autoregressive Model
Autoregressive generative models can estimate complex continuous data distributions such as trajectory rollouts in an RL environment, image intensities, and audio. Traditional techniques discretize continuous data into various bins and approximate the continuous data distribution using categorical distributions over the bins. This approximation is parameter inefficient as it cannot express abrupt changes in density without using a significant number of additional bins. Adaptive Categorical Discretization (ADACAT) is proposed in this paper as a parameterization of 1-D conditionals that is expressive, parameter efficient, and multimodal. A vector of interval widths and masses is used to parameterize the distribution known as ADACAT. Figure 1 showcases the difference between the traditional uniform categorical discretization approach with the proposed ADACAT.
Each component of the ADACAT distribution has non-overlapping support, making it a specific subfamily of mixtures of uniform distributions. ADACAT generalizes uniformly discretized 1-D categorical distributions. The proposed architecture allows for variable bin widths and more closely approximates the modes of two Gaussians mixture than a uniformly discretized categorical, making it highly expressive than the latter. Additionally, a distribution’s support is discretized using quantile-based discretization, which bins data into groups with similar measured data points. ADACAT uses deep autoregressive frameworks to factorize the joint density into numerous 1-D conditional ADACAT distributions in problems with more than one dimension.
Continue reading | Check out the paper and github link.
Pytorch – Computer Application
https://torchmetrics.readthedocs.io/en/stable//index.html
Best practices for training PyTorch model
What are some ways we can use machine learning and artificial intelligence for algorithmic trading in the stock market?
What are some good datasets for Data Science and Machine Learning?
Top 100 Data Science and Data Analytics and Data Engineering Interview Questions and Answers
Machine Learning Engineer Interview Questions and Answers
- Recent developments in LLM architectures, KV sharing, mHC, and compressed attentionby /u/rhiever (Data Science) on May 17, 2026 at 8:38 pm
submitted by /u/rhiever [link] [comments]
- #1 on memory benchmark LongMemEval with Gemini Flash, not Pro [R]by /u/j-m-k-s (Machine Learning) on May 17, 2026 at 5:44 pm
Disclosure: first author. Evaluation of an experimental memory retrieval system against LongMemEval (Wang et al., 2024). Figured the results might be of interest here, particularly the deliberate use of a smaller answering model to isolate retrieval quality from model capability. 96.4% at top-50 with Gemini 3 Flash. Comparative reported scores (all Gemini 3 Pro): Mem0 94.8%, Honcho 92.6%, HydraDB 90.79%, Supermemory 85.2%. Retrieval architecture draws on episodic memory theory (Tulving, 1972), reconstructive recall (Bartlett, 1932), and temporal context models (Howard & Kahana, 2002). Three design choices we think mattered: Query decomposition: parallel retrieval passes targeting distinct information needs. Critical for multi-session questions where no single query surfaces all relevant fragments. Temporal salience scoring: candidates scored on semantic similarity, lexical precision, and temporal salience, reflecting associative and recency factors in human recall (Polyn et al., 2009). Coherence re-ranking: re-ranked for cross-memory coherence and temporal chain resolution before presentation to the answering model. Methodology: forked Mem0's open-source benchmarking script, replaced storage and retrieval with our system, stripped all question-specific prompt templates. Single generic prompt, 500 questions. Category results at top-50: single-session (user) 98.6%, assistant 100%, preferences 96.7%, knowledge update 97.4%, multi-session 94.0%, temporal reasoning 95.5%. Limitations: single benchmark evaluation; architecture details intentionally limited; single model configuration, no ablations; production conditions (adversarial inputs, privacy, contradictory information) not tested. Above ~96% we hit evaluation ceiling effects: ambiguous questions, narrow expected answers, dataset inconsistencies. Some benchmark errors identified, which we reported upstream. Paper | Results | Answerer prompt Curious if others have explored similar cognitive-science-informed retrieval architectures for conversational memory. submitted by /u/j-m-k-s [link] [comments]
- Slop is making me feel disconnected from AI Research [D]by /u/Skye7821 (Machine Learning) on May 17, 2026 at 4:55 pm
Hello everyone. This is just a small rant on my part. I’m relatively young, a final year undergrad, and I’ve been interested in AI researcher since I was in high school. Over that period of time I feel there has been a significant shift in the landscape regarding the culture surrounding the research. While I’ve really enjoyed producing some interesting and creative work, I can’t help but feel that slowly the wave of low quality AI research and researchers are really making me feel frustrated. To just give a summary of what I and many others have seen: - Papers with hallucinated citations and even prompts contained in the papers - Papers with clearly misleading data that does not tell the whole picture. - Labs who have built a culture around quantity over quality, pumping out pubs, citing each other, and having all of the lab on each paper to inflate each students publication record. - Highschoolers…. Yes HIGHSCHOOLERS, becoming more common submitting at conferences that don’t really know what they are doing but paying a pretty penny to participate in “research programs” which are really just cash cows taking advantage of the fierce competition. See the post on the subreddit for more info. - Even the so called “top labs” producing work that is somewhat misleading or not fully representative. For instance see what happened recently with TurboQuant. - Research from “low tier institutions” being drowned out because they are not good for click baiting and farming views on LinkedIn and X, even if they are high quality. It’s… a lot I know. Of course these problems have been around for a long time, but I feel as if lately they have become more and more exacerbated. I originally felt that I was attached to AI research primarily for the creativity and freedom, but I feel that ironically AI itself has been a hindrance on the quality of work being published. Of course I don’t mean to say that all AI has been bad for ML research, I mean even I use it extensively to help me polish my writing and generate seaborn plots for my data, but that is very very different from just pumping out low quality cookie cutter work. Anyways, just wondering if anyone else shares similar thoughts. I know I’m relatively young here so maybe some of you have better insights into the broader trends over the decades. submitted by /u/Skye7821 [link] [comments]
- Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention [P]by /u/seraschka (Machine Learning) on May 17, 2026 at 1:41 pm
submitted by /u/seraschka [link] [comments]
- ML lead vs PM on eval-methodology layer independence. who's actually right here? [D]by /u/Critical_Builder_902 (Machine Learning) on May 17, 2026 at 9:16 am
got into an argument with our ML lead at 11pm yesterday about an eval methodology a PM had built off a framework she learned at an AI PM cohort. shes claiming a layered defense framework, hes saying the layers are statistically conditioned and her independence claim is wrong. they both have a point. the framework as taught at the cohort (it was Product Faculty's, fwiw) is genuinely useful for non-eng PMs. it forces explicit thinking about behavioral checks vs adversarial probes vs traditional metrics. but the way it's been taught in the abridged form makes the layers sound independent when they statistically arent. for ML/AI engineers here who've worked with non-eng PMs on production eval. how do you handle the gap between the simplified eval frameworks PMs learn and the actual statistical interactions in production? specifically interested in how you've negotiated the conversation with a PM who's ""done the cohort"" and shows up with a framework that's solid in its public form but has subtle issues in its statistical foundations. submitted by /u/Critical_Builder_902 [link] [comments]
- Program misleading high school students into paying to perform academic misconduct in ML Research [D]by /u/Marisu_BG (Machine Learning) on May 17, 2026 at 6:08 am
I was browsing OpenReview and I came accross this person called Kevin Zhu https://openreview.net/profile?id=~Kevin_Zhu3, lets say I was impressed when I saw 158 publications and 468 coauthors, and out of curiosity I searched up his afflication (https://algoverseairesearch.org/) Turns out it is a paid program, and most interesting it is marketed towards high school students. They have a whole column of papers listed as Neurips publications (their website states: 289 Algoverse Students Accepted to NeurIPS 2025). I was originally unware of the rigor of Neurips workshops and I was understandably very shocked. I skimmed through four of their papers one by one. Every single one had errors that would be caught by opening the PDF and reading it once. I am completely unsure how they are not caught by reviewers even at a workshop. https://openreview.net/forum?id=21pxWVRoPL - Appendix Tables 6.5 and 6.6 are supposed to report two different experimental conditions: "Stigma Negative" and "Stigma Positive." One measures what happens when the user pushes the model toward a negative association with a stigmatized group. The other measures the opposite direction. These are fundamentally different experiments, yet they have the exact same numbers in the results. There are typo in the Abstract section, their Related Works is within Results section. Citations are completely wrong, which I suspect to be AI generated. https://openreview.net/pdf?id=0BYRYwGCbK - broken prompts in a dataset that claims human review. The results say the opposite of the abstract. The abstract claims the work "reveals novel methods to elicit sycophancy." Then they proceed to show most modifiers perform about the same as the unmodified control (91-95% accuracy). Moreover, their citations also seem AI generated with false citations (wrong authors, wrong formats ..) Interestingly, undisclosed self-citation by Kevin Zhu. https://openreview.net/pdf?id=VcRUAT5G8I - Two foundational methods are attributed to the wrong paper. TIES merging and Task Arithmetic, two well known methods, was introduced but never cited. Same AI generated citations, I am not even going to get to the content anymore. https://openreview.net/pdf?id=It7AgR3A9H - eleven authors, zero contribution. Four papers, that I RANDOMLY CLICKED ON WITH NO ORDER, all follow the same template take existing method -> run it with some variation, likely done by AI -> put Kevin Zhu as an author -> submit to workshop I am unsure how any of these bypass any form of peer review process, only today I learned how low the bar is for workshops. Why I am posting: It angers to me when you market this to high schoolers and tell them you can get into Stanford and MIT. A 16 year old look at this and say, if I pay $3,325, I can get a Neurip publication. Then they proceed to let them publish a paper clear errors. This is academic dishonesty, but I dont think the kids even know they are commiting it. Kevin Zhu puts his name on every single paper published, self-cite himself in these paper, and charge student $3,325. I wasn't fully aware of how much lighter the workshop review process is, and I really want to hear why this is. submitted by /u/Marisu_BG [link] [comments]
- Anyone from India attending EEML ? [D]by /u/Suhan_XD (Machine Learning) on May 16, 2026 at 4:08 pm
I got accepted into EEML and I’m a little confused about travel and stay. Has anyone else from India been accepted? Let’s connect! submitted by /u/Suhan_XD [link] [comments]
- Made and Published a Paper Comparing Analysis of CNN and Vision Transformer Architectures for Brain Tumor Detection [R]by /u/Mental-Climate5798 (Machine Learning) on May 16, 2026 at 3:47 pm
Hi everyone 😄 A while ago I worked on a project where I compared computer vision architectures on detecting and classifying brain tumors in brain MRI scans. I was looking for some feedback on the methodology and really anything else--just simple research stuff. This isn't meant to be some big paper but a small research project that I did as a high schooler. Here is the paper: zenodo.org/records/15973756 I appreciate any feedback! submitted by /u/Mental-Climate5798 [link] [comments]
- Do you agree with Judea that learning from data is not everything? [D]by /u/xTouny (Machine Learning) on May 16, 2026 at 2:46 pm
Link: Judea Pearl, 2011 ACM Turing Award Recipient (2:18:05) Quote: There is a limitation to that which people not everybody understand. I already mentioned a limitation that you have a hierarchy here and going from correlation to causation and from causation from causation to explanation or to imagination. It's hard for people especially in machine learning to grasp that wall the limitation of one layer where one layer ends and the other one begins. Why? Because of two things. Machine learning school of thought has two paradigms that they love everybody love. Number one tabula raza I don't want to get any opinion I don't want to get any preconceived knowledge I want to derive everything by myself let the computer learn it and you find the word learning overused .. The other handcuff is let's do it the way that the brain does it. So if it looks like neurons interacting, it's good. If it looks like knowledge coming from rule system, it's bad because it's man-made .. Now there's limitation to that. We can prove today that you cannot do certain things by looking at data and data only. It's not a matter of opinion. It's a matter of mathematical proof that you cannot you can look at people who take aspirin all day and people whether or not they have headache all day and you cannot prove that the aspirin is what causes the headache. In particular, Judea states: "It's not a matter of opinion. It's a matter of mathematical proof". So we have formal proof that there are fundamental limits of learning from data. Judea later in the interview states we have solutions to problems faced by the machine learning community; nonetheless they are not adopted because of hype. Discussion. Do you agree with Judea? submitted by /u/xTouny [link] [comments]
- For those in corporate roles, how do you all work with the non-technical areas you support?by /u/SkipGram (Data Science) on May 16, 2026 at 12:50 pm
I've spent the past few years at what feels like a somewhat dysfunctional company. Our Data Science and Engineering teams are very siloed away from the rest of the company, including the teams we support and build things for. IC individuals rarely interact with those requesting the work, and myself and many of my peers have the common challenge of needing to talk to the people who asked for what we're building, but we're often told no we can't go talk to them. This is one of our biggest pain points, and it makes it very difficult to know if I'm making the most sensible choices given the goals of the work. In the small amount of conversations I have been able to be in with our non-tech teams, it feels like there's this constant tension. Some of my team's 'vision' for the future feels more like changing another area's business strategy instead of using Data Science to support them with their actual stated strategy. Maybe these two things can work towards the same goals in the future, but from the small amount I've seen now, we're rowing in a different direction than the teams we're supposed to be helping, and I'm worried this will harm trust and the ability to influence in the future if there are places we want to suggest different ways of approaching a problem. I'm not in enough of the conversations I need to be in to have this context though. Is it like this at other companies? I know the economy and job market are pretty rough right now, but as I'm thinking about longer term decisions, I want a company where there's a functional relationship between business and technology and those of us building can actually speak to the people we're building for. Building the best technical solution doesn't matter if it doesn't actually help the people it's for, or have a way to be incorporated into current processes. I'm just not sure how to assess this from the outside or how common this is. submitted by /u/SkipGram [link] [comments]
- Backlash against Arxiv's proposed 1 year ban is genuinely perplexing. [D]by /u/NeighborhoodFatCat (Machine Learning) on May 16, 2026 at 8:30 am
Anyone else surprised at the enormous amount of backlash against Arxiv's proposed 1 year ban for authors and coauthors publishing papers with hallucinated reference and other obvious LLM/Gen AI artifacts? https://x.com/tdietterich/status/2055000956144935055 https://xcancel.com/tdietterich/status/2055000956144935055 Some of the responses: "This is the age of AI, Arxiv should be part of the movement instead of holding onto the old ways" "The P.I. is a macro-manager, not a micro-manager, can't be expected to read every reference that his/her student puts in." "I publish 20+ papers a year with my students, how do you expect me to read everything?" "What about teams with 100s of people? How can you expect the authors to check references?" "Who reads references in depth anyways!?" These responses are very revealing how academia works. Apparently people have just been slapping names on research papers they've never even read or fact-checked themselves. Very obscene! submitted by /u/NeighborhoodFatCat [link] [comments]
- KDD 2026 Cycle 2 Results [D]by /u/ATadDisappointed (Machine Learning) on May 16, 2026 at 4:07 am
Results for the research track have been released. submitted by /u/ATadDisappointed [link] [comments]
- ROCm with PyTorch and PyTorch Lightning seems to still suck for research [D]by /u/QuantumQuokka (Machine Learning) on May 16, 2026 at 12:01 am
So I asked about people's experiences with ROCm in a post a few weeks or so ago https://www.reddit.com/r/MachineLearning/comments/1t6cng3/rocm_status_in_mid_2026_d/ I actually went and procured a RX 7900XTX reference version to give it a try My discovery is that it kind of still sucks I have a small codebase for training flow matching models (SANA Architecture), which runs fine on my RTX3090s. But the moment I ported it across to ROCm it was NaNs absolutely everywhere. Forward passes were absolutely fine, but the moment you called backwards() all bets were off. The code was kept identical, apart from altering the pip environment to point to torch2.12 with ROCm7.2 instead of CUDA Trying everything from switching between bf16, fp32, to tweaking various environment variables yielded nothing. Unless there's some trick I'm missing, I get the feeling that ROCm is still seriously behind. I tried running the nanoGPT training script, which ran perfectly My intuition is that the ROCm people have probably tested their stack on established well known codebases. But, it's still remarkably fragile on even slightly uncommon code. submitted by /u/QuantumQuokka [link] [comments]
- No feeling quite lower than...by /u/MeLikaDoTheChaCha (Data Science) on May 15, 2026 at 10:49 pm
crushing the system design interview just to bomb the pandas-live coding interview even though you've been using pandas everyday for 10 years. If anyone wants feedback on how that feels like hmu. Anyone know if they sell kegs of Jager? Asking for a friend... submitted by /u/MeLikaDoTheChaCha [link] [comments]
- Struggling with Overfitting on Medical Imaging Task [D]by /u/Future-Structure-296 (Machine Learning) on May 15, 2026 at 8:16 pm
Hi everyone, I’m working on a 2-class classification problem (LCA vs. RCA coronary arteries) using 2D X-ray angiograms. I’m currently stuck in a cycle of extreme overfitting and could use some advice on my training strategy. The Setup: Dataset: Small (~900 training frames from ~300 unique DICOMs). Architecture: InceptionV3 (PyTorch). Input: Grayscale .npy arrays converted to 3-channel, resized to 299x299. Current Strategy: Transfer learning from ImageNet. I’ve tried full unfreezing and partial unfreezing (last blocks). The Problem: My training accuracy hits ~95-99% within a few epochs, but validation accuracy peaks early (around 74-79%) and then collapses toward 30-40% as the model starts memorizing the specific textures of the training patients. What I’ve Tried So Far: Normalization: Standard ImageNet mean/std (applied at load time). Class Weights: Handled 2:1 imbalance (LCA:RCA). Regularization: Added Dropout (tried 0.3 to 0.6) and Weight Decay (1e-4). Augmentation: Flips, 25deg rotations, and translation. Schedulers: ReduceLROnPlateau (factor 0.5, patience 8). Would love any insights or papers you'd recommend for small-sample medical classification. Thanks! submitted by /u/Future-Structure-296 [link] [comments]
- Publication Topics Questionby /u/InfamousTrouble7993 (Data Science) on May 15, 2026 at 6:02 pm
Hi, i am looking for topics to cover in a potential publication, as I will have a few months free time. The problem is, I am struggling to decide for a potential problem statement to focus on, to find a solution/get insights about it. I asked ai what kind of problems are covered in papers currently, but the response was not satisfying for me. Now I ask this in this com. Are you currently working on problems and know about additional problems to tackle? My experience fields: statistics/probability theory machine/deep learning natural language processing submitted by /u/InfamousTrouble7993 [link] [comments]
- Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion [R]by /u/Franck_Dernoncourt (Machine Learning) on May 15, 2026 at 5:21 pm
Paper: https://arxiv.org/abs/2605.12825 Code: https://github.com/chiennv2000/orthrus Disclosure: co-author. Idea: Inject a trainable diffusion attention module into each layer of a frozen AR Transformer. Both heads share one KV cache. Diffusion head projects K=32 tokens in parallel; AR head verifies in a second pass and accepts the longest matching prefix. Output distribution is provably identical to the base model. Results: Up to 7.8× TPF, ~6× wall-clock on MATH-500. 16% of params trained, <1B tokens, 24h on 8×H200. vs. diffusion LMs (Dream, Fast-dLLM-v2, SDAR, Mercury, Gemini Diffusion): they modify base weights and lose accuracy (Fast-dLLM-v2: -11 pts on MATH-500). Orthrus freezes the backbone; accuracy matches Qwen3-8B exactly. vs. Speculative Decoding (EAGLE-3, DFlash): No external drafter, no separate cache, and zero Time-To-First-Token (TTFT) penalty because we don't have to initialize and sync a separate drafter model. KV overhead is O(1) (~4.5 MiB flat). Acceptance length on MATH-500: 11.7 vs. 7.9 (DFlash) vs. 3.5 (EAGLE-3). Single-step denoising beats multi-step (6.35 vs. 3.53 TPF). KL distillation beats CE on acceptance rate. Limitations: strictly bounded by the frozen base model (inherits its biases, hallucinations, knowledge gaps); Qwen3-only evaluation; greedy + rejection sampling only. https://i.redd.it/5lsf6l5w4c1h1.gif submitted by /u/Franck_Dernoncourt [link] [comments]
- PINN is predicting trivial solution for stiff ODE [D]by /u/cae_shot (Machine Learning) on May 15, 2026 at 4:07 pm
I am learning physics informed neural networks. Currently, I am solving a simple second ODE (damped harmonic oscillator). The equation is m*d2y/dt2 + mu*dy/dt + k*y = 0 (bcs: y(t=0) = 1, y'(t=0) = 0). I managed to draft a code. The code works for k values upto 50. However, when increased the value beyond 50, PINN is predicting trivial solution. I tried several things: reducing the learning rate, increasing the data points, reusing the weights trained using lower k values, and using a for loop to increase the k value in smaller steps (step size 20). However, none of them helped. Could you help me with this. Thanks in advance. submitted by /u/cae_shot [link] [comments]
- Looking for a real world dataset (or website where i can find it) [P]by /u/novromeda (Machine Learning) on May 15, 2026 at 2:39 pm
Hi guys, I’m gonna do a data analysis project based on data privacy, bias and data interpretability. For this reason our professor asked for a real world dataset in order to analyze a real case. Additionally I would prefer the least anonymity possible for that dataset in order to create some interesting technique over it (differential privacy, k-anonimity exc…) Do you have any advice where to find the dataset? (links or website names) Because I checked on Kaggle but I don’t know how to find if the dataset is real or not submitted by /u/novromeda [link] [comments]
- software trying to catch software is officially a dead en [D]by /u/bebo117722 (Machine Learning) on May 15, 2026 at 2:36 pm
I feel like we've crossed a weird threshold in the generative AI space where the arms race against botnets is just over. and the bots won I was reading that interview recently where the Reddit CEO was floating the idea of using Face ID and Touch ID just to verify that commenters are actual humans. it honestly hit me how absurd things have gotten. standard heuristics and behavioral analysis are completely useless now against modern LLMs, and vision models solve captchas faster than I can. the dead internet theory is basically just our daily engineering reality at this point we are at a stage where the only reliable way to prove you aren't an automated script is to literally anchor your digital presence to your physical biology. From a purely technical standpoint, it’s fascinating seeing the shift toward hardware verification. like looking at the engineering behind that Orb device the idea of doing local biometric iris hashing on custom hardware just to output a zero-knowledge proof of personhood. It's wild that we actually need dedicated physical devices now just to enforce the concept of "one human, one account" it makes total sense why platforms are pushing for this, beacuse trying to build software firewalls against infinitely scalable AI agents is a losing battle. but it just feels like such a massive, permanent shift for how the internet works. idk, is anyone else working on sybil resistance right now? are we just collectively accepting that biometric hardware gates are the only way to save the web from being 99% synthetic noise? submitted by /u/bebo117722 [link] [comments]


![Made and Published a Paper Comparing Analysis of CNN and Vision Transformer Architectures for Brain Tumor Detection [R]](https://preview.redd.it/fgq8e0swsi1h1.png?width=640&crop=smart&auto=webp&s=1376d513ddfdeeb56ba9ad93366b037a322d2e87)
![Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion [R]](https://preview.redd.it/5lsf6l5w4c1h1.gif?frame=1&width=140&height=72&auto=webp&s=023b02ee925924f982e6eea09bbeefbe039d00ab)























96DRHDRA9J7GTN6