What are the top 3 methods used to find Autoregressive Parameters in Data Science?

What are the top 3 methods used to find Autoregressive Parameters in Data Science?
DjamgaMind

DjamgaMind: Audio Intelligence for the C-Suite (Energy, Healthcare, Finance)

Are you drowning in dense legal text? DjamgaMind is the new audio intelligence platform that turns 100-page healthcare or Energy mandates into 5-minute executive briefings. Whether you are navigating Bill C-27 (Canada) or the CMS-0057-F Interoperability Rule (USA), our AI agents decode the liability so you don’t have to. 👉 Start your specialized audio briefing today at Djamgamind.com


AI Jobs and Career

I wanted to share an exciting opportunity for those of you looking to advance your careers in the AI space. You know how rapidly the landscape is evolving, and finding the right fit can be a challenge. That's why I'm excited about Mercor – they're a platform specifically designed to connect top-tier AI talent with leading companies. Whether you're a data scientist, machine learning engineer, or something else entirely, Mercor can help you find your next big role. If you're ready to take the next step in your AI career, check them out through my referral link: https://work.mercor.com/?referralCode=82d5f4e3-e1a3-4064-963f-c197bb2c8db1. It's a fantastic resource, and I encourage you to explore the opportunities they have available.

Job TitleStatusPay
Full-Stack Engineer Strong match, Full-time $150K - $220K / year
Developer Experience and Productivity Engineer Pre-qualified, Full-time $160K - $300K / year
Software Engineer - Tooling & AI Workflows (Contract) Contract $90 / hour
DevOps Engineer (India) Full-time $20K - $50K / year
Senior Full-Stack Engineer Full-time $2.8K - $4K / week
Enterprise IT & Cloud Domain Expert - India Contract $20 - $30 / hour
Senior Software Engineer Contract $100 - $200 / hour
Senior Software Engineer Pre-qualified, Full-time $150K - $300K / year
Senior Full-Stack Engineer: Latin America Full-time $1.6K - $2.1K / week
Software Engineering Expert Contract $50 - $150 / hour
Generalist Video Annotators Contract $45 / hour
Generalist Writing Expert Contract $45 / hour
Editors, Fact Checkers, & Data Quality Reviewers Contract $50 - $60 / hour
Multilingual Expert Contract $54 / hour
Mathematics Expert (PhD) Contract $60 - $80 / hour
Software Engineer - India Contract $20 - $45 / hour
Physics Expert (PhD) Contract $60 - $80 / hour
Finance Expert Contract $150 / hour
Designers Contract $50 - $70 / hour
Chemistry Expert (PhD) Contract $60 - $80 / hour

What are the top 3 methods used to find Autoregressive Parameters in Data Science?

 In order to find autoregressive parameters, you will first need to understand what autoregression is. Autoregression is a statistical method used to create a model that describes data as a function of linear regression of lagged values of the dependent variable. In other words, it is a model that uses past values of a dependent variable in order to predict future values of the same dependent variable.

In time series analysis, autoregression is the use of previous values in a time series to predict future values. In other words, it is a form of regression where the dependent variable is forecasted using a linear combination of past values of the independent variable. The parameter values for the autoregression model are estimated using the method of least squares.

The autoregressive parameters are the coefficients in the autoregressive model. These coefficients can be estimated in a number of ways, including ordinary least squares (OLS), maximum likelihood (ML), or least squares with L1 regularization (LASSO). Once estimated, the autoregressive parameters can be used to predict future values of the dependent variable.

To find the autoregressive parameters, you need to use a method known as least squares regression. This method finds the parameters that minimize the sum of the squared residuals. The residual is simply the difference between the predicted value and the actual value. So, in essence, you are finding the parameters that best fit the data.

What are the top 3 methods used to find Autoregressive Parameters in Data Science?
What are the top 3 methods used to find Autoregressive Parameters in Data Science?

How to Estimate Autoregressive Parameters?


There are three main ways to estimate autoregressive parameters: ordinary least squares (OLS), maximum likelihood (ML), or least squares with L1 regularization (LASSO).

Ordinary Least Squares: Ordinary least squares is the simplest and most common method for estimating autoregressive parameters. This method estimates the parameters by minimizing the sum of squared errors between actual and predicted values.

Maximum Likelihood: Maximum likelihood is another common method for estimating autoregressive parameters. This method estimates the parameters by maximizing the likelihood function. The likelihood function is a mathematical function that quantifies the probability of observing a given set of data given certain parameter values.

Least Squares with L1 Regularization: Least squares with L1 regularization is another method for estimating autoregressive parameters. This method estimates the parameters by minimizing the sum of squared errors between actual and predicted values while also penalizing models with many parameters. L1 regularization penalizes models by adding an extra term to the error function that is proportional to the sum of absolute values of the estimator coefficients.

Finding Autoregressive Parameters: The Math Behind It
To find the autoregressive parameters using least squares regression, you first need to set up your data in a certain way. You need to have your dependent variable in one column and your independent variables in other columns. For example, let’s say you want to use three years of data to predict next year’s sales (the dependent variable). Your data would look something like this:

| Year | Sales |
|——|——-|
| 2016 | 100 |
| 2017 | 150 |
| 2018 | 200 |

AI-Powered Professional Certification Quiz Platform
Crack Your Next Exam with Djamgatech AI Cert Master

Web|iOs|Android|Windows

Are you passionate about AI and looking for your next career challenge? In the fast-evolving world of artificial intelligence, connecting with the right opportunities can make all the difference. We're excited to recommend Mercor, a premier platform dedicated to bridging the gap between exceptional AI professionals and innovative companies.

Whether you're seeking roles in machine learning, data science, or other cutting-edge AI fields, Mercor offers a streamlined path to your ideal position. Explore the possibilities and accelerate your AI career by visiting Mercor through our exclusive referral link:

Find Your AI Dream Job on Mercor

Your next big opportunity in AI could be just a click away!

Next, you need to calculate the means for each column. For our sales example, that would look like this:

$$ \bar{Y} = \frac{100+150+200}{3} = 150$$

Now we can calculate each element in what’s called the variance-covariance matrix:

$$ \operatorname {Var} (X)=\sum _{i=1}^{n}\left({x_{i}}-{\bar {x}}\right)^{2} $$

AI Jobs and Career

And before we wrap up today's AI news, I wanted to share an exciting opportunity for those of you looking to advance your careers in the AI space. You know how rapidly the landscape is evolving, and finding the right fit can be a challenge. That's why I'm excited about Mercor – they're a platform specifically designed to connect top-tier AI talent with leading companies. Whether you're a data scientist, machine learning engineer, or something else entirely, Mercor can help you find your next big role. If you're ready to take the next step in your AI career, check them out through my referral link: https://work.mercor.com/?referralCode=82d5f4e3-e1a3-4064-963f-c197bb2c8db1. It's a fantastic resource, and I encourage you to explore the opportunities they have available.

and

$$ \operatorname {Cov} (X,Y)=\sum _{i=1}^{n}\left({x_{i}}-{\bar {x}}\right)\left({y_{i}}-{\bar {y}}\right) $$

For our sales example, that calculation would look like this:

$$ \operatorname {Var} (Y)=\sum _{i=1}^{3}\left({y_{i}}-{\bar {y}}\right)^{2}=(100-150)^{2}+(150-150)^{2}+(200-150)^{2})=2500 $$

and

$$ \operatorname {Cov} (X,Y)=\sum _{i=1}^{3}\left({x_{i}}-{\bar {x}}\right)\left({y_{i}}-{\bar {y}}\right)=(2016-2017)(100-150)+(2017-2017)(150-150)+(2018-2017)(200-150))=-500 $$

Now we can finally calculate our autoregressive parameters! We do that by solving this equation:


AI Unraveled: Demystifying Frequently Asked Questions on Artificial Intelligence (OpenAI, ChatGPT, Google Gemini, Generative AI, Discriminative AI, xAI, LLMs, GPUs, Machine Learning, NLP, Promp Engineering)

$$ \hat {\beta }=(X^{\prime }X)^{-1}X^{\prime }Y=\frac {1}{2500}\times 2500\times (-500)=0.20 $$\.20 . That’s it! Our autoregressive parameter is 0\.20 . Once we have that parameter, we can plug it into our autoregressive equation:

$$ Y_{t+1}=0\.20 Y_t+a_1+a_2+a_3footnote{where $a_1$, $a_2$, and $a_3$ are error terms assuming an AR(3)} .$$ And that’s how you solve for autoregressive parameters! Of course, in reality you would be working with much larger datasets, but the underlying principles are still the same. Once you have your autoregressive parameters, you can plug them into the equation and start making predictions!.

Which Method Should You Use?
The estimation method you should use depends on your particular situation and goals. If you are looking for simple and interpretable results, then Ordinary Least Squares may be the best method for you. If you are looking for more accurate predictions, then Maximum Likelihood or Least Squares with L1 Regularization may be better methods for you.

Autoregressive models STEP BY STEP:

1) Download data: The first step is to download some data. This can be done by finding a publicly available dataset or by using your own data if you have any. For this example, we will be using data from the United Nations Comtrade Database.

2) Choose your variables: Once you have your dataset, you will need to choose the variables you want to use in your autoregression model. In our case, we will be using the import and export values of goods between countries as our independent variables.

3) Estimate your model: After choosing your independent variables, you can estimate your autoregression model using the method of least squares. OLS estimation can be done in many statistical software packages such as R or STATA.

4) Interpret your results: Once you have estimated your model, it is important to interpret the results in order to understand what they mean. The coefficients represent the effect that each independent variable has on the dependent variable. In our case, the coefficients represent the effect that imports and exports have on trade balance. A positive coefficient indicates that an increase in the independent variable leads to an increase in the dependent variable while a negative coefficient indicates that an increase in the independent variable leads to a decrease in the dependent variable.

5)Make predictions: Finally, once you have interpreted your results, you can use your autoregression model to make predictions about future values of the dependent variable based on past values of the independent variables.

Conclusion: In this blog post, we have discussed what autoregression is and how to find autoregressive parameters. 

Estimating an autoregression model is a relatively simple process that can be done in many statistical software packages such as R or STATA.

In statistics and machine learning, autoregression is a modeling technique used to describe the linear relationship between a dependent variable and one more independent variables. To find the autoregressive parameters, you can use a method known as least squares regression which minimizes the sum of squared residuals. This blog post also explains how to set up your data for calculating least squares regression as well as how to calculate Variance and Covariance before finally calculating your autoregressive parameters. After finding your parameters you can plug them into an autoregressive equation to start making predictions about future events!

We have also discussed three different methods for estimating those parameters: Ordinary Least Squares, Maximum Likelihood, and Least Squares with L1 Regularization. The appropriate estimation method depends on your particular goals and situation.

Machine Learning For Dummies
Machine Learning For Dummies

Machine Learning For Dummies App

Machine Learning For Dummies  on iOs:  https://apps.apple.com/us/app/machinelearning-for-dummies-p/id1610947211

Machine Learning For Dummies on Windowshttps://www.microsoft.com/en-ca/p/machinelearning-for-dummies-ml-ai-ops-on-aws-azure-gcp/9p6f030tb0mt?

Machine Learning For Dummies Web/Android on Amazon: https://www.amazon.com/gp/product/B09TZ4H8V6

What are some good datasets for Data Science and Machine Learning?

Machine Learning Engineer Interview Questions and Answers

Machine Learning Breaking News 

Transformer – Machine Learning Models

transformer neural network

Machine Learning – Software Classification

Autoregressive Model

Autoregressive generative models can estimate complex continuous data distributions such as trajectory rollouts in an RL environment, image intensities, and audio. Traditional techniques discretize continuous data into various bins and approximate the continuous data distribution using categorical distributions over the bins. This approximation is parameter inefficient as it cannot express abrupt changes in density without using a significant number of additional bins. Adaptive Categorical Discretization (ADACAT) is proposed in this paper as a parameterization of 1-D conditionals that is expressive, parameter efficient, and multimodal. A vector of interval widths and masses is used to parameterize the distribution known as ADACAT. Figure 1 showcases the difference between the traditional uniform categorical discretization approach with the proposed ADACAT.

Each component of the ADACAT distribution has non-overlapping support, making it a specific subfamily of mixtures of uniform distributions. ADACAT generalizes uniformly discretized 1-D categorical distributions. The proposed architecture allows for variable bin widths and more closely approximates the modes of two Gaussians mixture than a uniformly discretized categorical, making it highly expressive than the latter. Additionally, a distribution’s support is discretized using quantile-based discretization, which bins data into groups with similar measured data points. ADACAT uses deep autoregressive frameworks to factorize the joint density into numerous 1-D conditional ADACAT distributions in problems with more than one dimension. 

Continue reading | Check out the paper and github link.

Pytorch – Computer Application

https://torchmetrics.readthedocs.io/en/stable//index.html

Best practices for training PyTorch model

What are some ways we can use machine learning and artificial intelligence for algorithmic trading in the stock market?

What are some good datasets for Data Science and Machine Learning?

Top 100 Data Science and Data Analytics and Data Engineering Interview Questions and Answers

Machine Learning Engineer Interview Questions and Answers

  • ISBI 2026: Results Out [D]
    by /u/ade17_in (Machine Learning) on January 15, 2026 at 7:02 am

    Results out for ISBI 2026 - London a few days back. Just want to check with fellow medical imaging peeps on how did it go for all. Results were delayed by a month and I see a pretty high acceptance rate this time. submitted by /u/ade17_in [link] [comments]

  • SQL performance training question
    by /u/idan_huji (Data Science) on January 15, 2026 at 6:26 am

    submitted by /u/idan_huji [link] [comments]

  • Google DS interview
    by /u/No-Mud4063 (Data Science) on January 15, 2026 at 2:34 am

    Have a Google Sr. DS interview coming up in a month. Has anyone taken it? tips? submitted by /u/No-Mud4063 [link] [comments]

  • Nvidia: End-to-End Test-Time Training for Long Context aka Being Able To Update A Model's Weights In Real-Time As You Use It | "TTT changes the paradigm from retrieving info to learning it on the fly...the TTT model treats the context window as a dataset & trains itself on it in real-time." [R]
    by /u/44th--Hokage (Machine Learning) on January 15, 2026 at 1:43 am

    TL;DR: The paper describes a mechanism that essentially turns the context window into a training dataset for a "fast weight" update loop: Inner Loop: The model runs a mini-gradient descent on the context during inference. It updates specific MLP layers to "learn" the current context. Outer Loop: The model's initial weights are meta-learned during training to be "highly updateable" or optimized for this test-time adaptation From the Paper: "Overall, our empirical observations strongly indicate that TTT-E2E should produce the same trend as full attention for scaling with training compute in large-budget production runs." Abstract: We formulate long-context language modeling as a problem in continual learning rather than architecture design. Under this formulation, we only use a standard architecture a Transformer with sliding-window attention. However, our model continues learning at test time via next-token prediction on the given context, compressing the context it reads into its weights. In addition, we improve the model's initialization for learning at test time via meta-learning at training time. Overall, our method, a form of Test-Time Training (TTT), is End-to-End (E2E) both at test time (via next-token prediction) and training time (via meta-learning), in contrast to previous forms. We conduct extensive experiments with a focus on scaling properties. In particular, for 3B models trained with 164B tokens, our method (TTT-E2E) scales with context length in the same way as Transformer with full attention, while others, such as Mamba 2 and Gated DeltaNet, do not. However, similar to RNNs, TTT-E2E has constant inference latency regardless of context length, making it 2.7x faster than full attention for 128K context. Our code is publicly available. Layman's Explanation: Think of this paper as solving the memory bottleneck by fundamentally changing how a model processes information. Imagine you are taking a massive open-book exam. A standard Transformer (like GPT-4) is the student who frantically re-reads every single page of the textbook before answering every single question. This strategy guarantees they find the specific details (perfect recall), but as the textbook gets thicker, they get exponentially slower until they simply cannot finish the test in time. On the other hand, alternatives like RNNs or Mamba try to summarize the entire textbook onto a single index card. They can answer questions instantly because they don't have to look back at the book, but for long, complex subjects, they eventually run out of space on the card and start forgetting crucial information. This new method, Test-Time Training (TTT), changes the paradigm from retrieving information to learning it on the fly. Instead of re-reading the book or summarizing it onto a card, the TTT model treats the context window as a dataset and actually trains itself on it in real-time. It performs a mini-gradient descent update on its own neural weights as it reads. This is equivalent to a student who reads the textbook and physically rewires their brain to master the subject matter before the test. Because the information is now compressed into the model's actual intelligence (its weights) rather than a temporary cache, the model can answer questions instantly (matching the constant speed of the fast index-card models) but with the high accuracy and scaling capability of the slow, page-turning Transformers. This effectively decouples intelligence from memory costs, allowing for massive context lengths without the usual slowdown. Link to the Paper: https://arxiv.org/pdf/2512.23675 Link to the Open-Sourced Official Implementation of End-to-End Test Time Training for Long Context: https://github.com/test-time-training/e2e submitted by /u/44th--Hokage [link] [comments]

  • Does anyone know how hard it is to work with the All of Us database?
    by /u/phymathnerd (Data Science) on January 15, 2026 at 12:04 am

    I have limited python proficiency but I can code well with R. I want to design a project that’ll require me to collect patient data from the All of Us database. Does this sound like an unrealistic plan with my limited python proficiency? submitted by /u/phymathnerd [link] [comments]

  • [P] Provider outages are more common than you'd think - here's how we handle them
    by /u/dinkinflika0 (Machine Learning) on January 14, 2026 at 9:04 pm

    I Work on Bifrost (been posting a lot here lol) and wanted to share what we learned building multi-provider routing, since it's messier than it seems. Github: https://github.com/maximhq/bifrost Initially thought weighted routing would be the main thing - like send 80% of traffic to Azure, 20% to OpenAI. Pretty straightforward. Configure weights, distribute requests proportionally, done. But production is messier. Providers go down regionally. Rate limits hit unexpectedly. Azure might be healthy in US-East but degraded in EU-West. Or you hit your tier limit mid-day and everything starts timing out. So we built automatic fallback chains. When you configure multiple providers on a virtual key, Bifrost sorts them by weight and creates fallbacks automatically. Primary request goes to Azure, fails, immediately retries with OpenAI. Happens transparently - your app doesn't see it. The health monitoring part was interesting. We track success rates, response times, error patterns per provider. When issues get detected, requests start routing to backup providers within milliseconds. No manual intervention needed. Also handles rate limits differently now. If a provider hits TPM/RPM limits, it gets excluded from routing temporarily while other providers stay available. Prevents cascading failures. One thing that surprised us - weighted routing alone isn't enough. You need adaptive load balancing that actually looks at real-time metrics (latency, error rates, throughput) and adjusts on the fly. Static weights don't account for degradation. The tricky part was making failover fast enough that it doesn't add noticeable latency. Had to optimize connection pooling, timeout handling, and how we track provider health. how are you folks handling multi-provider routing in production. Static configs? Manual switching? Something else? submitted by /u/dinkinflika0 [link] [comments]

  • Spine surgery has massive decision variability. Retrospective ML won’t fix it. Curious if a workflow-native, outcome-driven approach could. [D]
    by /u/LaniakeaResident (Machine Learning) on January 14, 2026 at 8:25 pm

    Hi everyone I’m a fellowship-trained neurosurgeon / spine surgeon. I’ve been discussing a persistent problem in our field with other surgeons for a while, and I wanted to run it by people who think about ML systems, not just model performance. I’m trying to pressure-test whether a particular approach is even technically sound, where it would break, and what I’m likely underestimating. Id love to find an interested person to have a discussion with to get a 10000 feet level understanding of the scope of what I am trying to accomplish. The clinical problem: For the same spine pathology and very similar patient presentations, you can see multiple reputable surgeons and get very different surgical recommendations. anything from continued conservative management to decompression, short fusion, or long multilevel constructs. Costs and outcomes vary widely. This isn’t because surgeons are careless. It’s because spine surgery operates with: Limited prospective evidence Inconsistent documentation Weak outcome feedback loops Retrospective datasets that are biased, incomplete, and poorly labeled EMRs are essentially digital paper charts. PACS is built for viewing images, not capturing decision intent. Surgical reasoning is visual, spatial, and 3D, yet we reduce it to free-text notes after the fact. From a data perspective, the learning signal is pretty broken. Why I’m skeptical that training on existing data works: “Labels” are often inferred indirectly (billing codes, op notes) Surgeon decision policies are non-stationary Available datasets are institution-specific and access-restricted Selection bias is extreme (who gets surgery vs who doesn’t is itself a learned policy) Outcomes are delayed, noisy, and confounded Even with access, I’m not convinced retrospective supervision converges to something clinically useful. The idea I’m exploring: Instead of trying to clean bad data later, what if the workflow itself generated structured, high-fidelity labels as a byproduct of doing the work, or at least the majority of it? Concretely, I’m imagining an EMR-adjacent, spine-specific surgical planning and case monitoring environment that surgeons would actually want to use. Not another PACS viewer, but a system that allows: 3D reconstruction from pre-op imaging Automated calculation of alignment parameters Explicit marking of anatomic features tied to symptoms Surgical plan modeling (levels, implants, trajectories, correction goals) Structured logging of surgical cases (to derive patterns and analyze for trends) Enable productivity (generate note, auto populate plans ect.) Enable standardized automated patient outcomes data collection. The key point isn’t the UI, but UI is also an area that currently suffers. It’s that surgeons would be forced (in a useful way) to externalize decision intent in a structured format because it directly helps them plan cases and generate documentation. Labeling wouldn’t feel like labeling it would almost just be how you work. The data used for learning would explicitly include post-operative outcomes. PROMs collected at standardized intervals, complications (SSI, reoperation), operative time, etc, with automated follow-up built into the system. The goal would not be to replicate surgeon decisions, but to learn decision patterns that are associated with better outcomes. Surgeons could specify what they want to optimize for a given patient (eg pain relief vs complication risk vs durability), and the system would generate predictions conditioned on those objectives. Over time, this would generate: Surgeon-specific decision + outcome datasets Aggregate cross-surgeon data Explicit representations of surgical choices, not just endpoints Learning systems could then train on: Individual surgeon decision–outcome mappings Population-level patterns Areas of divergence where similar cases lead to different choices and outcomes Where I’m unsure, and why I’m posting here: From an ML perspective, I’m trying to understand: Given delayed, noisy outcomes, is this best framed as supervised prediction or closer to learning decision policies under uncertainty? How feasible is it to attribute outcome differences to surgical decisions rather than execution, environment, or case selection? Does it make sense to learn surgeon-specific decision–outcome mappings before attempting cross-surgeon generalization? How would you prevent optimizing for measurable metrics (PROMs, SSI, etc) at the expense of unmeasured but important patient outcomes? Which outcome signals are realistically usable for learning, and which are too delayed or confounded? What failure modes jump out immediately? I’m also trying to get a realistic sense of: The data engineering complexity this implies Rough scale of compute once models actually exist The kind of team required to even attempt this (beyond just training models) I know there are a lot of missing details. If anyone here has worked on complex ML systems tightly coupled to real-world workflows (medical imaging, decision support, etc) and finds this interesting, I’d love to continue the discussion privately or over Zoom. Maybe we can collaborate on some level! Appreciate any critique especially the uncomfortable kind!! submitted by /u/LaniakeaResident [link] [comments]

  • [D] Peer matrix evaluation: 10 frontier models judge each other's responses to eliminate single-evaluator bias. Results from async debugging and probability reasoning tasks.
    by /u/Silver_Raspberry_811 (Machine Learning) on January 14, 2026 at 8:10 pm

    Methodology: 10 frontier models (Claude Opus/Sonnet 4.5, o1, GPT-4o, Gemini 3 Pro, Grok 4, DeepSeek V3.2, Llama 4 Scout, Mistral Large, Command A) Each answers identical prompt blindly All 10 judge all 10 responses (100 judgments) Self-judgments excluded from final scores 5 criteria: Correctness (30%), Completeness (20%), Clarity (20%), Depth (15%), Usefulness (15%) CODE-001 Results (Async Python Debugging): Claude Opus 4.5: 9.49 o1: 9.48 Claude Sonnet 4.5: 9.41 DeepSeek V3.2: 9.39 Grok 4: 9.37 Command A: 9.23 Gemini 3 Pro: 9.19 Mistral Large: 9.10 GPT-4o: 8.79 Llama 4 Scout: 8.04 REASON-001 Results (Two Envelope Paradox): Claude Opus 4.5: 9.24 o1: 9.23 Claude Sonnet 4.5: 9.09 DeepSeek V3.2: 8.93 Grok 4: 8.88 GPT-4o: 8.75 Gemini 3 Pro: 8.68 Mistral Large: 8.64 Command A: 8.38 Llama 4 Scout: 7.92 Judge Bias Patterns: Strictest: Claude Opus (avg 7.10-8.76 depending on task) Most lenient: Mistral Large (9.22-9.73) Correlation: Strict judges tend to score higher themselves Open questions for feedback: Is 5-point rubric weighting optimal for different task types? Should we normalize for judge harshness before aggregating? Are 9 judgments per response sufficient for statistical validity? Full data + prompts: https://themultivac.substack.com Daily evals at themultivac.com — currently in Phase 2 (peer matrix format). submitted by /u/Silver_Raspberry_811 [link] [comments]

  • [P] my shot at a DeepSeek style moe on a single rtx 5090
    by /u/exhorder72 (Machine Learning) on January 14, 2026 at 7:53 pm

    I know most will wonder why I’m wasting my time training at only 19k tok a sec. It’s because I can. I’m doing this in my living room in my spare time. 0 formal ML experience. The absurd amount I’ve learned in the last few months made me realize I really picked the wrong career. My Mixture of Experts is 2.36B parameter with 8 routed experts plus a shared expert using top-2 routing. Attention is Grouped Query Attention with QK-normalization and RoPE positional embeddings. All feed-forward layers use SwiGLU activation with RMSNorm throughout. Load balancing follows DeepSeek V3’s auxiliary-loss-free approach using bias-based routing. I monitor coefficient of variation and maximum violation per step. Training runs on TorchAO FP8 quantization with the Muon optimizer and a multi-stage learning rate schedule (warmup, constant, cosine decay). The backend is optimized for Blackwell architecture with cuBLASLt. The data pipeline implements MeCo (Metadata Conditioning then Cooldown) with ledger-based deterministic sampling. I have document-aware attention masking and cross-document loss masking but was disabled for the initial MeCo run. I have since disabled MeCo and curated a clean corpus with no tagging of any kind. MeCo worked but it worked too well and with only 8 experts, it became very problematic. My two biggest early mistakes were not using symmetric router initialization (std=0.006) and not having a dense first layer. Cost me a lot of time and sleep. So what did I do? I cheated. I used aux loss of .003 snd ema smoothing at the beginning. I just didn’t know better. I paid a price later on for that. DO NOT use router scaling on a small MoE. DeepSeek used 2.5. Kimi K2 used 2.446. I tried 1.2 and it was horribly unstable and violation blew up to over .500. 24 batch 6 Grad LR 3e-4 AdamW+Muon Scaled. Bias .001 Aux .0001. I update every step. As of yesterday: 2026-01-13 20:53:06 step 41915 | lr 3.00e-04 | loss 1.8867 | gnorm 0.13 | 19,415 tok/s (ema 19,553) | 75.9s/5 steps | cv 0.022 | bias -0.001708±0.179996 | rel_max=0.036 maxvio=0.027 ent=1.203 applied=True | seq_aux 2.444 2026-01-13 20:54:20 [moe] token counts: [150018, 148422, 155402, 147966, 145236, 146724, 144358, 141522] 2026-01-13 20:54:20 step 41920 | lr 3.00e-04 | loss 1.9263 | gnorm 0.13 | 20,102 tok/s (ema 19,828) | 73.4s/5 steps | cv 0.026 | bias -0.001708±0.179920 | rel_max=0.054 maxvio=0.054 ent=1.211 applied=True | seq_aux 2.515 I got a long ways to go 🙂 I’ll gladly answer any question. No gate keeping here. submitted by /u/exhorder72 [link] [comments]

  • [R] Controlled LLM Training on Spectral Sphere
    by /u/StartledWatermelon (Machine Learning) on January 14, 2026 at 3:23 pm

    TL;DR: The paper introduces Spectral Sphere Optimizer, which takes steepest descent under spectral norm (Muon) and forces the weights & updates onto a spectral sphere. Paper: https://www.arxiv.org/pdf/2601.08393 Repo: https://github.com/Unakar/Spectral-Sphere-Optimizer Abstract: Scaling large models requires optimization strategies that ensure rapid convergence grounded in stability. Maximal Update Parametrization ( muP) provides a theoretical safeguard for width-invariant theta(1) activation control, whereas emerging optimizers like Muon are only ``half-aligned'' with these constraints: they control updates but allow weights to drift. To address this limitation, we introduce the Spectral Sphere Optimizer (SSO), which enforces strict module-wise spectral constraints on both weights and their updates. By deriving the steepest descent direction on the spectral sphere, SSO realizes a fully muP-aligned optimization process. To enable large-scale training, we implement SSO as an efficient parallel algorithm within Megatron. Through extensive pretraining on diverse architectures, including Dense 1.7B, MoE 8B-A1B, and 200-layer DeepNet models, SSO consistently outperforms AdamW and Muon. Furthermore, we observe significant practical stability benefits, including improved MoE router load balancing, suppressed outliers, and strictly bounded activations. Algorithm: https://preview.redd.it/f1bvi7yd1cdg1.png?width=1197&format=png&auto=webp&s=88a15a375316f54b092e8101e492a2574dc2ace1 Evals: https://preview.redd.it/5hefuy7g1cdg1.png?width=1503&format=png&auto=webp&s=8a0864c5279654a1c9a29b7aae57d2a1b160aa4d https://preview.redd.it/0sy8ih8h1cdg1.png?width=1517&format=png&auto=webp&s=ffd675a60192908ed95652b89540cce8d2110088 https://preview.redd.it/rz6bhc6i1cdg1.png?width=1585&format=png&auto=webp&s=50cd471c7805517d0279877fee235dea3e42954e https://preview.redd.it/fu5wd7zi1cdg1.png?width=1524&format=png&auto=webp&s=5bfb7668a76ceefa320d7325b6abdb731d985e45 submitted by /u/StartledWatermelon [link] [comments]

  • Modeling exercise for triplets
    by /u/idan_huji (Data Science) on January 14, 2026 at 3:18 pm

    submitted by /u/idan_huji [link] [comments]

  • How far should I go with LeetCode topics for coding interviews?
    by /u/Lamp_Shade_Head (Data Science) on January 14, 2026 at 2:49 pm

    I recently started doing LeetCode to prep for coding interviews. So far I’ve mostly been focusing on arrays, hash maps, strings, and patterns like two pointers, sliding window, and binary search. Should I move on to other topics like stacks, queues, and trees, or is this enough for now? submitted by /u/Lamp_Shade_Head [link] [comments]

  • [D] CUDA Workstation vs Apple Silicon for ML / LLMs
    by /u/Individual-School-07 (Machine Learning) on January 14, 2026 at 1:22 pm

    Hi everyone, I’m trying to make a deliberate choice between two paths for machine learning and AI development, and I’d really value input from people who’ve used both CUDA GPUs and Apple Silicon. Context I already own a MacBook Pro M1, which I use daily for coding and general work. I’m now considering adding a local CUDA workstation mainly for: Local LLM inference (30B–70B models) Real-time AI projects (LLM + TTS + RVC) Unreal Engine 5 + AI-driven characters ML experimentation and systems-level learning I’m also thinking long-term about portfolio quality and employability (FAANG / ML infra / quant-style roles). Option A — Apple Silicon–first Stick with the M1 MacBook Pro Use Metal / MPS where possible Offload heavy jobs to cloud GPUs (AWS, etc.) Pros I see: efficiency, quiet, great dev experience Concerns: lack of CUDA, tooling gaps, transferability to industry infra Option B — Local CUDA workstation Used build (~£1,270 / ~$1,700): RTX 3090 (24GB) i5-13600K 32GB DDR4 (upgradeable) Pros I see: CUDA ecosystem, local latency, hands-on GPU systems work Concerns: power, noise, cost, maintenance What I’d love feedback on For local LLMs and real-time pipelines, how limiting is Apple Silicon today vs CUDA? For those who’ve used both, where did Apple Silicon shine — and where did it fall short? From a portfolio / hiring perspective, does CUDA experience meaningfully matter in practice? Is a local 3090 still a solid learning platform in 2025, or is cloud-first the smarter move? Is the build I found a good deal ? I’m not anti-Mac (I use one daily), but I want to be realistic about what builds strong, credible ML experience. Thanks in advance — especially interested in responses from people who’ve run real workloads on both platforms. submitted by /u/Individual-School-07 [link] [comments]

  • [D] Classification of low resource language using Deep learning
    by /u/Sikandarch (Machine Learning) on January 14, 2026 at 6:54 am

    I have been trying to solve classification problem on a low resource language. I am doing comparative analysis, LinearSVC and Logistic regression performed the best and the only models with 80+ accuracy and no overfitting. I have to classify it using deep learning model as well. I applied BERT on the dataset, model is 'bert-base-multilingual-cased', and I am fine tuning it, but issue is overfitting. Training logs: Epoch 6/10 | Train Loss: 0.4135 | Train Acc: 0.8772 | Val Loss: 0.9208 | Val Acc: 0.7408 Epoch 7/10 | Train Loss: 0.2984 | Train Acc: 0.9129 | Val Loss: 0.8313 | Val Acc: 0.7530 Epoch 8/10 | Train Loss: 0.2207 | Train Acc: 0.9388 | Val Loss: 0.8720 | Val Acc: 0.7505 this was with default dropout of the model, when I change dropout to 0.3, or even 0.2, model still overfits but not this much, but with dropout I don't go near 60% accuracy, long training introduces overfitting, early stopping isn't working as val loss continuous to decrease. On 10 epoch, I trained patience of 2 and 3. It doesn't stops. To prevent this I am not doing warmup step, my optimizer is below: optimizer = AdamW([ {'params': model.bert.parameters(), 'lr': 2e-5}, {'params': model.classifier.parameters(), 'lr': 3e-5} ], weight_decay=0.01) About my dataset, I have 9000 training samples and 11 classes to train, data is imbalanced but not drastically, to cater this I have added class weights to loss function. 17 words per training sample on average. I set the max_length to 120 for tokens ids and attention masks. How can I improve my training, I am trying to achieve atleast 75% accuracy without overfitting, for my comparative analysis. What I am doing wrong? Please guide me. Data Augmentation didn't work too. I did easy data augmentation. Mixup Augmentation also didn't work. If you need more information about my training to answer questions, ask in the comment, thanks. submitted by /u/Sikandarch [link] [comments]

  • [R] My team and I have created a system that autonomously creates pufferlib envs. Looking for a compute sponsor
    by /u/cobalt1137 (Machine Learning) on January 14, 2026 at 6:40 am

    Hey hey. Like the title says, we are currently building some pretty weird and ambitious systems (think hive-mind/swarm-like collective) and we are growing these to be able to create great RL environments. And we are starting with pufferlib envs. It is doing a pretty damn good job atm. We are currently bootstrapped and we are limited on compute. Even a small batch of gpus (of decent size chips) would be pretty great. If you have any extra gpus laying around, or would potentially want to sponsor us, would love to chat. I am open to any questions in the thread as well. I'm also down to do a decent amount of discovery (need nda ideally). submitted by /u/cobalt1137 [link] [comments]

  • [D] Some of CVPR 2026 Workshops are released
    by /u/Striking-Warning9533 (Machine Learning) on January 14, 2026 at 5:44 am

    https://openreview.net/group?id=thecvf.com/CVPR/2026/Workshop submitted by /u/Striking-Warning9533 [link] [comments]

  • [D] TMLR timeline question: how long after rebuttal is it normal to wait for a decision?
    by /u/SynagogueLog (Machine Learning) on January 14, 2026 at 12:08 am

    Hi everyone, I have a quick question about typical timelines for TMLR. I submitted a paper to TMLR, received reviews, and then submitted the rebuttal. It’s now been about 3 weeks since the rebuttal, and there hasn’t been any update yet. I understand TMLR is a journal with rolling submissions and no hard deadlines, so delays are expected. I’ve seen some mentions that the discussion/rebuttal phase is designed to last ~2–4 weeks, and that Action Editors may wait during this period for possible reviewer responses or official recommendations before making a decision. For those who’ve submitted to TMLR before: Is 3–4 weeks after rebuttal still considered normal? How long did it take for you to receive a decision after rebuttal? Just trying to calibrate expectations — not complaining. Thanks in advance! submitted by /u/SynagogueLog [link] [comments]

  • [R] Why AI Self-Assessment Actually Works: Measuring Knowledge, Not Experience
    by /u/entheosoul (Machine Learning) on January 13, 2026 at 11:57 pm

    TL;DR: We collected 87,871 observations showing AI epistemic self-assessment produces consistent, calibratable measurements. No consciousness claims required. The Conflation Problem When people hear "AI assesses its uncertainty," they assume it requires consciousness or introspection. It doesn't. Functional Measurement Phenomenological Introspection "Rate your knowledge 0-1" "Are you aware of your states?" Evaluating context window Accessing inner experience Thermometer measuring temp Thermometer feeling hot A thermometer doesn't need to feel hot. An LLM evaluating knowledge state is doing the same thing - measuring information density, coherence, domain coverage. Properties of the context window, not reports about inner life. The Evidence: 87,871 Observations 852 sessions, 308 clean learning pairs: 91.3% showed knowledge improvement Mean KNOW delta: +0.172 (0.685 → 0.857) Calibration variance drops 62× as evidence accumulates Evidence Level Variance Reduction Low (5) 0.0366 baseline High (175+) 0.0006 62× tighter That's Bayesian convergence. More data → tighter calibration → reliable measurements. For the Skeptics Don't trust self-report. Trust the protocol: Consistent across similar contexts? ✓ Correlates with outcomes? ✓ Systematic biases correctable? ✓ Improves with data? ✓ (62× variance reduction) The question isn't "does AI truly know what it knows?" It's "are measurements consistent, correctable, and useful?" That's empirically testable. We tested it. Paper + dataset: Empirica: Epistemic Self-Assessment for AI Systems Code: github.com/Nubaeon/empirica Independent researcher here. If anyone has arXiv endorsement for cs.AI and is willing to help, I'd appreciate it. The endorsement system is... gatekeepy. submitted by /u/entheosoul [link] [comments]

  • [P] Awesome Physical AI – A curated list of academic papers and resources on Physical AI — focusing on VLA models, world models, embodied intelligence, and robotic foundation models.
    by /u/kwk236 (Machine Learning) on January 13, 2026 at 11:24 pm

    I've been compiling papers on Physical AI — the intersection of foundation models and robotics. This covers Vision-Language-Action (VLA) models like RT-2 and π₀, world models (DreamerV3, Genie 2, JEPA), diffusion policies, real-world deployment and latency problems, cross-embodiment transfer, scaling laws, and safety/alignment for robots. The field has exploded in the past 18 months. We went from "lets try llms on robotics" to having so many dimensions to optimize for. so felt right to maintain a running list of resources. Organized by: foundations → architectures → action representations → world models → learning paradigms → deployment → applications. Contributions welcome — especially corrections and missing papers. https://github.com/keon/awesome-physical-ai submitted by /u/kwk236 [link] [comments]

  • Undergrad Data Science dissertation ideas [Quantitative Research]
    by /u/ItzSaf (Data Science) on January 13, 2026 at 11:11 pm

    Hi everyone, I’m a undergraduate Data Science student in the UK starting my dissertation and I’m looking for ideas that would be relevant to quantitative research, which is the field I’d like to move into after graduating I’m not coming in with a fixed idea yet I’m mainly interested in data science / ML problems that are realistic at undergrad level to do over a course of a few months and aligned with how quantitative research is actually done I’ve worked on ML and neural networks as part of my degree projects and previous internship, but I’m still early in understanding how these ideas are applied in quant research, so I’m very open to suggestions. I’d really appreciate: examples of dissertation topics that would be viewed positively for quant research roles areas that are commonly misunderstood or overdone pointers to papers or directions worth exploring Thanks in advance! any advice would be really helpful. submitted by /u/ItzSaf [link] [comments]

 

What is Google Workspace?
Google Workspace is a cloud-based productivity suite that helps teams communicate, collaborate and get things done from anywhere and on any device. It's simple to set up, use and manage, so your business can focus on what really matters.

Watch a video or find out more here.

Here are some highlights:
Business email for your domain
Look professional and communicate as you@yourcompany.com. Gmail's simple features help you build your brand while getting more done.

Access from any location or device
Check emails, share files, edit documents, hold video meetings and more, whether you're at work, at home or on the move. You can pick up where you left off from a computer, tablet or phone.

Enterprise-level management tools
Robust admin settings give you total command over users, devices, security and more.

Sign up using my link https://referworkspace.app.goo.gl/Q371 and get a 14-day trial, and message me to get an exclusive discount when you try Google Workspace for your business.

Google Workspace Business Standard Promotion code for the Americas 63F733CLLY7R7MM 63F7D7CPD9XXUVT 63FLKQHWV3AEEE6 63JGLWWK36CP7WM
Email me for more promo codes

Active Hydrating Toner, Anti-Aging Replenishing Advanced Face Moisturizer, with Vitamins A, C, E & Natural Botanicals to Promote Skin Balance & Collagen Production, 6.7 Fl Oz

Age Defying 0.3% Retinol Serum, Anti-Aging Dark Spot Remover for Face, Fine Lines & Wrinkle Pore Minimizer, with Vitamin E & Natural Botanicals

Firming Moisturizer, Advanced Hydrating Facial Replenishing Cream, with Hyaluronic Acid, Resveratrol & Natural Botanicals to Restore Skin's Strength, Radiance, and Resilience, 1.75 Oz

Skin Stem Cell Serum

Smartphone 101 - Pick a smartphone for me - android or iOS - Apple iPhone or Samsung Galaxy or Huawei or Xaomi or Google Pixel

Can AI Really Predict Lottery Results? We Asked an Expert.

Ace the 2025 AWS Solutions Architect Associate SAA-C03 Exam with Confidence Pass the 2025 AWS Certified Machine Learning Specialty MLS-C01 Exam with Flying Colors

List of Freely available programming books - What is the single most influential book every Programmers should read



#BlackOwned #BlackEntrepreneurs #BlackBuniness #AWSCertified #AWSCloudPractitioner #AWSCertification #AWSCLFC02 #CloudComputing #AWSStudyGuide #AWSTraining #AWSCareer #AWSExamPrep #AWSCommunity #AWSEducation #AWSBasics #AWSCertified #AWSMachineLearning #AWSCertification #AWSSpecialty #MachineLearning #AWSStudyGuide #CloudComputing #DataScience #AWSCertified #AWSSolutionsArchitect #AWSArchitectAssociate #AWSCertification #AWSStudyGuide #CloudComputing #AWSArchitecture #AWSTraining #AWSCareer #AWSExamPrep #AWSCommunity #AWSEducation #AzureFundamentals #AZ900 #MicrosoftAzure #ITCertification #CertificationPrep #StudyMaterials #TechLearning #MicrosoftCertified #AzureCertification #TechBooks

Top 1000 Canada Quiz and trivia: CANADA CITIZENSHIP TEST- HISTORY - GEOGRAPHY - GOVERNMENT- CULTURE - PEOPLE - LANGUAGES - TRAVEL - WILDLIFE - HOCKEY - TOURISM - SCENERIES - ARTS - DATA VISUALIZATION
zCanadian Quiz and Trivia, Canadian History, Citizenship Test, Geography, Wildlife, Secenries, Banff, Tourism

Top 1000 Africa Quiz and trivia: HISTORY - GEOGRAPHY - WILDLIFE - CULTURE - PEOPLE - LANGUAGES - TRAVEL - TOURISM - SCENERIES - ARTS - DATA VISUALIZATION
Africa Quiz, Africa Trivia, Quiz, African History, Geography, Wildlife, Culture

Exploring the Pros and Cons of Visiting All Provinces and Territories in Canada.
Exploring the Pros and Cons of Visiting All Provinces and Territories in Canada

Exploring the Advantages and Disadvantages of Visiting All 50 States in the USA
Exploring the Advantages and Disadvantages of Visiting All 50 States in the USA


Health Health, a science-based community to discuss human health

Reddit Science This community is a place to share and discuss new scientific research. Read about the latest advances in astronomy, biology, medicine, physics, social science, and more. Find and submit new publications and popular science coverage of current research.

Reddit Sports Sports News and Highlights from the NFL, NBA, NHL, MLB, MLS, NCAA, F1, and other leagues around the world.

Turn your dream into reality with Google Workspace: It’s free for the first 14 days.
Get 20% off Google Google Workspace (Google Meet) Standard Plan with  the following codes:
Get 20% off Google Google Workspace (Google Meet) Standard Plan with  the following codes: 96DRHDRA9J7GTN6 96DRHDRA9J7GTN6
63F733CLLY7R7MM
63F7D7CPD9XXUVT
63FLKQHWV3AEEE6
63JGLWWK36CP7WM
63KKR9EULQRR7VE
63KNY4N7VHCUA9R
63LDXXFYU6VXDG9
63MGNRCKXURAYWC
63NGNDVVXJP4N99
63P4G3ELRPADKQU
With Google Workspace, Get custom email @yourcompany, Work from anywhere; Easily scale up or down
Google gives you the tools you need to run your business like a pro. Set up custom email, share files securely online, video chat from any device, and more.
Google Workspace provides a platform, a common ground, for all our internal teams and operations to collaboratively support our primary business goal, which is to deliver quality information to our readers quickly.
Get 20% off Google Workspace (Google Meet) Business Plan (AMERICAS): M9HNXHX3WC9H7YE
C37HCAQRVR7JTFK
C3AE76E7WATCTL9
C3C3RGUF9VW6LXE
C3D9LD4L736CALC
C3EQXV674DQ6PXP
C3G9M3JEHXM3XC7
C3GGR3H4TRHUD7L
C3LVUVC3LHKUEQK
C3PVGM4CHHPMWLE
C3QHQ763LWGTW4C
Even if you’re small, you want people to see you as a professional business. If you’re still growing, you need the building blocks to get you where you want to be. I’ve learned so much about business through Google Workspace—I can’t imagine working without it.
(Email us for more codes)