What are the top 3 methods used to find Autoregressive Parameters in Data Science?
In order to find autoregressive parameters, you will first need to understand what autoregression is. Autoregression is a statistical method used to create a model that describes data as a function of linear regression of lagged values of the dependent variable. In other words, it is a model that uses past values of a dependent variable in order to predict future values of the same dependent variable.
In time series analysis, autoregression is the use of previous values in a time series to predict future values. In other words, it is a form of regression where the dependent variable is forecasted using a linear combination of past values of the independent variable. The parameter values for the autoregression model are estimated using the method of least squares.
The autoregressive parameters are the coefficients in the autoregressive model. These coefficients can be estimated in a number of ways, including ordinary least squares (OLS), maximum likelihood (ML), or least squares with L1 regularization (LASSO). Once estimated, the autoregressive parameters can be used to predict future values of the dependent variable.
To find the autoregressive parameters, you need to use a method known as least squares regression. This method finds the parameters that minimize the sum of the squared residuals. The residual is simply the difference between the predicted value and the actual value. So, in essence, you are finding the parameters that best fit the data.

How to Estimate Autoregressive Parameters?
There are three main ways to estimate autoregressive parameters: ordinary least squares (OLS), maximum likelihood (ML), or least squares with L1 regularization (LASSO).
Ordinary Least Squares: Ordinary least squares is the simplest and most common method for estimating autoregressive parameters. This method estimates the parameters by minimizing the sum of squared errors between actual and predicted values.
Maximum Likelihood: Maximum likelihood is another common method for estimating autoregressive parameters. This method estimates the parameters by maximizing the likelihood function. The likelihood function is a mathematical function that quantifies the probability of observing a given set of data given certain parameter values.
Least Squares with L1 Regularization: Least squares with L1 regularization is another method for estimating autoregressive parameters. This method estimates the parameters by minimizing the sum of squared errors between actual and predicted values while also penalizing models with many parameters. L1 regularization penalizes models by adding an extra term to the error function that is proportional to the sum of absolute values of the estimator coefficients.
Finding Autoregressive Parameters: The Math Behind It
To find the autoregressive parameters using least squares regression, you first need to set up your data in a certain way. You need to have your dependent variable in one column and your independent variables in other columns. For example, let’s say you want to use three years of data to predict next year’s sales (the dependent variable). Your data would look something like this:
| Year | Sales |
|——|——-|
| 2016 | 100 |
| 2017 | 150 |
| 2018 | 200 |
Next, you need to calculate the means for each column. For our sales example, that would look like this:
$$ \bar{Y} = \frac{100+150+200}{3} = 150$$
Now we can calculate each element in what’s called the variance-covariance matrix:
$$ \operatorname {Var} (X)=\sum _{i=1}^{n}\left({x_{i}}-{\bar {x}}\right)^{2} $$
and
$$ \operatorname {Cov} (X,Y)=\sum _{i=1}^{n}\left({x_{i}}-{\bar {x}}\right)\left({y_{i}}-{\bar {y}}\right) $$
For our sales example, that calculation would look like this:
$$ \operatorname {Var} (Y)=\sum _{i=1}^{3}\left({y_{i}}-{\bar {y}}\right)^{2}=(100-150)^{2}+(150-150)^{2}+(200-150)^{2})=2500 $$
and
$$ \operatorname {Cov} (X,Y)=\sum _{i=1}^{3}\left({x_{i}}-{\bar {x}}\right)\left({y_{i}}-{\bar {y}}\right)=(2016-2017)(100-150)+(2017-2017)(150-150)+(2018-2017)(200-150))=-500 $$
Now we can finally calculate our autoregressive parameters! We do that by solving this equation:
$$ \hat {\beta }=(X^{\prime }X)^{-1}X^{\prime }Y=\frac {1}{2500}\times 2500\times (-500)=0.20 $$\.20 . That’s it! Our autoregressive parameter is 0\.20 . Once we have that parameter, we can plug it into our autoregressive equation:
$$ Y_{t+1}=0\.20 Y_t+a_1+a_2+a_3footnote{where $a_1$, $a_2$, and $a_3$ are error terms assuming an AR(3)} .$$ And that’s how you solve for autoregressive parameters! Of course, in reality you would be working with much larger datasets, but the underlying principles are still the same. Once you have your autoregressive parameters, you can plug them into the equation and start making predictions!.
Which Method Should You Use?
The estimation method you should use depends on your particular situation and goals. If you are looking for simple and interpretable results, then Ordinary Least Squares may be the best method for you. If you are looking for more accurate predictions, then Maximum Likelihood or Least Squares with L1 Regularization may be better methods for you.
Autoregressive models STEP BY STEP:
1) Download data: The first step is to download some data. This can be done by finding a publicly available dataset or by using your own data if you have any. For this example, we will be using data from the United Nations Comtrade Database.
2) Choose your variables: Once you have your dataset, you will need to choose the variables you want to use in your autoregression model. In our case, we will be using the import and export values of goods between countries as our independent variables.
3) Estimate your model: After choosing your independent variables, you can estimate your autoregression model using the method of least squares. OLS estimation can be done in many statistical software packages such as R or STATA.
4) Interpret your results: Once you have estimated your model, it is important to interpret the results in order to understand what they mean. The coefficients represent the effect that each independent variable has on the dependent variable. In our case, the coefficients represent the effect that imports and exports have on trade balance. A positive coefficient indicates that an increase in the independent variable leads to an increase in the dependent variable while a negative coefficient indicates that an increase in the independent variable leads to a decrease in the dependent variable.
5)Make predictions: Finally, once you have interpreted your results, you can use your autoregression model to make predictions about future values of the dependent variable based on past values of the independent variables.
Conclusion: In this blog post, we have discussed what autoregression is and how to find autoregressive parameters.
Estimating an autoregression model is a relatively simple process that can be done in many statistical software packages such as R or STATA.
In statistics and machine learning, autoregression is a modeling technique used to describe the linear relationship between a dependent variable and one more independent variables. To find the autoregressive parameters, you can use a method known as least squares regression which minimizes the sum of squared residuals. This blog post also explains how to set up your data for calculating least squares regression as well as how to calculate Variance and Covariance before finally calculating your autoregressive parameters. After finding your parameters you can plug them into an autoregressive equation to start making predictions about future events!
We have also discussed three different methods for estimating those parameters: Ordinary Least Squares, Maximum Likelihood, and Least Squares with L1 Regularization. The appropriate estimation method depends on your particular goals and situation.

Machine Learning For Dummies App
Machine Learning For Dummies on iOs: https://apps.apple.com/
Machine Learning For Dummies on Windows: https://www.
Machine Learning For Dummies Web/Android on Amazon: https://www.amazon.
What are some good datasets for Data Science and Machine Learning?
Machine Learning Engineer Interview Questions and Answers
Machine Learning Breaking News
Transformer – Machine Learning Models
Machine Learning – Software Classification
Autoregressive Model
Autoregressive generative models can estimate complex continuous data distributions such as trajectory rollouts in an RL environment, image intensities, and audio. Traditional techniques discretize continuous data into various bins and approximate the continuous data distribution using categorical distributions over the bins. This approximation is parameter inefficient as it cannot express abrupt changes in density without using a significant number of additional bins. Adaptive Categorical Discretization (ADACAT) is proposed in this paper as a parameterization of 1-D conditionals that is expressive, parameter efficient, and multimodal. A vector of interval widths and masses is used to parameterize the distribution known as ADACAT. Figure 1 showcases the difference between the traditional uniform categorical discretization approach with the proposed ADACAT.
Each component of the ADACAT distribution has non-overlapping support, making it a specific subfamily of mixtures of uniform distributions. ADACAT generalizes uniformly discretized 1-D categorical distributions. The proposed architecture allows for variable bin widths and more closely approximates the modes of two Gaussians mixture than a uniformly discretized categorical, making it highly expressive than the latter. Additionally, a distribution’s support is discretized using quantile-based discretization, which bins data into groups with similar measured data points. ADACAT uses deep autoregressive frameworks to factorize the joint density into numerous 1-D conditional ADACAT distributions in problems with more than one dimension.
Continue reading | Check out the paper and github link.
Pytorch – Computer Application
https://torchmetrics.readthedocs.io/en/stable//index.html
Best practices for training PyTorch model
What are some ways we can use machine learning and artificial intelligence for algorithmic trading in the stock market?
What are some good datasets for Data Science and Machine Learning?
Top 100 Data Science and Data Analytics and Data Engineering Interview Questions and Answers
Machine Learning Engineer Interview Questions and Answers
- [D] If you had one year and basic knowledge of ML? How would you design it to get maximum benefit out of it?by /u/Cool_Bhidu (Machine Learning) on December 7, 2023 at 9:38 pm
Hello, Let's assume a scenario where you know Python, basic probability, basic linear algebra, a few concepts like supervised and unsupervised, etc., basic statistics, and basic software experience. What could be realistic things that I can learn within a year - given I will devote complete one year to learn ML and nothing else. and there is a organization which keeps me accountable. submitted by /u/Cool_Bhidu [link] [comments]
- [D] Thoughts on Mamba?by /u/ExaminationNo8522 (Machine Learning) on December 7, 2023 at 9:29 pm
I ran the NanoGPT of Karparthy replacing Self-Attention with Mamba on his TinyShakespeare Dataset and within 5 minutes it started spitting out the following: https://preview.redd.it/4r96tp6lxx4c1.png?width=836&format=png&auto=webp&s=10f2f61cd4cea96f4f903cb2070835fc5d1df951 https://preview.redd.it/32ler5vnxx4c1.png?width=622&format=png&auto=webp&s=dd00e53f43dd0afa058758a987901ee6789d2258 https://preview.redd.it/sc96i4xoxx4c1.png?width=678&format=png&auto=webp&s=94d2ed279054363d3ed2b6beed65be89468582b0 So much faster than self-attention, and so much smoother, running at 6 epochs per second. I'm honestly gobsmacked. submitted by /u/ExaminationNo8522 [link] [comments]
- [P] flex-prompt: a flexible prompt rendering engine that ensures you'll never exceed your LLM's context length againby /u/queerviolet (Machine Learning) on December 7, 2023 at 8:49 pm
When working with LLMs, I frequently experience token agony. Error: This model's maximum context length is 4097 but you are trying to push in all of War and Peace, you imbecile Perhaps you've experienced it too! The issue is particularly pronounced with retrieval augmented pipelines, since you have potentially quite a large set of documents which you could perhaps include in the prompt if only you knew how big it could be. I got tired of hacking around this headache, so I wrote flex-prompt to address it. I wish I didn't have to. Perhaps someone can point me to a better solution! But I couldn't find one, so alas, here it is. flex-prompt provides a basic layout and component model to help you describe how you want the pieces of your prompt to grow and shrink and a token-aware renderer which renders your prompt to fit your model's window. Github, Intro to flex prompt colab Quick examples You can just render(Flex(...)), and flex prompt will fit the prompt into the context window, and tell you how many tokens are left over for the response: from flex_prompt import render, Flex, Expect rendered = render( Flex([ "Given the text, answer the question.", "--Text--", WAR_AND_PEACE, "--End Text--", "Question: What's the title of this text?", "Answer:", Expect() ], join='\n'), model='text-davinci-002') # rendered.output is the string to send to the model # rendered.max_response_tokens is how many tokens you can # request in response without exceeding the model's context window print(rendered.output, rendered.max_response_tokens) More typically, you'll want to define a prompt which takes parameters. To do this, you can create a class (probably a dataclass) which derives Flexed: from flex_prompt import Flexed, Expect from dataclasses import dataclass @dataclass class Ask(Flexed): text: str question: str answer: str | Expect = Expect() instruct: str = "Given a text, answer the question." flex_join = '\n' # yielded items will be joined by newlines def content(self, _ctx): if self.instruct: yield 'Given the text, answer the question.' yield '' yield '-- Begin Text --' # note: we're using `Flex` here just to attach a flex_weight # to the text, telling the renderer we'd like more space for the # text than anything else. yield Flex([self.text], flex_weight=2) yield '-- End Text --' yield 'Question: ', self.question yield 'Answer: ', self.answer The renderer works much as you might expect. You can `yield` anything which you can pass to the top-level render function, including other components, creating a whole tree. Note that the component above can be used to render both the actual prompt and examples. Examples simply have an answer. This is useful for experimenting with different ways of structuring a prompt while ensuring that all the examples we present to the LLM are in the same format. LangChain and Haystack Integrations Flex prompt doesn't really care how you execute your prompt. For convenience, render(model=) does accept both LangChain and Haystack models: ask_tolstoy = Ask(text=WAR_AND_PEACE, question="Who wrote this?") # Using LangChain from langchain.llms import OpenAI lc_llm = OpenAI() rendering = render(ask_tolstoy, model=lc_llm) print(lc_llm(rendering.output, max_tokens=rendering.max_response_tokens)) # Using Haystack from haystack.nodes import PromptModel hs_llm = PromptModel(model_name_or_path='text-davinci-002', api_key=os.environ['OPENAI_API_KEY']) rendering = render(ask_tolstoy, model=hs_llm) print(hs_llm.invoke(rendering.output, max_tokens=rendering.max_response_tokens)) Is it worth it? As models grow larger and larger context windows, I've asked myself whether this is worth it. Won't context sizes eventually big enough to put in everything we might want without worry? One response: "everything I might want" is a very, very big set, plausibly bigger than any window size we're going to see soon. Another: being able to do this kind of token accounting is useful even if we don't completely fill context windows. For example, we might be able to augment our prompt with examples, documents, and tips. How much space should we allocate to each? The answer might well be model-dependent. How do we figure it out? Flex prompt's output, a Rendering object, actually holds the entire component tree. You can look through the object to see how many tokens were allocated to each child. This is currently very manual, but it does provide the bedrock infrastructure to e.g. run tests to discover the optimal balance of augmented data for a given prompt and model. Additionally, the right admixture (and for that matter, the right phrasing) may well be model-dependent. Flex prompt currently provides only very limited model-specific rendering (you can look at ctx.target, but it doesn't tell you much), but there's no reason that can't be significantly improved. At the extreme limit is prompt erasure, where we fine-tune a model to require no or minimal instructions/examples for a given set of prompts. Flex prompt can enable transitions like this with no changes to the pipelines themselves: you'd still use the same prompt components, they'd just render differently if the target is a fine-tuned model vs. a generic one. Status & Future Work Flex prompt is very much in early development. I would love to hear if and how people find it useful, and would love input and contributions! Some things I'd like to tackle in the future: Rendering message lists. Flex prompt currently only renders strings, though it's set up to be able to render any type of output. Message histories basically grow without bound, so supporting this seems like a no-brainer. Pagination. If your rendering overflows (as above, where we're trying to stuff the entirety of war and peace into a prompt), flex prompt will clip the offending pieces to fit. But there's currently no way to get "the next page". But the Rendering actually retains enough information to do this! It would be great to be able to call render(...).pages() to get the sequence of prompts as we "scroll" whatever has overflowed. This is medium-hanging fruit—a little tricky because we do have to descend the tree of renderings to find the exact one(s) which overflowed and then update only those. Token accounting. As mentioned above, you can currently grovel around in Rendering and look at the pieces of the prompt. This would be more useful if it were a little easier, e.g. if you could use rendering[Examples] to find all the parts rendered by the Examples component, or rendering['advice'] to find all the parts which are tagged (somehow) as "advice". The use case here is prompt optimization: discovering the optimal number or percentage of tokens to allot to each thing we might want to drop into the prompt. More integrations. Currently, flex prompt only supports OpenAI models. You can register your own target finders, but it would be great to have more support out of the box. This is mostly a matter of digging around and finding the tokenizers and window sizes for common models, and then writing the appropriate target finders. Contributions very welcome! Model tuning. As mentioned above, the rendering context could provide a mechanism for fetching model-specific parameters. The basic idea is that ctx[param] will evaluate param against the context, and then we can define some parameter types which load their model-specific values from gestures vaguely somewhere. Thanks for reading! Flex prompt Github Intro to flex prompt colab My website. shameless plug: I have a lot of engineering experience and a bit of machine learning experience and I am currently looking for a job submitted by /u/queerviolet [link] [comments]
- How do you deal with people wanting definite answers when statistics isn't deterministic?by /u/son_of_tv_c (Data Science) on December 7, 2023 at 8:30 pm
I'm sure you all know that nothing in statistics is certain, it's all probabilities and degrees of confidence. Well, I'm finding business people simply just don't comprehend that. The amount of times I've had to explain that correlation =/= causation, or why aggregate metrics based on very small sample sizes aren't reliable is insane. Like I get it's not their job to know stats and data science, but at some point these things should be common sense, and I shouldn't have to waste half a meeting explaining it for the 30th time. And whenever I come back to them with some kind of result, I choose my words carefully, not to over promise, cause guess who's ass is on the line if I'm wrong. If I say "increased advertising appears to be correlated with increased sales", they hear "spend more money on advertising". They will then spend that money and if it doesn't work, I'm the one who apparently messed up. I've been working around it by both choosing my words carefully and creating documentation. Kind of like a CYA, if they don't heed my warnings and it blows up in their faces, at least I can point back to them saying I told them. For the former, it's in one ear out the other, are increasingly happening in meetings where there is no official transcript I can point back to. They don't listen to any of my warnings about over-concluding from my results. As to the former, well I was told recently after doing a sales analysis that "no one gives a fuck" about my methodology or results. Drop it from the write up. They just want broad conclusions and actions. In my mind, my job is to tell them what I found and let them draw their own conclusions. I get that data science and stats aren't their job, but at the same job, sales isn't mine, so why am I making conclusions for sales people about what they should do? IDK, I figure this is common for this field, so what do you guys do? submitted by /u/son_of_tv_c [link] [comments]
- [D] Is there a tool that indicates which parts of the input prompt impact the LLM's output the most?by /u/ToughOpening (Machine Learning) on December 7, 2023 at 8:29 pm
Hi, Is there a tool that indicates which parts of the input prompt impact the LLM's output the most? I do not care which LLM the tool is for if it exists. I guess it could be backtracked via the weights of each node in the neural network, but you guys are smarter than me so I'll listen to y'all. My use case is I have a prompt that slightly changes variation to variation. The output of the model is "Yes" or "No", so I want to see which parts of the prompt I change impact its response Best, A Reddit User submitted by /u/ToughOpening [link] [comments]
- Public Datasets With At Least 2 Rows Per Subject? [D]by /u/ZeApelido (Machine Learning) on December 7, 2023 at 8:11 pm
Are you aware of any public datasets that have at least 2 trials / samples / rows per subject? Could be in any domain. Preferably with > 100 subjects, and the tests not sampled years apart (but not dealbreakers). For instance, a large cohort of patients who have had ECG scans collected on 2 separate occasions. I am slowly working my way through the PhysioNet databases: https://physionet.org/about/database/ Most of course only have one scan per subject. submitted by /u/ZeApelido [link] [comments]
- [D] Llms/Generative AI citing sourcesby /u/edixtor93 (Machine Learning) on December 7, 2023 at 7:48 pm
This might be something the big boys in AI are considering, or maybe already doing. If not, would it be such a crazy endeavor? Technically speaking... With artists claiming copyright infringement, with Meta using Facebook and Instagram images (which I assume we all blindly agreed to in tos), and in general more and more backlash against AI using sources without permission or whatever. Wouldn't a system where during training it somehow keeps track of sources such that when the model is finally trained and ready to use, it could generate whatever it generates along with a probably long list of citations of the sources it used to generate the work of the prompt? Ain't saying this would be an easy thing to do, hell I know very little of how it all works, but wouldn't a system like this put a lot of people at ease? Knowing that at the very least their work is being credited? Opening the floor for thoughts and opinions. submitted by /u/edixtor93 [link] [comments]
- [D] How do LLMs Combine with Traditional ML approaches?by /u/yoquierodata (Machine Learning) on December 7, 2023 at 7:38 pm
I’ve got experience in “traditional” ML like building classification and regression models with GB Trees and the like, so I’m curious how, if at all LLMs can be combined with other ML modeling approaches. If my use case entails structured data as well as something like chat history, is there a need to “combine” the modeling approaches? Thanks for any resources or input you might have. submitted by /u/yoquierodata [link] [comments]
- [D] Considering Switching to DS/ML from SWE.by /u/EquivalentAbies6095 (Machine Learning) on December 7, 2023 at 7:26 pm
How is the day to day for Data Science/ML? I am thinking about changing careers as I am really not enjoying certain aspects of Software Engineering. I did an 4 month ML internship, but I don't feel like I got the full grasp of the day to day. I know this can vary from company to company and team to team. For some background, I have ~4 years of experience as a SWE, BS in CS from top 50 school, and MS in CS from top 10 school. submitted by /u/EquivalentAbies6095 [link] [comments]
- [D] Is ICPRAI a Reputable ML Conference? Seeking Input!by /u/SufficientAd3564 (Machine Learning) on December 7, 2023 at 7:17 pm
Hey ML folks! Found ICPRAI on aideadlin.es but don't know much about it. Any insights on the conference quality, review process, or personal experiences? Considering submitting a paper and looking for advice. Your thoughts are much appreciated! submitted by /u/SufficientAd3564 [link] [comments]
- [P] Machinery failure detection with training on the edge using Arduino-compatible boardsby /u/prokyber (Machine Learning) on December 7, 2023 at 7:05 pm
Bender Bending Rodríguez Some time ago I posted about the C++ library I made for training decision trees on the edge (using any Arduino-compatible board). People were asking reasonable questions, like is it actually good for something. So, recently I decided to make a little proof of concept project by training it to detect fan 'failure' based on it's vibration patterns. The code is in the examples folder. The code is practically the same as the code for 'physical activity recognition' except for the fact that I increased the sampling frequency a little bit. Here is a video of the whole process. (Sorry for my bad English) First part of the video is learning two states('ok' and 'failure') then goes the classification. I guess the coolest thing is that how fast and easy it is. I wonder If it is possible to make unsupervised anomaly detection on the edge using KNN... submitted by /u/prokyber [link] [comments]
- Learning Resources for MLE/CS Topicsby /u/Dezireless (Data Science) on December 7, 2023 at 6:11 pm
When I was first hired as a DS, I was working on data analysis, statistics, and experimental design aspects. Whenever I did any ML, it was always just in a Jupyter notebook environment and didn't seem to go anywhere beyond that. I want to delve deeper into some MLE/CS topics. for a variety of reasons. In the past year I have become more focused on putting ML models and data analysis into production. I want to be self sufficient. I don't like having to beg for help from a software engineer to make changes to the production environment. Can you suggest any beginner hands on tutorials on any of these topics: 1) Constructing python modules, including python requirements files, setup.py, etc. 2) deploying a module in a docker container 3) Constructing an API with #1 and #2? Not sure if this makes sense. 4) Other topics, such as Airflow, AWS, etc. submitted by /u/Dezireless [link] [comments]
- Learning Resources for MLE/CS Topicsby /u/Dezireless (Data Science) on December 7, 2023 at 6:07 pm
When I was first hired as a DS, I was working on data analysis, statistics, and experimental design aspects. Whenever I did any ML, it was always just in a Jupyter notebook environment and didn't seem to go anywhere beyond that. I want to delve deeper into some MLE/CS topics. for a variety of reasons. In the past year I have become more focused on putting ML models and data analysis into production. I want to be self sufficient. I don't like having to beg for help from a software engineer to make changes to the production environment. Can you suggest any beginner hands on tutorials on any of these topics: 1) Constructing python modules, including python requirements files, setup.py, etc. 2) deploying a module in a docker container 3) Constructing an API with #1 and #2? Not sure if this makes sense. 4) Other topics, such as Airflow, AWS, etc. submitted by /u/Dezireless [link] [comments]
- Learning Resources for MLE/CS Topicsby /u/Dezireless (Data Science) on December 7, 2023 at 6:07 pm
When I was first hired as a DS, I was working on data analysis, statistics, and experimental design aspects. Whenever I did any ML, it was always just in a Jupyter notebook environment and didn't seem to go anywhere beyond that. I want to delve deeper into some MLE/CS topics. for a variety of reasons. In the past year I have become more focused on putting ML models and data analysis into production. I want to be self sufficient. I don't like having to beg for help from a software engineer to make changes to the production environment. Can you suggest any beginner hands on tutorials on any of these topics: 1) Constructing python modules, including python requirements files, setup.py, etc. 2) deploying a module in a docker container 3) Constructing an API with #1 and #2? Not sure if this makes sense. 4) Other topics, such as Airflow, AWS, etc. submitted by /u/Dezireless [link] [comments]
- [D] Undergrad contemplating a Machine Learning PhD, but worried about what that truly entails.by /u/FM-2070 (Machine Learning) on December 7, 2023 at 4:46 pm
I'm a 3rd year at a T5 school, and I've been doing research since high school. I've got a few publications under my belt and my professor (at a different T3 institution) is expecting my current work will get me a first authorship to Nature Medicine (or a similar top journal). It's all been applied ML/DL work, especially in reducing the cost of healthcare. I've been keeping up a high GPA and I've been trying to plan out the next year to set myself up as well as I can for a PhD. In theory, what I really want to do is figure out next-generation learning algorithms, especially those motivated by principles from the brain. I've found this incredibly exciting, and freshman year I'd get up at 5, spend most of the day learning about this stuff and implementing my ideas, skip all my classes, and barely manage As. This was to little avail (nothing was really that novel, I just kept reinventing old techniques), so by my second year I went all-in on applied work so I could rack up publications. Consequently, at this point, I'm honestly a little out of touch with my favorite area of work. My obsession with that specific domain of research is what makes me think that what I want is a PhD. Personally, truth be told, I generally dislike the process of applied research. Most of it is, of course, just handing data and applying existing ideas in ML (maybe with some cool strokes of insight here and there). While I think it's really important and meaningful, I definitely don't enjoy it in the same way. I don't feel that thrill of ideation, and I'm not sure if I'd rather just try solving problems like that in industry. So my interest in research is already somewhat specific. My end goal is to continue to do research, either at big tech companies where I can work on new innovations at the forefront of AI or simply in universities. The least painful viable path to that is what I want, and a PhD seems like what I'd need to get there. There are a couple things I'm worried about though: As soon as things start shifting too far from my core interests, while I suck it up and still work hard, I don't feel the same sheer enjoyment doing research. The results are cool and mean something, but the process is less fun. I feel more like I'm doing ML engineering than research. Might as well just do industry or a master's instead? A PhD is 6-7 years. I know this sounds a bit silly (like, duh - that's what a PhD is! it takes time, and everything is a gamble), but that honestly scares me. The fear of future regret, my worries that I'll perish from not publishing enough (or being forced to spend all my time doing research I don't truly enjoy), or just general fear of me not being cut out to take advantage of opportunities a PhD confers and ultimately just ending up in the same place/worse off than if I'd just entered industry straight away. I have mixed feelings about SDE/MLE/DS roles in big tech. I've had internships in legacy big tech companies and honestly the fact that work was kinda chill, big impacts took less effort, work-life balance was not nonexistent and the problems were somewhat interesting made it an overall enjoyable experience. But when I'm thinking about my life long-term, I want to produce something that actually means something to me. Helping make (however minuscule) minor contributions to fundamental ideas is at the intersection of "fun" and "meaningful" for me, which is why it's my top pick. Second to that is applied work in healthcare, third applied work in space exploration. Not sure if a PhD is the way to go for those two. Anyway, this is sort of an infodump, and I was generally just hoping to get some advice from people with more experience and practical wisdom than I do. Thank you for taking the time to read this. submitted by /u/FM-2070 [link] [comments]
- [D] I keep seeing ambiguous use of terminology in academic and SoTA papersby /u/reverendCappuccino (Machine Learning) on December 7, 2023 at 3:29 pm
I think the community should do better and maybe even converge explicitly on some terminology, even though in still developing fields it is normal to have concepts that are discovered to be equal or different on the run. I keep seeing "residual connection" as "skip connection", but I think the latter is the identity map and the former is the residual function. I keep seeing kernels and filters used interchangeably but it would be nice to have a way to express the "depthwise" slices of the 2D convolution filters (whose weights are in a 3D tensor). I keep seeing embedding dimension for that of a Query/Key matrix, but then PyTorch MultiHeadAttention divides the projection instead of repeating it (iykyk I'm not talking about the efficiency turnaround per se). This are surely or mostly minor issues, but slow down code implementations and mathematical understanding whenever notation is unclear and code isn't available, which unfortunately happens quite often. What bothers you, what do you find helpful? submitted by /u/reverendCappuccino [link] [comments]
- [D] What are some examples of being clever with batching for training efficiency?by /u/angelcatfish (Machine Learning) on December 7, 2023 at 3:27 pm
Language Model novice here. I was going through the README section of minGPT and read this line. The majority of the complexity is just being clever with batching (both across examples and over sequence length) for efficiency. What could be some examples of doing such? submitted by /u/angelcatfish [link] [comments]
- [D] In decoder models, if later tokens attend to early tokens but early tokens don't attend to later tokens, what stops the influence of the early tokens from growing with each layer?by /u/30299578815310 (Machine Learning) on December 7, 2023 at 2:19 pm
Let's imagine just two tokens, A and B. In each attention layer, B will attend to itself and A, but A will only ever attend to itself. So over and over B will become a weighted sum of itself and A, then they will both go through a FF layer, and then the process will repeat. So shouldn't the "percentage" of info in B that comes from A grow with each layer? Am I missing something basic here? Does the model have to learn to make the Q of B and the K of A become more orthogonal with each layer (on average) to prevent B from paying too much attention to A over the course of all the layers? edit: To be clear, I totally get this is trained out of the model, but what I don't get is how it is. Do models tend to only a few layers where they pay attention to early tokens? Does the FF layer try to reverse the impacts of an oversaturation of info from early tokens? etc. submitted by /u/30299578815310 [link] [comments]
- [R] Half-Quadratic Quantization of Large Machine Learning Modelsby /u/sightio (Machine Learning) on December 7, 2023 at 1:38 pm
Sharing our work on model quantization. Blog: https://mobiusml.github.io/hqq_blog/ Code: https://github.com/mobiusml/hqq Models: https://huggingface.co/mobiuslabsgmbh/ No data calibration needed, extremely fast 🚀, works on both language and vision models! Why does it matter? Quantization significantly reduces GPU memory requirements but degrades the quality of the models. Having faster and more accurate quantization methods is extremely valuable for the ML community. Approach: Sparsity-based error formulation between the original weights and their dequantized version. We used a Half-Quadratic solver to derive a closed-form solution that is 100x faster than backprop via Pytorch's Autograd. Quantization speed: ~ 1 minute for Llama2-13B ~ 4 minutes for LLama2-70B (over 50x faster than GPTQ) Findings: - Larger models quantized to 3/2-bit outperform smaller full-precision models with similar or lower memory requirements. - Successful 2-bit quantization requires a lower group-size (e.g., 32 or 16) and compression of both the zero-point and the scaling factor for lower memory usage. While we acknowledge our view might be slightly biased, we genuinely believe that our work will significantly benefit the open-source software (OSS) machine learning community. Code and model are in Apache permissive license. submitted by /u/sightio [link] [comments]
- For PhD data scientists in research focused roles, do you exclusively hire PhDs?by /u/AdFew4357 (Data Science) on December 7, 2023 at 12:18 pm
This is regarding the data scientist positions in the industry which are more research focused. Not business facing or product facing ones. I find in the research focused data scientist roles the main criteria is a PhD. However, I’m wondering if there are: Any MS stats folks working in these types of jobs? And if PhDs are the ones hiring, do you exclusively hire PhDs for these roles as oppose to a MS with industry experience? submitted by /u/AdFew4357 [link] [comments]