**What are the top 3 methods used to find Autoregressive Parameters in Data Science?**

In order to find autoregressive parameters, you will first need to understand what autoregression is. **Autoregression is a statistical method used to create a model that describes data as a function of linear regression of lagged values of the dependent variable**. In other words, it is a model that uses past values of a dependent variable in order to predict future values of the same dependent variable.

In time series analysis,** autoregression is the use of previous values in a time series to predict future values.** In other words, it is a form of regression where the dependent variable is forecasted using a linear combination of past values of the independent variable. The parameter values for the autoregression model are estimated using the method of least squares.

The autoregressive parameters are the coefficients in the autoregressive model. These coefficients can be estimated in a number of ways, including ordinary least squares (OLS), maximum likelihood (ML), or least squares with L1 regularization (LASSO). Once estimated, the autoregressive parameters can be used to predict future values of the dependent variable.

To find the autoregressive parameters, you need to use a method known as** least squares regression**. This method finds the parameters that minimize the sum of the squared residuals. The residual is simply the difference between the predicted value and the actual value. So, in essence, you are finding the parameters that best fit the data.

**How to Estimate Autoregressive Parameters?**

There are three main ways to estimate autoregressive parameters: ordinary least squares (OLS), maximum likelihood (ML), or least squares with L1 regularization (LASSO).

**Ordinary Least Squares**: Ordinary least squares is the simplest and most common method for estimating autoregressive parameters. This method estimates the parameters by minimizing the sum of squared errors between actual and predicted values.

**Maximum Likelihood**: Maximum likelihood is another common method for estimating autoregressive parameters. This method estimates the parameters by maximizing the likelihood function. The likelihood function is a mathematical function that quantifies the probability of observing a given set of data given certain parameter values.

**Least Squares with L1 Regularization**: Least squares with L1 regularization is another method for estimating autoregressive parameters. This method estimates the parameters by minimizing the sum of squared errors between actual and predicted values while also penalizing models with many parameters. L1 regularization penalizes models by adding an extra term to the error function that is proportional to the sum of absolute values of the estimator coefficients.

**Finding Autoregressive Parameters:** The Math Behind It

To find the autoregressive parameters using least squares regression, you first need to set up your data in a certain way. You need to have your dependent variable in one column and your independent variables in other columns. For example, let’s say you want to use three years of data to predict next year’s sales (the dependent variable). Your data would look something like this:

| Year | Sales |

|——|——-|

| 2016 | 100 |

| 2017 | 150 |

| 2018 | 200 |

Next, you need to calculate the means for each column. For our sales example, that would look like this:

$$ \bar{Y} = \frac{100+150+200}{3} = 150$$

Now we can calculate each element in what’s called the variance-covariance matrix:

$$ \operatorname {Var} (X)=\sum _{i=1}^{n}\left({x_{i}}-{\bar {x}}\right)^{2} $$

and

$$ \operatorname {Cov} (X,Y)=\sum _{i=1}^{n}\left({x_{i}}-{\bar {x}}\right)\left({y_{i}}-{\bar {y}}\right) $$

For our sales example, that calculation would look like this:

$$ \operatorname {Var} (Y)=\sum _{i=1}^{3}\left({y_{i}}-{\bar {y}}\right)^{2}=(100-150)^{2}+(150-150)^{2}+(200-150)^{2})=2500 $$

### Advertise with us - Post Your Good Content Here

We are ranked in the Top 20 on Google

### AI Dashboard is available on the Web, Apple, Google, and Microsoft, PRO version

and

$$ \operatorname {Cov} (X,Y)=\sum _{i=1}^{3}\left({x_{i}}-{\bar {x}}\right)\left({y_{i}}-{\bar {y}}\right)=(2016-2017)(100-150)+(2017-2017)(150-150)+(2018-2017)(200-150))=-500 $$

Now we can finally calculate our autoregressive parameters! We do that by solving this equation:

$$ \hat {\beta }=(X^{\prime }X)^{-1}X^{\prime }Y=\frac {1}{2500}\times 2500\times (-500)=0.20 $$\.20 . That’s it! Our autoregressive parameter is 0\.20 . Once we have that parameter, we can plug it into our autoregressive equation:

$$ Y_{t+1}=0\.20 Y_t+a_1+a_2+a_3footnote{where $a_1$, $a_2$, and $a_3$ are error terms assuming an AR(3)} .$$ And that’s how you solve for autoregressive parameters! Of course, in reality you would be working with much larger datasets, but the underlying principles are still the same. Once you have your autoregressive parameters, you can plug them into the equation and start making predictions!.

**Which Method Should You Use?**

The estimation method you should use depends on your particular situation and goals. If you are looking for simple and interpretable results, then Ordinary Least Squares may be the best method for you. If you are looking for more accurate predictions, then Maximum Likelihood or Least Squares with L1 Regularization may be better methods for you.

**Autoregressive models STEP BY STEP:**

1) **Download data**: The first step is to download some data. This can be done by finding a publicly available dataset or by using your own data if you have any. For this example, we will be using data from the United Nations Comtrade Database.

2) **Choose your variables**: Once you have your dataset, you will need to choose the variables you want to use in your autoregression model. In our case, we will be using the import and export values of goods between countries as our independent variables.

3) **Estimate your model:** After choosing your independent variables, you can estimate your autoregression model using the method of least squares. OLS estimation can be done in many statistical software packages such as R or STATA.

4) **Interpret your results**: Once you have estimated your model, it is important to interpret the results in order to understand what they mean. The coefficients represent the effect that each independent variable has on the dependent variable. In our case, the coefficients represent the effect that imports and exports have on trade balance. A positive coefficient indicates that an increase in the independent variable leads to an increase in the dependent variable while a negative coefficient indicates that an increase in the independent variable leads to a decrease in the dependent variable.

5)**Make predictions:** Finally, once you have interpreted your results, you can use your autoregression model to make predictions about future values of the dependent variable based on past values of the independent variables.

**Conclusion:** In this blog post, we have discussed what autoregression is and how to find autoregressive parameters.

Estimating an autoregression model is a relatively simple process that can be done in many statistical software packages such as R or STATA.

In statistics and machine learning, autoregression is a modeling technique used to describe the linear relationship between a dependent variable and one more independent variables. To find the autoregressive parameters, you can use a method known as least squares regression which minimizes the sum of squared residuals. This blog post also explains how to set up your data for calculating least squares regression as well as how to calculate Variance and Covariance before finally calculating your autoregressive parameters. After finding your parameters you can plug them into an autoregressive equation to start making predictions about future events!

We have also discussed three different methods for estimating those parameters: Ordinary Least Squares, Maximum Likelihood, and Least Squares with L1 Regularization. **The appropriate estimation method depends on your particular goals and situation.**

# Machine Learning For Dummies App

Machine Learning For Dummies on iOs: https://apps.apple.com/

Machine Learning For Dummies on Windows: https://www.

Machine Learning For Dummies Web/Android on Amazon: https://www.amazon.

What are some good datasets for Data Science and Machine Learning?

Machine Learning Engineer Interview Questions and Answers

**Machine Learning Breaking News **

Transformer – Machine Learning Models

**Machine Learning – Software Classification**

# Autoregressive Model

Autoregressive generative models can estimate complex continuous data distributions such as trajectory rollouts in an RL environment, image intensities, and audio. Traditional techniques discretize continuous data into various bins and approximate the continuous data distribution using categorical distributions over the bins. This approximation is parameter inefficient as it cannot express abrupt changes in density without using a significant number of additional bins. Adaptive Categorical Discretization (ADACAT) is proposed in this paper as a parameterization of 1-D conditionals that is expressive, parameter efficient, and multimodal. A vector of interval widths and masses is used to parameterize the distribution known as ADACAT. Figure 1 showcases the difference between the traditional uniform categorical discretization approach with the proposed ADACAT.

Each component of the ADACAT distribution has non-overlapping support, making it a specific subfamily of mixtures of uniform distributions. ADACAT generalizes uniformly discretized 1-D categorical distributions. The proposed architecture allows for variable bin widths and more closely approximates the modes of two Gaussians mixture than a uniformly discretized categorical, making it highly expressive than the latter. Additionally, a distribution’s support is discretized using quantile-based discretization, which bins data into groups with similar measured data points. ADACAT uses deep autoregressive frameworks to factorize the joint density into numerous 1-D conditional ADACAT distributions in problems with more than one dimension.

Continue reading | *Check out the* *paper* *and* *github link.*

**Pytorch – Computer Application**

https://torchmetrics.readthedocs.io/en/stable//index.html

Best practices for training PyTorch model

What are some ways we can use machine learning and artificial intelligence for algorithmic trading in the stock market?

What are some good datasets for Data Science and Machine Learning?

Top 100 Data Science and Data Analytics and Data Engineering Interview Questions and Answers

Machine Learning Engineer Interview Questions and Answers

- Is there a place to learn where people aren't petty and condescending?by /u/databro92 (Data Science) on July 23, 2024 at 4:19 pm
I see people posting in this subreddit frequently trying to learn things, asking for recommendations and tips, trying to discuss data science, and about 50% of the replies here are people who think they are so much smarter than they are being petty, mocking them, being denigrating to them, aggressive, toxic, for no reason at all. Just acting like they think they are one of the smartest people in the world to ever exist.. The other 50% are pretty nice, they talk, provide recommendations, support, words of encouragement, advice, technical information. Some wondering if there is another place where people go to discuss data science as they are learning it. I'm not talking about doing a boot camp, or doing a udemy course or anything like that. I'm talking about a place where people who are devoted to learning data science and machine learning fundamentals can go to discuss freely. submitted by /u/databro92 [link] [comments]

- If you peek in your AB tests, you're setting yourself up for dissapointmentby /u/__compactsupport__ (Data Science) on July 23, 2024 at 1:37 pm
Peeking (looking for significance in an AB test before the experiment has enough samples to reach desired power) is a “no no”. Rationales for not peeking typically mention inflated type 1 error rate. Unless you’re just randomizing into two groups and not changing anything, the null is unlikely to be true. So inflated type one error rate is really not the primary concern. Rather, if we peek then we are setting ourselves up for disappointment. Detected effects from peeking will typically not generalize, and we will be overstating out impact. The reason why is fairly clear when considering the Winner’s Curse. I write a short little blog post to demonstrate just how exaggerated the effects detected from peeking can be here. If you need to tell your stakeholders not to peek, its probably best to come at it from this angle as opposed to a statistical angle, which they neither understand nor care about. submitted by /u/__compactsupport__ [link] [comments]

- [P] Multi Output Regression to predict cost and revenue from ROAS and other featuresby /u/ibraheemn73 (Machine Learning) on July 23, 2024 at 11:14 am
I am trying to predict expected Cost and Revenue for hotel_name and Channel from user inputs: ROAS (Revenue / Cost), hotel_name, and month (refer to below sample data). I've attempted using Multioutput Regression and the pymc-marketing library but haven't found a satisfactory solution. The predictions are not close to real data and major variabilities. Could someone suggest a method or a library that might be better suited for this problem? Multi Output Model script I have import pandas as pd import numpy as np from warnings import filterwarnings from sklearn.pipeline import Pipeline from sklearn.compose import ColumnTransformer from sklearn.preprocessing import OneHotEncoder from sklearn.multioutput import MultiOutputRegressor from sklearn.ensemble import RandomForestRegressor from sklearn.model_selection import train_test_split filterwarnings('ignore') # Define function to exclude outliers def exclude_outliers_using_iqr(df, group_columns, columns, multiplier=1.5): def exclude_outliers(group): for column in columns: Q1 = group[column].quantile(0.25) Q3 = group[column].quantile(0.75) IQR = Q3 - Q1 lower_bound = Q1 - multiplier * IQR upper_bound = Q3 + multiplier * IQR group = group[(group[column] >= lower_bound) & (group[column] <= upper_bound)] return group df = df.groupby(group_columns).apply(exclude_outliers).reset_index(drop=True) return df # Define function to make new prediction def make_new_prediction(new_data, pipeline): new_df = pd.DataFrame([new_data]) new_X = new_df[['month', 'channel_grup', 'market', 'ROAS']] new_prediction = pipeline.predict(new_X) return new_prediction # Define function to predict ROAS for all def predict_roas_for_all(df, model, roas, month): days_in_month = { 1: 31, 2: 28, 3: 31, 4: 30, 5: 31, 6: 30, 7: 31, 8: 31, 9: 30, 10: 31, 11: 30, 12: 31 } num_days = days_in_month[month] unique_combinations = df[['channel_grup', 'market']].drop_duplicates() predictions = [] for _, row in unique_combinations.iterrows(): channel_group = row['channel_grup'] market = row['market'] input_data = { 'channel_grup': channel_group, 'market': market, 'ROAS': roas, 'month': month } input_df = pd.DataFrame([input_data]) prediction = model.predict(input_df) cost = prediction[0][0] * num_days revenue = prediction[0][1] * num_days prediction_result = { 'channel_grup': channel_group, 'market': market, 'ROAS': roas, 'month': month, 'cost': cost, 'revenue': revenue } predictions.append(prediction_result) predictions_df = pd.DataFrame(predictions) return predictions_df # Load data df = pd.read_csv(r'data.csv') df['date'] = pd.to_datetime(df['date']) # Define unique hotels list hotels_list = df['hotel_name'].unique() # Initialize final results list final_results = [] for hotel in hotels_list: print(hotel) df_hotel = df.loc[ (df['Revenue'] > 1) & (df['hotel_name'] == hotel) ].reset_index(drop=True) if df_hotel.shape[0] == 0: continue df_hotel.loc[df_hotel['channel_group'] == 'Search', 'channel_grup'] = df_hotel['channel'] + '_' + df_hotel['channel_group'] df_hotel.loc[df_hotel['channel_grup'].isna(), 'channel_grup'] = df_hotel['channel_group'] group = df_hotel.groupby(by=['date', 'channel_grup', 'hotel_name', 'market'])[['Cost', 'Revenue']].sum().reset_index() group['ROAS'] = (group['Revenue'] / group['Cost']).round(2) market_counts = group.groupby(by=['market'])['date'].count().reset_index().sort_values(by=['date']) top_market_percent = market_counts.tail(int(np.ceil(0.75 * len(market_counts)))) top_market_percent = top_market_percent.drop(columns=['date']) group = pd.merge(group, top_market_percent, on=['market'], how='right') group = exclude_outliers_using_iqr(group, ['market', 'channel_grup'], ['ROAS', 'Cost', 'Revenue']) group['month'] = group['date'].dt.month X = group[['month', 'channel_grup', 'market', 'ROAS']] y = group[['Cost', 'Revenue']] categorical_features = ['channel_grup', 'market'] categorical_transformer = OneHotEncoder(handle_unknown='ignore') preprocessor = ColumnTransformer( transformers=[ ('cat', categorical_transformer, categorical_features) ], remainder='passthrough' ) pipeline = Pipeline(verbose=True, steps=[ ('preprocessor', preprocessor), ('regressor', MultiOutputRegressor(RandomForestRegressor())) ]) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) pipeline.fit(X_train, y_train) for increment in range(1, 70, 1): for month in range(1, 13): increment_predictions = predict_roas_for_all(group, pipeline, increment, month) increment_predictions['hotel_name'] = hotel increment_predictions['increment'] = increment increment_predictions['month'] = month final_results.append(increment_predictions) # Combine all results into a single DataFrame final_results_df = pd.concat(final_results, ignore_index=True) Sample data: import pandas as pd data = { 'hotel_name': [ 'Jumeirah Burj Al Arab', 'Jumeirah Beach Hotel', 'Atlantis The Palm', 'Burj Khalifa Hotel', 'Armani Hotel Dubai', 'Jumeirah Burj Al Arab', 'Jumeirah Beach Hotel', 'Atlantis The Palm', 'Burj Khalifa Hotel', 'Armani Hotel Dubai', 'Jumeirah Burj Al Arab', 'Jumeirah Beach Hotel', 'Atlantis The Palm', 'Burj Khalifa Hotel', 'Armani Hotel Dubai' ], 'Channel': [ 'Bing_Search', 'Bing_Search', 'Bing_Search', 'Bing_Search', 'Bing_Search', 'Google_Search', 'Google_Search', 'Google_Search', 'Google_Search', 'Google_Search', 'Google_Search', 'Metasearch', 'Metasearch', 'Metasearch', 'Metasearch' ], 'market': [ 'Australia', 'UAE', 'UK', 'US', 'World Wide', 'Australia', 'Canada', 'UAE', 'UK', 'US', 'World Wide', 'India', 'UAE', 'UK', 'US' ], 'year': [ 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2024, 2023, 2023, 2023 ], 'month': [ 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 2, 2, 2 ], 'Cost': [ 38.1, 27.0, 26.2, 426.2, 119.8, 1177.8, 291.3, 16727.9, 10178.4, 4592.7, 44880.7, 162.2, 281.8, 45.0, 321.4 ], 'Revenue': [ 20946.6, 30081.5, 21308.8, 174064.0, 22784.2, 105614.4, 13672.4, 509304.4, 692854.5, 353565.6, 1164871.3, 107757.7, 27406.1, 31325.9, 80625.0 ], 'ROAS': [ 549.78, 1114.13, 813.31, 408.41, 190.19, 89.67, 46.94, 30.45, 68.07, 76.98, 25.95, 664.35, 97.25, 696.13, 250.86 ] ) Thank You! submitted by /u/ibraheemn73 [link] [comments]

- [P] haipera - an open source tool to instrument Python notebooks & scripts with configs without writing any codeby /u/dromger (Machine Learning) on July 23, 2024 at 3:24 am
TL;DR: I made an open source (apache 2) tool (https://github.com/haipera/haipera) to make it easier to do hyperparameter sweeps with simple scripts & notebooks. Hey everyone! I've been doing research in ML / CV for a good 7 years now and I've always been frustrated by how much time I have to spend writing instrumentation code instead of writing algorithms. By instrumentation code, I mean things like: config management, config logging, logging in general, experiment tracking, etc... In my career I've written countless dataclasses, yaml files, json files, and a lot more code to pass the parameters from these config objects through layers of class hierarchies- just to find out whatever experiment I was trying wasn't fruitful and now having to delete what I just added. The oft-repeated meme is 'machine learning researchers just do hyperparameter sweeps' but reality is that we actually write code to pass in these hyper parameters so that we can do the sweeps. This causes even more problems when transferring the code to product teams; product teams get code that has 600 lines of argparse and code that copies from argparse into initializers; code that is often buggy and makes cross-project compatibility hard. I also have a lot of friends who work / worked on config systems to try to solve this- from Hydra to tyro to dysweep to countless internal tools. I was one of them too, but the problem is that these libraries tend to get increasingly complex... as they try to be more 'robust' and less broken. More code to write is almost never the solution. So I wanted to experiment with a new paradigm which throws away instrumentation code entirely and relies on static parsing to instrument code. This means you don't have to ever write a single line of code to enable things like configs for your code. This is recently made possible with the availability of better parsing libraries (like ast and libcst). LLMs hold a lot of exciting potential for this too looking into the future. How does this all work? Given a script like: num_apples = 100 apple_price = 3.0 print("# apples: ", num_apples) print("price of an apple: ", apple_price) price = num_apples * apple_price print("total: ", price) You just do pip install haipera and then you can run the script with haipera run script.py. You can run haipera run script.py --help to see that variables are directly editable from the CLI (right now only supports globals, and primitive types like numbers, bools, strings). You can run something like haipera run script.py --apple-price 1.0 to directly set the parameters from CLI. When you run with haipera, it will create its own experiment folder in reports and populates it with an automatically generated config file which you can rerun directly for reproducibility. If you want to do grid sweeps, you can simply pass in multiple arguments like haipera run script.py --num-apples 1,2,3 --apple-price 2.0,3.0,4.0. You can also do other things like haipera run script.ipynb to run a notebook as a script (convenient if you want to develop inside a notebook, but run lots of experiment with configs as scripts) or haipera notebook script.ipynb --opt1 2 to spin up a new variant of the notebook with the provided config. This turns out to be convenient for versioning your notebooks too! I'm pretty excited about this library and have been getting feedback from my researcher friends, but I wanted to show you all and gather feedback. We plan to make this much more feature complete (like supporting more types of variables, generally making everything more robust, and adding support for things like GPU profiling instrumentation)- but before that we wanted to hear what people think of this and hear what sorts of features you wish existed in MLOps tooling in general. Let us know what you think! https://github.com/haipera/haipera submitted by /u/dromger [link] [comments]

- Self-supervised learning weights initialization "after" projection head [D][R]by /u/grid_world (Machine Learning) on July 22, 2024 at 8:00 pm
For most Self-supervised learning algorithms: SimCLR, MoCo, BYOL, SimSiam, SwAV, etc., its common to have a projection head after the base encoder (which in most cases is a vanilla ResNet-50 CNN). An example of such a projection (taken from SwAV) is: projection_head = nn.Sequential( nn.Linear(2048, 512), nn.BatchNorm1d(512), nn.ReLU(inplace=True), nn.Linear(512, 128), ) The output of this projection head is L2-normalized: x = projection_head(x) x = nn.functional.normalize(x, dim = 1, p = 2) I am trying to initialize a layer after the projection head as: wts = nn.Parameter(data = torch.empty(40 * 40, 128), requires_grad = True) # The projection head outputs weights in the range [-1, 1], so initialize SOM weights to be in that range- wts.data.uniform_(-1.0, 1.0) Since the output of the projection head is L2-normalized, I am assuming that the input range to "wts" ∈ [-1, 1] and therefore use the uniform initialization above. Is this a correct approach or am I missing something? submitted by /u/grid_world [link] [comments]

- [D] What are the problems with using Llama in a commercial app?by /u/technicallynotlying (Machine Learning) on July 22, 2024 at 6:24 pm
I searched and saw a thread saying Llama shouldn't be used for commercial purposes, but I can't tell why. I looked at the Meta license for Llama and it says you don't need a license until you have 700M monthly users, a number which there is no way the application I have in mind would ever hit. What am I missing? If I use Llama in a commercial application with far fewer users (maybe 1M per month at the very highest), is there going to be a problem? submitted by /u/technicallynotlying [link] [comments]

- Suggested literature/techniques to model forward moving averagesby /u/Brites_Krieg (Data Science) on July 22, 2024 at 2:10 pm
I want to start a personal project, but i'm failing to formulate my business problem into a model. I would love inputs on how to better look into this issue and what type of models/techniques i should be researching to tackle it. I want to model the nth day forward moving average of a metric on a given date based on previous days and on the latest available forward moving average for that given day. For example: Consider today is day 30 and I want to predict up to 360d forward moving average a metric. I will only have the actual average value of the 360d forward moving average on day 360. Currently i have the actual average values for 1d to 30d. I also have all of these forward moving averages for the past 5 years. The goal is to define ranges in for all forward moving averages from the latest date (31d) to 360d. I am failing to think of the type of model i'd be looking for or how should i structure the problem, given how the goal here is not to predict a single value, but all the values in the 31d to 360d range. submitted by /u/Brites_Krieg [link] [comments]

- Easiest way to calculate required sample size for A/B testsby /u/vastava_viz (Data Science) on July 22, 2024 at 2:03 pm
I am a data scientist that monitors ~5-10 A/B experiments in a given month. I've used numerous online sample size calculators, but had minor grievances with each of them.. so I did a completely sane and normal thing, and built my own! Screenshot of A/B Test calculator at www.samplesizecalc.com/proportion-metric Unlike other calculators, mine can handle different split ratios (e.g. 20/80 tests), more than 2 testing groups beyond "Control" and "Treatment", and you can choose between a one-sided or two-sided statistical test. Most importantly, it outputs the required sample size and estimated duration for multiple Minimum Detectable Effects so you can make the most informed estimate (and of course you can input your own custom MDE value!). Here is the calculator: https://www.samplesizecalc.com/proportion-metric And here is an article explaining the methodology, inputs and the calculator's underlying formula: https://www.samplesizecalc.com/blog/how-sample-size-calculator-works Please let me know what you think! I'm looking for feedback from those who design and run A/B tests in their day-to-day. I've built this to tailor my own needs, but now I want to make sure it's helpful to the general audience as well 🙂 Note: You all were very receptive to the first version of this calculator I posted, so wanted to re-share now that's it's been updated in some key ways. Cheers! submitted by /u/vastava_viz [link] [comments]

- [D] Supervised Fine-Tuning (SFT)by /u/juliannorton (Machine Learning) on July 22, 2024 at 2:03 pm
Every chatbot in use today, from ChatGPT to custom chatbots built from open-source large language models (LLMs), has been instruction-tuned. An LLM, like any language model, is simply a next-token predictor. To get a vanilla LLM to interact with a user like a chatbot, it must be fine-tuned using tens of thousands of examples of user-and-assistant conversations. This process, called supervised fine-tuning, is a basic building block of productionizing an LLM application. Publicly available LLMs remain general purpose and aren’t suitable for direct use in most business applications because they need to be continuously fine-tuned to produce high-quality results. A modern supervised fine-tuning solution involves something called the low-rank adapter. Low-rank adapters are relatively small matrices (millions, not billions of elements) that sit alongside each layer of the LLM and act as a sidekick. It’s job is to translate the inputs and outputs of LLM layers into the proper domain without adding latency in production. During the fine-tuning process, low-rank adapters are trained on gold standard examples to teach an LLM how to respond. If the dataset is high quality and diverse, then the fine-tuned LLM’s output measurably increases in quality with as few as 100 examples as opposed to tens of thousands. Traditionally, these examples would be handcrafted by an expert, but writing them is time-consuming and labor-intensive. At Plum Defense, we automatically generate examples that are on par with human-written ones. This allows for continuous fine-tuning, which increases the quality of the LLM’s responses on an ongoing basis. By combining well-trained low-rank adapters with a well-written system prompt, a machine learning practitioner can produce a robust application that conforms well to the required output and is fast enough to use in production. A good system prompt conveys the intention but is concise enough to leave room for retrieval-augmentation (RAG) systems to inject relevant facts into the application. The system prompt’s length also has a direct impact on application latency. The smaller the system prompt, the faster the application’s average response time. With advanced techniques like soft-prompting, the size of the system prompt can be reduced significantly, which speeds up response time. If you’d like to learn more about continuous fine-tuning and soft-prompting system for your production application, shoot me a message. submitted by /u/juliannorton [link] [comments]

- [P] TTSDS - Benchmarking recent TTS systemsby /u/cdminix (Machine Learning) on July 22, 2024 at 1:29 pm
TL;DR - I made a benchmark for TTS, and you can see the results here: https://huggingface.co/spaces/ttsds/benchmark There are a lot of LLM benchmarks out there and while they're not perfect, they give at least an overview over which systems perform well at which tasks. There wasn't anything similar for Text-to-Speech systems, so I decided to address that with my latest project. The idea was to find representations of speech that correspond to different factors: for example prosody, intelligibility, speaker, etc. - then compute a score based on the Wasserstein distances to real and noise data for the synthetic speech. I go more into detail on this in the paper (https://www.arxiv.org/abs/2407.12707), but I'm happy to answer any questions here as well. I then aggregate those factors into one score that corresponds with the overall quality of the synthetic speech - and this score correlates well with human evluation scores from papers from 2008 all the way to the recently released TTS Arena by huggingface. Anyone can submit their own synthetic speech here. and I will be adding some more models as well over the coming weeks. The code to run the benchmark offline is here. submitted by /u/cdminix [link] [comments]

- [Discussion] when I can use research models for commercial purposeby /u/Frosty-Equipment-692 (Machine Learning) on July 22, 2024 at 11:47 am
I was going through one research paper in which they are using diffusion model for specific purpose. I had a thought it can be use for commercial purposes with huge market opportunities if executed correctly. So I wonder, if have research paper code, model architecture and trained weights I have three questions 1. Can I use this model and weight productionize it and use for commercial? 2. If not, if a make some necessary changes in architecture or trained it new dataset or both the use for commercial purpose When I get into legal or copyright license issue submitted by /u/Frosty-Equipment-692 [link] [comments]

- [R] Equation requirements for PINNs (Physics-inforemd Neural networks)by /u/its_a_targaryen (Machine Learning) on July 22, 2024 at 11:00 am
I had a question about the differential equations in the loss term. Typically, in PINNs, we use differential equations of the predicted_output wrt to the input variables in the loss function. For example, if u is the predicted_output and x, y, m are the inputs, the loss function include terms like du/d(x,y,m). However, what if we only have differential equations for the input variables with respect to other input or the output variable? For example: dx/dt=f(x,y,u) dy/dt=g(x,u) Here, x and y derivates are wrt time t. and no equation for du/d(x,y,m) Is it possible to use a PINN approach in this case, where the loss function is constructed only using dx/dt and dy/dt? submitted by /u/its_a_targaryen [link] [comments]

- [P] FLUTE - a new CUDA kernel for quantized LLM Inference achieving up to 2.6x latency improvements over vLLM. It extends QLoRA with learnable scales to 4-bit and 3-bit per parameter quantization.by /u/radi-cho (Machine Learning) on July 22, 2024 at 8:56 am
The deployment of large language models (LLMs) is often constrained by memory bandwidth, where the primary bottleneck is the cost of transferring model parameters from the GPU's global memory to its registers. When coupled with custom kernels that fuse the dequantization and matmul operations, weight-only quantization can thus enable faster inference by reducing the amount of memory movement. However, developing high-performance kernels for weight-quantized LLMs presents substantial challenges, especially when the weights are compressed to non-evenly-divisible bit widths (e.g., 3 bits) with non-uniform, lookup table (LUT) quantization. This paper describes FLUTE, a flexible lookup table engine for LUT-quantized LLMs, which uses offline restructuring of the quantized weight matrix to minimize bit manipulations associated with unpacking, and vectorization and duplication of the lookup table to mitigate shared memory bandwidth constraints. At batch sizes < 32 and quantization group size of 128 (typical in LLM inference), the FLUTE kernel can be 2-4x faster than existing GEMM kernels. As an application of FLUTE, we explore a simple extension to lookup table-based NormalFloat quantization and apply it to quantize LLaMA3 to various configurations, obtaining competitive quantization performance against strong baselines while obtaining an end-to-end throughput increase of 1.5 to 2 times. Arxiv: https://arxiv.org/abs/2407.10960 submitted by /u/radi-cho [link] [comments]

- Perpetual: a gradient boosting machine which doesn't need hyperparameter tuningby /u/mutlu_simsek (Data Science) on July 22, 2024 at 8:30 am
Repo: https://github.com/perpetual-ml/perpetual PerpetualBooster is a gradient boosting machine (GBM) algorithm that doesn't need hyperparameter tuning so that you can use it without hyperparameter optimization libraries unlike other GBM algorithms. Similar to AutoML libraries, it has a budget parameter. Increasing the budget parameter increases the predictive power of the algorithm and gives better results on unseen data. The following table summarizes the results for the California Housing dataset (regression): Perpetual budget LightGBM n_estimators Perpetual mse LightGBM mse Perpetual cpu time LightGBM cpu time Speed-up 1.0 100 0.192 0.192 7.6 978 129x 1.5 300 0.188 0.188 21.8 3066 141x 2.1 1000 0.185 0.186 86.0 8720 101x PerpetualBooster prevents overfitting with a generalization algorithm. The paper is work-in-progress to explain how the algorithm works. Check our blog post for a high level introduction to the algorithm. submitted by /u/mutlu_simsek [link] [comments]

- [R] Neural networks have been trained to accurately predict the optimal geometry of molecules using 50 times less databy /u/AIRI_Institute (Machine Learning) on July 22, 2024 at 8:04 am
An important task of computational chemistry is to find molecular geometries where a local energy minimum is achieved, as these are the most likely configurations in which the molecule undergoes a chemical reaction. Despite recent progress in neural networks for molecular conformation energy prediction, such models are prone to errors due to distribution shifts, leading to inaccurate energy minimization. The quality of energy minimization with neural networks can be improved by providing optimization trajectories as additional training data. Still, obtaining complete optimization trajectories demands a lot of extra computations. A team of researchers developed a new framework called Gradual Optimization Learning Framework (GOLF), consisting of an efficient data-collecting scheme and an external optimizer. The author demonstrated that using significantly less additional data, the neural network trained with GOLF performs on par with the oracle on a benchmark of diverse drug-like molecules. The ~paper~ is published in the ICLR 2024 conference proceedings submitted by /u/AIRI_Institute [link] [comments]

- [P] ModelClash: Dynamic LLM Evaluation Through AI Duelsby /u/throwquestion111 (Machine Learning) on July 22, 2024 at 7:20 am
I've developed ModelClash, an open-source framework for LLM evaluation that could offer some potential advantages over static benchmarks: Automatic challenge generation, reducing manual effort Should scale with advancing model capabilities Evaluates both problem creation and solving skills The project is in early stages, but initial tests with GPT and Claude models show promising results. GitHub: https://github.com/mrconter1/ModelClash What are your thoughts on how this approach could complement existing LLM evaluation methods? submitted by /u/throwquestion111 [link] [comments]

- [D] Aggregating token probabilitiesby /u/archiesteviegordie (Machine Learning) on July 22, 2024 at 5:36 am
What are some good aggregation techniques that I can use to give a score to the generated sequence using the token probabilities (this can be either just the softmax probabilities or the log probabilities)? For example, finding key entities in an answer and trying to find out the token probabilities of it and see how much is the median token probabilities accross such key entities. submitted by /u/archiesteviegordie [link] [comments]

- [Discussion] Document Image Restorationby /u/atlury (Machine Learning) on July 22, 2024 at 5:25 am
Here is DocRes a Image Restoration model running in chainner for improving scanned documents. Original Image followed by Restored image followed by chainner model. Going further, using Mindee Doctr to very accurately getting line segments. The next task that I am working on is getting font sizes recognized, then font styles and then using Microsoft Phi-3 or similar model with OCR capabilities to OCR and apply the styles and then restore the image Links https://github.com/ZZZHANG-jx/DocRes https://github.com/chaiNNer-org/chaiNNer Original Image Restored Image Chainner Architecture Line Segments Recognized submitted by /u/atlury [link] [comments]

- [P] Best practices in fine tuning OS models with sparse data for custom downstream tasksby /u/VBQL (Machine Learning) on July 22, 2024 at 5:03 am
I have a certain downstream task that during the input, 99+% of data is context, being generated by various sources. The actual model output are just a couple of tokens, however the input can vary from 2k tokens all the way up to 10k tokens in size. Therefore, I'm trying to fine tune mistral 7b v0.3 for this task, given the long context window. But trying a lower learning rate like 8e-6 and decaying I'm still getting higher and higher training losses per run. The training set consists of the standard input_ids, attention_mask and labels, but due to the nature of training data attention_mask and labels would be mostly 1s and -100s, respectively. Since they also vary wildly in size, I've packed the data into length of 4096 so that its constant. My training machine is the AWS trn1n.32xlarge type. Are there any suggestions on what I should do here? For anyone curious on the dataset, here is a link to the directly tokenized version of the data. submitted by /u/VBQL [link] [comments]

- Weekly Entering & Transitioning - Thread 22 Jul, 2024 - 29 Jul, 2024by /u/AutoModerator (Data Science) on July 22, 2024 at 4:01 am
Welcome to this week's entering & transitioning thread! This thread is for any questions about getting started, studying, or transitioning into the data science field. Topics include: Learning resources (e.g. books, tutorials, videos) Traditional education (e.g. schools, degrees, electives) Alternative education (e.g. online courses, bootcamps) Job search questions (e.g. resumes, applying, career prospects) Elementary questions (e.g. where to start, what next) While you wait for answers from the community, check out the FAQ and Resources pages on our wiki. You can also search for answers in past weekly threads. submitted by /u/AutoModerator [link] [comments]