**What are the top 3 methods used to find Autoregressive Parameters in Data Science?**

In order to find autoregressive parameters, you will first need to understand what autoregression is. **Autoregression is a statistical method used to create a model that describes data as a function of linear regression of lagged values of the dependent variable**. In other words, it is a model that uses past values of a dependent variable in order to predict future values of the same dependent variable.

In time series analysis,** autoregression is the use of previous values in a time series to predict future values.** In other words, it is a form of regression where the dependent variable is forecasted using a linear combination of past values of the independent variable. The parameter values for the autoregression model are estimated using the method of least squares.

The autoregressive parameters are the coefficients in the autoregressive model. These coefficients can be estimated in a number of ways, including ordinary least squares (OLS), maximum likelihood (ML), or least squares with L1 regularization (LASSO). Once estimated, the autoregressive parameters can be used to predict future values of the dependent variable.

To find the autoregressive parameters, you need to use a method known as** least squares regression**. This method finds the parameters that minimize the sum of the squared residuals. The residual is simply the difference between the predicted value and the actual value. So, in essence, you are finding the parameters that best fit the data.

**How to Estimate Autoregressive Parameters?**

There are three main ways to estimate autoregressive parameters: ordinary least squares (OLS), maximum likelihood (ML), or least squares with L1 regularization (LASSO).

**Ordinary Least Squares**: Ordinary least squares is the simplest and most common method for estimating autoregressive parameters. This method estimates the parameters by minimizing the sum of squared errors between actual and predicted values.

**Maximum Likelihood**: Maximum likelihood is another common method for estimating autoregressive parameters. This method estimates the parameters by maximizing the likelihood function. The likelihood function is a mathematical function that quantifies the probability of observing a given set of data given certain parameter values.

**Least Squares with L1 Regularization**: Least squares with L1 regularization is another method for estimating autoregressive parameters. This method estimates the parameters by minimizing the sum of squared errors between actual and predicted values while also penalizing models with many parameters. L1 regularization penalizes models by adding an extra term to the error function that is proportional to the sum of absolute values of the estimator coefficients.

**Finding Autoregressive Parameters:** The Math Behind It

To find the autoregressive parameters using least squares regression, you first need to set up your data in a certain way. You need to have your dependent variable in one column and your independent variables in other columns. For example, let’s say you want to use three years of data to predict next year’s sales (the dependent variable). Your data would look something like this:

| Year | Sales |

|——|——-|

| 2016 | 100 |

| 2017 | 150 |

| 2018 | 200 |

Next, you need to calculate the means for each column. For our sales example, that would look like this:

$$ \bar{Y} = \frac{100+150+200}{3} = 150$$

Now we can calculate each element in what’s called the variance-covariance matrix:

$$ \operatorname {Var} (X)=\sum _{i=1}^{n}\left({x_{i}}-{\bar {x}}\right)^{2} $$

and

$$ \operatorname {Cov} (X,Y)=\sum _{i=1}^{n}\left({x_{i}}-{\bar {x}}\right)\left({y_{i}}-{\bar {y}}\right) $$

For our sales example, that calculation would look like this:

$$ \operatorname {Var} (Y)=\sum _{i=1}^{3}\left({y_{i}}-{\bar {y}}\right)^{2}=(100-150)^{2}+(150-150)^{2}+(200-150)^{2})=2500 $$

### Advertise with us - Post Your Good Content Here

We are ranked in the Top 20 on Google

### AI Dashboard is available on the Web, Apple, Google, and Microsoft, PRO version

and

$$ \operatorname {Cov} (X,Y)=\sum _{i=1}^{3}\left({x_{i}}-{\bar {x}}\right)\left({y_{i}}-{\bar {y}}\right)=(2016-2017)(100-150)+(2017-2017)(150-150)+(2018-2017)(200-150))=-500 $$

Now we can finally calculate our autoregressive parameters! We do that by solving this equation:

$$ \hat {\beta }=(X^{\prime }X)^{-1}X^{\prime }Y=\frac {1}{2500}\times 2500\times (-500)=0.20 $$\.20 . That’s it! Our autoregressive parameter is 0\.20 . Once we have that parameter, we can plug it into our autoregressive equation:

$$ Y_{t+1}=0\.20 Y_t+a_1+a_2+a_3footnote{where $a_1$, $a_2$, and $a_3$ are error terms assuming an AR(3)} .$$ And that’s how you solve for autoregressive parameters! Of course, in reality you would be working with much larger datasets, but the underlying principles are still the same. Once you have your autoregressive parameters, you can plug them into the equation and start making predictions!.

**Which Method Should You Use?**

The estimation method you should use depends on your particular situation and goals. If you are looking for simple and interpretable results, then Ordinary Least Squares may be the best method for you. If you are looking for more accurate predictions, then Maximum Likelihood or Least Squares with L1 Regularization may be better methods for you.

**Autoregressive models STEP BY STEP:**

1) **Download data**: The first step is to download some data. This can be done by finding a publicly available dataset or by using your own data if you have any. For this example, we will be using data from the United Nations Comtrade Database.

2) **Choose your variables**: Once you have your dataset, you will need to choose the variables you want to use in your autoregression model. In our case, we will be using the import and export values of goods between countries as our independent variables.

3) **Estimate your model:** After choosing your independent variables, you can estimate your autoregression model using the method of least squares. OLS estimation can be done in many statistical software packages such as R or STATA.

4) **Interpret your results**: Once you have estimated your model, it is important to interpret the results in order to understand what they mean. The coefficients represent the effect that each independent variable has on the dependent variable. In our case, the coefficients represent the effect that imports and exports have on trade balance. A positive coefficient indicates that an increase in the independent variable leads to an increase in the dependent variable while a negative coefficient indicates that an increase in the independent variable leads to a decrease in the dependent variable.

5)**Make predictions:** Finally, once you have interpreted your results, you can use your autoregression model to make predictions about future values of the dependent variable based on past values of the independent variables.

**Conclusion:** In this blog post, we have discussed what autoregression is and how to find autoregressive parameters.

Estimating an autoregression model is a relatively simple process that can be done in many statistical software packages such as R or STATA.

In statistics and machine learning, autoregression is a modeling technique used to describe the linear relationship between a dependent variable and one more independent variables. To find the autoregressive parameters, you can use a method known as least squares regression which minimizes the sum of squared residuals. This blog post also explains how to set up your data for calculating least squares regression as well as how to calculate Variance and Covariance before finally calculating your autoregressive parameters. After finding your parameters you can plug them into an autoregressive equation to start making predictions about future events!

We have also discussed three different methods for estimating those parameters: Ordinary Least Squares, Maximum Likelihood, and Least Squares with L1 Regularization. **The appropriate estimation method depends on your particular goals and situation.**

# Machine Learning For Dummies App

Machine Learning For Dummies on iOs: https://apps.apple.com/

Machine Learning For Dummies on Windows: https://www.

Machine Learning For Dummies Web/Android on Amazon: https://www.amazon.

What are some good datasets for Data Science and Machine Learning?

Machine Learning Engineer Interview Questions and Answers

**Machine Learning Breaking News **

Transformer – Machine Learning Models

**Machine Learning – Software Classification**

# Autoregressive Model

Autoregressive generative models can estimate complex continuous data distributions such as trajectory rollouts in an RL environment, image intensities, and audio. Traditional techniques discretize continuous data into various bins and approximate the continuous data distribution using categorical distributions over the bins. This approximation is parameter inefficient as it cannot express abrupt changes in density without using a significant number of additional bins. Adaptive Categorical Discretization (ADACAT) is proposed in this paper as a parameterization of 1-D conditionals that is expressive, parameter efficient, and multimodal. A vector of interval widths and masses is used to parameterize the distribution known as ADACAT. Figure 1 showcases the difference between the traditional uniform categorical discretization approach with the proposed ADACAT.

Each component of the ADACAT distribution has non-overlapping support, making it a specific subfamily of mixtures of uniform distributions. ADACAT generalizes uniformly discretized 1-D categorical distributions. The proposed architecture allows for variable bin widths and more closely approximates the modes of two Gaussians mixture than a uniformly discretized categorical, making it highly expressive than the latter. Additionally, a distribution’s support is discretized using quantile-based discretization, which bins data into groups with similar measured data points. ADACAT uses deep autoregressive frameworks to factorize the joint density into numerous 1-D conditional ADACAT distributions in problems with more than one dimension.

Continue reading | *Check out the* *paper* *and* *github link.*

**Pytorch – Computer Application**

https://torchmetrics.readthedocs.io/en/stable//index.html

Best practices for training PyTorch model

What are some ways we can use machine learning and artificial intelligence for algorithmic trading in the stock market?

What are some good datasets for Data Science and Machine Learning?

Top 100 Data Science and Data Analytics and Data Engineering Interview Questions and Answers

Machine Learning Engineer Interview Questions and Answers

- [D] Is creating indexes and using vector databases really required in my use case for Retrieval Augmented Generation operations?by /u/Wise-Grand-8374 (Machine Learning) on August 7, 2024 at 8:14 pm
My use case is simple: I need to extract information from small 1-3 paged documents. Method 1: Should I append the whole text from the document along with the query and pass it to a LLM? I think this will do the job(I may be wrong) since the final number of tokens will for sure be lesser than the context window size of the LLM. Method 2: Or should I split it into chunks and store them in a vector database with indexes and retrieve relevant chunks before passing it to the LLM? The only advantage I can think of is scalability in case of large documents in the future. Will it improve the accuracy of retrieval in case of small documents? Since the documents are small, the processing time won’t be an issue. The main thing I care about is accurately extracting the information from the text. If method 2 helps in improving the accuracy, I will go with it. Any suggestions will be helpful! submitted by /u/Wise-Grand-8374 [link] [comments]

- [Research] The Puzzling Failure of Multimodal AI Chatbotsby /u/chiayewken (Machine Learning) on August 7, 2024 at 5:33 pm
https://preview.redd.it/ummnvenf1ahd1.png?width=2592&format=png&auto=webp&s=7115ba5de026ada17b0636ec2fa3c3151b3e5eb6 Chatbot models such as GPT-4o and Gemini have demonstrated impressive capabilities in understanding both images and texts. However, it is not clear whether they can emulate the general intelligence and reasoning ability of humans. To this end, PuzzleVQA is a new benchmark of multimodal puzzles to explore the limits of current models. As shown above, even models such as GPT-4V struggle to understand simple abstract patterns that a child could grasp. https://preview.redd.it/7l5fmuys1ahd1.png?width=2716&format=png&auto=webp&s=337118dbc55230637cec1b08b90ae943746ddbb0 Despite the apparent simplicity of the puzzles, we observe surprisingly poor performance for current multimodal AI models. Notably, there remains a massive gap towards human performance. Thus, the natural question arises: what caused the failure of the models? To answer this question, we ran a bottleneck analysis by progressively providing ground-truth "hints" to the models, such as image captions for perception or reasoning explanations. As shown above, we found that leading models face key challenges in visual perception and inductive reasoning. This means that they are not able to accurately perceive the objects in the images, and they are also poor at recognizing the correct patterns. https://arxiv.org/abs/2403.13315 submitted by /u/chiayewken [link] [comments]

- [D] Is there an appropriate community for technical discussions of general intelligence development?by /u/Revolutionary-Fig660 (Machine Learning) on August 7, 2024 at 5:27 pm
Acknowledgment that is post is skirting the line of not discussing AGI, and mods can delete it. I know that posts related to AGI should be directed to r/singularity , but that reddit seems to mostly be filled with non-technical posts hyping and philosophizing news articles. I think there is a lot of valid discussion in the field of ML to be had regarding technical approaches, issues, and research to creating generalized intelligence, such as spiking networks, evolutionary algorithms, memory augmented networks, RL etc. I don't think just scaling current approaches (LLMs) will get us there for technical reasons and we are rather far out, but I don't want this post to be about discussing that. Rather, are there recommendations for communities or other groups that focus on the technical work, research, and practical discussion of working towards AGI? submitted by /u/Revolutionary-Fig660 [link] [comments]

- [D] What are the benefits of being on a program committee?by /u/smorad (Machine Learning) on August 7, 2024 at 4:59 pm
I'm curious as to what being on a program committee entails for an ML conference, and why one would choose to be on the program committee. As a reviewer for the ML conferences, I can see benefit in getting to read a few papers deeply. What is the benefit of being on the program committee? My understanding is the job is to mostly ping reviewers, summarize reviews, and other administrative tasks. submitted by /u/smorad [link] [comments]

- [D] Sequence labelingby /u/FeatureBackground634 (Machine Learning) on August 7, 2024 at 3:30 pm
Looking for a an NLP model/research papers that can tag long sequences. Unline NER where entities tagged are usually small spans like name, location etc ; I am looking for a model that can work on extracting longer sequences. It can be a QA like model which is capable of tagging longer spans as the answer. Thanks!!! submitted by /u/FeatureBackground634 [link] [comments]

- [D] Question About Public Access to Reviews for AAAI Submissionsby /u/RudeFollowing2534 (Machine Learning) on August 7, 2024 at 3:24 pm
I am preparing a submission for AAAI, which is using OpenReview this year. Does anyone know if all reviews, including those for rejected papers, will be made public after the review process? I couldn't find this information on the AAAI website. Thanks! submitted by /u/RudeFollowing2534 [link] [comments]

- [P] Training an Embedding Model to Ignore Unnecessary Dimensions for a Topicby /u/zeronyk (Machine Learning) on August 7, 2024 at 2:09 pm
Hi, I’m working on building a Knowledge Management Tool for a fixed set of topic-specific documents. The primary goal is to make these documents "explorable" in the embedding space and to cluster them intelligently. However, I've noticed that most of the embeddings are very close together, which I believe is because they all revolve around the same topic. My idea is to fine-tune a model to de-emphasize the rest of the embedding space, thereby boosting the differences within the same topic and making them more comparable. I initially tried using PCA for this, but the results were not good. Another idea I’m exploring is using a small autoencoder on the embeddings, or possibly fine-tuning an open-source embedding model for this purpose. However, I’m not sure how to start. Does anyone have experience with this? If so, what approaches, models, frameworks, or sources did you use, and what were the results? Additionally, I’m searching for nice visual exploration of the dataset on top of this. While aesthetics are secondary, I’m interested in any recommendations for effective plotting methods. submitted by /u/zeronyk [link] [comments]

- [D] Are Neurips 2024 rebuttal viewable to reviewers now?by /u/fixed-point-learning (Machine Learning) on August 7, 2024 at 1:37 pm
This should have happened a couple of hours ago, but the papers I reviewed still only show the original reviews only, no rebuttals. What’s going on? submitted by /u/fixed-point-learning [link] [comments]

- [D] RLHF for LLMs: Variable number of actions?by /u/No_Individual_7831 (Machine Learning) on August 7, 2024 at 1:13 pm
Hey, I have a question regarding the PPO involvement in RLHF for LLMs. As the goal is to optimize the answer of the model, i.e. a sequence of tokens, how does the action space look? Token sequences always differ in length and one token equals one action. So the number of actions the model has to output simultaneously at each step varies. I have never worked in RL with such a scenario where the number of actions varies at each step. So my question is, is this even the right intuition how the actions are framed and, if yes, how is the variable number of actions handled? submitted by /u/No_Individual_7831 [link] [comments]

- [D] How do you keep track of all your experiments ?by /u/Theboredhuman_56 (Machine Learning) on August 7, 2024 at 12:32 pm
Hello everyone, In my company, we are conducting a lot of experiments on LLMs. We are currently in the process of doing "small-scale" experiments to do various things (select various hyperparameters, do some small architecture changes, what dataset to use, etc ...) We are using WandB and it's pretty cool to log experiments but I'm not aware of any features to go a step further in terms of collaboration. For instance, we would like to have something were we can write conclusions from the various experiments/plots we launched and ideally have the plots and conclusions stored in one place. This way it's easy to keep track of everything and in particular when we go back to experiments months later, we are able to understand why we launched it and what was the conclusion out of it. How do you manage that ? Do you use specific tools ? submitted by /u/Theboredhuman_56 [link] [comments]

- [P] Error in `Actual` valuesby /u/WillsSpirit (Machine Learning) on August 7, 2024 at 11:44 am
So, I am trying to make a neural network from scratch in C. It started out just being able to fit to boolean functions such as XOR, but I slowly tinkered and made it more and more complicated. I am now on the task of trying to fit the model to classify Circles and Crosses, with both able to be made of 0's, 1's, 2's, and 3's. I am so sorry if my model is horrible, I am a relative noob at programming, but I am so close to getting near 100% accuracy. Here's the only problem: When my variable `int label == 5`, my `actual` value is wrong. This means that my network is correctly classifying everything perfectly, but my labels (supposed to be ground-truth) are sometimes wrong, and only when label = 5. I have no idea what I'm doing wrong and have spent an hour + trying to figure it out. I assume the error is in either my void generate_image() function, or in main(). I'd appreciate any help I can get! About this error, or about anything stupid architecture-wise that I should change. I am mainly doing this for educational purposes to apply the math I've learned. #include <stdio.h> #include <stdlib.h> #include <math.h> #include <time.h> #include <string.h> #define INPUT_SIZE 784 #define HIDDEN_SIZE 64 #define OUTPUT_SIZE 8 #define NUM_TRAINING_EXAMPLES 1000 #define LEARNING_RATE 0.05 #define EPOCHS 100 #define IMAGE_SIZE 28 // Function prototypes double sigmoid(double x); double sigmoid_derivative(double x); void init_weights(double *matrix, int size, int input_size); void init_bias(double *matrix, int size); void matrix_multiply(const double *a, const double *b, double *result, int a_rows, int a_cols, int b_cols); void matrix_add(const double *a, const double *b, double *result, int size); void matrix_subtract(const double *a, const double *b, double *result, int size); void matrix_hadamard(const double *a, const double *b, double *result, int size); void forward_propagation(const double *input, const double *hidden_weights, const double *hidden_bias, const double *output_weights, const double *output_bias, double *hidden_output, double *final_output); void backward_propagation(const double *input, const double *hidden_output, const double *final_output, double *hidden_weights, double *output_weights, double *hidden_bias, double *output_bias, int target); void generate_image(double *image, int *label); void print_image(const double *image, int predicted); int main() { clock_t start_time = clock(); srand(time(NULL)); double *hidden_weights = malloc(INPUT_SIZE * HIDDEN_SIZE * sizeof(double)); double *output_weights = malloc(HIDDEN_SIZE * OUTPUT_SIZE * sizeof(double)); double *hidden_bias = malloc(HIDDEN_SIZE * sizeof(double)); double *output_bias = malloc(OUTPUT_SIZE * sizeof(double)); init_weights(hidden_weights, INPUT_SIZE * HIDDEN_SIZE, INPUT_SIZE); init_weights(output_weights, HIDDEN_SIZE * OUTPUT_SIZE, HIDDEN_SIZE); init_bias(hidden_bias, HIDDEN_SIZE); init_bias(output_bias, OUTPUT_SIZE); for (int epoch = 0; epoch < EPOCHS; epoch++) { int correct = 0; for (int i = 0; i < NUM_TRAINING_EXAMPLES; i++) { double input[INPUT_SIZE]; int label; generate_image(input, &label); double hidden_output[HIDDEN_SIZE]; double final_output[OUTPUT_SIZE]; forward_propagation(input, hidden_weights, hidden_bias, output_weights, output_bias, hidden_output, final_output); backward_propagation(input, hidden_output, final_output, hidden_weights, output_weights, hidden_bias, output_bias, label); int predicted = 0; double max_output = final_output[0]; for (int j = 1; j < OUTPUT_SIZE; j++) { if (final_output[j] > final_output[predicted]) { max_output = final_output[j]; predicted = j; } } if (predicted == label) correct++; } printf("Epoch %d, Accuracy: %.2f%%\n", epoch, (float)correct / NUM_TRAINING_EXAMPLES * 100); } printf("\nTesting the neural network:\n"); int correct = 0; for (int i = 0; i < 50; i++) { double input[INPUT_SIZE]; int label; generate_image(input, &label); double hidden_output[HIDDEN_SIZE]; double final_output[OUTPUT_SIZE]; forward_propagation(input, hidden_weights, hidden_bias, output_weights, output_bias, hidden_output, final_output); int predicted = 0; double max_output = final_output[0]; for (int j = 1; j < OUTPUT_SIZE; j++) { if (final_output[j] > final_output[predicted]) { max_output = final_output[j]; predicted = j; } } printf("%d", label); if (predicted == label) correct++; const char* label_names[] = {"Cross 0", "Cross 1", "Cross 2", "Cross 3", "Circle 0", "Circle 1", "Circle 2", "Circle 3"}; printf("\nActual: %s, Predicted: %s\n", label_names[label], label_names[predicted]); print_image(input, predicted); printf("\n"); } free(hidden_weights); free(output_weights); free(hidden_bias); free(output_bias); clock_t end_time = clock(); double time_spent = (double)(end_time - start_time) / CLOCKS_PER_SEC; printf("Time spent: %.2f seconds\n", time_spent); return 0; } double sigmoid(double x) { return 1 / (1 + exp(-x)); } double sigmoid_derivative(double x) { return x * (1 - x); } void init_weights(double *matrix, int size, int input_size) { double limit = sqrt(6.0 / (input_size + size)); for (int i = 0; i < size; i++) { matrix[i] = ((double)rand() / RAND_MAX) * 2 * limit - limit; } } void init_bias(double *matrix, int size) { for (int i = 0; i < size; i++) { matrix[i] = 0.0; } } void matrix_multiply(const double *a, const double *b, double *result, int a_rows, int a_cols, int b_cols) { for (int i = 0; i < a_rows; i++) { for (int j = 0; j < b_cols; j++) { double sum = 0.0; for (int k = 0; k < a_cols; k++) { sum += a[i * a_cols + k] * b[k * b_cols + j]; } result[i * b_cols + j] = sum; } } } void matrix_add(const double *a, const double *b, double *result, int size) { for (int i = 0; i < size; i++) { result[i] = a[i] + b[i]; } } void matrix_subtract(const double *a, const double *b, double *result, int size) { for (int i = 0; i < size; i++) { result[i] = a[i] - b[i]; } } void matrix_hadamard(const double *a, const double *b, double *result, int size) { for (int i = 0; i < size; i++) { result[i] = a[i] * b[i]; } } void forward_propagation(const double *input, const double *hidden_weights, const double *hidden_bias, const double *output_weights, const double *output_bias, double *hidden_output, double *final_output) { double hidden_sum[HIDDEN_SIZE]; matrix_multiply(input, hidden_weights, hidden_sum, 1, INPUT_SIZE, HIDDEN_SIZE); matrix_add(hidden_sum, hidden_bias, hidden_sum, HIDDEN_SIZE); for (int i = 0; i < HIDDEN_SIZE; i++) { hidden_output[i] = sigmoid(hidden_sum[i]); } double output_sum[OUTPUT_SIZE]; matrix_multiply(hidden_output, output_weights, output_sum, 1, HIDDEN_SIZE, OUTPUT_SIZE); matrix_add(output_sum, output_bias, output_sum, OUTPUT_SIZE); for (int i = 0; i < OUTPUT_SIZE; i++) { final_output[i] = sigmoid(output_sum[i]); } } void backward_propagation(const double *input, const double *hidden_output, const double *final_output, double *hidden_weights, double *output_weights, double *hidden_bias, double *output_bias, int target) { double target_vector[OUTPUT_SIZE] = {0}; target_vector[target] = 1; double output_error[OUTPUT_SIZE]; matrix_subtract(target_vector, final_output, output_error, OUTPUT_SIZE); double output_delta[OUTPUT_SIZE]; for (int i = 0; i < OUTPUT_SIZE; i++) { output_delta[i] = output_error[i] * sigmoid_derivative(final_output[i]); } double hidden_error[HIDDEN_SIZE] = {0}; for (int i = 0; i < HIDDEN_SIZE; i++) { for (int j = 0; j < OUTPUT_SIZE; j++) { hidden_error[i] += output_delta[j] * output_weights[i * OUTPUT_SIZE + j]; } } double hidden_delta[HIDDEN_SIZE]; for (int i = 0; i < HIDDEN_SIZE; i++) { hidden_delta[i] = hidden_error[i] * sigmoid_derivative(hidden_output[i]); } for (int i = 0; i < HIDDEN_SIZE; i++) { for (int j = 0; j < OUTPUT_SIZE; j++) { output_weights[i * OUTPUT_SIZE + j] += LEARNING_RATE * output_delta[j] * hidden_output[i]; } } for (int i = 0; i < INPUT_SIZE; i++) { for (int j = 0; j < HIDDEN_SIZE; j++) { hidden_weights[i * HIDDEN_SIZE + j] += LEARNING_RATE * hidden_delta[j] * input[i]; } } for (int i = 0; i < OUTPUT_SIZE; i++) { output_bias[i] += LEARNING_RATE * output_delta[i]; } for (int i = 0; i < HIDDEN_SIZE; i++) { hidden_bias[i] += LEARNING_RATE * hidden_delta[i]; } } void generate_image(double *image, int *label) { int shape = rand() % 2; // 0 for cross, 1 for circle int symbol = rand() % 4; // 0, 1, 2, or 3 *label = shape * 4 + symbol; memset(image, 0, INPUT_SIZE * sizeof(double)); double symbol_values[] = {0.6, 0.7, 0.8, 0.9}; double symbol_value = symbol_values[symbol]; double background_value = 0.0; if (shape == 0) { // Cross for (int i = 0; i < IMAGE_SIZE; i++) { for (int j = 0; j < IMAGE_SIZE; j++) { if (i == IMAGE_SIZE/2 || i == IMAGE_SIZE/2-1 || j == IMAGE_SIZE/2 || j == IMAGE_SIZE/2-1) { image[i * IMAGE_SIZE + j] = symbol_value; } else { image[i * IMAGE_SIZE + j] = background_value; } } } } else { // Circle int center = IMAGE_SIZE / 2; int radius = IMAGE_SIZE / 4; for (int i = 0; i < IMAGE_SIZE; i++) { for (int j = 0; j < IMAGE_SIZE; j++) { int dx = i - center; int dy = j - center; if (dx*dx + dy*dy <= radius*radius && dx*dx + dy*dy >= (radius-2)*(radius-2)) { image[i * IMAGE_SIZE + j] = symbol_value; } else { image[i * IMAGE_SIZE + j] = background_value; } } } } // Add noise (unchanged) for (int i = 0; i < INPUT_SIZE; i++) { image[i] += ((double)rand() / RAND_MAX) * 0.2 - 0.1; if (image[i] > 1.0) image[i] = 1.0; if (image[i] < 0.0) image[i] = 0.0; } } void print_image(const double *input, int label) { int shape = label / 4; // 0 for cross, 1 for circle int symbol = label % 4; char symbol_char = '0' + symbol; for (int i = 0; i < IMAGE_SIZE; i++) { for (int j = 0; j < IMAGE_SIZE; j++) { if (input[i * IMAGE_SIZE + j] > 0.5) { if (shape == 1) { // Circle int center = IMAGE_SIZE / 2; int dx = j - center; int dy = i - center; if (dx*dx + dy*dy <= center*center) { printf("%c", symbol_char); } else { printf(" "); } } else { // Cross if (i == IMAGE_SIZE/2 || i == IMAGE_SIZE/2-1 || j == IMAGE_SIZE/2 || j == IMAGE_SIZE/2-1) { printf("%c", symbol_char); } else { printf(" "); } } } else { printf(" "); } } printf("\n"); } } submitted by /u/WillsSpirit [link] [comments]

- [D] AI/ML in big tech vs biotechby /u/Pleasant_Wish1799 (Machine Learning) on August 7, 2024 at 5:13 am
I'm curious why a strong ML engineer would leave a big tech firm (like Google, Microsoft or OpenAI) and work for biotech company. What is the appeal to biotech versus all the cutting edge innovation happening in tech companies? submitted by /u/Pleasant_Wish1799 [link] [comments]

- [R] State of the art in Scene Flow Estimation?by /u/DisciplinedPenguin (Machine Learning) on August 7, 2024 at 4:31 am
What's the state of the art in scene flow estimation? Suggested reads would be very appreciated. submitted by /u/DisciplinedPenguin [link] [comments]

- [P] GroundedAI: Open-Source Framework/Models for Efficient LLM Evaluationby /u/Jl_btdipsbro (Machine Learning) on August 6, 2024 at 11:53 pm
I'm excited to share GroundedAI, an open-source framework I've developed for evaluating large language model application outputs using fine-tuned small language models and specialized adapters. Key features: - Evaluate LLM outputs for toxicity, RAG relevance, and hallucination - Efficient small language models with metric-specific adapters - Local evaluation using less than 5GB VRAM - Easy-to-use Python package - Contends with GPT4 performance at just 3.8B params The framework currently includes three main evaluators: 1. Toxicity Evaluator 2. RAG Relevance Evaluator 3. Hallucination Evaluator Each evaluator uses a base model that merges with a specialized adapter during warmup, allowing for efficient and metric specific evals. Our models are available on Hugging Face: https://huggingface.co/grounded-ai We welcome contributions and feedback from the community. Check out our GitHub repo https://github.com/grounded-ai/grounded_ai for more details and documentation. Let me know if you have any questions or ideas for improvement! submitted by /u/Jl_btdipsbro [link] [comments]

- [D] Why does overparameterization and reparameterization result in a better model?by /u/Revolutionary-Fig660 (Machine Learning) on August 6, 2024 at 10:43 pm
The backbone for Apple's mobileCLIP network is FastVIT, which uses network reparameterization between train and inference time to produce a smaller network with better performance. I've seen this crop up in several papers recently, but the basic idea is that you overparameterize your model during training and then mathematically reduce it for inference. For example, instead of doing a single conv op you can make two "branches", each of which is an independent conv op and then sum the results. It doubles the parameters of the op during training, but then during inference you "reparameterize" which in this case means adding the weight/biases of the two branches together resulting in a single, mathematically identical conv op (same input, same output, one conv op instead of two summed branches). A similar trick is done by adding skip connections over a few ops during training, then during inference mathematically incorporating the skip into the op weights to produce an identical output without the need to preserve the earlier layer tensors or do the extra addition. The situation seems equivalent to modifying y = a*x + b during training to y = (a1+a2)*x +b1+b2 to get more parameters, then just going back to the base form using a = a1+a2 and b = b1+b2 for inference. I understand mathematically that the operations are equivalent, but I have less intuition regard why overparameterizing for training and then reducing for inference produces a better model. My naive thought is that this would add more memory and compute to the network, reducing training speed, without actually enhancing the capacity of the model, since the overparameterized ops are still mathematically equivalent to a single op, regardless of whether they have actually been reduced. Is there strong theory behind it, or is it an interesting idea someone tried that happened to work? submitted by /u/Revolutionary-Fig660 [link] [comments]

- [R] alphaXiv - a comments section for ArXivby /u/Vivid_Perception_143 (Machine Learning) on August 6, 2024 at 7:36 pm
I've been working on an arXiv labs project, alphaXiv.org, which is a comment and discussion section for papers directly built on top of arXiv. I feel that a lot of readers often have the same questions on papers and so I hope having a central forum could be of great to the research community. Last week, we were featured by Stanford's AI Lab. Please check it out and let me know what you think! This project is in active development, so please DM me if you would like to collaborate or have feedback. submitted by /u/Vivid_Perception_143 [link] [comments]

- [Discussion] Beat GPT-4o at Python by searching with 100 dumb LLaMAsby /u/thundergolfer (Machine Learning) on August 6, 2024 at 5:41 pm
One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning. Richard Sutton, The Bitter Lesson The eponymously distasteful take-away of Richard Sutton’s essay has often been misconstrued: because scale is all you need, they say, smaller models are doomed to irrelevance. The rapid increase in model size above one trillion parameters and the technological limitations of GPU memory together seemed to foreclose on economical frontier intelligence anywhere except at an oligopoly of intelligence-as-a-service providers. Open models and self-serve inference were in retreat. But as the quote above indicates, there are in fact two arrows in the scaling quiver: learning and search. Learning, as we do it now with neural networks, scales with memory at inference time — larger models perform better, ceteris paribus, because they can extract more data from their training set into more circuits and more templates. Search scales smoothly with compute at inference time — compute that can be spent on either producing higher quality candidates or on producing more candidates. In the ideal case, the scaling behavior can be predicted via so-called scaling laws. Recent papers indicate that generative models like LLMs can be scaled up with search. The Large Language Monkeys paper, published on arXiv by Brown, Juravsky, and co-authors last week, includes several results in this vein and indicates that frontier-level intelligence in certain domains can be elicited from smaller models that can run on a single, past-generation GPU. Further, they observed smooth, predictable improvement of performance with scale. Put more simply: where before, it seemed frontier capabilities required one horse-sized duck, it is clear we can now alternatively get them with one hundred duck-sized horses (or, rather, LLaMAs). This weekend, we set out to replicate this finding. Scaling LLaMA 3.1 8B HumanEval on Modal Running all of our experiments, including configuration and testing, cost well under $50. You can find our code here. You can run it yourself without exceeding the $30/month in credits included in Modal’s free tier. Metrics and data: HumanEval and pass@k Continued in original post... submitted by /u/thundergolfer [link] [comments]

- [P] Grounded SAM 2: Ground and Track Anythingby /u/Technical-Vast1314 (Machine Learning) on August 6, 2024 at 4:55 pm
https://preview.redd.it/13854j03q2hd1.jpg?width=1280&format=pjpg&auto=webp&s=0735848ae40c2591111fa4ed91d2c28ea829c0ac With the release of SAM 2, we have taken the opportunity to update our Grounded SAM algorithm. The biggest improvement in SAM 2 compared to SAM is the expansion of its segmentation capabilities to video, allowing users to interactively segment any object and track it in video. However, the main issue with SAM 2 is that the segmented and tracked objects do not contain semantic information. To address this, we have continued the approach of Grounded SAM by incorporating an open-set detection model, Grounding DINO. This enables us to extend 2D open-set detection to video object segmentation and tracking. We have release our code in https://github.com/IDEA-Research/Grounded-SAM-2 with very easy implementations, which is convenient for users. Project Highlights: In this repo, we've supported the following demo with simple implementations: Ground and Segment Anything with Grounding DINO, Grounding DINO 1.5 & 1.6 and SAM 2 Ground and Track Anything with Grounding DINO, Grounding DINO 1.5 & 1.6 and SAM 2 Detect, Segment and Track Visualization based on the powerful https://github.com/roboflow/supervision library. And we will continue update our code to make it easier for users. submitted by /u/Technical-Vast1314 [link] [comments]

- [P] Identify the faulty components of water-cooled HVAC systems via anomalous sound detection and diagnose ensuing cooling malfunctions via thermal visual anomaly detection. To perform AI-powered features, the device employs Audio MFE and FOMO-AD algorithms in combination with the web application.by /u/the-amplituhedron (Machine Learning) on August 6, 2024 at 4:44 pm
submitted by /u/the-amplituhedron [link] [comments]

- "[D]" Per class augmentation for highly imbalanced image data. Good or bad idea?by /u/Antman-007 (Machine Learning) on August 6, 2024 at 12:17 pm
When solving computer vision problems where the data is highly imbalance, I have come across a number of techniques one could try - ranging from using loss functions tailored for imbalance datasets, class/sample weights, using sampling techniques like SMOTE, weighted random sampler or just regular random sampling as well as using GANs to generate more data of the minority class. I wonder, however if anyone has explored per class augmentations i.e different augmentations applied to the different classes with the minority class being heavily augmented compared to the majority class. I have scoured the internet looking for material indicating why this could be a good or bad idea and its implications to no avail. submitted by /u/Antman-007 [link] [comments]