Categories: Data scienceMachine Learning

What are the top 3 methods used to find Autoregressive Parameters in Data Science?

In order to find autoregressive parameters, you will first need to understand what autoregression is. Autoregression is a statistical method used to create a model that describes data as a function of linear regression of lagged values of the dependent variable. In other words, it is a model that uses past values of a dependent variable in order to predict future values of the same dependent variable.

In time series analysis, autoregression is the use of previous values in a time series to predict future values. In other words, it is a form of regression where the dependent variable is forecasted using a linear combination of past values of the independent variable. The parameter values for the autoregression model are estimated using the method of least squares.

The autoregressive parameters are the coefficients in the autoregressive model. These coefficients can be estimated in a number of ways, including ordinary least squares (OLS), maximum likelihood (ML), or least squares with L1 regularization (LASSO). Once estimated, the autoregressive parameters can be used to predict future values of the dependent variable.

To find the autoregressive parameters, you need to use a method known as least squares regression. This method finds the parameters that minimize the sum of the squared residuals. The residual is simply the difference between the predicted value and the actual value. So, in essence, you are finding the parameters that best fit the data.

How to Estimate Autoregressive Parameters?

There are three main ways to estimate autoregressive parameters: ordinary least squares (OLS), maximum likelihood (ML), or least squares with L1 regularization (LASSO).

Ordinary Least Squares: Ordinary least squares is the simplest and most common method for estimating autoregressive parameters. This method estimates the parameters by minimizing the sum of squared errors between actual and predicted values.

Maximum Likelihood: Maximum likelihood is another common method for estimating autoregressive parameters. This method estimates the parameters by maximizing the likelihood function. The likelihood function is a mathematical function that quantifies the probability of observing a given set of data given certain parameter values.

Least Squares with L1 Regularization: Least squares with L1 regularization is another method for estimating autoregressive parameters. This method estimates the parameters by minimizing the sum of squared errors between actual and predicted values while also penalizing models with many parameters. L1 regularization penalizes models by adding an extra term to the error function that is proportional to the sum of absolute values of the estimator coefficients.

Finding Autoregressive Parameters: The Math Behind It
To find the autoregressive parameters using least squares regression, you first need to set up your data in a certain way. You need to have your dependent variable in one column and your independent variables in other columns. For example, let’s say you want to use three years of data to predict next year’s sales (the dependent variable). Your data would look something like this:

| Year | Sales |
|——|——-|
| 2016 | 100 |
| 2017 | 150 |
| 2018 | 200 |

Next, you need to calculate the means for each column. For our sales example, that would look like this:

$$ \bar{Y} = \frac{100+150+200}{3} = 150$$

Now we can calculate each element in what’s called the variance-covariance matrix:

$$ \operatorname {Var} (X)=\sum _{i=1}^{n}\left({x_{i}}-{\bar {x}}\right)^{2} $$

and

$$ \operatorname {Cov} (X,Y)=\sum _{i=1}^{n}\left({x_{i}}-{\bar {x}}\right)\left({y_{i}}-{\bar {y}}\right) $$

For our sales example, that calculation would look like this:

$$ \operatorname {Var} (Y)=\sum _{i=1}^{3}\left({y_{i}}-{\bar {y}}\right)^{2}=(100-150)^{2}+(150-150)^{2}+(200-150)^{2})=2500 $$

and

$$ \operatorname {Cov} (X,Y)=\sum _{i=1}^{3}\left({x_{i}}-{\bar {x}}\right)\left({y_{i}}-{\bar {y}}\right)=(2016-2017)(100-150)+(2017-2017)(150-150)+(2018-2017)(200-150))=-500 $$

Now we can finally calculate our autoregressive parameters! We do that by solving this equation:

$$ \hat {\beta }=(X^{\prime }X)^{-1}X^{\prime }Y=\frac {1}{2500}\times 2500\times (-500)=0.20 $$\.20 . That’s it! Our autoregressive parameter is 0\.20 . Once we have that parameter, we can plug it into our autoregressive equation:

$$ Y_{t+1}=0\.20 Y_t+a_1+a_2+a_3footnote{where $a_1$, $a_2$, and $a_3$ are error terms assuming an AR(3)} .$$ And that’s how you solve for autoregressive parameters! Of course, in reality you would be working with much larger datasets, but the underlying principles are still the same. Once you have your autoregressive parameters, you can plug them into the equation and start making predictions!.

Which Method Should You Use?
The estimation method you should use depends on your particular situation and goals. If you are looking for simple and interpretable results, then Ordinary Least Squares may be the best method for you. If you are looking for more accurate predictions, then Maximum Likelihood or Least Squares with L1 Regularization may be better methods for you.

Autoregressive models STEP BY STEP:

1) Download data: The first step is to download some data. This can be done by finding a publicly available dataset or by using your own data if you have any. For this example, we will be using data from the United Nations Comtrade Database.

2) Choose your variables: Once you have your dataset, you will need to choose the variables you want to use in your autoregression model. In our case, we will be using the import and export values of goods between countries as our independent variables.

3) Estimate your model: After choosing your independent variables, you can estimate your autoregression model using the method of least squares. OLS estimation can be done in many statistical software packages such as R or STATA.

4) Interpret your results: Once you have estimated your model, it is important to interpret the results in order to understand what they mean. The coefficients represent the effect that each independent variable has on the dependent variable. In our case, the coefficients represent the effect that imports and exports have on trade balance. A positive coefficient indicates that an increase in the independent variable leads to an increase in the dependent variable while a negative coefficient indicates that an increase in the independent variable leads to a decrease in the dependent variable.

5)Make predictions: Finally, once you have interpreted your results, you can use your autoregression model to make predictions about future values of the dependent variable based on past values of the independent variables.

Conclusion: In this blog post, we have discussed what autoregression is and how to find autoregressive parameters.

Estimating an autoregression model is a relatively simple process that can be done in many statistical software packages such as R or STATA.

Smartphone 101 - Pick a smartphone for me - android or iOS - Apple iPhone or Samsung Galaxy or Huawei or Xaomi or Google Pixel

In statistics and machine learning, autoregression is a modeling technique used to describe the linear relationship between a dependent variable and one more independent variables. To find the autoregressive parameters, you can use a method known as least squares regression which minimizes the sum of squared residuals. This blog post also explains how to set up your data for calculating least squares regression as well as how to calculate Variance and Covariance before finally calculating your autoregressive parameters. After finding your parameters you can plug them into an autoregressive equation to start making predictions about future events!

We have also discussed three different methods for estimating those parameters: Ordinary Least Squares, Maximum Likelihood, and Least Squares with L1 Regularization. The appropriate estimation method depends on your particular goals and situation.

Machine Learning For Dummies

Machine Learning For Dummies App

Machine Learning For Dummies on iOs: https://apps.apple.com/us/app/machine–learning-for-dummies-p/id1610947211

Machine Learning For Dummies on Windows: https://www.microsoft.com/en-ca/p/machine–learning-for-dummies-ml-ai-ops-on-aws-azure-gcp/9p6f030tb0mt?

Machine Learning For Dummies Web/Android on Amazon: https://www.amazon.com/gp/product/B09TZ4H8V6

Machine Learning Breaking News

Transformer – Machine Learning Models

Machine Learning – Software Classification

Autoregressive Model

Autoregressive generative models can estimate complex continuous data distributions such as trajectory rollouts in an RL environment, image intensities, and audio. Traditional techniques discretize continuous data into various bins and approximate the continuous data distribution using categorical distributions over the bins. This approximation is parameter inefficient as it cannot express abrupt changes in density without using a significant number of additional bins. Adaptive Categorical Discretization (ADACAT) is proposed in this paper as a parameterization of 1-D conditionals that is expressive, parameter efficient, and multimodal. A vector of interval widths and masses is used to parameterize the distribution known as ADACAT. Figure 1 showcases the difference between the traditional uniform categorical discretization approach with the proposed ADACAT.

Each component of the ADACAT distribution has non-overlapping support, making it a specific subfamily of mixtures of uniform distributions. ADACAT generalizes uniformly discretized 1-D categorical distributions. The proposed architecture allows for variable bin widths and more closely approximates the modes of two Gaussians mixture than a uniformly discretized categorical, making it highly expressive than the latter. Additionally, a distribution’s support is discretized using quantile-based discretization, which bins data into groups with similar measured data points. ADACAT uses deep autoregressive frameworks to factorize the joint density into numerous 1-D conditional ADACAT distributions in problems with more than one dimension.

Continue reading | Check out the paper and github link.

Pytorch – Computer Application

https://torchmetrics.readthedocs.io/en/stable//index.html

Best practices for training PyTorch model

[D] Correct interpretation of Model.predict output.
by /u/adolfban (Machine Learning) on May 17, 2024 at 8:07 am
Currently taking the FCC Machine Learning Course. I dont know how to correctly interpret the probabilities of the Model.predict function output. The CNN is meant to determine whether is an image of a cat or a dog. Some probabilities are negative and very low. I dont know how to interpret that data. Also mention the model achieves its goal with an aceptable margin. Here an example: https://preview.redd.it/yyss5v202y0d1.png?width=759&format=png&auto=webp&s=b1141a6e7150fcfdf0b65dc2119de956a9fc8ec8 Here the code. Link of images: wget https://cdn.freecodecamp.org/project-data/cats-and-dogs/cats_and_dogs.zip Code: # 3 train_image_generator = ImageDataGenerator(rescale=1./255) validation_image_generator = ImageDataGenerator(rescale=1./255) test_image_generator = ImageDataGenerator(rescale=1./255) train_data_gen = train_image_generator.flow_from_directory( train_dir, target_size=(IMG_HEIGHT,IMG_WIDTH), batch_size = batch_size, class_mode = 'binary') val_data_gen = validation_image_generator.flow_from_directory( directory = validation_dir, target_size=(IMG_HEIGHT,IMG_WIDTH), batch_size = batch_size, class_mode = 'binary') test_data_gen = test_image_generator.flow_from_directory( directory=test_dir, target_size=(IMG_HEIGHT,IMG_WIDTH), batch_size = batch_size, class_mode = 'binary', shuffle=False) # 4 def plotImages(images_arr, probabilities = False): fig, axes = plt.subplots(len(images_arr), 1, figsize=(5,len(images_arr) * 3)) if probabilities is False: for img, ax in zip( images_arr, axes): ax.imshow(img) ax.axis('off') else: for img, probability, ax in zip( images_arr, probabilities, axes): ax.imshow(img) ax.axis('off') if probability > 0.5: ax.set_title("%.2f" % (probability*100) + "% dog") else: ax.set_title("%.2f" % ((1-probability)*100) + "% cat") plt.show() sample_training_images, _ = next(train_data_gen) plotImages(sample_training_images[:5]) # 5 train_image_generator = train_image_generator = ImageDataGenerator( rotation_range = 360, horizontal_flip = True, vertical_flip = True, zoom_range = 0.2, shear_range = 60, rescale=1./255) # 6 train_data_gen = train_image_generator.flow_from_directory(batch_size=batch_size, directory=train_dir, target_size=(IMG_HEIGHT, IMG_WIDTH), class_mode='binary') augmented_images = [train_data_gen[0][0][0] for i in range(5)] plotImages(augmented_images) # 7 model = Sequential() model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3))) model.add(MaxPooling2D((2, 2))) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(MaxPooling2D((2, 2))) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(Flatten()) model.add(Dense(64, activation='relu')) model.add(Dense(2)) model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy']) model.summary() # 8 history = model.fit(x = train_data_gen, epochs = epochs, validation_data = val_data_gen) acc = history.history['accuracy'] print(acc) # 9 acc = history.history['accuracy'] val_acc = history.history['val_accuracy'] loss = history.history['loss'] val_loss = history.history['val_loss'] epochs_range = range(epochs) print(epochs_range) print(acc) plt.figure(figsize=(8, 8)) plt.subplot(1, 2, 1) plt.plot(epochs_range, acc, label='Training Accuracy') plt.plot(epochs_range, val_acc, label='Validation Accuracy') plt.legend(loc='lower right') plt.title('Training and Validation Accuracy') plt.subplot(1, 2, 2) plt.plot(epochs_range, loss, label='Training Loss') plt.plot(epochs_range, val_loss, label='Validation Loss') plt.legend(loc='upper right') plt.title('Training and Validation Loss') plt.show() #10 probabilities = model.predict(test_data_gen) print(probabilities) probabilities = np.argmax(probabilities, axis = 1) sample_test_images, _ = next(test_data_gen) plotImages(sample_test_images, probabilities=probabilities) # 11 answers = [1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0] correct = 0 for probability, answer in zip(probabilities, answers): if round(probability) == answer: correct +=1 percentage_identified = (correct / len(answers)) * 100 passed_challenge = percentage_identified >= 63 print(f"Your model correctly identified {round(percentage_identified, 2)}% of the images of cats and dogs.") if passed_challenge: print("You passed the challenge!") else: print("You haven't passed yet. Your model should identify at least 63% of the images. Keep trying. You will get it!") submitted by /u/adolfban [link] [comments]
[P] Real Time Emotion Classification with FER-2013 dataset
by /u/Hades_Kerbex22 (Machine Learning) on May 17, 2024 at 7:25 am
So I am doing an internship project at a company that is as the title says. Currently I'm trying to achieve good accuracy on FER 2013 dataset then I'll move to the Real Time capture part I need to finish this project in like 2 weeks' time. I have tried transfer learning with models like mobile_net, VGG19, ResNet50, Inception, Efficient_net and my training accuracy has reached to like 87% but validation accuracy is pretty low ~56% (MAJOR overfitting, ik). Can the smart folks here help me out with some suggestions on how to better perform transfer learning, whether I should use data augmentation or not( I have around 28000 training images), and about should I use vision transformer, etc. ? with VGG19 and Inception , for some reason my validation accuracy gets stuck at 24.71% and doesn't change after it ResNet50, mobile_net and Efficient_net are giving the metrics as stated above This is a sample notebook I've been using for transfer learning https://colab.research.google.com/drive/1DeJzEs7imQy4lItWA11bFB4mSdZ95YgN?usp=sharing Any and all help is appreciated! submitted by /u/Hades_Kerbex22 [link] [comments]
mismatch between past key values calculated from scratch and past key values obtained from the model [D]
by /u/1azytux (Machine Learning) on May 17, 2024 at 5:10 am
I'm trying to calculate past key values for llama-2 model from scratch, and followed all the steps including normalizing the hidden states values followed by matrix multiplication of weight vectors with hidden states and finally application of RoPE. Even after doing all of this, past key values don't match with the ones obtained from the model. Does anyone have any suggestions? Following is the code: # Load tokenizer and model tokenizer = LlamaTokenizer.from_pretrained(path_to_llama2) config = LlamaConfig.from_pretrained(path_to_llama2) config.output_hidden_states = True config.output_attentions = True config.use_cache = True model = LlamaForCausalLM.from_pretrained(path_to_llama2, config=config) model.eval() input_text = "Once upon a time" inputs = tokenizer(input_text, return_tensors='pt') outputs = model(**inputs) hidden_states = outputs.hidden_states state_dict = model.state_dict() # Function to compute rotary embeddings def apply_rotary_pos_emb(q, k, rotary_pos_emb): cos, sin = rotary_pos_emb q_rot = q * cos + rotate_half(q) * sin k_rot = k * cos + rotate_half(k) * sin return q_rot, k_rot def rotate_half(x): x1, x2 = x.chunk(2, dim=-1) return torch.cat((-x2, x1), dim=-1) # Generate rotary position embeddings def get_rotary_emb(dim, seq_len): inv_freq = 1.0 / (10000 ** (torch.arange(0, dim, 2).float() / dim)) t = torch.arange(seq_len, dtype=inv_freq.dtype) freqs = torch.einsum("i,j->ij", t, inv_freq) emb = torch.cat((freqs, freqs), dim=-1) cos = emb.cos().unsqueeze(0).unsqueeze(0) sin = emb.sin().unsqueeze(0).unsqueeze(0) return cos, sin # Function to compute past_key_values for a single layer def compute_past_key_values_for_layer(layer_idx, hidden_state): attention_layers = [layer.self_attn for layer in model.model.layers] # Apply layer normalization norm_weight = state_dict[f'model.layers.{layer_idx}.input_layernorm.weight'] hidden_state = F.layer_norm(hidden_state, (hidden_state.size(-1),), norm_weight) W_q = state_dict[f'model.layers.{layer_idx}.self_attn.q_proj.weight'] W_k = state_dict[f'model.layers.{layer_idx}.self_attn.k_proj.weight'] W_v = state_dict[f'model.layers.{layer_idx}.self_attn.v_proj.weight'] queries = torch.matmul(hidden_state, W_q.T) keys = torch.matmul(hidden_state, W_k.T) values = torch.matmul(hidden_state, W_v.T) batch_size, seq_length, hidden_dim = hidden_state.size() num_attention_heads = attention_layers[layer_idx].num_heads head_dim = hidden_dim // num_attention_heads keys = keys.view(batch_size, seq_length, num_attention_heads, head_dim) queries = queries.view(batch_size, seq_length, num_attention_heads, head_dim) values = values.view(batch_size, seq_length, num_attention_heads, head_dim) keys = keys.permute(0, 2, 1, 3) queries = queries.permute(0, 2, 1, 3) values = values.permute(0, 2, 1, 3) rotary_emb = get_rotary_emb(head_dim, seq_length) queries, keys = apply_rotary_pos_emb(queries, keys, rotary_emb) return keys, values # Calculate past_key_values past_key_values = [] for i, hidden_state in enumerate(hidden_states[:-1]): # Skip the last layer keys, values = compute_past_key_values_for_layer(i, hidden_state) past_key_values.append((keys, values)) past_key_values = tuple(past_key_values) Any help is appreciated! An example of mismatch between values can be found here : https://pastebin.com/CadGf9Ug submitted by /u/1azytux [link] [comments]
[D] Real chances to be accepted in NeurIPS 2024 - Other conferences
by /u/Sincerebri (Machine Learning) on May 17, 2024 at 3:06 am
Hey! This is my first time submitting to NeurIPS. Does anyone know when the reviews are visible to the authors? August, or is it possible that earlier? If we have really bad reviews... The best thing is to exit the submission path, right? In that case, which alternatives do you recommend on those dates? My topic is NN reliability, but I am always underconfident about my research and I always think that it is not enough, more if I think in a conference as Neurips. Do you think that everybody submits good papers or is there a large quantity of rubbish papers? I read a lot of bad opinions here about the reviewing process... So, I am a little afraid. This year, there are 20000ish submissions. So, I don't know what to do, if continue the submission or submit to another conference. As the gap that I am filling is clear, I am sure that others are covering that gap and submitting it to NeurIPS. Is there any other conference that outputs the results first than NeurIPS? I am trying to think in a smart way. So hard to be a researcher... Thank you! submitted by /u/Sincerebri [link] [comments]
[D] Seminal papers list since 2018 that will be considered cannon in the future
by /u/David202023 (Machine Learning) on May 17, 2024 at 12:27 am
Hi there, A recent grad here that finally has some time to learn the actual interesting stuff. I want to get myself familiar with modern machine learning. I read the most well-known paper like Attention is all you Need, CLIP, Vision Transformers, but I am sure that I missed the majority of the important papers. Jumping directly into reading recent ICML/NIPS won't do me good as I feel like I have much to cover in the fundamentals. Where should I start? I am familiar with ML and DL until 2018-ish, familiar with the vanilla transformer but that is basically it. submitted by /u/David202023 [link] [comments]
[D] Are PyTorch high-level frameworks worth using?
by /u/dazor1 (Machine Learning) on May 16, 2024 at 11:24 pm
In an attempt to better track experiment results and hyperparameters, not only did I learn about the Weights and Biases library but also ended up finding out about frameworks such as PyTorch Lightning and Ignite. I've always used raw PyTorch, so I'm not sure if these frameworks are really useful. I mostly work with academic research, right now I also need to keep track of the MAE since it's a regression problem and I don't know if these frameworks support this or let me define a custom metric. Would these frameworks be useful for me? Could it speed up the process when experimenting with different architectures? If you think they're useful, let me know which one you'd recommend. submitted by /u/dazor1 [link] [comments]
Rio: WebApps in pure Python. No JavaScript, HTML and CSS needed!
by /u/Sn3llius (Data Science) on May 16, 2024 at 6:07 pm
Hi everyone! We're excited to announce that our reactive web UI framework is now public. This project has been in the works for quite some time, and we're excited to share it with you. Feel free to check it out and share your feedback! GitHub Website There is a short coding GIF on GitHub. What My Project Does Rio is a brand new GUI framework designed to let you create modern web apps with just a few lines of Python. Our goal is to simplify web and app development, allowing you to focus on what matters most instead of getting stuck on complicated user interface details. We achieve this by adhering to the core principles of Python that we all know and love. Python is meant to be simple and concise, and Rio embodies this philosophy. There's no need to learn additional languages like HTML, CSS, or JavaScript, as all UI, logic, components, and even layout are managed entirely in Python. Moreover, there's no separation between front-end and back-end; Rio transparently handles all communication for you. Target Audience Rio is perfect for data scientists who want to create web apps without learning new languages. With Rio, it's easy to create interactive apps that let stakeholders explore results and give feedback, so you can stay focused on your data analysis and model development. Plus, Rio offers more flexibility than frameworks like Gradio or Streamlit, giving you greater control over your app's functionality and design. Showcase Rio doesn't just serve HTML templates like you might be used to from frameworks like Flask. In Rio you define components as simple dataclasses with a React/Flutter style build method. Rio continuously watches your attributes for changes and updates the UI as necessary. class MyComponent(rio.Component): clicks: int = 0 def _on_press(self) -> None: self.clicks += 1 def build(self) -> rio.Component: return rio.Column( rio.Button('Click me', on_press=self._on_press), rio.Text(f'You clicked the button {self.clicks} time(s)'), ) app = rio.App(build=MyComponent) app.run_in_browser() Notice how there is no need for any explicit HTTP requests. In fact there isn't even a distinction between frontend and backend. Rio handles all communication transparently for you. Unlike ancient libraries like tkinter, Rio ships with over 50 builtin components in Google's Material Design. Moreover the same exact codebase can be used for both local apps and websites. Key Features Full-Stack Web Development: Rio handles front-end and backend for you. In fact, you won't even notice they exist. Create your UI, and Rio will take care of the rest. Python Native: Rio apps are written in 100% Python, meaning you don't need to write a single line of CSS or JavaScript. Modern Python: We embrace modern Python features, such as type annotations and asynchrony. This keeps your code clean and maintainable, and helps your code editor help you out with code completions and type checking. Python Debugger Compatible: Since Rio runs on Python, you can connect directly to the running process with a debugger. This makes it easy to identify and fix bugs in your code. Declarative Interface: Rio apps are built using reusable components, inspired by react, flutter & vue. They're declaratively combined to create modular and maintainable UIs. Batteries included: Over 50 builtin components based on Google's Material Design We welcome your thoughts and questions in the comments! If you like the project, please give it a star on GitHub to show your support and help us continue improving it. submitted by /u/Sn3llius [link] [comments]
LLMs in industry
by /u/No-Mud4063 (Data Science) on May 16, 2024 at 4:14 pm
I dont have much experience about LLMs. But i see the requirements for LLMs in many job postings now. I was curious as to what the extent of LLMs is in the industry and what is expected? do majority of companies (maybe minus the faang or equivalent companies) just do finetuning existing models like BERT/GPT or do they actually build LLM models? submitted by /u/No-Mud4063 [link] [comments]
[D] Friday's Oxen.AI Water Cooler call: High-performance audio processing, Python vs Rust
by /u/ReluOrTanh (Machine Learning) on May 16, 2024 at 3:50 pm
At this Friday's Oxen.AI Water Cooler, the "Show & Tell / Where are you stuck? / What is your project?" segment topic will be: High-performance audio processing Oxen.ai discord member Shalini Ananda, PhD, https://www.linkedin.com/in/shalinianandaphd/ will discuss her experimentation with Python vs Rust with audio workloads. Preview here: https://discord.com/channels/1104137825638682806/1145920256301338685/1240029110726561823 Greg Schoeninger, CEO of Oxen.ai , u/FallMindless3563 , will share highlights from this week's SW2 Conference and his session "Better Data, Better AI" To join the Ai Water Cooler call or Paper Club Zoom call at Friday 10:00 AM Pacific time, click hard on the 'subscribe' button: https://lu.ma/oxen submitted by /u/ReluOrTanh [link] [comments]
[R] Energy-based Hopfield Boosting for Out-of-Distribution Detection
by /u/hopeman2 (Machine Learning) on May 16, 2024 at 3:40 pm
https://arxiv.org/abs/2405.08766 Out-of-distribution (OOD) detection is critical when deploying machine learning models in the real world. Outlier exposure methods, which incorporate auxiliary outlier data in the training process, can drastically improve OOD detection performance compared to approaches without advanced training strategies. We introduce Hopfield Boosting, a boosting approach, which leverages modern Hopfield energy (MHE) to sharpen the decision boundary between the in-distribution and OOD data. Hopfield Boosting encourages the model to concentrate on hard-to-distinguish auxiliary outlier examples that lie close to the decision boundary between in-distribution and auxiliary outlier data. Our method achieves a new state-of-the-art in OOD detection with outlier exposure, improving the FPR95 metric from 2.28 to 0.92 on CIFAR-10 and from 11.76 to 7.94 on CIFAR-100. submitted by /u/hopeman2 [link] [comments]
[D] LWhy are Linear RNNs so performant (in terms of accuracy, not compute)? Looking for mathematical or even intuitive explanations
by /u/Ice-Cool701 (Machine Learning) on May 16, 2024 at 2:29 pm
Trying to familiarise myself with the mamba architecture, hence familiarising myself with SSMs, hence familiarising myself with Linear RNNs. I have looked over resources on SSMs, S4 and Mamba but I’m unable to find an explanation. on why Linear RNNs with SSM parameterization improves performance. I can’t wrap my head around it intuitively either - why are linear transformations sufficient for seq2seq tasks? Are there any exhaustive mathematical explanations, or even videos on how linear RNNs can outperform transformers on certain tasks? submitted by /u/Ice-Cool701 [link] [comments]
[D] ICML 2024 travel grants?
by /u/South-Conference-395 (Machine Learning) on May 16, 2024 at 2:16 pm
Hello everyone, Has anyone got any update about ICML 2024 Financial Aid? I saw in X that applications will be open soon but haven't heard anything yet https://twitter.com/icmlconf/status/1787617481034481714 Is there anyone who got the grant in the past? Are all students eligible for this? thanks! submitted by /u/South-Conference-395 [link] [comments]
[D] Unveiling MileBench: Benchmarking MLLMs in Long Contexts!
by /u/BBsngg (Machine Learning) on May 16, 2024 at 11:45 am
Hey everyone! I'm excited to share our latest work on MileBench, a new benchmark designed to evaluate the performance of Multimodal Large Language Models (MLLMs) in long-context tasks involving multiple images and lengthy texts. Title: MileBench: Benchmarking MLLMs in Long Context Homepage: https://milebench.github.io/ Paper: https://arxiv.org/abs/2404.18532 Code: https://github.com/MileBench/MileBench Data: https://huggingface.co/datasets/FreedomIntelligence/MileBench Why MileBench? Existing benchmarks often overlook the complexity of tasks that involve multimodal long contexts. MileBench is the first to focus on these challenging scenarios, offering a more realistic assessment of MLLMs. Evaluation Types: Diagnostic Evaluation: Tests recall in long contexts with needle-in-a-haystack and image retrieval tasks. Realistic Evaluation: Simulates real-world scenarios with time-sequence and semantically related image tasks. We collected 6,440 multimodal long-context samples from 21 existing or self-constructed datasets, with an average of 15.2 images and 422.3 words per sample. The table and figure show the detailed statistics of our datasets. Key Findings: Closed-source GPT-4o excelled in both diagnostic and realistic evaluations, but still short of perfect 100%. Most open-source MLLMs struggled with long-context tasks. Only Mantis and Qwen-VL-7B managed notable scores. These results underscore that there are "miles to go" towards fully-realized long-context MLLMs. In-depth Analyses: Analysis 1: How do MLLMs perform with different context lengths? Most models' performance drops as the number of images increases. Some models, like GPT-4o, perform better with a medium number of images. Analysis 2: Is there a "Lost in the Middle" Phenomenon in Long Contexts? Strong long-text processing capabilities are crucial Qwen-VL-Chat showed some "Lost in the Middle" effects. Analysis 3: Combining Images Helps? To address input limitations, we combined multiple images into a single large image. Closed-source models performance dropped when combining images, except for Gemini 1.0 High-resolution capabilities are crucial Some open-source models showed improved performance with combined images Want to Dive Deeper? Our paper includes detailed experimental analyses, covering data contamination issues and task diversity. Check it out here: https://arxiv.org/abs/2404.18532 Looking Ahead: Expanding MileBench to larger contexts and other modalities. Developing MLLMs that can efficiently handle complex, multimodal long-context tasks. For more details, visit our project page: https://milebench.github.io Let’s discuss and explore how we can push the boundaries of MLLMs together! submitted by /u/BBsngg [link] [comments]
[D] What’s the best cloud compute service for hobby projects?
by /u/ats678 (Machine Learning) on May 16, 2024 at 11:14 am
Hi everyone! I’m a research engineer working mainly on Computer Vision applications. I want to start experimenting with models or tasks I’m not an expert in as a side project, but I don’t have a GPU on my personal laptop, and I’d like to perform some small-to-medium training experiments at least. Just to give you an idea of the models I want to train: NeRFs and Gaussian Splats Diffusion models Some small transformer models (Think Llama-3 8b and less). Considering the scale of the projects I have in mind, anything above an A100 is probably an overkill. Until a few weeks ago I was using colab pro, but I didn’t really like the fact that I had to store stuff on my google drive and I’d like to have something where I can at least access the terminal and not being limited just to jupyter notebooks. In your opinion, what’s a good cloud provider at a good cost for these sort of projects? submitted by /u/ats678 [link] [comments]
[P] Needle in a Needlestack (NIAN)
by /u/EternalBlueFriday (Machine Learning) on May 16, 2024 at 10:22 am
Code: https://github.com/llmonpy/needle-in-a-needlestack Website: https://nian.llmonpy.ai/ Description: Needle in a haystack (NIAH) has been a wildly popular test for evaluating how effectively LLMs can pay attention to the content in their context window. As LLMs have improved NIAH has become too easy. Needle in a Needlestack (NIAN) is a new, more challenging benchmark. Even GPT-4-turbo struggles with this benchmark. NIAN creates a list of limericks from a large database of limericks and asks a question about a specific limerick that has been placed at a test location. Each test will typically use 5 to 10 test limericks placed at 5 to 10 locations in the prompt. Each test is repeated 2-10 times. submitted by /u/EternalBlueFriday [link] [comments]
[D] Neural Operators | DeepONet vs. FNO |
by /u/turbulence53 (Machine Learning) on May 16, 2024 at 10:18 am
Hi all. I am recently getting started into Neural Operators and their application in PDE driven problems. It would be great if someone with experience could share how DeepONet and Fourier Neural Operator compare to each other. More specifically when it comes to their application in a huge spatio-temporal domain, efficiency and implementation in inverse problems. Cheers. submitted by /u/turbulence53 [link] [comments]
[R] Pretraining a byte-level 0.67B transformer on a single A100
by /u/kiockete (Machine Learning) on May 16, 2024 at 9:58 am
It feels so good, no multi-gpu crap, just a single powerful A100. Also, no tokenization business, just feeding it with plain UTF-8 bytes. I've designed a positional encoding that does not rely on assigning increasingly lower scores the further apart two tokens are, so there is no such explicit bias like in RoPE or ALiBi. The hope is it will extrapolate to unlimited context length... probably not, but I will give it a try and waste some $$$. Loss is going down! It's still very early in the training - a warmup phase - but this is what it's already capable of - e.g. completing the prompt "Try our brand new Virtual Wallet services": Try our brand new Virtual Wallet services on a consistently low rate and with the same approach as a Virtual Wallet provider. We have recently emerged to install an outstanding Virtual Wallet service on an existing VW VW VW and VW Solenox VW exterior. We do it on all change dates, remain integrated into our service cycle company to ensure that the customer is happy and satisfied. - Virtual Wallet solution to provide our customers the best and fastest alternative - As soon as we have the premium virtual wallet, we can take it appropriately - When it is needed to allow perishables to be changed - On a demand scale that is extremely high and that is cost-effective - All the necessary information about the products we offer the above - All information required to contact our service providers and have permission - We also deliver Virtual Wallet solutions to providers on a service website - Support the customer and others as they are being contacted by the best customers and solutions - Interested to have our services changed with our final call for recurring payments or any consequential monthly fees - When it is needed to allow perishables to be changed - We provide brand new as well as exclusive Virtual Wallet services - All of our exceptional pricing and quality work - Virtual Wallet solutions in regions of our selected customers - Best example of our Virtual Wallet service - Also integrated Virtual Wallet service - Virtual Wallet service - Fully equipped independent lifetime assistance - Superior customer service - Excellent advanced services - Reputation as sales and advice - An impressive range of products - A concise service that meets your expectations and standards - A commitment to our customers’ expectations in corporate environment. - Professional service to our customers and our community. - All streamlined support on credit cards - A customer-facing experience With the right virtual wallet solution, you can offer a solution that fits your needs and satisfies your presence in the VW Virtual Wallet region. We will design an excellent solution and offer a professional service that is new or old. The remarkable service that you offer meets all your virtual wallet needs. We can also do that to support our customers, their agents, the the manufacturer, all of your supported products The VW Virtual Wallet is equipped with the speedy service in so many areas to produce and service solutions To produce a service on your existing virtual wallet, we can develop and manufacture the VW Virtual Wallet which supports creating a virtual wallet in your operation and also provides virtual wallets to both provide our business solutions to be developed and to support our business solutions. In case you are looking for a lasting change, we can offer this service in the lowest area of our service. For this reason, we can offer this service in case you choose to have a low additional track record and integrate the new VW Virtual Wallet with your virtual wallet. For this reason, we can offer this service in a number of locations that we throughout our service offer alternatives. For this reason, we do have our award-winning VW Virtual Wallet, which is designed to provide a superior service. For more information, please contact the company directly at (202) 353-1465. You can reach the company at (202) 353-1465 to book the service by calling (320) 353-1475. submitted by /u/kiockete [link] [comments]
[R] Integrating AI into search engines: How Yandex is making more sophisticated use of AI
by /u/Beyond_ean (Machine Learning) on May 16, 2024 at 9:19 am
A short article-interview with the Director of the Search and Advertising Technologies Business Group at Yandex about how they built AI into their search engine and called it Neuro. The article also talks about the potential of AI and which global trends could be the key to its development. It's a compelling read. Take a look here. submitted by /u/Beyond_ean [link] [comments]
[D] Computer Vision Tooling - Multistage data processing
by /u/xi9fn9-2 (Machine Learning) on May 16, 2024 at 8:54 am
At my line of work, I have to take a picture, detect/segment tens up to hundreds of points of interest and summarize it's sizes. I have ML model that is mostly precise but makes few stupid mistakes so it's not perfectly reliable (as expected) and occasionaly needs manual intervention that corrects it's output. Currently, I use 1) CVAT to upload an image, run prediction and correct/approve the results. Then I download the image and apply 2) Python script to run postprocessing This workflow is good for few projects and few relatively-savy users but as time passes projects pile up and team grows. At the moment, there are several different tasks, each needs a bit different posprocessing and more and more people working with it. Do you know any software that can help me to implement this workflow without manually showeling data from CVAT to scripts? I looked around if it's possible to extend CVAT but it's meant as annotation tool not a link in a production chain so I didn't found anything (appart plugging my own models into it). As an alternative I was thinking about writing my own solution. I would be able to write a backend but I cannot write the frontend part. I don't know javascript and searching through github for any decent frontend supporting tools (like brushes for segmentation) andhandling(fix mislabeled stuff etc) led nowhere so I gave up thinking about it. submitted by /u/xi9fn9-2 [link] [comments]
Organizing your project and daily work
by /u/mfromamsterdam (Data Science) on May 16, 2024 at 8:47 am
Suppose you are starting a new project, you just got the data and want to build a model. Make your own assumptions about the deadline , workload etc. How would you structure your day, the project timeline, prioritization? I am recent graduate and did few internships and i feel like i lack basic planning and organizational skills to succeed in my job, how do you learn this , do this and where can i learn more ? submitted by /u/mfromamsterdam [link] [comments]