### AI Dashboard is available on the Web, Apple, Google, and Microsoft, PRO version

**What are the top 3 methods used to find Autoregressive Parameters in Data Science?**

In order to find autoregressive parameters, you will first need to understand what autoregression is. **Autoregression is a statistical method used to create a model that describes data as a function of linear regression of lagged values of the dependent variable**. In other words, it is a model that uses past values of a dependent variable in order to predict future values of the same dependent variable.

In time series analysis,** autoregression is the use of previous values in a time series to predict future values.** In other words, it is a form of regression where the dependent variable is forecasted using a linear combination of past values of the independent variable. The parameter values for the autoregression model are estimated using the method of least squares.

The autoregressive parameters are the coefficients in the autoregressive model. These coefficients can be estimated in a number of ways, including ordinary least squares (OLS), maximum likelihood (ML), or least squares with L1 regularization (LASSO). Once estimated, the autoregressive parameters can be used to predict future values of the dependent variable.

To find the autoregressive parameters, you need to use a method known as** least squares regression**. This method finds the parameters that minimize the sum of the squared residuals. The residual is simply the difference between the predicted value and the actual value. So, in essence, you are finding the parameters that best fit the data.

### Get 20% off Google Google Workspace (Google Meet) Standard Plan with the following codes: 96DRHDRA9J7GTN6

Get 20% off Google Workspace (Google Meet) Business Plan (AMERICAS): M9HNXHX3WC9H7YE (Email us for more codes)

**How to Estimate Autoregressive Parameters?**

There are three main ways to estimate autoregressive parameters: ordinary least squares (OLS), maximum likelihood (ML), or least squares with L1 regularization (LASSO).

**Ordinary Least Squares**: Ordinary least squares is the simplest and most common method for estimating autoregressive parameters. This method estimates the parameters by minimizing the sum of squared errors between actual and predicted values.

**Maximum Likelihood**: Maximum likelihood is another common method for estimating autoregressive parameters. This method estimates the parameters by maximizing the likelihood function. The likelihood function is a mathematical function that quantifies the probability of observing a given set of data given certain parameter values.

**Least Squares with L1 Regularization**: Least squares with L1 regularization is another method for estimating autoregressive parameters. This method estimates the parameters by minimizing the sum of squared errors between actual and predicted values while also penalizing models with many parameters. L1 regularization penalizes models by adding an extra term to the error function that is proportional to the sum of absolute values of the estimator coefficients.

**Finding Autoregressive Parameters:** The Math Behind It

To find the autoregressive parameters using least squares regression, you first need to set up your data in a certain way. You need to have your dependent variable in one column and your independent variables in other columns. For example, let’s say you want to use three years of data to predict next year’s sales (the dependent variable). Your data would look something like this:

| Year | Sales |

|——|——-|

| 2016 | 100 |

| 2017 | 150 |

| 2018 | 200 |

Next, you need to calculate the means for each column. For our sales example, that would look like this:

$$ \bar{Y} = \frac{100+150+200}{3} = 150$$

Now we can calculate each element in what’s called the variance-covariance matrix:

$$ \operatorname {Var} (X)=\sum _{i=1}^{n}\left({x_{i}}-{\bar {x}}\right)^{2} $$

and

$$ \operatorname {Cov} (X,Y)=\sum _{i=1}^{n}\left({x_{i}}-{\bar {x}}\right)\left({y_{i}}-{\bar {y}}\right) $$

For our sales example, that calculation would look like this:

$$ \operatorname {Var} (Y)=\sum _{i=1}^{3}\left({y_{i}}-{\bar {y}}\right)^{2}=(100-150)^{2}+(150-150)^{2}+(200-150)^{2})=2500 $$

and

$$ \operatorname {Cov} (X,Y)=\sum _{i=1}^{3}\left({x_{i}}-{\bar {x}}\right)\left({y_{i}}-{\bar {y}}\right)=(2016-2017)(100-150)+(2017-2017)(150-150)+(2018-2017)(200-150))=-500 $$

Now we can finally calculate our autoregressive parameters! We do that by solving this equation:

$$ \hat {\beta }=(X^{\prime }X)^{-1}X^{\prime }Y=\frac {1}{2500}\times 2500\times (-500)=0.20 $$\.20 . That’s it! Our autoregressive parameter is 0\.20 . Once we have that parameter, we can plug it into our autoregressive equation:

$$ Y_{t+1}=0\.20 Y_t+a_1+a_2+a_3footnote{where $a_1$, $a_2$, and $a_3$ are error terms assuming an AR(3)} .$$ And that’s how you solve for autoregressive parameters! Of course, in reality you would be working with much larger datasets, but the underlying principles are still the same. Once you have your autoregressive parameters, you can plug them into the equation and start making predictions!.

**Which Method Should You Use?**

The estimation method you should use depends on your particular situation and goals. If you are looking for simple and interpretable results, then Ordinary Least Squares may be the best method for you. If you are looking for more accurate predictions, then Maximum Likelihood or Least Squares with L1 Regularization may be better methods for you.

**Autoregressive models STEP BY STEP:**

1) **Download data**: The first step is to download some data. This can be done by finding a publicly available dataset or by using your own data if you have any. For this example, we will be using data from the United Nations Comtrade Database.

2) **Choose your variables**: Once you have your dataset, you will need to choose the variables you want to use in your autoregression model. In our case, we will be using the import and export values of goods between countries as our independent variables.

3) **Estimate your model:** After choosing your independent variables, you can estimate your autoregression model using the method of least squares. OLS estimation can be done in many statistical software packages such as R or STATA.

4) **Interpret your results**: Once you have estimated your model, it is important to interpret the results in order to understand what they mean. The coefficients represent the effect that each independent variable has on the dependent variable. In our case, the coefficients represent the effect that imports and exports have on trade balance. A positive coefficient indicates that an increase in the independent variable leads to an increase in the dependent variable while a negative coefficient indicates that an increase in the independent variable leads to a decrease in the dependent variable.

5)**Make predictions:** Finally, once you have interpreted your results, you can use your autoregression model to make predictions about future values of the dependent variable based on past values of the independent variables.

**Conclusion:** In this blog post, we have discussed what autoregression is and how to find autoregressive parameters.

Estimating an autoregression model is a relatively simple process that can be done in many statistical software packages such as R or STATA.

In statistics and machine learning, autoregression is a modeling technique used to describe the linear relationship between a dependent variable and one more independent variables. To find the autoregressive parameters, you can use a method known as least squares regression which minimizes the sum of squared residuals. This blog post also explains how to set up your data for calculating least squares regression as well as how to calculate Variance and Covariance before finally calculating your autoregressive parameters. After finding your parameters you can plug them into an autoregressive equation to start making predictions about future events!

We have also discussed three different methods for estimating those parameters: Ordinary Least Squares, Maximum Likelihood, and Least Squares with L1 Regularization. **The appropriate estimation method depends on your particular goals and situation.**

# Machine Learning For Dummies App

Machine Learning For Dummies on iOs: https://apps.apple.com/

Machine Learning For Dummies on Windows: https://www.

Machine Learning For Dummies Web/Android on Amazon: https://www.amazon.

What are some good datasets for Data Science and Machine Learning?

Machine Learning Engineer Interview Questions and Answers

**Machine Learning Breaking News **

Transformer – Machine Learning Models

**Machine Learning – Software Classification**

# Autoregressive Model

Autoregressive generative models can estimate complex continuous data distributions such as trajectory rollouts in an RL environment, image intensities, and audio. Traditional techniques discretize continuous data into various bins and approximate the continuous data distribution using categorical distributions over the bins. This approximation is parameter inefficient as it cannot express abrupt changes in density without using a significant number of additional bins. Adaptive Categorical Discretization (ADACAT) is proposed in this paper as a parameterization of 1-D conditionals that is expressive, parameter efficient, and multimodal. A vector of interval widths and masses is used to parameterize the distribution known as ADACAT. Figure 1 showcases the difference between the traditional uniform categorical discretization approach with the proposed ADACAT.

Each component of the ADACAT distribution has non-overlapping support, making it a specific subfamily of mixtures of uniform distributions. ADACAT generalizes uniformly discretized 1-D categorical distributions. The proposed architecture allows for variable bin widths and more closely approximates the modes of two Gaussians mixture than a uniformly discretized categorical, making it highly expressive than the latter. Additionally, a distribution’s support is discretized using quantile-based discretization, which bins data into groups with similar measured data points. ADACAT uses deep autoregressive frameworks to factorize the joint density into numerous 1-D conditional ADACAT distributions in problems with more than one dimension.

Continue reading | *Check out the* *paper* *and* *github link.*

**Pytorch – Computer Application**

https://torchmetrics.readthedocs.io/en/stable//index.html

Best practices for training PyTorch model

What are some ways we can use machine learning and artificial intelligence for algorithmic trading in the stock market?

What are some good datasets for Data Science and Machine Learning?

Top 100 Data Science and Data Analytics and Data Engineering Interview Questions and Answers

Machine Learning Engineer Interview Questions and Answers

- [P] Learn to Binarize CLIP (&SigLIP) for Multimodal Retrieval and Rankingby /u/Jesse_marqo (Machine Learning) on May 23, 2024 at 10:46 pm
Learning to binarize and rank with CLIP to reduce storage by 32x for text or multimodal search and recommendations. Article: https://www.marqo.ai/blog/learn-to-binarize-clip-for-multimodal-retrieval-and-ranking Binary embeddings during CLIP rank-tuning preserve between 87-93% of fp32 embeddings. Pseudo-quantization with sigmoid with 4x scaled temperature is (almost) universally better than tanh (see next point). Cosine similarity on 0/1 (sigmoid) is better than -1, 1 (tanh) - pretty sure this is because cosine has better degeneracy (D vs DxN) as it penalises embeddings that are not on the same hyper-sphere (it also biases for fewer non-zero elements). Use L1 to approximate hamming distance during training which is marginally better than cosine (for 0/1). Evaluated using GS-10M for multimodal retrieval using exact KNN. Fp32 embeddings retain full fidelity when auxiliary binary loss is added . Evaluated across in-domain, novel query, novel document and zero-shot settings. Can be combined with Matryoshka if really necessary but fidelity does suffer (not shown). submitted by /u/Jesse_marqo [link] [comments]

- [D] How to handle GPL Licensing in ML? Example: commercial object detection appby /u/sushi_roll_svk (Machine Learning) on May 23, 2024 at 10:10 pm
Hi ML people people, I’m working on building a commercial app that involves an object detection model. I’m considering using YOLOv7 with my own training data, but I’m concerned about the licensing. YOLOv7 is licensed under GNU GPL 3.0, which would require me to make the source code of my whole app open source if YOLOv7 is integrated into it. I’m curious about how others handle this situation. Specifically: Do developers often resort to using different, older models that have more permissive licenses for commercial applications? Are there any alternative approaches, such as training a model from scratch using the YOLOv7 repo (so that it is not a derivative of the repo), and build my own inference pipeline that does not use the YOLOv7 codebase for inference? For example using ONNX. I would really appreciate any insights or experiences. Thanks! submitted by /u/sushi_roll_svk [link] [comments]

- How do data scientists handle GPL Licensing? Example: an object detector in a Commercial appby /u/sushi_roll_svk (Data Science) on May 23, 2024 at 9:51 pm
Hi data scientists/computer vision people, I’m working on building a commercial app that involves an object detection model. I’m considering using YOLOv7 with my own training data, but I’m concerned about the licensing. YOLOv7 is licensed under GNU GPL 3.0, which would require me to make the source code of my whole app open source if YOLOv7 is integrated into it. I’m curious about how others handle this situation. Specifically: Do developers often resort to using different, older models that have more permissive licenses for commercial applications? Are there any alternative approaches, such as training a model from scratch using the YOLOv7 repo (so that it is not a derivative of the repo), and build my own inference pipeline that does not use the YOLOv7 codebase for inference? For example using ONNX. I would really appreciate any insights or experiences. Thanks! submitted by /u/sushi_roll_svk [link] [comments]

- [D] Looking for specific Article from HuggingFaceby /u/Medical_Initial (Machine Learning) on May 23, 2024 at 9:10 pm
Hi all, this might be a bit of a shot in the dark but here it goes: I saw someone from HuggingFace either share/like a post about a methodology (using HuggingFace) for extreme classification problems, e.g. hundreds or even thousands of classes, in a few shot setting. I didn't save it at the time and for the life of me can't find it anymore. I tried LinkedIn search but it's terrible for finding posts. Google, HuggingFace website, Perplexity have all been of no use. Does this ring a bell to anyone? submitted by /u/Medical_Initial [link] [comments]

- [R] raster to graphicsby /u/Worldly-Inflation-92 (Machine Learning) on May 23, 2024 at 9:04 pm
I want to tune hyperparameters of vtracer. Would it produce good results? If not what other techniques can I accommodate with it or what approach should I follow? submitted by /u/Worldly-Inflation-92 [link] [comments]

- [R] Introducing SSAMBA: The Self-Supervised Audio Mamba!by /u/attentionisallyounee (Machine Learning) on May 23, 2024 at 7:53 pm
Hey Reddit, Tired of transformers? Is attention really all you need? Meet SSAMBA (Self-Supervised Audio Mamba)! 🐍✨ This attention-free, purely state-space model (SSM)-based, self-supervised marvel doesn’t just hiss—it roars! SSAMBA achieves better or similar performance to its transformer-based counterparts (SSAST) on tasks like speaker identification, keyword spotting, and audio classification. But here's the kicker: it’s much more GPU memory efficient and quicker at inference, especially with longer audio lengths. Curious? Check out the full paper here: SSAMBA on arXiv Thanks for tuning in! submitted by /u/attentionisallyounee [link] [comments]

- [D] Paperswithcode relevant?by /u/_puhsu (Machine Learning) on May 23, 2024 at 7:46 pm
I feel like paperswithcode became less relevant for tracking progress in ML in general for me. But it’s hard to say, in my field (tabular ML/DL) there are not many established academic benchmarks (no need for something like papers with code yet) In NLP and foundation model space leaderboards in hf spaces became a thing (mostly in NLP). Overall, paperswithcode just feels less maintained and less useful. Do you use paperswithcode often? What do you use it for? What’s your field where it is useful? submitted by /u/_puhsu [link] [comments]

- [D][R] If you could pick 3 papers about video/image generation models, which one would you pick?by /u/ShlomiRex (Machine Learning) on May 23, 2024 at 5:18 pm
I am doing my MSc and I've chosen to do a video generation project. I've read some papers on image and video synthesis: VQGAN Stable Diffusion Imagen I also picked 3 video generation papers: Video-LDM Stable Video Diffusion fine-tuned for Multi-View generation (SVD-MV) Text2Video-Zero I also read some survey papers, and those are the models I've chosen to talk about. What i'm struggling with is to pick a logically ordered papers, so first I explain the 3 image generation papers, and the video generation papers should follow the same strategies mentioned in the image synthesis papers. Can I ask you to suggest different set of papers to write about? I can still change all of the papers to something else. Something that is mostly recent (2020-2024 is fine) and has some big impact. I know for example VQGAN is popular base model, the techniques and strategies used in the paper are still relevant today. Imagen (by Google) however, is not open source, and I prefer papers with open source code. That's why I avoid OpenAI papers. I also read that diffusion is chosen over GAN in video generation because it has better results, both in quality and training. However, diffusion is more computationally expensive. Video-LDM for example is based on Stable Diffusion, so for me its good papers to talk about. submitted by /u/ShlomiRex [link] [comments]

- [R] Fuse Feature Vector in image classificationby /u/Civil_Statement_9331 (Machine Learning) on May 23, 2024 at 3:21 pm
Hi everyone, Currently, i'm processing a image classification problem about facial emotional classification. I am using 2 extract methods: HOG and Facial Landmark. My idea is using HOG to find the gradient magnitude and oriented of the image and use facial landmark to find face keypoint. I thought i can fuse 2 method to make a better feature. But the new feature worse than HOG and better than facial landmark (same model to evaluate). I have some question: I wonder how i can fuse these two method where HOG normalization before and facial landmark return 68x2 pairs point integer. If can, should i normalize or something before fuse ? Which method i can try to fuse them (concat, add, multiply, ...) ? Is there anyway how to measure my method will be better or evaluate it ? I am also try to fuse HOG and SIFT (Bag of visual word) too. I had tried fuse HOG and Facial Landmark feature but it get worse than HOG and better than Facial Landmark in the same model. I also fuse (SIFT) bag of visual word and HOG but it still worse than HOG and better than bag of visual word. Here is the code i use: x_hogp_train = pca.transform(x_hog_train)[:,:382] x_hogp_valid = pca.transform(x_hog_valid)[:,:382] x_hogp_test = pca.transform(x_hog_test)[:,:382] scaler = StandardScaler() # scale bovw feature scaler.fit(x_bovw_train) x_scale_bovw_train = scaler.transform(x_bovw_train) x_scale_bovw_valid = scaler.transform(x_bovw_valid) x_scale_bovw_test = scaler.transform(x_bovw_test) # fuse them use concat x_fused_train = np.concatenate((x_hogp_train, x_scale_bovw_train), axis=1) x_fused_valid = np.concatenate((x_hogp_valid, x_scale_bovw_valid), axis=1) x_fused_test = np.concatenate((x_hogp_test, x_scale_bovw_test), axis=1) Thank in advance submitted by /u/Civil_Statement_9331 [link] [comments]

- Cloud computing that is most commonly used by companies in Europe / Swedenby /u/-S-I-D- (Data Science) on May 23, 2024 at 3:15 pm
Hi, I am planning to do a cloud certification on either AWS, Azure, or GCP but I'm not sure which one is generally used and preferred by companies here so that I can learn the one that companies expect from their candidates. Does anyone have any insights on this? submitted by /u/-S-I-D- [link] [comments]

- [R] Variational Inference: Reverse KL vs. Forward KLby /u/DriftingClient (Machine Learning) on May 23, 2024 at 2:36 pm
Hi all, I'm working on variational inference methods, mainly in the context of BNNs. Using the reverse (exclusive) KL as the variational objective is the common approach, though lately I stumbled upon some interesting works that use the forward (inclusive) KL as an objective instead, e.g [1][2][3]. Also in the context of VI for GPs both divergence measures have been used, see e.g [4]. While I'm familiar with the well-known difference between the objectives that the reverse KL is 'mode-seeking' and the forward KL is 'mode covering', I see some of these works making claims about downstream differences of these VI objectives such as (paraphrasing here) "the reverse KL underestimates predictive variance" [4] and "the forward KL is useful for applications benefiting from conservative uncertainty quantification" [3]. I'm interested in understanding these downstream differences in the context of VI, but haven't found any works that explain these claims theoretically instead of empirically. Anyone who can point me in the right direction or have a go at explaining this? Cheers [1] Naesseth, Christian, Fredrik Lindsten, and David Blei. "Markovian score climbing: Variational inference with KL (p|| q)." Advances in Neural Information Processing Systems 33 (2020): 15499-15510. [2] Zhang, L., Blei, D. M., & Naesseth, C. A. (2022). Transport score climbing: Variational inference using forward KL and adaptive neural transport. arXiv preprint arXiv:2202.01841. [3] McNamara, D., Loper, J., & Regier, J. (2024, April). Sequential Monte Carlo for Inclusive KL Minimization in Amortized Variational Inference. In International Conference on Artificial Intelligence and Statistics (pp. 4312-4320). PMLR. [4] Bauer, M., Van der Wilk, M., & Rasmussen, C. E. (2016). Understanding probabilistic sparse Gaussian process approximations. Advances in neural information processing systems, 29. submitted by /u/DriftingClient [link] [comments]

- [D] Phi-3 models compared side-by-side.by /u/dark_surfer (Machine Learning) on May 23, 2024 at 2:19 pm
https://preview.redd.it/8l04pnfhq62d1.png?width=661&format=png&auto=webp&s=7fe616ca8cd7da974070c86b6b47ffab3ab545e5 https://preview.redd.it/hr7fr1uiq62d1.png?width=688&format=png&auto=webp&s=bd3de359bfe4c1ed82d092be92ae38c246bdfda2 https://preview.redd.it/v6k3v39kq62d1.png?width=450&format=png&auto=webp&s=c0abb0e397a498ef7ccfb35b1b1cb598198f66ad For anyone looking to compare the Phi-3 benchmarks in one place. Interesting comparisons for: ANLI, Hellaswag, MedQA, TriviaQA, Language understanding, Factual Knowledge and Robustness. Note: Phi-3 mini model table have labels in different order. submitted by /u/dark_surfer [link] [comments]

- Are data science influencers popular?by /u/thebrilliot (Data Science) on May 23, 2024 at 1:33 pm
I'm curious if anyone knows of data science "influencers" that are anything like other popular web and software development gurus I see all over YouTube. I know Pirate software is big on both YouTube and Twitch, and I get more web developers in my YouTube feed than I can count but ThePrimeagen is a big one there. I know there are data science peeps out there but I'm surprised they are not more popular. For instance, Rachael Tatman does some really awesome stuff, but her videos don't get many views. I'd really love to hear more from data scientists who make real applications with their skills and talk about their experience working in the field. The most popular I've found is Medallion Stallion. Does anybody know why there aren't more popular data science influencers? It just seems like with the attention from AI that there would be more out there. submitted by /u/thebrilliot [link] [comments]

- [D] What's the biggest challenge you face when deploying ML models on your own cloud (Azure/AWS/GCP)?by /u/Capital_Ad1552 (Machine Learning) on May 23, 2024 at 12:38 pm
Hi, This is a market research post to understand the challenges people face while deploying open-source or custom ML models in production on their own cloud (AWS/Azure/GCP). Options: Deployment complexity (K8S, Knative, Ray, etc) Autoscaling wrt user demand Lack of GPU availability (spot instances, quota limit) Setting up CI/CD submitted by /u/Capital_Ad1552 [link] [comments]

- [D] Index of Indicesby /u/Opening_Youth3387 (Machine Learning) on May 23, 2024 at 12:23 pm
Okay so I have been exploring the world of RAGs and I am thinking of a concept called index of indices. I am working on a large database of a company [Consider a Fortune 2500]. There are multiple functions (100+) across the company which do not really overlap too much - marketing & manufacturing for instance. Now for various teams within the company I want to give each department a RAG enabled chat system to play with. Someone from marketing department does some search on a topic like - "What was CAC for Product A in the year 2023?". Now it goes and looks for that information across vectors in an index, but in parallel it also looks for the information in 99 other indices (considering 100 departments = 100 indices). If top_k=10, then likely, it's going to come up with 1000 results, which would be passed as context to LLM which may become very large resulting in increased latency and pricing. I am thinking of creating a master index, that has top_k = 10 (example) which then selects 10 most relevant indices based on what a user is looking for and then searches within those 10 indices, giving 100 results instead of 1000. I know these numbers would need to be optimized a lot, but has anyone heard about creating a vector database (on chroma or pinecone) where there's a master index and sub indices. Is it recommended for a RAG model? Is there any reading material on it? Thanks! submitted by /u/Opening_Youth3387 [link] [comments]

- Hot Take: "Data are" is grammatically incorrect even if the guide books say it's right.by /u/takenorinvalid (Data Science) on May 23, 2024 at 12:09 pm
Water is wet. There's a lot of water out there in the world, but we don't say "water are wet". Why? Because water is an uncountable noun, and when a noun in uncountable, we don't use plural verbs like "are". How many datas do you have? Do you have five datas? Did you have ten datas? No. You have might have five data points, but the word "data" is uncountable. "Data are" has always instinctively sounded stupid, and it's for a reason. It's because mathematicians came up with it instead of English majors that actually understand grammar. Thank you for attending my TED Talk. submitted by /u/takenorinvalid [link] [comments]

- How Data Driven is your Organization and Why?by /u/CarrolltonConsulting (Data Science) on May 23, 2024 at 11:55 am
I've worked with a few companies over the years with wildly varying levels of data maturity and data trust. In my current organization, we'd like to think we're data driven, and we have solid reporting and good models deployed... but our IT teams don't take data quality seriously, so we're frequently troubleshooting inaccuracies, delays and other problems. Even though our reporting is accurate, we still have challenges with leaders who just don't trust the data or math, and push back with "we need more analysis" or "could I see another cut" before they'll take any action based on our recommendations. Part of me thinks the push back is just human - people don't want to take action, so they delay by asking someone else to do it first, but I don't think this is an unusual experience. Whats your experience with your organization? submitted by /u/CarrolltonConsulting [link] [comments]

- TPC-H Cloud Benchmarks: Spark, Dask, DuckDB, Polarsby /u/mrocklin (Data Science) on May 23, 2024 at 11:22 am
I hit publish on a blogpost last week on running Spark, Dask, DuckDB, and Polars on the TPC-H benchmark across a variety of scales (10 GiB, 100 GiB, 1 TiB, 10 TiB), both locally on a Macbook Pro and on the cloud. It’s a broad set of configurations. The results are interesting. No project wins uniformly. They all perform differently at different scales: DuckDB and Polars are crazy fast on local machines Dask and DuckDB seem to win on cloud and at scale Dask ends up being most robust, especially at scale DuckDB does shockingly well on large datasets on a single large machine Spark performs oddly poorly, despite being the standard choice 😢 Tons of charts in this post to try to make sense of the data. If folks are curious, here’s the post: https://docs.coiled.io/blog/tpch.html Performance isn’t everything of course. Each project has its die-hard fans/critics for loads of different reasons. Anyone want to attack/defend their dataframe library of choice? submitted by /u/mrocklin [link] [comments]

- Anomalies and forecasting with MLby /u/ubiond (Data Science) on May 23, 2024 at 11:07 am
What ML topic should I learn to do forecasting/predictive analysis and anomaly/fraud detection? Also things like churn rate predictions, user behaviour and so o submitted by /u/ubiond [link] [comments]

- [D] Better APIs for high-load computer image inference?by /u/wedazu (Machine Learning) on May 23, 2024 at 10:50 am
Hey everyone. We build docker images with CV models using BentoML. But the problem is that it uses REST API, which is not the best choice for high-load CV inference. Are there other open-source solutions for providing faster api for computer vision inference? Is there any chance we can implement them in our pipeline on BentoML? submitted by /u/wedazu [link] [comments]