**What is the Best Machine Learning Algorithms for Imbalanced Datasets?**

## In machine learning, **imbalanced datasets** are those where one class heavily outnumbers the others. This can be due to the nature of the problem or simply because more data is available for one class than the others. Either way, imbalanced datasets can pose a challenge for machine learning algorithms. In this blog post, we’ll take a look at which machine learning algorithms are best suited for imbalanced datasets and why they tend to perform better than others.

** For example, in a binary classification problem, if there are 100 observations, and only 10 of them are positive (the rest are negatives), then we say that the dataset is imbalanced. The ratio of positive to negative cases is 1:10. **

## There are a few reasons why some machine learning algorithms tend to perform better on imbalanced datasets than others. First, certain algorithms are designed to handle imbalanced datasets. Second, some algorithms are more robust to outliers, which can be more common in imbalanced datasets. And third, some algorithms are better able to learn from a limited amount of data, which can be an issue when one class is heavily outnumbered by the others.

**Some of the best machine learning algorithms for imbalanced datasets include:**

**– Support Vector Machines (SVMs),**

**– Decision Trees,**

**– Random Forests,**

**– Naive Bayes Classifiers,**

**– k-Nearest Neighbors (kNN),**

## Of these, SVMs tend to be the most popular choice as they are specifically designed to handle imbalanced datasets. SVMs work by finding a hyperplane that maximizes the margin between the two classes. This helps to reduce overfitting and improve generalization. Decision trees and random forests are also popular choices as they are less sensitive to outliers than other algorithms such as linear regression. Naive Bayes classifiers are another good choice as they are able to learn from a limited amount of data. kNN is also a good choice as it is not sensitive to outliers and is able to learn from a limited amount of data. However, it can be computationally intensive for large datasets.

## There are two main types of machine learning algorithms: supervised and unsupervised. Supervised algorithms tend to perform better on imbalanced datasets than unsupervised algorithms. In this blog post, we will discuss why this is so and look at some examples.

**Supervised Algorithms**

Supervised algorithms are those where the target variable is known. In other words, we have training data where the correct answers are already given. The algorithm then learns from this data and is able to generalize to new data. Some examples of supervised algorithms are regression and classification.

**Unsupervised Algorithms**

Unsupervised algorithms are those where the target variable is not known. With unsupervised algorithms, we only have input data, without any corresponding output labels. The algorithm has to learn from the data itself without any guidance. Some examples of unsupervised algorithms are clustering and dimensionality reduction.

**Why Supervised Algorithms Perform Better on Imbalanced Datasets**

**The reason why supervised algorithms perform better on imbalanced datasets is because they can learn from the training data which cases are more important**. With unsupervised algorithms, all data points are treated equally, regardless of whether they are in the minority or majority class.

## For example, in a binary classification problem with an imbalanced dataset, let’s say that we want to predict whether a customer will default on their loan payment or not. We have a training dataset of 1000 customers, out of which only 100 (10%) have defaulted on their loan in the past.

## If we use a supervised algorithm like logistic regression, the algorithm will learn from the training data that defaulting on a loan is rare (since only 10% of cases in the training data are Positive). This means that it will be more likely to predict correctly that a new customer will not default on their loan (since this is the majority class in the training data).

However, if we use an unsupervised algorithm like k-means clustering, all data points will be treated equally since there is no target variable to guide the algorithm. This means that it might incorrectly cluster together customers who have defaulted on their loans with those who haven’t since there is no guidance provided by a target variable.

**Conclusion: **

**In conclusion, supervised machine learning algorithms tend to perform better on imbalanced datasets than unsupervised machine learning algorithms because they can learn from the training data which cases are more important. **

**Some machine learning algorithms tend to perform better on highly imbalanced datasets because they are designed to deal with imbalance or because they can learn from both classes simultaneously. If you are working with a highly imbalanced dataset, then you should consider using one of these algorithms.**

**Thanks for reading!**

**How are machine learning techniques being used to address unstructured data challenges**?

Machine learning techniques are being used to address unstructured data challenges in a number of ways:

**Natural language processing (NLP)**: NLP algorithms can be used to extract meaningful information from unstructured text data, such as emails, documents, and social media posts. NLP algorithms can be trained to classify text data, identify key terms and concepts, and extract structured data from unstructured text.**Image recognition**: Machine learning algorithms can be used to analyze and classify images, enabling the automatic identification and classification of objects, people, and other elements in images. This can be useful for tasks such as image tagging and search, as well as for applications such as security and surveillance.**Audio and speech recognition**: Machine learning algorithms can be used to analyze and classify audio data, enabling the automatic transcription and translation of spoken language. This can be useful for tasks such as speech-to-text transcription, as well as for applications such as call center automation and language translation.**Video analysis**: Machine learning algorithms can be used to analyze and classify video data, enabling the automatic detection and classification of objects, people, and other elements in video. This can be useful for tasks such as video tagging and search, as well as for applications such as security and surveillance.

Overall, machine learning techniques are being used in a wide range of applications to extract meaningful information from unstructured data, and to enable the automatic classification and analysis of data in a variety of formats.

**How is AI and machine learning impacting application development today?**

Artificial intelligence (AI) and machine learning are having a significant impact on application development today in a number of ways:

- Enabling new capabilities: AI and machine learning algorithms can be used to enable applications to perform tasks that would be difficult or impossible for humans to do. For example, AI-powered applications can be used to analyze and classify large amounts of data, or to automate complex decision-making processes.
- Improving performance: AI and machine learning algorithms can be used to optimize the performance of applications, making them faster, more efficient, and more accurate. For example, machine learning algorithms can be used to improve the accuracy of predictive models, or to optimize the performance of search algorithms.
- Streamlining development: AI and machine learning algorithms can be used to automate various aspects of application development, such as testing, debugging, and deployment. This can help to streamline the development process and reduce the time and resources needed to build and maintain applications.
- Enhancing user experiences: AI and machine learning algorithms can be used to enhance the user experience of applications, by providing personalized recommendations, recommendations, or by enabling applications to anticipate and respond to the needs and preferences of users.

Overall, AI and machine learning are having a significant impact on application development today, and they are likely to continue to shape the way applications are built and used in the future.

**How will advancements in artificial intelligence and machine learning shape the future of work and society?**

Advancements in artificial intelligence (AI) and machine learning are likely to shape the future of work and society in a number of ways. Some potential impacts include:

**Automation**: AI and machine learning algorithms can be used to automate tasks that are currently performed by humans, such as data entry, customer service, and manufacturing. This could lead to changes in the types of jobs that are available and the skills that are in demand, as well as to increased productivity and efficiency.**Job displacement**: While automation may create new job opportunities, it could also lead to job displacement, particularly for workers in industries that are more susceptible to automation. This could lead to social and economic challenges, including unemployment and income inequality.**Increased efficiency**: AI and machine learning algorithms can be used to optimize and streamline business processes, leading to increased efficiency and productivity. This could lead to economic growth and innovation, and could also help to reduce costs for businesses and consumers.**Enhanced decision-making**: AI and machine learning algorithms can be used to analyze large amounts of data and make more informed and accurate decisions. This could lead to improved outcomes in fields such as healthcare, finance, and education, and could also help to reduce bias and improve fairness.

Overall, **the impact of AI and machine learning on the future of work and society is likely to be significant and complex, with both potential benefits and challenges.** It will be important to consider and address these impacts as these technologies continue to advance and become more widely adopted.

- [D] I created a tiny website for the ones who want to learn machine learning by themselvesby /u/IMMORNAVO (Machine Learning) on June 24, 2024 at 5:10 am
I just launched a tiny website specifically designed for beginners who want to grasp the fundamentals of ML! Whether you're completely new to the field or just looking to refresh your understanding, this site is a great place to start. https://themachineway.pages.dev/ P.S. Feel free to share any feedback or suggestions you might have to improve the website for future learners! submitted by /u/IMMORNAVO [link] [comments]

- [R] M3-AUDIODEC: Multi-channel multi-speaker multi-spatial audio codecby /u/Snoo63916 (Machine Learning) on June 24, 2024 at 4:33 am
submitted by /u/Snoo63916 [link] [comments]

- [P] C-GAN based MNIST model evaluator/validatorby /u/Willing-Ear-8271 (Machine Learning) on June 24, 2024 at 1:15 am
Receny I made an Conditional GAN based MNIST Model evaluator/validator. Repo link : https://github.com/shoryasethia/C-GAN-Powered-MNIST-Validator Please rate this project on a scale of 10. I am new to GANs and have started with building Generative AI from last month. submitted by /u/Willing-Ear-8271 [link] [comments]

- [R] [IEEE VR 2024] Listen2Scene: Interactive material-aware binaural sound propagation for 3D scenesby /u/Snoo63916 (Machine Learning) on June 23, 2024 at 11:32 pm
submitted by /u/Snoo63916 [link] [comments]

- [R] [CVPR 2024] AV-RIR: Audio-Visual Room Impulse Response Estimationby /u/Snoo63916 (Machine Learning) on June 23, 2024 at 11:29 pm
submitted by /u/Snoo63916 [link] [comments]

- [Research] Exploiting the Layered Intrinsic Dimensionality for Practical Adversarial Trainingby /u/hassaan84s (Machine Learning) on June 23, 2024 at 10:35 pm
Sharing our recent work for discussion: Paper: https://arxiv.org/pdf/2405.17130 Video: https://youtu.be/vL4pn6AnDwI We study the relationship between robustness, generalization and adversarial training from the perspective of data manifold. We leverage the manifold conjecture, stating that off-manifold AEs lead to better robustness while on-manifold AEs result in better generalization. We propose SMAAT, a new AT algorithm that leverages the manifold conjecture and aims at generating a higher proportion of off-manifold AEs by perturbing the intermediate deepnet layer with the lowest intrinsic dimension. https://preview.redd.it/1jgw9ag1fe8d1.png?width=1545&format=png&auto=webp&s=dc8650804d7e14d623675667d0a76b6831e159f5 submitted by /u/hassaan84s [link] [comments]

- [D] Patenting in MLby /u/SometimesObsessed (Machine Learning) on June 23, 2024 at 8:46 pm
For those of you researching on the frontier of ML theory or applications in academia or industry, how often are you filing for patents before you publish or market the product? I know google and others have patented things that seem like algorithms, but are written into patents as an application or a computer system. How prevalent is it when you are breaking a SOTA advancement? I see some other threads that suggest software patents aren't that enforceable: [D] Can Google sue OpenAI for using the Transformer in their products? : While other comments say patents are more important than papers in industry: https://www.reddit.com/r/MachineLearning/comments/mf24jr/comment/gsl3pux/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button Anyways, I'm curious what people's experience has been. submitted by /u/SometimesObsessed [link] [comments]

- Best text to speech for Virtual friend [Discussion]by /u/NoPrinciple1242 (Machine Learning) on June 23, 2024 at 8:11 pm
Hi I'm making a Unity virtual friend for children, it will be a kitten and I don't know which text-to-speech to use. I tried Google text-to-speech, would it be possible to modify the voice using code to make it sound more suitable for a kitten, or use another text-to-speech? Elevenlabs seems expensive to me. submitted by /u/NoPrinciple1242 [link] [comments]

- [R] Weight Rescaling: Applying Initialization Strategies During Trainingby /u/rasten41 (Machine Learning) on June 23, 2024 at 4:13 pm
submitted by /u/rasten41 [link] [comments]

- [P] llama.ttf: A font which is also an LLMby /u/pred (Machine Learning) on June 23, 2024 at 12:10 pm
submitted by /u/pred [link] [comments]

- [Research] EEG Classification using Graph Neural Networkby /u/DiscussionTricky2904 (Machine Learning) on June 23, 2024 at 10:16 am
So I am tyrna build a network to classify EEG Signals from 128 channels and 440 features to classes which correspond to the ImageNet dataset. The first model I had in mind was a GNN so that signals can develop some weight between each other, and train a siasemese network in which the sister network would generate the embedding of the image corressponding to the EEG. What would you guys recommend I should try? submitted by /u/DiscussionTricky2904 [link] [comments]

- [D] Thought Space in LLMs?by /u/ikoukas (Machine Learning) on June 23, 2024 at 8:49 am
I'm an amateur hobbyist around AI concepts. I'm curious to know what the obstacles for the following implementation could be and to what extent it has been tried. Image generation models use an iterative approach of noise reduction to generate an image. Could we create a thought-space representation of text, and use a similar technique to crystallize thoughts from random noise? We could create a long length embeddings model that transforms paragraphs or even documents into long vectors, or possibly variable length vectors. This could be the thought-space representation. After translating all the input-output pairs of the training set of an LLM (the whole internet), to thought-space, we could add various levels of noise to the outputs, and train a model to take an input thought and iteratively crystallize random noise into a refined, less-noisy output thought. This way, we could possibly assign extra inference compute to crystallize more complex thoughts for more difficult questions. The last step would be to transform the crystallized thoughts into text, images, sound etc. For text it could be done by reversing an embedding into a plausible text that would have a similar embedding value to the input embedding. As reversing isn't perfected yet, maybe an existing LLM could take the approximate text and the question to inference a more coherent output text. For other modalities I'm sure more experienced researchers could come up with solutions. I know this is full speculation with gaps in how and what, but I'm interested in the answers of experienced researchers to understand better of today's obstacles. Thanks submitted by /u/ikoukas [link] [comments]

- Cuda advanced learning materials, [D]by /u/M-notgivingup (Machine Learning) on June 23, 2024 at 5:34 am
I am searching for cuda advanced learning materials that are above beginnr lvl, I already did the nvidia's course named as Introductions to cuda in c++ , but that doesn't felt enough to let me get advanced tips and tricks and patterns. Recommend any books or any materials learning. Will be much helpful for me , thankss submitted by /u/M-notgivingup [link] [comments]

- [D] How many of you "work" on weekends?by /u/Seankala (Machine Learning) on June 23, 2024 at 4:45 am
I know that the nature of most of our work is time-consuming; sometimes a single experiment can take days if not weeks. My team, including myself, usually find ourselves working on the weekends too for this matter. We have to double check to make sure the experiments are running properly, and restart the experiment or make changes if not. Sometimes we just work on new experiments. It just seems like the weekend is such precious time that may go potentially wasted. A lot of my friends who aren't in the field have criticized this saying that we're slaving away for a company that doesn't care. The thing is my coworkers and I feel like we're doing this for ourselves. I'm curious how many other people here feel or experience the same? submitted by /u/Seankala [link] [comments]

- [D] Serving a model for clients with sensitive databy /u/Deto (Machine Learning) on June 23, 2024 at 4:01 am
I'm sure this situation happens and so I wanted to ask the sub what tools / platforms might be available to facilitate. The idea is that say you don't want to give your model to a client and they don't want to give you their data, but you want to be able to enter an agreement with them to process their data on your model. It's fundamentally an issue of trust. Is there, say third party platform that could mitigate this? Where you could upload your model and they could send it data for inference / receive results but have some assurance that you couldnt be secretly saving their data? submitted by /u/Deto [link] [comments]

- [R] The concept of an inverse SoftMax function in a multi-layered LLM structure within a multi-dimensional vector space.by /u/utkohoc (Machine Learning) on June 23, 2024 at 1:59 am
Introduction Machine learning (ML) is fundamentally rooted in mathematics, utilizing complex functions and programming to extrapolate vectors in a space and calculate probabilities. Large language models (LLMs), a subset of ML, employ mathematical techniques to determine connections between data points in a high-dimensional space. This paper explores the enhancement of LLM capabilities through the integration of additional mathematical layers, parallel computing, and advanced programming techniques like Bend. The Mathematical Foundation of Machine Learning At its core, machine learning involves the manipulation and transformation of vectors within a space to model and predict outcomes. This process heavily relies on functions such as weights and softmax. **Weights**: In neural networks, weights adjust the influence of input signals. They are optimized during training to minimize error and enhance prediction accuracy. **Softmax Function**: This function converts a vector of values into a probability distribution, often used in the final layer of a neural network for classification tasks. It ensures that the output values sum to one, making them interpretable as probabilities. Large Language Models (LLMs) LLMs, such as those based on the Transformer architecture, utilize a series of mathematical operations to model language. Transformers, introduced in "Attention is All You Need" by Vaswani et al., leverage self-attention mechanisms to process sequences of data without relying on recurrent structures. **Self-Attention Mechanism**: This mechanism allows the model to weigh the importance of different words in a sequence relative to each other, facilitating the capture of long-range dependencies in the data. **Multi-Head Attention**: Enhances the model’s ability to focus on different parts of the input sequence simultaneously by running multiple self-attention operations in parallel. Enhancing Learning Through Additional Mathematical Layers If LLMs use extensive mathematics to map connections between data points, incorporating more sophisticated mathematical operations into each layer can theoretically enhance their learning ability. The idea is to add new layers of mathematical functions on top of the existing probabilistic layers, effectively increasing the model's capacity to understand and manipulate data. **Parallel Computing with Bend**: Bend, a programming language designed for parallelism, can significantly boost the performance of LLMs. Bend supports features like fast object allocations, higher-order functions, and runs on massively parallel hardware like GPUs. This allows for nearly linear acceleration based on core count without explicit parallelism annotations (e.g., no thread creation or locks). Building a Multi-Layered LLM Structure Imagine constructing a multi-layered LLM where each layer represents an additional dimension of mathematical processing. The base layer operates as a standard LLM, processing data using conventional methods. Above this, additional layers perform more complex mathematical transformations. **First Layer**: Standard LLM processing on a GPU. **Second Layer**: Enhanced with additional mathematical functions running in parallel, leveraging Bend for optimal performance. By stacking these layers, the LLM can process data through multiple stages of mathematical refinement. The bottom layers handle probabilistic computations, while the upper layers focus on deterministic, linear algebra transformations. Example Structure Visualize the LLM structure as a 3D cube: **Base Layer**: A 10x10 grid of vector spaces, each running an LLM. **Upper Layers**: Additional 10x10 grids, each incorporating advanced mathematical functions. Each layer performs softmax operations on the outputs of the layer beneath it, iteratively refining the model's understanding of the data. This multi-dimensional approach can potentially produce a higher-order softmax function, enhancing the model's learning capabilities exponentially. Conclusion The integration of additional mathematical layers and advanced parallel computing techniques like Bend into LLMs represents a promising avenue for enhancing their learning capabilities. By building a multi-layered structure, we can leverage both probabilistic and deterministic computations to achieve more sophisticated data modeling and prediction. References Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. *31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA*. Bend Programming Language. GitHub repository. [https://github.com/HigherOrderCO/bend\](https://github.com/HigherOrderCO/bend) Points of Analysis **Mathematical Complexity and Computational Load**: **Claim**: Adding more mathematical layers and functions increases learning ability. **Analysis**: While more complex mathematical operations can provide deeper insights and refined models, they also significantly increase computational requirements. Each additional layer introduces more parameters to be learned, which can lead to issues such as overfitting if not managed properly. Moreover, more complex models require more data to train effectively and more computational power, potentially leading to inefficiencies and increased costs. **Parallel Computing and Bend**: **Claim**: Using parallel computing with a language like Bend can optimize the process without explicit parallelism annotations. **Analysis**: Bend's features (fast object allocations, support for higher-order functions, and scalability like CUDA) are promising for parallel computing. However, translating theoretical benefits into practical gains can be challenging. Effective parallelization of neural networks often requires careful tuning and management of data dependencies, which might still necessitate some level of explicit control over parallel processes. **Multi-Dimensional LLM Structure**: **Claim**: Constructing a multi-layered LLM structure (like a 3D cube) can enhance learning through additional mathematical refinements. **Analysis**: The concept of stacking layers in a 3D space and refining outputs through successive softmax operations is innovative. However, the practical implementation of such a structure poses several challenges: **Complexity Management**: Managing the increased complexity and ensuring stable training across multiple layers require sophisticated techniques to prevent issues like gradient vanishing or exploding. **Data Requirements**: More layers and parameters necessitate larger datasets for training to avoid overfitting and ensure the model generalizes well to unseen data. **Interpretability**: Adding multiple layers of mathematical functions can reduce the interpretability of the model, making it harder to diagnose issues and understand the model’s decision-making process. **Probabilistic vs. Deterministic Layers**: **Claim**: Combining probabilistic layers with deterministic, linear algebra transformations enhances model capabilities. **Analysis**: Integrating deterministic operations with probabilistic ones can indeed enrich the model’s feature extraction capabilities. However, ensuring smooth interaction between these two types of operations is non-trivial. Linear algebra transformations need to be carefully designed to complement the probabilistic layers without introducing instability or incompatibility in the learning process. **Softmax and Higher-Order Functions**: **Claim**: Using softmax operations across multiple layers to derive a higher-order softmax function. **Analysis**: The idea of iteratively refining softmax operations through additional layers is intriguing. However, ensuring that each layer’s softmax output correctly informs the next layer without loss of meaningful information or introduction of noise is critical. Additionally, the computational cost of repeatedly applying softmax functions across many layers might outweigh the benefits if not efficiently managed. Conclusion While the proposed enhancements to LLMs through additional mathematical layers, parallel computing, and advanced programming techniques present innovative ideas, they also introduce several challenges. The feasibility of these improvements depends on careful management of computational resources, sophisticated model tuning, and ensuring compatibility between different types of operations. Balancing increased complexity with practical benefits is crucial to make these enhancements viable in real-world applications. Recommendations **Incremental Implementation**: Start by incrementally adding mathematical layers and functions, closely monitoring the impact on model performance and computational load. **Advanced Regularization Techniques**: Employ advanced regularization methods to manage the risk of overfitting with more complex models. **Scalability Testing**: Conduct thorough scalability testing with parallel computing frameworks like Bend to evaluate real-world performance gains. **Collaborative Research**: Collaborate with researchers and practitioners to refine and test these concepts in various settings, ensuring robustness and practicality. Inverse Softmax Function in a 3D LLM Structure The concept of an inverse softmax function in a multi-layered LLM structure within a 3D vector space. Softmax Function Overview The softmax function is used to convert a vector of values (logits) into a probability distribution. For a vector \(\mathbf{z} = [z_1, z_2, \ldots, z_n]\), the softmax function is defined as: \[ \sigma(\mathbf{z})_i = \frac{e^{z_i}}{\sum_{j=1}^{n} e^{z_j}} \] where \(\sigma(\mathbf{z})_i\) represents the probability associated with the \(i\)-th element. Concept of Inverse Softmax The inverse softmax would ideally perform the reverse operation: converting a probability distribution back into logits. While this isn't straightforward due to the nature of the softmax function (it maps a vector to a simplex), we can consider the following approach: Given a probability distribution \(\mathbf{p} = [p_1, p_2, \ldots, p_n]\) where \( \sum_{i=1}^{n} p_i = 1\), the inverse softmax can be defined (in a simplified form) as: \[ z_i = \log(p_i) + C \] where \(C\) is a constant ensuring that the logits maintain the relative differences in the probability distribution. One common approach is to set \(C\) such that the logits sum to zero or another fixed value for stability. Application in a 3D Vector Space In the proposed 3D LLM structure, layers of LLMs are stacked, each adding complexity and refining outputs. Here’s how an inverse softmax might fit into this structure: **Base Layer (Standard LLMs)** : Each element of the 3D grid runs a standard LLM, producing a set of logits for the input data. **Intermediate Layers (Mathematical Functions)**: Subsequent layers perform additional mathematical transformations on the logits, refining them further. **Inverse Softmax Layer**: At a certain stage, an inverse softmax function is applied to convert probability distributions back into logits. This step could help in scenarios where it's beneficial to revert to a logit representation for further transformations. **Upper Layers (Enhanced Transformations)**: The logits are then processed through additional layers of mathematical functions, eventually producing a refined output. Practical Example **Initial Logits**: Let’s say the base layer produces logits \(\mathbf{z}^{(0)} = [z_1^{(0)}, z_2^{(0)}, \ldots, z_n^{(0)}]\). **Softmax Application**: These logits are transformed into probabilities using the softmax function, yielding \(\mathbf{p}^{(1)} = \sigma(\mathbf{z}^{(0)})\). **Intermediate Transformations**: Several layers perform mathematical operations on \(\mathbf{p}^{(1)}\), producing refined probabilities \(\mathbf{p}^{(2)}, \mathbf{p}^{(3)}, \ldots\). **Inverse Softmax Application**: At a specific layer, the inverse softmax is applied to \(\mathbf{p}^{(k)}\), converting it back into logits \(\mathbf{z}^{(k)} = \log(\mathbf{p}^{(k)}) + C\). **Further Processing**: These logits \(\mathbf{z}^{(k)}\) are processed through additional layers, ultimately generating the final output. Conclusion Incorporating an inverse softmax function within a multi-layered LLM structure in a 3D vector space adds flexibility in handling logits and probability distributions. While the implementation details require careful consideration, this approach can enhance the model’s ability to refine and process data through various mathematical transformations. submitted by /u/utkohoc [link] [comments]

- [D] What are open unsolved interesting problems in machine learning?by /u/marshallggggg (Machine Learning) on June 23, 2024 at 1:10 am
I am curious what is the next big leap forward in machine learning. What are some obstacles out there that if solved machine learning would become even more useful? Or this question could be phrased differently. In what problems a machine learning approach hasnt been applied yet when it could turn out useful. submitted by /u/marshallggggg [link] [comments]

- [D] Why does developing these RAG applications feel like alchemy?by /u/latentnumber (Machine Learning) on June 23, 2024 at 12:58 am
^ Basically the title. Is there a principled way of doing this? Like Weights & Biases, where you can at least monitor what's happening. submitted by /u/latentnumber [link] [comments]

- [D] How do you quantize a finetuned encoder-decoder (seq2seq) transformer like mT5 on ONNXRuntime or Optimum?by /u/Abs0lute_Jeer0 (Machine Learning) on June 22, 2024 at 7:26 pm
I believe I have to quantize the encoder and decoder parts separately, I am able to do this but when I use: model = ORTSeq2SeqLM(‘path/to/onnx/files’) tokenizer = …. toeknized_input = … model.generate() I end up with tensor shape mismatch errors at the input node itself. They want me to send an input of shape (16, 2). Why is this happening have I made a mistake while quantizing them? Even if anyone can point towards any good tutorials or guides that are able to quantize seq2seq models I will be grateful! submitted by /u/Abs0lute_Jeer0 [link] [comments]

- [R] GNOME: Generating Negotiations through Open-Domain Mapping of Exchangesby /u/Megixist (Machine Learning) on June 22, 2024 at 6:41 pm
submitted by /u/Megixist [link] [comments]