**What is the Best Machine Learning Algorithms for Imbalanced Datasets?**

## In machine learning, **imbalanced datasets** are those where one class heavily outnumbers the others. This can be due to the nature of the problem or simply because more data is available for one class than the others. Either way, imbalanced datasets can pose a challenge for machine learning algorithms. In this blog post, we’ll take a look at which machine learning algorithms are best suited for imbalanced datasets and why they tend to perform better than others.

** For example, in a binary classification problem, if there are 100 observations, and only 10 of them are positive (the rest are negatives), then we say that the dataset is imbalanced. The ratio of positive to negative cases is 1:10. **

## There are a few reasons why some machine learning algorithms tend to perform better on imbalanced datasets than others. First, certain algorithms are designed to handle imbalanced datasets. Second, some algorithms are more robust to outliers, which can be more common in imbalanced datasets. And third, some algorithms are better able to learn from a limited amount of data, which can be an issue when one class is heavily outnumbered by the others.

**Some of the best machine learning algorithms for imbalanced datasets include:**

**– Support Vector Machines (SVMs),**

**– Decision Trees,**

**– Random Forests,**

**– Naive Bayes Classifiers,**

**– k-Nearest Neighbors (kNN),**

## Of these, SVMs tend to be the most popular choice as they are specifically designed to handle imbalanced datasets. SVMs work by finding a hyperplane that maximizes the margin between the two classes. This helps to reduce overfitting and improve generalization. Decision trees and random forests are also popular choices as they are less sensitive to outliers than other algorithms such as linear regression. Naive Bayes classifiers are another good choice as they are able to learn from a limited amount of data. kNN is also a good choice as it is not sensitive to outliers and is able to learn from a limited amount of data. However, it can be computationally intensive for large datasets.

## There are two main types of machine learning algorithms: supervised and unsupervised. Supervised algorithms tend to perform better on imbalanced datasets than unsupervised algorithms. In this blog post, we will discuss why this is so and look at some examples.

**Supervised Algorithms**

Supervised algorithms are those where the target variable is known. In other words, we have training data where the correct answers are already given. The algorithm then learns from this data and is able to generalize to new data. Some examples of supervised algorithms are regression and classification.

**Unsupervised Algorithms**

Unsupervised algorithms are those where the target variable is not known. With unsupervised algorithms, we only have input data, without any corresponding output labels. The algorithm has to learn from the data itself without any guidance. Some examples of unsupervised algorithms are clustering and dimensionality reduction.

**Why Supervised Algorithms Perform Better on Imbalanced Datasets**

**The reason why supervised algorithms perform better on imbalanced datasets is because they can learn from the training data which cases are more important**. With unsupervised algorithms, all data points are treated equally, regardless of whether they are in the minority or majority class.

## For example, in a binary classification problem with an imbalanced dataset, let’s say that we want to predict whether a customer will default on their loan payment or not. We have a training dataset of 1000 customers, out of which only 100 (10%) have defaulted on their loan in the past.

## If we use a supervised algorithm like logistic regression, the algorithm will learn from the training data that defaulting on a loan is rare (since only 10% of cases in the training data are Positive). This means that it will be more likely to predict correctly that a new customer will not default on their loan (since this is the majority class in the training data).

However, if we use an unsupervised algorithm like k-means clustering, all data points will be treated equally since there is no target variable to guide the algorithm. This means that it might incorrectly cluster together customers who have defaulted on their loans with those who haven’t since there is no guidance provided by a target variable.

**Conclusion: **

**In conclusion, supervised machine learning algorithms tend to perform better on imbalanced datasets than unsupervised machine learning algorithms because they can learn from the training data which cases are more important. **

**Some machine learning algorithms tend to perform better on highly imbalanced datasets because they are designed to deal with imbalance or because they can learn from both classes simultaneously. If you are working with a highly imbalanced dataset, then you should consider using one of these algorithms.**

**Thanks for reading!**

**How are machine learning techniques being used to address unstructured data challenges**?

Machine learning techniques are being used to address unstructured data challenges in a number of ways:

**Natural language processing (NLP)**: NLP algorithms can be used to extract meaningful information from unstructured text data, such as emails, documents, and social media posts. NLP algorithms can be trained to classify text data, identify key terms and concepts, and extract structured data from unstructured text.**Image recognition**: Machine learning algorithms can be used to analyze and classify images, enabling the automatic identification and classification of objects, people, and other elements in images. This can be useful for tasks such as image tagging and search, as well as for applications such as security and surveillance.**Audio and speech recognition**: Machine learning algorithms can be used to analyze and classify audio data, enabling the automatic transcription and translation of spoken language. This can be useful for tasks such as speech-to-text transcription, as well as for applications such as call center automation and language translation.**Video analysis**: Machine learning algorithms can be used to analyze and classify video data, enabling the automatic detection and classification of objects, people, and other elements in video. This can be useful for tasks such as video tagging and search, as well as for applications such as security and surveillance.

Overall, machine learning techniques are being used in a wide range of applications to extract meaningful information from unstructured data, and to enable the automatic classification and analysis of data in a variety of formats.

**How is AI and machine learning impacting application development today?**

Artificial intelligence (AI) and machine learning are having a significant impact on application development today in a number of ways:

- Enabling new capabilities: AI and machine learning algorithms can be used to enable applications to perform tasks that would be difficult or impossible for humans to do. For example, AI-powered applications can be used to analyze and classify large amounts of data, or to automate complex decision-making processes.
- Improving performance: AI and machine learning algorithms can be used to optimize the performance of applications, making them faster, more efficient, and more accurate. For example, machine learning algorithms can be used to improve the accuracy of predictive models, or to optimize the performance of search algorithms.
- Streamlining development: AI and machine learning algorithms can be used to automate various aspects of application development, such as testing, debugging, and deployment. This can help to streamline the development process and reduce the time and resources needed to build and maintain applications.
- Enhancing user experiences: AI and machine learning algorithms can be used to enhance the user experience of applications, by providing personalized recommendations, recommendations, or by enabling applications to anticipate and respond to the needs and preferences of users.

Overall, AI and machine learning are having a significant impact on application development today, and they are likely to continue to shape the way applications are built and used in the future.

**How will advancements in artificial intelligence and machine learning shape the future of work and society?**

Advancements in artificial intelligence (AI) and machine learning are likely to shape the future of work and society in a number of ways. Some potential impacts include:

**Automation**: AI and machine learning algorithms can be used to automate tasks that are currently performed by humans, such as data entry, customer service, and manufacturing. This could lead to changes in the types of jobs that are available and the skills that are in demand, as well as to increased productivity and efficiency.**Job displacement**: While automation may create new job opportunities, it could also lead to job displacement, particularly for workers in industries that are more susceptible to automation. This could lead to social and economic challenges, including unemployment and income inequality.**Increased efficiency**: AI and machine learning algorithms can be used to optimize and streamline business processes, leading to increased efficiency and productivity. This could lead to economic growth and innovation, and could also help to reduce costs for businesses and consumers.**Enhanced decision-making**: AI and machine learning algorithms can be used to analyze large amounts of data and make more informed and accurate decisions. This could lead to improved outcomes in fields such as healthcare, finance, and education, and could also help to reduce bias and improve fairness.

Overall, **the impact of AI and machine learning on the future of work and society is likely to be significant and complex, with both potential benefits and challenges.** It will be important to consider and address these impacts as these technologies continue to advance and become more widely adopted.

- [D] Serving a model for clients with sensitive databy /u/Deto (Machine Learning) on June 23, 2024 at 4:01 am
I'm sure this situation happens and so I wanted to ask the sub what tools / platforms might be available to facilitate. The idea is that say you don't want to give your model to a client and they don't want to give you their data, but you want to be able to enter an agreement with them to process their data on your model. It's fundamentally an issue of trust. Is there, say third party platform that could mitigate this? Where you could upload your model and they could send it data for inference / receive results but have some assurance that you couldnt be secretly saving their data? submitted by /u/Deto [link] [comments]

- [R] The concept of an inverse SoftMax function in a multi-layered LLM structure within a multi-dimensional vector space.by /u/utkohoc (Machine Learning) on June 23, 2024 at 1:59 am
Introduction Machine learning (ML) is fundamentally rooted in mathematics, utilizing complex functions and programming to extrapolate vectors in a space and calculate probabilities. Large language models (LLMs), a subset of ML, employ mathematical techniques to determine connections between data points in a high-dimensional space. This paper explores the enhancement of LLM capabilities through the integration of additional mathematical layers, parallel computing, and advanced programming techniques like Bend. The Mathematical Foundation of Machine Learning At its core, machine learning involves the manipulation and transformation of vectors within a space to model and predict outcomes. This process heavily relies on functions such as weights and softmax. **Weights**: In neural networks, weights adjust the influence of input signals. They are optimized during training to minimize error and enhance prediction accuracy. **Softmax Function**: This function converts a vector of values into a probability distribution, often used in the final layer of a neural network for classification tasks. It ensures that the output values sum to one, making them interpretable as probabilities. Large Language Models (LLMs) LLMs, such as those based on the Transformer architecture, utilize a series of mathematical operations to model language. Transformers, introduced in "Attention is All You Need" by Vaswani et al., leverage self-attention mechanisms to process sequences of data without relying on recurrent structures. **Self-Attention Mechanism**: This mechanism allows the model to weigh the importance of different words in a sequence relative to each other, facilitating the capture of long-range dependencies in the data. **Multi-Head Attention**: Enhances the model’s ability to focus on different parts of the input sequence simultaneously by running multiple self-attention operations in parallel. Enhancing Learning Through Additional Mathematical Layers If LLMs use extensive mathematics to map connections between data points, incorporating more sophisticated mathematical operations into each layer can theoretically enhance their learning ability. The idea is to add new layers of mathematical functions on top of the existing probabilistic layers, effectively increasing the model's capacity to understand and manipulate data. **Parallel Computing with Bend**: Bend, a programming language designed for parallelism, can significantly boost the performance of LLMs. Bend supports features like fast object allocations, higher-order functions, and runs on massively parallel hardware like GPUs. This allows for nearly linear acceleration based on core count without explicit parallelism annotations (e.g., no thread creation or locks). Building a Multi-Layered LLM Structure Imagine constructing a multi-layered LLM where each layer represents an additional dimension of mathematical processing. The base layer operates as a standard LLM, processing data using conventional methods. Above this, additional layers perform more complex mathematical transformations. **First Layer**: Standard LLM processing on a GPU. **Second Layer**: Enhanced with additional mathematical functions running in parallel, leveraging Bend for optimal performance. By stacking these layers, the LLM can process data through multiple stages of mathematical refinement. The bottom layers handle probabilistic computations, while the upper layers focus on deterministic, linear algebra transformations. Example Structure Visualize the LLM structure as a 3D cube: **Base Layer**: A 10x10 grid of vector spaces, each running an LLM. **Upper Layers**: Additional 10x10 grids, each incorporating advanced mathematical functions. Each layer performs softmax operations on the outputs of the layer beneath it, iteratively refining the model's understanding of the data. This multi-dimensional approach can potentially produce a higher-order softmax function, enhancing the model's learning capabilities exponentially. Conclusion The integration of additional mathematical layers and advanced parallel computing techniques like Bend into LLMs represents a promising avenue for enhancing their learning capabilities. By building a multi-layered structure, we can leverage both probabilistic and deterministic computations to achieve more sophisticated data modeling and prediction. References Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. *31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA*. Bend Programming Language. GitHub repository. [https://github.com/HigherOrderCO/bend\](https://github.com/HigherOrderCO/bend) Points of Analysis **Mathematical Complexity and Computational Load**: **Claim**: Adding more mathematical layers and functions increases learning ability. **Analysis**: While more complex mathematical operations can provide deeper insights and refined models, they also significantly increase computational requirements. Each additional layer introduces more parameters to be learned, which can lead to issues such as overfitting if not managed properly. Moreover, more complex models require more data to train effectively and more computational power, potentially leading to inefficiencies and increased costs. **Parallel Computing and Bend**: **Claim**: Using parallel computing with a language like Bend can optimize the process without explicit parallelism annotations. **Analysis**: Bend's features (fast object allocations, support for higher-order functions, and scalability like CUDA) are promising for parallel computing. However, translating theoretical benefits into practical gains can be challenging. Effective parallelization of neural networks often requires careful tuning and management of data dependencies, which might still necessitate some level of explicit control over parallel processes. **Multi-Dimensional LLM Structure**: **Claim**: Constructing a multi-layered LLM structure (like a 3D cube) can enhance learning through additional mathematical refinements. **Analysis**: The concept of stacking layers in a 3D space and refining outputs through successive softmax operations is innovative. However, the practical implementation of such a structure poses several challenges: **Complexity Management**: Managing the increased complexity and ensuring stable training across multiple layers require sophisticated techniques to prevent issues like gradient vanishing or exploding. **Data Requirements**: More layers and parameters necessitate larger datasets for training to avoid overfitting and ensure the model generalizes well to unseen data. **Interpretability**: Adding multiple layers of mathematical functions can reduce the interpretability of the model, making it harder to diagnose issues and understand the model’s decision-making process. **Probabilistic vs. Deterministic Layers**: **Claim**: Combining probabilistic layers with deterministic, linear algebra transformations enhances model capabilities. **Analysis**: Integrating deterministic operations with probabilistic ones can indeed enrich the model’s feature extraction capabilities. However, ensuring smooth interaction between these two types of operations is non-trivial. Linear algebra transformations need to be carefully designed to complement the probabilistic layers without introducing instability or incompatibility in the learning process. **Softmax and Higher-Order Functions**: **Claim**: Using softmax operations across multiple layers to derive a higher-order softmax function. **Analysis**: The idea of iteratively refining softmax operations through additional layers is intriguing. However, ensuring that each layer’s softmax output correctly informs the next layer without loss of meaningful information or introduction of noise is critical. Additionally, the computational cost of repeatedly applying softmax functions across many layers might outweigh the benefits if not efficiently managed. Conclusion While the proposed enhancements to LLMs through additional mathematical layers, parallel computing, and advanced programming techniques present innovative ideas, they also introduce several challenges. The feasibility of these improvements depends on careful management of computational resources, sophisticated model tuning, and ensuring compatibility between different types of operations. Balancing increased complexity with practical benefits is crucial to make these enhancements viable in real-world applications. Recommendations **Incremental Implementation**: Start by incrementally adding mathematical layers and functions, closely monitoring the impact on model performance and computational load. **Advanced Regularization Techniques**: Employ advanced regularization methods to manage the risk of overfitting with more complex models. **Scalability Testing**: Conduct thorough scalability testing with parallel computing frameworks like Bend to evaluate real-world performance gains. **Collaborative Research**: Collaborate with researchers and practitioners to refine and test these concepts in various settings, ensuring robustness and practicality. Inverse Softmax Function in a 3D LLM Structure The concept of an inverse softmax function in a multi-layered LLM structure within a 3D vector space. Softmax Function Overview The softmax function is used to convert a vector of values (logits) into a probability distribution. For a vector \(\mathbf{z} = [z_1, z_2, \ldots, z_n]\), the softmax function is defined as: \[ \sigma(\mathbf{z})_i = \frac{e^{z_i}}{\sum_{j=1}^{n} e^{z_j}} \] where \(\sigma(\mathbf{z})_i\) represents the probability associated with the \(i\)-th element. Concept of Inverse Softmax The inverse softmax would ideally perform the reverse operation: converting a probability distribution back into logits. While this isn't straightforward due to the nature of the softmax function (it maps a vector to a simplex), we can consider the following approach: Given a probability distribution \(\mathbf{p} = [p_1, p_2, \ldots, p_n]\) where \( \sum_{i=1}^{n} p_i = 1\), the inverse softmax can be defined (in a simplified form) as: \[ z_i = \log(p_i) + C \] where \(C\) is a constant ensuring that the logits maintain the relative differences in the probability distribution. One common approach is to set \(C\) such that the logits sum to zero or another fixed value for stability. Application in a 3D Vector Space In the proposed 3D LLM structure, layers of LLMs are stacked, each adding complexity and refining outputs. Here’s how an inverse softmax might fit into this structure: **Base Layer (Standard LLMs)** : Each element of the 3D grid runs a standard LLM, producing a set of logits for the input data. **Intermediate Layers (Mathematical Functions)**: Subsequent layers perform additional mathematical transformations on the logits, refining them further. **Inverse Softmax Layer**: At a certain stage, an inverse softmax function is applied to convert probability distributions back into logits. This step could help in scenarios where it's beneficial to revert to a logit representation for further transformations. **Upper Layers (Enhanced Transformations)**: The logits are then processed through additional layers of mathematical functions, eventually producing a refined output. Practical Example **Initial Logits**: Let’s say the base layer produces logits \(\mathbf{z}^{(0)} = [z_1^{(0)}, z_2^{(0)}, \ldots, z_n^{(0)}]\). **Softmax Application**: These logits are transformed into probabilities using the softmax function, yielding \(\mathbf{p}^{(1)} = \sigma(\mathbf{z}^{(0)})\). **Intermediate Transformations**: Several layers perform mathematical operations on \(\mathbf{p}^{(1)}\), producing refined probabilities \(\mathbf{p}^{(2)}, \mathbf{p}^{(3)}, \ldots\). **Inverse Softmax Application**: At a specific layer, the inverse softmax is applied to \(\mathbf{p}^{(k)}\), converting it back into logits \(\mathbf{z}^{(k)} = \log(\mathbf{p}^{(k)}) + C\). **Further Processing**: These logits \(\mathbf{z}^{(k)}\) are processed through additional layers, ultimately generating the final output. Conclusion Incorporating an inverse softmax function within a multi-layered LLM structure in a 3D vector space adds flexibility in handling logits and probability distributions. While the implementation details require careful consideration, this approach can enhance the model’s ability to refine and process data through various mathematical transformations. submitted by /u/utkohoc [link] [comments]

- [D] What are open unsolved interesting problems in machine learning?by /u/marshallggggg (Machine Learning) on June 23, 2024 at 1:10 am
I am curious what is the next big leap forward in machine learning. What are some obstacles out there that if solved machine learning would become even more useful? Or this question could be phrased differently. In what problems a machine learning approach hasnt been applied yet when it could turn out useful. submitted by /u/marshallggggg [link] [comments]

- [D] Why does developing these RAG applications feel like alchemy?by /u/latentnumber (Machine Learning) on June 23, 2024 at 12:58 am
^ Basically the title. Is there a principled way of doing this? Like Weights & Biases, where you can at least monitor what's happening. submitted by /u/latentnumber [link] [comments]

- [D] How do you quantize a finetuned encoder-decoder (seq2seq) transformer like mT5 on ONNXRuntime or Optimum?by /u/Abs0lute_Jeer0 (Machine Learning) on June 22, 2024 at 7:26 pm
I believe I have to quantize the encoder and decoder parts separately, I am able to do this but when I use: model = ORTSeq2SeqLM(‘path/to/onnx/files’) tokenizer = …. toeknized_input = … model.generate() I end up with tensor shape mismatch errors at the input node itself. They want me to send an input of shape (16, 2). Why is this happening have I made a mistake while quantizing them? Even if anyone can point towards any good tutorials or guides that are able to quantize seq2seq models I will be grateful! submitted by /u/Abs0lute_Jeer0 [link] [comments]

- [R] GNOME: Generating Negotiations through Open-Domain Mapping of Exchangesby /u/Megixist (Machine Learning) on June 22, 2024 at 6:41 pm
submitted by /u/Megixist [link] [comments]

- [D] Datasets of the google Gemma for Indic languagesby /u/cern_unnosi (Machine Learning) on June 22, 2024 at 3:43 pm
Were the Indic language datasets used to train GEMMA originally created in the Indic languages themselves, or were they translations from English datasets? The response seems overly translated.? submitted by /u/cern_unnosi [link] [comments]

- [D] Academic ML Labs: How many GPUS ?by /u/South-Conference-395 (Machine Learning) on June 22, 2024 at 10:29 am
Following a recent post, I was wondering how other labs are doing in this regard. During my PhD (top-5 program), compute was a major bottleneck (it could be significantly shorter if we had more high-capacity GPUs). We currently have *no* H100. How many GPUs does your lab have? Are you getting extra compute credits from Amazon/ NVIDIA through hardware grants? thanks submitted by /u/South-Conference-395 [link] [comments]

- [D] Memory mechanism for Transformersby /u/Janos95 (Machine Learning) on June 21, 2024 at 6:29 pm
Hey folks! I am wondering what interesting work has been done to add a short term memory mechanism to transformers? Does someone know what the important work in this area is? submitted by /u/Janos95 [link] [comments]

- [P] AgileRL - evolutionary RLOps for state-of-the-art deep reinforcement learningby /u/nicku_a (Machine Learning) on June 21, 2024 at 5:49 pm
Hi, I've posted before about our evolutionary hyperparameter optimization for reinforcement learning achieving SOTA results, but I'd like to share that our open-source framework has now had its v1.0.0 release! Please check it out! https://github.com/AgileRL/AgileRL This library is initially focused on reducing the time taken for training models and hyperparameter optimization by pioneering evolutionary HPO techniques for reinforcement learning. Evolutionary HPO has been shown to drastically reduce overall training times by automatically converging on optimal hyperparameters, without requiring numerous training runs. We are constantly adding more algorithms and features. AgileRL already includes state-of-the-art evolvable on-policy, off-policy, offline, multi-agent and contextual multi-armed bandit reinforcement learning algorithms with distributed training. I'd love to get your feedback! submitted by /u/nicku_a [link] [comments]

- [D] Visualising attention maps for multimodal ACT modelby /u/Few_Pangolin4015 (Machine Learning) on June 21, 2024 at 4:39 pm
Does anyone know how to visualise encoder and decoder transformer attention maps for ACT? Observation is a combination of robot proprioceptive and multi-camera image data. Output is an action chunk. Model is based on DETR. The hard part is splitting the attention maps in such a way that links back to the current observation. I think the most interesting visual would be: Given the last layer decoder attention map and current observation. What did the model attend to in the observation to produce the generated the action chunk I.e., what were the parts of the image in each of the cameras and what were the parts of the robot proprioceptive data, that the model paid attention to in generating the action chunk. ACT project page: https://tonyzhaozh.github.io/aloha/ submitted by /u/Few_Pangolin4015 [link] [comments]

- Work on Text to video for Sign Language.[R]by /u/One_Definition_8975 (Machine Learning) on June 21, 2024 at 3:52 pm
I am working on text to video for Sign Language. I see the main bottle neck is the keypoint extraction. Anyone working on this area? submitted by /u/One_Definition_8975 [link] [comments]

- [D] [R] Need Help: Using ML to differentiate Radiation Necrosis from Tumor Progression in glioblastomaby /u/Eastern_Phase_6323 (Machine Learning) on June 21, 2024 at 2:39 pm
Hi, I have a set of MRI images and I'm trying to figure out if new lesions visible on a set of images are due to tumor progression or radiation necrosis. I have a background in software development and machine learning, and I’m looking for insight into how ML can help solve this problem. Based on the latest research my understanding is that it's possible with a combination of different imaging techniques. I'm looking for proven ML models that can help to distinguish between radiation necrosis and tumor progression anyone who has experience with BRATS dataset and can give some advice Thank you! Update: Removed personal background to be more objective. submitted by /u/Eastern_Phase_6323 [link] [comments]

- [R] [D] Sanity Check on use of biLSTM for time series predictionby /u/rutherfordofman (Machine Learning) on June 21, 2024 at 1:08 pm
TLDR; this paper uses biLSTM in a published paper and I think it violates causality. Hi, I am struggling to convince myself I am not going mad. I am looking at this paper published in Neural Networks, an Elsevier journal. In this paper they use a bidirectional LSTM model (+ some other novel stuff) to predict time series. This seems fundamentally wrong as biLSTM cannot/should not be used for time series prediction. The best known use case for biLSTM is translating a phrase word by word when the entire sentence is known in advance. In this case the preceding and succeeding words can influence the meaning and so the translation of a focal word. A silly example would be translating this into Spanish I need a shot, I got bitten by a dog If you are scanning through each word in turn to translate, you might suggest w_4 (= 'shot') would translate to 'inyeccion' i.e. a vaccination. Knowing that w_10 = 'dog' would have important predictive value here. Likewise I need a shot, let's go to a bar! w_4 would probably translate to 'chupito' for a shot of booze because w_9 = 'bar' has an influence. So you can and should use a biLSTM here so you can scan what comes before and after the word to know the context. However, for a time series prediction, you don't know the future! The future cannot affect the present without violating causality. In the translation example the sentence is in fact already created in the person's head before the say/write it so the later words don't violate causality. However in this paper they use biLSTM on general time series benchmarks and it seems totally unscientific! AM I missing something? submitted by /u/rutherfordofman [link] [comments]

- [P] Importance map of image based on segmentationsby /u/mrex778 (Machine Learning) on June 21, 2024 at 12:41 pm
Hej, So I've been working on a project where I need to identify the important areas in the image. The dataset has a full image plus the segmentations with each region's importance (a label of -1, 0, 1 with -1 being the least important and 1 the most important). Also, the dataset is small (around 200 images). I'm stuck, can't think of anything I haven't done. I know also about object saliency detection but that just gives the most important object in the image and not a map of importance. I would appreciate any help, ideas or guidance. Thanks submitted by /u/mrex778 [link] [comments]

- [P] Synthetic data Generatorby /u/Possible-Suspect2127 (Machine Learning) on June 21, 2024 at 10:29 am
We are trying to build a synthetic data generator for tabular and textual data on a particular domain. Final product will be user provides a dataset , specifies number of rows to generate and we generate that along with different metrics to evaluate the generated data. We have thought of using GANs for tabular data such as CTGAN but we have no idea what to use for textual datasets ( eg mental health conversation data) .Please suggest how can we train our model so that it can generalize well for other new datasets, should we train the same model on multiple datasets of same domain , or use different model and train from start. Any guidance would be appreciated, if you had previously worked on such problem do let me know I will reach out to you. submitted by /u/Possible-Suspect2127 [link] [comments]

- [P] Classifier for prioritizing emailsby /u/mr_house7 (Machine Learning) on June 21, 2024 at 9:20 am
I'm trying to build a classifier for prioritizing emails with tradional ML models (Decision Tree, Logistic Regression etc) Input: Email Body (Vectorized), Subject(Vectorized), Num of chars Output : Email Priority (3 classes), generated with an LLM (phi3-mini) (I know this is controversial, but my boss wants a model, but has no data, so this was the only way I knew how to "create" data) Dataset: 7K rows: class 0 - 4k, class 1: 2K, class 2: 1K (I have dealt with class imbalance by adding a class weight and looking mostly and confusion metrics) I tried several models with subpar results. I'm was wondering if any of you had similar experience with a problem like this. What you think is the problem? AI generated data? Small dataset? Impossible to do it with tradional ML models? Am I doing something wrong? Any help or insight would be greatly appreciated submitted by /u/mr_house7 [link] [comments]

- [D] FP8 current Stateby /u/ClumsyClassifier (Machine Learning) on June 21, 2024 at 8:43 am
I remember there was some hype about fp8 training however the problem then was that its not really supported. I checked recently and there still doesnt seem to be a lot support, even though rtf 40 series as well as h100 supprts fp8. Im just wondering what happened, was it too unstable? Did pytorch just not bother? Just seems like a mystery to me considering modern hardware supports it submitted by /u/ClumsyClassifier [link] [comments]

- [D] Open AI JSON mode implementationby /u/WrapKey69 (Machine Learning) on June 21, 2024 at 7:08 am
How can function calling or JSON mode be implemented on the llm side? I suppose there must be a JSON validator and classifying somewhere. Would appreciate any ideas. submitted by /u/WrapKey69 [link] [comments]

- [Project] LLM based Python docs that never touches your original codeby /u/ford_prefect_9931 (Machine Learning) on June 21, 2024 at 6:44 am
Documentation is tedious and time-consuming. I thought LLMs might be the answer, but they tend to hallucinate, inventing functions or misinterpret code. Not ideal when you're trying to document real, working code So I built lmdocs. It can: Reference documentation from imported libraries Guarantees that your original code is unchanged Work with OpenAI and lo¯cal LLMs I'd love to get some feedback from other devs. If you're interested, you can check it out here: https://github.com/MananSoni42/lmdocs It's open source, so feel free to contribute or just let me know what you think. submitted by /u/ford_prefect_9931 [link] [comments]