What are some ways to increase precision or recall in machine learning?
What are some ways to Boost Precision and Recall in Machine Learning?
Sensitivity vs Specificity?
In machine learning, recall is the ability of the model to find all relevant instances in the data while precision is the ability of the model to correctly identify only the relevant instances. A high recall means that most relevant results are returned while a high precision means that most of the returned results are relevant. Ideally, you want a model with both high recall and high precision but often there is a trade-off between the two. In this blog post, we will explore some ways to increase recall or precision in machine learning.
What are some ways to increase precision or recall in machine learning?
There are two main ways to increase recall:
by increasing the number of false positives or by decreasing the number of false negatives. To increase the number of false positives, you can lower your threshold for what constitutes a positive prediction. For example, if you are trying to predict whether or not an email is spam, you might lower the threshold for what constitutes spam so that more emails are classified as spam. This will result in more false positives (emails that are not actually spam being classified as spam) but will also increase recall (more actual spam emails being classified as spam).
2023 AWS Certified Machine Learning Specialty (MLS-C01) Practice Exams
To decrease the number of false negatives,
you can increase your threshold for what constitutes a positive prediction. For example, going back to the spam email prediction example, you might raise the threshold for what constitutes spam so that fewer emails are classified as spam. This will result in fewer false negatives (actual spam emails not being classified as spam) but will also decrease recall (fewer actual spam emails being classified as spam).
There are two main ways to increase precision:
by increasing the number of true positives or by decreasing the number of true negatives. To increase the number of true positives, you can raise your threshold for what constitutes a positive prediction. For example, using the spam email prediction example again, you might raise the threshold for what constitutes spam so that fewer emails are classified as spam. This will result in more true positives (emails that are actually spam being classified as spam) but will also decrease precision (more non-spam emails being classified as spam).
you can lower your threshold for what constitutes a positive prediction. For example, going back to the spam email prediction example once more, you might lower the threshold for what constitutes spam so that more emails are classified as spam. This will result in fewer true negatives (emails that are not actually spam not being classified as spam) but will also decrease precision (more non-spam emails being classified as spam).
To summarize,
there are a few ways to increase precision or recall in machine learning. One way is to use a different evaluation metric. For example, if you are trying to maximize precision, you can use the F1 score, which is a combination of precision and recall. Another way to increase precision or recall is to adjust the threshold for classification. This can be done by changing the decision boundary or by using a different algorithm altogether.
Sensitivity vs Specificity
In machine learning, sensitivity and specificity are two measures of the performance of a model. Sensitivity is the proportion of true positives that are correctly predicted by the model, while specificity is the proportion of true negatives that are correctly predicted by the model.
Google introduced computing units, which you can purchase just like any other cloud computing unit you can from AWS or Azure etc. With Pro you get 100, and with Pro+ you get 500 computing units. GPU, TPU and option of High-RAM effects how much computing unit you use hourly. If you don’t have any computing units, you can’t use “Premium” tier gpus (A100, V100) and even P100 is non-viable.
Google Colab Pro+ comes with Premium tier GPU option, meanwhile in Pro if you have computing units you can randomly connect to P100 or T4. After you use all of your computing units, you can buy more or you can use T4 GPU for the half or most of the time (there can be a lot of times in the day that you can’t even use a T4 or any kinds of GPU). In free tier, offered gpus are most of the time K80 and P4, which performs similar to a 750ti (entry level gpu from 2014) with more VRAM.
For your consideration, T4 uses around 2, and A100 uses around 15 computing units hourly. Based on the current knowledge, computing units costs for GPUs tend to fluctuate based on some unknown factor.
Considering those:
For hobbyists and (under)graduate school duties, it will be better to use your own gpu if you have something with more than 4 gigs of VRAM and better than 750ti, or atleast purchase google pro to reach T4 even if you have no computing units remaining.
For small research companies, and non-trivial research at universities, and probably for most of the people Colab now probably is not a good option.
Colab Pro+ can be considered if you want Pro but you don’t sit in front of your computer, since it disconnects after 90 minutes of inactivity in your computer. But this can be overcomed with some scripts to some extend. So for most of the time Colab Pro+ is not a good option.
If you have anything more to say, please let me know so I can edit this post with them. Thanks!
In machine learning, precision and recall trade off against each other; increasing one often decreases the other. There is no single silver bullet solution for increasing either precision or recall; it depends on your specific use case which one is more important and which methods will work best for boosting whichever metric you choose. In this blog post, we explored some methods for increasing either precision or recall; hopefully this gives you a starting point for improving your own models!
Hi, I am a 2nd year PhD student in CS. My supervisor just got this idea about MoEs and fairness and asked me to implement it ( work on a toy classification problem on tabular data and NOT language data). However as it is not their area of expertise, they did not give any guidelines on how to approach it. My main question is: How do I search for or proceed with implementing a mixture of expert models? The ones that I find are for chatting and such but I mainly work with tabular EHR data. This is my first foray into this area (LLMs and MoEs) and I am kind of lost with all these Mixtral, openMoE, etc. As we do not have access to Google Collab or have powerful GPUs I have to rely on local training (My lab PC has 2080ti and my laptop has 4070). Any guideline or starting point on how to proceed would be greatly appreciated. submitted by /u/Furiousguy79 [link] [comments]
Just wanted to start a small discussion about why you use LLMs and which model works best for your use case. I am asking because I am concerned that there is little use for LLMs apart from doing role play, helping with coding, and answering general questions submitted by /u/RND_RandoM [link] [comments]
Meta's release of its latest Llama language model family this week, including the massive Llama-3 405B model, has generated a great deal of excitement among AI developers. These open-weights frontier models, which have been updated with a new license that allows unrestricted use of outputs, will enable significant improvements to AI-powered applications, and enable widespread commercial use of synthetic data. Less discussed, but no less important, are Meta's latest open moderation tools, including a new model called PromptGuard. PromptGuard is a small, lightweight classification model trained to detect malicious prompts, including jailbreaks and prompt injections. These attacks can be used to manipulate language models to produce harmful outputs or extract sensitive information. Companies building enterprise-ready applications must be able to detect and mitigate these attacks to ensure their models are safe to use, especially in sensitive and highly-regulated domains like healthcare, finance, and law. PromptGuard is a text classification model based on mDeBERTa-v3-base, a small transformer model with multilingual capabilities. Meta trained this model to output probabilities for 3 classes: BENIGN, INJECTION, and JAILBREAK. The JAILBREAK class is designed to identify malicious user prompts (such as the "Do Anything Now(opens in a new tab)" or DAN prompt, which instructs a language model to ignore previous instructions and enter an unrestricted mode). On the other hand, the INJECTION class is designed to identify retrieved contexts, such as a webpage or document, which have been poisoned with malicious content to influence the model's output. In our tests, we find that the model is able to identify common jailbreaks like DAN, but also labels benign prompts as injections. This likely happens because the model is trained to handle both prompts and retrieved contexts (such as web searches and news articles), and a benign prompt may appear similar to a malicious context. As stated in the model card: Application developers typically want to allow users flexibility in how they interact with an application, and to only filter explicitly violating prompts (what the ‘jailbreak’ label detects). Third-party content has a different expected distribution of inputs (we don’t expect any “prompt-like” content in this part of the input) This indicates that when applying the model to user prompts, you may want to ignore the INJECTION label, and only filter JAILBREAK inputs. On the other hand, when filtering third-party context to show to the model, such as a news article, you'd want to remove both JAILBREAK and INJECTION labels. We wrote a quick blog post about how you can use PromptGuard to protect your language models from malicious inputs. You can read more here: https://www.trytaylor.ai/blog/promptguard submitted by /u/Different-General700 [link] [comments]
My data is a bit complicated to describe so I'm going try to describe something analogous. Each example is randomly generated, but you can group them based on a specific but latent (by latent I mean this isn't added into the features used to develop a model, but I have access to it) feature (in this example we'll call this number of bedrooms). Feature x1 Feature x2 Feature x3 ... Output (Rent) Row 1 Row 2 Row 3 Row 4 Row 5 Row 6 Row 7 2 Row 8 1 Row 9 0 So I can group Row 1, Row 2, and Row 3 based on a latent feature called number of bedrooms (which in this case is 0 bedroom). Similarly, Row 4, Row 5, & Row 6 have 2 Bedrooms, and Row 7, Row 8, & Row 9 have 4 Bedrooms. Furthermore, these groups also have an optimum price which is used to create output classes (output here is Rent; increase, keep constant, or decrease). So say the optimum price for the 4 bedrooms group is $3mil, and row 7 has a price of $4mil (=> 3 - 4 = -1 mil, i.e a -ve value so convert this to class 2, or above optimum or increase rent), row 8 has a price of $3mil (=> 3 - 3 = 0, convert this to class 1, or at optimum), and row 9 has a price of $2mil (3 - 2 = 1, i.e +ve value, so convert this to class 0, or below optimum, or decrease rent). I use this method to create an output class for each example in the dataset (essentially, if example x has y number of bedrooms, I get the known optimum price for that number of bedrooms and I subtract the example's price from the optimum price). Say I have 10 features (e.g. square footage, number of bathrooms, parking spaces etc.) in the dataset, these 10 features provide the model with enough information to figure out the "number of bedrooms". So when I am evaluating the model, feature x1 feature x2 feature x3 ... Row 10 e.g. I pass into the model a test example (Row 10) which I know has 4 bedrooms and is priced at $6mil, the model can accurately predict class 2 (i.e increase rent) for this example. Because the model was developed using data with a representative number of bedrooms in my dataset. Features.... Output (Rent) Row 1 0 Row 2 0 Row 3 0 However, my problem arises at examples with a low number of bedrooms (i.e. 0 bedrooms). The input features doesn't have enough information to determine the number of bedrooms for examples with a low number of bedrooms (which is fine because we assume that within this group, we will always decrease the rent, so we set the optimum price to say $2000. So row 1 price could be $8000, (8000 - 2000 = 6000, +ve value thus convert to class 0 or below optimum/decrease rent). And within this group we rely on the class balance to help the model learn to make predictions because the proportion is heavily skewed towards class 0 (say 95% = class 0 or decrease rent, and 5 % = class 1 or class 2). We do this based the domain knowledge of the data (so in this case, we would always decrease the rent because no one wants to live in a house with 0 bedrooms). MAIN QUESTION: We now want to predict (or undertake inference) for examples with number of bedrooms in between 0 bedrooms and 2 bedrooms (e.g 1 bedroom NOTE: our training data has no example with 1 bedroom). What I notice is that the model's predictions on examples with 1 bedroom act as if these examples had 0 bedrooms and it mostly predicts class 0. My question is, apart from specifically including examples with 1 bedroom in my input data, is there any other way (more statistics or ML related way) for me to improve the ability of my model to generalise on unseen data? submitted by /u/Individual_Ad_1214 [link] [comments]
Tesla has a version (V12.5) of their supervised "Full Self Driving" that potential showing signficant improvements, though we will wait to see how much miles per critical disengagment have gone up. (Maybe 600-1000. Previous versions at 100-200 miles per critical disengagement). In order to make this improvement, they upped the parameter count by 5x the previous models. They are just barely making it function on HW3 (works on HW4). These models are already taking advantage of distillation and compression techniques. Considering that the miles per critical disengagement still needs to go up another 100x, I would think model parameter count will have to go up signficantly, maybe 10x-100x? While there are continuing advances in model distillation and compression, I find it hard to fathom that much larger models needed to achieve unsupervised driving will be compressed even further. Tweets like this imply (presumably from advances like LLAMA 2 to LLAMA 3) that these compression ratios will continue at a massive pace. https://x.com/wintonARK/status/1816537413206048915 What do you think? To me, the likely needed increase in model size to get to robotaxi level fidelity will outweigh any advances in distillation so that HW3 will unlikely be able to handle the model. submitted by /u/ZeApelido [link] [comments]
EMNLP paper review scores Overall assessment for my paper is 2, 2.5 and 3. Is there any chance that it may still be selected? The confidence is 2, 2.5 and 3. The soundness is 2, 2.5, 3.5. I am not sure how soundness and confidence may affect my paper's selection. Pls explain how this works. Which metrics should I consider important. Thank you! submitted by /u/Immediate-Hour-8466 [link] [comments]
https://openai.com/index/searchgpt-prototype/ We’re testing SearchGPT, a temporary prototype of new AI search features that give you fast and timely answers with clear and relevant sources. submitted by /u/we_are_mammals [link] [comments]
Transformer-based LLMs like GPT, Gemini, LLaMa, are decoder-only architectures. This means that the compute utilized and the time it takes for an output is directly related to the size of the output, NOT the size of the input. Meaning you could feed it 100 pages, but if the output is 1 page, that's what matters for compute. It’s notoriously tricky to get an LLM to output text that’s short and concise, and many practitioners give up trying to get it to be concise after some tinkering with prompt engineering. However, fine-tuning deals with the problem at the source, retraining the model to prefer outputting shorter outputs. submitted by /u/juliannorton [link] [comments]
I built a simple starter demo of a Knowledge Question and Answering System using Llama 3.1 (8B GGUF) and Marqo. Feel free to experiment and build on top of this yourselves! GitHub: https://github.com/ellie-sleightholm/marqo-llama3_1 submitted by /u/elliesleight [link] [comments]
https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/ They solved 4 of the 6 IMO problems (although it took days to solve some of them). This would have gotten them a score of 28/42, just one point below the gold-medal level. submitted by /u/we_are_mammals [link] [comments]
Hi community, I am exploring the Responsible AI domain where I have started reading about methods and tools to make Deep Learning Models explainable. I have already used SHAP and LIMe for ML model explainability. However, I am unsure about their use in explaining LLMs. I know that these methods are model agnostic but can we use these methods for Text Generation or Summarization tasks? I got reference docs from Shap explaining GPT2 for text generation tasks, but I am unsure about using it for other newer LLMs. Additionally, I would like to know, are there any better ways for Explainable AI for LLMs? submitted by /u/PhoenixHeadshot25 [link] [comments]
I have an MS in stats and 7 years as an analyst under my belt. I've been looking for a data scientist jb ever since I got the MS 4 years ago (got it part time while I started as an analyst) and have been having a hell of a time at it. I get plenty of interest in analyst positoins, but little interest in data scientist positoins. As I'm sure you all know, there is considerable overlap between the titles but HR drones and ATS doesn't necessarily know this. All they care about is key words. I've been offered a data scientist positoin at a company that I am ready to accept. The positoin is a little underpaid for a DS but about enough for me right now, but I'm thinking it could be a great stepping stone. I work that for 2-3 years then I'm competitive for higher compensated DS positoins. However I just got off the phone with a recriuter for a DA positoin that would pay between 25-40k more than the DS positoin (it's just a band at this point). The responsibilities are similar, it's just that this place has more money and is located in a HCOL are (both are remote though so COL and relocating are not a factor for me). More money now would be great, but I don't really know if this is going to leave me in a better position in a few years. Obviously, we're talking an offer vs just one phone screen, the higher DA positoin isn't a sure thing right now. But I'm just wondering if you guys would even keep pursing the DA positoin or just take the DS positoin and make up the difference in a few years with a higher paid DS positoin? Also I hate that this is a factor but I've done 12 interveiws just this month, I really REALLY don't want to do anymore, so it's a huge factor in me wanting to just drop out of the DA interveiw process and take the DS. submitted by /u/son_of_tv_c [link] [comments]
What is the standard way to model high-dimensional stochastic processes today? I have some process defined over images x, and I would like to compute P(x' | x, z) for all x'. I know there are Normalizing Flows, Gaussian Processes, etc, but I do not know which to get started with. I specifically want to compute the probabilities, not just sample some x' ~ P(x, z). submitted by /u/smorad [link] [comments]
Happy to share our recent paper, where we demonstrate that LLMs exhibit surprising agreement on purely imaginary and hallucinated contents -- what we call a "shared imagination space". To arrive at this conclusion, we ask LLMs to generate questions on hypothetical contents (e.g., a made-up concept in physics) and then find that they can answer each other's (unanswerable and nonsensical) questions with much higher accuracy than random chance. From this, we investigate in multiple directions on its emergence, generality and possible reasons, and given such consistent hallucination and imagination behavior across modern LLMs, discuss implications to hallucination detection and computational creativity. Link to the paper: https://arxiv.org/abs/2407.16604 Link to the tweet with result summary and highlight: https://x.com/YilunZhou/status/1816371178501476473 Please feel free to ask any questions! The main experiment setup and finding. submitted by /u/zyl1024 [link] [comments]
Hello, I am new to forecasting, looking for suggestions on what model/ models to use for my use case. I have time series data on free trial signups for users to our product. Mainly two categories of users: US users and non US users. The users have unlimited free trial. They can convert to full paid customers anytime if they want to get all the features. We have 50% of the users convert to paid within the first month of the trial, 10% in month 2, and so on. By 4th month of free trial sign up, the conversion to paid is around 1%, and this is then a non zero value for as far as the data goes back ( let’s say 15 months from trial sign up). I have last two years of data for the free trial by each segment. I am using a simple linear model with seasonally for the month to forecast 12 months out for the free trial orders. How can I use the above to forecast the conversion to paid by month for next 12 months? Like mentioned above, the conversions can happen between month 1 to month 15 from the trial sign up date. I would need to forecast US and non US separately. What would be some models to try? Any suggestions on forecasting trials and paid conversions are appreciated. Thanks in advance. submitted by /u/uraz5432 [link] [comments]
I always struggle with describing what I do without overcomplicating it. Especially with my parents. They speak Spanish and what I try to describe in Spanish I can’t communicate it. submitted by /u/Rare_Art_9541 [link] [comments]
For people working on information verification in general, for instance, working on fact checking, fake news detection or even using RAG from news articles this paper may be useful. Authors use different reinforcement learning techniques to estimate reliability values of news media outlets based on how they interact on the web. The method is easy to scale since the source code is available to build larger hyperlink-based interaction graphs from Common Crawl News. Authors also released the computed values and dataset with news media reliability annotation: Github repo: https://github.com/idiap/News-Media-Reliability Paper: https://aclanthology.org/2024.naacl-long.383/ Live Demo Example: https://lab.idiap.ch/criteria/ In the demo, the retrieved news articles will be order not only by the match to the query but also by the estimated reliability for each sources (URL domains are color coded from green to red, for instance, scrolling down will show results coming from less reliable sources marked with red-ish colors). Alternatively, if a news URL or a news outlet domain (e.g. apnews.com) is given as a query, information about the estimated values are detailed (e.g. showing the neighboring sources interacting with the media, etc.) Have a nice day, everyone! 🙂 submitted by /u/sergbur [link] [comments]
Too anxious about reviews as they didn’t arrive yet! Wanted to share with the community and see the reactions to the reviews! Rant and stuff! Be polite in comments. submitted by /u/always_been_a_toy [link] [comments]
As the title says, looking for some tips on how you keep track of the latest research in video generation and CV. I have been reading through https://cvpr.thecvf.com/ and it's a great source, are there any simiar ones? submitted by /u/Sobieski526 [link] [comments]
Research done at UoW shows that pre-prompting your LLM, or providing context prior to asking your question leads to better results. Even when the context is self generated. https://arxiv.org/pdf/2110.08387 For example asking, "What should I do while in Rome?" is less effective than a series of prompts, "What are the top restaraunts in Rome?" "What are the top sight seeing locations in Rome?" "Best things to do in Rome" "What should I do in Rome?" I always figured this was the case from anecdotal evidence but good to see people who are way starter than me explain it in this paper. And while chain prompting is a little more time consuming there's chrome extensions like ChatGPT Queue that ease up the process. Are their any other "hacks" to squeeze out better performance ? submitted by /u/CalendarVarious3992 [link] [comments]
Today I Learned (TIL) You learn something new every day; what did you learn today? Submit interesting and specific facts about something that you just found out here.
Reddit Science This community is a place to share and discuss new scientific research. Read about the latest advances in astronomy, biology, medicine, physics, social science, and more. Find and submit new publications and popular science coverage of current research.