You can translate the content of this page by selecting a language in the select box.
What are some ways to increase precision or recall in machine learning?
What are some ways to Boost Precision and Recall in Machine Learning?
Sensitivity vs Specificity?
In machine learning, recall is the ability of the model to find all relevant instances in the data while precision is the ability of the model to correctly identify only the relevant instances. A high recall means that most relevant results are returned while a high precision means that most of the returned results are relevant. Ideally, you want a model with both high recall and high precision but often there is a trade-off between the two. In this blog post, we will explore some ways to increase recall or precision in machine learning.
There are two main ways to increase recall:
by increasing the number of false positives or by decreasing the number of false negatives. To increase the number of false positives, you can lower your threshold for what constitutes a positive prediction. For example, if you are trying to predict whether or not an email is spam, you might lower the threshold for what constitutes spam so that more emails are classified as spam. This will result in more false positives (emails that are not actually spam being classified as spam) but will also increase recall (more actual spam emails being classified as spam).
To decrease the number of false negatives,
you can increase your threshold for what constitutes a positive prediction. For example, going back to the spam email prediction example, you might raise the threshold for what constitutes spam so that fewer emails are classified as spam. This will result in fewer false negatives (actual spam emails not being classified as spam) but will also decrease recall (fewer actual spam emails being classified as spam).
There are two main ways to increase precision:
by increasing the number of true positives or by decreasing the number of true negatives. To increase the number of true positives, you can raise your threshold for what constitutes a positive prediction. For example, using the spam email prediction example again, you might raise the threshold for what constitutes spam so that fewer emails are classified as spam. This will result in more true positives (emails that are actually spam being classified as spam) but will also decrease precision (more non-spam emails being classified as spam).
To decrease the number of true negatives,
you can lower your threshold for what constitutes a positive prediction. For example, going back to the spam email prediction example once more, you might lower the threshold for what constitutes spam so that more emails are classified as spam. This will result in fewer true negatives (emails that are not actually spam not being classified as spam) but will also decrease precision (more non-spam emails being classified as spam).
there are a few ways to increase precision or recall in machine learning. One way is to use a different evaluation metric. For example, if you are trying to maximize precision, you can use the F1 score, which is a combination of precision and recall. Another way to increase precision or recall is to adjust the threshold for classification. This can be done by changing the decision boundary or by using a different algorithm altogether.
Sensitivity vs Specificity
In machine learning, sensitivity and specificity are two measures of the performance of a model. Sensitivity is the proportion of true positives that are correctly predicted by the model, while specificity is the proportion of true negatives that are correctly predicted by the model.
Google Colab For Machine Learning
State of the Google Colab for ML (October 2022)
Google introduced computing units, which you can purchase just like any other cloud computing unit you can from AWS or Azure etc. With Pro you get 100, and with Pro+ you get 500 computing units. GPU, TPU and option of High-RAM effects how much computing unit you use hourly. If you don’t have any computing units, you can’t use “Premium” tier gpus (A100, V100) and even P100 is non-viable.
Google Colab Pro+ comes with Premium tier GPU option, meanwhile in Pro if you have computing units you can randomly connect to P100 or T4. After you use all of your computing units, you can buy more or you can use T4 GPU for the half or most of the time (there can be a lot of times in the day that you can’t even use a T4 or any kinds of GPU). In free tier, offered gpus are most of the time K80 and P4, which performs similar to a 750ti (entry level gpu from 2014) with more VRAM.
If you are looking for an all-in-one solution to help you prepare for the AWS Cloud Practitioner Certification Exam, look no further than this AWS Cloud Practitioner CCP CLFC01 book below.
For your consideration, T4 uses around 2, and A100 uses around 15 computing units hourly.
Based on the current knowledge, computing units costs for GPUs tend to fluctuate based on some unknown factor.
- For hobbyists and (under)graduate school duties, it will be better to use your own gpu if you have something with more than 4 gigs of VRAM and better than 750ti, or atleast purchase google pro to reach T4 even if you have no computing units remaining.
- For small research companies, and non-trivial research at universities, and probably for most of the people Colab now probably is not a good option.
- Colab Pro+ can be considered if you want Pro but you don’t sit in front of your computer, since it disconnects after 90 minutes of inactivity in your computer. But this can be overcomed with some scripts to some extend. So for most of the time Colab Pro+ is not a good option.
If you have anything more to say, please let me know so I can edit this post with them. Thanks!
In machine learning, precision and recall trade off against each other; increasing one often decreases the other. There is no single silver bullet solution for increasing either precision or recall; it depends on your specific use case which one is more important and which methods will work best for boosting whichever metric you choose. In this blog post, we explored some methods for increasing either precision or recall; hopefully this gives you a starting point for improving your own models!
What are some ways we can use machine learning and artificial intelligence for algorithmic trading in the stock market?
Machine Learning and Data Science Breaking News 2022 – 2023
- [D] Have researchers given up on traditional machine learning methods?by /u/fujidaiti (Machine Learning) on January 31, 2023 at 9:18 am
This may be a silly question for those familiar with the field, but don't machine learning researchers expect any more prospects for traditional methods (I mean, "traditional" is other than deep learning)? I feel that most of the time when people talk about machine learning in the world today, they are referring to deep learning, but is this the same in the academic world? Have people who have been studying traditional methods switched to neural networks? I know that many researchers are excited about deep learning, but I am wondering what they think about other methods. submitted by /u/fujidaiti [link] [comments]
- [D] deepmind's ai visionby /u/pfm11231 (Machine Learning) on January 31, 2023 at 9:05 am
hey i've been looking at this paper from deepmind https://arxiv.org/pdf/1807.01281.pdf where they train agents to play capture the flag based off of only visual input. what i'm curious about is are there any tricks going on here? Is the ai looking at a "screen" the same way a human would and then encodes it's observations after? or is it just looking at a grid of numbers? submitted by /u/pfm11231 [link] [comments]
- [Discussion] Misinformation about ChatGPT and ML in media and where to find good sources of informationby /u/Silvestron (Machine Learning) on January 31, 2023 at 8:49 am
TLDR: Poorly informed people have misleading discussions about "AI" in the media. Where can I find people who know what they're talking about? I've never had high expectations when any kind of tech was covered by general media, but even tech media is failing at properly informing people about what ChatGPT is even after months of continuous discussions about it. The current discussion is so bad that is making me want to unsubscribe from every tech podcast I listen to. It's either speculation on what jobs will be lost or pointing out errors that ChatGPT makes. And it's the second point that I can't tolerate anymore because it's completely missing the point on what this kind of technology really is, which is a language model. Accuracy, while desired, is beyond its capabilities. Literally during my first day playing with it ChatGPT was constantly reminding me that the accuracy of the answers it gives depend entirely on the patterns it has learned. And I want to quote something I also asked ChatGPT: Q: Why do you display words one by one when you respond? A: I am a machine learning model that generates text one word at a time, based on the input it receives and the patterns it has learned during training. This is a common approach used in many language generation models, and it allows the model to generate text that is more coherent and contextually relevant. The model is trained on a large dataset of text and it learns the patterns and structure of the language. As it receives input, it uses the patterns learned during training to generate the next word, this process is repeated one word at a time until it reaches the desired length of the response. This approach has the advantage of being more efficient and allows the model to generate text with a greater level of coherence and contextually relevant, but it also has some limitations, such as generating repetitive or nonsensical phrases. I don't know if this answer is accurate, I'd say it seems to make sense, but what I want to point out is that any person that doesn't have any idea of what ChatGPT is can directly ask it and get at least an idea of what it is. And it's not what it's being described in the media. I'd say that even calling it "AI" is misleading because it's not intelligent. You can easily test this by giving ChatGPT a sequence and asking it to continue it. It can give correct answers for very simple sequences but as soon as you make it more difficult it will start giving incorrect answers because, from what I understand, those patterns are not part of its training. This is an example about that: Q: I'll give you a sequence and I want you to continue it with the three following steps. Sequence: a, b, c A: d, e, f Q: Sequence: aa, ab, ac A: ad, ae, af Q: Sequence: ac, bd, ce A: cf, dg, eh Again, this is just to show how little effort it takes to learn what this tool is about. The bar is so low but people still don't make any effort in learning more about it before discussing it with a broader audience which is kind of ironic if you consider that part of the discussion is how inaccurate ChatGPT is. I'm here because I love ChatGPT and all the other machine learning tools and models that have started to emerge in the past year. Still, because the current discussion focuses only on topics that have little to do with the technology itself I can't find any good source of information. And on Youtube I can't find anything other than tutorials, videos on how to make money with AI or people reacting to AI. I'd like to ask if anyone knows of any content creator, podcast, or any anything really that covers machine learning and have at least some basic knowledge of the topic. submitted by /u/Silvestron [link] [comments]
- Government workers. How do you accomplish data analysis/visualization?by /u/Longjumping-B (Data Science) on January 31, 2023 at 8:48 am
Perhaps I am alone in the walled garden that I’m forced to work in. I do population health analysis of approximately 100,000 lines of varying data. The nightmare of my life is that I cannot use any scripting. No python, r, macro-enabled workbooks. For visualization, can’t upload to anything as it’s protected health information. No Tableau, no online SQL database, no power BI. All I can use is Power query/pivot. I admit, I can adapt and overcome with being hamstrung like this. However as the data grows, Excel is starting to give out on me. I’m looking to see if anyone has methods that they use to clean data and present basic visualizations outside of excel. submitted by /u/Longjumping-B [link] [comments]
- Are notebooks a form of data visualisation?by /u/Chaoscontrol9999 (Data Science) on January 31, 2023 at 7:56 am
We’re creating a data visualisation segment of our app. Would a notebook be a form of data visualisation?? submitted by /u/Chaoscontrol9999 [link] [comments]
- Mechanical engineer trying for data science masters.by /u/burralohit01 (Data Science) on January 31, 2023 at 7:49 am
Firstly I’m sorry if this is like the most generic type of post in this sub, I’m sure may of the beginners are curious, so am I. I completed my undergrad in mechanical engineering, and Ive applied for masters in data science couple of months back, then came in chat gpt. Now I have so many questions. Firstly is this discipline ai proof, I’m pretty sure it’s not. But there’s some aspect which needs human expertise. Is this the right career path for me going forward? Can I get placed after my masters? I’m currently working in it and right now and most of my repetitive work I’ve automated it using chat gpt and I hardly open my laptop these days. It’s very handy and so terrifying. submitted by /u/burralohit01 [link] [comments]
- How to choose the optimal K for KNN?by /u/Muu-dzic (Data Science) on January 31, 2023 at 7:44 am
I have a huge dataset with more than 100k rows and 30+ variables. A lot of articles online mention that the optimal value of K is sqrt(n) where n is the number of samples the model is trained on. Is that the best value of k? A high k value also makes computation harder and takes a longer time. Is there another way to find the optimal k ? submitted by /u/Muu-dzic [link] [comments]
- Looking for recommended courses/certification (AI/DS related)by /u/nobodykid23 (Data Science) on January 31, 2023 at 7:36 am
Hello, I'm currently working as a data science staff on a rather small startup company for 2 years now. Considering our size, I've been making several analytical dashboards that helps both operational and marketing team. I've been doing lots of data viz (mostly in echart), analyzing data in SQL (mostly Postgres), and building automation in fetching and doing said analyses (mostly in Airflow). I used to study in machine learning during my uni time, but it's been quite a while and I felt my AI knowledge has been rusty. I'm looking to improve my skills in data science or AI related field hopefully to advance my career, and also helping me in landing job in bigger company. Does anyone have recommendation for some courses, possibly those that ties with certification at the end of the course? Thanks for any suggestions given submitted by /u/nobodykid23 [link] [comments]
- Is there a reason why scikit-learn's classification report doesn't output the number of predictions?by /u/Seankala (Data Science) on January 31, 2023 at 7:30 am
Those of you who've used sklearn.metrics.classification_report will know what I'm talking about, but the function prints the class name, precision, recall, f1-score, and support. This function builds on top of sklearn.metrics.precision_recall_f1_support but I'm wondering why only support is taken into consideration and not the number of predictions. In my job I often find it useful to tune my models or preprocessing according to how many times the model has made a prediction for each class. The current functions don't support that. I've made small changes so that it does but I'm wondering if it's convention or something to only care about support. submitted by /u/Seankala [link] [comments]
- Starting to job hunt but unsure about myself and the salaries; advice needed!by /u/darkpoisonblood (Data Science) on January 31, 2023 at 7:04 am
Hi, it’s my first time posting here and just really curious. I’m currently in my final year of my master’s in computer science. And I intend to work full-time as soon as I graduate. I am hoping for a data analyst or data science or machine learning researcher position. However, I still feel inexperienced tho. The only things I got to show on my resume are: 1) a data analyst internship at an insurance company, 2) some research projects that involve analysis of large datasets and use of machine learning and deep learning models (with publications to journals and conferences), and 3) my master’s thesis. So now I am on the hunt for jobs, and idk how much (average) salary I should be expecting (other than the baseline from my previous internship job). I fear that I might find an offer that will pay very little and then I’ll accept because I don’t know the numbers. Would anybody be able to give a rough ballpark of how much these positions usually pay? I’m hoping to get into an entry-level (or new grad program) first, since it’ll be my very first full-time stable job. Currently living in Canada and idk how much salaries compare with that of US. Also, just to give context, I started right away with my master’s right after graduating from my undergraduate, as I felt the need for a stronger education background before coming into the field (was also pandemic times when I graduated so it was just tough to find jobs then). And any advice you could give for a noobie? It would be appreciated. Thanks! submitted by /u/darkpoisonblood [link] [comments]
- How long did it take you to learn R?by /u/kizzop (Data Science) on January 31, 2023 at 5:53 am
Any pre-req skills you’d recommend one have? submitted by /u/kizzop [link] [comments]
- Does anyone work at Samsung Ads?by /u/Curious-Cucumber8047 (Data Science) on January 31, 2023 at 5:36 am
I’m interviewing with them tomorrow for an analytics position and would like to know about the culture, work load and growth of the business. submitted by /u/Curious-Cucumber8047 [link] [comments]
- [R] MusicLM Text to Music from Google Researchby /u/Sea-Photo5230 (Machine Learning) on January 31, 2023 at 5:21 am
MusicLM is a model generating high-fidelity music from text descriptions such as "a calming violin melody backed by a distorted guitar riff". MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modeling task, and it generates music at 24 kHz that remains consistent over several minutes. MusicLM can be conditioned on both text and a melody in that it can transform whistled and hummed melodies according to the style described in a text caption. In this short video i explain the model behind MusicLM. Do checkout https://youtu.be/8rofGhGJmgY submitted by /u/Sea-Photo5230 [link] [comments]
- The K-Nearest Neighbors Algorithm for regression and classificationby /u/Beginning-Scholar105 (Data Science) on January 31, 2023 at 5:18 am
submitted by /u/Beginning-Scholar105 [link] [comments]
- DALL-E, ChatGPT, etc. seem to defeat the purpose of developing and employing AIby /u/renok_archnmy (Data Science) on January 31, 2023 at 5:05 am
“Man made machines make music for the man Now machines make music while the man makes plans A second generation a generation lost While the man prepares for the holocaust Man made machines to control the days Now machines control while the man obeys A second generation a generation lost While machines prepare for the holocaust” A Flock of Seagulls, Man Made Songwriters: Alister James Score / Francis Maudsley / Michael Score / Paul E Reynolds 1982 https://m.youtube.com/watch?v=sTk4jz9OBhI Seems apropos with all the spam about generative models to reflect on them through the art most of y’all would rather let them make instead of letting humans make. submitted by /u/renok_archnmy [link] [comments]
- Where do I begin with freelancing, what is a good plan?by /u/No-Willingness1879 (Data Science) on January 31, 2023 at 4:30 am
I recently finished my first year in university for Data Science and completed an internship for data analysis over the holidays. I would like to gain some experience and work online on projects, not solely for the money but also to earn some extra cash. I am a beginner in the field, and I was thinking of researching the most in-demand skills on Upwork and learning them. For example, scraping data from a real estate website to predict prices. I plan to practice these skills using online data until I feel confident in my abilities then get started applying for jobs on Upwork. I also am considering cold calling local businesses. Is any of this a good approach? If not, could you suggest a better plan? Please be radically transparent. I posted this on r/careeradvice and got no comments, so I thought this would be the place to go. tldr: finished first year university/college (data science) and would like to get started freelancing, where do I start? submitted by /u/No-Willingness1879 [link] [comments]
- Standardize data before feature engineering or after?by /u/Stats_love (Data Science) on January 31, 2023 at 3:27 am
Should you standardize data before applying a transformation like adstock? Or after? submitted by /u/Stats_love [link] [comments]
- Remote/rural data science community?by /u/binomialdistribution (Data Science) on January 31, 2023 at 2:41 am
Is there a Discord/Mastodon/Slack/Meetup/whatever for data scientists/analysts working remotely? Preferably for people that live in a small city to a rural area. submitted by /u/binomialdistribution [link] [comments]
- I've been reporting sales data at work. What else can I learn and apply?by /u/coolandsmartrr (Data Science) on January 31, 2023 at 2:13 am
I work in marketing and report sales data on a weekly basis. So far, I've been finding out new items to report on by fiddling on Excel. I report on things like: the day with most sales proportion of item sales volume compared to overall sales volume finding relevant news to speculate on correlation to sales Since I'd like to find more sales insights, I was wondering what else I can learn. I should probably improve upon: theoretical knowledge on business data analysis, to find more items to report on using Python for data reports I have used Python in the past and am rather comfortable with it. I think it might be better to transition to Python as Excel is not performant with high amounts of data. Q: Please let me know what topics and resources I should explore. submitted by /u/coolandsmartrr [link] [comments]
- let the data speakby /u/statisticant (Data Science) on January 31, 2023 at 1:53 am
submitted by /u/statisticant [link] [comments]