What are some ways to increase precision or recall in machine learning?
What are some ways to Boost Precision and Recall in Machine Learning?
Sensitivity vs Specificity?
In machine learning, recall is the ability of the model to find all relevant instances in the data while precision is the ability of the model to correctly identify only the relevant instances. A high recall means that most relevant results are returned while a high precision means that most of the returned results are relevant. Ideally, you want a model with both high recall and high precision but often there is a trade-off between the two. In this blog post, we will explore some ways to increase recall or precision in machine learning.
There are two main ways to increase recall:
by increasing the number of false positives or by decreasing the number of false negatives. To increase the number of false positives, you can lower your threshold for what constitutes a positive prediction. For example, if you are trying to predict whether or not an email is spam, you might lower the threshold for what constitutes spam so that more emails are classified as spam. This will result in more false positives (emails that are not actually spam being classified as spam) but will also increase recall (more actual spam emails being classified as spam).
To decrease the number of false negatives,
you can increase your threshold for what constitutes a positive prediction. For example, going back to the spam email prediction example, you might raise the threshold for what constitutes spam so that fewer emails are classified as spam. This will result in fewer false negatives (actual spam emails not being classified as spam) but will also decrease recall (fewer actual spam emails being classified as spam).
There are two main ways to increase precision:
by increasing the number of true positives or by decreasing the number of true negatives. To increase the number of true positives, you can raise your threshold for what constitutes a positive prediction. For example, using the spam email prediction example again, you might raise the threshold for what constitutes spam so that fewer emails are classified as spam. This will result in more true positives (emails that are actually spam being classified as spam) but will also decrease precision (more non-spam emails being classified as spam).
To decrease the number of true negatives,
you can lower your threshold for what constitutes a positive prediction. For example, going back to the spam email prediction example once more, you might lower the threshold for what constitutes spam so that more emails are classified as spam. This will result in fewer true negatives (emails that are not actually spam not being classified as spam) but will also decrease precision (more non-spam emails being classified as spam).
there are a few ways to increase precision or recall in machine learning. One way is to use a different evaluation metric. For example, if you are trying to maximize precision, you can use the F1 score, which is a combination of precision and recall. Another way to increase precision or recall is to adjust the threshold for classification. This can be done by changing the decision boundary or by using a different algorithm altogether.
Sensitivity vs Specificity
In machine learning, sensitivity and specificity are two measures of the performance of a model. Sensitivity is the proportion of true positives that are correctly predicted by the model, while specificity is the proportion of true negatives that are correctly predicted by the model.
Google Colab For Machine Learning
State of the Google Colab for ML (October 2022)
Google introduced computing units, which you can purchase just like any other cloud computing unit you can from AWS or Azure etc. With Pro you get 100, and with Pro+ you get 500 computing units. GPU, TPU and option of High-RAM effects how much computing unit you use hourly. If you don’t have any computing units, you can’t use “Premium” tier gpus (A100, V100) and even P100 is non-viable.
Google Colab Pro+ comes with Premium tier GPU option, meanwhile in Pro if you have computing units you can randomly connect to P100 or T4. After you use all of your computing units, you can buy more or you can use T4 GPU for the half or most of the time (there can be a lot of times in the day that you can’t even use a T4 or any kinds of GPU). In free tier, offered gpus are most of the time K80 and P4, which performs similar to a 750ti (entry level gpu from 2014) with more VRAM.
For your consideration, T4 uses around 2, and A100 uses around 15 computing units hourly.
Based on the current knowledge, computing units costs for GPUs tend to fluctuate based on some unknown factor.
- For hobbyists and (under)graduate school duties, it will be better to use your own gpu if you have something with more than 4 gigs of VRAM and better than 750ti, or atleast purchase google pro to reach T4 even if you have no computing units remaining.
- For small research companies, and non-trivial research at universities, and probably for most of the people Colab now probably is not a good option.
- Colab Pro+ can be considered if you want Pro but you don’t sit in front of your computer, since it disconnects after 90 minutes of inactivity in your computer. But this can be overcomed with some scripts to some extend. So for most of the time Colab Pro+ is not a good option.
If you have anything more to say, please let me know so I can edit this post with them. Thanks!
In machine learning, precision and recall trade off against each other; increasing one often decreases the other. There is no single silver bullet solution for increasing either precision or recall; it depends on your specific use case which one is more important and which methods will work best for boosting whichever metric you choose. In this blog post, we explored some methods for increasing either precision or recall; hopefully this gives you a starting point for improving your own models!
What are some ways we can use machine learning and artificial intelligence for algorithmic trading in the stock market?
Machine Learning and Data Science Breaking News 2022 – 2023
- Is the workload lesser in larger organizations?by /u/Mission-Language8789 (Data Science) on October 2, 2023 at 5:55 pm
I work in a small data team, and I'm the only one working on my project. So I've to look after everything from data collection, processing, dashboarding, quite a bit of software engineering, testing, CI/CD and deployment. So even though my primary work is data science, there's a requirement for end-to-end work. That got me thinking (not in a disrespectful way), what do data scientists who work at larger organizations spend time on? By larger organizations, I mean places where there are specialised teams for each of the steps I mentioned above. submitted by /u/Mission-Language8789 [link] [comments]
- What more can I do?by /u/PlayfulCobbler1497 (Data Science) on October 2, 2023 at 5:24 pm
I recently acquired my bachelors in Data Science. We covered the usual stuff (Python ,stat, ML, DL, SQL, R, Visualization etc) but I'm finding it difficult to get hired. My CVs don't seem to be getting anywhere. Any suggestions on how I can improve my employability? submitted by /u/PlayfulCobbler1497 [link] [comments]
- How to prep for final round interviews (intern)by /u/asmalltowngirlie (Data Science) on October 2, 2023 at 5:22 pm
I was told that I'm moving onto final round interviews, and it will consist of a coding portion using either SQL/R/Python, A/B testing and Statistics, and behavioral questions. I only have 1 week to prepare and am scared of messing this chance up. I don't know how to really prepare for A/B testing in the sense of an interview. submitted by /u/asmalltowngirlie [link] [comments]
- Data Analytics Certificationby /u/dameis (Data Science) on October 2, 2023 at 4:54 pm
I was looking on my schools catalog and saw a data analytics certification offered. After some research I found that the mathematics depart offers it. I feel like the classes in the certification sound better than most classes require of my data science program. Could anyone offer insight if the data science program covers the classes? Trying to figure out if I should take the data analytics certification or not. submitted by /u/dameis [link] [comments]
- Are there any playlists on YT that are ACTUALLY GOOD to learn Excel in context of Data Science ( beginner to advanced)by /u/vich_lasagna (Data Science) on October 2, 2023 at 4:42 pm
submitted by /u/vich_lasagna [link] [comments]
- Looking for something differentby /u/SupertrampDFenx (Data Science) on October 2, 2023 at 4:18 pm
Hi everyone, I would like to share my experience. I am currently working as a Data Scientist for a company (part of the R&D team). I have been with this company for about two years, and initially, everything seemed to be going well. At a certain point, the organization (both in terms of projects and the team) started to fall apart: pointless meetings, lack of clarity, and much more. After several months, I realized that the main problem is the manager of the R&D department, who, frankly speaking, does not understand anything about the technical side (starting from the programming language used and more). This has caused me a sense of discomfort that makes me consider leaving. The last straw was the assignment of a training course: initially, we were asked to choose a topic for a course (not necessarily related to the role, and in my case, I chose concepts of Data Engineering and frameworks like Airflow). Without saying anything, this manager assigned a course to all of us, Data Scientists, and what did she assign us? A course on Data Science! What's the point! In fact, I'm completely skipping every single video as I already know all the concepts. Having said that, can you give me some advice on how to handle this situation? Even better, can you suggest companies that have fully remote positions for Data Scientists? Thanks everyone submitted by /u/SupertrampDFenx [link] [comments]
- What I wish I had known earlier in my career, particularly with disorganized companiesby /u/Excellent_Cost170 (Data Science) on October 2, 2023 at 4:15 pm
I'm quoting directly from a Reddit user named funbike. This is the rule you should abide by in organizations. I also made the same mistake when I joined a company, attempting to prove myself. " After being a fool in my early career trying too hard to impress, this is how I handle this kind of thing these days: Document EVERYTHING. Follow-up verbal conversations with summary email. When things go south, I'll be able to prove I warned them. Give realistic estimates on how long things will take. Whatever I say is usually twice how long I actually think it will take, because things never go like you think. Make it clear that that longer-term estimates will be less accurate the farther out they are, because software is notoriously difficult to estimate. Tell them to their face that we will not make the unrealistic dates they've set, and to prevent in future to always consult first. I will not work overtime due to artificial deadlines. I'll do O/T for extreme exceptional cases only, such as a one-time short-term crisis or for a regulatory-mandated deadline. By 6pm I'll be at my house. Explain quality should never be abandoned for speed. It will violently backfire in the end, with the opposite effect. I stand my ground. I can make them mildly unhappy now, or furiously disappointed in our results in the future. I'll take the first one please. Even if you were to heroically meet their unreasonable date, they'll just expect more next time. You'll burn out and maybe the next time you'll have an embarrassing failure even with crazy overtime. They'll say "tsk, tsk" and blame you. Don't fall into this trap" submitted by /u/Excellent_Cost170 [link] [comments]
- How long did it take you to self-learn data science and afterwards, how long to get employed?by /u/Remarkable-Floor-351 (Data Science) on October 2, 2023 at 4:04 pm
To anyone who taught themselves data science and then achieved employment in a data science role, how long did it take you to learn in hours per day? And additionally, how long did it take you after you stopped learning to find a job and keep a job? If you did not self learn or hold a job afterwards please do not reply with any speculations. submitted by /u/Remarkable-Floor-351 [link] [comments]
- Which industries have the most lacking data architecture for data analysis/modelling?by /u/Brief-Living-5083 (Data Science) on October 2, 2023 at 3:39 pm
I have noticed some industries or domains are lacking in integrated data architecture (data warehouse, let alone data lake). One example that comes to mind is marketing where the different levels in data sources makes it difficult for integrated data architecture, hence, also difficult for cross-source analysis for the KPIs. And also on the vice versa which domains are leading in this? submitted by /u/Brief-Living-5083 [link] [comments]
- Benefits of converting DICOM images to PNG'sby /u/01jasper (Data Science) on October 2, 2023 at 3:36 pm
I try to understand what are the benefits to convert DICOM images to PNG's. Context: I have DICOM images which I already extracted the useful meta-data I want to use. Those images are for a task, classification-detection pipeline of some disease. So as I already asked, what are the benefits of converting those DICOM files to PNG's rather then just using pydicom and the dicom pixel_array? Reason I ask this is because I saw many top 5 users on kaggle do this when dealing with DICOM images. If I understand how networks actually works, they get as input an array of pixels as floating point numbers no? So what's the differences between DICOM pixel_array to PNG's pixel array and numpy array or tensor? both are eventually will be fed to the network as a tensor of floating numbers. Is the reason is because PNG's are usually faster to train? Is the reason is because PNG's have more libraries support for preprocessing / augmentation / etc. ? Is the reason is because PNG's are the format many pre-trained models expect to? (I write this knowing it's 99% not true, as mentioned the tensor thing) Thanks in Advance, and Please, forgive my English (I could use AI tools to fix it but I feel addicted already) submitted by /u/01jasper [link] [comments]
- Graduated with an MS in Data Science, now in the workforce as a systems analyst for a small consulting firm. Now what?by /u/drunkmute (Data Science) on October 2, 2023 at 2:48 pm
Hello everyone! As the title explains, I graduated with a MS in Data Science just recently in July, and in mid august I started my job as a system analyst (essentially a business analyst but for technology). It has been only 5 weeks in this job, but I have yet to implement the skills I gained in my Master's except for some project/process management (my MS was specializing in business), along with some web scraping and data manipulation in Python for an internal project. Because of this, I have grown pretty desperate to apply and practice all those skills I heavily enjoy, and this feeling has been exasperated by a pretty decent amount of downtime since I'm new. This is all to say that I am going to use the leverage of working in a small and very lax company to meet with a higher-up and ask for any work available either internally or in any present projects that intersect the area of data analytics and/or machine learning. But other than that, I wanted to know what other structured activities you all recommend to continue building on my career and skills that I can commit to during my down time. For example, I have been looking into an AWS certification path towards their ML certification, but I wanted to know what other suggestions you have? Thanks! submitted by /u/drunkmute [link] [comments]
- How far you can go in data science with cppby /u/DickSmithismydad (Data Science) on October 2, 2023 at 2:41 pm
Are their previous work in Cpp language in data science, please consider sharing , also any context related to use of cpp in data science. submitted by /u/DickSmithismydad [link] [comments]
- How do you handle making mistakes on the job?by /u/SuitableElk7382 (Data Science) on October 2, 2023 at 2:09 pm
What are some of the biggest mistakes you guys have made and how do you handle them? Especially when there is a time crunch. I’m a quality data analyst for a steel company and have been in this position for almost 2 years. I finished my masters in data analytics this past May, so this job has been my only real experience in the world of data. I want to transition to data science in this next year. In my free time, I take Codecademy courses to learn Python and SQL and I will eventually dive into Java as well. I take what I learn and I try to apply it to my job. We’re a legacy steel mill, so there is no fancy automation, the business and production systems don’t communicate very well, data can only be gathered through exporting reports from these systems in csv files. So I’ve been able to sort of make my own database using the tools I’ve been approved to download (basically just anaconda and power bi). As the only data analyst in my mill, with no previous steel making background, my company relies heavily on my data analytics to make business decisions both small and large and sometimes is overwhelming pressure to be precise. Luckily I haven’t had any major mistakes. The downside is I’m the only person doing the job I do and there isn’t a whole lot of computer literacy in the management, so unless my conclusions appear extremely illogical to them, they just roll with it. I’ve definitely made mistakes along the way but have caught them myself, sometimes working through the night so I can hurry and send emails out to disregard my previous work and look at the revised stuff. This just made me wonder how others handle mistakes both when they catch it and when they don’t catch it. I understand larger companies probably have a team of people doing the same projects or can lend a hand to be a 2nd pair of eyes. Maybe I’m just overdue to make my first big mistake lol. I feel like I make a lot of decisions day-to-day that I have to cross my fingers on. submitted by /u/SuitableElk7382 [link] [comments]
- Second Data Project : Web Scraping, I am a beginner, Help with suggestion!!by /u/Alarming_Scene126 (Data Science) on October 2, 2023 at 2:07 pm
Hello everyone!! I have come up with my second project and I am very excited to share here. I have done this work with a day of learning web scraping. please review my project and give feedbacks, suggestions and do not hesitate to leave brutal comments. Also, i request to help me with my next steps on web scraping. I would like to thank this community for letting to share my projects!! project title: Nepali Beverage Seller data web scraping https://www.kaggle.com/code/aadeshpradhan/nepali-alcohol-seller-data-web-scraping-cheers/notebook submitted by /u/Alarming_Scene126 [link] [comments]
- What should I major in?by /u/QueasyCold5787 (Data Science) on October 2, 2023 at 1:51 pm
I recently transferred from a state school to a tech school. Previously I was CS but now I am math. I still have until the end of this semester to decide what I want to do but I am debating math vs CS. My career goals involve working with big data such as an analyst or data engineer, but working as a SWE is also a dream. At my school, I can major in math, minor in computational data analysis and have a concentration in data science or I can major in CS and do threads in mod and sims, and info networks. Which is better and why? Also I know that I want to get a masters so what would be better. submitted by /u/QueasyCold5787 [link] [comments]
- Masters in Data Science or Statistics?by /u/Akshath_memeLoRd (Data Science) on October 2, 2023 at 1:00 pm
I have around an year's experience as a tech consultant in one of the Big 4 firms after my undegrad in CS. Having seen many threads on this sub regarding DS programs as cash grabs, I am genuinely confused whether to pursue a Masters in Data Science or a Masters in Statistics with a data science track. I am preferably looking forward to get into the Quant industry after my masters or am even open for data roles depending on the job situation then. submitted by /u/Akshath_memeLoRd [link] [comments]
- Git and Jupyter Notebooks: The Ultimate Guideby /u/amirathi (Data Science) on October 2, 2023 at 10:48 am
submitted by /u/amirathi [link] [comments]
- Intern questionby /u/snmstyle (Data Science) on October 2, 2023 at 9:45 am
I’m about to start my Masters wondering if I should be interning concurrently or be doing side projects or both? Coming from a non-cs background (chemistry) but i did do a bootcamp full-stack and familiar with python,sql, and data science. submitted by /u/snmstyle [link] [comments]
- How can an LLM be good at compressing images and audio?by /u/axlrosen (Data Science) on October 2, 2023 at 8:55 am
This paper seems to say that an LLM trained on text can somehow be good at compressing not just text but images and audio. Intuitively this seems improbable to me. How does this work? https://arxiv.org/pdf/2309.10668.pdf submitted by /u/axlrosen [link] [comments]
- Quick review of most used algorithm answersby /u/Choweeez (Data Science) on October 2, 2023 at 8:51 am
First, thanks to everyone that answered my previous post ! So, following this previous post (https://www.reddit.com/r/datascience/comments/16tgojm/what_kind_of_algorithm_do_you_use_the_most_as_a/), I'm giving here a quick review of the most common answers. Here it is: Gradient boosted machines (22): XGBoost (12), Light GBM (8), Catboost(2) Linear methods (19): Linear Regression (9), Logistic Regression (5), GAM (3), OLS (2) Random Forest (or tree based algo) (9) DBScan (4) DNN / CNN / GNN (4) Clustering algo / K-means (3) ARIMA (2) The number gives the number of time an answer appeared. I put here only answer that appeared at least 2 times.Also, I tried to gather some answers, but I don't know all the algorithms or tools your are using. So please forgive me if I did some mistakes or approximations in the way I gathered answers. submitted by /u/Choweeez [link] [comments]