Download the AI & Machine Learning For Dummies PRO App: iOS - Android Our AI and Machine Learning For Dummies PRO App can help you Ace the following AI and Machine Learning certifications:
What are some ways to increase precision or recall in machine learning?
What are some ways to Boost Precision and Recall in Machine Learning?
Sensitivity vs Specificity?
In machine learning, recall is the ability of the model to find all relevant instances in the data while precision is the ability of the model to correctly identify only the relevant instances. A high recall means that most relevant results are returned while a high precision means that most of the returned results are relevant. Ideally, you want a model with both high recall and high precision but often there is a trade-off between the two. In this blog post, we will explore some ways to increase recall or precision in machine learning.
There are two main ways to increase recall:
by increasing the number of false positives or by decreasing the number of false negatives. To increase the number of false positives, you can lower your threshold for what constitutes a positive prediction. For example, if you are trying to predict whether or not an email is spam, you might lower the threshold for what constitutes spam so that more emails are classified as spam. This will result in more false positives (emails that are not actually spam being classified as spam) but will also increase recall (more actual spam emails being classified as spam).
To decrease the number of false negatives,
you can increase your threshold for what constitutes a positive prediction. For example, going back to the spam email prediction example, you might raise the threshold for what constitutes spam so that fewer emails are classified as spam. This will result in fewer false negatives (actual spam emails not being classified as spam) but will also decrease recall (fewer actual spam emails being classified as spam).
There are two main ways to increase precision:
by increasing the number of true positives or by decreasing the number of true negatives. To increase the number of true positives, you can raise your threshold for what constitutes a positive prediction. For example, using the spam email prediction example again, you might raise the threshold for what constitutes spam so that fewer emails are classified as spam. This will result in more true positives (emails that are actually spam being classified as spam) but will also decrease precision (more non-spam emails being classified as spam).
you can lower your threshold for what constitutes a positive prediction. For example, going back to the spam email prediction example once more, you might lower the threshold for what constitutes spam so that more emails are classified as spam. This will result in fewer true negatives (emails that are not actually spam not being classified as spam) but will also decrease precision (more non-spam emails being classified as spam).
To summarize,
there are a few ways to increase precision or recall in machine learning. One way is to use a different evaluation metric. For example, if you are trying to maximize precision, you can use the F1 score, which is a combination of precision and recall. Another way to increase precision or recall is to adjust the threshold for classification. This can be done by changing the decision boundary or by using a different algorithm altogether.
Sensitivity vs Specificity
In machine learning, sensitivity and specificity are two measures of the performance of a model. Sensitivity is the proportion of true positives that are correctly predicted by the model, while specificity is the proportion of true negatives that are correctly predicted by the model.
Google introduced computing units, which you can purchase just like any other cloud computing unit you can from AWS or Azure etc. With Pro you get 100, and with Pro+ you get 500 computing units. GPU, TPU and option of High-RAM effects how much computing unit you use hourly. If you don’t have any computing units, you can’t use “Premium” tier gpus (A100, V100) and even P100 is non-viable.
Google Colab Pro+ comes with Premium tier GPU option, meanwhile in Pro if you have computing units you can randomly connect to P100 or T4. After you use all of your computing units, you can buy more or you can use T4 GPU for the half or most of the time (there can be a lot of times in the day that you can’t even use a T4 or any kinds of GPU). In free tier, offered gpus are most of the time K80 and P4, which performs similar to a 750ti (entry level gpu from 2014) with more VRAM.
For your consideration, T4 uses around 2, and A100 uses around 15 computing units hourly. Based on the current knowledge, computing units costs for GPUs tend to fluctuate based on some unknown factor.
For hobbyists and (under)graduate school duties, it will be better to use your own gpu if you have something with more than 4 gigs of VRAM and better than 750ti, or atleast purchase google pro to reach T4 even if you have no computing units remaining.
For small research companies, and non-trivial research at universities, and probably for most of the people Colab now probably is not a good option.
Colab Pro+ can be considered if you want Pro but you don’t sit in front of your computer, since it disconnects after 90 minutes of inactivity in your computer. But this can be overcomed with some scripts to some extend. So for most of the time Colab Pro+ is not a good option.
If you have anything more to say, please let me know so I can edit this post with them. Thanks!
In machine learning, precision and recall trade off against each other; increasing one often decreases the other. There is no single silver bullet solution for increasing either precision or recall; it depends on your specific use case which one is more important and which methods will work best for boosting whichever metric you choose. In this blog post, we explored some methods for increasing either precision or recall; hopefully this gives you a starting point for improving your own models!
Hoping to see if I can find any recommendations or suggestions into deploying R alongside other code (probably JavaScript) for commercial software. Hard to give away specifics as it is an extremely niche industry and I will dox myself immediately, but we need to use a Bayesian package that has primary been developed in R. Issue is, from my perspective, the package is poorly developed. No unit tests. poor/non-existent documentation, plus practically impossible to understand unless you have a PhD in Statistics along with a deep understanding of the niche industry I am in. Also, the values provided have to be "correct"... lawyers await us if not... While I am okay with statistics / maths, I am not at the level of the people that created this package, nor do I know anyone that would be in my immediate circle. The tested JAGS and untested STAN models are freely provided along with their papers. It is either I refactor the R package myself to allow for easier documentation / unit testing / maintainability, or I recreate it in Python (I am more confident with Python), or just utilise the package as is and pray to Thomas Bays for (probable) luck. Any feedback would be appreciated. submitted by /u/Sebyon [link] [comments]
I have just started using Linux and most of the tools I use aren't available as a desktop app, and some are available as a web app but after some search I figured that there are some tools available for Linux such Gravana and apache superset and database management systems so which is better that others from your experience submitted by /u/Emotional-Rhubarb725 [link] [comments]
This is normally the kind of thing I'd go to GPT for since it has endless patience, however, it can often come up with wonderful ideas and no way to actually fulfill them (no available data). One thing I've considered is using my spotify listening history to find myself new songs. On the one hand, I would love to do a data vis project on my listening history as I'm the type who has music on constantly. On the other hand, when it comes to the actual data science aspect of the project, I would need information on songs that I haven't listened to, in order to classify them. Does anybody know how I could get my hands on a list of spotify URIs in order to fetch data from their API? Moreover, does anybody know of any open source datasets that would lend themselves well to this kind of project? Kaggle data often seems too perfect and can't be used for a real-time project / tool, which is the bar nowadays. Some ideas I've had include Classifying crop diseases, but I'm not sure if there is open data, and labelled data on that? Predicting probability your roof is suitable for solar panel installation based on address and Google satellite API combined with an LLM and prompt engineering - I don't think I could use a logistics regression for this since there isn't labelled data I'm aware of Any other ideas that can use some element of machine learning? I'm comfortable with things like logistic regression and getting to grips with neural networks. Starting to ramble so I'll leave it there! submitted by /u/ColdStorage256 [link] [comments]
What are some of the contemporary ways in which Telemetry data is modeled? My experience is from before the pandemic days where I used fact-tables (Kimball dimensional modeling practices) and relied on metadata and views. But I anticipate working with large volumes of real-time streaming data like logs and clickstream. What resources/docs can I refer to when it comes to wrangling, modeling and analyzing for insights and further development? submitted by /u/StuckInLocalMinima [link] [comments]
Currently working in the retail industry that has quite a lot of transactional data. Apart from the traditional product recommendation /propensity / basket analysis / classification models used in client targeting, what other types of models are commonly being built in the retail scene? I’d love to hear about your use cases to get some new inspirations! submitted by /u/EstablishmentHead569 [link] [comments]
I am on a good level now and I want to practice what I have learned, but most of the projects online are far from practical and I want to do something close to reality so If anyone here works as a DA or BI , can you please direct me to projects online that you find close to what you work with ? submitted by /u/Emotional-Rhubarb725 [link] [comments]
I worked as a civil engineer for 5 years before getting a Hydroinformatics position (primarily data analysis with some elements of machine learning applied to wastewater) and and am looking to move into an official data science position. I haven’t gotten any hits while applying to various roles but I’ve been leaving out my engineering experience since I don’t want to work in anything related to engineering. Wondering if this is a bad idea and if I should just put the experience back on my resume? submitted by /u/GoldenPandaCircus [link] [comments]
Hi all, As the title implies, I've been relying on (somewhat near) real-time monitoring of model performance metrics to see if data drift has happened in my use-case. I'm wondering if you know other more sophisticated/advanced methods to detect data drift. Would love to hear any kind of methods, whether they target detection of covariate/feature drift, target/label drift or concept drift. Even better if you can share any Python or R implementations to carry out the above data drift checks. Thanks in advance! submitted by /u/YsrYsl [link] [comments]
Hi, I’m at the beginning of my career. Currently, my main focus has been to get into data science (product side), then maybe in the future decide to become a PM or stay in DS, depending on the role and company I get. Recently, I came across a company hiring for a Sales Engineer position. The company is a SaaS for data science, so I researched the role and felt I would like it. I do enjoy working with people, presenting things, and public speaking if required. I’ve heard the pay is pretty good in sales engineering too. However, I was wondering if I didn’t like it after a few months or years, would it be possible to come back to product or data science? I do like the data science field as I’m curious about finding insights, solving problems, and on the product side, improving user experiences and creating new features. I’ve been looking for an opportunity since last year and haven’t been able to land a job, not even as an analyst. So my fear is if I get into sales engineering (I’m only interested in this particular firm for this role), would it be hard to get back into the industry? I’m currently working as a volunteer data science to maintain visa, so not like I will be leaving actual DS job. Also, I don’t know if I will get the role for SE as well, but wanted to decide as I have someone who will refer me and i feared that it would look bad when I ask them for referral for data science role (in future as they don’t have any open role now for DS). Also, in the long term, I was thinking if money is the biggest driver for me, which would be better? Additionally, I’m an international on a STEM visa in the US. So, how hard is it for other firms to sponsor a sales engineering role? Thanks! submitted by /u/Starktony11 [link] [comments]
I have the opportunity to take two jobs. One is for a senior BI analyst at a medium-sized sports betting company ($500M revenue) and the other is for a “reporting analyst” position with a large asset management firm (>$1B revenue). Here is my dilemma: - The BI analyst one involves a lot more data science methodology and probably offers more direct advancement in this field The reporting analyst one is mostly focused on outreach and data collection (calling member banks and whatnot) to report investment data to supervisors however, I would prefer to work in the financial industry long term as I enjoy economic data more and enjoy the idea of working within capital Markets and economic planning. I’m worried that if I take the sports betting role, I wont be able to pivot into this field later on. I would appreciate any and all advice. submitted by /u/OutrageousPressure6 [link] [comments]
I'm a senior Data Scientist/Machine Learning Engineer with 7 years of experience and a Kaggle Grandmaster. I just finished the first round of interviews at Jane Street. I think I did okay—I managed to come up with a somewhat decent solution, although I got stuck a few times. I don’t really understand the rationale behind asking LeetCode-style questions for MLE positions. The interviewer was nice, but when I asked about the responsibilities of MLEs at Jane Street, he had no idea. I’m not sure how to feel about this process, but it doesn’t make much sense to me. submitted by /u/BurnerMcBurnersonne [link] [comments]
Forecasting is still very clumsy and very painful. Even the models built by major companies -- Meta's Prophet and Google's Causal Impact come to mind -- don't really succeed as one-step, plug-and-play forecasting tools. They miss a lot of seasonality, overreact to outliers, and need a lot of tweaking to get right. It's an area of data science where the models that I build on my own tend to work better than the models I can find. LLMs, on the other hand, have reached incredible versatility and usability. ChatGPT and its clones aren't necessarily perfect yet, but they're definitely way beyond what I can do. Any time I have a language processing challenge, I know I'm going to get a better result leveraging somebody else's model than I will trying to build my own solution. Why is that? After all the time we as data scientists have put into forecasting, why haven't we created something that outperforms what an individual data scientist can create? Or -- if I'm wrong, and that does exist -- what tool does that? submitted by /u/takenorinvalid [link] [comments]
I was given the title of Director of Data after only 3 years of being a Data Analyst. I am a one-man department in a company of ~30 employees with ~$6-7 million in revenue/yr. I only recently hired a direct report part-time to assist with some of the more mundane tasks. I was given the title because I routinely deal with executives. The management at my company wants them to view me more as an equal rather than reaching out to them and they just forward to me anyways to complete the data request. While I enjoy the pay bump and increase in autonomy, my goal is to work as an individual contributor at a high level (DA or DS) as I do not want to deal with direct reports if possible. Especially not ones that have to be micromanaged. I've noticed since my title change ~2 years ago recruiters reach out to me asking me to hire candidates rather than approaching me as a candidate. This makes me think this job title may be too early as I am not even 30 yet and I am not ready to settle into a management role long-term. Has anyone else had a similar experience and how did you deal with it? I am in a situation where my employer would be willing to essentially give me whatever title I want if I left on good terms when a company calls for a reference. Edit: for any new comments, I do not claim to be a Data Scientist. My role is a mix of data analyst and data engineer as I do both Ad Hoc analysis and manage our SQL database. Because I'm already getting paid around 120k, I would likely need to move into a Senior Data Analyst or entry level Data Scientist role to maintain my salary. And I likely could only enter a DS role after I finish my Masters in Data Science as my Bachelors is in Finance and my Python skills are still in the early stages. submitted by /u/Inception952 [link] [comments]
Which of these graduate level classes would be more beneficial in me getting a DS job? Which do you use more? Thanks! submitted by /u/Careless-Tailor-2317 [link] [comments]
How much bayesian inference are data scientists generally doing in their day to day work? Are there roles in specific areas of data science where that knowledge is needed? Marketing comes to mind but I’m not sure where else. By knowledge of Bayesian inference I mean building hierarchical Bayesian models or more complex models in languages like Stan. submitted by /u/AdFew4357 [link] [comments]
I have been a data science manager for a little more than two years and absolutely hate it. I used to be in analytics and then technical product manager for ML solutions and took on this role to gain people management experience. Biggest mistake of my life. I have been trying to get back to being an individual contributor but feel rusty at the moment. My relationship with my stakeholders are great. They love me and consequently I am not able to move back to my old role as it will leave a void in the current role. My skip level boss is the same and wouldn't allow it. I have been interviewing outside but not clearing interview primarily because I do not have anything to talk about my individual performance that's groundbreaking. I also feel like I need to get back to basics and start from scratch. Any advice on how to proceed? P.S. I don't like the people management part as I do not feel in control of my day. I manage 9 ICs and there's always some fire to put out. I also think I got the responsibility of a big portfolio without enough experience in management. submitted by /u/TheEmotionalNerd [link] [comments]
Today I Learned (TIL) You learn something new every day; what did you learn today? Submit interesting and specific facts about something that you just found out here.
Reddit Science This community is a place to share and discuss new scientific research. Read about the latest advances in astronomy, biology, medicine, physics, social science, and more. Find and submit new publications and popular science coverage of current research.