AI Dashboard is available on the Web, Apple, Google, and Microsoft, PRO version
What is the Best Machine Learning Algorithms for Imbalanced Datasets?
In machine learning, imbalanced datasets are those where one class heavily outnumbers the others. This can be due to the nature of the problem or simply because more data is available for one class than the others. Either way, imbalanced datasets can pose a challenge for machine learning algorithms. In this blog post, we’ll take a look at which machine learning algorithms are best suited for imbalanced datasets and why they tend to perform better than others.
For example, in a binary classification problem, if there are 100 observations, and only 10 of them are positive (the rest are negatives), then we say that the dataset is imbalanced. The ratio of positive to negative cases is 1:10.
There are a few reasons why some machine learning algorithms tend to perform better on imbalanced datasets than others. First, certain algorithms are designed to handle imbalanced datasets. Second, some algorithms are more robust to outliers, which can be more common in imbalanced datasets. And third, some algorithms are better able to learn from a limited amount of data, which can be an issue when one class is heavily outnumbered by the others.
Some of the best machine learning algorithms for imbalanced datasets include:
– Support Vector Machines (SVMs),
– Decision Trees,
– Random Forests,
– Naive Bayes Classifiers,
– k-Nearest Neighbors (kNN),
Of these, SVMs tend to be the most popular choice as they are specifically designed to handle imbalanced datasets. SVMs work by finding a hyperplane that maximizes the margin between the two classes. This helps to reduce overfitting and improve generalization. Decision trees and random forests are also popular choices as they are less sensitive to outliers than other algorithms such as linear regression. Naive Bayes classifiers are another good choice as they are able to learn from a limited amount of data. kNN is also a good choice as it is not sensitive to outliers and is able to learn from a limited amount of data. However, it can be computationally intensive for large datasets.
There are two main types of machine learning algorithms: supervised and unsupervised. Supervised algorithms tend to perform better on imbalanced datasets than unsupervised algorithms. In this blog post, we will discuss why this is so and look at some examples.
Supervised Algorithms
Supervised algorithms are those where the target variable is known. In other words, we have training data where the correct answers are already given. The algorithm then learns from this data and is able to generalize to new data. Some examples of supervised algorithms are regression and classification.
Unsupervised Algorithms
Unsupervised algorithms are those where the target variable is not known. With unsupervised algorithms, we only have input data, without any corresponding output labels. The algorithm has to learn from the data itself without any guidance. Some examples of unsupervised algorithms are clustering and dimensionality reduction.
Why Supervised Algorithms Perform Better on Imbalanced Datasets
The reason why supervised algorithms perform better on imbalanced datasets is because they can learn from the training data which cases are more important. With unsupervised algorithms, all data points are treated equally, regardless of whether they are in the minority or majority class.
For example, in a binary classification problem with an imbalanced dataset, let’s say that we want to predict whether a customer will default on their loan payment or not. We have a training dataset of 1000 customers, out of which only 100 (10%) have defaulted on their loan in the past.
If we use a supervised algorithm like logistic regression, the algorithm will learn from the training data that defaulting on a loan is rare (since only 10% of cases in the training data are Positive). This means that it will be more likely to predict correctly that a new customer will not default on their loan (since this is the majority class in the training data).
However, if we use an unsupervised algorithm like k-means clustering, all data points will be treated equally since there is no target variable to guide the algorithm. This means that it might incorrectly cluster together customers who have defaulted on their loans with those who haven’t since there is no guidance provided by a target variable.
Conclusion:
In conclusion, supervised machine learning algorithms tend to perform better on imbalanced datasets than unsupervised machine learning algorithms because they can learn from the training data which cases are more important.
Some machine learning algorithms tend to perform better on highly imbalanced datasets because they are designed to deal with imbalance or because they can learn from both classes simultaneously. If you are working with a highly imbalanced dataset, then you should consider using one of these algorithms.
Thanks for reading!
How are machine learning techniques being used to address unstructured data challenges?
Machine learning techniques are being used to address unstructured data challenges in a number of ways:
- Natural language processing (NLP): NLP algorithms can be used to extract meaningful information from unstructured text data, such as emails, documents, and social media posts. NLP algorithms can be trained to classify text data, identify key terms and concepts, and extract structured data from unstructured text.
- Image recognition: Machine learning algorithms can be used to analyze and classify images, enabling the automatic identification and classification of objects, people, and other elements in images. This can be useful for tasks such as image tagging and search, as well as for applications such as security and surveillance.
- Audio and speech recognition: Machine learning algorithms can be used to analyze and classify audio data, enabling the automatic transcription and translation of spoken language. This can be useful for tasks such as speech-to-text transcription, as well as for applications such as call center automation and language translation.
- Video analysis: Machine learning algorithms can be used to analyze and classify video data, enabling the automatic detection and classification of objects, people, and other elements in video. This can be useful for tasks such as video tagging and search, as well as for applications such as security and surveillance.
Overall, machine learning techniques are being used in a wide range of applications to extract meaningful information from unstructured data, and to enable the automatic classification and analysis of data in a variety of formats.
How is AI and machine learning impacting application development today?
Artificial intelligence (AI) and machine learning are having a significant impact on application development today in a number of ways:
- Enabling new capabilities: AI and machine learning algorithms can be used to enable applications to perform tasks that would be difficult or impossible for humans to do. For example, AI-powered applications can be used to analyze and classify large amounts of data, or to automate complex decision-making processes.
- Improving performance: AI and machine learning algorithms can be used to optimize the performance of applications, making them faster, more efficient, and more accurate. For example, machine learning algorithms can be used to improve the accuracy of predictive models, or to optimize the performance of search algorithms.
- Streamlining development: AI and machine learning algorithms can be used to automate various aspects of application development, such as testing, debugging, and deployment. This can help to streamline the development process and reduce the time and resources needed to build and maintain applications.
- Enhancing user experiences: AI and machine learning algorithms can be used to enhance the user experience of applications, by providing personalized recommendations, recommendations, or by enabling applications to anticipate and respond to the needs and preferences of users.
Overall, AI and machine learning are having a significant impact on application development today, and they are likely to continue to shape the way applications are built and used in the future.
How will advancements in artificial intelligence and machine learning shape the future of work and society?
Advancements in artificial intelligence (AI) and machine learning are likely to shape the future of work and society in a number of ways. Some potential impacts include:
Get 20% off Google Google Workspace (Google Meet) Standard Plan with the following codes: 96DRHDRA9J7GTN6
Get 20% off Google Workspace (Google Meet) Business Plan (AMERICAS): M9HNXHX3WC9H7YE (Email us for more codes)
Active Anti-Aging Eye Gel, Reduces Dark Circles, Puffy Eyes, Crow's Feet and Fine Lines & Wrinkles, Packed with Hyaluronic Acid & Age Defying Botanicals
- Automation: AI and machine learning algorithms can be used to automate tasks that are currently performed by humans, such as data entry, customer service, and manufacturing. This could lead to changes in the types of jobs that are available and the skills that are in demand, as well as to increased productivity and efficiency.
- Job displacement: While automation may create new job opportunities, it could also lead to job displacement, particularly for workers in industries that are more susceptible to automation. This could lead to social and economic challenges, including unemployment and income inequality.
- Increased efficiency: AI and machine learning algorithms can be used to optimize and streamline business processes, leading to increased efficiency and productivity. This could lead to economic growth and innovation, and could also help to reduce costs for businesses and consumers.
- Enhanced decision-making: AI and machine learning algorithms can be used to analyze large amounts of data and make more informed and accurate decisions. This could lead to improved outcomes in fields such as healthcare, finance, and education, and could also help to reduce bias and improve fairness.
Overall, the impact of AI and machine learning on the future of work and society is likely to be significant and complex, with both potential benefits and challenges. It will be important to consider and address these impacts as these technologies continue to advance and become more widely adopted.
- Adaptive RAG: A retrieval technique to reduce LLM token cost for top-k Vector Index retrieval [R]by /u/dxtros (Machine Learning) on March 28, 2024 at 6:55 pm
Abstract: We demonstrate a technique which allows to dynamically adapt the number of documents in a top-k retriever RAG prompt using feedback from the LLM. This allows a 4x cost reduction of RAG LLM question answering while maintaining the same level of accuracy. We also show that the method helps explain the lineage of LLM outputs. The reference implementation works with most models (GPT4, many local models, older GPT-3.5 turbo) and can be adapted to work with most vector databases exposing a top-k retrieval primitive. Blog paper: https://pathway.com/developers/showcases/adaptive-rag Reference implementation: https://github.com/pathwaycom/pathway/blob/main/python/pathway/xpacks/llm/question_answering.py submitted by /u/dxtros [link] [comments]
- [D] Suggested readings on distributed inferenceby /u/Shintuku1 (Machine Learning) on March 28, 2024 at 6:37 pm
I'm looking for readings on distributed inference: is it at all possible? Is there any system architecture that makes this feasible, or at all worthwhile? What approaches are there to distributed inference? I'm getting a number of hits on Google Scholar; anything you personally consider worthwhile digging into? submitted by /u/Shintuku1 [link] [comments]
- [D] Advice needed!!by /u/ray_ashh (Machine Learning) on March 28, 2024 at 5:44 pm
I am currently a sophomore studying computer science. In this era of AI, is it necessary for me to learn the inner workings of AI like the math and other stuff or should I directly dive into the top level stuff and create projects based on models made by others. What would be better for me to break into jobs in AI startups or MNC's. submitted by /u/ray_ashh [link] [comments]
- [D] Stanford's BioMedLM Paper reported accuracy vs Evaluated accuracy: Doesn't make senseby /u/aadityaura (Machine Learning) on March 28, 2024 at 5:32 pm
Stanford releases #BioMedLM, a 2.7B parameter language model trained on biomedical data. However, the results do not seem to make sense. Here is the evaluation report using the LM Evaluation Harness framework on MultiMedQA (MedMCQA, MedQA, MMLU, PubMed). https://preview.redd.it/vd21crtn14rc1.png?width=1442&format=png&auto=webp&s=ee905e8277006e40c37b7e5b87003165bd0de4b5 https://preview.redd.it/6ot7mibo14rc1.png?width=1164&format=png&auto=webp&s=5d76fcce909fb07d5404e148b0cdc2fbc6dae43c submitted by /u/aadityaura [link] [comments]
- [D] What skills should I have to make the transition from Physics/EE/SWE to ML professionallyby /u/Brilliant-Donkey-320 (Machine Learning) on March 28, 2024 at 5:24 pm
I am looking to transition to a ML engineer (or DS possibly) in the future(1-3yrs) and (I will continue to work as a SWE, but possibly with a job in Python in the meantime, TBD. I have my education and work background below. What skills and knowledge should I gain/brush up on? Any thing I should add to my rough plan I am doing below for the next year-ish. Courses: - CS50 AI with Python - Andrew Ng ML Specialization - Andrew Ng DL Specialization These courses seem to give a good base (but not too deep) (These teach with TF instead of PyTorch. Side question, is it easy to transition from TF to PyTorch?) Also, I will be reading Introduction to Statistical Learning in Python. This book seems to have some good depth to it at a glance. Also also, just review my linear algebra, probability and statistics, multivariable calculus. During this process, I thought it would be good get some Kaggle datasets and do some projects with them. Any suggestions or thoughts on what I am maybe missing or have overlooked? Thanks! Education Background: BSc - Physics MSc- Electronics and ICT Work Background: Hardware engineer: 1.5yrs Software engineer: 1.5yrs. C# .NET, desktop application submitted by /u/Brilliant-Donkey-320 [link] [comments]
- [D] Suggestions on organizing and monitoring multi-model trainingby /u/pwinggles (Machine Learning) on March 28, 2024 at 5:05 pm
Hey all, I have a project that, for me, is a bit complicated and so I'm trying to scheme out the best structure for it prior to getting things running, and I'm looking for some advice. The situation: I have 4 tabular predictor datasets, each of which has 31 response variables (RV) for which I need to train regression models (using XGBoost). By the end, I will have 124 (4 * 31) trained models. Ideally, for each RV I'd like to perform some form of K-fold cross-validated hyperparam optimization, and final model analysis will also be based on K-fold CV. The challenge: I'm trying to figure out the best way to organize all of this in such a way that it isn't a complete mess when it comes to reproducibility and analysis as well as having the potential to add new predictor data and/or new RVs. I've done this once before and I opted for just writing data out to a CSV, but that quickly became unwieldy and ended up requiring a lot of extra code just to handle and parse the results sanely. I'd really like to be able to visualize the training and performance for each of the models, but most of the examples of popular tools in this space seem to focus training a single model, with "experiments" generally referring to different hyperparams or feature modifications. DVC, Aim, WandB all look appealing, but I'm not quite sure how to conceptualize my particular workflow, and I'd like to avoid any eventual limiting pitfalls in the future by making sure my initial seutp is sound. I'd love to hear how others have organized such multi-model/ensemble training projects! submitted by /u/pwinggles [link] [comments]
- [D] A Little guide to building Large Language Models in 2024 – 75min lectureby /u/Thomjazz (Machine Learning) on March 28, 2024 at 4:26 pm
I finally recorded this lecture I gave two weeks ago because people kept asking me for a video. So here it is, I hope you'll enjoy it "A Little guide to building Large Language Models in 2024". I tried to keep it short and comprehensive – focusing on concepts that are crucial for training good LLM but often hidden in tech reports. In the lecture, I introduce the students to all the important concepts/tools/techniques for training good performance LLM:- finding, preparing and evaluating web scale data- understanding model parallelism and efficient training- fine-tuning/aligning models- fast inference There is of course many things and details missing and that I should have added to it, don't hesitate to tell me you're most frustrating omission and I'll add it in a future part. In particular I think I'll add more focus on how to filter topics well and extensively and maybe more practical anecdotes and details. Now that I recorded it I've been thinking this could be part 1 of a two-parts series with a 2nd fully hands-on video on how to run all these steps with some libraries and recipes we've released recently at HF around LLM training (and could be easily adapted to your other framework anyway): datatrove for all things web-scale data preparation: https://github.com/huggingface/datatrove nanotron for lightweight 4D parallelism LLM training: https://github.com/huggingface/nanotron lighteval for in-training fast parallel LLM evaluations: https://github.com/huggingface/lighteval Here is the link to watch the lecture on Youtube: https://www.youtube.com/watch?v=2-SPH9hIKT8And here is the link to the Google slides: https://docs.google.com/presentation/d/1IkzESdOwdmwvPxIELYJi8--K3EZ98_cL6c5ZcLKSyVg/edit#slide=id.p Enjoy and happy to hear feedback on it and what to add, correct, extend in a second part. submitted by /u/Thomjazz [link] [comments]
- The end of hallucination (for those who can afford it)? [R]by /u/we_are_mammals (Machine Learning) on March 28, 2024 at 3:04 pm
DeepMind just published a paper about fact-checking text: https://preview.redd.it/zsmv0a0293rc1.png?width=1028&format=png&auto=webp&s=789c1c2f9b31aa734a7ebcf459df3ad06bd74285 The approach costs $0.19 per model response, using GPT-3.5-Turbo, which is cheaper than human annotators, while being more accurate than them: https://preview.redd.it/ob7bb3iv73rc1.png?width=1014&format=png&auto=webp&s=e79bbcaa578b29772cb3b43ead508daff7288091 They use this approach to create a factuality benchmark and compare some popular LLMs. Paper and code: https://arxiv.org/abs/2403.18802 submitted by /u/we_are_mammals [link] [comments]
- [Discussion] MS in bioinformatics&digital health with minor in Machine Learning, Data science and AI? Is this a good combo for jobs in techby /u/maansaee1 (Machine Learning) on March 28, 2024 at 2:11 pm
Hi! I need advice. Will I be able to find jobs in the Machine Learning, Data Science and AI field if I have a MS in Bioinformatics with a minor in Machine learning? Bioinformatics is a niche field, and I’m worried that my minor will not be enough to transition into the tech field if there are no good jobs available in bioinf. The bioinformatics major also consists of a lot of machine learning and data analysis. Here’s more info in the courses listed in the major for anyone interested: https://www.aalto.fi/en/programmes/masters-programme-in-life-science-technologies/curriculum-2022-2024. Should I stick with only a minor in machine learning, or should I just complete my MS im machine learning? I’m interested in both fields, and want to make sure my future salary is good. submitted by /u/maansaee1 [link] [comments]
- [D] Anyone remembers the AI company Element AI?by /u/xiikjuy (Machine Learning) on March 28, 2024 at 10:49 am
Founded in 2016. Yoshua Bengio was one of its co-founders. Sold for 230M in 2020. I think with their talents, they could have provided one of the top llms now. Timing is really important for a startup ... submitted by /u/xiikjuy [link] [comments]
- [D] How is deep learning used to correct spelling errors in search engines?by /u/Seankala (Machine Learning) on March 28, 2024 at 10:46 am
I was reading this 2021 blog post by Google about how Google handles spelling errors: https://blog.google/products/search/abcs-spelling-google-search/ They said that they introduced deep learning into their search but to me it's not really that straightforward. What is the input and output? Is the input a potentially misspelled query, and is the output a 0 or 1 where 1 is misspelled? Or is the input a potentially misspelled query and is the model generating a potentially correct query? How is deep learning used in general for search? submitted by /u/Seankala [link] [comments]
- [D] a sentence level transformer to improve memory for a token level transformer?by /u/Alarming-Ad8154 (Machine Learning) on March 28, 2024 at 10:29 am
I have an (probably dumb) idea for long term transformer memory. You can embed sentences into vectors of length ~128 - ~2048 right? Then you can cluster those sentences and effectively project them into lower dimensional spaces. I have often wondered whether you could take ~50.000 cardinal points in the embedding space (points such that the summed of squared distance to all sentences in a representative corpus is minimal). You'd then map each sentence in a big corpus to the nearest point, these points are then used as tokens. Subsequently you encode a massive text library into these tokens, and train a bog standard GPT model to predict "next sentence". Given the model deals in "sentences", even a 4096 context length would be BIG, but it wouldn't be able to give you the details of these sentence, as the 50k tokens are a very coarse representation of all possible sentences. However you could then train a token level model to predict next token, which takes input from both its own context (previous 4096 tokens, or more, whatever is expedient), AND the sentence level prediction model, which would have a courser memory going WAY WAY back... You could potentially use a cross attention style mechanism to feed the next sentence level model into the next token level model. its sort of a multi-modal model but the modalities are both text, just at different levels of organisation? submitted by /u/Alarming-Ad8154 [link] [comments]
- [D]Evaluating xG Models: Comparing Discrete Outcomes with Continuous Predictionsby /u/tipoviento (Machine Learning) on March 28, 2024 at 10:13 am
I've recently developed an xG (expected goals) model using event data, and I'm exploring the best methods for evaluating its accuracy. Given the nature of football, where goals are discrete (or if we look at each shot, it is a binary outcome) but my model predicts a continuous probability range (0,1). I'm curious about the most appropriate statistical techniques or metrics for comparison, rather than just MSE/RMSE. How do you assess the accuracy of your xG models under these conditions? Any advice or references on this topic would be greatly appreciated. submitted by /u/tipoviento [link] [comments]
- [N] The 77 French legal codes are now available via Hugging Face's Datasets library with daily updatesby /u/louisbrulenaudet (Machine Learning) on March 28, 2024 at 7:37 am
This groundwork enables ecosystem players to consider deploying RAG solutions in real time without having to configure data retrieval systems. Link to Louis Brulé-Naudet's Hugging Face profile ```python import concurrent.futures import logging from datasets from tqdm import tqdm def dataset_loader( name:str, streaming:bool=True ) -> datasets.Dataset: """ Helper function to load a single dataset in parallel. Parameters ---------- name : str Name of the dataset to be loaded. streaming : bool, optional Determines if datasets are streamed. Default is True. Returns ------- dataset : datasets.Dataset Loaded dataset object. Raises ------ Exception If an error occurs during dataset loading. """ try: return datasets.load_dataset( name, split="train", streaming=streaming ) except Exception as exc: logging.error(f"Error loading dataset {name}: {exc}") return None def load_datasets( req:list, streaming:bool=True ) -> list: """ Downloads datasets specified in a list and creates a list of loaded datasets. Parameters ---------- req : list A list containing the names of datasets to be downloaded. streaming : bool, optional Determines if datasets are streamed. Default is True. Returns ------- datasets_list : list A list containing loaded datasets as per the requested names provided in 'req'. Raises ------ Exception If an error occurs during dataset loading or processing. Examples -------- >>> datasets = load_datasets(["dataset1", "dataset2"], streaming=False) """ datasets_list = [] with concurrent.futures.ThreadPoolExecutor() as executor: future_to_dataset = {executor.submit(dataset_loader, name): name for name in req} for future in tqdm(concurrent.futures.as_completed(future_to_dataset), total=len(req)): name = future_to_dataset[future] try: dataset = future.result() if dataset: datasets_list.append(dataset) except Exception as exc: logging.error(f"Error processing dataset {name}: {exc}") return datasets_list req = [ "louisbrulenaudet/code-artisanat", "louisbrulenaudet/code-action-sociale-familles", "louisbrulenaudet/code-assurances", "louisbrulenaudet/code-aviation-civile", "louisbrulenaudet/code-cinema-image-animee", "louisbrulenaudet/code-civil", "louisbrulenaudet/code-commande-publique", "louisbrulenaudet/code-commerce", "louisbrulenaudet/code-communes", "louisbrulenaudet/code-communes-nouvelle-caledonie", "louisbrulenaudet/code-consommation", "louisbrulenaudet/code-construction-habitation", "louisbrulenaudet/code-defense", "louisbrulenaudet/code-deontologie-architectes", "louisbrulenaudet/code-disciplinaire-penal-marine-marchande", "louisbrulenaudet/code-domaine-etat", "louisbrulenaudet/code-domaine-etat-collectivites-mayotte", "louisbrulenaudet/code-domaine-public-fluvial-navigation-interieure", "louisbrulenaudet/code-douanes", "louisbrulenaudet/code-douanes-mayotte", "louisbrulenaudet/code-education", "louisbrulenaudet/code-electoral", "louisbrulenaudet/code-energie", "louisbrulenaudet/code-entree-sejour-etrangers-droit-asile", "louisbrulenaudet/code-environnement", "louisbrulenaudet/code-expropriation-utilite-publique", "louisbrulenaudet/code-famille-aide-sociale", "louisbrulenaudet/code-forestier-nouveau", "louisbrulenaudet/code-fonction-publique", "louisbrulenaudet/code-propriete-personnes-publiques", "louisbrulenaudet/code-collectivites-territoriales", "louisbrulenaudet/code-impots", "louisbrulenaudet/code-impots-annexe-i", "louisbrulenaudet/code-impots-annexe-ii", "louisbrulenaudet/code-impots-annexe-iii", "louisbrulenaudet/code-impots-annexe-iv", "louisbrulenaudet/code-impositions-biens-services", "louisbrulenaudet/code-instruments-monetaires-medailles", "louisbrulenaudet/code-juridictions-financieres", "louisbrulenaudet/code-justice-administrative", "louisbrulenaudet/code-justice-militaire-nouveau", "louisbrulenaudet/code-justice-penale-mineurs", "louisbrulenaudet/code-legion-honneur-medaille-militaire-ordre-national-merite", "louisbrulenaudet/livre-procedures-fiscales", "louisbrulenaudet/code-minier", "louisbrulenaudet/code-minier-nouveau", "louisbrulenaudet/code-monetaire-financier", "louisbrulenaudet/code-mutualite", "louisbrulenaudet/code-organisation-judiciaire", "louisbrulenaudet/code-patrimoine", "louisbrulenaudet/code-penal", "louisbrulenaudet/code-penitentiaire", "louisbrulenaudet/code-pensions-civiles-militaires-retraite", "louisbrulenaudet/code-pensions-retraite-marins-francais-commerce-peche-plaisance", "louisbrulenaudet/code-pensions-militaires-invalidite-victimes-guerre", "louisbrulenaudet/code-ports-maritimes", "louisbrulenaudet/code-postes-communications-electroniques", "louisbrulenaudet/code-procedure-civile", "louisbrulenaudet/code-procedure-penale", "louisbrulenaudet/code-procedures-civiles-execution", "louisbrulenaudet/code-propriete-intellectuelle", "louisbrulenaudet/code-recherche", "louisbrulenaudet/code-relations-public-administration", "louisbrulenaudet/code-route", "louisbrulenaudet/code-rural-ancien", "louisbrulenaudet/code-rural-peche-maritime", "louisbrulenaudet/code-sante-publique", "louisbrulenaudet/code-securite-interieure", "louisbrulenaudet/code-securite-sociale", "louisbrulenaudet/code-service-national", "louisbrulenaudet/code-sport", "louisbrulenaudet/code-tourisme", "louisbrulenaudet/code-transports", "louisbrulenaudet/code-travail", "louisbrulenaudet/code-travail-maritime", "louisbrulenaudet/code-urbanisme", "louisbrulenaudet/code-voirie-routiere" ] dataset = load_datasets( req=req, streaming=True ) ``` submitted by /u/louisbrulenaudet [link] [comments]
- [D] What are some of the big tech company sponsored ML research websites that you are aware of for constantly keeping up with the ML research and workings behind their products, like Apple Machine Learning Research (https://machinelearning.apple.com/) or Tesla's AI day videos?by /u/pontiac_RN (Machine Learning) on March 28, 2024 at 5:08 am
It would be great if there were a bundle of such sources or if you have a go to place where you keep up to date with all the new research going on. submitted by /u/pontiac_RN [link] [comments]
- [D] Machine Learning On The Edgeby /u/TheLastMate (Machine Learning) on March 28, 2024 at 2:29 am
Hi guys, I found it today in my drawer. I forgot I had it and have never used it. Then it came to mind how is the current state of ML on the edge and are your predictions for the near future. We usually see big advances and news on big models but not much on applications on device. submitted by /u/TheLastMate [link] [comments]
- [P] deit3-jax: A codebase for training ViTs on TPUsby /u/affjljoo3581 (Machine Learning) on March 27, 2024 at 9:54 pm
Hey all, I have written a codebase to train ViTs by following DeiT and DeiT-III recipes. As they are strong baselines to train vanilla ViTs, it is necessary to reproduce to adopt to the variant research. However, the original repository is implemented in PyTorch, it is impossible to run on TPUs. Therefore I re-implemented the simple ViT training codebase with DeiT and DeiT-III training recipes. Here is my repository: https://github.com/affjljoo3581/deit3-jax. I used Jax/Flax and webdataset to build a TPU-friendly training environment. Below are the reproduction results: DeiT Reproduction Name Data Resolution Epochs Time Reimpl. Original Config Wandb Model T/16 in1k 224 300 2h 40m 73.1% 72.2% config log ckpt S/16 in1k 224 300 2h 43m 79.68% 79.8% config log ckpt B/16 in1k 224 300 4h 40m 81.46% 81.8% config log ckpt DeiT-III on ImageNet-1k Name Data Resolution Epochs Time Reimpl. Original Config Wandb Model S/16 in1k 224 400 2h 38m 80.7% 80.4% config log ckpt S/16 in1k 224 800 5h 19m 81.44% 81.4% config log ckpt B/16 in1k 192 → 224 400 4h 42m 83.6% 83.5% pt / ft pt / ft pt / ft B/16 in1k 192 → 224 800 9h 28m 83.91% 83.8% pt / ft pt / ft pt / ft L/16 in1k 192 → 224 400 14h 10m 84.62% 84.5% pt / ft pt / ft pt / ft L/16 in1k 192 → 224 800 - - 84.9% pt / ft - - H/14 in1k 154 → 224 400 19h 10m 85.12% 85.1% pt / ft pt / ft pt / ft H/14 in1k 154 → 224 800 - - 85.2% pt / ft - - DeiT-III on ImageNet-21k Name Data Resolution Epochs Time Reimpl. Original Config Wandb Model S/16 in21k 224 90 7h 30m 83.04% 82.6% pt / ft pt / ft pt / ft S/16 in21k 224 240 20h 6m 83.39% 83.1% pt / ft pt / ft pt / ft B/16 in21k 224 90 12h 12m 85.35% 85.2% pt / ft pt / ft pt / ft B/16 in21k 224 240 33h 9m 85.68% 85.7% pt / ft pt / ft pt / ft L/16 in21k 224 90 37h 13m 86.83% 86.8% pt / ft pt / ft pt / ft L/16 in21k 224 240 - - 87% pt / ft - - H/14 in21k 126 → 224 90 35h 51m 86.78% 87.2% pt / ft pt / ft pt / ft H/14 in21k 126 → 224 240 - - - pt / ft - - I trained all models on TPU v4-64 Pod slice, provided by the TRC program. I uploaded the checkpoints to the huggingface hub and you can also see the training logs on wandb. For more details, please check out my repository. submitted by /u/affjljoo3581 [link] [comments]
- [D] How do you measure performance of AI copilot/assistant?by /u/n2parko (Machine Learning) on March 27, 2024 at 5:38 pm
Curious to hear from those that are building and deploying products with AI copilots. How are you tracking the interactions? And are you feeding the interaction back into the model for retraining? Put together a how-to to do this with an OS Copilot (Vercel AI SDK) and Segment and would love any feedback to improve the spec: https://segment.com/blog/instrumenting-user-insights-for-your-ai-copilot/ submitted by /u/n2parko [link] [comments]
- [D] What is the state-of-the-art for 1D signal cleanup?by /u/XmintMusic (Machine Learning) on March 27, 2024 at 4:52 pm
I have the following problem. Imagine I have a 'supervised' dataset of 1D curves with inputs and outputs, where the input is a modulated noisy signal and the output is the cleaned desired signal. Is there a consensus in the machine learning community on how to tackle this simple problem? Have you ever worked on anything similar? What algorithm did you end up using? Example: https://imgur.com/JYgkXEe submitted by /u/XmintMusic [link] [comments]
- [P] Insta Face Swapby /u/abdullahozmntr (Machine Learning) on March 27, 2024 at 2:03 pm
ComfyUI node repo: https://github.com/abdozmantar/ComfyUI-InstaSwap Standalone repo: https://github.com/abdozmantar/Standalone-InstaSwap https://i.redd.it/9d4ti20fvvqc1.gif submitted by /u/abdullahozmntr [link] [comments]
Active Hydrating Toner, Anti-Aging Replenishing Advanced Face Moisturizer, with Vitamins A, C, E & Natural Botanicals to Promote Skin Balance & Collagen Production, 6.7 Fl Oz
Age Defying 0.3% Retinol Serum, Anti-Aging Dark Spot Remover for Face, Fine Lines & Wrinkle Pore Minimizer, with Vitamin E & Natural Botanicals
Firming Moisturizer, Advanced Hydrating Facial Replenishing Cream, with Hyaluronic Acid, Resveratrol & Natural Botanicals to Restore Skin's Strength, Radiance, and Resilience, 1.75 Oz
Skin Stem Cell Serum
Smartphone 101 - Pick a smartphone for me - android or iOS - Apple iPhone or Samsung Galaxy or Huawei or Xaomi or Google Pixel
Can AI Really Predict Lottery Results? We Asked an Expert.
Djamgatech
Read Photos and PDFs Aloud for me iOS
Read Photos and PDFs Aloud for me android
Read Photos and PDFs Aloud For me Windows 10/11
Read Photos and PDFs Aloud For Amazon
Get 20% off Google Workspace (Google Meet) Business Plan (AMERICAS): M9HNXHX3WC9H7YE (Email us for more)
Get 20% off Google Google Workspace (Google Meet) Standard Plan with the following codes: 96DRHDRA9J7GTN6(Email us for more)
FREE 10000+ Quiz Trivia and and Brain Teasers for All Topics including Cloud Computing, General Knowledge, History, Television, Music, Art, Science, Movies, Films, US History, Soccer Football, World Cup, Data Science, Machine Learning, Geography, etc....
List of Freely available programming books - What is the single most influential book every Programmers should read
- Bjarne Stroustrup - The C++ Programming Language
- Brian W. Kernighan, Rob Pike - The Practice of Programming
- Donald Knuth - The Art of Computer Programming
- Ellen Ullman - Close to the Machine
- Ellis Horowitz - Fundamentals of Computer Algorithms
- Eric Raymond - The Art of Unix Programming
- Gerald M. Weinberg - The Psychology of Computer Programming
- James Gosling - The Java Programming Language
- Joel Spolsky - The Best Software Writing I
- Keith Curtis - After the Software Wars
- Richard M. Stallman - Free Software, Free Society
- Richard P. Gabriel - Patterns of Software
- Richard P. Gabriel - Innovation Happens Elsewhere
- Code Complete (2nd edition) by Steve McConnell
- The Pragmatic Programmer
- Structure and Interpretation of Computer Programs
- The C Programming Language by Kernighan and Ritchie
- Introduction to Algorithms by Cormen, Leiserson, Rivest & Stein
- Design Patterns by the Gang of Four
- Refactoring: Improving the Design of Existing Code
- The Mythical Man Month
- The Art of Computer Programming by Donald Knuth
- Compilers: Principles, Techniques and Tools by Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman
- Gödel, Escher, Bach by Douglas Hofstadter
- Clean Code: A Handbook of Agile Software Craftsmanship by Robert C. Martin
- Effective C++
- More Effective C++
- CODE by Charles Petzold
- Programming Pearls by Jon Bentley
- Working Effectively with Legacy Code by Michael C. Feathers
- Peopleware by Demarco and Lister
- Coders at Work by Peter Seibel
- Surely You're Joking, Mr. Feynman!
- Effective Java 2nd edition
- Patterns of Enterprise Application Architecture by Martin Fowler
- The Little Schemer
- The Seasoned Schemer
- Why's (Poignant) Guide to Ruby
- The Inmates Are Running The Asylum: Why High Tech Products Drive Us Crazy and How to Restore the Sanity
- The Art of Unix Programming
- Test-Driven Development: By Example by Kent Beck
- Practices of an Agile Developer
- Don't Make Me Think
- Agile Software Development, Principles, Patterns, and Practices by Robert C. Martin
- Domain Driven Designs by Eric Evans
- The Design of Everyday Things by Donald Norman
- Modern C++ Design by Andrei Alexandrescu
- Best Software Writing I by Joel Spolsky
- The Practice of Programming by Kernighan and Pike
- Pragmatic Thinking and Learning: Refactor Your Wetware by Andy Hunt
- Software Estimation: Demystifying the Black Art by Steve McConnel
- The Passionate Programmer (My Job Went To India) by Chad Fowler
- Hackers: Heroes of the Computer Revolution
- Algorithms + Data Structures = Programs
- Writing Solid Code
- JavaScript - The Good Parts
- Getting Real by 37 Signals
- Foundations of Programming by Karl Seguin
- Computer Graphics: Principles and Practice in C (2nd Edition)
- Thinking in Java by Bruce Eckel
- The Elements of Computing Systems
- Refactoring to Patterns by Joshua Kerievsky
- Modern Operating Systems by Andrew S. Tanenbaum
- The Annotated Turing
- Things That Make Us Smart by Donald Norman
- The Timeless Way of Building by Christopher Alexander
- The Deadline: A Novel About Project Management by Tom DeMarco
- The C++ Programming Language (3rd edition) by Stroustrup
- Patterns of Enterprise Application Architecture
- Computer Systems - A Programmer's Perspective
- Agile Principles, Patterns, and Practices in C# by Robert C. Martin
- Growing Object-Oriented Software, Guided by Tests
- Framework Design Guidelines by Brad Abrams
- Object Thinking by Dr. David West
- Advanced Programming in the UNIX Environment by W. Richard Stevens
- Hackers and Painters: Big Ideas from the Computer Age
- The Soul of a New Machine by Tracy Kidder
- CLR via C# by Jeffrey Richter
- The Timeless Way of Building by Christopher Alexander
- Design Patterns in C# by Steve Metsker
- Alice in Wonderland by Lewis Carol
- Zen and the Art of Motorcycle Maintenance by Robert M. Pirsig
- About Face - The Essentials of Interaction Design
- Here Comes Everybody: The Power of Organizing Without Organizations by Clay Shirky
- The Tao of Programming
- Computational Beauty of Nature
- Writing Solid Code by Steve Maguire
- Philip and Alex's Guide to Web Publishing
- Object-Oriented Analysis and Design with Applications by Grady Booch
- Effective Java by Joshua Bloch
- Computability by N. J. Cutland
- Masterminds of Programming
- The Tao Te Ching
- The Productive Programmer
- The Art of Deception by Kevin Mitnick
- The Career Programmer: Guerilla Tactics for an Imperfect World by Christopher Duncan
- Paradigms of Artificial Intelligence Programming: Case studies in Common Lisp
- Masters of Doom
- Pragmatic Unit Testing in C# with NUnit by Andy Hunt and Dave Thomas with Matt Hargett
- How To Solve It by George Polya
- The Alchemist by Paulo Coelho
- Smalltalk-80: The Language and its Implementation
- Writing Secure Code (2nd Edition) by Michael Howard
- Introduction to Functional Programming by Philip Wadler and Richard Bird
- No Bugs! by David Thielen
- Rework by Jason Freid and DHH
- JUnit in Action
#BlackOwned #BlackEntrepreneurs #BlackBuniness #AWSCertified #AWSCloudPractitioner #AWSCertification #AWSCLFC02 #CloudComputing #AWSStudyGuide #AWSTraining #AWSCareer #AWSExamPrep #AWSCommunity #AWSEducation #AWSBasics #AWSCertified #AWSMachineLearning #AWSCertification #AWSSpecialty #MachineLearning #AWSStudyGuide #CloudComputing #DataScience #AWSCertified #AWSSolutionsArchitect #AWSArchitectAssociate #AWSCertification #AWSStudyGuide #CloudComputing #AWSArchitecture #AWSTraining #AWSCareer #AWSExamPrep #AWSCommunity #AWSEducation #AzureFundamentals #AZ900 #MicrosoftAzure #ITCertification #CertificationPrep #StudyMaterials #TechLearning #MicrosoftCertified #AzureCertification #TechBooks
Top 1000 Canada Quiz and trivia: CANADA CITIZENSHIP TEST- HISTORY - GEOGRAPHY - GOVERNMENT- CULTURE - PEOPLE - LANGUAGES - TRAVEL - WILDLIFE - HOCKEY - TOURISM - SCENERIES - ARTS - DATA VISUALIZATION
Top 1000 Africa Quiz and trivia: HISTORY - GEOGRAPHY - WILDLIFE - CULTURE - PEOPLE - LANGUAGES - TRAVEL - TOURISM - SCENERIES - ARTS - DATA VISUALIZATION
Exploring the Pros and Cons of Visiting All Provinces and Territories in Canada.
Exploring the Advantages and Disadvantages of Visiting All 50 States in the USA
Health Health, a science-based community to discuss health news and the coronavirus (COVID-19) pandemic
- Older Americans now have twice as many STIs as a decade agoby /u/newsweek on March 28, 2024 at 1:08 pm
submitted by /u/newsweek [link] [comments]
- Former Kansas City Chiefs cheerleader Krystal Anderson dies from sepsis after giving birthby /u/zsreport on March 28, 2024 at 10:54 am
submitted by /u/zsreport [link] [comments]
- Here's what to know about dengue, as Puerto Rico declares a public health emergencyby /u/Maxcactus on March 28, 2024 at 10:50 am
submitted by /u/Maxcactus [link] [comments]
- HIV cure nearer with way to "shock and kill" latent virusby /u/newsweek on March 28, 2024 at 10:05 am
submitted by /u/newsweek [link] [comments]
- ‘The craving is just not there’: How Ozempic is affecting snacking culture - National | Globalnews.caby /u/Bean_Tiger on March 28, 2024 at 8:47 am
submitted by /u/Bean_Tiger [link] [comments]
Today I Learned (TIL) You learn something new every day; what did you learn today? Submit interesting and specific facts about something that you just found out here.
- TIL after murdering a woman in 1821, convicted killer John Horwood was hanged, had his body dissected and his skin was used to bind a book that contained the details of his crime in a practice called anthropodermic bibliopegy.by /u/AudibleNod on March 28, 2024 at 1:50 pm
submitted by /u/AudibleNod [link] [comments]
- TIL, in the year 2003, Maywood Chemical Works — now owned by Stepan Company — imported more than 385,000 pounds of coca leaf for Coca-Cola, enough to make $200 million of cocaine, all of which legally had to be destroyed, likely by incineration.by /u/Pappyjang on March 28, 2024 at 1:13 pm
submitted by /u/Pappyjang [link] [comments]
- TIL that bed bugs have no courtship rituals. What they have, instead, is a type of mating behavior called traumatic insemination.by /u/bashfulstandpoint on March 28, 2024 at 12:19 pm
submitted by /u/bashfulstandpoint [link] [comments]
- TIL the "Red Phone", which linked Washington to Moscow wasn't red, or even a phone. It was teletype when the system started in 1963, switched to fax in 1986, and secure email since 2008.by /u/-nhops- on March 28, 2024 at 11:59 am
submitted by /u/-nhops- [link] [comments]
- TIL about Walter F. White, an NAACP leader for over 25 years who passed as white, infiltrated lynching rings, and architected Brown v. Board of Education. Despite controversy surrounding his methods, his work exposed injustices and advanced civil rights.by /u/bobbyioaloha on March 28, 2024 at 7:46 am
submitted by /u/bobbyioaloha [link] [comments]
Reddit Science This community is a place to share and discuss new scientific research. Read about the latest advances in astronomy, biology, medicine, physics, social science, and more. Find and submit new publications and popular science coverage of current research.
- Dogs can be trained to detect the scent of trauma reactions and stress in people's breath, according to international researchers, who say this could make PTSD assistance dogs more effective.by /u/chrisdh79 on March 28, 2024 at 12:48 pm
submitted by /u/chrisdh79 [link] [comments]
- Living on a greener street or having views of blue spaces from your home will help you sleep for longer, finds new research across 18 countries. Lack of sleep, defined as fewer than 6 hours a night, is a significant public health issue in industrialised countries, affecting around 16% of UK adults.by /u/mvea on March 28, 2024 at 11:59 am
submitted by /u/mvea [link] [comments]
- Applying Polygenic Predictors of Musical Ability to Beethoven's Genome Incorrectly Predics Beethoven to have been a Below Average Musicianby /u/Imperio_do_Interior on March 28, 2024 at 11:52 am
submitted by /u/Imperio_do_Interior [link] [comments]
- Taking certain common contraceptive hormones for a year or more is associated with an increased risk of developing a brain tumor that requires surgical removal, according to a new study | The findings highlight the need for women to review their contraceptive medications regularly.by /u/chrisdh79 on March 28, 2024 at 11:22 am
submitted by /u/chrisdh79 [link] [comments]
- A component of the aromatic spice cinnamon caused hair follicles to sprout in the lab, with researchers now set on developing a novel treatment to reverse hair loss through the use of natural compounds.by /u/chrisdh79 on March 28, 2024 at 11:20 am
submitted by /u/chrisdh79 [link] [comments]
Reddit Sports Sports News and Highlights from the NFL, NBA, NHL, MLB, MLS, and leagues around the world.
- LeBron James being 'strategic' with health vs. Lakers' seedingby /u/PrincessBananas85 on March 28, 2024 at 1:38 pm
submitted by /u/PrincessBananas85 [link] [comments]
- Stephen Curry 'lets out steam,' lifts Dubs after Green tossedby /u/PrincessBananas85 on March 28, 2024 at 7:42 am
submitted by /u/PrincessBananas85 [link] [comments]
- Sunrisers hit highest total in IPL historyby /u/Huge-Physics5491 on March 28, 2024 at 4:13 am
submitted by /u/Huge-Physics5491 [link] [comments]
- Isaiah Joe puts Jeff Green on a poster with a monster dunkby /u/BCLetsRide69 on March 28, 2024 at 3:50 am
submitted by /u/BCLetsRide69 [link] [comments]
- Draymond ejected early in 1st quarter vs. Magicby /u/Oldtimer_2 on March 28, 2024 at 1:27 am
submitted by /u/Oldtimer_2 [link] [comments]