DjamgaMind: Audio Intelligence for the C-Suite (Energy, Healthcare, Finance)
Are you drowning in dense legal text? DjamgaMind is the new audio intelligence platform that turns 100-page healthcare or Energy mandates into 5-minute executive briefings. Whether you are navigating Bill C-27 (Canada) or the CMS-0057-F Interoperability Rule (USA), our AI agents decode the liability so you don’t have to. 👉 Start your specialized audio briefing today at Djamgamind.com
AI Jobs and Career
I wanted to share an exciting opportunity for those of you looking to advance your careers in the AI space. You know how rapidly the landscape is evolving, and finding the right fit can be a challenge. That's why I'm excited about Mercor – they're a platform specifically designed to connect top-tier AI talent with leading companies. Whether you're a data scientist, machine learning engineer, or something else entirely, Mercor can help you find your next big role. If you're ready to take the next step in your AI career, check them out through my referral link: https://work.mercor.com/?referralCode=82d5f4e3-e1a3-4064-963f-c197bb2c8db1. It's a fantastic resource, and I encourage you to explore the opportunities they have available.
- Full Stack Engineer [$150K-$220K]
- Software Engineer, Tooling & AI Workflow, Contract [$90/hour]
- DevOps Engineer, India, Contract [$90/hour]
- More AI Jobs Opportunitieshere
| Job Title | Status | Pay |
|---|---|---|
| Full-Stack Engineer | Strong match, Full-time | $150K - $220K / year |
| Developer Experience and Productivity Engineer | Pre-qualified, Full-time | $160K - $300K / year |
| Software Engineer - Tooling & AI Workflows (Contract) | Contract | $90 / hour |
| DevOps Engineer (India) | Full-time | $20K - $50K / year |
| Senior Full-Stack Engineer | Full-time | $2.8K - $4K / week |
| Enterprise IT & Cloud Domain Expert - India | Contract | $20 - $30 / hour |
| Senior Software Engineer | Contract | $100 - $200 / hour |
| Senior Software Engineer | Pre-qualified, Full-time | $150K - $300K / year |
| Senior Full-Stack Engineer: Latin America | Full-time | $1.6K - $2.1K / week |
| Software Engineering Expert | Contract | $50 - $150 / hour |
| Generalist Video Annotators | Contract | $45 / hour |
| Generalist Writing Expert | Contract | $45 / hour |
| Editors, Fact Checkers, & Data Quality Reviewers | Contract | $50 - $60 / hour |
| Multilingual Expert | Contract | $54 / hour |
| Mathematics Expert (PhD) | Contract | $60 - $80 / hour |
| Software Engineer - India | Contract | $20 - $45 / hour |
| Physics Expert (PhD) | Contract | $60 - $80 / hour |
| Finance Expert | Contract | $150 / hour |
| Designers | Contract | $50 - $70 / hour |
| Chemistry Expert (PhD) | Contract | $60 - $80 / hour |
What are the Top 200 AWS and Google Certified Machine Learning Specialty Questions and Answers Dumps?
This blog is the best way is the best way to prepare for your upcoming AWS Certified Machine Learning Specialty and Google Certified Professional Machine Learning Engineer exam. With over 100 questions and answers, this blog provides quizzes similar that are very similar to the real exam. It also includes the option to show and hide answers. Additionally, there are machine learning interview questions and detailed answers, as well as cheat sheets and illustrations. This blog is the best way to make sure you are well-prepared for your AWS Certified Machine Learning Specialty Exam.

The typical Google Machine Learning Engineer salary is $147,218. Machine Learning Engineer salaries at Google can range from $110,000 – $152,183.
Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it to learn for themselves.
- By the end of 2020, 85% of customer interactions will be handled without a human (Call Center, Chatbot, etc…)
- 61% of marketers say artificial intelligence is the most important aspect of their data strategy.
- 80% of business and tech leaders say AI already boosts productivity (Robotic Process Automation, Power Automate, etc..)
- Current AI technology can boost business productivity by up to 40%
AWS Machine Learning Certification Specialty Exam Prep for iOs Android Windows10/11

GCP Professional Machine Learning Engineer for iOs, Android, Windows 10/11
Quizzes, Practice Exams: Framing, Architecting, Designing, Developing ML Problems & Solutions, ML Jobs Interview Q&A

Azure AI Fundamentals AI-900 Exam Prep App for iOS, Android, Windows10/11
Basics and Advanced Machine Learning Quizzes on Azure, Azure Machine Learning Job Interviews Questions and Answer, ML Cheat Sheets

Machine Learning For Dummies App for iOs, Android, Windows10/11
Use this App to learn about Machine Learning and Elevate your Brain with Machine Learning Quizzes, Cheat Sheets, Ml Jobs Interview Questions and Answers updated daily.
AI-Powered Professional Certification Quiz Platform
Web|iOs|Android|Windows
Are you passionate about AI and looking for your next career challenge? In the fast-evolving world of artificial intelligence, connecting with the right opportunities can make all the difference. We're excited to recommend Mercor, a premier platform dedicated to bridging the gap between exceptional AI professionals and innovative companies.
Whether you're seeking roles in machine learning, data science, or other cutting-edge AI fields, Mercor offers a streamlined path to your ideal position. Explore the possibilities and accelerate your AI career by visiting Mercor through our exclusive referral link:
Find Your AI Dream Job on Mercor
Your next big opportunity in AI could be just a click away!

What does a Professional Machine Learning Engineer do?
A Professional Machine Learning Engineer designs, builds, and productionizes ML models to solve business challenges using Google Cloud technologies and knowledge of proven ML models and techniques. The ML Engineer collaborates closely with other job roles to ensure long-term success of models. The ML Engineer should be proficient in all aspects of model architecture, data pipeline interaction, and metrics interpretation. The ML Engineer needs familiarity with application development, infrastructure management, data engineering, and security. Through an understanding of training, retraining, deploying, scheduling, monitoring, and improving models, they design and create scalable solutions for optimal performance.
The AWS Certified Machine Learning – Specialty certification is intended for individuals who perform a development or data science role. It validates a candidate’s ability to design, implement, deploy, and maintain machine learning (ML) solutions for given business problems.
This blog covers Machine Learning 101, Top 20 AWS Certified Machine Learning Specialty Questions and Answers, Top 20 Google Professional Machine Learning Engineer Sample Questions, Machine Learning Quizzes, Machine Learning Q&A, Top 10 Machine Learning Algorithms, Machine Learning Latest Hot News, Machine Learning Demos (Ex: Tensorflow Demos)
Below are the Top 100 AWS Certified Machine Learning Specialty Questions and Answers Dumps.
AI Jobs and Career
And before we wrap up today's AI news, I wanted to share an exciting opportunity for those of you looking to advance your careers in the AI space. You know how rapidly the landscape is evolving, and finding the right fit can be a challenge. That's why I'm excited about Mercor – they're a platform specifically designed to connect top-tier AI talent with leading companies. Whether you're a data scientist, machine learning engineer, or something else entirely, Mercor can help you find your next big role. If you're ready to take the next step in your AI career, check them out through my referral link: https://work.mercor.com/?referralCode=82d5f4e3-e1a3-4064-963f-c197bb2c8db1. It's a fantastic resource, and I encourage you to explore the opportunities they have available.
Question1: A machine learning team has several large CSV datasets in Amazon S3. Historically, models built with the Amazon SageMaker Linear Learner algorithm have taken hours to train on similar-sized datasets. The team’s leaders need to accelerate the training process. What can a machine learning specialist do to address this concern?
A) Use Amazon SageMaker Pipe mode.
B) Use Amazon Machine Learning to train the models.
C) Use Amazon Kinesis to stream the data to Amazon SageMaker.
D) Use AWS Glue to transform the CSV dataset to the JSON format.
ANSWER1:
Notes/Hint1:
Question 2) A local university wants to track cars in a parking lot to determine which students are parking in the lot. The university is wanting to ingest videos of the cars parking in near-real time, use machine learning to identify license plates, and store that data in an AWS data store. Which solution meets these requirements with the LEAST amount of development effort?
A) Use Amazon Kinesis Data Streams to ingest the video in near-real time, use the Kinesis Data Streams consumer integrated with Amazon Rekognition Video to process the license plate information, and then store results in DynamoDB.
B) Use Amazon Kinesis Video Streams to ingest the videos in near-real time, use the Kinesis Video Streams integration with Amazon Rekognition Video to identify the license plate information, and then store the results in DynamoDB.
C) Use Amazon Kinesis Data Streams to ingest videos in near-real time, call Amazon Rekognition to identify license plate information, and then store results in DynamoDB.
D) Use Amazon Kinesis Firehose to ingest the video in near-real time and outputs results onto S3. Set up a Lambda function that triggers when a new video is PUT onto S3 to send results to Amazon Rekognition to identify license plate information, and then store results in DynamoDB.
Answer 2)
Notes/Hint2)
Question 3) A term frequency–inverse document frequency (tf–idf) matrix using both unigrams and bigrams is built from a text corpus consisting of the following two sentences:
ANSWER3:
Notes/Hint3:
Question 4: A company is setting up a system to manage all of the datasets it stores in Amazon S3. The company would like to automate running transformation jobs on the data and maintaining a catalog of the metadata concerning the datasets. The solution should require the least amount of setup and maintenance. Which solution will allow the company to achieve its goals?
ANSWER4:
Notes/Hint4:
Question 5) Which service in the Kinesis family allows you to easily load streaming data into data stores and analytics tools?
ANSWER5:
Notes/Hint5:
Notes 6)
Notes/Hint 8)
Answer 9)
Notes 9)
Answer 10)
Answer 11)
Notes 11)
Notes 12)
Answer 13)
Notes 13)
Question 14) You have been tasked with capturing two different types of streaming events. The first event type includes mission-critical data that needs to immediately be processed before operations can continue. The second event type includes data of less importance, but operations can continue without immediately processing. What is the most appropriate solution to record these different types of events?
Answer 14)
Notes 14)
Question 15) You are collecting clickstream data from an e-commerce website to make near-real time product suggestions for users actively using the site. Which combination of tools can be used to achieve the quickest recommendations and meets all of the requirements?
Answer 15)
Notes 15)
Question 16) Which service built by AWS makes it easy to set up a retry mechanism, aggregate records to improve throughput, and automatically submits CloudWatch metrics?
Answer 16)
Notes 16)
[appbox appstore 1611045854-iphone screenshots]
[appbox microsoftstore 9n8rl80hvm4t-mobile screenshots]
Question 17) You have been tasked with capturing data from an online gaming platform to run analytics on and process through a machine learning pipeline. The data that you are ingesting is players controller inputs every 1 second (up to 10 players in a game) that is in JSON format. The data needs to be ingested through Kinesis Data Streams and the JSON data blob is 100 KB in size. What is the minimum number of shards you can use to successfully ingest this data?
Answer 17)
Notes 17)
Question 18) Which services in the Kinesis family allows you to analyze streaming data, gain actionable insights, and respond to your business and customer needs in real time?
Answer 18)
Notes 18)
Question 19) You are a ML specialist needing to collect data from Twitter tweets. Your goal is to collect tweets that include only the name of your company and the tweet body, and store it off into a data store in AWS. What set of tools can you use to stream, transform, and load the data into AWS with the LEAST amount of effort?
Answer 19)
Notes 19)
Question 20) Which service in the Kinesis family allows you to build custom applications that process or analyze streaming data for specialized needs?
Answer 20)
Notes 20)
Question21:
Answer21:
What are the Top 100 AWS and Google Certified Machine Learning Specialty Questions and Answers Dumps?
This blog is the best way is the best way to prepare for your upcoming AWS Certified Machine Learning Specialty and Google Certified Professional Machine Learning Engineer exam. With over 100 questions and answers, this blog provides quizzes similar that are very similar to the real exam. It also includes the option to show and hide answers. Additionally, there are machine learning interview questions and detailed answers, as well as cheat sheets and illustrations. This blog is the best way to make sure you are well-prepared for your AWS Certified Machine Learning Specialty Exam.
The typical Google Machine Learning Engineer salary is $147,218. Machine Learning Engineer salaries at Google can range from $110,000 – $152,183.
Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it to learn for themselves.
- By the end of 2020, 85% of customer interactions will be handled without a human (Call Center, Chatbot, etc…)
- 61% of marketers say artificial intelligence is the most important aspect of their data strategy.
- 80% of business and tech leaders say AI already boosts productivity (Robotic Process Automation, Power Automate, etc..)
- Current AI technology can boost business productivity by up to 40%
AWS Machine Learning Certification Specialty Exam Prep for iOs Android Windows10/11

GCP Professional Machine Learning Engineer for iOs, Android, Windows 10/11
Quizzes, Practice Exams: Framing, Architecting, Designing, Developing ML Problems & Solutions, ML Jobs Interview Q&A

Azure AI Fundamentals AI-900 Exam Prep App for iOS, Android, Windows10/11
Basics and Advanced Machine Learning Quizzes on Azure, Azure Machine Learning Job Interviews Questions and Answer, ML Cheat Sheets

Machine Learning For Dummies App for iOs, Android, Windows10/11
Use this App to learn about Machine Learning and Elevate your Brain with Machine Learning Quizzes, Cheat Sheets, Ml Jobs Interview Questions and Answers updated daily.

What does a Professional Machine Learning Engineer do?
A Professional Machine Learning Engineer designs, builds, and productionizes ML models to solve business challenges using Google Cloud technologies and knowledge of proven ML models and techniques. The ML Engineer collaborates closely with other job roles to ensure long-term success of models. The ML Engineer should be proficient in all aspects of model architecture, data pipeline interaction, and metrics interpretation. The ML Engineer needs familiarity with application development, infrastructure management, data engineering, and security. Through an understanding of training, retraining, deploying, scheduling, monitoring, and improving models, they design and create scalable solutions for optimal performance.
The AWS Certified Machine Learning – Specialty certification is intended for individuals who perform a development or data science role. It validates a candidate’s ability to design, implement, deploy, and maintain machine learning (ML) solutions for given business problems.
This blog covers Machine Learning 101, Top 20 AWS Certified Machine Learning Specialty Questions and Answers, Top 20 Google Professional Machine Learning Engineer Sample Questions, Machine Learning Quizzes, Machine Learning Q&A, Top 10 Machine Learning Algorithms, Machine Learning Latest Hot News, Machine Learning Demos (Ex: Tensorflow Demos)
Question1: A machine learning team has several large CSV datasets in Amazon S3. Historically, models built with the Amazon SageMaker Linear Learner algorithm have taken hours to train on similar-sized datasets. The team’s leaders need to accelerate the training process. What can a machine learning specialist do to address this concern?
A) Use Amazon SageMaker Pipe mode.
B) Use Amazon Machine Learning to train the models.
C) Use Amazon Kinesis to stream the data to Amazon SageMaker.
D) Use AWS Glue to transform the CSV dataset to the JSON format.
ANSWER1:
Notes/Hint1:
Question 2) A local university wants to track cars in a parking lot to determine which students are parking in the lot. The university is wanting to ingest videos of the cars parking in near-real time, use machine learning to identify license plates, and store that data in an AWS data store. Which solution meets these requirements with the LEAST amount of development effort?
A) Use Amazon Kinesis Data Streams to ingest the video in near-real time, use the Kinesis Data Streams consumer integrated with Amazon Rekognition Video to process the license plate information, and then store results in DynamoDB.
B) Use Amazon Kinesis Video Streams to ingest the videos in near-real time, use the Kinesis Video Streams integration with Amazon Rekognition Video to identify the license plate information, and then store the results in DynamoDB.
C) Use Amazon Kinesis Data Streams to ingest videos in near-real time, call Amazon Rekognition to identify license plate information, and then store results in DynamoDB.
D) Use Amazon Kinesis Firehose to ingest the video in near-real time and outputs results onto S3. Set up a Lambda function that triggers when a new video is PUT onto S3 to send results to Amazon Rekognition to identify license plate information, and then store results in DynamoDB.
Answer 2)
Notes/Hint2)
Question 3) A term frequency–inverse document frequency (tf–idf) matrix using both unigrams and bigrams is built from a text corpus consisting of the following two sentences:
ANSWER3:
Notes/Hint3:
Question 4: A company is setting up a system to manage all of the datasets it stores in Amazon S3. The company would like to automate running transformation jobs on the data and maintaining a catalog of the metadata concerning the datasets. The solution should require the least amount of setup and maintenance. Which solution will allow the company to achieve its goals?
ANSWER4:
Notes/Hint4:
Question 5) Which service in the Kinesis family allows you to easily load streaming data into data stores and analytics tools?
ANSWER5:
Notes/Hint5:
Notes 6)
Notes/Hint 8)
Answer 9)
Notes 9)
Answer 10)
Answer 11)
Notes 11)
Notes 12)
Answer 13)
Notes 13)
Question 14) You have been tasked with capturing two different types of streaming events. The first event type includes mission-critical data that needs to immediately be processed before operations can continue. The second event type includes data of less importance, but operations can continue without immediately processing. What is the most appropriate solution to record these different types of events?
Answer 14)
Notes 14)
Question 15) You are collecting clickstream data from an e-commerce website to make near-real time product suggestions for users actively using the site. Which combination of tools can be used to achieve the quickest recommendations and meets all of the requirements?
Answer 15)
Notes 15)
Question 16) Which service built by AWS makes it easy to set up a retry mechanism, aggregate records to improve throughput, and automatically submits CloudWatch metrics?
Answer 16)
Notes 16)
[appbox appstore 1611045854-iphone screenshots]
[appbox microsoftstore 9n8rl80hvm4t-mobile screenshots]
Question 17) You have been tasked with capturing data from an online gaming platform to run analytics on and process through a machine learning pipeline. The data that you are ingesting is players controller inputs every 1 second (up to 10 players in a game) that is in JSON format. The data needs to be ingested through Kinesis Data Streams and the JSON data blob is 100 KB in size. What is the minimum number of shards you can use to successfully ingest this data?
Answer 17)
Notes 17)
Question 18) Which services in the Kinesis family allows you to analyze streaming data, gain actionable insights, and respond to your business and customer needs in real time?
Answer 18)
Notes 18)
Question 19) You are a ML specialist needing to collect data from Twitter tweets. Your goal is to collect tweets that include only the name of your company and the tweet body, and store it off into a data store in AWS. What set of tools can you use to stream, transform, and load the data into AWS with the LEAST amount of effort?
Answer 19)
Notes 19)
Question 20) Which service in the Kinesis family allows you to build custom applications that process or analyze streaming data for specialized needs?
Answer 20)
Notes 20)
Question21:
Answer21:
Notes 21:
Question22:
Answer22:
Notes 22:
Question23:
Answer23:
Notes 23:
Question24:
Answer24:
Notes 24:
What are the Top 100 AWS and Google Certified Machine Learning Specialty Questions and Answers Dumps?
This blog is the best way is the best way to prepare for your upcoming AWS Certified Machine Learning Specialty and Google Certified Professional Machine Learning Engineer exam. With over 100 questions and answers, this blog provides quizzes similar that are very similar to the real exam. It also includes the option to show and hide answers. Additionally, there are machine learning interview questions and detailed answers, as well as cheat sheets and illustrations. This blog is the best way to make sure you are well-prepared for your AWS Certified Machine Learning Specialty Exam.
The typical Google Machine Learning Engineer salary is $147,218. Machine Learning Engineer salaries at Google can range from $110,000 – $152,183.
Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it to learn for themselves.
- By the end of 2020, 85% of customer interactions will be handled without a human (Call Center, Chatbot, etc…)
- 61% of marketers say artificial intelligence is the most important aspect of their data strategy.
- 80% of business and tech leaders say AI already boosts productivity (Robotic Process Automation, Power Automate, etc..)
- Current AI technology can boost business productivity by up to 40%
AWS Machine Learning Certification Specialty Exam Prep for iOs Android Windows10/11

GCP Professional Machine Learning Engineer for iOs, Android, Windows 10/11
Quizzes, Practice Exams: Framing, Architecting, Designing, Developing ML Problems & Solutions, ML Jobs Interview Q&A

Azure AI Fundamentals AI-900 Exam Prep App for iOS, Android, Windows10/11
Basics and Advanced Machine Learning Quizzes on Azure, Azure Machine Learning Job Interviews Questions and Answer, ML Cheat Sheets

Machine Learning For Dummies App for iOs, Android, Windows10/11
Use this App to learn about Machine Learning and Elevate your Brain with Machine Learning Quizzes, Cheat Sheets, Ml Jobs Interview Questions and Answers updated daily.

What does a Professional Machine Learning Engineer do?
A Professional Machine Learning Engineer designs, builds, and productionizes ML models to solve business challenges using Google Cloud technologies and knowledge of proven ML models and techniques. The ML Engineer collaborates closely with other job roles to ensure long-term success of models. The ML Engineer should be proficient in all aspects of model architecture, data pipeline interaction, and metrics interpretation. The ML Engineer needs familiarity with application development, infrastructure management, data engineering, and security. Through an understanding of training, retraining, deploying, scheduling, monitoring, and improving models, they design and create scalable solutions for optimal performance.
The AWS Certified Machine Learning – Specialty certification is intended for individuals who perform a development or data science role. It validates a candidate’s ability to design, implement, deploy, and maintain machine learning (ML) solutions for given business problems.
This blog covers Machine Learning 101, Top 20 AWS Certified Machine Learning Specialty Questions and Answers, Top 20 Google Professional Machine Learning Engineer Sample Questions, Machine Learning Quizzes, Machine Learning Q&A, Top 10 Machine Learning Algorithms, Machine Learning Latest Hot News, Machine Learning Demos (Ex: Tensorflow Demos)
Question1: A machine learning team has several large CSV datasets in Amazon S3. Historically, models built with the Amazon SageMaker Linear Learner algorithm have taken hours to train on similar-sized datasets. The team’s leaders need to accelerate the training process. What can a machine learning specialist do to address this concern?
A) Use Amazon SageMaker Pipe mode.
B) Use Amazon Machine Learning to train the models.
C) Use Amazon Kinesis to stream the data to Amazon SageMaker.
D) Use AWS Glue to transform the CSV dataset to the JSON format.
ANSWER1:
Notes/Hint1:
Question 2) A local university wants to track cars in a parking lot to determine which students are parking in the lot. The university is wanting to ingest videos of the cars parking in near-real time, use machine learning to identify license plates, and store that data in an AWS data store. Which solution meets these requirements with the LEAST amount of development effort?
A) Use Amazon Kinesis Data Streams to ingest the video in near-real time, use the Kinesis Data Streams consumer integrated with Amazon Rekognition Video to process the license plate information, and then store results in DynamoDB.
B) Use Amazon Kinesis Video Streams to ingest the videos in near-real time, use the Kinesis Video Streams integration with Amazon Rekognition Video to identify the license plate information, and then store the results in DynamoDB.
C) Use Amazon Kinesis Data Streams to ingest videos in near-real time, call Amazon Rekognition to identify license plate information, and then store results in DynamoDB.
D) Use Amazon Kinesis Firehose to ingest the video in near-real time and outputs results onto S3. Set up a Lambda function that triggers when a new video is PUT onto S3 to send results to Amazon Rekognition to identify license plate information, and then store results in DynamoDB.
Answer 2)
Notes/Hint2)
Question 3) A term frequency–inverse document frequency (tf–idf) matrix using both unigrams and bigrams is built from a text corpus consisting of the following two sentences:
ANSWER3:
Notes/Hint3:
Question 4: A company is setting up a system to manage all of the datasets it stores in Amazon S3. The company would like to automate running transformation jobs on the data and maintaining a catalog of the metadata concerning the datasets. The solution should require the least amount of setup and maintenance. Which solution will allow the company to achieve its goals?
ANSWER4:
Notes/Hint4:
Question 5) Which service in the Kinesis family allows you to easily load streaming data into data stores and analytics tools?
ANSWER5:
Notes/Hint5:
Notes 6)
Notes/Hint 8)
Answer 9)
Notes 9)
Answer 10)
Answer 11)
Notes 11)
Notes 12)
Answer 13)
Notes 13)
Question 14) You have been tasked with capturing two different types of streaming events. The first event type includes mission-critical data that needs to immediately be processed before operations can continue. The second event type includes data of less importance, but operations can continue without immediately processing. What is the most appropriate solution to record these different types of events?
Answer 14)
Notes 14)
Question 15) You are collecting clickstream data from an e-commerce website to make near-real time product suggestions for users actively using the site. Which combination of tools can be used to achieve the quickest recommendations and meets all of the requirements?
Answer 15)
Notes 15)
Question 16) Which service built by AWS makes it easy to set up a retry mechanism, aggregate records to improve throughput, and automatically submits CloudWatch metrics?
Answer 16)
Notes 16)
[appbox appstore 1611045854-iphone screenshots]
[appbox microsoftstore 9n8rl80hvm4t-mobile screenshots]
Question 17) You have been tasked with capturing data from an online gaming platform to run analytics on and process through a machine learning pipeline. The data that you are ingesting is players controller inputs every 1 second (up to 10 players in a game) that is in JSON format. The data needs to be ingested through Kinesis Data Streams and the JSON data blob is 100 KB in size. What is the minimum number of shards you can use to successfully ingest this data?
Answer 17)
Notes 17)
Question 18) Which services in the Kinesis family allows you to analyze streaming data, gain actionable insights, and respond to your business and customer needs in real time?
Answer 18)
Notes 18)
Question 19) You are a ML specialist needing to collect data from Twitter tweets. Your goal is to collect tweets that include only the name of your company and the tweet body, and store it off into a data store in AWS. What set of tools can you use to stream, transform, and load the data into AWS with the LEAST amount of effort?
Answer 19)
Notes 19)
Question 20) Which service in the Kinesis family allows you to build custom applications that process or analyze streaming data for specialized needs?
Answer 20)
Notes 20)
Question21:
Answer21:
What are the Top 100 AWS and Google Certified Machine Learning Specialty Questions and Answers Dumps?
This blog is the best way is the best way to prepare for your upcoming AWS Certified Machine Learning Specialty and Google Certified Professional Machine Learning Engineer exam. With over 100 questions and answers, this blog provides quizzes similar that are very similar to the real exam. It also includes the option to show and hide answers. Additionally, there are machine learning interview questions and detailed answers, as well as cheat sheets and illustrations. This blog is the best way to make sure you are well-prepared for your AWS Certified Machine Learning Specialty Exam.
The typical Google Machine Learning Engineer salary is $147,218. Machine Learning Engineer salaries at Google can range from $110,000 – $152,183.
Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it to learn for themselves.
- By the end of 2020, 85% of customer interactions will be handled without a human (Call Center, Chatbot, etc…)
- 61% of marketers say artificial intelligence is the most important aspect of their data strategy.
- 80% of business and tech leaders say AI already boosts productivity (Robotic Process Automation, Power Automate, etc..)
- Current AI technology can boost business productivity by up to 40%
AWS Machine Learning Certification Specialty Exam Prep for iOs Android Windows10/11

GCP Professional Machine Learning Engineer for iOs, Android, Windows 10/11
Quizzes, Practice Exams: Framing, Architecting, Designing, Developing ML Problems & Solutions, ML Jobs Interview Q&A

Azure AI Fundamentals AI-900 Exam Prep App for iOS, Android, Windows10/11
Basics and Advanced Machine Learning Quizzes on Azure, Azure Machine Learning Job Interviews Questions and Answer, ML Cheat Sheets

Machine Learning For Dummies App for iOs, Android, Windows10/11
Use this App to learn about Machine Learning and Elevate your Brain with Machine Learning Quizzes, Cheat Sheets, Ml Jobs Interview Questions and Answers updated daily.

What does a Professional Machine Learning Engineer do?
A Professional Machine Learning Engineer designs, builds, and productionizes ML models to solve business challenges using Google Cloud technologies and knowledge of proven ML models and techniques. The ML Engineer collaborates closely with other job roles to ensure long-term success of models. The ML Engineer should be proficient in all aspects of model architecture, data pipeline interaction, and metrics interpretation. The ML Engineer needs familiarity with application development, infrastructure management, data engineering, and security. Through an understanding of training, retraining, deploying, scheduling, monitoring, and improving models, they design and create scalable solutions for optimal performance.
The AWS Certified Machine Learning – Specialty certification is intended for individuals who perform a development or data science role. It validates a candidate’s ability to design, implement, deploy, and maintain machine learning (ML) solutions for given business problems.
This blog covers Machine Learning 101, Top 20 AWS Certified Machine Learning Specialty Questions and Answers, Top 20 Google Professional Machine Learning Engineer Sample Questions, Machine Learning Quizzes, Machine Learning Q&A, Top 10 Machine Learning Algorithms, Machine Learning Latest Hot News, Machine Learning Demos (Ex: Tensorflow Demos)
Question1: A machine learning team has several large CSV datasets in Amazon S3. Historically, models built with the Amazon SageMaker Linear Learner algorithm have taken hours to train on similar-sized datasets. The team’s leaders need to accelerate the training process. What can a machine learning specialist do to address this concern?
A) Use Amazon SageMaker Pipe mode.
B) Use Amazon Machine Learning to train the models.
C) Use Amazon Kinesis to stream the data to Amazon SageMaker.
D) Use AWS Glue to transform the CSV dataset to the JSON format.
ANSWER1:
Notes/Hint1:
Question 2) A local university wants to track cars in a parking lot to determine which students are parking in the lot. The university is wanting to ingest videos of the cars parking in near-real time, use machine learning to identify license plates, and store that data in an AWS data store. Which solution meets these requirements with the LEAST amount of development effort?
A) Use Amazon Kinesis Data Streams to ingest the video in near-real time, use the Kinesis Data Streams consumer integrated with Amazon Rekognition Video to process the license plate information, and then store results in DynamoDB.
B) Use Amazon Kinesis Video Streams to ingest the videos in near-real time, use the Kinesis Video Streams integration with Amazon Rekognition Video to identify the license plate information, and then store the results in DynamoDB.
C) Use Amazon Kinesis Data Streams to ingest videos in near-real time, call Amazon Rekognition to identify license plate information, and then store results in DynamoDB.
D) Use Amazon Kinesis Firehose to ingest the video in near-real time and outputs results onto S3. Set up a Lambda function that triggers when a new video is PUT onto S3 to send results to Amazon Rekognition to identify license plate information, and then store results in DynamoDB.
Answer 2)
Notes/Hint2)
Question 3) A term frequency–inverse document frequency (tf–idf) matrix using both unigrams and bigrams is built from a text corpus consisting of the following two sentences:
ANSWER3:
Notes/Hint3:
Question 4: A company is setting up a system to manage all of the datasets it stores in Amazon S3. The company would like to automate running transformation jobs on the data and maintaining a catalog of the metadata concerning the datasets. The solution should require the least amount of setup and maintenance. Which solution will allow the company to achieve its goals?
ANSWER4:
Notes/Hint4:
Question 5) Which service in the Kinesis family allows you to easily load streaming data into data stores and analytics tools?
ANSWER5:
Notes/Hint5:
Question 6) A data scientist is working on optimizing a model during the training process by varying multiple parameters. The data scientist observes that, during multiple runs with identical parameters, the loss function converges to different, yet stable, values. What should the data scientist do to improve the training process?
Notes 6)
Question 7) Your organization has a standalone Javascript (Node.js) application that streams data into AWS using Kinesis Data Streams. You notice that they are using the Kinesis API (AWS SDK) over the Kinesis Producer Library (KPL). What might be the reasoning behind this?
Question 8) A data scientist is evaluating different binary classification models. A false positive result is 5 times more expensive (from a business perspective) than a false negative result. The models should be evaluated based on the following criteria:
Notes/Hint 8)
Question 9) A data scientist uses logistic regression to build a fraud detection model. While the model accuracy is 99%, 90% of the fraud cases are not detected by the model. What action will definitely help the model detect more than 10% of fraud cases?
Answer 9)
Notes 9)
Question 10) A company is interested in building a fraud detection model. Currently, the data scientist does not have a sufficient amount of information due to the low number of fraud cases. Which method is MOST likely to detect the GREATEST number of valid fraud cases?
Answer 10)
Question 11) A machine learning engineer is preparing a data frame for a supervised learning task with the Amazon SageMaker Linear Learner algorithm. The ML engineer notices the target label classes are highly imbalanced and multiple feature columns contain missing values. The proportion of missing values across the entire data frame is less than 5%. What should the ML engineer do to minimize bias due to missing values?
Answer 11)
Notes 11)
Question 12) A company has collected customer comments on its products, rating them as safe or unsafe, using decision trees. The training dataset has the following features: id, date, full review, full review summary, and a binary safe/unsafe tag. During training, any data sample with missing features was dropped. In a few instances, the test set was found to be missing the full review text field. For this use case, which is the most effective course of action to address test data samples with missing features?
Notes 12)
Question 13) An insurance company needs to automate claim compliance reviews because human reviews are expensive and error-prone. The company has a large set of claims and a compliance label for each. Each claim consists of a few sentences in English, many of which contain complex related information. Management would like to use Amazon SageMaker built-in algorithms to design a machine learning supervised model that can be trained to read each claim and predict if the claim is compliant or not. Which approach should be used to extract features from the claims to be used as inputs for the downstream supervised task?
Answer 13)
Notes 13)
Question 14) You have been tasked with capturing two different types of streaming events. The first event type includes mission-critical data that needs to immediately be processed before operations can continue. The second event type includes data of less importance, but operations can continue without immediately processing. What is the most appropriate solution to record these different types of events?
Answer 14)
Notes 14)
Question 15) You are collecting clickstream data from an e-commerce website to make near-real time product suggestions for users actively using the site. Which combination of tools can be used to achieve the quickest recommendations and meets all of the requirements?
Answer 15)
Notes 15)
Question 16) Which service built by AWS makes it easy to set up a retry mechanism, aggregate records to improve throughput, and automatically submits CloudWatch metrics?
Answer 16)
Notes 16)
[appbox appstore 1611045854-iphone screenshots]
[appbox microsoftstore 9n8rl80hvm4t-mobile screenshots]
Question 17) You have been tasked with capturing data from an online gaming platform to run analytics on and process through a machine learning pipeline. The data that you are ingesting is players controller inputs every 1 second (up to 10 players in a game) that is in JSON format. The data needs to be ingested through Kinesis Data Streams and the JSON data blob is 100 KB in size. What is the minimum number of shards you can use to successfully ingest this data?
Answer 17)
Notes 17)
Question 18) Which services in the Kinesis family allows you to analyze streaming data, gain actionable insights, and respond to your business and customer needs in real time?
Answer 18)
Notes 18)
Question 19) You are a ML specialist needing to collect data from Twitter tweets. Your goal is to collect tweets that include only the name of your company and the tweet body, and store it off into a data store in AWS. What set of tools can you use to stream, transform, and load the data into AWS with the LEAST amount of effort?
Answer 19)
Notes 19)
Question 20) Which service in the Kinesis family allows you to build custom applications that process or analyze streaming data for specialized needs?
Answer 20)
Notes 20)
Question21: Of the following, which is an example of machine learning? (Select TWO.)
A) Calculating the shortest route from current location to the destination
B) Optimizing product pricing based on real-time sales data
C) Sentiment analysis of text on product reviews
D) A loan approval system that classifies applicants entirely based on credit score
Answer21:
Notes 21:
Question22:Which of the following is an appropriate use case for unsupervised learning?
A) Partitioning an image of a street scene into multiple segments
B) Finding an optimal path out of a maze
C) Identifying clusters of housing sales based on related data points
D) Analyzing sentiment of social media posts
Answer22:
Notes 22:
Question23:
Answer23:
Notes 23:
Question24: A Djamgatech retail company wants to deploy a machine learning model to predict the demand for a product using sales data from the past 5 years. What is the MOST efficient solution that the company should implement first?
A) Regression
B) Multi-class classification
C) Binary class classification
D) N/A
Answer24:
Notes 24:
Question25: In which phase of the ML pipeline do you analyze the business requirements and re-frame that information into a machine learning context.
A) Problem formulation
B) Model training
C) Deployment
D)
Answer25:
Notes 25:
iOs: https://apps.apple.com/
Android/Amazon: https://www.amazon.com/gp/product/B09TZ4H8V6
AWS MLS-C01 Machine Learning Exam Prep
Quizzes, Practice Exams: Modeling, Data Engineering, Vision, Exploratory Data Analysis, ML Ops, Cheat Sheets, ML Jobs Interview Q&A
Use this App to learn about Machine Learning on AWS and prepare for the AWS Machine Learning Specialty Certification MLS-C01.
Earning AWS Certified Machine Learning Specialty validates expertise in building, training, tuning, and deploying machine learning (ML) models on AWS.
The App provides hundreds of quizzes and practice exam about:
– Machine Learning Operation on AWS
– Modelling
– Data Engineering
– Computer Vision,
– Exploratory Data Analysis,
– ML implementation & Operations
– Machine Learning Basics Questions and Answers
– Machine Learning Advanced Questions and Answers
– Scorecard
– Countdown timer
– Machine Learning Cheat Sheets
– Machine Learning Interview Questions and Answers
– Machine Learning Latest News
The App covers Machine Learning Basics and Advanced topics including: NLP, Computer Vision, Python, linear regression, logistic regression, Sampling, dataset, statistical interaction, selection bias, non-Gaussian distribution, bias-variance trade-off, Normal Distribution, correlation and covariance, Point Estimates and Confidence Interval, A/B Testing, p-value, statistical power of sensitivity, over-fitting and under-fitting, regularization, Law of Large Numbers, Confounding Variables, Survivorship Bias, univariate, bivariate and multivariate, Resampling, ROC curve, TF/IDF vectorization, Cluster Sampling, etc.
Domain 1: Data Engineering
Create data repositories for machine learning.
Identify data sources (e.g., content and location, primary sources such as user data)
Determine storage mediums (e.g., DB, Data Lake, S3, EFS, EBS)
Identify and implement a data ingestion solution.
Data job styles/types (batch load, streaming)
Data ingestion pipelines (Batch-based ML workloads and streaming-based ML workloads), etc.
Domain 2: Exploratory Data Analysis
Sanitize and prepare data for modeling.
Perform feature engineering.
Analyze and visualize data for machine learning.
Domain 3: Modeling
Frame business problems as machine learning problems.
Select the appropriate model(s) for a given machine learning problem.
Train machine learning models.
Perform hyperparameter optimization.
Evaluate machine learning models.
Domain 4: Machine Learning Implementation and Operations
Build machine learning solutions for performance, availability, scalability, resiliency, and fault tolerance.
Recommend and implement the appropriate machine learning services and features for a given problem.
Apply basic AWS security practices to machine learning solutions.
Deploy and operationalize machine learning solutions.
Machine Learning Services covered:
Amazon Comprehend
AWS Deep Learning AMIs (DLAMI)
AWS DeepLens
Amazon Forecast
Amazon Fraud Detector
Amazon Lex
Amazon Polly
Amazon Rekognition
Amazon SageMaker
Amazon Textract
Amazon Transcribe
Amazon Translate
Other Services and topics covered are:
Ingestion/Collection
Processing/ETL
Data analysis/visualization
Model training
Model deployment/inference
Operational
AWS ML application services
Language relevant to ML (for example, Python, Java, Scala, R, SQL)
Notebooks and integrated development environments (IDEs),
S3, SageMaker, Kinesis, Lake Formation, Athena, Kibana, Redshift, Textract, EMR, Glue, SageMaker, CSV, JSON, IMG, parquet or databases, Amazon Athena
Amazon EC2, Amazon Elastic Container Registry (Amazon ECR), Amazon Elastic Container Service, Amazon Elastic Kubernetes Service , Amazon Redshift
Sagemaker API Explained:
AWS Certified Machine Learning Engineer Specialty Questions and Answers:
Question1: An advertising and analytics company uses machine learning to predict user response to online advertisements using a custom XGBoost model. The company wants to improve its ML pipeline by porting its training and inference code, written in R, to Amazon SageMaker, and do so with minimal changes to the existing code.
Answer1: Use the Build Your Own Container (BYOC) Amazon Sagemaker option.
Create a new docker container with the existing code. Register the container in Amazon Elastic Container registry. with the existing code. Register the container in Amazon Elastic Container Registry. Finally run the training and inference jobs using this container.
Question2: Which feature of Amazon SageMaker can you use for preprocessing the data?
Answer2: Amazon Sagemaker Notebook instances
Amazon SageMaker enables developers and data scientists to build, train, tune, and deploy machine learning (ML) models at scale. You can deploy trained ML models for real-time or batch predictions on unseen data, a process known as inference. However, in most cases, the raw input data must be preprocessed and can’t be used directly for making predictions. This is because most ML models expect the data in a predefined format, so the raw data needs to be first cleaned and formatted in order for the ML model to process the data. You can use the Amazon SageMaker built-in Scikit-learn library for preprocessing input data and then use the Amazon SageMaker built-in Linear Learner algorithm for predictions.
Question3: What setting, when creating an Amazon SageMaker notebook instance, can you use to install libraries and import data?
Answer3: LifeCycle Configuration
Question4: How to Choose the right Sagemaker built-in algorithm?




This is a general guide for choosing which algorithm to use depending on what business problem you have and what data you have.
Top 10 Google Professional Machine Learning Engineer Sample Questions
Question 1: You work for a textile manufacturer and have been asked to build a model to detect and classify fabric defects. You trained a machine learning model with high recall based on high resolution images taken at the end of the production line. You want quality control inspectors to gain trust in your model. Which technique should you use to understand the rationale of your classifier?
A. Use K-fold cross validation to understand how the model performs on different test datasets.
B. Use the Integrated Gradients method to efficiently compute feature attributions for each predicted image.
C. Use PCA (Principal Component Analysis) to reduce the original feature set to a smaller set of easily understood features.
D. Use k-means clustering to group similar images together, and calculate the Davies-Bouldin index to evaluate the separation between clusters.
Answer 1)
Notes 1)
Question 2: You need to write a generic test to verify whether Dense Neural Network (DNN) models automatically released by your team have a sufficient number of parameters to learn the task for which they were built. What should you do?
Answer 2)
Notes 2)
[appbox appstore 1560083470-iphone screenshots]
[appbox googleplay com.awssolutionarchitectassociateexampreppro.app]
Answer 3)
Notes 3)
Question 4: You work on a team where the process for deploying a model into production starts with data scientists training different versions of models in a Kubeflow pipeline. The workflow then stores the new model artifact into the corresponding Cloud Storage bucket. You need to build the next steps of the pipeline after the submitted model is ready to be tested and deployed in production on AI Platform. How should you configure the architecture before deploying the model to production?
Question 10) You work for a large financial institution that is planning to use Dialogflow to create a chatbot for the company’s mobile app. You have reviewed old chat logs and tagged each conversation for intent based on each customer’s stated intention for contacting customer service. About 70% of customer inquiries are simple requests that are solved within 10 intents. The remaining 30% of inquiries require much longer and more complicated requests. Which intents should you automate first?
[appbox appstore 1611045854-iphone screenshots]
[appbox microsoftstore 9n8rl80hvm4t-mobile screenshots]
Machine Learning Q&A Part I:
Google.
Azure and AWS are second class citizens in this area.
Sure, AWS has 70% of the market.
Sure, Azure is the easiest turn key and super user friendly.
But, the king of machine learning in the cloud is GCP.
GCP = Google Cloud Platform
Google has the largest data science team in the world, not mention they have Hinton.
Let’s forgot for a minute they created TensorFlow and give it away.
Let’s just talk about building a real world model with data that doesn’t fit into a excel spreadsheet.
The vast majority of applied machine learning is supervised and that means we need data.
Not just normal data, we need very clean highly structured data.
Where’s the easiest place in the world to upload and model a Petabyte of structured data? BigQuery of course.
Why BigQuery? I don’t have to do anything but upload my data. No spinning up RedShit clusters or whatever I have to do in Azure, just upload and massage data with my familiar SQL. If I do have to wrangle my data it won’t take my six months to update 5 rows here, minutes usually.
Then, you’ll need a front end. Cloud datalab is a Jupyter notebook, which is good because I don’t want nor do I need anything else.
Then, with a single line of code I connect by datalab (Jupyter) notebook to my data in BigQuery and build away.
I’ve worked in all three and the only thing I care about is getting to my job the fastest and right now that means I build my models in GCP.
If you’re new to machine learning don’t start in GCP or any cloud vendor for that matter. Start learning Python from the comfort of your laptop.
The course below is free to the first 20.
The Complete Python Course for Machine Learning Engineers
Here, I want to share the best research paper on Machine Learning classification methods, titled ‘Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?’, published in the ‘Journal of Machine Learning Research’.
This paper nicely explained 179 classification techniques and applied them on 121 data sets thus sharing small summary of the paper:
Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?
The paper evaluated 179 classifiers arising from 17 ML families (discriminant analysis, Bayesian, neural networks, support vector machines, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearest neighbours, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods), implemented in Weka, R ( with and without the caret package), C and Matlab, including all the relevant classifiers available today.
Experiments used total 121 data sets , which represent the whole UCI data base (excluding the large-scale problems) and other own real problems, in order to achieve significant conclusions about the classifier behaviour, not dependent on the data set collection.
The whole data set and partitions are available from: http://persoal.citius.usc.es/manuel.fernandez.delgado/papers/jmlr/data.tar.gz
The classifiers most likely to be the bests are the random forest (RF) versions, the best of which (implemented in R and accessed via caret) achieves 94.1% of the maximum accuracy overcoming 90% in the 84.3% of the data sets. However, the difference is not statistically significant with the second best, the SVM with Gaussian kernel implemented in C using LibSVM, which achieves 92.3% of the maximum accuracy. A few models are clearly better than the remaining ones: random forest, SVM with Gaussian and polynomial kernels, extreme learning machine with Gaussian kernel, C5.0 and avNNet (a committee of multi-layer perceptrons implemented in R with the caret package).
The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in the top-20, respectively).
You can see the table with the complete results: http://persoal.citius.usc.es/manuel.fernandez.delgado/papers/jmlr/results.txt
I hope it will be helpful for Statistic and Machine Leaning aspirants!
Thank you!
These basic questions should help:
1. Is the classification going to be supervised or unsupervised? Several well defined techniques likes SVM (Support Vector Machines), trained neural net,etc. are applicable for supervised classification. For unsupervised classification, GMMs (Gaussian Mixture Models), HMMs (Hidden Markov models) with Baye’s techniques could be used. (Several other techniques could of course be used as well)
2.How much training data do you have in case it is supervised ? A small number of training data may yield discouraging classification accuracy even if the chosen classifier is the most suitable one for the problem. In such a case, try to obtain more number of samples. There’s also generally a correlation (for practical purposes at least) between the feature dimensionality and the number of samples for given technique. For example, while using SVM, the linear kernel tends to yield better results when the number of training samples are less than or equal to or only slightly more than the number of feature dimensions as compared to RBF or any other kernel.
3. If the feature vector dimensionality is small enough (1/2/3 -D) then it makes sense to plot and visually inspect if techniques like clustering could be more useful. With very high number of feature dimensions, methods like clustering are generally not advisable(Refer : “The Curse Of Dimensionality”).
4. Are you doing classification in real time ? Some techniques ,e.g. “Template Match” in image classification may lead to a higher number of errors but is generally faster than most other techniques if the number of templates to be evaluated are not excessively high.
5. Depending upon the problem domain, you can decide if you can choose the underlying model in such a way that it can use certain temporal/spatial correlations that may be inherent in the data. For example, HMMs use the temporal continuity of speech samples for enhancing classification results in speech recognition problems.
Another point, slightly off the topic perhaps, but the classification performance is as much a function of choosing the correct feature vectors, the pre-processing of the feature vectors as much as the classifier itself. It’s generally a good idea to give reserve some initial part of the project to try out various classifiers on the same data-set. It may at least help you reject the ones which are highly inaccurate.
At a high level, these skills are a combination of software and data engineering.
The persons that are more appropriate to do this job are a data engineer and/or a machine learning engineer.
That being said, if you work at a startup or happen to be in a small company and need to put the models into production yourself, here are the top skills you need to get:
- Well structured code: it doesn’t need to be perfect but at least can be understood and updated by other team members. Avoid spaghetti code[1] as the plague.
- Add logs: if you are a Python user, the logging[2] module is your friend. Avoid print statements at any cost.
- Model versioning: add a hash key to your different models. You will thank me later.
- Metadata everywhere: save as much data about your models and ML experiments as you can (running time, hyperparameters, used features, CV scores, and so on). You will thank me later, again.
- Monitor performances: execution time and statistical scores of your models.
- Data and models management: store the necessary data and models somewhere that is available to everyone (S3[3] for example). Avoid uploading these to your VCS[4] system. Don’t share them using Slack or Drive. I won’t judge you though, I do it sometimes (read often). Read more here …..
Some of the mistakes that might involve during building a machine learning model (I can think of) are listed here:
- Not understanding the structure of the dataset
- Not giving proper care during features selection
- Leaving out categorical features and considering just numerical variables
- Falling into dummy variable trap
- Selection of inefficient machine learning algorithm
- Not trying out various ML algorithms for building the model based on structure of data.
- Improper tuning of model parameters
- Most importantly: Building an idiotstic imperfect model i.e. suppose we have a classification problem with 99% chances of falling into class1 and remaining to class2. The built model may develop a mapping function which all the time for all data inputs, may predict the result to be class1. Well, one might say his/her model has 99% accuracy. But in reality the 1% class2 case hasn’t been included in the model. So this must be taken into consideration.
- Read more here…
[appbox appstore 1560083470-iphone screenshots]
[appbox googleplay com.awssolutionarchitectassociateexampreppro.app]
Basically, data mining is a key aspect of data analytics. Some even consider the former as essential to execute before the latter. While data analytics is the complete package and involves most components needed to examine a data set and extract valuable information, data mining focuses specifically on identifying hidden patterns.
That’s just the surface-level comparison though. The image above gives an overview of how the two differ.
One such difference is the presence of a hypothesis. Data analytics usually requires coming up with one, as it aims to find specific answers. Data mining, on the other hand, generally doesn’t need one to test or prove. The expected output are patterns or trends, which doesn’t require coming up with a statement or fact to test.
However, that doesn’t mean you mine data blindly. You still have a goal, whether it’s to come up with a recommender system or identify predictors of a certain dimension. Ultimately though, you strive to come up with data patterns or trends. For data analysis on the other hand, you’re expected to come up with valuable and actionable insights, usually in relation to a predetermined hypothesis. Read more here ….
The data science life cycle is not something well-defined like the software development life-cycle, and there is no ‘one-size-fits-all’ solution for data science projects. Every step in the life-cycle of a data science project depends on various data scientist skills and data science tools. The typical life-cycle of a data science project involves jumping back and forth among various interdependent science tasks using a variety of tools, techniques, programming, etc.
Thus, the data science life-cycle can include the following steps:
- Business requirement understanding.
- Data collection.
- Data cleaning.
- Data analysis.
- Modeling.
- Performance evaluation.
- Communicating with stakeholders.
- Deployment.
- Real-world testing.
- Business buy-in.
- Support and maintenance.
Looks neat, but here is the scheme to visualize how it is happening in reality:
Agile development processes, especially continuous delivery lends itself well to the data science project life-cycle. The early comparison helps the data science team to change approaches, refine hypotheses and even discard the project if the business case is nonviable or the benefits from the predictive models are not worth the effort to build it.
[appbox appstore 1611045854-iphone screenshots]
[appbox microsoftstore 9n8rl80hvm4t-mobile screenshots]
Machine Learning Q&A -Part II:
At a high level, these skills are a combination of software and data engineering.
The persons that are more appropriate to do this job are a data engineer and/or a machine learning engineer.
That being said, if you work at a startup or happen to be in a small company and need to put the models into production yourself, here are the top skills you need to get:
- Well structured code: it doesn’t need to be perfect but at least can be understood and updated by other team members. Avoid spaghetti code[1] as the plague.
- Add logs: if you are a Python user, the logging[2] module is your friend. Avoid print statements at any cost.
- Model versioning: add a hash key to your different models. You will thank me later.
- Metadata everywhere: save as much data about your models and ML experiments as you can (running time, hyperparameters, used features, CV scores, and so on). You will thank me later, again.
- Monitor performances: execution time and statistical scores of your models.
- Data and models management: store the necessary data and models somewhere that is available to everyone (S3[3] for example). Avoid uploading these to your VCS[4] system. Don’t share them using Slack or Drive. I won’t judge you though, I do it sometimes (read often). Read more here …..
Some of the mistakes that might involve during building a machine learning model (I can think of) are listed here:
- Not understanding the structure of the dataset
- Not giving proper care during features selection
- Leaving out categorical features and considering just numerical variables
- Falling into dummy variable trap
- Selection of inefficient machine learning algorithm
- Not trying out various ML algorithms for building the model based on structure of data.
- Improper tuning of model parameters
- Most importantly: Building an idiotstic imperfect model i.e. suppose we have a classification problem with 99% chances of falling into class1 and remaining to class2. The built model may develop a mapping function which all the time for all data inputs, may predict the result to be class1. Well, one might say his/her model has 99% accuracy. But in reality the 1% class2 case hasn’t been included in the model. So this must be taken into consideration.
- Read more here…
Basically, data mining is a key aspect of data analytics. Some even consider the former as essential to execute before the latter. While data analytics is the complete package and involves most components needed to examine a data set and extract valuable information, data mining focuses specifically on identifying hidden patterns.
That’s just the surface-level comparison though. The image above gives an overview of how the two differ.
One such difference is the presence of a hypothesis. Data analytics usually requires coming up with one, as it aims to find specific answers. Data mining, on the other hand, generally doesn’t need one to test or prove. The expected output are patterns or trends, which doesn’t require coming up with a statement or fact to test.
However, that doesn’t mean you mine data blindly. You still have a goal, whether it’s to come up with a recommender system or identify predictors of a certain dimension. Ultimately though, you strive to come up with data patterns or trends. For data analysis on the other hand, you’re expected to come up with valuable and actionable insights, usually in relation to a predetermined hypothesis. Read more here ….
The data science life cycle is not something well-defined like the software development life-cycle, and there is no ‘one-size-fits-all’ solution for data science projects. Every step in the life-cycle of a data science project depends on various data scientist skills and data science tools. The typical life-cycle of a data science project involves jumping back and forth among various interdependent science tasks using a variety of tools, techniques, programming, etc.
Thus, the data science life-cycle can include the following steps:
- Business requirement understanding.
- Data collection.
- Data cleaning.
- Data analysis.
- Modeling.
- Performance evaluation.
- Communicating with stakeholders.
- Deployment.
- Real-world testing.
- Business buy-in.
- Support and maintenance.
Looks neat, but here is the scheme to visualize how it is happening in reality:
Agile development processes, especially continuous delivery lends itself well to the data science project life-cycle. The early comparison helps the data science team to change approaches, refine hypotheses and even discard the project if the business case is nonviable or the benefits from the predictive models are not worth the effort to build it.
iOs: https://apps.apple.com/ca/app/aws-machine-learning-prep-pro/id1611045854
Android/Amazon: https://www.amazon.com/gp/product/B09TZ4H8V6
AWS MLS-C01 Machine Learning Exam Prep
Quizzes, Practice Exams: Modeling, Data Engineering, Vision, Exploratory Data Analysis, ML Ops, Cheat Sheets, ML Jobs Interview Q&A
Use this App to learn about Machine Learning on AWS and prepare for the AWS Machine Learning Specialty Certification MLS-C01.
Earning AWS Certified Machine Learning Specialty validates expertise in building, training, tuning, and deploying machine learning (ML) models on AWS.
The App provides hundreds of quizzes and practice exam about:
– Machine Learning Operation on AWS
– Modelling
– Data Engineering
– Computer Vision,
– Exploratory Data Analysis,
– ML implementation & Operations
– Machine Learning Basics Questions and Answers
– Machine Learning Advanced Questions and Answers
– Scorecard
– Countdown timer
– Machine Learning Cheat Sheets
– Machine Learning Interview Questions and Answers
– Machine Learning Latest News
The App covers Machine Learning Basics and Advanced topics including: NLP, Computer Vision, Python, linear regression, logistic regression, Sampling, dataset, statistical interaction, selection bias, non-Gaussian distribution, bias-variance trade-off, Normal Distribution, correlation and covariance, Point Estimates and Confidence Interval, A/B Testing, p-value, statistical power of sensitivity, over-fitting and under-fitting, regularization, Law of Large Numbers, Confounding Variables, Survivorship Bias, univariate, bivariate and multivariate, Resampling, ROC curve, TF/IDF vectorization, Cluster Sampling, etc.
Domain 1: Data Engineering
Create data repositories for machine learning.
Identify data sources (e.g., content and location, primary sources such as user data)
Determine storage mediums (e.g., DB, Data Lake, S3, EFS, EBS)
Identify and implement a data ingestion solution.
Data job styles/types (batch load, streaming)
Data ingestion pipelines (Batch-based ML workloads and streaming-based ML workloads), etc.
Domain 2: Exploratory Data Analysis
Sanitize and prepare data for modeling.
Perform feature engineering.
Analyze and visualize data for machine learning.
Domain 3: Modeling
Frame business problems as machine learning problems.
Select the appropriate model(s) for a given machine learning problem.
Train machine learning models.
Perform hyperparameter optimization.
Evaluate machine learning models.
Domain 4: Machine Learning Implementation and Operations
Build machine learning solutions for performance, availability, scalability, resiliency, and fault tolerance.
Recommend and implement the appropriate machine learning services and features for a given problem.
Apply basic AWS security practices to machine learning solutions.
Deploy and operationalize machine learning solutions.
Machine Learning Services covered:
Amazon Comprehend
AWS Deep Learning AMIs (DLAMI)
AWS DeepLens
Amazon Forecast
Amazon Fraud Detector
Amazon Lex
Amazon Polly
Amazon Rekognition
Amazon SageMaker
Amazon Textract
Amazon Transcribe
Amazon Translate
Other Services and topics covered are:
Ingestion/Collection
Processing/ETL
Data analysis/visualization
Model training
Model deployment/inference
Operational
AWS ML application services
Language relevant to ML (for example, Python, Java, Scala, R, SQL)
Notebooks and integrated development environments (IDEs),
S3, SageMaker, Kinesis, Lake Formation, Athena, Kibana, Redshift, Textract, EMR, Glue, SageMaker, CSV, JSON, IMG, parquet or databases, Amazon Athena
Amazon EC2, Amazon Elastic Container Registry (Amazon ECR), Amazon Elastic Container Service, Amazon Elastic Kubernetes Service , Amazon Redshift
Sagemaker API Explained:
AWS Certified Machine Learning Engineer Specialty Questions and Answers:
Question1: An advertising and analytics company uses machine learning to predict user response to online advertisements using a custom XGBoost model. The company wants to improve its ML pipeline by porting its training and inference code, written in R, to Amazon SageMaker, and do so with minimal changes to the existing code.
Answer1: Use the Build Your Own Container (BYOC) Amazon Sagemaker option.
Create a new docker container with the existing code. Register the container in Amazon Elastic Container registry. with the existing code. Register the container in Amazon Elastic Container Registry. Finally run the training and inference jobs using this container.
Question2: Which feature of Amazon SageMaker can you use for preprocessing the data?
Answer2: Amazon Sagemaker Notebook instances
Amazon SageMaker enables developers and data scientists to build, train, tune, and deploy machine learning (ML) models at scale. You can deploy trained ML models for real-time or batch predictions on unseen data, a process known as inference. However, in most cases, the raw input data must be preprocessed and can’t be used directly for making predictions. This is because most ML models expect the data in a predefined format, so the raw data needs to be first cleaned and formatted in order for the ML model to process the data. You can use the Amazon SageMaker built-in Scikit-learn library for preprocessing input data and then use the Amazon SageMaker built-in Linear Learner algorithm for predictions.
Question3: What setting, when creating an Amazon SageMaker notebook instance, can you use to install libraries and import data?
Answer3: LifeCycle Configuration
Question4: How to Choose the right Sagemaker built-in algorithm?




This is a general guide for choosing which algorithm to use depending on what business problem you have and what data you have.
Top 10 Google Professional Machine Learning Engineer Sample Questions
Question 1: You work for a textile manufacturer and have been asked to build a model to detect and classify fabric defects. You trained a machine learning model with high recall based on high resolution images taken at the end of the production line. You want quality control inspectors to gain trust in your model. Which technique should you use to understand the rationale of your classifier?
A. Use K-fold cross validation to understand how the model performs on different test datasets.
B. Use the Integrated Gradients method to efficiently compute feature attributions for each predicted image.
C. Use PCA (Principal Component Analysis) to reduce the original feature set to a smaller set of easily understood features.
D. Use k-means clustering to group similar images together, and calculate the Davies-Bouldin index to evaluate the separation between clusters.
Answer 1)
BNotes 1)
Question 2: You need to write a generic test to verify whether Dense Neural Network (DNN) models automatically released by your team have a sufficient number of parameters to learn the task for which they were built. What should you do?
Answer 2)
Notes 2)
[appbox appstore 1560083470-iphone screenshots]
[appbox googleplay com.awssolutionarchitectassociateexampreppro.app]
Answer 3)
Notes 3)
Question 4: You work on a team where the process for deploying a model into production starts with data scientists training different versions of models in a Kubeflow pipeline. The workflow then stores the new model artifact into the corresponding Cloud Storage bucket. You need to build the next steps of the pipeline after the submitted model is ready to be tested and deployed in production on AI Platform. How should you configure the architecture before deploying the model to production?
Question 10) You work for a large financial institution that is planning to use Dialogflow to create a chatbot for the company’s mobile app. You have reviewed old chat logs and tagged each conversation for intent based on each customer’s stated intention for contacting customer service. About 70% of customer inquiries are simple requests that are solved within 10 intents. The remaining 30% of inquiries require much longer and more complicated requests. Which intents should you automate first?
[appbox appstore 1611045854-iphone screenshots]
[appbox microsoftstore 9n8rl80hvm4t-mobile screenshots]
Machine Learning Q&A Part I:
Google.
Azure and AWS are second class citizens in this area.
Sure, AWS has 70% of the market.
Sure, Azure is the easiest turn key and super user friendly.
But, the king of machine learning in the cloud is GCP.
GCP = Google Cloud Platform
Google has the largest data science team in the world, not mention they have Hinton.
Let’s forgot for a minute they created TensorFlow and give it away.
Let’s just talk about building a real world model with data that doesn’t fit into a excel spreadsheet.
The vast majority of applied machine learning is supervised and that means we need data.
Not just normal data, we need very clean highly structured data.
Where’s the easiest place in the world to upload and model a Petabyte of structured data? BigQuery of course.
Why BigQuery? I don’t have to do anything but upload my data. No spinning up RedShit clusters or whatever I have to do in Azure, just upload and massage data with my familiar SQL. If I do have to wrangle my data it won’t take my six months to update 5 rows here, minutes usually.
Then, you’ll need a front end. Cloud datalab is a Jupyter notebook, which is good because I don’t want nor do I need anything else.
Then, with a single line of code I connect by datalab (Jupyter) notebook to my data in BigQuery and build away.
I’ve worked in all three and the only thing I care about is getting to my job the fastest and right now that means I build my models in GCP.
If you’re new to machine learning don’t start in GCP or any cloud vendor for that matter. Start learning Python from the comfort of your laptop.
The course below is free to the first 20.
The Complete Python Course for Machine Learning Engineers
Here, I want to share the best research paper on Machine Learning classification methods, titled ‘Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?’, published in the ‘Journal of Machine Learning Research’.
This paper nicely explained 179 classification techniques and applied them on 121 data sets thus sharing small summary of the paper:
Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?
The paper evaluated 179 classifiers arising from 17 ML families (discriminant analysis, Bayesian, neural networks, support vector machines, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearest neighbours, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods), implemented in Weka, R ( with and without the caret package), C and Matlab, including all the relevant classifiers available today.
Experiments used total 121 data sets , which represent the whole UCI data base (excluding the large-scale problems) and other own real problems, in order to achieve significant conclusions about the classifier behaviour, not dependent on the data set collection.
The whole data set and partitions are available from: http://persoal.citius.usc.es/manuel.fernandez.delgado/papers/jmlr/data.tar.gz
The classifiers most likely to be the bests are the random forest (RF) versions, the best of which (implemented in R and accessed via caret) achieves 94.1% of the maximum accuracy overcoming 90% in the 84.3% of the data sets. However, the difference is not statistically significant with the second best, the SVM with Gaussian kernel implemented in C using LibSVM, which achieves 92.3% of the maximum accuracy. A few models are clearly better than the remaining ones: random forest, SVM with Gaussian and polynomial kernels, extreme learning machine with Gaussian kernel, C5.0 and avNNet (a committee of multi-layer perceptrons implemented in R with the caret package).
The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in the top-20, respectively).
You can see the table with the complete results: http://persoal.citius.usc.es/manuel.fernandez.delgado/papers/jmlr/results.txt
I hope it will be helpful for Statistic and Machine Leaning aspirants!
Thank you!
These basic questions should help:
1. Is the classification going to be supervised or unsupervised? Several well defined techniques likes SVM (Support Vector Machines), trained neural net,etc. are applicable for supervised classification. For unsupervised classification, GMMs (Gaussian Mixture Models), HMMs (Hidden Markov models) with Baye’s techniques could be used. (Several other techniques could of course be used as well)
2.How much training data do you have in case it is supervised ? A small number of training data may yield discouraging classification accuracy even if the chosen classifier is the most suitable one for the problem. In such a case, try to obtain more number of samples. There’s also generally a correlation (for practical purposes at least) between the feature dimensionality and the number of samples for given technique. For example, while using SVM, the linear kernel tends to yield better results when the number of training samples are less than or equal to or only slightly more than the number of feature dimensions as compared to RBF or any other kernel.
3. If the feature vector dimensionality is small enough (1/2/3 -D) then it makes sense to plot and visually inspect if techniques like clustering could be more useful. With very high number of feature dimensions, methods like clustering are generally not advisable(Refer : “The Curse Of Dimensionality”).
4. Are you doing classification in real time ? Some techniques ,e.g. “Template Match” in image classification may lead to a higher number of errors but is generally faster than most other techniques if the number of templates to be evaluated are not excessively high.
5. Depending upon the problem domain, you can decide if you can choose the underlying model in such a way that it can use certain temporal/spatial correlations that may be inherent in the data. For example, HMMs use the temporal continuity of speech samples for enhancing classification results in speech recognition problems.
Another point, slightly off the topic perhaps, but the classification performance is as much a function of choosing the correct feature vectors, the pre-processing of the feature vectors as much as the classifier itself. It’s generally a good idea to give reserve some initial part of the project to try out various classifiers on the same data-set. It may at least help you reject the ones which are highly inaccurate.
At a high level, these skills are a combination of software and data engineering.
The persons that are more appropriate to do this job are a data engineer and/or a machine learning engineer.
That being said, if you work at a startup or happen to be in a small company and need to put the models into production yourself, here are the top skills you need to get:
- Well structured code: it doesn’t need to be perfect but at least can be understood and updated by other team members. Avoid spaghetti code[1] as the plague.
- Add logs: if you are a Python user, the logging[2] module is your friend. Avoid print statements at any cost.
- Model versioning: add a hash key to your different models. You will thank me later.
- Metadata everywhere: save as much data about your models and ML experiments as you can (running time, hyperparameters, used features, CV scores, and so on). You will thank me later, again.
- Monitor performances: execution time and statistical scores of your models.
- Data and models management: store the necessary data and models somewhere that is available to everyone (S3[3] for example). Avoid uploading these to your VCS[4] system. Don’t share them using Slack or Drive. I won’t judge you though, I do it sometimes (read often). Read more here …..
Some of the mistakes that might involve during building a machine learning model (I can think of) are listed here:
- Not understanding the structure of the dataset
- Not giving proper care during features selection
- Leaving out categorical features and considering just numerical variables
- Falling into dummy variable trap
- Selection of inefficient machine learning algorithm
- Not trying out various ML algorithms for building the model based on structure of data.
- Improper tuning of model parameters
- Most importantly: Building an idiotstic imperfect model i.e. suppose we have a classification problem with 99% chances of falling into class1 and remaining to class2. The built model may develop a mapping function which all the time for all data inputs, may predict the result to be class1. Well, one might say his/her model has 99% accuracy. But in reality the 1% class2 case hasn’t been included in the model. So this must be taken into consideration.
- Read more here…
[appbox appstore 1560083470-iphone screenshots]
[appbox googleplay com.awssolutionarchitectassociateexampreppro.app]
Basically, data mining is a key aspect of data analytics. Some even consider the former as essential to execute before the latter. While data analytics is the complete package and involves most components needed to examine a data set and extract valuable information, data mining focuses specifically on identifying hidden patterns.
That’s just the surface-level comparison though. The image above gives an overview of how the two differ.
One such difference is the presence of a hypothesis. Data analytics usually requires coming up with one, as it aims to find specific answers. Data mining, on the other hand, generally doesn’t need one to test or prove. The expected output are patterns or trends, which doesn’t require coming up with a statement or fact to test.
However, that doesn’t mean you mine data blindly. You still have a goal, whether it’s to come up with a recommender system or identify predictors of a certain dimension. Ultimately though, you strive to come up with data patterns or trends. For data analysis on the other hand, you’re expected to come up with valuable and actionable insights, usually in relation to a predetermined hypothesis. Read more here ….
The data science life cycle is not something well-defined like the software development life-cycle, and there is no ‘one-size-fits-all’ solution for data science projects. Every step in the life-cycle of a data science project depends on various data scientist skills and data science tools. The typical life-cycle of a data science project involves jumping back and forth among various interdependent science tasks using a variety of tools, techniques, programming, etc.
Thus, the data science life-cycle can include the following steps:
- Business requirement understanding.
- Data collection.
- Data cleaning.
- Data analysis.
- Modeling.
- Performance evaluation.
- Communicating with stakeholders.
- Deployment.
- Real-world testing.
- Business buy-in.
- Support and maintenance.
Looks neat, but here is the scheme to visualize how it is happening in reality:
Agile development processes, especially continuous delivery lends itself well to the data science project life-cycle. The early comparison helps the data science team to change approaches, refine hypotheses and even discard the project if the business case is nonviable or the benefits from the predictive models are not worth the effort to build it.
[appbox appstore 1611045854-iphone screenshots]
[appbox microsoftstore 9n8rl80hvm4t-mobile screenshots]
Machine Learning Q&A -Part II:
At a high level, these skills are a combination of software and data engineering.
The persons that are more appropriate to do this job are a data engineer and/or a machine learning engineer.
That being said, if you work at a startup or happen to be in a small company and need to put the models into production yourself, here are the top skills you need to get:
- Well structured code: it doesn’t need to be perfect but at least can be understood and updated by other team members. Avoid spaghetti code[1] as the plague.
- Add logs: if you are a Python user, the logging[2] module is your friend. Avoid print statements at any cost.
- Model versioning: add a hash key to your different models. You will thank me later.
- Metadata everywhere: save as much data about your models and ML experiments as you can (running time, hyperparameters, used features, CV scores, and so on). You will thank me later, again.
- Monitor performances: execution time and statistical scores of your models.
- Data and models management: store the necessary data and models somewhere that is available to everyone (S3[3] for example). Avoid uploading these to your VCS[4] system. Don’t share them using Slack or Drive. I won’t judge you though, I do it sometimes (read often). Read more here …..
Some of the mistakes that might involve during building a machine learning model (I can think of) are listed here:
- Not understanding the structure of the dataset
- Not giving proper care during features selection
- Leaving out categorical features and considering just numerical variables
- Falling into dummy variable trap
- Selection of inefficient machine learning algorithm
- Not trying out various ML algorithms for building the model based on structure of data.
- Improper tuning of model parameters
- Most importantly: Building an idiotstic imperfect model i.e. suppose we have a classification problem with 99% chances of falling into class1 and remaining to class2. The built model may develop a mapping function which all the time for all data inputs, may predict the result to be class1. Well, one might say his/her model has 99% accuracy. But in reality the 1% class2 case hasn’t been included in the model. So this must be taken into consideration.
- Read more here…
Basically, data mining is a key aspect of data analytics. Some even consider the former as essential to execute before the latter. While data analytics is the complete package and involves most components needed to examine a data set and extract valuable information, data mining focuses specifically on identifying hidden patterns.
That’s just the surface-level comparison though. The image above gives an overview of how the two differ.
One such difference is the presence of a hypothesis. Data analytics usually requires coming up with one, as it aims to find specific answers. Data mining, on the other hand, generally doesn’t need one to test or prove. The expected output are patterns or trends, which doesn’t require coming up with a statement or fact to test.
However, that doesn’t mean you mine data blindly. You still have a goal, whether it’s to come up with a recommender system or identify predictors of a certain dimension. Ultimately though, you strive to come up with data patterns or trends. For data analysis on the other hand, you’re expected to come up with valuable and actionable insights, usually in relation to a predetermined hypothesis. Read more here ….
The data science life cycle is not something well-defined like the software development life-cycle, and there is no ‘one-size-fits-all’ solution for data science projects. Every step in the life-cycle of a data science project depends on various data scientist skills and data science tools. The typical life-cycle of a data science project involves jumping back and forth among various interdependent science tasks using a variety of tools, techniques, programming, etc.
Thus, the data science life-cycle can include the following steps:
- Business requirement understanding.
- Data collection.
- Data cleaning.
- Data analysis.
- Modeling.
- Performance evaluation.
- Communicating with stakeholders.
- Deployment.
- Real-world testing.
- Business buy-in.
- Support and maintenance.
Looks neat, but here is the scheme to visualize how it is happening in reality:
Agile development processes, especially continuous delivery lends itself well to the data science project life-cycle. The early comparison helps the data science team to change approaches, refine hypotheses and even discard the project if the business case is nonviable or the benefits from the predictive models are not worth the effort to build it.
iOs: https://apps.apple.com/ca/app/aws-machine-learning-prep-pro/id1611045854
Android/Amazon: https://www.amazon.com/gp/product/B09TZ4H8V6
AWS MLS-C01 Machine Learning Exam Prep
Quizzes, Practice Exams: Modeling, Data Engineering, Vision, Exploratory Data Analysis, ML Ops, Cheat Sheets, ML Jobs Interview Q&A
Use this App to learn about Machine Learning on AWS and prepare for the AWS Machine Learning Specialty Certification MLS-C01.
Earning AWS Certified Machine Learning Specialty validates expertise in building, training, tuning, and deploying machine learning (ML) models on AWS.
The App provides hundreds of quizzes and practice exam about:
– Machine Learning Operation on AWS
– Modelling
– Data Engineering
– Computer Vision,
– Exploratory Data Analysis,
– ML implementation & Operations
– Machine Learning Basics Questions and Answers
– Machine Learning Advanced Questions and Answers
– Scorecard
– Countdown timer
– Machine Learning Cheat Sheets
– Machine Learning Interview Questions and Answers
– Machine Learning Latest News
The App covers Machine Learning Basics and Advanced topics including: NLP, Computer Vision, Python, linear regression, logistic regression, Sampling, dataset, statistical interaction, selection bias, non-Gaussian distribution, bias-variance trade-off, Normal Distribution, correlation and covariance, Point Estimates and Confidence Interval, A/B Testing, p-value, statistical power of sensitivity, over-fitting and under-fitting, regularization, Law of Large Numbers, Confounding Variables, Survivorship Bias, univariate, bivariate and multivariate, Resampling, ROC curve, TF/IDF vectorization, Cluster Sampling, etc.
Domain 1: Data Engineering
Create data repositories for machine learning.
Identify data sources (e.g., content and location, primary sources such as user data)
Determine storage mediums (e.g., DB, Data Lake, S3, EFS, EBS)
Identify and implement a data ingestion solution.
Data job styles/types (batch load, streaming)
Data ingestion pipelines (Batch-based ML workloads and streaming-based ML workloads), etc.
Domain 2: Exploratory Data Analysis
Sanitize and prepare data for modeling.
Perform feature engineering.
Analyze and visualize data for machine learning.
Domain 3: Modeling
Frame business problems as machine learning problems.
Select the appropriate model(s) for a given machine learning problem.
Train machine learning models.
Perform hyperparameter optimization.
Evaluate machine learning models.
Domain 4: Machine Learning Implementation and Operations
Build machine learning solutions for performance, availability, scalability, resiliency, and fault tolerance.
Recommend and implement the appropriate machine learning services and features for a given problem.
Apply basic AWS security practices to machine learning solutions.
Deploy and operationalize machine learning solutions.
Machine Learning Services covered:
Amazon Comprehend
AWS Deep Learning AMIs (DLAMI)
AWS DeepLens
Amazon Forecast
Amazon Fraud Detector
Amazon Lex
Amazon Polly
Amazon Rekognition
Amazon SageMaker
Amazon Textract
Amazon Transcribe
Amazon Translate
Other Services and topics covered are:
Ingestion/Collection
Processing/ETL
Data analysis/visualization
Model training
Model deployment/inference
Operational
AWS ML application services
Language relevant to ML (for example, Python, Java, Scala, R, SQL)
Notebooks and integrated development environments (IDEs),
S3, SageMaker, Kinesis, Lake Formation, Athena, Kibana, Redshift, Textract, EMR, Glue, SageMaker, CSV, JSON, IMG, parquet or databases, Amazon Athena
Amazon EC2, Amazon Elastic Container Registry (Amazon ECR), Amazon Elastic Container Service, Amazon Elastic Kubernetes Service , Amazon Redshift
Sagemaker API Explained:
AWS Certified Machine Learning Engineer Specialty Questions and Answers:
Question1: An advertising and analytics company uses machine learning to predict user response to online advertisements using a custom XGBoost model. The company wants to improve its ML pipeline by porting its training and inference code, written in R, to Amazon SageMaker, and do so with minimal changes to the existing code.
Answer1: Use the Build Your Own Container (BYOC) Amazon Sagemaker option.
Create a new docker container with the existing code. Register the container in Amazon Elastic Container registry. with the existing code. Register the container in Amazon Elastic Container Registry. Finally run the training and inference jobs using this container.
Question2: Which feature of Amazon SageMaker can you use for preprocessing the data?
Answer2: Amazon Sagemaker Notebook instances
Amazon SageMaker enables developers and data scientists to build, train, tune, and deploy machine learning (ML) models at scale. You can deploy trained ML models for real-time or batch predictions on unseen data, a process known as inference. However, in most cases, the raw input data must be preprocessed and can’t be used directly for making predictions. This is because most ML models expect the data in a predefined format, so the raw data needs to be first cleaned and formatted in order for the ML model to process the data. You can use the Amazon SageMaker built-in Scikit-learn library for preprocessing input data and then use the Amazon SageMaker built-in Linear Learner algorithm for predictions.
Question3: What setting, when creating an Amazon SageMaker notebook instance, can you use to install libraries and import data?
Answer3: LifeCycle Configuration
Question4: How to Choose the right Sagemaker built-in algorithm?




This is a general guide for choosing which algorithm to use depending on what business problem you have and what data you have.
Top 10 Google Professional Machine Learning Engineer Sample Questions
Question 1: You work for a textile manufacturer and have been asked to build a model to detect and classify fabric defects. You trained a machine learning model with high recall based on high resolution images taken at the end of the production line. You want quality control inspectors to gain trust in your model. Which technique should you use to understand the rationale of your classifier?
A. Use K-fold cross validation to understand how the model performs on different test datasets.
B. Use the Integrated Gradients method to efficiently compute feature attributions for each predicted image.
C. Use PCA (Principal Component Analysis) to reduce the original feature set to a smaller set of easily understood features.
D. Use k-means clustering to group similar images together, and calculate the Davies-Bouldin index to evaluate the separation between clusters.
Answer 1)
BNotes 1)
Question 2: You need to write a generic test to verify whether Dense Neural Network (DNN) models automatically released by your team have a sufficient number of parameters to learn the task for which they were built. What should you do?
Answer 2)
Notes 2)
[appbox appstore 1560083470-iphone screenshots]
[appbox googleplay com.awssolutionarchitectassociateexampreppro.app]
Answer 3)
Notes 3)
Question 4: You work on a team where the process for deploying a model into production starts with data scientists training different versions of models in a Kubeflow pipeline. The workflow then stores the new model artifact into the corresponding Cloud Storage bucket. You need to build the next steps of the pipeline after the submitted model is ready to be tested and deployed in production on AI Platform. How should you configure the architecture before deploying the model to production?
Question 10) You work for a large financial institution that is planning to use Dialogflow to create a chatbot for the company’s mobile app. You have reviewed old chat logs and tagged each conversation for intent based on each customer’s stated intention for contacting customer service. About 70% of customer inquiries are simple requests that are solved within 10 intents. The remaining 30% of inquiries require much longer and more complicated requests. Which intents should you automate first?
[appbox appstore 1611045854-iphone screenshots]
[appbox microsoftstore 9n8rl80hvm4t-mobile screenshots]
Machine Learning Q&A Part I:
Google.
Azure and AWS are second class citizens in this area.
Sure, AWS has 70% of the market.
Sure, Azure is the easiest turn key and super user friendly.
But, the king of machine learning in the cloud is GCP.
GCP = Google Cloud Platform
Google has the largest data science team in the world, not mention they have Hinton.
Let’s forgot for a minute they created TensorFlow and give it away.
Let’s just talk about building a real world model with data that doesn’t fit into a excel spreadsheet.
The vast majority of applied machine learning is supervised and that means we need data.
Not just normal data, we need very clean highly structured data.
Where’s the easiest place in the world to upload and model a Petabyte of structured data? BigQuery of course.
Why BigQuery? I don’t have to do anything but upload my data. No spinning up RedShit clusters or whatever I have to do in Azure, just upload and massage data with my familiar SQL. If I do have to wrangle my data it won’t take my six months to update 5 rows here, minutes usually.
Then, you’ll need a front end. Cloud datalab is a Jupyter notebook, which is good because I don’t want nor do I need anything else.
Then, with a single line of code I connect by datalab (Jupyter) notebook to my data in BigQuery and build away.
I’ve worked in all three and the only thing I care about is getting to my job the fastest and right now that means I build my models in GCP.
If you’re new to machine learning don’t start in GCP or any cloud vendor for that matter. Start learning Python from the comfort of your laptop.
The course below is free to the first 20.
The Complete Python Course for Machine Learning Engineers
Here, I want to share the best research paper on Machine Learning classification methods, titled ‘Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?’, published in the ‘Journal of Machine Learning Research’.
This paper nicely explained 179 classification techniques and applied them on 121 data sets thus sharing small summary of the paper:
Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?
The paper evaluated 179 classifiers arising from 17 ML families (discriminant analysis, Bayesian, neural networks, support vector machines, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearest neighbours, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods), implemented in Weka, R ( with and without the caret package), C and Matlab, including all the relevant classifiers available today.
Experiments used total 121 data sets , which represent the whole UCI data base (excluding the large-scale problems) and other own real problems, in order to achieve significant conclusions about the classifier behaviour, not dependent on the data set collection.
The whole data set and partitions are available from: http://persoal.citius.usc.es/manuel.fernandez.delgado/papers/jmlr/data.tar.gz
The classifiers most likely to be the bests are the random forest (RF) versions, the best of which (implemented in R and accessed via caret) achieves 94.1% of the maximum accuracy overcoming 90% in the 84.3% of the data sets. However, the difference is not statistically significant with the second best, the SVM with Gaussian kernel implemented in C using LibSVM, which achieves 92.3% of the maximum accuracy. A few models are clearly better than the remaining ones: random forest, SVM with Gaussian and polynomial kernels, extreme learning machine with Gaussian kernel, C5.0 and avNNet (a committee of multi-layer perceptrons implemented in R with the caret package).
The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in the top-20, respectively).
You can see the table with the complete results: http://persoal.citius.usc.es/manuel.fernandez.delgado/papers/jmlr/results.txt
I hope it will be helpful for Statistic and Machine Leaning aspirants!
Thank you!
These basic questions should help:
1. Is the classification going to be supervised or unsupervised? Several well defined techniques likes SVM (Support Vector Machines), trained neural net,etc. are applicable for supervised classification. For unsupervised classification, GMMs (Gaussian Mixture Models), HMMs (Hidden Markov models) with Baye’s techniques could be used. (Several other techniques could of course be used as well)
2.How much training data do you have in case it is supervised ? A small number of training data may yield discouraging classification accuracy even if the chosen classifier is the most suitable one for the problem. In such a case, try to obtain more number of samples. There’s also generally a correlation (for practical purposes at least) between the feature dimensionality and the number of samples for given technique. For example, while using SVM, the linear kernel tends to yield better results when the number of training samples are less than or equal to or only slightly more than the number of feature dimensions as compared to RBF or any other kernel.
3. If the feature vector dimensionality is small enough (1/2/3 -D) then it makes sense to plot and visually inspect if techniques like clustering could be more useful. With very high number of feature dimensions, methods like clustering are generally not advisable(Refer : “The Curse Of Dimensionality”).
4. Are you doing classification in real time ? Some techniques ,e.g. “Template Match” in image classification may lead to a higher number of errors but is generally faster than most other techniques if the number of templates to be evaluated are not excessively high.
5. Depending upon the problem domain, you can decide if you can choose the underlying model in such a way that it can use certain temporal/spatial correlations that may be inherent in the data. For example, HMMs use the temporal continuity of speech samples for enhancing classification results in speech recognition problems.
Another point, slightly off the topic perhaps, but the classification performance is as much a function of choosing the correct feature vectors, the pre-processing of the feature vectors as much as the classifier itself. It’s generally a good idea to give reserve some initial part of the project to try out various classifiers on the same data-set. It may at least help you reject the ones which are highly inaccurate.
At a high level, these skills are a combination of software and data engineering.
The persons that are more appropriate to do this job are a data engineer and/or a machine learning engineer.
That being said, if you work at a startup or happen to be in a small company and need to put the models into production yourself, here are the top skills you need to get:
- Well structured code: it doesn’t need to be perfect but at least can be understood and updated by other team members. Avoid spaghetti code[1] as the plague.
- Add logs: if you are a Python user, the logging[2] module is your friend. Avoid print statements at any cost.
- Model versioning: add a hash key to your different models. You will thank me later.
- Metadata everywhere: save as much data about your models and ML experiments as you can (running time, hyperparameters, used features, CV scores, and so on). You will thank me later, again.
- Monitor performances: execution time and statistical scores of your models.
- Data and models management: store the necessary data and models somewhere that is available to everyone (S3[3] for example). Avoid uploading these to your VCS[4] system. Don’t share them using Slack or Drive. I won’t judge you though, I do it sometimes (read often). Read more here …..
Some of the mistakes that might involve during building a machine learning model (I can think of) are listed here:
- Not understanding the structure of the dataset
- Not giving proper care during features selection
- Leaving out categorical features and considering just numerical variables
- Falling into dummy variable trap
- Selection of inefficient machine learning algorithm
- Not trying out various ML algorithms for building the model based on structure of data.
- Improper tuning of model parameters
- Most importantly: Building an idiotstic imperfect model i.e. suppose we have a classification problem with 99% chances of falling into class1 and remaining to class2. The built model may develop a mapping function which all the time for all data inputs, may predict the result to be class1. Well, one might say his/her model has 99% accuracy. But in reality the 1% class2 case hasn’t been included in the model. So this must be taken into consideration.
- Read more here…
[appbox appstore 1560083470-iphone screenshots]
[appbox googleplay com.awssolutionarchitectassociateexampreppro.app]
Basically, data mining is a key aspect of data analytics. Some even consider the former as essential to execute before the latter. While data analytics is the complete package and involves most components needed to examine a data set and extract valuable information, data mining focuses specifically on identifying hidden patterns.
That’s just the surface-level comparison though. The image above gives an overview of how the two differ.
One such difference is the presence of a hypothesis. Data analytics usually requires coming up with one, as it aims to find specific answers. Data mining, on the other hand, generally doesn’t need one to test or prove. The expected output are patterns or trends, which doesn’t require coming up with a statement or fact to test.
However, that doesn’t mean you mine data blindly. You still have a goal, whether it’s to come up with a recommender system or identify predictors of a certain dimension. Ultimately though, you strive to come up with data patterns or trends. For data analysis on the other hand, you’re expected to come up with valuable and actionable insights, usually in relation to a predetermined hypothesis. Read more here ….
The data science life cycle is not something well-defined like the software development life-cycle, and there is no ‘one-size-fits-all’ solution for data science projects. Every step in the life-cycle of a data science project depends on various data scientist skills and data science tools. The typical life-cycle of a data science project involves jumping back and forth among various interdependent science tasks using a variety of tools, techniques, programming, etc.
Thus, the data science life-cycle can include the following steps:
- Business requirement understanding.
- Data collection.
- Data cleaning.
- Data analysis.
- Modeling.
- Performance evaluation.
- Communicating with stakeholders.
- Deployment.
- Real-world testing.
- Business buy-in.
- Support and maintenance.
Looks neat, but here is the scheme to visualize how it is happening in reality:
Agile development processes, especially continuous delivery lends itself well to the data science project life-cycle. The early comparison helps the data science team to change approaches, refine hypotheses and even discard the project if the business case is nonviable or the benefits from the predictive models are not worth the effort to build it.
[appbox appstore 1611045854-iphone screenshots]
[appbox microsoftstore 9n8rl80hvm4t-mobile screenshots]
Machine Learning Q&A -Part II:
At a high level, these skills are a combination of software and data engineering.
The persons that are more appropriate to do this job are a data engineer and/or a machine learning engineer.
That being said, if you work at a startup or happen to be in a small company and need to put the models into production yourself, here are the top skills you need to get:
- Well structured code: it doesn’t need to be perfect but at least can be understood and updated by other team members. Avoid spaghetti code[1] as the plague.
- Add logs: if you are a Python user, the logging[2] module is your friend. Avoid print statements at any cost.
- Model versioning: add a hash key to your different models. You will thank me later.
- Metadata everywhere: save as much data about your models and ML experiments as you can (running time, hyperparameters, used features, CV scores, and so on). You will thank me later, again.
- Monitor performances: execution time and statistical scores of your models.
- Data and models management: store the necessary data and models somewhere that is available to everyone (S3[3] for example). Avoid uploading these to your VCS[4] system. Don’t share them using Slack or Drive. I won’t judge you though, I do it sometimes (read often). Read more here …..
Some of the mistakes that might involve during building a machine learning model (I can think of) are listed here:
- Not understanding the structure of the dataset
- Not giving proper care during features selection
- Leaving out categorical features and considering just numerical variables
- Falling into dummy variable trap
- Selection of inefficient machine learning algorithm
- Not trying out various ML algorithms for building the model based on structure of data.
- Improper tuning of model parameters
- Most importantly: Building an idiotstic imperfect model i.e. suppose we have a classification problem with 99% chances of falling into class1 and remaining to class2. The built model may develop a mapping function which all the time for all data inputs, may predict the result to be class1. Well, one might say his/her model has 99% accuracy. But in reality the 1% class2 case hasn’t been included in the model. So this must be taken into consideration.
- Read more here…
Basically, data mining is a key aspect of data analytics. Some even consider the former as essential to execute before the latter. While data analytics is the complete package and involves most components needed to examine a data set and extract valuable information, data mining focuses specifically on identifying hidden patterns.
That’s just the surface-level comparison though. The image above gives an overview of how the two differ.
One such difference is the presence of a hypothesis. Data analytics usually requires coming up with one, as it aims to find specific answers. Data mining, on the other hand, generally doesn’t need one to test or prove. The expected output are patterns or trends, which doesn’t require coming up with a statement or fact to test.
However, that doesn’t mean you mine data blindly. You still have a goal, whether it’s to come up with a recommender system or identify predictors of a certain dimension. Ultimately though, you strive to come up with data patterns or trends. For data analysis on the other hand, you’re expected to come up with valuable and actionable insights, usually in relation to a predetermined hypothesis. Read more here ….
The data science life cycle is not something well-defined like the software development life-cycle, and there is no ‘one-size-fits-all’ solution for data science projects. Every step in the life-cycle of a data science project depends on various data scientist skills and data science tools. The typical life-cycle of a data science project involves jumping back and forth among various interdependent science tasks using a variety of tools, techniques, programming, etc.
Thus, the data science life-cycle can include the following steps:
- Business requirement understanding.
- Data collection.
- Data cleaning.
- Data analysis.
- Modeling.
- Performance evaluation.
- Communicating with stakeholders.
- Deployment.
- Real-world testing.
- Business buy-in.
- Support and maintenance.
Looks neat, but here is the scheme to visualize how it is happening in reality:
Agile development processes, especially continuous delivery lends itself well to the data science project life-cycle. The early comparison helps the data science team to change approaches, refine hypotheses and even discard the project if the business case is nonviable or the benefits from the predictive models are not worth the effort to build it.
iOs: https://apps.apple.com/ca/app/aws-machine-learning-prep-pro/id1611045854
Android/Amazon: https://www.amazon.com/gp/product/B09TZ4H8V6
AWS MLS-C01 Machine Learning Exam Prep
Quizzes, Practice Exams: Modeling, Data Engineering, Vision, Exploratory Data Analysis, ML Ops, Cheat Sheets, ML Jobs Interview Q&A
Use this App to learn about Machine Learning on AWS and prepare for the AWS Machine Learning Specialty Certification MLS-C01.
Earning AWS Certified Machine Learning Specialty validates expertise in building, training, tuning, and deploying machine learning (ML) models on AWS.
The App provides hundreds of quizzes and practice exam about:
– Machine Learning Operation on AWS
– Modelling
– Data Engineering
– Computer Vision,
– Exploratory Data Analysis,
– ML implementation & Operations
– Machine Learning Basics Questions and Answers
– Machine Learning Advanced Questions and Answers
– Scorecard
– Countdown timer
– Machine Learning Cheat Sheets
– Machine Learning Interview Questions and Answers
– Machine Learning Latest News
The App covers Machine Learning Basics and Advanced topics including: NLP, Computer Vision, Python, linear regression, logistic regression, Sampling, dataset, statistical interaction, selection bias, non-Gaussian distribution, bias-variance trade-off, Normal Distribution, correlation and covariance, Point Estimates and Confidence Interval, A/B Testing, p-value, statistical power of sensitivity, over-fitting and under-fitting, regularization, Law of Large Numbers, Confounding Variables, Survivorship Bias, univariate, bivariate and multivariate, Resampling, ROC curve, TF/IDF vectorization, Cluster Sampling, etc.
Domain 1: Data Engineering
Create data repositories for machine learning.
Identify data sources (e.g., content and location, primary sources such as user data)
Determine storage mediums (e.g., DB, Data Lake, S3, EFS, EBS)
Identify and implement a data ingestion solution.
Data job styles/types (batch load, streaming)
Data ingestion pipelines (Batch-based ML workloads and streaming-based ML workloads), etc.
Domain 2: Exploratory Data Analysis
Sanitize and prepare data for modeling.
Perform feature engineering.
Analyze and visualize data for machine learning.
Domain 3: Modeling
Frame business problems as machine learning problems.
Select the appropriate model(s) for a given machine learning problem.
Train machine learning models.
Perform hyperparameter optimization.
Evaluate machine learning models.
Domain 4: Machine Learning Implementation and Operations
Build machine learning solutions for performance, availability, scalability, resiliency, and fault tolerance.
Recommend and implement the appropriate machine learning services and features for a given problem.
Apply basic AWS security practices to machine learning solutions.
Deploy and operationalize machine learning solutions.
Machine Learning Services covered:
Amazon Comprehend
AWS Deep Learning AMIs (DLAMI)
AWS DeepLens
Amazon Forecast
Amazon Fraud Detector
Amazon Lex
Amazon Polly
Amazon Rekognition
Amazon SageMaker
Amazon Textract
Amazon Transcribe
Amazon Translate
Other Services and topics covered are:
Ingestion/Collection
Processing/ETL
Data analysis/visualization
Model training
Model deployment/inference
Operational
AWS ML application services
Language relevant to ML (for example, Python, Java, Scala, R, SQL)
Notebooks and integrated development environments (IDEs),
S3, SageMaker, Kinesis, Lake Formation, Athena, Kibana, Redshift, Textract, EMR, Glue, SageMaker, CSV, JSON, IMG, parquet or databases, Amazon Athena
Amazon EC2, Amazon Elastic Container Registry (Amazon ECR), Amazon Elastic Container Service, Amazon Elastic Kubernetes Service , Amazon Redshift
Sagemaker API Explained:
AWS Certified Machine Learning Engineer Specialty Questions and Answers:
Question1: An advertising and analytics company uses machine learning to predict user response to online advertisements using a custom XGBoost model. The company wants to improve its ML pipeline by porting its training and inference code, written in R, to Amazon SageMaker, and do so with minimal changes to the existing code.
Answer1: Use the Build Your Own Container (BYOC) Amazon Sagemaker option.
Create a new docker container with the existing code. Register the container in Amazon Elastic Container registry. with the existing code. Register the container in Amazon Elastic Container Registry. Finally run the training and inference jobs using this container.
Question2: Which feature of Amazon SageMaker can you use for preprocessing the data?
Answer2: Amazon Sagemaker Notebook instances
Amazon SageMaker enables developers and data scientists to build, train, tune, and deploy machine learning (ML) models at scale. You can deploy trained ML models for real-time or batch predictions on unseen data, a process known as inference. However, in most cases, the raw input data must be preprocessed and can’t be used directly for making predictions. This is because most ML models expect the data in a predefined format, so the raw data needs to be first cleaned and formatted in order for the ML model to process the data. You can use the Amazon SageMaker built-in Scikit-learn library for preprocessing input data and then use the Amazon SageMaker built-in Linear Learner algorithm for predictions.
Question3: What setting, when creating an Amazon SageMaker notebook instance, can you use to install libraries and import data?
Answer3: LifeCycle Configuration
Question4: How to Choose the right Sagemaker built-in algorithm?




This is a general guide for choosing which algorithm to use depending on what business problem you have and what data you have.
Top 10 Google Professional Machine Learning Engineer Sample Questions
Question 1: You work for a textile manufacturer and have been asked to build a model to detect and classify fabric defects. You trained a machine learning model with high recall based on high resolution images taken at the end of the production line. You want quality control inspectors to gain trust in your model. Which technique should you use to understand the rationale of your classifier?
A. Use K-fold cross validation to understand how the model performs on different test datasets.
B. Use the Integrated Gradients method to efficiently compute feature attributions for each predicted image.
C. Use PCA (Principal Component Analysis) to reduce the original feature set to a smaller set of easily understood features.
D. Use k-means clustering to group similar images together, and calculate the Davies-Bouldin index to evaluate the separation between clusters.
Answer 1)
BNotes 1)
Question 2: You need to write a generic test to verify whether Dense Neural Network (DNN) models automatically released by your team have a sufficient number of parameters to learn the task for which they were built. What should you do?
Answer 2)
Notes 2)
[appbox appstore 1560083470-iphone screenshots]
[appbox googleplay com.awssolutionarchitectassociateexampreppro.app]
Answer 3)
Notes 3)
Question 4: You work on a team where the process for deploying a model into production starts with data scientists training different versions of models in a Kubeflow pipeline. The workflow then stores the new model artifact into the corresponding Cloud Storage bucket. You need to build the next steps of the pipeline after the submitted model is ready to be tested and deployed in production on AI Platform. How should you configure the architecture before deploying the model to production?
Question 10) You work for a large financial institution that is planning to use Dialogflow to create a chatbot for the company’s mobile app. You have reviewed old chat logs and tagged each conversation for intent based on each customer’s stated intention for contacting customer service. About 70% of customer inquiries are simple requests that are solved within 10 intents. The remaining 30% of inquiries require much longer and more complicated requests. Which intents should you automate first?
[appbox appstore 1611045854-iphone screenshots]
[appbox microsoftstore 9n8rl80hvm4t-mobile screenshots]
Machine Learning Q&A Part I:
Google.
Azure and AWS are second class citizens in this area.
Sure, AWS has 70% of the market.
Sure, Azure is the easiest turn key and super user friendly.
But, the king of machine learning in the cloud is GCP.
GCP = Google Cloud Platform
Google has the largest data science team in the world, not mention they have Hinton.
Let’s forgot for a minute they created TensorFlow and give it away.
Let’s just talk about building a real world model with data that doesn’t fit into a excel spreadsheet.
The vast majority of applied machine learning is supervised and that means we need data.
Not just normal data, we need very clean highly structured data.
Where’s the easiest place in the world to upload and model a Petabyte of structured data? BigQuery of course.
Why BigQuery? I don’t have to do anything but upload my data. No spinning up RedShit clusters or whatever I have to do in Azure, just upload and massage data with my familiar SQL. If I do have to wrangle my data it won’t take my six months to update 5 rows here, minutes usually.
Then, you’ll need a front end. Cloud datalab is a Jupyter notebook, which is good because I don’t want nor do I need anything else.
Then, with a single line of code I connect by datalab (Jupyter) notebook to my data in BigQuery and build away.
I’ve worked in all three and the only thing I care about is getting to my job the fastest and right now that means I build my models in GCP.
If you’re new to machine learning don’t start in GCP or any cloud vendor for that matter. Start learning Python from the comfort of your laptop.
The course below is free to the first 20.
The Complete Python Course for Machine Learning Engineers
Here, I want to share the best research paper on Machine Learning classification methods, titled ‘Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?’, published in the ‘Journal of Machine Learning Research’.
This paper nicely explained 179 classification techniques and applied them on 121 data sets thus sharing small summary of the paper:
Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?
The paper evaluated 179 classifiers arising from 17 ML families (discriminant analysis, Bayesian, neural networks, support vector machines, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearest neighbours, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods), implemented in Weka, R ( with and without the caret package), C and Matlab, including all the relevant classifiers available today.
Experiments used total 121 data sets , which represent the whole UCI data base (excluding the large-scale problems) and other own real problems, in order to achieve significant conclusions about the classifier behaviour, not dependent on the data set collection.
The whole data set and partitions are available from: http://persoal.citius.usc.es/manuel.fernandez.delgado/papers/jmlr/data.tar.gz
The classifiers most likely to be the bests are the random forest (RF) versions, the best of which (implemented in R and accessed via caret) achieves 94.1% of the maximum accuracy overcoming 90% in the 84.3% of the data sets. However, the difference is not statistically significant with the second best, the SVM with Gaussian kernel implemented in C using LibSVM, which achieves 92.3% of the maximum accuracy. A few models are clearly better than the remaining ones: random forest, SVM with Gaussian and polynomial kernels, extreme learning machine with Gaussian kernel, C5.0 and avNNet (a committee of multi-layer perceptrons implemented in R with the caret package).
The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in the top-20, respectively).
You can see the table with the complete results: http://persoal.citius.usc.es/manuel.fernandez.delgado/papers/jmlr/results.txt
I hope it will be helpful for Statistic and Machine Leaning aspirants!
Thank you!
These basic questions should help:
1. Is the classification going to be supervised or unsupervised? Several well defined techniques likes SVM (Support Vector Machines), trained neural net,etc. are applicable for supervised classification. For unsupervised classification, GMMs (Gaussian Mixture Models), HMMs (Hidden Markov models) with Baye’s techniques could be used. (Several other techniques could of course be used as well)
2.How much training data do you have in case it is supervised ? A small number of training data may yield discouraging classification accuracy even if the chosen classifier is the most suitable one for the problem. In such a case, try to obtain more number of samples. There’s also generally a correlation (for practical purposes at least) between the feature dimensionality and the number of samples for given technique. For example, while using SVM, the linear kernel tends to yield better results when the number of training samples are less than or equal to or only slightly more than the number of feature dimensions as compared to RBF or any other kernel.
3. If the feature vector dimensionality is small enough (1/2/3 -D) then it makes sense to plot and visually inspect if techniques like clustering could be more useful. With very high number of feature dimensions, methods like clustering are generally not advisable(Refer : “The Curse Of Dimensionality”).
4. Are you doing classification in real time ? Some techniques ,e.g. “Template Match” in image classification may lead to a higher number of errors but is generally faster than most other techniques if the number of templates to be evaluated are not excessively high.
5. Depending upon the problem domain, you can decide if you can choose the underlying model in such a way that it can use certain temporal/spatial correlations that may be inherent in the data. For example, HMMs use the temporal continuity of speech samples for enhancing classification results in speech recognition problems.
Another point, slightly off the topic perhaps, but the classification performance is as much a function of choosing the correct feature vectors, the pre-processing of the feature vectors as much as the classifier itself. It’s generally a good idea to give reserve some initial part of the project to try out various classifiers on the same data-set. It may at least help you reject the ones which are highly inaccurate.
At a high level, these skills are a combination of software and data engineering.
The persons that are more appropriate to do this job are a data engineer and/or a machine learning engineer.
That being said, if you work at a startup or happen to be in a small company and need to put the models into production yourself, here are the top skills you need to get:
- Well structured code: it doesn’t need to be perfect but at least can be understood and updated by other team members. Avoid spaghetti code[1] as the plague.
- Add logs: if you are a Python user, the logging[2] module is your friend. Avoid print statements at any cost.
- Model versioning: add a hash key to your different models. You will thank me later.
- Metadata everywhere: save as much data about your models and ML experiments as you can (running time, hyperparameters, used features, CV scores, and so on). You will thank me later, again.
- Monitor performances: execution time and statistical scores of your models.
- Data and models management: store the necessary data and models somewhere that is available to everyone (S3[3] for example). Avoid uploading these to your VCS[4] system. Don’t share them using Slack or Drive. I won’t judge you though, I do it sometimes (read often). Read more here …..
Some of the mistakes that might involve during building a machine learning model (I can think of) are listed here:
- Not understanding the structure of the dataset
- Not giving proper care during features selection
- Leaving out categorical features and considering just numerical variables
- Falling into dummy variable trap
- Selection of inefficient machine learning algorithm
- Not trying out various ML algorithms for building the model based on structure of data.
- Improper tuning of model parameters
- Most importantly: Building an idiotstic imperfect model i.e. suppose we have a classification problem with 99% chances of falling into class1 and remaining to class2. The built model may develop a mapping function which all the time for all data inputs, may predict the result to be class1. Well, one might say his/her model has 99% accuracy. But in reality the 1% class2 case hasn’t been included in the model. So this must be taken into consideration.
- Read more here…
[appbox appstore 1560083470-iphone screenshots]
[appbox googleplay com.awssolutionarchitectassociateexampreppro.app]
Basically, data mining is a key aspect of data analytics. Some even consider the former as essential to execute before the latter. While data analytics is the complete package and involves most components needed to examine a data set and extract valuable information, data mining focuses specifically on identifying hidden patterns.
That’s just the surface-level comparison though. The image above gives an overview of how the two differ.
One such difference is the presence of a hypothesis. Data analytics usually requires coming up with one, as it aims to find specific answers. Data mining, on the other hand, generally doesn’t need one to test or prove. The expected output are patterns or trends, which doesn’t require coming up with a statement or fact to test.
However, that doesn’t mean you mine data blindly. You still have a goal, whether it’s to come up with a recommender system or identify predictors of a certain dimension. Ultimately though, you strive to come up with data patterns or trends. For data analysis on the other hand, you’re expected to come up with valuable and actionable insights, usually in relation to a predetermined hypothesis. Read more here ….
The data science life cycle is not something well-defined like the software development life-cycle, and there is no ‘one-size-fits-all’ solution for data science projects. Every step in the life-cycle of a data science project depends on various data scientist skills and data science tools. The typical life-cycle of a data science project involves jumping back and forth among various interdependent science tasks using a variety of tools, techniques, programming, etc.
Thus, the data science life-cycle can include the following steps:
- Business requirement understanding.
- Data collection.
- Data cleaning.
- Data analysis.
- Modeling.
- Performance evaluation.
- Communicating with stakeholders.
- Deployment.
- Real-world testing.
- Business buy-in.
- Support and maintenance.
Looks neat, but here is the scheme to visualize how it is happening in reality:
Agile development processes, especially continuous delivery lends itself well to the data science project life-cycle. The early comparison helps the data science team to change approaches, refine hypotheses and even discard the project if the business case is nonviable or the benefits from the predictive models are not worth the effort to build it.
[appbox appstore 1611045854-iphone screenshots]
[appbox microsoftstore 9n8rl80hvm4t-mobile screenshots]
Machine Learning Q&A -Part II:
At a high level, these skills are a combination of software and data engineering.
The persons that are more appropriate to do this job are a data engineer and/or a machine learning engineer.
That being said, if you work at a startup or happen to be in a small company and need to put the models into production yourself, here are the top skills you need to get:
- Well structured code: it doesn’t need to be perfect but at least can be understood and updated by other team members. Avoid spaghetti code[1] as the plague.
- Add logs: if you are a Python user, the logging[2] module is your friend. Avoid print statements at any cost.
- Model versioning: add a hash key to your different models. You will thank me later.
- Metadata everywhere: save as much data about your models and ML experiments as you can (running time, hyperparameters, used features, CV scores, and so on). You will thank me later, again.
- Monitor performances: execution time and statistical scores of your models.
- Data and models management: store the necessary data and models somewhere that is available to everyone (S3[3] for example). Avoid uploading these to your VCS[4] system. Don’t share them using Slack or Drive. I won’t judge you though, I do it sometimes (read often). Read more here …..
Some of the mistakes that might involve during building a machine learning model (I can think of) are listed here:
- Not understanding the structure of the dataset
- Not giving proper care during features selection
- Leaving out categorical features and considering just numerical variables
- Falling into dummy variable trap
- Selection of inefficient machine learning algorithm
- Not trying out various ML algorithms for building the model based on structure of data.
- Improper tuning of model parameters
- Most importantly: Building an idiotstic imperfect model i.e. suppose we have a classification problem with 99% chances of falling into class1 and remaining to class2. The built model may develop a mapping function which all the time for all data inputs, may predict the result to be class1. Well, one might say his/her model has 99% accuracy. But in reality the 1% class2 case hasn’t been included in the model. So this must be taken into consideration.
- Read more here…
Basically, data mining is a key aspect of data analytics. Some even consider the former as essential to execute before the latter. While data analytics is the complete package and involves most components needed to examine a data set and extract valuable information, data mining focuses specifically on identifying hidden patterns.
That’s just the surface-level comparison though. The image above gives an overview of how the two differ.
One such difference is the presence of a hypothesis. Data analytics usually requires coming up with one, as it aims to find specific answers. Data mining, on the other hand, generally doesn’t need one to test or prove. The expected output are patterns or trends, which doesn’t require coming up with a statement or fact to test.
However, that doesn’t mean you mine data blindly. You still have a goal, whether it’s to come up with a recommender system or identify predictors of a certain dimension. Ultimately though, you strive to come up with data patterns or trends. For data analysis on the other hand, you’re expected to come up with valuable and actionable insights, usually in relation to a predetermined hypothesis. Read more here ….
The data science life cycle is not something well-defined like the software development life-cycle, and there is no ‘one-size-fits-all’ solution for data science projects. Every step in the life-cycle of a data science project depends on various data scientist skills and data science tools. The typical life-cycle of a data science project involves jumping back and forth among various interdependent science tasks using a variety of tools, techniques, programming, etc.
Thus, the data science life-cycle can include the following steps:
- Business requirement understanding.
- Data collection.
- Data cleaning.
- Data analysis.
- Modeling.
- Performance evaluation.
- Communicating with stakeholders.
- Deployment.
- Real-world testing.
- Business buy-in.
- Support and maintenance.
Looks neat, but here is the scheme to visualize how it is happening in reality:
Agile development processes, especially continuous delivery lends itself well to the data science project life-cycle. The early comparison helps the data science team to change approaches, refine hypotheses and even discard the project if the business case is nonviable or the benefits from the predictive models are not worth the effort to build it.
Machine Learning Latest News
Top 10 Machine Learning Algorithms
What are the simplest examples of machine learning algorithms?
Source: Top 10 Machine Learning Algorithms for Data Scientist
In machine learning, there’s something called the “No Free Lunch” theorem. In a nutshell, it states that no one algorithm works best for every problem. It’s especially relevant for supervised learning. For example, you can’t say that neural networks are always better than decision trees or vice-versa. Furthermore, there are many factors at play, such as the size and structure of your dataset. As a result, you should try many different algorithms for your problem!
Top ML Algorithms
1. Linear Regression
Regression is a technique for numerical prediction. Additionally, regression is a statistical measure that attempts to determine the strength of the relationship between two variables. One is a dependent variable. Other is from a series of other changing variables which are our independent variables. Moreover, just like Classification is for predicting categorical labels, Regression is for predicting a continuous value. For example, we may wish to predict the salary of university graduates with 5 years of work experience. We use regression to determine how much specific factors or sectors influence the dependent variable.
Linear regression attempts to model the relationship between a scalar variable and explanatory variables by fitting a linear equation. For example, one might want to relate the weights of individuals to their heights using a linear regression model.
Additionally, this operator calculates a linear regression model. It uses the Akaike criterion for model selection. Furthermore, the Akaike information criterion is a measure of the relative goodness of a fit of a statistical model.
2. Logistic Regression
Logistic regression is a classification model. It uses input variables to predict a categorical outcome variable. The variable can take on one of a limited set of class values. A binomial logistic regression relates to two binary output categories. A multinomial logistic regression allows for more than two classes. Examples of logistic regression include classifying a binary condition as “healthy” / “not healthy”. Logistic regression applies the logistic sigmoid function to weighted input values to generate a prediction of the data class.
A logistic regression model estimates the probability of a dependent variable as a function of independent variables. The dependent variable is the output that we are trying to predict. The independent variables or explanatory variables are the factors that we feel could influence the output. Multiple regression refers to regression analysis with two or more independent variables. Multivariate regression, on the other hand, refers to regression analysis with two or more dependent variables.
3. Linear Discriminant Analysis
Logistic Regression is a classification algorithm traditionally for two-class classification problems. If you have more than two classes then the Linear Discriminant Analysis algorithm is the preferred linear classification technique.
The representation of LDA is pretty straight forward. It consists of statistical properties of your data, calculated for each class. For a single input variable this includes:
- The mean value for each class.
- The variance calculated across all classes.
We make predictions by calculating a discriminate value for each class. After that we make a prediction for the class with the largest value. The technique assumes that the data has a Gaussian distribution. Hence, it is a good idea to remove outliers from your data beforehand. It’s a simple and powerful method for classification predictive modelling problems.
4. Classification and Regression Trees
Prediction Trees are for predicting response or class YY from input X1, X2,…,XnX1,X2,…,Xn. If it is a continuous response it is a regression tree, if it is categorical, it is a classification tree. At each node of the tree, we check the value of one the input XiXi. Depending on the (binary) answer we continue to the left or to the right subbranch. When we reach a leaf we will find the prediction.
Contrary to linear or polynomial regression which are global models, trees try to partition the data space into small enough parts where we can apply a simple different model on each part. The non-leaf part of the tree is just the procedure to determine for each data xx what is the model we will use to classify it.
5. Naive Bayes
A Naive Bayes Classifier is a supervised machine-learning algorithm that uses the Bayes’ Theorem, which assumes that features are statistically independent. The theorem relies on the naive assumption that input variables are independent of each other, i.e. there is no way to know anything about other variables when given an additional variable. Regardless of this assumption, it has proven itself to be a classifier with good results.
Naive Bayes Classifiers rely on the Bayes’ Theorem, which is based on conditional probability or in simple terms, the likelihood that an event (A) will happen given that another event (B) has already happened. Essentially, the theorem allows a hypothesis to be updated each time new evidence is introduced. The equation below expresses Bayes’ Theorem in the language of probability:
Let’s explain what each of these terms means.
- “P” is the symbol to denote probability.
- P(A | B) = The probability of event A (hypothesis) occurring given that B (evidence) has occurred.
- P(B | A) = The probability of the event B (evidence) occurring given that A (hypothesis) has occurred.
- P(A) = The probability of event B (hypothesis) occurring.
- P(B) = The probability of event A (evidence) occurring.
6. K-Nearest Neighbors
k-nearest neighbours (or k-NN for short) is a simple machine learning algorithm that categorizes an input by using its k nearest neighbours.
For example, suppose a k-NN algorithm has an input of data points of specific men and women’s weight and height, as plotted below. To determine the gender of an unknown input (green point), k-NN can look at the nearest k neighbours (suppose ) and will determine that the input’s gender is male. This method is a very simple and logical way of marking unknown inputs, with a high rate of success.
Also, we can k-NN in a variety of machine learning tasks; for example, in computer vision, k-NN can help identify handwritten letters and in gene expression analysis, the algorithm can determine which genes contribute to a certain characteristic. Overall, k-nearest neighbours provide a combination of simplicity and effectiveness that makes it an attractive algorithm to use for many machine learning tasks.
7. Learning Vector Quantization
A downside of K-Nearest Neighbors is that you need to hang on to your entire training dataset. The Learning Vector Quantization algorithm (or LVQ for short) is an artificial neural network algorithm that allows you to choose how many training instances to hang onto and learns exactly what those instances should look like.
Additionally, the representation for LVQ is a collection of codebook vectors. We select them randomly in the beginning and adapted to best summarize the training dataset over a number of iterations of the learning algorithm. After learned, the codebook vectors can make predictions just like K-Nearest Neighbors. Also, we find the most similar neighbour (best matching codebook vector) by calculating the distance between each codebook vector and the new data instance. The class value or (real value in the case of regression) for the best matching unit is then returned as the prediction. Moreover, you can get the best results if you rescale your data to have the same range, such as between 0 and 1.
If you discover that KNN gives good results on your dataset try using LVQ to reduce the memory requirements of storing the entire training dataset.
8. Bagging and Random Forest
A Random Forest consists of a collection or ensemble of simple tree predictors, each capable of producing a response when presented with a set of predictor values. For classification problems, this response takes the form of a class membership, which associates, or classifies, a set of independent predictor values with one of the categories present in the dependent variable. Alternatively, for regression problems, the tree response is an estimate of the dependent variable given the predictors.e
A Random Forest consists of an arbitrary number of simple trees, which determine the final outcome. For classification problems, the ensemble of simple trees votes for the most popular class. In the regression problem, we average responses to obtain an estimate of the dependent variable. Using tree ensembles can lead to significant improvement in prediction accuracy (i.e., better ability to predict new data cases).
9. SVM
A Support Vector Machine (SVM) is a supervised machine learning algorithm that can be employed for both classification and regression purposes. Also, SVMs have more common usage in classification problems and as such, this is what we will focus on in this post.
SVMs are based on the idea of finding a hyperplane that best divides a dataset into two classes, as shown in the image below.
Also, you can think of a hyperplane as a line that linearly separates and classifies a set of data.
Intuitively, the further from the hyperplane our data points lie, the more confident we are that they have been correctly classified. We, therefore, want our data points to be as far away from the hyperplane as possible, while still being on the correct side of it.
So when we add a new testing data , whatever side of the hyperplane it lands will decide the class that we assign to it.
The distance between the hyperplane and the nearest data point from either set is the margin. Furthermore, the goal is to choose a hyperplane with the greatest possible margin between the hyperplane and any point within the training set, giving a greater chance of correct classification of data.
But the data is rarely ever as clean as our simple example above. A dataset will often look more like the jumbled balls below which represent a linearly non-separable dataset.
10. Boosting and AdaBoost
Boosting is an ensemble technique that attempts to create a strong classifier from a number of weak classifiers. We do this by building a model from the training data, then creating a second model that attempts to correct the errors from the first model. We can add models until the training set is predicted perfectly or a maximum number of models are added.
AdaBoost was the first really successful boosting algorithm developed for binary classification. It is the best starting point for understanding boosting. Modern boosting methods build on AdaBoost, most notably stochastic gradient boosting machines.
AdaBoost is used with short decision trees. After the first tree is created, the performance of the tree on each training instance is used to weight how much attention the next tree that is created should pay attention to each training instance. Training data that is hard to predict is given more weight, whereas easy to predict instances are given less weight. Models are created sequentially one after the other, each updating the weights on the training instances that affect the learning performed by the next tree in the sequence. After all the trees are built, predictions are made for new data, and the performance of each tree is weighted by how accurate it was on training data.
Because so much attention is put on correcting mistakes by the algorithm it is important that you have clean data with outliers removed.
Summary
A typical question asked by a beginner, when facing a wide variety of machine learning algorithms, is “which algorithm should I use?” The answer to the question varies depending on many factors, including: (1) The size, quality, and nature of data; (2) The available computational time; (3) The urgency of the task; and (4) What you want to do with the data.
Even an experienced data scientist cannot tell which algorithm will perform the best before trying different algorithms. Although there are many other Machine Learning algorithms, these are the most popular ones. If you’re a newbie to Machine Learning, these would be a good starting point to learn.
Follow this link, if you are looking to learn Data Science Course Online!
Additionally, if you are having an interest in learning Data Science, Learn online Data Science Course to boost your career in Data Science.
Also, learn AWS Big Data Course click here, AWS Online Course
Furthermore, if you want to read more about data science, read this Data Science blogs
The foundations of most algorithms lie in linear algebra, multivariable calculus, and optimization methods. Most algorithms use a sequence of combinations to estimate an objective function given a set of data, and the sequence order and included methods distinguish one algorithm from another. It’s helpful to learn enough math to read the development papers associated with key algorithms in the field, as many other methods (or one’s own innovations) include pieces of those algorithms. It’s like learning the language of machine learning. Once you are fluent in it, it’s pretty easy to modify algorithms as needed and create new ones likely to improve on a problem in a short period of time.
Matrix factorization: a simple, beautiful way to do dimensionality reduction —and dimensionality reduction is the essence of cognition. Recommender systems would be a big application of matrix factorization. Another application I’ve been using over the years (starting in 2010 with video data) is factorizing a matrix of pairwise mutual information (or pointwise mutual information, which is more common) between features, which can be used for feature extraction, computing word embeddings, computing label embeddings (that was the topic of a recent paper of mine [1]), etc.
Used in a convolutional settings, this acts as an excellent unsupervised feature extractor for images and videos. There’s one big issue though: it is fundamentally a shallow algorithm. Deep neural networks will quickly outperform it if any kind of supervision labels are available.
[1] [1607.05691] Information-theoretical label embeddings for large-scale image classification
Machine Learning Demos:

See how well you synchronize to the lyrics of the popular hit “Dance Monkey.” This in-browser experience uses the Facemesh model for estimating key points around the lips to score lip-syncing accuracy.Explore demo View code

Use your phone’s camera to identify emojis in the real world. Can you find all the emojis before time expires?Explore demo View code

Play Pac-Man using images trained in your browser.Explore demo View code

No coding required! Teach a machine to recognize images and play sounds.Explore demo View code

Explore pictures in a fun new way, just by moving around.Explore demo View code

Enjoy a real-time piano performance by a neural network.Explore demo View code

Train a server-side model to classify baseball pitch types using Node.js.View code

See how to visualize in-browser training and model behaviour and training using tfjs-vis.Explore demo View code
Community demos
Get started with official templates and explore top picks from the community for inspiration.Glitch
Check out community Glitches and make your own TensorFlow.js-powered projects.Explore Glitch Codepen
Fork boilerplate templates and check out working examples from the community.Explore CodePen GitHub Community Projects
See what the community has created and submitted to the TensorFlow.js gallery page.Explore GitHub
https://cdpn.io/jasonmayes/fullcpgrid/QWbNeJdOpen in Editor
Real time body segmentation using TensorFlow.js
Load in a pre-trained Body-Pix model from the TensorFlow.js team so that you can locate all pixels in an image that are part of a body, and what part of the body they belong to. Clone this to make your own TensorFlow.js powered projects to recognize body parts in images from your webcam and more!
New Pen from Template
https://cdpn.io/jasonmayes/fullcpgrid/qBEJxggOpen in Editor
Multiple object detection using pre trained model in TensorFlow.js
This demo shows how we can use a pre made machine learning solution to recognize objects (yes, more than one at a time!) on any image you wish to present to it. Even better, not only do we know that the image contains an object, but we can also get the co-ordinates of the bounding box for each object it finds, which allows you to highlight the found object in the image.
For this demo we are loading a model using the ImageNet-SSD architecture, to recognize 90 common objects it has already been taught to find from the COCO dataset.
If what you want to recognize is in that list of things it knows about (for example a cat, dog, etc), this may be useful to you as is in your own projects, or just to experiment with Machine Learning in the browser and get familiar with the possibilities of machine learning.
If you are feeling particularly confident you can check out our GitHub documentation (https://github.com/tensorflow/tfjs-models/tree/master/coco-ssd) which goes into much more detail for customizing various parameters to tailor performance to your needs.
New Pen from Template
https://cdpn.io/jasonmayes/fullcpgrid/JjompwwOpen in Editor
Classifying images using a pre trained model in TensorFlow.js
This demo shows how we can use a pre made machine learning solution to classify images (aka a binary image classifier). It should be noted that this model works best when a single item is in the image at a time. Busy images may not work so well. You may want to try our demo for Multiple Object Detection (https://codepen.io/jasonmayes/pen/qBEJxgg) for that.
For this demo we are loading a model using the MobileNet architecture, to recognize 1000 common objects it has already been taught to find from the ImageNet data set (http://image-net.org/).
If what you want to recognize is in that list of things it knows about (for example a cat, dog, etc), this may be useful to you as is in your own projects, or just to experiment with Machine Learning in the browser and get familiar with the possibilities of machine learning.
Please note: This demo loads an easy to use JavaScript class made by the TensorFlow.js team to do the hardwork for you so no machine learning knowledge is needed to use it.
If you were looking to learn how to load in a TensorFlow.js saved model directly yourself then please see our tutorial on loading TensorFlow.js models directly.
If you want to train a system to recognize your own objects, using your own data, then check out our tutorials on “transfer learning”.
New Pen from Template
Open in Editor
Tensorflow.js Boilerplate
The hello world for TensorFlow.js 🙂 Absolute minimum needed to import into your website and simply prints the loaded TensorFlow.js version. From here we can do great things. Clone this to make your own TensorFlow.js powered projects or if you are following a tutorial that needs TensorFlow.js to work.
Examples
tfjs-examples provides small code examples that implement various ML tasks using TensorFlow.js.MNIST Digit Recognizer
Train a model to recognize handwritten digits from the MNIST database.Explore example View code Addition RNN
Train a model to learn addition from text examples.Explore example View code
TensorFlow.js Layers: Iris Demo
More TensorFlow examples
Top-paying Cloud certifications:
[appbox appstore 1611045854-iphone screenshots]
[appbox microsoftstore 9n8rl80hvm4t-mobile screenshots]
- Google Certified Professional Cloud Architect — $175,761/year
- AWS Certified Solutions Architect – Associate — $149,446/year
- Azure/Microsoft Cloud Solution Architect – $141,748/yr
- Google Cloud Associate Engineer – $145,769/yr
- AWS Certified Cloud Practitioner — $131,465/year
- Microsoft Certified: Azure Fundamentals — $126,653/year
- Microsoft Certified: Azure Administrator Associate — $125,993/year
Supervised Learning
Linear Regression
Logistic Regression
Naive Bayes
Support Vector Machines
Decision Trees
K-Nearest Neighbors
Machine Learning in Practice
Bias-Variance Tradeoff
How to Select a Model
How to Select Features
Regularizing Your Model
Ensembling: How to Combine Your Models
Evaluation Metrics
Unsupervised Learning
Market Basket Analysis
K-Means Clustering
Principal Components Analysis
Deep Learning
Feedforward Neural Networks
Grab Bag of Neural Network Practices
Convolutional Neural Networks
Recurrent Neural Networks
Test Your Knowledge
Best Subset Features Feature
Selection Examples
Adding Features Example
Activation Practice I
Activation Practice II
Activation Practice III
Weight Initialization
Batch vs. Stochastic
Convolutional Application
Convolutional Layer Advantages
Are you interested in becoming an AWS Certified Machine Learning Specialist? If so, then this exam preparation blog is for you! The blog contains over 100 quiz and practice exam questions, as well as detailed answers. The questions are very similar to those you will encounter on the actual exam, so this is a great way to prepare. In addition, the blog also includes cheat sheets and illustrations to help you understand the concepts better.
Bring your own algorithm to an MLOps Pipeline: Architecture




Code and Serve Your ML Model with AWS CodeBuild


What are some ways we can use machine learning and artificial intelligence for algorithmic trading in the stock market?
How do we know that the Top 3 Voice Recognition Devices like Siri Alexa and Ok Google are not spying on us?
What are some good datasets for Data Science and Machine Learning?
Machine Learning Engineer Interview Questions and Answers
- [R] Graph-Oriented Generation (GOG): Replacing Vector R.A.G. for Codebases with Deterministic AST Traversal (70% Average Token Reduction)by /u/BodeMan5280 (Machine Learning) on March 7, 2026 at 3:25 am
Hey everyone. I’m a 5 YoE full-stack engineer who has been crossing over into AI research. Like many of you, I got incredibly frustrated with Vector RAG hallucinating import paths and losing context when navigating deep codebases. RAG treats strict software architecture like a probabilistic novel. I wanted to see what happened if we treated it like a mathematical graph instead. I wrote a white paper and built a framework around this concept called Graph-Oriented Generation (GOG). The core idea is offloading architectural reasoning from the LLM to a deterministic Symbolic Reasoning Model (SRM). How it works: The Graph: Instead of chunking text, the SRM parses the entire repository using an AST and builds a strict Directed Acyclic Graph (DAG) of all dependencies. Deterministic Traversal: We use zero-shot lexical seeding to find the user's target nodes, and then run a strict shortest-path / descendant-capture traversal to isolate the exact execution path. If a file isn't mathematically on that path, it's dropped. O(1) State Evolution: Standard RAG requires O(N) re-indexing when a file changes. The SRM intercepts file saves and uses torch.cat to perform O(1) tensor surgery in-memory, hot-swapping the new AST nodes instantly. The Benchmark Data: I ran a 3-tier complexity gauntlet using a highly constrained local model (Qwen 0.8B) on a procedurally generated 100+ file Vue/TS enterprise maze loaded with "red herring" files. Local Compute Time (Context Assembly): 1.619s (RAG) vs. 0.001s (GOG) -> 99.9% Reduction Tokens Sent to LLM (Easy Tier): 4,230 (RAG) vs. 451 (GOG) -> 89.3% Reduction Total Execution Time: 136.77s vs. 29.96s -> 78.1% Reduction By feeding the 0.8B model a pristine, noise-free execution path, it flawlessly solved deep architectural routing that caused the RAG-backed model to suffer catastrophic context collapse. It effectively demotes the LLM from a "reasoning engine" to a "syntax translator." I'm relatively new to formal research, so I am actively looking for rigorous feedback, teardowns of the methodology, or anyone interested in collaborating on the next phase (applying this to headless multi-agent loops). GitHub Repo (Code + Benchmarks): https://github.com/dchisholm125/graph-oriented-generation Would love to hear your thoughts on where this architecture falls short or how it might scale into standard IDE environments! submitted by /u/BodeMan5280 [link] [comments]
- [D] ISBI 2026 in Londonby /u/Marion-De (Machine Learning) on March 6, 2026 at 9:29 pm
Hey, everyone, is anyone from the sub going to ISBI this year? I have a paper accepted and will be giving an oral presentation. Would love to meet and connect in London for ISBI this year. submitted by /u/Marion-De [link] [comments]
- [R] Functional regularization: where do I start?by /u/PurpleCardiologist11 (Machine Learning) on March 6, 2026 at 9:09 pm
Hey guys, Any advice on functional regularization? Especially in physics applications, but general pointers are welcome too. I’m new to this and trying to understand how to regularize by controlling the function a model learns (its behavior), not just the parameters. Any good explanations, examples, or resources would be helpful! Also, I’m a bit confused about what the “original” functional regularization paper actually is, cause I’ve seen the term used in different contexts. Which paper is usually being referred to? Thanks! submitted by /u/PurpleCardiologist11 [link] [comments]
- [Project] Extracting vector geometry (SVG/DXF/STL) from photos + experimental hand-drawn sketch extractionby /u/Most-Geologist-9547 (Machine Learning) on March 6, 2026 at 5:36 pm
Hi everyone, I’ve been working on a project called ShapeScan, focused on extracting clean geometric outlines from photos of real-world objects. The goal is to convert images into usable vector and fabrication-ready formats such as SVG, DXF and STL. The pipeline currently includes several stages: Image normalization color calibration automatic page detection perspective correction noise cleanup Segmentation classical segmentation for simple scenes optional background removal experiments with larger visual models for more complex objects Contour extraction mask → contour detection topology preservation (outer contour + holes) contour smoothing Geometry conversion contours converted into paths export to: SVG DXF STL (extruded) One of the main challenges has been producing stable and manufacturable contours, especially for workflows such as laser cutting, CNC or CAD prototyping. Drawing Mode (in development) I’m currently working on a new drawing mode designed specifically for hand-drawn sketches. The idea is simple: the user draws shapes on a sheet of paper takes a photo of the sheet ShapeScan extracts the drawn outlines and converts them into clean SVG vector paths This mode uses a different processing pipeline tuned for: pen/pencil drawings sketch noise cleanup outline extraction from hand-drawn lines I’m also experimenting with integrating larger vision models to improve segmentation robustness for more complex scenes. The long-term goal is to combine object scanning + sketch extraction into a single pipeline that can convert physical shapes or drawings into fabrication-ready geometry. I’d be very interested in feedback from people working with: segmentation contour extraction vectorization pipelines topology-preserving geometry extraction Happy to discuss approaches or technical challenges. submitted by /u/Most-Geologist-9547 [link] [comments]
- [P] Domain specific LoRA fine tuning on consumer hardwareby /u/sandseb123 (Machine Learning) on March 6, 2026 at 5:29 pm
Been experimenting with a pattern for building domain-specific local LLMs that I haven't seen documented cleanly elsewhere. The problem: base models fine for general tasks but struggle with domain-specific structured data — wrong schema assumptions, inconsistent output formatting, hallucinated column names even when the data is passed as context via RAG. The approach: Phase 1 — Use your existing RAG pipeline to generate (question, SQL, data, baseline_answer) examples automatically via a local model. No annotation, no cloud, ~100-200 examples in 20 minutes. Phase 2 — Single cloud pass: a stronger model rewrites baseline answers to gold-standard quality in your target style. One-time cost ~$2-5. This is the only external API call in the entire pipeline. Phase 3 — LoRA fine-tune on Qwen3.5-4B using mlx-lm (Apple Silicon) or Unsloth+TRL (CUDA). 15-40 min on M4 Mac mini, 10-25 min on RTX 3090. Phase 4 — Fuse and serve locally. mlx-lm on Apple Silicon, GGUF + Ollama on any platform. Key observations: - RAG alone doesn't fix schema hallucination in smaller models — LoRA is needed for structural consistency - The annotation quality ceiling matters more than example count past ~100 samples - 4B models post fine-tuning outperform untuned 70B models on narrow domain tasks in my testing Built a working implementation with a finance coach example. Curious if others have found better approaches to the annotation phase specifically — that feels like the biggest lever. https://github.com/sandseb123/local-lora-cookbook submitted by /u/sandseb123 [link] [comments]
- [R] Low-effort papersby /u/lightyears61 (Machine Learning) on March 6, 2026 at 5:21 pm
I came across a professor with 100+ published papers, and the pattern is striking. Almost every paper follows the same formula: take a new YOLO version (v8, v9, v10, v11...), train it on a public dataset from Roboflow, report results, and publish. Repeat for every new YOLO release and every new application domain. https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=%22murat+bakirci%22+%22yolo%22&btnG= As someone who works in computer vision, I can confidently say this entire research output could be replicated by a grad student in a day or two using the Ultralytics repo. No novel architecture, no novel dataset, no new methodology, no real contribution beyond "we ran the latest YOLO on this dataset." The papers are getting accepted in IEEE conferences and even some Q1/Q2 journals, with surprisingly high citation counts. My questions: Is this actually academic misconduct? Is it reportable, or just a peer review failure? Is anything being done systemically about this kind of research? submitted by /u/lightyears61 [link] [comments]
- [D] Two college students built a prototype that tries to detect contradictions between research papers — curious if this would actually be usefulby /u/PS_2005 (Machine Learning) on March 6, 2026 at 4:54 pm
Hi everyone, We’re two college students who spend way too much time reading papers for projects, and we kept running into the same frustrating situation: sometimes two papers say completely opposite things, but unless you happen to read both, you’d never notice. So we started building a small experiment to see if this could be detected automatically. The idea is pretty simple: Instead of just indexing papers, the system reads them and extracts causal claims like “X improves Y” “X reduces Y” “X enables Y” Then it builds a graph of those relationships and checks if different papers claim opposite things. Example: Paper A: X increases Y Paper B: X decreases Y The system flags that and shows both papers side-by-side. We recently ran it on one professor’s publication list (about 50 papers), and the graph it produced was actually pretty interesting. It surfaced a couple of conflicting findings across studies that we probably wouldn't have noticed just by reading abstracts. But it's definitely still a rough prototype. Some issues we’ve noticed: claim extraction sometimes loses conditions in sentences occasionally the system proposes weird hypotheses domain filtering still needs improvement Tech stack is pretty simple: Python / FastAPI backend React frontend Neo4j graph database OpenAlex for paper data LLMs for extracting claims Also being honest here — a decent portion of the project was vibe-coded while exploring the idea, so the architecture evolved as we went along. We’d really appreciate feedback from people who actually deal with research literature regularly. Some things we’re curious about: Would automatic contradiction detection be useful in real research workflows? How do you currently notice when papers disagree with each other? What would make you trust (or distrust) a tool like this? If anyone wants to check it out, here’s the prototype: ukc-pink.vercel.app/ We’re genuinely trying to figure out whether this is something researchers would actually want, so honest criticism is very welcome. Thanks! https://preview.redd.it/kcwfl7deggng1.png?width=1510&format=png&auto=webp&s=0c0c33af5640b7419ac7f7cc3e7783e6d87bbc05 https://preview.redd.it/jxozisdeggng1.png?width=1244&format=png&auto=webp&s=54076610f05c948abf72c28ea77cb8055b929163 https://preview.redd.it/lfcjb8deggng1.png?width=1276&format=png&auto=webp&s=ae74e01299de64c5e9172ab3aadf1457fae36c83 https://preview.redd.it/rhesw6deggng1.png?width=1316&format=png&auto=webp&s=73598312696398b09b51f55779ff21a3fe6c023d submitted by /u/PS_2005 [link] [comments]
- How do you deal with bad bosses?by /u/AdministrativeRub484 (Data Science) on March 6, 2026 at 3:50 pm
By "bad" I don't mean it in the sense that they make me work extra time or are micro managers, but I have been facing this issue at the last 2 companies I worked at which is that my bosses are just bad at technical stuff. One example is that they sometimes tell me to evaluate regular classifiers on the training data... Another one is that they come up with their own method (without researching anything). I, on the other hand, am not that cocky so I try to see how the field is tackling the problem we have. I run experiments, show that my method is better and they either admit they were wrong and we move one with my method (very very rarely) or they make up an excuse or complaint (last time was "why did you even evaluate your method and compared it with mine?!?!"). Now we are working on a refactor of a package that is being done 100% with Claude, but it's making so many mistakes and it miss understood the purpose of the project so bad that the package is unusable, but my boss keeps saying "don't code, just ask Claude". These are people with high egos that never "research" anything and think their word is gospel. They have 10+ years of experience (in data science sometimes) and because I only have 2 they never listen to me. Is there anything I can do in these situations or can I really only look for a better company/work colleagues? Or is it like this in every company and I might as well leave it be? Or is it just because I am "new"? submitted by /u/AdministrativeRub484 [link] [comments]
- [D] Unpopular opinion: "context window size" is a red herring if you don’t control what goes in it.by /u/hack_the_developer (Machine Learning) on March 6, 2026 at 3:24 pm
We keep talking about 128k, 200k, 1M context. But if the model is bad at using the middle, or we’re stuffing in noise, more window just means more cost and more confusion. I’d rather have a small, curated context than a huge dump. Curious if others think the real problem is formation - what we put in, in what order, and how we compact - not raw size. What’s your take? submitted by /u/hack_the_developer [link] [comments]
- New ML/DS project structure for human & AIby /u/SummerElectrical3642 (Data Science) on March 6, 2026 at 2:33 pm
AI is pushing DS/ML work toward faster, automated, parallel iteration. Recently I found that the bottleneck is no longer training runs : it’s the repo and process design. Most projects are still organized by file type (src/, notebooks/, data/, configs/). That’s convenient for browsing, but brittle for operating a an AI agents team. Hidden lineage: you can’t answer “what produced this model?” without reading the code. Scattered dependency: one experiment touches 5 places; easy to miss the real source of truth. No parallel safety: multiple experiments create conflicts. I tried to wrap my head about this topic and propose a better structure: Organize by self-sufficient deliverables: src/ is the main package, the glue stitching it together. datasets/ hold self contained dataset, HF style with doc, loading utility, lineage script, versioned by dvc model/ - similar to dataset, self-contained, HF style with doc, including script to train, eval, error analysis, etc. deployments/ organized by deployment artifacts for different environment Make entry points obvious: each deliverable has local README, one canonical run command per artifact. Make lineage explicit and mechanical: DVC pipeline + versioned outputs; All context live in the repo: all insights, experiments, decisions are logged into journal/. Journal log entry are markdown, timestamped, referenced to git hash. Process: Experiments start with a branch exp/try-something-new then either merged back to main or archived. In both case, create a journal entry in main. Main merge trigger staging, release trigger production. In case project grow large, easy to split into independent repo. It may sound heavy in the beginning but once the rules are set, our AI friends take care of the operations and book keeping. Curious how you works with AI agents recently and which structure works best for you? submitted by /u/SummerElectrical3642 [link] [comments]
- [D] ECCV submission flowed over page limit by 5 lines at the last minute.. how screwed are we?by /u/PatientWrongdoer9257 (Machine Learning) on March 6, 2026 at 2:14 pm
We were making minor changes (like replacing a single word) to the submission before it closed and forgot to check the page count, since we already uploaded one that fit. Unfortunately it overflowed by 5 lines onto page 15, leaving empty space on others. Are they going to be flexible about this? Can we address this to AC and pray they understand? submitted by /u/PatientWrongdoer9257 [link] [comments]
- Mar 2026 : How effective is a Copilot Studio RAG Agent for easy/medium use-cases?by /u/noimgonnalie (Data Science) on March 6, 2026 at 7:53 am
submitted by /u/noimgonnalie [link] [comments]
- [P] On-device speech toolkit for Apple Silicon — ASR, TTS, diarization, speech-to-speech, all in native Swiftby /u/ivan_digital (Machine Learning) on March 6, 2026 at 7:44 am
Open-source Swift package running 11 speech models on Apple Silicon via MLX (GPU) and CoreML (Neural Engine). Fully local inference, no cloud dependency. Models implemented: ASR - Qwen3-ASR 0.6B/1.7B (4-bit), Parakeet TDT (CoreML INT4) - RTF ~0.06 on M2 Max TTS - Qwen3-TTS 0.6B (4-bit), CosyVoice3 0.5B (4-bit) - Streaming, ~120ms first chunk Speech-to-speech - PersonaPlex 7B (4-bit) - Full-duplex, RTF ~0.87 VAD - Silero v5, Pyannote segmentation-3.0 - Streaming + overlap detection Diarization - Pyannote + WeSpeaker + spectral clustering - Auto speaker count via GMM-BIC Enhancement - DeepFilterNet3 (CoreML) - Real-time 48kHz noise suppression Alignment - Qwen3-ForcedAligner - Non-autoregressive, RTF ~0.018 Key design choice: MLX for large models on GPU, CoreML for small models on Neural Engine. This lets you run VAD on ANE while ASR runs on GPU without contention — something WhisperKit struggles with (their Core ML audio encoder blocks the ANE for 300-600ms per call). All models conform to shared protocols, so you can swap implementations or compose pipelines. Currently working on a MeetingTranscriber pipeline (diarize → per-segment ASR) and streaming real-time diarization. Roadmap: https://github.com/soniqo/speech-swift/discussions/81 Repo: https://github.com/soniqo/speech-swift submitted by /u/ivan_digital [link] [comments]
- [R] Anyone experimenting with heterogeneous (different base LLMs) multi-agent systems for open-ended scientific reasoning or hypothesis generation?by /u/Clear-Dimension-6890 (Machine Learning) on March 6, 2026 at 6:35 am
Quick question — has anyone tried multi-agent setups where agents use genuinely different underlying LLMs (not just roles on the same model) for scientific-style open-ended reasoning or hypothesis gen? Most stuff seems homogeneous. Curious if mixing distinct priors adds anything useful, or if homogeneous still rules. Pointers to papers/experiments/anecdotes appreciated! Thanks! submitted by /u/Clear-Dimension-6890 [link] [comments]
- [R] MICCAI 2026 Early Decisionsby /u/Amazing_Lie1688 (Machine Learning) on March 6, 2026 at 2:13 am
Hi, I am wondering if anyone has received their manuscript decision. Mine shows the status "awaiting decision." Last time, it was desk-rejected, and I am curious if this indicates a desk rejection. Thanks submitted by /u/Amazing_Lie1688 [link] [comments]
- [D] M1 Pro is hitting a wall with LLMs. Upgrade to M5 Max now or wait for the M6 redesign?by /u/tom_mathews (Machine Learning) on March 6, 2026 at 1:43 am
I'm an AI Engineer currently daily-driving a 16" M1 Pro MBP. It’s been a workhorse, but I’m feeling the bottleneck when running larger local LLMs (30B+ parameters or heavy RAG pipelines). With the M5 Pro/Max "Fusion Architecture" just announced, the 8x AI performance jump over the M1 generation is tempting, especially with the 18-core CPU and faster SSDs. However, I have two hesitations: The Notch: I still find it non-functional and distracting. The M6 Rumors: Reliable leaks suggest a late 2026 redesign with Tandem OLED, a hole-punch/Dynamic Island (finally moving past the notch), and even thinner chassis. For those doing heavy local inference: is the M5 Max gain worth pulling the trigger now, or is the M1 Pro "good enough" to limp through until the M6 redesign actually fixes the display? submitted by /u/tom_mathews [link] [comments]
- How to prep for Full Stack DS interview?by /u/LeaguePrototype (Data Science) on March 6, 2026 at 1:43 am
I have an interview coming up with for a Full stack DS position at a small,public tech adjacent company. Im excited for it since it seems highly technical, but they list every aspect of DS on the job description. It seems ML, AB testing oriented like you'll be helping with building the model and testing them since the product itself is oriented around ML. The technical part interview consists of python round and onsite (or virtual onsite). Has anyone had similar interviews? How do you recommend to prep? I'm mostly concerned how deep to go on each topic or what they are mostly interested in seeing? In the past I've had interviews of all types of technical depth submitted by /u/LeaguePrototype [link] [comments]
- [D] IJCAI'26 AI4Tech trackby /u/AddendumNo5533 (Machine Learning) on March 6, 2026 at 12:12 am
Did anyone submit to this ? Please let me know if you have, and whether or not you received any notification yet. submitted by /u/AddendumNo5533 [link] [comments]
- How do you keep track of model iterations in a project?by /u/Fig_Towel_379 (Data Science) on March 5, 2026 at 9:53 pm
At my company some of the ML processes are still pretty immature. For example, if my teammate and I are testing two different modeling approaches, each approach ends up having multiple iterations like different techniques, hyperparameters, new datasets, etc. It quickly gets messy and it’s hard to keep track of which model run corresponds to what. We also end up with a lot of scattered Jupyter notebooks. To address this I’m trying to build a small internal tool. Since we only use XGBoost, the idea is to keep it simple. A user would define a config file with things like XGBoost parameters, dataset, output path, etc. The tool would run the training and generate a report that summarizes the experiment: which hyperparameters were used, which model performed best, evaluation metrics, and some visualizations. My hope is that this reduces the need for long, messy notebooks and makes experiments easier to track and reproduce. What do you think of this? Edit: I cannot use external tools such as MLflow submitted by /u/Fig_Towel_379 [link] [comments]
- [D] Has anyone read Blaise Agüera y Arcas' What is Intelligence?by /u/LowStatistician11 (Machine Learning) on March 5, 2026 at 5:47 pm
I've read the first couple sections and it seems he is gearing up to make some big claims. Almost suspecting some pop philosophy that belongs on r/singularity. But he seems like a legit researcher and also the guy that invented federated learning apparently. lmk if anyone here has any inputs. submitted by /u/LowStatistician11 [link] [comments]
What is Google Workspace?
Google Workspace is a cloud-based productivity suite that helps teams communicate, collaborate and get things done from anywhere and on any device. It's simple to set up, use and manage, so your business can focus on what really matters.
Watch a video or find out more here.
Here are some highlights:
Business email for your domain
Look professional and communicate as you@yourcompany.com. Gmail's simple features help you build your brand while getting more done.
Access from any location or device
Check emails, share files, edit documents, hold video meetings and more, whether you're at work, at home or on the move. You can pick up where you left off from a computer, tablet or phone.
Enterprise-level management tools
Robust admin settings give you total command over users, devices, security and more.
Sign up using my link https://referworkspace.app.goo.gl/Q371 and get a 14-day trial, and message me to get an exclusive discount when you try Google Workspace for your business.
Google Workspace Business Standard Promotion code for the Americas
63F733CLLY7R7MM
63F7D7CPD9XXUVT
63FLKQHWV3AEEE6
63JGLWWK36CP7WM
Email me for more promo codes
Active Hydrating Toner, Anti-Aging Replenishing Advanced Face Moisturizer, with Vitamins A, C, E & Natural Botanicals to Promote Skin Balance & Collagen Production, 6.7 Fl Oz
Age Defying 0.3% Retinol Serum, Anti-Aging Dark Spot Remover for Face, Fine Lines & Wrinkle Pore Minimizer, with Vitamin E & Natural Botanicals
Firming Moisturizer, Advanced Hydrating Facial Replenishing Cream, with Hyaluronic Acid, Resveratrol & Natural Botanicals to Restore Skin's Strength, Radiance, and Resilience, 1.75 Oz
Skin Stem Cell Serum
Smartphone 101 - Pick a smartphone for me - android or iOS - Apple iPhone or Samsung Galaxy or Huawei or Xaomi or Google Pixel
Can AI Really Predict Lottery Results? We Asked an Expert.
Djamgatech

Read Photos and PDFs Aloud for me iOS
Read Photos and PDFs Aloud for me android
Read Photos and PDFs Aloud For me Windows 10/11
Read Photos and PDFs Aloud For Amazon
Get 20% off Google Workspace (Google Meet) Business Plan (AMERICAS): M9HNXHX3WC9H7YE (Email us for more)
Get 20% off Google Google Workspace (Google Meet) Standard Plan with the following codes: 96DRHDRA9J7GTN6(Email us for more)
AI-Powered Professional Certification Quiz Platform
Web|iOs|Android|Windows
FREE 10000+ Quiz Trivia and and Brain Teasers for All Topics including Cloud Computing, General Knowledge, History, Television, Music, Art, Science, Movies, Films, US History, Soccer Football, World Cup, Data Science, Machine Learning, Geography, etc....

List of Freely available programming books - What is the single most influential book every Programmers should read
- Bjarne Stroustrup - The C++ Programming Language
- Brian W. Kernighan, Rob Pike - The Practice of Programming
- Donald Knuth - The Art of Computer Programming
- Ellen Ullman - Close to the Machine
- Ellis Horowitz - Fundamentals of Computer Algorithms
- Eric Raymond - The Art of Unix Programming
- Gerald M. Weinberg - The Psychology of Computer Programming
- James Gosling - The Java Programming Language
- Joel Spolsky - The Best Software Writing I
- Keith Curtis - After the Software Wars
- Richard M. Stallman - Free Software, Free Society
- Richard P. Gabriel - Patterns of Software
- Richard P. Gabriel - Innovation Happens Elsewhere
- Code Complete (2nd edition) by Steve McConnell
- The Pragmatic Programmer
- Structure and Interpretation of Computer Programs
- The C Programming Language by Kernighan and Ritchie
- Introduction to Algorithms by Cormen, Leiserson, Rivest & Stein
- Design Patterns by the Gang of Four
- Refactoring: Improving the Design of Existing Code
- The Mythical Man Month
- The Art of Computer Programming by Donald Knuth
- Compilers: Principles, Techniques and Tools by Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman
- Gödel, Escher, Bach by Douglas Hofstadter
- Clean Code: A Handbook of Agile Software Craftsmanship by Robert C. Martin
- Effective C++
- More Effective C++
- CODE by Charles Petzold
- Programming Pearls by Jon Bentley
- Working Effectively with Legacy Code by Michael C. Feathers
- Peopleware by Demarco and Lister
- Coders at Work by Peter Seibel
- Surely You're Joking, Mr. Feynman!
- Effective Java 2nd edition
- Patterns of Enterprise Application Architecture by Martin Fowler
- The Little Schemer
- The Seasoned Schemer
- Why's (Poignant) Guide to Ruby
- The Inmates Are Running The Asylum: Why High Tech Products Drive Us Crazy and How to Restore the Sanity
- The Art of Unix Programming
- Test-Driven Development: By Example by Kent Beck
- Practices of an Agile Developer
- Don't Make Me Think
- Agile Software Development, Principles, Patterns, and Practices by Robert C. Martin
- Domain Driven Designs by Eric Evans
- The Design of Everyday Things by Donald Norman
- Modern C++ Design by Andrei Alexandrescu
- Best Software Writing I by Joel Spolsky
- The Practice of Programming by Kernighan and Pike
- Pragmatic Thinking and Learning: Refactor Your Wetware by Andy Hunt
- Software Estimation: Demystifying the Black Art by Steve McConnel
- The Passionate Programmer (My Job Went To India) by Chad Fowler
- Hackers: Heroes of the Computer Revolution
- Algorithms + Data Structures = Programs
- Writing Solid Code
- JavaScript - The Good Parts
- Getting Real by 37 Signals
- Foundations of Programming by Karl Seguin
- Computer Graphics: Principles and Practice in C (2nd Edition)
- Thinking in Java by Bruce Eckel
- The Elements of Computing Systems
- Refactoring to Patterns by Joshua Kerievsky
- Modern Operating Systems by Andrew S. Tanenbaum
- The Annotated Turing
- Things That Make Us Smart by Donald Norman
- The Timeless Way of Building by Christopher Alexander
- The Deadline: A Novel About Project Management by Tom DeMarco
- The C++ Programming Language (3rd edition) by Stroustrup
- Patterns of Enterprise Application Architecture
- Computer Systems - A Programmer's Perspective
- Agile Principles, Patterns, and Practices in C# by Robert C. Martin
- Growing Object-Oriented Software, Guided by Tests
- Framework Design Guidelines by Brad Abrams
- Object Thinking by Dr. David West
- Advanced Programming in the UNIX Environment by W. Richard Stevens
- Hackers and Painters: Big Ideas from the Computer Age
- The Soul of a New Machine by Tracy Kidder
- CLR via C# by Jeffrey Richter
- The Timeless Way of Building by Christopher Alexander
- Design Patterns in C# by Steve Metsker
- Alice in Wonderland by Lewis Carol
- Zen and the Art of Motorcycle Maintenance by Robert M. Pirsig
- About Face - The Essentials of Interaction Design
- Here Comes Everybody: The Power of Organizing Without Organizations by Clay Shirky
- The Tao of Programming
- Computational Beauty of Nature
- Writing Solid Code by Steve Maguire
- Philip and Alex's Guide to Web Publishing
- Object-Oriented Analysis and Design with Applications by Grady Booch
- Effective Java by Joshua Bloch
- Computability by N. J. Cutland
- Masterminds of Programming
- The Tao Te Ching
- The Productive Programmer
- The Art of Deception by Kevin Mitnick
- The Career Programmer: Guerilla Tactics for an Imperfect World by Christopher Duncan
- Paradigms of Artificial Intelligence Programming: Case studies in Common Lisp
- Masters of Doom
- Pragmatic Unit Testing in C# with NUnit by Andy Hunt and Dave Thomas with Matt Hargett
- How To Solve It by George Polya
- The Alchemist by Paulo Coelho
- Smalltalk-80: The Language and its Implementation
- Writing Secure Code (2nd Edition) by Michael Howard
- Introduction to Functional Programming by Philip Wadler and Richard Bird
- No Bugs! by David Thielen
- Rework by Jason Freid and DHH
- JUnit in Action
#BlackOwned #BlackEntrepreneurs #BlackBuniness #AWSCertified #AWSCloudPractitioner #AWSCertification #AWSCLFC02 #CloudComputing #AWSStudyGuide #AWSTraining #AWSCareer #AWSExamPrep #AWSCommunity #AWSEducation #AWSBasics #AWSCertified #AWSMachineLearning #AWSCertification #AWSSpecialty #MachineLearning #AWSStudyGuide #CloudComputing #DataScience #AWSCertified #AWSSolutionsArchitect #AWSArchitectAssociate #AWSCertification #AWSStudyGuide #CloudComputing #AWSArchitecture #AWSTraining #AWSCareer #AWSExamPrep #AWSCommunity #AWSEducation #AzureFundamentals #AZ900 #MicrosoftAzure #ITCertification #CertificationPrep #StudyMaterials #TechLearning #MicrosoftCertified #AzureCertification #TechBooks
Top 1000 Canada Quiz and trivia: CANADA CITIZENSHIP TEST- HISTORY - GEOGRAPHY - GOVERNMENT- CULTURE - PEOPLE - LANGUAGES - TRAVEL - WILDLIFE - HOCKEY - TOURISM - SCENERIES - ARTS - DATA VISUALIZATION

Top 1000 Africa Quiz and trivia: HISTORY - GEOGRAPHY - WILDLIFE - CULTURE - PEOPLE - LANGUAGES - TRAVEL - TOURISM - SCENERIES - ARTS - DATA VISUALIZATION

Exploring the Pros and Cons of Visiting All Provinces and Territories in Canada.

Exploring the Advantages and Disadvantages of Visiting All 50 States in the USA

Health Health, a science-based community to discuss human health
- Just one dose of psilocybin relieves symptoms of OCD for months | New Scientistby /u/chilladipa on March 7, 2026 at 5:28 am
submitted by /u/chilladipa [link] [comments]
- Groundbreaking new drug shows promise for treating children with a devastating form of epilepsyby /u/_Dark_Wing on March 7, 2026 at 5:19 am
submitted by /u/_Dark_Wing [link] [comments]
- FDA’s controversial vaccine chief will exit agency next monthby /u/cnn on March 6, 2026 at 11:01 pm
submitted by /u/cnn [link] [comments]
- Measles is 'worse than expected' in Utah, officials say: Measles patients in Utah are developing severe complications, health officials say, including potentially life-threatening anemia and liver inflammationby /u/DoremusJessup on March 6, 2026 at 10:33 pm
submitted by /u/DoremusJessup [link] [comments]
- Researchers say AI can predict Alzheimer’s disease with close to 93 percent accuracyby /u/theindependentonline on March 6, 2026 at 6:20 pm
submitted by /u/theindependentonline [link] [comments]
Today I Learned (TIL) You learn something new every day; what did you learn today? Submit interesting and specific facts about something that you just found out here.
- TIL about Dixy Lee Ray, the first woman governor of Washington. A Democrat, she climbed Mt. Rainier at age 12, changed her name from "Marguerite" to reference Robert E. Lee, ran the Atomic Energy Commission under Nixon, never married, and ran off hippies with a whistle when she was a museum curatorby /u/Smaptimania on March 7, 2026 at 8:31 am
submitted by /u/Smaptimania [link] [comments]
- TIL Vicks VapoInhalers contained an optical isomer of Methamphetamine until 2016by /u/letseatnudels on March 7, 2026 at 6:31 am
submitted by /u/letseatnudels [link] [comments]
- TIL that an estimated 1 in 10 people have “inverse salt sensitivity”: if they eat less salt, their blood pressure actually increasesby /u/Comfortable_List3413 on March 7, 2026 at 5:10 am
submitted by /u/Comfortable_List3413 [link] [comments]
- TIL The Classic Interstellar Black Hole doesn't look like this because there's multiple rings - it's 1 horizontal ring that is "lensed" to look like a separate vertical halo.by /u/velvetcrow5 on March 7, 2026 at 3:55 am
submitted by /u/velvetcrow5 [link] [comments]
- TIL that the United States Coast Guard has only one member, SM1 Douglas Munro, who has been awarded the Medal of Honor. SM1 Munro used his Higgins boat as a shield to protect Marines being evacuated during the Battle of Guadalcanal, dying in the process.by /u/Advanced_Narwhal_949 on March 7, 2026 at 3:44 am
submitted by /u/Advanced_Narwhal_949 [link] [comments]
Reddit Science This community is a place to share and discuss new scientific research. Read about the latest advances in astronomy, biology, medicine, physics, social science, and more. Find and submit new publications and popular science coverage of current research.
- People with the least political knowledge tend to be the most overconfident in their grasp of facts. This tendency to be overconfident appears most common among individuals who actually know the least about politics and those who lean conservative.by /u/mvea on March 7, 2026 at 11:04 am
submitted by /u/mvea [link] [comments]
- Cross-sectional study of healthy human fetuses finds stable yawning frequency between 23 and 31 weeks of gestation and a negative association between yawning rates and birth weight.by /u/adamjeffson on March 7, 2026 at 10:18 am
submitted by /u/adamjeffson [link] [comments]
- Quarter of healthy years lost to breast cancer are due to lifestyle factors, research finds. Largest study of its kind suggests high red meat consumption has biggest impact, followed by smoking.by /u/F0urLeafCl0ver on March 7, 2026 at 9:48 am
submitted by /u/F0urLeafCl0ver [link] [comments]
- Family dynamics, social media, including “what I eat in a day” videos, health care providers’ lack of acknowledgement and mental health challenges can dissuade people with eating disorders from telling those close to them about their strugglesby /u/NGNResearch on March 6, 2026 at 8:49 pm
submitted by /u/NGNResearch [link] [comments]
- Brain scans reveal 2 physical subtypes of ADHD. 1st subtype has increase in gray matter across areas of brain. Patients struggle with severe inattentiveness. 2nd subtype shows widespread atrophy in gray matter. Patients exhibit both inattentive and highly hyperactive or impulsive behaviors.by /u/mvea on March 6, 2026 at 8:20 pm
submitted by /u/mvea [link] [comments]
Reddit Sports Sports News and Highlights from the NFL, NBA, NHL, MLB, MLS, NCAA, F1, and other leagues around the world.
- Women's Asian Cup 2026: 'No way' Iran’s footballers can return home, with calls to protect them in Australiaby /u/AlamutJones on March 7, 2026 at 11:06 am
submitted by /u/AlamutJones [link] [comments]
- [AFL] Bailey Williams from the Western Bulldogs with an early goal of the year candidateby /u/B0llywoodBulkBogan on March 7, 2026 at 10:59 am
Shaking off that many defenders in a small space is incredibly impressive and most goals that do with the Goal of the Year award aren't as good as this. submitted by /u/B0llywoodBulkBogan [link] [comments]
- The Maxx Crosby trade is a cannon shot signaling the start of a massive Raiders rebuild around Fernando Mendozaby /u/Movie-Kino on March 7, 2026 at 9:55 am
submitted by /u/Movie-Kino [link] [comments]
- Miami University finishes the regular season 31-0.by /u/shunestar on March 7, 2026 at 5:31 am
Only the 8th team in division 1 NCAA men’s basketball history to accomplish the feat. submitted by /u/shunestar [link] [comments]
- Lucas Ramirez hits his second home run of the game to cut the lead to 3by /u/noahlylesusa on March 7, 2026 at 4:04 am
submitted by /u/noahlylesusa [link] [comments]


![[Project] Extracting vector geometry (SVG/DXF/STL) from photos + experimental hand-drawn sketch extraction](https://preview.redd.it/kljin70tngng1.jpg?width=140&height=105&auto=webp&s=25f4f853823a0b2d5611fab006aabb94263a0268)
![[D] Two college students built a prototype that tries to detect contradictions between research papers — curious if this would actually be useful](https://preview.redd.it/kcwfl7deggng1.png?width=140&height=82&auto=webp&s=a834eff0f1e04d8ffc30e7f627170a474bf6ae2d)






















![[AFL] Bailey Williams from the Western Bulldogs with an early goal of the year candidate](https://external-preview.redd.it/cW92Nng2Zm10bG5nMU4duhino5mQT_JHSKzamfflji9-Rh8NKefItMa3i31P.png?width=640&crop=smart&auto=webp&s=f6124bc8cedb1c50d3ecf7b63ef7a4526a6b26a4)



96DRHDRA9J7GTN6