Big Data and Data Analytics 101 – Top 100 AWS Certified Data Analytics Specialty Certification Questions and Answers Dumps

AI Dashboard is available on the Web, Apple, Google, and Microsoft, PRO version

Top 100 AWS Certified Data Analytics Specialty Certification Questions and Answers Dumps

If you’re looking to take your data analytics career to the next level, then this AWS Data Analytics Specialty Certification Exam Preparation blog is a must-read! With over 100 exam questions and answers, plus data science and data analytics interview questions, cheat sheets and more, you’ll be fully prepared to ace the DAS-C01 exam.

In this blog, we talk about big data and data analytics; we also give you the last updated top 100 AWS Certified Data Analytics – Specialty Questions and Answers Dumps

The AWS Certified Data Analytics – Specialty (DAS-C01) examination is intended for individuals who perform in a data analytics-focused role. This exam validates an examinee’s comprehensive understanding of using AWS services to design, build, secure, and maintain analytics solutions that provide insight from data.

Download the App for an interactive experience:

AWS DAS-C01 Exam Prep on iOS

Get 20% off Google Google Workspace (Google Meet) Standard Plan with the following codes: 96DRHDRA9J7GTN6
Get 20% off Google Workspace (Google Meet) Business Plan (AMERICAS): M9HNXHX3WC9H7YE (Email us for more codes)

Active Anti-Aging Eye Gel, Reduces Dark Circles, Puffy Eyes, Crow's Feet and Fine Lines & Wrinkles, Packed with Hyaluronic Acid & Age Defying Botanicals

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

AI Unraveled: Demystifying Frequently Asked Questions on Artificial Intelligence (OpenAI, ChatGPT, Google Bard, Generative AI, Discriminative AI, xAI, LLMs, GPUs, Machine Learning, NLP, Promp Engineering)

The AWS Certified Data Analytics – Specialty (DAS-C01) covers the following domains:

Domain 1: Collection 18%

Domain 2: Storage and Data Management 22%

Domain 3: Processing 24%

Advertise with us - Post Your Good Content Here
We are ranked in the Top 20 on Google

Domain 4: Analysis and Visualization 18%

"Pass the AWS Cloud Practitioner Certification with flying colors: Master the Exam with 250+ Quizzes, Cheat Sheets, Flashcards, and Illustrated Study Guides - 2024 Edition"

Domain 5: Security 18%

Below are the Top 100 AWS Certified Data Analytics – Specialty Questions and Answers Dumps and References –

https://enoumen.com/2021/11/07/top-100-data-science-and-data-analytics-interview-questions-and-answers/

Question1: What combination of services do you need for the following requirements: accelerate petabyte-scale data transfers, load streaming data, and the ability to create scalable, private connections. Select the correct answer order.

A) Snowball, Kinesis Firehose, Direct Connect

Dive into a comprehensive AWS Cloud Practitioner CLF-C02 Certification guide, masterfully weaving insights from Tutorials Dojo, Adrian Cantrill, Stephane Maarek, and AWS Skills Builder into one unified resource.

B) Data Migration Services, Kinesis Firehose, Direct Connect

C) Snowball, Data Migration Services, Direct Connect

Invest in your future today by enrolling in this Azure Fundamentals - Pass the Azure Fundamentals Exam with Ease: Master the AZ-900 Certification with the Comprehensive Exam Preparation Guide!

Microsoft Azure AZ900 Certification and Training

D) Snowball, Direct Connection, Kinesis Firehose

ANSWER1:

Notes/Hint1:

AWS has many options to help get data into the cloud, including secure devices like AWS Import/Export Snowball to accelerate petabyte-scale data transfers, Amazon Kinesis Firehose to load streaming data, and scalable private connections through AWS Direct Connect.

Reference1: Big Data Analytics Options

AWS Data Analytics Specialty Certification Exam Preparation App is a great way to prepare for your upcoming AWS Data Analytics Specialty Certification Exam. The app provides you with over 300 questions and answers, detailed explanations of each answer, a scorecard to track your progress, and a countdown timer to help keep you on track. You can also find data science and data analytics interview questions and detailed answers, cheat sheets, and flashcards to help you study. The app is very similar to the real exam, so you will be well-prepared when it comes time to take the test.

AWS DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

Question2: A company ingests a large set of clickstream data in nested JSON format from different sources and stores it in Amazon S3. Data analysts need to analyze this data in combination with data stored in an Amazon Redshift cluster. Data analysts want to build a cost-effective and automated solution for this need.
Which solution meets these requirements?

A) Use Apache Spark SQL on Amazon EMR to convert the clickstream data to a tabular format. Use the Amazon Redshift COPY command to load the data into the Amazon Redshift cluster.

B) Use AWS Lambda to convert the data to a tabular format and write it to Amazon S3. Use the Amazon Redshift COPY command to load the data into the Amazon Redshift cluster.

C) Use the Relationalize class in an AWS Glue ETL job to transform the data and write the data back to Amazon S3. Use Amazon Redshift Spectrum to create external tables and join with the internal tables.

D) Use the Amazon Redshift COPY command to move the clickstream data directly into new tables in the Amazon Redshift cluster.

Ace the Microsoft Azure Fundamentals AZ-900 Certification Exam: Pass the Azure Fundamentals Exam with Ease

ANSWER2:

Notes/Hint2:

The Relationalize PySpark transform can be used to flatten the nested data into a structured format. Amazon Redshift Spectrum can join the external tables and query the transformed clickstream data in place rather than needing to scale the cluster to accommodate the large dataset.

Reference1: Relationalize PySpark

AWS DAS-C01 Exam Prep on iOS

Unlock the Secrets of Africa: Master African History, Geography, Culture, People, Cuisine, Economics, Languages, Music, Wildlife, Football, Politics, Animals, Tourism, Science and Environment with the Top 1000 Africa Quiz and Trivia. Get Yours Now!

"Become a Canada Expert: Ace the Citizenship Test and Impress Everyone with Your Knowledge of Canadian History, Geography, Government, Culture, People, Languages, Travel, Wildlife, Hockey, Tourism, Sceneries, Arts, and Data Visualization. Get the Top 1000 Canada Quiz Now!"

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

Question 3: There is a five-day car rally race across Europe. The race coordinators are using a Kinesis stream and IoT sensors to monitor the movement of the cars. Each car has a sensor and data is getting back to the stream with the default stream settings. On the last day of the rally, data is sent to S3. When you go to interpret the data in S3, there is only data for the last day and nothing for the first 4 days. Which of the following is the most probable cause of this?

A) You did not have versioning enabled and would need to create individual buckets to prevent the data from being overwritten.

B) Data records are only accessible for a default of 24 hours from the time they are added to a stream.

C) One of the sensors failed, so there was no data to record.

D) You needed to use EMR to send the data to S3; Kinesis Streams are only compatible with DynamoDB.

ANSWER3:

Notes/Hint3:

Streams support changes to the data record retention period of your stream. An Amazon Kinesis stream is an ordered sequence of data records, meant to be written to and read from in real-time. Data records are therefore stored in shards in your stream temporarily. The period from when a record is added to when it is no longer accessible is called the retention period. An Amazon Kinesis stream stores records for 24 hours by default, up to 168 hours.

Cloud Certification made simple. Ace your exams with Djamgatech.

Reference3: Kinesis Extended Reading

AWS DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

Question 4: A publisher website captures user activity and sends clickstream data to Amazon Kinesis Data Streams. The publisher wants to design a cost-effective solution to process the data to create a timeline of user activity within a session. The solution must be able to scale depending on the number of active sessions.
Which solution meets these requirements?

A) Include a variable in the clickstream data from the publisher website to maintain a counter for the number of active user sessions. Use a timestamp for the partition key for the stream. Configure the consumer application to read the data from the stream and change the number of processor threads based upon the counter. Deploy the consumer application on Amazon EC2 instances in an EC2 Auto Scaling group.

B) Include a variable in the clickstream to maintain a counter for each user action during their session. Use the action type as the partition key for the stream. Use the Kinesis Client Library (KCL) in the consumer application to retrieve the data from the stream and perform the processing. Configure the consumer application to read the data from the stream and change the number of processor threads based upon the
counter. Deploy the consumer application on AWS Lambda.

C) Include a session identifier in the clickstream data from the publisher website and use as the partition key for the stream. Use the Kinesis Client Library (KCL) in the consumer application to retrieve the data from the stream and perform the processing. Deploy the consumer application on Amazon EC2 instances in an
EC2 Auto Scaling group. Use an AWS Lambda function to reshard the stream based upon Amazon CloudWatch alarms.

D) Include a variable in the clickstream data from the publisher website to maintain a counter for the number of active user sessions. Use a timestamp for the partition key for the stream. Configure the consumer application to read the data from the stream and change the number of processor threads based upon the counter. Deploy the consumer application on AWS Lambda.

ANSWER4:

Notes/Hint4:

Partitioning by the session ID will allow a single processor to process all the actions for a user session in order. An AWS Lambda function can call the UpdateShardCount API action to change the number of shards in the stream. The KCL will automatically manage the number of processors to match the number of shards. Amazon EC2 Auto Scaling will assure the correct number of instances are running to meet the processing load.

Reference4: UpdateShardCount API

AWS DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

Question 5: Your company has two batch processing applications that consume financial data about the day’s stock transactions. Each transaction needs to be stored durably and guarantee that a record of each application is delivered so the audit and billing batch processing applications can process the data. However, the two applications run separately and several hours apart and need access to the same transaction information. After reviewing the transaction information for the day, the information no longer needs to be stored. What is the best way to architect this application?

A) Use SQS for storing the transaction messages; when the billing batch process performs first and consumes the message, write the code in a way that does not remove the message after consumed, so it is available for the audit application several hours later. The audit application can consume the SQS message and remove it from the queue when completed.

B) Use Kinesis to store the transaction information. The billing application will consume data from the stream and the audit application can consume the same data several hours later.

C) Store the transaction information in a DynamoDB table. The billing application can read the rows while the audit application will read the rows then remove the data.

D) Use SQS for storing the transaction messages. When the billing batch process consumes each message, have the application create an identical message and place it in a different SQS for the audit application to use several hours later.

SQS would make this more difficult because the data does not need to persist after a full day.

ANSWER5:

Notes/Hint5:

Kinesis appears to be the best solution that allows multiple consumers to easily interact with the records.

Reference5: Amazon Kinesis

Get mobile friendly version of the quiz @ the App Store

AWS DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

Question 6: A company is currently using Amazon DynamoDB as the database for a user support application. The company is developing a new version of the application that will store a PDF file for each support case ranging in size from 1–10 MB. The file should be retrievable whenever the case is accessed in the application.
How can the company store the file in the MOST cost-effective manner?

A) Store the file in Amazon DocumentDB and the document ID as an attribute in the DynamoDB table.

B) Store the file in Amazon S3 and the object key as an attribute in the DynamoDB table.

C) Split the file into smaller parts and store the parts as multiple items in a separate DynamoDB table.

D) Store the file as an attribute in the DynamoDB table using Base64 encoding.

ANSWER6:

Notes/Hint6:

Use Amazon S3 to store large attribute values that cannot fit in an Amazon DynamoDB item. Store each file as an object in Amazon S3 and then store the object path in the DynamoDB item.

Reference6: S3 Storage Cost – DynamODB Storage Cost

AWS DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

Question 7: Your client has a web app that emits multiple events to Amazon Kinesis Streams for reporting purposes. Critical events need to be immediately captured before processing can continue, but informational events do not need to delay processing. What solution should your client use to record these types of events without unnecessarily slowing the application?

A) Log all events using the Kinesis Producer Library.

B) Log critical events using the Kinesis Producer Library, and log informational events using the PutRecords API method.

C) Log critical events using the PutRecords API method, and log informational events using the Kinesis Producer Library.

D) Log all events using the PutRecords API method.

ANSWER2:

Notes/Hint7:

The PutRecords API can be used in code to be synchronous; it will wait for the API request to complete before the application continues. This means you can use it when you need to wait for the critical events to finish logging before continuing. The Kinesis Producer Library is asynchronous and can send many messages without needing to slow down your application. This makes the KPL ideal for the sending of many non-critical alerts asynchronously.

Reference7: PutRecords API

AWS DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

Question 8: You work for a start-up that tracks commercial delivery trucks via GPS. You receive coordinates that are transmitted from each delivery truck once every 6 seconds. You need to process these coordinates in near real-time from multiple sources and load them into Elasticsearch without significant technical overhead to maintain. Which tool should you use to digest the data?

A) Amazon SQS

B) Amazon EMR

C) AWS Data Pipeline

D) Amazon Kinesis Firehose

ANSWER8:

Notes/Hint8:

Amazon Kinesis Firehose is the easiest way to load streaming data into AWS. It can capture, transform, and load streaming data into Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service, enabling near real-time analytics with existing business intelligence tools and dashboards.

Reference8: Amazon Kinesis Firehose

AWS DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

Question 9: A company needs to implement a near-real-time fraud prevention feature for its ecommerce site. User and order details need to be delivered to an Amazon SageMaker endpoint to flag suspected fraud. The amount of input data needed for the inference could be as much as 1.5 MB.
Which solution meets the requirements with the LOWEST overall latency?

A) Create an Amazon Managed Streaming for Kafka cluster and ingest the data for each order into a topic. Use a Kafka consumer running on Amazon EC2 instances to read these messages and invoke the Amazon SageMaker endpoint.

B) Create an Amazon Kinesis Data Streams stream and ingest the data for each order into the stream. Create an AWS Lambda function to read these messages and invoke the Amazon SageMaker endpoint.

C) Create an Amazon Kinesis Data Firehose delivery stream and ingest the data for each order into the stream. Configure Kinesis Data Firehose to deliver the data to an Amazon S3 bucket. Trigger an AWS Lambda function with an S3 event notification to read the data and invoke the Amazon SageMaker endpoint.

D) Create an Amazon SNS topic and publish the data for each order to the topic. Subscribe the Amazon SageMaker endpoint to the SNS topic.

ANSWER9:

Notes/Hint9:

An Amazon Managed Streaming for Kafka cluster can be used to deliver the messages with very low latency. It has a configurable message size that can handle the 1.5 MB payload.

Reference9: Amazon Managed Streaming for Kafka cluster

AWS DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

Question 10: You need to filter and transform incoming messages coming from a smart sensor you have connected with AWS. Once messages are received, you need to store them as time series data in DynamoDB. Which AWS service can you use?

A) IoT Device Shadow Service

B) Redshift

C) Kinesis

D) IoT Rules Engine

ANSWER10:

Notes/Hint10:

The IoT rules engine will allow you to send sensor data over to AWS services like DynamoDB

Reference10: The IoT rules engine

Get mobile friendly version of the quiz @ the App Store

Question 11: A media company is migrating its on-premises legacy Hadoop cluster with its associated data processing scripts and workflow to an Amazon EMR environment running the latest Hadoop release. The developers want to reuse the Java code that was written for data processing jobs for the on-premises cluster.
Which approach meets these requirements?

A) Deploy the existing Oracle Java Archive as a custom bootstrap action and run the job on the EMR cluster.

B) Compile the Java program for the desired Hadoop version and run it using a CUSTOM_JAR step on the EMR cluster.

C) Submit the Java program as an Apache Hive or Apache Spark step for the EMR cluster.

D) Use SSH to connect the master node of the EMR cluster and submit the Java program using the AWS CLI.

ANSWER11:

Notes/Hint11:

A CUSTOM JAR step can be configured to download a JAR file from an Amazon S3 bucket and execute it. Since the Hadoop versions are different, the Java application has to be recompiled.

Reference11: Automating analytics workflows on EMR

AWS DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

Question 12: You currently have databases running on-site and in another data center off-site. What service allows you to consolidate to one database in Amazon?

A) AWS Kinesis

B) AWS Database Migration Service

C) AWS Data Pipeline

D) AWS RDS Aurora

ANSWER12:

Notes/Hint12:

AWS Database Migration Service can migrate your data to and from most of the widely used commercial and open source databases. It supports homogeneous migrations such as Oracle to Oracle, as well as heterogeneous migrations between different database platforms, such as Oracle to Amazon Aurora. Migrations can be from on-premises databases to Amazon RDS or Amazon EC2, databases running on EC2 to RDS, or vice versa, as well as from one RDS database to another RDS database.

Reference12: DMS

AWS DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

Question 13: An online retail company wants to perform analytics on data in large Amazon S3 objects using Amazon EMR. An Apache Spark job repeatedly queries the same data to populate an analytics dashboard. The analytics team wants to minimize the time to load the data and create the dashboard.
Which approaches could improve the performance? (Select TWO.)

A) Copy the source data into Amazon Redshift and rewrite the Apache Spark code to create analytical reports by querying Amazon Redshift.

B) Copy the source data from Amazon S3 into Hadoop Distributed File System (HDFS) using s3distcp.

C) Load the data into Spark DataFrames.

D) Stream the data into Amazon Kinesis and use the Kinesis Connector Library (KCL) in multiple Spark jobs to perform analytical jobs.

E) Use Amazon S3 Select to retrieve the data necessary for the dashboards from the S3 objects.

ANSWER13:

C and E

Notes/Hint13:

One of the speed advantages of Apache Spark comes from loading data into immutable dataframes, which can be accessed repeatedly in memory. Spark DataFrames organizes distributed data into columns. This makes summaries and aggregates much quicker to calculate. Also, instead of loading an entire large Amazon S3 object, load only what is needed using Amazon S3 Select. Keeping the data in Amazon S3 avoids loading the large dataset into HDFS.

Reference13: Spark DataFrames

Question 14: You have been hired as a consultant to provide a solution to integrate a client’s on-premises data center to AWS. The customer requires a 300 Mbps dedicated, private connection to their VPC. Which AWS tool do you need?

A) VPC peering

B) Data Pipeline

C) Direct Connect

D) EMR

ANSWER14:

Notes/Hint14:

Direct Connect will provide a dedicated and private connection to an AWS VPC.

Reference14: Direct Connect

AWS DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

Question 15: Your organization has a variety of different services deployed on EC2 and needs to efficiently send application logs over to a central system for processing and analysis. They’ve determined it is best to use a managed AWS service to transfer their data from the EC2 instances into Amazon S3 and they’ve decided to use a solution that will do what?

A) Installs the AWS Direct Connect client on all EC2 instances and uses it to stream the data directly to S3.

B) Leverages the Kinesis Agent to send data to Kinesis Data Streams and output that data in S3.

C) Ingests the data directly from S3 by configuring regular Amazon Snowball transactions.

D) Leverages the Kinesis Agent to send data to Kinesis Firehose and output that data in S3.

ANSWER15:

Notes/Hint15:

Kinesis Firehose is a managed solution, and log files can be sent from EC2 to Firehose to S3 using the Kinesis agent.

Reference15: Kinesis Firehose

Question 16: A data engineer needs to create a dashboard to display social media trends during the last hour of a large company event. The dashboard needs to display the associated metrics with a latency of less than 1 minute.
Which solution meets these requirements?

A) Publish the raw social media data to an Amazon Kinesis Data Firehose delivery stream. Use Kinesis Data Analytics for SQL Applications to perform a sliding window analysis to compute the metrics and output the results to a Kinesis Data Streams data stream. Configure an AWS Lambda function to save the stream data to an Amazon DynamoDB table. Deploy a real-time dashboard hosted in an Amazon S3 bucket to read and display the metrics data stored in the DynamoDB table.

B) Publish the raw social media data to an Amazon Kinesis Data Firehose delivery stream. Configure the stream to deliver the data to an Amazon Elasticsearch Service cluster with a buffer interval of 0 seconds. Use Kibana to perform the analysis and display the results.

C) Publish the raw social media data to an Amazon Kinesis Data Streams data stream. Configure an AWS Lambda function to compute the metrics on the stream data and save the results in an Amazon S3 bucket. Configure a dashboard in Amazon QuickSight to query the data using Amazon Athena and display the results.

D) Publish the raw social media data to an Amazon SNS topic. Subscribe an Amazon SQS queue to the topic. Configure Amazon EC2 instances as workers to poll the queue, compute the metrics, and save the results to an Amazon Aurora MySQL database. Configure a dashboard in Amazon QuickSight to query the data in Aurora and display the results.

ANSWER16:

Notes/Hint16:

Amazon Kinesis Data Analytics can query data in a Kinesis Data Firehose delivery stream in near-real time using SQL. A sliding window analysis is appropriate for determining trends in the stream. Amazon S3 can host a static webpage that includes JavaScript that reads the data in Amazon DynamoDB and refreshes the dashboard.

Reference16: Amazon Kinesis Data Analytics can query data in a Kinesis Data Firehose delivery stream in near-real time using SQL

Question 17: A real estate company is receiving new property listing data from its agents through .csv files every day and storing these files in Amazon S3. The data analytics team created an Amazon QuickSight visualization report that uses a dataset imported from the S3 files. The data analytics team wants the visualization report to reflect the current data up to the previous day. How can a data analyst meet these requirements?

A) Schedule an AWS Lambda function to drop and re-create the dataset daily.

B) Configure the visualization to query the data in Amazon S3 directly without loading the data into SPICE.

C) Schedule the dataset to refresh daily.

D) Close and open the Amazon QuickSight visualization.

ANSWER17:

Notes/Hint17:

Datasets created using Amazon S3 as the data source are automatically imported into SPICE. The Amazon QuickSight console allows for the refresh of SPICE data on a schedule.

Reference17: Amazon QuickSight and SPICE

AWS DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

Question 18: You need to migrate data to AWS. It is estimated that the data transfer will take over a month via the current AWS Direct Connect connection your company has set up. Which AWS tool should you use?

A) Establish additional Direct Connect connections.

B) Use Data Pipeline to migrate the data in bulk to S3.

C) Use Kinesis Firehose to stream all new and existing data into S3.

D) Snowball

ANSWER18:

Notes/Hint18:

As a general rule, if it takes more than one week to upload your data to AWS using the spare capacity of your existing Internet connection, then you should consider using Snowball. For example, if you have a 100 Mb connection that you can solely dedicate to transferring your data and need to transfer 100 TB of data, it takes more than 100 days to complete a data transfer over that connection. You can make the same transfer by using multiple Snowballs in about a week.

Reference18: Snowball

Question 19: You currently have an on-premises Oracle database and have decided to leverage AWS and use Aurora. You need to do this as quickly as possible. How do you achieve this?

A) It is not possible to migrate an on-premises database to AWS at this time.

B) Use AWS Data Pipeline to create a target database, migrate the database schema, set up the data replication process, initiate the full load and a subsequent change data capture and apply, and conclude with a switchover of your production environment to the new database once the target database is caught up with the source database.

C) Use AWS Database Migration Services and create a target database, migrate the database schema, set up the data replication process, initiate the full load and a subsequent change data capture and apply, and conclude with a switch-over of your production environment to the new database once the target database is caught up with the source database.

D) Use AWS Glue to crawl the on-premises database schemas and then migrate them into AWS with Data Pipeline jobs.

https://aws.amazon.com/dms/faqs/

ANSWER19:

Notes/Hint19:

DMS can efficiently support this sort of migration using the steps outlined. While AWS Glue can help you crawl schemas and store metadata on them inside of Glue for later use, it isn’t the best tool for actually transitioning a database over to AWS itself. Similarly, while Data Pipeline is great for ETL and ELT jobs, it isn’t the best option to migrate a database over to AWS.

Reference19: DMS

AWS DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

Question 20: A financial company uses Amazon EMR for its analytics workloads. During the company’s annual security audit, the security team determined that none of the EMR clusters’ root volumes are encrypted. The security team recommends the company encrypt its EMR clusters’ root volume as soon as possible.
Which solution would meet these requirements?

A) Enable at-rest encryption for EMR File System (EMRFS) data in Amazon S3 in a security configuration. Re-create the cluster using the newly created security configuration.

B) Specify local disk encryption in a security configuration. Re-create the cluster using the newly created security configuration.

C) Detach the Amazon EBS volumes from the master node. Encrypt the EBS volume and attach it back to the master node.

D) Re-create the EMR cluster with LZO encryption enabled on all volumes.

ANSWER20:

Notes/Hint20:

Local disk encryption can be enabled as part of a security configuration to encrypt root and storage volumes.

Reference20: EMR Cluster Local disk encryption

Question 21: A company has a clickstream analytics solution using Amazon Elasticsearch Service. The solution ingests 2 TB of data from Amazon Kinesis Data Firehose and stores the latest data collected within 24 hours in an Amazon ES cluster. The cluster is running on a single index that has 12 data nodes and 3 dedicated master nodes. The cluster is configured with 3,000 shards and each node has 3 TB of EBS storage attached. The Data Analyst noticed that the query performance of Elasticsearch is sluggish, and some intermittent errors are produced by the Kinesis Data Firehose when it tries to write to the index. Upon further investigation, there were occasional JVMMemoryPressure errors found in Amazon ES logs.

What should be done to improve the performance of the Amazon Elasticsearch Service cluster?

A) Improve the cluster performance by increasing the number of master nodes of Amazon Elasticsearch.

B) Improve the cluster performance by increasing the number of shards of the Amazon Elasticsearch index.

C) Improve the cluster performance by decreasing the number of data nodes of Amazon Elasticsearch.

D) Improve the cluster performance by decreasing the number of shards of the Amazon Elasticsearch index.

ANSWER21:
D

Notes/Hint21:

“Amazon Elasticsearch Service (Amazon ES) is a managed service that makes it easy to deploy, operate, and scale Elasticsearch clusters in AWS Cloud. Elasticsearch is a popular open-source search and analytics engine for use cases such as log analytics, real-time application monitoring, and clickstream analysis. With Amazon ES, you get direct access to the Elasticsearch APIs; existing code and applications work seamlessly with the service.

Each Elasticsearch index is split into some number of shards. You should decide the shard count before indexing your first document. The overarching goal of choosing a number of shards is to distribute an index evenly across all data nodes in the cluster. However, these shards shouldn’t be too large or too numerous.

A good rule of thumb is to try to keep a shard size between 10 – 50 GiB. Large shards can make it difficult for Elasticsearch to recover from failure, but because each shard uses some amount of CPU and memory, having too many small shards can cause performance issues and out of memory errors. In other words, shards should be small enough that the underlying Amazon ES instance can handle them, but not so small that they place needless strain on the hardware. Therefore the correct answer is: Improve the cluster performance by decreasing the number of shards of Amazon Elasticsearch index.

Reference: ElasticsSearch

Question 22: A data lake is a central repository that enables which operation?

A) Store unstructured data from a single data source

B) Store structured data from any data source

C) Store structure and unstructured data from any source

D) Store structured and unstructured data from a single source

ANSWER22:
C

Notes/Hint22:

Data lake is a centralized repository for large amounts of structured and unstructured data to enable direct analytics.

Reference: Data Lakes

Question 23: What is the most cost-effective storage option for your data lake?

A) Amazon EBS

B) Amazon S3

C) Amazon RDS

D) Amazon Redshift

ANSWER23:
B

Notes/Hint23:

Amazon S3

Reference: Data Lakes – S3

Question 24: Which services are used in the processing layer of a data lake architecture? (SELECT TWO)

A. AWS Snowball

B. AWS Glue

C. Amazon EMR

D. Amazon QuickSight

ANSWER24:
B and C

Notes/Hint24:

Amazon Glue and Amazon EMR

Reference: Data Lakes – Glue and EMR

Question 25: Which services can be used for data ingestion into your data lake? (SELECT TWO)

A) Amazon Kinesis Data Firehose

B) Amazon QuickSight

C) Amazon Athena

D) AWS Storage Gateway

ANSWER25:
A and D

Notes/Hint25:

Amazon Kinesis Data Firehose and and Amazon Storage Gateway

Reference: Data Lakes

Question 26: Which service uses continuous data replication with high availability to consolidate databases into a petabyte-scale data warehouse by streaming data to amazon Redshift and Amazon S3?

A) AWS Storage Gateway

B) AWS Schema Conversion Tool

C) AWS Database Migration Service

D) Amazon Kinesis Data Firehose

ANSWER26:
C

Notes/Hint26:

AWS Database Migration Service

Reference: Data Lakes

Question 27: What is the AWS Glue Data Catalog?

A) A fully managed ETL (extract, transform, and load) pipeline service

B) A service to schedule jobs

C) A visual data preparation tool

D) An index to the location, schema, and runtime metrics of your data

ANSWER27:
D

Notes/Hint27:

An index to the location, schema, and runtime metrics of your data

Reference: Data Lakes

Questions 28: What AWS Glue feature “catalogs” your data?

A) AWS Glue crawler

B) AWS Glue DataBrew

C) AWS Glue Studio

D) AWS Glue Elastic Views

ANSWER28:
A

Notes/Hint28:

AWS Glue crawler

Reference: Data Lakes

Question 29: During your data preparation stage, the raw data has been enriched to support additional insights. You need to improve query performance and reduce costs of the final analytics solution.

Which data formats meet these requirements (SELECT TWO)

ANSWER29:
C and D

Notes/Hint29:

Apache Parquet and Apache ORC

Reference: Data Lakes

Question 30: Your small start-uo company is developing a data analytics solution. You need to clean and normalize large datasets, but you do not have developers with the skill set to write custom scripts. Which tool will help efficiently design and run the data preparation activities?

ANSWER30:
B

Notes/Hint30:

AWS Glue DataBrew

To be able to run analytics, build reports, or apply machine learning, you need to be sure the data you’re using is clean and in the right format. This data preparation step requires data analysts and data scientists to write custom code and perform many manual activities. When cleaning and normalizing data, it is helpful to first review the dataset to understand which possible values are present. Simple visualizations are helpful for determining whether correlations exist between the columns.

AWS Glue DataBrew is a visual data preparation tool that helps you clean and normalize data up to 80% faster so you can focus more on the business value you can get. DataBrew provides a visual interface that quickly connects to your data stored in Amazon S3, Amazon Redshift, Amazon Relational Database Service (RDS), any JDBC-accessible data store, or data indexed by the AWS Glue Data Catalog. You can then explore the data, look for patterns, and apply transformations. For example, you can apply joins and pivots, merge different datasets, or use functions to manipulate data.

Reference: Data Lakes

Question 30: In which scenario would you use AWS Glue jobs?

A) Analyze data in real-time as data comes into the data lake

B) Transform data in real-time as data comes into the data lake

C) Analyze data in batches on schedule or on demand

D) Transform data in batches on schedule or on demand.

ANSWER30:
D

Notes/Hint30:

An AWS Glue job encapsulates a script that connects to your source data, processes it, and then writes it out to your data target. Typically, a job runs extract, transform, and load (ETL) scripts. Jobs can also run general-purpose Python scripts (Python shell jobs.) AWS Glue triggers can start jobs based on a schedule or event, or on demand. You can monitor job runs to understand runtime metrics such as completion status, duration, and start tim

Question 31: Your data resides in multiple data stores, including Amazon S3, Amazon RDS, and Amazon DynamoDB. You need to efficiently query the combined datasets.

Which tool can achieve this, using a single query, without moving data?

A) Amazon Athena Federated Query

B) Amazon Redshift Query Editor

C) SQl Workbench

D) AWS Glue DataBrew

ANSWER31:
A

Notes/Hint31:

With Amazon Athena Federated Query, you can run SQL queries across a variety of relational, non-relational, and custom data sources. You get a unified way to run SQL queries across various data stores.

Athena uses data source connectors that run on AWS Lambda to run federated queries. A data source connector is a piece of code that can translate between your target data source and Athena. You can think of a connector as an extension of Athena’s query engine. Pre-built Athena data source connectors exist for data sources like Amazon CloudWatch Logs, Amazon DynamoDB, Amazon DocumentDB, Amazon RDS, and JDBC-compliant relational data sources such MySQL and PostgreSQL under the Apache 2.0 license. You can also use the Athena Query Federation SDK to write custom connectors. To choose, configure, and deploy a data source connector to your account, you can use the Athena and Lambda consoles or the AWS Serverless Application Repository. After you deploy data source connectors, the connector is associated with a catalog that you can specify in SQL queries. You can combine SQL statements from multiple catalogs and span multiple data sources with a single query.

Question 32: Which benefit do you achieve by using AWS Lake Formation to build data lakes?

A) Build data lakes quickly

B) Simplify security management

C) Provide self-service access to data

D) All of the above

ANSWER32:
D

Notes/Hint32:

Build data lakes quickly

With Lake Formation, you can move, store, catalog, and clean your data faster. You simply point Lake Formation at your data sources, and Lake Formation crawls those sources and moves the data into your new Amazon S3 data lake. Lake Formation organizes data in S3 around frequently used query terms and into right-sized chunks to increase efficiency. Lake Formation also changes data into formats like Apache Parquet and ORC for faster analytics. In addition, Lake Formation has built-in machine learning to deduplicate and find matching records (two entries that refer to the same thing) to increase data quality.

Simplify security management

You can use Lake Formation to centrally define security, governance, and auditing policies in one place, versus doing these tasks per service. You can then enforce those policies for your users across their analytics applications. Your policies are consistently implemented, eliminating the need to manually configure them across security services like AWS Identity and Access Management (AWS IAM) and AWS Key Management Service (AWS KMS), storage services like Amazon S3, and analytics and machine learning services like Amazon Redshift, Amazon Athena, and (in beta) Amazon EMR for Apache Spark. This reduces the effort in configuring policies across services and provides consistent enforcement and compliance.

Provide self-service access to data

With Lake Formation, you build a data catalog that describes the different available datasets along with which groups of users have access to each. This makes your users more productive by helping them find the right dataset to analyze. By providing a catalog of your data with consistent security enforcement, Lake Formation makes it easier for your analysts and data scientists to use their preferred analytics service. They can use Amazon EMR for Apache Spark (in beta), Amazon Redshift, or Amazon Athena on diverse datasets that are now housed in a single data lake. Users can also combine these services without having to move data between silos.

Question 33: What are the three stages to set up a data lake using AWS Lake Formation? (SELECT THREE)

A) Register the storage location

B) Create a database

C) Populate the database

D) Grant permissions

ANSWER33:
A B and D

Notes/Hint33:

Lake Formation manages access to designated storage locations within Amazon S3. Register the storage locations that you want to be part of the data lake.

Create a database

Lake Formation organizes data into a catalog of logical databases and tables. Create one or more databases and then automatically generate tables during data ingestion for common workflows.

Grant permissions

Lake Formation manages access for IAM users, roles, and Active Directory users and groups via flexible database, table, and column permissions. Grant permissions to one or more resources for your selected users.

Question 34: Which of the following AWS Lake Formation tasks are performed by the AWS Glue service? (SELECT THREE)

A) ETL code creation and job monitoring

B) Blueprints to create workflows

C) Data catalog and serverless architecture

D) Simplify securty management

ANSWER34:
A B and C

Notes/Hint34:

Lake Formation leverages a shared infrastructure with AWS Glue, including console controls, ETL code creation and job monitoring, blueprints to create workflows for data ingest, the same data catalog, and a serverless architecture. While AWS Glue focuses on these types of functions, Lake Formation encompasses all AWS Glue features AND provides additional capabilities designed to help build, secure, and manage a data lake. See the AWS Glue features page for more de

Question 35: A digital media customer needs to quickly build a data lake solution for the data housed in a PostgreSQL database. As a solutions architect, what service and feature would meet this requirement?

A) Copy PostgreSQL data to an Amazon S3 bucket and build a data lake using AWS Lake Formation

B) Use AWS Lake Formation blueprints

C) Build a data lake manually

D) Build an analytics solution by directly accessing the database.

ANSWER35:
B

Notes/Hint35:

A blueprint is a data management template that enables you to easily ingest data into a data lake. Lake Formation provides several blueprints, each for a predefined source type, such as a relational database or AWS CloudTrail logs. From a blueprint, you can create a workflow. Workflows consist of AWS Glue crawlers, jobs, and triggers that are generated to orchestrate the loading and update of data. Blueprints take the data source, data target, and schedule as input to configure the workflow.

Question 36: AWS Lake Formation has a set of suggested personas and IAM permissions. Which is a required persona?

A) Data lake administrator

B) Data engineer

C) Data analyst

D) Business analyst

ANSWER36:
A

Notes/Hint36:

Data lake administrator (Required)

A user who can register Amazon S3 locations, access the Data Catalog, create databases, create and run workflows, grant Lake Formation permissions to other users, and view AWS CloudTrail logs. The user has fewer IAM permissions than the IAM administrator but enough to administer the data lake. Cannot add other data lake administrators.

Data engineer (Optional) A user who can create and run crawlers and workflows and grant Lake Formation permissions on the Data Catalog tables that the crawlers and workflows create.

Data analyst (Optional) A user who can run queries against the data lake using, for example, Amazon Athena. The user has only enough permissions to run queries.

Business analyst (Optional) Generally, an end-user application specific persona that would query data and resource using a workflow role.

Question 37: Which three types of blueprints does AWS Lake Formation support? (SELECT THREE)

A) ETL code creation and job monitoring

B) Database snapshot

C) Incremental database

D) Log file sources (AWS CloudTrail, ELB/ALB logs)

ANSWER37:
B C and D

Notes/Hint37:

AWS Lake Formation blueprints simplify and automate creating workflows. Lake Formation provides the following types of blueprints:

• Database snapshot – Loads or reloads data from all tables into the data lake from a JDBC source. You can exclude some data from the source based on an exclude pattern.

• Incremental database – Loads only new data into the data lake from a JDBC source, based on previously set bookmarks. You specify the individual tables in the JDBC source database to include. For each table, you choose the bookmark columns and bookmark sort order to keep track of data that has previously been loaded. The first time that you run an incremental database blueprint against a set of tables, the workflow loads all data from the tables and sets bookmarks for the next incremental database blueprint run. You can therefore use an incremental database blueprint instead of the database snapshot blueprint to load all data, provided that you specify each table in the data source as a paramete

• Log file – Bulk loads data from log file sources, including AWS CloudTrail, Elastic Load Balancing logs, and Application Load Balancer logs.

Question 38: Which one of the following is the best description of the capabilities of Amazon QuickSight?

A) Automated configuration service build on AWS Glue

B) Fast, serverless, business intelligence service

C) Fast, simple, cost-effective data warehousing

D) Simple, scalable, and serverless data integration

ANSWER38:
B C and D

Notes/Hint38:

B. Scalable, serverless business intelligence service is the correct choice.

See the brief descriptions of several AWS Analytics services below:

AWS Lake Formation Build a secure data lake in days using Glue blueprints and workflows

Amazon QuickSight Scalable, serverless, embeddable, ML-powered BI Service built for the cloud

Amazon Redshift Analyze all of your data with the fastest and most widely used cloud data warehouse

AWS Glue Simple, scalable, and serverless data integration

Question 39: Which benefits are provided by Amazon Redshift? (Select TWO)

A) Analyze Data stored in your data lake

B) Maintain performance at scale

C) Focus effort on Data warehouse administration

D) Store all the data to meet analytics need

E) Amazon Redshift includes enterprise-level security and compliance features.

ANSWER38:
A and B

Notes/Hint38:

A is correct – With Amazon Redshift, you can analyze all your data, including exabytes of data stored in your Amazon S3 data lake.

B is correct – Amazon Redshift provides consistent performance at scale.

• C is incorrect – Amazon Redshift is a fully managed data warehouse solution. It includes automations to reduce the administrative overhead traditionally associated with data warehouses. When using Amazon Redshift, you can focus your development effort on strategic data analytics solutions.

• D is incorrect – With Amazon Redshift features—such as Amazon Redshift Spectrum, materialized views, and federated query—you can analyze data where it is stored in your data lake or AWS databases. This capability provides flexibility to meet new analytics requirements without the cost, time, or complexity of moving large volumes of data between solutions.

• Answer E is incorrect – Amazon Redshift includes enterprise-level security and compliance features.

Djamga Data Sciences Big Data – Data Analytics Youtube Playlist

2- Prepare for Your AWS Certification Exam

3- LinuxAcademy

Big Data – Data Analytics Jobs:

Big Data – Data Analytics – Data Sciences Latest News:

AWS DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

DATA ANALYTICS Q&A:

Can I know what is big data?

(MySQL) When I save an invoice, the records save in more than 5 tables (ex: invoice, stock, etc). Save, if only, all the tables received all the data successfully otherwise reject that invoice which should not impact any table?

How can I become a data scientist?

Data analysis made easy: Text2Code for Jupyter notebook

What should be the learning sequence (among the following skills) to become a Data scientist – Tableau, Data Analysis, SQL, Data Science and Python (please indicate if any skill(s) is redundant?

What’s the best thing a data scientist can do for his career?

What is the best way to learn SQL for data science?

Learn Deep Learning With Python Tutorials Complete Guide…

How can I become a data scientist?

What are some struggles for people who are trying to break into the Data Science field?

Why should a data scientist work in a startup?

Why does a Data scientist needs to learn SQL when all the data exploration and data munging can be done in R or Python?

As data science changes over the next decade, what are the most crucial skills to learn to stay relevant?

Is Machine Learning a good choice for a career?

How do I learn more advanced SQL?

How can I become a data scientist?

Where can I intuitively learn SQL Joins specifically?

What is the most overrated thing about data science?

Is SQL an easy programming language?

Can a position as a data analyst lead to a position as a data scientist?

In layman’s terms (you naughty experts), can you explain how huge corporations like Google, Microsoft, Facebook, etc. upgrade their data centres and servers?

What are some of the natural careers people transition into after 5-7 years in data science/analytics?

Why would people, especially data scientists (maybe those in the industry) use SQL, rather than Pandas?

How much level of SQL knowledge is required for data analytics?

How do I learn data science (step-by-step)?

How do I get data science, computer vision, and machine learning tutorial apps?

Roughly speaking, how long should it take a person with no programming background (who already knows probability & statistics) to learn how to use R and Python for data analytics well enough to land a decent job as a data analyst?

How can I become a data scientist or a data analyst?

Does data science have good potential, or is it only hype?

What are some wrong ways to learn data science?

Is Data Science actually in demand? What skills can make me launch a fruitful career in Data Science?

Which are the best analytics and data visualization companies who implement Tableau?

Will the salary of data scientists decrease a lot in 2025? Is it worth starting a degree in it?

Will there be more data science and machine learning jobs in the future? I’m wondering if it is a good career choice.

Here’s a all list of the popular Data Science Tools used by Data Scientists !!!!…

How should you answer the question “How’s your SQL skills?” when asked by the data scientist hiring manager?

Are CS questions part of a data scientist interview at Facebook and are there interview questions position specific?

Can I work in data science if I don’t like programming?

How to organize data cleaning scripts

[/bg_collapse]

Clever Questions, Answers, Resources about:

Data Sciences
Big Data
Data Analytics
Data Sciences
Databases
Data Streams
Large DataSets

What Is a Data Scientist?

Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician. – Josh Wills
Data scientists apply sophisticated quantitative and computer science skills to both structure and analyze massive stores or continuous streams of unstructured data, with the intent to derive insights and prescribe action. – Burtch Works Data Science Salary Survey, May 2018
More than anything, what data scientists do is make discoveries while swimming in data… In a competitive landscape where challenges keep changing and data never stop flowing, data scientists help decision makers shift from ad hoc analysis to an ongoing conversation with data. – Data Scientist: The Sexiest Job of the 21st Century, Harvard Business Review

Do All Data Scientists Hold Graduate Degrees?

Data scientists are highly educated. With exceedingly rare exception, every data scientist holds at least an undergraduate degree. 91% of data scientists in 2018 held advanced degrees. The remaining 9% all held undergraduate degrees. Furthermore,

25% of data scientists hold a degree in statistics or mathematics,
20% have a computer science degree,
an additional 20% hold a degree in the natural sciences, and
18% hold an engineering degree.

The remaining 17% of surveyed data scientists held degrees in business, social science, or economics.

How Are Data Scientists Different From Data Analysts?

Broadly speaking, the roles differ in scope: data analysts build reports with narrow, well-defined KPIs. Data scientists often to work on broader business problems without clear solutions. Data scientists live on the edge of the known and unknown.

We’ll leave you with a concrete example: A data analyst cares about profit margins. A data scientist at the same company cares about market share.

How Is Data Science Used in Medicine?

Data science in healthcare best translates to biostatistics. It can be quite different from data science in other industries as it usually focuses on small samples with several confounding variables.

How Is Data Science Used in Manufacturing?

Data science in manufacturing is vast; it includes everything from supply chain optimization to the assembly line.

What are data scientists paid?

Most people are attracted to data science for the salary. It’s true that data scientists garner high salaries compares to their peers. There is data to support this: The May 2018 edition of the BurtchWorks Data Science Salary Survey, annual salary statistics were

Note the above numbers do not reflect total compensation which often includes standard benefits and may include company ownership at high levels.

How will data science evolve in the next 5 years?

Will AI replace data scientists?

What is the workday like for a data scientist?

It’s common for data scientists across the US to work 40 hours weekly. While company culture does dictate different levels of work life balance, it’s rare to see data scientists who work more than they want. That’s the virtue of being an expensive resource in a competitive job market.

How do I become a Data Scientist?

The roadmap given to aspiring data scientists can be boiled down to three steps:

Earning an undergraduate and/or advanced degree in computer science, statistics, or mathematics,
Building their portfolio of SQL, Python, and R skills, and
Getting related work experience through technical internships.

All three require a significant time and financial commitment.

There used to be a saying around datascience: The road into a data science starts with two years of university-level math.

What Should I Learn? What Order Do I Learn Them?

This answer assumes your academic background ends with a HS diploma in the US.

Python
Differential Calculus
Integral Calculus
Multivariable Calculus
Linear Algebra
Probability
Statistics

Some follow up questions and answers:

Why Python first?

Python is a general purpose language. R is used primarily by statisticians. In the likely scenario that you decide data science requires too much time, effort, and money, Python will be more valuable than your R skills. It’s preparing you to fail, sure, but in the same way a savings account is preparing you to fail.

When do I start working with data?

You’ll start working with data when you’ve learned enough Python to do so. Whether you’ll have the tools to have any fun is a much more open-ended question.

How long will this take me?

Assuming self-study and average intelligence, 3-5 years from start to finish.

How Do I Learn Python?

If you don’t know the first thing about programming, start with MIT’s course in the curated list.

These modules are the standard tools for data analysis in Python:

pandas (and by extension, numpy)Check out Minimally Sufficient Pandas for style guides and best practices.
matplotlib and seaborn See /u/rhiever’s response to How do you decide between the plotting libraries: Matplotlib, Seaborn, Bokeh?Don’t worry about bokeh or dash unless you have a personal interest in interactive visualizations.
scipy and scikit-learnInternalize the .fit() and .predict() pattern.

Curated Threads & Resources

MIT’s Introduction to Computer Science and Programming in Python A free, archived course taught at MIT in the fall 2016 semester.
Data Scientist with Python Career Track | DataCamp The first courses are free, but unlimited access costs $29/month. Users usually report a positive experience, and it’s one of the better hands-on ways to learn Python.
Sentdex’s (Harrison Kinsley) Youtube Channel Related to Python Programming Tutorials
/r/learnpython is an active sub and very useful for learning the basics.

How Do I Learn R?

If you don’t know the first thing about programming, start with R for Data Science in the curated list.

These modules are the standard tools for data analysis in Python:

Curated Threads & Resources

R for Data Science by Hadley WickhamA free ebook full of succinct code examples. Terrific for learning tidyverse syntax.Folks with some math background may prefer the free alternative, Introduction to Statistical Learning.
Data Scientist with R Career Track | DataCamp The first courses are free, but unlimited access costs $29/month. Users usually report a positive experience, and it’s one of the few hands-on ways to learn R.
R Inferno Learners with a CS background will appreciate this free handbook explaining how and why R behaves the way that it does.

How Do I Learn SQL?

Prioritize the basics of SQL. i.e. when to use functions like POW, SUM, RANK; the computational complexity of the different kinds of joins.

Concepts like relational algebra, when to use clustered/non-clustered indexes, etc. are useful, but (almost) never come up in interviews.

You absolutely do not need to understand administrative concepts like managing permissions.

Finally, there are numerous query engines and therefore numerous dialects of SQL. Use whichever dialect is supported in your chosen resource. There’s not much difference between them, so it’s easy to learn another dialect after you’ve learned one.

Curated Threads & Resources

The SQL Tutorial for Data Analysis | Mode.com
Introduction to Databases A Free MOOC supported by Stanford University.
SQL Queries for Mere MortalsA $30 book highly recommended by /u/karmanujan

How Do I Learn Calculus?

Fortunately (or unfortunately), calculus is the lament of many students, and so resources for it are plentiful. Khan Academy mimics lectures very well, and Paul’s Online Math Notes are a terrific reference full of practice problems and solutions.

Calculus, however, is not just calculus. For those unfamiliar with US terminology,

Calculus I is differential calculus.
Calculus II is integral calculus.
Calculus III is multivariable calculus.
Calculus IV is differential equations.

Differential and integral calculus are both necessary for probability and statistics, and should be completed first.

Multivariable calculus can be paired with linear algebra, but is also required.

Differential equations is where consensus falls apart. The short it is, they’re all but necessary for mathematical modeling, but not everyone does mathematical modeling. It’s another tool in the toolbox.

Curated Threads & Resources about Data Science and Data Analytics

Khan AcademyDifferential Calculus Integral Calculus Multivariable Calculus Differential Equations
Paul’s Online Math NotesDifferential Calculus Integral Calculus Multivariable Calculus

How Do I Learn Probability?

Probability is not friendly to beginners. Definitions are rooted in higher mathematics, notation varies from source to source, and solutions are frequently unintuitive. Probability may present the biggest barrier to entry in data science.

It’s best to pick a single primary source and a community for help. If you can spend the money, register for a university or community college course and attend in person.

The best free resource is MIT’s 18.05 Introduction to Probability and Statistics (Spring 2014). Leverage /r/learnmath, /r/learnmachinelearning, and /r/AskStatistics when you get inevitably stuck.

How Do I Learn Linear Algebra?

Curated Threads & Resources https://www.youtube.com/watch?v=fNk_zzaMoSs&index=1&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab

What does the typical data science interview process look like?

For general advice, Mastering the DS Interview Loop is a terrific article. The community discussed the article here.

Briefly summarized, most companies follow a five stage process:

Coding Challenge: Most common at software companies and roles contributing to a digital product.
HR Screen
Technical Screen: Often in the form of a project. Less frequently, it takes the form of a whiteboarding session at the onsite.
Onsite: Usually the project from the technical screen is presented here, followed by a meeting with the director overseeing the team you’ll join.
Negotiation & Offer

Mastering the DS Interview Loop

Preparation:

Practice questions on Leetcode which has both SQL and traditional data structures/algorithm questions
Review Brilliant for math and statistics questions.
SQL Zoo and Mode Analytics both offer various SQL exercises you can solve in your browser.

Tips:

Before you start coding, read through all the questions. This allows your unconscious mind to start working on problems in the background.
Start with the hardest problem first, when you hit a snag, move to the simpler problem before returning to the harder one.
Focus on passing all the test cases first, then worry about improving complexity and readability.
If you’re done and have a few minutes left, go get a drink and try to clear your head. Read through your solutions one last time, then submit.
It’s okay to not finish a coding challenge. Sometimes companies will create unreasonably tedious coding challenges with one-week time limits that require 5–10 hours to complete. Unless you’re desperate, you can always walk away and spend your time preparing for the next interview.

Remember, interviewing is a skill that can be learned, just like anything else. Hopefully, this article has given you some insight on what to expect in a data science interview loop.

The process also isn’t perfect and there will be times that you fail to impress an interviewer because you don’t possess some obscure piece of knowledge. However, with repeated persistence and adequate preparation, you’ll be able to land a data science job in no time!

What does the Airbnb data science interview process look like? [Coming soon]

What does the Facebook data science interview process look like? [Coming soon]

What does the Uber data science interview process look like? [Coming soon]

What does the Microsoft data science interview process look like? [Coming soon]

What does the Google data science interview process look like? [Coming soon]

What does the Netflix data science interview process look like? [Coming soon]

What does the Apple data science interview process look like? [Coming soon]

Question: How is SQL used in real data science jobs?

Real life enterprise databases are orders of magnitude more complex than the “customers, products, orders” examples used as teaching tools. SQL as a language is actually, IMO, a relatively simple language (the db administration component can get complex, but mostly data scientists aren’t doing that anyways). SQL is an incredibly important skill though for any DS role. I think when people emphasize SQL, what they really are talking about is the ability to write queries that interrogate the data and discover the nuances behind how it is collected and/or manipulated by an application before it is written to the dB. For example, is the employee’s phone number their current phone number or does the database store a history of all previous phone numbers? Critically important questions for understanding the nature of your data, and it doesn’t necessarily deal with statistics! The level of syntax required to do this is not that sophisticated, you can get pretty damn far with knowledge of all the joins, group by/analytical functions, filtering and nesting queries. In many cases, the data is too large to just select * and dump into a csv to load into pandas, so you start with SQL against the source. In my mind it’s more important for “SQL skills” to know how to generate hypotheses (that will build up to answering your business question) that can be investigated via a query than it is to be a master of SQL’s syntax. Just my two cents though!

AWS DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

Data Visualization example: 12000 Years of Human Population Dynamic

[OC] 12,000 years of human population dynamics from dataisbeautiful

Human population density estimates based on the Hyde 3.2 model.

Data visualization example: AirPods Revenue vs. Top Tech Companies

[OC] AirPods Revenue vs. Top Tech Companies from dataisbeautiful

Source: 24/7 Kevin Rooke, Google Search, SEC Edgar

Data visualization example: Crypto race: DOGE (red) versus BTC (blue), 5/6/2020 – 5/5/2021

[OC] Crypto race: DOGE (red) versus BTC (blue), 5/6/2020 – 5/5/2021 from dataisbeautiful

Data sources: Coindesk-Bitcoin ; Coindesk-Dodgecoin

Data visualization example: How have cryptocurrencies done during the Pandemic?

[OC] How have cryptocurrencies done during the Pandemic? from dataisbeautiful

Data source: Performance data on these cryptocurrencies from Investing.com which provides free historic data

Data visualization example: Countries with the Most Nuclear Warheads

[OC] Countries with the Most Nuclear Warheads from dataisbeautiful

Data Source: Here

AWS Data Analytics Specialty Certification DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

Capitol insurrection arrests per million people by state

[OC] Capitol insurrection arrests per million people by state from dataisbeautiful

Data Source: Made in Google Sheets using data from this USA Today article (for the number of arrests by arrestee’s home state) and this spreadsheet of the results of the 2020 Census (for the population of each state and DC in 2020, which was used as the denominator in calculating arrests/million people).

For more information about analytics architecture, visit the AWS Big Data Blog: AWS serverless data analytics pipeline reference architecture here

Basic Data Lake Architecture

Data Analytics Architecture on AWS

Data Analytics Process

AWS Data Analytics Specialty Certification DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

Data Lake Storage:

AWS DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

Event Driven Data Analytics Workflow on AWS

What is a Data Lake?

What is a Data Warehouse?

What are benefits of a data warehouse?

• Informed decision making

• Consolidated data from many sources

• Historical data analysis

• Data quality, consistency, and accuracy

• Separation of analytics processing from transactional databases

AWS Data Analytics Specialty Certification DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

Data Lake vs Data Warehouse – Comparison

A data warehouse is specially designed for data analytics, which identifies relationships and trends across large amounts of data. A database is used to capture and store data, such as the details of a transaction. Unlike a data warehouse, a data lake is a centralized repository for structured, semi-structured, and unstructured data. A data warehouse organizes data in a tabular format (or schema) that enables SQL queries on the data. But not all applications require data to be in tabular format. Some applications can access data in the data lake even if it is “semi-structured” or unstructured. These include big data analytics, full-text search, and machine learning.

An AWS data lake only has a storage charge for the data. No servers are necessary for the data to be stored and accessed. In the case of Amazon Athena, also, there are no additional charges for processing. Data warehouse enable fast queries of structured data from transactional systems for batch reports, business intelligence, and visualization use cases. A data lake stores data without regard to its structure. Data scientists, data analysts, and business analysts use the data lake. They support use cases such as machine learning, predictive analytics, and data discovery and profiling.

Transactional Data Ingestion

AWS Data Analytics Specialty Certification DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

Structured Query Language (SQL)

Data definition language (DDL) refers to the subset of SQL commands that define data structures and objects such as databases, tables, and views. DDL commands include the following:

• CREATE: used to create a new object.

• DROP: used to delete an object.

• ALTER: used to modify an object.

• RENAME: used to rename an object.

• TRUNCATE: used to remove all rows from a table without deleting the table itself.

AWS Data Analytics Specialty Certification DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

Data manipulation language (DML) refers to the subset of SQL commands that are used to work with data. DML commands include the following:

• SELECT: used to request records from one or more tables.

• INSERT: used to insert one or more records into a table.

• UPDATE: used to modify the data of one or more records in a table.

• DELETE: used to delete one or more records from a table.

• EXPLAIN: used to analyze and display the expected execution plan of a SQL statement.

• LOCK: used to lock a table from write operations (INSERT, UPDATE, DELETE) and prevent concurrent operations from conflicting with one another.

Data control language (DCL) refers to the subset of SQL commands that are used to configure permissions to objects. DCL commands include:

• GRANT: used to grant access and permissions to a database or object in a database, such as a schema or table.

• REVOKE: used to remove access and permissions from a database or objects in a database.

AWS Data Analytics Specialty Certification DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

Comparison of OLTP and OLAP

AWS Data Analytics Specialty Certification DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

What is Amazon Macie?

Businesses are responsible to identify and limit disclosure of sensitive data such as personally identifiable information (PII) or proprietary information. Identifying and masking sensitive information is time consuming, and becomes more complex in data lakes with various data sources and formats and broad user access to published data sets.

Amazon Macie is a fully managed data security and privacy service that uses machine learning and pattern matching to discover sensitive data in AWS. Macie includes a set of managed data identifiers which automatically detect common types of sensitive data. Examples of managed data identifiers include keywords, credentials, financial information, health information, and PII. You can also configure custom data identifiers using keywords or regular expressions to highlight organizational proprietary data, intellectual property, and other specific scenarios. You can develop security controls that operate at scale to monitor and remediate risk automatically when Macie detects sensitive data. You can use AWS Lambda functions to automatically turn on encryption for an Amazon S3 bucket where Macie detects sensitive data. Or automatically tag datasets containing sensitive data, for inclusion in orchestrated data transformations or audit reports.

Amazon Macie can be integrated into the data ingestion and processing steps of your data pipeline. This approach avoids inadvertent disclosures in published data sets by detecting and addressing the sensitive data as it is ingested and processed. Building the automated detection and processing of sensitive data into your ETL pipelines simplifies and standardizes handling of sensitive data at scale.

AWS Data Analytics Specialty Certification DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

What is AWS Glue DataBrew?

AWS Glue DataBrew is a visual data preparation tool that simplifies cleaning and normalizing datasets in preparation for use in analytics and machine learning.

• Profile data quality, identifying patterns and automatically detecting anomalies.

• Clean and normalize data using over 250 pre-built transformations, without writing code.

• Visually map the lineage of your data to understand data sources and transformation history.

• Save data cleaning and normalization workflows for automatic application to new data.

Data processed in AWS Glue DataBrew is immediately available for use in analytics and machine learning projects.

Learn more about the built-in transformations available in AWS Glue DataBrew in the Recipe actions reference: https://docs.aws.amazon.com/databrew/latest/dg/recipe-actions-reference.html

AWS Data Analytics Specialty Certification DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

What is AWS Glue?

AWS Glue is a fully managed ETL (extract, transform, and load) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores and data streams. AWS Glue consists of a central metadata repository known as the AWS Glue Data Catalog, an ETL engine that automatically generates Python or Scala code, and a flexible scheduler that handles dependency resolution, job monitoring, and retries. AWS Glue can run your ETL jobs as new data arrives. For example, you can use an AWS Lambda function to trigger your ETL jobs to run as soon as new data becomes available in Amazon S3. You can also register this new dataset in the

AWS Glue Data Catalog as part of your ETL jobs.

AWS Glue is serverless, so there’s no infrastructure to set up or manage.

AWS Data Analytics Specialty Certification DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

AWS Glue Data Catalog The AWS Glue Data Catalog provides a uniform repository where disparate systems can store and find metadata to keep track of data in data silos, and use that metadata to query and transform the data. Once the data is cataloged, it is immediately available for search and query using Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.

You can use AWS Identity and Access Management (IAM) policies to control access to the data sources managed by the AWS Glue Data Catalog. The Data Catalog also provides comprehensive audit and governance capabilities, with schema-change tracking and data access controls.

AWS Glue crawler

AWS Glue crawlers can scan data in all kinds of repositories, classify it, extract schema information from it, and store the metadata automatically in the AWS Glue Data Catalog.

AWS Glue ETL

AWS Glue can run your ETL jobs as new data arrives. For example, you can use an AWS Lambda function to trigger your ETL jobs to run as soon as new data becomes available in Amazon S3. You can also register this new dataset in the AWS Glue Data Catalog as part of your ETL jobs.

AWS Glue Studio

AWS Glue Studio provides a graphical interface to create, run, and monitor extract, transform, and load (ETL) jobs in AWS Glue. You can visually compose data transformation workflows and seamlessly run them on AWS Glue’s Apache Spark-based serverless ETL engine. AWS Glue Studio also offers tools to monitor ETL workflows and validate that they are operating as intended.

AWS Data Analytics Specialty Certification DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

What is Amazon Athena?

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to set up or manage, and you can start analyzing data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3. To get started, just log into the Amazon Athena console, define your schema, and start querying. Athena uses Presto with full standard SQL support. It works with a variety of standard data formats, including CSV, JSON, ORC, Apache Parquet and Avro. While Athena is ideal for quick, ad-hoc querying, it can also handle complex analysis, including large joins, window functions, and arrays.

Amazon Athena helps you analyze data stored in Amazon S3. You can use Athena to run ad-hoc queries using ANSI SQL, without the need to aggregate or load the data into Athena. It can process unstructured, semi-structured, and structured datasets. Examples include CSV, JSON, Avro or columnar data formats such as Apache Parquet and Apache ORC. Athena integrates with Amazon QuickSight for easy visualization. You can also use Athena to generate reports or to explore data with business intelligence tools or SQL clients, connected via an ODBC or JDBC driver.

The tables and databases that you work with in Athena to run queries are based on metadata. Metadata is data about the underlying data in your dataset. How that metadata describes your dataset is called the schema. For example, a table name, the column names in the table, and the data type of each column are schema, saved as metadata, that describe an underlying dataset. In Athena, we call a system for organizing metadata a data catalog or a metastore. The combination of a dataset and the data catalog that describes it is called a data source.

The relationship of metadata to an underlying dataset depends on the type of data source that you work with. Relational data sources like MySQL, PostgreSQL, and SQL Server tightly integrate the metadata with the dataset. In these systems, the metadata is most often written when the data is written. Other data sources, like those built using Hive, allow you to define metadata on-the-fly when you read the dataset. The dataset can be in a variety of formats; for example, CSV, JSON, Parquet, or Avro.

AWS Data Analytics Specialty Certification DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

What is AWS Lake Formation?

Lake Formation is a fully managed service that enables data engineers, security officers, and data analysts to build, secure, manage, and use your data lake

To build your data lake in AWS Lake Formation, you must register an Amazon S3 location as a data lake. The Lake Formation service must have permission to write to the AWS Glue Data Catalog and to Amazon S3 locations in the data lake.

Next, identify the data sources to be ingested. AWS Lake formation can move data into your data lake from existing Amazon S3 data stores. Lake Formation can collect and organize datasets, such as logs from AWS CloudTrail, AWS CloudFront, detailed billing reports, or Elastic Load Balancing. You can ingest bulk or incremental datasets from relational, NoSQL, or non-relational databases. Lake Formation can ingest data from databases running in Amazon RDS or hosted in Amazon EC2. You can also ingest data from on-premises databases using Java Database Connectivity JDBC connectors. You can use custom AWS Glue jobs to load data from other databases or to ingest streaming data using Amazon Kinesis or Amazon DynamoDB.

AWS Lake Formation manages AWS Glue crawlers, AWS Glue ETL jobs, the AWS Glue Data Catalog, security settings, and access control:

• Lake Formation is an automated build environment based on AWS Glue.

• Lake Formation coordinates AWS Glue crawlers to identify datasets within the specified data stores and collect metadata for each dataset

• Lake Formation can perform transformations on your data, such as rewriting and organizing data into a consistent, analytics-friendly format. Lake Formation creates transformation templates and schedules AWS Glue jobs to prepare and optimize your data for analytics. Lake Formation also helps clean your data using FindMatches, an ML-based deduplication transform. AWS Glue jobs encapsulate scripts, such as ETL scripts, which connect to source data, process it, and write it out to a data target. AWS Glue triggers can start jobs based on a schedule or event, or on demand. AWS Glue workflows orchestrate AWS ETL jobs, crawlers, and triggers. You can define a workflow manually or use a blueprint based on commonly ingested data source types.

• The AWS Glue Data Catalog within the data lake persistently stores the metadata from raw and processed datasets. Metadata about data sources and targets is in the form of databases and tables. Tables store information about the underlying data, including schema information, partition information, and data location. Databases are collections of tables. Each AWS account has one data catalog per AWS Region.

• Lake Formation provides centralized access controls for your data lake, including security policy-based rules for users and applications by role. You can authenticate the users and roles using AWS IAM. Once the rules are defined, Lake Formation enforces them with table-and column-level granularity for users of Amazon Redshift Spectrum and Amazon Athena. Rules are enforced at the table-level in AWS Glue, which is normally accessed for administrators.

• Lake Formation leverages the encryption capabilities of Amazon S3 for data in the data lake. This approach provides automatic server-side encryption with keys managed by the AWS Key Management Service (KMS). S3 encrypts data in transit when replicating across Regions. You can separate accounts for source and destination Regions to further protect your data

AWS Data Analytics Specialty Certification DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

What is Amazon Quicksight?

Amazon QuickSight is a cloud-scale business intelligence (BI) service. In a single data dashboard, QuickSight gives decision-makers the opportunity to explore and interpret information in an interactive visual environment. QuickSight can include AWS data, third-party data, big data, spreadsheet data, SaaS data, B2B data, and more. QuickSight delivers fast and responsive query performance by using a robust in-memory engine (SPICE).

Scale from tens to tens of thousands of users

Amazon QuickSight has a serverless architecture that automatically scales to tens of thousands of users without the need to setup, configure, or manage your own servers.

Embed BI dashboards in your applications

With QuickSight, you can quickly embed interactive dashboards into your applications, websites, and portals.

Access deeper insights with Machine Learning

QuickSight leverages the proven machine learning (ML) capabilities of AWS. BI teams can perform advanced analytics without prior data science experience.

Ask questions of your data, receive answers

With QuickSight, you can quickly get answers to business questions asked in natural language with QuickSight’s new ML-powered natural language query capability, Q.

AWS Data Analytics Specialty Certification DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

What is SPICE?

SPICE is the Super-fast, Parallel, In-memory Calculation Engine in QuickSight. SPICE is engineered to rapidly perform advanced calculations and serve data. The storage and processing capacity available in SPICE speeds up the analytical queries that you run against your imported data. By using SPICE, you save time because you don’t need to retrieve the data every time that you change an analysis or update a visual.

When you import data into a dataset rather than using a direct SQL query, it becomes SPICE data because of how it’s stored. SPICE is the Amazon QuickSight Super-fast, Parallel, In-memory Calculation Engine. It’s engineered to rapidly perform advanced calculations and serve data. In Enterprise edition, data stored in SPICE is encrypted at rest.

When you create or edit a dataset, you choose to use either SPICE or a direct query, unless the dataset contains uploaded files. Importing (also called ingesting) your data into SPICE can save time and money:

• Your analytical queries process faster.

• You don’t need to wait for a direct query to process.

• Data stored in SPICE can be reused multiple times without incurring additional costs. If you use a data source that charges per query, you’re charged for querying the data when you first create the dataset and later when you refresh the dataset.

AWS Data Analytics Specialty Certification DAS-C01 Exam Prep on iOS

AWS DAS-C01 Exam Prep on android

AWS DAS-C01 Exam Prep on Windows

Serverless data lake reference architecture:

You can use AWS services as building blocks to build serverless data lakes and analytics pipelines. You can apply best practices on how to ingest, store, transform, and analyze structured and unstructured data at scale. Achieve the scale without needing to manage any storage or compute infrastructure. A decoupled, component-driven architecture allows you to start small and scale out slowly. You can quickly add new purpose-built components to one of six architecture layers to address new requirements and data sources.

This data lake-centric architecture can support business intelligence (BI) dashboarding, interactive SQL queries, big data processing, predictive analytics, and machine learning use cases.

• The ingestion layer includes protocols to support ingestion of structured, unstructured, or streaming data from a variety of sources.

• The storage layer provides durable, scalable, secure, and cost-effective storage of datasets across ingestion and processing.

• The landing zone stores data as ingested.

• Data engineers run initial quality checks to validate and cleanse data in the landing zone, producing the raw dataset.

• The processing layer creates curated datasets by further cleansing, normalizing, standardizing, and enriching data from the raw zone. The curated dataset is typically stored in formats that support performant and cost-effective access by the consumption layer.

• The catalog layer stores business and technical metadata about the datasets hosted in the storage layer.

• The consumption layer contains functionality for Search, Analytics, and Visualization. It integrates with the data lake storage, cataloging, and security layers. This integration supports analysis methods such as SQL, batch analytics, BI dashboards, reporting, and ML.

• The security and monitoring layer protects data within the storage layer and other resources in the data lake. This layer includes access control, encryption, network protection, usage monitoring, and auditing.

You can learn more about this reference architecture at AWS Big Data Blog: AWS serverless data analytics pipeline reference architecture: https://aws.amazon.com/blogs/big-data/aws-serverless-data-analytics-pipeline-reference-architecture/

What are Data Lakes Best Practices?

The main challenge with a data lake architecture is that raw data is stored with no oversight of the contents. To make the data usable, you must have defined mechanisms to catalog and secure the data. Without these mechanisms, data cannot be found or trusted, resulting in a “data swamp.” Meeting the needs of diverse stakeholders requires data lakes to have governance, semantic consistency, and access controls.

The Analytics Lens for the AWS Well-Architected Framework covers common analytics applications scenarios, including data lakes. It identifies key elements to help you architect your data lake according to best practices, including the following configuration notes:

• Decide on a location for data lake ingestion (that is, an S3 bucket). Select a frequency and isolation mechanism that meets your business needs.

• For Tier 2 Data, partition the data with keys that align to common query filter

. This enables pruning by common analytics tools that work on raw data files and increases performance

• Choose optimal file sizes to reduce Amazon S3 round trips during compute environment ingestion. Recommended: 512 MB – 1 GB in a columnar format (ORC/Parquet) per partition.

• Perform frequent scheduled compactions that align to the optimal file sizes noted previously. For example, compact into daily partitions if hourly files are too small.

• For data with frequent updates or deletes (that is, mutable data), either: o Temporarily store replicated data to a database like Amazon Redshift, Apache Hive, or Amazon RDS. Once the data becomes static, and then offload it to Amazon S3. Or, o Append the data to delta files per partition and compact it on a scheduled basis. You can use AWS Glue or Apache Spark on Amazon EMR for this processing.

With Tier 2 and Tier 3 Data being stored in Amazon S3, partition data using a high cardinality key. This is honored by Presto, Apache Hive, and Apache Spark and improves the query filter performance on that key

• Sort data in each partition with a secondary key that aligns to common filter queries. This allows query engines to skip files and get to requested data faster. For more information on the Analytics Lens for the AWS Well-Architected Framework, visit https://docs.aws.amazon.com/wellarchitected/latest/analytics-lens/data-lake.html

References:

For additional information on AWS data lakes and data analytics architectures, visit:

• AWS Well-Architected: Learn, measure, and build using architectural best practices: https://aws.amazon.com/architecture/well-architected

• AWS Lake Formation: Build a secure data lake in days: https://aws.amazon.com/lake-formation

• Getting Started with Amazon S3: https://aws.amazon.com/s3/getting-started

• Security in AWS Lake Formation: https://docs.aws.amazon.com/lake-formation/latest/dg/security.html

AWS Lake Formation: How It Works: https://docs.aws.amazon.com/lake-formation/latest/dg/how-it-works.html

• AWS Lake Formation Dashboard: https://us-west-2.console.aws.amazon.com/lakeformation

• Data Lake Storage on AWS: https://aws.amazon.com/products/storage/data-lake-storage/

• Building Big Data Storage Solutions (Data Lakes) for Maximum Flexibility: https://docs.aws.amazon.com/whitepapers/latest/building-data-lakes/building-data-lake-aws.html

• Data Ingestion Methods: https://docs.aws.amazon.com/whitepapers/latest/building-data-lakes/data-ingestion-methods.html

October 12, 2019January 24, 2023

What are the corresponding Azure and Google Cloud services for each of the AWS services?

AI Dashboard is available on the Web, Apple, Google, and Microsoft, PRO version

What are the corresponding Azure and Google Cloud services for each of the AWS services?

What are unique distinctions and similarities between AWS, Azure and Google Cloud services? For each AWS service, what is the equivalent Azure and Google Cloud service? For each Azure service, what is the corresponding Google Service? AWS Services vs Azure vs Google Services? Side by side comparison between AWS, Google Cloud and Azure Service?

For a better experience, use the mobile app here.

AWS vs Azure vs Google
What are the corresponding Azure and Google Cloud services for each of the AWS services? — AWS vs Azure vs Google Mobile App

Category: Marketplace
Easy-to-deploy and automatically configured third-party applications, including single virtual machine or multiple virtual machine solutions.
References:
[AWS]:AWS Marketplace
[Azure]:Azure Marketplace
[Google]:Google Cloud Marketplace
Tags: #AWSMarketplace, #AzureMarketPlace, #GoogleMarketplace
Differences: They are both digital catalog with thousands of software listings from independent software vendors that make it easy to find, test, buy, and deploy software that runs on their respective cloud platform.

Get 20% off Google Google Workspace (Google Meet) Standard Plan with the following codes: 96DRHDRA9J7GTN6
Get 20% off Google Workspace (Google Meet) Business Plan (AMERICAS): M9HNXHX3WC9H7YE (Email us for more codes)

Active Anti-Aging Eye Gel, Reduces Dark Circles, Puffy Eyes, Crow's Feet and Fine Lines & Wrinkles, Packed with Hyaluronic Acid & Age Defying Botanicals

Category: AI and machine learning
A cloud service to train, deploy, automate, and manage machine learning models.
References:
[AWS]:AWS SageMaker(build, train and deploy machine learning models), AWS DeepComposer (ML enabled musical keyboard), Amazon Fraud Detector (Detect more online fraud faster), Amazon CodeGuru (Automate code reviews and identify expensive lines of code), Contact Lens for Amazon Connect (Contact center analytics powered by ML), Amazon Kendra (Reinvent enterprise search with ML), Amazon Augmented AI (Easily implement human review of ML predictions), Amazon SageMaker Studio (The first visual IDE for machine learning), Amazon SageMaker Notebooks (Quickly start and share ML notebooks), Amazon SageMaker Experiments (Organize, track, and evaluate ML experiments), Amazon SageMaker Debugger (Analyze and debug ML models in real time), Amazon SageMaker Autopilot (Automatically create high quality ML models), Amazon SageMaker Model Monitor (Continuously monitor ML models)
[Azure]:Azure Machine Learning
[Google]:Google Cloud TensorFlow
Tags: #AI, #CloudAI, #SageMaker, #AzureMachineLearning, #TensorFlow
Differences: According to the StackShare community, Azure Machine Learning has a broader approval, being mentioned in 12 company stacks & 8 developers stacks; compared to Amazon Machine Learning, which is listed in 8 company stacks and 9 developer stacks.

Category: AI and machine learning
Build and connect intelligent bots that interact with your users using text/SMS, Skype, Teams, Slack, Office 365 mail, Twitter, and other popular services.
References:
[AWS]:Alexa Skills Kit (enables a developer to build skills, also called conversational applications, on the Amazon Alexa artificial intelligence assistant.)
[Azure]:Microsoft Bot Framework (building enterprise-grade conversational AI experiences.)
[Google]:Google Assistant Actions ( developer platform that lets you create software to extend the functionality of the Google Assistant, Google’s virtual personal assistant,)

Tags: #AlexaSkillsKit, #MicrosoftBotFramework, #GoogleAssistant
Differences: One major advantage Google gets over Alexa is that Google Assistant is available to almost all Android devices.

Advertise with us - Post Your Good Content Here
We are ranked in the Top 20 on Google

Category: AI and machine learning
Description:API capable of converting speech to text, understanding intent, and converting text back to speech for natural responsiveness.
References:
[AWS]:Amazon Lex (building conversational interfaces into any application using voice and text.)
[Azure]:Azure Speech Services(unification of speech-to-text, text-to-speech, and speech translation into a single Azure subscription)
[Google]:Google APi.ai, AI Hub (Hosted repo of plug-and-play AI component), AI building blocks(for developers to add sight, language, conversation, and structured data to their applications.), AI Platform(code-based data science development environment, lets ML developers and data scientists quickly take projects from ideation to deployment.), DialogFlow (Google-owned developer of human–computer interaction technologies based on natural language conversations. ), TensorFlow(Open Source Machine Learning platform)

"Pass the AWS Cloud Practitioner Certification with flying colors: Master the Exam with 250+ Quizzes, Cheat Sheets, Flashcards, and Illustrated Study Guides - 2024 Edition"

Tags: #AmazonLex, #CogintiveServices, #AzureSpeech, #Api.ai, #DialogFlow, #Tensorflow
Differences: api.ai provides us with such a platform which is easy to learn and comprehensive to develop conversation actions. It is a good example of the simplistic approach to solving complex man to machine communication problem using natural language processing in proximity to machine learning. Api.ai supports context based conversations now, which reduces the overhead of handling user context in session parameters. On the other hand in Lex this has to be handled in session. Also, api.ai can be used for both voice and text based conversations (assistant actions can be easily created using api.ai).

Category: AI and machine learning
Description:Computer Vision: Extract information from images to categorize and process visual data.
References:
[AWS]:Amazon Rekognition (based on the same proven, highly scalable, deep learning technology developed by Amazon’s computer vision scientists to analyze billions of images and videos daily. It requires no machine learning expertise to use.)
[Azure]:Cognitive Services(bring AI within reach of every developer—without requiring machine-learning expertise.)
[Google]:Google Vision (offers powerful pre-trained machine learning models through REST and RPC APIs.)
Tags: AmazonRekognition, #GoogleVision, #AzureSpeech
Differences: For now, only Google Cloud Vision supports batch processing. Videos are not natively supported by Google Cloud Vision or Amazon Rekognition. The Object Detection functionality of Google Cloud Vision and Amazon Rekognition is almost identical, both syntactically and semantically.
Differences:
Google Cloud Vision and Amazon Rekognition offer a broad spectrum of solutions, some of which are comparable in terms of functional details, quality, performance, and costs.

Dive into a comprehensive AWS Cloud Practitioner CLF-C02 Certification guide, masterfully weaving insights from Tutorials Dojo, Adrian Cantrill, Stephane Maarek, and AWS Skills Builder into one unified resource.

Category: Big data and analytics: Data warehouse
Description:Cloud-based Enterprise Data Warehouse (EDW) that uses Massively Parallel Processing (MPP) to quickly run complex queries across petabytes of data.
References:
[AWS]:AWS Redshift (scalable data warehouse that makes it simple and cost-effective to analyze all your data across your data warehouse and data lake.), Amazon Redshift Data Lake Export (Save query results in an open format),Amazon Redshift Federated Query(Run queries n line transactional data), Amazon Redshift RA3(Optimize costs with up to 3x better performance), AQUA: AQUA: Advanced Query Accelerator for Amazon Redshift (Power analytics with a new hardware-accelerated cache), UltraWarm for Amazon Elasticsearch Service(Store logs at ~1/10th the cost of existing storage tiers )
[Azure]:Azure Synapse formerly SQL Data Warehouse (limitless analytics service that brings together enterprise data warehousing and Big Data analytics.)
[Google]:BigQuery (RESTful web service that enables interactive analysis of massive datasets working in conjunction with Google Storage. )
Tags:#AWSRedshift, #GoogleBigQuery, #AzureSynapseAnalytics
Differences: Loading data, Managing resources (and hence pricing), Ecosystem. Ecosystem is where Redshift is clearly ahead of BigQuery. While BigQuery is an affordable, performant alternative to Redshift, they are considered to be more up and coming

Invest in your future today by enrolling in this Azure Fundamentals - Pass the Azure Fundamentals Exam with Ease: Master the AZ-900 Certification with the Comprehensive Exam Preparation Guide!

Category: Big data and analytics: Data warehouse
Description: Apache Spark-based analytics platform. Managed Hadoop service. Data orchestration, ETL, Analytics and visualization
References:
[AWS]:EMR, Data Pipeline, Kinesis Stream, Kinesis Firehose, Glue, QuickSight, Athena, CloudSearch
[Azure]:Azure Databricks, Data Catalog Cortana Intelligence, HDInsight, Power BI, Azure Datafactory, Azure Search, Azure Data Lake Anlytics, Stream Analytics, Azure Machine Learning
[Google]:Cloud DataProc, Machine Learning, Cloud Datalab
Tags:#EMR, #DataPipeline, #Kinesis, #Cortana, AzureDatafactory, #AzureDataAnlytics, #CloudDataProc, #MachineLearning, #CloudDatalab
Differences: All three providers offer similar building blocks; data processing, data orchestration, streaming analytics, machine learning and visualisations. AWS certainly has all the bases covered with a solid set of products that will meet most needs. Azure offers a comprehensive and impressive suite of managed analytical products. They support open source big data solutions alongside new serverless analytical products such as Data Lake. Google provide their own twist to cloud analytics with their range of services. With Dataproc and Dataflow, Google have a strong core to their proposition. Tensorflow has been getting a lot of attention recently and there will be many who will be keen to see Machine Learning come out of preview.

Category: Virtual servers
Description:Virtual servers allow users to deploy, manage, and maintain OS and server software. Instance types provide combinations of CPU/RAM. Users pay for what they use with the flexibility to change sizes.
Batch: Run large-scale parallel and high-performance computing applications efficiently in the cloud.
References:
[AWS]:Elastic Compute Cloud (EC2), Amazon Bracket(Explore and experiment with quantum computing), Amazon Ec2 M6g Instances (Achieve up to 40% better price performance), Amazon Ec2 Inf1 instancs (Deliver cost-effective ML inference), AWS Graviton2 Processors (Optimize price performance for cloud workloads), AWS Batch, AWS AutoScaling, VMware Cloud on AWS, AWS Local Zones (Run low latency applications at the edge), AWS Wavelength (Deliver ultra-low latency applications for 5G devices), AWS Nitro Enclaves (Further protect highly sensitive data), AWS Outposts (Run AWS infrastructure and services on-premises)
[Azure]:Azure Virtual Machines, Azure Batch, Virtual Machine Scale Sets, Azure VMware by CloudSimple
[Google]:Compute Engine, Preemptible Virtual Machines, Managed instance groups (MIGs), Google Cloud VMware Solution by CloudSimple
Tags: #AWSEC2, #AWSBatch, #AWSAutoscaling, #AzureVirtualMachine, #AzureBatch, #VirtualMachineScaleSets, #AzureVMWare, #ComputeEngine, #MIGS, #VMWare
Differences: There is very little to choose between the 3 providers when it comes to virtual servers. Amazon has some impressive high end kit, on the face of it this sound like it would make AWS a clear winner. However, if your only option is to choose the biggest box available you will need to make sure you have very deep pockets, and perhaps your money may be better spent re-architecting your apps for horizontal scale.Azure’s remains very strong in the PaaS space and now has a IaaS that can genuinely compete with AWS
Google offers a simple and very capable set of services that are easy to understand. However, with availability in only 5 regions it does not have the coverage of the other players.

Category: Containers and container orchestrators
Description: A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another.
Container orchestration is all about managing the lifecycles of containers, especially in large, dynamic environments.
References:
[AWS]:EC2 Container Service (ECS), Fargate(Run containers without anaging servers or clusters), EC2 Container Registry(managed AWS Docker registry service that is secure, scalable, and reliable.), Elastic Container Service for Kubernetes (EKS: runs the Kubernetes management infrastructure across multiple AWS Availability Zones), App Mesh( application-level networking to make it easy for your services to communicate with each other across multiple types of compute infrastructure)
[Azure]:Azure Container Instances, Azure Container Registry, Azure Kubernetes Service (AKS), Service Fabric Mesh
[Google]:Google Container Engine, Container Registry, Kubernetes Engine
Tags:#ECS, #Fargate, #EKS, #AppMesh, #ContainerEngine, #ContainerRegistry, #AKS
Differences: Google Container Engine, AWS Container Services, and Azure Container Instances can be used to run docker containers. Google offers a simple and very capable set of services that are easy to understand. However, with availability in only 5 regions it does not have the coverage of the other players.

Category: Serverless
Description: Integrate systems and run backend processes in response to events or schedules without provisioning or managing servers.
References:
[AWS]:AWS Lambda
[Azure]:Azure Functions
[Google]:Google Cloud Functions
Tags:#AWSLAmbda, #AzureFunctions, #GoogleCloudFunctions
Differences: Both AWS Lambda and Microsoft Azure Functions and Google Cloud Functions offer dynamic, configurable triggers that you can use to invoke your functions on their platforms. AWS Lambda, Azure and Google Cloud Functions support Node.js, Python, and C#. The beauty of serverless development is that, with minor changes, the code you write for one service should be portable to another with little effort – simply modify some interfaces, handle any input/output transforms, and an AWS Lambda Node.JS function is indistinguishable from a Microsoft Azure Node.js Function. AWS Lambda provides further support for Python and Java, while Azure Functions provides support for F# and PHP. AWS Lambda is built from the AMI, which runs on Linux, while Microsoft Azure Functions run in a Windows environment. AWS Lambda uses the AWS Machine architecture to reduce the scope of containerization, letting you spin up and tear down individual pieces of functionality in your application at will.

Category: Relational databases
Description: Managed relational database service where resiliency, scale, and maintenance are primarily handled by the platform.
References:
[AWS]:AWS RDS(MySQL and PostgreSQL-compatible relational database built for the cloud,), Aurora(MySQL and PostgreSQL-compatible relational database built for the cloud)
[Azure]:SQL Database, Azure Database for MySQL, Azure Database for PostgreSQL
[Google]:Cloud SQL
Tags: #AWSRDS, #AWSAUrora, #AzureSQlDatabase, #AzureDatabaseforMySQL, #GoogleCloudSQL
Differences: All three providers boast impressive relational database offering. RDS supports an impressive range of managed relational stores while Azure SQL Database is probably the most advanced managed relational database available today. Azure also has the best out-of-the-box support for cross-region geo-replication across its database offerings.

Category: NoSQL, Document Databases
Description:A globally distributed, multi-model database that natively supports multiple data models: key-value, documents, graphs, and columnar.
References:
[AWS]:DynamoDB (key-value and document database that delivers single-digit millisecond performance at any scale.), SimpleDB ( a simple web services interface to create and store multiple data sets, query your data easily, and return the results.), Managed Cassandra Services(MCS)
[Azure]:Table Storage, DocumentDB, Azure Cosmos DB
[Google]:Cloud Datastore (handles sharding and replication in order to provide you with a highly available and consistent database. )
Tags:#AWSDynamoDB, #SimpleDB, #TableSTorage, #DocumentDB, AzureCosmosDB, #GoogleCloudDataStore
Differences:DynamoDB and Cloud Datastore are based on the document store database model and are therefore similar in nature to open-source solutions MongoDB and CouchDB. In other words, each database is fundamentally a key-value store. With more workloads moving to the cloud the need for NoSQL databases will become ever more important, and again all providers have a good range of options to satisfy most performance/cost requirements. Of all the NoSQL products on offer it’s hard not to be impressed by DocumentDB; Azure also has the best out-of-the-box support for cross-region geo-replication across its database offerings.

Category:Caching
Description:An in-memory–based, distributed caching service that provides a high-performance store typically used to offload non transactional work from a database.
References:
[AWS]:AWS ElastiCache (works as an in-memory data store and cache to support the most demanding applications requiring sub-millisecond response times.)
[Azure]:Azure Cache for Redis (based on the popular software Redis. It is typically used as a cache to improve the performance and scalability of systems that rely heavily on backend data-stores.)
[Google]:Memcache (In-memory key-value store, originally intended for caching)
Tags:#Redis, #Memcached
<Differences: They all support horizontal scaling via sharding.They all improve the performance of web applications by allowing you to retrive information from fast, in-memory caches, instead of relying on slower disk-based databases.”, “Differences”: “ElastiCache supports Memcached and Redis. Memcached Cloud provides various data persistence options as well as remote backups for disaster recovery purposes. Redis offers persistence to disk, Memcache does not. This can be very helpful if you cache lots of data, since you remove the slowness around having a fully cold cache. Redis also offers several extra data structures that Memcache doesn’t— Lists, Sets, Sorted Sets, etc. Memcache only has Key/Value pairs. Memcache is multi-threaded. Redis is single-threaded and event driven. Redis is very fast, but it’ll never be multi-threaded. At hight scale, you can squeeze more connections and transactions out of Memcache. Memcache tends to be more memory efficient. This can make a big difference around the magnitude of 10s of millions or 100s of millions of keys. ElastiCache supports Memcached and Redis. Memcached Cloud provides various data persistence options as well as remote backups for disaster recovery purposes. Redis offers persistence to disk, Memcache does not. This can be very helpful if you cache lots of data, since you remove the slowness around having a fully cold cache. Redis also offers several extra data structures that Memcache doesn’t— Lists, Sets, Sorted Sets, etc. Memcache only has Key/Value pairs. Memcache is multi-threaded. Redis is single-threaded and event driven. Redis is very fast, but it’ll never be multi-threaded. At hight scale, you can squeeze more connections and transactions out of Memcache. Memcache tends to be more memory efficient. This can make a big difference around the magnitude of 10s of millions or 100s of millions of keys.

Category: Security, identity, and access
Description:Authentication and authorization: Allows users to securely control access to services and resources while offering data security and protection. Create and manage users and groups, and use permissions to allow and deny access to resources.
References:
[AWS]:Identity and Access Management (IAM), AWS Organizations, Multi-Factor Authentication, AWS Directory Service, Cognito(provides solutions to control access to backend resources from your app), Amazon Detective (Investigate potential security issues), AWS IAM Access Analyzer(Easily analyze resource accessibility)
[Azure]:Azure Active Directory, Azure Subscription Management + Azure RBAC, Multi-Factor Authentication, Azure Active Directory Domain Services, Azure Active Directory B2C, Azure Policy, Management Groups
[Google]:Cloud Identity, Identity Platform, Cloud IAM, Policy Intelligence, Cloud Resource Manager, Cloud Identity-Aware Proxy, Context-aware accessManaged Service for Microsoft Active Directory, Security key enforcement, Titan Security Key
Tags: #IAM, #AWSIAM, #AzureIAM, #GoogleIAM, #Multi-factorAuthentication
Differences: One unique thing about AWS IAM is that accounts created in the organization (not through federation) can only be used within that organization. This contrasts with Google and Microsoft. On the good side, every organization is self-contained. On the bad side, users can end up with multiple sets of credentials they need to manage to access different organizations. The second unique element is that every user can have a non-interactive account by creating and using access keys, an interactive account by enabling console access, or both. (Side note: To use the CLI, you need to have access keys generated.)

Category: Object Storage and Content delivery
Description:Object storage service, for use cases including cloud applications, content distribution, backup, archiving, disaster recovery, and big data analytics.
References:
[AWS]:Simple Storage Services (S3), Import/Export(used to move large amounts of data into and out of the Amazon Web Services public cloud using portable storage devices for transport.), Snowball( petabyte-scale data transport solution that uses devices designed to be secure to transfer large amounts of data into and out of the AWS Cloud), CloudFront( content delivery network (CDN) is massively scaled and globally distributed), Elastic Block Store (EBS: high performance block storage service), Elastic File System(shared, elastic file storage system that grows and shrinks as you add and remove files.), S3 Infrequent Access (IA: is for data that is accessed less frequently, but requires rapid access when needed. ), S3 Glacier( long-term storage of data that is infrequently accessed and for which retrieval latency times of 3 to 5 hours are acceptable.), AWS Backup( makes it easy to centralize and automate the back up of data across AWS services in the cloud as well as on-premises using the AWS Storage Gateway.), Storage Gateway(hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage), AWS Import/Export Disk(accelerates moving large amounts of data into and out of AWS using portable storage devices for transport)
[Azure]:Azure Blob storage, File Storage, Data Lake Store, Azure Backup, Azure managed disks, Azure Files, Azure Storage cool tier, Azure Storage archive access tier, Azure Backup, StorSimple, Import/Export
[Google]:Cloud Storage, GlusterFS, CloudCDN
Tags:#S3, #AzureBlobStorage, #CloudStorage
Differences:Source: All providers have good object storage options and so storage alone is unlikely to be a deciding factor when choosing a cloud provider. The exception perhaps is for hybrid scenarios, in this case Azure and AWS clearly win. AWS and Google’s support for automatic versioning is a great feature that is currently missing from Azure; however Microsoft’s fully managed Data Lake Store offers an additional option that will appeal to organisations who are looking to run large scale analytical workloads. If you are prepared to wait 4 hours for your data and you have considerable amounts of the stuff then AWS Glacier storage might be a good option. If you use the common programming patterns for atomic updates and consistency, such as etags and the if-match family of headers, then you should be aware that AWS does not support them, though Google and Azure do. Azure also supports blob leasing, which can be used to provide a distributed lock.

Category:Internet of things (IoT)
Description:A cloud gateway for managing bidirectional communication with billions of IoT devices, securely and at scale. Deploy cloud intelligence directly on IoT devices to run in on-premises scenarios.
References:
[AWS]:AWS IoT (Internet of Things), AWS Greengrass, Kinesis Firehose, Kinesis Streams, AWS IoT Things Graph
[Azure]:Azure IoT Hub, Azure IoT Edge, Event Hubs, Azure Digital Twins, Azure Sphere
[Google]:Google Cloud IoT Core, Firebase, Brillo, Weave, CLoud Pub/SUb, Stream Analysis, Big Query, Big Query Streaming API
Tags:#IoT, #InternetOfThings, #Firebase
Differences:AWS and Azure have a more coherent message with their products clearly integrated into their respective platforms, whereas Google Firebase feels like a distinctly separate product.

Unlock the Secrets of Africa: Master African History, Geography, Culture, People, Cuisine, Economics, Languages, Music, Wildlife, Football, Politics, Animals, Tourism, Science and Environment with the Top 1000 Africa Quiz and Trivia. Get Yours Now!

"Become a Canada Expert: Ace the Citizenship Test and Impress Everyone with Your Knowledge of Canadian History, Geography, Government, Culture, People, Languages, Travel, Wildlife, Hockey, Tourism, Sceneries, Arts, and Data Visualization. Get the Top 1000 Canada Quiz Now!"

Category:Web Applications
Description:Managed hosting platform providing easy to use services for deploying and scaling web applications and services. API Gateway is a a turnkey solution for publishing APIs to external and internal consumers. Cloudfront is a global content delivery network that delivers audio, video, applications, images, and other files.
References:
[AWS]:Elastic Beanstalk (for deploying and scaling web applications and services developed with Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker on familiar servers such as Apache, Nginx, Passenger, and IIS), AWS Wavelength (for delivering ultra-low latency applications for 5G), API Gateway (makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale.), CloudFront (web service that speeds up distribution of your static and dynamic web content, such as .html, .css, .js, and image files, to your users. CloudFront delivers your content through a worldwide network of data centers called edge locations.),Global Accelerator ( improves the availability and performance of your applications with local or global users. It provides static IP addresses that act as a fixed entry point to your application endpoints in a single or multiple AWS Regions, such as your Application Load Balancers, Network Load Balancers or Amazon EC2 instances.)AWS AppSync (simplifies application development by letting you create a flexible API to securely access, manipulate, and combine data from one or more data sources: GraphQL service with real-time data synchronization and offline programming features. )
[Azure]:App Service, API Management, Azure Content Delivery Network, Azure Content Delivery Network
[Google]:App Engine, Cloud API, Cloud Enpoint, APIGee
Tags: #AWSElasticBeanstalk, #AzureAppService, #GoogleAppEngine, #CloudEnpoint, #CloudFront, #APIgee
Differences: With AWS Elastic Beanstalk, developers retain full control over the AWS resources powering their application. If developers decide they want to manage some (or all) of the elements of their infrastructure, they can do so seamlessly by using Elastic Beanstalk’s management capabilities. AWS Elastic Beanstalk integrates with more apps than Google App Engines (Datadog, Jenkins, Docker, Slack, Github, Eclipse, etc..). Google App Engine has more features than AWS Elastic BEanstalk (App Identity, Java runtime, Datastore, Blobstore, Images, Go Runtime, etc..). Developers describe Amazon API Gateway as “Create, publish, maintain, monitor, and secure APIs at any scale”. Amazon API Gateway handles all the tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls, including traffic management, authorization and access control, monitoring, and API version management. On the other hand, Google Cloud Endpoints is detailed as “Develop, deploy and manage APIs on any Google Cloud backend”. An NGINX-based proxy and distributed architecture give unparalleled performance and scalability. Using an Open API Specification or one of our API frameworks, Cloud Endpoints gives you the tools you need for every phase of API development and provides insight with Google Cloud Monitoring, Cloud Trace, Google Cloud Logging and Cloud Trace.

Category:Encryption
Description:Helps you protect and safeguard your data and meet your organizational security and compliance commitments.
References:
[AWS]:Key Management Service AWS KMS, CloudHSM
[Azure]:Key Vault
[Google]:Encryption By Default at Rest, Cloud KMS
Tags:#AWSKMS, #Encryption, #CloudHSM, #EncryptionAtRest, #CloudKMS
Differences: AWS KMS, is an ideal solution for organizations that want to manage encryption keys in conjunction with other AWS services. In contrast to AWS CloudHSM, AWS KMS provides a complete set of tools to manage encryption keys, develop applications and integrate with other AWS services. Google and Azure offer 4096 RSA. AWS and Google offer 256 bit AES. With AWs, you can bring your own key

Category:Internet of things (IoT)
Description:A cloud gateway for managing bidirectional communication with billions of IoT devices, securely and at scale. Deploy cloud intelligence directly on IoT devices to run in on-premises scenarios.
References:
[AWS]:AWS IoT, AWS Greengrass, Kinesis Firehose ( captures and loads streaming data in storage and business intelligence (BI) tools to enable near real-time analytics in the AWS cloud), Kinesis Streams (for rapid and continuous data intake and aggregation.), AWS IoT Things Graph (makes it easy to visually connect different devices and web services to build IoT applications.)
[Azure]:Azure IoT Hub, Azure IoT Edge, Event Hubs, Azure Digital Twins, Azure Sphere
[Google]:Google Cloud IoT Core, Firebase, Brillo, Weave, CLoud Pub/SUb, Stream Analysis, Big Query, Big Query Streaming API
Tags:#IoT, #InternetOfThings, #Firebase
Differences:AWS and Azure have a more coherent message with their products clearly integrated into their respective platforms, whereas Google Firebase feels like a distinctly separate product.

Category: Backend process logic
Description: Cloud technology to build distributed applications using out-of-the-box connectors to reduce integration challenges. Connect apps, data and devices on-premises or in the cloud.
References:
[AWS]:AWS Step Functions ( lets you build visual workflows that enable fast translation of business requirements into technical requirements. You can build applications in a matter of minutes, and when needs change, you can swap or reorganize components without customizing any code.)
[Azure]:Logic Apps (cloud service that helps you schedule, automate, and orchestrate tasks, business processes, and workflows when you need to integrate apps, data, systems, and services across enterprises or organizations.)
[Google]:Dataflow ( fully managed service for executing Apache Beam pipelines within the Google Cloud Platform ecosystem.)
Tags:#AWSStepFunctions, #LogicApps, #Dataflow
Differences: AWS Step Functions makes it easy to coordinate the components of distributed applications and microservices using visual workflows. Building applications from individual components that each perform a discrete function lets you scale and change applications quickly. AWS Step Functions belongs to \”Cloud Task Management\” category of the tech stack, while Google Cloud Dataflow can be primarily classified under \”Real-time Data Processing\”. According to the StackShare community, Google Cloud Dataflow has a broader approval, being mentioned in 32 company stacks & 8 developers stacks; compared to AWS Step Functions, which is listed in 19 company stacks and 7 developer stacks.

Cloud Certification made simple. Ace your exams with Djamgatech.

Category: Enterprise application services
Description:Fully integrated Cloud service providing communications, email, document management in the cloud and available on a wide variety of devices.
References:
[AWS]:Amazon WorkMail, Amazon WorkDocs, Amazon Kendra (Sync and Index)
[Azure]:Office 365
[Google]:G Suite
Tags: #AmazonWorkDocs, #Office365, #GoogleGSuite
Differences: G suite document processing applications like Google Docs are far behind Office 365 popular Word and Excel software, but G Suite User interface is intuite, simple and easy to navigate. Office 365 is too clunky. Get 20% off G-Suite Business Plan with Promo Code: PCQ49CJYK7EATNC

Category: Networking
Description: Provides an isolated, private environment in the cloud. Users have control over their virtual networking environment, including selection of their own IP address range, creation of subnets, and configuration of route tables and network gateways.
References:
[AWS]:Virtual Private Cloud (VPC), Cloud virtual networking, Subnets, Elastic Network Interface (ENI), Route Tables, Network ACL, Secutity Groups, Internet Gateway, NAT Gateway, AWS VPN Gateway, AWS Route 53, AWS Direct Connect, AWS Network Load Balancer, VPN CloudHub, AWS Local Zones, AWS Transit Gateway network manager (Centrally manage global networks)
[Azure]:Virtual Network(provide services for building networks within Azure.),Subnets (network resources can be grouped by subnet for organisation and security.), Network Interface (Each virtual machine can be assigned one or more network interfaces (NICs)), Network Security Groups (NSG: contains a set of prioritised ACL rules that explicitly grant or deny access), Azure VPN Gateway ( allows connectivity to on-premise networks), Azure DNS, Traffic Manager (DNS based traffic routing solution.), ExpressRoute (provides connections up to 10 Gbps to Azure services over a dedicated fibre connection), Azure Load Balancer, Network Peering, Azure Stack (Azure Stack allows organisations to use Azure services running in private data centers.), Azure Load Balancer , Azure Log Analytics, Azure DNS,
[Google]:Cloud Virtual Network, Subnets, Network Interface, Protocol fowarding, Cloud VPN, Cloud DNS, Virtual Private Network, Cloud Interconnect, CDN interconnect, Cloud DNS, Stackdriver, Google Cloud Load Balancing,
Tags:#VPC, #Subnets, #ACL, #VPNGateway, #CloudVPN, #NetworkInterface, #ENI, #RouteTables, #NSG, #NetworkACL, #InternetGateway, #NatGateway, #ExpressRoute, #CloudInterConnect, #StackDriver
Differences: Subnets group related resources, however, unlike AWS and Azure, Google do not constrain the private IP address ranges of subnets to the address space of the parent network. Like Azure, Google has a built in internet gateway that can be specified from routing rules.

Category: Management
Description: A unified management console that simplifies building, deploying, and operating your cloud resources.
References:
[AWS]: AWS Management Console, Trusted Advisor, AWS Usage and Billing Report, AWS Application Discovery Service, Amazon EC2 Systems Manager, AWS Personal Health Dashboard, AWS Compute Optimizer (Identify optimal AWS Compute resources)
[Azure]:Azure portal, Azure Advisor, Azure Billing API, Azure Migrate, Azure Monitor, Azure Resource Health
[Google]:Google CLoud Platform, Cost Management, Security Command Center, StackDriver
Tags: #AWSConsole, #AzurePortal, #GoogleCloudConsole, #TrustedAdvisor, #AzureMonitor, #SecurityCommandCenter
Differences: AWS Console categorizes its Infrastructure as a Service offerings into Compute, Storage and Content Delivery Network (CDN), Database, and Networking to help businesses and individuals grow. Azure excels in the Hybrid Cloud space allowing companies to integrate onsite servers with cloud offerings. Google has a strong offering in containers, since Google developed the Kubernetes standard that AWS and Azure now offer. GCP specializes in high compute offerings like Big Data, analytics and machine learning. It also offers considerable scale and load balancing – Google knows data centers and fast response time.

Category: DevOps and application monitoring
Description: Comprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premises environments; Cloud services for collaborating on code development; Collection of tools for building, debugging, deploying, diagnosing, and managing multiplatform scalable apps and services; Fully managed build service that supports continuous integration and deployment.
References:
[AWS]:AWS CodePipeline(orchestrates workflow for continuous integration, continuous delivery, and continuous deployment), AWS CloudWatch (monitor your AWS resources and the applications you run on AWS in real time. ), AWS X-Ray (application performance management service that enables a developer to analyze and debug applications in aws), AWS CodeDeploy (automates code deployments to Elastic Compute Cloud (EC2) and on-premises servers. ), AWS CodeCommit ( source code storage and version-control service), AWS Developer Tools, AWS CodeBuild (continuous integration service that compiles source code, runs tests, and produces software packages that are ready to deploy. ), AWS Command Line Interface (unified tool to manage your AWS services), AWS OpsWorks (Chef-based), AWS CloudFormation ( provides a common language for you to describe and provision all the infrastructure resources in your cloud environment.), Amazon CodeGuru (for automated code reviews and application performance recommendations)
[Azure]:Azure Monitor, Azure DevOps, Azure Developer Tools, Azure CLI Azure PowerShell, Azure Automation, Azure Resource Manager , VM extensions , Azure Automation
[Google]:DevOps Solutions (Infrastructure as code, Configuration management, Secrets management, Serverless computing, Continuous delivery, Continuous integration , Stackdriver (combines metrics, logs, and metadata from all of your cloud accounts and projects into a single comprehensive view of your environment)
Tags: #CloudWatch, #StackDriver, #AzureMonitor, #AWSXray, #AWSCodeDeploy, #AzureDevOps, #GoogleDevopsSolutions
Differences: CodeCommit eliminates the need to operate your own source control system or worry about scaling its infrastructure. Azure DevOps provides unlimited private Git hosting, cloud build for continuous integration, agile planning, and release management for continuous delivery to the cloud and on-premises. Includes broad IDE support.

SageMaker | Azure Machine Learning Studio

A collaborative, drag-and-drop tool to build, test, and deploy predictive analytics solutions on your data.

Alexa Skills Kit | Microsoft Bot Framework

Build and connect intelligent bots that interact with your users using text/SMS, Skype, Teams, Slack, Office 365 mail, Twitter, and other popular services.

Amazon Lex | Speech Services

API capable of converting speech to text, understanding intent, and converting text back to speech for natural responsiveness.

Amazon Lex | Language Understanding (LUIS)

Allows your applications to understand user commands contextually.

Amazon Polly, Amazon Transcribe | Azure Speech Services

Enables both Speech to Text, and Text into Speech capabilities.
The Speech Services are the unification of speech-to-text, text-to-speech, and speech-translation into a single Azure subscription. It’s easy to speech enable your applications, tools, and devices with the Speech SDK, Speech Devices SDK, or REST APIs.
Amazon Polly is a Text-to-Speech (TTS) service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice. With dozens of lifelike voices across a variety of languages, you can select the ideal voice and build speech-enabled applications that work in many different countries.
Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy for developers to add speech-to-text capability to their applications. Using the Amazon Transcribe API, you can analyze audio files stored in Amazon S3 and have the service return a text file of the transcribed speech.

Amazon Rekognition | Cognitive Services

Computer Vision: Extract information from images to categorize and process visual data.
Amazon Rekognition is a simple and easy to use API that can quickly analyze any image or video file stored in Amazon S3. Amazon Rekognition is always learning from new data, and we are continually adding new labels and facial recognition features to the service.

Face: Detect, identy, and analyze faces in photos.

Emotions: Recognize emotions in images.

Alexa Skill Set | Azure Virtual Assistant

The Virtual Assistant Template brings together a number of best practices we’ve identified through the building of conversational experiences and automates integration of components that we’ve found to be highly beneficial to Bot Framework developers.

Big data and analytics

Data warehouse

AWS Redshift | SQL Data Warehouse

Cloud-based Enterprise Data Warehouse (EDW) that uses Massively Parallel Processing (MPP) to quickly run complex queries across petabytes of data.

Big data processing EMR | Azure Databricks
Apache Spark-based analytics platform.

EMR | HDInsight

Managed Hadoop service. Deploy and manage Hadoop clusters in Azure.

Data orchestration / ETL

AWS Data Pipeline, AWS Glue | Data Factory

Processes and moves data between different compute and storage services, as well as on-premises data sources at specified intervals. Create, schedule, orchestrate, and manage data pipelines.

AWS Glue | Data Catalog

A fully managed service that serves as a system of registration and system of discovery for enterprise data sources

Analytics and visualization

AWS Kinesis Analytics | Stream Analytics

Data Lake Analytics | Data Lake Store

Storage and analysis platforms that create insights from large quantities of data, or data that originates from many sources.

QuickSight | Power BI

Business intelligence tools that build visualizations, perform ad hoc analysis, and develop business insights from data.

CloudSearch | Azure Search

Delivers full-text search and related search analytics and capabilities.

Amazon Athena | Azure Data Lake Analytics

Provides a serverless interactive query service that uses standard SQL for analyzing databases.

Compute

Virtual servers

Elastic Compute Cloud (EC2) | Azure Virtual Machines

Virtual servers allow users to deploy, manage, and maintain OS and server software. Instance types provide combinations of CPU/RAM. Users pay for what they use with the flexibility to change sizes.

AWS Batch | Azure Batch

Run large-scale parallel and high-performance computing applications efficiently in the cloud.

AWS Auto Scaling | Virtual Machine Scale Sets

Allows you to automatically change the number of VM instances. You set defined metric and thresholds that determine if the platform adds or removes instances.

VMware Cloud on AWS | Azure VMware by CloudSimple

Redeploy and extend your VMware-based enterprise workloads to Azure with Azure VMware Solution by CloudSimple. Keep using the VMware tools you already know to manage workloads on Azure without disrupting network, security, or data protection policies.

Containers and container orchestrators

EC2 Container Service (ECS), Fargate | Azure Container Instances

Azure Container Instances is the fastest and simplest way to run a container in Azure, without having to provision any virtual machines or adopt a higher-level orchestration service.

EC2 Container Registry | Azure Container Registry

Allows customers to store Docker formatted images. Used to create all types of container deployments on Azure.

Elastic Container Service for Kubernetes (EKS) | Azure Kubernetes Service (AKS)

Deploy orchestrated containerized applications with Kubernetes. Simplify monitoring and cluster management through auto upgrades and a built-in operations console.

App Mesh | Service Fabric Mesh

Fully managed service that enables developers to deploy microservices applications without managing virtual machines, storage, or networking.
AWS App Mesh is a service mesh that provides application-level networking to make it easy for your services to communicate with each other across multiple types of compute infrastructure. App Mesh standardizes how your services communicate, giving you end-to-end visibility and ensuring high-availability for your applications.

Serverless

AWS Lambda | Azure Functions

Integrate systems and run backend processes in response to events or schedules without provisioning or managing servers.
AWS Lambda is an event-driven, serverless computing platform provided by Amazon as a part of the Amazon Web Services. It is a computing service that runs code in response to events and automatically manages the computing resources required by that code

Database

Relational database

AWS RDS | SQL Database Azure Database for MySQL Azure Database for PostgreSQL

Managed relational database service where resiliency, scale, and maintenance are primarily handled by the platform.
Amazon Relational Database Service is a distributed relational database service by Amazon Web Services. It is a web service running “in the cloud” designed to simplify the setup, operation, and scaling of a relational database for use in applications. Administration processes like patching the database software, backing up databases and enabling point-in-time recovery are managed automatically. Scaling storage and compute resources can be performed by a single API call as AWS does not offer an ssh connection to RDS instances.

NoSQL / Document

DynamoDB and SimpleDB | Azure Cosmos DB

A globally distributed, multi-model database that natively supports multiple data models: key-value, documents, graphs, and columnar.

Caching

AWS ElastiCache | Azure Cache for Redis

An in-memory–based, distributed caching service that provides a high-performance store typically used to offload non transactional work from a database.
Amazon ElastiCache is a fully managed in-memory data store and cache service by Amazon Web Services. The service improves the performance of web applications by retrieving information from managed in-memory caches, instead of relying entirely on slower disk-based databases. ElastiCache supports two open-source in-memory caching engines: Memcached and Redis.

Database migration

AWS Database Migration Service | Azure Database Migration Service

Migration of database schema and data from one database format to a specific database technology in the cloud.
AWS Database Migration Service helps you migrate databases to AWS quickly and securely. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database. The AWS Database Migration Service can migrate your data to and from most widely used commercial and open-source databases.

DevOps and application monitoring

AWS CloudWatch, AWS X-Ray | Azure Monitor

Comprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premises environments.
Amazon CloudWatch is a monitoring and observability service built for DevOps engineers, developers, site reliability engineers (SREs), and IT managers. CloudWatch provides you with data and actionable insights to monitor your applications, respond to system-wide performance changes, optimize resource utilization, and get a unified view of operational health. CloudWatch collects monitoring and operational data in the form of logs, metrics, and events, providing you with a unified view of AWS resources, applications, and services that run on AWS and on-premises servers.
AWS X-Ray is an application performance management service that enables a developer to analyze and debug applications in the Amazon Web Services (AWS) public cloud. A developer can use AWS X-Ray to visualize how a distributed application is performing during development or production, and across multiple AWS regions and accounts.

AWS CodeDeploy, AWS CodeCommit, AWS CodePipeline | Azure DevOps

A cloud service for collaborating on code development.
AWS CodeDeploy is a fully managed deployment service that automates software deployments to a variety of compute services such as Amazon EC2, AWS Fargate, AWS Lambda, and your on-premises servers. AWS CodeDeploy makes it easier for you to rapidly release new features, helps you avoid downtime during application deployment, and handles the complexity of updating your applications.
AWS CodePipeline is a fully managed continuous delivery service that helps you automate your release pipelines for fast and reliable application and infrastructure updates. CodePipeline automates the build, test, and deploy phases of your release process every time there is a code change, based on the release model you define.
AWS CodeCommit is a source code storage and version-control service for Amazon Web Services’ public cloud customers. CodeCommit was designed to help IT teams collaborate on software development, including continuous integration and application delivery.

AWS Developer Tools | Azure Developer Tools

Collection of tools for building, debugging, deploying, diagnosing, and managing multiplatform scalable apps and services.
The AWS Developer Tools are designed to help you build software like Amazon. They facilitate practices such as continuous delivery and infrastructure as code for serverless, containers, and Amazon EC2.

AWS CodeBuild | Azure DevOps

Fully managed build service that supports continuous integration and deployment.

AWS Command Line Interface | Azure CLI Azure PowerShell

Built on top of the native REST API across all cloud services, various programming language-specific wrappers provide easier ways to create solutions.
The AWS Command Line Interface (CLI) is a unified tool to manage your AWS services. With just one tool to download and configure, you can control multiple AWS services from the command line and automate them through scripts.

AWS OpsWorks (Chef-based) | Azure Automation

Configures and operates applications of all shapes and sizes, and provides templates to create and manage a collection of resources.
AWS OpsWorks is a configuration management service that provides managed instances of Chef and Puppet. Chef and Puppet are automation platforms that allow you to use code to automate the configurations of your servers.

AWS CloudFormation | Azure Resource Manager , VM extensions , Azure Automation

Provides a way for users to automate the manual, long-running, error-prone, and frequently repeated IT tasks.
AWS CloudFormation provides a common language for you to describe and provision all the infrastructure resources in your cloud environment. CloudFormation allows you to use a simple text file to model and provision, in an automated and secure manner, all the resources needed for your applications across all regions and accounts.

Networking

Area

Cloud virtual networking, Virtual Private Cloud (VPC) | Virtual Network

Provides an isolated, private environment in the cloud. Users have control over their virtual networking environment, including selection of their own IP address range, creation of subnets, and configuration of route tables and network gateways.

Cross-premises connectivity

AWS VPN Gateway | Azure VPN Gateway

Connects Azure virtual networks to other Azure virtual networks, or customer on-premises networks (Site To Site). Allows end users to connect to Azure services through VPN tunneling (Point To Site).

DNS management

AWS Route 53 | Azure DNS

Manage your DNS records using the same credentials and billing and support contract as your other Azure services

Route 53 | Traffic Manager

A service that hosts domain names, plus routes users to Internet applications, connects user requests to datacenters, manages traffic to apps, and improves app availability with automatic failover.

Dedicated network

AWS Direct Connect | ExpressRoute

Establishes a dedicated, private network connection from a location to the cloud provider (not over the Internet).

Load balancing

AWS Network Load Balancer | Azure Load Balancer

Azure Load Balancer load-balances traffic at layer 4 (TCP or UDP).

Application Load Balancer | Application Gateway

Application Gateway is a layer 7 load balancer. It supports SSL termination, cookie-based session affinity, and round robin for load-balancing traffic.

Internet of things (IoT)

AWS IoT | Azure IoT Hub

A cloud gateway for managing bidirectional communication with billions of IoT devices, securely and at scale.

AWS Greengrass | Azure IoT Edge

Deploy cloud intelligence directly on IoT devices to run in on-premises scenarios.

Kinesis Firehose, Kinesis Streams | Event Hubs

Services that allow the mass ingestion of small data inputs, typically from devices and sensors, to process and route the data.

AWS IoT Things Graph | Azure Digital Twins

Azure Digital Twins is an IoT service that helps you create comprehensive models of physical environments. Create spatial intelligence graphs to model the relationships and interactions between people, places, and devices. Query data from a physical space rather than disparate sensors.

Management

Trusted Advisor | Azure Advisor

Provides analysis of cloud resource configuration and security so subscribers can ensure they’re making use of best practices and optimum configurations.

AWS Usage and Billing Report | Azure Billing API

Services to help generate, monitor, forecast, and share billing data for resource usage by time, organization, or product resources.

AWS Management Console | Azure portal

A unified management console that simplifies building, deploying, and operating your cloud resources.

AWS Application Discovery Service | Azure Migrate

Assesses on-premises workloads for migration to Azure, performs performance-based sizing, and provides cost estimations.

Amazon EC2 Systems Manager | Azure Monitor

Comprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premises environments.

AWS Personal Health Dashboard | Azure Resource Health

Provides detailed information about the health of resources as well as recommended actions for maintaining resource health.

Security, identity, and access

Authentication and authorization

Identity and Access Management (IAM) | Azure Active Directory

Allows users to securely control access to services and resources while offering data security and protection. Create and manage users and groups, and use permissions to allow and deny access to resources.

Identity and Access Management (IAM) | Azure Role Based Access Control

Role-based access control (RBAC) helps you manage who has access to Azure resources, what they can do with those resources, and what areas they have access to.

AWS Organizations | Azure Subscription Management + Azure RBAC

Security policy and role management for working with multiple accounts.

Multi-Factor Authentication | Multi-Factor Authentication

Safeguard access to data and applications while meeting user demand for a simple sign-in process.

AWS Directory Service | Azure Active Directory Domain Services

Provides managed domain services such as domain join, group policy, LDAP, and Kerberos/NTLM authentication that are fully compatible with Windows Server Active Directory.

Cognito | Azure Active Directory B2C

A highly available, global, identity management service for consumer-facing applications that scales to hundreds of millions of identities.

AWS Organizations | Azure Policy

Azure Policy is a service in Azure that you use to create, assign, and manage policies. These policies enforce different rules and effects over your resources, so those resources stay compliant with your corporate standards and service level agreements.

AWS Organizations | Management Groups

Azure management groups provide a level of scope above subscriptions. You organize subscriptions into containers called “management groups” and apply your governance conditions to the management groups. All subscriptions within a management group automatically inherit the conditions applied to the management group. Management groups give you enterprise-grade management at a large scale, no matter what type of subscriptions you have.

Encryption

Server-side encryption with Amazon S3 Key Management Service | Azure Storage Service Encryption

Helps you protect and safeguard your data and meet your organizational security and compliance commitments.

Key Management Service AWS KMS, CloudHSM | Key Vault

Provides security solution and works with other services by providing a way to manage, create, and control encryption keys stored in hardware security modules (HSM).

Firewall

Web Application Firewall | Application Gateway – Web Application Firewall

A firewall that protects web applications from common web exploits.

Web Application Firewall | Azure Firewall

Provides inbound protection for non-HTTP/S protocols, outbound network-level protection for all ports and protocols, and application-level protection for outbound HTTP/S.

Security

Inspector | Security Center

An automated security assessment service that improves the security and compliance of applications. Automatically assess applications for vulnerabilities or deviations from best practices.

Certificate Manager | App Service Certificates available on the Portal

Service that allows customers to create, manage, and consume certificates seamlessly in the cloud.

GuardDuty | Azure Advanced Threat Protection

Detect and investigate advanced attacks on-premises and in the cloud.

AWS Artifact | Service Trust Portal

Provides access to audit reports, compliance guides, and trust documents from across cloud services.

AWS Shield | Azure DDos Protection Service

Provides cloud services with protection from distributed denial of services (DDoS) attacks.

Storage

Object storage

Simple Storage Services (S3) | Azure Blob storage

Object storage service, for use cases including cloud applications, content distribution, backup, archiving, disaster recovery, and big data analytics.

Virtual server disks

Elastic Block Store (EBS) | Azure managed disks

SSD storage optimized for I/O intensive read/write operations. For use as high-performance Azure virtual machine storage.

Shared files

Elastic File System | Azure Files

Provides a simple interface to create and configure file systems quickly, and share common files. Can be used with traditional protocols that access files over a network.

Archiving and backup

S3 Infrequent Access (IA) | Azure Storage cool tier

Cool storage is a lower-cost tier for storing data that is infrequently accessed and long-lived.

S3 Glacier | Azure Storage archive access tier

Archive storage has the lowest storage cost and higher data retrieval costs compared to hot and cool storage.

AWS Backup | Azure Backup

Back up and recover files and folders from the cloud, and provide offsite protection against data loss.

Hybrid storage

Storage Gateway | StorSimple

Integrates on-premises IT environments with cloud storage. Automates data management and storage, plus supports disaster recovery.

Bulk data transfer

AWS Import/Export Disk | Import/Export

A data transport solution that uses secure disks and appliances to transfer large amounts of data. Also offers data protection during transit.

AWS Import/Export Snowball, Snowball Edge, Snowmobile | Azure Data Box

Petabyte- to exabyte-scale data transport solution that uses secure data storage devices to transfer large amounts of data to and from Azure.

Web applications

Elastic Beanstalk | App Service

Managed hosting platform providing easy to use services for deploying and scaling web applications and services.

API Gateway | API Management

A turnkey solution for publishing APIs to external and internal consumers.

CloudFront | Azure Content Delivery Network

A global content delivery network that delivers audio, video, applications, images, and other files.

Global Accelerator | Azure Front Door

Easily join your distributed microservice architectures into a single global application using HTTP load balancing and path-based routing rules. Automate turning up new regions and scale-out with API-driven global actions, and independent fault-tolerance to your back end microservices in Azure—or anywhere.

Miscellaneous

Backend process logic

AWS Step Functions | Logic Apps

Cloud technology to build distributed applications using out-of-the-box connectors to reduce integration challenges. Connect apps, data and devices on-premises or in the cloud.

Enterprise application services

Amazon WorkMail, Amazon WorkDocs | Office 365

Fully integrated Cloud service providing communications, email, document management in the cloud and available on a wide variety of devices.

Gaming

GameLift, GameSparks | PlayFab

Managed services for hosting dedicated game servers.

Media transcoding

Elastic Transcoder | Media Services

Services that offer broadcast-quality video streaming services, including various transcoding technologies.

Workflow

Simple Workflow Service (SWF) | Logic Apps

Serverless technology for connecting apps, data and devices anywhere, whether on-premises or in the cloud for large ecosystems of SaaS and cloud-based connectors.

Hybrid

Outposts | Azure Stack

Azure Stack is a hybrid cloud platform that enables you to run Azure services in your company’s or service provider’s datacenter. As a developer, you can build apps on Azure Stack. You can then deploy them to either Azure Stack or Azure, or you can build truly hybrid apps that take advantage of connectivity between an Azure Stack cloud and Azure.

How does a business decide between Microsoft Azure or AWS?

Basically, it all comes down to what your organizational needs are and if there’s a particular area that’s especially important to your business (ex. serverless, or integration with Microsoft applications).

Some of the main things it comes down to is compute options, pricing, and purchasing options.

Here’s a brief comparison of the compute option features across cloud providers:

Here’s an example of a few instances’ costs (all are Linux OS):

Each provider offers a variety of options to lower costs from the listed On-Demand prices. These can fall under reservations, spot and preemptible instances and contracts.

Both AWS and Azure offer a way for customers to purchase compute capacity in advance in exchange for a discount: AWS Reserved Instances and Azure Reserved Virtual Machine Instances. There are a few interesting variations between the instances across the cloud providers which could affect which is more appealing to a business.

Another discounting mechanism is the idea of spot instances in AWS and low-priority VMs in Azure. These options allow users to purchase unused capacity for a steep discount.

With AWS and Azure, enterprise contracts are available. These are typically aimed at enterprise customers, and encourage large companies to commit to specific levels of usage and spend in exchange for an across-the-board discount – for example, AWS EDPs and Azure Enterprise Agreements.

You can read more about the differences between AWS and Azure to help decide which your business should use in this blog post

Source: AWS to Azure services comparison – Azure Architecture

Active Hydrating Toner, Anti-Aging Replenishing Advanced Face Moisturizer, with Vitamins A, C, E & Natural Botanicals to Promote Skin Balance & Collagen Production, 6.7 Fl Oz

Age Defying 0.3% Retinol Serum, Anti-Aging Dark Spot Remover for Face, Fine Lines & Wrinkle Pore Minimizer, with Vitamin E & Natural Botanicals

Firming Moisturizer, Advanced Hydrating Facial Replenishing Cream, with Hyaluronic Acid, Resveratrol & Natural Botanicals to Restore Skin's Strength, Radiance, and Resilience, 1.75 Oz

Skin Stem Cell Serum

Smartphone 101 - Pick a smartphone for me - android or iOS - Apple iPhone or Samsung Galaxy or Huawei or Xaomi or Google Pixel

Can AI Really Predict Lottery Results? We Asked an Expert.

Pass the 2023 AWS Cloud Practitioner CCP CLF-C02 Certification with flying colors

Ace the 2023 AWS Solutions Architect Associate SAA-C03 Exam with Confidence

Pass the 2023 AWS Certified Machine Learning Specialty MLS-C01 Exam with Flying Colors

Football/Soccer World Cup 2022 Guide and Past World Cups History and Quiz illustrated

Djamgatech

Read Photos and PDFs Aloud for me iOS
Read Photos and PDFs Aloud for me android
Read Photos and PDFs Aloud For me Windows 10/11
Read Photos and PDFs Aloud For Amazon

Get 20% off Google Workspace (Google Meet) Business Plan (AMERICAS): M9HNXHX3WC9H7YE (Email us for more)

Get 20% off Google Google Workspace (Google Meet) Standard Plan with the following codes: 96DRHDRA9J7GTN6(Email us for more)

FREE 10000+ Quiz Trivia and and Brain Teasers for All Topics including Cloud Computing, General Knowledge, History, Television, Music, Art, Science, Movies, Films, US History, Soccer Football, World Cup, Data Science, Machine Learning, Geography, etc....

taimienphi.vn

List of Freely available programming books - What is the single most influential book every Programmers should read

#BlackOwned #BlackEntrepreneurs #BlackBuniness #AWSCertified #AWSCloudPractitioner #AWSCertification #AWSCLFC02 #CloudComputing #AWSStudyGuide #AWSTraining #AWSCareer #AWSExamPrep #AWSCommunity #AWSEducation #AWSBasics #AWSCertified #AWSMachineLearning #AWSCertification #AWSSpecialty #MachineLearning #AWSStudyGuide #CloudComputing #DataScience #AWSCertified #AWSSolutionsArchitect #AWSArchitectAssociate #AWSCertification #AWSStudyGuide #CloudComputing #AWSArchitecture #AWSTraining #AWSCareer #AWSExamPrep #AWSCommunity #AWSEducation #AzureFundamentals #AZ900 #MicrosoftAzure #ITCertification #CertificationPrep #StudyMaterials #TechLearning #MicrosoftCertified #AzureCertification #TechBooks

Top 1000 Canada Quiz and trivia: CANADA CITIZENSHIP TEST- HISTORY - GEOGRAPHY - GOVERNMENT- CULTURE - PEOPLE - LANGUAGES - TRAVEL - WILDLIFE - HOCKEY - TOURISM - SCENERIES - ARTS - DATA VISUALIZATION

Top 1000 Africa Quiz and trivia: HISTORY - GEOGRAPHY - WILDLIFE - CULTURE - PEOPLE - LANGUAGES - TRAVEL - TOURISM - SCENERIES - ARTS - DATA VISUALIZATION

Exploring the Pros and Cons of Visiting All Provinces and Territories in Canada.

Exploring the Advantages and Disadvantages of Visiting All 50 States in the USA

Health Health, a science-based community to discuss health news and the coronavirus (COVID-19) pandemic

Circadian rhythms can influence drugs’ effectiveness
by /u/Sariel007 on April 24, 2024 at 10:33 pm
submitted by /u/Sariel007 [link] [comments]
Does eating too much sugar really make kids hyper? We asked researchers.
by /u/newzee1 on April 24, 2024 at 9:45 pm
submitted by /u/newzee1 [link] [comments]
Why dentists say you shouldn’t rinse after brushing
by /u/newzee1 on April 24, 2024 at 9:35 pm
submitted by /u/newzee1 [link] [comments]
Partnering for a Healthier Detroit: Gilbert Family Foundation & Henry Ford Health's Vision
by /u/sbgroup65 on April 24, 2024 at 7:50 pm
submitted by /u/sbgroup65 [link] [comments]
First patient receives combined pig kidney transplant, heart pump
by /u/VirtualNecromancer on April 24, 2024 at 6:45 pm
submitted by /u/VirtualNecromancer [link] [comments]

Today I Learned (TIL) You learn something new every day; what did you learn today? Submit interesting and specific facts about something that you just found out here.

TIL that John Rock, one of the creators of the contraceptive pill, was a devout Catholic
by /u/trashconverters on April 25, 2024 at 2:46 am
submitted by /u/trashconverters [link] [comments]
TIL that the Chicago area has more hot dog restaurants than McDonald's, Wendy's, and Burger King restaurants combined
by /u/Brix001 on April 25, 2024 at 1:32 am
submitted by /u/Brix001 [link] [comments]
TIL of the Glasgow effect, a term which refers to the lower life expectancy of residents of the Scottish city compared to the rest of the UK and Europe. Some hypotheses for this effect include stress, especially in childhood, leading to ill health; violent gang culture; and rate of premature births.
by /u/pukkapaddington on April 25, 2024 at 1:32 am
submitted by /u/pukkapaddington [link] [comments]
TIL of the mummy of Takabuti, a young ancient Egyptian woman who died from an axe blow to her back. A study of the proteins in her leg muscles allowed researchers to hypothesise that she had been running for some time before she was killed.
by /u/roughvandyke on April 24, 2024 at 11:55 pm
submitted by /u/roughvandyke [link] [comments]
TIL that it took Boeing less than 3 years from starting the 747 project to first flight. The first commercial flight occurred 11 months later.
by /u/getthedudesdanny on April 24, 2024 at 11:19 pm
submitted by /u/getthedudesdanny [link] [comments]

Reddit Science This community is a place to share and discuss new scientific research. Read about the latest advances in astronomy, biology, medicine, physics, social science, and more. Find and submit new publications and popular science coverage of current research.

James Webb Space Telescope takes its first images of forming planetary systems
by /u/UVicScience on April 25, 2024 at 1:05 am
submitted by /u/UVicScience [link] [comments]
Tiny rubber spheres used to make a programmable fluid: “We can [now] make hydraulic actuators soft and self-controlled. The fluid itself is doing all the control for us, so we don’t have to control the robot from the outside”
by /u/TurretLauncher on April 25, 2024 at 12:33 am
submitted by /u/TurretLauncher [link] [comments]
A sweetener used in cakes, soft drinks and chewing gum - neotame (sugar substitute E961), used to replace aspartame - can affect people’s health by weakening the gut wall as it damages intestinal bacteria, which may lead to irritable bowel syndrome or insulin resistance, a new study has found.
by /u/mvea on April 24, 2024 at 9:29 pm
submitted by /u/mvea [link] [comments]
A Florida man with migraines had a CT scan which showed that his brain was infested with tapeworm cysts. A new study hypothesised that he ate undercooked infected pork that contained tapeworm cysts, known as cysticercus, and re-infected himself with eggs passed in his faeces through poor hygiene.
by /u/mvea on April 24, 2024 at 9:13 pm
submitted by /u/mvea [link] [comments]
Early warning sign of extinction. Research found before an extinction pulse 34 million years ago, marine communities became highly specialized everywhere but in the southern high latitudes, implying that these micro-plankton migrated en masse to higher latitudes and away from the tropics
by /u/Wagamaga on April 24, 2024 at 9:04 pm
submitted by /u/Wagamaga [link] [comments]

Reddit Sports Sports News and Highlights from the NFL, NBA, NHL, MLB, MLS, and leagues around the world.

Heat catch fire from deep, steal Game 2 from Celtics
by /u/Oldtimer_2 on April 25, 2024 at 2:54 am
submitted by /u/Oldtimer_2 [link] [comments]
'Invest in these women': WNBA champion A'ja Wilson hopes Caitlin Clark's popularity helps the league grow
by /u/kundu123 on April 25, 2024 at 1:28 am
submitted by /u/kundu123 [link] [comments]
Tiger Woods, Rory McIlroy to receive massive loyalty payouts from PGA Tour after equity investment, per report
by /u/Oldtimer_2 on April 25, 2024 at 12:50 am
submitted by /u/Oldtimer_2 [link] [comments]
Hiltzik: The folly of public financing for stadiums - Los Angeles Times
by /u/davster39 on April 24, 2024 at 10:12 pm
submitted by /u/davster39 [link] [comments]
Chicago Bears give new stadium update, say they will provide over $2B for lakefront domed project
by /u/Oldtimer_2 on April 24, 2024 at 10:02 pm
submitted by /u/Oldtimer_2 [link] [comments]

Turn your dream into reality with Google Workspace: It’s free for the first 14 days.
Get 20% off Google Google Workspace (Google Meet) Standard Plan with the following codes:
96DRHDRA9J7GTN6
63F733CLLY7R7MM
63F7D7CPD9XXUVT
63FLKQHWV3AEEE6
63JGLWWK36CP7WM
63KKR9EULQRR7VE
63KNY4N7VHCUA9R
63LDXXFYU6VXDG9
63MGNRCKXURAYWC
63NGNDVVXJP4N99
63P4G3ELRPADKQU
With Google Workspace, Get custom email @yourcompany, Work from anywhere; Easily scale up or down
Google gives you the tools you need to run your business like a pro. Set up custom email, share files securely online, video chat from any device, and more.
Google Workspace provides a platform, a common ground, for all our internal teams and operations to collaboratively support our primary business goal, which is to deliver quality information to our readers quickly.
Get 20% off Google Workspace (Google Meet) Business Plan (AMERICAS): M9HNXHX3WC9H7YE
C37HCAQRVR7JTFK
C3AE76E7WATCTL9
C3C3RGUF9VW6LXE
C3D9LD4L736CALC
C3EQXV674DQ6PXP
C3G9M3JEHXM3XC7
C3GGR3H4TRHUD7L
C3LVUVC3LHKUEQK
C3PVGM4CHHPMWLE
C3QHQ763LWGTW4C
Even if you’re small, you want people to see you as a professional business. If you’re still growing, you need the building blocks to get you where you want to be. I’ve learned so much about business through Google Workspace—I can’t imagine working without it. (Email us for more codes)

AI Dashboard is available on the Web, Apple, Google, and Microsoft, PRO version

Top 100 AWS Certified Data Analytics Specialty Certification Questions and Answers Dumps

The AWS Certified Data Analytics – Specialty (DAS-C01) covers the following domains:

Below are the Top 100 AWS Certified Data Analytics – Specialty Questions and Answers Dumps and References –

Question1: What combination of services do you need for the following requirements: accelerate petabyte-scale data transfers, load streaming data, and the ability to create scalable, private connections. Select the correct answer order.

Question 10: You need to filter and transform incoming messages coming from a smart sensor you have connected with AWS. Once messages are received, you need to store them as time series data in DynamoDB. Which AWS service can you use?

Question 12: You currently have databases running on-site and in another data center off-site. What service allows you to consolidate to one database in Amazon?

Question 14: You have been hired as a consultant to provide a solution to integrate a client’s on-premises data center to AWS. The customer requires a 300 Mbps dedicated, private connection to their VPC. Which AWS tool do you need?

Question 16: A data engineer needs to create a dashboard to display social media trends during the last hour of a large company event. The dashboard needs to display the associated metrics with a latency of less than 1 minute.Which solution meets these requirements?

Question 18: You need to migrate data to AWS. It is estimated that the data transfer will take over a month via the current AWS Direct Connect connection your company has set up. Which AWS tool should you use?

Question 19: You currently have an on-premises Oracle database and have decided to leverage AWS and use Aurora. You need to do this as quickly as possible. How do you achieve this?

Question 22: A data lake is a central repository that enables which operation?

Question 23: What is the most cost-effective storage option for your data lake?

Question 24: Which services are used in the processing layer of a data lake architecture? (SELECT TWO)

Question 25: Which services can be used for data ingestion into your data lake? (SELECT TWO)

Question 26: Which service uses continuous data replication with high availability to consolidate databases into a petabyte-scale data warehouse by streaming data to amazon Redshift and Amazon S3?

Question 27: What is the AWS Glue Data Catalog?

Questions 28: What AWS Glue feature “catalogs” your data?

Question 29: During your data preparation stage, the raw data has been enriched to support additional insights. You need to improve query performance and reduce costs of the final analytics solution.

Which data formats meet these requirements (SELECT TWO)

Question 30: In which scenario would you use AWS Glue jobs?

Question 31: Your data resides in multiple data stores, including Amazon S3, Amazon RDS, and Amazon DynamoDB. You need to efficiently query the combined datasets.

Which tool can achieve this, using a single query, without moving data?

Question 32: Which benefit do you achieve by using AWS Lake Formation to build data lakes?

Question 33: What are the three stages to set up a data lake using AWS Lake Formation? (SELECT THREE)

Question 35: A digital media customer needs to quickly build a data lake solution for the data housed in a PostgreSQL database. As a solutions architect, what service and feature would meet this requirement?

Question 36: AWS Lake Formation has a set of suggested personas and IAM permissions. Which is a required persona?

Question 37: Which three types of blueprints does AWS Lake Formation support? (SELECT THREE)

Question 38: Which one of the following is the best description of the capabilities of Amazon QuickSight?

Question 39: Which benefits are provided by Amazon Redshift? (Select TWO)

Djamga Data Sciences Big Data – Data Analytics Youtube Playlist

Big Data – Data Analytics Jobs:

Big Data – Data Analytics – Data Sciences Latest News:

DATA ANALYTICS Q&A:

What Is a Data Scientist?

Do All Data Scientists Hold Graduate Degrees?

How Are Data Scientists Different From Data Analysts?

How Is Data Science Used in Medicine?

What are data scientists paid?

What is the workday like for a data scientist?

How do I become a Data Scientist?

What Should I Learn? What Order Do I Learn Them?

How Do I Learn Python?

How Do I Learn R?

How Do I Learn SQL?

Curated Threads & Resources

How Do I Learn Calculus?

Curated Threads & Resources about Data Science and Data Analytics

How Do I Learn Probability?

How Do I Learn Linear Algebra?

Data visualization example: Crypto race: DOGE (red) versus BTC (blue), 5/6/2020 – 5/5/2021

Basic Data Lake Architecture

Data Analytics Architecture on AWS

Data Analytics Process

Data Lake Storage:

Event Driven Data Analytics Workflow on AWS

What is a Data Lake?

What is a Data Warehouse?

What are benefits of a data warehouse?

Data Lake vs Data Warehouse – Comparison

Transactional Data Ingestion

Structured Query Language (SQL)

Comparison of OLTP and OLAP

What is Amazon Macie?

What is AWS Glue DataBrew?

What is AWS Glue?

What is Amazon Athena?

What is AWS Lake Formation?

What is Amazon Quicksight?

What is SPICE?

Serverless data lake reference architecture:

What are Data Lakes Best Practices?

References:

AI Dashboard is available on the Web, Apple, Google, and Microsoft, PRO version

Get 20% off Google Workspace (Google Meet) Business Plan (AMERICAS): M9HNXHX3WC9H7YE (Email us for more)

Get 20% off Google Google Workspace (Google Meet) Standard Plan with the following codes: 96DRHDRA9J7GTN6(Email us for more)

List of Freely available programming books - What is the single most influential book every Programmers should read

Health Health, a science-based community to discuss health news and the coronavirus (COVID-19) pandemic

Today I Learned (TIL) You learn something new every day; what did you learn today? Submit interesting and specific facts about something that you just found out here.

Reddit Science This community is a place to share and discuss new scientific research. Read about the latest advances in astronomy, biology, medicine, physics, social science, and more. Find and submit new publications and popular science coverage of current research.

Question 16: A data engineer needs to create a dashboard to display social media trends during the last hour of a large company event. The dashboard needs to display the associated metrics with a latency of less than 1 minute.
Which solution meets these requirements?