AZ-900 Certification Notes
Chapter 9.2 - Big Data
Big Data - Moving Definition
- Data from Millions of Devices
- The definition of how big " Big Data" is changes as the industry can process more and more data
Business Value
Big Data = Better service, better products, more profits
Data Lake Analytics
- Large Amounts of Data
- A data lake is a very large body of data
- Parallel Processing
- Two or more processes or computers processing the same data at the same time. Data Lake Analytics includes parallel processing
- Ready to Go
- Servers, processes and any other needed services are ready to go from the start. Jump straight into the data analytics
HDInsights
- Similar to Azure Data Lake Analytics
- Open Source, which is free and community supported
- Includes Apache Hadoop, Spark, and Kafka
Azure Databricks
- Based on Apache Spark, a distributed cluster-computing framework
- Run and process a dataset on many computers simultaneously
- Databricks provides all the computing power
- Integrates with other Azure Storage services
Azure Synapse Analytics
- Azure's data warehouse offering
- Used to be Azure SQL Data Warehouse
- Used for reporting and data analysis
- Only limited by your scope
- Use Synapse SQL language to manipulate the data
Outcomes
- Speed
- Speed and efficiency of processing large amounts of data, provides real value
- Cost Reduction
- Save large amounts of money on storage and processing, by using a Big Data solution in the cloud
- Better Decision Making
- Immediate data processing and analysis in-memory means you can make better decisions, and make them faster
- New Products and Services
- Understand what customers want and provide them with much better products and services