Learning big data is a big topic with a big commitment. You might know the hype of the big data. Big data is useful for today environment when we collected a lot of data for our business. The five V principles of Big data which is volume, velocity, variety, veracity, and value; allow us to understand why the big data is so promising and special. On this article, we will show you some curated material that can help you to get started with the big data.
Step 1 Understand the Big data Architecture.
Big data is not only the size of the data. Big data discuss about the technology to process and to analyze the data. Therefore, the first step to learn the Big data is to understand the Big Data Architecture. On today business, we elaborate the big data with Machine learning, streaming data, and of course the cloud. You can see the big data architecture here:
The component of the big data can be aligned with two common architectures:
- Lambda Architecture
- Kappa Architecture
You can learn more about the two architectures in the article here.
Step 2 Understand the Big data Process the Data.
Based on the diagram, we can see that we have two ways to process the big data.
- Real time processing. This is recommended way for the data that has a great velocity like Tweet, IOT data, and other unstructured data. You can learn more here.
- Batch data processing. This is recommended way for the data that need to be processed. You can learn more here.
Step 3 Understand where to Put the Big data
When planning to deploy the big data solution. You will have three ways to put the big data work:
- Cloud: This is the recommended way for typical situation since you do not need to prepare everything such as scaling, configuration expertise, or hardware investment
- On-Premises: this is the recommended way when you have a requirement that the data should be processed locally such as confidentiality, government requirement, or disconnected world.
- Hybrid: this is the combination in the cloud and on-premises. You can use this if you have complex enough business requirements that can be solved by Cloud or On-premise.
Step 4 Choosing the Technology
This is the huge list that you can learn this huge list from here. I will arrange it based on the simplicity. For simplicity, I just write the most used technology based on my experience
Big Data Process | Technology that you can choose |
Analytical data store | Azure Synapse Analytics (All in)
Azure SQL (Tabular)
Azure Data Explorer (Time Series)
HIVE on HDInsight (In Memory)
Cosmo DB (Non-Tabular) |
Analytics | Power BI
Jupyter Notebooks
Excel (limited) |
Batch Processing | Azure Synapse
HDInsight (Hadoop)
Azure Data Lake Analytics
Azure Databrick |
Stream Processing | Azure Stream Analytics
HDInsight with Spark Streaming
Apache Spark in Azure Databricks
HDInsight with Storm |
Search the Data | Azure Cognitive Search
Elasticsearch
HDInsight with Solr |
That's hope this guide will help you to get start in Big data Learning. Enjoy your weekend