Getting Started Big Data Analytics in The Cloud

Learning big data is a big topic with a big commitment. You might know the hype of the big data. Big data is useful for today environment when we collected a lot of data for our business. The five V principles of Big data which is volume, velocity, variety, veracity, and value; allow us to understand why the big data is so promising and special. On this article, we will show you some curated material that can help you to get started with the big data.

Step 1 Understand the Big data Architecture.

Big data is not only the size of the data. Big data discuss about the technology to process and to analyze the data. Therefore, the first step to learn the Big data is to understand the Big Data Architecture. On today business, we elaborate the big data with Machine learning, streaming data, and of course the cloud. You can see the big data architecture here:

The component of the big data can be aligned with two common architectures:

  • Lambda Architecture
  • Kappa Architecture

You can learn more about the two architectures in the article here.

Step 2 Understand the Big data Process the Data.

Based on the diagram, we can see that we have two ways to process the big data.

  • Real time processing. This is recommended way for the data that has a great velocity like Tweet, IOT data, and other unstructured data. You can learn more here.
  • Batch data processing. This is recommended way for the data that need to be processed. You can learn more here.

Step 3 Understand where to Put the Big data

When planning to deploy the big data solution. You will have three ways to put the big data work:

  • Cloud: This is the recommended way for typical situation since you do not need to prepare everything such as scaling, configuration expertise, or hardware investment
  • On-Premises: this is the recommended way when you have a requirement that the data should be processed locally such as confidentiality, government requirement, or disconnected world.
  • Hybrid: this is the combination in the cloud and on-premises. You can use this if you have complex enough business requirements that can be solved by Cloud or On-premise.

Step 4 Choosing the Technology

This is the huge list that you can learn this huge list from here. I will arrange it based on the simplicity. For simplicity, I just write the most used technology based on my experience

Big Data Process

Technology that you can choose

Analytical data store

Azure Synapse Analytics (All in)

Azure SQL (Tabular)

Azure Data Explorer (Time Series)

HIVE on HDInsight (In Memory)

Cosmo DB (Non-Tabular)

Analytics

Power BI

Jupyter Notebooks

Excel (limited)

Batch Processing

Azure Synapse

HDInsight (Hadoop)

Azure Data Lake Analytics

Azure Databrick

Stream Processing

Azure Stream Analytics

HDInsight with Spark Streaming

Apache Spark in Azure Databricks

HDInsight with Storm

Search the Data

Azure Cognitive Search

Elasticsearch

HDInsight with Solr

 

That's hope this guide will help you to get start in Big data Learning. Enjoy your weekend

blog comments powered by Disqus

Topics Highlights

About @ridife

This blog will be dedicated to integrate a knowledge between academic and industry need in the Software Engineering, DevOps, Cloud Computing and Microsoft 365 platform. Enjoy this blog and let's get in touch in any social media.

Xbox

Month List

Visitor