Getting Started Big Data Analytics in The Cloud

Learning big data is a big topic with a big commitment. You might know the hype of the big data. Big data is useful for today environment when we collected a lot of data for our business. The five V principles of Big data which is volume, velocity, variety, veracity, and value; allow us to understand why the big data is so promising and special. On this article, we will show you some curated material that can help you to get started with the big data. Step 1 Understand the Big data Architecture. Big data is not only the size of the data. Big data discuss about the technology to process and to analyze the data. Therefore, the first step to learn the Big data is to understand the Big Data Architecture. On today business, we elaborate the big data with Machine learning, streaming data, and of course the cloud. You can see the big data architecture here: The component of the big data can be aligned with two common architectures: Lambda Architecture Kappa Architecture You can learn more about the two architectures in the article here. medianet_width = "600"; medianet_height = "250"; medianet_crid = "858385152"; medianet_versionId = "3111299"; Step 2 Understand the Big data Process the Data. Based on the diagram, we can see that we have two ways to process the big data. Real time processing. This is recommended way for the data that has a great velocity like Tweet, IOT data, and other unstructured data. You can learn more here. Batch data processing. This is recommended way for the data that need to be processed. You can learn more here. Step 3 Understand where to Put the Big data When planning to deploy the big data solution. You will have three ways to put the big data work: Cloud: This is the recommended way for typical situation since you do not need to prepare everything such as scaling, configuration expertise, or hardware investment On-Premises: this is the recommended way when you have a requirement that the data should be processed locally such as confidentiality, government requirement, or disconnected world. Hybrid: this is the combination in the cloud and on-premises. You can use this if you have complex enough business requirements that can be solved by Cloud or On-premise. Step 4 Choosing the Technology This is the huge list that you can learn this huge list from here. I will arrange it based on the simplicity. For simplicity, I just write the most used technology based on my experience Big Data ProcessTechnology that you can chooseAnalytical data storeAzure Synapse Analytics (All in) Azure SQL (Tabular) Azure Data Explorer (Time Series) HIVE on HDInsight (In Memory) Cosmo DB (Non-Tabular)AnalyticsPower BI Jupyter Notebooks Excel (limited)Batch ProcessingAzure Synapse HDInsight (Hadoop) Azure Data Lake Analytics Azure DatabrickStream Processing Azure Stream Analytics HDInsight with Spark Streaming Apache Spark in Azure Databricks HDInsight with StormSearch the DataAzure Cognitive Search Elasticsearch HDInsight with Solr  That's hope this guide will help you to get start in Big data Learning. Enjoy your weekend medianet_width = "600"; medianet_height = "250"; medianet_crid = "858385152"; medianet_versionId = "3111299";

Azure Data Fundamental

The Fun Fact about the data When we build anything, we use data. Start from structured data, unstructured data, and semi-structured data we store the data to retrieve it as information and knowledge. Despite of the data usage, we know that the data in our life is growing. And when we can't store the data in the local storage the cloud is the answer. The question is how we store and manage the data in the cloud. This article will discuss how we store and analyze the data in the cloud era. You can read the data concept here The Data Store You can store the data in two types relational data or non-relational data. In non-relational data you will have Azure Cosmos DB, File, Blob, and many more. You can learn more here In relational data you will have the power of SQL Azure, as well as MySQL, Maria DB and any others database. You can learn more here. If you need high volume transaction without than the Non-relational data is for you. However, for small and tight relation between data you need the relational database such as SQL Server. You can learn more the consideration here. The Data Analytics medianet_width = "600"; medianet_height = "250"; medianet_crid = "858385152"; medianet_versionId = "3111299"; After the data is stored, you can analyze the data for more useful manner. This step knowns as analytics. According to Microsoft they have several products which are? Azure Data Factory who take any data and convert it into format that you need. The ETL process heavily happen in this Azure Data Factory Azure Data Lake who store raw data to ready to retrieve as fast as it can. Azure Data Lake is the main storage for Azure Data Factory Azure Databricks is a tool to provide big data processing, streaming, and machine learning. It can use data lake as a data source Azure Synapse Analytics is an analytics engine. It is designed to process large amounts of data very quickly. Azure Synapse Analytics supports two computational models: SQL pools and Spark pools. Azure Analysis Services enables you to build tabular models to support online analytical processing (OLAP) queries. You can combine data from multiple sources from the data lake, cosmos DB, and off course SQL Azure Azure HDInsight is a big data processing tool based on well-known platform Hadoop.   You can learn more about analytics here. After you have analytics you can pull it into dashboard or report by using Power BI. medianet_width = "600"; medianet_height = "250"; medianet_crid = "858385152"; medianet_versionId = "3111299";

Topics Highlights

About @ridife

This blog will be dedicated to integrate a knowledge between academic and industry need in the Software Engineering, DevOps, Cloud Computing and Microsoft 365 platform. Enjoy this blog and let's get in touch in any social media.


Month List