Big Data & Data Analytics

Saumit Patki , Last updated: 10 July 2018  


As we move forward with our daily routine with the help of “Telecommunication” we knowingly/unknowingly share our data with a lot of known/unknown organisations. Have we ever noticed why these organisations want our phone numbers, date of birth, bank details, favourite colour, vacation spots, sports, political inclination etc. This information gives the organisation our behavioural pattern. Our likes & dislikes, tastes, perceptions & many more.

The collection of such information from different parts of the world aggregates into voluminous information.

So, what is this all fuzz about Big Data? Big data is a term that describes the large volume of data – both structured and unstructured 

However, the amount of data is not important, what organizations do with the data matters the most.

What Is Data Analytics?

Big data analytics is the process of examining large and varied data sets - i.e., big data - to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful information that can help organizations make more-informed business decisions. These data are characterised by the high speed with which they are generated, the huge volume they are generated, the variety of events & the veracity of events.

Types of Data Analytics

  1. Descriptive: The descriptive ones explain what happened in the past based on data presented in the form of graphics. But does not explain why the event happened.
  2. Diagnostic: This one explains why any event given in the past occurred.
  3. Predictive: The predictive one helps to understand what will happen in future based on the historical data. This type is useful to all types of organisation.
  4. Prescriptive: In this type the system analyses the pattern predicts the future and prescribes a suitable solution.

Big data analytics benefits

Driven by specialized analytics systems and software, big data analytics can point the way to various business benefits, including new revenue opportunities, more effective marketing, better customer service, improved operational efficiency and competitive advantages over rivals.

Big data analytics applications enable data scientists, predictive modelers, statisticians and other analytics professionals to analyse growing volumes of structured transaction data, plus other forms of data that are often left untapped by conventional business intelligence (BI) and analytics programs. That encompasses a mix of semi-structured and unstructured data - for example, internet clickstream data, web server logs, social media content, text from customer emails and survey responses, mobile-phone call-detail records and machine data captured by sensors connected to the internet of things. 

On a broad scale, data analytics technologies and techniques provide a means of analysing data sets and drawing conclusions about them to help organizations make informed business decisions. BI queries answer basic questions about business operations and performance. Big data analytics is a form of advanced analytics, which involves complex applications with elements such as predictive models, statistical algorithms and what-if analyses powered by high-performance analytics systems.

Types of Tools for Analysis

  • YARN: a cluster management technology and one of the key features in second-generation Hadoop.
  • MapReduce: a software framework that allows developers to write programs that process massive amounts of unstructured data in parallel across a distributed cluster of processors or stand-alone computers.
  • Spark: an open-source parallel processing framework that enables users to run large-scale data analytics applications across clustered systems.
  • HBase: a column-oriented key/value data store built to run on top of the Hadoop Distributed File System (HDFS).
  • Hive: an open-source data warehouse system for querying and analysing large datasets stored in Hadoop files.
  • Kafka: a distributed publish-subscribe messaging system designed to replace traditional message brokers.
  • Pig: an open-source technology that offers a high-level mechanism for the parallel programming of MapReduce jobs to be executed on Hadoop clusters.

In some cases, Hadoop clusters and NoSQL systems are being used primarily as landing pads and staging areas for data before it gets loaded into a data warehouse or analytical database for analysis, usually in a summarized form that is more conducive to relational structures.

More frequently, however, big data analytics users are adopting the concept of a Hadoop data lake that serves as the primary repository for incoming streams of raw data. In such architectures, data can be analysed directly in a Hadoop cluster or run through a processing engine like Spark. As in data warehousing, sound data management is a crucial first step in the big data analytics process. Data being stored in the Hadoop Distributed File System must be organized, configured and partitioned properly to get good performance on both extract, transform and load (ETL) integration jobs and analytical queries. 

Once the data is ready, it can be analysed with the software commonly used in advanced analytics processes. That includes tools for data mining, which sift through data sets in search of patterns and relationships; predictive analytics, which build models for forecasting customer behaviour and other future developments; machine learning, which tap algorithms to analyse large data sets; and deep learning, a more advanced offshoot of machine learning.

Text mining and statistical analysis software can also play a role in the big data analytics process, as can mainstream BI software and data visualization tools. For both ETL and analytics applications, queries can be written in batch-mode MapReduce; programming languages, such as RPython and Scala; and SQL, the standard language for relational databases that's supported via SQL-on-Hadoop technologies.

The Revolution of Datafication is just beginning.

Article Written & Edited By: Saumit Patki

Published by

Saumit Patki
Category Others   Report



Related Articles