@lumarseg

Data Analytics 101 | Introduction

En this note, I will present the concept of data analytics and data analysis solutions. I will discuss the challenges that may arise when working with large datasets that need to provide relevant information quickly. I will also introduce the five V’s of data analysis and highlight some questions that should be asked when starting to plan a data analysis solution.

(Note: The translation assumes “data analytics” and “data analysis solutions” refer to the same concept, as they are often used interchangeably in various contexts.)

1: Data analytics and data analysis solutions

Let’s play with some definitions:

  • Analysis is a detailed study of something to understand its nature or determine its fundamental characteristics. Data analysis is the process of compiling, processing, and examining data with the objective of using it to make decisions.
  • Analytics is the systematic analysis of data. Data analytics is the specific analytical process applied.
  • Data analytics is essential for both small and large businesses. The processes of data analytics come together to create data analysis solutions that help companies decide where and when to launch new products, when to offer discounts, and when to market products in new areas. Without the data provided by data analytics, many decision-makers would base their choices on intuition and mere luck.

As companies begin to implement Data Analytics Solutions, challenges arise. These challenges depend on the characteristics of the data and the analytics needed for their specific use case. In the past, these challenges were defined as “big data” challenges. However, in the current cloud-based environment, these challenges can arise in small or slow datasets almost as frequently as in very large and fast datasets.

The objective of this note is to guide you in identifying the most suitable data analytics solution for specific needs and define a strategic implementation effectively.

  • Big Data refers to extremely large and complex datasets that exceed the capabilities of traditional data processing tools to be stored, managed, and analyzed efficiently.

Regarding the concept of Big Data, a myth has arisen, which corresponds to the mistaken idea that Big Data encompasses all cases when, in fact Big Data is a part of the solutions of Data Analytics. Big Data is a term that has undergone significant changes in the industry in recent years. Big data solutions are commonly integrated into data analytics solutions

Organizations invest millions of dollars in data storage. However, the real challenge lies not in finding the data but rather in effectively utilizing it to derive valuable insights and actions.

Benefits of large-scale data analytics

  • Personalization based on the customer: Show customers products according to their consumption habits.
  • Fraud detection: Determine if a transaction is fraudulent.
    Security threat detection: Identify higher security risks from malicious agents based on usage patterns.
  • User behavior: Identify the type of products or services a social network user might be interested in promoting.
  • Financial models and forecasts: Predict future market changes.
  • Real-time alerts: Determine the problem and who receives the notification.

To achieve effective data analysis solutions, it is essential to have both storage and the capability to analyze data almost in real-time with low latency, in order to obtain high-value benefits.

Big Data challenge

Data is generated in myriad ways, and the major question lies in both storing this vast amount of data and harnessing its potential to create value and gain competitive advantages. The challenges encountered in numerous data analysis solutions can be categorized into five pivotal aspects (the 5Vs): 

  • Volume.
  • Velocity. 
  • Variety.
  • Veracity.
  • Value.

It’s important to note that not all organizations face challenges in every area. Some struggle with efficiently ingesting vast volumes of data in a timely manner. Others encounter difficulties in processing these massive data sets to extract valuable predictive insights. Additionally, there are organizations where users require the ability to conduct intricate data analyses on enormous datasets on the fly.

Components of a data analytics solution

A data analytics solution comprises numerous components. The analytics applied to each of these components may necessitate different services and approaches.

A data analytics solution includes the following components.

  • Ingestion / Collection: Gathering raw data from transactions, logs, and IoT devices presents a significant challenge. An effective data analysis solution enables developers to ingest a wide variety of data (structured, semi-structured, and unstructured) at any speed, whether through batch processing or real-time streaming.
  • Storage: A robust data analysis solution must provide secure, scalable, and durable storage options. This includes data warehouses capable of accommodating structured, semi-structured, and unstructured data. For instance, data warehouses efficiently manage structured analytical data, databases handle both structured and semi-structured data, while data lakes can store all three data types.
  • Processing / Analysis: The initial step involves processing the data for easier consumption. During this stage, data is also subjected to analysis, typically involving sorting, summarizing, joining, and applying business logic to generate meaningful analytical datasets. The final outcome is loading these analytical datasets into a new storage location, such as a data lake, database, or data warehouse.
  • Consumption / Visualization: Data can be consumed in two main ways: through queries or Business Intelligence (BI) tools. Queries yield results suitable for data analysts to perform quick analyses. On the other hand, BI tools generate visualizations presented in reports and dashboards to assist users in exploring the data and making informed decisions.

By effectively integrating these components, a data analytics solution empowers organizations to extract valuable insights and drive data-informed decision-making processes.

The ingestion component is where services gather data from various sources.
The storage component stores the data in repositories.
The processing component is where services manipulate the data to fit into necessary formats.
The consumption component is where data is presented in the required formats.

2: Data Analytics Challenges.

As the volume, velocity, variety, veracity, and value of data continue to surge, some data management challenges surpass the capabilities of traditional processing and database solutions. This is where data analytics solutions step in.

Before we proceed further, let’s provide a brief definition of each of the five challenges to better comprehend their significance.

  • Volume: Volume refers to the amount of data ingested by the solution, i.e., the total size of the incoming data. Solutions must efficiently operate across distributed systems and be easily scalable to handle traffic spikes.
  • Velocity: Velocity pertains to the speed at which data enters a solution. Many organizations now require near-real-time data ingestion and processing. The high velocity of data results in shorter analysis time compared to traditional data processing. Solutions must effectively manage this speed. Processing systems must be capable of delivering results within an acceptable timeframe.
  • Variety: Data can originate from diverse sources. Variety relates to the number of different sources (and types of sources) that the solution will utilize. Solutions must be sophisticated enough to handle all different types of data while providing accurate data analysis.
  • Veracity: Veracity refers to the level of accuracy and reliability of data. It depends on data integrity and credibility. Solutions should be able to identify common data errors and correct them before storage, a process known as “data cleansing.” This process should be achievable within solution time requirements, including real-time processing speeds.
  • Value: Value is the ability of a solution to extract meaningful insights from stored and analyzed data. Solutions must be capable of delivering analytical results in the right format to provide information to decision-makers and stakeholders through trusted reports and dashboards.

3. Planning a Data Analytics Solution

Data analytics solutions encompass various types of analysis to store, process, and visualize data. To begin planning a data analytics solution, you need to understand what you require from that solution.

  1. Know the source of your data.
  2. Understand the options for processing the data.
  3. Identify what insights you need to glean from your data.

Knowing the source of your data

Most data ingested by data analytics solutions comes from databases and file repositories in existing on-premises installations. These data sets often require minimal processing within the solution.

Streaming data is an emerging source of enterprise data gaining popularity. This data source is less structured. You may need special software to collect the data and specific processing applications to aggregate and analyze it correctly, almost in real-time.

Public datasets are another data source for businesses. These include census, healthcare, population data, and many other datasets that help businesses interpret the data they collect about their customers. These data sets may need transformation to contain only what the company requires.

Understanding the options for processing the data.

There are many different solutions available for processing data. There is no one-size-fits-all approach. You must carefully assess the business needs and match them with the services that combine to deliver the required results.

Identify what insights you need to glean from your data.

You must be prepared to learn from your data, collaborate with internal teams to optimize efforts, and be willing to experiment.

It is essential to identify trends, establish relationships, and drive more efficient and profitable business decisions by putting data to work.