Data Analytics

Data is the raw material but on its own hold’s little value. Before data can create value for a business, it needs to be refined and analysed; this refinement process is referred to as data analytics. The ultimate goal of data analytics is to turn raw data into insight which can be acted on to create business value.

Types of Data Analytics

Advanced data analytics enable you to make better, data-driven decisions, with reduced reliance on experience and “gut-feel”. Data analytics is roughly split into three categories, as can be seen in the Gartner ascendancy model. Each type of data analytics attempts to answer a different question as we progress from looking at the past and answering “What happened?” and looking into the future to answer “What will happen?” or “How can we make it happen?”.
data analytics

Descriptive analytics is the most commonly used form of data analytics. This form of analytics describes and presents historical data in a way that can easily be summarised and understood. It answers the question of “What happened?”. Descriptive analytics allows businesses to learn from past events and understand how they may influence the future. Descriptive analytics is usually the starting point to inform, diagnose and prepare data for advanced analytics techniques.

Example uses of descriptive analytics.

  • Reports such as sales, revenue, profit etc. aggregated by region, branch, customer segment, etc.
  • Customer or product segmentation based on predefined properties and classifications
  • Ability to drill-down, to understand and “diagnose” the data

    Visit our PredictCustomer or PredictInventory page for domain-specific examples.

Predictive analytics is the application of data miningstatistical modelling and machine learning to predict “What is likely to happen in the future?”. While there is no crystal ball to see into the future, predictive analytics enables you to make an educated guess using mathematics and statistics, not gut-feel. Predictions are typically created by analysing historical data, for example, using machine learning, then attempting to fill future data based on the patterns and relationships in the data. Unlike descriptive analytics that focuses on the past, predictive analytics focuses on future events which did not yet happen; this means we can do something about them. We can use the predictive insight to make informed decisions and effective business strategies based on probabilities.

Example uses of predictive analytics. 

  • Using POS and ERP data to forecast future sales, by day, week, month, store, etc.
  • Using past customer purchasing behaviours to predict what and when a customer may purchase next
  • Using purchasing histories to predict lifetime value over a predetermined period
  • Predictive maintenance of vehicles and machinery

    Visit our PredictCustomer or PredictInventory page for domain-specific examples or click here to learn more about predictive analytics.

The most advanced and complex form of analytics, and the most rarely used one. While descriptive analytics focuses on what happened, and predictive on what may happen, prescriptive attempts to answer “How do we make it happen?”. “What is the best course of action to ensure that a simulated outcome is achieved?”

Prescriptive analytics is the combination of data, both internal and external, business rules, boundaries and simulation, to find the best path to the desired outcome. Prescriptive analytics is the final stage of the analytics journey and comparatively the most complex and challenging to manage, requiring deep domain expertise and ongoing fine-tuning.

Click here to learn more about prescriptive analytics.

What is Big Data Analytics?

Big data refers to massive amounts of data that cannot be stored or processed in a traditional relational database. Big data analytics refers to the use of advanced analytics, including predictive modelling, against these large data sets containing both structured and unstructured data, from diverse data streams. Big data will typically have a large volume, velocity or variety of data. The fundamentals of data analytics apply equally to relational or big data analytics; mainly the tools used are different.
big Data analytics

Predictive Analytics Lifecycle

Predictive analytics lifecycle is a continuous process of learning and improvement. The process consists of approximately six lifecycle stages revolving around the primary business goal

analytics lifecycle

Setting clear and measurable goals is at the core of any data analytics process, and the entire lifecycle revolves around this primary objective. To define the goals and objectives, you need to be asking the right questions and domain knowledge and descriptive analytics help immensely to determine the right questions.

When starting a data analytics project and setting goals, at a bare minimum, ask yourself these questions. If there was a dashboard with a gauge and the purpose of the analytics project is to ‘move the needle’.

  • What performance indicator is the gauge measuring?
  • By what percentage must the needle move in order to create ROI?
  • How do we measure uplift?

Appoint a data analytics “champion” − an executive, or a person with authority, to spearhead the implementation of the organisation’s data strategy. Change management is crucial for success as the workforce transitions from using gut feel and experience, to using data and analytics to make better decisions.

After requirement gathering and defining objectives and measurements, it’s time to collect relevant data. Preferably lots of historical data. Predictive modelling is most accurate when there is sufficient data to establish strong trends and relationships.

Every dataset has its own nuances that are business-specific. For example, there could be transaction types that cause every row in the dataset to have a different meaning, depending on this transaction type. There could be a historic event that caused a big but temporary deviation in data, for example, a fire or natural disaster that disrupted business operation for a period of time.  These sort of nuances are only known internally, and typically not well documented. It’s not enough to just gather the data, it’s important to collaborate closely with the internal teams in order to uncover these nuances and outliers. and gain a solid understanding of what the data means, at its core.
Domain knowledge and experience help tremendously during this step. You need to “know what you don’t know” and looking at data is not enough, you need a solid understanding of the operational processes as well.

Data is the fuel that powers data analytics; inferior data produces inferior results, so data goes through several phases of refinement and preparation. Data is extracted, transformed and loaded (ETL) before the process of feature extraction and modelling begins. It is a crucial step that will often take a large chunk of the total project time, and with good reason. Data gathering, cleaning and merging are used to identify and remove errors, outliers and inconsistencies that can affect data quality and accuracy. The goal is to create repeatable data pipelines that continuously feed raw data into big data and relational database storage. Data Engineers typically handle this process.
When the data cleaning and ETL steps are complete, Data Scientists can apply advanced analytics techniques, including machine learning and predictive modelling, to convert clean data into insight and business value. Because we are predicting future events, there always needs to be a way to validate the accuracy and estimate uplift during the development and training of the model, and not at a future date. There needs to be a level of trust in the accuracy before it is actioned operationally. Trust is achieved by referencing back to the ‘gauge and needle’ we are aiming to move, and the primary objective of the project. Model assessment reports and dashboards are used to track accuracy and overall uplift. Assessment reports allow data scientists to apply a variety of algorithms and machine learning techniques and benchmark each against the assessment dashboards until a satisfactory level of accuracy is achieved, and a top candidate is identified.
Model assessment is typically benchmarked by time-boxing and splitting the data into a training and validation dataset. For example, the training dataset could be data older than six months; this dataset is used to train the machine learning model to predict the future. The validation dataset could be data from 6 months ago up to today; this dataset is used to assess the accuracy of the predictions and the trained model without having to wait. In essence, this enables us to validate the model accuracy over six months in a simulated environment but using real-life data. Through this process, we will not only be able to identify the best performing algorithm or method, but we will also be able to estimate the accuracy and uplift with a high degree of confidence. If the model is X% accurate for the past six months, we can safely assume that the same level of accuracy, or better, will be achieved when trained with a full dataset to predict the following six months.
When the model is trained, tested and benchmarked, predictions and insight need to be actioned and utilised operationally. To use an analogy, predicting the winning lottery numbers and not buying a lottery ticket invalidates all the effort, this is where the importance of a business champion and custodian comes into play. This champion is typically an executive sponsor, or someone with authority to handle change management to action the insight and measure the uplift. To ‘buy the lottery ticket’!
The last step is the ongoing monitoring of the model’s performance. The context around us changes continuously, new data sources become available, data-drifts, unexpected events happen (corona!), etc. These changes could affect the performance. It’s essential to monitor and continuously improve the accuracy and performance of the models, in order to maximise business value.
When the predictions are actioned, they will produce new sets of data, or data could be enriched using external data sources. There will always be new data which needs to go back into the business analytics lifecycle in order for the predictive models to learn and improve continuously.

Business analytics lifecycle is cyclical and only improves over time as more relevant data becomes available.

What is Artificial Intelligence, Machine Learning and Deep Learning?

Artificial intelligence (AI), machine learning (ML) and deep learning fall under the same umbrella but mean different things. AI is a broad term describing any machine that is able to learn and perform tasks that typically require human intelligence. You can think of machine learning and deep learning as the enabler of AI.

ai overview
Artificial intelligence is the broad umbrella term encapsulating any program or machine attempting to mimic human behaviour in some way.
The surge of AI in recent times was made possible by the availability of large amounts of data and the wide adoption of cloud computing tools that can process data faster than ever before. Today, AI is everywhere around us, providing turn-by-turn directions when we need them, recommending what we should buy or watch next and automatically tagging us in images and videos. While AI is the broad marketing term, machine learning and deep learning is the enabler for creating AI-based applications.

The primary potential of AI lies in its ability to collect large volumes of data at high speed, recognise patterns, learn from them, and enable better decision-making.

ai timeline

Machine learning refers to a technique that gives computer programs the ability to learn from data, without explicitly being programmed to do so. Just like humans learn from experiences, machine learning enables programs to learn from historical data, allowing businesses to make decisions based on the trends and relationships in the data. Model accuracy is affected by the volume of data available to learn from, the strength of the trends, and the algorithms used.

To help conceptualise machine learning, let’s look at how we might apply ML to build personalised customer recommendations in a retail environment.

Sales and transaction histories usually have patterns and relationships in the data, some are obvious, but most are not. If you have a large number of customers, it would be challenging to analyse trends for each customer individually and consistently determine the most viable product or service to trigger an intent. Machine learning can quickly solve this challenge. Historical data is used to “train” and teach a machine-learning algorithm to analyse relationships between products, customers, price, transactions, clickstreams and other “features” or data properties, to statistically determine in real-time the products or services with the highest probability of conversion. The algorithm learns on an ongoing basis and adjusts to the changing context and available data. The trained ML model is, in essence, a “rules engine” that is trained for one purpose, increase sales, without explicitly being programmed to do so by a programmer. When the trends and patterns in the data change, so will the rules to always maintain maximum efficiency.

machine learning

The more recent deep learning is a subset and evolution of machine learning. It is modelled based on how the neurons in the human brain work. Deep learning uses multiple layers of algorithms called artificial neural networks to analyse complex data without human intervention. The most human-like AI, such as autonomous vehicles, image and voice recognition, natural language processing, etc., are made possible by deep learning. Deep learning is particularly effective when there is a massive volume of data, the data is not structured and labelled, or the problem is too complex to explain or solve with traditional machine learning.

“In traditional Machine learning techniques, most of the applied features need to be identified by a domain expert in order to reduce the complexity of the data and make patterns more visible to learning algorithms to work. The biggest advantage of Deep Learning algorithms is that they try to learn high-level features from data in an incremental manner. This eliminates the need for domain expertise and hardcore feature extraction.” Source

Deep learning vs machine learning

Data transformation and processing

ATG is a Google Cloud Build partner. We utilise a microservices architecture and best-of-breed Google Cloud Analytics tools including Google Big QueryGoogle DataflowGoogle AI, and Kubeflow to handle any data transformation or machine learning workload with ease.