Data Analytics

Data is the raw material but on its own, holds little value. Before data can create value for a business, it needs to be refined and analysed; this process is referred to as data analytics. The ultimate goal of data analytics is to turn raw data into insight which can be acted on to create business value.

Types of Data Analytics

Advanced data analytics enable you to make better, data-driven decisions with reduced reliance on experience and gut-feel. Data analytics is roughly split into three categories. Each type of data analytics attempts to answer a different question as we progress from looking at the past and answering “What happened?” and looking into the future to answer “What will happen?” or “How can we make it happen?”.
* Gartner ascendancy model

What is Big Data Analytics?

Big data refers to massive amounts of data that cannot be stored or processed in a traditional relational database. It is the use of advanced analytics, including predictive modelling, against these large data sets containing both structured and unstructured data, from diverse data streams. Big data will typically have a large volume, velocity or variety of data. The fundamentals of data analytics apply equally to relational or big data analytics; mainly the tools used are different.

Data is the fuel that powers any successful analytics project

Predictive Analytics Lifecycle

The predictive analytics lifecycle is a continuous process of learning and improvement. The process consists of approximately six lifecycle stages revolving around the primary business goal.

Setting clear and measurable goals is at the core of any data analytics process, and the entire lifecycle revolves around this primary objective. To define the goals and objectives, you need to be asking the right questions. Domain knowledge and descriptive analytics help immensely to determine the right questions.

When starting a data analytics project and setting goals, at a bare minimum, ask yourself these questions. If there was a dashboard with a gauge and the purpose of the project is to ‘move the needle’:

  • What performance indicator is the gauge measuring?
  • By what percentage must the needle move to create ROI?
  • How do we measure uplift?

Appoint a data analytics “champion” − an executive, or a person with authority, to spearhead the implementation of the organisation’s data strategy. Change management is crucial for success as the workforce transitions from using gut feel and experience to using data and analytics to make better decisions.

After requirement gathering and defining objectives and measurements, it’s time to collect relevant data. Preferably lots of historical data. Predictive modelling is most accurate when there is sufficient data to establish strong trends and relationships.

Every dataset has its own nuances that are business-specific. For example, transaction types could cause every row in the dataset to have a different meaning, depending on this transaction type. There could be a historical event that caused a significant  but temporary deviation in data, for example, a fire or natural disaster that disrupted business operations.  These nuances are only known internally and typically not well documented. It’s not enough to just gather the data, it’s important to collaborate closely with the internal teams in order to uncover these nuances and outliers, and gain a solid understanding of what the data means, at its core.

Domain knowledge and experience help tremendously during this step. You need to “know what you don’t know”, and looking at data is not enough. You need a solid understanding of the operational processes as well.

Data is the fuel that powers data analytics; inferior data produces inferior results, so data goes through several phases of refinement and preparation. Data is extracted, transformed and loaded (ETL) before feature extraction and modelling begins. It is a crucial step that will often take a large chunk of the total project time, and with good reason. 

Data gathering, cleaning and merging are used to identify and remove errors, outliers and inconsistencies affecting data quality and accuracy. The goal is to create repeatable data pipelines that continuously feed raw data into big data and relational database storage. Data Engineers typically handle this process.

When the data cleaning and ETL steps are complete, Data Scientists can apply advanced analytics techniques, including machine learning and predictive modelling, to convert clean data into insight and business value. Because we are predicting future events, there always needs to be a way to validate the accuracy and estimate uplift during the development and training of the model, and not at a future date. 

There needs to be a level of trust in the accuracy before it is actioned operationally. Trust is achieved by referencing back to the ‘gauge and needle’ we aim to move and the project’s primary objective. Model assessment reports and dashboards are used to track accuracy and overall uplift. Assessment reports allow data scientists to apply various algorithms and machine learning techniques and benchmark each against the assessment dashboards until a satisfactory level of accuracy is achieved and a top candidate is identified.

Model assessment is typically benchmarked by time-boxing and splitting the data into a training and validation dataset. For example, the training dataset could be data older than six months; this dataset is used to train the machine learning model to predict the future. The validation dataset could be data from 6 months ago up to today; this dataset is used to assess the accuracy of the predictions and the trained model without having to wait. In essence, this enables us to validate the model accuracy over six months in a simulated environment but using real-life data. 

Through this process, we will not only be able to identify the best-performing algorithm or method, but we will also be able to estimate the accuracy and uplift with a high degree of confidence. If the model is X% accurate for the past six months, we can safely assume that the same level of accuracy, or better, will be achieved when trained with a full dataset to predict the following six months.

When the model is trained, tested and benchmarked, predictions and insight need to be actioned and utilised operationally. To use an analogy, predicting the winning lottery numbers and not buying a lottery ticket invalidates all the effort; this is where the importance of a business champion and custodian comes into play. This champion is typically an executive sponsor or someone with authority to handle change management to action the insight and measure the uplift. To ‘buy the lottery ticket’!

The last step is the ongoing monitoring of the model’s performance. The context around us changes continuously, and new data sources become available, data-drifts, unexpected events happen (corona!), etc. These changes could affect performance. It’s essential to monitor and continuously improve the accuracy and performance of the models in order to maximise business value.

When the predictions are actioned, they will produce new sets of data, or data could be enriched using external data sources. There will always be new data that needs to go back into the business analytics lifecycle in order for the predictive models to learn and improve continuously.

Business analytics lifecycle is cyclical and only improves over time as more relevant data becomes available.

What is Artificial Intelligence, Machine Learning and Deep Learning?

Artificial intelligence (AI), machine learning (ML) and deep learning (DL) fall under the same umbrella but mean different things. AI is a broad term describing any machine that can learn and perform tasks that typically require human intelligence. You can think of machine learning and deep learning as the enabler of AI.

Artificial Intelligence

Artificial intelligence is the broad umbrella term encapsulating any program or machine attempting to mimic human behaviour in some way.
The surge of AI in recent times was made possible by the availability of large amounts of data and the wide adoption of cloud computing tools that can process data faster than ever before. Today, AI is everywhere around us, providing turn-by-turn directions when we need them, recommending what we should buy or watch next and automatically tagging us in images and videos. While AI is the broad marketing term, machine learning and deep learning is the enabler for creating AI-based applications.

The primary potential of AI lies in its ability to collect large volumes of data at high speed, recognise patterns, learn from them, and enable better decision-making.

Artificial intelligence is the broad umbrella term encapsulating any program or machine attempting to mimic human behaviour in some way.

Machine Learning

Machine learning refers to a technique that gives computer programs the ability to learn from data, without explicitly being programmed to do so. Just like humans learn from experiences, machine learning enables programs to learn from historical data, allowing businesses to make decisions based on the trends and relationships in the data. Model accuracy is affected by the volume of data available to learn from, the strength of the trends, and the algorithms used.

To help conceptualise machine learning, let’s look at how we might apply ML to build personalised customer recommendations in a retail environment.

Sales and transaction histories usually have patterns and relationships in the data, some are obvious, but most are not. If you have a large number of customers, it would be challenging to analyse trends for each customer individually and consistently determine the most viable product or service to trigger an intent. ML can quickly solve this challenge. Historical data is used to train and teach a machine-learning algorithm to analyse relationships between products, customers, price, transactions, clickstreams and other features or data properties, to statistically determine in real-time the products or services with the highest probability of conversion. The algorithm learns on an ongoing basis and adjusts to the changing context and available data. The trained ML model is, in essence, a “rules engine” that is trained for one purpose, to increase sales, without explicitly being programmed to do so by a programmer. When the trends and patterns in the data change, so will the rules to always maintain maximum efficiency.

Deep Learning

The more recent deep learning is a subset and evolution of machine learning. It is modelled based on how the neurons in the human brain work. DLg uses multiple layers of algorithms called artificial neural networks to analyse complex data without human intervention. The most human-like AI, such as autonomous vehicles, image and voice recognition, natural language processing, etc., are made possible by deep learning. DL is particularly effective when there is a massive volume of data, the data is not structured and labelled, or the problem is too complex to explain or solve with traditional machine learning.

“In traditional Machine learning techniques, most of the applied features need to be identified by a domain expert in order to reduce the complexity of the data and make patterns more visible to learning algorithms to work. The biggest advantage of Deep Learning algorithms is that they try to learn high-level features from data in an incremental manner. This eliminates the need for domain expertise and hardcore feature extraction.” Source


Data transformation and processing

Argility is a Google Cloud Build Partner. We utilise a microservices architecture and best-of-breed Google Cloud Analytics tools, including Google Big QueryGoogle DataflowGoogle AI, and Google Vertex AI to handle any data transformation or machine learning workload with ease.