Data Analytics

Data is the raw material but on its own hold’s little value. Before data can create value for a business, it needs to be refined and analysed; this refinement process is referred to as data analytics. The ultimate goal of data analytics is to turn raw data into insight which can be acted on to create business value.

Types of Data Analytics

Advanced data analytics enable you to make better, data-driven decisions, with reduced reliance on experience and “gut-feel”. Data analytics is roughly split into three categories, as can be seen in the Gartner ascendancy model. Each type of data analytics attempts to answer a different question as we progress from looking at the past and answering “What happened?” and looking into the future to answer “What will happen?” or “How can we make it happen?”.
data analytics

What is Big Data Analytics?

Big data refers to massive amounts of data that cannot be stored or processed in a traditional relational database. Big data analytics refers to the use of advanced analytics, including predictive modelling, against these large data sets containing both structured and unstructured data, from diverse data streams. Big data will typically have a large volume, velocity or variety of data. The fundamentals of data analytics apply equally to relational or big data analytics; mainly the tools used are different.
big Data analytics

Predictive Analytics Lifecycle

Predictive analytics lifecycle is a continuous process of learning and improvement. The process consists of approximately six lifecycle stages revolving around the primary business goal

analytics lifecycle

Setting clear and measurable goals is at the core of any data analytics process, and the entire lifecycle revolves around this primary objective. To define the goals and objectives, you need to be asking the right questions and domain knowledge and descriptive analytics help immensely to determine the right questions.

When starting a data analytics project and setting goals, at a bare minimum, ask yourself these questions. If there was a dashboard with a gauge and the purpose of the analytics project is to ‘move the needle’.

  • What performance indicator is the gauge measuring?
  • By what percentage must the needle move in order to create ROI?
  • How do we measure uplift?

Appoint a data analytics “champion” − an executive, or a person with authority, to spearhead the implementation of the organisation’s data strategy. Change management is crucial for success as the workforce transitions from using gut feel and experience, to using data and analytics to make better decisions.

After requirement gathering and defining objectives and measurements, it’s time to collect relevant data. Preferably lots of historical data. Predictive modelling is most accurate when there is sufficient data to establish strong trends and relationships.

 
Every dataset has its own nuances that are business-specific. For example, there could be transaction types that cause every row in the dataset to have a different meaning, depending on this transaction type. There could be a historic event that caused a big but temporary deviation in data, for example, a fire or natural disaster that disrupted business operation for a period of time.  These sort of nuances are only known internally, and typically not well documented. It’s not enough to just gather the data, it’s important to collaborate closely with the internal teams in order to uncover these nuances and outliers. and gain a solid understanding of what the data means, at its core.
 
Domain knowledge and experience help tremendously during this step. You need to “know what you don’t know” and looking at data is not enough, you need a solid understanding of the operational processes as well.

Data is the fuel that powers data analytics; inferior data produces inferior results, so data goes through several phases of refinement and preparation. Data is extracted, transformed and loaded (ETL) before the process of feature extraction and modelling begins. It is a crucial step that will often take a large chunk of the total project time, and with good reason. Data gathering, cleaning and merging are used to identify and remove errors, outliers and inconsistencies that can affect data quality and accuracy. The goal is to create repeatable data pipelines that continuously feed raw data into big data and relational database storage. Data Engineers typically handle this process.
When the data cleaning and ETL steps are complete, Data Scientists can apply advanced analytics techniques, including machine learning and predictive modelling, to convert clean data into insight and business value. Because we are predicting future events, there always needs to be a way to validate the accuracy and estimate uplift during the development and training of the model, and not at a future date. There needs to be a level of trust in the accuracy before it is actioned operationally. Trust is achieved by referencing back to the ‘gauge and needle’ we are aiming to move, and the primary objective of the project. Model assessment reports and dashboards are used to track accuracy and overall uplift. Assessment reports allow data scientists to apply a variety of algorithms and machine learning techniques and benchmark each against the assessment dashboards until a satisfactory level of accuracy is achieved, and a top candidate is identified.
Model assessment is typically benchmarked by time-boxing and splitting the data into a training and validation dataset. For example, the training dataset could be data older than six months; this dataset is used to train the machine learning model to predict the future. The validation dataset could be data from 6 months ago up to today; this dataset is used to assess the accuracy of the predictions and the trained model without having to wait. In essence, this enables us to validate the model accuracy over six months in a simulated environment but using real-life data. Through this process, we will not only be able to identify the best performing algorithm or method, but we will also be able to estimate the accuracy and uplift with a high degree of confidence. If the model is X% accurate for the past six months, we can safely assume that the same level of accuracy, or better, will be achieved when trained with a full dataset to predict the following six months.
When the model is trained, tested and benchmarked, predictions and insight need to be actioned and utilised operationally. To use an analogy, predicting the winning lottery numbers and not buying a lottery ticket invalidates all the effort, this is where the importance of a business champion and custodian comes into play. This champion is typically an executive sponsor, or someone with authority to handle change management to action the insight and measure the uplift. To ‘buy the lottery ticket’!
The last step is the ongoing monitoring of the model’s performance. The context around us changes continuously, new data sources become available, data-drifts, unexpected events happen (corona!), etc. These changes could affect the performance. It’s essential to monitor and continuously improve the accuracy and performance of the models, in order to maximise business value.
When the predictions are actioned, they will produce new sets of data, or data could be enriched using external data sources. There will always be new data which needs to go back into the business analytics lifecycle in order for the predictive models to learn and improve continuously.

Business analytics lifecycle is cyclical and only improves over time as more relevant data becomes available.

What is Artificial Intelligence, Machine Learning and Deep Learning?

Artificial intelligence (AI), machine learning (ML) and deep learning fall under the same umbrella but mean different things. AI is a broad term describing any machine that is able to learn and perform tasks that typically require human intelligence. You can think of machine learning and deep learning as the enabler of AI.

ai overview

Data transformation and processing

ATG is a Google Cloud Build partner. We utilise a microservices architecture and best-of-breed Google Cloud Analytics tools including Google Big QueryGoogle DataflowGoogle AI, and Kubeflow to handle any data transformation or machine learning workload with ease.