How does Divvit's Data Driven Attribution work?

We have built a machine learning attribution model, by using a LSTM (Long Short-Term Memory) neural network. We input sessions (visits) and sequences (a visitors full journey) into the network and the network gives us a probability of purchase after each session.

We then take those probabilities and apply a weighted formula to them, to find out the effect each session had on the final decision to place an order.

Our neural network is self learning, which means the more data we give it, the smarter (and more accurate) it gets!

So far we’ve analyzed over 1 billion events
So in academic terms our network has its own PhD in marketing attribution.
Let's take a closer look at the process

What happens after we input the data?

When we have put in all the data from visits and user journeys, the network analyzes 14 different factors from each, these are:

Channel group and Channel that the visit came from
What industry the ecommerce merchant is in
Date and time of the visit
Device used
Landing page of the visit
Landing page category
The visits duration
Page categories visited
Marketing campaign
Number of pageviews for the visit
Did the visitor bounce or not
If it’s the customer’s first order or not
Order being made or not
If it’s a new or returning visitor

The network then outputs a probability of purchase score from 0-1 for each session in a sequence. The higher the number, the higher the probability of purchase. What typically happens is that the number steadily rises as each visit takes place, until it reaches a peak. After which a decrease will be observed, along with more unstablility thereafter. How much the probability changes after each visit varies a lot and is affected by each individual factor listed above.

What do we do with the output data?

The final step is to apply the weighted formula to the output data. Each session in a sequence will be given a weighted score. The sum of all the weighted scores from a sequence will always equal 1. The way the scoring works is that the higher the score, the more impactful that visit was on the final purchase.

As an example, if the probability of a purchase after the first visit is 20% and after the second visit 50%, these will be weighted so the first visit had 40% impact on the order and the second had 60% impact. To decide the actual value of each visit, this impact is then multiplied with the order value of the purchase. If an order of €200 was made, the value of the first visit was €80 and for the second it was €120.

What makes our model so good?

To start with, standard attribution models only take the position of each visit into account. With a last click model, the last click get 100% of the value, with a first click model the first visit receives all the value and with any multi-touch model the value is simply distributed according to a fixed formula. Our Data Driven Attribution model not only takes the position of a visit in consideration, it also takes 14 different parameters into account when analyzing how much value a visit actually added.

How do we test the accuracy of our model?

In order to determine the accuracy of our model, we utilize something called the Receiver Operating Characteristic Curve or simply ROC Curve. It is a great tool to use when testing the accuracy of our predictive model. It helps our dev team and mathematicians further tweak and optimize the model.

Essentially what the ROC Curve does is to test how many predictions the model got right and how many it got wrong.

These predictions are labeled as:

True Positive
When the model correctly predicts that a visitor would convert
True Negative
When the model correctly predicts a visitor will not convert
False Positive
When the model incorrectly predicts a visitor will convert
False Negative
When the model incorrectly predicts a visitor will not convert

The ROC curve is created by plotting the ‘true positive’ against the ‘false positive’.

The way you quantitatively read an ROC Curve graph is by looking at the ‘Area Under the Curve’ or AUC. A score of 1 would mean perfection. Whilst no model can achieve perfection, the aim is to get as close to 1 as possible.

In general these are the classified ranges of scores

From 0.9 to 1
From 0.8 to 0.9
From 0.7 to 0.8
From 0.6 to 0.7
From 0.5 to 0.6

If you look at the graph below you’ll see that we’ve measured our predictions for two scenarios.

You’ll see that after only analyzing one session we generated an AUC of 0.872 which is considered good. Once we’ve had the opportunity to analyze multiple sessions though, our predictions become even more accurate and achieve an excellent score of 0.93. This is in line with logic, which would dictate that with more visits to analyze, the model will be able to more reliably predict the probability of a purchase.

Whilst these are very good scores, our attribution team is never satisfied and are constantly testing the model to find opportunities for improvement.