Graphs in Digital Marketing Optimization
Digital marketing optimization is both big business and a big data, exploiting the whole arsenal of machine learning and big data engineering. Still, on a UI and conceptual level data visualization remains crucial for one’s understanding towards tuning touch-points. Sankey diagrams, if shaped appropriately, compellingly enact the interaction flow and bring forward the crux of your digital marketing. We demonstrate how the yFiles diagramming library can articulate this and help you to implement your own vibrant marketing visualization.
Once upon a time, all roads were leading to Rome. Today Rome is still a popular holiday destination, and many roads - not all - lead to it. Many roads lead elsewhere, and unless you have Rome as your destination, you can end up pretty much anywhere in the world. Imagine, for a moment, that for your yearly holiday you would not decide about your destination in advance, where would you end up? You would not end up in a random place because your preferences define every next step, let’s say based on your cultural or culinary preferences. At each step, you make a choice, and this directs you towards some city or point of interest. Let’s presume Rome is the ultimate destination considering your preferences, how would you be able to end up in Rome without consciously deciding to go there (or even knowing about it)? Without any intermediate pointers, hints, or references to Rome, you would never get there. Clues and enticing elements need to point you towards it. This does not mean you would travel via the most optimal roads or series of intermediate cities. In fact, you can end up somewhere completely different if the clues are not positioned well, and the directive elements do not seduce you. Tourism is in a way precisely this: giving people lots of hints and hoping they buy it.
Imagine you need a new car, how do you proceed? You have some preference, a budget, and some ideas. Visiting a vendor’s website is a good move, of course. How many pages of the site will you visit? If the website is well-designed and answers well your questions, you will dig deeper and potentially ask for an offer. In marketing terms, the website should guide you towards a key-page where you get converted. Conversion, in this sense, can be many things: downloading a white-paper, buying something, sending a mail, signing up and so on.
In many cases, conversion means buying. A good e-commerce website manages to guide you via intermediate pages and links towards conversion: special promotions, banners, shouting hyperlinks, pretty picture, etc., anything and everything to make you buy the product. The destination is 'buying', and the intermediate clues are all there to entice you towards the conversion. Just like our example of Rome above, you initially did not decide to get converted (by landing somewhere on the website), but the site pushed you in that direction. Digital marketing optimization is all about converting visitors and finding the right clues and hints to do so.
Let’s focus on buying a digital product via a website. The analogy with roads and cities is valid in the following sense:
Cities are web pages, and there is a particular series of pages called the 'check-out process.' There may be a single or multiple pages, but visitors are only effectively converted when they have pressed the 'commit' or 'confirm' button.
Hyperlinks, buttons, and other elements leading to another page are direction signs along the roads. In marketing terms, one speaks of touchpoints: something the user interacts with on the way to conversion.
A path consists of a series of intermediate pages or parts of a page.
Optimal conversion corresponds to the most efficient road towards a city. In a way, your GPS gives you the optimal conversion from where you stand to your destination.
Alternative roads and points of interest are only valuable if they manage to keep you on the way to your destination. For instance, it’s unlikely that while visiting Paris, you get to see direction signs to Rome. Similarly, engaging website visitors to watch videos unrelated to the product is unlikely to seduce them to buy it. Typically, forums and documentation sites are valuable for existing customers but usually do not have a clear path towards buying something. It does not mean that somebody cannot get convinced about a product via documentation, but that the path from being convinced to effectively buying it should be as smooth as possible. Creating an account, confirming your address, broken links, server issues… the little details which can make or break conversion.
Note that even during the check-out process, one can exit. For this reason, no other pages are hyperlinked during this process to discourage exiting and enforce finishing the process, a bit like forcing tourists to enter the museum when standing in front of it.
How does one optimize conversion? This is a business and research domain on its own, and there are well-known proven approaches: A/B testing, search optimization (indexing), banners, and ads. Google Analytics is often used during Social Media Optimization (SMO) to analyze what visitors do (or try to do). The more tedious and expensive approach goes via data gathering and analysis. That is, one collects as much data from touchpoints as possible and tries to 'see' where visitors get lost on the way to conversion. Once the inadequate exit points and possible shortcuts to a faster conversion are identified, the website is altered, and new data is collected to 'prove' the newly discovered insights. A/B testing is nothing but assigning to different visitors a different alternative and demonstrate or find the best alternative. When an optimized strategy is found via data analysis, it’s often tested through an A/B process to show that the hypothesis is correct in the real-world and not just theoretically.
Outside the conventional basic approaches to optimization, one finds well-guarded corporate secrets involving approaches like portfolio optimization techniques, multi-channel attribution modeling, hidden Markov models, and much more. By combining social network data and graph analysis, one can go beyond all this, often referred to as the 360-degree approach. One tries to capture as much as possible about products and potential buyers on all available channels: mail, social media, account, buying history, anything. You do not need to include social network data to see how graphs are an integral part of marketing optimization, however.
Like roads and cities form a graph, so do web pages and touchpoints. The links between touchpoints correspond to transitions, and the more visitors make the transition, the more relevant a link is. Hence, the corresponding data model is a weighted, directed graph. The graph is a directed graph but is by no means without cycles (it’s not acyclic): visitors can go back and forth between two pages or end up on a previously visited page after a long detour. Note that a heavy link is not necessarily an important link towards conversion, it is simply used a lot.
As a side-note, the transition from a big dataset to a graph is conceptually simple but technically often a big-data challenge. Streaming info about thousands of touchpoints from thousands of visitors demands a special infrastructure. Storing the data, transforming it (ETL), and analyzing it usually involves Hadoop-like techniques leading to large budgets and dedicated teams.
Assuming this huge graph is available, how does one proceed? What does it take to find the optimal routes towards conversion(s)? Because visitors can enter a site via any page and transition in endless ways, the task of finding optimal routes is not as simple as finding the shortest paths or spanning trees in the graph. The sheer size of the graph usually makes this approach intractable. A possible road ahead goes via some abstract constructs and simplifications:
A graph representing transitions is a representation of a so-called Markov chain, and looking at visitor behavior in this fashion means you can use lots of algorithms based on Markov chains.
By cutting off the less probable transitions, one can simplify things a lot. For example, one typically don’t care about visits of the 'terms and conditions' page.
Disconnected parts of the graph containing the conversion touchpoint, paths not leading to conversion, loops, and cycles can all be discarded since they do not contribute to hot-paths.
At this point, the following insight helps the most:
Replacing the large amount of data gathered from each single transitions with the probability of making a transition from one touchpoint to another makes the problem much more manageable.
The result is a weighted directed graph with a much lower number of edges. Its edge weights correspond to the transition probabilities and are a direct mirror of people’s behavior on the website and, as such, not just some abstract theoretical construct.
The crucial cut-off threshold in all of this is which probabilities you wish to take into account, what is the threshold above which you include transitions? The lower the threshold, the more paths you allow. The higher the threshold, the more you focus on the most critical paths (hot-paths).
The hot-paths can be deduced from the resulting graph by looking at the shortest weighted paths. One looks at the heaviest paths since they comprise the paths with a high probability towards the conversion node. The conversion node is considered as an endpoint, and all heavy paths from the other nodes are considered. The resulting set of paths do not form a tree as one might initially think since any touchpoint can lead to any other and can itself be an intermediate one. Instead, the set of paths form a subgraph from the initial graph (Markov chain) describing the best possible paths towards conversion given the threshold value.
Decision-makers, executives, and marketers usually do not get excited by the latter statement, and this is where data visualization comes in. One could represent things via standard graphs, but the Sankey representation is particularly well-suited because
the horizontal flow emphasizes the motion from incoming visitors (on the left) flowing towards conversion (on the right),
the thickness of the bands is an indication of the strength of a transition,
the colors of the bands can be used to indicate the hotness of path: how fast a touchpoint leads to conversion.
Of course, such a Sankey diagram is often not the end of the story but part of a bigger picture. It can be complemented by other statistical techniques, and might contradict or corroborate other findings.
yFiles is a commercial programming library designed explicitly for graph visualization and is a perfect fit for the generation of Sankey diagrams. Amongst others, yFiles provides a sophisticated implementation of a layered graph layout algorithm that is well suited for small and large graphs, and can be easily configured for drawing Sankey diagrams.
Compared to ready-to-use Sankey drawing tools, a custom marketing visualization application built with yFiles can, for example,
automatically connect to your data source,
provide tailored filtering for your data,
use the powerful graph analysis algorithms,
integrate in your workflow, and more.
The Digital Marketing Optimization demo application that accompanies this use case contains a sample dataset of a probability graph of the transitions of a fictive company website. The data is representative of a typical organization selling digital goods. Note that the reduction from the original data to this graph is not part of the implementation.
In the application, users can specify the threshold of the transition probability, giving a coarser or finer set of hot-paths. After each change, the transition into the new layout preserves as much as possible of the previous state, enabling users to retain their mental map of the data.
The source code of the Digital Marketing Optimization demo application is available on GitHub. See the included readme file for usage instructions and implementation notes.