A core responsibility of Digital Analytics is its ability to feedback advertising tools and their respective algorithm using Conversion-Tracking. And usually, a tracked conversion has a value assigned to it, that represents its value to the advertiser. In most cases this can be a static value for a lead or the revenue, that was generated with an online sale. But when trying to create a more sophisticated setup, a common improvement is trying to send a more precise value to the advertising network. This can be a sales’ margin instead of generated revenue or a more precisely calculated value for a lead. But no matter how precise the value itself is, it does not represent one core issue with Conversion-tracking: Will there be a return of the purchased product? Or will there be a cancellation? Is the tracked lead going to be converted into business?And unfortunately these questions cannot be answered within the time frame for Conversion-Tracking given by advertising-networks (e.g. 3 days). To tackle this issue, I will demonstrate how create a Logistic Regression Model in BigQuery and deploy it to GCP’s Vertex AI. In Part 2, I will go through the required sGTM Setup to manipulate the regular Conversion-Value before sending it to your advertising-network. Create a Model in Google BigQuery BigQuery offers a SQL-Interface to create a variety of Machine-Learning Models with a wide range of options. In this case, I will display how to build a model, that predicts a leads’ probability to convert into a sale. Since we are trying to create a value between 0 (no sale) and 1 (sale), Logistic Regression is a good model-type to work with. The first thing you need to do is look at the data at your disposal and anticipate, what possible parameters could have an influence and a leads’ conversion. This can be details within the lead form, number of fields, that have not been filled, session- or product-page duration and so on. In addition to that, there are rather generic parameters you can use in many cases, like hour of the day, day of the week, device category, new- vs. existing user and so on. While building to model, you can always add or remove parameters depending on their contribution. After having a specific set of parameters in mind, the arguably most import step is up next: Data Cleansing. Filter out debug- as well as internal traffic data, choose the correct data type for every parameter and so on. Watch out for classics like upper-/lowercase, undefined values and bot traffic. In addition try to think about every parameter in a sense of what you want to signal towards your model. Do you want day of the week to be a string (“monday”) or do you want it to be an integer (1). The latter signalizes a steady relationship for rising/declining days. Or do want to go for a bool that describes, whether the day of the conversion is on the weekend? Last but not least, you need to join lead-completion data on your tracked leads, so that you have a boolean column, that describes, whether a lead has created a sale or not. When you have all your data cleaned and structured, you should load it into a BigQuery table as training data. Not let’s start creating the model! After running this SQL command, BigQuery will calculate and create a ML-model and safe it as ‘full_conversion_predicition’ in the dataset ‘ml_test’. After diving a little deeper into BigQuery and its machine learning capabilities, you are probably going to run this a few more times with updated model options until you get the result, that works best for you. Now you can apply your model using the following command: The above queries result will return both a probability (float between 0 and 1) as well as a integer (0 or 1) that is meant to predict the tracked-leads’ outcome. If you want to learn more on BigQuery machine learning, feel free to check out their documentation When you have a model, that meets all of your requirements and has the necessary precision to predict a leads conversion correctly most of the time, you can export the model from BigQuery to Google Cloud Storage using the “export model” button in BigQuery’s interface above of the model’s details page. When the model has been exported successfully, you can go to Vertex AI. There you can import a model from Cloud Storage (after activating all required APIs) by going to the “model registry” page and click the “import” button. After importing the model, you can go to the “online prediction” page and set up an HTTP-endpoint for your model. Assuming you did all of the above correctly, you now have a model hosted on Vertex AI, that can be applied on both individual leads as well as batches using HTTP. This will help us to apply the model on real-time data in our GTM-Server container. In the second part, we are going to take a look at this setup in sGTM, that let us apply our model in real-time on tracking data, so stay tuned! 🙂 Helpful links: