Ramon Ozod-Seradj

Elevate your Conversion-Tracking using sGTM & Machine Learning | Part 3

Vertex AI is great for hosting BigQuery-developed models – especially whenever you want to do both individual and batch predictions. But sometimes all you need is a batch prediction once a day or so to enrich existing conversion data before uploading it to any advertising network. Especially since the launch of Google Ads Data Manager (I know, it is still buggy), batch predictions in BigQuery on conversion data became a viable option to create ML-based advertising algorithm feedback. Why use BigQuery instead of Vertex-AI? The advantages in using BigQuery for batch predictions are obvious: No extra GCP-Service, no Vertex-AI costs, less governance, no extra access-management, and so on.That’s why you should always evaluate your requirements first before just using a new service and creating unnecessary costs. Using BigQuery batch predictions To do so, let’s take a look at the model, that we’ve created in the first post: ml_test.full_conversion_prediction. For the purpose of this post, let’s assume we do have access to the GCLID (Google Ads clicks identifier) for every lead. So this time, instead of just running the prediction, we want to add the GCLID to all the leads, that have GCLID data available. After running the prediction, we unnest the result to create a flattened table, which can be used for conversion uploads. In the result table you can now multiply the conversion value and the lead conversion probability to calculate a more accurate conversion value. There are several resources on Google Ads Data Manager and how to use It (e.g. documentation by Google, appropriate rant by Lukas Oldenburg) so I won’t dive into that topic. I hope this is helpful in your journey to create more sophisticated MarTech setups 🙂

Elevate your Conversion-Tracking using sGTM & Machine Learning | Part 3 Read More »

Elevate your Conversion-Tracking using sGTM & Machine Learning | Part 2

After reading this series’ first part and applying all the things explained, you are probably wondering what to do with that super fancy, high performing ML-model. Well, the answer is: We are going to query it using HTTP every time a lead event arrives at the sGTM-container. To do so, just follow these simple steps: Import variable template I created this template based on a solution engineered by Google. It keeps the data type flexibility of Google’s solution while being customized to return a single float, that either represents your models return value (conversion probability) or a value, that has been multiplied with the before mentioned value (e.g. the lead’s value multiplied with its probability to convert). My template on GitHub Google’s original template Configure template according to your needs First configure all of the template’s fields with your Vertex AI model data, like region, id and so on. Now you add all the model parameters to the request data table within the template. You can convert your data to either integer or float by adding an appendix to the corresponding key (either “_int” or “_num”). You can skip the “event list” field for now, I will get back to this in the next part. Configure transformation In my opinion sGTM’s Transformations are the optimal way to set up API-queries, since it gives you all the control as to what Tags get access to what data, while at the same time avoiding one API call for every Tag, that requires that data. But Transformations do have one issue: They are resolved every time the runContainer() API is called, even when its rules do not apply and thus the Transformation itself is not applied on the event data. To avoid querying Vertex AI for every single incoming event, when actually only want to run the query for one or two, I’ve added the “event list”-field. Just add a comma-seperated list of all the events you want the query to be run at. Test & Deploy Testing thoroughly is always important so don’t be lazy! Last words There we go! Now you can officially call yourself an Analyst that has built a machine learning model, deployed it and applied it in production.I hope you enjoyed this read!

Elevate your Conversion-Tracking using sGTM & Machine Learning | Part 2 Read More »

Elevate your Conversion-Tracking using sGTM & Machine Learning | Part 1

A core responsibility of Digital Analytics is its ability to feedback advertising tools and their respective algorithm using Conversion-Tracking. And usually, a tracked conversion has a value assigned to it, that represents its value to the advertiser. In most cases this can be a static value for a lead or the revenue, that was generated with an online sale. But when trying to create a more sophisticated setup, a common improvement is trying to send a more precise value to the advertising network. This can be a sales’ margin instead of generated revenue or a more precisely calculated value for a lead. But no matter how precise the value itself is, it does not represent one core issue with Conversion-tracking: Will there be a return of the purchased product? Or will there be a cancellation? Is the tracked lead going to be converted into business?And unfortunately these questions cannot be answered within the time frame for Conversion-Tracking given by advertising-networks (e.g. 3 days). To tackle this issue, I will demonstrate how create a Logistic Regression Model in BigQuery and deploy it to GCP’s Vertex AI. In Part 2, I will go through the required sGTM Setup to manipulate the regular Conversion-Value before sending it to your advertising-network. Create a Model in Google BigQuery BigQuery offers a SQL-Interface to create a variety of Machine-Learning Models with a wide range of options. In this case, I will display how to build a model, that predicts a leads’ probability to convert into a sale. Since we are trying to create a value between 0 (no sale) and 1 (sale), Logistic Regression is a good model-type to work with. The first thing you need to do is look at the data at your disposal and anticipate, what possible parameters could have an influence and a leads’ conversion. This can be details within the lead form, number of fields, that have not been filled, session- or product-page duration and so on. In addition to that, there are rather generic parameters you can use in many cases, like hour of the day, day of the week, device category, new- vs. existing user and so on. While building to model, you can always add or remove parameters depending on their contribution. After having a specific set of parameters in mind, the arguably most import step is up next: Data Cleansing. Filter out debug- as well as internal traffic data, choose the correct data type for every parameter and so on. Watch out for classics like upper-/lowercase, undefined values and bot traffic. In addition try to think about every parameter in a sense of what you want to signal towards your model. Do you want day of the week to be a string (“monday”) or do you want it to be an integer (1). The latter signalizes a steady relationship for rising/declining days. Or do want to go for a bool that describes, whether the day of the conversion is on the weekend? Last but not least, you need to join lead-completion data on your tracked leads, so that you have a boolean column, that describes, whether a lead has created a sale or not. When you have all your data cleaned and structured, you should load it into a BigQuery table as training data. Not let’s start creating the model! After running this SQL command, BigQuery will calculate and create a ML-model and safe it as ‘full_conversion_predicition’ in the dataset ‘ml_test’. After diving a little deeper into BigQuery and its machine learning capabilities, you are probably going to run this a few more times with updated model options until you get the result, that works best for you. Now you can apply your model using the following command: The above queries result will return both a probability (float between 0 and 1) as well as a integer (0 or 1) that is meant to predict the tracked-leads’ outcome. If you want to learn more on BigQuery machine learning, feel free to check out their documentation When you have a model, that meets all of your requirements and has the necessary precision to predict a leads conversion correctly most of the time, you can export the model from BigQuery to Google Cloud Storage using the “export model” button in BigQuery’s interface above of the model’s details page. When the model has been exported successfully, you can go to Vertex AI. There you can import a model from Cloud Storage (after activating all required APIs) by going to the “model registry” page and click the “import” button. After importing the model, you can go to the “online prediction” page and set up an HTTP-endpoint for your model. Assuming you did all of the above correctly, you now have a model hosted on Vertex AI, that can be applied on both individual leads as well as batches using HTTP. This will help us to apply the model on real-time data in our GTM-Server container. In the second part, we are going to take a look at this setup in sGTM, that let us apply our model in real-time on tracking data, so stay tuned! 🙂 Helpful links:

Elevate your Conversion-Tracking using sGTM & Machine Learning | Part 1 Read More »

Automate your DataLayer tests with Selenium & Python

Let’s face it: Nobody enjoys DataLayer quality checks on regular basis. Most of us prefer developing a cool new feature or diving deeply into a new analysis. But to be able to do all of this while producing value, data quality is a must (“shit in, shit out”). Thus, it is crucial to know, whether all the pages in your CMS have the correct page attributes pushed into the DataLayer. To be able to automate the above and other test-cases, I’ve developed a handful of Python modules to automate such quality assurance task. Requisite is, that you need to have Python installed on your machine (Python 3.10 or above) The Setup: The following function retrieves the sitemap.xml file and returns a list of all URLs included. Optionally you can use the limit argument to cut the list’s length: The next thing you want to do is take this list of URLs and use selenium to open each webpage and retrieve the DataLayer. This part is a little bit more tricky and needs to take some optional actions into account. The URL argument is obvious – it’s the webpages URL, you want to visit to retrieve the DataLayer object from. The index tells the function what occurrence of a specific DataLayer event, you want to retrieve. If you have multiple scroll events and you want to check the DataLayer for the first one, the index is 0, for the third one it’s 2. The event argument is “None” by default. In this case, you get the object from the DataLayer, that matches the index argument. If you want to check e.g. page information, that is populated on load, you can often leave the event as it is and set the index to 0, as this information is often the first element. The navigation_steps argument is the fun part. It takes a list of instructions, that selenium shall execute to simulate user behavior on your webpage. You can click stuff, scroll through it or even submit a form. The list itself can look like this: The above example instructs selenium to click on a button in the consent manager, scroll to the page’s footer and then scroll back to the H1. The “wait” key passes the maximum amount of seconds selenium is allowed to wait for the element to appear on the page. My function only includes clicking as well as scrolling as possible actions, but feel free to add more. The last two argument instruct selenium to wait for your CMP to be loaded and visible. Just pass the css-selector for an element within the CMP in the cmp_selector argument and that’s it. To check for missing keys, one can use this function: All you need to do is define a list of keys, that are mandatory to be included in your DataLayer. The function returns all of these, that are missing in a list. Before we put it all together, we do need to add a functionality to send an email (e.g. to yourself or someone managing your website). Of course you could use another communication tool as, like Slack or Microsoft Teams Webhooks. But since E-Mail communication is commonly available, I am sticking to it this time. Here is a generic function that sends an E-Mail using SMTP credentials of your mail-server. There are several posts/tutorials on how to get these for almost any provider. So let’s put it all together: First we take all the above functions (except for the one for E-Mails) and put it into a functions.py file as a collection of utility functions: Then we create a file “send_mail.py” and add the mail-function as well as all necessary modules to it And lastly, we create “main.py” to put it all together: When you run main.py selenium opens the first 10 pages in your sitemap.xml file (due to the limit argument being “10”), checks the existence of the defined DataLayer keys, saves the missing ones to a file and sends it via E-Mail. I hope this helps you in your journey to automating DataLayer QAs! Feel free to check out the GitHub repo with all files: https://github.com/ramonseradj/static_cms_qa_public

Automate your DataLayer tests with Selenium & Python Read More »

BigQuery User-Defined-Function for mapping tables

In this post I am trying to explain, how to use a User-Defined-Function (“UDF”) to give Analyst access to small mapping and dimension tables without using a SQL-JOIN, thus making the procedure more reliable and standardized. Let’s say you have two tables: The goal is to give reliable access to both table while minimizing the risk of poorly written JOINs or even differently written ones for different SQL-operations. You can avoid that by creating a UDF that takes the JOIN-column as an argument and returns the complete row with the corresponding field as a STRUCT. This way, you can complement your event data (table 1) with campaign information reliably, without performing a JOIN. In addition to that you ensure, that the campaign mapping is always performed correctly as long as you supply the function to everyone.But there is one thing you need to keep in mind: This only works for mapping/dimension tables that consist of distinct rows only. Duplicate rows create a syntax error in BigQuery and prevent your whole query from running successfully. Let’s take a closer look on the SQL-code: I hope this gave you an idea on a neat alternative to JOINs in BigQuery – at least for a specific set of cases.

BigQuery User-Defined-Function for mapping tables Read More »

Advanced sGTM Cookie Setter

In my previous blog post, I tried to display different methods to persist session-data and create a session data model (post). One of the options discussed was setting a Cookie using your sGTM container, that contains an object with all the relevant session data, that you want to persist. Thinking about this, I’ve built a tag template for sGTM to set such a cookie without coding yourself. In this post, we’re going to take a quick look at the template and explain, how to use it. How does the template work? The tag template sets a cookie, with an object as its value. The values within the object can be configured in sGTMs template when setting up the tag. Next to your custom data, it automatically generates a session-id and has the ability to set a first hit timestamp as well as a latest hit timestamp. Since some session data is set by the time a session starts, while others is updated during a session, you can choose whether the data should be updated after the cookie was set for every row. To be a little less transparent towards others, you can optionally base64 encode the cookie value. There is one important thing to be aware of, when using the template: The cookie will be set in the incoming hits response – thus, the data within the cookie is not accessible for tags that fire on the same hit, as this template does. That is why I’d recommend sending a generic event on load upfront to every other event (e.g. page_view). This generic events instructs the sGTM container to set/update the cookie. Afterwards, you can send all other events and read the cookie values with every incoming hit. How to set it up A few last words This template enables you to persist both session and user data. Always keep in mind, that privacy and data protection are more import than data collection – so take consent and your user’s privacy in consideration before using the tag blindly. As with all other solutions in the technical marketing space, it has be used responsibly.

Advanced sGTM Cookie Setter Read More »

How to do build and persist a session data-model using sGTM

Even though every professional working in Digital Analytics and Web Development wants to reduce Client-Side JavaScript, the “new and advanced” Google Analytics 4 has added JavaScript-based Client-Side session handling. Which basically means, that Google outsourced compute resources to your user’s devices in order to save costs. Obviously, it’s a free and has some astonishing functionalities for a free tool. But let’s assume you don’t want to use GA4 anymore – you want to build your own tracker (possibly after you’ve read my posts about inserting data to BigQuery directly). Server-Side Googletagmanager offers more than one way to handle a session data-model in a Digital Analytics context. I’d argue there are at least tree, so let’s take a closer look at them: 1. Build a SQL-based session data-model after BigQuery insert This is the classic way of creating a session data-model. Let’s say you have a denormalized BigQuery table with all your event-data as well as a column “session_id” containing the associated session id for all these. The session_id itself is persisted by any kind of web storage mechanism (e.g. Cookies, localStorage, sessionStorage). While transforming this raw event-data to a more sophisticated production data, you can use SQL to create a model for all relevant session data. For example, you want to create a session table, that contains the session id, the session landingpage as well as the timestamp of the first hit. A query to model this data could look like this: The above query creates a session table, that contains the first page visited in a session as landingpage, the timestamp of the session’s first hit as the first_hit_timestamp and the corresponding session id. This table can be extended by any useful column imaginable. This solution’s biggest advantage is probably it’s ability to be rebuilt/updated at any time. The biggest disadvantage is, that the query usually only runs once a day or so. This means, that you do not have a session data-model in realtime. Additionally, you have to perform SQL-JOINs whenever you need session information working with your event data. 2. Persist session data based on a (HTTP-) Cookie In this version, we use sGTMs capabilities to create an HTTP-only cookie, which contains all relevant session data. It can be static throughout the session or be updated whenever necessary. Imagine having an object within a cookie, that contains not only the session id but also data like landingpage and first hit timestamp: The above cookie needs to be created by sGTM when a new session is started. When a session starts or an existing session ends can be engineered by checking for an existing cookie and calculating the difference between the current timestamp and the latest hit timestamp in the above object. With every hit after the session’s initialization the cookie is read and both the event data as well as the session data is written to BigQuery. This way your event table contains up-to-date session data in every column – which is probably this option’s biggest advantage. But be aware, that this option is more difficult to engineer, easier to manipulate and your collected data cannot be changed afterwards. A little service advise: In case you want to store all your session data in a cookie, remember to stringify the object and also consider encoding it in some way. 3. Persist session data based on a backend document-storage (Firebase) The third option is actually kind of a mix of the other two. Instead of setting a cookie, which contains all relevant session data in an object, you create and update a Google Firebase document (visually similar to a JavaScript object) using sGTM’s native API. Like the cookie in option two, the document has to be created whenever a new session starts. For every incoming event afterwards, you query Firestore to retrieve the session data and write to BigQuery together with the event data. The resulting table contents should be identical to option two. But all this happens in your Cloud backend instead of cookie visible to your users – similar to option one. This option has all the advantages that number two has, while also being more difficult to identify and manipulate. But mind the costs that can be create in GCP by using Firestore for every single incoming tracking request. I hope you were to get an idea of different possibilities to persist a session and create a data-model. At the end of the day, what solution you chose should depend on your need for real-time session data and the option to change the logic in which you stitch together a session. Best!Ramon

How to do build and persist a session data-model using sGTM Read More »

Import Event-Data to BigQuery using Server Side Google Tagmanager | Part 2

In this guide I will continue to show, how to send data from any client or server to Server Side Google Tagmanager and handle the incoming data, so that it can be written to a BigQuery table immediately.In the previous part of this post, I demonstrated how to create a generic REST-endpoint so receive data from any kind of source. Now let’s see how we can insert this data event by event into BigQuery to build an analytical database. In general, there are two ways to insert your data into BigQuery. One would be to create a stringified object out of all the event data from the common event data model. This would require you to parse and manipulate the data afterwords. The obvious advantage is, that there is no data loss in case you need some data you previously weren’t aware of. The second possibility is to define the rows and values you want to populate in your target table. This way, the receiving table already has data to work with and there are no additional data processing steps necessary. In this case I will show, how to do the latter. There are a few preparations one has take care of: After taking care of the above, you can start by importing the following Tag template: Link to template file After importing the template and checking the required tag-permissions, you can start configuring the tag. Set the destination table and the data you want to insert. There is also an option to insert data to nested fields in BigQuery (similar to JSON and Array type). To do so, the columns value needs to be a stringified JavaScript Object. I recommend adding a timestamp as well (big thanks to TRKKN for adding this to their template). And optionally you can add your Slack Webhook URL as well as an individual message. And that’s it! If something does not work, you can see it in sGTM’s debug console and/or your Slack messages. Have fun with setting it up and testing it!

Import Event-Data to BigQuery using Server Side Google Tagmanager | Part 2 Read More »

Import Event-Data to BigQuery using Server Side Google Tagmanager | Part 1

In this guide I will show, how to send data from any client or server to Server Side Google Tagmanager and handle the incoming data, so that it can be written to a BigQuery table immediately.Yes I know, there are many ways to import data to Google Cloud BigQuery. And I am also aware, that this topic has been around quite some time now. Generally speaking, the advantage of developing a REST endpoint to write data to BigQuery is obvious: you can handle multiple data streams in one sGTM Container while your developers to not have to study complex documentations for other APIs or even GA4 Measurement Protocol. While Measurement Protocol offers a significant amount of advantages for cases in which you want to send additional data to GA4 directly, there is no real benefit in using MP, if you just want to send data to your server container and do other stuff with it. In this case, complying with MP schema is basically over-engineering a simple data stream. Let me give you an example:Your company’s website has a monitoring system (like Instana), that checks availability to several services (e.g. is the checkout available). So in order to have up-to-date data for your checkout and thus being able to give context on dropping conversion-rates in your reports and dashboards, you want the availability data in your BigQuery Data Warehouse. You could do that by querying the monitoring tool’s API (in case it has one). But that would require you to query the API extremely often and deduplicate every incident in a probably complex way. Another possibility is to use in-built webhooks, which most of these monitoring tools have. These allow you to create a request every time a specific incident condition is met. For these you could use MP – but what would you want with arbitrary client_id, session_ids and so on, when you only want data on a technical incident.Instead, you can build your own sGTM Client, that claims incoming requests, when the conditions, that you configured are met. Here is an example on what this Client could look like (please find the .tpl-file in my GitHub): This client checks for both the request path and a request header (in this case “sgtm_auth”) to align with the configurations you made, when setting up the template. If both are met, the client parses the request body, checks for single or bulk event import (JSON vs. Array) and runs the container. This way you would only need to tell your developers the sGTM’s endpoint (“subdomain.mydomain.com”), the request path (e.g.: “/generic-data-import”) as well as the required request header (e.g.: “sgtm_auth”) and they’d be able to send you all the incident data, that you need. I hope this was an interesting read for you. In the second part (coming soon), I will try to show you, how to import the data, claimed by the above client, to BigQuery.

Import Event-Data to BigQuery using Server Side Google Tagmanager | Part 1 Read More »