Chat with your data using Gemini, Streamlit & CloudRun
There is something I have to confess: I am not riding the AI-hype-train as much as many others do. Don’t get me wrong – AI as it is now has the potential to drastically improve productivity at a variety of tasks, especially coding. Nevertheless, I do not believe that it will destroy all knowledge work, nor do I believe that we will all lose our jobs in the coming months. Conversely, managers are eager to investigate the potential of AI to generate the aforementioned productivity increase. While coding is the obvious case, distribution of basic analysis can be another one. So today, I am going to demonstrate how to create a Gemini- and BigQuery-based chatbot, that allows anyone to ask questions directly to your data warehouse. Let’s imagine this case: You have a table containing all of your ecommerce orders. Next to the common columns you can find in most tables, like order_id, order_timestamp, item_name, revenue, it contains a huge number of columns for acquisition, device, and product details. Creating a report or dashboard, that allows users to filter as well as analyze all these different combinations would be a pain – either in terms of complexity or in terms of quantity. So instead, we are going to create a chat-like interface, that allows your users to ask questions and generate charts based on just that specific table. This project is based on Googles Conversational API quickstart guide. My repository: GitHub Prequisites – what you’ll need Create the application To start, you have to enable the required APIs in your Google Cloud Project. To do so, run the following command using gcloud CLI. It sets the correct project-id and sets up what service account you use for API-enablement, deployment, and so on. The next thing we need to do is create an OAuth Client in GCP, assuming you don’t want the application to be publicly available. The OAuth Client allows you to set up access for specific Google accounts. While comfortable for teams that are working a lot with Google products already, there could be better access management options for you (e.g. IP-range based access for companies with a static IP-range). However, for the purpose of this post, we will stick to OAuth just as Google did in their guide. Create consent screen Create OAuth client Prepare the project Next up is cloning the repository to your local machine: Now you need to create a .env file on the root level of your project clone: This is the version to run the application locally. Since we are planning to deloy it to CloudRun, the redirect URI has to be changed to the default-URL that will be provisioned for your application once it is deployed. This is how it should look: Run locally To run the application locally, all you need to do is install the required packages (I highly recommend using a virtual environment): And now start the Streamlit app using the following command. To cancel Streamlit just press CTRL and C in your terminal. streamlit run app.py Deploy to CloudRun Before we start the deployment, there are a few updates we need to make beforehand. Firstly, we need to update the .env file like we did before. The Redirect-URI has to be set to the CloudRun Service-URI. Otherwise, your app will always try to redirect you to localhost after authentication. In addition to that, the CloudRun service URI has to be added to your OAuth-Client for both “Authorized JavaScript origins” and “Authorized redirect URIs”. Since gcloud doesn’t listen to .gitignore instructions, a .gcloudignore file is necessary. It is already part of the repository, but you should check for possible additions like your venv or service account file. Last but not least, we want to deploy our project as a dockerized application, so we need to add a Dockerfile. To do so, just create a new file called “Dockerfile” in your project at the root level and add the following contents: With all of the above being done, you can deploy your application using this command: After deployment, the application is publicly available (authentication is handled by the OAuth Client) and listens to port 8501.You can test it by browsing to GCP’s CloudRun interface, clicking on your application, and then visiting the default URL. Create an agent Now you open your AI-based chat-application and create your first agent to “talk” to your orders table. To do so, click “Agents” on the left and scroll down to “Create a data agent”. Add your table’s project-, dataset- and table-id as well as an agent name and description. Agent instructions are a bit more complicated – they are basically used to create context for the agent to understand your data better. It is a YAML-based description for the contents of both table and columns: what do they mean, how to run calculations on them, and so on. Please check out Google’s documentation on how to create agent instructions for BigQuery. This is their example for agent instructions: If you have maintained your BigQuery metadata correctly (I for sure did not :D), it’ll be much easier for both you and the agent. Now you click “Create agent” navigate on the top left to “Chat” and start chatting with your agent. You can use it to answer direct questions or even create rendered visualizations like bar-charts. Customizing the Interface To impress your boss with a sophisticated MVP, you should consider applying your corporate identity to the application. You can add a logo, set a theme color as well as styles for individual elements or even add custom CSS. You can check out the Streamlit documentation on styling to get an overview of what is possible: https://docs.streamlit.io/develop/concepts/configuration/theming. Last words I really like the idea of using AI in combination with YAML-based instructions. It feels way more reliable to have a specific agent with detailed instructions for just one table.Nevertheless you should keep in mind that AI has an error margin; thus you should test thoroughly
Chat with your data using Gemini, Streamlit & CloudRun Read More »
