Read, Watch and Listen.
Recipes for Preparing Raw Firebase Events in Big-query
Firebase / Google Analytics allows you to export raw event data to Bigquery, Google's data warehouse platform. Bigquery supports a SQL dialect called GoogleSQL (previously known as StandardSQL). Having access to full unabridged data of your users' events is great if you want to customize your analytics layer and obtain in-depth insights into your users' behavior. However, this comes at the cost of working with raw data and building many basic analytics tools yourself. While doing this, it is important to know the structure of the available data. In this article, we want to look at how the Firebase events data is structured and propose a few "recipes" for some of the most basic tasks you will come by when working with such data.
1. Accessing multiple tables in one query
Firebase event data is saved in BigQuery in multiple tables, one for each date. Thankfully, a convenient naming convention `events_YYYYMMDD` allows access to all such tables in a single query using a wildcard in the table name:
You can also get data for a particular year or month as follows:
Get all events for 2023
Get all events for March 2023
Moreover, you can apply a more granular filter with `_TABLE_SUFFIX`, a pseudo column like this to get all events for the first seven days of March 2023
Applying filters to your wildcard queries allows you to limit your requests to particular subsets of raw event data and save on Bigquery processing costs.
2. A few useful events data fields
- `event_name` can be one of either standard events or custom events your app is sending to Firebase.
- `event_timestamp` specifies a timezone-independent representation of the time when the event happened at the user's device (not when data was uploaded to the server). The `event_timestamp` is in microseconds so that it can be converted to Bigquery's `TIMESTAMP` format using the `TIMESTAMP_MICROS` function.
- `user_pseudo_id` is a unique identifier for a particular app installation. If the app is reinstalled, a new `user_pseudo_id` is generated. This does not apply to app updates, though, so `user_pseudo_id` can be quite useful if you do not include custom user ids to your events yourself.
- `event_value_in_usd` indicates gross revenue for a particular `in_app_purchase` event, conveniently converted into US dollars.
- `platform` specifies the device platform (iOS / Android).
- `geo` contains data about the user's geographical location, with subfields such as `geo.country` or `geo.city` providing different levels of detail. Note that geographical data is included based on the user's location during each particular event, so it can differ between events.
- `device` and its subfields contain a lot of useful information on the device model, OS, language, etc.
- `app_info.version` provides app version.
Let's see how we can use some of the available fields.
3. Getting local event time
If you have an app with a global audience, you must consider time zone differences, which gets tricky very quickly. Fortunately, Firebase provides a field `device.time_zone_offset_seconds` that specifies a timezone offset from the GMT in seconds for each particular event. Then it is easy to calculate the local date time for each event as follows. First, convert `event_timestamp` to `DATETIME` in GMT timezone:
Then, convert `TIMESTAMP` to `DATETIME` at GMT timezone:
Then, add time zone offset to arrive at the local `DATETIME`:
4. Time to purchase
For each event, a `user_first_touch_timestamp` field is available. Calculating the difference between `user_first_touch_timestamp` and `event_timestamp` allows filtering events within a particular time range since each user's first opening.
For example, we can learn how many users are subscribing within the first 10 minutes, 60 minutes, one day, etc. of first opening the app. To do that, we calculate the time difference between `user_first_touch_timestamp` and `event_timestamp` for each `in_app_purchase` event using the `TIMESTAMP_DIFF` function:
With this data, we can build a "time-to-purchase" histogram with any convenient tool (Python, Google Sheets, etc.).
5. Accessing events' custom attributes
A lot of useful information about the events is included in the columns named `event_params` and `user_properties`. The specific list of such attributes depends on the integrations you use, but many useful things can already be inferred from default Firebase and Google Analytics attributes. Custom attributes you assign to particular events or users can also be found in these columns. Note that attributes can differ for different events. For example, attributes `ga_session_id` are included for every event type, while `price` is understandably included only for `in_app_purchase` events.
Some of the useful attributes are:
- For `event_params` array:
- `ga_session_number` contains a sequential number of the session for a specific user;
- `engagement_time_msec` provides data on each particular user's engagement time. Sum over all `engagement_time_msec` values for a user to get their total engagement time (note that this attribute is not unique to the `user_engagement` event type, so make sure to sum over all event types to get your engagement time data);
- `product_id`, `price` and `currency` for `in_app_purchase` events.
- Your custom event parameters.
- For `user_properties` array:
- `firebase_exp_NN` provides Firebase A/B testing data;
- Your custom user properties.
6. Working with columns that are arrays of structs
Each of the `event_params` and `user_properties` columns has a type of repeated STRUCT/RECORD, so you can think of it as an array of key-value pairs. Since each value in this column is an array, you won't be able to access their attributes directly by key. For example, you cannot get the price of your `in_app_purchase` events like this:
Will not work.
Instead, we have to use Bigquery's UNNEST operator, which takes an array and converts it into a set of rows that can be joined or filtered in a usual way. For example, to get all events for which the event_params array contains the `price` key, we can use the following query:
Notice that the above query will exclude all rows that do not have the `price` key in `event_params`. If you want to return ALL rows and add price information only to the rows that have the `price` property (and NULLs everywhere else) your query can look something like this:
Note that each attribute has predetermined value fields: string, int, float, and double. Usually, only one of these fields is used, so double-check which type is used for your particular field value.
7. A/B tests data
Data that allows mapping events to Firebase A/B experiments is stored in the `user_properties` field. If a user was part of the experiment `N` with variant `V`, a key-value pair `firebase_exp_N`-`V` is added to the `user_properties` array for each relevant event. Variant id is specified in the corresponding `string_value` field (even though the variant id looks like an integer).
Applying the approach from the previous section, we can query the events from experiment 42 as follows: -- All events in experiment 42
With this as a starting point, you can calculate all kinds of stats about users in your experiments.
8. Calculate basic stats for an experiment
One basic question you can ask about an experiment is how many events of each type are there for each variant and how many unique users had such events. The following query achieves this:
WITH data AS (...) -- All events in a particular experiment, as described above
We can infer many things from this straightforward calculation alone.
For example, we can understand how many users were included in each variant of the experiment, and compare conversion rates from `session_start` to any chosen event between variants.
To do that, just divide `dcount` of your chosen event by `dcount` of the `session_start` event.Note that this query will not give you conversion rates between arbitrary events, for which we have to write a more sophisticated query with joins or window functions.
We leave it for the future. Total revenue by variant can be most easily accessed via `event_value_in_usd` field as follows:
WITH data AS (...) -- All events in a particular experiment, as described above
Combining this query with the result of the previous one, you can easily calculate revenue per user by variant.
We’ve looked at some of the tasks that you might find useful in your own exploration of Firebase events in Bigquery. This is undoubtedly just scratching the surface of what's possible with direct access to raw events so that we will look at more sophisticated examples in future posts.