1. Accessing multiple tables in one query
Firebase event data is saved in BigQuery in multiple tables, one for each date. Thankfully, a convenient naming convention `events_YYYYMMDD` allows access to all such tables in a single query using a wildcard in the table name:
SELECT * FROM `project.analytics_XXXXXXXXX.events_*`
You can also get data for a particular year or month as follows:
Get all events for 2023
SELECT * FROM `project.analytics_XXXXXXXXX.events_2023*`
Get all events for March 2023
SELECT * FROM `project.analytics_XXXXXXXXX.events_202303*`
Moreover, you can apply a more granular filter with `_TABLE_SUFFIX`, a pseudo column like this to get all events for the first seven days of March 2023
SELECT * FROM `project.analytics_XXXXXXXXX.events_202303*`
WHERE _TABLE_SUFFIX BETWEEN '01' AND '07
Applying filters to your wildcard queries allows you to limit your requests to particular subsets of raw event data and save on Bigquery processing costs.
- `event_name` can be one of either standard events or custom events your app is sending to Firebase.
- `event_timestamp` specifies a timezone-independent representation of the time when the event happened at the user's device (not when data was uploaded to the server). The `event_timestamp` is in microseconds so that it can be converted to Bigquery's `TIMESTAMP` format using the `TIMESTAMP_MICROS` function.
- `user_pseudo_id` is a unique identifier for a particular app installation. If the app is reinstalled, a new `user_pseudo_id` is generated. This does not apply to app updates, though, so `user_pseudo_id` can be quite useful if you do not include custom user ids to your events yourself.
- `event_value_in_usd` indicates gross revenue for a particular `in_app_purchase` event, conveniently converted into US dollars.
- `platform` specifies the device platform (iOS / Android).
- `geo` contains data about the user's geographical location, with subfields such as `geo.country` or `geo.city` providing different levels of detail. Note that geographical data is included based on the user's location during each particular event, so it can differ between events.
- `device` and its subfields contain a lot of useful information on the device model, OS, language, etc.
- `app_info.version` provides app version.
Let's see how we can use some of the available fields.
If you have an app with a global audience, you must consider time zone differences, which gets tricky very quickly. Fortunately, Firebase provides a field `device.time_zone_offset_seconds` that specifies a timezone offset from the GMT in seconds for each particular event. Then it is easy to calculate the local date time for each event as follows. First, convert `event_timestamp` to `DATETIME` in GMT timezone:
SELECT TIMESTAMP_MICROS(event_timestamp) AS ts
Then, convert `TIMESTAMP` to `DATETIME` at GMT timezone:
SELECT DATETIME(ts, 'GMT') AS dt_gmt
Then, add time zone offset to arrive at the local `DATETIME`:
DATETIME_ADD(dt_gmt, INTERVAL device.time_zone_offset_seconds SECOND) AS dt_local
For each event, a `user_first_touch_timestamp` field is available. Calculating the difference between `user_first_touch_timestamp` and `event_timestamp` allows filtering events within a particular time range since each user's first opening.
For example, we can learn how many users are subscribing within the first 10 minutes, 60 minutes, one day, etc. of first opening the app. To do that, we calculate the time difference between `user_first_touch_timestamp` and `event_timestamp` for each `in_app_purchase` event using the `TIMESTAMP_DIFF` function:
SELECT
TIMESTAMP_DIFF(
TIMESTAMP_MICROS(event_timestamp),
TIMESTAMP_MICROS(user_first_touch_timestamp),
MINUTE
) AS delta
FROM `project.analytics_XXXXXXXXX.events_2023*`
WHERE event_name = 'in_app_purchase'
With this data, we can build a "time-to-purchase" histogram with any convenient tool (Python, Google Sheets, etc.).
A lot of useful information about the events is included in the columns named `event_params` and `user_properties`. The specific list of such attributes depends on the integrations you use, but many useful things can already be inferred from default Firebase and Google Analytics attributes. Custom attributes you assign to particular events or users can also be found in these columns. Note that attributes can differ for different events. For example, attributes `ga_session_id` are included for every event type, while `price` is understandably included only for `in_app_purchase` events.
Some of the useful attributes are:
- For `event_params` array:
- `ga_session_number` contains a sequential number of the session for a specific user;
- `engagement_time_msec` provides data on each particular user's engagement time. Sum over all `engagement_time_msec` values for a user to get their total engagement time (note that this attribute is not unique to the `user_engagement` event type, so make sure to sum over all event types to get your engagement time data);
- `product_id`, `price` and `currency` for `in_app_purchase` events.
- Your custom event parameters.
- For `user_properties` array:
- `firebase_exp_NN` provides Firebase A/B testing data;
- Your custom user properties.
6. Working with columns that are arrays of structs
Each of the `event_params` and `user_properties` columns has a type of repeated STRUCT/RECORD, so you can think of it as an array of key-value pairs. Since each value in this column is an array, you won't be able to access their attributes directly by key. For example, you cannot get the price of your `in_app_purchase` events like this:
SELECT event_params.price.int_value AS variant_id, *
FROM `project.analytics_XXXXXXXXX.events_*`
Will not work.
Instead, we have to use Bigquery's UNNEST operator, which takes an array and converts it into a set of rows that can be joined or filtered in a usual way. For example, to get all events for which the event_params array contains the `price` key, we can use the following query:
SELECT
p.value.int_value AS price,
*
FROM `project.analytics_XXXXXXXXX.events_*`
CROSS JOIN UNNEST(event_params) AS p
WHERE p.key = 'price'
Notice that the above query will exclude all rows that do not have the `price` key in `event_params`. If you want to return ALL rows and add price information only to the rows that have the `price` property (and NULLs everywhere else) your query can look something like this:
SELECT
(SELECT value.int_value FROM t.event_params WHERE key = 'price') AS price,
*
FROM `project.analytics_XXXXXXXXX.events_*` t
Note that each attribute has predetermined value fields: string, int, float, and double. Usually, only one of these fields is used, so double-check which type is used for your particular field value.
Data that allows mapping events to Firebase A/B experiments is stored in the `user_properties` field. If a user was part of the experiment `N` with variant `V`, a key-value pair `firebase_exp_N`-`V` is added to the `user_properties` array for each relevant event. Variant id is specified in the corresponding `string_value` field (even though the variant id looks like an integer).
Applying the approach from the previous section, we can query the events from experiment 42 as follows: -- All events in experiment 42
SELECT
p.key AS exp_id,
p.value.string_value AS variant_id,
*
FROM `project.analytics_XXXXXXXXX.events_*`
CROSS JOIN UNNEST(user_properties) AS p
WHERE p.key = 'firebase_exp_42'
With this as a starting point, you can calculate all kinds of stats about users in your experiments.
One basic question you can ask about an experiment is how many events of each type are there for each variant and how many unique users had such events. The following query achieves this:
WITH data AS (...) -- All events in a particular experiment, as described above
SELECT
event_name,
variant_id,
COUNT(user_pseudo_id) AS count,
COUNT(DISTINCT user_pseudo_id) AS dcount
FROM data
GROUP BY event_name, variant_id ORDER BY event_name, variant_id
We can infer many things from this straightforward calculation alone.
For example, we can understand how many users were included in each variant of the experiment, and compare conversion rates from `session_start` to any chosen event between variants.
To do that, just divide `dcount` of your chosen event by `dcount` of the `session_start` event.Note that this query will not give you conversion rates between arbitrary events, for which we have to write a more sophisticated query with joins or window functions.
We leave it for the future. Total revenue by variant can be most easily accessed via `event_value_in_usd` field as follows:
WITH data AS (...) -- All events in a particular experiment, as described above
SELECT
variant_id,
SUM(event_value_in_usd) AS revenue
FROM data
WHERE event_name = 'in_app_purchase'
GROUP BY variant_id ORDER BY variant_id
Combining this query with the result of the previous one, you can easily calculate revenue per user by variant.
To Conclude
We’ve looked at some of the tasks that you might find useful in your own exploration of Firebase events in Bigquery. This is undoubtedly just scratching the surface of what's possible with direct access to raw events so that we will look at more sophisticated examples in future posts.