21 Sep 2022
Min Read
DeltaStream 101 Part 2 – Always Up-to-date Materialized Views for Kafka and Kinesis
If you recall in DeltaStream 101 Part 1, we introduced DeltaStream, a serverless stream processing platform to manage, secure and process all your streams on cloud, and walked through a simple clickstream analytics use case.
In this post, we will continue to build on that base. Here, we’ll walk through how you can build materialized views that are continuously updated based on the results of the streaming queries in our previous post, and serve the results of those views to a user on a web page. The event streams that we begin with to ultimately wind up with data in DeltaStream could be built in popular event stream storage platforms such as Apache Kafka, Confluent Cloud, Amazon MSK, or Amazon Kinesis.
This is a sample of a web page built with materialized views in DeltaStream to serve user statistics to a visitor.
Before we dive in, what is a materialized view? In short, a materialized view is the result of a query, stored as a table. Sounds simple enough. But in a database built for streaming data, queries must produce the most up-to-date results in a real-time manner whenever called, so creating and updating a materialized view becomes more complex. Fortunately, DeltaStream takes care of all of these concerns under the hood, serving results with sub second latency. For the DeltaStream user, everything behind creating materialized views for streaming data looks like familiar SQL. The following figure shows how continuous queries in DeltaStream can build Materialized Views from a stream of events in Apache Kafka.
Materialized View #1: Number of times a URL has been visited
Let’s take a look at our first materialized view. Here, we are using the queries from our previous post to create a materialized view that represents the number of times every url has been visited. While this looks like standard SQL, the fact is, if the user visited that URL only a half second ago, it will be represented in our materialized view.
CREATE MATERIALIZED VIEW url_visit_count AS SELECT url, count(*) AS url_visit_count FROM clicks_dev GROUP BY url;
Once we create the materialized view in DeltaStream, we can query it for the latest result the same way we would query a table in a relational database. For instance, the following query returns the number of times a url with address “./home” has been visited:
SELECT * FROM url_visit_count WHERE url = './home';
With another query, we could find the url with the most number of views. This can be computed easily using the following query on the materialized view. Note that, again, since our materialized view continuously updates as click events are received, the result of this query will be the accurate real-time value.
SELECT url, url_visit_count FROM url_visit_count ORDER BY url_visit_count LIMIT 1;
Materialized View #2: Number of visits for each URL via every device type
Now we will build another materialized view. This time we want to build a view to store the number of visits a url has on different devices. If you recall in our previous post, we had filtered events with a desktop device ID and computed the number of events per url and device ID using a continuous query. Here is how we turn that into a materialized view that updates with very low latency with DeltaStream.
CREATE MATERIALIZED VIEW url_device_visit_count AS SELECT url, device_id, count(*) as url_device_visit_count FROM clicks_dev GROUP BY url, device_id;
Materialized View #3: Number of times a user visited a url on each device
And finally, we want to create a materialized view that we can query to know how many times a user visited the website on each device. For example, we may want to see how many times a given user visited various urls in the website on laptops, mobile devices, and tablets. Below is the SQL we can use to create this materialized view in DeltaStream.
CREATE MATERIALIZED VIEW user_device_visit_count AS SELECT user_id, device_id, count(*) as user_device_visit_count FROM clicks_dev GROUP BY user_id, device_id;
Similar to the above examples, once we create the materialized view in DeltaStream, it is ready for querying. Here is another example of querying the last materialized view we created to get the number of visits the user with userid of ‘User_9’ from mobile devices had.
SELECT user_id, user_device_visit_count FROM user_device_visit_count WHERE user_id = 'User_9' AND device_id = 'mobile';
I hope you enjoyed these examples so far of how you can use DeltaStream to go from raw event streaming to materialized views that can serve results to a web page for the latest, absolutely accurate results. In future posts, we’ll cover more capabilities for building, managing and securing real-time applications and pipelines. In the meantime, if you want to try this yourself please request a demo.