17 Aug 2023
Min Read
Building Upon Apache Flink for Better Stream Processing
Table of contents
- Why Choose Flink for DeltaStream?
- Supporting a wide range of use cases
- System performance
- Large open source community
- Enhancing Flink Beyond the Barriers
- A serverless stream processing platform
- Lower barrier to entry with extended SQL and modern user interface
- Support for real-time materialized views
- Wrapping Up
Making streaming data easy to manage, secure, and process has been at the heart of DeltaStream’s problem statement since day one. When it came time to choose how we wanted to process streaming data, it was clear that leveraging Apache Flink was the correct choice. In our previous blog post, we covered Why Apache Flink is the Industry Gold Standard. In this post, we’ll reiterate some of the reasons why we at DeltaStream chose Flink as our underlying processing engine, and explain how our platform improves upon Flink to provide a powerful and intuitive way to build stream processing applications for our customers.
Why Choose Flink for DeltaStream?
The use cases for stream processing are boundless (pun intended). From anomaly detection, to gaming, to IoT, to machine learning, there is a vast range of use cases that streaming can solve and each of these use cases have their own set of requirements. When choosing the correct stream processing engine, we want to ensure that (1) we can support a wide range of use cases, (2) the performance of our system is on par with the industry’s cutting edge, and (3) there is an opportunity to contribute back to the community and fix any bugs we may find. With these requirements, Flink is the only viable choice. At a high level, Apache Flink is a distributed real-time data processing engine that is highly scalable, highly efficient, and has seen massive growth in the past 5 years.
Supporting a wide range of use cases
Flink’s set of low-level and high-level APIs gives us flexibility when building features. When we want to expose certain configurations to our end users we can take advantage of using low-level APIs, and on the flip side, Flink’s high-level APIs allow us to support powerful data transformations that the Flink engine has optimized.
Flink also has a very rich connector ecosystem. For almost every popular data storage system, whether it’s a streaming database such as Apache Kafka, or an at rest database such as Postgres, there is likely already an open source Flink connector. If a connector is missing, Flink provides APIs that allow developers to write their own connector. This enables Flink to seamlessly integrate with most data infrastructures.
System performance
Flink is both low-latency and highly scalable. This means that resources can be rightsized for the workload which saves cost while maintaining performance. Flink’s state of the art savepointing algorithm also enables Flink to be fault tolerant and highly available.
Large open source community
Flink is one of the most popular Apache projects, with over 1,600 contributions during 2022. Its active mailing list and numerous committers makes it easy to find support for any issues as well as take part in the community. The DeltaStream team has made a number of contributions to the Flink project to fix bugs and make improvements. Choosing to use a project that we can both receive support from and contribute back to was a big factor in choosing Apache Flink.
If you want a more in depth analysis of how Flink works and why we think it’s so great, check out our previous blog post.
Enhancing Flink Beyond the Barriers
While Flink is an incredibly powerful technology, it is not the quick stream processing solution you may be searching for. It can be difficult to operate, take months to learn, and it requires stream processing expertise to design a Flink-based service to actually meet your demands. We at DeltaStream have become experts in Flink and have built a complete stream processing cloud offering that goes beyond Flink. The DeltaStream platform not only addresses the difficulties users face when starting out with Flink, but it addresses the difficulties users face when trying to integrate stream processing into their systems in general.
A serverless stream processing platform
While Flink is great in many ways, users may find it and other stream processing frameworks to be quite complex, especially those who are not familiar with other distributed processing systems. Operating Flink has a learning curve that requires users to understand Flink’s memory model, tuning savepointing and checkpointing, and diagnosing issues when things fail. There are plenty of gotchas that come along with Flink and other stream processing frameworks, such as data skew and inefficient serialization. For companies who are trying to add stream processing, this means longer ramp up time for engineers to learn and build the system. A serverless system such as DeltaStream would remove all this operational overhead from the user which means companies can set up and trust stream processing applications in minutes instead of months or even years.
Lower barrier to entry with extended SQL and modern user interface
To work with Flink directly, developers need to be experts in Java or Scala. Flink has been developing their Python API as well, but as of this writing it is much less mature, has fewer features, and is less performant than JVM Flink. DeltaStream removes this barrier by exposing our SQL grammar for defining stream processing jobs. By owning the SQL grammar, the user experience is abstracted away from Flink’s complexities while still taking advantage of Flink’s performance. Accessible through our scriptable CLI, web application, or REST API, the DeltaStream platform lowers the barrier to entry for stream processing while also providing a richer developer experience to our users. Visibility into query metrics, secure data sharing capabilities, and RBAC for queries and other entities in the platform are just a few of the ways that DeltaStream makes it easy for users to manage their streaming resources.
Support for real-time materialized views
Materialized views have been a staple in at-rest databases for a long time, but keeping them constantly up-to-date puts a lot of strain on the underlying database. Flink does not support materialized views out of the box (although Flink does have Dynamic Tables). While Flink is great for building stream processing applications for use cases that are event-driven such as anomaly detection, other use cases that aren’t event-driven, such as querying to find the most clicked on ad in an ad campaign, may be better solved with materialized views. DeltaStream provides materialized views as a core feature to support both in the same platform.
Wrapping Up
Flink is the industry gold standard for stream processing and that’s the main reason why we use it as our underlying processing engine. However, it comes with its complexities, both in development and operations. By managing the operations of Flink and using a more streaming-friendly SQL grammar, DeltaStream aims to make stream processing intuitive and pain free.
Make sure you visit our blogs page if you want to read more about what’s going on at DeltaStream. If you’re ready to try out our system, you can schedule a demo with us or sign up for a free trial. We’d love to show you what we’ve built.