Google introduced its Apache Kafka for BigQuery cloud service at its convention Google Cloud Subsequent 2024 in Las Vegas. Welcome to the information streaming membership becoming a member of Amazon, Microsoft, IBM, Oracle, Confluent, and others. This weblog publish explores this new managed Kafka providing for GCP, critiques the present standing of the information streaming panorama, and shares some standards to judge when Kafka normally and Google Apache Kafka particularly ought to (not) be used.
Welcome Google Apache Kafka to the Knowledge Streaming Membership
Higher late than by no means… Google introduced a model new Apache Kafka cloud service for GCP at Google Cloud Subsequent 2024. All different main cloud suppliers have already got one, together with AWS, Azure, Oracle, IBM, and Alibaba. Varied different software program distributors present Kafka companies, together with Confluent, Aiven, Redpanda, WarpStream, and lots of extra. Most leverage the open-source Kafka challenge as its core element, whereas others re-implement the Kafka protocol.
Apache Kafka and Apache Flink dominate the open-source information streaming ecosystem. Distributors and cloud options present cloud-native choices. Some builders, information engineers, and enterprise individuals nonetheless battle with a paradigm shift: Steady information processing permits higher information high quality, decreased value, and quicker time to market with modern new functions. Kafka and Flink are a match made in heaven for information streaming.
Use Instances for information streaming exist throughout all industries. Google Apache Kafka for BigQuery is doubtlessly a very good match for a few of them, however not for others.
Google Apache Kafka for BigQuery — What Is It?
What’s Google Apache Kafka for BigQuery? Quoting Google’s web site: “Apache Kafka for BigQuery is a managed service that operates extremely out there Apache Kafka clusters. It’s suitable with open supply variations of Apache Kafka and contains first-party Google Cloud IAM, monitoring, logging, key administration, group coverage, networking, and extra.” Listed below are a number of extra ideas:
- Asynchronous messaging with true decoupling and producers and customers utilizing the publish/subscribe sample is feasible with GCP proprietary service Google Pub/Sub. Why did Google now introduce a Kafka service? Limitations of Google Pub/Sub or as a result of Kafka turned the usual (e.g., emigrate on-premise Kafka workloads from prospects)? I suppose a little bit of each.
- Google re-uses open-source Kafka as a substitute of re-implementing the Kafka protocol (like Microsoft Azure’s Occasion Hubs). I like this method as a brand new implementation all the time creates a number of new challenges like lacking completeness, delays of recent options, and sudden conduct. The compatibility with open-source Kafka is talked about a number of instances. My private assumption is that Google’s predominant strategic objective for the brand new Kafka service is emigrate current on-premise workloads into Google Cloud.
- I actually like that the service is safe out of the field. It’s built-in with and helps Google Cloud IAM, customer-managed encryption keys (CMEK), and Digital Non-public Cloud (VPC) from the start. That is essential as most workloads at enterprises require this.
- Together with the time period ‘BigQuery’ is barely a advertising and marketing technique: “Knowledge engineers typically depend on Apache Kafka to construct pipelines that stream information into BigQuery and different analytics programs. Apache Kafka for BigQuery can be utilized for real-time and batch use circumstances”. There isn’t any requirement to make use of BigQuery for analytics. Google’s Kafka service is usable with different analytics platforms, too.
- Google emphasizes analytics use circumstances in every single place round its Kafka service; NOT transactional workloads. This method is just like Amazon MSK. Hopefully, the Google phrases and circumstances do not exclude Kafka assist when the service is GA (that is what MSK does — sadly, too many individuals do not learn T&C and simply use a cloud service in manufacturing).
Knowledge Streaming Is a NEW Software program Class
Knowledge streaming represents a brand new software program class that revolutionizes the best way companies harness and course of information in real-time. In contrast to conventional batch processing strategies, information streaming permits steady ingestion, evaluation, and processing of information because it flows by means of programs.
The Knowledge Streaming Panorama 2024
Many software program corporations have emerged within the information streaming class in the previous couple of years. And a number of other mature gamers within the information market added assist for information streaming of their platforms or cloud service ecosystem. Most software program distributors use Kafka for his or her information streaming platforms. Nonetheless, there’s greater than options powered by open-source Kafka. Some distributors solely use the Kafka protocol (e.g., Azure Occasion Hubs) or completely totally different APIs (like Amazon Kinesis).
The next Knowledge Streaming Panorama 2024 summarizes the present standing of related merchandise and cloud companies for information streaming round Kafka and extra stream processing engines.
Forrester Wave for Streaming Knowledge and IDG MarketScape for Stream Processing
Apache Kafka turned the de facto normal for information streaming, just like how Amazon S3 turned the de facto normal for object storage.
In December 2023, the analysis firm Forrester revealed “The Forrester Wave™: Streaming Knowledge Platforms, This autumn 2023.” Get free access to the report here. The leaders are Microsoft, Google, and Confluent, adopted by Oracle, Amazon, Cloudera, and some others.
In April 2024, IDC named Confluent a leader in the IDC MarketScape for Worldwide Analytic Stream Processing 2024.
It will not be a shock if we see a Gartner Magic Quadrant for Knowledge Streaming quickly, too. Gartner experiences point out Kafka and associated distributors an increasing number of yr by yr.
When Not To Select Google Apache Kafka for BigQuery
Qualifying out a know-how is commonly the simpler possibility. Why consider a service if it doesn’t meet the necessities? Let’s discover when NOT to make use of Kafka in any respect, and particularly when the Google Apache Kafka service might be NOT the proper alternative for you.
When Not To Use Apache Kafka
Apache Kafka has overlaps with applied sciences like a message dealer (like IBM MQ, TIBCO, or RabbitMQ), and different streaming analytics platforms, and it truly is a database, too. However Apache Kafka is just not an allrounder to resolve each downside.
Apache Kafka is NOT:
- A substitute in your favourite database, information warehouse, or information lake. As a substitute, it enhances and integrates with these platforms.
- An analytics platform for AI/ML mannequin coaching, though mannequin scoring is commonly accomplished throughout the streaming platform for crucial or low-latency use circumstances.
- A proxy for 1000’s of purchasers in dangerous networks.
- An API Administration resolution, though you may join REST/HTTP producers and customers in opposition to Kafka.
- An IoT gateway, though direct integration with IoT protocols like MQTT or OPC-UA is feasible.
- Laborious real-time for safety-critical embedded workloads.
Learn the thorough evaluation “When NOT to use Apache Kafka?” for extra particulars. Or watch this YouTube video:
When To Select One other Kafka As a substitute of Google’s
If Apache Kafka is the proper alternative in your challenge, you continue to have loads of choices.
Listed below are a number of standards that allow you to simply disqualify Google Apache Kafka for BigQuery:
- Non-GCP: In case your use case requires on-premise, multi-cloud, hybrid cloud, or edge deployments, then you definitely want one other provide.
- Vital SLAs: In the event you want 24/7 crucial assist and consulting experience, a devoted Kafka vendor like Confluent is the higher alternative. Kafka is not only for analytics, however shines for transactional workloads, too. Google’s Managed Apache Kafka service is just not GA but. It will most likely occur within the second half of 2024. Therefore, do not even think about it for crucial functions earlier than GA.
- Serverless: A managed service is just not all the time a really managed service. The longer term will present the place Google goes with Kafka. However proper now, Google Apache Kafka is just not serverless like e.g., Confluent Cloud. You pay for capability pricing and cluster capability administration is required. Amazon even created a second service Amazon MSK Serverless to deal with this challenge with its conventional MSK providing.
- Full information streaming platform: A knowledge streaming platform requires extra than simply messaging: information integration with first and third-party programs, stream processing for steady information correlation, versatile (long-term) retention with Tiered Storage, information governance, and extra. The longer term will present us the place Google’s Kafka service goes. Google is a automobile, however not (but) a Porsche (full luxurious automobile) and never but a Google Waymo (self-driving automobile stage 5). Google Apache Kafka even misses primary options for information streaming finest practices, like defining information contracts in schemas for constructing information merchandise with good information high quality.
The Evolution of Knowledge Streaming Is Not Stopping
In the event you didn’t qualify out Kafka normally or Google Apache Kafka particularly but, that is nice. Begin evaluating Google’s Managed Apache Kafka cloud service and examine it in opposition to self-managed open supply Kafka and different semi-managed or fully-managed Kafka cloud companies on GCP.
As we glance forward, the long run potentialities for information streaming are boundless, promising extra agile, clever, and real-time insights into the ever-increasing streams of information.
I typically get the query if I’m apprehensive in regards to the rising competitors as I work for Confluent the place we “solely do information streaming”?
No, I’m not! Truly, the brand new Google Apache Kafka cloud service is nice information for the trade! Knowledge Streaming established itself as a brand new software program class. Analysis analysts like Forrester and IDG already created devoted waves and comparisons. What could possibly be higher than working with the individuals who invented Kafka and the corporate that created this software program class throughout all industries and continents? And competitors is all the time good for innovation, too.
Actual-time information beats gradual information. That’s true in nearly each use case. At Confluent, we are actually ~3000 individuals working solely on one factor: Knowledge Streaming. I feel we must always have fun this Google announcement and stay up for extra mass adoption of information streaming around the globe.
And as a strategic Google companion, prospects can
- Leverage GCP credit to devour Confluent Cloud
- Leverage GCPs safety and personal networking infrastructure
- Combine by way of totally managed connectors into varied GCP companies like Google Large Question or Google Cloud Storage and third-party cloud options like MongoDB, Snowflake, or Databricks.
Are you excited in regards to the new Google Apache Kafka cloud service? Or do you continue to plan to make use of open-source Kafka or one other cloud service like Confluent Cloud? Let’s connect on LinkedIn and focus on it! Keep knowledgeable about new weblog posts by subscribing to my newsletter.