Announcing the release of Apache Samza 0.13.1
Announcing the release of Apache Samza 0.13.1
We are very excited to announce the release of Apache Samza 0.13.1
Samza has been powering real-time applications in production across several large companies (including LinkedIn, Netflix, Uber) for years now. Samza provides leading support for large-scale stateful stream processing with
- First class support for local state (with RocksDB store). This allows a stateful application to scale up to 1.1 Million events/sec on a single machine with SSD.
- Support for incremental checkpointing of state instead of full snapshots. This enables Samza to scale to applications with very large state.
- A fully pluggable model for input sources (e.g. Kafka, Kinesis, DynamoDB streams etc.) and output systems (HDFS, Kafka, ElastiCache etc.).
- A fully asynchronous programming model that makes parallelizing remote calls efficient and effortless.
- High level API for expressing complex stream processing pipelines in a few lines of code.
- Flexible deployment model for running the the applications in any hosting environment and with cluster managers other than YARN.
- Features like canaries, upgrades and rollbacks that support extremely large deployments with minimal downtime.
Enhancements, Upgrades and Bug Fixes
This is a stability release to make Samza as an embedded library production ready. Samza as a library is part of Samza’s Flexible Deployment model; release fixes a number of outstanding bugs includes the following enhancements to existing features:
- Standalone
- SAMZA-1165 Cleanup data created by ZkStandalone in ZK
- SAMZA-1324 Add a metrics reporter lifecycle for JobCoordinator component of StreamProcessor
- SAMZA-1336 Standalone session expiration propagation
- SAMZA-1337 LocalApplicationRunner supports StreamTask
- SAMZA-1339 Add standalone integration tests
- General
- SAMZA-1282 Fix killed leader process issue when spinning up more containers than the number of tasks kills leader
- SAMZA-1340 StreamProcessor does not propagate container failures from StreamTask
- SAMZA-1346 GroupByContainerCount.balance() should guard against null LocalityManager
- SAMZA-1347 GroupByContainerIds NPE if containerIds list is null
- SAMZA-1358 task.class empty string should be ignored when app.class is configured
- SAMZA-1361 OperatorImplGraph used wrong keys to store/retrieve OperatorImpl in the map
- SAMZA-1366 ScriptRunner should allow callers to control the child process environment
- SAMZA-1384 Race condition with async commit affects checkpoint correctness
- SAMZA-1385 Fix coordination issues during stream creation in LocalApplicationRunner
Overall, 29 JIRAs were resolved in this release. A source download of the 0.13.1 release is available here. The release JARs are also available in Apache’s Maven repository. See Samza’s download page for details and Samza’s feature preview for new features. We requires JDK version newer than 1.8.0_111 when running 0.13.1 release for users who are using Scala 2.12.
Community Developments
We’ve made great community progress since the last release (0.13.0). We presented Samza high level API features at the Cloud+Data NEXT Conference 2017 held in Silicon Valley, USA, and also gave a talk regarding the key features (Secret Kung Fu) of Samza at ArchSummit 2017 in Shenzhen, China, and a detailed study of stateful stream processing in VLDB 2017. Here are the details to these conferences.
- July 15, 2017 - Unified Processing with the Samza High-level API (Cloud+Data NEXT Conference, Silicon Valley) slides
- July 7, 2017 - Secret Kung Fu of Massive Scale Stream Processing with Apache Samza - Xinyu Liu [ArchSummit, Shenzhen, 2017]
- Aug 28, 2017 - Samza: Stateful Scalable Stream Processing at LinkedIn - Kartik Paramasivam (ACM VLDB, Munich, 2017)
In industry, Samza got new adopters, including Redfin and VMWare. As future development, we are continuing working on improving the new High Level API and flexible deployment features. Here is the list of the tasks for upcoming features and improvements.
Contribute
It’s a great time to get involved. You can start by reviewing the tutorials, signing up for the mailing list, and grabbing some newbie JIRAs.
I’d like to close by thanking everyone who’s been involved in the project. It’s been a great experience to be involved in this community, and I look forward to its continued growth.