Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel*, Sanjeev Kulkarni,Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, Dmitriy Ryaboy
Twitter , *University of Wisconsin – Madison
1、INTRODUCTION
Configuration | Configuration details | Result |
1 | Use an existing Zookeeper cluster at
Twitter that was also being used by many other systems inside Twitter |
Quickly exceeded the amount of
clients that this Zookeeper cluster could support |
2 | Identical to the first one, except with
dedicated hardware for the Zookeeper cluster |
Improved the number of workers
processes and topologies that run in Storm cluster , but quicklyhit a limit at around 300 workers per cluster |
3 | Changed the Zookeeper hardware and
configuration again: We used database class hardware |
Scaled to approximately 1200
workers |
4 | Changed the Kafka Spout code to
write its state to a key-value store , also changed the Storm core |
In favor of high availability and
high write performance |
Storm Overheads:
The experiment | Experimental
environment |
Support
message reliability |
Input
consumption (msgs/sec) |
CPU
utilization |
1 | Write a Java program
that did not use Storm1 machine |
no | 300K | 700% |
2 | Storm topology
1 machine |
no | 300K | 660% |
3 | Storm topology
3 machine |
yes | 300K | 924% |
Conclusion 1: Mitigated the concerns regarding Storm adding significant overhead compared to vanilla Java code that did the same computation.
Conclusion 2: Storm CPU costs related to the message reliability mechanism in Storm are nontrivial .
Max Spout Tuning:
Storm topologies have a max spout pending parameter.
Problem :right value for this max spout pending parameter.
Solution:auto-tuning algorithm.
4、Conclusions and Future Work
Storm is a critical infrastructure at Twitter that powers many of the real-time data-driven decisions that are made at Twitter.
1.Automatically optimizing the topology statically and re-optimizing dynamically
2.Add exactonce semantics
3.Improve the visualization tools, improve the reliability of certain parts
4.Provide a better integration of Storm with Hadoop
5.Support a declarative query paradigm for Storm that still allows easy extensibility.