Scrapy Cluster 1.2.1 Documentation¶
- Learn about the design considerations for the Kafka Monitor
- Quick Start
- How to use and run the Kafka Monitor
- The default Kafka API the comes with Scrapy Cluster
- Gives an overview of the different plugin components within the Kafka Monitor, and how to make your own.
- Explains all of the settings used by the Kafka Monitor
- Learn about the design considerations for the Scrapy Cluster Crawler
- Quick Start
- How to use and run the distributed crawlers
- Learning how to control your Scrapy Cluster will enable you to get the most out of it
- How to use both Scrapy and Scrapy Cluster to enhance your crawling capabilites
- Explains all of the settings used by the Crawler
- Argparse Helper
- Simple module to assist in argument parsing with subparsers.
- Log Factory
- Module for logging multithreaded or concurrent processes to files, stdout, and/or json.
- Method Timer
- A method decorator to timeout function calls.
- Redis Queue
- A module for creating easy redis based FIFO, Stack, and Priority Queues.
- Redis Throttled Queue
- A wrapper around the
redis_queuemodule to enable distributed throttled pops from the queue.
- Settings Wrapper
- Easy to use module to load both default and local settings for your python application and provides a dictionary object in return.
- Stats Collector
- Module for statistics based collection in Redis, including counters, rolling time windows, and hyperloglog counters.
- Zookeeper Watcher
- Module for watching a zookeeper file and handles zookeeper session connection troubles and re-establishment of watches.
- Upgrade Scrapy Cluster
- How to update an older version of Scrapy Cluster to the latest
- Integration with ELK
- Visualizing your cluster with the ELK stack gives you new insight into your cluster
- Use docker to provision and scale your Scrapy Cluster
- Crawling Responsibly
- Responsible Crawling with Scrapy Cluster
- Production Setup
- Thoughts on Production Scale Deployments
- DNS Cache
- DNS Caching is bad for long lived spiders
- Response Time
- How the production setup influences cluster response times
- Kafka Topics
- The Kafka Topics generated when typically running the cluster
- Redis Keys
- The keys generated when running a Scrapy Cluster in production
- Other Distributed Scrapy Projects
- A comparison with other Scrapy projects that are distributed in nature
- Frequently Asked Questions
- Scrapy Cluster FAQ
- Debugging distributed applications is hard, learn how easy it is to debug Scrapy Cluster.
- Learn how to contribute to Scrapy Cluster
- Change Log
- View the changes between versions of Scrapy Cluster.
- Scrapy Cluster is licensed under the MIT License.