Stats Collector

The Stats Collector Utility consists of a series of Redis based counting mechanisms, that allow a program to do distributed counting for particular time periods.

There are many useful types of keys within Redis, and this counting Stats Collector allow you to use the following styles of keys:

  • Integer values
  • Unique values
  • HyperLogLog values
  • Bitmap values

You can also specify the style of time window you wish to do your collection in. The Stats Collect currently supports the following use cases:

  • Sliding window integer counter
  • Step based counter for all other types

The sliding window counter allows you to determine “How many hits have occurred in the last X seconds?”. This is useful if you wish to know how many hits your API has had in the last hour, or how many times you have handled a particular exception in the past day.

The step based counter allows you to to collect counts based on a rounded time range chunks. This allows you to collect counts of things in meaningful time ranges, like from 9-10 am, 10-11 am, 11-12 pm, etc. The counter incrementally steps through the day, mapping your counter to the aggregated key for your desired time range. If you wanted to collect in 15 minute chunks, the counter steps through any particular hour from :00-:15, :15-:30, :30-:45, and :45-:00. This applies to all time ranges available. When using the step style counters, you can also specify the number of previous steps to keep.

Note

The step based counter does not map to the same key once all possible steps have been accounted for. 9:00 - 9:15 am is not the same thing as 10:00 - 10:15am. or 9-10 am on Monday is not the same thing as 9-10am on Tuesday (or next Monday). All steps have a unique key associated with them.

You should use the following static class methods to generate your counter objects.

class StatsCollector

These easy to use variables are provided for convenience for setting up your collection windows. Note that some are duplicates for naming convention only.

Variables:
  • SECONDS_1_MINUTE – The number of seconds in 1 minute
  • SECONDS_15_MINUTE – The number of seconds in 15 minutes
  • SECONDS_30_MINUTE – The number of seconds in 30 minutes
  • SECONDS_1_HOUR – The number of seconds in 1 hour
  • SECONDS_2_HOUR – The number of seconds in 2 hours
  • SECONDS_4_HOUR – The number of seconds in 4 hours
  • SECONDS_6_HOUR – The number of seconds in 6 hours
  • SECONDS_12_HOUR – The number of seconds in 12 hours
  • SECONDS_24_HOUR – The number of seconds in 24 hours
  • SECONDS_48_HOUR – The number of seconds in 48 hours
  • SECONDS_1_DAY – The number of seconds in 1 day
  • SECONDS_2_DAY – The number of seconds in 2 days
  • SECONDS_3_DAY – The number of seconds in 3 day
  • SECONDS_7_DAY – The number of seconds in 7 days
  • SECONDS_1_WEEK – The number of seconds in 1 week
  • SECONDS_30_DAY – The number of seconds in 30 days
get_time_window(redis_conn=None, host='localhost', port=6379, password=None, key='time_window_counter', cycle_time=5, start_time=None, window=SECONDS_1_HOUR, roll=True, keep_max=12)

Generates a new TimeWindow Counter. Useful for collecting number of hits generated between certain times

Parameters:
  • redis_conn – A premade redis connection (overrides host, port and password)
  • host (str) – the redis host
  • port (int) – the redis port
  • password (str) – the redis password
  • key (str) – the key for your stats collection
  • cycle_time (int) – how often to check for expiring counts
  • start_time (int) – the time to start valid collection
  • window (int) – how long to collect data for in seconds (if rolling)
  • roll (bool) – Roll the window after it expires, to continue collecting on a new date based key.
  • keep_max (bool) – If rolling the static window, the max number of prior windows to keep
Returns:

A TimeWindow counter object.

get_rolling_time_window(redis_conn=None, host='localhost', port=6379, password=None, key='rolling_time_window_counter', cycle_time=5, window=SECONDS_1_HOUR)

Generates a new RollingTimeWindow. Useful for collect data about the number of hits in the past X seconds

Parameters:
  • redis_conn – A premade redis connection (overrides host, port and password)
  • host (str) – the redis host
  • port (int) – the redis port
  • password (str) – the redis password
  • key (str) – the key for your stats collection
  • cycle_time (int) – how often to check for expiring counts
  • window (int) – the number of seconds behind now() to keep data for
Returns:

A RollingTimeWindow counter object.

get_counter(redis_conn=None, host='localhost', port=6379, password=None, key='counter', cycle_time=5, start_time=None, window=SECONDS_1_HOUR, roll=True, keep_max=12, start_at=0)

Generate a new Counter. Useful for generic distributed counters

Parameters:
  • redis_conn – A premade redis connection (overrides host, port and password)
  • host (str) – the redis host
  • port (int) – the redis port
  • password (str) – the redis password
  • key (str) – the key for your stats collection
  • cycle_time (int) – how often to check for expiring counts
  • start_time (int) – the time to start valid collection
  • window (int) – how long to collect data for in seconds (if rolling)
  • roll (bool) – Roll the window after it expires, to continue collecting on a new date based key.
  • keep_max (int) – If rolling the static window, the max number of prior windows to keep
  • start_at (int) – The integer to start counting at
Returns:

A Counter object.

get_unique_counter(redis_conn=None, host='localhost', port=6379, password=None, key='unique_counter', cycle_time=5, start_time=None, window=SECONDS_1_HOUR, roll=True, keep_max=12)

Generate a new UniqueCounter. Useful for exactly counting unique objects

Parameters:
  • redis_conn – A premade redis connection (overrides host, port and password)
  • host (str) – the redis host
  • port (int) – the redis port
  • password (str) – the redis password
  • key (str) – the key for your stats collection
  • cycle_time (int) – how often to check for expiring counts
  • start_time (int) – the time to start valid collection
  • window (int) – how long to collect data for in seconds (if rolling)
  • roll (bool) – Roll the window after it expires, to continue collecting on a new date based key.
  • keep_max (int) – If rolling the static window, the max number of prior windows to keep
Returns:

A UniqueCounter object.

get_hll_counter(redis_conn=None, host='localhost', port=6379, password=None, key='hyperloglog_counter', cycle_time=5, start_time=None, window=SECONDS_1_HOUR, roll=True, keep_max=12)

Generate a new HyperLogLogCounter. Useful for approximating extremely large counts of unique items

Parameters:
  • redis_conn – A premade redis connection (overrides host, port and password)
  • host (str) – the redis host
  • port (int) – the redis port
  • password (str) – the redis password
  • key (str) – the key for your stats collection
  • cycle_time (int) – how often to check for expiring counts
  • start_time (int) – the time to start valid collection
  • window (int) – how long to collect data for in seconds (if rolling)
  • roll (bool) – Roll the window after it expires, to continue collecting on a new date based key.
  • keep_max (int) – If rolling the static window, the max number of prior windows to keep
Returns:

A HyperLogLogCounter object.

get_bitmap_counter(redis_conn=None, host='localhost', port=6379, password=None, key='bitmap_counter', cycle_time=5, start_time=None, window=SECONDS_1_HOUR, roll=True, keep_max=12)

Generate a new BitMapCounter. Useful for creating different bitsets about users/items that have unique indices.

Parameters:
  • redis_conn – A premade redis connection (overrides host, port and password)
  • host (str) – the redis host
  • port (int) – the redis port
  • password (str) – the redis password
  • key (str) – the key for your stats collection
  • cycle_time (int) – how often to check for expiring counts
  • start_time (int) – the time to start valid collection
  • window (int) – how long to collect data for in seconds (if rolling)
  • roll (bool) – Roll the window after it expires, to continue collecting on a new date based key.
  • keep_max (int) – If rolling the static window, the max number of prior windows to keep
Returns:

A BitmapCounter object.

Each of the above methods generates a counter object that works in slightly different ways.

class TimeWindow
increment()

Increments the counter by 1.

value()
Returns:The value of the counter
get_key()
Returns:The string of the key being used
delete_key()

Deletes the key being used from Redis

class RollingTimeWindow
increment()

Increments the counter by 1.

value()
Returns:The value of the counter
get_key()
Returns:The string of the key being used
delete_key()

Deletes the key being used from Redis

class Counter
increment()

Increments the counter by 1.

value()
Returns:The value of the counter
get_key()
Returns:The string of the key being used
delete_key()

Deletes the key being used from Redis

class UniqueCounter
increment(item)

Tries to increment the counter by 1, if the item is unique

Parameters:item – the potentially unique item
value()
Returns:The value of the counter
get_key()
Returns:The string of the key being used
delete_key()

Deletes the key being used from Redis

class HyperLogLogCounter
increment(item)

Tries to increment the counter by 1, if the item is unique

Parameters:item – the potentially unique item
value()
Returns:The value of the counter
get_key()
Returns:The string of the key being used
delete_key()

Deletes the key being used from Redis

class BitmapCounter
increment(index)

Sets the bit at the particular index to 1

Parameters:item – the potentially unique item
value()
Returns:The number of bits set to 1 in the key
get_key()
Returns:The string of the key being used
delete_key()

Deletes the key being used from Redis

Usage

To use any counter, you should import the StatsCollector and use one of the static methods to generate your counting object. From there you can call increment() to increment the counter and value() to get the current count of the Redis key being used.

>>> from scutils.stats_collector import StatsCollector
>>> counter = StatsCollector.get_counter(host='scdev')
>>> counter.increment()
>>> counter.increment()
>>> counter.increment()
>>> counter.value()
3
>>> counter.get_key()
'counter:2016-01-31_19:00:00'

The key generated by the counter is based off of the UTC time of the machine it is running on. Note here since the default window time range is SECONDS_1_HOUR, the counter rounded the key down to the appropriate step.

Warning

When doing multi-threaded or multi-process counting on the same key, all counters operating on that key should be created with the counter style and the same parameters to avoid unintended behavior.

Example

In this example we are going count the number of times a user presses the Space bar while our program continuously runs.

Note

You will need the py-getch module from pip to run this example. pip install py-getch

import argparse
from getch import getch
from time import time
from scutils.stats_collector import StatsCollector

# set up arg parser
parser = argparse.ArgumentParser(
    description='Example key press stats collector.\n')
parser.add_argument('-rw', '--rolling-window', action='store_true',
                    required=False, help="Use a RollingTimeWindow counter",
                    default=False)
parser.add_argument('-r', '--redis-host', action='store', required=True,
                    help="The Redis host ip")
parser.add_argument('-p', '--redis-port', action='store', default='6379',
                    help="The Redis port")
parser.add_argument('-P', '--redis-password', action='store', default=None,
                    help="The Redis password")

args = vars(parser.parse_args())

the_window = StatsCollector.SECONDS_1_MINUTE

if args['rolling_window']:
    counter = StatsCollector.get_rolling_time_window(host=args['redis_host'],
                                                     port=args['redis_port'],
                                                     password=args['redis_password'],
                                                     window=the_window,
                                                     cycle_time=1)
else:
    counter = StatsCollector.get_time_window(host=args['redis_host'],
                                                     port=args['redis_port'],
                                                     password=args['redis_password'],
                                                     window=the_window,
                                                     keep_max=3)

print("Kill this program by pressing `ENTER` when done")

the_time = int(time())
floor_time = the_time % the_window
final_time = the_time - floor_time

pressed_enter = False
while not pressed_enter:
    print("The current counter value is " + str(counter.value()))
    key = getch()

    if key == '\r' or key == '\n':
        pressed_enter = True
    elif key == ' ':
        counter.increment()

    if not args['rolling_window']:
        new_time = int(time())
        floor_time = new_time % the_window
        new_final_time = new_time - floor_time

        if new_final_time != final_time:
            print("The counter window will roll soon")
            final_time = new_final_time

print("The final counter value is " + str(counter.value()))
counter.delete_key()

This code either creates a TimeWindow counter, or a RollingTimeWindow counter to collect the number of space bar presses that occurs while the program is running (press Enter to exit). With these two different settings, you can view the count for a specific minute or the count from the last 60 seconds.

Save the above code snippet, or use the example at utils/examples/example_sc.py. When running this example you will get similar results to the following.

$ python example_sc.py -r scdev
Kill this program by pressing `ENTER` when done
The current counter value is 0
The current counter value is 1
The current counter value is 2
The current counter value is 3
The current counter value is 4
The current counter value is 5
The current counter value is 6
The current counter value is 7
The final counter value is 7

It is fairly straightforward to increment the counter and to get the current value, and with only a bit of code tweaking you could use the other counters that the StatsCollector provides.