Parallels¶

Provides parallel processing feature to algorithm classes.

It is written in C++/OpenMP to maximize CPU utilization. Even with a single thread, it works faster than the default implementation of Algo classes. Parallels also provides a boosting feature to execute most_similar function, which is based on approximate nearest neighbors library N2. For performance and examples usage, please refer to the benchmark page and unit test codes.

class buffalo.parallel.base.Parallel(algo, *argv, **kwargs)¶

Bases: abc.ABC

most_similar(keys, topk=10, group='item', pool=None, repr=False, ef_search=-1, use_mmap=True)¶

Caculate TopK most similar items for each keys in parallel processing.

Parameters:

keys (list) – Query Keys
topk (int) – Number of topK
group (str) – Data group where to find (default: item)
pool (list or numpy.ndarray) – The list of item keys to find for. If it is a numpy.ndarray instance then it treat as index of items and it would be helpful for calculation speed. (default: None)
repr (bool) – Set True, to return as item key instead index.
ef_search (int) – This parameter is passed to N2 when hnsw_index was given for the group. (default: -1 which means topk * 10)
use_mmap – This parameter is passed to N2 when hnsw_index given for the group. (default: True)

Returns:

list of tuple(key, score)

topk_recommendation(keys, topk=10, pool=None, repr=False)¶

Caculate TopK recommendation for each users in parallel processing.

Parameters:	keys (list) – Query Keys topk (int) – Number of topK repr (bool) – Set True, to return as item key instead index.
Returns:	list of tuple(key, score)

class buffalo.parallel.base.ParALS(algo, **kwargs)¶

Bases: buffalo.parallel.base.Parallel

most_similar(keys, topk=10, group='item', pool=None, repr=False, ef_search=-1, use_mmap=True)¶: See the documentation of Parallel.

topk_recommendation(keys, topk=10, pool=None, repr=False)¶: See the documentation of Parallel.

class buffalo.parallel.base.ParBPRMF(algo, **kwargs)¶

Bases: buffalo.parallel.base.ParALS

most_similar(keys, topk=10, group='item', pool=None, repr=False, ef_search=-1, use_mmap=True)¶: See the documentation of Parallel.

topk_recommendation(keys, topk=10, pool=None, repr=False)¶: See the documentation of Parallel.

class buffalo.parallel.base.ParW2V(algo, **kwargs)¶

Bases: buffalo.parallel.base.Parallel

most_similar(keys, topk=10, pool=None, repr=False, ef_search=-1, use_mmap=True)¶: See the documentation of Parallel.

topk_recommendation(keys, topk=10, pool=None)¶

Caculate TopK recommendation for each users in parallel processing.

Parameters:	keys (list) – Query Keys topk (int) – Number of topK repr (bool) – Set True, to return as item key instead index.
Returns:	list of tuple(key, score)

class buffalo.parallel.base.ParCFR(algo, *argv, **kwargs)¶

Bases: buffalo.parallel.base.Parallel

most_similar(keys, topk=10, group='item', pool=None, repr=False, ef_search=-1, use_mmap=True)¶

Caculate TopK most similar items for each keys in parallel processing.

Parameters:

keys (list) – Query Keys
topk (int) – Number of topK
group (str) – Data group where to find (default: item)
pool (list or numpy.ndarray) – The list of item keys to find for. If it is a numpy.ndarray instance then it treat as index of items and it would be helpful for calculation speed. (default: None)
repr (bool) – Set True, to return as item key instead index.
ef_search (int) – This parameter is passed to N2 when hnsw_index was given for the group. (default: -1 which means topk * 10)
use_mmap – This parameter is passed to N2 when hnsw_index given for the group. (default: True)

Returns:

list of tuple(key, score)

topk_recommendation(keys, topk=10, pool=None, repr=False)¶

Caculate TopK recommendation for each users in parallel processing.

Parameters:	keys (list) – Query Keys topk (int) – Number of topK repr (bool) – Set True, to return as item key instead index.
Returns:	list of tuple(key, score)