Parallels¶
Provides parallel processing feature to algorithm classes.
It is written in C++/OpenMP to maximize CPU utilization. Even with a single thread, it works faster than the default implementation of Algo classes. Parallels also provides a boosting feature to execute most_similar function, which is based on approximate nearest neighbors library N2. For performance and examples usage, please refer to the benchmark page and unit test codes.
-
class
buffalo.parallel.base.
Parallel
(algo, *argv, **kwargs)¶ Bases:
abc.ABC
-
most_similar
(keys, topk=10, group='item', pool=None, repr=False, ef_search=-1, use_mmap=True)¶ Caculate TopK most similar items for each keys in parallel processing.
Parameters: - keys (list) – Query Keys
- topk (int) – Number of topK
- group (str) – Data group where to find (default: item)
- pool (list or numpy.ndarray) – The list of item keys to find for. If it is a numpy.ndarray instance then it treat as index of items and it would be helpful for calculation speed. (default: None)
- repr (bool) – Set True, to return as item key instead index.
- ef_search (int) – This parameter is passed to N2 when hnsw_index was given for the group. (default: -1 which means topk * 10)
- use_mmap – This parameter is passed to N2 when hnsw_index given for the group. (default: True)
Returns: list of tuple(key, score)
-
topk_recommendation
(keys, topk=10, pool=None, repr=False)¶ Caculate TopK recommendation for each users in parallel processing.
Parameters: - keys (list) – Query Keys
- topk (int) – Number of topK
- repr (bool) – Set True, to return as item key instead index.
Returns: list of tuple(key, score)
-
-
class
buffalo.parallel.base.
ParALS
(algo, **kwargs)¶ Bases:
buffalo.parallel.base.Parallel
-
most_similar
(keys, topk=10, group='item', pool=None, repr=False, ef_search=-1, use_mmap=True)¶ See the documentation of Parallel.
-
topk_recommendation
(keys, topk=10, pool=None, repr=False)¶ See the documentation of Parallel.
-
-
class
buffalo.parallel.base.
ParBPRMF
(algo, **kwargs)¶ Bases:
buffalo.parallel.base.ParALS
-
most_similar
(keys, topk=10, group='item', pool=None, repr=False, ef_search=-1, use_mmap=True)¶ See the documentation of Parallel.
-
topk_recommendation
(keys, topk=10, pool=None, repr=False)¶ See the documentation of Parallel.
-
-
class
buffalo.parallel.base.
ParW2V
(algo, **kwargs)¶ Bases:
buffalo.parallel.base.Parallel
-
most_similar
(keys, topk=10, pool=None, repr=False, ef_search=-1, use_mmap=True)¶ See the documentation of Parallel.
-
topk_recommendation
(keys, topk=10, pool=None)¶ Caculate TopK recommendation for each users in parallel processing.
Parameters: - keys (list) – Query Keys
- topk (int) – Number of topK
- repr (bool) – Set True, to return as item key instead index.
Returns: list of tuple(key, score)
-
-
class
buffalo.parallel.base.
ParCFR
(algo, *argv, **kwargs)¶ Bases:
buffalo.parallel.base.Parallel
-
most_similar
(keys, topk=10, group='item', pool=None, repr=False, ef_search=-1, use_mmap=True)¶ Caculate TopK most similar items for each keys in parallel processing.
Parameters: - keys (list) – Query Keys
- topk (int) – Number of topK
- group (str) – Data group where to find (default: item)
- pool (list or numpy.ndarray) – The list of item keys to find for. If it is a numpy.ndarray instance then it treat as index of items and it would be helpful for calculation speed. (default: None)
- repr (bool) – Set True, to return as item key instead index.
- ef_search (int) – This parameter is passed to N2 when hnsw_index was given for the group. (default: -1 which means topk * 10)
- use_mmap – This parameter is passed to N2 when hnsw_index given for the group. (default: True)
Returns: list of tuple(key, score)
-
topk_recommendation
(keys, topk=10, pool=None, repr=False)¶ Caculate TopK recommendation for each users in parallel processing.
Parameters: - keys (list) – Query Keys
- topk (int) – Number of topK
- repr (bool) – Set True, to return as item key instead index.
Returns: list of tuple(key, score)
-