Parallels

Provides parallel processing feature to algorithm classes.

It is written in C++/OpenMP to maximize CPU utilization. Even with a single thread, it works faster than the default implementation of Algo classes. Parallels also provides a boosting feature to execute most_similar function, which is based on approximate nearest neighbors library N2. For performance and examples usage, please refer to the benchmark page and unit test codes.

class buffalo.parallel.base.Parallel(algo, *argv, **kwargs)

Bases: abc.ABC

most_similar(keys, topk=10, group='item', pool=None, repr=False, ef_search=-1, use_mmap=True)

Caculate TopK most similar items for each keys in parallel processing.

Parameters:
  • keys (list) – Query Keys
  • topk (int) – Number of topK
  • group (str) – Data group where to find (default: item)
  • pool (list or numpy.ndarray) – The list of item keys to find for. If it is a numpy.ndarray instance then it treat as index of items and it would be helpful for calculation speed. (default: None)
  • repr (bool) – Set True, to return as item key instead index.
  • ef_search (int) – This parameter is passed to N2 when hnsw_index was given for the group. (default: -1 which means topk * 10)
  • use_mmap – This parameter is passed to N2 when hnsw_index given for the group. (default: True)
Returns:

list of tuple(key, score)

topk_recommendation(keys, topk=10, pool=None, repr=False)

Caculate TopK recommendation for each users in parallel processing.

Parameters:
  • keys (list) – Query Keys
  • topk (int) – Number of topK
  • repr (bool) – Set True, to return as item key instead index.
Returns:

list of tuple(key, score)

class buffalo.parallel.base.ParALS(algo, **kwargs)

Bases: buffalo.parallel.base.Parallel

most_similar(keys, topk=10, group='item', pool=None, repr=False, ef_search=-1, use_mmap=True)

See the documentation of Parallel.

topk_recommendation(keys, topk=10, pool=None, repr=False)

See the documentation of Parallel.

class buffalo.parallel.base.ParBPRMF(algo, **kwargs)

Bases: buffalo.parallel.base.ParALS

most_similar(keys, topk=10, group='item', pool=None, repr=False, ef_search=-1, use_mmap=True)

See the documentation of Parallel.

topk_recommendation(keys, topk=10, pool=None, repr=False)

See the documentation of Parallel.

class buffalo.parallel.base.ParW2V(algo, **kwargs)

Bases: buffalo.parallel.base.Parallel

most_similar(keys, topk=10, pool=None, repr=False, ef_search=-1, use_mmap=True)

See the documentation of Parallel.

topk_recommendation(keys, topk=10, pool=None)

Caculate TopK recommendation for each users in parallel processing.

Parameters:
  • keys (list) – Query Keys
  • topk (int) – Number of topK
  • repr (bool) – Set True, to return as item key instead index.
Returns:

list of tuple(key, score)

class buffalo.parallel.base.ParCFR(algo, *argv, **kwargs)

Bases: buffalo.parallel.base.Parallel

most_similar(keys, topk=10, group='item', pool=None, repr=False, ef_search=-1, use_mmap=True)

Caculate TopK most similar items for each keys in parallel processing.

Parameters:
  • keys (list) – Query Keys
  • topk (int) – Number of topK
  • group (str) – Data group where to find (default: item)
  • pool (list or numpy.ndarray) – The list of item keys to find for. If it is a numpy.ndarray instance then it treat as index of items and it would be helpful for calculation speed. (default: None)
  • repr (bool) – Set True, to return as item key instead index.
  • ef_search (int) – This parameter is passed to N2 when hnsw_index was given for the group. (default: -1 which means topk * 10)
  • use_mmap – This parameter is passed to N2 when hnsw_index given for the group. (default: True)
Returns:

list of tuple(key, score)

topk_recommendation(keys, topk=10, pool=None, repr=False)

Caculate TopK recommendation for each users in parallel processing.

Parameters:
  • keys (list) – Query Keys
  • topk (int) – Number of topK
  • repr (bool) – Set True, to return as item key instead index.
Returns:

list of tuple(key, score)