query

This module contains the classes necessary to perform searches. It defines polymr.query.Index as well as a parallel version: polymr.query.ParallelIndex.


class polymr.query.Index(backend)

Create an index. The index contains the methods necessary to perform searches and to incrementally index records.

Parameters:backend (subclass of class:polymr.storage.AbstractBackend) – The storage backend from which to retrieve search results
add(records, idxs=[])

Incrementally index one or more records. This method does the necessary steps to add the records to the storage backend. Optionally, this method can overwrite or update records by id.

Parameters:idxs (iterable of int) – Overwrite the records at these ids and update the index accordingly
Returns:The record ids of the added records.
Return type:list of int
close()

Close the index. Clean up any temporary files. Close any connections. Shut down the backend.

search(query, limit=5, r=100000, n=600, k=None, extract_func=<function features>, score_func=<function hit>)

Find records that match a list of search fields.

Parameters:
  • query (list of str) – The search query. Try to find records that match these fields.
  • limit (int) – The max number of search results to return
  • r (int) – The search space, defined as the max number of record ids to tally before scoring search hits.
  • n (int) – The max number of search hits to compare to the query
  • k (int) – The max number of tokens to use when gathering search hits. Optionally coordinates with r to consider the lesser of the two hit sets returned by either r or k
  • extract_func (Callable that maps a list of str to anything that can be used by score_func) – A function used in scoring search hits. This function breaks up a list of search fields into features. The collection of features are then compared to determine a search score. See polymr.score.features().
  • score_func (Callable that maps the output from extract_func to a float) – A function used in scoring search hits. This function takes feature collections output from extract_func to produce a floating point score. The score describes how well a query matches a search hit. Low scores are returned first.
class polymr.query.ParallelIndex(backend_url, n_workers)

A parallel version of polymr.query.Index. This class has all the behavior of the serial index, but also has a method to perform many searches at once in a batch.

Parameters:
  • backend (subclass of class:polymr.storage.AbstractBackend) – The storage backend from which to retrieve search results
  • n_workers (int) – When using a parallel method, this parameter governs the number of searches executed simultaneously
close(timeout=None, close_backend=True)

Close the index. Clean up any temporary files. Close any connections. Shut down the backend.

search(query, limit=5, r=100000, n=600, k=None, extract_func=<function features>, score_func=<function hit>)

Just perform one search. See polymr.query.Index.search().

searchmany(queries, limit=5, r=100000, n=600, k=None, extract_func=<function features>, score_func=<function hit>)

Perform many searches in parallel. See polymr.query.Index() for information on arguments and keyword parameters.

Return type:Iterable of search result lists.
class polymr.query.Index(backend)

Create an index. The index contains the methods necessary to perform searches and to incrementally index records.

Parameters:backend (subclass of class:polymr.storage.AbstractBackend) – The storage backend from which to retrieve search results
add(records, idxs=[])

Incrementally index one or more records. This method does the necessary steps to add the records to the storage backend. Optionally, this method can overwrite or update records by id.

Parameters:idxs (iterable of int) – Overwrite the records at these ids and update the index accordingly
Returns:The record ids of the added records.
Return type:list of int
close()

Close the index. Clean up any temporary files. Close any connections. Shut down the backend.

search(query, limit=5, r=100000, n=600, k=None, extract_func=<function features>, score_func=<function hit>)

Find records that match a list of search fields.

Parameters:
  • query (list of str) – The search query. Try to find records that match these fields.
  • limit (int) – The max number of search results to return
  • r (int) – The search space, defined as the max number of record ids to tally before scoring search hits.
  • n (int) – The max number of search hits to compare to the query
  • k (int) – The max number of tokens to use when gathering search hits. Optionally coordinates with r to consider the lesser of the two hit sets returned by either r or k
  • extract_func (Callable that maps a list of str to anything that can be used by score_func) – A function used in scoring search hits. This function breaks up a list of search fields into features. The collection of features are then compared to determine a search score. See polymr.score.features().
  • score_func (Callable that maps the output from extract_func to a float) – A function used in scoring search hits. This function takes feature collections output from extract_func to produce a floating point score. The score describes how well a query matches a search hit. Low scores are returned first.
class polymr.query.ParallelIndex(backend_url, n_workers)

A parallel version of polymr.query.Index. This class has all the behavior of the serial index, but also has a method to perform many searches at once in a batch.

Parameters:
  • backend (subclass of class:polymr.storage.AbstractBackend) – The storage backend from which to retrieve search results
  • n_workers (int) – When using a parallel method, this parameter governs the number of searches executed simultaneously
close(timeout=None, close_backend=True)

Close the index. Clean up any temporary files. Close any connections. Shut down the backend.

search(query, limit=5, r=100000, n=600, k=None, extract_func=<function features>, score_func=<function hit>)

Just perform one search. See polymr.query.Index.search().

searchmany(queries, limit=5, r=100000, n=600, k=None, extract_func=<function features>, score_func=<function hit>)

Perform many searches in parallel. See polymr.query.Index() for information on arguments and keyword parameters.

Return type:Iterable of search result lists.