query¶
This module contains the classes necessary to perform searches. It
defines polymr.query.Index
as well as a parallel version:
polymr.query.ParallelIndex
.
-
class
polymr.query.
Index
(backend)¶ Create an index. The index contains the methods necessary to perform searches and to incrementally index records.
Parameters: backend (subclass of class:polymr.storage.AbstractBackend) – The storage backend from which to retrieve search results -
add
(records, idxs=[])¶ Incrementally index one or more records. This method does the necessary steps to add the records to the storage backend. Optionally, this method can overwrite or update records by id.
Parameters: idxs (iterable of int) – Overwrite the records at these ids and update the index accordingly Returns: The record ids of the added records. Return type: list of int
-
close
()¶ Close the index. Clean up any temporary files. Close any connections. Shut down the backend.
-
search
(query, limit=5, r=100000, n=600, k=None, extract_func=<function features>, score_func=<function hit>)¶ Find records that match a list of search fields.
Parameters: - query (list of str) – The search query. Try to find records that match these fields.
- limit (int) – The max number of search results to return
- r (int) – The search space, defined as the max number of record ids to tally before scoring search hits.
- n (int) – The max number of search hits to compare to the query
- k (int) – The max number of tokens to use when gathering
search hits. Optionally coordinates with
r
to consider the lesser of the two hit sets returned by eitherr
ork
- extract_func (Callable that maps a list of str to
anything that can be used by
score_func
) – A function used in scoring search hits. This function breaks up a list of search fields into features. The collection of features are then compared to determine a search score. Seepolymr.score.features()
. - score_func (Callable that maps the output from
extract_func
to a float) – A function used in scoring search hits. This function takes feature collections output fromextract_func
to produce a floating point score. The score describes how well a query matches a search hit. Low scores are returned first.
-
-
class
polymr.query.
ParallelIndex
(backend_url, n_workers)¶ A parallel version of
polymr.query.Index
. This class has all the behavior of the serial index, but also has a method to perform many searches at once in a batch.Parameters: - backend (subclass of class:polymr.storage.AbstractBackend) – The storage backend from which to retrieve search results
- n_workers (int) – When using a parallel method, this parameter governs the number of searches executed simultaneously
-
close
(timeout=None, close_backend=True)¶ Close the index. Clean up any temporary files. Close any connections. Shut down the backend.
-
search
(query, limit=5, r=100000, n=600, k=None, extract_func=<function features>, score_func=<function hit>)¶ Just perform one search. See
polymr.query.Index.search()
.
-
searchmany
(queries, limit=5, r=100000, n=600, k=None, extract_func=<function features>, score_func=<function hit>)¶ Perform many searches in parallel. See
polymr.query.Index()
for information on arguments and keyword parameters.Return type: Iterable of search result lists.
-
class
polymr.query.
Index
(backend) Create an index. The index contains the methods necessary to perform searches and to incrementally index records.
Parameters: backend (subclass of class:polymr.storage.AbstractBackend) – The storage backend from which to retrieve search results -
add
(records, idxs=[]) Incrementally index one or more records. This method does the necessary steps to add the records to the storage backend. Optionally, this method can overwrite or update records by id.
Parameters: idxs (iterable of int) – Overwrite the records at these ids and update the index accordingly Returns: The record ids of the added records. Return type: list of int
-
close
() Close the index. Clean up any temporary files. Close any connections. Shut down the backend.
-
search
(query, limit=5, r=100000, n=600, k=None, extract_func=<function features>, score_func=<function hit>) Find records that match a list of search fields.
Parameters: - query (list of str) – The search query. Try to find records that match these fields.
- limit (int) – The max number of search results to return
- r (int) – The search space, defined as the max number of record ids to tally before scoring search hits.
- n (int) – The max number of search hits to compare to the query
- k (int) – The max number of tokens to use when gathering
search hits. Optionally coordinates with
r
to consider the lesser of the two hit sets returned by eitherr
ork
- extract_func (Callable that maps a list of str to
anything that can be used by
score_func
) – A function used in scoring search hits. This function breaks up a list of search fields into features. The collection of features are then compared to determine a search score. Seepolymr.score.features()
. - score_func (Callable that maps the output from
extract_func
to a float) – A function used in scoring search hits. This function takes feature collections output fromextract_func
to produce a floating point score. The score describes how well a query matches a search hit. Low scores are returned first.
-
-
class
polymr.query.
ParallelIndex
(backend_url, n_workers) A parallel version of
polymr.query.Index
. This class has all the behavior of the serial index, but also has a method to perform many searches at once in a batch.Parameters: - backend (subclass of class:polymr.storage.AbstractBackend) – The storage backend from which to retrieve search results
- n_workers (int) – When using a parallel method, this parameter governs the number of searches executed simultaneously
-
close
(timeout=None, close_backend=True) Close the index. Clean up any temporary files. Close any connections. Shut down the backend.
-
search
(query, limit=5, r=100000, n=600, k=None, extract_func=<function features>, score_func=<function hit>) Just perform one search. See
polymr.query.Index.search()
.
-
searchmany
(queries, limit=5, r=100000, n=600, k=None, extract_func=<function features>, score_func=<function hit>) Perform many searches in parallel. See
polymr.query.Index()
for information on arguments and keyword parameters.Return type: Iterable of search result lists.