featurizers¶
This module contains all the ways to break up a search query into feature sets.
-
polymr.featurizers.
featurize_compress
(rec)¶ Compute the ngram set of a record with a kmer size of 3 and a step size of 1, but compress each record attribute with zlib. Useful for indexing up to 3GB of search fields.
Returns: A set of 3-character ngram bytestrings
-
polymr.featurizers.
featurize_compress_k4
(rec)¶ Compute the ngram set of a record with a kmer size of 4 and a step size of 1, but compress each record attribute with zlib. Useful for indexing more than 3GB of search fields.
Returns: A set of 4-character ngram bytestrings
-
polymr.featurizers.
featurize_k2
(rec)¶ Compute the ngram set of a record with a kmer size of 2 and a step size of 1. Useful for indexing up to 300KB of search fields.
Returns: A set of 2-character ngram bytestrings
-
polymr.featurizers.
featurize_k3
(rec)¶ Compute the ngram set of a record with a kmer size of 3 and a step size of 1. Useful for indexing up to 1GB of search fields.
Returns: A set of 3-character ngram bytestrings
-
polymr.featurizers.
featurize_k4
(rec)¶ Compute the ngram set of a record with a kmer size of 4 and a step size of 1. Useful for indexing 1GB or more.
Returns: A set of 4-character ngram bytestrings