featurizers

This module contains all the ways to break up a search query into feature sets.


polymr.featurizers.featurize_compress(rec)

Compute the ngram set of a record with a kmer size of 3 and a step size of 1, but compress each record attribute with zlib. Useful for indexing up to 3GB of search fields.

Returns:A set of 3-character ngram bytestrings
polymr.featurizers.featurize_compress_k4(rec)

Compute the ngram set of a record with a kmer size of 4 and a step size of 1, but compress each record attribute with zlib. Useful for indexing more than 3GB of search fields.

Returns:A set of 4-character ngram bytestrings
polymr.featurizers.featurize_k2(rec)

Compute the ngram set of a record with a kmer size of 2 and a step size of 1. Useful for indexing up to 300KB of search fields.

Returns:A set of 2-character ngram bytestrings
polymr.featurizers.featurize_k3(rec)

Compute the ngram set of a record with a kmer size of 3 and a step size of 1. Useful for indexing up to 1GB of search fields.

Returns:A set of 3-character ngram bytestrings
polymr.featurizers.featurize_k4(rec)

Compute the ngram set of a record with a kmer size of 4 and a step size of 1. Useful for indexing 1GB or more.

Returns:A set of 4-character ngram bytestrings