Installing polymr

Getting the latest release of polymr is a snap.

pip install polymr

To get the full source distribution, including extra storage backends, tests, and documentation, clone the github repository:

git clone https://github.com/massmutual/polymr
cd polymr

To run the tests for polymr:

python setup.py test

To build the docs:

cd doc
make html

Using the Python API

Interacting with the polymr API is best shown by example. The data directory of the source repository contains a CSV of the senators serving in the 190th Massachussetts general court. The examples below will index, query, and modify that data.

Creating polymr indexes

Let’s start by opening and indexing the sample data, storing it in a LevelDB backend.

>>> import polymr
>>> be = polymr.storage.LevelDBBackend('data/ma_senators.polymr')
>>> with open('data/ma_senators.csv') as f:
...     records = polymr.record.from_csv(
...         f,
...         searched_fields_idxs=[0,1],
...         pk_field_idx=3
...     )
...     polymr.index.create(records, 1, 10, be)
...
>>> be.get_rowcount()
38

Searching

Now that we have a backend populated with an index, we can create an Index object and run some searches.

>>> import polymr
>>> be = polymr.storage.LevelDBBackend('data/ma_senators.polymr')
>>> index = polymr.query.Index(be)
>>> index.search(['', 'oconnor'])
[{'fields': ['Patrick', "O'Connor"], 'pk': '520', 'score': 0.7777777777777778, 'data': [b'Republican', b'617-722-1646', b'Patrick.OConnor@masenate.gov'], 'rownum': 26}, {'fields': ['Kathleen', "O'Connor Ives"], 'pk': '215', 'score': 0.8571428571428572, 'data': [b'Democrat', b'617-722-1604', b'Kathleen.OConnorIves@masenate.gov'], 'rownum': 27}, {'fields': ['Sonia', 'Chang-Diaz'], 'pk': '111', 'score': 1.0, 'data': [b'Democrat', b'617-722-1673', b'Sonia.Chang-Diaz@masenate.gov'], 'rownum': 5}]

Incremental indexing

Besides the batch method shown above, records can be added to the index incrementally.

>>> import polymr
>>> be = polymr.storage.LevelDBBackend('data/ma_senators.polymr')
>>> index = polymr.query.Index(be)
>>> rec = polymr.record.Record(
...     ['Sarah', "Connor"],
...     '911',
...     [b'Resistance', b'617-575-1300', b'Sarah.Connor@masenate.gov']
... )
>>> index.add([rec])
[39]
>>> index.search(['sarah', 'onno'])
[{'fields': ['Sarah', 'Connor'], 'pk': '911', 'score': 0.4, 'data': [b'Resistance', b'617-575-1300', b'Sarah.Connor@masenate.gov'], 'rownum': 39}, {'fields': ['Patrick', "O'Connor"], 'pk': '520', 'score': 0.7857142857142857, 'data': [b'Republican', b'617-722-1646', b'Patrick.OConnor@masenate.gov'], 'rownum': 26}, {'fields': ['Kathleen', "O'Connor Ives"], 'pk': '215', 'score': 0.875, 'data': [b'Democrat', b'617-722-1604', b'Kathleen.OConnorIves@masenate.gov'], 'rownum': 27}, {'fields': ['Karen', 'Spilka'], 'pk': '212', 'score': 0.9285714285714286, 'data': [b'Democrat', b'617-722-1640', b'Karen.Spilka@masenate.gov'], 'rownum': 33}]

Using the command line interface

Polymr ships with a command line interface to searches and indexes. The polymr executable is installed along with the rest of the polymr module during install. To see the invocation instructions, available options, and subcommands, try polymr --help.

Creating polymr indexes with polymr index

The index subcommand creates polymr indexes. Creating an index with polymr index involves describing where you want the new index to be created, and feeding a delimited file of records into the executable. Use polymr index --help to see invocation instructions and available options.

Let’s index some sample data. The source code repository contains the list contact information of senators serving in the 190th general court of the commonwealth of Massachussetts:

$ cd data
$ head -n3 ma_senators.csv
Michael,Barrett,Democrat,416,617-722-1572,Mike.Barrett@masenate.gov
Joseph,Boncore,Democrat,112,617-722-1634,Joseph.Boncore@masenate.gov
Michael,Brady,Democrat,519,617-722-1200,Michael.Brady@masenate.gov

The ma_senators.csv file is a CSV containing the first name, last name, party affiliation, room number, phone number, and email address of all senate members. To index these entries, with the primary key set to the senator’s room number and the search fields set to the senator’s first name and last name, we can use:

$ polymr index \
>    -b leveldb://localhost/$PWD/ma_senators.polymr \
>    --primary-key 3 \
>    --search-idxs 0,1 \
>    < ma_senators.csv

This creates a polymr index named ma_senators.polymr in the current directory using the LevelDB backend.

Searching polymr indexes with polymr query

The query subcommand searches through a polymr index to find the records most similar to a query. Queries are terms similar to the search fields on the records you’re looking for. A query should contain the same number of elements as the index has search fields. For example, if a set of records were indexed with two search fields, queries should be composed of two elements, where the first element searches through the first search field, and the second element searches through the second search field.

Let’s search through the index of senators created in the previous section, trying to find all senators with a last name resembling ‘oconnor’:

$ polymr query -b leveldb://localhost/$PWD/ma_senators.polymr '' 'oconnor'
[
  {
    "fields": [
      "Patrick",
      "O'Connor"
    ],
    "pk": "520",
    "score": 0.7777777777777778,
    "data": [
      "Republican",
      "617-722-1646",
      "Patrick.OConnor@masenate.gov"
    ],
    "rownum": 26
  },
  {
    "fields": [
      "Kathleen",
      "O'Connor Ives"
    ],
    "pk": "215",
    "score": 0.8571428571428572,
    "data": [
      "Democrat",
      "617-722-1604",
      "Kathleen.OConnorIves@masenate.gov"
    ],
    "rownum": 27
  },
  {
    "fields": [
      "Sonia",
      "Chang-Diaz"
    ],
    "pk": "111",
    "score": 1.0,
    "data": [
      "Democrat",
      "617-722-1673",
      "Sonia.Chang-Diaz@masenate.gov"
    ],
    "rownum": 5
  }
]

We find that there are two representatives with last names resembling ‘oconnor’: a democrat and a republican. As always, consult polymr query --help for invocation instructions and available options.