Installing polymr¶
Getting the latest release of polymr is a snap.
pip install polymr
To get the full source distribution, including extra storage backends, tests, and documentation, clone the github repository:
git clone https://github.com/massmutual/polymr
cd polymr
To run the tests for polymr:
python setup.py test
To build the docs:
cd doc
make html
Using the Python API¶
Interacting with the polymr API is best shown by example. The data
directory of the source repository contains a CSV of the senators
serving in the 190th Massachussetts general court. The examples below
will index, query, and modify that data.
Creating polymr indexes¶
Let’s start by opening and indexing the sample data, storing it in a LevelDB backend.
>>> import polymr
>>> be = polymr.storage.LevelDBBackend('data/ma_senators.polymr')
>>> with open('data/ma_senators.csv') as f:
... records = polymr.record.from_csv(
... f,
... searched_fields_idxs=[0,1],
... pk_field_idx=3
... )
... polymr.index.create(records, 1, 10, be)
...
>>> be.get_rowcount()
38
Searching¶
Now that we have a backend populated with an index, we can create an Index object and run some searches.
>>> import polymr
>>> be = polymr.storage.LevelDBBackend('data/ma_senators.polymr')
>>> index = polymr.query.Index(be)
>>> index.search(['', 'oconnor'])
[{'fields': ['Patrick', "O'Connor"], 'pk': '520', 'score': 0.7777777777777778, 'data': [b'Republican', b'617-722-1646', b'Patrick.OConnor@masenate.gov'], 'rownum': 26}, {'fields': ['Kathleen', "O'Connor Ives"], 'pk': '215', 'score': 0.8571428571428572, 'data': [b'Democrat', b'617-722-1604', b'Kathleen.OConnorIves@masenate.gov'], 'rownum': 27}, {'fields': ['Sonia', 'Chang-Diaz'], 'pk': '111', 'score': 1.0, 'data': [b'Democrat', b'617-722-1673', b'Sonia.Chang-Diaz@masenate.gov'], 'rownum': 5}]
Incremental indexing¶
Besides the batch method shown above, records can be added to the index incrementally.
>>> import polymr
>>> be = polymr.storage.LevelDBBackend('data/ma_senators.polymr')
>>> index = polymr.query.Index(be)
>>> rec = polymr.record.Record(
... ['Sarah', "Connor"],
... '911',
... [b'Resistance', b'617-575-1300', b'Sarah.Connor@masenate.gov']
... )
>>> index.add([rec])
[39]
>>> index.search(['sarah', 'onno'])
[{'fields': ['Sarah', 'Connor'], 'pk': '911', 'score': 0.4, 'data': [b'Resistance', b'617-575-1300', b'Sarah.Connor@masenate.gov'], 'rownum': 39}, {'fields': ['Patrick', "O'Connor"], 'pk': '520', 'score': 0.7857142857142857, 'data': [b'Republican', b'617-722-1646', b'Patrick.OConnor@masenate.gov'], 'rownum': 26}, {'fields': ['Kathleen', "O'Connor Ives"], 'pk': '215', 'score': 0.875, 'data': [b'Democrat', b'617-722-1604', b'Kathleen.OConnorIves@masenate.gov'], 'rownum': 27}, {'fields': ['Karen', 'Spilka'], 'pk': '212', 'score': 0.9285714285714286, 'data': [b'Democrat', b'617-722-1640', b'Karen.Spilka@masenate.gov'], 'rownum': 33}]
Using the command line interface¶
Polymr ships with a command line interface to searches and indexes.
The polymr
executable is installed along with the rest of the
polymr module during install. To see the invocation instructions,
available options, and subcommands, try polymr --help
.
Creating polymr indexes with polymr index
¶
The index
subcommand creates polymr indexes. Creating an index
with polymr index
involves describing where you want the new index
to be created, and feeding a delimited file of records into the
executable. Use polymr index --help
to see invocation instructions
and available options.
Let’s index some sample data. The source code repository contains the list contact information of senators serving in the 190th general court of the commonwealth of Massachussetts:
$ cd data
$ head -n3 ma_senators.csv
Michael,Barrett,Democrat,416,617-722-1572,Mike.Barrett@masenate.gov
Joseph,Boncore,Democrat,112,617-722-1634,Joseph.Boncore@masenate.gov
Michael,Brady,Democrat,519,617-722-1200,Michael.Brady@masenate.gov
The ma_senators.csv
file is a CSV containing the first name, last
name, party affiliation, room number, phone number, and email address
of all senate members. To index these entries, with the primary key
set to the senator’s room number and the search fields set to the
senator’s first name and last name, we can use:
$ polymr index \
> -b leveldb://localhost/$PWD/ma_senators.polymr \
> --primary-key 3 \
> --search-idxs 0,1 \
> < ma_senators.csv
This creates a polymr index named ma_senators.polymr
in the
current directory using the LevelDB backend.
Searching polymr indexes with polymr query
¶
The query
subcommand searches through a polymr index to find the
records most similar to a query. Queries are terms similar to the
search fields on the records you’re looking for. A query should
contain the same number of elements as the index has search
fields. For example, if a set of records were indexed with two search
fields, queries should be composed of two elements, where the first
element searches through the first search field, and the second
element searches through the second search field.
Let’s search through the index of senators created in the previous section, trying to find all senators with a last name resembling ‘oconnor’:
$ polymr query -b leveldb://localhost/$PWD/ma_senators.polymr '' 'oconnor'
[
{
"fields": [
"Patrick",
"O'Connor"
],
"pk": "520",
"score": 0.7777777777777778,
"data": [
"Republican",
"617-722-1646",
"Patrick.OConnor@masenate.gov"
],
"rownum": 26
},
{
"fields": [
"Kathleen",
"O'Connor Ives"
],
"pk": "215",
"score": 0.8571428571428572,
"data": [
"Democrat",
"617-722-1604",
"Kathleen.OConnorIves@masenate.gov"
],
"rownum": 27
},
{
"fields": [
"Sonia",
"Chang-Diaz"
],
"pk": "111",
"score": 1.0,
"data": [
"Democrat",
"617-722-1673",
"Sonia.Chang-Diaz@masenate.gov"
],
"rownum": 5
}
]
We find that there are two representatives with last names resembling
‘oconnor’: a democrat and a republican. As always, consult polymr
query --help
for invocation instructions and available options.