When we search for ordinary written documents, we send words into a search
engine and get pages of words back.

What if we could search for spreadsheets
by sending spreadsheets into a search engine and getting spreadsheets back?
The order of the results would be determined by various specialized statistics;
just as we use PageRank to find relevant hypertext documents, we can develop
other statistics that help us find relevant spreadsheets.

Indexing
------------
Comma Search indexes only spreadsheets that are stored locally. To index a new
spreadsheet, run ::

    , --index [csv file]

Regardless of what path you give for the csv file, Comma Search will expand the
path to an absolute path and then use this as the key to meta-index the cached
results of the indexing. These caches are all stored in the ``~/.,``
directory.

By default, CSV files that have already been indexed will be skipped; to index
the same CSV file again, run with the ``--force`` or ``-f`` option. ::

    , --index --force [csv file]

Once you have indexed a bunch of CSV files, you can search. ::

    , [csv file]

You'll see a bunch of file paths as results

    $ , 'Math Scores 2009.csv'
    /home/tlevine/math-scores-2010-gender.csv
    /home/tlevine/Math Scores 2009.csv
    /home/tlevine/Math Scores 2009 Copy (1).csv
    /home/tlevine/math-scores-2009-ethnicity.csv

Implementation details
------------------------------
When we index a table, we first figure out the unique indices ::

    import special_snowflake
    indices = special_snowflake.fromcsv(open(filepath))

and save them. ::

    from pickle_warehouse import Warehouse
    Warehouse(os.path.expathuser('~/.,/indices'))[filepath] = indices

Then we look at all of the values of all of the unique indices
and save them.

    Warehouse(os.path.expanduser('~/.,/values/%d' % hash(index)))[filepath] = set_of_indexed_values

When we search for a table, it actually gets indexed first.
Once it has been indexed, we know the unique keys of the table.
We look up the indices, ::

    indices = Warehouse(os.path.expathuser('~/.,/indices'))[path]

then we look up all of the tables that contain this index, ::

    tables = Warehouse(os.path.expanduser('~/.,/values/%d' % hash(index)))

and the values of this ``tables`` object are sets of hashes of the
different values. I can then count how many items are in the intersection
between the set for the table that is used as the query and the every
other particular table.

If I want to go crazy, I might do this for combinations of columns that
aren't unique indices, and I'd use ``collections.Counter`` objects to
represent the distributions of the values.
