CHANGES
=======

0.9.4
-----

* Allow multi-value arguments on command line

0.9.3
-----

* Correctly interleave choice elements

0.9.2
-----

* Normalize input tokens to NFD
* Unify output for kraken and tesseract

0.9.1
-----

* Clear out error list after refetching status
* Turns out tesseract can segfault during segmentation, too
* Reorder task definitions only before adding them
* Fix regression causing tasks to be added twice
* Pipeline redis operations to fix regressions
* Return error instead of exception

0.9
---

* Include REST API docs
* Fix tests
* Hopefully fix everything
* Huh?
* Fix travis stuff
* Correctly set mimetype for static files
* Re-Add crude web UI
* Remove unnamed parameters from documentation
* isinstance(str) -> isinstance(basestring)
* str is iterable but shouldn't be set()able
* Minor nidaba module fixes
* Correct logging in kraken segmentation
* Use flask blueprints to incorporate web UI and REST-API
* Log exceptions in tasks
* Don't try to install flasky stuff from conda
* Remove debuggin print
* Fix ordered dict initialization
* Add segment starts to simplexml format
* Correctly iterfind segments
* Docs for web API
* Display a spinner when uploading files using the web API
* self.celery is not set on static method
* Raise positional argument exception only when there is an argument
* Add flask_restful to requirements
* Also speed up command line execution
* Replace document tuples with URLs on the web API
* Speed up imports when plugin initialization isn't needed
* Refactor task imports
* Correct arg_values for sauvola binarization
* Change lock behavior on non-existing path components
* Rewrite API server using flask_restful

0.6.1
-----

* Add arg_values field to NidabaTask decorator
* Introduce preliminary Web API
* Bugfixes in the storage module
* Add arg_values field to NidabaTask decorator
* Update requirements with gunicorn/Flask
* Add SimpleBatch class to the core API

0.6.0
-----

* Fork before entering tesseract
* Turns out celery has a usable logging module
* Tesseract's GetUTF8Text doesn't output spaces - add them manually
* Set arg-/restypes for tesseract ctypes
* PEP8 compliance
* Herp-Derp release lock on file closure
* Static type all tesseract functions used
* Make TEI output EpiDoc compliant
* Moar docs

0.5.4
-----

* Scale correct confidence value this time
* Revert "Scale confidences to correct range (again)"
* Scale confidences to correct range (again)

0.5.3
-----

* Documentation update
* Hopefully eradicate segfaults by freeing PIX explicitely
* Fix restype of TessBaseAPIRecognize
* New output format for status command
* Run tesseract manually on each line using SetRectangle

0.5.2
-----

* Bump tesseract's PSM so it uses the segmentation

0.5.1
-----

* Add tei2txt task to output layer

0.5.0
-----

* Add tei2simplexml output layer thingy
* Add write_simplexml procedure to TEIFacsimile
* Add write_hocr command to TEIFacsimile
* Add output layer metadata subtask

0.4.2
-----

* Temporarily disable output merging
* Cleanup of the stats package for TEI input
* Deinitialize StorageFile if file hasn't been opened in the first place
* StorageFile is usable now
* Port spell-checker to TEI input documents
* Set correct page segmentation mode for tesseract character recognition
* Correct display of errors on task failure
* Allow subsequent addition of alternative readings
* add_choices() function for TEIFacsimile
* More merging
* Document merging
* Declare conf before use
* Update documentation to refer to new XML output
* Some changes to TEI output
* Bump requirements file
* Properly squish certainty value to the range 0-1
* Adapt ocropus tests to new xml output
* Adapt tesseract tests to new XML output
* Implement a file-like object for storage access

0.4.1
-----

* Hotfix for manifest incompetence
* Test skeleton for storage module

0.4.0
-----

* Check zone type when aggregating segment/grapheme boundaries
* Properly extract segment bounding boxes
* Use whitespace in the hOCR when loading from it
* TEI XML output for direct tesseract invocation
* Add rudimentary hOCR loading function
* Tests for krakens new XML interface
* Tesseract TEI segmentation/recognition results
* Some minor fixes in TEIFacsimile
* Add NidabaTEIException
* CLI prepare_filestore exception handling
* TEI segmentation/recognition ocropus support
* Fix some minor issues arising from kraken TEI code
* Let lines/segments/graphemes return element ID
* TEI-XML segmentation/recognition: ocropus
* TEI-XML segmentation/recognition: kraken
* Introduce TEI facsimile class
* Properly use custom page segmentations using ocropus
* Don't copy zone files if source and destination are identical
* Correct tesseract argument order in tests
* Properly mock nidaba_cfg in storage module
* Conditionalize kraken test execution
* Refactor storage module to throw exceptions
* Correctly encode file names as ctypes doesn't handle unicode objects
* Modular page segmentation: ocropus
* Document modular page segmentation
* Tests running the actual tasks
* Modular page segmentation: tesseract pt. 2
* Test tasks directly
* Modular page segmentation: tesseract support
* CLI page segmentation
* Generic UZN reader/writer
* Modular page segmentation: kraken support
* Import plugins/tasks so celery finds them
* Absolute import mangling again
* Add miscellaneous data to task graph
* New result backend access
* Add new result backend
* Unpin specific dependencies
* Hotfix removing exception in Tesseract output

0.3.18
------

* Update documentation
* Remove printf debugging

0.3.17
------

* Correct Plugin Runtime Exception
* Absolute import
* Fix tesseract segmentation fault
* don't install stevedore from conda
* Add nidaba_mkdict command
* Add plugin subcommand
* Pep8ify
* Remove unused import
* pep8ify
* Replace pluginbase with stevedore
* pytables isn't needed anyway
* Moar ocropy installs
* My failtitude is unmatched
* numexpr is not on conda
* Try to get ocropy install working
* Install ocropy in travis
* expected failure for kraken ocr jpeg
* Hopefully fix travis build
* Requirements fixing
* Enable more testing on travis
* Moar duncicity
* Fix travis link
* Update README
* Travis-CI file
* unibarrier is gone
* Test using the new HDF5 models
* Extras to setup.cfg
* Whitespace
* Make note of kraken bundle

0.3.16
------

* Squash them bugs
* Allow usage of kraken's new HDF5 models
* Add postprocessing parameter to cli
* Make configuration more explicit
* Add task for spell checking
* Return unicode not BS object
* s/x_conf/x_cost
* Update requirements file
* Add spellchecking functions beneath lex

0.3.15
------

* Bump version of pyxdameraulevenshtein

0.3.14
------

* Explicitely enable/disable extended hocr output
* Changelog up to 0.3.13
* Edit distance using pyxDamerauLevenshtein
* difflib result similarity measure
* Error checking around functions required for extended tesseract output as there hasn't been a stable release containing them yet
* new_join is on the result backend not broker url

0.3.13
------

* Squashed commit of the following:

0.3.12
------

* text edit distance measure
* Add move_to_storage for file: parameter on cli
* Update documentation for new cli
* Uncomment plugins not recommended in documentation

0.3.11
------

* Preparatory for for diversion
* Fix for B/W capi tesseract inputs
* Add dependency for new cli
* Change plugin hierarchy
* Make batch do more than display task help
* encoding preamble to kraken.py
* Add --help-tasks option to batch command
* Extract kwargs from task return values if dict
* Tesseract hotfix
* New and fancy command line interface
* Fix makefile for new documentation

0.3.10
------

* Hotfix: calling c function with unicode strings is not a good idea
* Move changelog to sidebar
* Add changelog to documentation
* Make kraken an optional install
* Miscellaneous fixes to plugins and correspoding tests
* grc+ara somehow causes segfaults
* Fix segfault caused by multiple API initializations
* Don't execute tesseract_capi_multiple test case
* Major revamp of documentation
* Add worker subcommand

0.3.7
-----

* Homepage is gh page for now
* markdown -> rst

0.3.6
-----

* Make download command work again
* Let tesseract tests check some basic premises before exploding in a glorious mess

0.3.5
-----

* Some error checking is in order
* tessdata may be a storage tuple

0.3.4
-----

* Add C-API to tesseract plugin
* Don't use __import__
* Add plugins_load to configuration file
* Update README
* Remove leper
* Some more test typos
* Typo

0.3.3
-----

* Really fix tests this time
* Fix missing dependency on redis
* Add plugin architecture and remove static imports
* Fix stupid import/make Otsu usable
* Fix algorithm test imports
* PEP8ify
* Bump versions in requirements file

0.3.2
-----

* Really add git versioning to sphinx
* Retrieve tag version from master
* Sphinx now infers version from git tags
* Update task source documentation
* Fix docstrings to remove transient objects
* Fixup for tracking object handling
* Use native python Otsu's thresholding
* Fix imports for lexical tools
* Move algorithms.py to a package
* Add automagic (read arcane) propagation of root documents and global ids
* Update readme documentation link
* Remove default ocrization
* Cope with parameterless binarization configuration
* Add nlbin binarization from ocropus

0.3.1
-----

* Add directive to example config disabling pickle serialization
* Fix incorrect behavior of batch object with more than one step

0.3.0
-----

* Fix will-it-blend switch to add correct task
* Update sphinx docs
* Autoinclude still doesn't work for tasks
* Minor fixes to docstrings
* Build apidoc in gh_pages
* Update documentation

0.2.0
-----

* Fix makefile to run more than once successfully
* Update README to link to new docs pages
* Add gh_pages target to sphinx makefile
* Update README.md to note test skipping
* remove trailing spaces
* Skip tests for ocropus or tesseract of if not installed
* Fix length of subheading ~s
* Fancy shmancy documentation
* New contributing file
* Fix link in README
* docstrings, so many docstrings
* Remove ununsed imports from setup.py and PEP8ify
* Remove tests for obsolete hocr module
* PEP8ify tests and fix some errors in the algorithm test module
* Annoying commit containing: * PEP8ify * Exception refactoring * Start of docstring work
* Clean up sphinx configuration
* Rename iris to nidaba
* Start to replace leper
* Remove unused files
* Update requirements
* Use pbr for setuptool magic
* Minor bugfix making cli usable
* One large commit that unfortunately can't be split into multiple smaller ones:
* Remove obsolete/nonfunctioning web stuff
* new style classes
* Add is_file function to storage module
* Make custom exceptions pickleable
* switch over to absolute imports so tasks are more easily modularized
* Fix install using pip version >=6.0
* Fix tesseracts tests to run clenaly independent of version
* Rename old_* switches to legacy_* Make legacy_ocropus actually do something
* Update README with new stuff
* New configuration format documentation
* new global configuration files in sys.prefix + '/etc/iris/'
* batch documentation
* Hotfix for ocropus
* Fix ocropus task
* Comment cleanup in tasks.py
* Remove useless variable
* Sorted suggestions for spell checking
* Complete spell checking/dictionary tools
* Remove ununsed manifest
* Don't segfault on nonexisting job id in get_state
* Some minor documentation updates
* sphinx replaced the wiki
* Allow multiple input documents in iris.batch()
* Add sphinx documentation tree with incomplete documentation
* We are reusing code from Bruce Robertson, who's licensed his code under GPL2 forcing us to do the same. Add a license file to reflect this
* Remove superfluous stuff in the README
* Flesh out iris command line interface
* status/configuration command for iris
* Remove flower and its dependency tornado from requirements.txt
* Updated requirements
* Add note about new download command
* Add download command and remove downloaded files
* Ocropus task
* ocropus tests
* Rudimentary support for the Ocropus OCR framework
* get_progress is obsolete, use get_state
* Allow arbitrary depth images for grayscale conversion
* Fix leper build failure
* Ocropus binarization parameter
* Fix configuration file for dictionaries
* Fix threshold values in documentation (float -> int)
* Add more information to explain my job entry data structure worthy of institutionalisation
* Working inter-subtree synchronization and result retrieval
* Remove useless load_del_dict function
* Updated build instructions in README
* Fixed test_file_output_png in test_tesseract to accomodate the new/old tesseract file output switch
* Nonfunctional synchronization thingy
* tesseract-grc required for tests
* Add configuration switch for tesseract hocr output format
* Changed iris/__init__.py back the having convenient imports
* Updated requirements to latest versions. Added nosetests to requirements
* Remove fs from requirements.txt
* Rebased tasks onto master; fixed __init__.py so tests would clear; backed up old __init__.py to __init__.py.bak
* Slightly less experimental hocr document merging
* Fix typo
* Install a non-functional iris executable
* Experimental untested hOCR document merging
* Add sane default binarization values from rigaudon
* Update readme with our new and fancy status and result checking functions
* Use storage module style path globally
* leper NULL pointer handling
* get_storage_path from absolute path function
* Note that leptonica builds without image format support
* Remove double parametrization of output file paths
* Complete documentation of tasks
* New status reporting function
* Fix tesseract exception handling
* Commit everything next time so group creation actually works
* Tesseract should throw an exception if stderr is not empty
* Removed superfluous comment
* Fixed typo
* Major cleanup of the spellchecker; added proper tests for the spellchecker
* Cleanup of old comments
* Cleaned garbage whitespace in test_algorithms.py
* Cleaned garbage whitespace and a typo in algorithms.py
* Replaced the WORDS xpath with WORDS and XWORDS to accomodate more hocr markup
* Cleaned whitespace in hocr and tests; remove extraneous code in iris/__init__.py
* Renamed scripts to contrib; created testrun stub
* Removed extraneous code from __init__.py; removed useless test
* Removed task tests
* Fixed incorrect naming of Tesseract output files
* Commented out code that uses leper
* Removed __init__.py from project root; commented out leper import
* Removed biopython from requirements.txt because it got added accidentally
* Make pipeline usable
* Apparently relative imports are the new hot thing right now and py3k deprecated absolute ones
* Use absolute imports
* Fix minor issues from merging
* Added stub tesseract task; changed celeryconfig imports in task.py to match new module name
* Moved tests to root
* Commented out some stubs/incomplete functions
* Make leper more digestible for celery Implement correct unicode support
* Removed old/never used taskManager and testLauncher modules
* Updated requirements.txt
* PEP8ified celeryconfig's filename
* Added symmetric delete spell check search with on-disk binary searching. Update basic script for generating deletion dictionaries
* test
* test :
* test commit
* Add get_abs_path to storage module
* Task skeleton
* Added function for binary search of a memory mapped symmetric delete dictionary file with tests
* Check for model creation failure in dewarpSinglePage
* Revert "Kitchen is broken" because it is not broken
* Kitchen is broken
* Don't need python-fs any more
* leper does more than deskewing now
* Ensure correct color depth for output by making it the responsibility of the user
* Add handlers for incorrect input color depth
* Add page-wise dewarping function
* Squashed commit of the following:
* tesseract produces hocr files with ending .hocr not .html
* Fix conversion tests
* Remove obsolete launch script
* Mention dependencies for the c extension
* Add deskew c extension
* Update readme to reflect reality
* Use setuptools
* Move tests for setuptoolization
* Added dictionary of Greek words with invariant diacritics
* Added diacritic stripping fuctions with tests. Modified the uniblock function to make it work/useful
* Implemented proper symmetric delete based spellcheck with tests
* Removed old abbyy conversion module; will re-implement with hocr tools
* Added a script for generating symmetric deletion dictionaries
* Added single delete Greek symmetric spell check dictionary (compressed)
* new and fancier mkdict building the revised dictionaries for sym_suggest
* Rewrite sym_suggest to run in somewhat constant time
* Add dictionary creation script
* s/set([])/set()
* s/NFD/NFC
* Strip trailing newlines from dictionaries
* Rewrite spell checker to use a simple ngram similarity measure as hunspell is too intelligent for the hocrinfoaggregator algorithm
* Fix stray return
* s/DICT_URL/DICT_PATH
* Encapsulate spell checker into class to speed up recurring tasks
* Spell check postprocessing
* Add spell checking function. It works but there are wonky encoding issues brought over from hunspell
* Added symmetric deletion based suggestion algorithms with tests
* Added a normalization parameter (with tests) to the sanitize function
* Added the strings_by_deletion function with tests. Rearranged the unibarrier and sanitize functions
* Modified algorithms.sanitize() to accept strings and convert to unicode. Added appropriate tests for said changes. Fixed doc comment in algorithms.sanitize()
* Added the sanitize method with tests
* Added the extract_words function
* Added the hocr parser context manager HocrContext
* Refactored hocr bbox extraction methods to use the new xpath-based extract_bbox methods. Updated tests accordingly
* Added the islang function in algorithms.py with tests
* Remove the 'test' dir; all tests now in 'tests'
* Renamed tests for autodiscovery
* Removed useless/accidentally committed files and dirs 'extras', 'gr.txt', and 'holder.html'
* Fixed incorrect bound in the comining diacritical mark unicode range
* HOCR specification .pdf in /docs was unreadable/corrupted; fixed
* Added extract_bboxes_by_class function in hocr, with tests
* PEP8ify tests
* Reformat test file name scheme to allow test autodiscovery
* Add Lock tests
* Rename config option as it isn't an URL anymore
* PEP8-ify
* Added a copy of the hocr specification doc to /docs
* Removed useless print statement in hocr.py
* Added bbox preview algorithm to hocr.py. Removed unused import from tests/hocrtests.py
* Moved hocr related algorithms (extract_hocr_tokens and extract_bboxes) to the now hocr.py file. Moved tests for said functions to the now hocrtests.py
* Bugfix in extract_bboxes which could cause regex mismatch
* Added function for bbox extraction with tests. Added unibarrier decorator function with tests. Added unicode block identification function with tests. Misc formatting and docstring/line length cleanup
* The empty shell that the locking tests are right now
* Finish storage unit tests
* Use new file locking thingy
* Add own implementation of symlink based file locking
* Some half-broken tests. Acquiring locks for multiple files doesn't seem to work correctly
* New storage module utilizing locking
* Fixed docstring comments in algorithmtests.py
* Actually changed 'test' to 'tests' (not fixed in previous commit). Modified tesseract.py to accept lists of tesseract training data
* Renamed 'test' to 'tests'; improved tesseract.py and added tests for same
* Created tesseract.py for tesseract bindings. Created tests for tesseract bindings. Renamed 'test' dir to 'tests'. Added tests/resources/tesseract for tesseract tests
* Make sure retrieve_content works with single document retrieval
* Removed redundant updateall.sh script from /iris
* Created native python version of all alignment algorithms with tests. Refactored numpy versions. Updated requirements.txt Pep8-ified algorithms.py
* Add some storage backend unit tests
* Remove test celery tasks
* Rudimentary external storage module
* Reformat irisconfig to 80 characters width
* Add gitignore
* Use opener.opendir() method to allow more generalized storage backend specification
* Commented out/removed old FTP code it tasks/tests. Added Greek dictionary file
* Removed experimental ftp related code and tests. (For conversion to an NFS based system) Commented out related code in tasks.py (most of it :) )
* Added link to the hocr standard in the dev guide
* Extracted initial matrix generation for alignment algorithms to their own methods. Renamed "BacktraceTests" to the correct "SemiGlobalAlignmentTests"
* Modified full_edit_distance to support semi-global alignments. Extracted edit distance backtracing to its own function. Added a semi-global alignment methed with tests
* Fixed typos
* Added whole word/list alignment tests. Fixed bug in test_all_empty test in algorithms.py Updated requirements file
* Implemented full edit distance and alignment algorithms. Implemended full unit test suite for the abovementioned algorithms. Deleted old or unnecessary files
* More typo fixes
* Updated/fixed typos in the dev guide, added a TODO file
* Ok, this is a much larger commit thas is normally good practice, but things have some clear structure now. Main changes include: 1. Added useful tasks to tasks.py, along with unit tests (tasktests.py). 2. Created the algorithms module, along with unit tests (algorithmtests.py). 3. Converted python modules to use UTF-8. 4. Created unit tests for the FTP-based cluster file store. 5. Added a bash script for updating dependencies via pip (updateall.sh). 6. Updated requirements.txt
* Fixed typo: "platform dependent" is now "platform independent"
* Fixed an incorrect comment in abbyyToOCR.py Made abbyyToOCR.py and its test pep8 compliant
* Created a complete, deployment-ready version of the abbyy to hocr converter, along with unit testing. Added the Iris developer's guide, devguide.md
* Updated jQuery filedrop dependency to remove Chrome 22 console warning
* More code for jQuery drag-n-drop file upload
* Reorganization of installation guide, SASS details
* Created installation instructions in docs/readme.md. So far, includes instructions for configuring Apache. Also included the Apache configuration file for Iris
* Moved everything from Iris/iris/static into templates directory so it would be accessible by the templates. Added a couple missing files into the bootstrap directory. Fixed references in OCR tutorial
* Static content added to repo (css, fonts, js)
* Initial commit of web UI
* Updated .gitignore to ignore sass-cache
* Added redis and its prereqs to the requirements. Created a stub taskManager class
* Completely debugged and refactored imagePreview.py into imageTools.py. Added a field in the batch stub template for images; cleaned up views.py
* Template folder not correctly added in previous commit; fixed
* Created a module for working with images; added image format validation. Moved the templates folder to /iris
* Moved the test run script to the /iris dir to simplify importations; modified views.py to support basic file uploading at the /batch endpoint
* Certain changes to .gitignore, requirements.txt, and several test HTML sub templates were not properly tracked in previous commits; thises have now been updated
* Removed redundant schematic copies; updated requirements to include PIL (Python imaging library)
* Stopped tracking of .pyc files; updated requirements to include Celery 3.0.22
* Updated README, removed old schematic files from the root dir
* Restructred folders and Python packages; added a basic inital version of the views.py module for displaying the web frontend; updated the README file
* Added a pipeline schematic and an xml file for updating the schematic with draw.io
* Added .gitignore, and the requirements list
* Initial commit; repo empty
