Metadata-Version: 2.0
Name: revscoring
Version: 0.7.2
Summary: A set of utilities for generating quality scores for MediaWiki revisions
Home-page: https://github.com/halfak/Revision-Scores
Author: Aaron Halfaker
Author-email: ahalfaker@wikimedia.org
License: MIT
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Environment :: Other Environment
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Dist: deltas (>=0.3.1,<0.3.999)
Requires-Dist: docopt (>=0.6.2,<0.6.999)
Requires-Dist: mwapi (>=0.3.0,<0.3.999)
Requires-Dist: mwparserfromhell (>=0.3.3,<0.4.999)
Requires-Dist: mwtypes (>=0.2.0,<0.2.999)
Requires-Dist: nltk (>=3.0.0,<3.0.999)
Requires-Dist: nose (>=1.3.4,<1.3.999)
Requires-Dist: numpy (>=1.8.2,<1.10.999)
Requires-Dist: pyenchant (>=1.6.6,<1.6.999)
Requires-Dist: pytz (==2012c)
Requires-Dist: requests (>=2.0.0,<2.999.999)
Requires-Dist: scikit-learn (==0.15.2)
Requires-Dist: scipy (>=0.13.3,<0.16.999)
Requires-Dist: setuptools (>=5.5.1,<15.999)
Requires-Dist: tabulate (>=0.7.5,<0.7.999)

Revision Scoring
================
A generic, machine learning-based revision scoring system designed to be used
to automatically differentiate damage from productive contributory behavior on
Wikipedia.

Example
========

Using a scorer_model to score a revision:

    >>> import mwapi
    >>> from revscoring import ScorerModel
    >>> from revscoring.extractors import APIExtractor
    >>>
    >>> with open("models/enwiki.damaging.linear_svc.model") as f:
    ...     scorer_model = ScorerModel.load(f)
    ...
    >>> extractor = APIExtractor(mwapi.Session(host="https://en.wikipedia.org",
    ...                                        user_agent="revscoring demo"))
    >>>
    >>> feature_values = extractor.extract(123456789, scorer_model.features)
    >>>
    >>> print(scorer_model.score(feature_values))
    {'prediction': True, 'probability': {False: 0.4694409344514984, True: 0.5305590655485017}}


Installation
============
The easiest way to install `revscoring` is via the Python package installer
(pip).

``pip install revscoring``

You may find that some of `revscorings` dependencies fail to compile (namely
`scipy`, `numpy` and `sklearn`).  In that case, you'll need to install some
dependencies in your operating system.

Ubuntu & Debian:
  Run ``sudo apt-get install python3-dev g++ gfortran liblapack-dev libopenblas-dev``
Windows:
  'TODO'
MacOS:
  'TODO'

Finally, in order to make use of language features, you'll need to download
some NLTK data.  The following command will get the necessary corpus.

``python -m nltk.downloader stopwords``

You'll also need to install `enchant <https://enchant.org>`_ compatible
dictionaries of the languages you'd like to use.  We recommend the following:

* ``languages.dutch``:  myspell-nl
* ``languages.english``:  myspell-en-us myspell-en-gb myspell-en-au
* ``languages.french``: myspell-fr
* ``languages.german``:  myspell-de-at myspell-de-ch myspell-de-ch
* ``languages.indonesian``: aspell-id
* ``languages.italian``: myspell-it
* ``languages.hebrew``: myspell-he
* ``languages.portuguese``: myspell-pt
* ``languages.persian``: myspell-fa
* ``languages.spanish``: myspell-es
* ``languages.vietnamese``: hunspell-vi

Authors
=======
    Aaron Halfaker:
        * `http://halfaker.info`
    Helder:
        * `https://github.com/he7d3r`
    Adam Roses Wight:
        * `https://mediawiki.org/wiki/User:Adamw`


