Metadata-Version: 1.1
Name: SoMaJo
Version: 2.1.2
Summary: A tokenizer and sentence splitter for German and English web and social media texts.
Home-page: https://github.com/tsproisl/SoMaJo
Author: Thomas Proisl, Peter Uhrig
Author-email: thomas.proisl@fau.de
License: GNU General Public License v3 or later (GPLv3+)
Download-URL: https://github.com/tsproisl/SoMaJo/archive/v2.1.2.tar.gz
Description: SoMaJo
        ======
        
        SoMaJo is a state-of-the-art tokenizer for German and English web and
        social media texts. It won the `EmpiriST 2015 shared task
        <https://sites.google.com/site/empirist2015/>`_ on automatic
        linguistic annotation of computer-mediated communication / social
        media. As such, it is particularly well-suited to perform tokenization
        on all kinds of written discourse, for example chats, forums, wiki
        talk pages, tweets, blog comments, social networks, SMS and WhatsApp
        dialogues.
        
        More detailed documentation is available `here
        <https://github.com/tsproisl/SoMaJo>`_.
        
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Natural Language :: German
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Text Processing :: Linguistic
