==========
Tutorial CSV
==========

.. module:: topogram


Topogram relies on three core components : corpus, extractors  and visualizers.

1. Corpus : describe your dataset
=============

::

    from topogram.corpus.csv_file import CSVCorpus 

    # import corpus
    csv_corpus = CSVCorpus('data.csv',
        origin ="user_id",
        content ="text",
        timestamp ="created_at",
        time_pattern="%Y-%m-%d %H:%M:%S",
        adds = ["permission_denied", "deleted_last_seen"])

    # validate CSV corpus formatting
    try :
        csv_corpus.validateCSV()
    except ValueError, e:
        print e.message, 422



2. Processor : extract your information
=============

::

    from topogram import Topogram
    from topogram.processors.nlp import NLP
    from topogram.processors.regexp import Regexp

    # init processors
    chinese_nlp = NLP("zh")
    url = Regexp(r"\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^\p{P}\s]|/)))")

    # init 
    topogram = Topogram(corpus=csv_corpus, processors=[("zh", chinese_nlp), ("urls", url)])


3. Visualizer : get your viz data
=============

::
    from topogram.vizparsers.network import Network

    # create viz model
    words_network = Network( directed=False )

    for row in topogram.process():
        words_network.add_edges_from_nodes_list(row["zh"])

    # get processed graph as d3js json
    print words_network.get(nodes_count=1000, min_edge_weight=3, json=True)
