Metadata-Version: 1.0
Name: textmodel
Version: 0.3.6
Summary: A data type for storing and manipulating rich text data. It aims to be fast and efficient and it is suited even for very long texts.
Home-page: https://pypi.python.org/pypi/textmodel/
Author: C. Ecker
Author-email: textmodelview@gmail.com
License: BSD
Description: Text Model
        ==========
        
        Quick example
        -------------
        
        ::
        
            >>> from textmodel import TextModel
            >>> text = TextModel(u'Hello World')
            >>> text2 = TextModel(u'!', fontsize=20)
            >>> text.insert(11, text2)
            >>> text.set_properties(6, 11, bgcolor='yellow')
            >>> for i in range(1000):
            ...     text.append(TextModel("Line %i\n" % i))
            >>> text.linelength(0) # length of first line
            19
            >>> text.index2position(100) # row, col of index 100
            (12, 2)
        
        
        Introduction
        ------------
        
        Word processors are usually believed to be heavy and slow
        applications. However I think, that it is possible to design a word
        processor which is light weight and which is fast - so fast that it
        even can be implemented in a "slow" scripting language. Text model is
        ment to be a prove of concept (even though it is merely a text editor
        and not a full word processor).
        
        Storing and editing text information is a problem with a long history
        in computer science. Known solutions include the `gap buffer
        <http://en.wikipedia.org/wiki/Gap_buffer>`_ (used by Emacs), the
        `piece table <http://www.cs.unm.edu/~crowley/papers/sds.pdf>`_ (used
        by MS-Word) and the `rope data structure
        <http://en.wikipedia.org/wiki/Rope_%28data_structure%29>`_ .  Instead,
        text model uses internally a structure which I named "texel tree" and
        which is probably a new approach to the problem. The goal was to find
        a data structure which stores text together with format information
        and is
        
        - fast (even when implemented in a scripting language)
        - efficient (in memory consumption)
        - hierarchic (so that texts can contain elements like tables which
          itself contain text)
        
        The texel tree consists of nodes which are called texels (text
        elements). Each texel can have a variable number of child texels
        (between 8 and 15), forming a highly branched tree, similar to a
        B-tree. Operations to the tree a performed in such a way, that the
        tree is kept balanced, i.e. all branches have exactly the same
        depth. The texel tree is fast because it allows all text operations
        (insert, remove, copy, paste) in logarithmic time. It is efficient
        because it stores text on the level of strings and not on the
        character level and it stores the styling in a economic way. 
        
        Text model is an interface to the texel tree, hiding all the
        complexity of the recursive texel data structure. It is termed "text
        model" because in a model-view-controller scenario it would have the
        role of the "model". A matching view / editor component is
        `wxtextview <https://pypi.python.org/pypi/wxtextview>`_. In
        combination they can be used as text editor.
        
        
        Speed
        -----
        
        Note that textmodel is not yet optimized. By saying that the texel
        structure is fast, I mean that the time of operations grows only
        slowly with the length of the text. I would not be surprised, if the
        times could be improved by a factor of 2 or more.
        
        The following table shows how the time needed to insert a line grows
        with the length of the text. The text length is measured as number of
        text nodes, where each text node holds one line of text, e.g. 50000
        means a text with 50 thousand lines of text.
        
        =============== ====================
          # lines        time (milliseconds)
        =============== ====================
                  1         0.332514
                  3         0.379985
                  5         0.436915
                 10         0.519033
                 30         0.596213
                 50         0.657198
                100         0.75822
                300         0.843198
                500         0.897312
               1000         0.998324
               3000         1.081806
               5000         1.136462
              10000         1.246638
              30000         1.356982
              50000         1.404089
        =============== ====================
        
        As can be seen, the time grows only very little with number of
        lines. Ideally, I would expect a logarithmic dependence on text
        length. This is especially true for the following operations:
        
        - inserting strings
        - inserting other trees (=paste)
        - copying text
        - removing text
        - calculating index positions from (row, col)-tuples and vice versa
        - counting lines
        
        Moreover, pasting and cutting text changes only little with the size of
        the text which is cut out or pasted in. Again, there should be a
        logarithmic dependence.
        
        Implementation details
        ----------------------
        
        The texel tree consists of different kinds of texels: group texels,
        character texels, glyphs texels and containers texels.
        
        Character texels hold strings of uniformly styled unicode
        text. NewLines are a special case of character texels. Groups hold
        child elements. The following texel stores the words *Hello world!*
        with *world* marked with red.
        
        ::
        
          G[C('Hello'), C('world!', bgcolor='red')] 
          
        Each texel has a length, which corresponds to the number of contained
        characters. For example, the length of C('Hello') is 5 and the length
        of an empty group is zero.
        
        There are also texels for new lines and tabs and a special mark for
        the end of text.
        
        It is easy to extend text model by introducing new texels, e.g. tables
        and math formulas.
        
        Each texel has a **weights** attribute. This attribute is one of the
        reasons for the high efficiency of the texel tree. It is a tuple of 3
        integer numbers and it facilitates fast navigation in the tree. The
        first entry gives the depth of the texel, which is needed internally,
        the second gives the number of characters in texel and the third gives
        the number of line breaks in the texel. The latter is used excessively
        by methods as nlines, linelength, lineend and index2position.
        
        
Platform: any
