Text Model
==========

Quick example
-------------

::

    >>> from textmodel import TextModel
    >>> text = TextModel(u'Hello World')
    >>> text2 = TextModel(u'!', fontsize=20)
    >>> text.insert(11, text2)
    >>> text.set_properties(6, 11, bgcolor='yellow')
    >>> for i in range(1000):
    ...     text.append(TextModel("Line %i\n" % i))
    >>> text.linelength(0) # length of first line
    19
    >>> text.index2position(100) # row, col of index 100
    (12, 2)


Introduction
------------

Word processors are usually believed to be heavy and slow
applications. However I think, that it is possible to design a word
processor which is light weight and which is fast - so fast that it
even can be implemented in a "slow" scripting language. Text model is
ment to be a prove of concept (even though it is merely a text editor
and not a full word processor).

Storing and editing text information is a problem with a long history
in computer science. Known solutions include the `gap buffer
<http://en.wikipedia.org/wiki/Gap_buffer>`_ (used by Emacs), the
`piece table <http://www.cs.unm.edu/~crowley/papers/sds.pdf>`_ (used
by MS-Word) and the `rope data structure
<http://en.wikipedia.org/wiki/Rope_%28data_structure%29>`_ .  Instead,
text model uses internally a structure which I named "texel tree" and
which is probably a new approach to the problem. The goal was to find
a data structure which stores text together with format information
and is

- fast (even when implemented in a scripting language)
- efficient (in memory consumption)
- hierarchic (so that texts can contain elements like tables which
  itself contain text)

The texel tree consists of nodes which are called texels (text
elements). Each texel can have a variable number of child texels
(between 8 and 15), forming a highly branched tree, similar to a
B-tree. Operations to the tree a performed in such a way, that the
tree is kept balanced, i.e. all branches have exactly the same
depth. The texel tree is fast because it allows all text operations
(insert, remove, copy, paste) in logarithmic time. It is efficient
because it stores text on the level of strings and not on the
character level and it stores the styling in a economic way. 

Text model is an interface to the texel tree, hiding all the
complexity of the recursive texel data structure. It is termed "text
model" because in a model-view-controller scenario it would have the
role of the "model". A matching view / editor component is
`wxtextview <https://pypi.python.org/pypi/wxtextview>`_. In
combination they can be used as text editor.


Speed
-----

Note that textmodel is not yet optimized. By saying that the texel
structure is fast, I mean that the time of operations grows only
slowly with the length of the text. I would not be surprised, if the
times could be improved by a factor of 2 or more.

The following table shows how the time needed to insert a line grows
with the length of the text. The text length is measured as number of
text nodes, where each text node holds one line of text, e.g. 50000
means a text with 50 thousand lines of text.

=============== ====================
  # lines        time (milliseconds)
=============== ====================
          1         0.332514
          3         0.379985
          5         0.436915
         10         0.519033
         30         0.596213
         50         0.657198
        100         0.75822
        300         0.843198
        500         0.897312
       1000         0.998324
       3000         1.081806
       5000         1.136462
      10000         1.246638
      30000         1.356982
      50000         1.404089
=============== ====================

As can be seen, the time grows only very little with number of
lines. Ideally, I would expect a logarithmic dependence on text
length. This is especially true for the following operations:

- inserting strings
- inserting other trees (=paste)
- copying text
- removing text
- calculating index positions from (row, col)-tuples and vice versa
- counting lines

Moreover, pasting and cutting text changes only little with the size of
the text which is cut out or pasted in. Again, there should be a
logarithmic dependence.

Implementation details
----------------------

The texel tree consists of different kinds of texels: group texels,
character texels, glyphs texels and containers texels.

Character texels hold strings of uniformly styled unicode
text. NewLines are a special case of character texels. Groups hold
child elements. The following texel stores the words *Hello world!*
with *world* marked with red.

::

  G[C('Hello'), C('world!', bgcolor='red')] 
  
Each texel has a length, which corresponds to the number of contained
characters. For example, the length of C('Hello') is 5 and the length
of an empty group is zero.

There are also texels for new lines and tabs and a special mark for
the end of text.

It is easy to extend text model by introducing new texels, e.g. tables
and math formulas.

Each texel has a **weights** attribute. This attribute is one of the
reasons for the high efficiency of the texel tree. It is a tuple of 3
integer numbers and it facilitates fast navigation in the tree. The
first entry gives the depth of the texel, which is needed internally,
the second gives the number of characters in texel and the third gives
the number of line breaks in the texel. The latter is used excessively
by methods as nlines, linelength, lineend and index2position.

