Extending Types
===============

This is the next tutorial, see the :doc:`previous <tutorial0>`

Once you have your base transformation up and running it is time to extend the
fields. Out of the box we already offer various extensions, the full list of
:ref:`Field types <field-types>` can be found in the reference.

Translating IDs
---------------

One of the patterns we see in the evolution with microservices is a move away
from fully normalized database schema's. Modern systems rely more on
auto-completion, data is cheaper and probably a lot more reasons to store
full strings instead of IDs. In *data-migrator* this is easily supported for
hardcoded values with a small python function:

.. code-block:: python

  M = {
    0: "NOT SET",
    1: "Hallo",
    2: "Hello",
    3: "Bonjour"
  }

  def parse_b(v):
    return M.get(int(v), M[0])

  class Result(models.Model):
    id = models.IntField(pos=0) # keep id
    ....
    # replace ID with value
    b  = models.StringField(pos=2, parse=parse_b, default="How are you doing?")

Note the values are parsed as string from the CSV reader.``NULL`` is by default
translated to ``None``, which is replaced by the default value and will never
see the ``parse`` function.

Merging columns
---------------

Another migration pattern is to merge separate (boolean) columns back to a
single enumeration column. To support that use a row parser instead of a single
value parser. If no ``pos`` is given, the parser will be row based
instead of a value parsed and linked to a single column value:

.. code-block:: python

  def employment_type(row):
    if row[26] == "1": 		# contractor
      return 'contractor'
    elif row[27] == "1": 	# intern
      return 'intern'
    else:
      return 'perm'

  class Result(models.Model):
    ....
    b  = models.StringField(parse=employment_type, default="perm")


Dynamic lookups
---------------

At moments one needs to lookup values in the target database. Do not be shy to
generate dynamic lookups in the target database using SELECT statements that
run during import into the target database.

.. code-block:: python

  class Result(models.Model):
    recruiter_uuid = models.StringField(pos=38,
      replacement=lambda x:'(SELECT uuid FROM `persons` WHERE `mail`="%s" limit 0,1)' % x)

This can off course be combined with python based transformations to fix deleted values:

.. code-block:: python

  def recruiter(v):
    if v is None or v in ['missing1@mail.com', 'missing2@mail.com']:
        return 'default_person@mail.com'
    else:
      return v

  class Result(models.Model):
    recruiter_uuid = models.StringField(pos=38, parse=recruiter,
      replacement=lambda x:'(SELECT uuid FROM `persons` WHERE `mail`="%s" limit 0,1)' % x)

The output is a value for the target database as being the input for a query on that target database.
BTW replacement by output string is considered a default. Therefore the
replacement string in ``format`` type of transformation can be provided directly:

.. code-block:: python

  ...

  class Result(models.Model):
    recruiter_uuid = models.StringField(pos=38, parse=recruiter,
      replacement='(SELECT uuid FROM `persons` WHERE `mail`="{}" limit 0,1)')


Table lookups
-------------

For larger tables there is support for table driven lookups from external
CSV files. It is also possible for the map (just a key,value) to be ad-hoc
generated by other means. *data-migrator* offers a helper function
:func:`~.read_map_from_csv` to read the csv.

.. code-block:: python

  from data_migrator.contrib.read import read_map_from_csv

  class Result(models.Model):
    country = models.MappingField(pos=33, default='NLD',
      data_map=read_map_from_csv(f=open('data/country.csv'), delimiter=';', key='country_id', value='alpha3'))

Combining table lookups
-----------------------

The table lookup and column reduction can also be combined. Consider a multi
state entity with specific lookup values to be merged in one lookup value:

.. code-block:: python

  from data_migrator.contrib.read import read_map_from_csv

  LOOKUP1 = read_map_from_csv(f=open('data/state1.csv'), delimiter=';', key='id', value='name')
  LOOKUP2 = read_map_from_csv(f=open('data/state2.csv'), delimiter=';', key='id', value='name')

  def parse_lookup(row):
    return LOOKUP1.get(row[1], LOOKUP2.get(row[2], ''))

  class Result(models.Model):
    ...
    state = models.StringField(parse=parse_lookup)

Flatten multi values
--------------------

The most extensive many-2-many flattening is for example a tagging of multiple
values to a main entity. This is mostly implemented in a 3 table structure,
following the classic normalization approach:

* A table with the main entity (for example persons)
* a table with the attributes in a fixed id,value structure and last
* a many-to-many table linking the attributes to the main entities.

A simple approach to flatten this is to encode this a JSON list, to transform the data use a four step
approach:

#. Extract the data from the old system fully expanded
#. Read the CSV and flatten to a map of lists
#. Link the values at read time replacing the main ID with lists
#. Emit the whole as a JSON list

The first step relies on queries like:

.. code-block:: sql

	SELECT
		P.person_id,
		S.name as skill
	FROM person_skill_m2m P
	INNER JOIN skills S
		ON S.id=P.skill_id;

After that, loading and emitting to JSON is simply using the :class:`~.MappingField`

.. code-block:: python

  from data_migrator.contrib.read import read_map_from_csv

  class Result(models.Model):
    skills = models.MappingField(pos=0, default=[], as_json=True,
      data_map=read_map_from_csv(f=open('results/skill.csv'), key='candidate_id', value='skill', as_list=True))


Now take these examples and mix your own. It is standard Python, we have no
doubt you can come up with all kinds of amazing transformations.
