Metadata-Version: 2.1
Name: triplify-csv
Version: 0.1.0
Summary: A tool to generate triples from CSV files according to a configuration file.
License: BSD-3-Clause
Author: Adrian Atley
Requires-Python: >=3.7,<4.0
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Dist: click (>=8.0.3,<9.0.0)
Requires-Dist: pytest (>=6.2.5,<7.0.0)
Requires-Dist: rdflib (>=6.0.2,<7.0.0)
Description-Content-Type: text/x-rst

============
triplify_csv
============
Features
--------
Generates RDF triples or quads in Turtle or NQuad syntax from one or more CSV files and a configuration file.

Installation
------------
triplify_csv can be installed from PyPI using 'pip':

.. code-block:: bash

	pip install triplify_csv


Usage
-----
triplify_csv installs as both a package and command line interface tool

**Example of using the package**

.. code-block:: python

	from TriplifyCsv import Rml, CsvOptions
	
	# config mapping files are .ttl files
	configfile = 'myconfig.ttl'
	
	# csv files should be .csv files
	csvfile1, csvfile2 = 'mycsv1.csv', 'mycsv2.csv'
	
	# output file must have either a .ttl extension for turtle triples 
	# or a .nq extension for quads
	outputfile = 'mytriples.ttl'
	
	rml = Rml()
	
	# default date format of dates in your CSV files is '%Y-%m-%d'
	# default csv delimiter is ','
	# override the defaults by setting options
	options = CsvOptions(dateformat='%d/%m/%Y', delimiter='|')
	
	# load one rml and one or more csvs
	rml.loadFile(configfile, [csvfile1,csvfile2], options)
	
	rml.create_triples()
	
	# "nquads" for named graphs need a .nq extension
	# here we are generating triples so .ttl for turtle syntax
	rml.write_file(outputfile, format="ttl")
	 

**Example of CLI use - help text**

To display full help text on the options enter the following at the command line

.. code-block:: bash
	
	triplify_csv --help
	 

**Example of CLI use - making triples**
The same example as the one in code above as a CLI call instead ...

.. code-block:: bash
	
	triplify_csv -m 'myconfig.ttl' -c 'mycsv1.csv' -c 'mycsv2.csv' -o 'mytriples.ttl'
	 
**How to make your configuration file**

The configuration file contains a set of mappings for triplify_csv to follow to set the subjects, predicates and objects or literal values of your triples or nquads from the data in one or more CSV files. These mappings are RDF triples in the turtle syntax.  The terms that can be used are a subset of the terms defined in the R2RML standard.  

R2RML was not designed for this purpose. R2RML is '.. a language for expressing customized mappings from relational databases to RDF datasets.' (see https://www.w3.org/TR/r2rml/ ). Triplify_csv uses a subset of R2RML to express customised mappings from CSV files to RDF datasets.  Where R2RML refers to the tables of a database using 'rr:logicalTable' this should be understood in the triplify_csv use of R2RML as referring to the name (without '.csv') of a corresponding csv file.  'rr:sqlQuery', the term of the R2RML language that lets you express mappings from database queries to RDF isn't supported in the triplify_csv usage.  Also, there is no need to support 'rr:sqlVersion'.

For a complete list of what parts of the R2RML language are supported see the examples in the /tests folder and refer to the R2RML test cases document (https://www.w3.org/TR/rdb2rdf-test-cases/). As of version 0.1.0 the test cases supported are

- R2RMLTC0007a - Typing resources by relying on rdf:type predicate
- R2RMLTC0007b - Assigning triples to Named Graphs
- R2RMLTC0007c - One column mapping, using rr:class
- R2RMLTC0007d - One column mapping, specifying an rr:predicateObjectMap with rdf:type
- R2RMLTC0007e - One column mapping, using rr:graphMap and rr:class
- R2RMLTC0007f - One column mapping, using rr:graphMap and specifying an rr:predicateObjectMap with rdf:type
- R2RMLTC0007g - Assigning triples to the default graph
- R2RMLTC0007h - Assigning triples to a non-IRI named graph
- R2RMLTC0008a - Generation of triples to a target graph by using rr:graphMap and rr:template
- R2RMLTC0008b - Generation of triples referencing object map
- R2RMLTC0008c - Generation of triples by using multiple predicateMaps within a rr:predicateObjectMap
- R2RMLTC0009a - Generation of triples from foreign key relations


**Simple config file example**
Suppose you have a CSV file containing details of contacts (example format below) and you want to generate RDF data from this using FOAF as the ontology you want to make use of for this, the R2RML config file might look like this ...

.. code-block:: bash

	@prefix rr: <http://www.w3.org/ns/r2rml#> .
	@prefix foaf: <http://xmlns.com/foaf/0.1/> .
	@prefix ex: <http://example.com/> .
	@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
	@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
	@base <http://example.com/base/> .
	
	<TriplesMap1> a rr:TriplesMap;
	rr:logicalTable [ rr:tableName "\"Contacts\"" ];
	
	rr:subjectMap [ rr:template "http://example.com/Contact/{\"ID\"}/{\"Name\"}";
	 rr:class foaf:Person;
	];
	
	rr:predicateObjectMap [ rr:predicate ex:id ;
	 rr:objectMap [ rr:column "\"ID\"" ;  ] ;
	];
	
	rr:predicateObjectMap [ rr:predicate foaf:name ;
	 rr:objectMap [ rr:column "\"Name\"" ; ] ;
	];
	
	rr:predicateObjectMap [ rr:predicate foaf:interest ;
	  rr:objectMap [ rr:column "\"Interest\"" ; ] ;
	];
	
	.




Create a CSV file called 'Contacts.csv' using commas as delimiters between the following values (shown here in a table) ...

.. csv-table:: Contacts.csv
	:header: "ID", "Name", "Interest"
	:widths: auto

	10, "John Smith", https://en.m.wikipedia.org/wiki/Tennis
	20, "Joe Bloggs", https://en.m.wikipedia.org/wiki/Golf
	30, "Mr Bun", https://en.m.wikipedia.org/wiki/Spam_(food)


Now, with triplify_csv installed save the R2RML config file as 'contactsmap.ttl' and the csv file as 'Contacts.csv' and generate the output containing your triples to a file called 'contactstriples.ttl' (for example) with the following command ...

.. code-block:: bash
	
	triplify_csv -m 'contactsmap.ttl' -c 'Contacts.csv' -o 'contactstriples.ttl'

	
The resulting triples in turtle syntax in the 'contactstriples.ttl' file
would look like this ...

.. code-block:: bash

	@prefix ex: <http://example.com/> .
	@prefix foaf: <http://xmlns.com/foaf/0.1/> .
	@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
	
	<http://example.com/Contact/10/John%20Smith> a foaf:Person ;
	    ex:id 10 ;
	    foaf:interest "https://en.m.wikipedia.org/wiki/Tennis" ;
	    foaf:name "John Smith" .
	
	<http://example.com/Contact/20/Joe%20Bloggs> a foaf:Person ;
	    ex:id 20 ;
	    foaf:interest "https://en.m.wikipedia.org/wiki/Golf" ;
	    foaf:name "Joe Bloggs" .
	
	<http://example.com/Contact/30/Mr%20Bun> a foaf:Person ;
	    ex:id 30 ;
	    foaf:interest "https://en.m.wikipedia.org/wiki/Spam_(food)" ;
	    foaf:name "Mr Bun" .

