Metadata-Version: 2.4
Name: datapaths
Version: 1.0.1
Summary: Python bindings for DataPaths SQL engine
Classifier: Operating System :: Microsoft :: Windows
Description-Content-Type: text/markdown
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: summary

## Database

DataPaths is an in-memory SQL database for relational data storage. Written in C/C++, it is designed to maximize storage capacity by compressing low-entropy columns. Developed from the start without external dependencies, the DataPaths library can be easily embedded in a wide range of projects.

## Data Types

The following data types are currently available:

- SHORT: 16-bit signed int
- INT: 32-bit signed int
- LONG: 64-bit signed int
- FLOAT: 32-bit floating point number
- DOUBLE: 64-bit floating point number
- STRING: text data
- DATE: 2-bit calendar date stored as an integer offset from Jan 1, 1970

## Table Configuration

Several methods for table configuration are available. The default method assumes no knowledge of data structure and attempts to optimize the table structure once a certain number of rows are available in the table:

`CREATE TABLE(column1 TYPE, column2 TYPE,...)`

An alternative approach is to explicitly specify which columns can be used as categories. This method requires some knowledge of the data to identify which table columns contain the least information:

`CREATE TABLE(column1 TYPE [CAT NUM] , column2 TYPE [CAT NUM],...)`

Finally, an option is available to configure the table using an existing data set. This procedure will read the CSV file to determine which columns should be used as categories:

`CREATE TABLE(column1 TYPE, column2 TYPE,...) CONFIG FROM file_name`

## Loading Data

Data can be loaded from a standard CSV file using the following command:

`LOAD TABLE FROM file_name`

## SELECT Query

The following is a general form of SELECT statement:

`SELECT column1, column2, ... FROM table [JOIN join_table on table_col = join_col ] [WHERE condition1, condition2, ...] [ORDER BY column1 [DESC], column2 [DESC], ...]`

## Python Interface

Python bindings are available for interacting with the database in Python applications. Currently, the DataPaths module can be installed on Windows and Linux platforms for Python 3.11â€“3.14:

`pip install datapaths`

The package provides two main classes for interacting with the database:

- `DatabaseParser`: executes database commands and SELECT queries
- `QuerySet`: represents query results and provides accessors for rows, columns, and string data

`DatabaseParser` is the main class for interacting with the database and it can execute database configuration commands as well as SELECT queries. The following methods are available:

- `executeCommand(string)`: executes database command
- `executeSelect(string)`: execute SELECT command and return `QuerySet` object containing data results

`QuerySet` is the storage container for SELECT query results. The following methods are available:

- `getNumberOfRows()`: returns number of rows of the resulting query
- `getNumberOfCols()`: returns number of columns of the resulting query
- `setRow(int)`: set the current query row
- `getString(int)`: get column entry as string
- `getFloat(int)`: get column entry as float
- `getInt(int)`: get column entry as integer

DataPaths library error handling is performed using exceptions mechanism. If invalid command is executed, an exception is thrown which is then translated into Python `RuntimeError`.

## Python Example

The following code demonstrates how to load a 65 MB CSV file into the DataPaths Python module with just 35 MB of memory usage:

```python
import urllib.request, datapaths

csvFile = "Electric_Vehicle_Population_Data.csv"
urllib.request.urlretrieve("https://data.wa.gov/api/views/f6w7-q2d2/rows.csv?accessType=DOWNLOAD", csvFile)
parser = datapaths.DatabaseParser()
parser.executeCommand("create table ev(vin string,county string,city string,state string,postalcode int,"
                     "modelyear short,make string,model string,type string,cafveligibility string,"
                     "electricrange short,legislativedistrict short,vehicleid string,"
                     "vehiclelocation string,electricutility string,censustract int)")
parser.executeCommand(f"load ev from {csvFile}")
res = parser.executeSelect("select * from ev where make = TESLA")
for i in range(res.getNumberOfRows()):
    res.setRow(i)
    print(f"VIN: {res.getString(0)} Model: {res.getString(7)}")
```

## Software Errors

To report software errors and bugs, please contact support@datapaths.technology
