Metadata-Version: 2.4
Name: remote-read-sql
Version: 1.1.2
Summary: Securely read sql into a pandas dataframe from a remote mysql DB
Keywords: pandas,ssh,mysql,remote,dataframe
Author: Erik van Widenfelt
Author-email: Erik van Widenfelt <ew2789@gmail.com>
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 5 - Production/Stable
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: mysqlclient>=2.2.7
Requires-Dist: pandas>=2.3.3
Requires-Dist: paramiko==2.9.3
Requires-Dist: python-dotenv>=1.1.1
Requires-Dist: sphinx>=8.2.3
Requires-Dist: sqlalchemy>=2.0.44
Requires-Dist: sqlglot>=27.28.1
Requires-Dist: sshtunnel>=0.4.0
Requires-Python: >=3.12
Description-Content-Type: text/x-rst

Remote read_sql
===============

Read SQL into a pandas data frame from a remote server

Installation
------------

.. code-block:: bash

    pip install remote-read-sql

Usage
-----

In this example, `remote_read_sql` opens an ssh tunnel and connects to the mysql server locally on port 3306. The SQL query is sanitized and passed to pandas `read_sql`.

After reading the data into the dataframe, the ssh and db connections are closed.

Storing your credentials in files
+++++++++++++++++++++++++++++++++

You should read your credentials from a file or files. Do not write credentials directly in your notebook.

In this example, the ssh credentials are in a ENV file that might look something like this::

    SSH_SERVER_IP=server.example.com
    SSH_USER=user
    SSH_KEY_PATH=~/.ssh/id_rsa
    SSH_KEY_PASS=
    LOCAL_BIND_PORT=3307
    REMOTE_HOST=127.0.0.1
    LOCAL_BIND_PORT=3307
    REMOTE_DB_PORT=3306

and the mysql credentials are in the ``my.cnf`` file and might look like this::

    [remote_server]
    user=user_readonly
    password=password
    default-character-set=utf8
    host=127.0.0.1
    port=3306

Preparing your credentials
++++++++++++++++++++++++++

Since you may be calling ``remote_read_sql`` several times in the same notebook, store the paths to your credentials in a dictionary as a convenience.

.. code-block:: python

    # change to your own paths
    ssh_config_path = Path("~/.my_ssh_config")
    my_cnf_path = Path("~/.my.cnf")
    db_name = "my_database"

    # combine kwargs into a dictionary
    conn_opts = {
        "ssh_config_path": ssh_config_path,
        "my_cnf_path": my_cnf_path,
        "my_cnf_connection_name": "remote_server",
        "db_name": db_name,
    }

Running a single query
++++++++++++++++++++++

To run a single query and return a Dataframe, pass the SQL query to ``remote_read_sql`` along with your ``conn_opts`` from above. The SQL query can be any valid SELECT statement.

.. code-block:: python

    # open ssh, open db, read sql into dataframe, close db, close ssh
    df = remote_read_sql("SELECT * FROM subject_glucose", **conn_opts)

    # inspect the dataframe
    df.head()


Running multiple queries
++++++++++++++++++++++++

When running ``remote_read_sql`` with the SQL query as above, the connection closes immediately after running the SQL statement. If you want to run several SQL queries using the same connection, use ``remote_read_sql`` as a context manager. As a context manager, ``remote_read_sql`` opens the connection, you run ``pd.read_sql()`` for multiple queries within the ``with`` statement. Once you leave the ``with`` statement, ``remote_read_sql`` closes the connection.

For ``remote_read_sql`` to work as a context manager, the SQL query is not passed to `remote_read_sql`:

.. code-block:: python

    with  remote_read_sql(**conn_opts) as db_conn:
        # connection db_conn is open
        # read sql
        df_glucose = pd.read_sql("SELECT * FROM subject_glucose", db_conn)
        # read sql
        df_bp = pd.read_sql("SELECT * FROM subject_bp", db_conn)

    # connection db_conn is closed
    # view your Dataframes
    df_glucose.head()
    df_bp.head()
