Metadata-Version: 2.4
Name: bqdf
Version: 0.0.6
Summary: Ergonomic wrapper for pandas_gbq that simplifies loading BigQuery data into DataFrames
Home-page: https://github.com/motdam/bqdf
Author: motdam
Author-email: tmad@hotmail.co.uk
License: Apache Software License 2.0
Keywords: nbdev jupyter notebook python
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: fastcore
Requires-Dist: pandas
Requires-Dist: google-cloud-bigquery
Requires-Dist: google-cloud-bigquery-storage
Requires-Dist: pandas-gbq
Provides-Extra: dev
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# bqdf


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Usage

### Installation

Install latest from the GitHub
[repository](https://github.com/motdam/bqdf):

``` sh
$ pip install git+https://github.com/motdam/bqdf.git
```

or from [conda](https://anaconda.org/motdam/bqdf)

``` sh
$ conda install -c motdam bqdf
```

or from [pypi](https://pypi.org/project/bqdf/)

``` sh
$ pip install bqdf
```

### Documentation

Documentation can be found hosted on this GitHub
[repository](https://github.com/motdam/bqdf)’s
[pages](https://motdam.github.io/bqdf/). Additionally you can find
package manager specific guidelines on
[conda](https://anaconda.org/motdam/bqdf) and
[pypi](https://pypi.org/project/bqdf/) respectively.

## How to use

This lib provides convenience functions for streamlining the interface
of the pandas-gbq library to perform CRUD operations in BigQuery more
quickly

``` python
import pandas_gbq
import pandas as pd
```

``` python
top_terms_query = """
-- todays top 10 search terms in England
SELECT refresh_date, rank, term, score, percent_gain / 100 as percent_gain, country_name, week
FROM `bigquery-public-data.google_trends.international_top_rising_terms` 
WHERE country_name = 'United Kingdom'
  and refresh_date = current_date - 1
  and region_name = 'England'
order by refresh_date desc, week desc, rank
limit 5
"""
```

### Reading a BigQuery table

``` python
df = read(top_terms_query, project_id='bq-sandbox-motdam')
df.head()
```

To recreate the above with the original library you would need the below
boiler plate to inspect the results and convert columns into pandas
friendly dtypes.

``` python
df = pandas_gbq.read_gbq(top_terms_query, project_id='bq-sandbox-motdam')
df = df.astype({
    'percent_gain':'Float64'
})
df['week'] = pd.to_datetime(df['week'])
df['refresh_date'] = pd.to_datetime(df['refresh_date'])
print(df.info())
df.head()
```

### Writing a df to BigQuery

The rest [`to`](https://motdam.github.io/bqdf/core.html#to) function is
unchanged beyond removing the redundant \_gbq suffix. We can write our
df back into BigQuery using hte
[`to`](https://motdam.github.io/bqdf/core.html#to) function.

``` python
# Write the dataframe to a temporary table
to(df, 'bq-sandbox-motdam.temporary.top_10_eng_search_terms', if_exists='replace')
```

### Executing SQL in BigQuery

The [`ex`](https://motdam.github.io/bqdf/core.html#ex) fucntion enables
non df based CRUD operations within the same api which can be useful for
creating feature processing pipelines.

``` python
project = 'bq-sandbox-motdam'

def create_top_terms(period, days):
    return f"""
    CREATE OR REPLACE TABLE `{project}.temporary.top_terms_{period}` AS
    WITH ranked AS (
      SELECT region_name, term, COUNT(*) as appearances, AVG(rank) as avg_rank,
        ROW_NUMBER() OVER (PARTITION BY region_name ORDER BY COUNT(*) DESC, AVG(rank)) as rn
      FROM `bigquery-public-data.google_trends.international_top_rising_terms`
      WHERE country_name = 'United Kingdom'
        AND region_name IN ('England', 'Scotland', 'Wales', 'Northern Ireland')
        AND refresh_date BETWEEN CURRENT_DATE() - {days} AND CURRENT_DATE()
        AND rank <= 100
      GROUP BY region_name, term
    )
    SELECT region_name, term as top_term_{period}
    FROM ranked WHERE rn = 1
    """

ex(create_top_terms('today', 1), project_id=project)
ex(create_top_terms('week', 8), project_id=project)
ex(create_top_terms('month', 31), project_id=project)
ex(create_top_terms('year', 366), project_id=project)

final_query = f"""
SELECT t.region_name, t.top_term_today, w.top_term_week, m.top_term_month, y.top_term_year
FROM `{project}.temporary.top_terms_today` as t
JOIN `{project}.temporary.top_terms_week` as w ON t.region_name = w.region_name
JOIN `{project}.temporary.top_terms_month` as m ON t.region_name = m.region_name
JOIN `{project}.temporary.top_terms_year` as y ON t.region_name = y.region_name
ORDER BY t.region_name
"""

read(final_query, project_id=project)
```

## Developer Guide

If you are new to using `nbdev` here are some useful pointers to get you
started.

### Install bqdf in Development mode

``` sh
# make sure bqdf package is installed in development mode
$ pip install -e .

# make changes under nbs/ directory
# ...

# compile to have changes apply to bqdf
$ nbdev_prepare
```
