Metadata-Version: 2.1
Name: synct
Version: 1.5.2
Summary: Transform data from source to Google or Excel spreadsheet
Home-page: https://github.com/ari3s/synct
Author: Jan Beran
Author-email: jberan@redhat.com
License: GPL-3.0-or-later
Description-Content-Type: text/markdown
License-File: COPYING
License-File: AUTHORS
Requires-Dist: jira
Requires-Dist: google-api-python-client
Requires-Dist: google-auth-oauthlib
Requires-Dist: openpyxl
Requires-Dist: pandas
Requires-Dist: pyperclip
Requires-Dist: pyyaml
Requires-Dist: python-bugzilla

# synct

## Description

`synct` (derived from *synchronize tables*) is a Python script that retrieves data from a source and converts it to either Google or Excel spreadsheet as defined in the configuration file.

## Installation

### Fedora

The script can be installed on Fedora systems using `dnf` from the package stored in this project:

```
sudo dnf install synct-1.5.2-1.fc39.noarch.rpm
```

The script can also be installed from Fedora COPR using these commands:

```
sudo dnf copr enable aries/synct
sudo dnf install synct
```

The following packages are required to be installed:

```
python3-bugzilla
python3-google-api-client
python3-google-auth-oauthlib
python3-jira
python3-openpyxl
python3-pandas
python3-pyperclip
python3-pyyaml
```

### Linux and MacOS

The script can be installed on Linux or MacOS systems using `pip`:

```
python -m pip install synct-1.5.2.tar.gz
```

The following dependencies will be installed from PyPI by the above command:

```
google-api-python-client
google-auth-oauthlib
jira
openpyxl
pandas
pyperclip
pyyaml
python-bugzilla
```

## Usage

```
synct [-h] [--version] \
      [-c CONFIG] [-s SHEET [SHEET ...]] \
      [-a] [-r] [-n] \
      [-f FILE] [-t TABLE] [-o OFFSET] \
      [-v] [-q]
```

The script updates data rows in the target Google or Excel spreadsheet based on key values. If a key value is missing or placed inappropriately, it should be manually corrected. Then the script can update the related data. Missing key values are stored in the clipboard, separated by new lines, which allows for easy copying into the spreadsheet.

### Options

- `-h`, `--help`
  - Show help with options list and their short description, and exit without any action.

- `--version`
  - Display the script name and version, and exit without any action.

- `-c CONFIG`, `--config CONFIG`
  - It defines the name of the YAML configuration file containing the input identification with access attributes, a reference to the target spreadsheet with sheet names and related queries, column names with their related items, and additional parameters defining the content of the spreadsheet. If this option is not defined, the script uses the `synct.yaml` file in the working directory.

- `-s SHEET [SHEET ...]`, `--sheet SHEET [SHEET ...]`
  - This option determines which sheets are processed. The selected sheets can be any sheets defined in the configuration YAML file. If the option is not specified, all sheets listed in the configuration file will be processed.

- `-a`, `--add`
  - It enables the addition of missing rows to the spreadsheet. The added rows are placed at the end of the specific sheet.

- `-n`,` --noupdate`
  - It disables updating the target spreadsheet.

- `-r`, `--remove`
  - It enables the removal of rows in the spreadsheet. It is related to rows whose values are not retrieved from a source. For example, the script is configured to collect and update open bugs from Bugzilla. The `-r` parameter allows for the deletion of rows containing bugs that have not been retrieved, meaning those that have been closed. Without this option, such rows are not updated, and a warning is reported by the script.

- `-f FILE`, `--file FILE`
  - It defines the source file name that can be either Excel or OpenDocument spreadsheet or CSV file. If the parameter occurs, the source file in the configuration file is ignored.

- `-t TABLE`, `--table TABLE`
  - This option defines the table/sheet in the source file that will be used. If the parameter occurs, a table name in the configuration file is ignored.

- `-o OFFSET`, `--offset OFFSET`
  - The parameter defines the header offset in the source file. If the parameter occurs, an offset in the configuration file is ignored. Default value is 0.

- `-v `, `--verbose`
  - By default, the script reports warnings and errors. The `-v` parameter extends the logging level to include the info level. Repeating the option, it increases verbosity, so `-vv` includes the debug level.

- `-q` , `--quiet`
  - The option reduces the logging to errors only.

## Configuration file structure

The configuration file contains reserved words written in capital letters, as well as additional data that defines the transition of input data to the target spreadsheet.

Bugzilla REST API documentation: [https://wiki.mozilla.org/Bugzilla:REST_API](https://wiki.mozilla.org/Bugzilla:REST_API)

GitHub REST API documentation: [https://docs.github.com/en/rest](https://docs.github.com/en/rest)

GitLab REST API documentation: [https://docs.gitlab.com/ee/api/rest/](https://docs.gitlab.com/ee/api/rest/)

Jira REST API documentation: [https://developer.atlassian.com/server/jira/platform/rest-apis/](https://developer.atlassian.com/server/jira/platform/rest-apis/)

Jira structured data, including custom field IDs and names, can be found in XML data exported from Jira.

| Reserved word      | Description |
| ------------------ | ----------- |
| `API_KEY`          | File name containing API key to access Bugzilla. |
| `BUGZILLA`         | The script retrieves data from Bugzilla. It should contain `API_KEY`, `DOMAIN`, and `URL`, optionally `MAX_RESULTS`. |
| `CONDITION`        | Used with the `FROM` and `GET` reserved words to define a condition that must be met to obtain the required data from the input. |
| `DEFAULT_COLUMNS`  | Enables usage of default column names. It means they can be omitted in the configuration file, and equal names of source data items and target spreadsheet columns are paired. The reserved word value can be either 'True' or 'False' and can be defined either globally or specifically for each sheet. This option is globally set to 'False' by default. |
| `DELIMITER`        | The delimiter separates items in one cell. The default value is space. The delimiter can be defined globally as well as individually in sheets and columns. If `DELIMITER` is defined together with the `GET` reserved word, it defines a separator between items obtained from the `GET` list. |
| `DOMAIN`           | Bugzilla domain. |
| `FILE`             | The script retrieves data from local file in spreadsheet format (.ods, .xls, .xlsx, .csv). It should contain `TYPE`, optionally 'FILE_NAME', `OFFSET' and/or `TABLE`. |
| `FILE_NAME`        | Name of the input file (optional). It is ignored if a file name is defined on the command line. |
| `FROM`             | Used with the `GET` (and optionally with the `CONDITION`) reserved word to address the higher level of structured identifiers. |
| `GET`              | Used with the `FROM` (and optionally with the `CONDITION`) reserved word to address the list of lower level structured identifiers with explicit values, which can be regular expressions. |
| `GITHUB`           | The script retrieves data from GitHub. It should contain `SEARCH_API` and `TOKEN`. |
| `GITLAB`           | The script retrieves data from GitLab. It should contain `SEARCH_API` and `TOKEN`. |
| `HEADER_OFFSET`    | The first row of the target spreadsheet is expected to be the header. In this case, `HEADER_OFFSET` is 0, which is the default value. If the header is larger, `HEADER_OFFSET` defines the value. It can be defined either globally or specifically for each sheet. |
| `INHERIT_FORMULAS` | Enables formula inheritance in added rows from the last original row in the columns that are not included in the source data. The reserved word value can be either 'True' or 'False' and can be defined either globally or specifically for each sheet or column. This option is globally set to 'False' by default. |
| `JIRA`             | The script retrieves data from Jira. It should contain `SERVER` and `TOKEN`, optionally `MAX_RESULTS`. |
| `KEY`              | The column containing keys is identified by the `KEY` reserved word with a value of `True`. It can be defined either globally or specifically for each sheet. |
| `LINK`             | Used in columns, it contains a URL that is used as a prefix for values. If the column is a key column, link format is used. |
| `MAX_RESULTS`      | Defines the maximum number of obtained items from Jira for each query. It can only be a part of the `JIRA` section. |
| `NAME`             | Defines the name of each sheet. |
| `OFFSET`           | Header offset in the spreadsheet input file (optional). It is ignored if an offset is defined on the command line. |
| `OPTIONAL`         | When the key with this specific column value is missing, it is not reported as a warning. The value can be a regular expression. |
| `QUERY`            | Query definition for each sheet. It is specific to the input: Bugzilla queries are in YAML format, GitHub and GitLab queries contain searching parameters of search API URL, Jira queries are in JIRA Query Language (JQL), and queries for spreadsheets are in Pandas query format. |
| `SEARCH_API`       | URL of the GitHub or GitLab search API. It can only be a part of either `GITHUB` or `GITLAB` section. |
| `SERVER`           | URL of the Jira server. It can only be a part of the `JIRA` section. |
| `SHEET_COLUMNS`    | Definition of column names and their relation to data identifiers obtained from the input. It can be defined either globally or specifically for each sheet. |
| `SHEETS`           | List of sheets that should be addressed in the target spreadsheet. |
| `SOURCE`           | Defines the column name's relation to a data identifier obtained from the input. It is used when multiple reserved words belong to a specific column, such as `CONDITION`, `FROM`, `GET`, `KEY`, `LINK`, or `OPTIONAL`. |
| `SPREADSHEET`      | Name of the target Excel spreadsheet. |
| `SPREADSHEET_ID`   | ID of the target Google spreadsheet. |
| `TABLE`            | Table/sheet name of the spreadsheet source (optional). Only one table is allowed. It is ignored if a table name is defined on the command line. |
| `TOKEN`            | File name that contains the token to access GitHub, GitLab or Jira. |
| `TYPE`             | Type of the local input file. The value must be `SPREADSHEET`. |
| `URL`              | Bugzilla URL. |

### Bugzilla YAML configuration file example

```
BUGZILLA:
  DOMAIN: bugzilla.redhat.com
  URL: https://bugzilla.redhat.com/xmlrpc.cgi
  API_KEY: ~/.config/python-bugzilla/bugzillarc
SPREADSHEET_ID: 1ftibl011mzlge8gcJ0MfmvIw_C8Hq2iYeUdU6B-rbwI
DELIMITER: ", "
HEADER_OFFSET: 0
SHEET_COLUMNS:
  Bug ID:
    SOURCE: id
    KEY: True
    LINK: https://bugzilla.redhat.com/show_bug.cgi?id=
  Component: component
  Status: status
  Summary: summary
  Assignee: assigned_to
  Priority: priority
  Flags:
    DELIMITER: " "
    SOURCE:
      DELIMITER: ""
      FROM: flags
      GET:
      - name: .*
        status: .*
  Opened: creation_time
SHEETS:
  - NAME: rpm
    QUERY:
        query_format: advanced
        include_fields:
        - _default
        - flags
        - external_bugs
        product: Fedora
        component: rpm
        status: __open__
        limit: 100
  - NAME: libdnf
    QUERY:
        query_format: advanced
        include_fields:
        - _default
        - flags
        - external_bugs
        product: Fedora
        component: libdnf
        status: __open__
        limit: 100
    SHEET_COLUMNS:
      Jira:
        SOURCE:
          FROM: external_bugs
          GET:
          - ext_bz_bug_id: .*
          CONDITION:
            type:
              type: JIRA
          LINK: https://issues.redhat.com/browse/
```

### GitHub YAML configuration file example

```
GITHUB:
  SEARCH_API: "https://api.github.com/search/"
  TOKEN: "~/.github/github_token"
SPREADSHEET_ID: 1aEHCYma8orzDiDbkOnDifoFL2LuOy4b0aVAD1j7NTPg
HEADER_OFFSET: 2
SHEET_COLUMNS:
  Issue Number: 
    SOURCE: number
    KEY: true
    LINK: https://github.com/ari3s/synct/issues/
  Title: title
  Assignee:
    SOURCE:
      FROM: assignees
      GET:
        - login: .*
  Label:
    SOURCE:
      FROM: labels
      GET:
        - name: .*
  Status: state
  Created: created_at
  Updated: updated_at
  Closed: closed_at
SHEETS:
- NAME: Issues
  QUERY: issues?q=state:open+state:closed+repo:ari3s/synct
```

### GitLab YAML configuration file example

```
GITLAB:
  SEARCH_API: "https://gitlab.cee.redhat.com/api/v4/groups/"
  TOKEN: "~/.gitlab/gitlab_token"
SPREADSHEET_ID: 1jj2EI5Kum7Q2_DgJMKGAAP7bzCwOOX_WBOZ9AussSZo
HEADER_OFFSET: 2
SHEET_COLUMNS:
  Issue Number:
    SOURCE: iid
    KEY: true
    LINK: https://gitlab.cee.redhat.com/developer-guide/developer-guide/-/issues/
  Title: title
  Assignee:
    SOURCE:
      FROM: assignees
      GET:
        - name: .*
  Label: labels
  Status: state
  Created: created_at
  Updated: updated_at
  Closed: closed_at
SHEETS:
- NAME: Issues
  DELIMITER: "\n"
  QUERY: 9649/search?scope=issues&search=file
```

### Jira YAML configuration file example

```
JIRA:
  SERVER: https://issues.redhat.com/
  TOKEN: "~/.jira/jira.token"
  MAX_RESULTS: '1000'
SPREADSHEET_ID: 1RuA01d-Asa6EKhZ9HJU7ykXk-wHjX3SIUsEgScS3ONM
SHEETS:
- NAME: rpm
  QUERY: project = RHEL AND component = rpm AND issuetype = Bug
SHEET_COLUMNS:
  Issue key:
    SOURCE: key
    KEY: True
    LINK: https://issues.redhat.com/browse/
  Title: fields.summary
  Assignee: fields.assignee
  Priority: fields.priority
  Fix Versions: fields.fixVersions
  Affects Versions: fields.versions
  Label: fields.labels
  Issue status: fields.status
```

## Authorized access

### Bugzilla

Bugzilla access is handled using an API key, as described at [https://bugzilla.readthedocs.io/en/latest/api/core/v1/general.html#authentication](https://bugzilla.readthedocs.io/en/latest/api/core/v1/general.html#authentication). An API key can be generated in the Preferences of the personal Bugzilla profile and stored in a file referred to in the YAML configuration file of the script.

### GitHub

The script uses the REST API, which can use an API token generated according to the guidance here: [https://docs.github.com/en/rest/overview/authenticating-to-the-rest-api](https://docs.github.com/en/rest/overview/authenticating-to-the-rest-api). The stored token file must be referred to in the YAML configuration file of the script.

### GitLab

The script uses the REST API, which uses an API token generated according to the guidance here: [https://docs.gitlab.com/ee/api/rest/#authentication](https://docs.gitlab.com/ee/api/rest/#authentication). The stored token file must be referred to in the YAML configuration file of the script.

### Jira

The script uses the REST API, which requires an API token. The token can be generated from your Jira account according to the guidance here: [https://confluence.atlassian.com/enterprise/using-personal-access-tokens-1026032365.html](https://confluence.atlassian.com/enterprise/using-personal-access-tokens-1026032365.html). The stored token file must be referred to in the YAML configuration file of the script.

### Google

Since the script uses the Google Sheets API to read and write data, it requires authorized access using an OAuth JSON key file, which can be obtained according to the instructions here: [https://developers.google.com/workspace/guides/create-credentials#service-account](https://developers.google.com/workspace/guides/create-credentials#service-account).

The following *Enabled APIs & Services* should be selected: *Cloud Identity-Aware Proxy API*, *Google Sheets API*, and *Token Service API*. Sensitive scopes set through the *OAuth consent screen* should allow the following option:

| API               | Scope                 | User-facing description |
| ----------------- | --------------------- | ----------------------- |
| Google Sheets API | .../auth/spreadsheets | See, edit, create, and delete all your Google Sheets spreadsheets |

The OAuth client ID type set in the *Credentials* menu should be *Desktop*. The OAuth client JSON file with service account credentials should be downloaded and stored locally. An easy way to provide service account credentials is by setting the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to the JSON file's name. The script will use the value of this variable to find the service account key JSON file.

## Support

Bugs can be reported or new feature requests can be raised in the [Issues](https://github.com/ari3s/synct/issues) section.
