Metadata-Version: 2.4
Name: MalwareClassifier
Version: 0.1.6
Summary: A malware classifier template with built-in logging.
Author-email: cchunhuang <cchunhuang147@gmail.com>
License: MIT License
        
        Copyright (c) 2025 cchunhuang
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/cchunhuang/MalwareClassifier
Project-URL: Issues, https://github.com/cchunhuang/MalwareClassifier/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: python_box>=7.3.2
Requires-Dist: python-json-logger>=2.0.7
Dynamic: license-file

# MalwareClassifier

MalwareClassifier is a Python package that provides a **template** for building a malware classification system with a **built-in logging system** and **configurable settings**.
It is designed to be **modular**, **extensible**, and **easy to install** using `pip`.

---

## Table of Contents

* [Installation](#installation)
* [Quick Start](#quick-start)
* [Configuration](#configuration)
* [Logging Usage](#logging-usage)
* [Publishing](#publishing)
* [License](#license)
* [Contact](#contact)

---

## Installation

### Option A: Install using pip

```bash
pip install MalwareClassifier
```

### Option B: Standard installation

```bash
# Clone the repository
git clone git@github.com:cchunhuang/MalwareClassifier.git
cd MalwareClassifier

# Install
pip install .

# Install additional dependencies (optional)
pip install -r requirements.txt
```

### Dependencies

The package requires the following core dependencies:
- `python_box>=7.3.2` - For configuration management with dot notation access
- `python-json-logger>=2.0.7` - For JSON format logging (optional)

---

## Quick Start

```python
from MalwareClassifier import MalwareClassifier, setup_logging, get_logger

class SubMalwareClassifier(MalwareClassifier):
    def __init__(self, config_path="./config.json"):
        super().__init__(config_path)
        logging_config = setup_logging(log_dir=self.config.folder.log)
        self.start_time = logging_config["start_time"]
        self.logger = get_logger(__name__)

    def get_feature(self):
        self.logger.info("Extracting features.")
    
    def get_vector(self):
        self.logger.info("Vectorizing.")

    def get_model(self, action: str = "train"):
        self.logger.info(f"Using model for action: {action}.")
    
    def get_prediction(self):
        self.logger.info("Predicting.")
    
if __name__ == "__main__":
    classifier = SubMalwareClassifier()
    classifier.get_feature()
    classifier.get_vector()
    classifier.get_model()
    classifier.get_prediction()
```

### Key Features

- The `MalwareClassifier` class in `malware_classifier.py` defines the **workflow skeleton**. Subclass it to override the following methods:
  - `get_feature()` - Extracts features from the malware dataset
  - `get_vector()` - Vectorizes the extracted features
  - `get_model(action="train")` - Trains the model or performs inference (action: "train" or "predict")
  - `get_prediction()` - Predicts the given files
- **Configuration Management**: Use `config.json` with dot notation access via `python_box` (e.g., `self.config.folder.log`)
- **Automatic Directory Creation**: All folders specified in config are created automatically via `self.mkdir()`
- **Built-in Logging**: Integrated logging system with file and console output
- You can specify your own `config_path` when initializing the classifier

---

## Configuration

The package includes a default `config.json`:

```json
{
    "file": {
        "label": "./dataset/label.csv"
    },
    "folder": {
        "log": "./output/log/",
        "dataset": "./dataset/",
        "feature": "./output/feature/",
        "vector": "./output/vector/",
        "model": "./output/model/",
        "predict": "./output/predict/"
    },
    "params": {
        "mode": "detection",
        "feature": {
            "save": true,
            "load": false
        },
        "vector": {
            "save": true,
            "load": false
        },
        "model": {
            "save": true,
            "load": false
        },
        "predict": {
            "save": true,
            "load": false
        }
    }
}
```

### Configuration Access

After calling `super().__init__()`, you can access configuration values using dot notation:

- **File paths**: `self.config.file.label`
- **Directory paths**: `self.config.folder.log`, `self.config.folder.dataset`, etc.
- **Parameters**: `self.config.params.mode`
- **Feature settings**: `self.config.params.feature.save`, `self.config.params.vector.load`, etc.

### Customization

- The `config.json` structure is fully customizable
- Add new sections or parameters as needed
- Access nested values with dot notation, e.g. `self.config.section.subsection.value`
- All directories listed in the `folder` section are created automatically via `self.mkdir()`

---

## Logging Usage

The logging system is defined in `src/MalwareClassifier/logging.py`.

### Available functions

* `setup_logging(config=None, config_path=None, reset_handlers=True, log_dir=None)`
  Initialize logging with optional config overrides.
    * It is recommended to use `setup_logging(log_dir=self.config.folder.log)`
    * Return: logging config (dict)
    ```python
    # Global variables: Process start time captured once per interpreter run
    START_TIME = datetime.now().strftime("%Y%m%d-%H%M%S")

    _DEFAULT_CONFIG: Dict[str, Any] = {
        "version": 1,
        "disable_existing_loggers": False,
        "formatters": {
            "basic": {
                "format": "[%(levelname)s] %(name)s %(filename)s: %(message)s",
                "datefmt": "%Y-%m-%d %H:%M:%S",
            },
            "verbose": {
                "format": "%(asctime)s [%(levelname)s] %(name)s %(filename)s:%(lineno)d - %(message)s",
                "datefmt": "%Y-%m-%d %H:%M:%S",
            },
            "json": {
                "class": "pythonjsonlogger.jsonlogger.JsonFormatter",
                "format": "%(asctime)s %(levelname)s %(name)s %(message)s",
            },
        },
        "handlers": {
            "console": {
                "class": "logging.StreamHandler",
                "level": "INFO",
                "formatter": "basic",
                "stream": "ext://sys.stdout",
            },
            "file": {
                "class": "logging.FileHandler",
                "level": "DEBUG",
                "formatter": "verbose",
                "filename": f"malware_classifier-{START_TIME}.log",
                "encoding": "utf-8",
            },
        },
        "root": {
            "level": "INFO",
            "handlers": ["console", "file"],
        },
        "start_time": START_TIME
    }
    ```
* `get_logger(name)`
  Retrieve a logger for any module.

### Default behavior

* Logs are written both to **console** and **file**.
* Log files are automatically named as:
  `malware_classifier-YYYYMMDD-HHMMSS.log`

### Environment variables

| Variable                 | Description                                  | Example         |
| ------------------------ | -------------------------------------------- | --------------- |
| `MALCLASS_LOG_LEVEL`     | Set log level                                | `DEBUG`, `INFO` |
| `MALCLASS_LOG_FILE`      | Full path for the log file                   | `/tmp/log.txt`  |
| `MALCLASS_LOG_DIR`       | Directory for log files                      | `./output/log`  |
| `MALCLASS_LOG_FORMATTER` | Choose formatter: `basic`, `verbose`, `json` | `verbose`       |

**Note:** JSON logging is supported through [`python-json-logger`](https://pypi.org/project/python-json-logger/), which is included as a dependency.

### Example usage in modules

```python
from MalwareClassifier import setup_logging, get_logger

setup_logging()
logger = get_logger(__name__)
logger.info("This is an info message")
logger.debug("This is a debug message")
```

---

## Publishing

### Check version

```bash
pip install -e .
python -c "import MalwareClassifier; print(MalwareClassifier.__version__)"
```

### Publish to PyPI

```
git tag v0.1.0  # Replace the version
git push origin v0.1.0
# git action will publish it automatically.
```

### Publish to PyPI (Manual)

```bash
pip install build twine
python -m build
twine upload dist/*
```

---

## License

This project is licensed under the terms of the [MIT License](LICENSE).

---

## Contact

* **Homepage:** [https://github.com/cchunhuang/MalwareClassifier](https://github.com/cchunhuang/MalwareClassifier)
* **Issues:** [https://github.com/cchunhuang/MalwareClassifier/issues](https://github.com/cchunhuang/MalwareClassifier/issues)
