Metadata-Version: 2.1
Name: PgsFile
Version: 0.5.1
Summary: This module simplifies Python package management, script execution, file handling, web scraping, and multimedia downloads. The module supports (LLM-based) NLP tasks such as OCR, tokenization, lemmatization, POS tagging, NER, ATE, dependency parsing, MDD, WSD, LIWC, MIP analysis and Chinese-English sentence alignment. It also generates word lists, and plots data, aiding literary students. Ideal for scraping data, cleaning text, and analyzing language, it offers user-friendly tools to streamline workflows.
Home-page: https://mp.weixin.qq.com/s/lWMkYDWQMjBJNKY2vMYTpw
Author: Pan Guisheng
Author-email: panguisheng@sufe.edu.cn
License: Educational free
Classifier: Programming Language :: Python :: 3
Classifier: License :: Free For Educational Use
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: chardet
Requires-Dist: pandas
Requires-Dist: python-docx
Requires-Dist: pip
Requires-Dist: requests
Requires-Dist: fake-useragent
Requires-Dist: lxml
Requires-Dist: pimht
Requires-Dist: pysbd
Requires-Dist: nlpir-python
Requires-Dist: pillow
Requires-Dist: liwc

Purpose: This module is designed to make complex tasks accessible and convenient, even for beginners. By providing a unified set of tools, it simplifies the workflow for data collection, processing, and analysis. Whether you're scraping data from the web, cleaning text, or performing LLM-based NLP tasks, this module ensures you can focus on your research without getting bogged down by technical challenges.

Key Features:
1. **Web Scraping:** Easily scrape data from websites and download multimedia content.
2. **Package Management:** Install, uninstall, and manage Python packages with simple commands.
3. **Data Retrieval:** Extract data from various file formats like text, JSON, TSV, Excel, XML, and HTML (both online and offline).
4. **Data Storage:** Write and append data to text files, Excel, JSON, TMX, and JSON lines.
5. **File and Folder Processing:** Manage file paths, create directories, move or copy files, and search for files with specific keywords.
6. **Data Cleaning:** Clean text, handle punctuation, remove stopwords, convert Markdown strings into Python objects, and prepare data for analysis, utilizing valuable corpora and dictionaries such as CET-4/6 vocabulary, BE21 and BNC-COCA word lists.
7. **NLP:** Perform OCR, word tokenization, lemmatization, POS tagging, NER, dependency parsing, ATE, MDD, WSD, LIWC, MIP analysis, and Chinese-English sentence alignment using prepared LLM prompts.
8. **Math Operations:** Format numbers, convert decimals to percentages, and validate data.
9. **Visualization:** Process images (e.g., make white pixels transparent, resize images) and manage fonts for rendering text.

Author: Pan Guisheng, a PhD student at the Graduate Institute of Interpretation and Translation of Shanghai International Studies University
Email: 895284504@qq.com
