Metadata-Version: 2.1
Name: sword2vec
Version: 3.2.5
Summary: A simple skipgram word2vec implementations
Home-page: https://github.com/aziyan99/sword2vec
Author: Raja Azian
Author-email: rajaazian08@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: Cython (==0.29.35)
Requires-Dist: joblib (==1.2.0)
Requires-Dist: nltk (==3.8.1)
Requires-Dist: numpy (==1.24.3)
Provides-Extra: dev
Requires-Dist: pytest (>=7.0) ; extra == 'dev'
Requires-Dist: twine (>=4.0.2) ; extra == 'dev'

# sword2vec

The sword2vec contain SkipGramWord2Vec class serves as a proof of concept implementation for academic research in the field of natural language processing. It demonstrates the application of the Skip-Gram Word2Vec model, a widely studied technique for learning word embeddings.

Word embeddings, which are dense vector representations of words, play a crucial role in numerous NLP tasks, including text classification, sentiment analysis, and machine translation. The class showcases the training process of the Skip-Gram Word2Vec model, allowing researchers to experiment and validate their ideas in a controlled environment.

Key functionalities of the class include:

1. Training: Researchers can utilize the `train` method to train the Skip-Gram Word2Vec model on custom text corpora. It handles essential preprocessing steps such as vocabulary construction, embedding learning, and convergence monitoring. Researchers can fine-tune hyperparameters like window size, learning rate, embedding dimension, and the number of training epochs to suit their research objectives.

2. Prediction: The `predict` method enables researchers to explore the model's predictive capabilities by obtaining the most probable words given a target word. This functionality facilitates analysis of the model's ability to capture semantic relationships and contextual similarities between words.

3. Word Similarity: Researchers can utilize the `search_similar_words` method to investigate the learned word embeddings' ability to capture semantic similarity. By providing a target word, the method returns a list of the most similar words based on cosine similarity scores. This functionality aids in evaluating the model's ability to capture semantic relationships between words.

4. Saving and Loading Models: The class offers methods for saving trained models (`save_model` and `save_compressed_model`) and loading them for further analysis (`load_model` and `load_compressed_model`). This allows researchers to save their trained models, reproduce results, and conduct comparative studies.

By providing an accessible and customizable implementation, the SkipGramWord2Vec class serves as a valuable tool for researchers to explore and validate novel ideas in word embedding research. It aids in demonstrating the effectiveness of the Skip-Gram Word2Vec model and its potential application in academic research projects related to natural language processing.
