项目作者: jacksonpradolima

项目描述 :
GSP (Generalized Sequence Pattern) algorithm in Python
高级语言: Python
项目地址: git://github.com/jacksonpradolima/gsp-py.git
创建时间: 2017-10-26T18:43:24Z
项目社区:https://github.com/jacksonpradolima/gsp-py

开源协议:MIT License

下载


PyPI License

DOI

PyPI Downloads
Bugs
Vulnerabilities
Security Rating
Maintainability Rating
codecov

GSP-Py

GSP-Py: A Python-powered library to mine sequential patterns in large datasets, based on the robust Generalized
Sequence Pattern (GSP)
algorithm. Ideal for market basket analysis, temporal mining, and user journey discovery.

[!IMPORTANT]
GSP-Py is compatible with Python 3.8 and later versions!


📚 Table of Contents

  1. 🔍 What is GSP?
  2. 🔧 Requirements
  3. 🚀 Installation
  4. 🛠️ Developer Installation
  5. 💡 Usage
  6. 🌟 Planned Features
  7. 🤝 Contributing
  8. 📝 License
  9. 📖 Citation

🔍 What is GSP?

The Generalized Sequential Pattern (GSP) algorithm is a sequential pattern mining technique based on Apriori
principles
. Using support thresholds, GSP identifies frequent sequences of items in transaction datasets.

Key Features:

  • Support-based pruning: Only retains sequences that meet the minimum support threshold.
  • Candidate generation: Iteratively generates candidate sequences of increasing length.
  • General-purpose: Useful in retail, web analytics, social networks, temporal sequence mining, and more.

For example:

  • In a shopping dataset, GSP can identify patterns like “Customers who buy bread and milk often purchase diapers next.”
  • In a website clickstream, GSP might find patterns like “Users visit A, then go to B, and later proceed to C.”

🔧 Requirements

You will need Python installed on your system. On most Linux systems, you can install Python with:

  1. sudo apt install python3

For package dependencies of GSP-Py, they will automatically be installed when using pip.


🚀 Installation

GSP-Py can be easily installed from either the repository or PyPI.

Option 1: Clone the Repository

To manually clone the repository and set up the environment:

  1. git clone https://github.com/jacksonpradolima/gsp-py.git
  2. cd gsp-py

Refer to the Developer Installation section and run:

  1. rye sync

Option 2: Install via pip

Alternatively, install GSP-Py from PyPI with:

  1. pip install gsppy

🛠️ Developer Installation

This project uses Rye for managing dependencies, running scripts, and setting up the environment. Follow these steps to install and set up Rye for this project:

1. Install Rye

Run the following command to install Rye:

  1. curl -sSf https://rye.astral.sh/get | bash

If the ~/.rye/bin directory is not in your PATH, add the following line to your shell configuration file (e.g., ~/.bashrc, ~/.zshrc, etc.):

  1. export PATH="$HOME/.rye/bin:$PATH"

Reload your shell configuration file:

  1. source ~/.bashrc # or `source ~/.zshrc`

2. Set Up the Project Environment

To configure the project environment and install its dependencies, run:

  1. rye sync

3. Use Rye Scripts

Once the environment is set up, you can run the following commands to simplify project tasks:

  • Run tests (in parallel): rye run test
  • Format code: rye run format
  • Lint code: rye run lint
  • Type-check: rye run typecheck
  • Add new dependencies: rye add <package-name>
    • Add new dependency to dev dependencies: rye add --dev <package-name>

Notes

  • Rye automatically reads dependencies and scripts from the pyproject.toml file.
  • No need for requirements.txt, as Rye manages all dependencies!

💡 Usage

The library is designed to be easy to use and integrate with your own projects. Below is an example of how you can
configure and run GSP-Py.

Example Input Data

The input to the algorithm is a sequence of transactions, where each transaction contains a sequence of items:

  1. transactions = [
  2. ['Bread', 'Milk'],
  3. ['Bread', 'Diaper', 'Beer', 'Eggs'],
  4. ['Milk', 'Diaper', 'Beer', 'Coke'],
  5. ['Bread', 'Milk', 'Diaper', 'Beer'],
  6. ['Bread', 'Milk', 'Diaper', 'Coke']
  7. ]

Importing and Initializing the GSP Algorithm

Import the GSP class from the gsppy package and call the search method to find frequent patterns with a support
threshold (e.g., 0.3):

  1. from gsppy.gsp import GSP
  2. # Example transactions: customer purchases
  3. transactions = [
  4. ['Bread', 'Milk'], # Transaction 1
  5. ['Bread', 'Diaper', 'Beer', 'Eggs'], # Transaction 2
  6. ['Milk', 'Diaper', 'Beer', 'Coke'], # Transaction 3
  7. ['Bread', 'Milk', 'Diaper', 'Beer'], # Transaction 4
  8. ['Bread', 'Milk', 'Diaper', 'Coke'] # Transaction 5
  9. ]
  10. # Set minimum support threshold (30%)
  11. min_support = 0.3
  12. # Find frequent patterns
  13. result = GSP(transactions).search(min_support)
  14. # Output the results
  15. print(result)

Output

The algorithm will return a list of patterns with their corresponding support.

Sample Output:

  1. [
  2. {('Bread',): 4, ('Milk',): 4, ('Diaper',): 4, ('Beer',): 3, ('Coke',): 2},
  3. {('Bread', 'Milk'): 3, ('Milk', 'Diaper'): 3, ('Diaper', 'Beer'): 3},
  4. {('Bread', 'Milk', 'Diaper'): 2, ('Milk', 'Diaper', 'Beer'): 2}
  5. ]
  • The first dictionary contains single-item sequences with their frequencies (e.g., ('Bread',): 4 means “Bread”
    appears in 4 transactions).
  • The second dictionary contains 2-item sequential patterns (e.g., ('Bread', 'Milk'): 3 means the sequence “
    Bread → Milk” appears in 3 transactions).
  • The third dictionary contains 3-item sequential patterns (e.g., ('Bread', 'Milk', 'Diaper'): 2 means the
    sequence “Bread → Milk → Diaper” appears in 2 transactions).

[!NOTE]
The support of a sequence is calculated as the fraction of transactions containing the sequence, e.g.,
[Bread, Milk] appears in 3 out of 5 transactions → Support = 3 / 5 = 0.6 (60%).
This insight helps identify frequently occurring sequential patterns in datasets, such as shopping trends or user
behavior.

[!TIP]
For more complex examples, find example scripts in the gsppy/tests folder.


🌟 Planned Features

We are actively working to improve GSP-Py. Here are some exciting features planned for future releases:

  1. Custom Filters for Candidate Pruning:

    • Enable users to define their own pruning logic during the mining process.
  2. Support for Preprocessing and Postprocessing:

    • Add hooks to allow users to transform datasets before mining and customize the output results.
  3. Support for Time-Constrained Pattern Mining:

    • Extend GSP-Py to handle temporal datasets by allowing users to define time constraints (e.g., maximum time gaps
      between events, time windows) during the sequence mining process.
    • Enable candidate pruning and support calculations based on these temporal constraints.

Want to contribute or suggest an
improvement? Open a discussion or issue!


🤝 Contributing

We welcome contributions from the community! If you’d like to help improve GSP-Py, read
our CONTRIBUTING.md guide to get started.

Development dependencies (e.g., testing and linting tools) are automatically managed using Rye. To install
these dependencies and set up the environment, run:

  1. rye sync

After syncing, you can run the following scripts using Rye for development tasks:

  • Run tests (in parallel): rye run test
  • Lint code: rye run lint
  • Type-check: rye run typecheck
  • Format code: rye run format

General Steps:

  1. Fork the repository.
  2. Create a feature branch: git checkout -b feature/my-feature.
  3. Commit your changes: git commit -m "Add my feature."
  4. Push to your branch: git push origin feature/my-feature.
  5. Submit a pull request to the main repository!

Looking for ideas? Check out our Planned Features section.


📝 License

This project is licensed under the terms of the MIT License. For more details, refer to the LICENSE file.


📖 Citation

If GSP-Py contributed to your research or project that led to a publication, we kindly ask that you cite it as follows:

  1. @misc{pradolima_gsppy,
  2. author = {Prado Lima, Jackson Antonio do},
  3. title = {{GSP-Py - Generalized Sequence Pattern algorithm in Python}},
  4. month = Dec,
  5. year = 2025,
  6. doi = {10.5281/zenodo.3333987},
  7. url = {https://doi.org/10.5281/zenodo.3333987}
  8. }