Skip to content

sii-research/COCLP

 
 

Repository files navigation

English|简体中文

Corpus Cleansing Pipeline: COCLP

This project provides a series of workflows and tools for data cleaning, developed based on the ComfyUI platform.


🧪 Local Deployment Guide

1️⃣ Install ComfyUI

This project depends on ComfyUI. Please first clone and install it:

git clone https://github.com/comfyanonymous/ComfyUI.git

Please install the relevant dependencies according to the ComfyUI project documentation.

2️⃣ Download Project Source Code

First, clone this project's code repository and initialize submodules:

git clone https://github.com/LikeSwim/COCLP.git
cd COCLP/
git submodule update --init --recursive

3️⃣ Install Dependencies

Enter the project directory and install dependencies:

cd COCLP/Corpus_Cleansing_Pipeline/
pip install -r requirements.txt

🔧 Additional Dependencies

Depends on the following plugins or modules:
MinerU
rgthree-comfy
ComfyUI-to-Python-Extension
Comfyui-LG_GroupExecutor
faster-whisper
Please install the corresponding dependencies according to each project's documentation.
If you encounter the following error when using faster-whisper:

Could not load library libcudnn_ops_infer.so.8
Unable to load any of {libcudnn_cnn.so.9.1.0, libcudnn_cnn.so.9.1, libcudnn_cnn.so.9, libcudnn_cnn.so}
libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory

It indicates that the system lacks the CUDA Deep Neural Network library (cuDNN).
Install cuDNN (example for apt-based systems):

sudo apt update
sudo apt install libcudnn8 libcudnn8-dev -y  

Check the local cuDNN corresponding .so files:

find / -name "libcudnn_ops.so*" 2>/dev/null

Specify the cuDNN path:

export LD_LIBRARY_PATH=/path/to/your/cudnn/lib:$LD_LIBRARY_PATH

▶️ Run the Project

Put the following folders into ComfyUI's custom_nodes folder:
rgthree-comfy
ComfyUI-to-Python-Extension
Comfyui-LG_GroupExecutor
Corpus_Cleansing_Pipeline
Then start ComfyUI:

cd ComfyUI/
python main.py

📌 Example Explanation

In the example folder, the following workflow examples are provided:
Image processing workflow
PDF document processing workflow
DOCX document processing workflow
File decompression and classification workflow
Data privacy processing workflow
These workflow files can be imported into ComfyUI and run directly.

💾 Convert Workflow to Python Script

In the ComfyUI interface:
Click the top-left menu: Workflow ➡ Save as Script
Save the workflow as an executable .py file
Run in the terminal:

python your_workflow_script.py

🧩 Acknowledgements

Thanks to the following projects or individuals for their support:
Thanks to the ComfyUI, MinerU, ComfyUI-to-Python-Extension, Comfyui-LG_GroupExecutor, and faster-whisper teams for providing platform support.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%