Jieba | Notion

Jieba: Effective Chinese Text Segmentation
├── Introduction
│   └── Overview of Jieba
├── Setting Up the Environment
│   ├── Installing Jieba
│   └── Importing Libraries
├── Core Functionalities of Jieba
│   ├── Tokenization
│   ├── Adding Custom Words
│   └── Keyword Extraction
├── Practical Examples
│   └── Implementing Jieba in Text Processing
└── Conclusion
    └── Applications and Extensions

1. Introduction

Overview of Jieba

Jieba is a widely-used Chinese text segmentation tool, known for its ease of use and flexibility. It offers efficient tokenization and supports customized lexicons.

2. Setting Up the Environment

Installing Jieba

pip install jieba

Importing Libraries

import jieba

3. Core Functionalities of Jieba

Tokenization

Cutting text into individual words/terms.

pythonCopy code
text = "結巴斷詞是中文斷詞的Python開源工具。"
tokens = jieba.cut(text)
print(list(tokens))

Adding Custom Words

Integrating specialized or domain-specific terms into Jieba's dictionary.

jieba.add_word('結巴斷詞')

Keyword Extraction

Identifying key terms within a body of text.