TF-IDF: Term Frequency-Inverse Document Frequency Analysis
├── Introduction
│   └── What is TF-IDF?
├── Setting Up the Environment
│   ├── Importing Libraries
│   └── Generating Sample Text Data
├── Implementing TF-IDF
│   ├── Data Preparation
│   ├── Applying TF-IDF
│   └── Understanding the TF-IDF Matrix
├── Visualization
│   └── Visualizing TF-IDF Scores
└── Conclusion
    └── Advantages and Applications

1. Introduction

What is TF-IDF?

2. Setting Up the Environment

Importing Libraries

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer

Generating Sample Text Data

# Sample text data
documents = [
    "The quick brown fox jumped over the lazy dog.",
    "The dog slept under the veranda.",
    "John and Mary went to the market to buy bread and jam.",
    "The lazy dog woke up and chased the quick brown fox."
]

3. Implementing TF-IDF

Data Preparation

Applying TF-IDF

# Initializing TF-IDF Vectorizer
vectorizer = TfidfVectorizer()

# Fitting and transforming the documents
tfidf_matrix = vectorizer.fit_transform(documents)

# Creating a DataFrame for the TF-IDF matrix
tfidf_df = pd.DataFrame(tfidf_matrix.toarray(), columns=vectorizer.get_feature_names_out())

Understanding the TF-IDF Matrix

4. Visualization

Visualizing TF-IDF Scores