Extraction

HTML → Markdown → Chunks

Extracted

9,953

pages

Empty

1,213

pages

Total Chunks

71,351

Total Pages

12,068

Avg Chunks/Page

5.9

Chunks by Domain (top 30)