Extraction
HTML → Markdown → Chunks