ReadTheDocs Documentation #

This notebook covers how to load content from html that was generated as part of a Read-The-Docs build.

For an example of this in the wild, see here (opens in a new tab) .

This assumes that the html has already been scraped into a folder. This can be done by uncommenting and running the following command

#!wget -r -A -P rtdocs https://langchain.readthedocs.io/en/latest/

from langchain.document_loaders import ReadTheDocsLoader

loader = ReadTheDocsLoader("rtdocs", features='html.parser')

docs = loader.load()