An offline R-based web scraper built to extract key content from the term_project.html file of the Neptune Technologies Job Application Platform.
This scraper is designed for analyzing the HTML structure of a previously built static webpage—term_project.html—without requiring an internet connection.
The R script parses a local HTML file and extracts:
- Page title
- All headings (
<h1>–<h6>) - Paragraph text
- Links (anchor
<a>tags) - Specific content from targeted sections using CSS selectors
rvest– for web scrapingxml2– to read and parse HTMLdplyr– for data wrangling- Base R functions
- Save your
term_project.htmlfile locally in your R working directory - Run the scraper script in RStudio or your preferred IDE
- View the output in the console or export to
.csvor.txtfor further analysis