Offline-Web-Scraper

An offline R-based web scraper built to extract key content from the term_project.html file of the Neptune Technologies Job Application Platform.

Project Purpose

This scraper is designed for analyzing the HTML structure of a previously built static webpage—term_project.html—without requiring an internet connection.

What It Does

The R script parses a local HTML file and extracts:

Page title
All headings (<h1>–<h6>)
Paragraph text
Links (anchor <a> tags)
Specific content from targeted sections using CSS selectors

📦 Tools & Libraries Used

rvest – for web scraping
xml2 – to read and parse HTML
dplyr – for data wrangling
Base R functions

📝 How to Use

Save your term_project.html file locally in your R working directory
Run the scraper script in RStudio or your preferred IDE
View the output in the console or export to .csv or .txt for further analysis

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
offline_webscraper.R		offline_webscraper.R
term_project.html		term_project.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Offline-Web-Scraper

Project Purpose

What It Does

📦 Tools & Libraries Used

📝 How to Use

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Offline-Web-Scraper

Project Purpose

What It Does

📦 Tools & Libraries Used

📝 How to Use

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages