What is Web Scraping?
Web Scraping is the process of extracting data from a website. It is mostly unstructured data which is filtered and later exported to a CSV/excel file into a readable format.
Is it legal?
As long as we don't intend to cause any harm to the website it's perfectly legal. We can also check if a particular website allows scraping their data by checking the robots.txt file
While web scraping has numerous applications in real-time, I will demonstrate how it can help our job search. I'm going to scrape jobs on indeed.com.
The very first thing to do is retrieve the robots.txt file and check for permission. Just go to https://www.indeed.com/robots.txt
As you can see the User-agent has an asterisk (*) which means everyone is allowed to scrape.
I'm going to use the power of Java and Selenium to do the job of scraping and storing the data in a spreadsheet.
Launch the website
2. Create an Excel in a local folder to store the scraped data
3. Pass relevant values for job search
4. Determine the number of pages to scrape and go over every job card from each page to scrape for Job Role, Salary, company name, and job type and store it a hash map
5. Write the data to the Excel
Here is the console output and scraped data excel.
It takes about 15 seconds to scrape about 45 jobs and save it in Excel.
Happy coding and good luck with the job hunt!!!