I’m looking for a script to extract metatags from all pages of a website. The metatags which we are interested in are
– Title
– Keywords
– Description
The input to this script would be
– Website’s URL (e.g. http://www.domain.com)
– Option to extract metatags from all pages or just one URL as specified in previous input
– Directories to exclude (e.g. http://www.domain.com/directory/ , http://www.domain.com/directory_2/)
Output of this script will be a csv file containing following in that order
URL, Title, Keywords, Description
Additional inputs:
– Delay between each GET
– Total number of urls to crawl – this is for testing purpose
– Option to just print (export) URLs in CSV file – for testing purpose
Script must be able to cope with crawling about 2k pages in the website. If there is very significant delay in crawling and extracting these details for 2k urls then this script is not suitable for our use.
You can use any programming language from PHP/Perl/Python etc or shell scripting language.
We would like to avoid database unless you think it’s absolutely necessary.
There might be chances that the target website’s server has restriction on how many requests you can send in short time so you will need to include a delay between each GET.
We need this script asap.
The script must be fully tested against above mentioned requirements.
Provide your best solution by mentioning the language you will use and cost for the final script.
If you already have similar script then you are welcome to show us working example.
Apply via following form.