An internet sourcer friend of mine and were talking and he was telling me that his employer, a large recruiting and RPO company, routinely uses a data extraction tool on some of the biggest resume job boards to pull down 10's of thousands of resumes every night. Now, I have no way of confirming this so I'll keep Company names out of the conversation but where is the line in the sand for these data extraction tools?
For those of you that aren't familiar with DE tools, imagine if you could program a piece of software to do the following;
1) Log into your favorite job board
2) perform specific searches for candidate resumes
3) Follow the link to each individual's resume
4) compare that resume/name/address/phone against other resumes you already have in your ATS
5) if it's not a duplicate then create a new candidate file and save the file
It's a way of automating the whole process.
Now imagine if you had 20 subscriptions and could pull 10,000 resumes per subscription per night. As you can imagine it wouldn't take long for you to have the entire job board database.
Now I'm certainly not advocating the use of DE Tools in this manner. It is blatantly wrong, IMO, to strip a company of it's most valued asset; that being the data it creates/supplies.
But putting that scenario aside. What if you used it on a more limited basis say on Linkedin. As anyone with a Linkedin account knows it's damn near impossible to extract Name, Address, Company name and a profile address out of Linked in. Try copying the info into MS Word and you get all the graphics, MSExcel- ugh nothing is formatted, notepad and forget about it completed unformatted mess. But the DE tool could can the search results of every page and extract only the information you needed/wanted and put it into an Excel Spreadsheet format making it easier to verify the information and manipulate the information by sorting the columns, do a global replace of a title or company name, etc.
So where is that "Line in the Sand" for Data Extraction Tools?
Are we as Internet Researchers solely responsible for their use/mis-use or are the Internet sites solely responsible for protecting their data?
How it is different from Google spidering a web site and cataloging the web pages a site, so they can sort for relevancy claiming it adds more value to their service and then sell advertising (a lot of advertising).
And you can't say "Well as long as you use it in a limited fashion." Define Limited. What you define as limited as a single research working from a Home Office may not meet a Corp Staffing Mgr's definition of Limited that has to keep 50 recruiters flush with names to call every day.
Jeff Weidner
Title HTC Research Corp
Showing posts with label Data extraction Tools. Show all posts
Showing posts with label Data extraction Tools. Show all posts
Tuesday, April 1, 2008
Data Extraction Tools; High Tech Theft or Sourcing Automation ?
Posted by
Jeff Weidner
at
9:33 AM
2
comments
Links to this post
Labels: candidate development sourcing candidate sourcing name generation, Data extraction Tools
Subscribe to:
Posts (Atom)
