Tuesday, April 1, 2008

Data Extraction Tools; High Tech Theft or Sourcing Automation ?

An internet sourcer friend of mine and were talking and he was telling me that his employer, a large recruiting and RPO company, routinely uses a data extraction tool on some of the biggest resume job boards to pull down 10's of thousands of resumes every night. Now, I have no way of confirming this so I'll keep Company names out of the conversation but where is the line in the sand for these data extraction tools?

For those of you that aren't familiar with DE tools, imagine if you could program a piece of software to do the following;
1) Log into your favorite job board
2) perform specific searches for candidate resumes
3) Follow the link to each individual's resume
4) compare that resume/name/address/phone against other resumes you already have in your ATS
5) if it's not a duplicate then create a new candidate file and save the file

It's a way of automating the whole process.

Now imagine if you had 20 subscriptions and could pull 10,000 resumes per subscription per night. As you can imagine it wouldn't take long for you to have the entire job board database.

Now I'm certainly not advocating the use of DE Tools in this manner. It is blatantly wrong, IMO, to strip a company of it's most valued asset; that being the data it creates/supplies.

But putting that scenario aside. What if you used it on a more limited basis say on Linkedin. As anyone with a Linkedin account knows it's damn near impossible to extract Name, Address, Company name and a profile address out of Linked in. Try copying the info into MS Word and you get all the graphics, MSExcel- ugh nothing is formatted, notepad and forget about it completed unformatted mess. But the DE tool could can the search results of every page and extract only the information you needed/wanted and put it into an Excel Spreadsheet format making it easier to verify the information and manipulate the information by sorting the columns, do a global replace of a title or company name, etc.

So where is that "Line in the Sand" for Data Extraction Tools?
Are we as Internet Researchers solely responsible for their use/mis-use or are the Internet sites solely responsible for protecting their data?
How it is different from Google spidering a web site and cataloging the web pages a site, so they can sort for relevancy claiming it adds more value to their service and then sell advertising (a lot of advertising).
And you can't say "Well as long as you use it in a limited fashion." Define Limited. What you define as limited as a single research working from a Home Office may not meet a Corp Staffing Mgr's definition of Limited that has to keep 50 recruiters flush with names to call every day.

Jeff Weidner

Title HTC Research Corp

2 comments:

Anonymous said...

Great article. It's becoming a common practice amongst offshore practices as well. There are no laws restraining or curtailing the practice of Data Extraction for ORPO's- What I do know is that certain Offshore RPO's created a way for 20-50-100 recruiters to log on to a single log-in with out major Job boards noticing it. Meaning that they only pay one log-in but 100 recruiters can use it at the same time. Not saying its right or wrong on DE but creating a tool to get out of paying might be questionable. There are large gray areas these practitioners are banking on questionable practices- Some are well known in the industry. Pretty funny- Create it-Sell it then steal it back quietly bit by bit- One resume at a time.

Here's a thought...
What if all the recruiters in the world decided come together and share their own resume databases through P2P sharing- Now that would scare the big players in the JB market.

Well that is a big 'IF'

Little Bird

Bob Etheridge said...

Hi Little Bird,

That is a big IF to get "all the recruiters in the world" to come together and share resume databases in a p2p model. Too big an if... but is it too big an if to get recruiters (corporate or agency, but not both) functioning in a certain industry and/or geographic location to get together. Say all Biotech recruiters in North Carolina?