REA Group has ramped up its defences from internet site scrapers and account takeover fraudsters, making use of engineering from Australian safety commence-up Kasada to frustrate bots concentrating on its material and users.
The ASX-mentioned operator of home web pages like realestate.com.au mentioned it confronted more and more innovative attempts to scrape its material for monetisation elsewhere.
It also suspected it was the focus on of credential-stuffing attacks, the place bots aimed reams of stolen credentials at its internet sites in the hope that some logins worked, making it possible for attackers to choose in excess of individuals accounts.
“One matter that [genuine estate listing platforms] have in common is they’re quite significantly a treasure trove of information,” REA Group systems supervisor Andrew Logue told the latest AppSec Day Australia meeting.
The information is not just REA’s. Ever more, 3rd-celebration information, this sort of as functionality information of area faculties, is surfaced by way of genuine estate listing web pages.
“The problem with that is we’re now not only safeguarding our personal mental home, we’re safeguarding any individual else’s,” Logue mentioned.
“And if you read through the conditions and situations when you indication up to a 3rd celebration, they’re possibly going to say, ‘If you accessibility our information feeds, you’ve acquired to make positive that nobody’s going to scrape them’.
“We’re going to set them on the general public world wide web – so anyway, that’s a different tough problem.”
The bot problem
Logue mentioned scrapers ranged from folks running scripts for private use, this sort of as to be alerted to new rental listings just about every Monday, to additional innovative operators that re-packaged the information and onsold it.
Some scrapers were being comparatively effortless to discover and frustrate making use of “traditional” implies, this sort of as rate restricting, IP address blocking, geoblocking, or by turning on web software firewall (WAF) rules.
But Logue mentioned scrapers were being getting additional innovative and harder to detect as they were being in a position to use applications and open source scripts to far better blend in with ordinary internet site site visitors.
“We’ve located that a good deal of the scrapers and the persons associated in probable account takeover activity [are] fairly simply circumventing a good deal of the additional traditional approaches,” he mentioned.
“Scrapers are obtaining far better, mostly simply because they have a far better established of applications at their disposal.
“You’ve acquired toolkits this sort of as Selenium and Puppeteer – and then on the [account takeover] aspect you’ve acquired applications this sort of as Sentry MBA that can be utilised to drip feed a total ton of person creds less than the radar making use of a formerly disclosed breach.”
Scrapers were being also in a position to distribute their pursuits in excess of hundreds – probably tens of millions – of distinct household IP addresses and person agents (“crawlers”), and a variety of geographies.
Logue noted that was “making the life of individuals out there that are running users, running mental home and making an attempt to safeguard person identities a hell of a good deal harder”.
“It implies that you’re not only dealing with these incredibly particular traits,” he mentioned.
“You’ve acquired an very higher-cardinality when it arrives to IP addresses, areas, person agents, and most of the other traits that get linked with any kind of scraping activity or attack that you could experience.”
Significant-cardinality describes values (this sort of as in a database) that are comparatively unusual or one of a kind.
Logue mentioned a scraping bid that REA identified as ‘The Monster’ was a situation in issue.
“It was a reduced but significantly additional sustained range of requests, so in excess of time, it just blended in and created us kind of feel that our base level of person site visitors was a good deal greater than it was,” he mentioned.
“[It] experienced in excess of 3000 distinct IPs and in excess of ten,000 distinct person agents, and it was distribute throughout seventy two diverse nations around the world.
“And you’re like, All right, the place do we commence? Luckily they were being only making an attempt to scrape. This wasn’t an account takeover try, simply because the ramifications of that can be a good deal increased.”
Logue mentioned REA tackled ‘The Monster’ mostly by geoblocking.
“I can’t recall the total rely, but I feel …. we were being conversing about 50-70 diverse nations around the world blocked all around the world, and as we were being blocking them, they were being popping up elsewhere.
“Luckily, the only casualties were being man or woman several hours. So regardless of what seemed like quite a grave attack at the time, the harm was actually mitigated to possibly about $a hundred,000 worth of engineering time.
“But in conditions of the prospect value, we might instead commit that cash elsewhere.”
At the time, Logue mentioned REA experienced Kasada’s Polyform less than evidence-of-strategy.
The engineering makes use of a huge volume of telemetry to distinguish involving “good” and “bad” bots and human site visitors, and then deploys a selection of methods from sources of unwelcome site visitors.
“In the event of that what we assumed could be a credential-stuffing attack, we basically rapidly tracked our evidence-of-strategy … into manufacturing, and it instantly stopped all of the attempts that we were being viewing pop up,” Logue mentioned.
Logue mentioned Kasada has only been established up to protect “specific channels” linked with realestate.com.au, but he mentioned the success experienced been promising.
Polyform was in a position to return site visitors concentrations “back to what we feel human site visitors should really appear like” for just one individual channel, knocking “out 43 % of all the requests that were being hitting our origin for that individual channel” with no skewing the site’s audience metrics, Logue mentioned.
Logue mentioned REA also located scraping attempts – and vectors – that experienced formerly long gone undetected.
“We were being obtaining scraped by Google Docs, Google Sheets, PHP, PowerShell … and we didn’t know about any of this until eventually we turned this gadget on,” he mentioned.
Kasada’s field CTO Nick Rieniets mentioned the company’s objective was to convert the table on attackers.
“You’re getting a problem the place it’s incredibly affordable, effortless and rapidly to attack the site, and flipping it so that it becomes incredibly tough, incredibly pricey, and in the long run, incredibly time consuming,” he mentioned.
“The notion in this article is that we’re now making an attempt to make the REA site as pricey as possible to scrape, and we do that by switching the way that we defend them on a incredibly frequent basis.
“So we will make it possible for an attacker to go down a route of investigating a individual software of choice – possibly they’re going to go from Python to a Headless Chrome natural environment. We’ll follow that natural environment and then as they get additional self-assured that’s possible, we will then neutralise them.
Rieniets claimed Polyform presented “limitless” possibilities to the defenders of internet sites to frustrate “real, dwell attacks”.
“I could instruct the browser to clear up a puzzle the place I could make that puzzle tremendous tough, and so all of a sudden, you’ve acquired no memory left in your bot,” he mentioned.
“I could instruct the browser to download an very huge file and just blow that portion of it up. I could eat the CPU, I could ship you to the erroneous information and facts, and I could alter matters on the fly, so that it’s thoroughly unreliable to you and I’ve neutralised the output of your bot, but you continue to possibly feel that it’s doing work simply because the matter about this is that we’re basically offering back to the person the reaction, which will not result in to them that just about anything [untoward is] going on [to their bot].”
Rieniets mentioned the intention was to “elongate and frustrate” attempts to scrape information or to run a credential-stuffing attack, to the issue the place the attacker “makes a choice to end attacking REA and to go someplace else.”
“The total objective of switching the activity of the economics of attacking this individual portion of the site is to get the man or woman that’s writing the scripts to make just one ultimate assertion in the battle for victory: to give up. And that’s the objective.”